CN104036229A

CN104036229A - Regression-based active appearance model initialization method

Info

Publication number: CN104036229A
Application number: CN201310090347.5A
Authority: CN
Inventors: 陈莹; 化春键; 郭修宵
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2013-03-10
Filing date: 2013-03-10
Publication date: 2014-09-10

Abstract

The invention discloses a regression-based active appearance model initialization method, which belongs to the field of computer vision. The implementation process of this method is as follows: when using the active appearance model for automatic tracking of facial feature points, assuming that the target position information of the first frame in the video tracking is known, in the subsequent tracking process, use the double-threshold feature correspondence algorithm to obtain adjacent The discrete feature correspondence between frame images uses the spatial mapping relationship between discrete feature points and structured calibration points established by the Kernel Ridge regression algorithm to obtain the initial calibration of facial features, which can greatly reduce the number of subsequent iterations and improve calibration accuracy. Compared with the initialization method of the traditional active appearance model, the method of the present invention can assist the active appearance model to obtain more accurate calibration results of facial feature points.

Description

Regression-Based Active Appearance Model Initialization Method

技术领域 technical field

本发明属于图像分析技术领域，具体地说，属于一种基于回归的主动外观模型初始化方法。 The invention belongs to the technical field of image analysis, in particular to a regression-based active appearance model initialization method.

背景技术 Background technique

在计算机视觉研究领域，利用主动外观模型(Active Appearance Model，AAM)方法进行目标形状特征点定位是近年来关注和研究的热点，它于1998年由Edwards等人首次提出的，并在人脸和其他非刚体的配准与识别中得到了广泛的应用。AAM算法是对主动形状模型方法(Active Shape Model，ASM)的一种改进，与ASM相比，它考虑了全局信息的约束，采用形状和纹理融合的统计约束，即统计表观约束。并且，AAM的搜索原理借鉴了基于合成的分析技术(analysis-by-synthesis，ABS)的主要思想，通过对模型中参数的不断调整而使模型逐渐逼近实际的输入模型。 In the field of computer vision research, the use of Active Appearance Model (Active Appearance Model, AAM) method to locate target shape feature points is a hot spot of attention and research in recent years. It was first proposed by Edwards et al. It has been widely used in the registration and recognition of other non-rigid bodies. The AAM algorithm is an improvement to the Active Shape Model (ASM). Compared with ASM, it considers the constraints of global information and adopts the statistical constraints of shape and texture fusion, that is, the statistical appearance constraints. Moreover, the search principle of AAM draws on the main idea of analysis-by-synthesis (ABS), and makes the model gradually approach the actual input model by continuously adjusting the parameters in the model.

AAM虽然有效，但无论是何种改进算法，其算法的拟合效率与精度都与模型初始位置的给定有着密切的关系。若初始位置较差，则AAM中基于梯度下降的拟合算法将极有可能陷入局部极小而得到非常差的目标形状定位结果。因此初始特征点的给定是影响算法鲁棒性和速度的关键因素，能够自动进行人脸特征点的自动准确标定可以使算法的效率及精确度大大提高。传统的AAM初始化方法包括：(1)利用均值脸作为初始形状，初始化误差较大；(2)利用人脸特征(如双眼、嘴)定位信息完成初始化，但对定位精确度有较高要求；(3)在视频跟踪中，利用前帧定位结果作为当前帧定位的初始信息，但只能适应帧间变化较小的情形。 Although AAM is effective, no matter what kind of improved algorithm it is, the fitting efficiency and accuracy of the algorithm are closely related to the given initial position of the model. If the initial position is poor, the fitting algorithm based on gradient descent in AAM will most likely fall into a local minimum and obtain very poor target shape positioning results. Therefore, the determination of the initial feature points is the key factor affecting the robustness and speed of the algorithm, and the automatic and accurate calibration of the face feature points can greatly improve the efficiency and accuracy of the algorithm. The traditional AAM initialization methods include: (1) using the average face as the initial shape, and the initialization error is large; (2) using face features (such as eyes, mouth) positioning information to complete the initialization, but there are high requirements for positioning accuracy; (3) In video tracking, the localization result of the previous frame is used as the initial information of the current frame localization, but it can only adapt to the case of small changes between frames.

发明内容 Contents of the invention

本发明的目的在于：针对现有主动外观模型初始化方法的不足，以人脸特征点视频跟踪为研究对象，提出了一种基于回归的主动外观模型初始化方法。由于方法考虑了帧之间的相互关联，本发明能在丢帧的跟踪环境下大大提高目标形状的跟踪性能。 The object of the present invention is to propose a regression-based active appearance model initialization method based on the shortcomings of existing active appearance model initialization methods, taking video tracking of facial feature points as the research object. Because the method considers the correlation between frames, the invention can greatly improve the tracking performance of the target shape in the tracking environment of frame loss.

本发明所解决的技术方案是：算法假设已知第一帧的关键特征点，首先利用双阈值方法获取前一帧与当前帧图像中精确的局部特征点对应，再利用核岭回归算法(Kernel Ridge Regression，KRR)建立的离散局部特征点对应与结构化标定点之间的映射关系，从局部特征对应中提取当前帧的标定点定位信息。本发明技术方案的具体实现步骤如下： The technical scheme solved by the present invention is: the algorithm assumes that the key feature points of the first frame are known, firstly, the double threshold method is used to obtain the accurate local feature points corresponding to the previous frame and the current frame image, and then the Kernel Ridge regression algorithm (Kernel Ridge Regression Algorithm (Kernel Ridge Regression) Ridge Regression (KRR) establishes the mapping relationship between the discrete local feature point correspondence and the structured calibration point, and extracts the calibration point positioning information of the current frame from the local feature correspondence. The specific implementation steps of the technical solution of the present invention are as follows:

1.选定训练视频，利用核岭回归算法建立散乱局部特征点与人脸特征标定点之间空间位置的映射M_v； 1. Select the training video, and use the kernel ridge regression algorithm to establish the mapping M _v of the spatial position between the scattered local feature points and the facial feature calibration points;

2.利用Cascade Adaboost算法检测人脸，并将人脸图像归一化为250*250大小； 2. Use the Cascade Adaboost algorithm to detect faces, and normalize the face image to a size of 250*250;

3.计算前帧重建图像与当前重建图像之间的误差其中I₁和I₂分别为前帧人脸图像和当前人脸图像，x为均值形状s₀下的像素集合，p为从均值形状s₀到当前重建形状s的变形参数，W(x；p)为重建形状s下的像素集合，N为均值形状下像素的个数。当e＞e₀时，说明前帧人脸图像与当前人脸图像差别较大，转入步骤(3)，否则，转入步骤(5)，其中e₀＝5e-4为误差阈值。 3. Calculate the error between the reconstructed image of the previous frame and the current reconstructed image Where I ₁ and I ₂ are the face image of the previous frame and the current face image respectively, x is the pixel set under the mean shape _s0 , p is the deformation parameter from the mean shape _s0 to the current reconstructed shape s, W(x; p) is the set of pixels under the reconstruction shape s, and N is the number of pixels under the mean shape. When e>e ₀ , it means that the face image in the previous frame is quite different from the current face image, and go to step (3); otherwise, go to step (5), where e ₀ =5e-4 is the error threshold.

4.提取前帧人脸图像I₁和当前帧人脸图像I₂的SIFT特征，利用基于双阈值的特征匹配方法进行匹配，得到匹配对C＝{(c_k，c′_k)，k＝1，2，…，q_C}，其中q_C为匹配对个数； 4. Extract the SIFT features of the previous frame face image I ₁ and the current frame face image I ₂ , use the feature matching method based on double thresholds to match, and obtain the matching pair C={(c _k , c′ _k ), k= 1, 2, ..., q _C }, where q _C is the number of matching pairs;

5.在前帧人脸图像I₁中提取空间向量V＝{V_k，k＝1，2，…，n}，其中n为人脸特征点个数。 5. Extract a space vector V={V _k , k=1, 2, . . . , n} from the face image I ₁ of the previous frame, where n is the number of face feature points.

6.根据步骤(1)中得到的映射M_v参数以及步骤(4)中得到的匹配点，建立测试阶段离散特征点空间位置矢量V作为输入送入映射M_v，输出与之相对应的人脸标定点，即得到对于当前帧跟踪所用的主动表观模型的初始值； 6. According to the mapping M _v parameters obtained in step (1) and the matching points obtained in step (4), establish the spatial position vector V of discrete feature points in the test stage and send it to the mapping M _v as input, and output the corresponding person Face calibration point, that is, to obtain the initial value of the active appearance model used for current frame tracking;

上述的基于回归的主动外观模型初始化方法，步骤1中的具体实现过程如下： The above-mentioned regression-based active appearance model initialization method, the specific implementation process in step 1 is as follows:

(1)数据初始化，令时刻k＝0； (1) Data initialization, let time k=0;

(2)在前帧人脸图像I₁(k)与当前人脸图像I₂(k)间通过建立均衡化概率模型的匹配方法得到散乱的匹配点，令k＝k+1；； (2) Between the previous frame face image I ₁ (k) and the current face image I ₂ (k), scattered matching points are obtained by establishing a matching method of an equalized probability model, so that k=k+1;

(3)根据散乱匹配点，从前帧人脸图像和当前人脸图像中获取KRR训练数据 $T_{V_{j}} (k) = {(V_{k}^{j}, V_{k^{'}}^{j}), j = 1,2, . . ., m},$ 其中m＝66为标定点个数； (3) Acquire KRR training data from the previous frame face image and the current face image according to the scattered matching points $T_{V_{j}} (k) = {(V_{k}^{j}, V_{k^{'}}^{j}), j = 1,2, . . ., m},$ Among them, m=66 is the number of calibration points;

(4)若k＜n，返回(1.2)，否则获取总训练数据其中n＝100为训练样本对数。 (4) If k<n, return to (1.2), otherwise get the total training data Where n=100 is the logarithm of training samples.

(5)对每一个人脸标定点j，首先根据训练数据计算核函数矩阵K_j，其中 $K_{j} (V_{k_{1}}^{j}, V_{k_{2}}^{j}) = \exp (- {| | V_{k_{1}}^{j} - V_{k_{2}}^{j} | |}^{2} / σ), k_{1} = 1,2, . . . n, k_{2} = = 1,2, . . . n,$ 其中σ＝0.025；然后创建大小与矩阵K_j相同的单位矩阵I，其中I(k₁，k₂)＝1，k₁＝1，2，...n，k₂＝＝1，2，...n；之后计算核系数向量α_j，其中其中λ＝0.5×10^-7；最后根据上述计算，得到回归核函数 (5) For each face calibration point j, first according to the training data Calculate the kernel function matrix K _j , where $K_{j} (V_{k_{1}}^{j}, V_{k_{2}}^{j}) = \exp (- {| | V_{k_{1}}^{j} - V_{k_{2}}^{j} | |}^{2} / σ), k_{1} = 1,2, . . . no, k_{2} = = 1,2, . . . no,$ where σ=0.025; then create an identity matrix I of the same size as the matrix K _j , where I(k ₁ , k ₂ )=1, k ₁ =1,2,...n,k ₂ ==1,2, ...n; then calculate the kernel coefficient vector α _j , where where λ=0.5×10 ^-7 ; finally, according to the above calculation, the regression kernel function is obtained

(6)获得映射集合M_v＝{f^j(V)，j＝1，2，...，m}。 (6) Obtain the mapping set M _v ={f ^j (V), j=1, 2, . . . , m}.

上述的基于回归的主动外观模型初始化方法，步骤4的具体实现过程如下 The above-mentioned regression-based active appearance model initialization method, the specific implementation process of step 4 is as follows

(1)双阈值初始化：设阈值初始值η₁＝1.5，η₂＝8；循环次数iter₁＝0，iter₂＝0；循环次数限制iter_max₁＝10，iter_max₂＝20，邻近匹配点个数t＝5； (1) Double threshold initialization: set the threshold initial value η ₁ =1.5, η ₂ =8; the number of cycles iter ₁ =0, iter ₂ =0; the number of cycles iter_max ₁ =10, iter_max ₂ =20, adjacent matching points number t=5;

(2)从前帧人脸图像I₁和当前人脸图像I₂中提取SIFT特征； (2) Extract SIFT feature from previous frame face image I ₁ and current face image I ₂ ;

(3)利用阈值η₁做SIFT特征匹配，并令iter₁＝iter₁+1； (3) Utilize threshold η ₁ to do SIFT feature matching, and make iter ₁ =iter ₁ +1;

(4)若匹配个数小于t，且iter₁＜iter_max₁，则令η₁＝η₁+0.15，并返回(4.3)；若匹配个数大于t，或iter₁＞iter_max₁，则转入(4.5)； (4) If the number of matches is less than t, and iter ₁ <iter_max ₁ , set η ₁ =η ₁ +0.15, and return to (4.3); if the number of matches is greater than t, or iter ₁ >iter_max ₁ , then transfer to (4.5);

(5)利用阈值η₂做SIFT特征匹配，并令iter₂＝iter₂+1； (5) Utilize threshold η ₂ to do SIFT feature matching, and make iter ₂ =iter ₂ +1;

(6)若匹配个数小于2，且iter₂＜iter_max₂，则令η₂＝η₂+0.02，并返回(4.5)；若匹配个数大于2且小于5，且iter₂＜iter_max₂，则令η₂＝η₂-0.01，并返回(4.5)；否则，转入(4.7)； (6) If the number of matches is less than 2, and iter ₂ <iter_max ₂ , set η ₂ =η ₂ +0.02, and return (4.5); if the number of matches is greater than 2 and less than 5, and iter ₂ <iter_max ₂ , Then set η ₂ =η ₂ -0.01, and return to (4.5); otherwise, turn to (4.7);

(7)获得利用η₁得到的密集匹配集B＝{(b_j，b′_j)，j＝1，2，…，q_B}，其中q_B为密集匹配的个数，以及利用η₂得到的精确匹配集A＝{(a_i，a′_i)，i＝1，2，…，q_A}，其中q_A为密集匹配的个数； ( ₇ ) Obtain the dense matching set B={(b _j , b′ _j ), j=1, 2, ..., q _B } obtained by using η 1, where q _B is the number of dense matching, and using η ₂ The obtained exact matching set A={(a _i , a' _i ), i=1, 2, ..., q _A }, where q _A is the number of dense matches;

(8)计算精确匹配集中第一和最后两对匹配点的距离比值 (8) Calculate the distance ratio of the first and last two pairs of matching points in the exact matching set

(9)对于密集匹配集中的每一对匹配(b_j，b′_j)，首先计算然后计算并得到之后计算θ＝atan((a_t(y)-b_j(y))/(a_t(x)-b_j(x)))，θ′＝atan((a′_t(y)-b′_j(y))/(a_t(x)-b_j(x)))，并计算sig_x＝sign(a_t(y)-b_j(y))，sig_y＝sign(a_t(x)-b_j(x))，sig′_x＝sign(a′_t(y)-b′_j(y))，sig_y＝sign(a′_t(x)-b′_j(x))。如果则保存当前匹配，否则将该匹配删除。最后得到最终匹配对C＝{(c_k，c′_k)，k＝1，2，…，q_C}，其中q_C为最终匹配的个数。 (9) For each pair of matches (b _j , b′ _j ) in the dense matching set, first calculate then calculate and get Then calculate θ=atan((a _t (y)-b _j (y))/(a _t (x)-b _j (x))), θ'=atan((a' _t (y)-b' _j (y))/(a _t (x)-b _j (x))), and calculate sig _x = sign(a _t (y)-b _j (y)), sig _y = sign(a _t (x )-b _j (x)), sig' _x = sign(a' _t (y)-b' _j (y)), sig _y = sign(a' _t (x)-b' _j (x)). if then save the current match, otherwise delete the match. Finally, the final matching pair C={(c _k , c′ _k ), k=1, 2, . . . , q _C } is obtained, where q _C is the number of final matching.

上述的基于回归的主动外观模型初始化方法中，步骤1中的分步骤(3)按如下进行： In the above-mentioned regression-based active appearance model initialization method, sub-step (3) in step 1 is carried out as follows:

(1)设j为I₁(k)中的当前标定点，o为I₁(k)中当前t个邻近匹配点中心，o′为与其对应的I₂(k)中当前t个匹配点中心； (1) Suppose j is the current calibration point in I ₁ (k), o is the center of the current t adjacent matching points in I ₁ (k), and o′ is the current t matching points in I ₂ (k) corresponding to it center;

(2)计算从匹配点i(i的值为匹配点个数)到匹配点的中心点o的距离以及直线oi与x轴之间的夹角(d_i，θ_i)，以及从匹配点i′到匹配点的中心点o′的距离以及直线o′i′与x轴之间的夹角(d′_i，θ′_i)； (2) Calculate the distance from the matching point i (the value of i is the number of matching points) to the center point o of the matching point and the angle between the straight line oi and the x-axis (d _i , θ _i ), and from the matching point The distance from i' to the center point o' of the matching point and the angle between the straight line o'i' and the x-axis (d' _i , θ' _i );

(3)计算I₁(k)中从标定点j到中心点o的距离以及直线oj与x轴之间的夹角(d_l，θ_l)； (3) Calculate the distance from the calibration point j to the center point o in I ₁ (k) and the angle between the straight line oj and the x-axis (d _l , θ _l );

(4)计算I₂(k)中从标定点j′到中心点o′的距离以及直线o′j′与x轴之间的夹角(d_r，θ_r)； (4) Calculate the distance from the calibration point j' to the center point o' in I ₂ (k) and the angle (d _r , θ _r ) between the straight line o'j' and the x-axis;

(5)形成相对于标定点p的输入训练数据V^j＝(d₁，θ₁，...，d₆，θ₆，d′₁，θ′₁，...，d′₆，θ′₆，d_l，θ_l)以及相应的输出训练数据V^j＝(Δd，Δθ)形成k时刻训练样本其中Δd＝d_r/d_l，Δθ＝θ_r-θ_l； (5) Form the input training data V ^j = (d ₁ , θ ₁ , ..., d ₆ , θ ₆ , d′ ₁ , θ′ ₁ , ..., d′ ₆ , θ ′ ₆ , d _l , θ _l ) and the corresponding output training data V ^j = (Δd, Δθ) to form a training sample at time k Where Δd=d _r /d _l , Δθ=θ _r -θ _l ;

(6)将j和j′作为新的匹配点加入匹配点集合，不断迭代循环，直到遍历所有的标定点。 (6) Add j and j' as new matching points to the matching point set, and iterate the loop until all calibration points are traversed.

本发明方法与现有技术相比较，具有如下突出实质性特点和显著优点： Compared with the prior art, the method of the present invention has the following prominent substantive features and significant advantages:

(1)针对当前人脸自动标定技术中对初始姿态要求较高，迭代次数多，标定速度慢的缺点，利用训练数据获取离散特征点与结构化标定点之间的空间位置关系，从而根据在线离散特征点对应获得当前帧人脸标定的初始化； (1) In view of the shortcomings of the current face automatic calibration technology, which require high initial pose, many iterations, and slow calibration speed, the training data is used to obtain the spatial position relationship between the discrete feature points and the structured calibration points, and then according to the online The discrete feature points correspond to the initialization of the face calibration of the current frame;

(2)采用核岭回归方法获取离散特征点与结构化标定点之间的映射函数，在回归精度和速度上达到了较好的平衡； (2) The kernel ridge regression method is used to obtain the mapping function between discrete feature points and structured calibration points, which achieves a good balance in regression accuracy and speed;

(3)通过建立离散特征点与结构化标定点之间的映射关系，可根据在线离散特征点对应获得人脸标定点的初始化，可提高最终标定的速度和精度； (3) By establishing the mapping relationship between discrete feature points and structured calibration points, the initialization of face calibration points can be obtained according to the correspondence of online discrete feature points, which can improve the speed and accuracy of final calibration;

本发明提供的主动外观模型初始化方法能极大地提高目标形状的定位及跟踪精度，为人脸分析的后续处理提供更加全面、准确的特征点信息，达到理想的标定效果。在智能视频会议、影视制作、公共场所安全监测等民用领域和军事领域均有广泛的应用前景。 The active appearance model initialization method provided by the present invention can greatly improve the positioning and tracking accuracy of the target shape, provide more comprehensive and accurate feature point information for the subsequent processing of face analysis, and achieve an ideal calibration effect. It has broad application prospects in civil and military fields such as intelligent video conferencing, film and television production, and public place safety monitoring.

附图说明 Description of drawings

图1为本发明基于回归的主动外观模型初始化方法的流程示意图。 FIG. 1 is a schematic flow chart of the regression-based active appearance model initialization method of the present invention.

图2给出了FGNet Talking Face视频中样本帧图像的跟踪结果。 Figure 2 shows the tracking results of sample frame images in the FGNet Talking Face video.

图3给出了基于本发明所提出的初始化方法的AAM跟踪结果与传统AAM跟踪结果的跟踪误差比较。 Fig. 3 shows the tracking error comparison between the AAM tracking result based on the initialization method proposed by the present invention and the traditional AAM tracking result.

具体实施方式Detailed ways

下面结合图1中的具体图示，对本发明做进一步阐述。 The present invention will be further described below in conjunction with the specific illustration in FIG. 1 .

参考图1中的流程图，实现本发明基于回归的主动外观模型初始化方法，算法首先在训练阶段，利用核岭回归算法(Kernel Ridge Regression，KRR)建立的离散局部特征点对应与结构化标定点之间的映射关系，然后在测试阶段，假设已知第一帧的关键特征点，利用双阈值方法获取前一帧与当前帧图像中精确的局部特征点对应，再利用训练得到的离散局部特征点对应与结构化标定点之间的映射关系，从局部特征对应中提取当前帧的标定点定位信息。现将各个步骤具体实施方式加以说明： With reference to the flow chart in Fig. 1, realize the regression-based active appearance model initialization method of the present invention, the algorithm is first in the training phase, utilizes the discrete local feature point corresponding to the structured calibration point that Kernel Ridge Regression algorithm (Kernel Ridge Regression, KRR) establishes Then in the test phase, assuming that the key feature points of the first frame are known, the double-threshold method is used to obtain the accurate local feature points corresponding to the previous frame and the current frame image, and then the discrete local features obtained by training are used The mapping relationship between the point correspondence and the structured calibration point, and the calibration point positioning information of the current frame is extracted from the local feature correspondence. The specific implementation of each step is now described:

1.选定训练视频，利用核岭回归算法建立散乱局部特征点与人脸特征标定点之间空间位置的映射M_v。该过程的具体步骤为： 1. Select the training video, and use the kernel ridge regression algorithm to establish the mapping M _v of the spatial position between the scattered local feature points and the facial feature calibration points. The specific steps of the process are:

(1)数据初始化，令时刻k＝0； (1) Data initialization, let time k=0;

(3)根据散乱匹配点，从前帧人脸图像和当前人脸图像中获取KRR训练数据其中m＝66为标定点个数；具体实现步骤为： (3) Acquire KRR training data from the previous frame face image and the current face image according to the scattered matching points Among them, m=66 is the number of calibration points; the specific implementation steps are:

(a)设p为正脸中的当前标定点，o为正脸中当前k个匹配点中心，o′为与其对应的侧脸中当前k个匹配点中心； (a) Let p be the current calibration point in the front face, o be the center of the current k matching points in the front face, and o' be the center of the current k matching points in the corresponding side face;

(b)计算从匹配点i(i的值为匹配点个数)到匹配点的中心点o的距离以及直线oi与x轴之间的夹角(d_i，θ_i)，以及从匹配点i′到匹配点的中心点o′的距离以及直线o′i′与x轴之间的夹角(d′_i，θ′_i)； (b) Calculate the distance from the matching point i (the value of i is the number of matching points) to the center point o of the matching point and the angle between the straight line oi and the x-axis (d _i , θ _i ), and from the matching point The distance from i' to the center point o' of the matching point and the angle between the straight line o'i' and the x-axis (d' _i , θ' _i );

(c)计算正脸中从标定点p到中心点o的距离以及直线op与x轴之间的夹角(d_l，θ_l)； (c) Calculate the distance from the calibration point p to the center point o and the angle (d _l , θ _l ) between the straight line op and the x-axis in the frontal face;

(d)计算侧脸中从标定点p′到中心点o′的距离以及直线o′p′与x轴之间的夹角(d_r，θ_r)； (d) Calculate the distance from the calibration point p' to the center point o' in the side face and the angle (d _r , θ _r ) between the straight line o'p' and the x-axis;

(e)形成相对于标定点p的输入训练数据N_x＝(d₁，θ₁，...，d₆，θ₆，d′₁，θ′₁，...，d′₆，θ′₆，d_l，θ_l)以及相应的输出训练数据N_y＝(Δd，Δθ)，其中Δd＝d_r/d_l，Δθ＝θ_r-θ_l； (e) Form the input training data N _x =(d ₁ , θ ₁ , ..., d ₆ , θ ₆ , d′ ₁ , θ′ ₁ , ..., d′ ₆ , θ ' ₆ , d _l , θ _l ) and the corresponding output training data N _y =(Δd, Δθ), where Δd=d _r /d _l , Δθ=θ _r -θ _l ;

(f)将p和p′作为新的匹配点加入匹配点集合，不断迭代循环，直到遍历所有的标定点。 (f) Add p and p' as new matching points to the matching point set, and iterate the loop until all calibration points are traversed.

(5)对每一个人脸标定点j，首先根据训练数据计算核函数矩阵K_j，其中 k₁＝1，2...n，k₂＝＝1，2...n，其中σ＝0.025；然后创建大小与矩阵K_j相同的单位矩阵I，其中I(k₁，k₂)＝1，k₁＝1，2，...n，k₂＝＝1，2，...n；之后计算核系数向量α_j，其中其中λ＝0.5×10^-7；最后根据上述计算，得到回归核函数 (5) For each face calibration point j, first according to the training data Calculate the kernel function matrix K _j , where k ₁ =1, 2...n, k ₂ ==1, 2...n, where σ=0.025; then create an identity matrix I of the same size as the matrix K _j , where I(k ₁ , k ₂ ) =1, k ₁ =1, 2,...n, k ₂ ==1, 2,...n; then calculate the kernel coefficient vector α _j , where where λ=0.5×10 ^-7 ; finally, according to the above calculation, the regression kernel function is obtained

2.利用CascadeAdaboost算法检测人脸，并将人脸图像归一化为250*250大小； 2. Use the CascadeAdaboost algorithm to detect faces, and normalize the face image to a size of 250*250;

4.提取前帧人脸图像I₁和当前帧人脸图像I₂的SIFT特征，利用基于双阈值的特征匹配方法进行匹配，得到匹配对C＝{(c_k，c′_k)，k＝1，2，…，q_C}，其中q_C为匹配对个数。该过程的具体步骤为： 4. Extract the SIFT features of the previous frame face image I ₁ and the current frame face image I ₂ , use the feature matching method based on double thresholds to match, and obtain the matching pair C={(c _k , c′ _k ), k= 1, 2, ..., q _C }, where q _C is the number of matching pairs. The specific steps of the process are:

6.根据步骤1中得到的映射M_v参数以及步骤(4)中得到的匹配点，建立测试阶段离散特征点空间位置矢量V作为输入送入映射M_v，输出与之相对应的人脸标定点，即得到对于当前帧跟踪所用的主动表观模型的初始值。 6. According to the mapping M _v parameters obtained in step 1 and the matching points obtained in step (4), set up the discrete feature point space position vector V in the test stage and send it into the mapping M _v as input, and output the corresponding human face mark Fixed point, that is, to get the initial value of the active appearance model used for the current frame tracking.

本发明用FGNet Talking Face视频中各帧图像作为测试图像，对本发明所提出的基于回归的主动外观模型初始化方法进行了验证。 The present invention uses each frame image in the FGNet Talking Face video as a test image to verify the regression-based active appearance model initialization method proposed by the present invention.

图3给出了基于本发明所提出的基于回归的主动外观模型初始化方法的AAM跟踪结果与传统AAM 跟踪结果的跟踪误差比较，误差公式如式(1)所示，其中真实人脸标定点坐标为(x⁰ _i，y⁰ _i)，算法得到的人脸标定点坐标为(x_i，y_i)，其中i＝1，…，N，N为标定点个数，文中算法N＝66。 Fig. 3 provides the tracking error comparison based on the AAM tracking result based on the regression-based active appearance model initialization method proposed by the present invention and the tracking error of the traditional AAM tracking result, the error formula is shown in formula (1), wherein the real face calibration point coordinates is (x ⁰ _i , y ⁰ _i ), the coordinates of the face calibration points obtained by the algorithm are ( _xi , y _i ), where i=1,...,N, N is the number of calibration points, and the algorithm N=66 in this paper.

$e e = = \frac{11}{MN MN} {Σ Σ}_{k k = = 11}^{M m} {Σ Σ}_{i i = = 11}^{N N} \sqrt{{(({x x}_{i i} - - {x x}_{i i}^{00}))}^{22} + + {(({y the y}_{i i} - - {y the y}_{i i}^{00}))}^{22}} - - - - - - ((11))$

其中M为跟踪的帧数。 where M is the number of frames tracked.

从图中可以看出，与传统AAM初始化方法相比，本发明提出的初始化方法能帮助AAM获得更为精确的跟踪结果。 It can be seen from the figure that, compared with the traditional AAM initialization method, the initialization method proposed by the present invention can help AAM obtain more accurate tracking results.

Claims

1. A regression-based active appearance model initialization method, characterized in that: when utilizing the active appearance model to carry out automatic tracking of face feature points, assuming the target position information of the first frame in the known video tracking, using the local feature correspondence method Get the scattered local point feature correspondence of the front and rear frame images. The kernel ridge regression algorithm is used to establish the spatial position mapping relationship between scattered local feature points and facial feature calibration points, so as to complete the initialization of the active appearance model. The specific implementation steps are as follows:

(1) Select the training video, and use the kernel ridge regression algorithm to establish the mapping M _v of the spatial position between the scattered local feature points and the facial feature calibration points;

(2) Use the Cascade Adaboost algorithm to detect faces, and normalize the face image to a size of 250*250;

(3) Calculate the error between the reconstructed image of the previous frame and the current reconstructed image Where I ₁ and I ₂ are the face image of the previous frame and the current face image respectively, x is the pixel set under the mean shape _s0 , p is the deformation parameter from the mean shape _s0 to the current reconstructed shape s, W(x; p) is the set of pixels under the reconstruction shape s, and N is the number of pixels under the mean shape; when e>e ₀ , it shows that the face image of the previous frame is quite different from the current face image, and then go to step (3), Otherwise, go to step (5), where e ₀ =5e-4 is the error threshold;

(4) Extract the SIFT features of the face image I ₁ of the previous frame and the face image I ₂ of the current frame, and use the feature matching method based on double thresholds to match, and obtain the matching pair C={(c _k , c′ _k ), k =1, 2, ..., q _C }, where q _C is the number of matching pairs;

(5) Extract space vector V={V _k , k=1, 2,..., n} in the previous frame face image I ₁ , wherein n is the number of face feature points;

(6) According to the mapping M _v parameters obtained in step (1) and the matching points obtained in step (4), the spatial position vector V of the discrete feature points in the test stage is established as input and sent to the mapping M _v , and the corresponding output The face calibration point is to obtain the initial value of the active appearance model used for the current frame tracking.

2. the regression-based active appearance model initialization method according to claim 1, wherein step (1) is carried out as follows:

(1.1) Data initialization, let time k=0;

(1.2) Between the face image I ₁ (k) of the previous frame and the current face image I ₂ (k), scattered matching points are obtained by establishing a matching method of an equalized probability model, so that k=k+1;

(1.3) Acquire KRR training data from the face image of the previous frame and the current face image according to the scattered matching points

T_{V_{j}} (k) = {(V_{k}^{j}, V_{k^{'}}^{j}), j = 1,2, . . ., m},

Among them, m=66 is the number of calibration points;

(1.4) If k<n, return to (1.2), otherwise get the total training data Among them, n=100 is the logarithm of training samples;

(1.5) For each face calibration point j, first according to the training data Calculate the kernel function matrix K _j , where k ₁ =1, 2,...n, k ₂ ==1, 2,...n, where σ=0.025; then create an identity matrix I with the same size as the matrix K _j , where I(k ₁ , k ₂ )=1, k ₁ =1, 2,...n, k ₂ ==1, 2,...n; then calculate the kernel coefficient vector α _j , where where λ=0.5×10 ^-7 ; finally, according to the above calculation, the regression kernel function is obtained

(1.6) Obtain the mapping set M _v ={fj(V), j=1, 2, . . . , m}.

3. the regression-based active appearance model initialization method according to claim 2, wherein step (1.3) is carried out as follows:

(1.3.1) Suppose j is the current calibration point in I ₁ (k), o is the center of the current t adjacent matching points in I ₁ (k), and o′ is the current t matching points in I ₂ (k) corresponding to it match point center;

(1.3.2) Calculate the distance from the matching point i (the value of i is the number of matching points) to the center point o of the matching point and the angle between the straight line oi and the x-axis (d _i , θ _i ), and from The distance from the matching point i' to the center point o' of the matching point and the angle between the straight line o'i' and the x-axis (d' _i , θ' _i );

(1.3.3) Calculate the distance from the calibration point j to the center point o in I ₁ (k) and the angle (d _l , θ _l ) between the straight line oj and the x-axis;

(1.3.4) Calculate the distance from the calibration point j' to the center point o' in I ₂ (k) and the angle (d _r , θ _r ) between the straight line o'j' and the x-axis;

(1.3.5) Form the input training data V ^j =(d ₁ , θ ₁ , . . . , d ₆ , θ ₆ , d′ _i , θ′ ₁ , . . . , d′ ₆ , θ′ ₆ , d _l , θ _l ) and the corresponding output training data V ^j = (Δd, Δθ) to form a training sample at time k Where Δd=d _r /d _l , Δθ=θ _r -θ _l ;

(1.3.6) Add j and j' as new matching points to the matching point set, and iterate the loop until all calibration points are traversed.

4. the regression-based active appearance model initialization method according to claim 1, wherein step (4) is carried out as follows:

(4.1) Double threshold initialization: set the threshold initial value η ₁ =1.5, η ₂ =8; the number of cycles iter ₁ =0, iter ₂ =0; the number of cycles iter_max ₁ =10, iter_max ₂ =20, adjacent matching points number t=5;

(4.2) Extract SIFT features from the previous frame face image I ₁ and the current face image I ₂ ;

(4.3) Utilize threshold η ₁ to do SIFT feature matching, and make iter ₁ =iter ₁ +1;

(4.4) If the number of matches is less than t, and iter ₁ <iter_max ₁ , set η ₁ =η ₁ +0.15, and return to (4.3); if the number of matches is greater than t, or iter ₁ >iter_max ₁ , turn to (4.5);

(4.5) Utilize threshold η ₂ to do SIFT feature matching, and make iter ₂ =iter ₂ +1;

(4.6) If the number of matches is less than 2, and iter ₂ <iter_max ₂ , set η ₂ =η ₂ +0.02, and return to (4.5); if the number of matches is greater than 2 and less than 5, and iter ₂ <iter_max ₂ , Then set η ₂ =η ₂ -0.01, and return to (4.5); otherwise, turn to (4.7);

( _4.7 ) Obtain the dense matching set B={(b _j , b′ _j ), j=1, 2, ..., q _B } obtained by using η 1, where q _B is the number of dense matching, and using η ₂ The obtained exact matching set A={(a _i , a' _i ), i=1, 2, ..., q _A }, where q _A is the number of dense matches;

(4.8) Calculate the distance ratio of the first and last two pairs of matching points in the exact matching set

(4.9) For each pair of matches (b _j , b′ _j ) in the dense matching set, first calculate then calculate and get Then calculate θ=atan((a _t (y)-b _j (y))/(a _t (x)-b _j (x))), θ'=atan((a' _t (y)-b' _j (y))/(a _t (x)-b _j (x))), and calculate sig _x = sign(a _i (y)-b _j (y)), sig _y = sign(a _t (x )-b _j (x)), sig′ _x = sign(a′ _t (y)-b′ _j (y)), sig _y = sign(a′ _t (x)-b′ _j (x)); if Then save the current match, otherwise delete the match; finally get the final match pair C={(c _k , c′ _k ), k=1, 2, . . . , q _C }, where q _C is the number of final matches.