CN103456029B

CN103456029B - The Mean Shift tracking of a kind of anti-Similar color and illumination variation interference

Info

Publication number: CN103456029B
Application number: CN201310395734.XA
Authority: CN
Inventors: 张红颖; 胡正; 孙毅刚
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2013-09-03
Filing date: 2013-09-03
Publication date: 2016-03-30
Anticipated expiration: 2033-09-03
Also published as: CN103456029A

Abstract

A Mean that resists interference from similar colors and lighting changes? Shift tracking method. It starts from the target representation method, makes full use of the size relationship between the target pixel and its eight-neighborhood pixel gray value, expands the local saliency operator LSN, and proposes a local saliency texture operator—local ternary number, with Improving the discriminative ability of object representation methods against similarly colored backgrounds. In order to further improve the distinguishing ability of texture features, key pixels on edges, lines and corners are extracted to generate a target mask, and then the LTN feature of the target pixel in the mask is combined with the chromaticity information with little change in illumination to obtain a A new target model is proposed to improve the ability to resist interference from similar colors and lighting changes. Finally, embed the proposed target model into Mean? Under the Shift tracking framework, and in the presence of interference from similar background colors and light intensity changes in the scene, the task of continuously and stably tracking the target can be completed.

Description

A Mean Shift Tracking Method Anti-interference of Similar Colors and Illumination Variations

技术领域technical field

本发明属于计算机视觉技术领域，特别是涉及一种抗相似颜色和光照变化干扰的MeanShift跟踪方法。The invention belongs to the technical field of computer vision, in particular to a MeanShift tracking method that resists interference from similar colors and illumination changes.

背景技术Background technique

目标跟踪是指根据感兴趣区域在视频序列的当前帧中位置等参数实现目标在序列的下一帧中的定位，其是计算机视觉领域的一个重要环节，在人机交互、智能监控、视觉导航等军民领域有着广泛的应用。现实跟踪场景中，背景颜色可能会与待跟踪的目标相似而影响跟踪的准确性，光照的整体变化也会干扰跟踪的稳定性，持续准确的目标跟踪是近年来计算机视觉领域的一个研究热点和难点。Target tracking refers to the positioning of the target in the next frame of the sequence based on parameters such as the position of the region of interest in the current frame of the video sequence. It is an important link in the field of computer vision. It has a wide range of applications in military and civilian fields. In realistic tracking scenarios, the background color may be similar to the target to be tracked and affect the accuracy of tracking, and the overall change of illumination will also interfere with the stability of tracking. Continuous and accurate target tracking has become a research hotspot in the field of computer vision in recent years. difficulty.

目前在目标跟踪领域，Comaniciu等人提出的核跟踪方法最具代表性，核跟踪方法参考文献见：At present, in the field of target tracking, the nuclear tracking method proposed by Comaniciu et al. is the most representative. For references of the nuclear tracking method, see:

[1]ComaniciuD,RameshV,MeerP.Kernel-basedobjecttracking[J].IEEETransactionsonPatternAnalysisandMachineIntelligence,2003,25(5):564-577.[1] ComaniciuD, RameshV, MeerP. Kernel-based object tracking [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25 (5): 564-577.

核跟踪方法使用核函数加权的颜色直方图作为目标描述模型，通过MeanShift迭代寻优找到Bhattacharyya系数最大的候选区域作为目标的跟踪结果。核跟踪方法在一般场景中能取得较好的跟踪效果，且具有高效、实用等优点。但使用单一的颜色信息来描述目标使得目标的表征能力不足，容易受场景中相似颜色的背景和光照变化的干扰导致跟踪不准确甚至失败。The kernel tracking method uses the color histogram weighted by the kernel function as the target description model, and finds the candidate area with the largest Bhattacharyya coefficient through MeanShift iterative optimization as the target tracking result. The kernel tracking method can achieve better tracking results in general scenes, and has the advantages of high efficiency and practicality. However, using a single color information to describe the target makes the target's representation ability insufficient, and it is easy to be interfered by the background of similar colors and illumination changes in the scene, resulting in inaccurate or even failure of tracking.

根据目标的表示方法对跟踪结果有着重要影响这一事实，很多学者从目标表示方法入手，提出了各种改进方法以提高传统RGB颜色模型的核跟踪算法性能。According to the fact that the representation method of the target has an important influence on the tracking results, many scholars start with the representation method of the target and propose various improvement methods to improve the performance of the kernel tracking algorithm of the traditional RGB color model.

颜色和纹理是两种优势互补的底层视觉特征，人类的视觉认知过程通常是首先由颜色特征发现目标，然后通过纹理特征加以判别，融合颜色和纹理特征来共同表征目标已成为学者的共识。Ning等人提出一种颜色-纹理直方图的目标模型用于核跟踪方法，参考文献如下：Color and texture are two low-level visual features with complementary advantages. In the process of human visual cognition, the target is usually first discovered by the color feature, and then discriminated by the texture feature. It has become the consensus of scholars to jointly represent the target by combining color and texture features. Ning et al proposed a color-texture histogram target model for the nuclear tracking method, the references are as follows:

[2]NingJ,ZhangL,ZhangD,etal.Robustobjecttrackingusingjointcolor-texturehistogram[J].InternationalJournalofPatternRecognitionandArtificialIntelligence,2009,23(07):1245-1263.[2] NingJ, ZhangL, ZhangD, et al. Robust object tracking using joint color-texture histogram [J]. International Journal of Pattern Recognition and Artificial Intelligence, 2009, 23 (07): 1245-1263.

该模型结合了RGB颜色模型和5种关键UniformLBP纹理模式，在一定程度上增强了目标的表征能力且实时性较好，但该目标表示方法舍弃了较多的LBP纹理模式，从而限制了纹理特征的区分能力。This model combines the RGB color model and 5 key UniformLBP texture modes, which enhances the representation ability of the target to a certain extent and has better real-time performance. However, the target representation method discards more LBP texture modes, thus limiting the texture features. ability to distinguish.

Tavakoli等人通过分析LBP纹理算子并将其加以简化，提出局部相似数量（LocalSimilarityNumber,LSN）这一新的视觉描述子，该算子统计每个像素与其8个邻域像素灰度值相似的数量作为该像素的局部显著性度量，然后将LSN与RGB颜色模型结合得到一种颜色-显著度目标模型并引入到MeanShift框架实现了目标跟踪，参考文献如下：Tavakoli et al. analyzed the LBP texture operator and simplified it, and proposed a new visual descriptor called Local Similarity Number (LSN), which counts the similarity of each pixel's gray value to its 8 neighboring pixels. The number is used as the local saliency measure of the pixel, and then the LSN and the RGB color model are combined to obtain a color-saliency target model and introduced into the MeanShift framework to achieve target tracking. The references are as follows:

[3]TavakoliHR,MoinMS,HeikkilaJ.LocalSimilarityNumberanditsapplicationtoobjecttracking[J].Internationaljournalofadvanceroboticsystems,2013,10.[3] Tavakoli HR, Moin MS, Heikkila J. Local Similarity Number and its application to object tracking [J]. International journal of advances in robotic systems, 2013, 10.

与Ning等人提出的颜色-纹理模型相比，该模型能充分利用8邻域像素的每一局部结构以提取更多有用信息，Tavakoli等人通过实验证明能较Ning等人的方法取得更好跟踪效果。Tavakoli等人提取的颜色-显著度目标模型不足之处在于，LSN算子的每一显著程度可能包含有多种不同的局部纹理结构且不能区分这些不同的局部纹理结构，不利于在颜色与背景相似的场景中区分目标，且RGB颜色模型受光照强度变化影响较大，影响算法在户外等光照变化情况下的跟踪稳定性。Compared with the color-texture model proposed by Ning et al., this model can make full use of each local structure of 8 neighboring pixels to extract more useful information. Tavakoli et al. have proved through experiments that it can achieve better track effect. The disadvantage of the color-saliency target model extracted by Tavakoli et al. is that each saliency level of the LSN operator may contain a variety of different local texture structures and cannot distinguish these different local texture structures. Differentiate targets in similar scenes, and the RGB color model is greatly affected by changes in light intensity, which affects the tracking stability of the algorithm in outdoor light changes.

发明内容Contents of the invention

为了解决上述问题，本发明的目的在于提供一种抗相似颜色和光照变化干扰的MeanShift跟踪方法。In order to solve the above problems, the object of the present invention is to provide a MeanShift tracking method that is resistant to interference from similar colors and illumination changes.

为了达到上述目的，本发明提供的抗相似颜色和光照变化干扰的MeanShift跟踪方法包括按顺序进行的下列步骤：In order to achieve the above object, the MeanShift tracking method of anti-similar color and illumination variation interference provided by the present invention comprises the following steps carried out in order:

1）读入视频的当前帧，并通过人机交互的方式选定感兴趣的待跟踪目标，初始化待跟踪目标的中心坐标、尺度等参数，对目标像素提取出局部显著性纹理算子—局部三值数量(LocalTernaryNumber，LTN)，通过取边缘、线和角点上关键像素生成目标掩膜，以进一步提高LTN纹理特征的区分能力，最后将目标掩膜内像素的LTN特征与受光照强度变化影响较小的色度信息相结合得到目标表示方法，并对待跟踪目标的所有像素用上述方法进行表示，以建立起目标参考模型；1) Read in the current frame of the video, select the interested target to be tracked through human-computer interaction, initialize the center coordinates, scale and other parameters of the target to be tracked, and extract the local saliency texture operator for the target pixel—local Ternary number (LocalTernaryNumber, LTN), by taking the key pixels on the edge, line and corner to generate the target mask to further improve the ability to distinguish the LTN texture features, and finally the LTN feature of the pixels in the target mask and the change in the intensity of the light The target representation method is obtained by combining the chrominance information with less influence, and all the pixels of the target to be tracked are represented by the above method to establish the target reference model;

2）读入下一视频帧作为当前帧，以待跟踪目标在上一帧中的跟踪位置作为起点，在当前帧的候选区域内利用步骤1）中的目标表示方法建立目标候选模型，然后以Bhattacharyya系数作为目标参考模型和目标候选模型的相似性度量，通过MeanShift迭代寻优在邻域内找到Bhattacharyya系数最大的候选区域作为感兴趣目标的跟踪结果，其中迭代收敛条件可预先设定,然后更新待跟踪目标的收敛位置，以同样的方法完成待跟踪目标在下一帧中的跟踪至视频结束。2) Read the next video frame as the current frame, take the tracking position of the target to be tracked in the previous frame as the starting point, use the target representation method in step 1) to establish the target candidate model in the candidate area of the current frame, and then use The Bhattacharyya coefficient is used as the similarity measure between the target reference model and the target candidate model, and the candidate area with the largest Bhattacharyya coefficient is found in the neighborhood through MeanShift iterative optimization as the tracking result of the target of interest. The iterative convergence conditions can be set in advance, and then updated to be Track the convergent position of the target, and complete the tracking of the target to be tracked in the next frame to the end of the video in the same way.

所述的的步骤1）中的目标表示方法是在HSV颜色空间中，将目标像素的H分量量化为16个等级，然后在待跟踪目标LTN掩膜内将目标像素量化后的H分量即色度信息与目标像素的局部显著性纹理算子LTN相结合而得到。The target representation method in step 1) is to quantize the H component of the target pixel into 16 levels in the HSV color space, and then quantize the H component of the target pixel in the LTN mask of the target to be tracked, namely the color The degree information is combined with the local saliency texture operator LTN of the target pixel.

本发明提供的抗相似颜色和光照变化干扰的MeanShift跟踪方法是从目标表示方法入手，充分利用目标像素与其八邻域像素灰度值的大小关系，将局部显著性算子LSN加以拓展，提出一种新的局部显著性纹理算子—局部三值数量，以提高目标表示方法在相似颜色背景下的区分能力。为了进一步提高纹理特征的区分能力，提取边缘、线和角点上关键像素点生成目标掩膜，然后将掩膜内目标像素的LTN特征与受光照变化较小的色度信息相结合，得到一种新的目标模型以提高抗相似颜色和光照变化干扰的能力。最后将提出的目标模型嵌入到MeanShift跟踪框架下，并在场景中存在相似背景颜色和光照强度变化干扰的情况下，完成仍能持续稳定地跟踪目标的任务。The MeanShift tracking method provided by the present invention to resist the interference of similar colors and illumination changes starts from the target representation method, makes full use of the size relationship between the target pixel and its eight neighbor pixel gray values, expands the local saliency operator LSN, and proposes a A new local saliency texture operator—local ternary quantity—to improve the discriminative ability of object representation methods against similarly colored backgrounds. In order to further improve the distinguishing ability of texture features, key pixels on edges, lines and corners are extracted to generate a target mask, and then the LTN feature of the target pixel in the mask is combined with the chromaticity information with little change in illumination to obtain a A new target model is proposed to improve the ability to resist interference from similar colors and lighting changes. Finally, the proposed target model is embedded into the MeanShift tracking framework, and the task of continuously and stably tracking the target is completed in the presence of similar background color and light intensity changes in the scene.

附图说明Description of drawings

图1为本发明提供的抗相似颜色和光照变化干扰的MeanShift跟踪方法流程图。Fig. 1 is a flow chart of the MeanShift tracking method for resisting interference from similar colors and illumination changes provided by the present invention.

图2（a）—（i）为中心像素的局部显著程度示意图。Figure 2(a)-(i) is a schematic diagram of the local saliency of the central pixel.

图3（a）—（f）为两组和算子生成过程示意图。Figure 3(a)-(f) are two groups and Schematic diagram of the operator generation process.

图4为一组待跟踪目标及其LTN掩膜提取结果。Figure 4 is a group of targets to be tracked and their LTN mask extraction results.

图5为采用本发明提供的抗相似颜色和光照变化干扰的MeanShift跟踪方法对小车序列进行跟踪的部分跟踪结果。Fig. 5 is a partial tracking result of tracking a car sequence by using the MeanShift tracking method provided by the present invention to resist interference from similar colors and illumination changes.

图6为分别采用基于传统颜色模型的核跟踪方法和本发明跟踪方法时，上述小车序列前100帧具体的跟踪误差和迭代次数曲线。Fig. 6 is the specific tracking error and iteration number curves of the first 100 frames of the above-mentioned car sequence when the kernel tracking method based on the traditional color model and the tracking method of the present invention are respectively adopted.

图7为采用本发明提供的抗相似颜色和光照变化干扰的MeanShift跟踪方法对妇女序列进行跟踪的部分跟踪结果。Fig. 7 is a partial tracking result of women's sequence using the MeanShift tracking method provided by the present invention to resist interference from similar colors and illumination changes.

图8（a）、（b）分别给出了基于传统颜色模型的核跟踪方法和本发明跟踪方法的MeanShift迭代次数分布直方图。Fig. 8(a) and (b) respectively show the distribution histograms of MeanShift iterations of the traditional color model-based kernel tracking method and the tracking method of the present invention.

图9（a）、（b）分别给出了上述两种方法跟踪妇女序列的误差分布直方图。Figure 9(a) and (b) respectively show the error distribution histograms of the above two methods for tracking women sequences.

具体实施方式detailed description

下面结合附图和具体实施例对本发明提供的抗相似颜色和光照变化干扰的MeanShift跟踪方法进行详细说明。The MeanShift tracking method for resisting interference from similar colors and illumination changes provided by the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

如图1所示，本发明提供的抗相似颜色和光照变化干扰的MeanShift跟踪方法包括按顺序进行的下列步骤：As shown in Figure 1, the MeanShift tracking method of anti-similar color and illumination change interference provided by the present invention comprises the following steps carried out in order:

1）读入视频的当前帧，并通过人机交互的方式选定感兴趣的待跟踪目标，初始化待跟踪目标的中心坐标、尺度等参数，对目标像素提取出局部显著性纹理算子—局部三值数量（LTN），该算子是在考虑到LSN算子的每一显著程度可能包含有多种不同的局部纹理结构，不利于区分颜色与背景相似的目标的基础上，将LSN算子加以拓展而得到。1) Read in the current frame of the video, select the interested target to be tracked through human-computer interaction, initialize the center coordinates, scale and other parameters of the target to be tracked, and extract the local saliency texture operator for the target pixel—local Three-valued number (LTN), this operator is based on the consideration that each significant degree of the LSN operator may contain a variety of different local texture structures, which is not conducive to distinguishing objects with similar colors and backgrounds. obtained by expanding.

Tavakoli等人提出的LSN算子通过统计8邻域像素和中心像素灰度值相似的数量来衡量中心像素的局部显著程度，定义如下：The LSN operator proposed by Tavakoli et al. measures the local saliency of the central pixel by counting the number of similar gray values between the 8 neighboring pixels and the central pixel, which is defined as follows:

${LSN LSN}_{P P,, R R}^{d d} = = {Σ Σ}_{i i = = 00}^{P P - - 11} f f (({g g}_{i i} - - {g g}_{c c},, d d)) - - - - - - ((11))$

式中g_{i|i=0,...,P-1}是以g_c为中心、R为半径的P个邻域像素灰度值，d为相似差异度，In the formula, g _{i|i=0,...,P-1} is the gray value of P neighborhood pixels with g _c as the center and R as the radius, d is the similarity difference,

$f f ((x x,, d d)) = = \{\begin{matrix} 11,, & | | x x | | \leq \leq d d \\ 00,, & | | x x | | > > d d \end{matrix} - - - - - - ((22))$

表示一个像素与其半径为1的8邻域像素相似度，取值范围为[0，8]，分别对应着该中心像素的9种显著程度，如图2（a）—（i）所示。 Indicates the similarity between a pixel and its 8 neighboring pixels with a radius of 1, and the value range is [0, 8], which correspond to 9 distinct degrees of the central pixel, as shown in Figure 2(a)-(i).

图2中邻域像素与中心像素相似时用白色圆圈表示，否则用黑色圆圈表示。图2（a）表示8个邻域像素的灰度值都与中心像素相似，只有一种纹理模式与之对应，该模式对应的中心像素最不显著。图2（b）—（h）表示分别有不同个邻域像素与中心像素相似，其中每一种显著程度都对应着多种纹理模式，中心像素显著程度依次增强。图2（i）表示8个邻域像素均不与中心像素相似，该中心像素最为显著，有且仅有一种纹理模式与之对应。由此可见，LSN算子所表示的每一局部显著程度可能包含有多种不同的局部纹理结构，该算子不能将这些不同的纹理结构加以区分，因此当目标存在于与其颜色相近的场景中时，目标像素与背景像素的局部显著度可能相似，仅使用局部相似数量不能很好地区分目标和背景。In Figure 2, when the neighboring pixels are similar to the central pixel, they are represented by white circles, otherwise they are represented by black circles. Figure 2(a) shows that the gray values of the 8 neighboring pixels are similar to the central pixel, and there is only one texture mode corresponding to it, and the central pixel corresponding to this mode is the least significant. Figure 2(b)-(h) shows that different neighboring pixels are similar to the central pixel, and each salience degree corresponds to a variety of texture modes, and the salience degree of the central pixel is enhanced in turn. Figure 2(i) shows that none of the 8 neighboring pixels are similar to the central pixel, and the central pixel is the most prominent, and there is only one texture mode corresponding to it. It can be seen that each local saliency represented by the LSN operator may contain a variety of different local texture structures, and the operator cannot distinguish these different texture structures, so when the target exists in a scene with a similar color When , the local saliency of the target pixel and the background pixel may be similar, and only using the local similarity number cannot distinguish the target and the background well.

本发明将LSN算子加以拓展，提出一种新的局部显著性纹理算子—局部三值数量LTN。该算子通过两个变量分别统计邻域像素比中心像素灰度值大和小的数量，定义局部三值数量LTN为一个二维向量，表达式如下：The invention expands the LSN operator and proposes a new local saliency texture operator—local ternary number LTN. The operator uses two variables to count the number of neighboring pixels that are larger and smaller than the gray value of the central pixel, and defines the local three-valued number LTN as a two-dimensional vector. The expression is as follows:

${LTN LTN}_{P P,, R R}^{d d} = = (({Σ Σ}_{i i = = 00}^{P P - - 11} {f f}_{s the s} (({g g}_{i i} - - {g g}_{c c},, d d)),, {Σ Σ}_{i i = = 00}^{P P - - 11} {f f}_{l l} (({g g}_{i i} - - {g g}_{c c},, d d)))) - - - - - - ((33))$

式中第一项表示比中心像素灰度值小的邻域像素数量，其中：The first term in the formula represents the number of neighboring pixels that are smaller than the gray value of the center pixel, where:

${f f}_{s the s} ((x x,, d d)) = = \{\begin{matrix} 11,, & x x < < d d \\ 00,, & x x &GreaterEqual; &Greater Equal; d d \end{matrix} - - - - - - ((44))$

第二项表示比中心像素灰度值大的邻域像素数量，其中：The second term represents the number of neighboring pixels that are larger than the gray value of the center pixel, where:

${f f}_{l l} ((x x,, d d)) = = \{\begin{matrix} 11,, & x x > > d d \\ 00,, & x x \leq \leq d d \end{matrix} - - - - - - ((55))$

本发明在定义局部三值数量LTN算子时没有采用三维向量将邻域像素与中心像素灰度值相似的数量包含在内，是因为由式（3）能唯一确定像素的局部显著程度，即能唯一确定邻域像素与中心像素灰度值相似的数量，式（3）不产生沉余信息。表示一个半径为1的8邻域像素局部显著性纹理特征，中心像素的9种显著程度同图2所示。When defining the local ternary quantity LTN operator, the present invention does not use a three-dimensional vector to include the quantity similar to the gray value of the adjacent pixel and the central pixel, because the local salience degree of the pixel can be uniquely determined by formula (3), that is It can uniquely determine the number of similar gray values between the neighboring pixels and the central pixel, and formula (3) does not generate residual information. Represents a local saliency texture feature of 8 neighborhood pixels with a radius of 1, and the 9 saliency levels of the central pixel are the same as those shown in Figure 2.

LTN算子计算中心像素与邻域的灰度差值关系，对于场景的整体光照变化具有较好的适应性，对尺度和旋转变化也具有较好的不变性。LTN算子在LSN算子基础上，充分利用了邻域像素与中心像素灰度值的大小关系，能表达每一种局部纹理结构且能将同一显著程度所包含的不同纹理模式加以区分，在颜色相似情况下有利于区分目标和背景。The LTN operator calculates the gray difference relationship between the central pixel and the neighborhood, which has good adaptability to the overall illumination change of the scene, and has good invariance to scale and rotation changes. On the basis of the LSN operator, the LTN operator makes full use of the size relationship between the neighborhood pixels and the central pixel gray value, which can express each local texture structure and distinguish different texture patterns contained in the same significant degree. Similar colors are good for distinguishing between objects and backgrounds.

图3为两组和算子生成过程示意图。图3（a）、（d）为一小块目标图像的灰度分布，灰色填充区域表示待提取特征的当前像素。图3（b）、（e）为算子计算结果，两个具有不同局部纹理结构的中心像素局部显著度均为3。图3（c）、（f）为算子计算结果，能将（a）、（d）中局部显著度为3（用8减去式（3）中两元素之和）的目标像素根据局部纹理结构的不同加以区分。Figure 3 shows two groups and Schematic diagram of the operator generation process. Figure 3 (a), (d) is the gray distribution of a small target image, and the gray filled area represents the current pixel of the feature to be extracted. Figure 3 (b), (e) is According to the calculation result of the operator, the local saliency of two central pixels with different local texture structures is both 3. Figure 3(c), (f) is The calculation result of the operator can distinguish the target pixels whose local saliency is 3 in (a) and (d) (subtract the sum of the two elements in formula (3) from 8) according to the difference in local texture structure.

图2所示的9种局部显著程度分别包含有平坦区域、斑点、线、角点和边缘等多种纹理结构。Tavakoli等人通过实验表明，LSN掩膜在提取目标上边缘、线和角点等关键像素的同时能得到较完整的感兴趣目标，保留了较多的有用信息。The nine local salience levels shown in Figure 2 include various texture structures such as flat areas, spots, lines, corners, and edges. Tavakoli et al. have shown through experiments that the LSN mask can obtain a relatively complete target of interest while extracting key pixels such as the upper edge, line, and corner of the target, and retain more useful information.

为了提高上述LTN特征的纹理区分能力，本发明借鉴LSN掩膜方法，定义目标LTN掩膜如下：In order to improve the texture discrimination ability of the above-mentioned LTN features, the present invention uses the LSN mask method for reference, and defines the target LTN mask as follows:

${mLTN MLTN}_{8,1 8,1}^{d d} = = \{\begin{matrix} 11 + + ((88 - - {Σ Σ}_{i i = = 00}^{77} {f f}_{s the s} (({g g}_{i i} - - {g g}_{c c},, d d)) - - {Σ Σ}_{i i = = 00}^{77} {f f}_{l l} (({g g}_{i i} - - {g g}_{c c},, d d)))),, \\ {Σ Σ}_{i i = = 00}^{77} {f f}_{s the s} (({g g}_{i i} - - {g g}_{c c},, d d)) + + {Σ Σ}_{i i = = 00}^{77} {f f}_{l l} (({g g}_{i i} - - {g g}_{c c},, d d)) &Element; &Element; {{4,5,6,7,8 4,5,6,7,8}} \\ 00,, \begin{matrix} \end{matrix} others others \end{matrix} - - - - - - ((66))$

为了验证该LTN掩膜的有效性，图4给出了一组乒乓球运动员头部的待跟踪目标及其LTN掩膜提取结果。In order to verify the effectiveness of the LTN mask, Figure 4 shows a group of table tennis players' heads to be tracked and their LTN mask extraction results.

由图4（a）可以看出，待跟踪的乒乓球运动员的肤色与背景存在一定的相似性，直接将目标块上所有像素量化到特征空间可能会影响跟踪结果的准确性。图4（b）表明，LTN掩膜在提取目标上边缘、线和角点等重要像素的同时，较好地保留了目标的完整性以提供更多的有用信息，并且抑制了颜色相似且纹理特征不明显的平坦区域。It can be seen from Figure 4(a) that there is a certain similarity between the skin color of the table tennis player to be tracked and the background, and directly quantizing all pixels on the target block to the feature space may affect the accuracy of the tracking results. Figure 4(b) shows that the LTN mask can better preserve the integrity of the target to provide more useful information while extracting important pixels such as the upper edge, line and corner of the target, and suppress the similar color and texture A flat area with few distinct features.

目标的色度信息具有受光照强度变化影响小的优点，可用作户外等光照变化的跟踪场景中的目标颜色特征。基于这一特点，接下来本发明将目标LTN掩膜内像素的色度信息和LTN特征相结合，得到一种新的目标表示方法。具体来说就是，在HSV颜色空间中，将目标像素的H分量量化为16个等级，然后在目标LTN掩膜内，将目标像素量化后的H分量即色度信息与目标像素的局部显著性纹理特征LTN相结合得到一种新的表示方法，其中LTN算子的二维元素均有5种取值可能。因此，本发明将目标像素量化到16×5×5=400维的特征空间。The chromaticity information of the target has the advantage of being less affected by the change of light intensity, and can be used as the target color feature in the tracking scene of light change such as outdoors. Based on this feature, the present invention then combines the chrominance information of the pixels in the target LTN mask with the LTN features to obtain a new target representation method. Specifically, in the HSV color space, the H component of the target pixel is quantized into 16 levels, and then in the target LTN mask, the quantized H component of the target pixel, that is, the chrominance information and the local saliency of the target pixel A new representation method is obtained by combining the texture feature LTN, in which the two-dimensional elements of the LTN operator have five possible values. Therefore, the present invention quantizes the target pixels into a feature space of 16×5×5=400 dimensions.

对目标LTN掩膜内所有像素采用同样方法量化到特征空间，建立目标参考模型{q_u}_u=1,...,m如下：All pixels in the target LTN mask are quantized to the feature space by the same method, and the target reference model {q _u } _u=1,...,m is established as follows:

${q q}_{u u} = = C C {Σ Σ}_{i i = = 11}^{n no} k k (({| | | | \frac{{x x}_{i i}^{* *} - - {x x}_{00}}{h h} | | | |}^{22})) δ δ [[b b (({x x}_{i i}^{* *})) - - u u]] - - - - - - ((77))$

式（7）中表示目标上n个像素的坐标且中心坐标为x₀，函数b(x)将坐标为x处的目标像素量化到特征空间，m为特征空间的量化等级，k(x)为各向同性核函数，它将距离目标中心较近的像素赋以较大的权值，反之赋以较小的权值，h为核函数带宽，δ(x)为一维Kroneckerdelta函数，C为归一化系数：In formula (7) Represents the coordinates of n pixels on the target and the center coordinate is x ₀ , the function b(x) quantizes the target pixel at the coordinate x to the feature space, m is the quantization level of the feature space, and k(x) is the isotropic kernel function, which assigns larger weights to pixels closer to the center of the target, and vice versa, h is the bandwidth of the kernel function, δ(x) is the one-dimensional Kroneckerdelta function, and C is the normalization coefficient :

$C C = = 11 / / {Σ Σ}_{i i = = 11}^{n no} k k (({| | | | \frac{{x x}_{i i}^{* *} - - {x x}_{00}}{h h} | | | |}^{22})) - - - - - - ((88))$

该目标参考模型物理含义为将目标像素量化到本发明提出的特征空间时，各级特征的概率密度分布。The physical meaning of the target reference model is the probability density distribution of features at all levels when the target pixels are quantized into the feature space proposed by the present invention.

2）读入下一视频帧作为当前帧，以待跟踪目标在上一帧中的跟踪位置作为起点，在当前帧的候选区域内利用步骤1）中的目标表示方法建立目标候选模型。2) Read the next video frame as the current frame, take the tracking position of the target to be tracked in the previous frame as the starting point, and use the target representation method in step 1) to establish a target candidate model in the candidate area of the current frame.

假设目标在上一帧中跟踪结果的中心坐标为y，表示以y为中心坐标的目标候选区域像素的坐标，按照步骤1）的目标参考模型建立方法，同理建立当前位置上的目标候选模型{p_u(y)}_u=1,...,m如下：Assuming that the center coordinate of the target's tracking result in the previous frame is y, Represents the coordinates of the pixel of the target candidate area with y as the center coordinate, according to the target reference model establishment method in step 1), similarly establish the target candidate model at the current position {p _u (y)} _{u=1,..., m} as follows:

${p p}_{u u} ((y the y)) = = {C C}_{h h} {Σ Σ}_{i i = = 11}^{{n no}_{h h}} k k (({| | | | \frac{{x x}_{i i} - - y the y}{h h} | | | |}^{22})) δ δ [[b b (({x x}_{i i})) - - u u]] - - - - - - ((99))$

其中各参数含义同步骤1），且归一化系数：The meaning of each parameter is the same as step 1), and the normalization coefficient:

${C C}_{h h} = = 11 / / {Σ Σ}_{i i = = 11}^{{n no}_{h h}} k k (({| | | | \frac{{x x}_{i i} - - y the y}{h h} | | | |}^{22})) - - - - - - ((1010))$

至此，目标的参考模型和候选模型已经建立，跟踪就是要找到目标在当前帧中最相似的区域，用Bhattacharyya系数作为目标参考模型和候选模型的相似性度量方法：So far, the reference model and candidate model of the target have been established. Tracking is to find the most similar area of the target in the current frame, and use the Bhattacharyya coefficient as the similarity measurement method between the target reference model and the candidate model:

$ρ ρ ((p p ((y the y)),, q q)) = = {Σ Σ}_{u u = = 11}^{m m} \sqrt{{p p}_{u u} ((y the y)) {q q}_{u u}} - - - - - - ((1111))$

为了找到Bhattacharyya系数最大的候选区域，将式（11）在跟踪起点y₀（目标在上一帧中的跟踪结果）处泰勒展开得到：In order to find the candidate area with the largest Bhattacharyya coefficient, the Taylor expansion of formula (11) at the tracking starting point y ₀ (the tracking result of the target in the previous frame) is obtained:

$ρ ρ ((p p ((y the y)),, q q)) \approx \approx \frac{11}{22} {Σ Σ}_{u u = = 11}^{m m} \sqrt{{p p}_{u u} (({y the y}_{00})) {q q}_{u u}} + + \frac{{C C}_{h h}}{22} {Σ Σ}_{i i = = 11}^{{n no}_{h h}} {w w}_{i i} k k (({| | | | \frac{{x x}_{i i} - - y the y}{h h} | | | |}^{22})) - - - - - - ((1212))$

其中权重：where weights:

${w w}_{i i} = = {Σ Σ}_{u u = = 11}^{m m} \sqrt{\frac{{q q}_{u u}}{{p p}_{u u} (({y the y}_{00}))}} δ δ [[b b (({x x}_{i i} - - u u))]] - - - - - - ((1313))$

式（12）左边第一项为常数，因此Bhattacharyya系数最大等同于寻找权重w_i核加权和最大的区域。而权重w_i的物理含义就是当前候选区域各像素来自于目标的可信度，在整个权重分布上通过MeanShift迭代寻优找到极大值的区域即为目标在当前序列中的跟踪结果。其中每一次迭代将候选区域中心坐标由当前位置y₀移动到新位置y₁：The first item on the left side of formula (12) is a constant, so the maximum Bhattacharyya coefficient is equivalent to finding the area with the maximum weighted w _i kernel weighted sum. The physical meaning of the weight w _i is the credibility of each pixel in the current candidate area from the target, and the area where the maximum value is found through MeanShift iterative optimization on the entire weight distribution is the tracking result of the target in the current sequence. Each iteration moves the coordinates of the center of the candidate area from the current position y ₀ to the new position y ₁ :

${y the y}_{11} = = \frac{{Σ Σ}_{i i = = 11}^{{n no}_{h h}} {x x}_{i i} {w w}_{i i} g g (({| | | | \frac{{x x}_{i i} - - {y the y}_{00}}{h h} | | | |}^{22}))}{{Σ Σ}_{i i = = 11}^{{n no}_{h h}} {w w}_{i i} g g (({| | | | \frac{{x x}_{i i} - - {y the y}_{00}}{h h} | | | |}^{22}))} - - - - - - ((1414))$

其中，g(x)=-k'(x)。where g(x)=-k'(x).

根据预先设定的最大迭代次数N和位移ε作为MeanShift迭代的收敛条件，并将最终的收敛位置记录为感兴趣目标在当前帧中的跟踪结果。According to the preset maximum number of iterations N and displacement ε as the convergence condition of MeanShift iteration, and record the final convergence position as the tracking result of the target of interest in the current frame.

最后将待跟踪目标在当前帧中的跟踪结果作为其在下一帧中候选目标的起始位置，循环步骤2）继续待跟踪目标在下一视频帧中的跟踪直到视频结束。Finally, the tracking result of the target to be tracked in the current frame is used as the starting position of the candidate target in the next frame, and the loop step 2) continues the tracking of the target to be tracked in the next video frame until the end of the video.

由式（14）可以看出，准确的权重w_i是MeanShift迭代寻找到目标真实位置的重要前提。又从式（13）可知，权重w_i直接由目标和候选区域的表示方法决定。因此，从理论上讲，当场景中存在与目标颜色相似的背景或光照强度变化时，本发明提供的跟踪方法能较基于传统颜色模型的核跟踪方法取得更好的跟踪性能，下面的实验结果证明了本发明的有效性。It can be seen from formula (14) that accurate weight w _i is an important prerequisite for MeanShift to iteratively find the true position of the target. It can also be seen from formula (13) that the weight w _i is directly determined by the representation methods of the target and candidate regions. Therefore, in theory, when there is a background or light intensity change similar to the target color in the scene, the tracking method provided by the present invention can achieve better tracking performance than the kernel tracking method based on the traditional color model. The following experimental results The effectiveness of the present invention has been proved.

为了验证本发明方法的效果，下面给出了在配置为Pentium(R)Dual-Core2.70GHzCPU，2GBRAM的PC机上，使用VisualStudio2010集成开发环境、OpenCV2.4.3和两组标准测试序列，分别测试了本发明方法在场景中存在相似背景颜色和光照强度变化干扰情况下的跟踪性能，跟踪结果如图5—图9所示。In order to verify the effect of the inventive method, provide below to be configured as Pentium (R) Dual-Core2.70GHzCPU, on the PC of 2GBRAM, use VisualStudio2010 integrated development environment, OpenCV2.4.3 and two groups of standard test sequences, tested this respectively The tracking performance of the inventive method under the interference of similar background colors and light intensity changes in the scene, the tracking results are shown in Figures 5-9.

图5所示的小车序列为本方法的抗相似背景颜色干扰能力测试结果，其中待跟踪的小车和场景的背景颜色有一定的相似度，红色方框为基于传统颜色模型的核跟踪方法跟踪结果，绿色方框为本方法的跟踪结果。可以定性地看出，基于传统颜色模型的核跟踪方法在相似颜色的背景干扰下，存在着较大的跟踪误差，并且在第93帧之后发生目标完全丢失的严重后果。采用本方法能持续准确地跟踪目标，主要是因为本方法中的LTN特征具有较好的纹理区分能力，对颜色相似的目标和背景能通过纹理特征加以区分。表1分别从平均跟踪误差、标准差、迭代次数和跟踪速度四方面统计出小车序列前100帧的目标跟踪结果（第93帧之后，传统颜色模型的核跟踪方法使目标完全丢失），定量地分析了两种方法的跟踪性能。The car sequence shown in Figure 5 is the test result of this method’s anti-similar background color interference ability, in which the car to be tracked has a certain degree of similarity with the background color of the scene, and the red box is the tracking result of the nuclear tracking method based on the traditional color model , the green box is the tracking result of this method. It can be seen qualitatively that the nuclear tracking method based on the traditional color model has a large tracking error under the background interference of similar colors, and the serious consequence of complete loss of the target occurs after the 93rd frame. This method can continuously and accurately track the target, mainly because the LTN feature in this method has better texture discrimination ability, and the target and background with similar colors can be distinguished through texture features. Table 1 counts the target tracking results of the first 100 frames of the car sequence from the four aspects of average tracking error, standard deviation, number of iterations and tracking speed (after the 93rd frame, the traditional color model kernel tracking method makes the target completely lost), quantitatively The tracking performance of the two methods is analyzed.

表1两种方法对小车序列的跟踪性能Table 1 Tracking performance of the two methods on the car sequence

从表1可以看出，本方法的跟踪误差明显小于基于传统颜色模型的核跟踪方法。本方法在计算LTN算子时有少量的耗时，但本方法中的目标表示方法的区分能力较强，迭代次数减少了一半，而迭代是MeanShift跟踪框架中主要的耗时部分，因此本方法的平均跟踪速度能略快于基于传统颜色模型的核跟踪方法，序列前100帧具体的跟踪误差和迭代次数曲线分别如图6（a）、（b）所示。It can be seen from Table 1 that the tracking error of this method is significantly smaller than the kernel tracking method based on the traditional color model. This method consumes a small amount of time when calculating the LTN operator, but the target representation method in this method has a strong ability to distinguish, and the number of iterations is reduced by half, and iteration is the main time-consuming part of the MeanShift tracking framework. Therefore, this method The average tracking speed can be slightly faster than the kernel tracking method based on the traditional color model. The specific tracking error and iteration number curves of the first 100 frames of the sequence are shown in Figure 6 (a) and (b) respectively.

图7所示的妇女序列为本方法对场景光照强度变化的适应能力测试结果，其中红色方框为基于传统颜色模型的核跟踪方法的跟踪结果，绿色方框为本方法的跟踪结果。从跟踪结果可以看出，基于传统颜色模型的核跟踪方法易受户外光照强度变化的影响，导致目标的跟踪结果不准确。本方法受光照变化影响较小，能取得较好的跟踪结果，这主要是由于色度信息具有对光照变化不敏感的优点，且LTN算子统计的是中心像素与邻域像素的灰度值大小关系，对场景光强的整体变化具有较好的适应性。表2分别从平均跟踪误差、标准差、迭代次数和跟踪速度四方面定量地分析了两种方法对妇女序列的跟踪性能。The women sequence shown in Figure 7 is the test result of this method’s adaptability to changes in scene illumination intensity. The red box is the tracking result of the nuclear tracking method based on the traditional color model, and the green box is the tracking result of this method. It can be seen from the tracking results that the kernel tracking method based on the traditional color model is easily affected by the change of outdoor light intensity, which leads to inaccurate tracking results of the target. This method is less affected by illumination changes and can achieve better tracking results, mainly because the chromaticity information has the advantage of being insensitive to illumination changes, and the LTN operator counts the gray value of the central pixel and neighboring pixels The size relationship has good adaptability to the overall change of scene light intensity. Table 2 quantitatively analyzes the tracking performance of the two methods on women sequences from the four aspects of average tracking error, standard deviation, number of iterations and tracking speed.

表2两种方法对妇女序列跟踪性能Table 2 Tracking performance of two methods on women sequence

由表2可以看出本方法跟踪性能明显优于基于传统颜色模型的核跟踪方法。图8（a）、（b）分别给出了基于传统颜色模型的核跟踪方法和本跟踪方法的MeanShift迭代次数分布直方图。由图8可以看出，本方法的迭代次数更多地分布在1～3次，较基于传统颜色模型的核跟踪方法具有更快的迭代收敛速度，这在一定程度上弥补了LTN算子的用时，因此本跟踪方法的平均跟踪速度只是略低于基于传统颜色模型的核跟踪方法。It can be seen from Table 2 that the tracking performance of this method is significantly better than the kernel tracking method based on the traditional color model. Figure 8(a) and (b) show the distribution histograms of MeanShift iterations of the traditional color model-based nuclear tracking method and this tracking method, respectively. It can be seen from Figure 8 that the number of iterations of this method is more distributed between 1 and 3 times, and it has a faster iterative convergence speed than the kernel tracking method based on the traditional color model, which makes up for the limitation of the LTN operator to a certain extent. Therefore, the average tracking speed of this tracking method is only slightly lower than the kernel tracking method based on the traditional color model.

图9（a）、（b）分别给出了上述两种方法跟踪妇女序列的误差分布直方图。对比图9（a）和9（b）的误差分布可以看出，基于传统颜色模型的核跟踪方法受光照影响较大，妇女序列的跟踪误差较均匀地分布在0～20之间，而本方法的跟踪误差较多地分布在0～15之间，且主要集中在0～10之间，较基于传统颜色模型的核跟踪方法具有更高的跟踪精度，证实了本方法对户外光照强度变化的鲁棒性。Figure 9(a) and (b) respectively show the error distribution histograms of the above two methods for tracking women sequences. Comparing the error distributions in Figures 9(a) and 9(b), it can be seen that the kernel tracking method based on the traditional color model is greatly affected by illumination, and the tracking errors of the women sequence are more evenly distributed between 0 and 20, while this The tracking error of the method is mostly distributed between 0 and 15, and mainly concentrated between 0 and 10, which has higher tracking accuracy than the kernel tracking method based on the traditional color model. robustness.

Claims

1. A MeanShift tracking method of anti-similar color and illumination variation interference, is characterized in that: described tracking method comprises the following steps of carrying out in order:

1) Read in the current frame of the video, select the target to be tracked of interest through human-computer interaction, initialize the center coordinates and scale parameters of the target to be tracked, and extract the local saliency texture operator-local three The number of values, the target mask is generated by taking key pixels on edges, lines and corners to further improve the ability to distinguish local ternary quantity texture features, and finally the local ternary quantity features of pixels in the target mask and the changes in the intensity of light The target representation method is obtained by combining the chrominance information with less influence, and all the pixels of the target to be tracked are represented by the above method to establish the target reference model;

2) Read the next video frame as the current frame, take the tracking position of the target to be tracked in the previous frame as the starting point, use the target representation method in step 1) to establish the target candidate model in the candidate area of the current frame, and then use The Bhattacharyya coefficient is used as the similarity measure between the target reference model and the target candidate model, and the candidate area with the largest Bhattacharyya coefficient is found in the neighborhood through MeanShift iterative optimization as the tracking result of the target of interest. The iterative convergence conditions can be set in advance, and then updated to be Track the convergent position of the target, and complete the tracking of the target to be tracked in the next frame to the end of the video in the same way.

2. The tracking method according to claim 1, characterized in that: the target representation method in the described step 1) is in the HSV color space, the H component of the target pixel is quantized into 16 levels, and then The target local ternary quantity mask is obtained by combining the quantized H component of the target pixel, that is, the chrominance information, with the local saliency texture operator local ternary quantity of the target pixel.