CN107038400A

CN107038400A - Face identification device and method and utilize its target person tracks of device and method

Info

Publication number: CN107038400A
Application number: CN201610079687.1A
Authority: CN
Inventors: 殷雄
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2016-02-04
Filing date: 2016-02-04
Publication date: 2017-08-11

Abstract

The invention provides a face recognition device and method, and a target person tracking device and method using the same. The face recognition device includes a face set generation module configured to generate a face set based on multiple video images; a feature metric normalization module configured to perform metric space transformation on the same face image under multiple different cameras, To eliminate the difference between the same face images under the multiple different cameras; and the feature envelope forming module is configured to perform envelope processing on the multiple face images transformed in the metric space, so as to combine the face sets Multiple different face set feature spaces are transformed into one same face set feature space. The present invention normalizes the feature metric space according to the differences of different cameras, and performs recognition in combination with the feature metric space and the face image set, thereby greatly improving the accuracy of face recognition in the monitoring scene.

Description

Face recognition device and method, and target person tracking device and method using the same

技术领域technical field

本发明一般地涉及计算机视频处理技术领域，更具体地，涉及人脸识别装置、人脸识别方法、利用人脸识别装置的目标人跟踪装置以及利用人脸识别方法的目标人跟踪方法，上述的装置和方法适合城市安全管理，诸如机场、监狱和图书馆等公共区域。The present invention generally relates to the technical field of computer video processing, and more specifically, to a face recognition device, a face recognition method, a target person tracking device using a face recognition device, and a target person tracking method using a face recognition method. The device and method are suitable for urban security management, public areas such as airports, prisons and libraries.

背景技术Background technique

现有的人脸识别技术对于图像质量较好的人脸具有较高的辨识度。对于LFW等数据库(其由美国马萨诸塞大学阿姆斯特分校技术视觉实验室整理完成的)的人脸识别精度能到达99％以上。但是如果直接将此类算法应用到监控场景中，其效果将会大打折扣。以LFW数据库为例，虽然一般认为此数据库是在不受控的条件下拍摄的图像，但是在分辨率，色彩真实度，人脸姿态等方面较监控场景的人脸图像好很多。因此，如果直接将一般的人脸识别算法移植到监控场景的人脸识别中，其效果将会下降很多。Existing face recognition technology has a high degree of recognition for faces with better image quality. For databases such as LFW (which are completed by the Technical Vision Laboratory of the University of Massachusetts, Amherst), the face recognition accuracy can reach more than 99%. However, if such algorithms are directly applied to monitoring scenarios, their effects will be greatly reduced. Taking the LFW database as an example, although it is generally believed that this database is an image taken under uncontrolled conditions, it is much better than the face image of the monitoring scene in terms of resolution, color fidelity, and face posture. Therefore, if the general face recognition algorithm is directly transplanted into the face recognition of the surveillance scene, its effect will be greatly reduced.

利用上述LFW数据库的主要问题在于以下两点：1、常规人脸识别一般使用单张图像，而在监控场景中，由于图像质量较差，因此单张图像信息量有限，无法保证识别的准确性；2、常规人脸识别所面临的图像成像差异不大，因此一般不会针对于不同摄像头图像进行差异性减少处理，而在监控场景中，由于不同摄像头之间的人脸分辨率、颜色、姿态等都会有较大差异，因此需要先减少差异性，才能保证识别的准确性。因此，LFW数据库无法应用于现场监控场景。The main problems of using the above LFW database are the following two points: 1. Conventional face recognition generally uses a single image, but in the monitoring scene, due to the poor image quality, the information of a single image is limited, and the accuracy of recognition cannot be guaranteed. ; 2. The imaging differences faced by conventional face recognition are not large, so generally, the difference reduction processing will not be performed for different camera images. In the monitoring scene, due to the face resolution, color, There will be large differences in posture and so on, so it is necessary to reduce the difference first to ensure the accuracy of recognition. Therefore, the LFW database cannot be applied to field monitoring scenarios.

监控场景的人脸识别较普通人脸识别具有一些新的特点。1、单目标物体可以被多张连续图所描述，而多图所提取的特征能够更完备的描述目标特点。目前，image-set的方法可以通过多张图形成类似于高维包络的空间来描述单一目标，对于两个不同的目标计算两个包络之间的最小距离，作为这两个目标的差异度，目前相关的算法有基于仿射包的图像集距离AHISD(Affine Hullbased Image Set Distance)、稀疏近似最近邻点SANP、CRNP(CollaborativelyRegularized Nearest Points)和二元线性回归分类DLRC(Dual Linear RegressionClassification)等。虽然此类算法能够比较丰富的描述目标的特征空间，但此类算法对于不同目标形成的包络易于重叠，原因在于其特征具有相似性，因此对于不同的目标，其所提取的特征差异度必须较大，基于特征形成的包络才不会重叠。2、跨摄像头的监控场景具有不同的光照，视角/姿态以及图像分辨率，这些会极大的影响识别的精度。目前，metric-learning(即，度量学习)的方法能够解决这一问题。这类算法能够将不同特征空间的特征投影到相同的度量空间，能够减小相同目标由于摄像头和姿态/光照/表情的不同造成的差异，增加不同目标的特征差异。metric-learning方法的这一特点能够弥补image-set(即，图像集，又称为图集)方法的不足。目前相关的算法有大间隔最近邻居LMNN(Large margin nearest neighbor)等。3、由于监控摄像头需要连续不断的记录数据，因此其存储的数据量是惊人的，而在大量的数据中寻找出目标图像，需要效率极快的算法，而对人脸特征进行压缩后再识别，能够极大的提高算法效率。Compared with ordinary face recognition, face recognition in surveillance scenes has some new features. 1. A single target object can be described by multiple continuous images, and the features extracted from multiple images can describe the characteristics of the target more completely. At present, the image-set method can describe a single target by forming a space similar to a high-dimensional envelope through multiple images. For two different targets, the minimum distance between the two envelopes is calculated as the difference between the two targets. At present, related algorithms include image set distance AHISD (Affine Hullbased Image Set Distance) based on affine package, sparse approximate nearest neighbor point SANP, CRNP (Collaboratively Regularized Nearest Points) and binary linear regression classification DLRC (Dual Linear Regression Classification), etc. . Although this type of algorithm can describe the feature space of the target more abundantly, the envelopes formed by this type of algorithm for different targets tend to overlap because of the similarity of their features. Therefore, for different targets, the degree of difference in the extracted features must be Larger so that the envelopes formed based on features do not overlap. 2. Surveillance scenes across cameras have different lighting, viewing angles/poses, and image resolutions, which will greatly affect the accuracy of recognition. Currently, metric-learning (ie, metric learning) methods can solve this problem. This type of algorithm can project the features of different feature spaces into the same metric space, reduce the difference of the same target due to the difference of camera and pose/illumination/expression, and increase the feature difference of different targets. This feature of the metric-learning method can make up for the shortcomings of the image-set (ie, image set, also known as atlas) method. At present, related algorithms include large margin nearest neighbor LMNN (Large margin nearest neighbor) and so on. 3. Since the surveillance camera needs to continuously record data, the amount of data it stores is astonishing. Finding the target image in a large amount of data requires an extremely fast algorithm, and the facial features are compressed and then recognized , which can greatly improve the algorithm efficiency.

因此，在使用多摄像头的监控场景下存在多摄像头人脸识别的问题，由于与常规场景差异性较大，因此在处理方法上与常规的人脸识别具有较大不同。跨摄像头跟踪的对象一般是多源图像，摄像头的不同(制造商及型号)导致图像分辨率，色彩及变形情况会有较大不同；此外，环境光源差异、人为对焦差异及摄像头安装角度差异等也会让不同摄像头生成的图像中同一个目标差异性较大；其次，在监控场景中，运动的目标距离摄像头距离变化较大，会导致成像清晰度及颜色也具有较大差异。以上这些差异都是监控场景的人脸识别所面临的独特问题。此外，跨摄像头跟踪是对多个摄像头的海量数据进行处理，因此快速处理海量数据也是跨摄像头跟踪需要解决的关键问题。Therefore, there is a problem of multi-camera face recognition in the monitoring scene using multiple cameras. Due to the large difference from the conventional scene, the processing method is quite different from the conventional face recognition. The objects tracked across cameras are generally multi-source images. Different cameras (manufacturers and models) lead to large differences in image resolution, color, and deformation; in addition, differences in environmental light sources, differences in human focus, and differences in camera installation angles, etc. It will also make the same target in the images generated by different cameras have greater differences; secondly, in the monitoring scene, the distance between the moving target and the camera varies greatly, which will lead to large differences in imaging clarity and color. These differences are unique problems faced by face recognition in surveillance scenarios. In addition, cross-camera tracking is to process massive data from multiple cameras, so fast processing of massive data is also a key issue that needs to be solved for cross-camera tracking.

发明内容Contents of the invention

针对现有技术中所存在的监控场景中的多摄像头人脸识别的差异性和针对多个摄像头的海量数据处理的技术问题，本发明提供了能够解决上述技问题的一种人脸识别装置和方法，以及目标人跟踪装置和方法。Aiming at the differences in multi-camera face recognition in the monitoring scene in the prior art and the technical problems of massive data processing for multiple cameras, the present invention provides a face recognition device and a face recognition device that can solve the above technical problems. method, and target person tracking apparatus and method.

为了解决上述缺陷，本发明提出一种能够适用于大尺度数据的监控场景下跨摄像头识别的方法。具体地通过以下三种方式来提供优点：1、针对于多个摄像头，为了减少其差异性，使用特征度量空间归一化；2、针对于视频帧的连续性，提取及合并多张人脸特征来描述同一目标，使目标特征空间更完备；以及3、针对大规模数据集运算，使用特征压缩及二阶段匹配方法。In order to solve the above-mentioned defects, the present invention proposes a cross-camera recognition method applicable to large-scale data monitoring scenarios. Specifically, the advantages are provided in the following three ways: 1. For multiple cameras, in order to reduce their differences, use feature metric space normalization; 2. For the continuity of video frames, extract and merge multiple faces Features to describe the same target, so that the target feature space is more complete; and 3. For large-scale data set operations, use feature compression and two-stage matching methods.

根据本发明的一方面，提供了一种人脸识别装置包括：人脸集生成模块，被配置为基于多幅视频图像生成人脸集；特征度量归一化模块，被配置为对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅不同摄像头下相同人脸图像之间的差异性；以及特征包络形成模块，被配置为对度量空间变换的多幅人脸图像进行包络化处理，以将人脸集的多个不同的人脸集特征空间变换为一个相同的人脸集的特征空间。According to an aspect of the present invention, a face recognition device is provided, including: a face set generation module configured to generate a face set based on multiple video images; a feature metric normalization module configured to perform multiple different The same face image under the camera is transformed into a metric space to eliminate the difference between the same face images under different cameras; and the feature envelope forming module is configured to wrap the multiple face images transformed in the metric space Networking processing is used to transform multiple different feature spaces of the face set into a feature space of the same face set.

优选地，多幅不同摄像头下相同人脸图像之间的差异性包括背景光源的差异性和拍摄角度的差异性。Preferably, the differences between the multiple images of the same face under different cameras include differences in background light sources and differences in shooting angles.

优选地，对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅人脸图像之间的差异性进一步包括：将多幅人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的背景光源；以及将多幅人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的拍摄角度。Preferably, performing metric space transformation on the same face images from multiple different cameras to eliminate the differences between the multiple human face images further includes: performing metric space transformation on multiple human face images to make multiple different cameras The same face images under the same background light source; and the metric space transformation is performed on multiple face images, so that the same face images under multiple different cameras have the same shooting angle.

优选地，人脸识别装置还包括：识别模块，被配置为基于设定的目标人，从多个人脸集中选择出与设定的目标人匹配度最高的目标人脸集。Preferably, the face recognition device further includes: a recognition module configured to, based on the set target person, select a target face set with the highest matching degree with the set target person from multiple face sets.

优选地，人脸集生成模块包括：人脸检测模块，被配置为以预定帧间隔获取的视频图像进行人脸检测，其中，根据视频帧率进行确定预定帧间隔。Preferably, the face set generation module includes: a face detection module configured to perform face detection on video images acquired at predetermined frame intervals, wherein the predetermined frame interval is determined according to the video frame rate.

优选地，人脸集生成模块还包括人脸对齐模块，被配置为对进行人脸检测的视频图像进行如下图像处理：被配置为从视频图像中的提取关键点，并且根据双眼瞳孔距离对人脸进行尺度变换；以及根据提取的关键点的位置，估计人脸的偏行角度，并根据人脸的偏航角度将人脸旋转为正面脸，生成包括同一目标人的多幅人脸图像的人脸集，其中，关键点包括眼睛、鼻子和嘴巴。Preferably, the face set generation module also includes a face alignment module configured to perform the following image processing on the video image for face detection: it is configured to extract key points from the video image, and to align people according to the distance between the pupils of the eyes Scale transformation of the face; and according to the position of the extracted key points, the yaw angle of the face is estimated, and the face is rotated into a frontal face according to the yaw angle of the face, and multiple face images including the same target person are generated. A set of human faces, where keypoints include eyes, noses, and mouths.

优选地，特征度量归一化模块还包括：特征提取模块，对人脸集进行特征提取，以形成原始特征人脸集；以及压缩模块，被配置为对原始特征进行压缩处理，以形成压缩特征人脸集。Preferably, the feature metric normalization module also includes: a feature extraction module, which extracts features from the face set to form an original feature face set; and a compression module, configured to compress the original features to form compressed features Human faces set.

优选地，识别模块还包括：粗匹配识别模块，被配置为基于设定的目标人，确定有序的预定数量的人脸集；以及精匹配识别模块，被配置为基于设定的目标人，从有序的预定数量的人脸集中快速确定目标人脸集。Preferably, the identification module further includes: a rough matching identification module, configured to determine an orderly predetermined number of face sets based on the set target person; and a fine matching identification module, configured to, based on the set target person, A target face set is quickly determined from an ordered predetermined number of face sets.

优选地，基于设定的目标人，从多个压缩特征人脸集中确定预定数量的人脸集；根据与设定的目标人的匹配度，以匹配度从高到低的顺序对预定数量的人脸集进行排序，以生成有序的预定数量的人脸集；以及基于设定的目标人，从与有序的预定数量的人脸集相对应的预定数量的原始特征人脸集中快速确定目标人脸集。Preferably, based on the set target person, a predetermined number of face sets are determined from a plurality of compressed feature face sets; according to the matching degree with the set target person, the predetermined number of face sets are determined in order of matching degree from high to low. Sorting the face sets to generate an ordered predetermined number of face sets; and based on the set target person, quickly determine Target face set.

根据本发明的另一方面，提供了一种人脸识别方法，包括以下步骤：基于多幅视频图像生成人脸集；对人脸集中的多幅人脸图像进行度量空间变换，以消除多幅不同摄像头下相同人脸图像之间的差异性；以及对度量空间变换的多幅人脸图像进行包络化处理，以将人脸集的多个不同的人脸集特征空间变换为一个相同的人脸集的特征空间。According to another aspect of the present invention, a face recognition method is provided, comprising the following steps: generating a face set based on multiple video images; performing metric space transformation on multiple face images in the face set to eliminate multiple The difference between the same face images under different cameras; and enveloping the multiple face images of the metric space transformation, so as to transform the feature space of multiple different face sets into one same The feature space of face sets.

优选地，对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅不同摄像头下相同人脸图像之间的差异性进一步包括：将多幅不同摄像头下相同人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的背景光源；以及将多幅人脸图像进行度量空间变换，以使多幅人脸图像具有相同的拍摄角度。Preferably, performing metric space transformation on the same face images under multiple different cameras to eliminate the difference between the same face images under multiple different cameras further includes: performing metric space transformation on the same face images under multiple different cameras Transformation, so that multiple images of the same face under different cameras have the same background light source; and perform metric space transformation on multiple facial images, so that multiple facial images have the same shooting angle.

优选地，人脸识别方法还包括：基于设定的目标人，从多个人脸集中选择出与设定的目标人匹配度最高的目标人脸集。Preferably, the face recognition method further includes: based on the set target person, selecting the target face set with the highest matching degree with the set target person from multiple face sets.

优选地，人脸识别方法还包括：在生成人脸集之前，以预定帧间隔获取的视频图像进行人脸检测，其中，根据视频帧率进行确定预定帧间隔。Preferably, the face recognition method further includes: before generating the face set, performing face detection on video images acquired at predetermined frame intervals, wherein the predetermined frame interval is determined according to the video frame rate.

优选地，在对多幅人脸图像进行度量空间变换之前，对进行人脸检测的视频图像进行如下图像处理：从视频图像中的提取关键点，并且根据双眼瞳孔距离对人脸进行尺度变换；以及根据提取的关键点的位置，估计人脸的偏行角度，并根据人脸的偏航角度将人脸旋转为正面脸，生成包括同一目标人的多幅人脸图像的人脸集，其中，关键点包括眼睛、鼻子和嘴巴。Preferably, before carrying out metric space transformation to a plurality of face images, carry out following image processing to the video image of face detection: extract key point from video image, and carry out scale transformation to face according to binocular pupil distance; And according to the position of the extracted key point, estimate the yaw angle of the face, and rotate the face into a frontal face according to the yaw angle of the face, generate a face set including multiple face images of the same target person, wherein , the key points include eyes, nose and mouth.

优选地，在对人脸集中的多幅人脸图像进行度量空间变换之前还包括以下步骤：对人脸集进行特征提取，以形成原始特征人脸集；以及被配置为对原始特征进行压缩处理，以形成压缩特征人脸集。Preferably, the following steps are also included before performing metric space transformation on multiple face images in the face set: performing feature extraction on the face set to form an original feature face set; and being configured to compress the original features , to form a compressed feature face set.

优选地，人脸识别方法进一步包括：基于设定的目标人，确定有序的预定数量的人脸集；以及基于设定的目标人，从有序的预定数量的人脸集中快速确定目标人脸集。Preferably, the face recognition method further includes: based on the set target person, determining an ordered and predetermined number of human face sets; and based on the set target person, quickly determining the target person from the ordered and ordered number of human faces face set.

优选地，人脸识别方法进一步包括：基于设定的目标人，从多个压缩特征人脸集中确定预定数量的人脸集；根据与设定的目标人的匹配度，以匹配度从高到低的顺序对预定数量的人脸集进行排序，以生成有序的预定数量的人脸集；以及基于设定的目标人，从与有序的预定数量的人脸集相对应的预定数量的原始特征人脸集中快速确定目标人脸集。Preferably, the face recognition method further includes: based on the set target person, determining a predetermined number of face sets from a plurality of compressed feature face sets; according to the matching degree with the set target person, the matching degree is from high to Sorting the predetermined number of human face sets in a low order to generate an ordered predetermined number of human face sets; The target face set is quickly determined from the original feature face set.

根据本发明的又一方面，提供了一种目标人跟踪装置，包括：人脸集生成模块，被配置为基于多幅视频图像生成人脸集；特征度量归一化模块，被配置为对多幅人脸图像进行度量空间变换，以消除多幅人脸图像之间的差异性；特征包络形成模块，被配置为对度量空间变换的多幅人脸图像进行包络化处理，以将人脸集的多个不同的人脸集特征空间变换为一个相同的人脸集的特征空间；人脸识别模块，基于设定的目标人，从多个人脸集中选择出与设定的目标人的人脸图像匹配度最高的目标人脸集；跟踪模块，根据目标人脸集中的每一幅人脸图像的拍摄位置和时间，确定目标人的路线。According to another aspect of the present invention, a target person tracking device is provided, including: a human face set generation module configured to generate a human face set based on multiple video images; a feature metric normalization module configured to The metric space transformation is performed on the face images to eliminate the difference between the multiple face images; the feature envelope forming module is configured to perform envelope processing on the multiple face images transformed by the metric space, so as to The feature space of multiple different face sets in the face set is transformed into the feature space of the same set of faces; the face recognition module, based on the set target person, selects the target person from multiple face sets that matches the set target person. The target face set with the highest matching degree of face images; the tracking module determines the route of the target person according to the shooting location and time of each face image in the target face set.

根据本发明的又一方面，提供了一种目标人跟踪方法，包括以下步骤：基于多幅视频图像生成人脸集；对人脸集中的多幅人脸图像进行度量空间变换，以消除多幅人脸图像之间的差异性；以及对度量空间变换的多幅人脸图像进行包络化处理，以将人脸集的多个不同的人脸集特征空间变换为一个相同的人脸集的特征空间；基于设定的目标人，从多个人脸集中选择出与设定的目标人的人脸图像匹配度最高的目标人脸集；根据目标人脸集中的每一幅人脸图像的拍摄位置和时间，确定目标人的路线。According to yet another aspect of the present invention, a target person tracking method is provided, including the following steps: generating a face set based on multiple video images; performing metric space transformation on multiple face images in the face set to eliminate multiple The difference between the face images; and enveloping the multiple face images of the metric space transformation, so as to transform the feature space of multiple different face sets of the face set into the same face set Feature space; based on the set target person, select the target face set that matches the face image of the set target person from multiple face sets; according to the shooting of each face image in the target face set Location and time, determine the route of the target person.

本发明专门针对监控场景的人脸识别装置和方法。针对于不同摄像头的差异性进行特征度量空间归一化，并结合特征度量空间和人脸图像集进行识别，大幅提高了监控场景下的人脸识别的精度。The present invention is specially aimed at the face recognition device and method of the monitoring scene. The feature metric space is normalized according to the differences of different cameras, and combined with the feature metric space and face image set for recognition, which greatly improves the accuracy of face recognition in surveillance scenarios.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings required in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是根据本发明的实施例的人脸识别装置的框图；Fig. 1 is a block diagram of a face recognition device according to an embodiment of the present invention;

图2是根据本发明的实施例的人脸识别方法的流程图；Fig. 2 is the flow chart of the face recognition method according to an embodiment of the present invention;

图3是根据本发明的实施例的图2中的人脸识别方法中进一步包括的特征提取和压缩步骤的示意图。FIG. 3 is a schematic diagram of feature extraction and compression steps further included in the face recognition method in FIG. 2 according to an embodiment of the present invention.

图4是根据本发明的实施例的人脸识别方法中的包络化处理所生成的特征包络的示意图；4 is a schematic diagram of a feature envelope generated by enveloping processing in a face recognition method according to an embodiment of the present invention;

图5是根据本发明的实施例的目标人跟踪装置的框图；5 is a block diagram of a target person tracking device according to an embodiment of the present invention;

图6是根据本发明的实施例的在目标人跟踪方法的流程图；Fig. 6 is a flow chart of the target person tracking method according to an embodiment of the present invention;

图7是根据本发明的实施例的在目标人跟踪过程中所进行的两个阶段匹配的示意图；FIG. 7 is a schematic diagram of two-stage matching performed in the target person tracking process according to an embodiment of the present invention;

图8A、8B和8C分别地示出了根据本发明的实施例的目标人跟踪过程中的在一个摄像头中跟踪目标人、在多摄像头中继续跟踪目标人以及被跟踪在地图中的轨迹被显示的系统示图。8A, 8B and 8C respectively show that the target person is tracked in one camera, the target person is continued to be tracked in multiple cameras, and the trajectory tracked on the map is displayed in the process of target person tracking according to an embodiment of the present invention. system diagram.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present invention belong to the protection scope of the present invention.

图1是根据本发明的实施例的人脸识别装置的框图。图2是根据本发明的实施例的人脸识别方法的流程图。FIG. 1 is a block diagram of a face recognition device according to an embodiment of the present invention. Fig. 2 is a flowchart of a face recognition method according to an embodiment of the present invention.

参考图1，人脸识别装置包括人脸集生成模块102、特征度量归一化模块104和特征包络形成模块106。具体地，人脸集生成模块102被配置为基于多幅视频图像生成人脸集。特征度量归一化模块104被配置为对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅不同摄像头下相同人脸图像之间的差异性。特征包络形成模块106被配置为对度量空间变换的多幅人脸图像进行包络化处理，以将人脸集的多个不同的人脸集特征空间变换为一个相同的人脸集的特征空间。Referring to FIG. 1 , the face recognition device includes a face set generation module 102 , a feature metric normalization module 104 and a feature envelope formation module 106 . Specifically, the face set generation module 102 is configured to generate a face set based on multiple video images. The feature metric normalization module 104 is configured to perform metric space transformation on multiple same face images from different cameras, so as to eliminate differences among multiple same face images from different cameras. The feature envelope forming module 106 is configured to perform enveloping processing on the multiple face images of the metric space transformation, so as to transform the feature space of a plurality of different face sets of the face set into features of the same face set space.

在一个实施例中，人脸集生成模块102还包括：人脸检测模块，被配置为以预定帧间隔获取的视频图像进行人脸检测，其中，根据视频帧率进行确定预定帧间隔。此外，人脸集生成模块102还包括人脸对齐模块，被配置为对进行人脸检测的视频图像进行如下图像处理：被配置为从视频图像中的提取关键点，并且根据双眼瞳孔距离对人脸进行尺度变换；以及根据提取的关键点的位置，估计人脸的偏行角度，并根据人脸的偏航角度将人脸旋转为正面脸，生成包括同一目标人的多幅人脸图像的人脸集，其中，关键点包括眼睛、鼻子和嘴巴。In one embodiment, the face set generation module 102 further includes: a face detection module configured to perform face detection on video images acquired at predetermined frame intervals, wherein the predetermined frame interval is determined according to the video frame rate. In addition, the human face set generation module 102 also includes a human face alignment module, which is configured to perform the following image processing on the video image for face detection: it is configured to extract key points from the video image, and align the human faces according to the distance between the pupils of the eyes Scale transformation of the face; and according to the position of the extracted key points, the yaw angle of the face is estimated, and the face is rotated into a frontal face according to the yaw angle of the face, and multiple face images including the same target person are generated. A set of human faces, where keypoints include eyes, noses, and mouths.

具体地，多幅不同摄像头下相同人脸图像之间的差异性包括背景光源的差异性和拍摄角度的差异性。利用特征度量归一化模块104对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅人脸图像之间的差异性进一步包括：将多幅人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的背景光源；以及将多幅人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的拍摄角度。另外，特征度量归一化模块104还包括特征提取模块，对人脸集进行特征提取，以形成原始特征人脸集；以及压缩模块，被配置为对原始特征进行压缩处理，以形成压缩特征人脸集。Specifically, the differences between multiple images of the same face under different cameras include differences in background light sources and differences in shooting angles. Using the feature metric normalization module 104 to perform metric space transformation on the same face images under different cameras to eliminate the difference between the multiple face images. Make the same face images under different cameras have the same background light source; and perform metric space transformation on multiple face images, so that the same face images under different cameras have the same shooting angle. In addition, the feature metric normalization module 104 also includes a feature extraction module, which extracts features from the face set to form an original feature face set; and a compression module, configured to compress the original features to form a compressed feature face set. face set.

另外，人脸识别装置还包括识别模块，被配置为基于设定的目标人，从多个人脸集中选择出与设定的目标人匹配度最高的目标人脸集。识别模块还包括：粗匹配识别模块，被配置为基于设定的目标人，确定有序的预定数量的人脸集；以及精匹配识别模块，被配置为基于设定的目标人，从有序的预定数量的人脸集中快速确定目标人脸集。具体地，基于设定的目标人，从多个压缩特征人脸集中确定预定数量的人脸集；根据与设定的目标人的匹配度，以匹配度从高到低的顺序对预定数量的人脸集进行排序，以生成有序的预定数量的人脸集；以及基于设定的目标人，从与有序的预定数量的人脸集相对应的预定数量的原始特征人脸集中快速确定目标人脸集。In addition, the face recognition device further includes a recognition module configured to, based on the set target person, select a target face set with the highest matching degree with the set target person from multiple face sets. The recognition module also includes: a rough matching recognition module configured to determine an ordered predetermined number of face sets based on the set target person; A predetermined number of face sets can be used to quickly determine the target face set. Specifically, based on the set target person, determine a predetermined number of face sets from multiple compressed feature face sets; Sorting the face sets to generate an ordered predetermined number of face sets; and based on the set target person, quickly determine Target face set.

参考图2，人脸识别方法包括以下步骤：在步骤210中，基于多幅视频图像生成人脸集；在步骤220中，对人脸集中的多幅人脸图像进行度量空间变换，以消除多幅不同摄像头下相同人脸图像之间的差异性；以及在步骤230中，对度量空间变换的多幅人脸图像进行包络化处理，以将人脸集的多个不同的人脸集特征空间变换为一个相同的人脸集的特征空间。With reference to Fig. 2, the face recognition method comprises the following steps: in step 210, generate a human face set based on multiple video images; in step 220, carry out metric space transformation to the multiple pieces of human face images in the human face set, to eliminate multiple difference between the same face images under different cameras; and in step 230, enveloping the multiple face images of the metric space transformation, so as to combine the features of multiple different face sets in the face set The space is transformed into a feature space of the same face set.

人脸识别方法还包括：在生成人脸集之前，以预定帧间隔获取的视频图像进行人脸检测，其中，根据视频帧率进行确定预定帧间隔。另外，在对多幅人脸图像进行度量空间变换之前，对进行人脸检测的视频图像进行如下图像处理：从视频图像中的提取关键点，并且根据双眼瞳孔距离对人脸进行尺度变换；以及根据提取的关键点的位置，估计人脸的偏行角度，并根据人脸的偏航角度将人脸旋转为正面脸，生成包括同一目标人的多幅人脸图像的人脸集，其中，关键点包括眼睛、鼻子和嘴巴。The face recognition method further includes: before generating the face set, performing face detection on video images acquired at predetermined frame intervals, wherein the predetermined frame interval is determined according to the video frame rate. In addition, before performing metric space transformation on multiple face images, the following image processing is performed on the video image for face detection: extracting key points from the video image, and performing scale transformation on the face according to the distance between the pupils of the eyes; and According to the positions of the extracted key points, the yaw angle of the face is estimated, and the face is rotated into a frontal face according to the yaw angle of the face, and a face set including multiple face images of the same target person is generated, wherein, Key points include the eyes, nose and mouth.

多幅不同摄像头下相同人脸图像之间的差异性包括背景光源的差异性和拍摄角度的差异性。在对人脸集中的多幅人脸图像进行度量空间变换之前还包括：对人脸集进行特征提取，以形成原始特征人脸集；以及被配置为对原始特征进行压缩处理，以形成压缩特征人脸集。对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅不同摄像头下相同人脸图像之间的差异性进一步包括：将多幅不同摄像头下相同人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的背景光源；以及将多幅人脸图像进行度量空间变换，以使多幅人脸图像具有相同的拍摄角度。The differences among the same face images under different cameras include differences in background light sources and differences in shooting angles. Before performing metric space transformation on multiple face images in the face set, it also includes: performing feature extraction on the face set to form an original feature face set; and being configured to compress the original features to form compressed features Human faces set. Perform metric space transformation on the same face images from different cameras to eliminate the difference between the same face images from different cameras. Make the same face images under different cameras have the same background light source; and perform metric space transformation on the multiple face images so that the multiple face images have the same shooting angle.

另外，人脸识别方法还包括：基于设定的目标人，从多个人脸集中选择出与设定的目标人匹配度最高的目标人脸集。人脸识别方法进一步包括：基于设定的目标人，确定有序的预定数量的人脸集；以及基于设定的目标人，从有序的预定数量的人脸集中快速确定目标人脸集。具体地，人脸识别方法进一步包括：基于设定的目标人，从多个压缩特征人脸集中确定预定数量的人脸集；根据与设定的目标人的匹配度，以匹配度从高到低的顺序对预定数量的人脸集进行排序，以生成有序的预定数量的人脸集；以及基于设定的目标人，从与有序的预定数量的人脸集相对应的预定数量的原始特征人脸集中快速确定目标人脸集。In addition, the face recognition method further includes: based on the set target person, selecting the target face set with the highest matching degree with the set target person from multiple face sets. The face recognition method further includes: determining an ordered and predetermined number of face sets based on the set target person; and quickly determining the target face set from the ordered and predetermined number of face sets based on the set target person. Specifically, the face recognition method further includes: based on the set target person, determining a predetermined number of face sets from a plurality of compressed feature face sets; Sorting the predetermined number of human face sets in a low order to generate an ordered predetermined number of human face sets; The target face set is quickly determined from the original feature face set.

根据本发明的上述实施例，利用监控场景的人脸识别装置和方法，其中，针对于不同摄像头的差异性进行特征度量空间归一化，并结合特征度量空间和人脸图像集进行识别，能够大幅度提高人脸识别的精度。According to the above-mentioned embodiments of the present invention, using the face recognition device and method of the monitoring scene, wherein the feature metric space is normalized for the differences of different cameras, and the feature metric space and the face image set are combined for recognition, which can Greatly improve the accuracy of face recognition.

下文中将参照图3至图8C对目标人跟踪装置及目标人跟踪方法，然后在对目标人跟踪装置中所包括的人脸识别装置进行详细描述并且对目标人跟踪方法中所使用的人脸识别方法进行详细描述。The target person tracking device and the target person tracking method will be described below with reference to Figures 3 to 8C, and then the face recognition device included in the target person tracking device will be described in detail and the face used in the target person tracking method The identification method is described in detail.

图5是根据本发明的实施例的目标人跟踪装置的框图。首先参照图5对目标人跟踪装置进行详细描述。参考图5，根据本发明的另一实施例，目标人跟踪装置包括人脸集生成模块502、特征度量归一化模块504、特征包络形成模块506、人脸识别模块510和跟踪模块512。人脸集生成模块502、特征度量归一化模块504、特征包络形成模块506包括在人脸识别装置中。在优选实施例中，图5所示的人脸集生成模块502、特征度量归一化模块504、特征包络形成模块506可以分别与图1所示的人脸集生成模块102、特征度量归一化模块104和特征包络形成模块106相同或相似。FIG. 5 is a block diagram of a target person tracking device according to an embodiment of the present invention. First, the target person tracking device will be described in detail with reference to FIG. 5 . Referring to FIG. 5 , according to another embodiment of the present invention, the target person tracking device includes a face set generation module 502 , a feature metric normalization module 504 , a feature envelope formation module 506 , a face recognition module 510 and a tracking module 512 . The face set generation module 502, the feature measure normalization module 504, and the feature envelope forming module 506 are included in the face recognition device. In a preferred embodiment, the face set generation module 502 shown in FIG. 5 , the feature metric normalization module 504 , and the feature envelope formation module 506 can be respectively combined with the face set generation module 102 , feature metric normalization module 506 shown in FIG. 1 . The normalization module 104 is the same as or similar to the feature envelope forming module 106 .

下文中，将对目标人跟踪装置中的各个模块分别进行详细描述。In the following, each module in the target person tracking device will be described in detail respectively.

人脸集生成模块502被配置为基于多幅视频图像生成人脸集。在一个实施例中，人脸集生成模块502还包括：人脸检测模块，被配置为以预定帧间隔获取的视频图像进行人脸检测，其中，根据视频帧率进行确定预定帧间隔。此外，人脸集生成模块502还包括人脸对齐模块，被配置为对进行人脸检测的视频图像进行如下图像处理：被配置为从视频图像中的提取关键点，并且根据双眼瞳孔距离对人脸进行尺度变换；以及根据提取的关键点的位置，估计人脸的偏行角度，并根据人脸的偏航角度将人脸旋转为正面脸，生成包括同一目标人的多幅人脸图像的人脸集，其中，关键点包括眼睛、鼻子和嘴巴。The face set generation module 502 is configured to generate a face set based on multiple video images. In one embodiment, the face set generation module 502 further includes: a face detection module configured to perform face detection on video images acquired at predetermined frame intervals, wherein the predetermined frame interval is determined according to the video frame rate. In addition, the human face set generation module 502 also includes a human face alignment module, which is configured to perform the following image processing on the video image for face detection: it is configured to extract key points from the video image, and to align the human face according to the distance between the pupils of the eyes Scale transformation of the face; and according to the position of the extracted key points, the yaw angle of the face is estimated, and the face is rotated into a frontal face according to the yaw angle of the face, and multiple face images including the same target person are generated. A set of human faces, where keypoints include eyes, noses, and mouths.

下文中，将对人脸生成模块502的实例进行详细描述。在具体实例中，首先通过人脸检测模块进行人脸检测和跟踪，然后通过人脸对齐模块进行人脸对齐。具体地，将监控视频分解为图片，每隔几帧对视频图片进行人脸检测。间隔的帧数根据实际的视频帧率确定。所检测的人脸区域，尽量少的包含背景信息，减少背景噪声对于识别的影响。为了获取同一个人的多张图片，需要对人脸图像进行跟踪。人脸跟踪可能会有两种错误：1、跟踪的目标不属于同一个人；解决方法为：实时检测相邻帧人脸图像的相似度，如果相似度低于阈值，则对所跟踪图像设定新的ID，其中，该阈值可以根据用户需要进行设置；2、跟踪的目标属于同一个人，但是跟踪图像无法准确的覆盖人脸区域。解决方法为：利用人脸跟踪框最近的人脸检测图像修正人脸跟踪结果。Hereinafter, an example of the face generation module 502 will be described in detail. In a specific example, face detection and tracking are first performed by the face detection module, and then face alignment is performed by the face alignment module. Specifically, the surveillance video is decomposed into pictures, and face detection is performed on the video pictures every few frames. The interval frames are determined according to the actual video frame rate. The detected face area contains as little background information as possible to reduce the influence of background noise on recognition. In order to obtain multiple pictures of the same person, face images need to be tracked. There may be two kinds of errors in face tracking: 1. The tracked target does not belong to the same person; the solution is to detect the similarity of adjacent frames of face images in real time, and if the similarity is lower than the threshold, set A new ID, where the threshold can be set according to user needs; 2. The tracked target belongs to the same person, but the tracked image cannot accurately cover the face area. The solution is: use the closest face detection image of the face tracking frame to correct the face tracking result.

在具体实例中，接下来进行人脸对齐。人脸对齐对识别结果影响较大。首先对人脸进行眼睛、鼻子、嘴巴四个关键点的提取，然后根据两眼瞳孔的距离对人脸进行尺度变换；然后两个瞳孔连线与水平线之间的夹角，对人脸图像旋转为水平；根据这四个关键点的位置，估计人脸的偏航角度，将人脸旋转到正面脸。如果人脸的这四个关键点无法提取，那么对这些人脸图像不进行处理。在单个图像集的图像数量比较多的情况下，可以对这些关键点无法提取的图像进行丢弃处理，但是需要保证单个图像集的图像数量大于15个。In a specific example, face alignment is performed next. Face alignment has a great influence on the recognition results. First, extract the four key points of the eyes, nose, and mouth from the face, and then perform scale transformation on the face according to the distance between the pupils of the two eyes; then rotate the face image based on the angle between the line connecting the two pupils and the horizontal line is horizontal; according to the positions of these four key points, the yaw angle of the face is estimated, and the face is rotated to the front face. If the four key points of the face cannot be extracted, then these face images are not processed. When the number of images in a single image set is relatively large, the images whose key points cannot be extracted can be discarded, but it is necessary to ensure that the number of images in a single image set is greater than 15.

目标人跟踪装置还包括特征度量归一化模块504，被配置为对多幅人脸图像进行度量空间变换，以消除多幅人脸图像之间的差异性，具体地，多幅不同摄像头下相同人脸图像之间的差异性包括背景光源的差异性和拍摄角度等的差异性。具体地，对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅人脸图像之间的差异性进一步包括：将多幅人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的背景光源；以及将多幅人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的拍摄角度。另外，特征度量归一化模块504还包括特征提取模块，对人脸集进行特征提取，以形成原始特征人脸集；以及压缩模块，被配置为对原始特征进行压缩处理，以形成压缩特征人脸集。利用该特征度量归一化模块504能够大幅度降低背景光源的差异性和拍摄角度等的差异性。The target person tracking device also includes a feature metric normalization module 504, which is configured to perform metric space transformation on multiple face images, so as to eliminate the differences between multiple face images, specifically, the same Differences among face images include differences in background light sources and differences in shooting angles. Specifically, performing metric space transformation on the same face images from multiple different cameras to eliminate the differences between the multiple face images further includes: performing metric space transformation on multiple face images so that multiple different cameras The same face images under the same background light source; and the metric space transformation is performed on multiple face images, so that the same face images under multiple different cameras have the same shooting angle. In addition, the feature metric normalization module 504 also includes a feature extraction module, which extracts features from the face set to form an original feature face set; and a compression module, configured to compress the original features to form a compressed feature face set. face set. Utilizing the feature metric normalization module 504 can greatly reduce differences in background light sources and shooting angles.

下文中，将对特征度量归一化模块504的实例进行详细描述。在具体实例中，当进行基于人脸集的多摄像头特征度量空间归一化时，首先通过特征提取模块和压缩模块人脸检测模块进行人脸检测和跟踪，然后通过特征度量归一化模块504进行多摄像头特征度量空间归一化。Hereinafter, an example of the feature metric normalization module 504 will be described in detail. In a specific example, when normalizing the feature metric space of a multi-camera based on a face set, the face detection and tracking are first performed through the feature extraction module and the compression module, the face detection module, and then through the feature metric normalization module 504 Perform multi-camera feature metric space normalization.

图3是根据本发明的实施例的图2中的人脸识别方法中进一步包括的特征提取和压缩步骤的示意图。参考图3，通过特征提取模块和压缩模块人脸检测模块进行人脸检测和跟踪。在具体实例中，对对齐后的人脸图像集进行特征提取，所提取特征可以为LBP(即，局部二值模式)、Gabor(即，盖博)或者DCT(即，离散余弦变换)特征。一般情况下，对于32×32或者64×64的图像，其特征维度能够达到1000以上，虽然单个图像处理耗时不多，但是如果当图像数量达到十万级时，其耗时长度将会极为可观，因此需要对所提取特征进行特征压缩处理。对所提取的特征进行采样；然后使用WTA hash压缩技术对所采样的特征进行压缩，压缩结果为：局部最大特征值为1，其余特征值压缩为0；最后，将所有局部区域的压缩串联起来，其结果为此特征向量的压缩结果。在特征压缩过程中，特征采样间隔、局部区域长度根据实际需求确定。压缩后的特征向量为二值压缩，压缩后的特征维度将会减少。FIG. 3 is a schematic diagram of feature extraction and compression steps further included in the face recognition method in FIG. 2 according to an embodiment of the present invention. Referring to Fig. 3, the face detection and tracking are performed by the feature extraction module and the compression module face detection module. In a specific example, feature extraction is performed on the aligned face image set, and the extracted features may be LBP (ie, local binary pattern), Gabor (ie, Gabor) or DCT (ie, discrete cosine transform) features. In general, for a 32×32 or 64×64 image, its feature dimension can reach more than 1000. Although the processing time of a single image is not much, if the number of images reaches 100,000 levels, the time-consuming length will be extremely Therefore, it is necessary to perform feature compression on the extracted features. Sampling the extracted features; then using WTA hash compression technology to compress the sampled features, the compression result is: the local maximum eigenvalue is 1, and the remaining eigenvalues are compressed to 0; finally, the compression of all local areas is connected in series , the result of which is the compressed result of this eigenvector. In the feature compression process, the feature sampling interval and local region length are determined according to actual needs. The compressed feature vector is binary compressed, and the compressed feature dimension will be reduced.

在具体实例中，对多摄像头特征度量空间归一化进行详细描述。此过程需要先训练然后再使用。首先使用不同摄像头之间的样本进行变换矩阵训练，然后在使用过程中将此变换矩阵应用到对应的摄像头上。特征度量空间归一化的目的是减少同一个人在不同摄像头下由于光照及角度的不同导致的差异性。例如，通过该特征度量归一化模块504将左侧光照变换为右侧光照或者将右侧光照变换为左侧光照。另外，例如将不同摄像机相对于目标人进行拍摄的不同角度变换为在相同角度对目标人进行拍摄。此过程将压缩后的特征向量进行度量空间变换，将不同度量空间的特征变化到同一度量空间，即具有相同的metric。此过程使用了metric-learning的方法。Metric-learning能够将输入特征空间变换到具有尺度意义的空间：L为变换矩阵。在将不同空间的特征度量标准投影到同一空间时，需要满足以下两个条件：1将不同ID目标的特征空间距离变大，且距离超过安全距离；2将相同ID目标特征空间距离缩小。这样，就可以避免误分类的情况出现。在描述其误差函数之前，先定义如下符号：In a specific example, the normalization of the multi-camera feature metric space is described in detail. This process requires training before use. First, use the samples between different cameras for transformation matrix training, and then apply this transformation matrix to the corresponding camera during use. The purpose of feature metric space normalization is to reduce the difference caused by the same person under different cameras due to different lighting and angles. For example, the feature metric normalization module 504 transforms the left illumination into the right illumination or transforms the right illumination into the left illumination. In addition, for example, the different angles at which different cameras shoot the target person are converted into shooting the target person at the same angle. This process transforms the compressed feature vector into a metric space, and changes the features of different metric spaces into the same metric space, that is, has the same metric. This process uses the metric-learning method. Metric-learning can transform the input feature space into a scale-meaning space: L is the transformation matrix. When projecting feature metrics from different spaces to the same space, the following two conditions need to be met: 1. Make the feature space distance of different ID objects larger, and the distance exceeds the safety distance; 2. Reduce the feature space distance of the same ID object. In this way, misclassification can be avoided. Before describing its error function, the following symbols are defined:

输入特征向量， input feature vector,

y_ij：如果和具有相同标签为1，否则为0，y _ij : if with have the same label as 1, otherwise 0,

η_ij：如果的相邻目标是时为1，否则为0。η _ij : if The adjacent target of is 1 when , otherwise 0.

其误差函数如下：Its error function is as follows:

ε(L)＝ε_pull(L)+ε_push(L)ε(L)＝ε _pull (L)+ε _push (L)

其中in

ε_pull为将邻域内相同目标拉近的误差函数，ε_push为将邻域内不同目标推远的误差函数。由于直接求解L可能会导致局部最优解，因此将其进行变换：M＝L^TL，这样ε(M)将具有全局最优解，相应的特征变换后的距离形式变换为：最终误差函数形式为：ε _pull is an error function that brings the same target closer in the neighborhood, and ε _push is an error function that pushes different targets in the neighborhood farther away. Since solving L directly may lead to a local optimal solution, it is transformed: M=L ^T L, so that ε(M) will have a global optimal solution, and the distance form after the corresponding feature transformation is transformed into: The final error function form is:

Minimize:Minimize:

ε(M)＝ε_pull(M)+ε_push(M)ε(M)＝ε _pull (M)+ε _push (M)

Subject to:Subject to:

M≥0M≥0

最优值求解利用树回归进行梯度逼近，在复杂情况比如线性不可分时，可引入核技巧。The optimal value solution uses tree regression for gradient approximation. In complex cases such as linear inseparability, kernel techniques can be introduced.

接下来，特征包络形成模块506被配置为对度量空间变换的多幅人脸图像进行包络化处理，以将人脸集的多个不同的人脸集特征空间变换为一个相同的人脸集的特征空间508。具体地，对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅人脸图像之间的差异性进一步包括：将多幅人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的背景光源；以及将多幅人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的拍摄角度。Next, the feature envelope forming module 506 is configured to perform envelope processing on the multiple face images transformed in the metric space, so as to transform a plurality of different face set feature spaces of the face set into one same face The feature space 508 of the set. Specifically, performing metric space transformation on the same face images from multiple different cameras to eliminate the differences between the multiple face images further includes: performing metric space transformation on multiple face images so that multiple different cameras The same face images under the same background light source; and the metric space transformation is performed on multiple face images, so that the same face images under multiple different cameras have the same shooting angle.

图4是根据本发明的实施例的人脸识别方法中的包络化处理所生成的特征包络的示意图。下文中，将参照图4对通过特征包络形成模块所进行的包络化处理进行详细描述。Fig. 4 is a schematic diagram of a feature envelope generated by envelope processing in the face recognition method according to an embodiment of the present invention. Hereinafter, the envelope processing performed by the feature envelope forming module will be described in detail with reference to FIG. 4 .

用若干图像即图像集来描述人脸是为了扩大人脸的特征空间，更加全面的描述人脸。对同一目标，在提取其多张人脸图基础上，对压缩人脸特征进行度量空间变换后，使用image-set方法对其进行包络化处理，最终结果是将同一目标的多张图像表现为不同的加权组合。所谓包络指的是相同的人不同的图片形成的一个图像集所形成的特征空间。假设X_c为包含若干图像特征的特征集，X_c,i为其中一个样本，其中c＝1,…,C，i＝1,…,n_c。例如，n_c为5-50或者任何其他的正整数，优选地，为15、20、和50。更优选地，当n_c为5时，效率最高。包络表示为H_c，具体形式为：The purpose of using several images, that is, image sets, to describe human faces is to expand the feature space of human faces and describe human faces more comprehensively. For the same target, on the basis of extracting multiple face images, after performing metric space transformation on the compressed face features, use the image-set method to envelop them, and the final result is to express multiple images of the same target for different weighted combinations. The so-called envelope refers to the feature space formed by an image set formed by different pictures of the same person. Suppose X _c is a feature set including several image features, X _c,i is one of the samples, where c=1,...,C, i=1,...,n _c . For example, n _c is 5-50 or any other positive integer, preferably 15, 20, and 50. More preferably, when n _c is 5, the efficiency is the highest. The envelope is expressed as H _c , and the specific form is:

其中，α_c,i为每个样本的权重。对于权重有不同的获取方法，在本发明中，采用DLRC权重计算的方法。Among them, α _c,i is the weight of each sample. There are different acquisition methods for the weight, and in the present invention, the method of DLRC weight calculation is adopted.

人脸识别模块(又称识别模块)510基于设定的目标人，从多个人脸集中选择出与设定的目标人的人脸图像匹配度最高的目标人脸集。识别模块还包括：粗匹配识别模块，被配置为基于设定的目标人，确定有序的预定数量的人脸集；以及精匹配识别模块，被配置为基于设定的目标人，从有序的预定数量的人脸集中快速确定目标人脸集。更具体地，基于设定的目标人，从多个压缩特征人脸集中确定预定数量的人脸集；根据与设定的目标人的匹配度，以匹配度从高到低的顺序对预定数量的人脸集进行排序，以生成有序的预定数量的人脸集；以及基于设定的目标人，从与有序的预定数量的人脸集相对应的预定数量的原始特征人脸集中快速确定目标人脸集。The face recognition module (also known as the recognition module) 510 selects the target face set with the highest degree of matching with the face image of the set target person from multiple face sets based on the set target person. The recognition module also includes: a rough matching recognition module configured to determine an ordered predetermined number of face sets based on the set target person; A predetermined number of face sets can be used to quickly determine the target face set. More specifically, based on the set target person, determine a predetermined number of face sets from multiple compressed feature face sets; The face sets are sorted to generate ordered predetermined number of face sets; Determine the target face set.

在以下实例中，将参照附图7以及8A至8C对跨摄像头人脸集包络识别进行详细描述。In the following examples, the cross-camera face set envelope recognition will be described in detail with reference to FIGS. 7 and 8A to 8C.

每个人脸集的特征包络形成后，需要度量不同人脸集特征的差异性，差异性是通过计算人脸集包络的最小距离确定的，最小距离通过求两组加权特征的最小化残差来确定。图3所示为不同人脸集的特征包络示意图，图中有三个人脸集，每个人人脸集由若干人脸图像组成。在实际识别过程中，每个人脸集是一组人脸图特征，每个人脸图特征包括两个特征：原始特征和压缩后特征。原始特征为附图3中的图像集特征以及压缩特征为图3中的图像集哈希编码特征。原始特征和压缩特征的特点分别是：原始特征识别率更高，但是所需时间长；压缩特征识别率低于原始特征，但是所需时间短。识别过程分为两个阶段：After the feature envelope of each face set is formed, it is necessary to measure the difference of the features of different face sets. The difference is determined by calculating the minimum distance of the envelope of the face set. Bad to be sure. Figure 3 is a schematic diagram of feature envelopes of different face sets. There are three face sets in the figure, and each face set consists of several face images. In the actual recognition process, each face set is a set of face map features, and each face map feature includes two features: original features and compressed features. The original feature is the image set feature in Figure 3 and the compressed feature is the image set hash code feature in Figure 3. The characteristics of the original features and the compressed features are: the recognition rate of the original features is higher, but it takes a long time; the recognition rate of the compressed features is lower than that of the original features, but the time required is short. The identification process is divided into two phases:

1、粗匹配阶段：使用压缩后特征，对象为全体识别人脸集；此阶段速度优先；1. Rough matching stage: use the compressed features, and the object is the set of all recognized faces; speed is the priority at this stage;

2、精确匹配阶段：使用原始特征，对象为粗匹配结果中比较靠前的N个人脸集；此阶段精度优先。2. Accurate matching stage: use the original features, and the object is the N face set that is relatively high in the rough matching result; the accuracy is prioritized in this stage.

图4为二阶段匹配示意图，在粗匹配阶段，所有的测试样本用于匹配，使用压缩后的特征进行人脸图像集的距离计算，然后根据与待匹配人脸的距离对测试样本进行排序。在粗匹配阶段可能发生错误匹配问题，比如更相似的人脸排在后面，因此需要精确匹配来调整粗匹配的结果。精确匹配仅对根据距离排序在最前面的若干个图像集进行运算，所使用的特征为原始特征。例如，在粗匹配阶段，可以从要识别的被压缩的全体人脸集中快速选择出2-50个或任意其他数量的有序的人脸集；任何在精确匹配阶段中，从粗匹配阶段所获得的有序人脸集中确定目标人脸集。有序的人脸集的数量可以根据用户需要进行确定，例如，5个、10个、以及15。Figure 4 is a schematic diagram of two-stage matching. In the rough matching stage, all test samples are used for matching, and the compressed features are used to calculate the distance of the face image set, and then the test samples are sorted according to the distance to the face to be matched. False matching problems may occur in the rough matching stage, such as more similar faces are ranked behind, so exact matching is needed to adjust the results of rough matching. Exact matching only operates on the top image sets sorted by distance, and the features used are original features. For example, in the rough matching stage, 2-50 or any other number of ordered face sets can be quickly selected from the compressed set of all faces to be recognized; The obtained ordered face set determines the target face set. The number of ordered face sets can be determined according to user needs, for example, 5, 10, and 15.

跟踪模块512根据目标人脸集中的每一幅人脸图像的拍摄位置和时间，确定目标人的路线。具体地，在人脸识别模块选择出的与设定的目标人的人脸图像匹配度最高的目标人脸集之后，跟踪模块512根据所获得的目标人脸集中的每一幅人脸图像的拍摄位置和时间，在地图上显示出此人的运动轨迹。The tracking module 512 determines the route of the target person according to the shooting location and time of each face image in the target face set. Specifically, after the face recognition module selects the target face set with the highest matching degree with the set target person's face image, the tracking module 512 according to each face image in the obtained target face set The location and time of the capture are shown on a map showing the person's trajectory.

在跨相机摄像头跟踪的场景下，本发明能够有效的对不同相机摄像头中的同一个人进行跟踪和识别，最终在地图上显示出此人的运动轨迹。图8A至图8C分别地显示了以下结果：(a)在一个相机摄像头中跟踪目标人物；(b)在多个相机摄像头中继续跟踪目标人物；(c)显示被跟踪人在地图中的轨迹。In the scene of cross-camera tracking, the present invention can effectively track and identify the same person in different cameras, and finally display the person's trajectory on the map. Figures 8A to 8C respectively show the following results: (a) tracking the target person in one camera; (b) continuing to track the target in multiple cameras; (c) showing the trajectory of the tracked person in the map .

图6是根据本发明的实施例的在目标人跟踪方法的流程图。以下将参照图6对目标人跟踪方法进行详细描述。Fig. 6 is a flowchart of a method for tracking a target person according to an embodiment of the present invention. The target person tracking method will be described in detail below with reference to FIG. 6 .

目标人跟踪方法包括以下步骤：The target person tracking method includes the following steps:

在步骤610中，基于多幅视频图像生成人脸集；在生成人脸集之前，以预定帧间隔获取的视频图像进行人脸检测，其中，根据视频帧率进行确定预定帧间隔。在对多幅人脸图像进行度量空间变换之前，对进行人脸检测的视频图像进行如下图像处理：从视频图像中的提取关键点，并且根据双眼瞳孔距离对人脸进行尺度变换；以及根据提取的关键点的位置，估计人脸的偏行角度，并根据人脸的偏航角度将人脸旋转为正面脸，生成包括同一目标人的多幅人脸图像的人脸集，其中，关键点包括眼睛、鼻子和嘴巴。In step 610, a face set is generated based on multiple video images; before the face set is generated, face detection is performed on video images acquired at predetermined frame intervals, wherein the predetermined frame interval is determined according to the video frame rate. Before the metric space transformation is performed on multiple face images, the following image processing is performed on the video image for face detection: extract key points from the video image, and perform scale transformation on the face according to the distance between the pupils of the eyes; and according to the extracted The position of the key points, estimate the yaw angle of the face, and rotate the face into a frontal face according to the yaw angle of the face, and generate a face set including multiple face images of the same target person, where the key points Includes eyes, nose and mouth.

在步骤620中，对人脸集中的多幅人脸图像进行度量空间变换，以消除多幅人脸图像之间的差异性；在进行度量空间变换以前，对人脸集进行特征提取，以形成原始特征人脸集；以及被配置为对原始特征进行压缩处理，以形成压缩特征人脸集。其中，多幅不同摄像头下相同人脸图像之间的差异性包括背景光源的差异性和拍摄角度的差异性。具体地，对多幅不同摄像头下相同人脸图像进行度量空间变换，以消除多幅不同摄像头下相同人脸图像之间的差异性进一步包括：将多幅不同摄像头下相同人脸图像进行度量空间变换，以使多幅不同摄像头下相同人脸图像具有相同的背景光源；以及将多幅人脸图像进行度量空间变换，以使多幅人脸图像具有相同的拍摄角度。In step 620, perform metric space transformation on multiple face images in the face set to eliminate the differences between multiple face images; before performing metric space transformation, perform feature extraction on the face set to form an original feature face set; and being configured to compress the original features to form a compressed feature face set. Among them, the differences between multiple images of the same face under different cameras include differences in background light sources and differences in shooting angles. Specifically, performing metric space transformation on the same face images from multiple different cameras to eliminate the differences between the same face images from different cameras further includes: performing metric space transformation on multiple same face images from different cameras Transformation, so that multiple images of the same face under different cameras have the same background light source; and perform metric space transformation on multiple facial images, so that multiple facial images have the same shooting angle.

在步骤630中，对度量空间变换的多幅人脸图像进行包络化处理，以将人脸集的多个不同的人脸集特征空间变换为一个相同的人脸集的特征空间；In step 630, envelope processing is performed on the plurality of face images transformed by the metric space, so as to transform a plurality of different face set feature spaces of the face set into a feature space of the same face set;

在步骤640中，基于设定的目标人，从多个人脸集中选择出与设定的目标人的人脸图像匹配度最高的目标人脸集。具体地，基于设定的目标人，确定有序的预定数量的人脸集；以及基于设定的目标人，从有序的预定数量的人脸集中快速确定目标人脸集。更具体地，基于设定的目标人，从多个压缩特征人脸集中确定预定数量的人脸集；根据与设定的目标人的匹配度，以匹配度从高到低的顺序对预定数量的人脸集进行排序，以生成有序的预定数量的人脸集；以及基于设定的目标人，从与有序的预定数量的人脸集相对应的预定数量的原始特征人脸集中快速确定目标人脸集。In step 640, based on the set target person, a target face set with the highest matching degree to the set target person's face image is selected from multiple face sets. Specifically, based on the set target person, determine an ordered and predetermined number of human face sets; and based on the set target person, quickly determine the ordered target human face set from the ordered and predetermined number of human faces. More specifically, based on the set target person, determine a predetermined number of face sets from multiple compressed feature face sets; The face sets are sorted to generate ordered predetermined number of face sets; Determine the target face set.

在步骤650中，根据目标人脸集中的每一幅人脸图像的拍摄位置和时间，确定目标人的路线。In step 650, the route of the target person is determined according to the shooting location and time of each face image in the target face set.

因此，本发明的实施例首先对所提取的不变特征进行压缩，然后在image-set方法和metric-learning基础上，提出跨摄像头识别方法，解决监控场景下跨摄像头跟踪的问题。Therefore, the embodiments of the present invention first compress the extracted invariant features, and then propose a cross-camera recognition method based on the image-set method and metric-learning to solve the problem of cross-camera tracking in surveillance scenarios.

因此，利用本发明的实施例的人脸识别装置和人脸识别方法能够提高监控场景下的人脸识别精度。此外，利用该人脸识别装置的目标人跟踪装置以及利用人脸识别方法的目标人跟踪方法能够不仅能够提供人脸识别精度，而且能够通过两个阶段的匹配查找目标人可以在大量的数据中以极快的速度寻找到目标图像。也就是说，在实际应用中，本发明能够辅助使用者从海量数据库中寻找目标人并提取其时间和路径信息，极大的提高工作效率。Therefore, using the face recognition device and the face recognition method of the embodiments of the present invention can improve the face recognition accuracy in a monitoring scene. In addition, the target person tracking device using the face recognition device and the target person tracking method using the face recognition method can not only provide face recognition accuracy, but also can find the target person through two-stage matching in a large amount of data Find the target image at an extremely fast speed. That is to say, in practical application, the present invention can assist users to find target persons from massive databases and extract their time and route information, greatly improving work efficiency.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the scope of the present invention. within the scope of protection.

Claims

1. A face recognition device, characterized in that, comprising:

A human face set generation module is configured to generate a human face set based on multiple video images;

The feature metric normalization module is configured to perform metric space transformation on the multiple identical face images under different cameras, so as to eliminate the difference between the multiple identical facial images under different cameras; and

The feature envelope forming module is configured to perform enveloping processing on the plurality of face images transformed in the metric space, so as to transform the feature space of a plurality of different face sets of the face set into one same person Feature space for face sets.

2 . The face recognition device according to claim 1 , wherein the differences between the plurality of images of the same face under different cameras include differences in background light sources and differences in shooting angles.

3. face recognition device according to claim 2, is characterized in that, carries out metric space transformation to described multiple identical facial images under different cameras, to eliminate the difference between described multiple facial images further include:

Performing metric space transformation on the multiple face images, so that the multiple same face images under different cameras have the same background light source; and

The multiple face images are subjected to metric space transformation, so that the multiple same face images under different cameras have the same shooting angle.

4. The face recognition device according to claim 1, further comprising: a recognition module configured to, based on the set target person, select from a plurality of faces to match the set target person The target face set with the highest degree.

5. The face recognition device according to claim 1, wherein the face set generation module comprises: a face detection module configured to perform face detection with video images acquired at predetermined frame intervals, wherein, The predetermined frame interval is determined according to the video frame rate.

6. face recognition device according to claim 5, is characterized in that, described face collection generation module also comprises face alignment module, is configured to carry out following image processing to the described video image that carries out face detection:

configured to extract key points from the video image, and perform scale transformation on the face according to the distance between the pupils of the eyes; and

Estimating the yaw angle of the human face according to the extracted position of the key point, and rotating the human face into a frontal face according to the yaw angle of the human face, generating a plurality of human face images including the same target person A human face set, wherein the key points include eyes, nose and mouth.

7. face recognition device according to claim 1, is characterized in that, described feature measure normalization module also comprises:

A feature extraction module, which extracts features from the face set to form an original feature face set; and

The compression module is configured to perform compression processing on the original features to form a compressed feature face set.

8. The face recognition device according to claim 4, wherein the recognition module further comprises:

The rough matching recognition module is configured to determine an orderly predetermined number of face sets based on the set target person; and

The fine matching recognition module is configured to quickly determine the target face set from the ordered predetermined number of face sets based on the set target person.

9. The face recognition device according to any one of claims 7 and 8, characterized in that,

Based on the set target person, determine a predetermined number of face sets from a plurality of compressed feature face sets;

According to the degree of matching with the set target person, sort the predetermined number of human face sets in descending order of matching degree, so as to generate the ordered predetermined number of human face sets; and

Based on the set target person, the target human face set is quickly determined from a predetermined number of original feature human face sets corresponding to the ordered predetermined number of human face sets.

10. A face recognition method, characterized in that, comprising the following steps:

Generate a face set based on multiple video images;

Performing metric space transformation on multiple face images in the face set to eliminate the differences between the multiple same face images under different cameras; and

Envelope processing is performed on the plurality of face images transformed by metric space, so as to transform a plurality of different feature spaces of the face set into a same feature space of the face set.

11. The face recognition method according to claim 10, characterized in that the differences between the plurality of images of the same face under different cameras include differences in background light sources and differences in shooting angles.

12. The face recognition method according to claim 11, characterized in that, performing metric space transformation on the same face images under the multiple different cameras, so as to eliminate the gap between the same face images under the multiple different cameras. The differences further include:

Performing a metric space transformation on the multiple identical face images under different cameras, so that the multiple identical facial images under different cameras have the same background light source; and

The multiple human face images are subjected to metric space transformation, so that the multiple human face images have the same shooting angle.

13. The face recognition method according to claim 10, further comprising: based on the set target person, selecting the target face with the highest matching degree with the set target person from a plurality of face sets set.

14. The face recognition method according to claim 10, further comprising: before generating the set of faces, performing face detection on video images acquired at predetermined frame intervals, wherein, performing face detection according to the frame rate of the video The predetermined frame interval is determined.

15. the face recognition method according to claim 14, is characterized in that, before carrying out metric space transformation to described multiple face images, carry out following image processing to the described video image that carries out face detection:

Extracting key points from the video image, and performing scale transformation on the face according to the distance between the pupils of the eyes; and

16. face recognition method according to claim 10, is characterized in that, carrying out metric space transformation to a plurality of pieces of face images in described face collection also comprises the following steps:

performing feature extraction on the face set to form an original feature face set; and

It is configured to compress the original features to form a compressed feature face set.

17. The face recognition method according to claim 13, further comprising:

Based on the set target person, determine an ordered and predetermined number of face sets; and

Based on the set target person, quickly determine the target human face set from the ordered predetermined number of human face sets.

18. The face recognition method according to any one of claims 16 and 17, further comprising:

19. A target person tracking device, comprising:

A feature metric normalization module configured to perform metric space transformation on the multiple face images to eliminate differences between the multiple face images;

The feature envelope forming module is configured to perform enveloping processing on the plurality of face images transformed in the metric space, so as to transform the feature space of a plurality of different face sets of the face set into one same person The feature space of the face set;

The face recognition module, based on the set target person, selects the target face set with the highest matching degree with the face image of the set target person from multiple face sets;

The tracking module determines the route of the target person according to the shooting location and time of each face image in the target face set.

20. A target person tracking method, comprising the following steps:

Generate a face set based on multiple video images;

performing a metric space transformation on multiple face images in the face set to eliminate differences among the multiple face images; and

performing enveloping processing on the plurality of face images transformed by metric space, so as to transform a plurality of different face set feature spaces of the face set into a feature space of the same face set;

Based on the set target person, select the target face set with the highest matching degree with the face image of the set target person from multiple face sets;

The route of the target person is determined according to the shooting location and time of each face image in the target face set.