CN106909870A

CN106909870A - The search method and device of facial image

Info

Publication number: CN106909870A
Application number: CN201510963793.1A
Authority: CN
Inventors: 陆平; 霍静; 贾霞; 刘金羊; 刘明; 张媛媛
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2015-12-22
Filing date: 2015-12-22
Publication date: 2017-06-30
Also published as: WO2017107957A1; WO2017107957A9

Abstract

The present invention provides a face image retrieval method and device, wherein the method includes: training the semantic features of the face images in the gallery to obtain the attribute features of the first number of face images; Refer to the specified semantic features of the face image for training to obtain the similarity features of the specified semantic features of the second quantity relative to the attribute features in the gallery; use the first quantity of attribute features and the second quantity of similarity features as each image in the gallery The feature vector of the face image; calculate the feature vector of the image to be retrieved and the feature vector of each image in the gallery through preset rules, and retrieve one or more images matching the image to be detected according to the calculation results. The present invention solves the problem of poor face retrieval effect caused by using low-level features for face recognition in the related art, and improves the efficiency and matching degree of face retrieval.

Description

Face Image Retrieval Method and Device

技术领域technical field

本发明涉及人脸图像识别领域，具体而言，涉及一种人脸图像的检索方法及装置。The invention relates to the field of face image recognition, in particular to a method and device for retrieving a face image.

背景技术Background technique

随着社会的不断发展以及各方面对于快速有效的自动身份验证的迫切要求，生物特征识别技术在近几十年中得到了飞速的发展。其中人脸识别技术的研究吸引了大批研究者。人脸识别技术应用非常广泛，如协助公安部门刑侦破案，机器自动进行身份验证，视频监控跟踪识别，人脸面部表情分析等等。当前很多国家展开了有关人脸识别的研究，方法主要集中在模板匹配、示例学习、神经网络、基于隐马尔可夫模型的方法，基于支持向量机的方法。With the continuous development of society and the urgent requirements for fast and effective automatic identity verification, biometric technology has developed rapidly in recent decades. Among them, the research on face recognition technology has attracted a large number of researchers. Face recognition technology is widely used, such as assisting the public security department in criminal investigation and solving cases, automatic identity verification by machines, video surveillance tracking and recognition, facial expression analysis and so on. At present, many countries have carried out research on face recognition. The methods mainly focus on template matching, example learning, neural networks, methods based on hidden Markov models, and methods based on support vector machines.

在计算机人脸识别中，可以将那些通过大量图像数据简单处理后获得的特征定义为低层次特征，而将线、面、模式等描述特征定义为高层次特征。图象主成分分析(Principal ComponentAnalysis，简称为PCA)特征、小波变换特征及一些统计特征均属低层次特征的范畴，而人脸部件形状分析的结果则为高层次特征。采用男性，女性，微笑，黑发，带眼镜等属性用于人脸识别能获得不错的结果。此外，利用和某个人脸的相似性数据也可以进行人脸识别。户外脸部检测图库(Labeled Faces in the Wild，简称为LFW)和哥伦比亚大学公众人物脸部图库(Public Figures Face Database，简称为Pubfig)是两个独立的公共数据集，其中的图片都是在非受控环境下获取的。这两个数据集中的姿势、表情、光照等不同会对人脸识别造成很大影响。相关技术中传统的方法只使用低层次特征进行人脸识别，导致人脸检索效果不佳。In computer face recognition, the features obtained through simple processing of a large amount of image data can be defined as low-level features, while the description features such as lines, faces, and patterns can be defined as high-level features. Image Principal Component Analysis (PCA) features, wavelet transform features and some statistical features belong to the category of low-level features, while the results of face part shape analysis are high-level features. Using attributes such as male, female, smiling, black hair, and glasses for face recognition can achieve good results. In addition, face recognition can also be performed using similarity data with a certain face. The Outdoor Face Detection Library (Labeled Faces in the Wild, referred to as LFW) and the Public Figures Face Database of Columbia University (Pubfig for short) are two independent public data sets, the pictures of which are all in Africa. Obtained in a controlled environment. The difference in pose, expression, lighting, etc. in these two data sets will have a great impact on face recognition. Traditional methods in related technologies only use low-level features for face recognition, resulting in poor face retrieval results.

针对相关技术中的上述，目前尚未存在有效的解决方案。For the above-mentioned problems in the related art, there is no effective solution yet.

发明内容Contents of the invention

本发明提供了一种人脸图像的检索方法及装置，以至少解决相关技术中使用低层次特征进行人脸识别，导致人脸检索效果不佳的问题。The present invention provides a face image retrieval method and device to at least solve the problem in the related art that low-level features are used for face recognition, resulting in poor face retrieval results.

本发明提供了一种人脸图像的检索方法，包括：对图库中的人脸图像的语义特征进行训练得到第一数量的所述人脸图像的属性特征；对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的所述指定语义特征相对于所述图库中的属性特征的相似性特征；将所述第一数量的属性特征与所述第二数量的相似性特征作为所述图库中每幅人脸图像的特征向量；通过预设规则计算待检索图像的特征向量与所述图库中每幅图像的特征向量的匹配度，依据所述匹配度检索出与所述待检测图像匹配的一个或多个图像。The present invention provides a method for retrieving human face images, comprising: training the semantic features of human face images in a gallery to obtain attribute features of a first number of said human face images; The specified semantic features of the second quantity are trained to obtain the similarity features of the specified semantic features with respect to the attribute features in the gallery; the attribute features of the first quantity and the similarity features of the second quantity are used as The feature vector of each face image in the gallery; the matching degree between the feature vector of the image to be retrieved and the feature vector of each image in the gallery is calculated by preset rules, and the matching degree is retrieved according to the matching degree. One or more images that detect image matches.

进一步地，所述对图库中的人脸图像的语义特征进行训练得到第一数量的所述人脸图像的属性特征包括：对所述图库中每幅人脸图像中的关键点进行检测，其中，所述关键点包括：双眼的四个眼角、鼻尖以及嘴巴两端；依据所述关键点对所述人脸图像进行区域的划分，并抽取得到与不同区域对应的人脸底层特征；根据属性分类器对不同区域的多个所述人脸底层特征进行分类学习得到不同类型的所述第一数量的属性特征。Further, the training the semantic features of the face images in the gallery to obtain the attribute features of the first quantity of the face images includes: detecting key points in each face image in the gallery, wherein , the key points include: the four corners of the eyes, the tip of the nose, and the two ends of the mouth; according to the key points, the face image is divided into regions, and the underlying features of the face corresponding to different regions are extracted; according to the attributes The classifier classifies and learns the multiple underlying features of the human face in different regions to obtain different types of the first quantity of attribute features.

进一步地，对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的所述指定语义特征相对于所述图库中的属性特征的相似性特征包括：对所述第一预定数量参考人脸图像的关键点进行检测，其中，所述关键点包括：双眼四个眼角、鼻尖以及嘴巴两端；依据所述关键点对所述指定语义特征进行抽取得到与所述指定语义特征对应的数据集，并抽取得到与不同区域对应的人脸底层特征；根据相似性分类器对所述数据集进行分类学习得到第二数量的所述相似性特征。Further, the training of the specified semantic features of the first predetermined number of reference face images to obtain the similarity features of the specified semantic features of the second number with respect to the attribute features in the gallery includes: Refer to the key points of the face image for detection, wherein the key points include: the four corners of the eyes, the tip of the nose, and the two ends of the mouth; the specified semantic features are extracted according to the key points to obtain the corresponding The data set is extracted to obtain the underlying features of the face corresponding to different regions; the data set is classified and learned according to the similarity classifier to obtain the second number of similarity features.

进一步地，所述属性分类器和所述相似性分类器包括：支持向量机SVM分类器。Further, the attribute classifier and the similarity classifier include: a support vector machine (SVM) classifier.

进一步地，通过预设规则对待检索图像的特征向量与所述图库中每幅图像的特征向量进行计算，依据计算结果检索出一个或多个与所述待检测图像匹配的图像包括：获取所述待检索图像的特征向量与所述图库中每幅图像的特征向量；对所述待检索图像的特征向量与所述图库中每幅图像的特征向量进行距离计算，所述距离计算的方法包括：余弦距离方法、欧式距离方法；对多个计算结果按照从大到小的规则进行排序，并从排序后的计算结果中选择取值靠前的第二预定数量的计算结果对应的人脸图像作为所述待检索图像的匹配图像。Further, calculating the eigenvector of the image to be retrieved and the eigenvector of each image in the gallery through preset rules, and retrieving one or more images matching the image to be detected according to the calculation result includes: obtaining the The feature vector of the image to be retrieved and the feature vector of each image in the gallery; the distance calculation is performed between the feature vector of the image to be retrieved and the feature vector of each image in the gallery, and the distance calculation method includes: Cosine distance method, Euclidean distance method; multiple calculation results are sorted according to the rule from large to small, and the face images corresponding to the second predetermined number of calculation results with the highest value are selected from the sorted calculation results as The matching image of the image to be retrieved.

根据本发明的另一个方面，提供了一种人脸图像的检索装置，包括：第一语义特征提取模块，用于对图库中的人脸图像的语义特征进行训练得到第一数量的所述人脸图像的属性特征；第二语义特征提取模块，用于对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的所述指定语义特征相对于所述图库中的属性特征的相似性特征；处理模块，用于将所述第一数量的属性特征与所述第二数量的相似性特征作为所述图库中每幅人脸图像的特征向量；检索模块，用于通过预设规则计算待检索图像的特征向量与所述图库中每幅图像的特征向量的匹配度，依据所述匹配度检索出与所述待检测图像匹配的一个或多个图像。According to another aspect of the present invention, there is provided a human face image retrieval device, including: a first semantic feature extraction module, used to train the semantic features of the human face images in the gallery to obtain the first number of said people The attribute feature of the face image; the second semantic feature extraction module is used to train the specified semantic features of the first predetermined number of reference face images to obtain the second quantity of the specified semantic features relative to the attribute features in the gallery Similarity feature; processing module, for using the attribute feature of the first quantity and the similarity feature of the second quantity as the feature vector of each face image in the gallery; retrieval module, for by preset The rule calculates the matching degree between the feature vector of the image to be retrieved and the feature vector of each image in the gallery, and retrieves one or more images matching the image to be detected according to the matching degree.

进一步地，所述第一语义特征提取模块包括：第一检测单元，用于对所述图库中每幅人脸图像中的关键点进行检测，其中，所述关键点包括：四个眼角、鼻尖以及嘴巴两端；第一处理单元，用于依据所述关键点对所述人脸图像进行区域的划分，并抽取得到与不同区域对应的人脸底层特征；第二语义特征提取单元，用于根据属性分类器对不同区域的多个所述人脸底层特征进行分类学习得到不同类型的所述第一数量的属性特征。Further, the first semantic feature extraction module includes: a first detection unit for detecting key points in each face image in the gallery, wherein the key points include: four eye corners, nose tip and both ends of the mouth; the first processing unit is used to divide the face image into regions according to the key points, and extract the underlying features of the face corresponding to different regions; the second semantic feature extraction unit is used to According to the attribute classifier, classify and learn the plurality of human face underlying features in different regions to obtain the first number of attribute features of different types.

进一步地，所述第二语义特征提取模块包括：第二检测单元，用于对所述第一预定数量参考人脸图像的关键点进行检测，其中，所述关键点包括：双眼四个眼角、鼻尖以及嘴巴两端；第二处理单元，用于依据所述关键点对所述指定语义特征进行抽取得到与所述指定语义特征对应的数据集，并抽取得到与不同区域对应的人脸底层特征；第二语义特征提取单元，用于根据相似性分类器对所述数据集进行分类学习得到第二数量的所述相似性特征。Further, the second semantic feature extraction module includes: a second detection unit, configured to detect key points of the first predetermined number of reference face images, wherein the key points include: four corners of both eyes, The tip of the nose and both ends of the mouth; the second processing unit is used to extract the specified semantic features according to the key points to obtain a data set corresponding to the specified semantic features, and to extract the underlying features of the face corresponding to different regions and a second semantic feature extraction unit, configured to classify and learn the data set according to a similarity classifier to obtain a second number of similarity features.

进一步地，所述检索模块包括：获取单元，用于获取所述待检索图像的特征向量与所述图库中每幅图像的特征向量；计算单元，用于对所述待检索图像的特征向量与所述图库中每幅图像的特征向量进行距离计算，所述距离计算的方法包括：余弦距离方法、欧式距离方法；检索单元，用于对多个计算结果按照从大到小的规则进行排序，并从排序后的计算结果中选择取值靠前的第二预定数量的计算结果对应的人脸图像作为所述待检索图像的匹配图像。Further, the retrieval module includes: an acquisition unit for obtaining the feature vector of the image to be retrieved and the feature vector of each image in the gallery; a calculation unit for calculating the feature vector and the feature vector of the image to be retrieved The eigenvectors of each image in the gallery are subjected to distance calculation, and the distance calculation methods include: cosine distance method and Euclidean distance method; a retrieval unit is used to sort a plurality of calculation results according to the rules from large to small, And from the sorted calculation results, select the face images corresponding to the second predetermined number of calculation results with higher values as the matching images of the image to be retrieved.

在本发明中，采用的是对人脸图像的语义特征进行训练得到第一数量的属性特征，此外还对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的指定语义特征相对于图库中的属性特征的相似性特征，通过得到的第一数量的属性特征与第二数量的相似性特征得到每幅图像的特征向量，通过待检测图像与图库中图像的特征向量比较，检索出与待检测图像匹配的一个或多个图像，即是通过待检测图像与图库中的图像的特征向量进行比较，该特征向量是由于属性特征与相似性特征组成，而属性特征与相似性特征都属于高层次特征，因此匹配出来的结果与待检测图像的匹配度高，从而解决了相关技术中使用低层次特征进行人脸识别，导致人脸检索效果不佳的问题，提高了人脸检索的效率与匹配度。In the present invention, the semantic features of the face image are trained to obtain the first number of attribute features, and in addition, the specified semantic features of the first predetermined number of reference face images are trained to obtain the second number of specified semantic features With respect to the similarity features of the attribute features in the gallery, the feature vectors of each image are obtained by obtaining the first number of attribute features and the second number of similarity features, and comparing the image to be detected with the feature vectors of the images in the gallery, To retrieve one or more images that match the image to be detected is to compare the feature vector of the image to be detected with the image in the gallery. The feature vector is composed of attribute features and similarity features, and attribute features and similarity The features are all high-level features, so the matching result has a high matching degree with the image to be detected, thus solving the problem of using low-level features for face recognition in related technologies, which leads to poor face retrieval results, and improving the face detection efficiency. Retrieval efficiency and matching degree.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The accompanying drawings described here are used to provide a further understanding of the present invention and constitute a part of the application. The schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention. In the attached picture:

图1是根据本发明实施例的人脸图像的检索方法的流程图；Fig. 1 is the flow chart of the retrieval method of face image according to the embodiment of the present invention;

图2是根据本发明实施例的人脸图像的检索装置的结构框图；Fig. 2 is the structural block diagram of the retrieval device of face image according to the embodiment of the present invention;

图3是根据本发明实施例的人脸图像的检索装置的可选结构框图一；Fig. 3 is an optional structural block diagram 1 of a retrieval device for a face image according to an embodiment of the present invention;

图4是根据本发明实施例的人脸图像的检索装置的可选结构框图二；Fig. 4 is an optional structural block diagram 2 of a retrieval device for a face image according to an embodiment of the present invention;

图5是根据本发明实施例的人脸图像的检索装置的可选结构框图三；FIG. 5 is an optional structural block diagram three of a face image retrieval device according to an embodiment of the present invention;

图6是根据本发明可选实施例的人脸关键点检测的示意图；Fig. 6 is a schematic diagram of face key point detection according to an optional embodiment of the present invention;

图7是根据本发明可选实施例的坐标系统示意图；Fig. 7 is a schematic diagram of a coordinate system according to an optional embodiment of the present invention;

图8a～8b是根据本发明可选实施例的人脸图像旋转对齐之前和之后的对比示意图；8a-8b are schematic diagrams of comparisons before and after rotational alignment of face images according to an optional embodiment of the present invention;

图9是根据本发明可选实施例的相似性图像区域分割的示意图；Fig. 9 is a schematic diagram of similarity image region segmentation according to an optional embodiment of the present invention;

图10是根据本发明可选实施例的属性/相似性特征分类器学习以及特征提取过程示意图；Fig. 10 is a schematic diagram of attribute/similarity feature classifier learning and feature extraction process according to an optional embodiment of the present invention;

图11是根据本发明可选实施例的图片入库以及检索流程示意图。Fig. 11 is a schematic diagram of image storage and retrieval process according to an optional embodiment of the present invention.

具体实施方式detailed description

下文中将参考附图并结合实施例来详细说明本发明。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present invention will be described in detail with reference to the drawings and examples. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.

在本实施例中提供了一种人脸图像的检索方法，图1是根据本发明实施例的人脸图像的检索方法的流程图，如图1所示，该流程包括如下步骤：A kind of retrieval method of human face image is provided in the present embodiment, and Fig. 1 is the flow chart of the retrieval method of human face image according to the embodiment of the present invention, as shown in Fig. 1, this process comprises the following steps:

步骤S102：对图库中的人脸图像的语义特征进行训练得到第一数量的人脸图像的属性特征；Step S102: training the semantic features of the face images in the gallery to obtain the attribute features of the first number of face images;

步骤S104：对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的指定语义特征相对于图库中的相似性特征；Step S104: Train the specified semantic features of the first predetermined number of reference face images to obtain the second number of specified semantic features relative to the similarity features in the gallery;

步骤S106：将第一数量的属性特征与第二数量的相似性特征作为图库中每幅人脸图像的特征向量；Step S106: using the first number of attribute features and the second number of similarity features as feature vectors of each face image in the gallery;

步骤S108：通过预设规则计算待检索图像的特征向量与所述图库中每幅图像的特征向量的匹配度，依据所述匹配度检索出与所述待检测图像匹配的一个或多个图像。Step S108: Calculate the matching degree between the feature vector of the image to be retrieved and the feature vector of each image in the gallery according to preset rules, and retrieve one or more images matching the image to be detected according to the matching degree.

通过步骤S102至步骤S108可知，在本实施例中首先采用的是对人脸图像的语义特征进行训练得到第一数量的属性特征；此外还对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的指定语义特征相对于图库中的属性特征的相似性特征，通过得到的第一数量的属性特征与第二数量的相似性特征得到每幅图像的特征向量，通过待检测图像与图库中图像的特征向量比较，检索出与待检测图像匹配的一个或多个图像，也就是说，在本实施例是通过待检测图像与图库中的图像的特征向量进行比较，该特征向量是由于属性特征与相似性特征组成，而属性特征与相似性特征都属于高层次特征，因此匹配出来的结果与待检测图像的匹配度高，从而解决了相关技术中使用低层次特征进行人脸识别，导致人脸检索效果不佳的问题，提高了人脸检索的效率与匹配度。From step S102 to step S108, it can be known that in this embodiment, at first, the semantic features of the face image are trained to obtain the first number of attribute features; in addition, the first predetermined number of specified semantic features of the reference face image Train to obtain the similarity features of the specified semantic features of the second number relative to the attribute features in the gallery, obtain the feature vector of each image through the obtained first number of attribute features and the second number of similarity features, and pass the image to be detected Compared with the feature vector of the image in the gallery, one or more images matching the image to be detected are retrieved, that is to say, in this embodiment, the feature vector of the image to be detected is compared with the image in the gallery, and the feature vector It is due to the composition of attribute features and similarity features, and both attribute features and similarity features belong to high-level features, so the matching result has a high matching degree with the image to be detected, thus solving the problem of using low-level features in related technologies to detect faces. Recognition, which leads to the problem of poor face retrieval effect, improves the efficiency and matching degree of face retrieval.

对于上述步骤S102中涉及到的对图库中的人脸图像的语义特征进行训练得到第一数量的人脸图像的属性特征的方式，在本实施例的可选实施方式中，可以通过如下方式来实现：Regarding the method of training the semantic features of the face images in the gallery to obtain the attribute features of the first number of face images involved in the above step S102, in an optional implementation of this embodiment, it can be implemented in the following manner accomplish:

步骤S102-1：对图库中每幅人脸图像中的关键点进行检测，其中，关键点包括：双眼的四个眼角、鼻尖以及嘴巴两端；Step S102-1: Detect the key points in each face image in the gallery, where the key points include: the four corners of the eyes, the tip of the nose and the two ends of the mouth;

需要说明的是，上述中涉及到的关键点仅仅是本可选实施例的优选关键点，并不构成对本发明的限定，例如其他关键点：头发、下巴、耳朵等等都是可以的。也就是说，只要是人脸上的特征都是可以的。It should be noted that the key points mentioned above are only preferred key points of this optional embodiment and do not constitute a limitation to the present invention. For example, other key points: hair, chin, ears, etc. are all possible. In other words, as long as it is the features of the human face, it is acceptable.

步骤S102-2：依据关键点对人脸图像进行区域的划分，并抽取得到与不同区域对应的人脸底层特征；Step S102-2: Divide the face image into regions according to the key points, and extract the underlying features of the face corresponding to different regions;

步骤S102-3：根据属性分类器对不同区域的多个人脸底层特征进行分类学习得到不同类型的第一数量的属性特征。Step S102-3: According to the attribute classifier, classify and learn the multiple underlying features of human faces in different regions to obtain a first number of attribute features of different types.

上述步骤S102-1至步骤S102-3，在本实施例的具体应用场景中，该第一数量可选取值为69，因为人脸属性特征包括：男性，女性，微笑，黑发，带眼镜等等，这些都描述了人脸的语义特征，而人脸属性分类器的目标是对人脸图像进行分类，判断该人脸图像是否具有某个属性，即在具体应用场景基于上述步骤可以训练微笑、黑发、戴眼镜等69个属性特征的分类器用于表示人脸特征，而属性特征提取就是通过训练得到的属性特征分类器提取人脸的属性特征，也就是通过69个训练好的属性分类器对图像计算得到69个属性值，拼接形成该图像属性特征。In the above step S102-1 to step S102-3, in the specific application scenario of this embodiment, the first number can be selected as a value of 69, because the face attribute features include: male, female, smiling, black hair, wearing glasses Wait, these all describe the semantic features of the face, and the goal of the face attribute classifier is to classify the face image and judge whether the face image has a certain attribute, that is, it can be trained based on the above steps in specific application scenarios A classifier with 69 attribute features such as smiling, black hair, and wearing glasses is used to represent the face features, and the attribute feature extraction is to extract the attribute features of the face through the attribute feature classifier obtained through training, that is, through 69 trained attributes The classifier calculates 69 attribute values for the image, and concatenates them to form the attribute characteristics of the image.

基于上述描述，在本实施例的可选实施方式中，该步骤S102-1至步骤S103-3中涉及到的方式可以是：Based on the above description, in an optional implementation manner of this embodiment, the methods involved in step S102-1 to step S103-3 may be:

首先，是抽取属性的底层特征：对图库中的每张人脸图像进行人脸检测，关键点定位，获得人脸图像的关键点信息，将图片旋转对齐。根据属性需求对图像进行区域分割(例如眼镜属性对应的区域为眼睛区域，白头发属性对应的区域为头发区域)，不同的属性可能需要分割出不同数目的区域。对分割出的区域提取出该属性该区域有效的底层特征。First of all, the underlying features of the extracted attributes: perform face detection on each face image in the gallery, locate key points, obtain key point information of the face image, and rotate and align the images. The image is segmented according to the attribute requirements (for example, the region corresponding to the glasses attribute is the eye region, and the region corresponding to the white hair attribute is the hair region), and different attributes may need to be divided into different numbers of regions. For the segmented area, extract the underlying features that are valid for this attribute.

然后，将抽取出的底层属性特征分为数量相等的两部分，一半用于训练，一半用于测试(当然这仅仅是举例说明，其他比例也是可以的，可以根据实际情况进行划分)。如果某个属性使用了多个分割区域需要先进行特征拼接，如是否带耳环这个属性需要使用到左右两边耳朵区域。对该底层特征学习相应的SVM属性分类器，如笑脸属性分类器、黑头发属性分类器、眼镜属性分类器等共69个属性分类器。Then, divide the extracted underlying attribute features into two equal parts, half for training and half for testing (of course, this is just an example, other ratios are also possible, and can be divided according to the actual situation). If a certain attribute uses multiple segmentation regions, feature stitching needs to be performed first, such as whether to wear earrings or not, this attribute needs to use the left and right ear regions. Learn the corresponding SVM attribute classifiers for the underlying features, such as smiley face attribute classifiers, black hair attribute classifiers, glasses attribute classifiers, and a total of 69 attribute classifiers.

最后，对依据属性分类器的分类值验证属性分类效果。Finally, the attribute classification effect is verified against the classification value of the attribute classifier.

步骤S104中涉及到的对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的指定语义特征相对于图库中的属性特征的相似性特征的方式，在本实施例的可选实施方式中，可以通过如下方式来实现：The manner involved in step S104 to train the specified semantic features of the first predetermined number of reference face images to obtain the similarity features of the second number of specified semantic features with respect to the attribute features in the gallery is optional in this embodiment. In the implementation mode, it can be realized in the following ways:

步骤S104-1：对第一预定数量参考人脸图像的关键点进行检测，其中，关键点包括：双眼四个眼角、鼻尖以及嘴巴两端；Step S104-1: Detect key points of the first predetermined number of reference face images, wherein the key points include: the four corners of both eyes, the tip of the nose, and both ends of the mouth;

与上述步骤S102-1中涉及到的关键点一样，上述中涉及到的关键点仅仅是本可选实施例的优选关键点，并不构成对本发明的限定，例如其他关键点：头发、下巴、耳朵等等都是可以的。也就是说，只要是人脸上的特征都是可以的。Like the key points involved in the above step S102-1, the key points involved in the above are only the preferred key points of this optional embodiment, and do not constitute a limitation of the present invention, such as other key points: hair, chin, Ears etc. are fine. In other words, as long as it is the features of the human face, it is acceptable.

步骤S104-2：依据关键点对指定语义特征进行抽取得到与指定语义特征对应的数据集，并抽取得到与不同区域对应的人脸底层特征；Step S104-2: Extract the specified semantic features according to the key points to obtain a data set corresponding to the specified semantic features, and extract the underlying features of the face corresponding to different regions;

步骤S104-3:根据相似性分类器对数据集进行分类学习得到第二数量的相似性特征。Step S104-3: Perform classification learning on the data set according to the similarity classifier to obtain a second number of similarity features.

其中，在上述涉及到的第一预定数量可以取值为10，基于此，上述步骤S104-1至步骤S104-4，在具体应用场景中，相似性分类器的训练过程如下：Wherein, the first predetermined number involved in the above may take a value of 10. Based on this, the above step S104-1 to step S104-4, in a specific application scenario, the training process of the similarity classifier is as follows:

首先，选取例如10个参考人，分别对每个参考人单独处理，将每个参考人所有的图片作为正样本，并选择同等数量的其他人脸图片作为负样本，以参考人为单位构成一个数据集。First, select, for example, 10 reference persons, and process each reference person separately, take all the pictures of each reference person as positive samples, and select the same number of other face pictures as negative samples, and form a data set with reference persons as units set.

然后，对每个数据集按照如下过程处理抽取特征：先进行人脸检测以及关键点定位，并将图片旋转对齐。接着在每张人脸图片上分别分割出眼睛、眉毛、鼻子和嘴巴四个子块，对四个子块分别抽取底层特征。将这个数据集转化成4个新的子数据集，即眼睛数据集、眉毛数据集、嘴巴数据集和鼻子数据集。Then, process the extracted features for each data set as follows: first perform face detection and key point positioning, and rotate and align the pictures. Then, four sub-blocks of eyes, eyebrows, nose and mouth are segmented on each face picture, and the underlying features are extracted from the four sub-blocks. Transform this dataset into 4 new sub-datasets, namely eye dataset, eyebrow dataset, mouth dataset and nose dataset.

最后，将每个参考人的各个子数据集进行划分，一半数据作为训练，另一半数据作测试，在训练集上学习SVM模型，在测试集上验证相似性分类器分类效果。将训练产生的模型文件以及特征归一化文件进行保存，用于后续的相似性特征提取。Finally, each sub-dataset of each reference person is divided, half of the data is used as training, and the other half is used as testing. The SVM model is learned on the training set, and the classification effect of the similarity classifier is verified on the test set. Save the model file and feature normalization file generated by training for subsequent similarity feature extraction.

需要说明的是，本实施例中涉及到的属性分类器和相似性分类器可选为支持向量机(Support Vector Machine，简称为SVM)属性分类器。It should be noted that the attribute classifier and the similarity classifier involved in this embodiment may be a support vector machine (Support Vector Machine, SVM for short) attribute classifier.

此外，在本实施例的另一个可选实施方式中，步骤S108中涉及到的通过预设规则对待检索图像的特征向量与图库中每幅图像的特征向量进行计算，依据计算结果检索出一个或多个与待检测图像匹配的图像的方式，可以通过如下方式来实现：In addition, in another optional implementation of this embodiment, the feature vector of the image to be retrieved and the feature vector of each image in the gallery involved in step S108 are calculated according to the preset rules, and one or more images are retrieved according to the calculation result. The mode of multiple images matching the image to be detected can be realized in the following way:

步骤S108-1：获取待检索图像的特征向量与图库中每幅图像的特征向量；Step S108-1: Obtain the feature vector of the image to be retrieved and the feature vector of each image in the gallery;

步骤S108-2：对待检索图像的特征向量与图库中每幅图像的特征向量进行距离计算；Step S108-2: Calculate the distance between the feature vector of the image to be retrieved and the feature vector of each image in the gallery;

其中，该距离计算的方法，在本可选实施方式可以是余弦距离方法或欧式距离方法。Wherein, the distance calculation method may be a cosine distance method or a Euclidean distance method in this optional implementation manner.

步骤S108-3：对多个计算结果按照从大到小的规则进行排序，并从排序后的计算结果中选择取值靠前的第二预定数量的计算结果对应的人脸图像作为待检索图像的匹配图像。Step S108-3: Sort the multiple calculation results according to the rule from large to small, and select the face images corresponding to the second predetermined number of calculation results with the highest values as the images to be retrieved from the sorted calculation results matching image.

在具体应用场景中，上述步骤S108-1至步骤S108-3的方式可以是：在得到人脸图像的属性值和相似性值构成的特征向量后，可以进行基于组合人脸属性特征和相似性特征的人脸检索，将获得的69个属性值和40个相似性值拼接作为每副图像的特征向量，再使用大间隔最近邻居(Large Margin Nearest Neighbors，简称为LMNN)算法优化特征向量每一维的权重。使用特征向量和权重便可计算两张人脸的相似度。本实施例采用两个向量夹角的余弦(Cosine)作为向量之间的相似性数值，cosθ取值范围在[-1,+1]，越接近于+1，代表两张图片中的人脸越相似。In a specific application scenario, the method of the above step S108-1 to step S108-3 can be: after obtaining the feature vector composed of the attribute value and similarity value of the face image, it can be based on the combination of face attribute features and similarity Feature face retrieval, the obtained 69 attribute values and 40 similarity values are concatenated as the feature vector of each image, and then the Large Margin Nearest Neighbors (LMNN) algorithm is used to optimize each feature vector. dimension weight. The similarity between two faces can be calculated using feature vectors and weights. In this embodiment, the cosine (Cosine) of the angle between two vectors is used as the similarity value between the vectors. The value range of cosθ is [-1,+1], and the closer to +1, it represents the faces in the two pictures more similar.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products are stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present invention.

在本实施例中还提供了一种人脸图像的检索装置，该装置用于实现上述实施例及优选实施方式，已经进行过说明的不再赘述。如以下所使用的，术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现，但是硬件，或者软件和硬件的组合的实现也是可能并被构想的。In this embodiment, a device for retrieving human face images is also provided, and the device is used to implement the above embodiments and preferred implementation modes, and what has been explained will not be repeated. As used below, the term "module" may be a combination of software and/or hardware that realizes a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.

图2是根据本发明实施例的人脸图像的检索装置的结构框图，如图2所示，该装置包括：第一语义特征提取模块22，用于对图库中的人脸图像的语义特征进行训练得到第一数量的人脸图像的属性特征；第二语义特征提取模块24，与第一语义特征提取模块22耦合连接，用于对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的指定语义特征相对于图库中的属性特征的相似性特征；处理模块26，与第二语义特征提取模块24耦合连接，用于将第一数量的属性特征与第二数量的相似性特征作为图库中每幅人脸图像的特征向量；检索模块28，与处理模块26耦合连接，用于通过预设规则计算待检索图像的特征向量与所述图库中每幅图像的特征向量的匹配度，依据所述匹配度检索出与所述待检测图像匹配的一个或多个图像。Fig. 2 is the structural block diagram of the retrieval device of face image according to the embodiment of the present invention, as shown in Fig. 2, this device comprises: the first semantic feature extraction module 22, is used for carrying out the semantic feature of the face image in the gallery Training obtains the attribute features of the first quantity of face images; the second semantic feature extraction module 24 is coupled and connected with the first semantic feature extraction module 22, and is used to train the specified semantic features of the first predetermined number of reference face images to obtain The similarity feature of the specified semantic features of the second quantity with respect to the attribute features in the gallery; the processing module 26, coupled with the second semantic feature extraction module 24, is used to combine the attribute features of the first quantity with the similarity of the second quantity Feature is used as the feature vector of each face image in the gallery; the retrieval module 28 is coupled with the processing module 26, and is used to calculate the matching of the feature vector of the image to be retrieved and the feature vector of each image in the gallery by preset rules degree, and according to the matching degree, one or more images matching the image to be detected are retrieved.

图3是根据本发明实施例的人脸图像的检索装置的可选结构框图一，如图3所示，第一语义特征提取模块22包括：第一检测单元32，用于对图库中每幅人脸图像中的关键点进行检测，其中，关键点为双眼的四个眼角、鼻尖以及嘴巴两端；第一处理单元34，与第一检测单元32耦合连接，用于依据关键点对人脸图像进行区域的划分，并抽取得到与不同区域对应的人脸底层特征；第二语义特征提取单元36，与第一处理单元34耦合连接，用于根据属性分类器对不同区域的多个人脸底层特征进行分类学习得到不同类型的属性特征。Fig. 3 is an optional structural block diagram one of a retrieval device of a human face image according to an embodiment of the present invention. As shown in Fig. 3 , the first semantic feature extraction module 22 includes: a first detection unit 32, which is used to search each image in the gallery The key points in the face image are detected, wherein the key points are the four corners of the eyes, the tip of the nose, and the two ends of the mouth; the first processing unit 34 is coupled with the first detection unit 32, and is used to detect the human face according to the key points. The image is divided into regions, and the bottom layer features of people's faces corresponding to different regions are extracted; the second semantic feature extraction unit 36 is coupled with the first processing unit 34, and is used to classify multiple people's faces in different regions according to the attribute classifier. The features are classified and learned to obtain different types of attribute features.

图4是根据本发明实施例的人脸图像的检索装置的可选结构框图二，如图4所示，第二语义特征提取模块24包括：第二检测单元42，用于对第一预定数量参考人脸图像的关键点进行检测，其中，关键点包括：双眼四个眼角、鼻尖以及嘴巴两端；第二处理单元44，与第二检测单元42耦合连接，用于依据关键点对指定语义特征进行抽取得到与指定语义特征对应的数据集，并抽取得到与不同区域对应的人脸底层特征；第二语义特征提取单元46，与第二处理单元44耦合连接，用于根据相似性分类器对数据集进行分类学习得到第二数量的相似性特征。Fig. 4 is the optional structural block diagram two of the retrieval device of human face image according to the embodiment of the present invention, as shown in Fig. Detect with reference to the key points of the face image, wherein the key points include: the four corners of the eyes, the tip of the nose, and the two ends of the mouth; the second processing unit 44 is coupled to the second detection unit 42 for specifying semantics based on the key points The feature is extracted to obtain a data set corresponding to the specified semantic feature, and the underlying features of the face corresponding to different regions are extracted; the second semantic feature extraction unit 46 is coupled to the second processing unit 44, and is used to classify according to the similarity. Classification learning is performed on the dataset to obtain a second quantity of similarity features.

可选地，本实施例中涉及到的属性分类器和相似性分类器为支持向量机SVM分类器。Optionally, the attribute classifier and the similarity classifier involved in this embodiment are support vector machine (SVM) classifiers.

图5是根据本发明实施例的人脸图像的检索装置的可选结构框图三，如图5所示，检索模块28包括：获取单元52，用于获取待检索图像的特征向量与图库中每幅图像的特征向量；计算单元54，与获取单元52耦合连接，用于对待检索图像的特征向量与图库中每幅图像的特征向量进行距离计算；检索单元56，与计算单元54耦合连接，用于对多个计算结果按照从大到小的规则进行排序，并从排序后的计算结果中选择取值靠前的第二预定数量的计算结果对应的人脸图像作为待检索图像的匹配图像。Fig. 5 is the optional structural block diagram three of the retrieval device of face image according to the embodiment of the present invention, as shown in Fig. 5, retrieval module 28 comprises: acquisition unit 52, is used to obtain the feature vector of image to be retrieved and each in the gallery The feature vector of the image; the calculation unit 54 is coupled with the acquisition unit 52, and is used to calculate the distance between the feature vector of the image to be retrieved and the feature vector of each image in the gallery; the retrieval unit 56 is coupled with the calculation unit 54. The plurality of calculation results are sorted according to the rule from large to small, and the face images corresponding to the second predetermined number of calculation results with the highest value are selected from the sorted calculation results as matching images of the image to be retrieved.

需要说明的是，上述各个模块是可以通过软件或硬件来实现的，对于后者，可以通过以下方式实现，但不限于此：上述模块均位于同一处理器中；或者，上述模块分别位于多个处理器中。It should be noted that each of the above-mentioned modules can be implemented by software or hardware. For the latter, it can be implemented in the following manner, but not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules are respectively located in multiple in the processor.

下面结合本发明的可选实施例对本发明进行举例说明；The present invention is illustrated below in conjunction with optional embodiments of the present invention;

本可选实施例提供了一种人脸高层语义特征的提取方法，该方法通过属性分类器以及相似性分类器提取人脸的高层语义特征，通过组合人脸属性特征和相似性特征进行人脸相似性度量，实现相似人脸检索，其中，包含人脸属性分类器学习以及人脸属性特征获取方法，人脸相似性分类器学习以及相似性特征获取方法，基于人脸属性特征和相似性特征的人脸检索三个部分，下面对该三个部分进行详细说明：This optional embodiment provides a method for extracting high-level semantic features of human faces. The method extracts high-level semantic features of human faces through attribute classifiers and similarity classifiers, and extracts high-level semantic features of human faces by combining human face attribute features and similarity features. Similarity measurement to achieve similar face retrieval, including face attribute classifier learning and face attribute feature acquisition methods, face similarity classifier learning and similarity feature acquisition methods, based on face attribute features and similarity features There are three parts of face retrieval, and the three parts are described in detail below:

一、人脸属性分类器学习以及人脸属性特征获取方法；1. Face attribute classifier learning and face attribute feature acquisition method;

(1)人脸属性分类器学习方式的过程：人脸属性包括：男性，女性，微笑，黑发，带眼镜等，它描述了人脸的语义特征。人脸属性分类器的目标是对人脸图像进行分类，判断该人脸图像是否具有某个属性。在本发明专利中，共训练了微笑、黑发、戴眼镜等69个属性特征的分类器用于人脸特征的表示。属性特征提取就是通过训练得到的属性特征分类器提取人脸的属性特征，在本实施例中也就是通过69个训练好的属性分类器对图像计算得到69个属性值，拼接形成该图像属性特征。(1) The process of learning the face attribute classifier: face attributes include: male, female, smiling, black hair, wearing glasses, etc., which describe the semantic features of the face. The goal of the face attribute classifier is to classify the face image and judge whether the face image has a certain attribute. In the patent of the present invention, a classifier with 69 attribute features such as smile, black hair, and glasses is trained for the representation of human face features. The attribute feature extraction is to extract the attribute feature of the face through the attribute feature classifier obtained through training. In this embodiment, 69 attribute values are calculated for the image through 69 trained attribute classifiers, and the image attribute feature is formed by splicing. .

(2)属性分类器的训练过程如下：(2) The training process of the attribute classifier is as follows:

首先，需要获得该属性的标注图像，对每个属性选择一定规模的正例样本和负例样本人脸图片(样本图片对该属性表现明显)，以此作为该属性的标注集。First of all, it is necessary to obtain the labeled image of this attribute, and select a certain scale of positive samples and negative sample face pictures for each attribute (the sample images are obvious for this attribute) as the label set of this attribute.

其次，按照下面的过程抽取属性的底层特征：对标注集中的每张人脸图像进行人脸检测，关键点定位，获得人脸图像的关键点信息，将图片旋转对齐。根据属性需求对图像进行区域分割(例如眼镜属性对应的区域为眼睛区域，白头发属性对应的区域为头发区域)，不同的属性可能需要分割出不同数目的区域。对分割出的区域提取出该属性该区域有效的底层特征(如LBP，Gabor等)。Secondly, the underlying features of the attribute are extracted according to the following process: perform face detection on each face image in the annotation set, locate key points, obtain key point information of the face image, and rotate and align the images. The image is segmented according to the attribute requirements (for example, the region corresponding to the glasses attribute is the eye region, and the region corresponding to the white hair attribute is the hair region), and different attributes may need to be divided into different numbers of regions. For the segmented area, the underlying features (such as LBP, Gabor, etc.) that are valid for the area of the attribute are extracted.

然后，将之前标注集中的正负例图像抽取出的底层属性特征分为数量相等的两部分，一半用于训练，一半用于测试。如果某个属性使用了多个分割区域需要先进行特征拼接，如是否带耳环这个属性需要使用到左右两边耳朵区域。对该底层特征学习相应的SVM属性分类器，如笑脸属性分类器、黑头发属性分类器、眼镜属性分类器等共69个属性分类器，此外生成训练集上的特征归一化文件。将训练产生的模型文件以及特征归一化文件进行保存。Then, the underlying attribute features extracted from the positive and negative images in the previous annotation set are divided into two equal parts, one half is used for training and the other half is used for testing. If a certain attribute uses multiple segmentation regions, feature stitching needs to be performed first, such as whether to wear earrings or not, this attribute needs to use the left and right ear regions. Learn corresponding SVM attribute classifiers for the underlying features, such as smiley face attribute classifiers, black hair attribute classifiers, glasses attribute classifiers, etc., a total of 69 attribute classifiers, and generate feature normalization files on the training set. Save the model file and feature normalization file generated by training.

最后，对测试集进行同样的归一化处理以及特征拼接后，依据属性分类器的分类值验证属性分类效果。Finally, after the same normalization processing and feature splicing are performed on the test set, the attribute classification effect is verified according to the classification value of the attribute classifier.

(3)人脸相似性分类器学习过程包括：(3) The face similarity classifier learning process includes:

相似性分类器的训练目标是训练参考人的五官相似性分类器，根据五官相似分类器对新的人脸图像进行分类，可以判断该人脸的五官与参考人的五官是否相似。The training goal of the similarity classifier is to train the facial features similarity classifier of the reference person, classify the new face image according to the facial features similarity classifier, and judge whether the facial features of the face are similar to those of the reference person.

(4)相似性分类器的训练过程如下：(4) The training process of the similarity classifier is as follows:

首先，选取若干个(例如10个)参考人，分别对每个参考人单独处理，将每个参考人所有的图片作为正样本，并选择同等数量的其他人脸图片作为负样本，以参考人为单位构成一个数据集。First, select several (for example, 10) reference persons, and process each reference person separately, take all the pictures of each reference person as positive samples, and select the same number of other face pictures as negative samples, and use the reference persons as negative samples. Units make up a dataset.

然后，对每个数据集按照如下过程处理抽取特征：首先进行人脸检测，关键点定位，并将图片旋转对齐。接着在每张人脸图片上分别分割出眼睛、眉毛、鼻子和嘴巴四个子块，对四个子块分别抽取底层特征(如LBP，Gabor等)。将这个数据集转化成4个新的子数据集，即眼睛数据集、眉毛数据集、嘴巴数据集和鼻子数据集。Then, process the extracted features for each data set as follows: first perform face detection, locate key points, and rotate and align the images. Next, four sub-blocks of eyes, eyebrows, nose and mouth are segmented on each face picture, and the underlying features (such as LBP, Gabor, etc.) are extracted from the four sub-blocks respectively. Transform this dataset into 4 new sub-datasets, namely eye dataset, eyebrow dataset, mouth dataset and nose dataset.

二、属性特征和相似性特征提取；2. Extraction of attribute features and similarity features;

(1)属性分类器和相似性分类器的特征提取过程是类似的，如下：(1) The feature extraction process of attribute classifier and similarity classifier is similar, as follows:

首先，对于一幅输入图像，其提取属性特征的过程如下，首先按照和属性分类器训练中一样的过程对人脸图像进行人脸检测，关键点定位，获得人脸图像的关键点信息，将图片旋转对齐，根据各个属性的需求对图像进行区域分割，并调用训练得到的属性分类器进行分类，得到属性分类器数值，将所有的属性分类数值进行拼接，得到输入图像的人脸属性特征。First of all, for an input image, the process of extracting attribute features is as follows. First, the face image is detected according to the same process as in the attribute classifier training, and the key points are located to obtain the key point information of the face image. The image is rotated and aligned, and the image is segmented according to the requirements of each attribute, and the attribute classifier obtained by training is called to classify to obtain the value of the attribute classifier, and all the attribute classification values are spliced to obtain the face attribute characteristics of the input image.

然后，对于一幅输入图像提取相似性特征，其具体过程同提取属性特征类似，具体包括：首先对人脸图像进行人脸检测，关键点定位，并将图片旋转对齐。接着在每张人脸图片上分别分割出眼睛、眉毛、鼻子和嘴巴四个子块，对四个子块分别抽取底层特征。然后调用训练得到相似性分类器对四个子块分别计算相似性数值，将所有的相似性数值进行拼接，得到输入图像的人脸相似性特征。Then, for an input image to extract similarity features, the specific process is similar to extracting attribute features, including: first, face detection is performed on the face image, key points are located, and the image is rotated and aligned. Then, four sub-blocks of eyes, eyebrows, nose and mouth are segmented on each face picture, and the underlying features are extracted from the four sub-blocks. Then call the trained similarity classifier to calculate the similarity values for the four sub-blocks, and splicing all the similarity values to obtain the face similarity features of the input image.

最后，最后对输入图像的属性特征和相似性特征进行拼接，这样就得到了这张图像的69个属性值和40个相似性值构成的一个特征向量。Finally, the attribute features and similarity features of the input image are concatenated, so that a feature vector composed of 69 attribute values and 40 similarity values of this image is obtained.

三、基于属性特征和相似性特征组合的人脸检索；3. Face retrieval based on the combination of attribute features and similarity features;

在得到人脸图像的属性值和相似性值构成的特征向量后，可以进行基于组合人脸属性特征和相似性特征的人脸检索，如下：After obtaining the feature vector composed of the attribute value and similarity value of the face image, face retrieval based on the combination of face attribute features and similarity features can be performed, as follows:

将获得的69个属性值和40个相似性值拼接作为每副图像的特征向量，再使用LMNN(Large Margin Nearest Neighbors)算法优化特征向量每一维的权重。使用特征向量和权重便可计算两张人脸的相似度。本专利采用两个向量夹角的余弦(Cosine)作为向量之间的相似性数值，cosθ取值范围在[-1,+1]，越接近于+1，代表两张图片中的人脸越相似。The obtained 69 attribute values and 40 similarity values are concatenated as the feature vector of each image, and then the weight of each dimension of the feature vector is optimized using the LMNN (Large Margin Nearest Neighbors) algorithm. The similarity between two faces can be calculated using feature vectors and weights. This patent uses the cosine (Cosine) of the angle between two vectors as the similarity value between the vectors. The value range of cosθ is [-1,+1]. The closer to +1, the more the faces in the two pictures are. resemblance.

下面结合附图对本发明可选实施例进行详细的说明；Optional embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings;

首先介绍本可选实施例中的算法各个部件采用的技术方案，该技术方案包括：人脸关键点检测、图像预处理、图像区域分割、特征抽取、以及分类器训练。Firstly, the technical solution adopted by each component of the algorithm in this optional embodiment is introduced. The technical solution includes: face key point detection, image preprocessing, image region segmentation, feature extraction, and classifier training.

(1)关键点检测；(1) key point detection;

图6是根据本发明可选实施例的人脸关键点检测的示意图，如图6所示，本可选实施例中采用flandmark进行快速人脸关键点检测，检测点为眼角、鼻尖和嘴巴两端这7个关键点。Fig. 6 is a schematic diagram of face key point detection according to an optional embodiment of the present invention. As shown in Fig. 6, in this optional embodiment, flandmark is used for fast face key point detection, and the detection points are the corners of the eyes, the tip of the nose and the two corners of the mouth. End these 7 key points.

(2)图像预处理；(2) Image preprocessing;

图像预处理是对原始人脸图像的旋转和对齐。根据得到的关键点的数据可以定位得到双眼瞳孔位置信息。由于旋转后的人脸的瞳孔应该在一条直线上，即瞳孔坐标的X值应该相等，进而可以计算旋转的角度。旋转后的图像保存为250像素*250像素大小，不足的部分用黑色填充。Image preprocessing is the rotation and alignment of the original face image. According to the obtained data of the key points, the position information of the pupils of both eyes can be obtained by positioning. Since the pupils of the rotated face should be on a straight line, that is, the X values of the pupil coordinates should be equal, and then the angle of rotation can be calculated. The rotated image is saved with a size of 250 pixels*250 pixels, and the insufficient part is filled with black.

需要说明的是，本可选实施例中采用的坐标系统与通常的坐标系不同，水平方向从左向右为y轴，垂直方向从上向下为x轴。It should be noted that the coordinate system used in this optional embodiment is different from the usual coordinate system, the y-axis is from left to right in the horizontal direction, and the x-axis is from top to bottom in the vertical direction.

假设一幅图像，左右眼睛的坐标分别为(plx,ply)和(prx,pry)，两眼之间连线的中点坐标为(mx,my)，两眼瞳孔之间的距离为d。此时图像放缩比例ratio＝d/dd(dd默认为75)。两眼连线与y轴之间的夹角为θ，两眼连线的斜率为k。图7是根据本发明可选实施例的坐标系统示意图，如图7所示，椭圆代表眼睛的位置。Assuming an image, the coordinates of the left and right eyes are (plx,ply) and (prx,pry) respectively, the coordinates of the midpoint of the line between the two eyes are (mx,my), and the distance between the pupils of the two eyes is d. At this time, the image scaling ratio ratio=d/dd (dd is 75 by default). The angle between the line connecting the two eyes and the y-axis is θ, and the slope of the line connecting the two eyes is k. Fig. 7 is a schematic diagram of a coordinate system according to an alternative embodiment of the present invention. As shown in Fig. 7, an ellipse represents the position of an eye.

要从原图像中分割出符合人脸标准图像，需要对图像进行以下处理步骤：旋转θ度,使得两眼连线与y轴重合；进行图像缩放,使得两眼距离为dd；移动图像,使得两眼中点移动到(mx,my)。To segment an image conforming to the human face standard from the original image, the image needs to be processed in the following steps: rotate θ degrees so that the line connecting the two eyes coincides with the y-axis; zoom the image so that the distance between the two eyes is dd; move the image so that Move the midpoint of the two eyes to (mx,my).

图8a～8b是根据本发明可选实施例的人脸图像旋转对齐之前和之后的对比示意图，其中，图8a是旋转之前，图8b是旋转之后；Figures 8a-8b are schematic diagrams showing comparisons of face images before and after rotation alignment according to an optional embodiment of the present invention, wherein Figure 8a is before rotation and Figure 8b is after rotation;

(3)区域分割；(3) Regional segmentation;

图像区域分割，以分割眼睛为例，具体的做法是：找到左眼角的关键点坐标pLeftIndex和右眼角的关键点坐标pRightIndex，根据这两点的坐标可以计算出它们的中点以这个中点为矩形的中心点，定义中心点到矩形左右边界的距离为centerToLeft，到矩形上下边界的距离为centerToUp，以及定义图像的宽为width，高为height，根据中心点位置以及centerToUp和centerToLeft可以得到分割区域左上角的坐标，根据左上角坐标以及宽和高的信息，就得到了包含分割的眼睛区域信息的图像。Image area segmentation, taking the segmentation of eyes as an example, the specific method is: find the key point coordinates pLeftIndex of the left eye corner and the key point coordinates pRightIndex of the right eye corner, and calculate their midpoint according to the coordinates of these two points Take this midpoint as the center point of the rectangle, define the distance from the center point to the left and right borders of the rectangle as centerToLeft, and the distance from the upper and lower borders of the rectangle as centerToUp, and define the width of the image as width and height as height, according to the position of the center point and centerToUp and centerToLeft can get the coordinates of the upper left corner of the segmented area. According to the coordinates of the upper left corner and the information of width and height, an image containing information of the segmented eye area is obtained.

(4)特征提取；(4) feature extraction;

特征抽取主要是用Gabor小波变换提取图像块的特征。在特征提取方面，Gabor小波变换与其它方法相比：一方面其处理的数据量较少，能满足系统的实时性要求；另一方面，小波变换对光照变化不敏感，且能容忍一定程度的图像旋转和变形，当采用基于夹角余弦距离进行识别时，特征模式与待测特征不需要严格的对应，故能提高系统的鲁棒性。因此，在人脸识别的过程中我们采用Gabor小波变换方法对图像进行特征提取。Feature extraction mainly uses Gabor wavelet transform to extract the features of image blocks. In terms of feature extraction, Gabor wavelet transform is compared with other methods: on the one hand, it processes less data and can meet the real-time requirements of the system; on the other hand, wavelet transform is not sensitive to illumination changes and can tolerate a certain degree of For image rotation and deformation, when the recognition is based on the cosine distance of the included angle, the feature pattern and the feature to be tested do not need to be strictly corresponding, so the robustness of the system can be improved. Therefore, in the process of face recognition, we use the Gabor wavelet transform method to extract the features of the image.

(5)SVM分类器；(5) SVM classifier;

SVM分类器训练使用了LIBSVM。LIBSVM是一个实现了支持向量机SVM算法的库。使用LIBSVM需要两步：首先，训练一个数据集获得分类模型，然后，使用模型预测测试数据集的类标。SVM分类器的学习目标是在特征空间中找到一个最大化间隔的分离超平面，将不同的类分开。The SVM classifier was trained using LIBSVM. LIBSVM is a library that implements the SVM algorithm for support vector machines. Using LIBSVM requires two steps: first, train a data set to obtain a classification model, and then use the model to predict the class label of the test data set. The learning goal of the SVM classifier is to find a separating hyperplane that maximizes the margin in the feature space, separating different classes.

2.属性分类器相似性分类器具体实施方式；2. The specific implementation of the attribute classifier similarity classifier;

(1)属性分类器实施方式，属性分类器采用了LFW数据集，对表1中69个属性中的每个属性，从LFW数据集中挑选出符合这个属性和不符合这个属性的图片各1000张(建议不少于这个数目)，对每张图片都进行标记，如符合该属性标记为+1，如不符合则标记为-1。分割出属性对应的区域，具体区域见表2。因为某些属性可以使用相同的分割区域，所以69个属性可以缩减对应到19个区域。对所用图片抽取Gabor小波变换特征。之后将每个属性使用LIBSVM进行训练。(1) The implementation of the attribute classifier. The attribute classifier uses the LFW data set. For each attribute in the 69 attributes in Table 1, select 1000 pictures that meet this attribute and do not meet this attribute from the LFW data set. (It is recommended not to be less than this number), mark each picture, if it meets this attribute, mark it as +1, if it does not meet it, mark it as -1. Segment the area corresponding to the attribute, see Table 2 for the specific area. Because some attributes can use the same segmentation region, 69 attributes can be reduced to 19 regions. The Gabor wavelet transform feature is extracted for the pictures used. Each attribute is then trained using LIBSVM.

表1Table 1

表2Table 2

(2)相似性分类器的具体实施方式，相似性分类器采用了PubFig数据集，从PubFig图像库中选出10个具有代表性的人，每个人的图片要不少于150张，分割出相似分类器所需的眼睛、眉毛、鼻子和嘴巴四个区域，图9是根据本发明可选实施例的相似性图像区域分割的示意图，如图9所示，这样总共可以得到40个区域，分别提取特征。以第一个参考人的眼睛区域为例，把他自己的眼睛区域作为正例，标记为+1。另外从图库中挑选出不是这个人的图片(建议是多个不同人)，张数大致等于这个参考人所有图片的张数，分割出眼睛区域，提取特征，作为负例，标记为-1。同样对分割的区域提取Gabor小波变换特征。使用LIBSVM训练得到40个相似性分类器。(2) The specific implementation of the similarity classifier. The similarity classifier adopts the PubFig data set, selects 10 representative people from the PubFig image library, and each person has no less than 150 pictures. The four regions of eyes, eyebrows, nose and mouth required by the similarity classifier, Fig. 9 is a schematic diagram of similarity image region segmentation according to an optional embodiment of the present invention, as shown in Fig. 9, so that a total of 40 regions can be obtained, Extract features separately. Taking the eye area of the first reference person as an example, take his own eye area as a positive example and mark it as +1. In addition, select pictures that are not this person from the gallery (it is recommended to be multiple different people), the number of which is roughly equal to the number of all pictures of this reference person, segment the eye area, extract features, and mark it as -1 as a negative example. Similarly, Gabor wavelet transform features are extracted from the segmented regions. 40 similarity classifiers were trained using LIBSVM.

3.计算输入图片的属性特征和相似性特征以及特征权重学习；3. Calculate the attribute features and similarity features of the input image and feature weight learning;

(1)对输入图片进行人脸关键点检测，图像预处理，图像区域分割，特征抽取操作，19个区域得到19个区域特征。(1) Perform face key point detection, image preprocessing, image region segmentation, and feature extraction operations on the input image, and 19 regional features are obtained from 19 regions.

(2)对69个属性和40个相似性分类器模型，通过LIBSVM的svmpredict()函数可以得到对应的值。这样就得到了这张图像的69个属性值和40个相似性值。对属性值和相似性值进行组合，得到这幅图片的特征向量 (2) For 69 attributes and 40 similarity classifier models, the corresponding values can be obtained through the svmpredict() function of LIBSVM. In this way, 69 attribute values and 40 similarity values of this image are obtained. Combine the attribute value and similarity value to get the feature vector of this picture

(3)在输入的已标记人员类标的图片集上(采用的LFW数据集)，对图片集上提取的属性特征和相似性特征，使用LMNN算法优化特征向量每一维的权重，得到 (3) On the input image set of marked persons (the LFW dataset used), use the LMNN algorithm to optimize the weight of each dimension of the feature vector for the attribute features and similarity features extracted from the image set, and obtain

4.人脸检索实施过程；4. Implementation process of face retrieval;

(1)对检索图库所有图片进行处理得到每张图片的109维特征向量。(1) Process all the images in the retrieval gallery to obtain the 109-dimensional feature vector of each image.

(2)对新输入的待检索图片也同样处理得到109维特征向量。(2) The newly input image to be retrieved is also processed to obtain a 109-dimensional feature vector.

(3)计算人脸图片相似性，这里采用夹角余弦，对于两个向量和 (3) Calculate the similarity of face pictures, here the angle cosine is used, for two vectors with

其中，上述公式中，<x,y>表示求两个向量x和y之间的内积，||·||表示求向量的模，sim(v_i,v_j)取值范围在[-1,+1]，越接近于+1，代表两张图片中的人脸越相似。对待检索图片的特征向量和检索库中所有图片的特征向量求夹角余弦，取余弦值最大的前N(默认为1000)张图片。被选出的靠前的图片中的人被认为最可能和输入图片中的是同一人。Among them, in the above formula, <x, y> means to find the inner product between two vectors x and y, ||·|| means to find the modulus of the vector, and the value range of sim(v _i , v _j ) is [- 1,+1], the closer to +1, the more similar the faces in the two pictures. Calculate the cosine of the angle between the eigenvector of the image to be retrieved and the eigenvectors of all images in the retrieval library, and take the first N (default is 1000) images with the largest cosine value. The person in the selected top image is considered most likely to be the same person in the input image.

本可选实施例中的人脸属性分类器学习以及属性特征获取，人脸相似性分类器学习以及相似性特征获取的流程，图10是根据本发明可选实施例的属性/相似性特征分类器学习以及特征提取过程示意图，如图10所示，该提取过程包括：属性/相似性分类器训练过程和属性/相似性分类器提取过程；The process of face attribute classifier learning and attribute feature acquisition in this optional embodiment, face similarity classifier learning and similarity feature acquisition, FIG. 10 is an attribute/similarity feature classification according to an optional embodiment of the present invention A schematic diagram of machine learning and feature extraction process, as shown in Figure 10, the extraction process includes: attribute/similarity classifier training process and attribute/similarity classifier extraction process;

其中，属性/相似性分类器训练过程的步骤包括：Among them, the steps of the attribute/similarity classifier training process include:

步骤S1002：带标记图库；Step S1002: a gallery with markers;

步骤S1004：快速人脸关键点检测；Step S1004: fast face key point detection;

步骤S1006：图像预处理；Step S1006: image preprocessing;

步骤S1008：图像区域分割；Step S1008: image region segmentation;

步骤S1010：特征抽取；Step S1010: feature extraction;

步骤S1012：SVM分类训练和测试。Step S1012: SVM classification training and testing.

属性/相似性分类器提取过程的步骤包括：The steps of the attribute/similarity classifier extraction process include:

步骤S1014：测试或检索图片；Step S1014: test or retrieve pictures;

步骤S1016：快速人脸关键点检测；Step S1016: fast face key point detection;

步骤S1018：图像预处理；Step S1018: image preprocessing;

步骤S1020：图像区域分割；Step S1020: image region segmentation;

步骤S1022：特征抽取；Step S1022: feature extraction;

步骤S1024：计算属性/相似性值。Step S1024: Calculate the attribute/similarity value.

需要说明的是，这两个分类器的区别在于SSVM分类器训练和测试部分，属性分类器中符合该属性标记为+1，如不符合则标记为-1；相似性分类器中当前参考人标记为+1，其他人标记为-1。人脸属性特征和相似性特征的人脸检索流程，It should be noted that the difference between these two classifiers lies in the training and testing part of the SSVM classifier. In the attribute classifier, the attribute is marked as +1, and if not, it is marked as -1; the current reference person in the similarity classifier +1 for flagging, -1 for others. The face retrieval process of face attribute features and similarity features,

图11是根据本发明可选实施例的图片入库以及检索流程示意图，如图11所示，该过程包括两个流程：图片存入数据库流程和检索流程；Fig. 11 is a schematic diagram of a picture storage and retrieval process according to an optional embodiment of the present invention. As shown in Fig. 11, the process includes two processes: a process of storing pictures in a database and a process of retrieval;

其中，图片存入数据库流程的步骤包括：Among them, the steps of storing pictures in the database process include:

步骤S1102：检索图库；Step S1102: Retrieve the gallery;

步骤S1104：计算属性和相似性特征值；Step S1104: calculating attribute and similarity feature values;

步骤S1106：属性/相似性特征组合；Step S1106: attribute/similarity feature combination;

步骤S1108：人像属性/相似性特征数据库；该步骤执行完之后执行步骤S1016；Step S1108: portrait attribute/similarity feature database; after this step is executed, step S1016 is executed;

检索流程的步骤：Steps in the retrieval process:

步骤S1110：待检索图片；Step S1110: the picture to be retrieved;

步骤S1112：计算属性和相似性特征值；Step S1112: calculating attribute and similarity feature values;

步骤S1114：属性/相似性特征组合；Step S1114: attribute/similarity feature combination;

步骤S1116：特征比对；Step S1116: feature comparison;

步骤S1118：检索结果/最相似图片。Step S1118: Retrieve result/most similar picture.

需要说明的是，最终特征向量是包含69个属性值和40个相似性值的109维向量。It should be noted that the final feature vector is a 109-dimensional vector containing 69 attribute values and 40 similarity values.

本发明的实施例还提供了一种存储介质。可选地，在本实施例中，上述存储介质可以被设置为存储用于执行以下步骤的程序代码：The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the above-mentioned storage medium may be configured to store program codes for performing the following steps:

S1：对图库中的人脸图像的语义特征进行训练得到第一数量的人脸图像的属性特征；S1: Train the semantic features of the face images in the gallery to obtain the attribute features of the first number of face images;

S2：对第一预定数量参考人脸图像的指定语义特征进行训练得到第二数量的指定语义特征相对于图库中的属性特征的相似性特征；S2: Train the specified semantic features of the first predetermined number of reference face images to obtain the similarity features of the second number of specified semantic features with respect to the attribute features in the gallery;

S3：将第一数量的属性特征与第二数量的相似性特征作为图库中每幅人脸图像的特征向量；S3: using the first number of attribute features and the second number of similarity features as feature vectors of each face image in the gallery;

S4：通过预设规则对待检索图像的特征向量与图库中每幅图像的特征向量进行计算，依据计算结果检索出与待检测图像匹配的一个或多个图像。可选地，本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例，本实施例在此不再赘述。S4: Calculate the feature vector of the image to be retrieved and the feature vector of each image in the gallery by preset rules, and retrieve one or more images matching the image to be detected according to the calculation result. Optionally, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementation manners, and details are not repeated in this embodiment.

显然，本领域的技术人员应该明白，上述的本发明的各模块或各步骤可以用通用的计算装置来实现，它们可以集中在单个的计算装置上，或者分布在多个计算装置所组成的网络上，可选地，它们可以用计算装置可执行的程序代码来实现，从而，可以将它们存储在存储装置中由计算装置来执行，并且在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤，或者将它们分别制作成各个集成电路模块，或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样，本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that each module or each step of the above-mentioned present invention can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network formed by multiple computing devices Alternatively, they may be implemented in program code executable by a computing device so that they may be stored in a storage device to be executed by a computing device, and in some cases in an order different from that shown here The steps shown or described are carried out, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present invention is not limited to any specific combination of hardware and software.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A retrieval method for face images, characterized in that, comprising:

The semantic features of the face images in the gallery are trained to obtain the attribute features of the first quantity of the face images;

Train the specified semantic features of the first predetermined number of reference face images to obtain the similarity features of the specified semantic features with respect to the attribute features in the gallery;

Using the attribute feature of the first quantity and the similarity feature of the second quantity as the feature vector of each face image in the gallery;

Computing the matching degree between the feature vector of the image to be retrieved and the feature vector of each image in the gallery through preset rules, and retrieving one or more images matching the image to be detected according to the matching degree.

2. method according to claim 1, is characterized in that, described the semantic feature of the face image in the gallery is trained and obtains the attribute feature of the described face image of the first quantity and comprises:

Detect key points in each face image in the gallery, wherein the key points include: the four corners of the eyes, the tip of the nose and the two ends of the mouth;

Carrying out regional division of the human face image according to the key points, and extracting the underlying features of the human face corresponding to different regions;

According to the attribute classifier, classify and learn the plurality of human face underlying features in different regions to obtain the first number of attribute features of different types.

3. The method according to claim 1, wherein the specified semantic features of the first predetermined number of reference face images are trained to obtain the second quantity of the specified semantic features relative to the attribute features in the gallery. Similarity features include:

Detecting the key points of the first predetermined number of reference face images, wherein the key points include: the four corners of both eyes, the tip of the nose, and both ends of the mouth;

Extracting the specified semantic features according to the key points to obtain a data set corresponding to the specified semantic features, and extracting the underlying features of the face corresponding to different regions;

Classifying and learning the data set according to a similarity classifier to obtain the second number of similarity features.

4. The method according to claim 2 or 3, wherein the attribute classifier and the similarity classifier comprise: a support vector machine (SVM) classifier.

5. The method according to claim 1, wherein the matching degree of the feature vector of the image to be retrieved is calculated by preset rules and the feature vector of each image in the gallery, and the matching degree is retrieved according to the matching degree. The one or more images that match the image to be detected include:

Obtaining the feature vector of the image to be retrieved and the feature vector of each image in the gallery;

Carrying out distance calculation between the feature vector of the image to be retrieved and the feature vector of each image in the gallery, the distance calculation method includes: cosine distance method, Euclidean distance method;

Sorting a plurality of calculation results according to the rule from large to small, and selecting the face images corresponding to the second predetermined number of calculation results with higher values from the sorted calculation results as the matching images of the image to be retrieved .

6. A retrieval device for human face images, comprising:

The first semantic feature extraction module is used to train the semantic features of the face images in the gallery to obtain the first number of attribute features of the face images;

The second semantic feature extraction module is used to train the specified semantic features of the first predetermined number of reference face images to obtain a second number of similarity features of the specified semantic features with respect to the attribute features in the gallery;

A processing module, configured to use the first number of attribute features and the second number of similarity features as feature vectors of each face image in the gallery;

The retrieval module is used to calculate the matching degree between the feature vector of the image to be retrieved and the feature vector of each image in the gallery through preset rules, and retrieve one or more images matching the image to be detected according to the matching degree. image.

7. The device according to claim 6, wherein the first semantic feature extraction module comprises:

The first detection unit is configured to detect the key points in each face image in the gallery, wherein the key points include: four corners of the eyes, the tip of the nose, and both ends of the mouth;

A first processing unit, configured to divide the face image into regions according to the key points, and extract the underlying features of the face corresponding to different regions;

The second semantic feature extraction unit is configured to classify and learn a plurality of the underlying features of the human face in different regions according to the attribute classifier to obtain the first number of attribute features of different types.

8. The device according to claim 7, wherein the second semantic feature extraction module comprises:

The second detection unit is configured to detect the key points of the first predetermined number of reference face images, wherein the key points include: the four corners of both eyes, the tip of the nose, and both ends of the mouth;

The second processing unit is configured to extract the specified semantic feature according to the key point to obtain a data set corresponding to the specified semantic feature, and extract the underlying features of the face corresponding to different regions;

The second semantic feature extraction unit is configured to classify and learn the data set according to a similarity classifier to obtain a second number of similarity features.

9. The device according to claim 7 or 8, wherein the attribute classifier and the similarity classifier comprise: a support vector machine (SVM) classifier.

10. The device according to claim 6, wherein the retrieval module comprises:

an acquisition unit, configured to acquire the feature vector of the image to be retrieved and the feature vector of each image in the gallery;

A calculation unit, configured to calculate the distance between the feature vector of the image to be retrieved and the feature vector of each image in the gallery, the distance calculation method includes: cosine distance method, Euclidean distance method;

A retrieval unit, configured to sort a plurality of calculation results according to the rule from large to small, and select, from the sorted calculation results, the face images corresponding to the second predetermined number of calculation results with the highest values as the to-be-waited Retrieves matching images for an image.