CN111428661A - Method for processing face image based on intelligent human-computer interaction - Google Patents
Method for processing face image based on intelligent human-computer interaction Download PDFInfo
- Publication number
- CN111428661A CN111428661A CN202010232764.9A CN202010232764A CN111428661A CN 111428661 A CN111428661 A CN 111428661A CN 202010232764 A CN202010232764 A CN 202010232764A CN 111428661 A CN111428661 A CN 111428661A
- Authority
- CN
- China
- Prior art keywords
- face
- predicted
- facial
- level
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
一种基于智能化人机交互进行人脸图像处理的方法属于人工智能领域。本发明与用户信息相关的面部特征被分为整体面部特征、表情特征和视觉跟踪,通过三方面的结合分析可以得到一个三维的人脸信息。本发明将用户面部特征中的特定动作绑定到行为信息中,并通过面部特征信息实现人机交互,降低面部识别点选取误差。本发明通过对捕获人脸进行面部区域分析,定位出人脸的基本面部特征,针对眼部区域进行视觉跟踪,将面部表情绑定行为信息以实现人机交互,降低面部识别点选取误差,提高面部识别准确率。
A face image processing method based on intelligent human-computer interaction belongs to the field of artificial intelligence. The facial features related to user information in the present invention are divided into overall facial features, expression features and visual tracking, and a three-dimensional face information can be obtained through combined analysis of the three aspects. The present invention binds specific actions in the user's facial features to behavior information, and realizes human-computer interaction through the facial feature information, thereby reducing the selection error of facial recognition points. The invention locates the basic facial features of the human face by analyzing the captured human face, performs visual tracking for the eye region, binds the facial expression to the behavior information to realize human-computer interaction, reduces the selection error of the facial recognition point, and improves the Facial recognition accuracy.
Description
技术领域technical field
本发明属于人工智能领域,涉及人脸图像处理的方法实现,具体的说是一种通过人脸图像和视频信息的实时采集,进行人脸点检测、表情识别、视觉跟踪,实现人脸图像处理的方法。The invention belongs to the field of artificial intelligence, and relates to the realization of a method for processing face images, in particular to a method for performing face point detection, expression recognition, and visual tracking through real-time collection of face images and video information to realize face image processing. Methods.
背景技术Background technique
人机交互是人与系统之间信息交换的过程,该过程可存在于多种不同类型的系统中,最早的人机交互方法是通过手动操作输入机器语言指令来实现的,交互的媒介是计算机语言。随着图形处理领域的发展,人机交互媒介亦逐渐向图形界面转变,使得图形界面有更好的交互体验,用户的交互更方便,反馈给用户更精确的信息。随着普适计算,深度学习等相关技术的发展,人机交互的种类不断丰富,用于交互的媒介变得多样化,主要方法不断改进包括语音识别、手势识别、跟踪等等,交互形式变得多样化,可以传递的信息量大大增加。Human-computer interaction is the process of information exchange between people and systems. This process can exist in many different types of systems. The earliest human-computer interaction method is realized by manual operation and input of machine language instructions. The medium of interaction is computer. language. With the development of the field of graphics processing, the human-computer interaction medium is gradually changing to the graphical interface, which makes the graphical interface have a better interactive experience, makes the user's interaction more convenient, and feeds back more accurate information to the user. With the development of ubiquitous computing, deep learning and other related technologies, the types of human-computer interaction are constantly enriched, the media used for interaction have become diversified, and the main methods have been continuously improved, including speech recognition, gesture recognition, tracking, etc. Diversified, the amount of information that can be transmitted is greatly increased.
人脸识别作为模式识别的一个重要领域,一直是研究的热点之一,随着人脸分析技术的日趋成熟,人脸特征被广泛应用于各种场合,对于人脸图像的处理问题被推上时代的风口浪尖。借此本发明通过定位了人脸特征区域,对人脸进行人脸点检测、表情识别、视觉跟踪,将三种方法结合应用,克服了每种方法分别存在的不同局限性,实现智能化人机交互过程。As an important field of pattern recognition, face recognition has always been one of the research hotspots. With the maturity of face analysis technology, face features are widely used in various occasions, and the problem of face image processing has been pushed up. The cusp of the times. Thereby, the present invention locates the feature area of the face, performs face point detection, expression recognition, and visual tracking on the face, and applies the three methods in combination, overcomes the different limitations of each method, and realizes the intelligent human face. computer interaction process.
人脸点检测与识别是深度学习领域的重要应用之一,人脸点识别是指在检测到人脸后,在人脸上对关键点进行定位,定位人脸特征区域后对数据进行预处理,利用识别算法提取特征,完成人脸识别,过程如图1所示。随着深度学习技术的飞速发展,人脸点识别技术也日趋成熟。Face point detection and recognition is one of the important applications in the field of deep learning. Face point recognition refers to locating key points on the face after detecting the face, and preprocessing the data after locating the face feature area. , using the recognition algorithm to extract features to complete face recognition, the process is shown in Figure 1. With the rapid development of deep learning technology, face recognition technology is also becoming more and more mature.
表情识别是计算机理解人类情感的一个重要方向,也是人机交互的一个重要方面,通过分析表达式,您可以捕获用户信息并做出决策,可以直接判断使用者的情绪和心理。分析情绪常用卷积神经网络,通过分析细胞神经网络中的多个卷积层和采集层,可以提取整个人脸或局部区域的高级和多级特征,并得到良好的表情图像特征分析。经验表明,在图像识别中卷积神经网络的应用是优于其他类型神经网络,通过卷积神经网络可以达到目前最好的表情识别效果。Expression recognition is an important direction for computers to understand human emotions and an important aspect of human-computer interaction. By analyzing expressions, you can capture user information and make decisions, and you can directly judge the user's emotions and psychology. Convolutional neural networks are commonly used to analyze emotions. By analyzing multiple convolutional layers and acquisition layers in the cellular neural network, high-level and multi-level features of the entire face or local area can be extracted, and good facial expression image feature analysis can be obtained. Experience has shown that the application of convolutional neural networks in image recognition is superior to other types of neural networks, and the best expression recognition effect can be achieved through convolutional neural networks.
视觉跟踪是人机交互过程中的另一个重要方面,通过视觉跟踪可以方便地观察到用户的焦点,更有利于分析用户感兴趣的区域,分析用户的选择和偏好,我们用人眼作为电脑的输入源,通过跟踪用户的视线,我们确定了人眼的视线范围,并完成了相应的人机交互。Visual tracking is another important aspect in the process of human-computer interaction. Through visual tracking, the user's focus can be easily observed, which is more conducive to analyzing the user's area of interest and analyzing the user's choices and preferences. We use the human eye as the input of the computer. By tracking the user's line of sight, we determine the line of sight of the human eye and complete the corresponding human-computer interaction.
综合上述分析可以看出,以上三种方法分别针对人脸识别中的一个方向,因而单独应用都存在缺陷导致识别错误的发生。为此,本发明结合以上三种方法提出了一种新的人脸识别方法,通过对捕获人脸进行面部区域分析,定位出人脸的基本面部特征,针对眼部区域进行视觉跟踪,将面部表情绑定行为信息以实现人机交互,降低面部识别点选取误差,提高面部识别准确率。Based on the above analysis, it can be seen that the above three methods are respectively aimed at one direction in face recognition, so there are defects in individual applications, which lead to the occurrence of recognition errors. To this end, the present invention proposes a new face recognition method by combining the above three methods. By analyzing the face area of the captured face, the basic facial features of the face are located, and the eye area is visually tracked to identify the face. Expression binding behavior information to realize human-computer interaction, reduce facial recognition point selection errors, and improve facial recognition accuracy.
发明内容SUMMARY OF THE INVENTION
本发明的目的是解决人脸图像处理问题,提出通过人脸图像和视频信息的实时采集,进行人脸点检测、表情识别、视觉跟踪,实现人脸图像处理的方法。The purpose of the present invention is to solve the problem of face image processing, and proposes a method for realizing face image processing through real-time collection of face images and video information, performing face point detection, expression recognition, and visual tracking.
实现本发明的方法描述如下:The method for realizing the present invention is described as follows:
本发明基于卷积神经网络的人脸图像处理技术实现智能化人机交互,具体实现方法分以下三个过程:The present invention realizes intelligent human-computer interaction based on the face image processing technology of convolutional neural network, and the specific implementation method is divided into the following three processes:
(一)人脸点检测的方法(1) The method of face point detection
本方法采用三层卷积神经网络结构,采用绝对值校正及参数共享机制来建立第一层卷积神经网络,采用多层次回归的思想来获取二、三层卷积神经网络。由于人脸姿态变化大,导致检测不稳定,人脸点和检测点的相对位置可能在很大范围内发生变化,即产生较大的相对误差,因此第一级网络的输入区域选取较大,以覆盖尽可能多的预测位置。第一级网络的输出为后续检测区域的选取提供了选取条件,因此第二、三级检测区域相应缩小,检测区域的选取条件为:包含前一级网络获得的所有预测位置中的75%的预测位置,以前一级网络的预测位置密度最高的位置点为中心的圆形区域。This method adopts a three-layer convolutional neural network structure, adopts the absolute value correction and parameter sharing mechanism to establish the first-layer convolutional neural network, and adopts the idea of multi-level regression to obtain the second and third-layer convolutional neural networks. Due to the large change in the face posture, the detection is unstable, and the relative position of the face point and the detection point may change in a large range, that is, a large relative error occurs. Therefore, the input area of the first-level network is selected to be large. to cover as many predicted locations as possible. The output of the first-level network provides selection conditions for the selection of subsequent detection areas, so the second and third-level detection areas are correspondingly reduced. The predicted location is a circular area centered on the location point with the highest predicted location density of the previous level network.
在新的预测区域内重新获取预测位置,此过程重复多次至检测区域缩小至第一级检测区域的1%时,得到的预测位置即为每个点的预测位置,进而得到每个层次的不同输入区域的多个网络;Re-acquire the predicted position in the new prediction area. This process is repeated several times until the detection area is reduced to 1% of the first-level detection area. The obtained predicted position is the predicted position of each point, and then the Multiple networks for different input areas;
人脸点的最终预测位置x可以正式表示为一个n层级的级联,预测位置x的数学表示如下:The final predicted position x of a face point can be formally expressed as an n-level cascade, and the mathematical representation of the predicted position x is as follows:
式中x即为预测位置,li为一个n层级的级联在第层级上的预测位置个数,第i层级上的预测位置分别用表示,即x1 (1)为第1层级上的第一个预测位置,第i层级上的第li个预测位置相比较于第i-1层级上相对应的第li个预测位置的改变用Δx1 (i)表示。In the formula, x is the prediction position, l i is the number of prediction positions on the n-level cascade on the first level, and the prediction positions on the i-th level are respectively Representation, that is, x 1 (1) is the first prediction position on the 1st level, and the l ith prediction position on the i-th level The change compared to the corresponding li-th predicted position on the i -1-th level is denoted by Δx 1 ( i ) .
本方法采用三层卷积神经网络的设计。第一层卷积神经网络包括三个检测区域不同的深度卷积网络,分别为F1网络检测区域覆盖了整个面部,EN1网络检测区域仅覆盖眼睛和鼻子区域及NM1网络检测区域仅覆盖鼻子和嘴巴区域。三个网络针对同一个面孔的不同区域同时采用上述预测方法进行预测,得到的三个网络的预测值进行平均,以减少方差即可得到整个面部特征点的第一层预测位置,避免因局部特征过于明显而导致的面部预测结果与实际有偏差。根据第一层预测位置采用回归的思想,针对三个网络F1、EN1、NM1分别得到对应的第二、三层预测位置即可。由于第二、三层次的输入区域范围受第一层次的预测结果的严格限制,因此第二、三层次的预测位置可达到极高的精度,但也因此将受到严格限制。This method adopts the design of three-layer convolutional neural network. The first layer of convolutional neural network includes three deep convolutional networks with different detection areas. The detection area of the F1 network covers the entire face, the detection area of the EN1 network only covers the eyes and nose area, and the detection area of the NM1 network only covers the nose and mouth. area. The three networks use the above prediction method to predict different areas of the same face at the same time, and the predicted values of the three networks obtained are averaged to reduce the variance to obtain the first-level prediction position of the entire facial feature point, to avoid local features. The face prediction results that are too obvious are deviated from the actual ones. According to the idea of using regression for the prediction position of the first layer, the corresponding prediction positions of the second and third layers can be obtained for the three networks F1, EN1 and NM1 respectively. Since the range of the input area of the second and third levels is strictly limited by the prediction results of the first level, the prediction positions of the second and third levels can achieve extremely high accuracy, but will therefore be strictly limited.
(二)面部表情识别的方法(2) Methods of facial expression recognition
在面部表情识别方法中,提出了一种端到端学习模型,该模型通过姿态和表情两个角度进行人脸图像合成,并在姿态不变时进行面部表情识别。该模型的结构由一个生成器,两个鉴别器和一个分类器组成。在将图像传入模型之前进行预处理,将人脸检测算法应用于包含68个标志点的基础库进行人脸检测。经过预处理后,将面部图像输入到生成器G中生成身份标识,具体来说,存在一个法则f(x),使得每一个输入图像都有一个确定且唯一的身份标识,再将身份标识与表情编码e、姿态编码p级连,表示面部的变化。通过将极大极小算法应用于生成器G与鉴别器D之间,在解码器输入端添加相应的标签,可以获得不同姿态和表情人脸图像的新标签。在此本发明使用两个鉴别器结构分别用Datt和Di表示,其中鉴别器Datt用于识别并表示标识纠缠度,Di用于提高生成图像的质量。对人脸图像完成合成后,利用分类器Cexp完成人脸图像的面部表情识别任务,具体来说,本发明将深度学习算法应用于分类器中,确保在每个表示层中,分类关键因素逐渐趋于稳定的同时保留每个面部表情的特征信息。In the facial expression recognition method, an end-to-end learning model is proposed, which synthesizes face images from two angles of posture and expression, and performs facial expression recognition when the posture is unchanged. The structure of the model consists of a generator, two discriminators and a classifier. The image is preprocessed before feeding it into the model, and the face detection algorithm is applied to a base library containing 68 landmarks for face detection. After preprocessing, the facial image is input into the generator G to generate the identity. Specifically, there is a law f(x), so that each input image has a definite and unique identity, and then the identity is associated with Expression encoding e and pose encoding p are cascaded to represent changes in the face. By applying the minimax algorithm between the generator G and the discriminator D, and adding corresponding labels to the input of the decoder, new labels of face images with different poses and expressions can be obtained. Herein, the present invention uses two discriminator structures represented by Datt and Di respectively, wherein the discriminator Datt is used to identify and represent the entanglement degree of the identity, and Di is used to improve the quality of the generated image. After the face image is synthesized, the classifier Cexp is used to complete the facial expression recognition task of the face image. Specifically, the present invention applies the deep learning algorithm to the classifier to ensure that in each representation layer, the key factors of classification are gradually It tends to stabilize while preserving the characteristic information of each facial expression.
(三)视觉跟踪的方法(3) The method of visual tracking
本发明采用基于检测的跟踪算法,通过分析捕获人脸的图像梯度向量场,用数学方法描述了图像梯度的可能中心和方向之间的关系,设置一个可能的中心c,并在中心位置上提供一个梯度向量,通过归一化使位移与梯度方向相同。通过计算与中心位置相关的归一化位移和梯度向量gj之间的点积,得到人脸图像中眼部区域的最佳中心c*,即瞳孔中心位置,由以下方法给出:The invention adopts the detection-based tracking algorithm, and by analyzing the image gradient vector field capturing the face, the relationship between the possible center and direction of the image gradient is mathematically described, a possible center c is set, and the center position is provided. A gradient vector, normalized so that the displacement is in the same direction as the gradient. By computing the normalized displacement relative to the center position And the dot product between the gradient vector g j , get the best center c * of the eye region in the face image, that is, the pupil center position, given by the following method:
针对一个可能的中心c,选取N个不同的梯度,分别对应不同的归一化位移和梯度向量g1…gN,即选取的第j个梯度对应的归一化位移为梯度向量为gj,当目标函数取得最大值时所对应的位置变量即为最佳中心位置c*。其中,选取的不同梯度对应位移dj可由如下方法得出:For a possible center c, select N different gradients, corresponding to different normalized displacements and gradient vectors g 1 ...g N , that is, the normalized displacement corresponding to the selected j-th gradient is The gradient vector is g j , and the position variable corresponding to the maximum value of the objective function is the optimal center position c * . Among them, the selected displacements d j corresponding to different gradients can be obtained by the following method:
将位移dj缩放到单位长度,得到人脸图像中不同位置的相同权值,为了提高本方法对光照和对比度的线性变化的鲁棒性,梯度矢量gj也缩放到单位长度,可以得到目标函数在瞳孔中心位置产生最大值。另外可通过只考虑具有明显振幅的梯度向量,来降低算法的复杂度。Scale the displacement d j to unit length to obtain the same weights at different positions in the face image. In order to improve the robustness of this method to linear changes in illumination and contrast, the gradient vector g j is also scaled to unit length, and the target can be obtained. The function produces a maximum value at the pupil center position. In addition, the complexity of the algorithm can be reduced by considering only gradient vectors with significant amplitudes.
本发明提出的方法应用于人脸图像处理领域的优点如下:The advantages of the method proposed in the present invention applied to the field of face image processing are as follows:
(一)本发明通过识别面部特征来实现人机交互。(1) The present invention realizes human-computer interaction by recognizing facial features.
(二)本发明与用户信息相关的面部特征被分为整体面部特征、表情特征和视觉跟踪,通过三方面的结合分析可以得到一个三维的人脸信息。(2) The facial features related to the user information in the present invention are divided into overall facial features, expression features and visual tracking, and a three-dimensional face information can be obtained through the combined analysis of the three aspects.
(三)本发明将用户面部特征中的特定动作绑定到行为信息中,并通过面部特征信息实现人机交互,降低面部识别点选取误差。(3) The present invention binds specific actions in the user's facial features to behavior information, and realizes human-computer interaction through the facial feature information, thereby reducing the selection error of facial recognition points.
附图说明Description of drawings
图1基于人脸检测的人机交互处理过程示意图;Fig. 1 is a schematic diagram of the human-computer interaction processing process based on face detection;
图2模型框图;Figure 2 model block diagram;
具体实施方式Detailed ways
现结合附图2中的模型框图和实施示例对本发明的一种基于智能化人机交互进行人脸图像处理的方法做进一步详细描述。A method for processing face images based on intelligent human-computer interaction of the present invention will now be described in further detail with reference to the model block diagram and implementation examples in FIG. 2 .
本发明的一种基于智能化人机交互进行人脸图像处理的方法是一种通过人脸图像和视频信息的实时采集,进行人脸点检测、表情识别、视觉跟踪,实现人脸图像处理的方法。此发明应用于人脸识别领域总体工作框架叙述如下:首先对人脸进行捕获,通过三层卷积神经网络进行人脸关键点定位工作,采用端到端的深度学习模型对人脸表情进行识别,利用不同的手势和表情实现人脸图像合成和表情识别,通过使用图像梯度对眼睛中心进行定位实现视觉跟踪的工作。在获得这三个特征之后,我们将这三个特征结合到三层神经网络中进行训练,以使机器根据组合特征做出合理的响应。A method for processing face images based on intelligent human-computer interaction of the present invention is a method of performing face point detection, expression recognition, and visual tracking through real-time collection of face images and video information to realize face image processing. method. The overall working framework of the invention applied to the field of face recognition is described as follows: first, the face is captured, the key points of the face are located through a three-layer convolutional neural network, and the face expression is recognized by an end-to-end deep learning model. Use different gestures and expressions to achieve face image synthesis and expression recognition, and achieve visual tracking by using image gradients to locate the center of the eye. After obtaining these three features, we combine the three features into a three-layer neural network for training to make the machine respond reasonably based on the combined features.
本发明采用了如下的技术方案及实现步骤:The present invention adopts the following technical solutions and implementation steps:
(一)人脸点检测的方法(1) The method of face point detection
本方法采用三层卷积神经网络结构,采用绝对值校正及参数共享机制来建立第一层卷积神经网络,采用多层次回归的思想来获取二、三层卷积神经网络。由于人脸姿态变化大,导致检测不稳定,人脸点和检测点的相对位置可能在很大范围内发生变化,即产生较大的相对误差,因此第一级网络的输入区域选取较大,以覆盖尽可能多的预测位置。第一级网络的输出为后续检测区域的选取提供了选取条件,因此第二、三级检测区域相应缩小,检测区域的选取条件为:包含前一级网络获得的所有预测位置中的75%的预测位置,以前一级网络的预测位置密度最高的位置点为中心的圆形区域。This method adopts a three-layer convolutional neural network structure, adopts the absolute value correction and parameter sharing mechanism to establish the first-layer convolutional neural network, and adopts the idea of multi-level regression to obtain the second and third-layer convolutional neural networks. Due to the large change in the face posture, the detection is unstable, and the relative position of the face point and the detection point may change in a large range, that is, a large relative error occurs. Therefore, the input area of the first-level network is selected to be large. to cover as many predicted locations as possible. The output of the first-level network provides selection conditions for the selection of subsequent detection areas, so the second and third-level detection areas are correspondingly reduced. The predicted location is a circular area centered on the location point with the highest predicted location density of the previous level network.
在新的预测区域内重新获取预测位置,此过程重复多次至检测区域缩小至第一级检测区域的1%时,得到的预测位置即为每个点的预测位置,进而得到每个层次的不同输入区域的多个网络;Re-acquire the predicted position in the new prediction area. This process is repeated several times until the detection area is reduced to 1% of the first-level detection area. The obtained predicted position is the predicted position of each point, and then the Multiple networks for different input areas;
人脸点的最终预测位置x可以正式表示为一个n层级的级联,预测位置x的数学表示如下:The final predicted position x of a face point can be formally expressed as an n-level cascade, and the mathematical representation of the predicted position x is as follows:
式中x即为预测位置,li为一个n层级的级联在第层级上的预测位置个数,第i层级上的预测位置分别用表示,即x1 (1)为第1层级上的第一个预测位置,第i层级上的第li个预测位置相比较于第i-1层级上相对应的第li个预测位置的改变用Δx1 (i)表示。In the formula, x is the prediction position, l i is the number of prediction positions on the n-level cascade on the first level, and the prediction positions on the i-th level are respectively Representation, that is, x 1 (1) is the first prediction position on the 1st level, and the l ith prediction position on the i-th level The change compared to the corresponding li-th predicted position on the i -1-th level is denoted by Δx 1 (i) .
本方法采用三层卷积神经网络的设计。第一层卷积神经网络包括三个检测区域不同的深度卷积网络,分别为F1网络检测区域覆盖了整个面部,EN1网络检测区域仅覆盖眼睛和鼻子区域及NM1网络检测区域仅覆盖鼻子和嘴巴区域。三个网络针对同一个面孔的不同区域同时采用上述预测方法进行预测,得到的三个网络的预测值进行平均,以减少方差即可得到整个面部特征点的第一层预测位置,避免因局部特征过于明显而导致的面部预测结果与实际有偏差。根据第一层预测位置采用回归的思想,针对三个网络F1、EN1、NM1分别得到对应的第二、三层预测位置即可。由于第二、三层次的输入区域范围受第一层次的预测结果的严格限制,因此第二、三层次的预测位置可达到极高的精度,但也因此将受到严格限制。This method adopts the design of three-layer convolutional neural network. The first layer of convolutional neural network includes three deep convolutional networks with different detection areas. The F1 network detection area covers the entire face, the EN1 network detection area only covers the eyes and nose area, and the NM1 network detection area only covers the nose and mouth. area. The three networks use the above prediction method to predict different areas of the same face at the same time, and the predicted values of the three networks are averaged to reduce the variance to obtain the first-layer predicted position of the entire facial feature point, avoiding the local characteristics. The face prediction results that are too obvious are deviated from the actual ones. According to the idea of using regression for the prediction position of the first layer, the corresponding prediction positions of the second and third layers can be obtained for the three networks F1, EN1 and NM1 respectively. Since the range of the input area of the second and third levels is strictly limited by the prediction results of the first level, the prediction positions of the second and third levels can achieve extremely high accuracy, but will be strictly limited.
(二)面部表情识别的方法(2) Methods of facial expression recognition
在面部表情识别方法中,提出了一种端到端学习模型,该模型通过姿态和表情两个角度进行人脸图像合成,并在姿态不变时进行面部表情识别。该模型的结构由一个生成器,两个鉴别器和一个分类器组成。在将图像传入模型之前进行预处理,将人脸检测算法应用于包含68个标志点的基础库进行人脸检测。经过预处理后,将面部图像输入到生成器G中生成身份标识,具体来说,存在一个法则f(x),使得每一个输入图像都有一个确定且唯一的身份标识,再将身份标识与表情编码e、姿态编码p级连,表示面部的变化。通过将极大极小算法应用于生成器G与鉴别器D之间,在解码器输入端添加相应的标签,可以获得不同姿态和表情人脸图像的新标签。在此本发明使用两个鉴别器结构分别用Datt和Di表示,其中鉴别器Datt用于识别并表示标识纠缠度,Di用于提高生成图像的质量。对人脸图像完成合成后,利用分类器Cexp完成人脸图像的面部表情识别任务,具体来说,本发明将深度学习算法应用于分类器中,确保在每个表示层中,分类关键因素逐渐趋于稳定的同时保留每个面部表情的特征信息。In the facial expression recognition method, an end-to-end learning model is proposed, which synthesizes face images from two angles of posture and expression, and performs facial expression recognition when the posture is unchanged. The structure of the model consists of a generator, two discriminators and a classifier. The image is preprocessed before feeding it into the model, and the face detection algorithm is applied to a base library containing 68 landmarks for face detection. After preprocessing, the facial image is input into the generator G to generate the identity. Specifically, there is a law f(x), so that each input image has a definite and unique identity, and then the identity is associated with Expression encoding e and pose encoding p are cascaded to represent changes in the face. By applying the minimax algorithm between the generator G and the discriminator D, and adding corresponding labels to the input of the decoder, new labels of face images with different poses and expressions can be obtained. Herein, the present invention uses two discriminator structures represented by Datt and Di respectively, wherein the discriminator Datt is used to identify and represent the entanglement degree of the identity, and Di is used to improve the quality of the generated image. After the face image is synthesized, the classifier Cexp is used to complete the facial expression recognition task of the face image. Specifically, the present invention applies the deep learning algorithm to the classifier to ensure that in each representation layer, the key factors of classification are gradually It tends to stabilize while preserving the characteristic information of each facial expression.
(三)视觉跟踪的方法(3) The method of visual tracking
本发明采用基于检测的跟踪算法,通过分析捕获人脸的图像梯度向量场,用数学方法描述了图像梯度的可能中心和方向之间的关系,设置一个可能的中心c,并在中心位置上提供一个梯度向量,通过归一化使位移与梯度方向相同。通过计算与中心位置相关的归一化位移和梯度向量gj之间的点积,得到人脸图像中眼部区域的最佳中心c*,即瞳孔中心位置,由以下方法给出:The invention adopts the detection-based tracking algorithm, and by analyzing the image gradient vector field capturing the face, the relationship between the possible center and direction of the image gradient is mathematically described, a possible center c is set, and the center position is provided. A gradient vector, normalized so that the displacement is in the same direction as the gradient. By computing the normalized displacement relative to the center position And the dot product between the gradient vector g j , get the best center c * of the eye region in the face image, that is, the pupil center position, given by the following method:
针对一个可能的中心c,选取N个不同的梯度,分别对应不同的归一化位移和梯度向量g1…gN,即选取的第个梯度对应的归一化位移为梯度向量为gj,当目标函数取得最大值时所对应的位置变量即为最佳中心位置c*。其中,选取的不同梯度对应位移dj可由如下方法得出:For a possible center c, select N different gradients, corresponding to different normalized displacements and the gradient vector g 1 ...g N , that is, the normalized displacement corresponding to the selected gradient is The gradient vector is g j , and the position variable corresponding to the maximum value of the objective function is the optimal center position c * . Among them, the selected displacements d j corresponding to different gradients can be obtained by the following method:
将位移dj缩放到单位长度,得到人脸图像中不同位置的相同权值,为了提高本方法对光照和对比度的线性变化的鲁棒性,梯度矢量gj也缩放到单位长度,可以得到目标函数在瞳孔中心位置产生最大值。另外可通过只考虑具有明显振幅的梯度向量,来降低算法的复杂度。Scale the displacement d j to unit length to obtain the same weights at different positions in the face image. In order to improve the robustness of this method to linear changes in illumination and contrast, the gradient vector g j is also scaled to unit length, and the target can be obtained. The function produces a maximum value at the pupil center position. In addition, the complexity of the algorithm can be reduced by considering only gradient vectors with significant amplitudes.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010232764.9A CN111428661A (en) | 2020-03-28 | 2020-03-28 | Method for processing face image based on intelligent human-computer interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010232764.9A CN111428661A (en) | 2020-03-28 | 2020-03-28 | Method for processing face image based on intelligent human-computer interaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111428661A true CN111428661A (en) | 2020-07-17 |
Family
ID=71549134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010232764.9A Pending CN111428661A (en) | 2020-03-28 | 2020-03-28 | Method for processing face image based on intelligent human-computer interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111428661A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150169938A1 (en) * | 2013-12-13 | 2015-06-18 | Intel Corporation | Efficient facial landmark tracking using online shape regression method |
CN105868689A (en) * | 2016-02-16 | 2016-08-17 | 杭州景联文科技有限公司 | Cascaded convolutional neural network based human face occlusion detection method |
CN108875624A (en) * | 2018-06-13 | 2018-11-23 | 华南理工大学 | Method for detecting human face based on the multiple dimensioned dense Connection Neural Network of cascade |
-
2020
- 2020-03-28 CN CN202010232764.9A patent/CN111428661A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150169938A1 (en) * | 2013-12-13 | 2015-06-18 | Intel Corporation | Efficient facial landmark tracking using online shape regression method |
CN105868689A (en) * | 2016-02-16 | 2016-08-17 | 杭州景联文科技有限公司 | Cascaded convolutional neural network based human face occlusion detection method |
CN108875624A (en) * | 2018-06-13 | 2018-11-23 | 华南理工大学 | Method for detecting human face based on the multiple dimensioned dense Connection Neural Network of cascade |
Non-Patent Citations (3)
Title |
---|
FABIAN TIMM等: "Accurate Eye Centre Localisation by Means of Gradients", 《HTTPS://WWW.RESEARCHGATE.NET/PUBLICATION/221415814》 * |
FEIFEI ZHANG等: "Joint Pose and Expression Modeling for Facial Expression Recognition", 《CVPR》 * |
YI SUN等: "Deep Convolutional Network Cascade for Facial Point Detection", 《HTTP://MMLAB.IE.CUHK.EDU.HK/CNN FACEPOINT.HTM》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112418095B (en) | A method and system for facial expression recognition combined with attention mechanism | |
CN106951867B (en) | Face identification method, device, system and equipment based on convolutional neural networks | |
Yang et al. | Faceness-net: Face detection through deep facial part responses | |
WO2020108362A1 (en) | Body posture detection method, apparatus and device, and storage medium | |
Várkonyi-Kóczy et al. | Human–computer interaction for smart environment applications using fuzzy hand posture and gesture models | |
CN111967379B (en) | Human behavior recognition method based on RGB video and skeleton sequence | |
CN104573665B (en) | A kind of continuous action recognition methods based on improvement viterbi algorithm | |
Wu et al. | Generalized zero-shot emotion recognition from body gestures | |
CN111444488A (en) | Identity authentication method based on dynamic gesture | |
Li et al. | Visual object tracking via multi-stream deep similarity learning networks | |
Al Farid et al. | Single shot detector CNN and deep dilated masks for vision-based hand gesture recognition from video sequences | |
CN110956141A (en) | A Rapid Analysis Method of Human Continuous Action Based on Local Recognition | |
Xie et al. | Towards Hardware-Friendly and Robust Facial Landmark Detection Method | |
Hanzla et al. | Human Pose Estimation and Event Recognition via Feature Extraction and Neuro-Fuzzy Classifier | |
Ghaleb et al. | Multimodal fusion based on information gain for emotion recognition in the wild | |
Renjith et al. | An effective skeleton-based approach for multilingual sign language recognition | |
CN107346207A (en) | A kind of dynamic gesture cutting recognition methods based on HMM | |
Zhao et al. | A local spatial–temporal synchronous network to dynamic gesture recognition | |
CN119339422A (en) | A micro-expression recognition method based on visual Transformer | |
CN111814604A (en) | A Pedestrian Tracking Method Based on Siamese Neural Network | |
CN112580527A (en) | Facial expression recognition method based on convolution long-term and short-term memory network | |
CN111428661A (en) | Method for processing face image based on intelligent human-computer interaction | |
Zhao et al. | Channel self-attention residual network: Learning micro-expression recognition features from augmented motion flow images | |
Hao | A Method for Estimating the Posture of Yoga Asansa Exercises Based on Lightweight OpenPose | |
Abbattista et al. | A biometric-based system for unsupervised anomaly behaviour detection at the pawn shop |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200717 |