CN116311426A

CN116311426A - A Children's Expression Scoring System

Info

Publication number: CN116311426A
Application number: CN202310120894.7A
Authority: CN
Inventors: 曹操; 李洋; 陈畅捷; 刘炫宇; 蒋晓峰
Original assignee: Shanghai Media Tech Co ltd
Current assignee: Shanghai Media Tech Co ltd
Priority date: 2023-02-14
Filing date: 2023-02-14
Publication date: 2023-06-23

Abstract

The invention relates to the technical field of face recognition, in particular to a pediatric expression scoring system, which comprises: the face detection module receives an externally input expression picture, and detects a first face area from the expression picture; the feature extraction module is used for extracting a plurality of action units in the first face area to obtain a first feature group; and the score generation module is used for generating a similarity score according to the first feature group and the second feature group which are generated in advance. The beneficial effects are that: aiming at the problem that the evaluation process of dance expressions is not accurate enough mainly by teachers according to subjective, the scheme is characterized in that the first characteristic group is built by extracting corresponding action units from input expression pictures by constructing the system, so that better quantitative characterization of facial expressions is realized, and then, a better quantitative evaluation effect is realized by calculating with a second characteristic group acquired in advance.

Description

A Children's Expression Scoring System

技术领域technical field

本发明涉及人脸识别技术领域，具体涉及一种少儿表情评分系统。The invention relates to the technical field of face recognition, in particular to a children's facial expression scoring system.

背景技术Background technique

舞蹈是表演艺术的一种，是由一连串的动作形成，可以是即兴的，也可以是经过专业编排的；舞蹈分为街舞、现代舞、民族舞、街舞、芭蕾舞等。为实现较好的表现效果，在舞蹈动作中，通常会加入特定的舞蹈表情，以实现更好的情感表达效果。Dance is a kind of performing arts, which is formed by a series of movements, which can be impromptu or professionally choreographed; dance is divided into street dance, modern dance, folk dance, street dance, ballet and so on. In order to achieve a better expressive effect, specific dance expressions are usually added to the dance movements to achieve better emotional expression.

现有技术中，针对少儿舞蹈表情的评价过程，主要是依赖于教师对特定的舞蹈动作所对应的表情做出示范，并由学生进行模仿，随后由教师给出相应的评价。In the prior art, the evaluation process for children's dance expressions mainly relies on teachers demonstrating the expressions corresponding to specific dance movements, and students imitate them, and then teachers give corresponding evaluations.

但是，在实际实施过程中，发明人发现，上述方案由于是通过人工对表情的相似度进行评价，导致了评价过程相对主观，难以进行准确量化的问题。However, in the actual implementation process, the inventors found that the above solution evaluates the similarity of expressions manually, which leads to the problem that the evaluation process is relatively subjective and it is difficult to accurately quantify.

发明内容Contents of the invention

针对现有技术中存在的上述问题，现提供一种少儿表情评分系统。Aiming at the above-mentioned problems existing in the prior art, a children's facial expression scoring system is now provided.

具体技术方案如下：The specific technical scheme is as follows:

一种少儿表情评分系统，包括：A children's facial expression scoring system, comprising:

面部检测模块，所述面部检测模块接收外部输入的表情图片，自所述表情图片中检测得到第一面部区域；A face detection module, the face detection module receives an externally input expression picture, and detects the first facial region from the expression picture;

特征抽取模块，所述特征抽取模块连接所述面部检测模块，所述特征抽取模块对所述第一面部区域中的多个动作单元进行抽取得到第一特征组；A feature extraction module, the feature extraction module is connected to the face detection module, and the feature extraction module extracts a plurality of action units in the first facial area to obtain a first feature group;

评分生成模块，所述评分生成模块连接所述特征抽取模块，所述评分生成模块根据所述第一特征组和预先生成的第二特征组生成相似度评分。A score generation module, the score generation module is connected to the feature extraction module, and the score generation module generates a similarity score according to the first feature group and the pre-generated second feature group.

另一方面，所述面部检测模块采用一预先训练的人脸检测模型对所述表情图片进行检测；On the other hand, the face detection module uses a pre-trained face detection model to detect the expression pictures;

所述人脸检测模型包括：The face detection model includes:

特征抽取网络，所述特征抽取网络对输入的所述表情图片抽取特征图组合；A feature extraction network, the feature extraction network extracts feature map combinations from the input facial expression pictures;

特征金字塔，所述特征金字塔连接所述特征抽取网络，所述特征金字塔对所述特征图组合进行融合以得到特征融合结果；A feature pyramid, the feature pyramid is connected to the feature extraction network, and the feature pyramid fuses the combination of feature maps to obtain a feature fusion result;

单阶段检测模块，所述单阶段检测模块根据所述特征融合结果生成所述第一面部区域。A single-stage detection module, the single-stage detection module generates the first facial region according to the feature fusion result.

另一方面，所述单阶段检测模块包括：On the other hand, the single-stage detection module includes:

卷积层，所述卷积层自所述特征融合结果中抽取图像特征；a convolutional layer, the convolutional layer extracts image features from the feature fusion result;

上下文网络，所述上下文网络自所述特征融合结果中生成关联特征；a context network that generates associated features from the feature fusion results;

合并层，所述合并层分别连接所述卷积层和所述上下文网络，所述合并层根据所述图像特征和所述关联特征生成合并特征；A merging layer, the merging layer is respectively connected to the convolutional layer and the context network, and the merging layer generates a merging feature according to the image features and the associated features;

人脸分类层，所述人脸分类层连接所述合并层，所述人脸分类层根据所述合并特征生成人脸分类结果；A face classification layer, the face classification layer is connected to the merging layer, and the face classification layer generates a face classification result according to the merging feature;

坐标回归层，所述坐标回归层连接所述合并层，所述坐标回归层根据所述合并特征生成坐标位移预测结果；A coordinate regression layer, the coordinate regression layer is connected to the merging layer, and the coordinate regression layer generates a coordinate displacement prediction result according to the merging feature;

形变卷积网络，所述形变卷积网络分别连接所述人脸分类层和所述坐标回归层，所述形变卷积网络根据所述人脸分类结果和所述坐标位移预测结果生成所述第一面部区域。Deformable convolutional network, the deformable convolutional network is respectively connected to the face classification layer and the coordinate regression layer, and the deformable convolutional network generates the first One facial area.

另一方面，所述特征抽取模块包括：On the other hand, the feature extraction module includes:

元素提取模块，所述元素提取模块自所述第一面部区域中识别得到多个脸部元素；an element extraction module, the element extraction module identifies a plurality of facial elements from the first facial region;

向量组合模块，所述向量组合模块连接所述元素提取模块，所述向量组合模块选取所述脸部元素并将所述脸部元素组合为子向量；A vector combination module, the vector combination module is connected to the element extraction module, the vector combination module selects the facial elements and combines the facial elements into sub-vectors;

动作单元预测模块，所述动作单元预测模块连接所述向量组合模块，所述动作单元预测模块根据所述子向量预测得到所述动作单元并添加至所述第一特征组中。An action unit prediction module, the action unit prediction module is connected to the vector combination module, and the action unit prediction module obtains the action unit according to the sub-vector prediction and adds it to the first feature group.

另一方面，所述动作单元预测模块包括：On the other hand, the action unit prediction module includes:

预处理层，所述预处理层接收输入的所述子向量并生成预处理向量；a preprocessing layer, the preprocessing layer receives the input sub-vector and generates a preprocessing vector;

第一全连接层，所述第一全连接层连接所述预处理层，所述第一全连接层分别生成每个所述预处理向量的向量得分；The first fully connected layer, the first fully connected layer is connected to the preprocessing layer, and the first fully connected layer generates a vector score for each of the preprocessing vectors;

权重生成层，所述权重生成层连接所述第一全连接层，所述权重生成层分别生成每个所述预处理向量的向量权重；A weight generation layer, the weight generation layer is connected to the first fully connected layer, and the weight generation layer generates the vector weights of each of the preprocessing vectors respectively;

前馈层，所述前馈层连接所述第一全连接层和所述权重生成层，所述前馈层根据所述向量得分和所述向量权重自所述子向量中筛选的得到多个所述动作单元。A feed-forward layer, the feed-forward layer is connected to the first fully connected layer and the weight generation layer, and the feed-forward layer is obtained from the sub-vectors according to the vector score and the vector weight The action unit.

另一方面，所述预处理层包括依次连接的第二全连接层、批标准化层和线性修正层。On the other hand, the preprocessing layer includes a second fully connected layer, a batch normalization layer and a linear correction layer connected in sequence.

另一方面，于采集所述表情图片之前，还预先采集有参考图片；On the other hand, before collecting the expression picture, a reference picture is also collected in advance;

所述面部检测模块自所述参考图片中检测得到第二面部区域，所述特征抽取模块对所述第二面部区域中的多个参考单元进行抽取得到所述第二特征组。The face detection module detects a second face area from the reference picture, and the feature extraction module extracts a plurality of reference units in the second face area to obtain the second feature group.

另一方面，还包括动作单元衡量模块，所述动作单元衡量模块连接所述特征抽取模块，所述动作单元衡量模块对所述参考单元进行选择以筛选出实际添加至所述第二特征组的所述参考单元。On the other hand, it also includes an action unit measurement module, the action unit measurement module is connected to the feature extraction module, and the action unit measurement module selects the reference unit to filter out the ones that are actually added to the second feature group. the reference unit.

另一方面，所述动作单元衡量模块包括：On the other hand, the action unit measurement module includes:

节点生成模块，所述节点生成模块获取所述参考单元并创建树节点；a node generation module, the node generation module obtains the reference unit and creates a tree node;

连接模块，所述连接模块连接所述节点生成模块，所述连接模块根据所述参考单元和所述树节点创建决策树；a connection module, the connection module is connected to the node generation module, and the connection module creates a decision tree according to the reference unit and the tree node;

衡量模块，所述衡量模块连接所述连接模块，所述衡量模块对所述树节点进行改变，并计算所述决策树的方差以生成衡量结果；a measurement module, the measurement module is connected to the connection module, the measurement module changes the tree nodes, and calculates the variance of the decision tree to generate a measurement result;

筛选模块，所述筛选模块连接所述衡量模块，所述筛选模块根据所述衡量结果和预先配置的抽取数量选择对所述决策树的方差影响最大的多个所述参考单元，作为所述第二特征组中的所述参考的那元。A screening module, the screening module is connected to the measurement module, and the screening module selects a plurality of reference units that have the greatest influence on the variance of the decision tree according to the measurement results and the pre-configured extraction quantity, as the first The element of the reference in the set of two features.

另一方面，所述评分生成模块中，对所述第一特征组中的所述动作单元和所述第二特征组中的参考单元分别计算均方误差，以作为所述相似度评分。On the other hand, in the score generating module, the mean square error is calculated respectively for the action unit in the first feature group and the reference unit in the second feature group, as the similarity score.

上述技术方案具有如下优点或有益效果：The above technical solution has the following advantages or beneficial effects:

针对现有技术中对舞蹈表情的评价过程主要由教师依照主观作出不够准确的问题，本方案通过构建上述系统，对输入的表情图片抽取相应的动作单元来组建第一特征组，实现了对面部表情较好的量化表征，随后，通过与预先采集的第二特征组进行计算，从而实现了较好的量化评价效果。Aiming at the problem that the evaluation process of dance expressions in the prior art is not accurate enough based on the teacher’s subjective judgment, this program builds the above system and extracts the corresponding action units from the input expression pictures to form the first feature group. A better quantitative representation of the expression is then calculated with the pre-collected second feature group, thereby achieving a better quantitative evaluation effect.

附图说明Description of drawings

参考所附附图，以更加充分的描述本发明的实施例。然而，所附附图仅用于说明和阐述，并不构成对本发明范围的限制。Embodiments of the present invention are more fully described with reference to the accompanying drawings. However, the accompanying drawings are for illustration and illustration only, and do not limit the scope of the present invention.

图1为本发明实施例的整体示意图；Fig. 1 is the overall schematic diagram of the embodiment of the present invention;

图2为本发明实施例中人脸检测模型示意图；Fig. 2 is the schematic diagram of human face detection model in the embodiment of the present invention;

图3为本发明实施例中单阶段检测模型示意图；3 is a schematic diagram of a single-stage detection model in an embodiment of the present invention;

图4为本发明实施例中特征抽取模块示意图；Fig. 4 is a schematic diagram of a feature extraction module in an embodiment of the present invention;

图5为本发明实施例中动作单元预测模块示意图；5 is a schematic diagram of an action unit prediction module in an embodiment of the present invention;

图6为本发明实施例中预处理层示意图；FIG. 6 is a schematic diagram of a pretreatment layer in an embodiment of the present invention;

图7为本发明实施例中动作单元衡量模块示意图。Fig. 7 is a schematic diagram of an action unit weighing module in an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。It should be noted that, in the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other.

下面结合附图和具体实施例对本发明作进一步说明，但不作为本发明的限定。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.

本发明包括：The present invention includes:

一种少儿表情评分系统，如图1所示，包括：A kind of children's expression scoring system, as shown in Figure 1, comprises:

面部检测模块1，面部检测模块1接收外部输入的表情图片，自表情图片中检测得到第一面部区域；Face detection module 1, face detection module 1 receives the facial expression picture of external input, detects and obtains the first face region from the facial expression picture;

特征抽取模块2，特征抽取模块2连接面部检测模块1，特征抽取模块2对第一面部区域中的多个动作单元进行抽取得到第一特征组；The feature extraction module 2, the feature extraction module 2 is connected to the face detection module 1, and the feature extraction module 2 extracts a plurality of action units in the first face area to obtain the first feature group;

于采集表情图片之前，还预先采集有参考图片；Before collecting the expression pictures, there are also pre-collected reference pictures;

面部检测模块1自参考图片中检测得到第二面部区域，特征抽取模块2对第二面部区域中的多个参考单元进行抽取得到第二特征组；The face detection module 1 detects the second face area from the reference picture, and the feature extraction module 2 extracts a plurality of reference units in the second face area to obtain the second feature group;

评分生成模块3，评分生成模块3连接特征抽取模块2，评分生成模块3根据第一特征组和预先生成的第二特征组生成相似度评分；Score generation module 3, the score generation module 3 is connected to the feature extraction module 2, and the score generation module 3 generates a similarity score according to the first feature group and the pre-generated second feature group;

具体地，针对现有技术中的表情评估过程相对主观，难以进行量化的问题，本实施例中，通过构建上述的少儿表情评分系统，采用面部检测模块1对输入的表情图片进行截取，提取得到表情图片中对应于脸部的第一面部区域，随后通过特征抽取模块2对第一面部区域中对应于各面部肌肉动作的动作单元进行提取作为第一特征组，实现了对面部表情较好的表征效果；通过该少儿表情评分系统预先对教师作出的参考图片进行第二面部区域的采集、参考单元抽取形成第二特征组，可以同样实现对标准表情较好的表征效果，进而使得评分生成模块3可以根据量化后的第一特征组和第二特征组计算得到相似度评分。Specifically, in view of the problem that the expression evaluation process in the prior art is relatively subjective and difficult to quantify, in this embodiment, by constructing the above-mentioned children's expression scoring system, the face detection module 1 is used to intercept the input expression picture and extract the obtained Corresponding to the first facial area of face in the facial expression picture, the action unit corresponding to each facial muscle action in the first facial area is then extracted as the first feature group by the feature extraction module 2, which realizes the comparison of facial expressions. Good characterization effect; through the children's facial expression scoring system, the reference picture made by the teacher is collected in advance for the second facial area, and the reference unit is extracted to form the second feature group, which can also achieve a good characterization effect on standard expressions, and then make the score The generation module 3 can calculate the similarity score according to the quantized first feature group and the second feature group.

在实施过程中，表情图片指针对待评价的学员采集的脸部照片，参考照片指针对教师采集或者通过对教学图像进行截取得到的脸部照片，该脸部照片通过图像传感器拍摄得到，可能包含有人脸、背景和部分上半身及肢体部分；第一面部区域和第二面部区域为通过面部检测模块1检测、截取得到的对应于实际的人脸区域的脸部图像，其相对于原始采集的脸部照片去除了无关的背景和边缘部分。动作单元和参考单元指人脸上可检测到的多个对应于面部肌肉动作的部位，比如提眉、拉唇角、下眉、内提眉等，通过对动作单元和参考单元进行检测可以实现对较为复杂的面部表情的检测。During the implementation process, the facial expression picture refers to the facial photo collected by the student to be evaluated, and the reference photo refers to the facial photo collected by the teacher or obtained by intercepting the teaching image. The facial photo is taken by an image sensor and may contain Face, background and part of the upper body and limbs; the first facial area and the second facial area are facial images corresponding to the actual human face area detected and intercepted by the face detection module 1, which are compared to the face of the original collection. This photo has extraneous background and edges removed. Action units and reference units refer to multiple detectable parts on the face corresponding to facial muscle movements, such as eyebrow raising, lip corner pulling, eyebrow lowering, inner eyebrow raising, etc., which can be realized by detecting the action unit and reference unit Detection of more complex facial expressions.

在一个实施例中，面部检测模块采用一预先训练的人脸检测模型对表情图片进行检测；In one embodiment, the face detection module uses a pre-trained face detection model to detect facial expressions;

如图2所示，人脸检测模型包括：As shown in Figure 2, the face detection model includes:

特征抽取网络11，特征抽取网络11对输入的表情图片抽取特征图组合；Feature extraction network 11, feature extraction network 11 extracts feature map combination to the expression picture of input;

特征金字塔12，特征金字塔12连接特征抽取网络12，特征金字塔12对特征图组合进行融合以得到特征融合结果；The feature pyramid 12, the feature pyramid 12 is connected to the feature extraction network 12, and the feature pyramid 12 fuses the combination of feature maps to obtain a feature fusion result;

单阶段检测模块13，单阶段检测模块13根据特征融合结果生成第一面部区域。A single-stage detection module 13. The single-stage detection module 13 generates the first face region according to the feature fusion result.

具体地，为实现较好的识别效果，本实施例中，通过构建上述的人脸检测模型，通过将输入的表情图片通过深度可分离卷积结构的特征抽取网络11，实现了对图像特征的抽取，并将特征抽取网络11中的最后三层卷积层的特征图作为输出来形成特征图组合；随后，通过特征金字塔12中的1X1卷积对特征图组合中的三个有效特征层进行通道数的调整，再通过上采样和特征值叠加的方式对这三个有效特征层进行特征融合得到特征融合结果，进而使得单阶段检测模块13能够根据特征融合结果预测得到实际的人脸位置来作为第一面部区域或第二面部区域，以此来实现对人脸较好的检测效果。Specifically, in order to achieve a better recognition effect, in this embodiment, by constructing the above-mentioned face detection model, by passing the input facial expression picture through the feature extraction network 11 with a depth-separable convolution structure, the recognition of image features is realized. extraction, and the feature maps of the last three convolutional layers in the feature extraction network 11 are used as output to form a feature map combination; subsequently, the three effective feature layers in the feature map combination are performed through the 1X1 convolution in the feature pyramid 12 The number of channels is adjusted, and then the feature fusion of the three effective feature layers is performed by means of upsampling and feature value superposition to obtain the feature fusion result, so that the single-stage detection module 13 can predict the actual face position according to the feature fusion result. As the first facial area or the second facial area, a better detection effect on human faces can be achieved.

在一个实施例中，如图3所示，单阶段检测模块13包括：In one embodiment, as shown in Figure 3, the single-stage detection module 13 includes:

卷积层131，卷积层131自特征融合结果中抽取图像特征；Convolutional layer 131, the convolutional layer 131 extracts image features from the feature fusion result;

上下文网络132，上下文网络132自特征融合结果中生成关联特征；Context network 132, context network 132 generates associated features from feature fusion results;

合并层133，合并层133分别连接卷积层131和上下文网络132，合并层133根据图像特征和关联特征生成合并特征；Merging layer 133, the merging layer 133 connects the convolutional layer 131 and the context network 132 respectively, and the merging layer 133 generates a merged feature according to image features and associated features;

人脸分类层134，人脸分类层134连接合并层133，人脸分类层134根据合并特征生成人脸分类结果；The face classification layer 134, the face classification layer 134 is connected to the merge layer 133, and the face classification layer 134 generates a face classification result according to the merge feature;

坐标回归层135，坐标回归层135连接合并层133，坐标回归层135根据合并特征生成坐标位移预测结果；The coordinate regression layer 135, the coordinate regression layer 135 is connected to the merging layer 133, and the coordinate regression layer 135 generates a coordinate displacement prediction result according to the merging feature;

形变卷积网络136，形变卷积网络136分别连接人脸分类层134和坐标回归层135，形变卷积网络136根据人脸分类结果和坐标位移预测结果生成第一面部区域。The deformable convolutional network 136 is connected to the face classification layer 134 and the coordinate regression layer 135 respectively, and the deformable convolutional network 136 generates the first facial area according to the face classification result and the coordinate displacement prediction result.

具体地，为实现对人脸区域较好的预测效果，本实施例中，通过添加1X1的卷积层131实现了对特征融合结果的特征抽取，并通过上下文网络132对特征融合结果进行上下文之间互信息的识别，进而生成关联特征来实现对人脸边缘部分较好的识别效果；随后，通过合并层133对图像特征和关联特征进行concat处理得到合并特征，分别输入人脸分类层134和坐标回归层135中。其中，人脸分类层134的输出向量维度为W/S*H/S*2K，其能够对图像特征中的人脸部分进行判别、分类；坐标回归层135对检测得到的人脸检测框进行修正坐标回归，其输出的向量维度为W/S*H/S*4K，在坐标回归层135中，其通过预测每个滑动点处每个含有人脸与ground truth的相对缩放量和位移量来实现这一过程。随后，当实现了对人脸分类结果和坐标位于预测结果的生成后，进一步地添加形变卷积网络136来作为形变层，模拟几何形变，以此来实现对少儿表情和成人表情之间的差异的较好的模拟效果，进而提取得到第一面部区域。Specifically, in order to achieve a better prediction effect on the face area, in this embodiment, the feature extraction of the feature fusion result is realized by adding a 1×1 convolutional layer 131, and the context network 132 is used to contextualize the feature fusion result. Recognition of mutual information, and then generate associated features to achieve a better recognition effect on the edge of the face; then, through the merge layer 133, the image features and associated features are concat processed to obtain the merged features, which are input into the face classification layer 134 and the face classification layer 134 respectively. In the coordinate regression layer 135 . Wherein, the output vector dimension of the face classification layer 134 is W/S*H/S*2K, which can discriminate and classify the face parts in the image features; Corrected coordinate regression, the output vector dimension is W/S*H/S*4K, in the coordinate regression layer 135, by predicting the relative scaling and displacement of each face and ground truth at each sliding point to realize this process. Subsequently, after realizing the generation of the face classification results and coordinate location prediction results, the deformation convolution network 136 is further added as a deformation layer to simulate geometric deformation, so as to realize the difference between children's expressions and adult expressions. A better simulation effect, and then extract the first facial region.

在一个实施例中，如图4所示，特征抽取模块2包括：In one embodiment, as shown in Figure 4, the feature extraction module 2 includes:

元素提取模块21，元素提取模块21自第一面部区域中识别得到多个脸部元素；Element extraction module 21, element extraction module 21 obtains a plurality of facial elements from the identification of the first facial region;

向量组合模块22，向量组合模块22连接元素提取模块21，向量组合模块选取脸部元素并将脸部元素组合为子向量；Vector combination module 22, the vector combination module 22 is connected to the element extraction module 21, the vector combination module selects face elements and combines the face elements into sub-vectors;

动作单元预测模块23，动作单元预测模块23连接向量组合模块22，动作单元预测模块23根据子向量预测得到动作单元并添加至第一特征组中。The action unit prediction module 23, the action unit prediction module 23 is connected to the vector combination module 22, the action unit prediction module 23 obtains the action unit according to the sub-vector prediction and adds it to the first feature group.

具体地，为实现较好的动作单元的生成过程，本实施例中，在通过面部检测模块1对人脸区域进行提取、特征层的生成后，进一步地通过元素提取模块21自第一面部区域中识别得到脸部元素，并由向量组合模块22来对特定数量的脸部元素组装为W*H的子向量，每个子向量均可能表征一动作单元。随后，通过动作单元预测模块23对每个子向量分别进行预测从而判断出实际的动作单元来添加至第一特征组中。Specifically, in order to achieve a better generation process of action units, in this embodiment, after the face region is extracted and the feature layer is generated by the face detection module 1, the element extraction module 21 is further used to extract the first face The facial elements are identified in the area, and the vector combination module 22 assembles a specific number of facial elements into W*H sub-vectors, each sub-vector may represent an action unit. Subsequently, each sub-vector is predicted by the action unit prediction module 23 to determine the actual action unit and add it to the first feature group.

在一个实施例中，如图5所示，动作单元预测模块23包括：In one embodiment, as shown in Figure 5, the action unit prediction module 23 includes:

预处理层231，预处理层231接收输入的子向量并生成预处理向量；Preprocessing layer 231, the preprocessing layer 231 receives the input sub-vector and generates a preprocessing vector;

第一全连接层232，第一全连接层232连接预处理层231，第一全连接层232分别生成每个预处理向量的向量得分；The first fully connected layer 232, the first fully connected layer 232 is connected to the preprocessing layer 231, and the first fully connected layer 232 generates the vector score of each preprocessing vector respectively;

权重生成层233，权重生成层233连接第一全连接层232，权重生成层233分别生成每个预处理向量的向量权重；Weight generating layer 233, weight generating layer 233 is connected to the first fully connected layer 232, and weight generating layer 233 generates the vector weight of each preprocessing vector respectively;

前馈层234，前馈层234连接第一全连接层232和权重生成层233，前馈层234根据向量得分和向量权重自子向量中筛选的得到多个动作单元。Feedforward layer 234. The feedforward layer 234 connects the first fully connected layer 232 and the weight generation layer 233. The feedforward layer 234 obtains multiple action units from the sub-vectors according to the vector score and vector weight.

具体地，为实现对动作单元较好的预测效果，本实施例中，通过设置预处理层231对输入的子向量进行预处理，随后，在第一全连接层232中对每个预处理向量分别计算向量得分，并通过权重生成层233通过Softmax函数计算得到每个预处理向量的向量权重，进而使得前馈层234能够基于sigmoid激活函数来计算得到实际的动作单元。Specifically, in order to achieve a better prediction effect on the action unit, in this embodiment, the input sub-vectors are preprocessed by setting the preprocessing layer 231, and then each preprocessing vector is processed in the first fully connected layer 232 The vector scores are calculated separately, and the vector weight of each preprocessing vector is calculated by the weight generation layer 233 through the Softmax function, so that the feedforward layer 234 can calculate the actual action unit based on the sigmoid activation function.

在一个实施例中，如图6所示，预处理层231包括依次连接的第二全连接层2311、批标准化层2312和线性修正层2313。In one embodiment, as shown in FIG. 6 , the preprocessing layer 231 includes a second fully connected layer 2311 , a batch normalization layer 2312 and a linear correction layer 2313 connected in sequence.

具体地，为实现对子向量较好的预处理效果，本实施例中，通过在预处理层231中依次设置第二全连接层2311、批标准化层2312对子向量进行标准化处理(BatchNormalization)以及具有ReLU函数的线性修正层2313，实现了较好的向量预处理效果。Specifically, in order to achieve a better preprocessing effect on the sub-vectors, in this embodiment, the second fully connected layer 2311 and the batch normalization layer 2312 are sequentially set in the pre-processing layer 231 to perform normalization processing on the sub-vectors (BatchNormalization) and The linear correction layer 2313 with the ReLU function achieves better vector preprocessing effect.

在一个实施例中，如图7所示，还包括动作单元衡量模块4，动作单元衡量模块4连接特征抽取模块2，动作单元衡量模块4对参考单元进行选择以筛选出实际添加至第二特征组的参考单元；In one embodiment, as shown in Figure 7, it also includes an action unit measurement module 4, the action unit measurement module 4 is connected to the feature extraction module 2, and the action unit measurement module 4 selects the reference unit to filter out the actual addition to the second feature the reference unit of the group;

动作单元衡量模块4包括：Action unit measurement module 4 includes:

节点生成模块41，节点生成模块41获取参考单元并创建树节点；Node generation module 41, node generation module 41 obtains reference unit and creates tree node;

连接模块42，连接模块42连接节点生成模块41，连接模块42根据参考单元和树节点创建决策树；Connection module 42, connection module 42 connection node generation module 41, connection module 42 creates decision tree according to reference unit and tree node;

衡量模块43，衡量模块43连接连接模块42，衡量模块43对树节点进行改变，并计算决策树的方差以生成衡量结果；A measurement module 43, the measurement module 43 is connected to the connection module 42, the measurement module 43 changes the tree nodes, and calculates the variance of the decision tree to generate a measurement result;

筛选模块44，筛选模块44连接衡量模块43，筛选模块44根据衡量结果和预先配置的抽取数量选择对决策树的方差影响最大的多个参考单元，作为第二特征组中的参考单元。The screening module 44, the screening module 44 is connected to the measurement module 43, and the screening module 44 selects a plurality of reference units that have the greatest impact on the variance of the decision tree according to the measurement results and the pre-configured extraction quantity, as the reference units in the second feature group.

具体地，为实现对表情的相似度较为显著的评价效果，本实施例中，在采集参考图片的过程中，还进一步地通过设置动作单元衡量模块4来对参考单元中对表情影响较大的部分进行筛选，并仅将该部分参考单元作为实际进行评价时会选用的参考单元添加至第二特征组中。通过该方法可以在对各动作单元与参考单元之间的相似度进行计算的过程中增大相似度评分的区间，实现更好的衡量效果。Specifically, in order to achieve a more significant evaluation effect on the similarity of expressions, in this embodiment, in the process of collecting reference pictures, the action unit measurement module 4 is further set up to measure the effect of the reference units that have a greater impact on the expression. Partially screened, and only this part of the reference units is added to the second feature group as the reference units that will be selected during the actual evaluation. Through this method, the range of the similarity score can be increased in the process of calculating the similarity between each action unit and the reference unit, so as to achieve a better measurement effect.

在实施过程中，为实现较好的筛选效果，本实施例中通过采用决策树的方式来衡量各参考单元对表情的影响。具体来说，通过节点生成模块41将每个参考单元作为树节点和叶子，并由连接模块42依照各参考节点之间的关联性对树节点进行联结从而创建出决策树。随后，通过衡量模块43对树节点进行随机改变，并计算决策树整体的方差来衡量决策树的纯度，以此来使得筛选模块44能够依照方差的变化情况以及对应的参考单元的变化情况来筛选得到对表情影响最大的TopN个参考单元。在一个实施例中，筛选模块44对各参考单元的变化情况对方差的影响进行统计，并筛选得到影响最大的10个参考单元作为第二特征组中的参考单元。In the implementation process, in order to achieve a better screening effect, in this embodiment, a decision tree is used to measure the influence of each reference unit on the expression. Specifically, the node generation module 41 uses each reference unit as a tree node and a leaf, and the connection module 42 connects the tree nodes according to the association between the reference nodes to create a decision tree. Subsequently, the tree nodes are randomly changed by the measurement module 43, and the overall variance of the decision tree is calculated to measure the purity of the decision tree, so that the screening module 44 can filter according to the variation of the variance and the variation of the corresponding reference unit Get the TopN reference units that have the greatest influence on the expression. In one embodiment, the screening module 44 makes statistics on the influence of the variation of each reference unit on the variance, and screens out the 10 reference units with the greatest influence as the reference units in the second feature group.

在一个实施例中，评分生成模块3中，对第一特征组中的动作单元和第二特征组中的参考单元分别计算均方误差，以作为相似度评分。In one embodiment, in the score generation module 3 , the mean square errors are respectively calculated for the action units in the first feature group and the reference units in the second feature group as similarity scores.

具体地，为实现较好的相似度计算效果，本实施例中，通过选择均方误差作为衡量表情相似度的评分方法，从而实现了对各动作单元与参考单元的相似度较好的评估效果。Specifically, in order to achieve a better similarity calculation effect, in this embodiment, the mean square error is selected as the scoring method to measure the similarity of expressions, thereby achieving a better evaluation effect on the similarity between each action unit and the reference unit .

以上仅为本发明较佳的实施例，并非因此限制本发明的实施方式及保护范围，对于本领域技术人员而言，应当能够意识到凡运用本发明说明书及图示内容所作出的等同替换和显而易见的变化所得到的方案，均应当包含在本发明的保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the implementation and protection scope of the present invention. For those skilled in the art, they should be able to realize the equivalent replacement and The solutions obtained by obvious changes shall all be included in the protection scope of the present invention.

Claims

1. a children's facial expression scoring system, is characterized in that, comprises:

A face detection module, the face detection module receives an externally input expression picture, and detects the first facial region from the expression picture;

A feature extraction module, the feature extraction module is connected to the face detection module, and the feature extraction module extracts a plurality of action units in the first facial area to obtain a first feature group;

A score generation module, the score generation module is connected to the feature extraction module, and the score generation module generates a similarity score according to the first feature group and the pre-generated second feature group.

2. children's facial expression scoring system according to claim 1, is characterized in that, described facial detection module adopts a face detection model trained in advance to detect described facial expression picture;

The face detection model includes:

A feature extraction network, the feature extraction network extracts feature map combinations from the input facial expression pictures;

A feature pyramid, the feature pyramid is connected to the feature extraction network, and the feature pyramid fuses the combination of feature maps to obtain a feature fusion result;

A single-stage detection module, the single-stage detection module generates the first facial region according to the feature fusion result.

3. children's facial expression scoring system according to claim 2, is characterized in that, described single-stage detection module comprises:

a convolutional layer, the convolutional layer extracts image features from the feature fusion result;

a context network that generates associated features from the feature fusion results;

A merging layer, the merging layer is respectively connected to the convolutional layer and the context network, and the merging layer generates a merging feature according to the image features and the associated features;

A face classification layer, the face classification layer is connected to the merging layer, and the face classification layer generates a face classification result according to the merging feature;

A coordinate regression layer, the coordinate regression layer is connected to the merging layer, and the coordinate regression layer generates a coordinate displacement prediction result according to the merging feature;

Deformable convolutional network, the deformable convolutional network is respectively connected to the face classification layer and the coordinate regression layer, and the deformable convolutional network generates the first One facial area.

4. children's expression scoring system according to claim 1, is characterized in that, described feature extraction module comprises:

an element extraction module, the element extraction module identifies a plurality of facial elements from the first facial region;

A vector combination module, the vector combination module is connected to the element extraction module, the vector combination module selects the facial elements and combines the facial elements into sub-vectors;

An action unit prediction module, the action unit prediction module is connected to the vector combination module, and the action unit prediction module obtains the action unit according to the sub-vector prediction and adds it to the first feature group.

5. children's facial expression scoring system according to claim 4, is characterized in that, described action unit prediction module comprises:

a preprocessing layer, the preprocessing layer receives the input sub-vector and generates a preprocessing vector;

The first fully connected layer, the first fully connected layer is connected to the preprocessing layer, and the first fully connected layer generates a vector score for each of the preprocessing vectors;

A weight generation layer, the weight generation layer is connected to the first fully connected layer, and the weight generation layer generates the vector weights of each of the preprocessing vectors respectively;

A feed-forward layer, the feed-forward layer is connected to the first fully connected layer and the weight generation layer, and the feed-forward layer is obtained from the sub-vectors according to the vector score and the vector weight The action unit.

6. children's facial expression scoring system according to claim 5, is characterized in that, described preprocessing layer comprises the second fully connected layer, batch normalization layer and linear correction layer that are connected successively.

7. children's facial expression scoring system according to claim 1, is characterized in that, before collecting described facial expression picture, also gathers reference picture in advance;

The face detection module detects a second face area from the reference picture, and the feature extraction module extracts a plurality of reference units in the second face area to obtain the second feature group.

8. children's facial expression scoring system according to claim 7, is characterized in that, also comprises action unit measurement module, and described action unit measurement module connects described feature extraction module, and described action unit measurement module carries out to described reference unit Select to filter out the reference cells that are actually added to the second feature set.

9. children's facial expression scoring system according to claim 8, is characterized in that, described action unit measuring module comprises:

a node generation module, the node generation module obtains the reference unit and creates a tree node;

a connection module, the connection module is connected to the node generation module, and the connection module creates a decision tree according to the reference unit and the tree node;

a measurement module, the measurement module is connected to the connection module, the measurement module changes the tree nodes, and calculates the variance of the decision tree to generate a measurement result;

A screening module, the screening module is connected to the measurement module, and the screening module selects a plurality of reference units that have the greatest influence on the variance of the decision tree according to the measurement results and the pre-configured extraction quantity, as the first The element of the reference in the set of two features.

10. children's facial expression scoring system according to claim 1, is characterized in that, in the described scoring generation module, to the reference unit in the described action unit in the described first characteristic group and the described second characteristic group respectively Calculate the mean square error as the similarity score.