CN109815814A

CN109815814A - A face detection method based on convolutional neural network

Info

Publication number: CN109815814A
Application number: CN201811572322.8A
Authority: CN
Inventors: 刘高华; 王萌; 苏寒松
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2019-05-28
Anticipated expiration: 2038-12-21
Also published as: CN109815814B

Abstract

The invention discloses a face detection method based on a convolutional neural network, comprising the steps of: step (1), establishing a database; step (2), performing an image in the database; preprocessing; Build a good degree learning network; step (4), test the training results, and accurately detect the faces that are occluded, different angles, and sides in the picture, as well as the smaller and more blurred faces in the picture. The rate is higher, the network structure is simple, the iteration parameters are less, and the training time is shorter.

Description

A face detection method based on convolutional neural network

技术领域technical field

本发明属于计算机视觉、人工智能领域，特别涉及一种基于卷积神经网络的人脸检测方法。The invention belongs to the fields of computer vision and artificial intelligence, and particularly relates to a face detection method based on a convolutional neural network.

背景技术Background technique

人脸检测是指在有人脸的图像中，确定出人脸所在的位置、大小的过程，是计算机视觉领域中重要的组成部分，也是进行人脸识别时预处理的关键步骤，其检测精度很大程度上也决定着人脸识别的精度，对后续的工作有着很大的影响，因此，对人脸检测的研究有着重大的意义和实用价值。Face detection refers to the process of determining the position and size of a face in an image of a face. It is an important part in the field of computer vision and a key step in preprocessing for face recognition. Its detection accuracy is very high. It also determines the accuracy of face recognition to a large extent and has a great impact on subsequent work. Therefore, the research on face detection has great significance and practical value.

人脸检测在实际生活中有着广泛的应用:例如人份认证与安全防护、在关于人脸方面媒体与娱乐、手机、数码相机等电子产品中、以及图像检索层面等等。人脸检测方法大致可分为传统的检测方法(包括基于匹配模板的检测方法、基于距离的检测方法等)以及基于深度学习的检测方法。Face detection has a wide range of applications in real life: such as identity authentication and security protection, in media and entertainment, mobile phones, digital cameras and other electronic products, as well as image retrieval and so on. Face detection methods can be roughly divided into traditional detection methods (including detection methods based on matching templates, distance-based detection methods, etc.) and detection methods based on deep learning.

近年来深度学习得到不断的完善和发展，无论是在分类还是回归任务上都得到了广泛的应用。基于深度学习的人脸检测方法也在不断发展，但对于目前的方法而言，以最常应用的MTCNN方法为例，其识别速度不够快，识别精度不够高，特别是对于图像、视频中有遮挡、或不同角度、侧面以及在画面中较小的人脸不易检测到。而作为人脸识别过程的预处理步骤，人脸检测的精度也在很大程度上影响着后续识别工作的精度，因此解决这些问题至关重要。In recent years, deep learning has been continuously improved and developed, and it has been widely used in both classification and regression tasks. Face detection methods based on deep learning are also developing continuously, but for the current methods, taking the most commonly used MTCNN method as an example, its recognition speed is not fast enough, and the recognition accuracy is not high enough, especially for images and videos. Occlusions, or different angles, sides, and smaller faces in the frame are not easy to detect. As a preprocessing step in the face recognition process, the accuracy of face detection also greatly affects the accuracy of subsequent recognition work, so it is very important to solve these problems.

发明内容SUMMARY OF THE INVENTION

基于现有技术，本发明提出了一种基于卷积神经网络的人脸检测方法，特别是涉及对画面中由于光照、遮挡的或者是处于侧面状态以及在画面中很小的人脸的检测，通过建立新的数据库，搭建卷积神经网络，并通过调整超参数，不断迭代训练网络，可以得到一个较好的检测效果，从而有效的对人脸进行检测。Based on the prior art, the present invention proposes a face detection method based on a convolutional neural network, in particular, it relates to the detection of faces in the picture due to illumination, occlusion, or in the side state and very small in the picture, By establishing a new database, building a convolutional neural network, and by adjusting the hyperparameters and iteratively training the network, a better detection effect can be obtained, thereby effectively detecting faces.

本发明提出了一种基于卷积神经网络的人脸检测方法，该方法包括以下步骤：The present invention proposes a face detection method based on a convolutional neural network, the method comprising the following steps:

一种基于卷积神经网络的人脸检测方法，该方法包括以下步骤：A face detection method based on convolutional neural network, the method includes the following steps:

步骤1、建立数据库获得图像数据进行预处理构建卷积神经网络；Step 1. Establish a database to obtain image data for preprocessing to construct a convolutional neural network;

步骤2、通过卷积神经网络中的图像特征分析模块对预处理数据进行四次迭代运算生成图像特征参数；Step 2, performing four iterative operations on the preprocessed data through the image feature analysis module in the convolutional neural network to generate image feature parameters;

步骤3、通过卷积神经网络中的全连接层对图像特征参数运算生成图像一维向量；Step 3. Generate an image one-dimensional vector by operating on the image feature parameters through the fully connected layer in the convolutional neural network;

步骤4、通过卷积神经网络中的分类层对图像一维向量进行分类和回归获得人脸图像的位置坐标。Step 4: Classify and regress the one-dimensional vector of the image through the classification layer in the convolutional neural network to obtain the position coordinates of the face image.

所述步骤2图像特征分析模块对预处理数据过程，包括如下步骤：The process of preprocessing data by the image feature analysis module in the step 2 includes the following steps:

步骤2.1所述图像特征分析模块的卷积层对预处理数据的权值与参数进行相卷积的方法提取图像特征；The convolution layer of the image feature analysis module described in step 2.1 extracts image features by convolving the weights and parameters of the preprocessed data;

步骤2.2、所述图像特征分析模块的激活函数层将图像特征运用ReLu函数进行非线性运算获得非线性特征图参数；Step 2.2, the activation function layer of the image feature analysis module uses the ReLu function to perform nonlinear operations on the image features to obtain nonlinear feature map parameters;

步骤2.3、所述图像特征分析模块的最大池化层对非线性特征图的参数进行降低处理。Step 2.3: The maximum pooling layer of the image feature analysis module reduces the parameters of the nonlinear feature map.

所述步骤4中分类层对图像一维向量进行分类和回归过程：包括如下步骤。In the step 4, the classification layer classifies and regresses the one-dimensional vector of the image: including the following steps.

步骤4.1，通过随机梯度下降法的优化方法对图像一维向量进行迭代权值，达到令损失函数不断的调整，从不断调整训练时的超参数以获得最佳训练结果，其中超参数包含：迭代次数、批次、最大迭代次数、学习率；Step 4.1, through the optimization method of the stochastic gradient descent method, iterative weights are performed on the one-dimensional vector of the image, so as to continuously adjust the loss function, and continuously adjust the hyperparameters during training to obtain the best training results. The hyperparameters include: iteration times, batches, maximum number of iterations, learning rate;

步骤4.2，分类过程选取的损失函数为将中心损失函数与softmax损失函数相结合Step 4.2, the loss function selected in the classification process is to combine the central loss function with the softmax loss function

的方法，具体表达式为：method, the specific expression is:

其中，L_S为softmax损失函数，L_c为中心损失函数，λ为系数，表示二者权重这里取λ＝0.1。式中Wx+b为全连接层的输出，经log后表示x_i属于类别y_i的概率，C表示类别的特征中心；Among them, L _S is the softmax loss function, L _c is the central loss function, and λ is the coefficient, indicating that the weight of the two is λ=0.1 here. In the formula, Wx+b is the output of the fully connected layer, after log, it represents the probability that x _i belongs to the category y _i , and C represents the feature center of the category;

步骤4.3，回归过程所采用的损失函数为：欧氏距离损失函数，具体表达式如下：Step 4.3, the loss function used in the regression process is: Euclidean distance loss function, the specific expression is as follows:

y_i∈R⁴ y _i ∈R ⁴

其中，是网络预测的输出结果，y为标记的真实标签，即68个人脸关键点的坐标。步骤4.4，将最优权值条件下输出的68个人脸关键点的坐标与数据库中带标签的的人脸关键点坐标及人脸作比对，从而算出此卷积神经网络用于人脸检测的准确率。in, is the output result of the network prediction, and y is the marked real label, that is, the coordinates of the 68 face key points. Step 4.4, compare the coordinates of the 68 face key points output under the optimal weight condition with the labeled face key point coordinates and faces in the database, so as to calculate the convolutional neural network for face detection. 's accuracy.

有益效果beneficial effect

与现有的技术相比，本发明提供一种基于卷积神经网络的人脸检测方法，对于图片中出现的有遮挡的、不同角度的、侧面的人脸以及图片中较小的、较模糊的人脸的检测准确率较高，并且网络结构简单，迭代参数较少，训练时间较短。Compared with the prior art, the present invention provides a face detection method based on a convolutional neural network. The detection accuracy of the face is relatively high, and the network structure is simple, the iterative parameters are few, and the training time is short.

附图说明Description of drawings

图1为一种基于卷积神经网络的人脸检测方法的流程图；Fig. 1 is a kind of flow chart of the face detection method based on convolutional neural network;

图2为本发明所提出的一种基于卷积神经网络的人脸识别方法所用卷积神经网络的连接方式，其中包含四个卷积层，四个ReLu激活函数层、四个最大池化层、两个全连接层，其中最后一个全连接层为softmax分类层；FIG. 2 is a connection mode of a convolutional neural network used in a face recognition method based on a convolutional neural network proposed by the present invention, which includes four convolutional layers, four ReLu activation function layers, and four maximum pooling layers. , two fully connected layers, the last fully connected layer is the softmax classification layer;

具体实施方式Detailed ways

下面结合附图对本发明作进一步详细描述：Below in conjunction with accompanying drawing, the present invention is described in further detail:

如图1所示，为一种基于卷积神经网络的人脸检测方法的流程图。As shown in Figure 1, it is a flow chart of a face detection method based on a convolutional neural network.

步骤1(110)建立数据库获得图像数据进行预处理构建卷积神经网络；Step 1 (110) establishes a database to obtain image data for preprocessing to construct a convolutional neural network;

本步骤中，建立数据库获得图像数据，即在所建立数据库中包含以下要求的图片：图片中含有至少一个人脸，人脸的位置不做要求，最好是不在正中心、距离较远的人脸；且人脸所处背景复杂多样，包含室内及室外各种场景；用矩形框标记出图像中人脸的所在位置，并且标记出人脸中包括眉毛、眼睛、鼻子、嘴巴、脸部轮廓在内的68个关键点。图像清晰度不做要求。所建立的数据库中共包含6000张包含人脸并做好标记的图像。In this step, a database is established to obtain image data, that is, the established database contains pictures with the following requirements: the picture contains at least one face, and the position of the face is not required, preferably a person who is not in the center and is far away face; and the background of the face is complex and diverse, including various indoor and outdoor scenes; the position of the face in the image is marked with a rectangular frame, and the face includes eyebrows, eyes, nose, mouth, and facial contours. 68 key points including. Image clarity is not required. The established database contains a total of 6,000 images containing human faces and labeled.

本步骤中，将数据库中的图像进行预处理过程，对于所建立好的数据库中的图像首先进行空间金字塔池化操作，此操作可由一张图像得到不同像素、不同尺度的多张图像，便于从多尺度的特征中提取出固定大小的特征向量；将上述步骤生成的所有图片进行随机镜像，包括上下、左右镜像操作；将上述步骤处理好的数据库图像中的4/5用作训练数据库，1/5用作测试数据库；In this step, the images in the database are preprocessed, and the images in the established database are firstly subjected to the spatial pyramid pooling operation. This operation can obtain multiple images with different pixels and different scales from one image, which is convenient for extracting images from the database. A feature vector of a fixed size is extracted from the multi-scale features; all images generated in the above steps are randomly mirrored, including up and down, left and right mirror operations; 4/5 of the database images processed in the above steps are used as the training database, 1 /5 is used as a test database;

步骤2、(210)通过卷积神经网络中的图像特征分析模块对预处理数据进行四次迭代运算生成图像特征参数；Step 2, (210) by the image feature analysis module in the convolutional neural network, perform four iterative operations on the preprocessed data to generate image feature parameters;

2.2、所述图像特征分析模块的激活函数层将图像特征运用ReLu函数进行非线性运算获得非线性特征图参数；2.2. The activation function layer of the image feature analysis module uses the ReLu function to perform nonlinear operations on image features to obtain nonlinear feature map parameters;

本发明是将预处理后的测试数据库图像送入已训练好的神经网络中，测试图片经过训练好的神经网络权值矩阵，提取特征后经过分类器，输出分类与回归的结果，分类的结果以概率形式表示，若判定为人脸的概率大于判定为非人脸的概率，即判定为人脸，并将判定为人脸的部分用矩形框标记；回归的结果是图片中人脸部分68个关键点处用关键点标记出，并返回标记的坐标。In the present invention, the preprocessed test database image is sent into the trained neural network, the test image is subjected to the trained neural network weight matrix, the features are extracted, and then the classifier is passed to output the result of classification and regression, and the result of the classification In the form of probability, if the probability of being judged as a face is greater than the probability of being judged as a non-face, it is judged as a face, and the part judged as a face will be marked with a rectangular frame; the result of the regression is the 68 key points of the face part in the picture mark with a keypoint, and return the coordinates of the mark.

步骤3、(310)通过卷积神经网络中的全连接层对图像特征参数运算生成图像一维向量。Step 3. (310) Generate a one-dimensional image vector by operating on the image feature parameters through the fully connected layer in the convolutional neural network.

步骤4、(410)通过卷积神经网络中的分类层对图像一维向量进行分类和回归获得人脸图像的位置坐标。所述步骤4中分类层对图像一维向量进行分类和回归过程：包括如下步骤：Step 4. (410) Classify and regress the one-dimensional vector of the image through the classification layer in the convolutional neural network to obtain the position coordinates of the face image. In the step 4, the classification layer performs the classification and regression process on the one-dimensional vector of the image: including the following steps:

步骤4.2，分类过程选取的损失函数为将中心损失函数与softmax损失函数相结合的方法，具体表达式为：Step 4.2, the loss function selected in the classification process is a method of combining the central loss function and the softmax loss function, and the specific expression is:

y_i∈R⁴ y _i ∈R ⁴

本发明训练任务整体分为两个部分：分类与回归。分类是指将人脸检测问题视作人脸以及非人脸的二分类问题；回归是指经神经网络训练后返回出人脸边框的坐标以及人脸68个关键点的所在位置的坐标的过程，从而完成检测人脸的目的。经过不断迭代更新网络中的权值，以减小损失函数，从而最终得到最优权值；将最优权值条件下输出的识别结果与数据库中带标签的的人脸关键点坐标及人脸作比对，从而算出此卷积神经网络用于人脸检测的准确率。The training task of the present invention is divided into two parts as a whole: classification and regression. Classification refers to the problem of face detection as a two-class problem of face and non-face; regression refers to the process of returning the coordinates of the face frame and the coordinates of the 68 key points of the face after neural network training. , so as to complete the purpose of detecting faces. After continuous iterative updating of the weights in the network to reduce the loss function, the optimal weights are finally obtained; the recognition results output under the optimal weights are compared with the labeled face key point coordinates and face in the database. For comparison, the accuracy of this convolutional neural network for face detection is calculated.

如图2所示，本发明所提出的一种基于卷积神经网络的人脸识别方法所用卷积神经网络包含四个卷积层，四个ReLu激活函数层、四个最大池化层、两个全连接层，其中最后一个全连接层为softmax分类层。其作用分别为:卷积层用于利用将卷积层权值与参数相卷积的方法提取图像的特征；激活函数层是为了增加网络的非线性能力，其中ReLu函数指的是y＝max(0，x)这一函数；最大池化层是为了减少输出大小、降低参数量；全连接层是为了将所提取到的特征映射为一维向量；分类层是为了在前述网络提取出的特征中分类出人脸、非人脸两个部分，以及回归出人脸的68个关键点的坐标。整个训练过程为：首先随机初始化卷积层与全连接层中的参数，当将所建立的数据库中的图像送至这个网络后，经过四个卷积、激活、池化层之后，得到人脸的特征，再通过全连接层得到固定大小的特征向量，最后通过分类层得到人脸所在位置的坐标。分类层是为了在前述网络提取出的特征中分类出人脸、非人脸两个部分，以及回归出人脸的68个关键点的坐标。在网络训练的过程中，数据沿网络正向传播，通过损失函数得到的误差沿网络反向传播，使卷积层与全连接层中的参数不断优化，通过不断训练、微调各种参数，最终得到好的训练效果。As shown in Figure 2, the convolutional neural network used in the face recognition method based on the convolutional neural network proposed by the present invention includes four convolutional layers, four ReLu activation function layers, four maximum pooling layers, two The last fully connected layer is the softmax classification layer. The functions are: the convolution layer is used to extract the features of the image by convolving the weights of the convolution layer with the parameters; the activation function layer is to increase the nonlinear ability of the network, and the ReLu function refers to y=max. (0,x) this function; the maximum pooling layer is to reduce the output size and the amount of parameters; the fully connected layer is to map the extracted features into a one-dimensional vector; the classification layer is to extract the The features are classified into two parts: face and non-face, and the coordinates of 68 key points that return to the face. The whole training process is as follows: first, the parameters in the convolution layer and the fully connected layer are randomly initialized. After the images in the established database are sent to this network, after four convolution, activation, and pooling layers, the face is obtained. features, and then obtain a fixed-size feature vector through the fully connected layer, and finally obtain the coordinates of the location of the face through the classification layer. The classification layer is to classify the face and non-face in the features extracted by the aforementioned network, and to return the coordinates of the 68 key points of the face. In the process of network training, the data propagates forward along the network, and the error obtained through the loss function propagates backward along the network, so that the parameters in the convolutional layer and the fully connected layer are continuously optimized. Get a good training effect.

本发明的整个训练过程为：首先随机初始化卷积层与全连接层中的参数，当将所建立的数据库中的图像送至这个网络后，经过四个卷积、激活、池化层之后，得到人脸的特征，再通过全连接层得到固定大小的特征向量，最后通过分类层得到人脸所在位置的坐标。在网络训练的过程中，数据沿网络正向传播，通过损失函数得到的误差沿网络反向传播，使卷积层与全连接层中的参数不断优化，通过不断训练、微调各种参数，最终得到好的训练效果。这一步骤是通过对数据库进行训练，从而得到最优的参数。在整个训练过程中，用损失函数表征实际标签与预测结果之间的误差，即通过使损失函数达到最小，不断进行迭代训练，到最后损失函数达到最小时，得到最优参数。其中，要训练的参数包括卷积层的卷积核以及偏置，以及全连接层中的神经元参数。在整个训练过程中，数据前向传播，由损失函数计算得到的误差反向传播，通过梯度下降法，使网络在不断迭代的过程中找到全局最优点，此时便得到了最优参数。在训练结束，即得到最优的网络参数之后，将最优参数代入至整个网络中，此时整个网络具备人脸检测的能力，即可进行人脸检测。之后通过测试可得到此神经网络用于人脸检测的准确率。The whole training process of the present invention is as follows: first, the parameters in the convolution layer and the fully connected layer are randomly initialized, and after the images in the established database are sent to this network, after four convolution, activation, and pooling layers, The features of the face are obtained, and then a fixed-size feature vector is obtained through the fully connected layer, and finally the coordinates of the location of the face are obtained through the classification layer. In the process of network training, the data propagates forward along the network, and the error obtained through the loss function propagates backward along the network, so that the parameters in the convolutional layer and the fully connected layer are continuously optimized. Get a good training effect. This step is to obtain the optimal parameters by training the database. During the whole training process, the error between the actual label and the predicted result is represented by the loss function, that is, by minimizing the loss function, iterative training is performed continuously, and the optimal parameters are obtained when the final loss function reaches the minimum. Among them, the parameters to be trained include the convolution kernel and bias of the convolution layer, and the neuron parameters in the fully connected layer. During the whole training process, the data is propagated forward, the error calculated by the loss function is back propagated, and through the gradient descent method, the network finds the global optimal point in the process of continuous iteration, and the optimal parameters are obtained at this time. After the training is over, that is, after obtaining the optimal network parameters, the optimal parameters are substituted into the entire network. At this time, the entire network has the capability of face detection, and face detection can be performed. Afterwards, the accuracy of this neural network for face detection can be obtained through testing.

以上说书仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内的所作的任何修改、等同替换和改进等，均为包含在本发明的保护范围内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention are included in the protection of the present invention. within the range.

Claims

1. a face detection method based on convolutional neural network, is characterized in that, this method comprises the following steps:

Step 1. Establish a database to obtain image data for preprocessing to construct a convolutional neural network;

Step 2, performing four iterative operations on the preprocessed data through the image feature analysis module in the convolutional neural network to generate image feature parameters;

Step 3. Generate an image one-dimensional vector by operating on the image feature parameters through the fully connected layer in the convolutional neural network;

Step 4: Classify and regress the one-dimensional vector of the image through the classification layer in the convolutional neural network to obtain the position coordinates of the face image.

2. the face detection method based on convolutional neural network as claimed in claim 1, is characterized in that, in described step 2, image feature analysis module comprises the following steps to preprocessing data process:

Step 2.1, the convolution layer of the image feature analysis module extracts image features by convolving the weights and parameters of the preprocessed data;

Step 2.2, the activation function layer of the image feature analysis module uses the ReLu function to perform nonlinear operations on the image features to obtain nonlinear feature map parameters;

Step 2.3: The maximum pooling layer of the image feature analysis module reduces the parameters of the nonlinear feature map.

3. The face detection method based on a convolutional neural network as claimed in claim 1, wherein in the step 4, the classification layer performs a classification and regression process on the image one-dimensional vector: comprising the following steps.

Step 4.1. Iterative weights are performed on the one-dimensional vector of the image through the optimization method of the stochastic gradient descent method, so as to continuously adjust the loss function, and continuously adjust the hyperparameters during training to obtain the best training results. The hyperparameters include: iteration times, batches, maximum number of iterations, learning rate;

Step 4.2. The loss function selected in the classification process is a method of combining the central loss function and the softmax loss function. The specific expression is:

Among them, L _S is the softmax loss function, L _c is the central loss function, and λ is the coefficient, indicating that the weight of the two is λ=0.1 here. In the formula, Wx+b is the output of the fully connected layer, after log, it represents the probability that x _i belongs to the category y _i , and C represents the feature center of the category;

Step 4.3. The loss function used in the regression process is: Euclidean distance loss function, the specific expression is as follows:

y _i ∈R ⁴

in, is the output result of the network prediction, and y is the marked real label, that is, the coordinates of the 68 face key points. Step 4.4. Compare the coordinates of the 68 face key points output under the optimal weight condition with the labeled face key point coordinates and faces in the database, so as to calculate the convolutional neural network for face detection. 's accuracy.