CN111027504A

CN111027504A - Face key point detection method, device, equipment and storage medium

Info

Publication number: CN111027504A
Application number: CN201911310807.4A
Authority: CN
Inventors: 周康明; 方飞虎
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2020-04-17

Abstract

The present application provides a method, device, device and storage medium for detecting key points of a face. The method includes: acquiring a face image including a face; performing key point detection on the face image through a neural network to obtain position information of the key points of the face in the face image, wherein the neural network is trained Data set training, the training data set includes a plurality of first face image samples and their corresponding marking information, the face contained in each first face image sample has an occluded part, and the marking information includes The location information of the key points of the occluded and unoccluded parts of the face. The present application can train the neural network through face image samples that mark the face key points of the occluded part, so that the neural network can accurately detect the face key points of the face image of the partially occluded face, and improve the performance of the face. Detection accuracy of keypoint detection.

Description

Face key point detection method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a face key point.

Background

The face key point detection refers to a technology for detecting key points such as eyes, a nose, face edges and the like on a face in a face image. The face key point detection can be applied to scenes such as face local positioning, expression recognition, intelligent driving test judgment, driving assistance and the like.

Generally, a pre-constructed face key point detection network is trained by a plurality of face image samples with key point positions marked, and then the key point positions of the face in the image to be processed are detected by the trained face key point detection network.

However, there may be occlusion in the image to be processed, for example, the turning angle of the face in the image is too large, or there is an occlusion on the face. The existing face key point detection method is adopted to detect the face image with the face sheltered, and the accuracy of the detection result is poor.

Disclosure of Invention

The embodiment of the application provides a face key point detection method, a face key point detection device, face key point detection equipment and a storage medium, and aims to solve the problem that the detection accuracy of the existing face key point detection method is poor.

In a first aspect, an embodiment of the present application provides a method for detecting a face key point, including:

acquiring a face image containing a face;

the method comprises the steps of detecting key points of a face image through a neural network to obtain position information of key points of the face in the face image, wherein the neural network is trained through a training data set, the training data set comprises a plurality of first face image samples and corresponding mark information of the first face image samples, the face contained in each first face image sample has a blocked part, and the mark information comprises the position information of the key points of the blocked part and the unblocked part on the face.

In one possible embodiment, the neural network comprises a convolutional neural network;

carrying out key point detection on the face image through a neural network to obtain the position information of the key points of the face in the face image, wherein the position information comprises the following steps:

inputting the face image into the convolutional neural network to obtain a preset number of feature maps, wherein the feature maps correspond to key points on the face in the face image one to one;

and for each feature map, determining the coordinates of the positions of the pixel points with the maximum pixel values in the feature map as the position coordinates of the key points corresponding to the feature map.

In a possible implementation, before acquiring a face image including a face, the method further includes:

acquiring a plurality of first human face image samples and marking coordinates of human face key points marked in advance by the first human face image samples;

generating a thermodynamic diagram corresponding to each key point on the face of each first face image sample according to the mark coordinate of each key point on the face of the first face image sample, wherein the pixel value of the position corresponding to the mark coordinate of the key point in the thermodynamic diagram is larger than the pixel values of the rest positions, and the pixel values of the rest positions are reduced along with the increase of the distance from the position corresponding to the mark coordinate of the key point;

and forming the training data set by each first face image and a group of corresponding thermodynamic diagrams.

In one possible embodiment, after the training data set is composed of each first face image sample and a corresponding set of thermodynamic diagrams, the method further includes:

and aiming at each first face image sample in the training data set, inputting the first face image sample into a pre-constructed neural network to obtain a plurality of predicted characteristic graphs output by the neural network, wherein the predicted characteristic graphs correspond to thermodynamic diagrams of the first face image sample one by one, and network parameters of the neural network are adjusted according to a group of thermodynamic diagrams and predicted characteristic graphs of the first face image sample.

In a possible implementation manner, a ratio of the number of key points of the portion of the first face image sample where the face is occluded to the total number of key points of the face is less than or equal to a preset threshold.

In a possible implementation manner, the training data set further includes a plurality of second face image samples and corresponding label information thereof, and each second face image sample includes a face that is not occluded.

In one possible implementation, acquiring a face image including a face includes:

acquiring an image acquired by a camera;

identifying the position information of the area where the face is located in the image;

and extracting the image of the area corresponding to the position information from the image as the face image.

In a possible implementation, identifying the location information of the region where the face is located in the image includes:

and inputting the image into a single-shot multi-frame detection SSD network to obtain the position information of the area where the face is located in the image.

In a possible implementation manner, after performing key point detection on the face image through a neural network to obtain position information of key points of the face in the face image, the method further includes:

and marking each face key point in the face image according to the position information of the face key points in the face image, and displaying the marked face image.

In a second aspect, an embodiment of the present application provides a face keypoint detection apparatus, including:

the acquisition module is used for acquiring a face image containing a face;

the processing module is used for detecting key points of the face image through a neural network to obtain position information of key points of the face in the face image, wherein the neural network is trained through a training data set, the training data set comprises a plurality of first face image samples and corresponding marking information of the first face image samples, a blocked part exists in a face contained in each first face image sample, and the marking information comprises position information of key points of the blocked part and the unblocked part on the face.

In a third aspect, an embodiment of the present application provides a face keypoint detection apparatus, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions stored by the memory, so that the at least one processor performs the face keypoint detection method as described in the first aspect and various possible implementations of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when a processor executes the computer executing instruction, the method for detecting a face keypoint is implemented as described in the first aspect and various possible implementations of the first aspect.

According to the method, the device, the equipment and the storage medium for detecting the key points of the face, a face image containing the face is obtained firstly; and then carrying out key point detection on the face image through a neural network to obtain the position information of face key points in the face image, wherein the neural network is trained through a training data set, the training data set comprises a plurality of first face image samples and corresponding marking information thereof, the face contained in each first face image sample has a shielded part, the marking information comprises the position information of key points of the shielded part and the non-shielded part on the face, and the neural network can be trained through the face image samples for marking the face key points of the shielded part, so that the neural network can also accurately detect the face key points on the face image of the face shielded by the part, and the detection accuracy of the face key point detection is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a schematic diagram illustrating an architecture of a face keypoint detection system according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a face key point detection method according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a face key point detection method according to another embodiment of the present application;

fig. 4 is a schematic flow chart of a face keypoint detection method according to another embodiment of the present application;

fig. 5 is a schematic diagram of marking positions of key points of a face according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a face keypoint detection apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a face keypoint detection apparatus according to yet another embodiment of the present application;

fig. 8 is a schematic diagram of a hardware structure of a face keypoint detection apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic structural diagram of a face keypoint detection system according to an embodiment of the present application. As shown in fig. 1, the face keypoint detection system provided by the present embodiment includes a camera 11 and a face keypoint detection apparatus 12. The camera 11 is used for acquiring a face image including a human face. The camera 11 may send the face image to the face keypoint detection device 12. The face key point detection device 12 is used for performing key point detection on the face image. The face key point detection device 12 may be a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, a portable computer, a server, and the like, and is not limited herein.

The face key point detection method provided by the embodiment can be applied to scenes such as face local positioning, expression recognition, intelligent driving test judgment, driving assistance and the like, and is not limited herein. Taking the assisted driving as an example, in the assisted driving process, it is often necessary to determine the state of the driver, such as the sight line direction and fatigue condition of the driver, and generally, the face and the positions of key points of the face are first located from the acquired image, and then the state of the driver is determined according to the positions of the key points. Therefore, the positioning accuracy of the key points of the human face plays an important role in accurately judging the state of the driver. In this application scenario, the camera may be mounted in a fixed position inside the vehicle for capturing images of the driver. The face key point detection equipment can be a vehicle-mounted terminal or a server in communication connection with the camera and the like. After the camera collects the image of the face of the driver, the image is sent to face key point detection equipment, the face key point detection equipment detects the key point position of the face in the image, then the face key point detection equipment identifies the state of the driver in the image according to the key point position of the face, or the face key point detection equipment sends the key point position of the face to other identification equipment, and the other identification equipment identifies the state of the driver in the image according to the key point position of the face.

The existing human face key point detection method mostly focuses on the situation that the human face is more positive and is not shielded, and the accuracy of the detection result is poor under the condition that the key point is shielded. If the steering angle of the driver is too large or a shielding object exists between the driver and the camera, the positions of key points on the face of the driver cannot be accurately detected, and further the state of the driver cannot be identified or is identified wrongly. According to the face key point detection method provided by the embodiment of the application, the neural network can be trained by the face image sample of the face key point marked out by the shielded part, so that the neural network can accurately detect the face key point for the face image of the face shielded by the part, the detection accuracy of face key point detection is improved, and the accuracy of subsequent driver state identification is further improved.

Fig. 2 is a schematic flow chart of a face key point detection method according to an embodiment of the present application. As shown in fig. 2, the method includes:

s201, obtaining a face image containing a face.

In this embodiment, the face keypoint detection apparatus may acquire a face image. For example, the face key point detection device may acquire a face image collected by a camera inside a vehicle, a road-mounted snapshot camera, a camera of a mobile phone, or the like, or may acquire the face image from a server and a database, which is not limited herein.

S202, carrying out key point detection on the face image through a neural network to obtain position information of key points of the face in the face image, wherein the neural network is trained through a training data set, the training data set comprises a plurality of first face image samples and corresponding marking information thereof, a blocked part exists in the face contained in each first face image sample, and the marking information comprises position information of key points of the blocked part and the non-blocked part on the face.

In this embodiment, the face key point detection device may input the face image into a neural network for face key point detection, and the neural network may detect the positions of the key points of the face in the face image and output position information of the face key points. The neural network is pre-constructed and the training of the network parameters is performed through a training data set.

The training data set comprises a plurality of face image samples with partially shielded faces and corresponding marking information. In this embodiment, a face image sample whose face is partially occluded is referred to as a first face image sample, and a face image sample whose face is not occluded is referred to as a second face image sample. The marking information of each first face image sample is position information of key points of the face in the image sample marked by manually marking the image sample. The marked key point position information comprises key point position information of an unoccluded part of the face in the image sample and key point position information of an occluded part of the face in the image sample.

For example, assuming that 68 keypoints are marked on each first face image sample, for a certain first face image sample, 56 keypoints of the non-occluded part of the face in the image sample are marked, and 12 keypoints of the occluded part of the face are marked, 56 keypoints of the non-occluded part of the face can be directly marked in the image sample during manual marking, and 12 keypoint positions of the occluded part of the face can be marked according to manual experience.

The method comprises the steps of firstly, obtaining a face image containing a face; and then carrying out key point detection on the face image through a neural network to obtain the position information of face key points in the face image, wherein the neural network is trained through a training data set, the training data set comprises a plurality of first face image samples and corresponding marking information thereof, the face contained in each first face image sample has a shielded part, the marking information comprises the position information of key points of the shielded part and the non-shielded part on the face, and the neural network can be trained through the face image samples for marking the face key points of the shielded part, so that the neural network can also accurately detect the face key points on the face image of the face shielded by the part, and the detection accuracy of the face key point detection is improved.

Optionally, S202 may include:

In this embodiment, the neural network may be a convolutional neural network for performing keypoint detection on a face image. The face key point detection equipment can input the face image into the convolutional neural network, and the convolutional neural network outputs the feature images of the preset number corresponding to the face image. The value of the predetermined number of sheets is not limited herein, and may be, for example, 68 or 136. The preset number is the same as the number of key points on the human face. For example, assuming that 68 key point positions are recognized on the face in each face image, after the face image is input by the neural network, 68 feature maps are output, and each feature map corresponds to one key point. And the position of the pixel point with the maximum pixel value in each feature map is the position of the key point corresponding to the feature map. The face key point detection equipment can select the coordinates of the positions of the pixel points with the maximum pixel values from each feature image as the position coordinates of the key points corresponding to the feature image, so that the position coordinates of each key point of the face in the face image are obtained.

The structure of the convolutional neural network is not limited herein, for example, the convolutional neural network may be constructed by 18 convolutional layers, 3 Relu (Rectified Linear Unit) active layers, 5 pooling layers, 4 deconvolution layers, 5 eltwise layers, and 3 batcnorm layers.

Optionally, a ratio of the number of key points of the portion, in which the face is occluded, of the first face image sample to the total number of key points of the face is less than or equal to a preset threshold.

In this embodiment, if too many parts of the face in the first face image sample are blocked, the neural network cannot be trained effectively, for example, more than 90% of the face area in the image sample is blocked, the position of the key point of the neural network cannot be determined by the neural network or the artificial marker, and the network parameters cannot be improved by training the neural network through the image sample. Therefore, the shielded part in the first face image sample is limited by the preset threshold value, so that effective training of the neural network is guaranteed. The value of the preset threshold is not limited herein, for example, the preset threshold may be 30%, 40%, etc. The ratio of the number of key points of the part, in which the face is shielded, of each first face image sample to the total number of key points of the face is less than or equal to the preset threshold. For example, if the preset threshold is 40%, the number of the key points of the portion of the first face image sample where the face is occluded is 12, and the total number of the face key points is 68, the proportion of the number of the key points of the portion of the first face image sample where the face is occluded to the total number of the face key points is 12/68-17.6%, and thus the first face image sample meets the requirement and can be used for training the neural network model.

Optionally, the training data set further includes a plurality of second face image samples and corresponding label information thereof, and each second face image sample includes an unobstructed face.

In this embodiment, the label information of each second face image sample includes position information of a key point on the face in the second face image sample. In this embodiment, besides training the neural network by using a plurality of first face image samples, a plurality of second face image samples are also used for training the neural network, so that the training data set includes both image samples in which the face is partially occluded and image samples in which the face is not occluded. The neural network is trained through the training data set, so that the neural network can detect key points of images of which the faces are not shielded and images of which the faces are partially shielded, and the accuracy is high.

Optionally, after S202, the method may further include:

In this embodiment, the face key point detection device may mark each face key point in the face image according to the position information of the face key point in the face image, for example, the face key point in the face image may be marked by a preset color or symbol, and then the marked face image is displayed, so that the user can view the detected key point.

Alternatively, the face key point detection device may store the face image and the position information of the face key points in the detected face image in a designated file or a designated database in an associated manner.

Fig. 3 is a schematic flow chart of a face keypoint detection method according to another embodiment of the present application. In this embodiment, before the face image is detected by the neural network, a training data set is first constructed, and then the neural network is trained by the training data. The embodiment describes a specific implementation process for constructing the training data set in detail. As shown in fig. 3, the method may further include:

s301, obtaining a plurality of first face image samples and the marked coordinates of the face key points marked in advance.

In this embodiment, a plurality of first face image samples may be acquired, and the mark coordinates of the face key points in each first face image sample are marked in an artificial mark manner. The marking coordinates are the position coordinates of the manually marked key points.

And S302, generating a thermodynamic diagram corresponding to each key point on the face of each first face image sample according to the mark coordinate of each key point on the face of the first face image sample, wherein the pixel value of the position corresponding to the mark coordinate of the key point in the thermodynamic diagram is larger than the pixel values of the rest positions, and the pixel values of the rest positions are reduced along with the increase of the distance from the position corresponding to the mark coordinate of the key point.

In this embodiment, a first face image sample may be selected each time, and then, for each key point on a face in the first face image sample, a corresponding thermodynamic diagram is generated according to the mark coordinates of the key point. The thermodynamic diagram is the same size as the first face image sample. The pixel value of the position corresponding to the mark coordinate of the key point in the thermodynamic diagram is larger than the pixel values of the rest positions, and the pixel values of the rest positions are reduced along with the increase of the distance from the position corresponding to the mark coordinate of the key point. The farther a position is from the corresponding position of the mark coordinate of the key point, the smaller the pixel value is.

Optionally, the [ x, y ] pixel value of the position where the corresponding key point in each thermodynamic diagram is located is 1, and the pixel values of the rest positions conform to the gaussian distribution according to the distance from the [ x, y ] position. Thus, for each first face image sample, a set of thermodynamic diagrams corresponding to the first face image sample is obtained. For example, if a face has 68 keypoints in each first face image sample, then each first face image sample results in 68 thermodynamic diagrams.

And S303, forming the training data set by each first face image and a group of thermodynamic diagrams corresponding to the first face images.

In this embodiment, a set of thermodynamic diagram coordinates corresponding to each first face image may be used as the label information corresponding to the first face image, so as to obtain a training data set.

Optionally, after S303, the method may further include: and aiming at each first face image sample in the training data set, inputting the first face image sample into a pre-constructed neural network to obtain a plurality of predicted characteristic graphs output by the neural network, wherein the predicted characteristic graphs correspond to thermodynamic diagrams of the first face image sample one by one, and network parameters of the neural network are adjusted according to a group of thermodynamic diagrams and predicted characteristic graphs of the first face image sample.

In this embodiment, the predicted feature map and the thermodynamic map have the same size. For a certain first face image sample, calculating a loss function value corresponding to the first face image sample according to a group of thermodynamic diagrams of the first face image sample and a corresponding prediction characteristic diagram, then calculating back propagation according to the loss function value, iteratively updating network parameters, and considering that the model converges when the loss function value tends to be stabilized to a value of about 0.4 or less, thereby completing training. The loss function may be selected according to actual requirements, and is not limited herein, for example, the loss function may be a mean-square loss function, and is expressed as:

wherein L represents a loss function, x_iI-th pixel value, y, of a predicted feature map representing the output of a neural network_iAnd the ith pixel value in the thermodynamic diagram is represented, and N is the number of pixel points in the prediction characteristic diagram and the thermodynamic diagram.

In the embodiment, the neural network is trained through a plurality of first face image samples and corresponding thermodynamic diagrams, the trained neural network identifies the face image to be processed, and each feature diagram corresponds to one key point position, so that the accuracy of face key point detection is improved.

Fig. 4 is a schematic flow chart of a face keypoint detection method according to another embodiment of the present application. The embodiment describes in detail a specific implementation process for acquiring a face image. As shown in fig. 4, the method includes:

s401, obtaining an image collected by a camera.

In this embodiment, the face key point detection device may acquire an image acquired by the camera. The image collected by the camera comprises a human face and a background image except the human face.

S402, identifying the position information of the area where the face in the image is located.

In this embodiment, the face key point detection device may identify the position information of the region where the face is located by using a face detection method. For example, the position information of the region where the face is located may be represented as [ x, y, w, h ], where x, y, w, h respectively represent the vertex coordinate x value and y value of the minimum bounding rectangle of the region where the face is located, and the width and height of the rectangle.

Optionally, the image is input to a Single Shot multi box Detector (SSD) network, so as to obtain position information of a region where the face is located in the image.

And S403, extracting the image of the area corresponding to the position information from the image as the face image.

In this embodiment, the face key point detection device may identify position information of an area where a face in the image is located through the SSD network, extract an image of the position information from the image as a face image, and detect a position of a face key point in the face image through the neural network.

Optionally, the structure of the SSD network is not limited herein, for example, the SSD network may be built up by 10 convolution layers, 7 Relu active layers, 2 pooling layers, 1 deconvolution layer, 1 clipping layer, 1 eltwise layer, 1 flatten layer, and 3 splicing layers.

S404, carrying out key point detection on the face image through a neural network to obtain position information of key points of the face in the face image, wherein the neural network is trained through a training data set, the training data set comprises a plurality of first face image samples and corresponding marking information thereof, a blocked part exists in the face contained in each first face image sample, and the marking information comprises position information of key points of the blocked part and the non-blocked part on the face.

In this embodiment, S404 is similar to S202 in the embodiment of fig. 2, and is not described here again.

In the embodiment, the position information of the region where the face is located in the image collected by the camera is identified, then the image of the corresponding region is extracted to be used as the face image, and then the key point identification is carried out on the face image, so that the interference of the image information outside the region where the face is located can be eliminated, and the key point identification accuracy is further improved.

In a possible implementation example, the face key point detection method provided by this embodiment may apply face detection to a driver, and may include the following steps:

step 1, installing a camera and program running equipment in a vehicle, and training an SSD network in advance. The method mainly comprises the following steps:

step 1.1, data is prepared. The method comprises the steps of collecting RGB image data of a scene in the car under different illumination conditions of different people through a color camera installed at a fixed position in the car, carrying out normalization processing on all collected images, carrying out normalization through subtracting 127.5 from a pixel value on each pixel point and dividing the pixel value by 127.5, and scaling the normalized images to 286 multiplied by 286 pixel size.

And step 1.2, manually marking the face position [ x, y, width, height ] of the driver in each image, recording the face position [ x, y, width, height ] of the driver in a label file, and making the face position [ x, y, width, height ] of the driver in each image and the image into a training data set of the SSD network.

And step 1.3, constructing an SSD detection network structure by adopting 10 convolution layers, 7 Relu activation layers, 2 pooling layers, 1 deconvolution layer, 1 clipping layer, 1 eltwise layer, 1 flatten layer and 3 splicing layers.

Step 1.4, when the SSD detection network is trained, inputting the normalized image data and the corresponding label data to the SSD detection network at the same time, manually setting K rectangles with different sizes and aspect ratios as preset position frames in advance, determining whether the overlap degree between each preset position frame and the face position area of the manual mark is large, if so, regarding the preset position frame as the face area, otherwise, regarding the preset position frame as the background area, respectively calculating a classification loss function value loss (two types of background and face area) and a position regression loss on each preset position frame, and adding the two to obtain a final loss of the network:

wherein x represents a preset position frame set, c represents a category set of the preset position frame, l represents an [ x, y, width, height ] parameter set of the preset position frame, g represents an [ x, y, width, height ] parameter set of the artificial mark position, and N represents the number of the preset position frames matched with the artificial mark position; lconf (x, c) represents calculating the cross entropy loss of the category of the preset position frame and the actual category; lloc (x, l, g) represents a smoothL1 loss to calculate the position information of the preset position frame and the actual position information. And calculating back propagation according to the loss, iteratively updating network parameters, and considering the model to be converged when the loss tends to be stable to a value of about 1.0 or less.

Step 2, training a face key point network model, which mainly comprises the following steps:

and 2.1, preparing data, intercepting and storing a corresponding position in the image as a new picture according to the marked face region information [ x, y, width, height ] during the SSD network training, and zooming the picture to 256 multiplied by 256 pixels. The image is the face image sample, and may include an image in which the face is partially occluded and an image in which the face is not occluded.

And 2.2, marking key point information [ x, y ] which can confirm positions of eyebrows, eyes, a nose, a mouth, the lower half part edge of the face and the like of the face in the acquired picture, wherein the x and the y respectively represent a horizontal coordinate and a vertical coordinate of the key point, as shown in fig. 5. A total of 68 sets of keypoint information represent 68 keypoints; and drawing 68 pairs of thermodynamic diagrams of 256 x 256 pixels according to the positions of the key points, wherein each key point corresponds to one thermodynamic diagram, the value of the [ x, y ] pixel at the position where the key point is arranged in the thermodynamic diagram is set to be 1, the values of the rest positions conform to Gaussian distribution according to the distance from the [ x, y ] position, and the 68 thermodynamic diagrams are used as labels.

And 2.3, constructing a human face key point detection network structure by adopting 18 convolution layers, 3 Relu activation layers, 5 pooling layers, 4 deconvolution layers, 5 eltwise layers and 3 Batchnorm layers.

And 2.4, when the face key point detection network is trained, inputting the scaled image data and the corresponding thermodynamic diagram data into the network, returning a 68 x 256 feature diagram to the last convolutional layer of the network, calculating the mean square loss function mse loss between the feature diagram output by the network and the thermodynamic diagram, calculating back propagation according to the loss, iteratively updating network parameters, and considering the model to be converged when the loss tends to be stabilized to a value of about 0.4 or less.

And 3, inputting the real-time in-car picture acquired by the camera into the SSD network acquired in the step 1, and acquiring face position information [ x, y, w, h ], wherein x, y, w, h respectively represent the vertex coordinate x value and y value of the minimum rectangular frame containing the whole face and the width and height of the rectangular frame. The method mainly comprises the following steps:

and 3.1, acquiring RGB image data of a scene in the vehicle through the camera, normalizing the image data, subtracting 127.5 from a pixel value of each point, dividing by 127.5, and zooming to 286 × 286 pixels.

And 3.2, inputting the normalized data into a network, obtaining a classification result and a position regression result in each preset position frame, and obtaining K groups of [ score, x, y, width and height ] arrays by corresponding the classification result and the position regression result, wherein the score represents the confidence coefficient that the current preset position frame belongs to the face, and the x, y, width and height represent the detailed position of the face. Since there is only one driver at most in the car and the driver is closest to the camera, the array with the largest score can be selected as the detection result.

Step 4, intercepting the image of the face area through the acquired face position information, scaling the image to 256 multiplied by 256 and then inputting the image into the face key point detection network, and the main steps comprise:

and 4.1, acquiring RGB image data of a scene in the vehicle through the camera, intercepting an image of a face area according to the face position information acquired in the step 3, and scaling the face image to 256 multiplied by 256 pixels.

And 4.2, inputting the zoomed image into a face key point detection network to obtain 68 predicted feature images with the size of 256 multiplied by 256, wherein the position of the maximum pixel value of the k-th feature image with the size of 256 multiplied by 256 is obtained and is regarded as the position of the k-th key point, so that the positions of the 68 key points can be obtained.

According to the embodiment, the neural network can be trained by marking the face image samples of the face key points of the shielded part, so that the neural network can accurately detect the face key points of the face image of the partially shielded face, and the detection accuracy of the face key point detection is improved.

Fig. 6 is a schematic structural diagram of a face keypoint detection apparatus according to an embodiment of the present application. As shown in fig. 6, the face key point detecting device 60 includes: .

The acquiring module 601 is configured to acquire a face image including a face.

The processing module 602 is configured to perform key point detection on the face image through a neural network to obtain position information of key points of a face in the face image, where the neural network is trained through a training data set, the training data set includes a plurality of first face image samples and corresponding label information thereof, a face included in each first face image sample has a blocked portion, and the label information includes position information of key points of a blocked portion and an unblocked portion on the face.

Fig. 7 is a schematic structural diagram of a face keypoint detection apparatus according to yet another embodiment of the present application. As shown in fig. 7, the face keypoint detection apparatus 60 provided in this embodiment may further include, on the basis of the face keypoint detection apparatus provided in the embodiment shown in fig. 6: a training module 603 and a display module 604.

Optionally, the neural network comprises a convolutional neural network;

the processing module 602 is specifically configured to:

Optionally, the training module 603 is configured to:

Optionally, the training module 603 is further configured to:

Optionally, the obtaining module 601 is specifically configured to:

acquiring an image acquired by a camera;

Optionally, the obtaining module 601 is specifically configured to:

Optionally, a display module 604, configured to:

The face key point detection device provided by the embodiment of the application can be used for executing the method embodiment, the implementation principle and the technical effect are similar, and the embodiment is not repeated herein.

Fig. 8 is a schematic diagram of a hardware structure of a face keypoint detection apparatus according to an embodiment of the present application. As shown in fig. 8, the face keypoint detection apparatus 80 provided in this embodiment includes: at least one processor 801 and a memory 802. The face keypoint detection apparatus 80 further comprises a communication component 803. The processor 801, the memory 802, and the communication unit 803 are connected by a bus 804.

In a specific implementation, the at least one processor 801 executes the computer-executable instructions stored in the memory 802, so that the at least one processor 801 executes the above face keypoint detection method.

For a specific implementation process of the processor 801, reference may be made to the above method embodiments, which have similar implementation principles and technical effects, and details of this embodiment are not described herein again.

In the embodiment shown in fig. 8, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in the incorporated application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.

The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The application also provides a computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the above human face key point detection method is realized.

The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. a face key point detection method, is characterized in that, comprises:

Get a face image containing a face;

Perform key point detection on the face image through a neural network to obtain the position information of the key points of the face in the face image, wherein the neural network is trained on a training data set, and the training data set includes a plurality of A face image sample and its corresponding marking information, the face contained in each first face image sample has an occluded part, and the marking information includes the key of the occluded part and the unoccluded part of the face location information of the point.

2. The method of claim 1, wherein the neural network comprises a convolutional neural network;

Perform key point detection on the face image through a neural network, and obtain the position information of the key points of the face in the face image, including:

Inputting the face image into the convolutional neural network to obtain a preset number of feature maps, wherein the feature maps correspond one-to-one with the key points on the face in the face image;

For each feature map, the coordinates of the location of the pixel with the largest pixel value in the feature map are determined as the location coordinates of the key points corresponding to the feature map.

3. The method according to claim 1, characterized in that, before acquiring a human face image containing a human face, the method further comprises:

Obtain a plurality of first face image samples and the marked coordinates of the pre-marked face key points;

For each first face image sample, according to the marked coordinates of each key point on the face in the first face image sample, a heat map corresponding to the key point is generated, wherein the heat map has the The pixel value of the position corresponding to the marked coordinate of the key point is greater than the pixel value of the remaining positions, and the pixel value of the remaining position decreases as the distance from the position corresponding to the marked coordinate of the key point increases;

The training data set is composed of each first face image and its corresponding set of heat maps.

4. The method according to claim 3, wherein after composing each first face image sample and its corresponding set of heat maps into the training data set, the method further comprises:

For each first face image sample in the training data set, input the first face image sample into a pre-built neural network to obtain a plurality of predicted feature maps output by the neural network, wherein the predicted feature maps There is a one-to-one correspondence with the heat map of the first face image sample, and the network parameters of the neural network are adjusted according to a set of heat maps and prediction feature maps of the first face image sample.

5. The method according to claim 1, wherein the ratio of the number of key points of the part of the face that is occluded in the first face image sample to the total number of key points of the face is less than or equal to a preset threshold.

6. The method according to claim 1, wherein the training data set further comprises a plurality of second human face image samples and their corresponding marking information, and each second human face image sample contains no occlusions face.

7. The method according to any one of claims 1-6, wherein acquiring a human face image comprising a human face comprises:

Get the image captured by the camera;

Identify the location information of the area where the face in the image is located;

The image of the region corresponding to the position information is extracted from the image as the face image.

8. The method according to claim 7, wherein identifying the location information of the area where the human face in the image is located comprises:

The image is input into the single-shot multi-frame detection SSD network to obtain the location information of the area where the face in the image is located.

9. The method according to any one of claims 1-6, characterized in that, after performing key point detection on the face image by a neural network, after obtaining the position information of the face key points in the face image, The method also includes:

According to the position information of the face key points in the face image, each face key point is marked in the face image, and the marked face image is displayed.

10. A face key point detection device, comprising:

an acquisition module, used to acquire a face image containing a human face;

a processing module, configured to perform key point detection on the face image through a neural network to obtain position information of key points of the face in the face image, wherein the neural network is trained on a training data set, and the training data The set includes a plurality of first face image samples and their corresponding marking information, the face contained in each first face image sample has an occluded part, and the marking information includes the occluded part and the unobstructed part of the face. The location information of the key points of the occluded part.

11. A face key point detection device, comprising: at least one processor and a memory;

the memory stores computer-executable instructions;

The at least one processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the method for detecting a face key point according to any one of claims 1-9.

12. A computer-readable storage medium, characterized in that, computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the computer-executable instructions according to any one of claims 1-9 are implemented. The face keypoint detection method described above.