Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for recognizing three-dimensional coordinates of key points of a hand, so as to realize recognition of three-dimensional coordinates of key points of the hand based on color data.
In order to achieve the above purpose, the invention provides the following specific technical scheme:
a three-dimensional coordinate identification method for key points of a hand comprises the following steps:
acquiring a target hand frame color image, wherein the target hand frame color image is a color image obtained after hand detection;
and inputting the target hand frame color image into a three-dimensional coordinate recognition network model of the hand key point for processing to obtain the three-dimensional coordinate of the target hand key point.
Optionally, the method further includes:
acquiring training data of the three-dimensional coordinate recognition network model of the hand key points;
and training a preset neural network model by using the training data, and obtaining the three-dimensional coordinate recognition network model of the hand key point when the accuracy of the output result of the preset neural network model is greater than a threshold value.
Optionally, the acquiring training data of the three-dimensional coordinate recognition network model of the hand key point includes:
under the condition that the direction of a camera and the distance between the camera and a prosthetic hand model in a CG model are set, acquiring a color image of the prosthetic hand model by using the camera;
acquiring three-dimensional coordinates of key points of the hand according to the artificial hand model;
fusing the color image of the artificial hand model with the real scene image to obtain a color image with a foreground artificial hand model and a real background;
according to internal parameters of a camera, cutting a hand area in a color image with a foreground artificial hand model and a real background to obtain a hand frame color image, and performing normalization processing on the hand frame color image and three-dimensional coordinates of hand key points to obtain training data of the hand key point three-dimensional coordinate recognition network model, wherein the training data comprises the normalized hand frame color image and the three-dimensional coordinates of the hand key points.
Optionally, the acquiring training data of the three-dimensional coordinate recognition network model of the hand key point includes:
acquiring a depth image and a color image which are acquired by a depth camera and are synchronized and registered with each other;
recognizing three-dimensional coordinates of the hand key points of the depth image by using a hand key point coordinate recognition model based on depth data;
cutting a hand area in the color image to obtain a hand frame color image corresponding to the depth image hand area;
and normalizing the three-dimensional coordinates of the hand key points according to the depth value of the center of the hand frame color image to obtain training data of the hand key point three-dimensional coordinate recognition network model comprising the hand frame color image and the normalized three-dimensional coordinates of the hand key points.
Optionally, the three-dimensional coordinates of the target hand key point are position coordinates relative to the center of the target hand frame color image area, and the method further includes:
acquiring a depth value of the center of the target hand frame color image area;
and calculating the real three-dimensional coordinates of the key points of the target hand according to the depth value of the center of the color image area of the target hand frame.
A hand key point three-dimensional coordinate recognition device comprises:
the device comprises a color image acquisition unit, a color image acquisition unit and a color image processing unit, wherein the color image acquisition unit is used for acquiring a target hand frame color image which is obtained after hand detection;
and the three-dimensional coordinate identification unit is used for inputting the target hand frame color image into a hand key point three-dimensional coordinate identification network model for processing to obtain the three-dimensional coordinates of the target hand key point.
Optionally, the apparatus further comprises:
the training data acquisition unit is used for acquiring training data of the three-dimensional coordinate recognition network model of the hand key points;
and the recognition model training unit is used for training a preset neural network model by using the training data, and when the accuracy of the output result of the preset neural network model is greater than a threshold value, the three-dimensional coordinate recognition network model of the hand key point is obtained.
Optionally, the training data obtaining unit is specifically configured to:
under the condition that the direction of a camera and the distance between the camera and a prosthetic hand model in a CG model are set, acquiring a color image of the prosthetic hand model by using the camera;
acquiring three-dimensional coordinates of key points of the hand according to the artificial hand model;
fusing the color image of the artificial hand model with the real scene image to obtain a color image with a foreground artificial hand model and a real background;
according to internal parameters of a camera, cutting a hand area in a color image with a foreground artificial hand model and a real background to obtain a hand frame color image, and performing normalization processing on the hand frame color image and three-dimensional coordinates of hand key points to obtain training data of the hand key point three-dimensional coordinate recognition network model, wherein the training data comprises the normalized hand frame color image and the three-dimensional coordinates of the hand key points.
Optionally, the training data obtaining unit is specifically configured to:
acquiring a depth image and a color image which are acquired by a depth camera and are synchronized and registered with each other;
recognizing three-dimensional coordinates of the hand key points of the depth image by using a hand key point coordinate recognition model based on depth data;
cutting a hand area in the color image to obtain a hand frame color image corresponding to the depth image hand area;
and normalizing the three-dimensional coordinates of the hand key points according to the depth value of the center of the hand frame color image to obtain training data of the hand key point three-dimensional coordinate recognition network model comprising the hand frame color image and the normalized three-dimensional coordinates of the hand key points.
Optionally, the three-dimensional coordinates of the target hand key point are position coordinates relative to the center of the target hand frame color image area, and the apparatus further includes:
the three-dimensional coordinate conversion unit is used for acquiring the depth value of the center of the target hand frame color image area; and calculating the real three-dimensional coordinates of the key points of the target hand according to the depth value of the center of the color image area of the target hand frame.
Compared with the prior art, the invention has the following beneficial effects:
according to the method for identifying the three-dimensional coordinates of the key points of the hands, the three-dimensional coordinates of the key points of the hands in the color image of the hand frame are identified by using the three-dimensional coordinate identification network model of the key points of the hands obtained through pre-selection training, the three-dimensional coordinates of the key points of the hands can be identified only based on the color image without using a depth image, the user experience is improved, and the application scene is wide.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For a depth camera, a color camera has low cost and low energy consumption, and is widely applied to current mobile end equipment, and this embodiment discloses a method for identifying three-dimensional coordinates of a hand key point, which is applied to mobile end equipment equipped with a color camera, and realizes the identification of three-dimensional coordinates of the hand key point based on a color image, please refer to fig. 1, and the method for identifying three-dimensional coordinates of the hand key point disclosed in this embodiment includes the following steps:
s101: acquiring a target hand frame color image, wherein the target hand frame color image is a color image obtained after hand detection;
after the color camera collects a color image containing a hand image, a target hand frame color image is obtained through hand detection, and the target hand frame color image is the hand frame color image needing to be subjected to three-dimensional coordinate identification of a hand key point at this time.
S102: and inputting the target hand frame color image into a three-dimensional coordinate recognition network model of the hand key point for processing to obtain the three-dimensional coordinate of the target hand key point.
The three-dimensional coordinate recognition network model of the hand key point is obtained by pre-training, please refer to fig. 2, and the training method of the three-dimensional coordinate recognition network model of the hand key point is as follows:
s201: acquiring training data of a three-dimensional coordinate recognition network model of a hand key point;
the embodiment provides two methods for acquiring training data, wherein in the first method, a color camera is used for acquiring a color image of a prosthetic hand model in a CG model, and the training data is obtained through image fusion processing, hand region cutting and three-dimensional coordinate normalization processing of hand key points; and acquiring a depth image and a color image containing a hand through a depth camera, recognizing the three-dimensional coordinates of the hand key points of the depth image based on a hand key point coordinate recognition model of the depth data, and obtaining training data through hand region cutting and three-dimensional coordinate normalization processing of the hand key points.
The two training data acquisition methods specifically comprise the following steps:
method 1
Referring to fig. 3, the method for obtaining training data of the three-dimensional coordinate recognition network model of the hand key point includes the following steps:
s301: under the condition that the direction of a camera and the distance between the camera and a prosthetic hand model in a CG model are set, acquiring a color image of the prosthetic hand model by using the camera;
the relative positions of the camera and the dummy hand model in the CG model are shown in fig. 4, where the camera's intrinsic parameters have been set, and the camera's intrinsic parameters include a horizontal focal length, a vertical focal length, an image center horizontal coordinate, and a vertical coordinate.
S302: acquiring three-dimensional coordinates of key points of the hand according to the artificial hand model;
as shown in fig. 5, the collected color image of the hand model corresponds to the hand key points, and the three-dimensional coordinates of the hand key points are three-dimensional coordinates having a depth direction, and may be world coordinates or image uvd coordinates.
S303: fusing the color image of the artificial hand model with the real scene image to obtain a color image with a foreground artificial hand model and a real background;
specifically, the color image of the artificial hand model is used as a mask (mask) of a real image, the color image of the artificial hand model is placed in the center of a real scene image or other set positions, and the data of an original real image corresponding to an area in the contour of the artificial hand is changed into color image data of the artificial hand model, so that the color image of the artificial hand model is fused with the real scene image.
A color image with a foreground prosthetic hand model and a real background is shown in fig. 6.
S304: according to internal parameters of a camera, cutting a hand area in a color image with a foreground artificial hand model and a real background to obtain a hand frame color image, and performing normalization processing on the hand frame color image and three-dimensional coordinates of hand key points to obtain training data of the hand key point three-dimensional coordinate recognition network model, wherein the training data comprises the normalized hand frame color image and the three-dimensional coordinates of the hand key points.
The color image of the hand frame obtained after cutting the hand area is the smallest surrounding frame image which can surround the hand area.
The normalization processing of the three-dimensional coordinates of the hand frame color image and the hand key points means that pixels of the hand frame color image are normalized to be in an interval of [ -1,1] or [0,1], the coordinates of the hand key points are normalized by taking centroid coordinates obtained by calculating the coordinates of the artificial hand frame as normalization reference points, the normalization interval is [ -1,1] or [0,1], and the normalization interval is consistent with the normalization range of the hand frame color image.
Method two
Referring to fig. 7, the method for obtaining training data of the three-dimensional coordinate recognition network model of the hand key point includes the following steps:
s401: acquiring a depth image and a color image which are acquired by a depth camera and are synchronized and registered with each other;
the registration here means that the color image and the depth image are consistent in size and have one-to-one correspondence in pixel value in the process of guaranteeing frame synchronization transmission. The one-to-one correspondence referred to herein is not the same as the pixel values in the color image and the depth image, but rather the color image is palm RGB data for one hand at the center of the image, and the corresponding depth image center is the palm depth value for one hand.
S402: recognizing three-dimensional coordinates of the hand key points of the depth image by using a hand key point coordinate recognition model based on depth data;
the hand key point coordinate identification model based on the depth data can be any existing model, the principle of which is existing, and the description is omitted here.
S403: cutting a hand area in the color image to obtain a hand frame color image corresponding to the depth image hand area;
the color image of the hand frame obtained after cutting the hand area is the smallest surrounding frame image which can surround the hand area.
S404: and normalizing the three-dimensional coordinates of the hand key points according to the depth value of the center of the hand frame color image to obtain training data of the hand key point three-dimensional coordinate recognition network model comprising the hand frame color image and the normalized three-dimensional coordinates of the hand key points.
The three-dimensional coordinates of the hand key points are normalized according to the depth value of the center of the color image of the hand frame, specifically, the depth value of the center of the color image of the hand frame is used as a reference point for normalizing the coordinates of the cut hand key points to be normalized to be [ -1,1] or [0,1], and meanwhile, the color image of the hand frame is also normalized in a corresponding range, namely, the pixels of the color image of the hand frame are normalized to be in an interval [ -1,1] or [0,1 ].
The two training data acquisition methods can perform data enhancement processes, translation, rotation, scaling, mirror image and the like after obtaining the normalized data, and meanwhile, the artificial hand model can also be flexibly driven through the degrees of freedom of the skeleton joints to perform different gestures, which is not repeated herein.
S202: and training a preset neural network model by using the training data, and obtaining a three-dimensional coordinate recognition network model of the hand key point when the accuracy of the output result of the preset neural network model is greater than a threshold value.
The three-dimensional coordinate recognition network model of the hand key points has the input data of a hand frame color image and the output data of the three-dimensional coordinate recognition network model of the hand key points of the hand frame color image.
The three-dimensional coordinates of the hand key points output by the hand key point three-dimensional coordinate recognition network model are position coordinates relative to the center of the target hand frame color image area, the depth value of the center of the target hand frame color image area is acquired by means of a bullet screen SLAM and the like, and the real three-dimensional coordinates of the target hand key points are calculated according to the depth value of the center of the target hand frame color image area.
Therefore, according to the method for recognizing the three-dimensional coordinates of the key points of the hands disclosed by the embodiment, the three-dimensional coordinates of the key points of the hands in the color image of the hand frame are recognized by using the three-dimensional coordinate recognition network model of the key points of the hands obtained through the pre-selection training, the three-dimensional coordinates of the key points of the hands can be recognized only based on the color image without using a depth image, the user experience is improved, and the application scene is wide.
Based on the method for identifying three-dimensional coordinates of key points of a hand disclosed in the above embodiments, the present embodiment correspondingly discloses a device for identifying three-dimensional coordinates of key points of a hand, please refer to fig. 8, and the device includes:
a color image acquisition unit 801, configured to acquire a color image of a target hand frame, where the color image of the target hand frame is obtained after hand detection;
and a three-dimensional coordinate recognition unit 802, configured to input the target hand frame color image into a hand key point three-dimensional coordinate recognition network model for processing, so as to obtain a three-dimensional coordinate of the target hand key point.
Optionally, the apparatus further comprises:
the training data acquisition unit is used for acquiring training data of the three-dimensional coordinate recognition network model of the hand key points;
and the recognition model training unit is used for training a preset neural network model by using the training data, and when the accuracy of the output result of the preset neural network model is greater than a threshold value, the three-dimensional coordinate recognition network model of the hand key point is obtained.
Optionally, the training data obtaining unit is specifically configured to:
under the condition that the direction of a camera and the distance between the camera and a prosthetic hand model in a CG model are set, acquiring a color image of the prosthetic hand model by using the camera;
acquiring three-dimensional coordinates of key points of the hand according to the artificial hand model;
fusing the color image of the artificial hand model with the real scene image to obtain a color image with a foreground artificial hand model and a real background;
according to internal parameters of a camera, cutting a hand area in a color image with a foreground artificial hand model and a real background to obtain a hand frame color image, and performing normalization processing on the hand frame color image and three-dimensional coordinates of hand key points to obtain training data of the hand key point three-dimensional coordinate recognition network model, wherein the training data comprises the normalized hand frame color image and the three-dimensional coordinates of the hand key points.
Optionally, the training data obtaining unit is specifically configured to:
acquiring a depth image and a color image which are acquired by a depth camera and are synchronized and registered with each other;
recognizing three-dimensional coordinates of the hand key points of the depth image by using a hand key point coordinate recognition model based on depth data;
cutting a hand area in the color image to obtain a hand frame color image corresponding to the depth image hand area;
and normalizing the three-dimensional coordinates of the hand key points according to the depth value of the center of the hand frame color image to obtain training data of the hand key point three-dimensional coordinate recognition network model comprising the hand frame color image and the normalized three-dimensional coordinates of the hand key points.
Optionally, the three-dimensional coordinates of the target hand key point are position coordinates relative to the center of the target hand frame color image area, and the apparatus further includes:
the three-dimensional coordinate conversion unit is used for acquiring the depth value of the center of the target hand frame color image area; and calculating the real three-dimensional coordinates of the key points of the target hand according to the depth value of the center of the color image area of the target hand frame.
According to the hand key point three-dimensional coordinate recognition device disclosed by the embodiment, the three-dimensional coordinates of the hand key points in the hand frame color image are recognized by using the hand key point three-dimensional coordinate recognition network model obtained through pre-selection training, the three-dimensional coordinates of the hand key points can be recognized only based on the color image without using a depth image, the user experience is improved, and the application scene is wide.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.