CN111222401A

CN111222401A - Method and device for identifying three-dimensional coordinates of hand key points

Info

Publication number: CN111222401A
Application number: CN201911112541.2A
Authority: CN
Inventors: 李江; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2020-06-02
Anticipated expiration: 2039-11-14
Also published as: CN111222401B

Abstract

The present invention provides a method and device for recognizing three-dimensional coordinates of key points of a hand, obtaining a color image of a target hand frame, and the color image of the target hand frame is a color image obtained after hand detection; The frame color image is input into the three-dimensional coordinate recognition network model of the hand key point for processing, and the three-dimensional coordinate of the target hand key point is obtained. The invention realizes the identification of three-dimensional coordinates of hand key points based on color data.

Description

Method and device for identifying three-dimensional coordinates of hand key points

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for identifying three-dimensional coordinates of key points of a hand.

Background

The 3D gesture key point estimation is a key technology of 3D gesture control, and a common current hand key point coordinate estimation scheme based on a depth image is as follows: and directly or indirectly utilizing a depth camera to obtain an infrared image and a color image thereof, identifying two-dimensional coordinates of the key points of the hand in the image by adopting a color image algorithm of an RGB space, then finding depth values of corresponding positions in the registered depth image as numerical values of the depth direction of the depth image, or directly identifying three-dimensional coordinates of the key points of the hand in the depth image by adopting a monocular depth image data algorithm.

However, the technology for estimating the key points of the hand based on the monocular depth camera relies on the quality of the data of the depth map, and when the depth image has more noise, the depth map is not accurate enough, the edge contour is not smooth enough, or the background depth value has great interference, the depth data of the foreground of the hand is not accurate enough, which affects the accuracy of the coordinate estimation of the key points of the hand. In the existing mobile terminal devices, for example, mobile phones, tablet computers and other devices are not provided with a large number of products integrated with the depth camera, and most existing products are overheated and have serious power consumption, so that the user experience of realizing the hand key point coordinate estimation based on the depth camera is poor.

Disclosure of Invention

In view of this, the present invention provides a method and an apparatus for recognizing three-dimensional coordinates of key points of a hand, so as to realize recognition of three-dimensional coordinates of key points of the hand based on color data.

In order to achieve the above purpose, the invention provides the following specific technical scheme:

a three-dimensional coordinate identification method for key points of a hand comprises the following steps:

acquiring a target hand frame color image, wherein the target hand frame color image is a color image obtained after hand detection;

and inputting the target hand frame color image into a three-dimensional coordinate recognition network model of the hand key point for processing to obtain the three-dimensional coordinate of the target hand key point.

Optionally, the method further includes:

acquiring training data of the three-dimensional coordinate recognition network model of the hand key points;

and training a preset neural network model by using the training data, and obtaining the three-dimensional coordinate recognition network model of the hand key point when the accuracy of the output result of the preset neural network model is greater than a threshold value.

Optionally, the acquiring training data of the three-dimensional coordinate recognition network model of the hand key point includes:

under the condition that the direction of a camera and the distance between the camera and a prosthetic hand model in a CG model are set, acquiring a color image of the prosthetic hand model by using the camera;

acquiring three-dimensional coordinates of key points of the hand according to the artificial hand model;

fusing the color image of the artificial hand model with the real scene image to obtain a color image with a foreground artificial hand model and a real background;

according to internal parameters of a camera, cutting a hand area in a color image with a foreground artificial hand model and a real background to obtain a hand frame color image, and performing normalization processing on the hand frame color image and three-dimensional coordinates of hand key points to obtain training data of the hand key point three-dimensional coordinate recognition network model, wherein the training data comprises the normalized hand frame color image and the three-dimensional coordinates of the hand key points.

acquiring a depth image and a color image which are acquired by a depth camera and are synchronized and registered with each other;

recognizing three-dimensional coordinates of the hand key points of the depth image by using a hand key point coordinate recognition model based on depth data;

cutting a hand area in the color image to obtain a hand frame color image corresponding to the depth image hand area;

and normalizing the three-dimensional coordinates of the hand key points according to the depth value of the center of the hand frame color image to obtain training data of the hand key point three-dimensional coordinate recognition network model comprising the hand frame color image and the normalized three-dimensional coordinates of the hand key points.

Optionally, the three-dimensional coordinates of the target hand key point are position coordinates relative to the center of the target hand frame color image area, and the method further includes:

acquiring a depth value of the center of the target hand frame color image area;

and calculating the real three-dimensional coordinates of the key points of the target hand according to the depth value of the center of the color image area of the target hand frame.

A hand key point three-dimensional coordinate recognition device comprises:

the device comprises a color image acquisition unit, a color image acquisition unit and a color image processing unit, wherein the color image acquisition unit is used for acquiring a target hand frame color image which is obtained after hand detection;

and the three-dimensional coordinate identification unit is used for inputting the target hand frame color image into a hand key point three-dimensional coordinate identification network model for processing to obtain the three-dimensional coordinates of the target hand key point.

Optionally, the apparatus further comprises:

the training data acquisition unit is used for acquiring training data of the three-dimensional coordinate recognition network model of the hand key points;

and the recognition model training unit is used for training a preset neural network model by using the training data, and when the accuracy of the output result of the preset neural network model is greater than a threshold value, the three-dimensional coordinate recognition network model of the hand key point is obtained.

Optionally, the training data obtaining unit is specifically configured to:

Optionally, the three-dimensional coordinates of the target hand key point are position coordinates relative to the center of the target hand frame color image area, and the apparatus further includes:

the three-dimensional coordinate conversion unit is used for acquiring the depth value of the center of the target hand frame color image area; and calculating the real three-dimensional coordinates of the key points of the target hand according to the depth value of the center of the color image area of the target hand frame.

Compared with the prior art, the invention has the following beneficial effects:

according to the method for identifying the three-dimensional coordinates of the key points of the hands, the three-dimensional coordinates of the key points of the hands in the color image of the hand frame are identified by using the three-dimensional coordinate identification network model of the key points of the hands obtained through pre-selection training, the three-dimensional coordinates of the key points of the hands can be identified only based on the color image without using a depth image, the user experience is improved, and the application scene is wide.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for identifying three-dimensional coordinates of key points of a hand according to an embodiment of the present invention;

FIG. 2 is a schematic flowchart of a method for training a three-dimensional coordinate recognition network model of a hand key point according to an embodiment of the present invention;

FIG. 3 is a schematic flowchart of a method for obtaining training data of a three-dimensional coordinate recognition network model of a hand key point according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating the relative positions of a camera and a dummy hand model in a CG model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a color image of a prosthetic hand model collected according to an embodiment of the present invention;

fig. 6 is a schematic image diagram obtained by fusing a color image of a prosthetic hand model and a real scene image, which is disclosed in the embodiment of the present invention;

FIG. 7 is a schematic flowchart of another method for obtaining training data of a three-dimensional coordinate recognition network model of a hand key point according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a hand key point three-dimensional coordinate recognition device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For a depth camera, a color camera has low cost and low energy consumption, and is widely applied to current mobile end equipment, and this embodiment discloses a method for identifying three-dimensional coordinates of a hand key point, which is applied to mobile end equipment equipped with a color camera, and realizes the identification of three-dimensional coordinates of the hand key point based on a color image, please refer to fig. 1, and the method for identifying three-dimensional coordinates of the hand key point disclosed in this embodiment includes the following steps:

s101: acquiring a target hand frame color image, wherein the target hand frame color image is a color image obtained after hand detection;

after the color camera collects a color image containing a hand image, a target hand frame color image is obtained through hand detection, and the target hand frame color image is the hand frame color image needing to be subjected to three-dimensional coordinate identification of a hand key point at this time.

S102: and inputting the target hand frame color image into a three-dimensional coordinate recognition network model of the hand key point for processing to obtain the three-dimensional coordinate of the target hand key point.

The three-dimensional coordinate recognition network model of the hand key point is obtained by pre-training, please refer to fig. 2, and the training method of the three-dimensional coordinate recognition network model of the hand key point is as follows:

s201: acquiring training data of a three-dimensional coordinate recognition network model of a hand key point;

the embodiment provides two methods for acquiring training data, wherein in the first method, a color camera is used for acquiring a color image of a prosthetic hand model in a CG model, and the training data is obtained through image fusion processing, hand region cutting and three-dimensional coordinate normalization processing of hand key points; and acquiring a depth image and a color image containing a hand through a depth camera, recognizing the three-dimensional coordinates of the hand key points of the depth image based on a hand key point coordinate recognition model of the depth data, and obtaining training data through hand region cutting and three-dimensional coordinate normalization processing of the hand key points.

The two training data acquisition methods specifically comprise the following steps:

method 1

Referring to fig. 3, the method for obtaining training data of the three-dimensional coordinate recognition network model of the hand key point includes the following steps:

s301: under the condition that the direction of a camera and the distance between the camera and a prosthetic hand model in a CG model are set, acquiring a color image of the prosthetic hand model by using the camera;

the relative positions of the camera and the dummy hand model in the CG model are shown in fig. 4, where the camera's intrinsic parameters have been set, and the camera's intrinsic parameters include a horizontal focal length, a vertical focal length, an image center horizontal coordinate, and a vertical coordinate.

S302: acquiring three-dimensional coordinates of key points of the hand according to the artificial hand model;

as shown in fig. 5, the collected color image of the hand model corresponds to the hand key points, and the three-dimensional coordinates of the hand key points are three-dimensional coordinates having a depth direction, and may be world coordinates or image uvd coordinates.

S303: fusing the color image of the artificial hand model with the real scene image to obtain a color image with a foreground artificial hand model and a real background;

specifically, the color image of the artificial hand model is used as a mask (mask) of a real image, the color image of the artificial hand model is placed in the center of a real scene image or other set positions, and the data of an original real image corresponding to an area in the contour of the artificial hand is changed into color image data of the artificial hand model, so that the color image of the artificial hand model is fused with the real scene image.

A color image with a foreground prosthetic hand model and a real background is shown in fig. 6.

S304: according to internal parameters of a camera, cutting a hand area in a color image with a foreground artificial hand model and a real background to obtain a hand frame color image, and performing normalization processing on the hand frame color image and three-dimensional coordinates of hand key points to obtain training data of the hand key point three-dimensional coordinate recognition network model, wherein the training data comprises the normalized hand frame color image and the three-dimensional coordinates of the hand key points.

The color image of the hand frame obtained after cutting the hand area is the smallest surrounding frame image which can surround the hand area.

The normalization processing of the three-dimensional coordinates of the hand frame color image and the hand key points means that pixels of the hand frame color image are normalized to be in an interval of [ -1,1] or [0,1], the coordinates of the hand key points are normalized by taking centroid coordinates obtained by calculating the coordinates of the artificial hand frame as normalization reference points, the normalization interval is [ -1,1] or [0,1], and the normalization interval is consistent with the normalization range of the hand frame color image.

Method two

Referring to fig. 7, the method for obtaining training data of the three-dimensional coordinate recognition network model of the hand key point includes the following steps:

s401: acquiring a depth image and a color image which are acquired by a depth camera and are synchronized and registered with each other;

the registration here means that the color image and the depth image are consistent in size and have one-to-one correspondence in pixel value in the process of guaranteeing frame synchronization transmission. The one-to-one correspondence referred to herein is not the same as the pixel values in the color image and the depth image, but rather the color image is palm RGB data for one hand at the center of the image, and the corresponding depth image center is the palm depth value for one hand.

S402: recognizing three-dimensional coordinates of the hand key points of the depth image by using a hand key point coordinate recognition model based on depth data;

the hand key point coordinate identification model based on the depth data can be any existing model, the principle of which is existing, and the description is omitted here.

S403: cutting a hand area in the color image to obtain a hand frame color image corresponding to the depth image hand area;

S404: and normalizing the three-dimensional coordinates of the hand key points according to the depth value of the center of the hand frame color image to obtain training data of the hand key point three-dimensional coordinate recognition network model comprising the hand frame color image and the normalized three-dimensional coordinates of the hand key points.

The three-dimensional coordinates of the hand key points are normalized according to the depth value of the center of the color image of the hand frame, specifically, the depth value of the center of the color image of the hand frame is used as a reference point for normalizing the coordinates of the cut hand key points to be normalized to be [ -1,1] or [0,1], and meanwhile, the color image of the hand frame is also normalized in a corresponding range, namely, the pixels of the color image of the hand frame are normalized to be in an interval [ -1,1] or [0,1 ].

The two training data acquisition methods can perform data enhancement processes, translation, rotation, scaling, mirror image and the like after obtaining the normalized data, and meanwhile, the artificial hand model can also be flexibly driven through the degrees of freedom of the skeleton joints to perform different gestures, which is not repeated herein.

S202: and training a preset neural network model by using the training data, and obtaining a three-dimensional coordinate recognition network model of the hand key point when the accuracy of the output result of the preset neural network model is greater than a threshold value.

The three-dimensional coordinate recognition network model of the hand key points has the input data of a hand frame color image and the output data of the three-dimensional coordinate recognition network model of the hand key points of the hand frame color image.

The three-dimensional coordinates of the hand key points output by the hand key point three-dimensional coordinate recognition network model are position coordinates relative to the center of the target hand frame color image area, the depth value of the center of the target hand frame color image area is acquired by means of a bullet screen SLAM and the like, and the real three-dimensional coordinates of the target hand key points are calculated according to the depth value of the center of the target hand frame color image area.

Therefore, according to the method for recognizing the three-dimensional coordinates of the key points of the hands disclosed by the embodiment, the three-dimensional coordinates of the key points of the hands in the color image of the hand frame are recognized by using the three-dimensional coordinate recognition network model of the key points of the hands obtained through the pre-selection training, the three-dimensional coordinates of the key points of the hands can be recognized only based on the color image without using a depth image, the user experience is improved, and the application scene is wide.

Based on the method for identifying three-dimensional coordinates of key points of a hand disclosed in the above embodiments, the present embodiment correspondingly discloses a device for identifying three-dimensional coordinates of key points of a hand, please refer to fig. 8, and the device includes:

a color image acquisition unit 801, configured to acquire a color image of a target hand frame, where the color image of the target hand frame is obtained after hand detection;

and a three-dimensional coordinate recognition unit 802, configured to input the target hand frame color image into a hand key point three-dimensional coordinate recognition network model for processing, so as to obtain a three-dimensional coordinate of the target hand key point.

Optionally, the apparatus further comprises:

Optionally, the training data obtaining unit is specifically configured to:

According to the hand key point three-dimensional coordinate recognition device disclosed by the embodiment, the three-dimensional coordinates of the hand key points in the hand frame color image are recognized by using the hand key point three-dimensional coordinate recognition network model obtained through pre-selection training, the three-dimensional coordinates of the hand key points can be recognized only based on the color image without using a depth image, the user experience is improved, and the application scene is wide.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. a three-dimensional coordinate identification method of hand key point, is characterized in that, comprises:

acquiring a color image of the target hand frame, where the target hand frame color image is a color image obtained after hand detection;

The color image of the target hand frame is input into the three-dimensional coordinate recognition network model of the key points of the hand for processing, and the three-dimensional coordinates of the key points of the target hand are obtained, and the three-dimensional coordinates of the key points of the target hand are relative to the The position coordinates of the center of the color image area of the target hand box.

2. The method according to claim 1, wherein the method further comprises:

Obtain the training data of the three-dimensional coordinate recognition network model of the hand key point;

The preset neural network model is trained by using the training data, and when the accuracy rate of the output result of the preset neural network model is greater than a threshold, the three-dimensional coordinate recognition network model of the hand key point is obtained.

3. The method according to claim 2, wherein the acquiring the training data of the three-dimensional coordinate recognition network model of the hand key point comprises:

When the direction of the camera and the distance between the camera and the prosthetic hand model in the CG model are set, the camera is used to collect a color image of the prosthetic hand model;

obtaining three-dimensional coordinates of key points of the hand according to the prosthetic hand model;

fusing the color image of the prosthetic hand model with the real scene image to obtain a color image with a foreground prosthetic hand model and a real background;

According to the internal parameters of the camera, the hand region is cropped in the color image with the foreground prosthetic hand model and the real background, and the color image of the hand frame is obtained, and the color image of the hand frame and the three-dimensional coordinates of the key points of the hand are normalized. The training data of the three-dimensional coordinate recognition network model of the hand key point including the normalized color image of the hand frame and the three-dimensional coordinates of the hand key point is obtained.

4. The method according to claim 2, wherein the acquiring the training data of the three-dimensional coordinate recognition network model of the hand key point comprises:

Acquire the synchronized and registered depth image and color image captured by the depth camera;

Utilize the hand key point coordinate recognition model based on the depth data to identify the three-dimensional coordinates of the hand key point of the depth image;

Cropping the hand region in the color image to obtain a color image of the hand frame corresponding to the hand region of the depth image;

According to the depth value of the center of the color image of the hand frame, the three-dimensional coordinates of the hand key points are normalized to obtain the hand key points including the color image of the hand frame and the three-dimensional coordinates of the normalized hand key points. The training data for the 3D coordinate recognition network model.

5. The method according to claim 1, wherein the method further comprises:

Obtain the depth value of the center of the color image area of the target hand frame;

According to the depth value of the center of the color image area of the target hand frame, the real three-dimensional coordinates of the key points of the target hand are calculated.

6. A three-dimensional coordinate recognition device for hand key points, characterized in that, comprising:

a color image acquisition unit, configured to acquire a color image of a target hand frame, where the target hand frame color image is a color image obtained after hand detection;

The three-dimensional coordinate recognition unit is used for inputting the color image of the target hand frame into the three-dimensional coordinate recognition network model of the key points of the hand for processing to obtain the three-dimensional coordinates of the key points of the target hand. The three-dimensional coordinates are the position coordinates relative to the center of the color image area of the target hand frame.

7. The apparatus of claim 6, wherein the apparatus further comprises:

a training data acquisition unit, used for acquiring the training data of the three-dimensional coordinate recognition network model of the hand key point;

A recognition model training unit, configured to use the training data to train a preset neural network model, and when the accuracy rate of the output result of the preset neural network model is greater than a threshold, obtain the three-dimensional coordinate recognition network model of the hand key point .

8. The device according to claim 7, wherein the training data acquisition unit is specifically used for:

9. The device according to claim 7, wherein the training data acquisition unit is specifically used for:

10. The apparatus of claim 6, wherein the apparatus further comprises:

The three-dimensional coordinate conversion unit is used to obtain the depth value of the center of the color image area of the target hand frame; according to the depth value of the center of the color image area of the target hand frame, calculate the real three-dimensional coordinates of the key points of the target hand.