Human body posture estimation method based on OpenPose and Kinect and rehabilitation training system
Technical Field
The invention belongs to a cross technology in the field of computer vision and rehabilitation robots, and relates to a human body posture estimation method and a rehabilitation training system based on OpenPose and Kinect.
Background
Since the 90 s of the 20 th century, the robot-assisted rehabilitation training technology has rapidly developed and attracted general attention of various developed countries. The existing research and clinical application show that: the rehabilitation training robot can perform safe, reliable, highly targeted and adaptive rehabilitation training on patients with limb movement dysfunction caused by stroke, spinal cord injury and the like, and has important significance in improving the rehabilitation training quality of the patients with limb movement dysfunction, promoting early rehabilitation of the patients and reducing family and social burdens.
In recent years, intelligent rehabilitation is carried out to develop wider rehabilitation training means and further improve rehabilitation efficiency. The virtual scene human-computer interaction technology is introduced to stimulate the active participation consciousness of the patient, so that the training time, the training intensity and the training frequency are increased, and the training effect is improved. In addition, the human body posture estimation technology is used for capturing three-dimensional motion data of limbs in the rehabilitation training process of a patient so as to control a virtual agent in a rehabilitation game and realize human-computer interaction. A great deal of application research is carried out at home and abroad on a scene interactive virtual environment technology and a human body posture estimation technology which are applied to the rehabilitation training robot, and various rehabilitation training robot scene interactive systems which are fused with the two technologies are developed.
The method for capturing human body gestures by utilizing Kinect to realize human-computer interaction with the virtual game is more and more approved due to the characteristics of low cost and convenient use. However, the bone binding algorithm of the Kinect is very easily affected by illumination, foreground shielding and human body self-shielding, and the phenomenon of non-recognition or false recognition occurs. However, a stroke patient often needs to wear some supporting or fixing devices to maintain body balance due to partial limb disability, so that the scenario interaction system based on the bone binding algorithm carried by the Kinect cannot be applied to the rehabilitation robot with partial body occlusion. Moreover, most of the existing rehabilitation training scene interactive systems do not provide a target-oriented rehabilitation training virtual game scene according to different disabilities and different rehabilitation stages of the limbs of the patient, so that a targeted rehabilitation training scheme cannot be provided according to the rehabilitation conditions of the patient.
Disclosure of Invention
The technical problem to be solved by the invention is as follows:
the invention aims to solve the defects in the prior art and provides a human body posture estimation method based on OpenPose and Kinect and a scene interactive rehabilitation training system based on OpenPose and Kinect.
The invention adopts the following technical scheme for solving the technical problems:
a human body posture estimation method based on OpenPose and Kinect comprises the following steps:
(1) calibrating a depth camera and a color camera of the Kinect to obtain internal reference matrixes of the color camera and the depth camera and a rotation matrix and a translation vector from a depth camera coordinate system to a color camera coordinate system;
(2) generating a point cloud array of a three-dimensional space by combining a depth image and a color image of Kinect according to the internal reference matrix, the rotation matrix and the translation vector obtained in the step (1);
(3) synchronizing the Kinect color image with the point cloud array through the timestamp;
(4) obtaining a two-dimensional joint point image coordinate according to the Kinect color image by using an OpenPose algorithm;
(5) searching a three-dimensional joint point space coordinate corresponding to the two-dimensional joint point image coordinate in the point cloud array synchronized in the step (3);
(6) and (5) smoothing and predicting the space coordinates of the human body three-dimensional joint points obtained in the step (5) by using a median filtering method and a Hott two-parameter exponential smoothing method.
Preferably, the obtaining of the internal reference matrices of the color camera and the depth camera in the step (1) is respectively
And
wherein (f)
x_RGB,f
y_RGB) Is the focal length of the color camera, (c)
x_RGB,c
y_RGB) Is the center point coordinate of the color camera, (f)
x_D,f
y_D) Is the focal length of the depth camera, (c)
x_D,c
y_D) Is the coordinate of the central point of the depth camera, and the obtained rotation matrix and translation vector from the depth camera coordinate system to the color camera coordinate system are respectively R
D-RGBAnd t
D-RGB。
Preferably, the step (2) of generating a point cloud array of a three-dimensional space comprises performing the following steps:
I. according to the internal reference matrix of the depth camera, the two-dimensional image coordinates of the depth image are mapped into three-dimensional space coordinates in a depth camera coordinate system, and one point on the depth image is set as (x)D,yD) Depth value of the point is depth (x)D,yD) Then the three-dimensional coordinate (X) of the point is in the depth camera coordinatesD,YD,ZD) Comprises the following steps:
three-dimensional coordinates (X) in depth camera coordinatesD,YD,ZD) Conversion to three-dimensional coordinates (X) in color Camera coordinatesRGB,YRGB,ZRGB) Comprises the following steps:
further converting the three-dimensional coordinates (X) in the color camera coordinate systemRGB,YRGB,ZRGB) Projecting the image onto a two-dimensional color image plane to obtain the coordinates (x) of the two-dimensional color imageRGB,yRGB) Comprises the following steps:
the coordinates in the color image are taken as (x)RGB,yRGB) The corresponding RGB value of point(s) is taken as the three-dimensional coordinate (X) in the color camera coordinate systemRGB,YRGB,ZRGB) The RGB value of (1);
and IV, repeating the steps I to III on each point in the depth image, thereby generating a point cloud array of the three-dimensional space in the XYZRGB format.
Preferably, the smoothing and predicting the spatial coordinates of the three-dimensional joint points of the human body by using a median filtering method and a hottop two-parameter exponential smoothing method in the step (6) comprises the following steps:
the point cloud array coordinates (X, Y, Z) of the joint points comprise n points in a certain neighborhood window S, and the coordinates of the points are respectively (U)i,Vi,Wi) I is 1, … n, the coordinates of the joint point are modified to be the median of the coordinates of the n points, i.e. the coordinates of the n points
The hotte two-parameter exponential smoothing method comprises two basic smoothing formulas and a prediction model, namely:
smoothing formula:
St=αPt+(1-α)(St-1+bt-1)
bt=β(St-St-1)+(1-β)bt-1
and (3) prediction model:
Ft+m=St+btm
wherein alpha and beta are smoothing coefficients, values are between (0, 1), delay and mean square error characteristics are observed by drawing curves of predicted values and actual values, and the smoothing coefficients alpha and beta are adjusted to optimize an optimal prediction model to complete filtering of the three-dimensional joint point coordinates;
time series P for space coordinates of three-dimensional joint points of human bodyt={P1,P2,P3……},PtIs the three-dimensional joint point coordinate of the t-th stage of the time series, StAs smoothed values of the t-th period of the time series, btIs the smooth value of the trend of the t-th phase of the time series, m is the predicted number of the lead phases, Ft+mFor the prediction of the t + m th stage of the time series, S is initialized1Is P1,b1Is P2-P1Subsequent StAnd btS according to the preamblet-1And bt-1Iterating to obtain a predicted value F of the t + m staget+mAccording to the t-th stage StAnd btAnd (6) calculating.
In another embodiment, a scene interactive rehabilitation training system based on openpos and Kinect is provided, which includes:
the human body posture estimation module based on OpenPose and Kinect identifies three-dimensional joint point data of a patient in real time according to a depth image and a color image of the Kinect;
the scene interactive rehabilitation training virtual scene module is used for building a progressive rehabilitation training virtual scene based on a Unity3D platform, and realizing the functions of motion control of a virtual agent, drawing of a joint point chart, visual and auditory feedback, calculation of collision acting force and user basic information input; and
and the three-dimensional joint point motion track database is used for storing the basic information of the user and the space coordinates of the three-dimensional joint points.
Preferably, the human body posture estimation module based on OpenPose and Kinect comprises a Kinect point cloud array generation node, an OpenPose node, a human body three-dimensional joint point mapping and filtering node, an ROS master controller, a Unity3D communication node and a database communication node,
the Kinect point cloud array generating node generates a point cloud array of a three-dimensional space according to the internal reference matrix of the Kinect depth camera and the color camera, the rotation matrix and the translation vector from the depth camera coordinate system to the color camera coordinate system, and the depth image and the color image of the Kinect;
the OpenPose node obtains a two-dimensional joint point image coordinate according to the color image of the Kinect;
synchronizing a color image of the Kinect with a point cloud array through a time stamp by a human body three-dimensional joint point mapping and filtering node, searching a three-dimensional joint point space coordinate corresponding to a two-dimensional joint point image coordinate in the synchronized point cloud array, and smoothing and predicting the three-dimensional joint point space coordinate by using a median filtering method and a Hott two-parameter index smoothing method;
the database communication node acquires the space coordinates of the three-dimensional joint points for rehabilitation evaluation and stores the space coordinates to the three-dimensional joint point motion track database;
the Unity3D communication node acquires three-dimensional joint point space coordinates for controlling the movement of the virtual agent and sends the three-dimensional joint point space coordinates to the scene interactive rehabilitation training virtual scene module;
the ROS master controller realizes the intercommunication of the Kinect point cloud array generation node, the OpenPose node, the human body three-dimensional joint point mapping and filtering node, the Unity3D communication node and the database communication node.
Preferably, the scene interactive rehabilitation training virtual scene module comprises:
a user login interface for a patient to enter basic information;
the progressive rehabilitation training virtual scene generation module is used for providing a target-oriented virtual game environment for different disabled parts and different rehabilitation training stages based on a Unity3D platform;
the virtual agent control module is used for controlling the action of a virtual agent in the rehabilitation training virtual game through the obtained three-dimensional joint point space coordinates and simultaneously displaying the motion parameters in a virtual scene in real time; and
and the feedback module is used for triggering visual and auditory feedback according to the events in the scene and calculating the acting force to be sent to the rehabilitation training robot so as to provide force feedback for the patient.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
(1) according to the three-dimensional human body skeleton joint point recognition method, the two-dimensional human body skeleton joint points obtained by the OpenPose algorithm are combined with the depth data of the Kinect to obtain the three-dimensional human body skeleton joint points, and the problem that the skeleton binding algorithm carried by the Kinect cannot recognize or mistakenly recognizes when the body of a patient is partially shielded by a rehabilitation training robot is solved to a certain extent.
(2) The three-dimensional human joint point data based on OpenPose and Kinect is used for controlling the action of a virtual agent in a virtual environment, and the three-dimensional joint point data is stored in the MySQL database, so that digital quantitative data are provided for subsequent rehabilitation evaluation, and rehabilitation status tracking of a rehabilitation doctor is facilitated.
(3) Aiming at different rehabilitation stages of a patient, the invention designs a plurality of progressive rehabilitation training virtual game environments with rehabilitation pertinence so as to match the scene interaction requirements of the rehabilitation training robot in different rehabilitation stages.
(4) The invention adopts a modularized design idea, the design of a rehabilitation training scene interaction system is independent of a rehabilitation training robot, and a Kinect is adopted to capture the space coordinates of human body joint points as the input of the system. The system of the invention can be conveniently applied to the existing rehabilitation training robot, and the portability and the expansibility of the software system are improved.
Drawings
FIG. 1 is a skeletal structure of OpenPose;
FIG. 2 is a schematic flow chart of the human body posture estimation method based on OpenPose and Kinect of the present invention;
FIG. 3 is a diagram of a ROS-based node communication software framework;
fig. 4(a) to 4(c) are scene screenshots of the progressive scene interactive rehabilitation training virtual scene part in the invention.
Detailed Description
The technical scheme of the invention is more clearly and more specifically described below with reference to the accompanying drawings.
As shown in fig. 2, a human body posture estimation method based on openpos and Kinect includes the following steps:
(1) calibrating a depth camera and a color camera of the Kinect to obtain internal reference matrixes of the color camera and the depth camera and a rotation matrix and a translation vector from a depth camera coordinate system to a color camera coordinate system;
(2) generating a point cloud array of a three-dimensional space by combining a depth image and a color image of Kinect according to the internal reference matrix, the rotation matrix and the translation vector obtained in the step (1);
(3) synchronizing the Kinect color image with the point cloud array through the timestamp;
(4) obtaining a two-dimensional joint point image coordinate according to the Kinect color image by using an OpenPose algorithm;
(5) searching a three-dimensional joint point space coordinate corresponding to the two-dimensional joint point image coordinate in the point cloud array synchronized in the step (3);
(6) and (5) smoothing and predicting the space coordinates of the human body three-dimensional joint points obtained in the step (5) by using a median filtering method and a Hott two-parameter exponential smoothing method.
In another embodiment, the scene interactive rehabilitation training system based on OpenPose and Kinect comprises human body posture estimation based on OpenPose and Kinect, a progressive scene interactive rehabilitation training virtual scene and a three-dimensional joint point motion track database. The human body posture estimation part based on OpenPose and Kinect is used for capturing three-dimensional joint point data of a patient in real time, the progressive scene interactive rehabilitation training virtual scene part designs a progressive rehabilitation training virtual scene based on a Unity3D platform, and the three-dimensional joint point motion track database part builds a database based on MySQL, so that the joint point data can be stored and called again conveniently.
A scene interactive rehabilitation training system based on OpenPose and Kinect comprises the following steps:
step 1: firstly, calibrating a depth camera and a color camera of Kinect to obtain internal reference matrixes of the color camera and the depth camera respectively
And
wherein (f)
x_RGB,f
y_RGB) Is the focal length of the color camera, (c)
x_RGB,c
y_RGB) Is the center point coordinates of the color camera. (f)
x_D,f
y_D) Is the focal length of the depth camera, (c)
x_D,c
y_D) Is the center point coordinates of the depth camera. Then, the transformation relation between the color camera and the depth camera is calibrated, and R is set
D-RGBAnd t
D-RGBRespectively, the rotation matrix and translation vector of the depth camera coordinate system to the color camera coordinate system.
Step 2: and (3) generating a point cloud array of a three-dimensional space according to the internal reference matrix, the rotation matrix and the translation vector obtained in the step (1) and by combining the depth image and the color image of the Kinect. The method comprises the following specific steps.
Step 2.1: and mapping the two-dimensional image coordinates of the depth image into three-dimensional space coordinates under a depth camera coordinate system according to an internal reference matrix and a pinhole imaging principle of the depth camera. Let a point on the depth image be (x)D,yD) Depth value of the point is depth (x)D,yD) Then the three-dimensional coordinate (X) of the point is in the depth camera coordinatesD,YD,ZD) Comprises the following steps:
step 2.2: three-dimensional coordinates (X) under depth camera coordinatesD,YD,ZD) Conversion to three-dimensional coordinates (X) in color Camera coordinatesRGB,YRGB,ZRGB) Comprises the following steps:
step 2.3: further combining the three-dimensional coordinates (X) in the color camera coordinate systemRGB,YRGB,ZRGB) Projecting the image onto a two-dimensional color image plane to obtain the coordinates (x) of the two-dimensional color imageRGB,yRGB) Comprises the following steps:
the coordinates in the color image are taken as (x)RGB,yRGB) The corresponding RGB value of point(s) is taken as the three-dimensional coordinate (X) in the color camera coordinate systemRGB,YRGB,ZRGB) The RGB value of (a).
Step 2.4: and (3) repeating the steps I to III on each point in the depth image to generate a point cloud array of the three-dimensional space in the XYZRGB format.
And step 3: the intercommunication of a Kinect node, an OpenPose node, a human body three-dimensional joint point mapping and filtering node, a Unity3D communication node and a database communication node is realized through an ROS platform, and the function of mapping a two-dimensional human body joint point to a synchronous point cloud array to obtain a three-dimensional human body joint point is realized. And simultaneously, respectively sending the three-dimensional human joint data to the progressive scene interactive rehabilitation training virtual scene module and the three-dimensional joint motion track database module.
The ROS-based node communication software framework is shown in fig. 3, and the specific steps are as follows:
step 3.1: the software framework can be divided into three layers, namely a sensing layer, an attitude estimation and data storage layer and an application layer. The Kinect nodes of the attitude estimation and data storage layer are communicated with the Kinect nodes of the perception layer and calculate to generate a point cloud array, the human body three-dimensional joint point mapping and filtering nodes on the same layer subscribe a color image and a point cloud array topic issued by the Kinect nodes, and synchronization of the Kinect nodes and the point cloud array topic is completed through a timestamp. Step 3.2: and after the human body three-dimensional joint point mapping and filtering nodes are synchronized, the color image of the Kinect is sent to the OpenPose node in a request mode, and after a two-dimensional joint point image coordinate response returned by the OpenPose node is obtained, the three-dimensional joint point space coordinate corresponding to the two-dimensional joint point image coordinate is searched in the synchronized point cloud array. And finally, finishing filtering the space coordinates of the three-dimensional joint points by using a median filtering method and a Hott two-parameter exponential smoothing method.
Step 3.3: after the human body three-dimensional joint point mapping and filtering nodes complete smooth filtering, the obtained 18 human body three-dimensional joint point space coordinates shown in fig. 1 are released as topics, and nodes subscribing the topics extract interested joint point information. And selecting relevant joint point motion track information for rehabilitation evaluation by the database communication nodes, storing the information into the three-dimensional joint point motion track database, and providing digital quantitative data for subsequent rehabilitation effect evaluation. And the Unity3D communication node selects joint point information for controlling the movement of the virtual agent and sends the joint point information to the progressive scene interactive rehabilitation training virtual scene in a UDP communication mode.
And 4, step 4: and (3) smoothing and predicting the space coordinates of the three-dimensional joint points obtained in the step (3) by using a median filtering method and a Hott two-parameter exponential smoothing method.
Step 4.1: the median filtering method can be expressed by equation (4). The point cloud array coordinates (X, Y, Z) of the joint points comprise n points in a certain neighborhood window S, and the coordinates of the points are respectively (U)i,Vi,Wi) And i is 1, … n. The coordinates of the joint point are modified to be the median of the coordinates of the n points, i.e.
Step 4.2: the hotte two-parameter exponential smoothing method includes two basic smoothing formulas and a prediction model, namely:
smoothing formula:
St=αPt+(1-α)(St-1+bt-1) Formula (5)
bt=β(St-St-1)+(1-β)bt-1
And (3) prediction model:
Ft+m=St+btm formula (6)
Wherein, alpha and beta are smoothing coefficients and take values between (0, 1). Observing delay and mean square error characteristics by drawing curves of predicted values and actual values, and adjusting smoothing coefficients alpha and beta to preferably select an optimal prediction model to finish filtering of the three-dimensional joint point coordinates;
time series P for space coordinates of three-dimensional joint points of human bodyt={P1,P2,P3……},PtIs the three-dimensional joint point coordinate of the t-th stage of the time series, StAs smoothed values of the t-th period of the time series, btIs the smooth value of the trend of the t-th phase of the time series, m is the predicted number of the lead phases, Ft+mFor the prediction of the t + m th stage of the time series, S is initialized1Is P1,b1Is P2-P1Subsequent StAnd btS according to the preamblet-1And bt-1Iterating to obtain a predicted value F of the t + m staget+mAccording to the t-th stage StAnd btAnd (6) calculating.
And 5: and the user logs in a starting interface of the progressive scene interactive rehabilitation training virtual scene and inputs basic information.
Step 6: the rehabilitation training system provides a target-oriented game for different disability parts and different rehabilitation training stages. For example, different rehabilitation training virtual games are respectively provided for elbow joints and knee joints, and different rehabilitation training virtual game environments are provided for passive rehabilitation training at the early stage of rehabilitation, active rehabilitation training at the middle stage of rehabilitation, and resistance rehabilitation training at the later stage of rehabilitation. Taking the knee joint as an example, fig. 4(a) is a bicycle riding scene for the passive rehabilitation training at the early stage of rehabilitation, fig. 4(b) is a lake side walking scene for the active rehabilitation training at the middle stage of lower limb rehabilitation, and fig. 4(c) is a climbing scene for the resistance rehabilitation training at the later stage of lower limb rehabilitation.
In passive rehabilitation training, the rehabilitation robot drives the lower limbs of a patient to move, and the angle speed of the knee joint of the patient is used for controlling the speed of the virtual character riding. Meanwhile, the gesture actions of the patient of waving the hand leftwards and waving the hand rightwards are judged according to the movement tracks of the joints of the left arm and the right arm, and the bicycle is controlled to turn left and turn right so as to collide with gold coins in a game scene and obtain game bonus points. In the active rehabilitation training, the angular speed of the knee joint of the patient during active walking is mapped into the walking speed of the virtual character in the scene. And judging whether the virtual character climbs or not according to the height of the ground where the virtual character is located in the resistance rehabilitation training, and if the virtual character is in the climbing process, sending an instruction to the rehabilitation robot to request the rehabilitation robot to provide resistance feedback for the patient.
And 7: and (4) controlling the virtual agent in the rehabilitation training virtual game environment in the step 6 to move, rotate, play animation and the like through the smoothed three-dimensional human body joint point information obtained in the step 4, and simultaneously displaying the joint angle change curve, the limb reachable space, the motion rate and other motion parameters in a virtual training scene in real time in a chart drawing mode.
And 8: visual and auditory effect feedback is triggered through events such as collision between the virtual agent and objects in the virtual scene, and collision acting force is calculated and sent to the rehabilitation training robot so as to provide force feedback to a patient.
And step 9: and (4) storing the smoothed three-dimensional joint point data obtained in the step (4) in real time in the rehabilitation training process based on a database built by the MySQL platform, and connecting the rehabilitation training data with the corresponding patient according to the basic information of the patient obtained in the step (5) when the data is stored. After training is finished, historical rehabilitation training data can be called as required.
The technical idea of the present invention is described in the above technical solutions, and the protection scope of the present invention is not limited thereto, and any changes and modifications made to the above technical solutions according to the technical essence of the present invention belong to the protection scope of the technical solutions of the present invention.