CN112800905A

CN112800905A - Pull-up counting method based on RGBD camera attitude estimation

Info

Publication number: CN112800905A
Application number: CN202110067884.2A
Authority: CN
Inventors: 朱程利; 余小欢; 陈啟煌; 伍贤彬; 马村; 陈嵩
Original assignee: Zhejiang Guangpo Intelligent Technology Co ltd
Current assignee: Guangche Technology (Hangzhou) Co.,Ltd.
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2021-05-14

Abstract

The invention discloses a pull-up counting method based on RGBD camera attitude estimation, which collects RGB and Depth images, and processes the RGB and Depth images to obtain images that suppress most of the background information; Input into the mobi le net series network structure model for effective fusion, and output the key point confidence map and partial affinity field map; evaluate the correlation between the two key points through the integral function, and connect the joint points of each person , get the posture skeleton map of everyone in the image; extract the pull-up motion parameters according to the information of the skeleton key points in the posture skeleton map; judge whether to perform the pull-up action according to the motion parameters, and if so, perform the pull-up count. The present invention can adapt to various motion scenes, and divides the background information through the Depth information, the counting mechanism is more robust, the counting is more accurate, and the calculation efficiency is improved at the same time.

Description

A pull-up counting method based on RGBD camera pose estimation

技术领域technical field

本发明涉及一种人体向上计数方法，特别涉及一种基于RGBD相机姿态估计的引体向上计数方法。The invention relates to a human body counting method, in particular to a pull-up counting method based on RGBD camera attitude estimation.

背景技术Background technique

引体向上要求有一定的握力和上肢力量，这个力量必须克服自身的体重才能完成一次，引体向上对发展上肢悬垂力量、肩带力量和握力有重要作用，因此，引体向上是健身和锻炼最为常见的项目之一，为了科学有效的健身与锻炼，在做引体向上项目时，需要对引体向上进行有效的计数。Pull-ups require a certain amount of grip strength and upper body strength, which must overcome one’s own body weight to complete one time. Pull-ups play an important role in developing upper limb suspension strength, shoulder girdle strength and grip strength. Therefore, pull-ups are fitness and exercise. One of the most common projects, for scientific and effective fitness and exercise, when doing pull-ups, it is necessary to count the pull-ups effectively.

目前引体向上计数的方法有人工计数，这种计数方法依靠计数人员的主观判断，在模棱两可的情况下容易出错，另外也浪费人力；At present, the method of counting pull-ups includes manual counting. This counting method relies on the subjective judgment of the counting personnel, which is prone to errors in ambiguous situations, and also wastes manpower;

中国发明专利授权公告号CN105879358B，授权公告日2018年08月28日，专利名称《引体向上成绩测试仪》，该方法采用拉线式位移传感器，这种方法要求被测者穿戴相应的设备，给被测者带来不便，另外需要装置上下杠感知单元，设计较为复杂；China Invention Patent Authorization Announcement No. CN105879358B, the authorization announcement date is August 28, 2018, and the patent name is "Pull-up Performance Tester". This method uses a pull-wire displacement sensor. The tested person is inconvenient, and the upper and lower bar sensing unit needs to be installed, and the design is more complicated;

中国发明专利公开号CN 107122798A，公开日2017年09月01日，专利名称《基于深度卷积网络的引体向上计数检测方法及装置》，该方法公开了一种利用深度学习进行引体向上的计数，该方法存在两方面的问题，一是，采集的数据比较多，需要采集大量的鼻子过杆，头部过杆但鼻子未过杆等数据，需要对数据进行标注，需要投入大量的人力物力，二是，对模棱两可的情况不能很好的处理。Chinese Invention Patent Publication No. CN 107122798A, published on September 1, 2017, patent title "Pull-up counting detection method and device based on deep convolutional network", the method discloses a pull-up using deep learning. Counting, this method has two problems. First, it collects a lot of data. It needs to collect a large amount of data such as nose passing the rod, head passing the rod but the nose not passing the rod. Material resources, second, cannot handle ambiguous situations very well.

中国发明专利公开号CN 111282248A，公开日2020年06月16日，专利名称《一种基于骨骼和人脸关键点的引体向上检测系统及方法》，该方法通过采用单帧大臂和小臂之间的夹角来计数，该方法存在两个问题：一是，当有某个关键点检测失败之后，这些计数机制则失败，造成计数误差。二是，当RGB图像出现多人时，会检测出所有人的关键点信息，多人关键点信息会相互影响，造成计数不精确；同时检测多人的关键点信息也会带来一定的时延。Chinese Invention Patent Publication No. CN 111282248A, published on June 16, 2020, patent title "A system and method for pull-up detection based on key points of bones and faces", the method uses a single frame of the upper arm and forearm There are two problems with this method: First, when a certain key point fails to be detected, these counting mechanisms fail, resulting in counting errors. Second, when there are many people in the RGB image, the key point information of everyone will be detected, and the key point information of multiple people will affect each other, resulting in inaccurate counting; at the same time, detecting the key point information of many people will also bring a certain time. extension.

所以有必要提出一种新的方案，基于RGBD相机姿态估计的引体向上计数方法，该方法通过获取RGB和Depth图像，并对RGB和Depth图像进行处理，将处理后的RGB和Depth图像输入至网络模型中获得图像中所有人的姿态骨架图，根据姿态骨架图设计引体向上相应的动作逻辑机制，进行引体向上计数的方案。Therefore, it is necessary to propose a new solution, a pull-up counting method based on RGBD camera pose estimation, which obtains RGB and Depth images, processes the RGB and Depth images, and inputs the processed RGB and Depth images to the In the network model, the posture skeleton map of everyone in the image is obtained, and the corresponding action logic mechanism of the pull-up is designed according to the posture skeleton map, and the plan of counting the pull-up is carried out.

发明内容SUMMARY OF THE INVENTION

针对目前引体向上计数出现的问题，本发明提供了一种基于RGBD相机姿态估计的引体向上计数方法，结合RGBD图像的互补特性进行网络结构的设计，使得网络能够自适应的融合RGB和Depth图像的特征，网络同时回归人体骨骼关键点和进行关键点的关联，得到人体姿态骨架图，然后对检测的关键点进行逻辑判断和对比，设计引体向上逻辑和动作，并进行引体向上计数。Aiming at the problem of current pull-up counting, the present invention provides a pull-up counting method based on RGBD camera attitude estimation, and combines the complementary characteristics of RGBD images to design the network structure, so that the network can adaptively integrate RGB and Depth. Image features, the network simultaneously returns to the key points of the human skeleton and associates the key points to obtain the skeleton diagram of the human body posture, and then logically judges and compares the detected key points, designs the pull-up logic and action, and counts the pull-up. .

根据本发明的目的提供了一种基于RGBD相机姿态估计的引体向上计数方法，包括以下步骤：According to the purpose of the present invention, there is provided a method for counting pull-ups based on RGBD camera attitude estimation, comprising the following steps:

S1：采集RGB和Depth图像，并对RGB和Depth图像进行处理得到抑制掉大部分背景信息的图像；S1: Collect RGB and Depth images, and process RGB and Depth images to obtain images that suppress most of the background information;

S2：将处理后的RGB和Depth图像输入至mobile net系列网络结构模型中进行有效的融合，输出关键点置信度图和部分亲和场图；通过积分函数评估两个关键点之间的相关性，将各人的关节点进行连接，得到图像中所有人的姿态骨架图；S2: Input the processed RGB and Depth images into the mobile net series network structure model for effective fusion, and output the key point confidence map and partial affinity field map; evaluate the correlation between the two key points through the integral function , connect the joint points of each person to get the pose skeleton map of everyone in the image;

S3：根据姿态骨架图中骨骼关键点的信息提取引体向上运动参数；S3: Extract the pull-up motion parameters according to the information of the skeleton key points in the posture skeleton diagram;

S4：根据运动参数判断是否进行引体向上动作，如果是，则进行引体向上计数。S4: Determine whether to perform a pull-up action according to the motion parameters, and if so, perform a pull-up count.

优选的，步骤S1中对RGB和Depth图像进行处理的步骤包括,Preferably, the step of processing the RGB and Depth images in step S1 includes,

S11:利用具有时空一致性的RGBD相机，采集RGB和Depth图像，分别对RGB和Depth图像做背景分割；S11: Use a RGBD camera with spatial and temporal consistency to collect RGB and Depth images, and perform background segmentation on the RGB and Depth images respectively;

具体令RGB图像某像素点坐标为X_R(i,j)，对应的深度图像素点坐标为X_D(i,j)，根据深度图的分辨率生成一个掩码图，该掩码图对应像素点坐标为X_M(i,j)，根据场景复杂度设计一个可控阀值δ，如将人物活动的范围作为阀值标准，对掩码图进行二值化操作优化；Specifically, let the coordinates of a pixel in the RGB image be X _R (i,j), and the corresponding pixel coordinates of the depth map be X _D (i, j), and generate a mask map according to the resolution of the depth map. The mask map corresponds to The pixel coordinates are X _M (i, j), and a controllable threshold δ is designed according to the complexity of the scene. For example, the range of the character's activities is used as the threshold standard, and the mask image is optimized by the binarization operation;

S12：将优化后的掩码图分别与RGB和Depth图像进行点乘，抑制掉RGB和Depth图像中大部分的背景信息。S12: Do point multiplication of the optimized mask image with the RGB and Depth images respectively to suppress most of the background information in the RGB and Depth images.

优选的，步骤S2中得到所有人的姿态骨架图的步骤包括，Preferably, in step S2, the step of obtaining the posture skeleton diagram of all people includes:

S21:将处理后的RGB和depth图像通过mask之后,分别通过两个分支的网络得到RGB_f和Depth_f；同时学习一个1x2的权重向量[W_D,W_R],分别表示RGB和Depth模态的权重,将模态权重[W_D,W_R]分别与RGB_f和Depth_f相乘,然后进行RGB和Depth特征图的融合，得到融合后的特征；S21: After passing the processed RGB and depth images through the mask, RGB_f and Depth_f are obtained through the two branches of the network respectively; at the same time, a 1x2 weight vector [W _D , W _R ] is learned, representing the weights of the RGB and Depth modes respectively , Multiply the modal weights [W _D , W _R ] with RGB_f and Depth_f respectively, and then fuse the RGB and Depth feature maps to obtain the fused features;

S22:将融合后的特征输入到stage1网络结构中,每个stage的输出有两个分支,两个分支分别输出关键点置信图和关键点的亲和场,stage n的输入为stage n-1的输出；S22: Input the fused features into the stage1 network structure. The output of each stage has two branches. The two branches output the keypoint confidence map and the affinity field of the keypoint respectively. The input of stage n is stage n-1 Output;

S23:得到亲和场和关键点的位置后，通过积分函数评价两关键点的相关性；S23: After obtaining the position of the affinity field and the key point, evaluate the correlation of the two key points through the integral function;

S24:利用匈牙利算法求得相邻关键点的最优匹配，得到图像中所有人的姿态骨架图。S24: Use the Hungarian algorithm to obtain the optimal matching of adjacent key points, and obtain the pose skeleton map of everyone in the image.

优选的，所述mobile net系列网络结构模型中每个阶段的网络结构均采用3×3和1×1的卷积层，并使用空洞卷积增加网络的感受野。Preferably, the network structure of each stage in the mobile net series network structure model adopts 3×3 and 1×1 convolutional layers, and uses hole convolution to increase the receptive field of the network.

优选的，通过最大化操作得到真值置信图，在测试时，通过最大化操作得到关键点的位置，并利用非极大值抑制排除冗余关键点。Preferably, the true value confidence map is obtained through the maximization operation, and during the test, the positions of the key points are obtained through the maximization operation, and redundant key points are eliminated by non-maximum suppression.

优选的，每个阶段分支上在输出时都添加有损失函数，所述损失函数均用L2范数进行约束。Preferably, a loss function is added to the output of each stage branch, and the loss function is constrained by the L2 norm.

优选的，所述mobile net系列网络结构模型用NAS(Neural ArchitectureSearch)搜索的方法来权衡网络的精度和速度。Preferably, the mobile net series network structure model uses a NAS (Neural Architecture Search) search method to balance the accuracy and speed of the network.

优选的，步骤S3中步骤中所述引体向上运动参数包括：头部位置变化特征、手臂位置变化特征；所述头部位置变化特征指的是在运动的过程中头部高度变化情况，通过鼻子、耳朵、眼睛的三个关键点位置变化来估算；Preferably, the pull-up motion parameters in the step S3 include: head position change characteristics, arm position change characteristics; the head position change characteristics refer to the change of head height during the movement process, through The position changes of three key points of nose, ears and eyes are estimated;

所述手臂变化特征指在做引体向上过程中，手臂弯曲变化情况，通过手腕、手肘以及肩膀三个关键点位置变化来估算。The arm variation feature refers to the variation of the arm bending during the pull-up process, which is estimated by the position variation of three key points of the wrist, the elbow and the shoulder.

优选的，所述手臂弯曲变化情况通过判断手腕到肩膀的连线长度是否大于0.9倍的手肘到手腕和手肘到肩膀的长度之和。Preferably, the arm bending change is determined by judging whether the length of the connecting line from the wrist to the shoulder is greater than 0.9 times the sum of the lengths of the elbow to the wrist and the elbow to the shoulder.

本发明的有益效果是：The beneficial effects of the present invention are:

1.本发明不依靠计数人员的主观判断，避免了在模棱两可的情况下容易出错，同时节省了人力。1. The present invention does not rely on the subjective judgment of counting personnel, avoids being prone to errors in ambiguous situations, and saves manpower at the same time.

2.本发明不需要复杂的装置，只需要一款RGBD相机即可，价格低廉。2. The present invention does not need complicated devices, but only needs one RGBD camera, and the price is low.

3.本发明利用Depth信息分割掉背景信息，使得深度值达到最优，充分发挥RGBD的互补特性，结合RGBD多模态的输入，设计鲁棒的深度学习算法进行人体关键点的估计，并进行网络模型的压缩；使其能够在边缘设备达到实时的效果，根据运动属性设计相应的动作逻辑机制，进行引体向上计数；本发明能够适应各种运动场景，通过Depth信息分割掉背景信息，计数机制更加鲁棒，计数更加精确，同时提升了计算的效率。3. The present invention uses the Depth information to segment the background information, so that the depth value is optimized, and the complementary characteristics of RGBD are fully utilized. Combined with the multi-modal input of RGBD, a robust deep learning algorithm is designed to estimate the key points of the human body, and the Compression of the network model; enabling it to achieve real-time effects on edge devices, designing a corresponding action logic mechanism according to motion attributes, and performing pull-up counting; the present invention can adapt to various motion scenarios, segment background information through Depth information, and count The mechanism is more robust, the counting is more accurate, and the calculation efficiency is improved.

4.在设计引体向上相应的动作逻辑时，通过多个关键点的位置综合评估，避免了因为某一个关键点采集不到，计数机制失效。4. When designing the corresponding action logic of the pull-up, the position of multiple key points is comprehensively evaluated to avoid the failure of the counting mechanism because a certain key point cannot be collected.

附图说明Description of drawings

图1是本发明计数方法流程图；Fig. 1 is the flow chart of counting method of the present invention;

图2是本发明mobile net系列网络结构模型流程图；Fig. 2 is the flow chart of mobile net series network structure model of the present invention;

图3是本发明mobile net系列网络结构模型每个阶段的网络结构；Fig. 3 is the network structure of each stage of mobile net series network structure model of the present invention;

图4是本发明人体骨骼关键点图；Fig. 4 is the key point diagram of human skeleton of the present invention;

图5是引体向上开始计数的状态或者下一个计数的开始状态；Fig. 5 is the state that the pull-up starts counting or the starting state of the next counting;

图6是引体向上开始计数加1状态；Fig. 6 is the state that the pull-up starts counting and adds 1;

对图4中附图说明：0：鼻子；1：脖子；2：右肩；3：右肘4：右腕；5：左肩；6：左肘；7：左腕；8：右髋；9：右膝；10：右踝；11：左髋；12：左膝；13：左踝；14：右眼；15：左眼；16：右耳；17：左耳；Description of the drawings in Figure 4: 0: nose; 1: neck; 2: right shoulder; 3: right elbow; 4: right wrist; 5: left shoulder; 6: left elbow; 7: left wrist; 8: right hip; 9: right Knee; 10: Right Ankle; 11: Left Hip; 12: Left Knee; 13: Left Ankle; 14: Right Eye; 15: Left Eye; 16: Right Ear; 17: Left Ear;

具体实施方式Detailed ways

以下将结合附图所示的具体实施方式对本发明进行详细描述，但这些实施方式并不限制本发明，本领域的普通技术人员根据这些实施方式所做出的结构、方法、或功能上的变换均包含在本发明的保护范围内。The present invention will be described in detail below with reference to the specific embodiments shown in the accompanying drawings, but these embodiments do not limit the present invention, and those of ordinary skill in the art can make structural, method, or functional transformations according to these embodiments. All are included in the protection scope of the present invention.

如图1所示，本发明公开的一种基于RGBD相机姿态估计的引体向上计数方法，包括以下步骤：As shown in FIG. 1 , a method for counting pull-ups based on RGBD camera attitude estimation disclosed in the present invention includes the following steps:

有一种具体实施例，步骤S1中对RGB和Depth图像进行处理的步骤包括,There is a specific embodiment, the step of processing RGB and Depth images in step S1 includes,

具体令RGB图像某像素点坐标为X_R(i,j)，对应的深度图像素点坐标为X_D(i,j)，根据深度图的分辨率生成一个掩码图，该掩码图对应像素点坐标为X_M(i,j)，根据场景复杂度设计一个可控阀值δ，如将人物活动的范围作为阀值标准，对掩码图进行二值化操作优化；其公式如下：Specifically, let the coordinates of a pixel in the RGB image be X _R (i,j), and the corresponding pixel coordinates of the depth map be X _D (i, j), and generate a mask map according to the resolution of the depth map. The mask map corresponds to The pixel coordinates are X _M (i, j), and a controllable threshold δ is designed according to the complexity of the scene. For example, the range of the character's activities is used as the threshold standard, and the mask image is optimized by the binarization operation; the formula is as follows:

S12：将优化后的掩码图分别与RGB和Depth图像进行点乘，抑制掉RGB和Depth图像中大部分的背景信息，其公式如下：S12: Do point multiplication of the optimized mask image with the RGB and Depth images respectively to suppress most of the background information in the RGB and Depth images. The formula is as follows:

X_R(i,j)＝X_M(i,j)·X_R(i,j) (2)X _R (i,j) = X _M (i, j) · X _R (i, j) (2)

X_D(i,j)＝X_M(i,j)·X_D(i,j) (3)X _D (i,j) = X _M (i, j) · X _D (i, j) (3)

如图2所示，有一具体的实施例，步骤S2中得到所有人的姿态骨架图的步骤包括，As shown in Figure 2, there is a specific embodiment, and the step of obtaining the pose skeleton diagram of everyone in step S2 includes:

S22:将融合后的特征输入到stage1网络结构中,每个stage的输出有两个分支,两个分支分别输出关键点置信图S¹＝ρ¹(F)和关键点的亲和场

stage n的输入为stage n-1的输出；两个分支都是一轮迭代的预测体系结构，具体迭代公式如下：S22: Input the fused features into the stage1 network structure, the output of each stage has two branches, and the two branches output the key point confidence map S ¹ =ρ ¹ (F) and the affinity field of the key point respectively

The input of stage n is the output of stage n-1; both branches are an iterative prediction architecture, and the specific iteration formula is as follows:

有一种优选方案，如图3所示，所述mobile net系列网络结构模型中每个阶段的网络结构采用3×3和1×1的卷积层，并使用空洞卷积增加网络的感受野。There is a preferred solution, as shown in Figure 3, the network structure of each stage in the mobile net series network structure model adopts 3 × 3 and 1 × 1 convolution layers, and uses atrous convolution to increase the receptive field of the network.

有一种优选方案，在训练的时候需要分别对关节点位置和亲和区域进行监督，所述损失函数均用L2范数进行约束。为了避免梯度消失现象发生，在每个阶段的每分支输出都添加损失函数，起到中继监督作用。There is a preferred solution, the joint point position and the affinity region need to be supervised separately during training, and the loss function is constrained by the L2 norm. In order to avoid the phenomenon of gradient disappearance, a loss function is added to the output of each branch in each stage to play a role of relay supervision.

每支的损失函数如下：The loss function of each branch is as follows:

其中，

是有J个真实关键点的置信图，

是有C个真实的部分亲和场。W是一个边界标志，当图像位置P的注释消失时，W(P)＝0。这个标记是为了避免无标记部分参与到模型权重的优化。in,

is a confidence map with J real keypoints,

is that there are C real partial affinity fields. W is a boundary marker, W(P)=0 when the annotation of image position P disappears. This mark is to avoid the unmarked part participating in the optimization of model weights.

在训练时，对于每个人k的位置p,生成个人关键点置信图

During training, for each person k's position p, generate a personal keypoint confidence map

的方式为：The way is:

其中X_j,k为个人k，关键点j的真值的位置，σ为控制峰值范围的系数。通过最大化操作得到真值的置信图：where X _j,k is the individual k, the location of the true value of the key point j, and σ is the coefficient that controls the peak range. The confidence map of the truth value is obtained by the maximization operation:

在测试时，通过最大化操作得到关键点的位置，并利用非极大值抑制排除冗余关键点。During testing, the positions of key points are obtained by maximizing operation, and redundant key points are excluded by non-maximum suppression.

对于个人k的第c个肢干上的部分亲和场定义为：The partial affinity field on the cth limb of individual k is defined as:

其中v＝(x_j2,k-x_j1,k)/||x_j2,k-x_j1,k||₂，x_j,k表示个人k的第j个关键点位置，像素P是否落在肢干上的判断为：where v=(x _j2,k -x _j1,k )/||x _j2,k -x _j1,k || ₂ , x _j,k represents the jth key point position of individual k, whether the pixel P falls on Judgments on the limbs are:

0≤v·(p-x_j1,k)≤l_c,k&&|v_⊥·(p-x_j1,k)|≤σ_l 0≤v·(px _j1,k )≤l _c,k &&|v _⊥ ·(px _j1,k )|≤σ _l

其中，l_c,k和σ_l表示肢干的长度和宽度，最后对所有人相同类别的肢干进行平均，使得亲和场的输出通道与肢干种类数相等：Among them, l _{c, k} and σ _l represent the length and width of the limbs, and finally average the limbs of the same category of all people, so that the output channel of the affinity field is equal to the number of limbs:

在得到亲和场和关键点的位置d_j之后，通过以下积分函数评估两个关键点的相关性：After obtaining the affinity field and the location d _j of the keypoints, the correlation of the two keypoints is evaluated by the following integral function:

其中p(u)＝(1-u)d_j1+ud_j2在得到关键点以及相关性的边权之后，计算姿态骨骼就转换成了一个图问题。Among them, p(u)=(1-u)d _j1 +ud _j2 After obtaining the edge weights of key points and correlations, the calculation of pose bones is converted into a graph problem.

利用匈牙利算法求得相邻关键点的最优匹配，从而得到图像中所有人的姿态骨架图。The Hungarian algorithm is used to obtain the optimal matching of adjacent key points, so as to obtain the pose skeleton map of everyone in the image.

有一优选方案，mobile net系列网络结构模型用NAS(Neural ArchitectureSearch)搜索的方法来权衡网络的精度和速度。There is a preferred solution, the mobile net series network structure model uses the NAS (Neural ArchitectureSearch) search method to balance the accuracy and speed of the network.

如图4所示，人体骨骼关键点的相应位置；有一优选方案，步骤S3中步骤中所述引体向上运动参数包括：头部位置变化特征、手臂位置变化特征；所述头部位置变化特征指的是在运动的过程中头部高度变化情况，通过鼻子、耳朵、眼睛的三个关键点位置变化来估算；As shown in Figure 4, the corresponding positions of the key points of the human skeleton; there is a preferred solution, the pull-up motion parameters in step S3 include: head position change characteristics, arm position change characteristics; the head position change characteristics It refers to the change of the height of the head during the movement, which is estimated by the position changes of the three key points of the nose, ears and eyes;

具体的，头部位置的高度变化主要通过鼻子、耳朵、眼睛的位置变化综合考虑得出，头部的整体移动情况的定义如下:Specifically, the height change of the head position is mainly obtained by comprehensively considering the position changes of the nose, ears and eyes. The overall movement of the head is defined as follows:

y_head＝α·y_ears+β·y_eyes+γ·y_nose y _head ＝α·y _ears +β·y _eyes +γ·y _nose

(11) (11)

其中，y_head表示此时头部所在的高度，y_ears、y_eyes、y_nose代表耳朵、眼睛、鼻子所在高度，α、β、γ代表对应的权重。Among them, y _head represents the height of the head at this time, y _ears , y _eyes , and y _nose represent the height of the ears, eyes, and nose, and α, β, and γ represent the corresponding weights.

手臂变化特征指在做引体向上过程中，手臂弯曲变化情况，通过手腕、手肘以及肩膀三个关键点位置变化来估算；The arm change feature refers to the change of the arm bending during the pull-up process, which is estimated by the position changes of the three key points of the wrist, elbow and shoulder;

更为具体的，手臂弯曲变化情况通过判断手腕到肩膀的连线长度是否大于0.9倍的手肘到手腕和手肘到肩膀的长度之和。More specifically, the change of arm bending is determined by judging whether the length of the connecting line from the wrist to the shoulder is greater than 0.9 times the sum of the lengths of the elbow to the wrist and the elbow to the shoulder.

具体公式如下：The specific formula is as follows:

其中，x_wrist、x_elbow、x_shoulder分别表示手腕、手肘、肩膀的横坐标，y_wrist、y_elbow、y_shoulder分别表示手腕、手肘、肩膀的纵坐标。Among them, x _wrist , x _elbow and x _shoulder represent the abscissa of wrist, elbow and shoulder respectively, y _wrist , y _elbow and y _shoulder respectively represent the ordinate of wrist, elbow and shoulder.

具体的计数过程如下：The specific counting process is as follows:

1)当手臂处于伸直状态且头部位置变化|y_head-y₀|>ε，时，表示身体处于悬垂在单杠上；如图5所示，引体向上开始计数的状态或者下一个计数的开始状态；1) When the arm is in a straight state and the head position changes |y _head -y ₀ |>ε, it means that the body is hanging on the horizontal bar; as shown in Figure 5, the pull-up starts to count or the next count start state;

2)当手臂处于弯曲状态且头部位置变化|y_head-y₀|≤ε时，表示头部过单杠；2) When the arm is in a bent state and the head position changes |y _head -y ₀ |≤ε, it means that the head is over the horizontal bar;

3)当出现步骤1)到步骤2)的情况时，将计数加1，如图6所示；当出现步骤2)到步骤1)时表示进入下个计数的开始状态；如此的循环计数。3) When the situation from step 1) to step 2) occurs, increase the count by 1, as shown in Figure 6; when step 2) to step 1) occurs, it means entering the start state of the next count; such a loop count.

其中，y₀表示引体向上杆的高度，ε表示某一距离阀值；Among them, y ₀ represents the height of the pull-up bar, and ε represents a certain distance threshold;

需要说明的是，RGB图像受光线，背景嘈杂以及运动遮挡等挑战影响较大，但是RGB具有丰富的纹理特性。Depth图像具有目标的轮廓信息，能够区分有距离差异的目标，且对光线变化不是很敏感，但是Depth图像缺乏目标的纹理特征。It should be noted that RGB images are greatly affected by challenges such as light, background noise, and motion occlusion, but RGB has rich texture characteristics. The Depth image has the contour information of the target, can distinguish targets with distance differences, and is not very sensitive to light changes, but the Depth image lacks the texture features of the target.

本发明基于RGBD相机姿态估计的引体向上计数方法，通过自研的RGBD相机获得RGB图像以及Depth图像，并对RGB图像以及Depth图像处理后输入至网络模型进行RGBD特征的有效融合，有效减轻人体姿态估计中各个挑战因素对算法性能的影响，得到所有人的姿态骨架图，得到人体姿态骨架图后，根据运动员骨骼关节点信息提取引体向上运动参数，并进行引体向上计数。本发明能够适应各种运动场景，通过Depth信息分割掉背景信息，计数机制更加鲁棒，计数更加精确，同时提升了计算的效率。The present invention is based on a pull-up counting method based on RGBD camera attitude estimation, obtains RGB images and Depth images through a self-developed RGBD camera, and processes the RGB images and Depth images and inputs them to the network model for effective fusion of RGBD features, thereby effectively reducing the risk of human body damage. The influence of each challenge factor in the attitude estimation on the performance of the algorithm is obtained. After obtaining the human body pose skeleton diagram, the pull-up motion parameters are extracted according to the information of the athlete's skeletal joint points, and the pull-up is counted. The present invention can adapt to various motion scenes, and divides the background information through the Depth information, the counting mechanism is more robust, the counting is more accurate, and the calculation efficiency is improved at the same time.

尽管为示例目的，已经公开了本发明的优选实施方式，但是本领域的普通技术人员将意识到，在不脱离由所附的权利要求书公开的本发明的范围和精神前提下，各种改进、增加以及取代是可能的。Although the preferred embodiments of this invention have been disclosed for illustrative purposes, those of ordinary skill in the art will recognize that various modifications are possible without departing from the scope and spirit of the invention as disclosed in the accompanying claims. , additions and substitutions are possible.

Claims

1. A chin up-counting method based on RGBD camera pose estimation is characterized by comprising the following steps:

s1: collecting RGB and Depth images, and processing the RGB and Depth images to obtain an image with most background information suppressed;

s2: inputting the processed RGB and Depth images into a mobile net series network structure model for effective fusion, and outputting a key point confidence map and a partial affinity field map; evaluating the correlation between the two key points through an integral function, and connecting the joint points of all people to obtain a posture skeleton diagram of all people in the image;

s3: extracting a body upward motion parameter according to the information of skeleton key points in the posture skeleton diagram;

s4: and judging whether to perform pull-up action according to the motion parameters, and if so, performing pull-up counting.

2. The method of claim 1, wherein the method comprises: the step of processing the RGB and Depth images in step S1 includes,

s11, collecting RGB and Depth images by using an RGBD camera with space-time consistency, and respectively carrying out background segmentation on the RGB and Depth images;

specifically, the coordinate of a certain pixel point of the RGB image is X_R(i, j), the corresponding depth map pixel point coordinate is X_D(i, j), generating a mask map according to the resolution of the depth map, wherein the coordinate of a pixel point corresponding to the mask map is X_M(i, j), designing a controllable threshold value delta according to the complexity of the scene, and if the range of the character activity is used as a threshold value standard, carrying out binarization operation optimization on the mask map;

s12: and performing dot multiplication on the optimized mask image and the RGB and Depth images respectively to inhibit most background information in the RGB and Depth images.

3. The method of claim 1, wherein the method comprises: the step of obtaining the posture skeleton map of all persons in the step S2 includes,

s21, passing the processed RGB and Depth images through a mask, and respectively obtaining RGB _ f and Depth _ f through two branched networks; learning a 1x2 weight vector [ W ] simultaneously_D,W_R]Weights for RGB and Depth modes, respectively, are expressed, and the mode weight [ W [ ]_D,W_R]Multiplying the RGB _ f and Depth _ f respectively, and then fusing the RGB and Depth feature maps to obtain fused features;

s22, inputting the fused features into a stage1 network structure, wherein the output of each stage is provided with two branches which respectively output a key point confidence graph and an affinity field of a key point, and the input of stage n is the output of stage n-1;

s23, after the positions of the affinity field and the key points are obtained, evaluating the correlation of the two key points through an integral function;

and S24, obtaining the optimal matching of the adjacent key points by using the Hungarian algorithm to obtain the posture skeleton diagram of all people in the image.

4. The method of claim 3, wherein the method comprises: each stage in the mobile net series network structure model adopts convolution layers of 3 x 3 and 1x 1, and the cavity convolution is used for increasing the receptive field of the network.

5. The method of claim 3, wherein the method comprises: and obtaining a true confidence map through the maximization operation, obtaining the positions of the key points through the maximization operation during testing, and eliminating redundant key points by utilizing non-maximum suppression.

6. The method of claim 3, wherein the method comprises: each stage branch is added with a loss function at the output, and the loss functions are constrained by an L2 norm.

7. The method of claim 3, wherein the method comprises: the mobile net series network structure model uses the method of NAS (neural Architecture search) search to balance the accuracy and speed of the network.

8. The method of claim 1, wherein the method comprises: the parameters of the pull-up motion in the step S3 include: a head position change feature, an arm position change feature; the head position change characteristic refers to the head height change condition in the movement process, and is estimated through the position change of three key points of a nose, ears and eyes;

the arm change characteristics refer to the change situation of arm bending in the process of performing pull-up, and are estimated through the position change of three key points of a wrist, an elbow and a shoulder.

9. The RGBD camera pose estimation based pull-up counting method according to claim 8, wherein: and judging whether the length of a connecting line from the wrist to the shoulder is greater than 0.9 time of the sum of the lengths from the elbow to the wrist and the elbow to the shoulder.