TWI753382B - Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator - Google Patents
Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator Download PDFInfo
- Publication number
- TWI753382B TWI753382B TW109108622A TW109108622A TWI753382B TW I753382 B TWI753382 B TW I753382B TW 109108622 A TW109108622 A TW 109108622A TW 109108622 A TW109108622 A TW 109108622A TW I753382 B TWI753382 B TW I753382B
- Authority
- TW
- Taiwan
- Prior art keywords
- dimensional
- joint
- human skeleton
- training
- joint point
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 89
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000004364 calculation method Methods 0.000 claims abstract description 70
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 210000000988 bone and bone Anatomy 0.000 claims description 21
- 238000010606 normalization Methods 0.000 claims description 20
- 210000004197 pelvis Anatomy 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 230000001537 neural effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000005094 computer simulation Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 2
- 238000002759 z-score normalization Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
一種對一影像中的人體進行三維人體骨架估測及對所使用估測器進行深度網路模型訓練的方法,該估測器主要是使用一第一階段計算模型及一第二階段計算模型,該第一階段計算模型係由一深度卷積神經網路做為骨幹所建構而成,該第二階段計算模型係為一全連接模組的類神經網路,且為一種殘差連接的架構。在訓練估測器時,係藉第一訓練組中的複數訓練樣本影像來依序輸入該第一階段計算模型,並取得該第一階段計算模型的最佳化參數,再以這個第一階段計算模型的最佳化參數所得到的第一階段計算模型的輸出做為輸入至該第二階段計算模型,再經第二訓練組訓練而獲得該第二階段計算模型的最佳化參數,其中該第二訓練組可相同或不相同於該第一訓練組。在對影像進行三維人體骨架估測時,即將影像輸入至該第一與該第二階段串聯之計算模型,即可得到人體骨架估測結果。A method for estimating a three-dimensional human skeleton of a human body in an image and training a deep network model for a used estimator. The estimator mainly uses a first-stage calculation model and a second-stage calculation model, The first-stage calculation model is constructed by a deep convolutional neural network as the backbone, and the second-stage calculation model is a fully connected module neural network with a residual connection structure . When training the estimator, the first-stage calculation model is sequentially input by using the plurality of training sample images in the first training group, and the optimized parameters of the first-stage calculation model are obtained, and then the first-stage calculation model is used for the first-stage calculation model. The output of the first-stage calculation model obtained by calculating the optimized parameters of the model is used as input to the second-stage calculation model, and then trained by the second training group to obtain the optimized parameters of the second-stage calculation model, wherein The second training set may or may not be the same as the first training set. When estimating the three-dimensional human skeleton on the image, the image is input into the computing model connected in series with the first and the second stages, and the human skeleton estimation result can be obtained.
Description
本發明係與影像處理的技術有關,特別是指一種對一影像中的人體進行三維人體骨架估測的方法、三維人體骨架估測器、及估測器之訓練方法。The present invention relates to the technology of image processing, in particular to a method for estimating a 3D human skeleton of a human body in an image, a 3D human skeleton estimator, and a training method for the estimator.
在既有的技術中,目前用來估測人體在三維空間的關節點標的技術,主要是依靠昂貴的標記式動作捕捉系統或使用深度攝影機來獲得。然而,這樣的技術由於設備成本高,場地受限,且不易延伸至偵測多人骨架。Among the existing technologies, the current techniques for estimating the joint points of the human body in three-dimensional space are mainly obtained by relying on expensive marker-based motion capture systems or using depth cameras. However, due to the high cost of equipment and limited space, such a technology cannot easily be extended to detect multiple skeletons.
為了解決上述成本昂貴的問題,有人在 A Simple yet Effective Baseline for 3D Human Pose Estimation (ICCV, 2017) 這篇論文中,提出兩階段估測人體 3D 骨架的方法,第一階段透過二維骨架估測器來輸出二維影像中的二維骨架,再於第二階段將二維骨架提升至三維骨架。此方法已被證明其性能優異,且目前大多數的研究都根據此方法來進行改良。例如,目前已知有文獻將前述方法拓展至類神經網路進行深度學習,由於學習二維到三維之間的映射關係是缺乏條件來進行約束的,因此,該文獻作者定義了骨架的語法作為約束條件,例如運動學關係、對稱關係、動作協調關係,並透過最後所連接的雙向循環神經網路學習關節之間的交互關係。In order to solve the above-mentioned expensive problem, someone proposed a two-stage method for estimating the human 3D skeleton in the paper A Simple yet Effective Baseline for 3D Human Pose Estimation (ICCV, 2017). The 2D skeleton is output from the 2D image, and the 2D skeleton is upgraded to a 3D skeleton in the second stage. This method has been proven to perform well, and most of the current research is based on this method. For example, there are known literatures that extend the aforementioned methods to neural networks for deep learning. Since learning the mapping relationship between two-dimensional and three-dimensional is lack of conditions to be constrained, the author of this document defines the grammar of the skeleton as Constraints, such as kinematic relationship, symmetry relationship, action coordination relationship, and the interaction between joints are learned through the last connected bidirectional recurrent neural network.
而在 Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation (AAAI, 2018) 這篇論文中,則提到了其以第一階段來估測二維骨架及各關節點間的深度排序,再以第二階段來以類神經網路之深度學習方式來回歸至人體三維骨架。In the paper Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation (AAAI, 2018), it is mentioned that it uses the first stage to estimate the depth ordering between the two-dimensional skeleton and each joint point, and then uses the first stage to estimate the depth ordering between the joint points. In the second stage, the neural network-like deep learning method is used to return to the three-dimensional skeleton of the human body.
前述的先前技術中,以兩個階段來回歸人體三維骨架的方式,其做法尚有改進之餘地以達到精確生成人體三維骨架的目的,本發明即是使用二維骨架以及深度學習網路所估測出的相對關節點深度來做為輸入,以生成真正的三維關節點座標。In the aforementioned prior art, the method of returning to the 3D skeleton of the human body in two stages has room for improvement to achieve the purpose of accurately generating the 3D skeleton of the human body. The measured relative joint depths are used as input to generate true 3D joint coordinates.
基於上述,本發明提出一種對一影像中的人體進行三維人體骨架估測的方法,包含有下列步驟:S1) 估測得到人體邊界框:將該影像輸入至由一類神經網路訓練完成的一邊界框計算模型,並由該邊界框計算模型輸出一或多個人體邊界框,每一該人體邊界框係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、及屬於該人體邊界框內的影像內容;S2) 數據正規化:以正規化的方法來讓每一該人體邊界框內影像內容的像素色彩值常態分布;S3) 二維人體骨架複數關節點影像座標與各關節點相對於一根關節點 (root joint) 的相對深度估測:使用由深度卷積神經網路做為骨幹所訓練而成的一第一階段計算模型21,將前述步驟S2)所正規化後的每一該人體邊界框內的影像輸入至該第一階段計算模型21,而獲得輸出為一三維熱圖,從該三維熱圖可得出一個二維人體骨架上的複數關節點影像座標、一根關節點影像座標、及該複數關節點相對於該根關節點的相對深度估測值,其中,該根關節點係對應至該人體骨架上的一個關節點;以及S4) 三維人體骨架估測: 使用一個由全連接模組做為骨幹所訓練而成的第二階段計算模型,將前述步驟S3)中的該複數關節點影像座標、及該複數關節點相對根關節點的相對深度估測值輸入至該第二階段計算模型,而獲得輸出為一估測的三維人體骨架模型與其關節點的三維座標,其中該全連接模組係為一種類神經網路,而且為一種殘差連接的架構,該些輸出的關節點的三維座標係相對於該根關節點。Based on the above, the present invention proposes a method for estimating a three-dimensional human skeleton of a human body in an image, which includes the following steps: S1) estimating a human body bounding box: inputting the image into a neural network trained by a type of neural network. A bounding box calculation model, and the bounding box calculation model outputs one or more human body bounding boxes, each of which is a rectangular area, including a center point located at a human body reference point image coordinates, four vertex points Image coordinates, and the image content belonging to the human body bounding box; S2) Data normalization: use a normalization method to make the pixel color value of each image content in the human body bounding box normal distribution; S3) Two-dimensional human skeleton complex number Joint point image coordinates and relative depth estimation of each joint point relative to a root joint: using a first-
藉此,本發明使用二階段訓練,並以二維骨架以及相對關節點深度來做為中間結果 (即第一階段輸出與第二階段輸入),進而可以生成精確的三維關節座標。在估測三維骨架 (及其關節點座標)上,本發明較目前的已知技術更為準確。In this way, the present invention uses two-stage training, and uses the two-dimensional skeleton and the relative joint point depths as intermediate results (ie, the first-stage output and the second-stage input), thereby generating accurate three-dimensional joint coordinates. In estimating the three-dimensional skeleton (and its joint point coordinates), the present invention is more accurate than the currently known techniques.
另外,本發明還提出一種對一三維人體骨架估測器進行模型訓練的方法,該三維人體骨架估測器包含一第一階段計算模型及一第二階段計算模型,該第一階段計算模型係由一深度卷積神經網路做為骨幹所建構而成,該第二階段計算模型係為一全連接模組的類神經網路,且為一種殘差連接的架構,該方法包含有下列步驟:SS1) 輸入一第一訓練組:該第一訓練組包含複數訓練樣本,各該訓練樣本具有一人體邊界框,每一該人體邊界框係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、及屬於該人體邊界框內的影像內容;SS2) 數據正規化:以正規化的方法來讓每一該人體邊界框內影像內容的像素色彩值常態分布;SS3) 二維人體骨架複數關節點影像座標與各關節點相對於一根關節點(root joint)的相對深度估測器參數訓練:將前述步驟SS2)所正規化後的每一該人體邊界框內的影像輸入至該第一階段計算模型,而獲得輸出為一三維熱圖,從該三維熱圖可得出該二維人體骨架上的複數關節點影像座標、及該複數關節點相對於該根關節點的相對深度估測值,其中,該根關節點係對應至該人體骨架上的一個關節點,訓練該第一階段計算模型的參數時使用一第一損失函數;SS4) 依序輸入該第一訓練組之複數訓練樣本:在依序輸入該第一訓練組的複數訓練樣本期間對該第一損失函數進行最佳化而獲致該第一階段計算模型的最佳化參數;SS5) 三維人體骨架估測器參數訓練:使用前述步驟SS4)中所得到的該第一階段計算模型最佳化參數對一第二訓練組的複數訓練樣本預測得到該複數關節點影像座標、及其相對根關節點的相對深度估測值輸入至該第二階段計算模型,而獲得輸出為一估測的三維人體骨架模型與其關節點的三維座標,其中該些輸出的關節點的三維座標係相對於該根關節點,訓練該第二階段計算模型的參數時使用一第二損失函數;以及SS6) 獲得最佳化參數:在循環執行SS5)步驟以輸入該第二訓練組的複數訓練樣本所對應的複數關節點影像座標及相對根關節點的相對深度估測值、及網路輸出處的三維人體骨架與關節點座標真實值後,對該第二損失函數進行最佳化而獲致該第二階段計算模型的最佳化參數。In addition, the present invention also provides a method for model training for a three-dimensional human skeleton estimator. The three-dimensional human skeleton estimator includes a first-stage calculation model and a second-stage calculation model. The first-stage calculation model is a Constructed by a deep convolutional neural network as the backbone, the second-stage computational model is a fully connected module-like neural network, and is a residual connection architecture, and the method includes the following steps : SS1) Input a first training group: The first training group includes a plurality of training samples, each of which has a human body bounding box, and each human body bounding box is a rectangular area, including a center point located in a human body reference Point image coordinates, four vertex point image coordinates, and image content belonging to the human body bounding box; SS2) Data normalization: normalize the pixel color value of each image content within the human body bounding box by a normalized method distribution; SS3) two-dimensional human skeleton complex joint point image coordinates and the relative depth estimator parameter training of each joint point relative to a joint point (root joint): each of the human body normalized in the aforementioned step SS2) The image in the bounding box is input to the first-stage calculation model, and the obtained output is a three-dimensional heat map. From the three-dimensional heat map, the image coordinates of the complex joint points on the two-dimensional human skeleton and the relative joint points of the complex joint points can be obtained. The relative depth estimation value of the root joint point, wherein the root joint point corresponds to a joint point on the human skeleton, and a first loss function is used when training the parameters of the first stage calculation model; SS4) according to Sequentially inputting the complex training samples of the first training group: optimizing the first loss function during the sequential inputting of the complex training samples of the first training group to obtain the optimized parameters of the first-stage computing model; SS5) Three-dimensional human skeleton estimator parameter training: using the first-stage calculation model optimization parameters obtained in the aforementioned step SS4) to predict the complex number of training samples of a second training group to obtain the complex number of joint point image coordinates, and The relative depth estimation value relative to the root joint point is input to the second-stage calculation model, and the obtained output is an estimated three-dimensional human skeleton model and three-dimensional coordinates of its joint points, wherein the three-dimensional coordinate system of the output joint points is With respect to the root joint node, a second loss function is used when training the parameters of the second-stage calculation model; and SS6) Obtaining optimized parameters: Step SS5) is performed in a loop to input the complex training samples of the second training group After the corresponding complex joint point image coordinates and relative depth estimates relative to the root joint point, and the real values of the three-dimensional human skeleton and joint point coordinates at the network output, the second loss function is optimized to obtain the The second stage calculates the optimized parameters of the model.
藉此,本發明可以對三維人體骨架估測器進行模型訓練,進而訓練出一個有效的模型,而可供操作來估測輸入影像中的三維人體骨架及關節點座標。In this way, the present invention can perform model training on the three-dimensional human skeleton estimator, and then train an effective model, which can be used for operation to estimate the three-dimensional human skeleton and joint point coordinates in the input image.
本發明還揭露一種三維人體骨架估測器,包含:一第一階段計算模型:係由一深度卷積神經網路做為骨幹所建構而成,其輸入為一正規化後的人體邊界框影像,該人體邊界框係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、及屬於該人體邊界框內的影像內容,其輸出為一三維熱圖,從該三維熱圖得出一個二維人體骨架上的複數關節點影像座標、及該複數關節點相對於一根關節點的相對深度估測值,其中,該根關節點係對應至該人體骨架上的一個關節點;以及一第二階段計算模型:係為一全連接模組的類神經網路,且為一種殘差連接的架構,其輸入為該第一階段計算模型輸出的該複數關節點影像座標、及該複數關節點相對該根關節點的相對深度估測值,其輸出為一估測的三維人體骨架模型與其複數關節點的三維座標,該些輸出的關節點的三維座標係相對於該根關節點。The present invention also discloses a three-dimensional human skeleton estimator, comprising: a first-stage calculation model: constructed by a deep convolutional neural network as the backbone, the input of which is a normalized human body bounding box image , the human body bounding box is a rectangular area, including a center point at a human body reference point image coordinates, four vertex image coordinates, and the image content belonging to the human body bounding box, the output is a three-dimensional heat map, From the 3D heat map, the image coordinates of a plurality of joint points on a two-dimensional human skeleton, and the relative depth estimation value of the complex joint point relative to a joint point are obtained, wherein the root joint point corresponds to the human skeleton and a second-stage computing model: a neural network-like network of a fully connected module, and a residual-connected architecture whose input is the complex joint output by the first-stage computing model point image coordinates, and the estimated relative depth of the complex joint point relative to the root joint point, the output is an estimated three-dimensional human skeleton model and the three-dimensional coordinates of its complex joint points, the three-dimensional coordinate system of the output joint points Relative to this root node.
為了詳細說明本發明之技術特點所在,茲舉以下之較佳實施例並配合圖式說明如後,其中:In order to illustrate the technical features of the present invention in detail, the following preferred embodiments are given and described in conjunction with the drawings as follows, wherein:
如圖1至圖5所示,本發明第一較佳實施例提出一種對一影像中的人體進行三維人體骨架估測的方法,主要具有下列步驟:As shown in FIG. 1 to FIG. 5 , a first preferred embodiment of the present invention provides a method for estimating a three-dimensional human skeleton of a human body in an image, which mainly includes the following steps:
S1) 估測得到人體邊界框:將一影像輸入至由一類神經網路訓練完成的一邊界框計算模型,並由該邊界框計算模型輸出一或多個人體邊界框11,此處的一或多個人體邊界框11主要是依據該影像中的人體影像數量而定,所輸出的人體邊界框11可標記於影像上,如圖2所示。每一該人體邊界框11係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、以及屬於人體邊界框11內的影像內容。於本實施例中,該類神經網路可係為中心網(CenterNet) 技術,其使用全卷積網路 (Full Convolutional Networks, FCN) 做為網路骨幹,例如 ResNet 結合轉置卷積層、Hourglass、Deep Layer Aggregation (DLA) 技術等,但不以此為限。任何可從影像中找出人體所在位置的技術皆可使用於找出所有的人體邊界框11。S1) Estimate the human body bounding box: input an image into a bounding box calculation model trained by a class of neural networks, and output one or more human
S2) 數據正規化:以正規化的方法來讓每一該人體邊界框11內影像內容的像素色彩值常態分布。於本實施例中,該正規化的方法為Z-分數正規化法 (Z-score normalization),如下式(1)所示:S2) Data normalization: normalize the pixel color values of the image content in each human
式(1) Formula 1)
其中,平均數為μ,標準差為σ,分別對應影像的 RGB 通道,原始影像像素值為p (RGB 三通道值),正規化後像素值為 p’。如此可以藉由R,G,B的平均值及標準差來將各通道像素色彩值正規化至[0,1]的範圍。Among them, the mean is μ and the standard deviation is σ, which correspond to the RGB channels of the image respectively, the original image pixel value is p (RGB three-channel value), and the normalized pixel value is p’. In this way, the pixel color value of each channel can be normalized to the range of [0,1] by the mean and standard deviation of R, G, B.
S3) 二維人體骨架複數關節點影像座標與各關節點相對於一根關節點(root joint)的相對深度估測:使用由深度卷積神經網路做為骨幹所訓練而成的一第一階段計算模型21,將前述步驟S2)所正規化後的每一該人體邊界框11內的影像輸入至該第一階段計算模型21,而獲得輸出為數個三維熱圖,從該數個三維熱圖可得出一個二維人體骨架上的複數關節點影像座標 (均量化為 [0,64] 範圍內)、及該複數關節點相對於該根關節點的相對深度估測值(量化為 [0,64] 範圍內),其估測的架構如圖3所示。該複數關節點的相對深度值係定義為Z-Zroot,其中, Zroot 為該根關節點的深度值 (可以由另外方法求得或定義為 0 深度點),Z則為各該關節點的深度值。由於第一階段計算模型 21 的輸出為經過量化與正規化動作,其正規化關節點相對深度值與其實際相對深度值 Z-Zroot 間的關係如下式(2)所示:S3) Two-dimensional human skeleton complex joint point image coordinates and relative depth estimation of each joint point relative to a joint point (root joint): using a deep convolutional neural network as the backbone to train a first The
式(2) Formula (2)
其中,scale係指一給予之倍率。Among them, scale refers to a given magnification.
在式 (2) 中,我們可由該第一階段計算模型21預測得到的depth’
計算得到相關關節點對應的相對深度值Z-Zroot。此外,該根關節點係對應至該人體骨架上的一個關節點,於本實施例中係以人的骨盆點的座標來做為根關節點座標,但不以此為限。In formula (2), we can obtain the relative depth value Z-Zroot corresponding to the relevant joint points by calculating the depth' predicted by the first-
S4) 三維人體骨架估測:使用一個由全連接模組做為骨幹所訓練而成的一第二階段計算模型31,將前述步驟S3)中第一階段計算模型輸出的該複數關節點影像座標、及該複數關節點的相對深度估測值輸入該第二階段計算模型31,而獲得輸出為一估測的三維人體骨架模型與其關節點的三維座標,其中該全連接模組係為一種類神經網路,而且為一種殘差連接的架構,這個架構如圖4所示,而這些輸出的關節點的三維座標係相對於該根關節點。其中,該估測的三維人體骨架模型的關節點的三維座標,乃是以線性正規化 (linear normalization)的方式來表示,其值在 -1.0 ~ 1.0 之間:S4) 3D human skeleton estimation: using a second-
式(3) Formula (3)
其中,為正規化後的三維關節點座標,為給予之根關節點三維座標。藉由式(3),我們可以反求每一關節點的三維座標 (X ,Y ,Z )。in, is the normalized three-dimensional joint point coordinates, is the three-dimensional coordinate of the given root joint point. With Equation (3), we can reverse the three-dimensional coordinates ( X , Y , Z ) of each joint point.
由本第一實施例的上述步驟 S1) ~ S4)可以瞭解到,本發明使用估測出的二維骨架以及關節相對深度來做為輸入,進而可以生成真正的關節三維座標,其關節點位置估測,本發明較目前的已知技術更為準確。From the above-mentioned steps S1) to S4) of the first embodiment, it can be known that the present invention uses the estimated two-dimensional skeleton and the relative depth of the joints as input, and then can generate the real three-dimensional joint coordinates, and the joint point positions are estimated. It is found that the present invention is more accurate than the current known technology.
如圖5至圖7所示,本發明第二較佳實施例提出一種對一個三維人體骨架估測器進行模型訓練的方法,主要是為了說明前述第一實施例的估測方法其估測器是如何訓練出來的,該三維人體骨架估測器包含一第一階段計算模型21及一第二階段計算模型31,該第一階段計算模型21即為前述第一實施例中的步驟S3)的第一階段計算模型21,該第二階段計算模型31即為前述第一實施例中的步驟S4)裡的第二階段計算模型31,本第二實施例之對一個三維人體骨架估測器進行模型訓練的方法具有下列步驟:As shown in FIG. 5 to FIG. 7 , a second preferred embodiment of the present invention proposes a method for model training a three-dimensional human skeleton estimator, mainly to illustrate the estimation method and its estimator of the aforementioned first embodiment How is it trained? The three-dimensional human skeleton estimator includes a first-
SS1) 輸入一第一訓練組:該第一訓練組19包含複數訓練樣本191,各該訓練樣本191係為影像而具有一個人體邊界框11(示於圖2),每一該人體邊界框11係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、及屬於該人體邊界框11內的影像內容,其中,該第一訓練組19係如圖6所示。此外,在輸入該第一訓練組19時,亦輸入各該訓練樣本191所對應的輸出真實值(ground truth)。SS1) Input a first training set: the
SS2) 數據正規化:以正規化的方法來讓每一該人體邊界框11內影像內容的像素色彩值常態分布,其係以上述式(1)為例,而可藉由R,G,B的平均值及標準差來將像素色值正規化至[0,1]範圍。SS2) Data normalization: the normalization method is used to make the pixel color value of each image content in the human
SS3) 二維人體骨架複數關節點影像座標與各關節點相對於一根關節點(root joint)的相對深度估測器參數訓練:將前述步驟SS2)所正規化後的每一該人體邊界框11內的影像輸入至該第一階段計算模型21,而獲得輸出為數個三維熱圖,從該些三維熱圖可得出該二維人體骨架上的複數關節點影像座標、及該複數關節點相對於一根關節點的相對深度估測值,其估測的網路架構如圖6所示。其中,該根關節點係對應至該人體骨架上的一個關節點,例如人體骨架模型的骨盆點,訓練該第一階段計算模型21的網路參數時使用一第一損失函數。其中,該複數關節點的相對深度值的真實值 (ground truth),在網路訓練時乃是由Z-Zroot方法計算而得,而Zroot為根關節點的真實深度值,Z則為各該關節點的真實深度值。該第一階段計算模型 21 訓練時,該複數關節點的相對深度值的真實值亦必須經過量化與正規化動作,使其正規化關節點相對深度值在 [0,64] 範圍內,而與其實際相對深度值 Z-Zroot 間的關係如上式(2)所示。SS3) Two-dimensional human skeleton complex joint point image coordinates and the relative depth estimator parameter training of each joint point with respect to a joint point (root joint). The images in 11 are input to the first-
該第一損失函數係關節的平均絕對誤差值(Mean Absolute Error, MAE),該誤差係關關節的二維影像座標及相對深度值,並以 Lfirst 做為第一損失函數,係如下式(4)所示:The first loss function is the Mean Absolute Error (MAE) of the joint, and the error is related to the two-dimensional image coordinates and the relative depth value of the joint, and L first is used as the first loss function, which is represented by the following formula ( 4) shown:
式(4) Formula (4)
其中,𝑠𝑗
=(𝑥𝑗
, 𝑦𝑗
, 𝑑𝑒𝑝𝑡𝑗
)為第一階段計算模型 21的輸出預測值,sj
=(xj
, yj
, depthj
)為其對應之真實值。Among them, 𝑠 𝑗 =(𝑥 𝑗 , 𝑦 𝑗 , 𝑑𝑒𝑝𝑡 𝑗 ) is the output prediction value of the first
SS4) 依序輸入該第一訓練組之複數訓練樣本:在依序輸入該第一訓練組19的複數訓練樣本191期間對該第一損失函數進行最佳化而獲致該第一階段計算模型21的最佳化參數。SS4) Sequentially inputting the complex training samples of the first training group: optimizing the first loss function during the sequential inputting of the
SS5) 三維人體骨架估測器參數之訓練:循環使用前述步驟SS4)中所得到的該第一階段計算模型21最佳化參數對每一訓練樣本191預測得到該複數關節點影像座標、及其相對根關節點的相對深度估測值輸入至該第二階段計算模型31,而獲得輸出為估測的三維人體骨架模型與其關節點的三維座標,其中該些輸出的關節點的三維座標係相對於該根關節點。此處訓練第二階段計算模型31所使用輸入的二維人體骨架關節點影像座標、及其相對根關節點的相對深度值不限於如圖6所示之使用第一訓練組19 的訓練樣本191而得,亦可來自於另一第二訓練組 29 的訓練樣本 291而得,如圖7所示,其中第二訓練組 29可以相同、被包含於、或不同於第一訓練組 19。訓練該第二階段計算模型31參數時使用一第二損失函數,該第二損失函數Lsecond
係由關節點之三維座標誤差及一組骨骼向量誤差來組成,其中每一骨骼向量代表一實際或虛擬的兩個關節點間的向量,骨骼向量為事先定義的骨骼終點與骨骼起點相減得到的骨骼特徵(箭頭方向為終點)。該第二損失函數的計算式係如下式(5)所示:SS5) The training of the parameters of the three-dimensional human skeleton estimator: cyclically use the optimized parameters of the first-
式(5) Formula (5)
其中,為第j 個關節點的三維座標預測值,為其對應的三維座標真實值,為三維骨骼向量預測值,為三維骨骼向量真實值。其中,dest(k)與start(k)為兩個函數,代表第k個骨骼的終點與起點的關節編號。前述函數的超參數分別為以及,K=22以及J=17則為事先定義的骨骼數量及關節點數量。函數如式(6)所示,其用來計算整組骨骼與間的相異程度。式(6) in, is the predicted value of the three-dimensional coordinates of the jth joint point, is the true value of its corresponding three-dimensional coordinate, is the predicted value for the 3D bone vector, is the true value of the 3D bone vector. Among them, dest(k) and start(k) are two functions, representing the joint number of the end point and the start point of the kth bone. The hyperparameters of the aforementioned functions are as well as , K=22 and J=17 are the predefined number of bones and joints. The function is shown in formula (6), which is used to calculate the entire set of bones and degree of dissimilarity. Formula (6)
上述的該第二損失函數係基於關節點位置Sj 及骨骼向量Bk ,骨骼向量的誤差Lbone 使類神經網路學到骨架結構的空間關係,可藉以增強關節點之間的物理約束(physical constraint)。關節點的誤差Ljoint 使類神經網路學到精確的座標位置。骨骼向量為事先定義的骨骼終點與骨骼起點相減得到的骨骼特徵(箭頭方向為終點),骨骼的終點與起點定義關係如圖8所示。The above-mentioned second loss function is based on the joint point position S j and the bone vector B k , and the error L bone of the bone vector enables the neural network to learn the spatial relationship of the skeleton structure, thereby enhancing the physical constraints between the joint points ( physical constraint). The error L joint of the joint point enables the neural network to learn the precise coordinate position. The bone vector is the bone feature obtained by subtracting the pre-defined bone end point and the bone start point (the direction of the arrow is the end point), and the definition relationship between the bone end point and the start point is shown in Figure 8.
SS6) 獲得最佳化參數:在循環執行SS5)步驟以輸入該第二訓練組29的複數訓練樣本291所對應的複數關節點影像座標及相對根關節點的相對深度估測值、及網路輸出處的三維人體骨架與關節點座標真實值後,對該第二損失函數進行最佳化而獲致該第二階段計算模型31的最佳化參數。SS6) Obtaining optimized parameters: Step SS5) is executed in a loop to input the complex joint image coordinates and relative depth estimates relative to the root joint point corresponding to the
由上述步驟可知,本第二實施例可以對本發明的兩階段三維人體骨架估測器進行模型訓練,進而訓練出一個有效的模型,而可供操作,並以前述第一實施例的方法使用該訓練完成的估測器來對一影像中的人體進行三維人體骨架估測。It can be seen from the above steps that the second embodiment can perform model training on the two-stage three-dimensional human skeleton estimator of the present invention, and then trains an effective model, which is ready for operation, and uses the method of the first embodiment described above. The completed estimator is trained to perform 3D human skeleton estimation for a human body in an image.
11:人體邊界框 19:第一訓練組 191:訓練樣本 21:第一階段計算模型 29:第二訓練組 291:訓練樣本 31:第二階段計算模型11: Human Bounding Box 19: The first training group 191: Training samples 21: Phase 1 Computational Model 29: Second training group 291: training samples 31: The second stage computational model
圖1係本發明第一較佳實施例之流程圖。 圖2係本發明第一較佳實施例之影像示意圖,顯示影像上標記了人體邊界框的狀態。 圖3係本發明第一較佳實施例之深度學習網路架構示意圖,顯示第一階段計算模型之架構。 圖4係本發明第一較佳實施例之另一深度學習網路架構示意圖,顯示第二階段計算模型之架構。 圖5係本發明第二較佳實施例之流程圖。 圖6係本發明第二較佳實施例之深度網路架構及學習示意圖,顯示利用第一訓練組之第一階段計算模型之訓練。 圖7係本發明第二較佳實施例之再一深度網路架構及學習示意圖,顯示由一第二訓練組形成之二維人體骨節點座標及各節點相對根節點之相對深度使用第二階段計算模型之訓練。 圖8係本發明第二較佳實施例中人體骨架之關節及骨骼示意圖。FIG. 1 is a flow chart of the first preferred embodiment of the present invention. FIG. 2 is a schematic diagram of an image of the first preferred embodiment of the present invention, showing a state in which a human body bounding box is marked on the image. FIG. 3 is a schematic diagram of the deep learning network architecture according to the first preferred embodiment of the present invention, showing the architecture of the first-stage computing model. FIG. 4 is a schematic diagram of another deep learning network architecture according to the first preferred embodiment of the present invention, showing the architecture of the second-stage computing model. FIG. 5 is a flow chart of the second preferred embodiment of the present invention. FIG. 6 is a schematic diagram of the deep network architecture and learning of the second preferred embodiment of the present invention, showing the training of the first-stage computational model using the first training group. 7 is a schematic diagram of still another deep network architecture and learning according to the second preferred embodiment of the present invention, showing the two-dimensional human bone node coordinates formed by a second training group and the relative depth of each node relative to the root node using the second stage Training of computational models. 8 is a schematic diagram of the joints and bones of the human skeleton in the second preferred embodiment of the present invention.
Claims (16)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109108622A TWI753382B (en) | 2020-03-16 | 2020-03-16 | Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109108622A TWI753382B (en) | 2020-03-16 | 2020-03-16 | Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202137144A TW202137144A (en) | 2021-10-01 |
| TWI753382B true TWI753382B (en) | 2022-01-21 |
Family
ID=79601238
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW109108622A TWI753382B (en) | 2020-03-16 | 2020-03-16 | Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI753382B (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201732739A (en) * | 2016-01-22 | 2017-09-16 | 高通公司 | Object-focused active three-dimensional reconstruction |
| TWI620076B (en) * | 2016-12-29 | 2018-04-01 | 大仁科技大學 | Analysis system of humanity action |
| US20190295307A1 (en) * | 2015-09-21 | 2019-09-26 | TuringSense Inc. | System and method for capturing and analyzing motions to be shared |
| CN110443144A (en) * | 2019-07-09 | 2019-11-12 | 天津中科智能识别产业技术研究院有限公司 | A kind of human body image key point Attitude estimation method |
| EP3579198A1 (en) * | 2018-06-05 | 2019-12-11 | Cristian Sminchisescu | Image processing method, system and device |
| CN110570455A (en) * | 2019-07-22 | 2019-12-13 | 浙江工业大学 | A full-body 3D pose tracking method for room VR |
-
2020
- 2020-03-16 TW TW109108622A patent/TWI753382B/en active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190295307A1 (en) * | 2015-09-21 | 2019-09-26 | TuringSense Inc. | System and method for capturing and analyzing motions to be shared |
| TW201732739A (en) * | 2016-01-22 | 2017-09-16 | 高通公司 | Object-focused active three-dimensional reconstruction |
| TWI620076B (en) * | 2016-12-29 | 2018-04-01 | 大仁科技大學 | Analysis system of humanity action |
| EP3579198A1 (en) * | 2018-06-05 | 2019-12-11 | Cristian Sminchisescu | Image processing method, system and device |
| CN110443144A (en) * | 2019-07-09 | 2019-11-12 | 天津中科智能识别产业技术研究院有限公司 | A kind of human body image key point Attitude estimation method |
| CN110570455A (en) * | 2019-07-22 | 2019-12-13 | 浙江工业大学 | A full-body 3D pose tracking method for room VR |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202137144A (en) | 2021-10-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113221647B (en) | 6D pose estimation method fusing point cloud local features | |
| CN107292912B (en) | Optical flow estimation method based on multi-scale corresponding structured learning | |
| CN110827342A (en) | Three-dimensional human body model reconstruction method, storage device, and control device | |
| CN113516693B (en) | Rapid and universal image registration method | |
| CN110009674B (en) | A real-time calculation method of monocular image depth of field based on unsupervised deep learning | |
| CN110427877A (en) | Human body three-dimensional posture estimation method based on structural information | |
| CN112232106B (en) | A 2D to 3D Human Posture Estimation Method | |
| CN107204010A (en) | A kind of monocular image depth estimation method and system | |
| CN108416840A (en) | A Dense Reconstruction Method of 3D Scene Based on Monocular Camera | |
| CN111062326B (en) | A Geometry-Driven Self-Supervised Human 3D Pose Estimation Network Training Method | |
| CN115294265B (en) | Three-dimensional human body grid reconstruction method and system based on attention of graph skeleton | |
| CN113313176B (en) | A point cloud analysis method based on dynamic graph convolutional neural network | |
| CN111862299A (en) | Human body three-dimensional model construction method, device, robot and storage medium | |
| CN114155406A (en) | Pose estimation method based on region-level feature fusion | |
| CN111860297A (en) | A SLAM loopback detection method applied to indoor fixed space | |
| CN113609999A (en) | Human body model building method based on gesture recognition | |
| CN109255783B (en) | Method for detecting position arrangement of human skeleton key points on multi-person image | |
| CN116385660A (en) | Indoor single view scene semantic reconstruction method and system | |
| CN111368733A (en) | A 3D hand pose estimation method, storage medium and terminal based on label distribution learning | |
| CN110598711B (en) | An Object Segmentation Method Combined with Classification Task | |
| CN117711066A (en) | Three-dimensional human body posture estimation method, device, equipment and medium | |
| TWI753382B (en) | Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator | |
| CN115376209A (en) | A 3D Human Pose Estimation Method Based on Deep Learning | |
| CN116805337B (en) | A crowd positioning method based on cross-scale visual transformation network | |
| Chang et al. | Multi-view 3d human pose estimation with self-supervised learning |