TW202137144A - Method of estimating three-dimensional human skeleton of human body in an image, three-dimensional human skeleton estimator, and training method of estimator which is obtained by using two-dimensional skeleton and relative joint point depth estimated by deep learning network as inputs - Google Patents
Method of estimating three-dimensional human skeleton of human body in an image, three-dimensional human skeleton estimator, and training method of estimator which is obtained by using two-dimensional skeleton and relative joint point depth estimated by deep learning network as inputs Download PDFInfo
- Publication number
- TW202137144A TW202137144A TW109108622A TW109108622A TW202137144A TW 202137144 A TW202137144 A TW 202137144A TW 109108622 A TW109108622 A TW 109108622A TW 109108622 A TW109108622 A TW 109108622A TW 202137144 A TW202137144 A TW 202137144A
- Authority
- TW
- Taiwan
- Prior art keywords
- dimensional
- training
- joint
- human skeleton
- calculation model
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000013135 deep learning Methods 0.000 title description 6
- 238000004364 calculation method Methods 0.000 claims abstract description 80
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 238000005457 optimization Methods 0.000 claims abstract description 5
- 210000000988 bone and bone Anatomy 0.000 claims description 21
- 238000010606 normalization Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 description 16
- 238000005516 engineering process Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 230000001537 neural effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000002759 z-score normalization Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
Description
本發明係與影像處理的技術有關,特別是指一種對一影像中的人體進行三維人體骨架估測的方法、三維人體骨架估測器、及估測器之訓練方法。The present invention is related to image processing technology, in particular to a method for estimating a three-dimensional human skeleton of a human body in an image, a three-dimensional human skeleton estimator, and a training method of the estimator.
在既有的技術中,目前用來估測人體在三維空間的關節點標的技術,主要是依靠昂貴的標記式動作捕捉系統或使用深度攝影機來獲得。然而,這樣的技術由於設備成本高,場地受限,且不易延伸至偵測多人骨架。Among the existing technologies, the current technologies used to estimate the joint points of the human body in three-dimensional space are mainly obtained by relying on expensive marker motion capture systems or using depth cameras. However, due to the high cost of the equipment, the space is limited, and it is not easy to extend this technology to detect multi-person skeletons.
為了解決上述成本昂貴的問題,有人在 A Simple yet Effective Baseline for 3D Human Pose Estimation (ICCV, 2017) 這篇論文中,提出兩階段估測人體 3D 骨架的方法,第一階段透過二維骨架估測器來輸出二維影像中的二維骨架,再於第二階段將二維骨架提升至三維骨架。此方法已被證明其性能優異,且目前大多數的研究都根據此方法來進行改良。例如,目前已知有文獻將前述方法拓展至類神經網路進行深度學習,由於學習二維到三維之間的映射關係是缺乏條件來進行約束的,因此,該文獻作者定義了骨架的語法作為約束條件,例如運動學關係、對稱關係、動作協調關係,並透過最後所連接的雙向循環神經網路學習關節之間的交互關係。In order to solve the above-mentioned costly problem, someone proposed a two-stage method for estimating the human body's 3D skeleton in A Simple yet Effective Baseline for 3D Human Pose Estimation (ICCV, 2017). The first stage is through the two-dimensional skeleton estimation. To output the two-dimensional skeleton in the two-dimensional image, and then upgrade the two-dimensional skeleton to the three-dimensional skeleton in the second stage. This method has been proven to have excellent performance, and most of the current studies are based on this method to improve. For example, there is currently known literature that extends the aforementioned method to neural network-like deep learning. Since learning the mapping relationship between two-dimensional and three-dimensional is lack of conditions to be constrained, the author of the literature defines the grammar of the skeleton as Constraint conditions, such as kinematic relationship, symmetry relationship, movement coordination relationship, and learn the interaction relationship between joints through the finally connected two-way cyclic neural network.
而在 Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation (AAAI, 2018) 這篇論文中,則提到了其以第一階段來估測二維骨架及各關節點間的深度排序,再以第二階段來以類神經網路之深度學習方式來回歸至人體三維骨架。In the paper Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation (AAAI, 2018), it is mentioned that it uses the first stage to estimate the depth ranking of the two-dimensional skeleton and each joint point. In the second stage, the neural network-like deep learning method is used to return to the human body's three-dimensional skeleton.
前述的先前技術中,以兩個階段來回歸人體三維骨架的方式,其做法尚有改進之餘地以達到精確生成人體三維骨架的目的,本發明即是使用二維骨架以及深度學習網路所估測出的相對關節點深度來做為輸入,以生成真正的三維關節點座標。In the aforementioned prior art, the method of returning to the human body three-dimensional skeleton in two stages still has room for improvement to achieve the purpose of accurately generating the human body three-dimensional skeleton. The present invention uses the two-dimensional skeleton and the deep learning network to estimate The measured relative joint point depth is used as input to generate true three-dimensional joint point coordinates.
基於上述,本發明提出一種對一影像中的人體進行三維人體骨架估測的方法,包含有下列步驟:S1) 估測得到人體邊界框:將該影像輸入至由一類神經網路訓練完成的一邊界框計算模型,並由該邊界框計算模型輸出一或多個人體邊界框,每一該人體邊界框係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、及屬於該人體邊界框內的影像內容;S2) 數據正規化:以正規化的方法來讓每一該人體邊界框內影像內容的像素色彩值常態分布;S3) 二維人體骨架複數關節點影像座標與各關節點相對於一根關節點 (root joint) 的相對深度估測:使用由深度卷積神經網路做為骨幹所訓練而成的一第一階段計算模型21,將前述步驟S2)所正規化後的每一該人體邊界框內的影像輸入至該第一階段計算模型21,而獲得輸出為一三維熱圖,從該三維熱圖可得出一個二維人體骨架上的複數關節點影像座標、一根關節點影像座標、及該複數關節點相對於該根關節點的相對深度估測值,其中,該根關節點係對應至該人體骨架上的一個關節點;以及S4) 三維人體骨架估測: 使用一個由全連接模組做為骨幹所訓練而成的第二階段計算模型,將前述步驟S3)中的該複數關節點影像座標、及該複數關節點相對根關節點的相對深度估測值輸入至該第二階段計算模型,而獲得輸出為一估測的三維人體骨架模型與其關節點的三維座標,其中該全連接模組係為一種類神經網路,而且為一種殘差連接的架構,該些輸出的關節點的三維座標係相對於該根關節點。Based on the above, the present invention proposes a method for estimating a three-dimensional human skeleton of a human body in an image, which includes the following steps: S1) Estimating the bounding box of the human body: inputting the image into a neural network trained by a type of neural network. A bounding box calculation model, and one or more human body bounding boxes are output from the bounding box calculation model. Each human body bounding box is a rectangular area, including a center point located at a human body reference point image coordinate, and four vertex points The image coordinates and the image content belonging to the bounding box of the human body; S2) Data normalization: the normal distribution of the pixel color value of each image content within the bounding box of the human body by a normalization method; S3) Two-dimensional body skeleton plural The image coordinates of the key nodes and the relative depth estimation of each joint point with respect to a root joint: using a first-
藉此,本發明使用二階段訓練,並以二維骨架以及相對關節點深度來做為中間結果 (即第一階段輸出與第二階段輸入),進而可以生成精確的三維關節座標。在估測三維骨架 (及其關節點座標)上,本發明較目前的已知技術更為準確。In this way, the present invention uses two-stage training, and uses a two-dimensional skeleton and relative joint point depths as intermediate results (that is, the output of the first stage and the input of the second stage), so as to generate accurate three-dimensional joint coordinates. In estimating the three-dimensional skeleton (and the coordinates of the joint points), the present invention is more accurate than the currently known technology.
另外,本發明還提出一種對一三維人體骨架估測器進行模型訓練的方法,該三維人體骨架估測器包含一第一階段計算模型及一第二階段計算模型,該第一階段計算模型係由一深度卷積神經網路做為骨幹所建構而成,該第二階段計算模型係為一全連接模組的類神經網路,且為一種殘差連接的架構,該方法包含有下列步驟:SS1) 輸入一第一訓練組:該第一訓練組包含複數訓練樣本,各該訓練樣本具有一人體邊界框,每一該人體邊界框係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、及屬於該人體邊界框內的影像內容;SS2) 數據正規化:以正規化的方法來讓每一該人體邊界框內影像內容的像素色彩值常態分布;SS3) 二維人體骨架複數關節點影像座標與各關節點相對於一根關節點(root joint)的相對深度估測器參數訓練:將前述步驟SS2)所正規化後的每一該人體邊界框內的影像輸入至該第一階段計算模型,而獲得輸出為一三維熱圖,從該三維熱圖可得出該二維人體骨架上的複數關節點影像座標、及該複數關節點相對於該根關節點的相對深度估測值,其中,該根關節點係對應至該人體骨架上的一個關節點,訓練該第一階段計算模型的參數時使用一第一損失函數;SS4) 依序輸入該第一訓練組之複數訓練樣本:在依序輸入該第一訓練組的複數訓練樣本期間對該第一損失函數進行最佳化而獲致該第一階段計算模型的最佳化參數;SS5) 三維人體骨架估測器參數訓練:使用前述步驟SS4)中所得到的該第一階段計算模型最佳化參數對一第二訓練組的複數訓練樣本預測得到該複數關節點影像座標、及其相對根關節點的相對深度估測值輸入至該第二階段計算模型,而獲得輸出為一估測的三維人體骨架模型與其關節點的三維座標,其中該些輸出的關節點的三維座標係相對於該根關節點,訓練該第二階段計算模型的參數時使用一第二損失函數;以及SS6) 獲得最佳化參數:在循環執行SS5)步驟以輸入該第二訓練組的複數訓練樣本所對應的複數關節點影像座標及相對根關節點的相對深度估測值、及網路輸出處的三維人體骨架與關節點座標真實值後,對該第二損失函數進行最佳化而獲致該第二階段計算模型的最佳化參數。In addition, the present invention also provides a method for model training of a three-dimensional human skeleton estimator. The three-dimensional human skeleton estimator includes a first-stage calculation model and a second-stage calculation model. The first-stage calculation model is Constructed by a deep convolutional neural network as the backbone, the second-stage calculation model is a fully connected module-like neural network and a residual connection architecture. The method includes the following steps : SS1) Input a first training group: the first training group contains plural training samples, each of the training samples has a human body bounding box, each of the body bounding boxes is a rectangular area, including a center point located on a human body reference Point image coordinates, four vertex point image coordinates, and the image content belonging to the human body boundary box; SS2) Data normalization: normalize the pixel color value of each image content in the human body boundary box by a normalization method Distribution; SS3) Two-dimensional human skeleton image coordinates of multiple joint points and the relative depth estimator parameter training of each joint point with respect to a root joint: each human body normalized in step SS2) The image in the bounding box is input to the first-stage calculation model, and the output obtained is a three-dimensional heat map. From the three-dimensional heat map, the image coordinates of the complex joint points on the two-dimensional human skeleton and the relative joint points can be obtained. The relative depth estimation value at the root joint point, where the root joint point corresponds to a joint point on the human skeleton, and a first loss function is used when training the parameters of the first stage calculation model; SS4) according to Input the complex number training samples of the first training group sequentially: optimize the first loss function during the sequence input of the complex number training samples of the first training group to obtain the optimized parameters of the first stage calculation model; SS5) Three-dimensional human skeleton estimator parameter training: using the first-stage calculation model optimization parameters obtained in the foregoing step SS4) to predict the complex training samples of a second training group to obtain the complex joint point image coordinates, and The relative depth estimation value relative to the root joint point is input to the second-stage calculation model, and the output obtained is an estimated three-dimensional human skeleton model and the three-dimensional coordinates of the joint points, wherein the three-dimensional coordinates of the output joint points are Relative to the root joint point, use a second loss function when training the parameters of the second-stage calculation model; and SS6) Obtain optimized parameters: SS5) is performed in a loop to input the complex training samples of the second training group After the corresponding complex joint point image coordinates and the relative depth estimation value of the relative root joint point, and the real value of the three-dimensional human body skeleton and joint point coordinates at the network output, the second loss function is optimized to obtain the The second stage calculates the optimized parameters of the model.
藉此,本發明可以對三維人體骨架估測器進行模型訓練,進而訓練出一個有效的模型,而可供操作來估測輸入影像中的三維人體骨架及關節點座標。In this way, the present invention can perform model training on the three-dimensional human skeleton estimator, and then train an effective model, which can be operated to estimate the three-dimensional human skeleton and joint point coordinates in the input image.
本發明還揭露一種三維人體骨架估測器,包含:一第一階段計算模型:係由一深度卷積神經網路做為骨幹所建構而成,其輸入為一正規化後的人體邊界框影像,該人體邊界框係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、及屬於該人體邊界框內的影像內容,其輸出為一三維熱圖,從該三維熱圖得出一個二維人體骨架上的複數關節點影像座標、及該複數關節點相對於一根關節點的相對深度估測值,其中,該根關節點係對應至該人體骨架上的一個關節點;以及一第二階段計算模型:係為一全連接模組的類神經網路,且為一種殘差連接的架構,其輸入為該第一階段計算模型輸出的該複數關節點影像座標、及該複數關節點相對該根關節點的相對深度估測值,其輸出為一估測的三維人體骨架模型與其複數關節點的三維座標,該些輸出的關節點的三維座標係相對於該根關節點。The present invention also discloses a three-dimensional human skeleton estimator, including: a first-stage calculation model: constructed by a deep convolutional neural network as the backbone, the input of which is a normalized human bounding box image The boundary frame of the human body is a rectangular area, including a center point at a reference point of the human body, image coordinates of four vertex points, and image content belonging to the boundary frame of the human body. The output is a three-dimensional heat map, Obtain the image coordinates of the complex joint points on a two-dimensional human skeleton from the three-dimensional heat map, and the relative depth estimation value of the complex joint points with respect to a joint point, wherein the root joint point corresponds to the human skeleton A joint point on the above; and a second-stage calculation model: it is a fully-connected module-like neural network and a residual-connected architecture, the input of which is the complex number of joints output by the first-stage calculation model Point image coordinates, and the estimated relative depth of the complex joint point relative to the root joint point, the output of which is an estimated three-dimensional human skeleton model and the three-dimensional coordinates of its complex joint points, and the three-dimensional coordinate system of the output joint points Relative to the root joint point.
為了詳細說明本發明之技術特點所在,茲舉以下之較佳實施例並配合圖式說明如後,其中:In order to describe the technical features of the present invention in detail, the following preferred embodiments are described in conjunction with the drawings, in which:
如圖1至圖5所示,本發明第一較佳實施例提出一種對一影像中的人體進行三維人體骨架估測的方法,主要具有下列步驟:As shown in FIGS. 1 to 5, the first preferred embodiment of the present invention proposes a method for estimating a three-dimensional human skeleton of a human body in an image, which mainly includes the following steps:
S1) 估測得到人體邊界框:將一影像輸入至由一類神經網路訓練完成的一邊界框計算模型,並由該邊界框計算模型輸出一或多個人體邊界框11,此處的一或多個人體邊界框11主要是依據該影像中的人體影像數量而定,所輸出的人體邊界框11可標記於影像上,如圖2所示。每一該人體邊界框11係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、以及屬於人體邊界框11內的影像內容。於本實施例中,該類神經網路可係為中心網(CenterNet) 技術,其使用全卷積網路 (Full Convolutional Networks, FCN) 做為網路骨幹,例如 ResNet 結合轉置卷積層、Hourglass、Deep Layer Aggregation (DLA) 技術等,但不以此為限。任何可從影像中找出人體所在位置的技術皆可使用於找出所有的人體邊界框11。S1) Estimated human bounding box: input an image to a bounding box calculation model trained by a type of neural network, and output one or more
S2) 數據正規化:以正規化的方法來讓每一該人體邊界框11內影像內容的像素色彩值常態分布。於本實施例中,該正規化的方法為Z-分數正規化法 (Z-score normalization),如下式(1)所示:S2) Data normalization: normalize the pixel color value of each image content within the human
式(1) Formula 1)
其中,平均數為μ,標準差為σ,分別對應影像的 RGB 通道,原始影像像素值為p (RGB 三通道值),正規化後像素值為 p’。如此可以藉由R,G,B的平均值及標準差來將各通道像素色彩值正規化至[0,1]的範圍。Among them, the average is μ and the standard deviation is σ, respectively corresponding to the RGB channels of the image. The original image pixel value is p (RGB three-channel value), and the normalized pixel value is p'. In this way, the color value of each channel pixel can be normalized to the range of [0,1] by using the average value and standard deviation of R, G, and B.
S3) 二維人體骨架複數關節點影像座標與各關節點相對於一根關節點(root joint)的相對深度估測:使用由深度卷積神經網路做為骨幹所訓練而成的一第一階段計算模型21,將前述步驟S2)所正規化後的每一該人體邊界框11內的影像輸入至該第一階段計算模型21,而獲得輸出為數個三維熱圖,從該數個三維熱圖可得出一個二維人體骨架上的複數關節點影像座標 (均量化為 [0,64] 範圍內)、及該複數關節點相對於該根關節點的相對深度估測值(量化為 [0,64] 範圍內),其估測的架構如圖3所示。該複數關節點的相對深度值係定義為Z-Zroot,其中, Zroot 為該根關節點的深度值 (可以由另外方法求得或定義為 0 深度點),Z則為各該關節點的深度值。由於第一階段計算模型 21 的輸出為經過量化與正規化動作,其正規化關節點相對深度值與其實際相對深度值 Z-Zroot 間的關係如下式(2)所示:S3) Two-dimensional human skeleton image coordinates of multiple joint points and the relative depth estimation of each joint point with respect to a joint point (root joint): using a first-class model trained by a deep convolutional neural network as the backbone The
式(2) Formula (2)
其中,scale係指一給予之倍率。Among them, scale refers to a given magnification.
在式 (2) 中,我們可由該第一階段計算模型21預測得到的depth’
計算得到相關關節點對應的相對深度值Z-Zroot。此外,該根關節點係對應至該人體骨架上的一個關節點,於本實施例中係以人的骨盆點的座標來做為根關節點座標,但不以此為限。In formula (2), we can calculate the relative depth value Z-Zroot corresponding to the relevant joint points from the depth' predicted by the first-
S4) 三維人體骨架估測:使用一個由全連接模組做為骨幹所訓練而成的一第二階段計算模型31,將前述步驟S3)中第一階段計算模型輸出的該複數關節點影像座標、及該複數關節點的相對深度估測值輸入該第二階段計算模型31,而獲得輸出為一估測的三維人體骨架模型與其關節點的三維座標,其中該全連接模組係為一種類神經網路,而且為一種殘差連接的架構,這個架構如圖4所示,而這些輸出的關節點的三維座標係相對於該根關節點。其中,該估測的三維人體骨架模型的關節點的三維座標,乃是以線性正規化 (linear normalization)的方式來表示,其值在 -1.0 ~ 1.0 之間:S4) Three-dimensional human skeleton estimation: use a second-
式(3) Formula (3)
其中,為正規化後的三維關節點座標,為給予之根關節點三維座標。藉由式(3),我們可以反求每一關節點的三維座標 (X ,Y ,Z )。in, Is the normalized three-dimensional joint point coordinates, It is the three-dimensional coordinates of the root joint point given. By formula (3), we can reverse the three-dimensional coordinates ( X , Y , Z ) of each joint point.
由本第一實施例的上述步驟 S1) ~ S4)可以瞭解到,本發明使用估測出的二維骨架以及關節相對深度來做為輸入,進而可以生成真正的關節三維座標,其關節點位置估測,本發明較目前的已知技術更為準確。From the above steps S1) ~ S4) of the first embodiment, it can be understood that the present invention uses the estimated two-dimensional skeleton and the relative depth of the joints as input, and then can generate the real three-dimensional coordinates of the joints, and the estimated joint position Therefore, the present invention is more accurate than the currently known technology.
如圖5至圖7所示,本發明第二較佳實施例提出一種對一個三維人體骨架估測器進行模型訓練的方法,主要是為了說明前述第一實施例的估測方法其估測器是如何訓練出來的,該三維人體骨架估測器包含一第一階段計算模型21及一第二階段計算模型31,該第一階段計算模型21即為前述第一實施例中的步驟S3)的第一階段計算模型21,該第二階段計算模型31即為前述第一實施例中的步驟S4)裡的第二階段計算模型31,本第二實施例之對一個三維人體骨架估測器進行模型訓練的方法具有下列步驟:As shown in Figures 5 to 7, the second preferred embodiment of the present invention proposes a method for model training of a three-dimensional human skeleton estimator, which is mainly to illustrate the estimator of the estimation method of the aforementioned first embodiment. How is it trained? The three-dimensional human skeleton estimator includes a first-
SS1) 輸入一第一訓練組:該第一訓練組19包含複數訓練樣本191,各該訓練樣本191係為影像而具有一個人體邊界框11(示於圖2),每一該人體邊界框11係為一矩形區域,包括一中心點位於一人體參考點影像座標、四個頂角點影像座標、及屬於該人體邊界框11內的影像內容,其中,該第一訓練組19係如圖6所示。此外,在輸入該第一訓練組19時,亦輸入各該訓練樣本191所對應的輸出真實值(ground truth)。SS1) Input a first training set: the first training set 19 includes a plurality of
SS2) 數據正規化:以正規化的方法來讓每一該人體邊界框11內影像內容的像素色彩值常態分布,其係以上述式(1)為例,而可藉由R,G,B的平均值及標準差來將像素色值正規化至[0,1]範圍。SS2) Data normalization: the normal distribution of the pixel color value of the image content in each human
SS3) 二維人體骨架複數關節點影像座標與各關節點相對於一根關節點(root joint)的相對深度估測器參數訓練:將前述步驟SS2)所正規化後的每一該人體邊界框11內的影像輸入至該第一階段計算模型21,而獲得輸出為數個三維熱圖,從該些三維熱圖可得出該二維人體骨架上的複數關節點影像座標、及該複數關節點相對於一根關節點的相對深度估測值,其估測的網路架構如圖6所示。其中,該根關節點係對應至該人體骨架上的一個關節點,例如人體骨架模型的骨盆點,訓練該第一階段計算模型21的網路參數時使用一第一損失函數。其中,該複數關節點的相對深度值的真實值 (ground truth),在網路訓練時乃是由Z-Zroot方法計算而得,而Zroot為根關節點的真實深度值,Z則為各該關節點的真實深度值。該第一階段計算模型 21 訓練時,該複數關節點的相對深度值的真實值亦必須經過量化與正規化動作,使其正規化關節點相對深度值在 [0,64] 範圍內,而與其實際相對深度值 Z-Zroot 間的關係如上式(2)所示。SS3) Two-dimensional human skeleton image coordinates of multiple joint points and the relative depth estimator parameter training of each joint point with respect to a root joint: each body bounding box normalized in the previous step SS2) The images in 11 are input to the first-
該第一損失函數係關節的平均絕對誤差值(Mean Absolute Error, MAE),該誤差係關關節的二維影像座標及相對深度值,並以 Lfirst 做為第一損失函數,係如下式(4)所示:The first loss function is the mean absolute error value of the joint (Mean Absolute Error, MAE), the error is related to the two-dimensional image coordinates and relative depth value of the joint, and L first is used as the first loss function, which is the following formula ( 4) Shown:
式(4) Formula (4)
其中,𝑠𝑗
=(𝑥𝑗
, 𝑦𝑗
, 𝑑𝑒𝑝𝑡𝑗
)為第一階段計算模型 21的輸出預測值,sj
=(xj
, yj
, depthj
)為其對應之真實值。Among them, 𝑠 𝑗 = (𝑥 𝑗 , 𝑦 𝑗 , 𝑑𝑒𝑝𝑡 𝑗 ) is the output predicted value of the first
SS4) 依序輸入該第一訓練組之複數訓練樣本:在依序輸入該第一訓練組19的複數訓練樣本191期間對該第一損失函數進行最佳化而獲致該第一階段計算模型21的最佳化參數。SS4) Input the plural training samples of the first training group in sequence: during the input of the
SS5) 三維人體骨架估測器參數之訓練:循環使用前述步驟SS4)中所得到的該第一階段計算模型21最佳化參數對每一訓練樣本191預測得到該複數關節點影像座標、及其相對根關節點的相對深度估測值輸入至該第二階段計算模型31,而獲得輸出為估測的三維人體骨架模型與其關節點的三維座標,其中該些輸出的關節點的三維座標係相對於該根關節點。此處訓練第二階段計算模型31所使用輸入的二維人體骨架關節點影像座標、及其相對根關節點的相對深度值不限於如圖6所示之使用第一訓練組19 的訓練樣本191而得,亦可來自於另一第二訓練組 29 的訓練樣本 291而得,如圖7所示,其中第二訓練組 29可以相同、被包含於、或不同於第一訓練組 19。訓練該第二階段計算模型31參數時使用一第二損失函數,該第二損失函數Lsecond
係由關節點之三維座標誤差及一組骨骼向量誤差來組成,其中每一骨骼向量代表一實際或虛擬的兩個關節點間的向量,骨骼向量為事先定義的骨骼終點與骨骼起點相減得到的骨骼特徵(箭頭方向為終點)。該第二損失函數的計算式係如下式(5)所示:SS5) Training of the parameters of the three-dimensional human skeleton estimator: cyclically use the first-
式(5) Formula (5)
其中,為第j 個關節點的三維座標預測值,為其對應的三維座標真實值,為三維骨骼向量預測值,為三維骨骼向量真實值。其中,dest(k)與start(k)為兩個函數,代表第k個骨骼的終點與起點的關節編號。前述函數的超參數分別為以及,K=22以及J=17則為事先定義的骨骼數量及關節點數量。函數如式(6)所示,其用來計算整組骨骼與間的相異程度。式(6) in, Is the predicted value of the three-dimensional coordinate of the j-th joint point, Is the true value of its corresponding three-dimensional coordinates, Is the predicted value of the three-dimensional bone vector, Is the true value of the three-dimensional bone vector. Among them, dest(k) and start(k) are two functions, representing the joint number of the end point and the starting point of the kth bone. The hyperparameters of the aforementioned functions are as well as , K=22 and J=17 are the number of bones and joint points defined in advance. The function is shown in formula (6), which is used to calculate the entire set of bones and The degree of dissimilarity between. Formula (6)
上述的該第二損失函數係基於關節點位置Sj 及骨骼向量Bk ,骨骼向量的誤差Lbone 使類神經網路學到骨架結構的空間關係,可藉以增強關節點之間的物理約束(physical constraint)。關節點的誤差Ljoint 使類神經網路學到精確的座標位置。骨骼向量為事先定義的骨骼終點與骨骼起點相減得到的骨骼特徵(箭頭方向為終點),骨骼的終點與起點定義關係如圖8所示。The above-mentioned second loss function is based on the joint point position S j and the bone vector B k . The error L bone of the bone vector enables the neural network to learn the spatial relationship of the skeleton structure, thereby enhancing the physical constraints between the joint points ( physical constraint). The error L joint of the nodes enables the neural network to learn precise coordinate positions. The bone vector is the bone feature obtained by subtracting the pre-defined bone end point and the bone start point (the arrow direction is the end point), and the definition relationship between the bone end point and the start point is shown in Figure 8.
SS6) 獲得最佳化參數:在循環執行SS5)步驟以輸入該第二訓練組29的複數訓練樣本291所對應的複數關節點影像座標及相對根關節點的相對深度估測值、及網路輸出處的三維人體骨架與關節點座標真實值後,對該第二損失函數進行最佳化而獲致該第二階段計算模型31的最佳化參數。SS6) Obtain optimized parameters: Execute the SS5) step in the loop to input the image coordinates of the complex joint points corresponding to the
由上述步驟可知,本第二實施例可以對本發明的兩階段三維人體骨架估測器進行模型訓練,進而訓練出一個有效的模型,而可供操作,並以前述第一實施例的方法使用該訓練完成的估測器來對一影像中的人體進行三維人體骨架估測。It can be seen from the above steps that the second embodiment can perform model training on the two-stage three-dimensional human skeleton estimator of the present invention, and then train an effective model, which is available for operation, and uses the method in the aforementioned first embodiment. The trained estimator is used to estimate the three-dimensional human skeleton of the human body in an image.
11:人體邊界框 19:第一訓練組 191:訓練樣本 21:第一階段計算模型 29:第二訓練組 291:訓練樣本 31:第二階段計算模型11: Human body bounding box 19: The first training group 191: training samples 21: The first stage calculation model 29: The second training group 291: training samples 31: The second stage calculation model
圖1係本發明第一較佳實施例之流程圖。 圖2係本發明第一較佳實施例之影像示意圖,顯示影像上標記了人體邊界框的狀態。 圖3係本發明第一較佳實施例之深度學習網路架構示意圖,顯示第一階段計算模型之架構。 圖4係本發明第一較佳實施例之另一深度學習網路架構示意圖,顯示第二階段計算模型之架構。 圖5係本發明第二較佳實施例之流程圖。 圖6係本發明第二較佳實施例之深度網路架構及學習示意圖,顯示利用第一訓練組之第一階段計算模型之訓練。 圖7係本發明第二較佳實施例之再一深度網路架構及學習示意圖,顯示由一第二訓練組形成之二維人體骨節點座標及各節點相對根節點之相對深度使用第二階段計算模型之訓練。 圖8係本發明第二較佳實施例中人體骨架之關節及骨骼示意圖。Fig. 1 is a flowchart of the first preferred embodiment of the present invention. FIG. 2 is a schematic diagram of an image of the first preferred embodiment of the present invention, showing a state where the boundary frame of the human body is marked on the image. FIG. 3 is a schematic diagram of the deep learning network architecture of the first preferred embodiment of the present invention, showing the architecture of the first stage calculation model. 4 is a schematic diagram of another deep learning network architecture of the first preferred embodiment of the present invention, showing the architecture of the second stage calculation model. Fig. 5 is a flowchart of the second preferred embodiment of the present invention. 6 is a schematic diagram of the deep network architecture and learning of the second preferred embodiment of the present invention, showing the training of the first stage calculation model using the first training set. FIG. 7 is another schematic diagram of the deep network architecture and learning of the second preferred embodiment of the present invention, showing the two-dimensional human bone node coordinates formed by a second training set and the relative depth of each node to the root node. The second stage of use Training of computational model. Fig. 8 is a schematic diagram of the joints and bones of the human skeleton in the second preferred embodiment of the present invention.
Claims (16)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109108622A TWI753382B (en) | 2020-03-16 | 2020-03-16 | Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109108622A TWI753382B (en) | 2020-03-16 | 2020-03-16 | Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202137144A true TW202137144A (en) | 2021-10-01 |
| TWI753382B TWI753382B (en) | 2022-01-21 |
Family
ID=79601238
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW109108622A TWI753382B (en) | 2020-03-16 | 2020-03-16 | Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI753382B (en) |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10157488B2 (en) * | 2015-09-21 | 2018-12-18 | TuringSense Inc. | System and method for capturing and analyzing motions |
| US10372968B2 (en) * | 2016-01-22 | 2019-08-06 | Qualcomm Incorporated | Object-focused active three-dimensional reconstruction |
| TWI620076B (en) * | 2016-12-29 | 2018-04-01 | 大仁科技大學 | Analysis system of humanity action |
| EP3579196A1 (en) * | 2018-06-05 | 2019-12-11 | Cristian Sminchisescu | Human clothing transfer method, system and device |
| CN110443144A (en) * | 2019-07-09 | 2019-11-12 | 天津中科智能识别产业技术研究院有限公司 | A kind of human body image key point Attitude estimation method |
| CN110570455B (en) * | 2019-07-22 | 2021-12-07 | 浙江工业大学 | Whole body three-dimensional posture tracking method for room VR |
-
2020
- 2020-03-16 TW TW109108622A patent/TWI753382B/en active
Also Published As
| Publication number | Publication date |
|---|---|
| TWI753382B (en) | 2022-01-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111862126B (en) | Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm | |
| CN113205595B (en) | A method for constructing a 3D human pose estimation model and its application | |
| CN113221647B (en) | 6D pose estimation method fusing point cloud local features | |
| CN113516693B (en) | Rapid and universal image registration method | |
| CN112232106B (en) | A 2D to 3D Human Posture Estimation Method | |
| CN110427877A (en) | Human body three-dimensional posture estimation method based on structural information | |
| CN110276768B (en) | Image segmentation method, image segmentation device, image segmentation device and medium | |
| CN111199207B (en) | Two-dimensional multi-human body posture estimation method based on depth residual error neural network | |
| CN107204010A (en) | A kind of monocular image depth estimation method and system | |
| CN115294265B (en) | Three-dimensional human body grid reconstruction method and system based on attention of graph skeleton | |
| CN110223382B (en) | Reconstruction method of free-view 3D model of single-frame image based on deep learning | |
| CN116030537B (en) | Three-dimensional human body posture estimation method based on multi-branch attention-seeking convolution | |
| CN107392131A (en) | A kind of action identification method based on skeleton nodal distance | |
| CN116935486A (en) | Sign language identification method and system based on skeleton node and image mode fusion | |
| CN114155406A (en) | Pose estimation method based on region-level feature fusion | |
| CN116343338A (en) | Human Skeleton Action Recognition Method Based on Hierarchical Spatiotemporal Attention Network | |
| CN117711066A (en) | Three-dimensional human body posture estimation method, device, equipment and medium | |
| CN113627259A (en) | Fine motion recognition method based on graph convolution network | |
| CN114092650A (en) | Three-dimensional point cloud generation method based on efficient graph convolution | |
| CN113920270A (en) | Layout reconstruction method and system based on multi-view panorama | |
| CN116805337B (en) | A crowd positioning method based on cross-scale visual transformation network | |
| CN117253012B (en) | Method for restoring plane building free-form surface grid structure to three-dimensional space | |
| TWI753382B (en) | Method for estimating three-dimensional human skeleton for a human body in an image, three-dimensional human skeleton estimator, and training method for the estimator | |
| Chang et al. | Multi-view 3d human pose estimation with self-supervised learning | |
| CN115130593B (en) | Connection relation determining method, device, equipment and medium |