TWI879635B

TWI879635B - Six-dimensional object posture tracking method and device, computer-readable recording medium

Info

Publication number: TWI879635B
Application number: TW113126397A
Authority: TW
Inventors: 陳政隆; 春祿阮; 陳仁杰; 曉輝王; 何忠范; 強軍阮
Original assignee: 所羅門股份有限公司
Priority date: 2024-07-15
Filing date: 2024-07-15
Publication date: 2025-04-01

Abstract

一種六維物件姿態追蹤方法，由六維物件姿態追蹤裝置其中的二維物件追蹤模組對輸入的一RGB影像進行物件偵測、追蹤及分類，並產生與該RGB影像中的各該物件對應一二維物件追蹤結果，接著六維物件姿態追蹤裝置其中的六維物件姿態估測模組判斷該等二維物件追蹤結果其中之一包含的識別碼未出現在一物件清單中時，六維物件姿態估測模組根據未出現在該物件清單中的識別碼所屬的二維物件追蹤結果對RGB影像中與未出現在該物件清單中的識別碼對應的物件進行六維姿態估測，並將對應產生的六維姿態估測結果記錄在該物件清單中；然後六維物件姿態追蹤裝置其中的六維物件姿態追蹤模組根據該物件清單包含的所有六維姿態估測結果，對該RGB影像中的物件進行六維物件姿態追蹤。A six-dimensional object posture tracking method, wherein a two-dimensional object tracking module in a six-dimensional object posture tracking device detects, tracks and classifies an object on an input RGB image, and generates a two-dimensional object tracking result corresponding to each object in the RGB image. Then, when a six-dimensional object posture estimation module in the six-dimensional object posture tracking device determines that an identification code included in one of the two-dimensional object tracking results does not appear in an object list, the six-dimensional object posture estimation module According to the two-dimensional object tracking results of the identification codes that do not appear in the object list, six-dimensional posture estimation is performed on the objects in the RGB image that correspond to the identification codes that do not appear in the object list, and the corresponding six-dimensional posture estimation results are recorded in the object list; then the six-dimensional object posture tracking module in the six-dimensional object posture tracking device performs six-dimensional object posture tracking on the objects in the RGB image according to all six-dimensional posture estimation results contained in the object list.

Description

Six-dimensional object posture tracking method and device, and computer-readable recording medium

本發明是有關於一種追蹤影像中的物件方法，特別是指一種追蹤影像中的物件之六維姿態的六維物件姿態追蹤方法。The present invention relates to a method for tracking an object in an image, and more particularly to a six-dimensional object posture tracking method for tracking the six-dimensional posture of an object in an image.

現有的六維物件姿態追蹤技術能追蹤影像中的物件的六維姿態(包括三個平移自由度（x、y、z軸方向的位移）和三個旋轉自由度（繞x、y、z軸的旋轉角度）)。六維物件姿態追蹤技術主要包含物件姿態估測和物件姿態追蹤兩個部分，其中物件姿態估測相對較為耗時，因此，當影像中的物件數量較多時，在物件姿態估測階段將耗費更多時間。The existing 6D object posture tracking technology can track the 6D posture of objects in the image (including three translational degrees of freedom (displacement in the x, y, and z axes) and three rotational degrees of freedom (rotation angles around the x, y, and z axes)). The 6D object posture tracking technology mainly includes two parts: object posture estimation and object posture tracking. Object posture estimation is relatively time-consuming. Therefore, when there are more objects in the image, more time will be spent in the object posture estimation stage.

此外，現有的六維物件姿態追蹤技術會對輸入的每一張影像中的所有物件皆進行六維姿態估測及追蹤，因此，即使輸入的一張影像與前一張影像中的物件完全不變或只有新增一個物件，六維物件姿態追蹤技術仍會對輸入的該張影像中的所有物件再次進行姿態估測及追蹤，導致現有的六維物件姿態追蹤技術的效能無法有效提升。In addition, the existing six-dimensional object posture tracking technology will perform six-dimensional posture estimation and tracking on all objects in each input image. Therefore, even if the objects in an input image are completely unchanged from the previous image or only one object is added, the six-dimensional object posture tracking technology will still perform posture estimation and tracking on all objects in the input image again, resulting in the performance of the existing six-dimensional object posture tracking technology cannot be effectively improved.

因此，本發明之目的，即在提供一種六維物件姿態追蹤方法及實現該方法的一種六維物件姿態追蹤裝置和一種電腦可讀取的記錄媒體，其能縮短六維物件姿態追蹤技術其中估測六維物件姿態的時間以提升其追蹤效能。Therefore, the purpose of the present invention is to provide a six-dimensional object posture tracking method and a six-dimensional object posture tracking device and a computer-readable recording medium for implementing the method, which can shorten the time of estimating the six-dimensional object posture in the six-dimensional object posture tracking technology to improve its tracking performance.

於是，本發明一種六維物件姿態追蹤方法，應用於一六維物件姿態追蹤裝置；該方法包括下列步驟。Therefore, the present invention provides a six-dimensional object posture tracking method, which is applied to a six-dimensional object posture tracking device; the method includes the following steps.

該六維物件姿態追蹤裝置的一二維物件追蹤模組對輸入的一RGB影像進行物件偵測、追蹤及分類，並產生與該RGB影像中的各該物件對應一二維物件追蹤結果，各該二維物件追蹤結果包含相對應的該物件的識別碼及邊界框。A two-dimensional object tracking module of the six-dimensional object posture tracking device performs object detection, tracking and classification on an input RGB image, and generates a two-dimensional object tracking result corresponding to each object in the RGB image, each of which includes an identification code and a bounding box of the corresponding object.

該六維物件姿態追蹤裝置的一六維物件姿態估測模組比對該等二維物件追蹤結果與一物件清單，並於判斷該等二維物件追蹤結果其中之一包含的該識別碼未出現在該物件清單中時，該六維物件姿態估測模組根據未出現在該物件清單中的該識別碼所屬的該二維物件追蹤結果及該RGB影像對應的一深度圖，對該RGB影像中與未出現在該物件清單中的該識別碼對應的該物件進行六維姿態估測，並將對應產生的一六維姿態估測結果記錄在該物件清單中。A six-dimensional object posture estimation module of the six-dimensional object posture tracking device compares the two-dimensional object tracking results with an object list, and when it is determined that the identification code included in one of the two-dimensional object tracking results does not appear in the object list, the six-dimensional object posture estimation module performs six-dimensional posture estimation on the object corresponding to the identification code that does not appear in the object list in the RGB image based on the two-dimensional object tracking result to which the identification code that does not appear in the object list belongs and a depth map corresponding to the RGB image, and records the corresponding six-dimensional posture estimation result in the object list.

該六維物件姿態追蹤裝置的一六維物件姿態追蹤模組根據該物件清單包含的所有六維姿態估測結果及該RGB影像對應的該深度圖，對該RGB影像中的物件進行六維物件姿態追蹤。A six-dimensional object posture tracking module of the six-dimensional object posture tracking device performs six-dimensional object posture tracking on the object in the RGB image according to all six-dimensional posture estimation results included in the object list and the depth map corresponding to the RGB image.

再者，本發明實現上述方法的一種六維物件姿態追蹤裝置，包括一儲存單元及一處理單元；該儲存單元中儲存一物件清單；該處理單元包含一二維物件追蹤模組、一六維物件姿態估測模組及一六維物件姿態追蹤模組。Furthermore, the present invention implements a six-dimensional object posture tracking device for the above method, including a storage unit and a processing unit; the storage unit stores an object list; the processing unit includes a two-dimensional object tracking module, a six-dimensional object posture estimation module and a six-dimensional object posture tracking module.

該處理單元執行該二維物件追蹤模組以對輸入的一RGB影像進行物件偵測、追蹤及分類，並產生與該RGB影像中的各該物件對應一二維物件追蹤結果，各該二維物件追蹤結果包含相對應的該物件的識別碼及邊界框。The processing unit executes the two-dimensional object tracking module to detect, track and classify objects on an input RGB image, and generates a two-dimensional object tracking result corresponding to each object in the RGB image. Each of the two-dimensional object tracking results includes an identification code and a bounding box of the corresponding object.

該處理單元執行該六維物件姿態估測模組，使其比對該等二維物件追蹤結果與該儲存單元儲存的該物件清單，並於判斷該等二維物件追蹤結果其中之一包含的該識別碼未出現在該物件清單中時，該六維物件姿態估測模組根據未出現在該物件清單中的該識別碼所屬的該二維物件追蹤結果及該RGB影像對應的一深度圖，對該RGB影像中與未出現在該物件清單中的該識別碼對應的該物件進行六維姿態估測，並將對應產生的一六維姿態估測結果記錄在該物件清單中。The processing unit executes the six-dimensional object posture estimation module to compare the two-dimensional object tracking results with the object list stored in the storage unit. When it is determined that the identification code included in one of the two-dimensional object tracking results does not appear in the object list, the six-dimensional object posture estimation module performs six-dimensional posture estimation on the object corresponding to the identification code that does not appear in the object list in the RGB image based on the two-dimensional object tracking result to which the identification code does not appear in the object list and a depth map corresponding to the RGB image, and records the corresponding six-dimensional posture estimation result in the object list.

該處理單元執行該六維物件姿態追蹤模組，使其根據該物件清單包含的所有六維姿態估測結果及該RGB影像對應的該深度圖，對該RGB影像中的物件進行六維物件姿態追蹤。The processing unit executes the six-dimensional object posture tracking module so that it performs six-dimensional object posture tracking on the object in the RGB image according to all six-dimensional posture estimation results contained in the object list and the depth map corresponding to the RGB image.

在本發明的一些實施態樣中，該六維物件姿態估測模組判斷該物件清單中的六維姿態估測結果其中之一包含的該識別碼未出現在該等二維物件追蹤結果其中之一時，該六維物件姿態估測模組將未出現在該等二維物件追蹤結果其中之一的該識別碼所屬的該六維姿態估測結果從該物件清單中移除。In some implementations of the present invention, when the six-dimensional object pose estimation module determines that the identification code included in one of the six-dimensional pose estimation results in the object list does not appear in one of the two-dimensional object tracking results, the six-dimensional object pose estimation module will remove the six-dimensional pose estimation result to which the identification code that does not appear in one of the two-dimensional object tracking results belongs from the object list.

在本發明的一些實施態樣中，該二維物件追蹤模組和該六維物件姿態估測模組是透過多執行緒機制進行非同步運算。In some embodiments of the present invention, the 2D object tracking module and the 6D object posture estimation module perform asynchronous operations via a multi-thread mechanism.

在本發明的一些實施態樣中，該六維物件姿態估測模組是根據該RGB影像中與未出現在該物件清單中的該識別碼對應的該物件對應的一CAD模型，對該RGB影像中與未出現在該物件清單中的該識別碼對應的該物件進行六維姿態估測。In some implementations of the present invention, the six-dimensional object pose estimation module performs six-dimensional pose estimation on the object in the RGB image corresponding to the identification code that does not appear in the object list based on a CAD model corresponding to the object in the RGB image corresponding to the identification code that does not appear in the object list.

在本發明的一些實施態樣中，該六維物件姿態追蹤模組透過動態批量推論機制同時追蹤該RGB影像中的多個物件。In some embodiments of the present invention, the 6D object pose tracking module simultaneously tracks multiple objects in the RGB image through a dynamic batch inference mechanism.

在本發明的一些實施態樣中，各該二維物件追蹤結果還包括相對應的各該物件的類別名稱及遮罩，且各該六維姿態估測結果包含相對應的該物件的該識別碼、類別名稱、二維姿勢資訊、六維姿態資訊、網格、網格張量、網格直徑和模型中心。In some embodiments of the present invention, each of the two-dimensional object tracking results also includes the corresponding class name and mask of each of the objects, and each of the six-dimensional posture estimation results includes the corresponding identification code, class name, two-dimensional posture information, six-dimensional posture information, mesh, mesh tensor, mesh diameter and model center of the object.

此外，本發明實現上述方法的一種電腦可讀取的記錄媒體，其中儲存一包含一二維物件追蹤模組、一六維物件姿態估測模組及一六維物件姿態追蹤模組的軟體程式，該軟體程式被一電腦裝置載入並執行時，該電腦裝置能完成如上所述的六維物件姿態追蹤方法。In addition, the present invention implements a computer-readable recording medium for implementing the above-mentioned method, in which a software program including a two-dimensional object tracking module, a six-dimensional object posture estimation module and a six-dimensional object posture tracking module is stored. When the software program is loaded and executed by a computer device, the computer device can complete the six-dimensional object posture tracking method as described above.

本發明之功效在於：藉由該二維物件追蹤模組先對輸入的RGB影像進行物件偵測、追蹤及分類，產生與RGB影像中的各該物件對應的二維物件追蹤結果，並由該六維物件姿態估測模組比對該等二維物件追蹤結果與該物件清單，以找出存在該等二維物件追蹤結果中但不存在該物件清單中之與RGB影像中的物件對應的識別碼，且只對該RGB影像中與未出現在該物件清單中的識別碼對應的物件進行六維姿態估測，藉此，節省軟硬體運算資源、縮短六維姿態估測的運算時間並加快六維物件姿態追蹤的速度及效能。The effect of the present invention is that: the two-dimensional object tracking module first performs object detection, tracking and classification on the input RGB image to generate two-dimensional object tracking results corresponding to each object in the RGB image, and the six-dimensional object posture estimation module compares the two-dimensional object tracking results with the object list to find out the identification codes corresponding to the objects in the RGB image that exist in the two-dimensional object tracking results but not in the object list, and only performs six-dimensional posture estimation on the objects in the RGB image that correspond to the identification codes that do not appear in the object list, thereby saving software and hardware computing resources, shortening the computing time of six-dimensional posture estimation, and accelerating the speed and performance of six-dimensional object posture tracking.

在本發明被詳細描述之前，應當注意在以下的說明內容中，類似的元件是以相同的編號來表示。Before the present invention is described in detail, it should be noted that similar components are represented by the same reference numerals in the following description.

參閱圖1所示，是本發明六維物件姿態追蹤方法的一實施例的流程步驟，該方法是由圖2所示的一六維物件姿態追蹤裝置100實現，該六維物件姿態追蹤裝置100是一電腦裝置，其主要包括一儲存單元1及一處理單元2。該儲存單元1(例如記憶體模組)儲存一物件清單11。Referring to FIG. 1 , there is shown a flow chart of an embodiment of a six-dimensional object posture tracking method of the present invention, which is implemented by a six-dimensional object posture tracking device 100 shown in FIG. 2 . The six-dimensional object posture tracking device 100 is a computer device, which mainly includes a storage unit 1 and a processing unit 2. The storage unit 1 (e.g., a memory module) stores an object list 11.

該處理單元2例如但不限於中央處理器(CPU)、微處理器(MPU)、圖形處理器(GPU) 、張量處理單元（Tensor Processing Unit，TPU）等其中之一或其任意組合，且該處理單元2載入並執行儲存於一電腦可讀取的記錄媒體(例如但不限於該儲存單元1)的一軟體程式，該軟體程式包含二維物件追蹤模組21、六維物件姿態估測模組22及六維物件姿態追蹤模組23。The processing unit 2 is, for example but not limited to, a central processing unit (CPU), a microprocessor (MPU), a graphics processing unit (GPU), a tensor processing unit (TPU), etc., or any combination thereof, and the processing unit 2 loads and executes a software program stored in a computer-readable recording medium (for example but not limited to the storage unit 1), and the software program includes a two-dimensional object tracking module 21, a six-dimensional object posture estimation module 22, and a six-dimensional object posture tracking module 23.

其中，該二維物件追蹤模組21包含執行物件偵測的物件偵測模型，例如但不限於YoloV8、執行物件追蹤的物件追蹤模組，例如但不限於BoT-SORT，以及執行物件分類的物件分類模型，例如但不限於DINO-V2。其中YOLOv8與BoT-SORT的結合實現多目標追蹤（Multi-Object Tracking, MOT），在多變的環境條件下，如光線變化、遮擋等情況，YOLOv8能夠準確地檢測出影像中的目標物件，而BoT-SORT則能夠有效地跟蹤這些物件。此外，該二維物件追蹤模組21還包含一過濾模型，用以過濾影像中微小的目標物件。且在本實施例中，該二維物件追蹤模組21是一已預先藉由樣本數據訓練完成的模型。The two-dimensional object tracking module 21 includes an object detection model for performing object detection, such as but not limited to YoloV8, an object tracking module for performing object tracking, such as but not limited to BoT-SORT, and an object classification model for performing object classification, such as but not limited to DINO-V2. The combination of YOLOv8 and BoT-SORT realizes multi-object tracking (MOT). Under changing environmental conditions, such as light changes, occlusion, etc., YOLOv8 can accurately detect target objects in the image, while BoT-SORT can effectively track these objects. In addition, the two-dimensional object tracking module 21 also includes a filtering model for filtering tiny target objects in the image. In this embodiment, the two-dimensional object tracking module 21 is a model that has been pre-trained with sample data.

該六維物件姿態估測模組22可以是但不限於包含(採用)SAM-6D演算法，SAM-6D是一個無需預先藉由樣本數據訓練的零樣本六維(6D)姿態估計框架，藉由給定任意目標物件的CAD 模型，SAM-6D能從RGB影像及其對應的深度圖中對目標物件進行實例分割和姿態估計。因此，在本實施例中，該六維物件姿態估測模組22會預先取得要進行六維物件姿態估測的RGB影像中的各該物件對應的CAD模型。此外，本實施例的該處理單元2能夠讓該二維物件追蹤模組21和該六維物件姿態估測模組22透過多執行緒(multithreading)機制進行非同步運算，以提升整體處理效能。The six-dimensional object attitude estimation module 22 may be, but is not limited to, including (using) the SAM-6D algorithm. SAM-6D is a zero-sample six-dimensional (6D) attitude estimation framework that does not require prior training with sample data. By given a CAD model of any target object, SAM-6D can perform instance segmentation and attitude estimation of the target object from the RGB image and its corresponding depth map. Therefore, in this embodiment, the six-dimensional object attitude estimation module 22 will pre-obtain the CAD model corresponding to each object in the RGB image to be used for six-dimensional object attitude estimation. In addition, the processing unit 2 of this embodiment allows the two-dimensional object tracking module 21 and the six-dimensional object attitude estimation module 22 to perform asynchronous operations through a multithreading mechanism to improve overall processing performance.

該六維物件姿態追蹤模組23在本實施例中可以是但不限於包含(採用)NVIDIA發表的Foundation Pose演算法其中關於物件追蹤的技術。且該六維物件姿態追蹤模組23應用動態批量推論機制(dynamic batch inference)同時追蹤影像中的多個物件。動態批量推論是在深度學習模型上線推論階段的一種技巧，主要目的是提升效能，讓該處理單元2其中的例如GPU、TPU這類硬體資源充分發揮硬體平行運算能力，以在吞吐量和延遲之間取得較好的平衡，使該處理單元2運用得更有效率。The six-dimensional object pose tracking module 23 in this embodiment may be, but is not limited to, including (using) the object tracking technology in the Foundation Pose algorithm published by NVIDIA. And the six-dimensional object posture tracking module 23 applies a dynamic batch inference mechanism (dynamic batch inference) to simultaneously track multiple objects in the image. Dynamic batch inference is a technique used in the online inference phase of deep learning models. The main purpose is to improve performance and allow hardware resources such as GPU and TPU in the processing unit 2 to fully utilize the hardware parallel computing capabilities to achieve a better balance between throughput and latency, so that the processing unit 2 can be used more efficiently.

此外，為了說明本實施例的功效，在進行圖1的流程之前，假設該物件清單11一開始是空的，且此時該處理單元2收到輸入的第一張RGB影像P1及其對應的一深度圖D1，例如圖3所示。如圖4舉例，該第一張RGB影像P1中具有三種物件，包括兩個長方體物件41、42、一個正方體物件43及兩個圓柱體物件44、45；接著，如圖1的步驟S1，該處理單元2執行該二維物件追蹤模組21，以對該第一張RGB影像P1進行物件偵測、追蹤及分類，而產生如圖3所示一第一張追蹤結果影像T1(可暫存於該儲存單元1)以及與該第一張RGB影像P1中的各該物件41~45對應一二維物件追蹤結果T11(可暫存於該儲存單元1)。In addition, in order to illustrate the effectiveness of this embodiment, before performing the process of FIG. 1 , it is assumed that the object list 11 is empty at the beginning, and at this time the processing unit 2 receives the input of the first RGB image P1 and its corresponding depth map D1, as shown in FIG. 3 . As shown in FIG. 4 , the first RGB image P1 contains three kinds of objects, including two rectangular objects 41, 42, a cube object 43, and two cylindrical objects 44, 45. Then, as shown in step S1 of FIG. 1 , the processing unit 2 executes the two-dimensional object tracking module 21 to perform object detection, tracking, and classification on the first RGB image P1, thereby generating a first tracking result image T1 (which can be temporarily stored in the storage unit 1) as shown in FIG. 3 and a two-dimensional object tracking result T11 (which can be temporarily stored in the storage unit 1) corresponding to each of the objects 41-45 in the first RGB image P1.

如圖4所示，該第一張追蹤結果影像T1中標示出該第一張RGB影像P1中被偵測到的各該物件41~45的一識別碼(例如ID1、ID2、ID3、ID4、ID5)、一分類名稱(例如長方體物件41、42的分類名稱為a、正方體物件43的分類名稱為b、圓柱體物件44、45的分類名稱為c)、一邊界框(如圖4所示界定物件邊界的虛線方框)及一遮罩(界定物件表面範圍的半透明層，圖未示)，且各該二維物件追蹤結果T11包含相對應的該物件的該識別碼、該類別名稱、該邊界框及該遮罩。As shown in FIG4 , the first tracking result image T1 indicates an identification code (e.g., ID1, ID2, ID3, ID4, ID5) of each of the objects 41 to 45 detected in the first RGB image P1, a classification name (e.g., the classification name of the rectangular objects 41 and 42 is a, the classification name of the cube object 43 is b, and the classification name of the cylindrical objects 44 and 45 is c), a bounding box (a dotted box defining the boundary of the object as shown in FIG4 ) and a mask (a translucent layer defining the surface range of the object, not shown), and each of the two-dimensional object tracking results T11 includes the corresponding identification code, the category name, the bounding box and the mask of the object.

接著，如圖1的步驟S2，該處理單元2執行該六維物件姿態估測模組22，比對從該儲存單元1讀取的該物件清單11及該等二維物件追蹤結果T11，並判斷是否該等二維物件追蹤結果T11其中之一(例如至少其中一個或多個)包含的該識別碼未出現在該物件清單11中，若否，表示該等二維物件追蹤結果T11包含的該識別碼皆已出現在該物件清單11中，則直接進入步驟S4；若是，表示該等二維物件追蹤結果T11至少其中之一包含的該識別碼未出現在該物件清單11中，則如圖1的步驟S3，該六維物件姿態估測模組22根據未出現在該物件清單11中的該識別碼所屬的該二維物件追蹤結果T11及該第一RGB影像P1對應的該第一張深度圖D1，對該第一張RGB影像P1中與未出現在該物件清單11中的該識別碼對應的該物件進行六維姿態估測，並將對應產生的一包含該識別碼的六維姿態估測結果記錄在該物件清單11中，再進入步驟S4。Next, as shown in step S2 of FIG. 1 , the processing unit 2 executes the six-dimensional object posture estimation module 22, compares the object list 11 read from the storage unit 1 with the two-dimensional object tracking results T11, and determines whether the identification code included in one of the two-dimensional object tracking results T11 (for example, at least one or more of them) does not appear in the object list 11. If not, it means that the identification codes included in the two-dimensional object tracking results T11 have all appeared in the object list 11, and then directly enters step S4; if yes, it means that at least one of the two-dimensional object tracking results T11 has appeared in the object list 11. If the identification code included in one of the two-dimensional object tracking results T11 to which the identification code not appearing in the object list 11 belongs and the first depth map D1 corresponding to the first RGB image P1, the six-dimensional object posture estimation module 22 performs six-dimensional posture estimation on the object corresponding to the identification code not appearing in the object list 11 in the first RGB image P1 according to the two-dimensional object tracking result T11 to which the identification code not appearing in the object list 11 belongs, and records the corresponding six-dimensional posture estimation result including the identification code in the object list 11, and then enters step S4.

此時，由於該物件清單11是空的，因此，在步驟S3中，該六維物件姿態估測模組22將對該第一張RGB影像P1中的全部物件41~45進行六維姿態估測，並將對應各該物件的各該六維姿態估測結果T12記錄在該物件清單11中；因此，在此步驟S3完成時，該物件清單11中記錄了該第一張RGB影像P1中的全部物件41~45的該等六維姿態估測結果T12，如圖5所示。其中，各該六維姿態估測結果T12包含相對應的該物件的識別碼(例如ID、ID2…)、類別名稱(例如a、b、c)、二維姿勢資訊、六維姿態資訊、網格、網格張量、網格直徑和模型中心。At this time, since the object list 11 is empty, in step S3, the 6D object pose estimation module 22 performs 6D pose estimation on all objects 41-45 in the first RGB image P1, and records each of the 6D pose estimation results T12 corresponding to each of the objects in the object list 11; therefore, when this step S3 is completed, the object list 11 records the 6D pose estimation results T12 of all objects 41-45 in the first RGB image P1, as shown in FIG5. Each of the 6D pose estimation results T12 includes the corresponding identification code (e.g., ID, ID2, ...), category name (e.g., a, b, c), 2D pose information, 6D pose information, grid, grid tensor, grid diameter, and model center of the object.

接著，如步驟S4，該六維物件姿態估測模組22判斷是否該物件清單11包含的該等六維姿態估測結果T12其中之一(例如至少其中一個或多個)包含的該識別碼未對應出現在該等二維物件追蹤結果T11其中之一，若是，表示該物件清單11中存有不存在該第一張RGB影像P1中的物件的六維姿態估測結果，則如步驟S5，該六維物件姿態估測模組22將未出現在該等二維物件追蹤結果T11其中之一的該識別碼所屬的該六維姿態估測結果從該物件清單11中移除，再接著進行步驟S6；而在此階段，因為該物件清單11中記錄的該等六維姿態估測結果T12其中的識別碼皆對應於該等二維物件追蹤結果T11其中的識別碼，因此，步驟S4的判斷結果為否，而直接進入步驟S6。Next, as in step S4, the 6D object pose estimation module 22 determines whether the identification code included in one of the 6D pose estimation results T12 (e.g., at least one or more of them) included in the object list 11 does not correspond to one of the 2D object tracking results T11. If so, it means that the object list 11 contains a 6D pose estimation result of an object that does not exist in the first RGB image P1. Then, as in step S5, the 6D object pose estimation module 22 determines whether the identification code included in one of the 6D pose estimation results T12 (e.g., at least one or more of them) included in the object list 11 does not correspond to one of the 2D object tracking results T11. If so, it means that the object list 11 contains a 6D pose estimation result of an object that does not exist in the first RGB image P1. The estimation module 22 removes the six-dimensional posture estimation result to which the identification code does not appear in any of the two-dimensional object tracking results T11 from the object list 11, and then proceeds to step S6; and at this stage, because the identification codes of the six-dimensional posture estimation results T12 recorded in the object list 11 all correspond to the identification codes of the two-dimensional object tracking results T11, the judgment result of step S4 is no, and the process directly proceeds to step S6.

在步驟S6中，該處理單元2執行該六維物件姿態追蹤模組23，使該六維物件姿態追蹤模組23根據該物件清單11包含的所有六維姿態估測結果T12及該第一張RGB影像P1對應的該第一張深度圖D1，對該第一張RGB影像P1中的物件進行六維姿態追蹤。In step S6, the processing unit 2 executes the six-dimensional object posture tracking module 23, so that the six-dimensional object posture tracking module 23 performs six-dimensional posture tracking on the object in the first RGB image P1 according to all six-dimensional posture estimation results T12 included in the object list 11 and the first depth map D1 corresponding to the first RGB image P1.

接著，如圖3所示，當第二張RGB影像P2及其對應的第二張深度圖D2輸入該處理單元2時，將重覆步驟S1，由該二維物件追蹤模組21對輸入的第二張RGB影像P2進行物件偵測、追蹤及分類，而對應產生一第二張追蹤結果影像T2以及與該第二張RGB影像P2中的各該物件對應一二維物件追蹤結果T21，如圖3和圖6所示。且如圖4和圖6所示，在本實施例中，該第二張RGB影像P2相較於第一張RGB影像P1新增了一個正方體物件46，因此該等二維物件追蹤結果T21中將包括新增的該正方體物件46的二維物件追蹤結果(其中包含識別碼ID6及分類名稱b)。Next, as shown in FIG3, when the second RGB image P2 and its corresponding second depth map D2 are input to the processing unit 2, step S1 is repeated, and the two-dimensional object tracking module 21 performs object detection, tracking and classification on the input second RGB image P2, and accordingly generates a second tracking result image T2 and a two-dimensional object tracking result T21 corresponding to each object in the second RGB image P2, as shown in FIG3 and FIG6. And as shown in FIG4 and FIG6, in this embodiment, the second RGB image P2 has a newly added cube object 46 compared to the first RGB image P1, so the two-dimensional object tracking results T21 will include the two-dimensional object tracking results of the newly added cube object 46 (including the identification code ID6 and the classification name b).

然後，進行步驟S2，由該六維物件姿態估測模組22比對該等二維物件追蹤結果T21與該物件清單11，此時，該六維物件姿態估測模組22判斷出該等二維物件追蹤結果T21其中之一(即新增的該正方體物件46的該二維物件追蹤結果)包含的該識別碼(ID6)未出現在該物件清單11中的該等六維姿態估測結果T12，因此，執行步驟S3，根據新增的該正方體物件46的該二維物件追蹤結果及該第二張RGB影像P2對應的該第二張深度圖D2，對該第二張RGB影像P2中新增的該正方體物件46進行六維姿態估測，而產生與新增的該正方體物件46對應的一包含該正方體物件46的該識別碼(ID6)的六維姿態估測結果T22，並將該六維姿態估測結果T22記錄(新增)在該物件清單11中，如圖7所示。Then, step S2 is performed, and the six-dimensional object posture estimation module 22 compares the two-dimensional object tracking results T21 with the object list 11. At this time, the six-dimensional object posture estimation module 22 determines that the identification code (ID6) included in one of the two-dimensional object tracking results T21 (i.e., the two-dimensional object tracking result of the newly added cube object 46) does not appear in the six-dimensional posture estimation results T12 in the object list 11. Therefore, step S3 is performed. The two-dimensional object tracking result of the cube object 46 and the second depth map D2 corresponding to the second RGB image P2 are used to perform six-dimensional posture estimation on the newly added cube object 46 in the second RGB image P2, and a six-dimensional posture estimation result T22 corresponding to the newly added cube object 46 and including the identification code (ID6) of the cube object 46 is generated, and the six-dimensional posture estimation result T22 is recorded (added) in the object list 11, as shown in FIG7 .

然後，該六維物件姿態估測模組22執行步驟S4，並判斷該物件清單11包含的該等六維姿態估測結果T12、T22包含的該識別碼皆出現在該等二維物件追蹤結果T21後，即進行步驟S6，由該六維物件姿態追蹤模組23根據該物件清單11包含的該等六維姿態估測結果T12、T22及該第二張RGB影像P2對應的該第二張深度圖D2，對該第二張RGB影像P2中的物件41~46進行六維姿態追蹤。Then, the six-dimensional object posture estimation module 22 executes step S4, and after determining that the identification codes included in the six-dimensional posture estimation results T12 and T22 included in the object list 11 all appear in the two-dimensional object tracking results T21, step S6 is performed, and the six-dimensional object posture tracking module 23 performs six-dimensional posture tracking on the objects 41~46 in the second RGB image P2 according to the six-dimensional posture estimation results T12 and T22 included in the object list 11 and the second depth map D2 corresponding to the second RGB image P2.

由此可知，本實施例會參考該RGB影像的該等二維物件追蹤結果，並且只對該RGB影像中包含在該等二維物件追蹤結果中但未出現在該物件清單11中的該或該等識別碼所對應的該或該等物件(即該RGB影像其中相較於前一張RGB影像新增的物件)進行六維姿態估測，而不會再次對同時包含在該等二維物件追蹤結果及在該物件清單11中的該或該等識別碼所對應的該或該等物件進行姿態估測，藉此，在對RGB影像進行六維物件姿態追蹤的過程中，能縮短估測六維物件姿態的時間、節省硬體運算資源而提升六維物件姿態追蹤的效能。It can be seen from this that the present embodiment will refer to the two-dimensional object tracking results of the RGB image, and only perform six-dimensional posture estimation on the object or objects corresponding to the one or more identification codes included in the two-dimensional object tracking results but not appearing in the object list 11 (i.e., the objects newly added to the RGB image compared to the previous RGB image), and will not perform posture estimation again on the one or more objects corresponding to the one or more identification codes included in both the two-dimensional object tracking results and in the object list 11. In this way, in the process of six-dimensional object posture tracking of the RGB image, the time for estimating the six-dimensional object posture can be shortened, hardware computing resources can be saved, and the performance of six-dimensional object posture tracking can be improved.

再者，如圖3所示，當第三張RGB影像P3及其對應的第三張深度圖D3輸入該處理單元2時，將再次重覆步驟S1，由該二維物件追蹤模組21對輸入的第三張RGB影像P3進行物件偵測、追蹤及分類，而對應產生一第三張追蹤結果影像T3以及與該第三張RGB影像P3中的各該物件對應一二維物件追蹤結果T31。且如圖8所示，在本實施例中，該第三張RGB影像P3相較於圖6所示的該第二張RGB影像P2少了一個該長方體物件42。因此，該等二維物件追蹤結果T31中將少了該長方體物件42的該二維物件追蹤結果。Furthermore, as shown in FIG3 , when the third RGB image P3 and its corresponding third depth map D3 are input to the processing unit 2, step S1 will be repeated again, and the two-dimensional object tracking module 21 will detect, track and classify the input third RGB image P3, and generate a third tracking result image T3 and a two-dimensional object tracking result T31 corresponding to each object in the third RGB image P3. And as shown in FIG8 , in this embodiment, the third RGB image P3 lacks one rectangular object 42 compared to the second RGB image P2 shown in FIG6 . Therefore, the two-dimensional object tracking results T31 will lack the two-dimensional object tracking results of the rectangular object 42.

然後，進行步驟S2，該六維物件姿態估測模組22比對該等二維物件追蹤結果T31與圖7所示的該物件清單11，並判斷該等二維物件追蹤結果T31包含的該等識別碼皆有出現在該物件清單11中的該等六維姿態估測結果T12、T22後，接著執行步驟S4，在步驟S4中，該六維物件姿態估測模組22判斷出圖7所示的該物件清單11包含的該等六維姿態估測結果T12、T22其中與該長方體物件42對應的六維姿態估測結果包含的該識別碼(ID2)並未出現在該等二維物件追蹤結果T31時，即進行步驟S5，將該長方體物件42對應的六維姿態估測結果從該物件清單11中移除，如圖9所示，然後，進行步驟S6，該六維物件姿態追蹤模組23根據圖9的該物件清單11包含的該等六維姿態估測結果T12、T22及該第三張RGB影像P3對應的該第三張深度圖D3，對該第三張RGB影像P3中的物件41、43~46進行六維姿態追蹤。Then, step S2 is performed, the six-dimensional object posture estimation module 22 compares the two-dimensional object tracking results T31 with the object list 11 shown in FIG. 7, and determines that the identification codes included in the two-dimensional object tracking results T31 all appear in the six-dimensional posture estimation results T12 and T22 in the object list 11, and then step S4 is performed. In step S4, the six-dimensional object posture estimation module 22 determines that among the six-dimensional posture estimation results T12 and T22 included in the object list 11 shown in FIG. 7, the six-dimensional posture estimation corresponding to the rectangular object 42 When the identification code (ID2) included in the result does not appear in the two-dimensional object tracking results T31, step S5 is performed to remove the six-dimensional posture estimation result corresponding to the rectangular object 42 from the object list 11, as shown in Figure 9, and then step S6 is performed, and the six-dimensional object posture tracking module 23 performs six-dimensional posture tracking on the objects 41, 43~46 in the third RGB image P3 according to the six-dimensional posture estimation results T12, T22 included in the object list 11 of Figure 9 and the third depth map D3 corresponding to the third RGB image P3.

由此可知，每一張輸入該處理單元2的RGB影像皆需經過上述步驟S1，並經由步驟S2判斷後接著進行步驟S3、S4或者直接進行步驟S4，再經由步驟S4判斷後進行步驟S5、S6或直接進行步驟S6。此外，如圖10所示，步驟S4、S5也可以和步驟S2、S3對調，亦即步驟S1之後可先執行步驟S4、S5再執行步驟S2、S3。It can be seen that each RGB image input to the processing unit 2 must go through the above step S1, and after being judged by step S2, proceed to step S3, S4 or directly proceed to step S4, and then proceed to step S5, S6 or directly proceed to step S6 after being judged by step S4. In addition, as shown in FIG. 10, steps S4 and S5 can also be swapped with steps S2 and S3, that is, after step S1, steps S4 and S5 can be executed first and then steps S2 and S3 can be executed.

此外，由於該六維物件姿態估測模組22是根據(參考)該二維物件追蹤模組21預先產生的該等二維物件追蹤結果T11、T21、T31產生該物件清單11中的該等六維姿態估測結果T12、T22，因此該六維物件姿態追蹤模組23根據該物件清單11中的該等六維姿態估測結果T12、T22對RGB影像進行物件姿態追蹤時會更準確。In addition, since the six-dimensional object posture estimation module 22 generates the six-dimensional posture estimation results T12, T22 in the object list 11 based on (with reference to) the two-dimensional object tracking results T11, T21, T31 pre-generated by the two-dimensional object tracking module 21, the six-dimensional object posture tracking module 23 will be more accurate when performing object posture tracking on the RGB image based on the six-dimensional posture estimation results T12, T22 in the object list 11.

綜上所述，上述實施例藉由該二維物件追蹤模組21對輸入的RGB影像進行物件偵測、追蹤及分類，產生與RGB影像中的各該物件對應的二維物件追蹤結果，並由該六維物件姿態估測模組22比對該等二維物件追蹤結果與該物件清單，以找出存在該等二維物件追蹤結果中但不存在該物件清單中之與RGB影像中的物件對應的識別碼，且只對該RGB影像中與未出現在該物件清單中的該識別碼對應的該物件進行六維姿態估測，藉此，相較於現有的六維物件姿態追蹤技術，除了節省軟硬體運算資源、縮短六維姿態估測的運算時間並加快六維物件姿態追蹤的速度及效能，確實達到本發明的功效與目的。In summary, the above embodiment uses the two-dimensional object tracking module 21 to detect, track and classify objects on the input RGB image, and generates two-dimensional object tracking results corresponding to each object in the RGB image. The six-dimensional object posture estimation module 22 compares the two-dimensional object tracking results with the object list to find out the objects that exist in the two-dimensional object tracking results but not in the object list. The object in the RGB image corresponds to an identification code, and only the object in the RGB image that corresponds to the identification code that does not appear in the object list is estimated in six dimensions. Compared with the existing six-dimensional object posture tracking technology, in addition to saving software and hardware computing resources, shortening the computing time of six-dimensional posture estimation and accelerating the speed and performance of six-dimensional object posture tracking, the effect and purpose of the present invention are indeed achieved.

惟以上所述者，僅為本發明之實施例而已，當不能以此限定本發明實施之範圍，凡是依本發明申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾，皆仍屬本發明專利涵蓋之範圍內。However, the above is only an example of the implementation of the present invention, and it should not be used to limit the scope of the implementation of the present invention. All simple equivalent changes and modifications made according to the scope of the patent application of the present invention and the content of the patent specification are still within the scope of the patent of the present invention.

S1~S6:步驟 100:六維物件姿態追蹤裝置 1:儲存單元 11:物件清單 2:處理單元 21:二維物件追蹤模組 22:六維物件姿態估測模組 23:六維物件姿態追蹤模組 P1:第一張RGB影像 D1:第一張深度圖 P2:第二張RGB影像 D2:第二張深度圖 P3:第三張RGB影像 D3:第三張深度圖 T1:第一張追蹤結果影像 T2:第二張追蹤結果影像 T3:第三張追蹤結果影像 T11、T21、T31:二維物件追蹤結果 T12、T22:六維姿態估測結果 41、42:(長方體)物件 43、46:(正方體)物件 44、45:(圓柱體)物件S1~S6: Steps 100: Six-dimensional object posture tracking device 1: Storage unit 11: Object list 2: Processing unit 21: Two-dimensional object tracking module 22: Six-dimensional object posture estimation module 23: Six-dimensional object posture tracking module P1: First RGB image D1: First depth map P2: Second RGB image D2: Second depth map P3: Third RGB image D3: Third depth map T1: First tracking result image T2: Second tracking result image T3: Third tracking result image T11, T21, T31: Two-dimensional object tracking results T12, T22: Six-dimensional posture estimation results 41, 42: (rectangular) object 43, 46: (Cube) Objects 44, 45: (Cylinder) Objects

本發明之其他的特徵及功效，將於參照圖式的實施方式中清楚地顯示，其中：圖1是本發明六維物件姿態追蹤方法的一實施例的主要流程步驟；圖2是本發明六維物件姿態追蹤裝置的一實施例包括的軟硬體方塊示意圖；圖3說明本實施例的該二維物件追蹤模組追蹤RGB影像而產生的追蹤結果；圖4說明本實施例的第一張RGB影像及其對應的第一張追蹤結果影像；圖5說明本實施例根據第一張RGB影像的該等二維物件追蹤結果更新物件清單的示意圖；圖6說明本實施例的第二張RGB影像及其對應的第二張追蹤結果影像；圖7說明本實施例根據第二張RGB影像的該等二維物件追蹤結果更新物件清單的示意圖；圖8說明本實施例的第三張RGB影像及其對應的第三張追蹤結果影像；圖9說明本實施例根據第三張RGB影像的該等二維物件追蹤結果更新物件清單的示意圖；及圖10是本實施例的另一種流程步驟。 Other features and functions of the present invention will be clearly shown in the implementation method with reference to the drawings, in which: FIG. 1 is the main process steps of an embodiment of the six-dimensional object posture tracking method of the present invention; FIG. 2 is a schematic diagram of the software and hardware blocks included in an embodiment of the six-dimensional object posture tracking device of the present invention; FIG. 3 illustrates the tracking results generated by the two-dimensional object tracking module of the present embodiment tracking the RGB image; FIG. 4 illustrates the first RGB image of the present embodiment and its corresponding first tracking result image; FIG. 5 illustrates a schematic diagram of updating the object list according to the two-dimensional object tracking results of the first RGB image of the present embodiment; FIG. 6 illustrates the second RGB image of the present embodiment and its corresponding second tracking result image; FIG. 7 is a schematic diagram illustrating the updating of the object list according to the two-dimensional object tracking results of the second RGB image in the present embodiment; FIG. 8 is a schematic diagram illustrating the third RGB image and its corresponding third tracking result image in the present embodiment; FIG. 9 is a schematic diagram illustrating the updating of the object list according to the two-dimensional object tracking results of the third RGB image in the present embodiment; and FIG. 10 is another process step of the present embodiment.

S1~S6:步驟 S1~S6: Steps

Claims

A six-dimensional object posture tracking method is applied to a six-dimensional object posture tracking device; the method comprises: A two-dimensional object tracking module of the six-dimensional object posture tracking device detects, tracks and classifies an input RGB image, and generates a two-dimensional object tracking result corresponding to each object in the RGB image, each of which includes an identification code and a bounding box of the corresponding object; A six-dimensional object posture estimation module of the six-dimensional object posture tracking device compares the two-dimensional object tracking results with an object list, and when it is determined that the identification code included in one of the two-dimensional object tracking results does not appear in the object list, the six-dimensional object posture estimation module performs six-dimensional posture estimation on the object corresponding to the identification code that does not appear in the object list in the RGB image based on the two-dimensional object tracking result to which the identification code that does not appear in the object list belongs and a depth map corresponding to the RGB image, and records the corresponding six-dimensional posture estimation result in the object list; A six-dimensional object posture tracking module of the six-dimensional object posture tracking device performs six-dimensional object posture tracking on the object in the RGB image according to all six-dimensional posture estimation results contained in the object list and the depth map corresponding to the RGB image.

A six-dimensional object posture tracking method as described in claim 1, wherein, when the six-dimensional object posture estimation module determines that the identification code included in one of the six-dimensional posture estimation results in the object list does not appear in one of the two-dimensional object tracking results, the six-dimensional object posture estimation module removes the six-dimensional posture estimation result to which the identification code that does not appear in one of the two-dimensional object tracking results belongs from the object list.

As described in claim 1, the six-dimensional object posture tracking method, wherein the two-dimensional object tracking module and the six-dimensional object posture estimation module perform asynchronous operations through a multi-thread mechanism.

A six-dimensional object pose tracking method as described in claim 1, wherein the six-dimensional object pose estimation module performs six-dimensional pose estimation on the object in the RGB image corresponding to the identification code that does not appear in the object list based on a CAD model corresponding to the object in the RGB image corresponding to the identification code that does not appear in the object list.

A six-dimensional object posture tracking method as described in claim 1, wherein the six-dimensional object posture tracking module simultaneously tracks multiple objects in the RGB image through a dynamic batch inference mechanism.

A six-dimensional object posture tracking method as described in claim 1, wherein each of the two-dimensional object tracking results also includes the corresponding class name and mask of each of the objects, and each of the six-dimensional posture estimation results includes the corresponding identification code, class name, two-dimensional posture information, six-dimensional posture information, grid, grid tensor, grid diameter and model center of the object.

A six-dimensional object posture tracking device comprises: a storage unit storing an object list; and a processing unit comprising a two-dimensional object tracking module, a six-dimensional object posture estimation module and a six-dimensional object posture tracking module; wherein the processing unit executes the two-dimensional object tracking module to detect, track and classify an input RGB image, and generates a two-dimensional object tracking result corresponding to each object in the RGB image, each of the two-dimensional object tracking results comprising an identification code and a bounding box of the corresponding object; The processing unit executes the six-dimensional object posture estimation module to compare the two-dimensional object tracking results with the object list stored in the storage unit, and when it is determined that the identification code included in one of the two-dimensional object tracking results does not appear in the object list, the six-dimensional object posture estimation module performs six-dimensional posture estimation on the object corresponding to the identification code that does not appear in the object list in the RGB image based on the two-dimensional object tracking result to which the identification code does not appear in the object list belongs and a depth map corresponding to the RGB image, and records a corresponding six-dimensional posture estimation result in the object list; The processing unit executes the six-dimensional object posture tracking module so that it performs six-dimensional object posture tracking on the object in the RGB image according to all six-dimensional posture estimation results contained in the object list and the depth map corresponding to the RGB image.

A six-dimensional object posture tracking device as described in claim 7, wherein, when the six-dimensional object posture estimation module determines that the identification code included in one of the six-dimensional posture estimation results in the object list does not appear in one of the two-dimensional object tracking results, the six-dimensional object posture estimation module removes the six-dimensional posture estimation result to which the identification code that does not appear in one of the two-dimensional object tracking results belongs from the object list.

The six-dimensional object posture tracking device as described in claim 7, wherein the two-dimensional object tracking module and the six-dimensional object posture estimation module perform asynchronous operations through a multi-thread mechanism.

A six-dimensional object posture tracking device as described in claim 7, wherein the six-dimensional object posture estimation module performs six-dimensional posture estimation on the object in the RGB image corresponding to the identification code that does not appear in the object list based on a CAD model corresponding to the object in the RGB image corresponding to the identification code that does not appear in the object list.

A six-dimensional object posture tracking device as described in claim 7, wherein the six-dimensional object posture tracking module simultaneously tracks multiple objects in the RGB image through a dynamic batch inference mechanism.

A six-dimensional object posture tracking device as described in claim 7, wherein each of the two-dimensional object tracking results also includes the corresponding class name and mask of each of the objects, and each of the six-dimensional posture estimation results includes the corresponding identification code, class name, two-dimensional posture information, six-dimensional posture information, grid, grid tensor, grid diameter and model center of the object.

A computer-readable recording medium stores a software program including a two-dimensional object tracking module, a six-dimensional object posture estimation module and a six-dimensional object posture tracking module. When the software program is loaded and executed by a computer device, the computer device can complete the six-dimensional object posture tracking method described in any one of claims 1 to 6.