TWI850858B

TWI850858B - Object recognition method of three-dimensional space and computing apparatus

Info

Publication number: TWI850858B
Application number: TW111144155A
Authority: TW
Inventors: 杜宇威; 張鈞凱
Original assignee: 杜宇威; 睿締國際科技股份有限公司
Priority date: 2022-10-06
Filing date: 2022-11-18
Publication date: 2024-08-01
Also published as: CN117893954A; TWM638980U; TW202416228A

Abstract

An object recognition method of three-dimensional (3D) space and a computing apparatus are provided. In the method, multiple sensing points in a 3D space are allocated to multiple areas. The 3D space is established by the sensing points generated by scanning a space. Multiple two-dimensional (2D) images in each area are captured. The 2D images of each area are recognized. One or more objects in the 3D space are determined according to the recognized result of those 2D images of the area. Accordingly, the object in the 3D space could be recognized.

Description

Three-dimensional space object recognition method and computing device

本發明是有關於一種物件偵測技術，且特別是有關於一種三維空間的物件辨識方法及運算裝置。The present invention relates to an object detection technology, and in particular to a three-dimensional object recognition method and computing device.

為了模擬真實空間，可以對真實空間進行掃描以產生看起來像真實空間的模擬空間。模擬空間可實現在諸如遊戲、家居佈置、機器人移動等應用。值得注意的是，雖然現今已廣泛應用二維影像辨識技術，但二維影像辨識難以通盤理解三維空間的物件辨識及標記。In order to simulate the real space, the real space can be scanned to generate a simulated space that looks like the real space. The simulated space can be realized in applications such as games, home decoration, robot movement, etc. It is worth noting that although two-dimensional image recognition technology is widely used today, it is difficult for two-dimensional image recognition to fully understand the object recognition and labeling of three-dimensional space.

有鑑於此，本發明實施例提供一種三維空間的物件辨識方法及運算裝置，將三維空間轉換成二維影像，並據以借助二維影像辨識來實現三維空間中的物件辨識。In view of this, the present invention provides a method and a computing device for object recognition in three-dimensional space, which converts three-dimensional space into two-dimensional images, and realizes object recognition in three-dimensional space by means of two-dimensional image recognition.

本發明實施例的三維空間的物件辨識方法包括(但不僅限於)下列步驟：將三維空間中的多個感測點分配至多個區域。這三維空間是由掃描空間所產生的多個感測點所建立。擷取每一個區域的多個二維影像。辨識每一個區域的那些二維影像。依據那些區域的那些二維影像的辨識結果決定三維空間中的一個或更多個物件。The object recognition method in three-dimensional space of the embodiment of the present invention includes (but is not limited to) the following steps: assigning multiple sensing points in three-dimensional space to multiple regions. The three-dimensional space is established by multiple sensing points generated by scanning space. Capturing multiple two-dimensional images of each region. Recognizing those two-dimensional images of each region. Determining one or more objects in the three-dimensional space based on the recognition results of those two-dimensional images of those regions.

本發明實施例的運算裝置包括記憶體及處理器。記憶體用以儲存程式碼。處理器耦接記憶體。處理器載入程式碼以執行：將三維空間中的多個感測點分配至多個區域，擷取每一個區域的多個二維影像，辨識每一個區域的那些二維影像，並依據那些區域的那些二維影像的辨識結果決定三維空間中的一個或更多個物件。這三維空間是由掃描空間所產生的多個感測點所建立。The computing device of the embodiment of the present invention includes a memory and a processor. The memory is used to store program codes. The processor is coupled to the memory. The processor loads the program codes to execute: assigning multiple sensing points in the three-dimensional space to multiple regions, capturing multiple two-dimensional images of each region, identifying those two-dimensional images of each region, and determining one or more objects in the three-dimensional space based on the identification results of those two-dimensional images of those regions. This three-dimensional space is established by multiple sensing points generated by scanning the space.

基於上述，依據本發明實施例的三維空間的物件辨識方法及運算裝置，將三維空間初步切割成多個區域，再分別對區域擷取二維影像，並依據二維影像的辨識結果辨識三維空間中的物件。藉此，有助於三維空間的物件辨識及理解。Based on the above, according to the method and computing device for object recognition in three-dimensional space of the embodiment of the present invention, the three-dimensional space is initially divided into multiple regions, and then two-dimensional images are captured for each region, and objects in the three-dimensional space are recognized based on the recognition results of the two-dimensional images. This helps to recognize and understand objects in three-dimensional space.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。In order to make the above features and advantages of the present invention more clearly understood, embodiments are specifically cited below and described in detail with reference to the accompanying drawings.

圖1是依據本發明一實施例的運算裝置10的元件方塊圖。請參照圖1，運算裝置10可以是手機、平板電腦、桌上型電腦、筆記型電腦、伺服器或智能助理裝置。運算裝置10包括(但不僅限於)記憶體11及處理器12。FIG1 is a block diagram of a computing device 10 according to an embodiment of the present invention. Referring to FIG1 , the computing device 10 may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a server, or an intelligent assistant device. The computing device 10 includes (but is not limited to) a memory 11 and a processor 12.

記憶體11可以是任何型態的固定或可移動隨機存取記憶體(Radom Access Memory，RAM)、唯讀記憶體(Read Only Memory，ROM)、快閃記憶體(flash memory)、傳統硬碟(Hard Disk Drive，HDD)、固態硬碟(Solid-State Drive，SSD)或類似元件。在一實施例中，記憶體11用以儲存程式碼、軟體模組、資料(例如，感測點、位置資訊、色彩資訊、辨識結果或三維模型)或檔案，其詳細內容待後續實施例詳述。The memory 11 can be any type of fixed or removable random access memory (RAM), read only memory (ROM), flash memory, traditional hard disk drive (HDD), solid-state drive (SSD) or similar components. In one embodiment, the memory 11 is used to store program code, software module, data (e.g., sensing point, location information, color information, recognition result or three-dimensional model) or file, and its details will be described in detail in the subsequent embodiments.

處理器12耦接記憶體11。處理器12可以是中央處理單元(Central Processing Unit，CPU)，或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、數位信號處理器(Digital Signal Processor，DSP)、可程式化控制器、特殊應用積體電路(Application-Specific Integrated Circuit，ASIC)或其他類似元件或上述元件的組合。在一實施例中，處理器12用以執行運算裝置10的所有或部份作業，且可載入並執行記憶體11所儲存的程式碼、軟體模組、檔案及/或資料。在一實施例中，處理器12執行本發明實施例的所有或部分操作。在一些實施例中，記憶體11所儲存的那些軟體模組或程式碼也可能是實體電路所實現。The processor 12 is coupled to the memory 11. The processor 12 may be a central processing unit (CPU), or other programmable general-purpose or special-purpose microprocessor (Microprocessor), digital signal processor (Digital Signal Processor, DSP), programmable controller, application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC) or other similar components or a combination of the above components. In one embodiment, the processor 12 is used to execute all or part of the operations of the computing device 10, and can load and execute the program code, software module, file and/or data stored in the memory 11. In one embodiment, the processor 12 executes all or part of the operations of the embodiment of the present invention. In some embodiments, the software modules or program codes stored in the memory 11 may also be implemented by physical circuits.

下文中，將搭配運算裝置10中的各項元件說明本發明實施例所述之方法。本方法的各個流程可依照實施情形而隨之調整，且並不僅限於此。Hereinafter, the method described in the embodiment of the present invention will be described in conjunction with various components in the computing device 10. The various processes of the method can be adjusted according to the implementation situation, and are not limited thereto.

圖2是依據本發明一實施例的三維空間的物件辨識方法的流程圖。請參照圖2，處理器12將三維空間中的多個感測點分配至多個區域(步驟S210)。具體而言，三維空間是由掃描(實體或虛擬)空間所產生的一個或更多個感測點所建立。感測點可反應三維空間中的物件的存在。例如，光學、雷達或聲音的掃描訊號受物件反射而產生回波，且這回波可用於決定相對於物件的位置、深度及/或方向。在一實施例中，三維空間是由感測點所形成的點雲圖。在其他實施例中，三維空間也可能是其他模型格式。此外，依據不同應用情境，物件可能是家具、家電、植栽、加工設備或裝飾品，也可以是牆、天花板或地板，但不以此為限。FIG2 is a flow chart of a method for object recognition in three-dimensional space according to an embodiment of the present invention. Referring to FIG2 , the processor 12 allocates a plurality of sensing points in the three-dimensional space to a plurality of regions (step S210). Specifically, the three-dimensional space is established by one or more sensing points generated by scanning the (real or virtual) space. The sensing points can reflect the existence of objects in the three-dimensional space. For example, an optical, radar or acoustic scanning signal is reflected by an object to generate an echo, and the echo can be used to determine the position, depth and/or direction relative to the object. In one embodiment, the three-dimensional space is a point cloud image formed by the sensing points. In other embodiments, the three-dimensional space may also be in other model formats. In addition, depending on different application scenarios, the object may be furniture, home appliances, plants, processing equipment or decorations, or it may be a wall, ceiling or floor, but is not limited thereto.

區域分配是用於將感測點分群。相同群組/區域中的感測點的特徵相近。特徵例如是相關於位置、色彩、或其他影像特徵(例如，矩形、邊界(edge)或角(corner))。一般而言，特徵相近的感測點屬於相同物件的可能性較高。Region allocation is used to group sensing points. Sensing points in the same group/region have similar features. Features are, for example, related to position, color, or other image features (e.g., rectangle, edge, or corner). Generally speaking, sensing points with similar features are more likely to belong to the same object.

在一實施例中，處理器12可依據那些感測點在(位置)三維空間中的位置資訊及在色彩空間中的色彩資訊將那些感測點在多維度空間中分群，以決定那些區域。具體而言，這多維度空間包括三維空間及色彩空間兩者的維度。三維空間中的維度例如是空間中相互垂直的三軸。色彩空間中的維度例如是RGB(紅、綠及藍)、CMYK(青、洋紅、黃及黑)或HSV(色相、飽和度、及明度)。In one embodiment, the processor 12 may group the sensing points in a multidimensional space according to the position information of the sensing points in the three-dimensional space and the color information in the color space to determine the regions. Specifically, the multidimensional space includes dimensions of both the three-dimensional space and the color space. The dimensions in the three-dimensional space are, for example, three axes perpendicular to each other in the space. The dimensions in the color space are, for example, RGB (red, green, and blue), CMYK (cyan, magenta, yellow, and black), or HSV (hue, saturation, and value).

每一個區域中的那些感測點之間的位置資訊及色彩資訊在對應空間中的距離小於距離門檻值。也就是說，這距離門檻值是用於評估多個感測點之間在位置資訊及/或色彩資訊是否相近。反應於兩感測點之間的位置資訊及色彩資訊在對應空間中的距離小於距離門檻值，處理器12可將這兩感測點分配到相同群組/區域。反應於兩感測點之間的位置資訊及色彩資訊在對應空間中的距離未小於距離門檻值，處理器12可將這兩感測點分配到不同群組/區域。位置資訊可以是三維空間對應坐標系的座標。例如，距離門檻值是在三維空間中的5公分距離。色彩資訊可以是原色、標準色或屬性的強度。例如，距離門檻值可以是色彩空間中的3階強度。然而，距離門檻值的實際值仍須依據實際需求而定義，且本發明實施例不加以限制。The distance between the position information and color information of the sensing points in each area in the corresponding space is less than the distance threshold. In other words, this distance threshold is used to evaluate whether the position information and/or color information between multiple sensing points are similar. In response to the distance between the position information and color information of two sensing points in the corresponding space being less than the distance threshold, the processor 12 may assign the two sensing points to the same group/area. In response to the distance between the position information and color information of the two sensing points in the corresponding space being not less than the distance threshold, the processor 12 may assign the two sensing points to different groups/areas. The position information may be the coordinates of a three-dimensional space corresponding coordinate system. For example, the distance threshold is a distance of 5 centimeters in three-dimensional space. The color information may be a primary color, a standard color, or the intensity of an attribute. For example, the distance threshold may be a 3rd order intensity in the color space. However, the actual value of the distance threshold still needs to be defined according to actual needs, and the embodiment of the present invention is not limited thereto.

在一實施例中，處理器12可利用諸如k-平均 (K-means)演算法、高斯混合模型(Gaussian Mixture Model，GMM)、平均平移(Mean-Shift)演算法、階層式(Hierarchical)分群法、譜(Spectral)分群演算法、DBSCAN(Density-Based Spatial Clustering of Applications with Noise)演算法或其他聚類/分群演算法，並設定距離參數(例如是前述距離門檻值或相關於距離門檻值的參數)及點數(或稱群組/區域內的點的最少數量)參數。In one embodiment, the processor 12 may utilize a k-means algorithm, a Gaussian mixture model (GMM), a mean-shift algorithm, a hierarchical clustering method, a spectral clustering algorithm, a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm, or other clustering/clustering algorithms, and set distance parameters (e.g., the aforementioned distance threshold or parameters related to the distance threshold) and point number (or the minimum number of points in a group/region) parameters.

舉例而言，圖3A是依據本發明一實施例的三維空間TS中的感測點S1~S8的示意圖。請參照圖3A，假設掃描空間後形成包括八個感測點S1~S8的三維空間TS，且感測點S1~S8的位置資訊可由相對於X、Y、Z三軸的距離所定義。圖3B是依據本發明一實施例的區域決定的示意圖。請參照圖3B，假設感測點S1~S3的距離在5公分內且皆為黑色，且感測點S4~S8的距離在8公分內且皆為紅色。因此，處理器12分配感測點S1~S3至區域A1，且分配感測點S4~S8至區域A2。區域的外型例如是以相同區域內的感測點的重心為中心所形成的幾何立體形狀或具象立體形狀。例如，球形、立方體或橄欖球形。然而，區域的外型也可能是不規則立體形狀或是由分群演算法所定義的形狀。For example, FIG3A is a schematic diagram of sensing points S1 to S8 in a three-dimensional space TS according to an embodiment of the present invention. Referring to FIG3A , it is assumed that after scanning the space, a three-dimensional space TS including eight sensing points S1 to S8 is formed, and the position information of the sensing points S1 to S8 can be defined by the distance relative to the three axes X, Y, and Z. FIG3B is a schematic diagram of area determination according to an embodiment of the present invention. Referring to FIG3B , it is assumed that the distance of sensing points S1 to S3 is within 5 centimeters and all are black, and the distance of sensing points S4 to S8 is within 8 centimeters and all are red. Therefore, the processor 12 assigns sensing points S1 to S3 to area A1, and assigns sensing points S4 to S8 to area A2. The shape of the region is, for example, a geometric three-dimensional shape or a concrete three-dimensional shape formed by the center of gravity of the sensing points in the same region, for example, a sphere, a cube, or an olive sphere. However, the shape of the region may also be an irregular three-dimensional shape or a shape defined by a clustering algorithm.

在其他實施例中，若參考更多其他特徵，則可能形成更多維度的多維度空間。此外，圖3A及圖3B所示感測點S1~S8的數量及位置僅是用於範例說明，且感測點的數量及位置仍需視實際應用情境而決定。In other embodiments, if more other features are referenced, a multi-dimensional space with more dimensions may be formed. In addition, the number and position of the sensing points S1-S8 shown in FIG3A and FIG3B are only used for example description, and the number and position of the sensing points still need to be determined according to the actual application scenario.

請參照圖2，處理器12擷取每一個區域的多個二維影像(步驟S220)。具體而言，處理器12將虛擬相機設置於三維空間中的多個觀看位置並朝向每一個區域拍攝，以擷取那些二維影像。2 , the processor 12 captures a plurality of two-dimensional images of each region (step S220 ). Specifically, the processor 12 sets the virtual camera at a plurality of viewing positions in the three-dimensional space and shoots toward each region to capture those two-dimensional images.

在一實施例中，處理器12決定某一區域的參考軸。參考軸例如是穿過區域的中心並與地面垂直(或是與重力方向平行)的假想線。然而，參考軸相對於地面的角度仍可依據需求而改變。處理器12可以這參考軸為軸心且間隔一段距離(例如，10、15或20公分，但也可能依據區域形狀動態調整距離)環繞區域，並透過虛擬相機朝區域擷取對應於多個擷取方向的多個二維影像。例如，每間隔20度，處理器12定義一個擷取方向。因此，依據參考軸旋轉，並每旋轉20度即透過虛擬相機擷取二維影像。然而，擷取方向仍可依據實際需求而變更。In one embodiment, the processor 12 determines a reference axis for a certain area. The reference axis is, for example, an imaginary line that passes through the center of the area and is perpendicular to the ground (or parallel to the direction of gravity). However, the angle of the reference axis relative to the ground can still be changed as needed. The processor 12 can circle the area with this reference axis as the axis and at intervals of a certain distance (for example, 10, 15 or 20 centimeters, but the distance may also be dynamically adjusted according to the shape of the area), and capture multiple two-dimensional images corresponding to multiple capture directions toward the area through a virtual camera. For example, the processor 12 defines a capture direction every 20 degrees. Therefore, the two-dimensional image is captured through the virtual camera every 20 degrees of rotation according to the reference axis. However, the capture direction can still be changed according to actual needs.

舉例而言，圖4是依據本發明一實施例的影像擷取的示意圖。請參照圖4，依據參考軸RS，朝擷取方向CD1可擷取二維影像IM1；朝擷取方向CD2可擷取二維影像IM2；朝擷取方向CD3可擷取二維影像IM3。For example, Fig. 4 is a schematic diagram of image capture according to an embodiment of the present invention. Referring to Fig. 4, according to reference axis RS, a two-dimensional image IM1 can be captured in the capture direction CD1; a two-dimensional image IM2 can be captured in the capture direction CD2; and a two-dimensional image IM3 can be captured in the capture direction CD3.

須說明的是，擷取影像不限於朝軸心的環繞擷取，且應用者可依據實際需求而決定擷取方式。It should be noted that the image capture is not limited to capturing in a circular manner around the axis, and the application user can determine the capture method according to actual needs.

請參照圖2，處理器12辨識每一個區域的那些二維影像(步驟S230)。具體而言，處理器12可基於神經網路的演算法(例如，YOLO(You only look once)、基於區域的卷積神經網路(Region Based Convolutional Neural Networks，R-CNN)、或快速R-CNN(Fast CNN))或是基於特徵匹配的演算法(例如，方向梯度直方圖(Histogram of Oriented Gradient，HOG)、尺度不變特徵轉換(Scale-Invariant Feature Transform，SIFT)、Harr、或加速穩健特徵(Speeded Up Robust Features，SURF)的特徵比對)辨識每一張二維影像中的物件的類型。在一實施例中，二維影像的辨識結果包括物件的類型。在一實施例中，二維影像的辨識結果包括與一個或多個物件類型的相似機率。Referring to FIG. 2 , the processor 12 identifies the two-dimensional images of each region (step S230 ). Specifically, the processor 12 may identify the type of object in each two-dimensional image based on a neural network algorithm (e.g., YOLO (You only look once), Region Based Convolutional Neural Networks (R-CNN), or Fast R-CNN) or a feature matching algorithm (e.g., Histogram of Oriented Gradient (HOG), Scale-Invariant Feature Transform (SIFT), Harr, or Speeded Up Robust Features (SURF) feature matching). In one embodiment, the recognition result of the two-dimensional image includes the type of the object. In one embodiment, the recognition result of the two-dimensional image includes a probability of similarity to one or more object types.

請參照圖2，處理器12依據那些區域的那些二維影像的辨識結果決定三維空間中的一個或更多個物件(步驟S240)。具體而言，若相同區域中的越多二維影像的辨識結果相同，則代表這區域中存在辨識結果所得出的特定物件類型的物件的可能性越高。在一實施例中，處理器12可依據那些區域中的第一區域在多個擷取方向的二維影像的辨識結果決定位於這第一區域的一個或多個物件。例如，若某一個區域中有超過特定數量的二維影像的辨識結果相同，則處理器12決定這區域存在辨識結果中的物件。Referring to FIG. 2 , the processor 12 determines one or more objects in the three-dimensional space based on the recognition results of the two-dimensional images of those regions (step S240). Specifically, if the recognition results of more two-dimensional images in the same region are the same, it means that the possibility of the existence of an object of a specific object type obtained by the recognition result in this region is higher. In one embodiment, the processor 12 may determine one or more objects located in the first region of those regions based on the recognition results of the two-dimensional images of the first region in multiple capture directions. For example, if the recognition results of more than a certain number of two-dimensional images in a certain region are the same, the processor 12 determines that the object in the recognition result exists in this region.

此外，若相鄰區域中的越多二維影像的辨識結果相同，則代表這些區域中存在辨識結果所得出的特定物件類型的物件的可能性越高。In addition, if the recognition results of more two-dimensional images in adjacent areas are the same, it means that the possibility of objects of the specific object type obtained by the recognition results existing in these areas is higher.

在一實施例中，辨識結果所得出的物件包括第一物件及第二物件。反應於第一區域的多個擷取方向的至少二者的二維影像偵測到第一物件及第二物件，處理器12可決定這第一區域中的第一物件及第二物件為相同者的機率。例如，基於神經網路的分類器決定第一物件及第二物件的相似機率，或與特定物件的影像特徵相同的比例。也就是說，在相同區域中，有兩個以上的二維影像偵測到兩個物件，處理器12將進一步判斷兩物件是否為同一者。In one embodiment, the objects obtained from the recognition result include a first object and a second object. In response to the first object and the second object being detected by at least two two-dimensional images in multiple capture directions of the first area, the processor 12 can determine the probability that the first object and the second object in the first area are the same. For example, a classifier based on a neural network determines the similarity probability of the first object and the second object, or the proportion of the same image features as a specific object. In other words, in the same area, if two or more two-dimensional images detect two objects, the processor 12 will further determine whether the two objects are the same.

處理器12比較這機率及機率門檻值，以取得比較結果。這機率門檻值可基於機器學習演算法而更新。而比較結果例如是機率大於機率門檻值、以及機率未大於機率門檻值。The processor 12 compares the probability with the probability threshold to obtain a comparison result. The probability threshold can be updated based on a machine learning algorithm. The comparison result is, for example, that the probability is greater than the probability threshold and that the probability is not greater than the probability threshold.

接著，處理器12可依據比較結果判斷第一物件及第二物件為相同者。若機率大於機率門檻值，則處理器12判斷第一物件及第二物件為相同者。若機率未大於機率門檻值，則處理器12判斷第一物件及第二物件為不同者。例如，處理器12自四張二維影像分別偵測到桌面與四個桌腳，因此處理器12可判斷桌面與四個桌腳為相同者。即，桌。Then, the processor 12 can determine that the first object and the second object are the same according to the comparison result. If the probability is greater than the probability threshold, the processor 12 determines that the first object and the second object are the same. If the probability is not greater than the probability threshold, the processor 12 determines that the first object and the second object are different. For example, the processor 12 detects the table top and the four table legs from the four two-dimensional images respectively, so the processor 12 can determine that the table top and the four table legs are the same. That is, the table.

綜上所述，在本發明實施例的三維空間的物件辨識方法及運算裝置中，對三維空間擷取影像以取得不同擷取方向的二維影像，並依據二維影像的辨識結果決定三維空間中的物件。藉此，可透過二維影像辨識技術理解三維空間中的物件。In summary, in the three-dimensional space object recognition method and computing device of the present invention, images are captured in the three-dimensional space to obtain two-dimensional images in different capture directions, and objects in the three-dimensional space are determined based on the recognition results of the two-dimensional images. In this way, objects in the three-dimensional space can be understood through two-dimensional image recognition technology.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed as above by the embodiments, they are not intended to limit the present invention. Any person with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be defined by the scope of the attached patent application.

10:運算裝置 11:記憶體 12:處理器 S210~S240:步驟 TS:三維空間 S1~S8:感測點 A1、A2:區域 X、Y、Z:軸 RS:參考軸 CD1~CD3:擷取方向 IM1~IM3:二維影像 10: Computing device 11: Memory 12: Processor S210~S240: Steps TS: Three-dimensional space S1~S8: Sensing points A1, A2: Area X, Y, Z: Axes RS: Reference axis CD1~CD3: Capture direction IM1~IM3: Two-dimensional image

圖1是依據本發明一實施例的運算裝置的元件方塊圖。圖2是依據本發明一實施例的三維空間的物件辨識方法的流程圖。圖3A是依據本發明一實施例的三維空間中的感測點的示意圖。圖3B是依據本發明一實施例的區域決定的示意圖。圖4是依據本發明一實施例的影像擷取的示意圖。 FIG. 1 is a block diagram of a computing device according to an embodiment of the present invention. FIG. 2 is a flow chart of a method for object recognition in a three-dimensional space according to an embodiment of the present invention. FIG. 3A is a schematic diagram of sensing points in a three-dimensional space according to an embodiment of the present invention. FIG. 3B is a schematic diagram of region determination according to an embodiment of the present invention. FIG. 4 is a schematic diagram of image capture according to an embodiment of the present invention.

S210~S240:步驟 S210~S240: Steps

Claims

A method for object recognition in three-dimensional space includes: generating a three-dimensional space based on a plurality of sensing points generated by scanning a space through a processor, wherein the three-dimensional space is in the form of a three-dimensional model; grouping the sensing points in the three-dimensional space according to their positions in the three-dimensional space and color information in a color space through the processor, so as to allocate the sensing points to a plurality of regions according to a grouping result, wherein each sensing point determines its position information through an echo generated by an object reflected by a scanning signal, the color information is the intensity in the color space, and the grouping result is the distance between its position information and its color information in the corresponding space. Two sensing points less than a distance threshold are allocated to an area, and each area has a three-dimensional shape in the three-dimensional space; a virtual camera is set in the three-dimensional space and photographed toward each area by the processor to capture multiple two-dimensional images of each area, including: determining a reference axis of the area; and circling around the reference axis and capturing the two-dimensional images corresponding to multiple capture directions toward the area by the virtual camera; identifying the two-dimensional images of each area by the processor; and determining at least one object in the three-dimensional space by the processor according to the recognition results of the two-dimensional images of the areas.

The object recognition method for three-dimensional space as described in claim 1, wherein the step of assigning the sensing points in the three-dimensional space to the regions includes: grouping the sensing points in a multi-dimensional space according to the position information of the sensing points in the three-dimensional space and the color information in the color space to determine the regions, wherein the multi-dimensional space includes the dimensions of both the three-dimensional space and the color space, and the distance between the position information and the color information of the sensing points in each region in the corresponding space is less than a distance threshold value.

The object recognition method for three-dimensional space as described in claim 1, wherein the step of determining the at least one object in the three-dimensional space based on the recognition results of the two-dimensional images of the regions includes: determining the at least one object located in a first region among the regions based on the recognition results of the two-dimensional images in the capture directions.

The object recognition method for three-dimensional space as described in claim 3, wherein the at least one object includes a first object and a second object, and the step of determining the at least one object in the three-dimensional space based on the recognition results of the two-dimensional images of the regions includes: detecting the first object and the second object based on the two-dimensional images of at least two of the capture directions of the first region, determining a probability that the first object and the second object in the first region are the same; comparing the probability with a probability threshold to obtain a comparison result; and judging that the first object and the second object are the same based on the comparison result.

A computing device includes: a memory for storing a program code; and a processor coupled to the memory and loaded with the program code to execute: generating a three-dimensional space based on a plurality of sensing points generated by scanning a space, wherein the three-dimensional space is in the form of a three-dimensional model; grouping the sensing points in the three-dimensional space according to their positions in the three-dimensional space and color information in a color space, so as to allocate the sensing points to a plurality of regions according to a grouping result, wherein each sensing point determines its position information through an echo generated by an object reflected by a scanning signal, the color information is the intensity in the color space, and the grouping result is its position information. and the distance between the two sensing points and their color information in the corresponding space is less than a distance threshold, and each of the regions has a three-dimensional shape in the three-dimensional space; a virtual camera is set in the three-dimensional space and shoots toward each of the regions to capture multiple two-dimensional images of each of the regions, wherein the processor is further used to: determine a reference axis of the region; and capture the two-dimensional images corresponding to multiple capture directions toward the region with the virtual camera around the reference axis; identify the two-dimensional images of each of the regions; and determine at least one object in the three-dimensional space according to the recognition results of the two-dimensional images of the regions.

The computing device as described in claim 5, wherein the processor is further used to: group the sensing points in a multi-dimensional space according to the position information of the sensing points in the three-dimensional space and the color information in the color space to determine the regions, wherein the multi-dimensional space includes the dimensions of both the three-dimensional space and the color space, and the distance between the position information and the color information of the sensing points in each region in the corresponding space is less than a distance threshold value.

The computing device as described in claim 5, wherein the processor is further used to: determine the at least one object located in a first area among the areas according to the recognition result of the two-dimensional image in the capture directions of the first area.

The computing device as described in claim 7, wherein the at least one object includes a first object and a second object, and the processor is further used to: detect the first object and the second object based on the two-dimensional images of at least two of the capture directions of the first area, determine a probability that the first object and the second object in the first area are the same; compare the probability with a probability threshold to obtain a comparison result; and determine that the first object and the second object are the same according to the comparison result.