TWI746417B

TWI746417B - Computer vision positioning method and device

Info

Publication number: TWI746417B
Application number: TW110131247A
Authority: TW
Inventors: 關鑫; 林孝先
Original assignee: 鑫行動股份有限公司
Priority date: 2020-04-14
Filing date: 2020-04-14
Publication date: 2021-11-11
Also published as: TW202144745A

Abstract

本申請提供一種電腦視覺定位方法，包含：進行姿態與動態的偵測；取得在一地點拍攝的多張照片；將該多張照片輸入到訓練完成的一深度神經網路模型進行推論，以得到每一張該照片是否相應到一定位區域內的一個位置；根據該多個相應的位置進行定位；取得在另一地點拍攝的一或多張第二照片的一或多個相應的第二位置時，根據該一或多個相應的第二位置進行定位，取得一第二定位位置；以及根據該定位位置、該第二定位位置以及姿態與動態的偵測結果，決定校正後的一定位位置。The present application provides a computer vision positioning method, including: performing posture and motion detection; obtaining multiple photos taken at one location; inputting the multiple photos into a trained deep neural network model for inference, so as to obtain Whether each photo corresponds to a location in a positioning area; locate according to the multiple corresponding locations; obtain one or more corresponding second locations of one or more second photos taken at another location When, perform positioning according to the one or more corresponding second positions to obtain a second positioning position; and determine a corrected positioning position according to the positioning position, the second positioning position, and the posture and dynamic detection results .

Description

Computer vision positioning method and device

本發明係關於定位，特別係關於電腦視覺定位。The present invention relates to positioning, especially to computer vision positioning.

定位與導航是現代人日常生活當中經常使用電子產品執行的功能之一。現代的智慧型手機多半具有無線電導航的功能，例如有衛星信號定位晶片、行動電話通信晶片、區域無線網路通信晶片。藉由衛星發出的導航信號、無線基地台所發出的基地台編號、與/或區域無線網路基地台的無線服務名稱，可以利用無線電來進行導航。Positioning and navigation is one of the functions that modern people often use electronic products to perform in daily life. Most modern smart phones have radio navigation functions, such as satellite signal positioning chips, mobile phone communication chips, and local wireless network communication chips. With the navigation signal sent by the satellite, the base station number sent by the wireless base station, and/or the wireless service name of the local wireless network base station, the radio can be used for navigation.

但如果想要達到更高的精度，或是在接收不到無線電信號、抑或在無線電傳播多徑的地方進行定位時，就必須仰賴波長更短的紅外線或可見光波段的電腦影像來進行定位。在另一種情況下，使用者的移動電子設備可能非常輕薄，無法裝置上述的無線電接收晶片與天線。或者所在的環境與場所不允許使用者攜帶具有無線電通信功能的晶片，例如防護嚴密的軍營與工廠等，但卻又需要定位與導航的功能。But if you want to achieve higher accuracy, or when you can't receive radio signals, or when radio propagation is multipath, you must rely on computer images with shorter wavelengths of infrared or visible light for positioning. In another case, the user's mobile electronic equipment may be very thin and light, and cannot be equipped with the above-mentioned radio receiving chip and antenna. Or the environment and place where it is located does not allow users to carry chips with radio communication functions, such as tightly-protected military camps and factories, but they also need the functions of positioning and navigation.

據此，亟需一種能夠在不需要無線電定位的電腦視覺定位機制，至少可以達到室內導航的精度，還能實現在移動電子設備以方便使用者的應用。Accordingly, there is an urgent need for a computer vision positioning mechanism that does not require radio positioning, at least achieves the accuracy of indoor navigation, and can be implemented in mobile electronic devices to facilitate user applications.

根據本申請的一實施例，提供一種電腦視覺定位方法，包含：進行姿態與動態的偵測；取得在一地點拍攝的多張照片，其中該多張照片的每一張照片對應到不同的方位角，該方位角資訊是由該偵測步驟所提供；將該多張照片輸入到訓練完成的一深度神經網路模型進行推論，以得到每一張該照片是否相應到一定位區域內的一個位置；以及當該推論步驟得到該定位區域內的多個相應的位置時，根據該多個相應的位置進行定位。According to an embodiment of the present application, a computer vision positioning method is provided, including: performing posture and motion detection; obtaining multiple photos taken at one location, wherein each photo of the multiple photos corresponds to a different orientation Angle, the azimuth angle information is provided by the detection step; input the multiple photos into a trained deep neural network model for inference, so as to obtain whether each photo corresponds to a location in a localization area Position; and when the inference step obtains a plurality of corresponding positions in the positioning area, positioning is performed according to the plurality of corresponding positions.

根據本申請的一實施例，提供一種電腦視覺定位裝置，包含：一攝像模組，用於拍攝照片；一姿態與動態偵測模組，用於進行姿態與動態的偵測；以及一中央處理器模組，用於執行非揮發性記憶體當中存儲的程式碼，用於執行下列步驟：令該姿態與動態偵測模組進行姿態與動態的偵測；取得該攝像模組在一地點拍攝的多張照片，其中該多張照片的每一張照片對應到不同的方位角資訊，該方位角資訊是由該偵測步驟所提供；將該多張照片輸入到訓練完成的一深度神經網路模型進行推論，以得到每一張該照片是否相應到一定位區域內的一個位置；以及當該推論步驟得到該定位區域內的多個相應的位置時，根據該多個相應的位置進行定位。According to an embodiment of the present application, a computer vision positioning device is provided, which includes: a camera module for taking photos; a posture and motion detection module for posture and motion detection; and a central processing unit The device module is used to execute the code stored in the non-volatile memory, and is used to perform the following steps: make the attitude and motion detection module perform attitude and motion detection; obtain the camera module to shoot at a location Multiple photos of, where each photo of the multiple photos corresponds to different azimuth information, the azimuth information is provided by the detection step; input the multiple photos to a trained deep neural network The road model performs inference to obtain whether each photo corresponds to a position in a positioning area; and when the inference step obtains multiple corresponding positions in the positioning area, perform positioning according to the multiple corresponding positions .

根據本申請所提供的電腦視覺定位裝置與方法，可以提供一種能夠在不需要無線電定位的電腦視覺定位機制，可以達到室內導航的精度，還能實現在移動電子設備以方便使用者的應用。According to the computer vision positioning device and method provided in the present application, a computer vision positioning mechanism that does not require radio positioning can be provided, the accuracy of indoor navigation can be achieved, and it can also be implemented in mobile electronic equipment to facilitate user applications.

本申請提供了一種電腦視覺定位裝置與其定位方法，可以實現在移動型的電子設備之上。在進行定位之前，該電腦視覺定位裝置需要先載入定位地圖相關的資料，才能在該定位地圖內進行電腦視覺定位。以下將要解說如何製作定位地圖資料。地圖資料可以簡稱為圖資。This application provides a computer vision positioning device and a positioning method thereof, which can be implemented on mobile electronic equipment. Before positioning, the computer vision positioning device needs to load the data related to the positioning map before it can perform computer vision positioning in the positioning map. The following will explain how to make location map data. Map data can be referred to as map data for short.

請參考圖1所示，其根據本申請一實施例的電腦視覺定位地圖準備的一示意圖。圖1顯示一張定位區域100的俯瞰示意圖。如圖1所示，該定位區域100可以位在建築物的內部，特別是無線電波難以穿透的建築物。但本申請的定位區域100並不限定是在建築物之內，也可以在建築物之外，或是其他具有不同視覺特徵的區域。Please refer to FIG. 1, which is a schematic diagram of a computer vision positioning map prepared according to an embodiment of the present application. FIG. 1 shows a schematic diagram of a bird's-eye view of the positioning area 100. As shown in FIG. 1, the positioning area 100 may be located inside a building, especially a building that is difficult for radio waves to penetrate. However, the positioning area 100 of the present application is not limited to being inside a building, it can also be outside the building, or other areas with different visual characteristics.

在準備該定位地圖資料時，需要在該定位區域100內拍攝多張照片。在圖1所示的實施例當中，至少包含四個目標物110、120、130、140。在定位區域100的四個目標物110~140，都可以對應到一組座標。本申請並不限定座標系的大小與其原點。但在一實施例當中，如果使用局部座標系的話，其原點或一特定點可以具有一個廣義座標系的座標，以便在局部和廣義座標系進行座標的轉換。上述的廣義座標系可以是多種衛星定位系統所使用的座標系其中之一。When preparing the location map data, multiple photos need to be taken in the location area 100. In the embodiment shown in FIG. 1, at least four targets 110, 120, 130, and 140 are included. The four targets 110 to 140 in the positioning area 100 can all correspond to a set of coordinates. This application does not limit the size of the coordinate system and its origin. However, in one embodiment, if a local coordinate system is used, its origin or a specific point may have coordinates in a generalized coordinate system, so that the coordinates can be converted between the local and generalized coordinate system. The aforementioned generalized coordinate system may be one of the coordinate systems used by various satellite positioning systems.

定位區域100內的目標物可以是特定的某個物品，例如一隻蟠龍花瓶、一張樹頭藝術桌、一隻哥吉拉玩具模型。目標物可以是軸對稱、面對稱的物品，比方說是軸對稱的圓桌，或是面對稱的方桌。目標物也可以是完全不對稱的物品。定位區域100內的目標物可以是一個場景，例如是一個壁面、一個走廊、一個窗景。所謂的場景是一些物品的集合，具有固定的相對位置。The target object in the positioning area 100 may be a specific object, such as a beaulieu vase, a tree head art table, and a Godzilla toy model. The target can be an axisymmetric or surface symmetric object, such as an axisymmetric round table or a square table with surface symmetry. The target can also be a completely asymmetrical object. The target in the positioning area 100 may be a scene, for example, a wall, a corridor, or a window scene. The so-called scene is a collection of some objects, with a fixed relative position.

想要讓目標物被辨識，可以先在定位區域100內拍攝一或多張關於某目標物的照片，使得人工智慧的神經網路模型能夠學習到該目標物。當目標物是對稱的物品時，這些關於某目標物的照片形成的照片集，可以是在一個範圍當中向該目標物拍攝。舉例來說，當目標物是一隻蟠龍花瓶，則上述照片集當中的照片可以是在花瓶周遭兩公尺之內的區域範圍所拍攝。To allow the target to be identified, one or more pictures of a target can be taken in the positioning area 100 first, so that the neural network model of artificial intelligence can learn the target. When the target object is a symmetrical object, the photo collection formed by the photos of a certain target object may be taken at the target object in a range. For example, when the target object is a beaulieu vase, the photos in the above photo collection can be taken within an area of two meters around the vase.

當目標物是一隻哥吉拉玩具模型時，可以將該哥吉拉玩具模型視為兩個在相同位置的目標物面向。第一個目標物面向是從靠近頭端的位置所看到的哥吉拉玩具模型。第二個目標物面向是從靠近尾端的位置所看到的哥吉拉玩具模型。雖然是同一個真實目標物，但所對應的照片集不同，所辨識到目標物面向也可以不同。第一個目標物面向可以對應到靠近頭端的第一個區域範圍。第二個目標物面向可以對應到靠近尾端的第二個區域範圍。When the target is a Godzilla toy model, the Godzilla toy model can be regarded as two target faces at the same position. The first target face is the Godzilla toy model seen from a position near the head end. The second target object is the Godzilla toy model seen from near the end. Although it is the same real target, the corresponding photo sets are different, and the identified target faces can also be different. The first target face can correspond to the first area near the head end. The second target object can correspond to the second area near the end.

例如，在圖1所示的目標物140可以具有三個不同的面向。每一個面向對應到一個區域範圍。如目標物140對應到三個區域範圍141、142與143。本領域普通技術人員可以理解到，這三個面向與其對應的區域範圍可以是彼此不重疊的，也可以是重疊的。例如，區域範圍141與142是不重疊的。反之，區域範圍141與143就有部分範圍是重疊的。For example, the target 140 shown in FIG. 1 may have three different aspects. Each aspect corresponds to a regional range. For example, the target 140 corresponds to three area ranges 141, 142, and 143. A person of ordinary skill in the art can understand that the three aspects and their corresponding area ranges may not overlap each other, or may overlap. For example, the area ranges 141 and 142 do not overlap. On the contrary, the area ranges 141 and 143 overlap in some areas.

目標物120與130，由於它們是軸對稱的，分別只有單一個面向，也只分別對應到區域範圍121與131。目標物110是一個壁面窗景，由於並沒有方法從反向來識別目標物110，因此目標物110也只有一個面向，對應到一個區域範圍111。The targets 120 and 130, because they are axisymmetric, only have a single face, and only correspond to the area ranges 121 and 131 respectively. The target 110 is a wall window scene. Since there is no method to identify the target 110 from the reverse direction, the target 110 also has only one face, which corresponds to a region 111.

在圖1的實施例當中，只顯示六個區域範圍，然而本領域普通技術人員可以理解到，這是為了簡化說明的緣故。如果想要在該建築物的每一個房間或區域進行定位，至少需要在每一個房間或走道區域當中挑選一個目標物，為其拍攝一組照片集。舉例來說，圖1的目標物位置110是一個具有三面壁的空間當中，左右兩方的牆壁並沒有任何裝飾物品，難以辨識。而在上方的牆壁面有窗戶與裝置，適合用來作為電腦視覺辨識。所以定位地圖的準備人員可以在目標物位置110的地方，朝向北方拍攝一組照片集。該照片集所包含的照片可以是在區域範圍111當中拍攝。但本申請並不限定，對神經網絡模型進行訓練時的照片集當中的每一張照片，全部都是在該區域範圍當中拍攝。某個目標物未必需要對應到一個區域範圍。換言之，目標物的面向與區域範圍是可選的資料，但目標物必然對應到圖資上的一個位置。In the embodiment of FIG. 1, only six area ranges are shown, but those of ordinary skill in the art can understand that this is for the sake of simplifying the description. If you want to locate in each room or area of the building, at least you need to pick a target in each room or aisle area and take a set of photos for it. For example, the target position 110 in FIG. 1 is in a space with three walls, and the left and right walls do not have any decorative objects, which are difficult to identify. There are windows and installations on the upper wall, suitable for computer vision identification. Therefore, the preparer of the positioning map can take a set of photos at the position 110 of the target and toward the north. The photos included in the photo collection may be taken in the area range 111. However, this application is not limited, and every photo in the photo set when the neural network model is trained is all taken within the region. A certain target does not necessarily need to correspond to an area range. In other words, the orientation and area of the target are optional data, but the target must correspond to a position on the map.

除此之外，圖1所示實施例的區域範圍111~143都是矩形。但本申請並不限定區域範圍必然是矩形，與某目標物相對應的區域範圍可以是圓形、橢圓形、扇形、多角形。可以是對稱，可以是不對稱。In addition, the area ranges 111 to 143 of the embodiment shown in FIG. 1 are all rectangular. However, this application does not limit the area range to necessarily be rectangular, and the area range corresponding to a certain target object can be a circle, an ellipse, a sector, and a polygon. It can be symmetrical or asymmetrical.

再者，圖1所示實施例的區域範圍111~143都包含了其所對應目標物的位置。但本申請並不要求區域範圍必須要涵蓋到目標物的位置。舉例來說，當前述的蟠龍花瓶有兩個人高時，靠得太近反而無法辨識出來。可能需要離一公尺以上的距離，才能辨識出花瓶來。Furthermore, the area ranges 111 to 143 of the embodiment shown in FIG. 1 all include the position of the corresponding target. However, this application does not require that the area must cover the location of the target. For example, when the aforementioned Beaulieu vase is two people tall, it cannot be distinguished if it is too close. It may take a distance of more than one meter to identify the vase.

每組照片集當中的各張照片的拍攝範圍、拍攝時間、焦距、對比、明暗、景深等各種參數可以是不同的。舉例來說，可以在晨昏、中午、晚上進行拍攝。目標物110場景中的窗戶就會呈現不同的景色，而且室內的光線變化也就有了差異。例如當目標物場景中的桌上有一個杯子。當在不同的時間拍攝照片，杯子所擺放的位置可能有所不同。甚至有可能出現多個杯子或沒有杯子的情況。而室內的檯燈燈光也可能有明暗的變化。The shooting range, shooting time, focal length, contrast, brightness, depth of field and other parameters of each photo in each group of photo collections can be different. For example, you can shoot in the morning and dusk, noon, and night. The windows in the scene of the target 110 will present different views, and the light changes in the room will also be different. For example, when there is a cup on the table in the target scene. When taking photos at different times, the positions of the cups may be different. There may even be multiple cups or no cups. The lighting of the indoor desk lamp may also vary in brightness.

由於人類腦部可以濾除掉上述的變化，而認知到這些照片是在該目標物對應的區域範圍內所拍攝。但電腦要有相同程度的認知，必須靠較大的照片數量來進行學習。在本申請當中，未必是使用可見光波段的照片，可以使用包含部分或全部是不可見光的波段來拍攝照片，例如紅外線波段或紫外線波段等。也可以使用較窄的可見光波段來拍攝照片，例如黑白照片等。但拍攝照片的波段要能夠配合定位裝置上的攝像模組，使得兩者具有相同的波段。Since the human brain can filter out the above-mentioned changes, it recognizes that these photos are taken within the region corresponding to the target. But for computers to have the same degree of cognition, they must rely on a larger number of photos for learning. In this application, it is not necessary to use a photo in the visible light waveband, and a photo can be taken using a waveband that contains part or all of invisible light, such as infrared waveband or ultraviolet waveband. You can also use a narrower visible light band to take photos, such as black and white photos. However, the band of the photo taken must be able to match the camera module on the positioning device, so that the two have the same band.

當收集了多組照片集之後，或者再收集了額外的區域範圍之後，就可以讓電腦進行視覺學習。請參考圖2所示，其為根據本申請一實施例的電腦視覺定位地圖準備方法200的一流程示意圖。圖2所示的流程示意圖可以是由普通計算機執行的程式碼，或者是具有硬體加速功能的特殊計算機執行的程式碼，以便增進人工智慧模組的學習速度。After collecting multiple sets of photos, or after collecting additional areas, the computer can be used for visual learning. Please refer to FIG. 2, which is a schematic flowchart of a method 200 for preparing a computer vision positioning map according to an embodiment of the present application. The flowchart shown in FIG. 2 can be a program code executed by an ordinary computer, or a program code executed by a special computer with a hardware acceleration function, so as to improve the learning speed of the artificial intelligence module.

本領域普通技術人員可以理解到，人工智慧模組有許多種類型，目前較為常用的是深度神經網路(Deep Neural Network, DNN)模型。而在深度神經網路模型當中，也有各式各樣的層數、連接方式、權重、每一層所使用的濾波器。本申請並不限定採用哪一種類型的深度神經網路模型，只要是能夠在監督式學習之後得到一定正確率以上的模型，都可以作為本申請所使用的深度神經網路模型。此外，本申請也可以不限定在深度神經網路模型以基礎的人工智慧模組。A person of ordinary skill in the art can understand that there are many types of artificial intelligence modules, and a Deep Neural Network (DNN) model is currently more commonly used. In the deep neural network model, there are also various layers, connection methods, weights, and filters used in each layer. This application does not limit which type of deep neural network model to use, as long as it is a model that can obtain a certain accuracy rate or more after supervised learning, it can be used as the deep neural network model used in this application. In addition, this application may not be limited to artificial intelligence modules based on deep neural network models.

步驟210：提供關於多個目標物的多個照片集。Step 210: Provide multiple photo collections about multiple targets.

步驟220：標記該多個目標物的位置，或者再額外標記該多個目標物對應的多個區域範圍。Step 220: Mark the positions of the multiple targets, or additionally mark multiple regions corresponding to the multiple targets.

步驟230：提供深度神經網路模型。Step 230: Provide a deep neural network model.

步驟240：根據做過標記的該多個照片集，對該深度神經網路模型進行監督式訓練。換言之，此監督式訓練的目的在於讓該深度神經網路模型可以學習到每一個照片集與其對應的位置與區域範圍之連結關係。Step 240: Perform supervised training on the deep neural network model according to the multiple labeled photo sets. In other words, the purpose of this supervised training is to allow the deep neural network model to learn the connection relationship between each photo set and its corresponding location and area range.

步驟250：對該深度神經網路模型進行測試。例如，用未標記的圖片輸入該深度神經網路模型進行推論，測試其推論結果是否與該圖片當中的目標物相符。可以反覆進行多項測試，以便測試其成功率。Step 250: Test the deep neural network model. For example, input an unlabeled picture into the deep neural network model to make an inference, and test whether the inference result is consistent with the target in the picture. Multiple tests can be repeated to test its success rate.

步驟260：判斷成功率是否超過一預設值。當成功率高於該預設值時，流程走向步驟270。否則，流程走回步驟210，可能需要提供更多張的圖片，再深化學習。Step 260: Determine whether the success rate exceeds a preset value. When the success rate is higher than the preset value, the process goes to step 270. Otherwise, the process goes back to step 210, and it may be necessary to provide more pictures to deepen the learning.

步驟270：訓練完成。可以保存該深度神經網路模型，以便複製到電腦視覺定位裝置之上。Step 270: The training is completed. The deep neural network model can be saved for copying to the computer vision positioning device.

請參考圖3所示，其為根據本申請一實施例的電腦視覺定位方法的一示意圖。當使用者利用一電腦視覺定位裝置進入到定位區域100中進行定位時，例如在實際位置310的地方，可以對周遭環境進行水平360度的環景照相或錄影。該電腦視覺定位裝置也可以在相同或相近的位置，對著不同方位角拍攝多張照片。該電腦視覺定位裝置可以包含一姿態及動態偵測模組。舉例來說，該姿態及動態偵測模組可以包含微機電裝置實作的加速度計、角加速度計、磁力計、陀螺儀等裝置其中之一，或者是其任意組合。當該電腦視覺定位裝置開始進行環景照相時，可以透過該姿態及動態偵測模組所偵測到的數據知道該環景照片一開始所對應的方位角。Please refer to FIG. 3, which is a schematic diagram of a computer vision positioning method according to an embodiment of the present application. When the user uses a computer vision positioning device to enter the positioning area 100 for positioning, for example, at the actual location 310, a horizontal 360-degree panoramic view of the surrounding environment can be photographed or recorded. The computer vision positioning device can also take multiple photos at different azimuths at the same or similar locations. The computer vision positioning device can include a posture and motion detection module. For example, the posture and motion detection module may include one of the accelerometer, angular accelerometer, magnetometer, gyroscope, etc. implemented by the micro-electromechanical device, or any combination thereof. When the computer vision positioning device starts to take a picture of the surrounding scene, it can know the azimuth angle of the surrounding scene at the beginning through the data detected by the posture and motion detection module.

在一實施例當中，該電腦視覺定位裝置可以包含兩個或兩個以上的廣角攝像鏡頭，可以同時拍攝多張具有不同方位角的照片。In one embodiment, the computer vision positioning device may include two or more wide-angle camera lenses, which can take multiple photos with different azimuth angles at the same time.

在圖3所示的實施例當中，該電腦視覺定位裝置在位置310所拍攝的多張具有對應方位角資訊的照片，被輸入到已經完成監督學習的深度神經網路模型來進行推論，用來判斷每一張照片是否能夠符合四個目標物110~140當中的一個目標物。In the embodiment shown in FIG. 3, multiple photos with corresponding azimuth angle information taken by the computer vision positioning device at position 310 are input into the deep neural network model that has completed supervised learning for inference. Determine whether each photo can meet one of the four targets 110~140.

在使用位置310所拍攝的環景照片或具有不同方位角的多張照片比對到照片組111時，由於距離較遠，該深度神經網路模型無法推論出照片組111。或者是說，和照片組111相比，其相符的信心程度並未超過門檻值。When the panoramic photos taken at the location 310 or multiple photos with different azimuth angles are compared to the photo group 111, the deep neural network model cannot infer the photo group 111 due to the long distance. In other words, compared with the photo group 111, its degree of confidence in conformity did not exceed the threshold.

當有兩張不同方位角的照片分別比對到目標物120與140時，該電腦視覺定位裝置可以從圖資找到目標物120與140的兩個位置。在一實施例當中，可以將這兩個目標物120與140的連線330當中，取出一個目前位置。舉例來說，目前位置可以在該目標物120與140的中間點。When two photos with different azimuth angles are compared to the targets 120 and 140, the computer vision positioning device can find the two positions of the targets 120 and 140 from the images. In one embodiment, a current position can be taken out of the connection line 330 of the two targets 120 and 140. For example, the current position may be between the target 120 and 140.

在另一實施例當中，當目標物120與140具有額外的區域範圍121與143的資訊時，可以取出該區域範圍121與143的交集區域，亦即區域範圍320。可以認為目前位置是在該區域範圍320當中。In another embodiment, when the targets 120 and 140 have additional information of the area ranges 121 and 143, the intersection area of the area ranges 121 and 143, that is, the area range 320, can be extracted. It can be considered that the current position is in the area range 320.

在更一實施例當中，可以進一步計算該連線330是否在該交集的區域範圍320當中。如果判斷結果為是的話，可以在區域範圍320當中的連線330當中取出一個目前位置。舉例來說，是在區域範圍320內的連線330當中，最靠近連線330之中點的一個點。In a further embodiment, it can be further calculated whether the connection line 330 is in the area range 320 of the intersection. If the result of the judgment is yes, a current position can be retrieved from the connection 330 in the area range 320. For example, it is a point closest to the midpoint of the line 330 among the lines 330 in the area range 320.

換言之，可以根據至少兩個符合的目標物所對應的位置，或者更根據目標物所對應的區域範圍，來找出目前位置或目前位置所在的區域範圍。In other words, the current position or the area range in which the current position is located can be found according to the positions corresponding to at least two matching targets, or more according to the area ranges corresponding to the targets.

請參考圖4所示，其為為根據本申請一實施例的電腦視覺定位方法之三角形定位步驟的一示意圖。在使用位置450所拍攝的具有不同方位角的多張照片比對到目標物410~430時，其相符的信心程度分別超過門檻值。因此，該深度神經網路模型可以推論出位置430所拍攝的具有不同方位角的多張照片與位置410、420與430相關。位置410、420、430可以形成三角形470。Please refer to FIG. 4, which is a schematic diagram of the triangle positioning step of the computer vision positioning method according to an embodiment of the present application. When multiple photos with different azimuth angles taken at the location 450 are compared to the targets 410 to 430, their conforming confidence levels exceed the threshold value respectively. Therefore, the deep neural network model can infer that multiple photos taken at location 430 with different azimuth angles are related to locations 410, 420, and 430. The positions 410, 420, 430 may form a triangle 470.

在一實施例中，假設當照片所相關的目標物的位置超過三個時，可以先挑選出信心程度或相符程度最高的三個。接著，再使用三個位置來取得一個三角形470。最後，可以利用該三角形來取得一個定位位置450。In one embodiment, it is assumed that when there are more than three positions of objects related to the photo, the three with the highest degree of confidence or conformity can be selected first. Then, three positions are used to obtain a triangle 470. Finally, the triangle can be used to obtain a positioning position 450.

在一實施例中，該定位位置450可以是三角形470的內心、外心、重心、垂心的其中之一。在一實施例中，該定位位置450可以與上述四心相關。舉例來說，可以是任兩個心的中間點。In an embodiment, the positioning position 450 may be one of the inner center, the outer center, the center of gravity, and the vertical center of the triangle 470. In an embodiment, the positioning position 450 may be related to the aforementioned four centers. For example, it can be the midpoint of any two hearts.

在一實施例中，當目標物具有對應的區域範圍時，可以獲得三個位置範圍411、421與431。接著，計算這三個位置範圍的交集，亦即位置範圍460。In one embodiment, when the target has a corresponding area range, three position ranges 411, 421, and 431 can be obtained. Then, the intersection of these three position ranges, that is, the position range 460 is calculated.

在一實施例中，可以再將交集的位置範圍460與三角形470進行交集，判斷目前位置是在交集所得到的區域範圍。In an embodiment, the intersection of the position range 460 and the triangle 470 may be further intersected, and it is determined that the current position is in the area range obtained by the intersection.

在一實施例中，假設當照片所相關的目標物的位置超過三個時，可以先劃分成多個三角形。接著，再計算每一個三角形的一個位置，例如前述的四心的其中之一，或是與四心相關的一個位置。最後，再根據每個三角形所對應的位置進行中間點的計算。舉例來說，當相關的位置為四個時，可以先劃分成兩個三角形。再計算兩個三角形的重心。最後再計算兩個重心的中間點。In one embodiment, it is assumed that when the position of the target object related to the photo exceeds three, it can be divided into a plurality of triangles first. Then, calculate a position of each triangle, such as one of the aforementioned four centers, or a position related to the four centers. Finally, calculate the intermediate point according to the position corresponding to each triangle. For example, when there are four related positions, it can be divided into two triangles first. Then calculate the center of gravity of the two triangles. Finally, calculate the midpoint of the two centers of gravity.

當取得定位位置之後，可以在地圖資訊上計算某一符合的目標物相對於該定位位置的一地圖方位角。接著，可以根據找到該目標物的照片拍攝時的相位角與該地圖方位角相比，就能將拍照時的方位角視為該地圖方位角。After the positioning position is obtained, a map azimuth angle of a certain matching target relative to the positioning position can be calculated on the map information. Then, according to the comparison of the phase angle when the photo of the target is found and the azimuth angle of the map, the azimuth angle when the photo is taken can be regarded as the azimuth angle of the map.

請參考圖5所示，其為根據本申請一實施例的電腦視覺定位裝置500的一方塊示意圖。在一實施例當中，該電腦視覺定位裝置500可以是行動裝置，特別是可穿戴裝置，方便使用者攜帶。雖然在圖5當中沒有示出，但該電腦視覺定位裝置500可以具有電池來供應電力。Please refer to FIG. 5, which is a block diagram of a computer vision positioning device 500 according to an embodiment of the present application. In one embodiment, the computer visual positioning device 500 may be a mobile device, especially a wearable device, which is convenient for the user to carry. Although not shown in FIG. 5, the computer visual positioning device 500 may have a battery to supply power.

電腦視覺定位裝置500可以包含一攝像模組510、一姿態與動態偵測模組520、一顯示模組530、一網路模組540、一記憶體550、一推論模組560與連接上述模組的一中央處理器模組570。該中央處理器模組570係用於控制該電腦視覺定位裝置500。該中央處理器模組570可以包含至少一個處理器核心，用於執行存儲在非揮發性記憶體當中的作業系統與應用程式，以便透過其餘的模組來實現本申請所提供的技術方案與特徵。本領域普通技術人員應當具有計算機結構與計算機組織的通常知識，可以理解並且實現上述的計算機架構。The computer vision positioning device 500 may include a camera module 510, a posture and motion detection module 520, a display module 530, a network module 540, a memory 550, an inference module 560, and connection to the above-mentioned modules. Group of a central processing unit 570. The central processing unit 570 is used to control the computer visual positioning device 500. The CPU module 570 may include at least one processor core for executing the operating system and application programs stored in the non-volatile memory, so as to realize the technical solutions and features provided in this application through the remaining modules . A person of ordinary skill in the art should have general knowledge of computer structure and computer organization, and can understand and implement the above-mentioned computer architecture.

該攝像模組510可以包含至少一個攝像鏡頭組，其包含光學元件、感光元件、濾波器、放大器、編碼器等元器件設備，用於將拍攝到的照片傳送至該中央處理器模組570，或是直接存入到記憶體550當中。The camera module 510 may include at least one camera lens group, which includes optical elements, photosensitive elements, filters, amplifiers, encoders and other components and equipment, used to transmit the captured photos to the central processing unit 570, Or it can be stored directly in the memory 550.

該姿態與動態偵測模組520可以包含微機電裝置實作的加速度計、角加速度計、磁力計、陀螺儀等裝置其中之一，或者是其任意組合。透過該姿態與動態偵測模組520，可以輸出六個自由度的訊息，例如俯仰軸、滾轉軸、偏行軸的角偏移量與角加速度、三個垂直軸的位移向量與加速度等訊息。根據上述六個自由度的訊息，該姿態與動態偵測模組520可以更計算出兩個連續時機點的角偏移量與位移量。據此，可以累積兩個以上時機點所測得的資訊，得到相對應的姿態與動態的改變量。The attitude and motion detection module 520 may include one of an accelerometer, an angular accelerometer, a magnetometer, a gyroscope and other devices implemented by a microelectromechanical device, or any combination thereof. Through the attitude and motion detection module 520, six degrees of freedom information can be output, such as the angular offset and angular acceleration of the pitch axis, roll axis, and yaw axis, and the displacement vector and acceleration of the three vertical axes. . According to the information of the aforementioned six degrees of freedom, the attitude and motion detection module 520 can further calculate the angular offset and displacement of two consecutive timing points. According to this, the information measured at more than two timing points can be accumulated, and the corresponding attitude and dynamic changes can be obtained.

由於該攝像模組510與該姿態與動態偵測模組520是固定在該電腦視覺定位裝置500當中，所以該姿態與動態偵測模組520可以記錄拍照當時的方位角與位置。在該電腦視覺定位裝置500移動之後，可以得知相對於拍照地點的方位角的夾角，以及兩個位置之間的移動向量。但除非外界給予一初始的座標系位置與姿態，否則該姿態與動態偵測模組520只能算出兩個時機點的相對位置與相對方位角變化。Since the camera module 510 and the posture and motion detection module 520 are fixed in the computer vision positioning device 500, the posture and motion detection module 520 can record the azimuth and position at the time of taking a photo. After the computer vision positioning device 500 moves, the included angle relative to the azimuth angle of the photographing location and the movement vector between the two positions can be known. However, unless an initial coordinate system position and attitude are given by the outside world, the attitude and motion detection module 520 can only calculate the relative position and relative azimuth change of the two timing points.

本領域普通技術人員可以理解到，上述相對應的姿態與動態的改變量可以由中央處理器模組570來計算，也可以由該姿態與動態偵測模組520來計算。本申請並不限定是由哪一個模組來計算。Those of ordinary skill in the art can understand that the corresponding posture and dynamic change amount can be calculated by the central processing unit 570, or can be calculated by the posture and motion detection module 520. This application does not limit which module is used for calculation.

顯示模組530可以用於顯示該攝像模組510所拍攝的照片，以及該作業系統與該應用程式所呈現的輸出畫面。在一實施例中，顯示模組530可以包含至少一個薄型顯示螢幕，例如各種液晶螢幕。在另一實施例中，顯示模組530可以包含穿透式的光學式頭戴顯示器。目前已知有六種波導型式的頭戴顯示器，本申請並不限定哪一種形式的顯示器。The display module 530 can be used to display photos taken by the camera module 510 and output images presented by the operating system and the application program. In an embodiment, the display module 530 may include at least one thin display screen, such as various liquid crystal screens. In another embodiment, the display module 530 may include a transmissive optical head-mounted display. There are currently six waveguide-type head-mounted displays, and this application does not limit which type of display.

可選的網路模組540可以包含無線通信模組，例如藍芽、區域無線網路或第四代、第五代、第六代行動通信網路模組。網路模組540可以用於下載與更新定位區域的資訊，也就是圖資。The optional network module 540 may include a wireless communication module, such as a Bluetooth, a local wireless network, or a fourth, fifth, and sixth-generation mobile communication network module. The network module 540 can be used to download and update the information of the positioning area, that is, the map data.

記憶體550可以包含揮發性與非揮發性的儲存媒介，用於作為系統記憶體與長期儲存媒介。記憶體550可以存儲圖資與模型，分別供中央處理器模組570與推論模組560使用。The memory 550 may include volatile and non-volatile storage media for use as a system memory and long-term storage media. The memory 550 can store graphics and models for use by the CPU module 570 and the inference module 560, respectively.

可選的推論模組560可以是軟體模組，也可以是硬體模組，或者是軟硬體配合的模組。推論模組560用於利用已經監督學習完成的深度神經網路模型進行推論，以便判斷並找出輸入的照片與一組位置與/或方向是否相關，還可能輸出其相關的信心程度，其介於0~100%之間。使用硬體或軟硬體配合實施的推論模組760係用於加速運算。當中央處理器模組570所提供的運算量無法及時供應深度神經網路模型的推論計算時，可以使用硬體或軟硬體配合實施的推論模組560。當中央處理器模組570所提供的運算量足夠深度神經網路模型的推論時，可以不需要用到硬體或軟硬體配合實施的推論模組560。The optional inference module 560 may be a software module, a hardware module, or a module that cooperates with software and hardware. The inference module 560 is used to make inferences using the deep neural network model that has been supervised and learned, so as to determine and find out whether the input photo is related to a set of positions and/or directions, and may also output its related confidence level. Between 0~100%. The inference module 760 implemented with hardware or software and hardware is used for accelerating calculations. When the calculation amount provided by the central processing unit 570 cannot provide the inference calculation of the deep neural network model in time, the inference module 560 implemented by hardware or software and hardware can be used. When the amount of calculation provided by the central processing unit 570 is sufficient for the inference of the deep neural network model, the inference module 560 implemented by hardware or software and hardware may not be used.

請參考圖6所示，其為根據本申請一實施例的電腦視覺定位方法600的一流程示意圖。該電腦視覺定位方法600可以適用於圖5所示的電腦視覺定位裝置500。在一實施例中，該電腦視覺定位方法600可以是由中央處理器模組570所執行的程式。在圖6所示的實施例當中，如果沒有提到任兩個步驟之間的因果關係，則本申請不限定這兩個步驟之間的執行順序。Please refer to FIG. 6, which is a schematic flowchart of a computer vision positioning method 600 according to an embodiment of the present application. The computer vision positioning method 600 can be applied to the computer vision positioning device 500 shown in FIG. 5. In one embodiment, the computer vision positioning method 600 may be a program executed by the central processing unit 570. In the embodiment shown in FIG. 6, if the causal relationship between any two steps is not mentioned, the application does not limit the execution sequence between these two steps.

步驟610：開啟姿態動態偵測。Step 610: Turn on dynamic posture detection.

步驟620：取得具有方位角資訊的多張照片。這些照片可以具有不同的方位角，在同一地點同時或分時拍攝。Step 620: Obtain multiple photos with azimuth angle information. These photos can have different azimuths and be taken at the same place at the same time or in time sharing.

步驟630：推論每一張照片以得到相關資訊。可以利用中央處理器模組570或推論模組560，對每一張照片進行推論以得到相關資訊。這裡所指的相關資訊是照片中是否判斷某一個目標物。Step 630: Infer each photo to obtain relevant information. The CPU module 570 or the inference module 560 can be used to make inferences on each photo to obtain relevant information. The relevant information referred to here is whether a certain target is judged in the photo.

步驟640：判斷是否得到多個目標物的資訊。判斷步驟640的推論結果當中，是否有至少一張照片可以對應到一個目標物，從圖資中獲得該目標物的位置，甚至是對應的區域範圍。當沒有兩張照片可以對應到目標物時，流程前進到可選的步驟650或回到步驟610。當至少有兩張照片以對應到目標物時，可以進行到步驟660。Step 640: Determine whether the information of multiple targets is obtained. It is determined whether there is at least one photo corresponding to a target among the inference results of step 640, and the position of the target, or even the corresponding area range, is obtained from the image data. When there are no two photos that correspond to the target object, the process proceeds to optional step 650 or returns to step 610. When there are at least two photos corresponding to the target, step 660 can be performed.

可選的步驟650：提示使用者移位。由於並未找到對應的位置，因此可以透過顯示模組530請使用者移位後再進行一次照相步驟620。Optional step 650: prompt the user to shift. Since the corresponding position is not found, the display module 530 can be used to ask the user to move and perform the photographing step 620 again.

步驟660：判斷得到N個目標物的資訊？N為大於或等於2的正整數。當N為2的時候，流程進到步驟665。當N大於2的時候，流程進到步驟666。Step 660: Determine the information of N targets? N is a positive integer greater than or equal to 2. When N is 2, the flow proceeds to step 665. When N is greater than 2, the flow proceeds to step 666.

步驟665：用該兩個目標物的位置來定位。或者是利用該兩個目標物的位置與其對應的兩個區域範圍來定位。在定位之後，可以利用一個目標物相對於定位位置的地圖方位角，來對應拍攝到該目標物的照片之方位角。就可以得知目前的方位角。本步驟可以參考先前關於圖3的說明，不在此詳述。接著，流程進到步驟670。Step 665: Use the positions of the two targets to locate. Or use the positions of the two targets and their corresponding two area ranges for positioning. After positioning, the map azimuth angle of a target object relative to the positioning position can be used to correspond to the azimuth angle of the photograph of the target object. You can know the current azimuth. For this step, reference may be made to the previous description of FIG. 3, which will not be described in detail here. Then, the flow proceeds to step 670.

步驟666：用該N個目標物的位置來定位。或者是利用該N個目標物的位置與其對應的兩個區域範圍來定位。在定位之後，可以利用一個目標物相對於定位位置的地圖方位角，來對應拍攝到該目標物的照片之方位角。就可以得知目前的方位角。本步驟可以參考先前關於圖4所示的實施例說明。接著，流程進到步驟670。Step 666: Use the positions of the N targets to locate. Or use the positions of the N targets and their corresponding two area ranges for positioning. After positioning, the map azimuth angle of a target object relative to the positioning position can be used to correspond to the azimuth angle of the photograph of the target object. You can know the current azimuth. For this step, reference may be made to the previous description of the embodiment shown in FIG. 4. Then, the flow proceeds to step 670.

步驟670：偵測動態以進行導航。由於已經得知步驟620進行時的位置與方位角，而且姿態與動態偵測模組720可偵測當前與步驟620進行時的位移向量與方位角的夾角，因此，可以在相關的地圖上進行定位與導航。接著，流程可以進到可選的步驟680或可選的步驟690。Step 670: Detect motion for navigation. Since the position and azimuth angle during step 620 are known, and the attitude and motion detection module 720 can detect the angle between the current displacement vector and the azimuth angle during step 620, it can be performed on the relevant map. Positioning and navigation. Then, the process can proceed to optional step 680 or optional step 690.

可選的步驟680：顯示擴增實境物件。在一實施例當中，該地圖可以包含擴增實境的資訊。舉例來說，該地圖可以包含某處有某物的訊息。因此，可以指引使用者如何走到該處並且發現該物。Optional step 680: display the augmented reality object. In one embodiment, the map may include augmented reality information. For example, the map can contain information that there is something somewhere. Therefore, the user can be guided how to get there and find the object.

在一實施例當中，該目標物可以對應到一虛擬寶物。判斷照片中具有該目標物時，可以認為得到該虛擬寶物。在另一實施例當中，當定位位置在對應到一虛擬寶物的某一區域內時，可以認為得到該虛擬寶物。該虛擬寶物可以用於兌換一實體的物品，或者是一種折扣券、招待券、兌換券或一種憑證。該虛擬寶物的數量可以具有限制。換言之，只有前幾名到達該區域或找到該目標物者，才能得到該虛擬寶物。可以利用網路模組540和一虛擬寶物的管理伺服器來登記取得名次、虛擬寶物、折扣券、招待券、兌換券等憑證。In one embodiment, the target object may correspond to a virtual treasure. When it is judged that there is the target object in the photo, it can be considered that the virtual treasure is obtained. In another embodiment, when the positioning position is in a certain area corresponding to a virtual treasure, it can be considered that the virtual treasure is obtained. The virtual treasure can be used to exchange for a physical item, or a discount coupon, entertainment coupon, exchange coupon or a voucher. The number of virtual treasures may have a limit. In other words, only the first few who reach the area or find the target can get the virtual treasure. The network module 540 and a virtual treasure management server can be used to register for rankings, virtual treasures, discount coupons, entertainment coupons, redemption coupons and other vouchers.

可選的步驟690：校正定位。由於初次獲得的位置的精度上不理想，可以在獲得定位資訊之後，再進行後續的校正。Optional step 690: correct positioning. Since the accuracy of the position obtained for the first time is not ideal, subsequent corrections can be made after the positioning information is obtained.

請參考圖7所示，為根據本申請一實施例的電腦視覺定位方法的校正步驟690的一流程示意圖。在圖7所示的實施例當中，如果沒有提到任兩個步驟之間的因果關係，則本申請不限定這兩個步驟之間的執行順序。Please refer to FIG. 7, which is a schematic flowchart of the calibration step 690 of the computer vision positioning method according to an embodiment of the present application. In the embodiment shown in FIG. 7, if the causal relationship between any two steps is not mentioned, the application does not limit the execution order between these two steps.

步驟710：取得一或多張照片。此步驟710和步驟620類似，但可以只取得一張照片。Step 710: Obtain one or more photos. This step 710 is similar to step 620, but only one photo can be obtained.

步驟720：推論每一張照片以得到相關的資訊。由於步驟710的照片的拍攝位置與方位角是已知的，可以和地圖資訊上的目標物所對應的區域範圍進行比對。例如當拍攝的照片位於某一目標物對應的區域範圍時，可以先推論該照片是否對應到該目標物。如此一來，可能可以節省推論的時間。Step 720: Infer each photo to obtain relevant information. Since the shooting location and azimuth angle of the photo in step 710 are known, it can be compared with the area range corresponding to the target on the map information. For example, when a photo is taken in a region corresponding to a certain target, it can be inferred whether the photo corresponds to the target. In this way, it may save time for inference.

步驟730：判斷是否得到至少一個目標物的資訊。當沒有得到任何一個目標物的資訊時，流程可以回到步驟710。否則，流程進到步驟740。Step 730: Determine whether the information of at least one target object is obtained. When the information of any target is not obtained, the process can return to step 710. Otherwise, the process goes to step 740.

步驟740：判斷得到N個目標物的資訊。當N=1時，進行步驟751。當N=2時，進行步驟752。當N=3時，進行步驟753。Step 740: Determine and obtain information about N targets. When N=1, proceed to step 751. When N=2, proceed to step 752. When N=3, proceed to step 753.

步驟751：用該目標物所對應的位置來定位。當只找到一個目標物時，可以判斷該目標物的位置與當前位置的距離是否合理。比方說，當兩個位置的距離在一誤差範圍內時，無須校正目前位置。當兩個位置的距離超過該誤差範圍時，可以計算兩者的中點做為新的位置。Step 751: Use the position corresponding to the target to locate. When only one target is found, it can be judged whether the distance between the position of the target and the current position is reasonable. For example, when the distance between two positions is within an error range, there is no need to correct the current position. When the distance between the two positions exceeds the error range, the midpoint of the two positions can be calculated as the new position.

步驟752：用該兩個目標物所對應的位置來定位。如步驟665，可以利用兩個目標物位置來找到一個新的位置。接著，可以判斷該新的位置與當前位置的距離是否合理。比方說，當兩個位置的距離在一誤差範圍內時，無須校正目前位置。當兩個位置的距離超過該誤差範圍時，可以計算兩者的中點做為新的位置。或者是用新的位置當作當前位置。Step 752: Use the positions corresponding to the two targets to locate. In step 665, two target positions can be used to find a new position. Then, it can be judged whether the distance between the new position and the current position is reasonable. For example, when the distance between two positions is within an error range, there is no need to correct the current position. When the distance between the two positions exceeds the error range, the midpoint of the two positions can be calculated as the new position. Or use the new position as the current position.

步驟753：用該N個目標物所對應的位置來定位。如步驟666，可以利用N個目標物位置來找到一個新的位置。接著，可以判斷該新的位置與當前位置的距離是否合理。比方說，當兩個位置的距離在一誤差範圍內時，無須校正目前位置。當兩個位置的距離超過該誤差範圍時，可以計算兩者的中點做為新的位置。或者是用新的位置當作當前位置。Step 753: Use the positions corresponding to the N targets to locate. In step 666, N target positions can be used to find a new position. Then, it can be judged whether the distance between the new position and the current position is reasonable. For example, when the distance between two positions is within an error range, there is no need to correct the current position. When the distance between the two positions exceeds the error range, the midpoint of the two positions can be calculated as the new position. Or use the new position as the current position.

步驟760：根據舊的定位資訊與新的定位資訊來更新定位資訊。Step 760: Update the positioning information according to the old positioning information and the new positioning information.

請參考圖8所示，其為圖6所示步驟666的一實施例的一流程示意圖。在圖8所示的實施例當中，如果沒有提到任兩個步驟之間的因果關係，則本申請不限定這兩個步驟之間的執行順序。圖8所示的實施例，可以參考圖4的相關說明。Please refer to FIG. 8, which is a schematic flowchart of an embodiment of step 666 shown in FIG. 6. In the embodiment shown in FIG. 8, if the causal relationship between any two steps is not mentioned, the application does not limit the execution order between these two steps. For the embodiment shown in FIG. 8, reference may be made to the related description of FIG. 4.

步驟810：根據四個以上相應的位置，組成相鄰的多個三角形。舉例來說，四個點可以組成相鄰的兩個三角形。五個點可以組成相鄰的三個三角形。Step 810: Form multiple adjacent triangles according to more than four corresponding positions. For example, four points can form two adjacent triangles. Five points can form three adjacent triangles.

步驟820：計算每一個三角形的相關位置。如前所述，該相關位置可以和該三角形的重心、內心、外心、垂心的其中之一相關，也可以和任兩個或兩個以上的心相關。Step 820: Calculate the relative position of each triangle. As mentioned above, the relative position can be related to one of the center of gravity, the inner center, the outer center, and the vertical center of the triangle, or it can be related to any two or more centers.

步驟830：根據該多個相關位置得到一定位位置。在一實施例中，該定位位置的第一軸座標值可以是所有相關位置第一軸座標值的平均，該定位位置的第二軸座標值可以是所有相關位置第二軸座標值的平均。Step 830: Obtain a positioning position according to the multiple related positions. In an embodiment, the first axis coordinate value of the positioning position may be the average of the first axis coordinate values of all related positions, and the second axis coordinate value of the positioning position may be the average of the second axis coordinate values of all related positions.

請參考圖9所示，其為圖6所示步驟666的一實施例的一流程示意圖。在圖9所示的實施例當中，如果沒有提到任兩個步驟之間的因果關係，則本申請不限定這兩個步驟之間的執行順序。圖9所示的實施例，可以參考圖4的相關說明。Please refer to FIG. 9, which is a schematic flowchart of an embodiment of step 666 shown in FIG. 6. In the embodiment shown in FIG. 9, if the causal relationship between any two steps is not mentioned, the application does not limit the execution sequence between these two steps. For the embodiment shown in FIG. 9, reference may be made to the related description of FIG. 4.

步驟910：根據四個以上相應的位置，選出三個信心程度最高的三個相應的位置。Step 910: According to the four or more corresponding positions, three corresponding positions with the highest confidence level are selected.

步驟920：根據該三個相應的位置，組成一個三角形。Step 920: According to the three corresponding positions, a triangle is formed.

步驟930：計算該三角形的相關位置作為一定位位置。Step 930: Calculate the relevant position of the triangle as a positioning position.

在一實施例中，當該推論步驟得到該定位區域內的兩個相應的位置時，更包含：根據該兩個相應的位置，得到通過該兩個相應的位置的一線段；以及根據該線段上的一定位位置以進行定位。In an embodiment, when the inference step obtains two corresponding positions in the positioning area, it further includes: obtaining a line segment passing through the two corresponding positions according to the two corresponding positions; and according to the line segment A positioning position on the top for positioning.

在一實施例中，當該推論步驟得到該定位區域內的三個相應的位置時，更包含：根據該三個相應的位置，組成一個三角形；以及根據該三角形，計算一定位位置以進行定位。In an embodiment, when the inference step obtains three corresponding positions in the positioning area, it further includes: forming a triangle based on the three corresponding positions; and calculating a positioning position for positioning based on the triangle .

在一實施例中，如果神經網路模型可以提供推論信心程度時，當該推論步驟得到該定位區域內的四個以上相應的位置時，更包含：根據該四個以上相應的位置的推論信心程度，選出三個推論信心程度最高的相應的位置；根據該三個相應的位置，組成一個三角形；以及根據該三角形，計算一定位位置以進行定位。In one embodiment, if the neural network model can provide the degree of inference confidence, when the inference step obtains more than four corresponding positions in the positioning area, it further includes: inference confidence based on the four or more corresponding positions Select three corresponding positions with the highest degree of inference confidence; form a triangle based on the three corresponding positions; and calculate a positioning position for positioning based on the triangle.

在一實施例中，當該推論步驟得到該定位區域內的四個以上相應的位置時，更包含：根據該四個以上相應的位置，組成多個相鄰的三角形；計算每一個該三角形的相關位置；以及根據多個該相關位置，計算一定位位置以進行定位。In one embodiment, when the inference step obtains more than four corresponding positions in the positioning area, it further includes: forming a plurality of adjacent triangles according to the four or more corresponding positions; calculating the value of each triangle Related positions; and calculating a positioning position for positioning based on a plurality of the related positions.

在一實施例中，為了在初次取得位置之後繼續校正，該電腦視覺定位方法更包含：取得在另一地點拍攝的一或多張第二照片；將該一或多張第二照片輸入到該深度神經網路模型進行推論，以得到每一張該第二照片是否相應到該定位區域內的一個第二位置；當相關於該一或多張第二照片的推論步驟得到該定位區域內的一或多個相應的第二位置時，根據該一或多個相應的第二位置進行定位，取得一第二定位位置；以及根據該定位位置、該第二定位位置以及姿態與動態的偵測結果，決定校正後的一定位位置。In one embodiment, in order to continue the calibration after obtaining the position for the first time, the computer vision positioning method further includes: obtaining one or more second photos taken at another location; and inputting the one or more second photos into the The deep neural network model makes inferences to obtain whether each second photo corresponds to a second position in the location area; when the inference step related to the one or more second photos obtains the location area When one or more corresponding second positions are used, perform positioning according to the one or more corresponding second positions to obtain a second positioning position; and according to the positioning position, the second positioning position, and the attitude and dynamic detection As a result, a positioning position after correction is determined.

在一實施例中，該推論步驟更包含：將該多張照片或該第二照片輸入到該深度神經網路模型進行推論，以得到每一張照片或該第二照片是否相應到該定位區域內的一目標物；當該張照片相應到該目標物時，在地圖資訊中尋找該目標物相應的該位置；以及當該第二照片相應到該目標物時，在地圖資訊中尋找該目標物相應的該第二位置。In one embodiment, the inference step further includes: inputting the multiple photos or the second photos into the deep neural network model for inference, so as to obtain whether each photo or the second photo corresponds to the location area When the photo corresponds to the target, find the location corresponding to the target in the map information; and when the second photo corresponds to the target, find the target in the map information Object corresponding to the second position.

在一實施例中，為了支援以虛擬物件取得憑證的功能，該電腦視覺定位方法更包含：判斷該目標物是否對應到一虛擬物件；以及當該目標物對應到該虛擬物件時，透過網路向一伺服器進行登記以取得該虛擬物件的一憑證。In one embodiment, in order to support the function of obtaining a certificate with a virtual object, the computer visual positioning method further includes: determining whether the target object corresponds to a virtual object; A server registers to obtain a certificate of the virtual object.

在一實施例中，當該推論步驟得到該定位區域內的兩個相應的位置與方位角時，該中央處理器模組更用於：根據該兩個相應的位置，得到通過該兩個相應的位置的一線段；以及根據該線段上的一定位位置以進行定位。In one embodiment, when the inference step obtains two corresponding positions and azimuth angles in the positioning area, the central processing unit is further configured to: according to the two corresponding positions, obtain the two corresponding positions and azimuth angles. A line segment of the position of the line segment; and the positioning is performed according to a positioning position on the line segment.

在一實施例中，當該推論步驟得到該定位區域內的三個相應的位置時，該中央處理器模組更用於：根據該三個相應的位置，組成一個三角形；以及根據該三角形，計算一定位位置以進行定位。In one embodiment, when the inference step obtains three corresponding positions in the positioning area, the CPU module is further used to: form a triangle according to the three corresponding positions; and according to the triangle, Calculate a positioning position for positioning.

在一實施例中，如果神經網路模型可以提供推論信心程度時，當該推論步驟得到該定位區域內的四個相應的位置時，該中央處理器模組更用於：根據該四個以上相應的位置的推論信心程度，選出三個推論信心程度最高的相應的位置；根據該三個相應的位置，組成一個三角形；以及根據該三角形，計算一定位位置以進行定位。In one embodiment, if the neural network model can provide the degree of confidence in inference, when the inference step obtains four corresponding positions in the positioning area, the CPU module is further used to: According to the degree of inference confidence of the corresponding position, three corresponding positions with the highest degree of inference confidence are selected; based on the three corresponding positions, a triangle is formed; and based on the triangle, a positioning position is calculated for positioning.

在一實施例中，當該推論步驟得到該定位區域內的四個以上相應的位置時，該中央處理器模組更用於：根據該四個以上相應的位置，組成多個相鄰的三角形；計算每一個該三角形的相關位置；以及根據多個該相關位置，計算一定位位置以進行定位。In an embodiment, when the inference step obtains more than four corresponding positions in the positioning area, the central processing unit module is further configured to: form a plurality of adjacent triangles according to the four or more corresponding positions ; Calculate the relevant position of each of the triangles; and calculate a positioning position for positioning based on a plurality of the relevant positions.

在一實施例中，為了在初次取得位置之後繼續校正，該中央處理器模組更用於：取得該攝像模組在另一地點拍攝的一或多張第二照片；將該一或多張第二照片輸入到該深度神經網路模型進行推論，以得到每一張該第二照片是否相應到該定位區域內的一個第二位置；當相關於該一或多張第二照片的推論步驟得到該定位區域內的一或多個相應的第二位置時，根據該一或多個相應的第二位置進行定位，取得一第二定位位置；以及根據該定位位置、該第二定位位置以及姿態與動態的偵測結果，決定校正後的一定位位置。In one embodiment, in order to continue the calibration after obtaining the position for the first time, the CPU module is further used to: obtain one or more second photos taken by the camera module at another location; The second photo is input to the deep neural network model for inference, so as to obtain whether each second photo corresponds to a second position in the positioning area; when it is related to the inference step of the one or more second photos When obtaining one or more corresponding second positions in the positioning area, perform positioning according to the one or more corresponding second positions to obtain a second positioning position; and according to the positioning position, the second positioning position, and The posture and motion detection results determine a positioning position after correction.

在一實施例中，該中央處理器模組更用於：將該多張照片或該第二照片輸入到該深度神經網路模型進行推論，以得到每一張照片或該第二照片是否相應到該定位區域內的一目標物；當該張照片相應到該目標物時，在地圖資訊中尋找該目標物相應的該位置；以及當該第二照片相應到該目標物時，在地圖資訊中尋找該目標物相應的該第二位置。In one embodiment, the CPU module is further used to input the multiple photos or the second photos into the deep neural network model for inference, so as to obtain whether each photo or the second photo corresponds to To a target in the location area; when the photo corresponds to the target, look for the position corresponding to the target in the map information; and when the second photo corresponds to the target, in the map information To find the second position corresponding to the target.

在一實施例中，為了支援以虛擬物件取得憑證的功能，電腦視覺定位裝置，更包含連接到一網路的一網路模組，其中該中央處理器模組更用於：判斷該目標物是否對應到一虛擬物件；以及當該目標物對應到該虛擬物件時，令該網路模組透過該網路向一伺服器進行登記以取得該虛擬物件的一憑證。In one embodiment, in order to support the function of obtaining a certificate with a virtual object, the computer visual positioning device further includes a network module connected to a network, wherein the CPU module is further used to: determine the target object Whether it corresponds to a virtual object; and when the target object corresponds to the virtual object, the network module is made to register with a server through the network to obtain a certificate of the virtual object.

100:定位區域 110:目標物 111:區域範圍 120:目標物 121:區域範圍 130:目標物 131:區域範圍 140:目標物 141:區域範圍 142:區域範圍 143:區域範圍 200:電腦視覺定位地圖準備方法 310:實際位置 320:區域範圍 330:連線 410:目標物 411:區域範圍 420:目標物 421:區域範圍 430:目標物 431:區域範圍 450:位置 460:位置範圍 470:三角形 500:電腦視覺定位裝置 510:攝像模組 520:姿態與動態偵測模組 530:顯示模組 540:網路模組 550:記憶體 560:推論模組 570:中央處理器模組 600:電腦視覺定位方法100: positioning area 110: target 111: Area scope 120: Target 121: regional scope 130: target 131: Area Range 140: target 141: Area Range 142: Area Range 143: Area Range 200: Computer vision positioning map preparation method 310: Actual location 320: area range 330: connection 410: Target 411: Area Range 420: target 421: Area Range 430: target 431: Area Range 450: location 460: position range 470: Triangle 500: Computer vision positioning device 510: camera module 520: Attitude and motion detection module 530: display module 540: Network Module 550: Memory 560: Inference Module 570: CPU module 600: Computer vision positioning method

[圖1]為根據本申請一實施例的電腦視覺定位地圖準備的一示意圖。 [圖2]為根據本申請一實施例的電腦視覺定位地圖準備方法的一流程示意圖。 [圖3]為根據本申請一實施例的電腦視覺定位方法的一示意圖。 [圖4]為根據本申請一實施例的電腦視覺定位方法之三角形定位步驟的一示意圖。 [圖5]為根據本申請一實施例的電腦視覺定位裝置的一方塊示意圖。 [圖6]為根據本申請一實施例的電腦視覺定位方法的一流程示意圖。 [圖7]為根據本申請一實施例的電腦視覺定位方法的校正步驟的一流程示意圖。 [圖8]為圖6所示步驟666的一實施例的一流程示意圖。 [圖9]為圖6所示步驟666的另一實施例的一流程示意圖。 [Figure 1] is a schematic diagram of a computer vision positioning map prepared according to an embodiment of the present application. [Fig. 2] is a schematic flowchart of a method for preparing a computer vision positioning map according to an embodiment of the present application. [Fig. 3] is a schematic diagram of a computer vision positioning method according to an embodiment of the present application. [Fig. 4] is a schematic diagram of the triangle positioning step of the computer vision positioning method according to an embodiment of the present application. [Fig. 5] is a block diagram of a computer vision positioning device according to an embodiment of the present application. [Fig. 6] is a schematic flowchart of a computer vision positioning method according to an embodiment of the present application. [FIG. 7] is a schematic flow chart of the calibration steps of the computer vision positioning method according to an embodiment of the present application. [Fig. 8] is a schematic flowchart of an embodiment of step 666 shown in Fig. 6. [Fig. [Fig. 9] is a schematic flowchart of another embodiment of step 666 shown in Fig. 6. [Fig.

600:電腦視覺定位方法 600: Computer vision positioning method

Claims

A computer vision positioning method includes: performing posture and motion detection; obtaining multiple photos taken at one location, wherein each photo of the multiple photos corresponds to a different azimuth angle, and the azimuth angle information is determined by the Provided by the detection step; input the multiple photos into a trained deep neural network model for inference, so as to obtain whether each photo corresponds to a location in a location area; when the inference step obtains the location When there are multiple corresponding locations in the area, perform positioning based on the multiple corresponding locations to obtain a positioning location; obtain one or more second photos taken at another location; input the one or more second photos into The deep neural network model makes inferences to obtain whether each second photo corresponds to a second position in the location area; when the inference step related to the one or more second photos obtains the location area When one or more corresponding second positions of, perform positioning according to the one or more corresponding second positions to obtain a second positioning position; and according to the positioning position, the second positioning position and the posture and dynamic detection The measurement result determines a positioning position after correction.

The computer vision positioning method according to claim 1, wherein the inference step further comprises: inputting the multiple photos or the second photos into the deep neural network model for inference, so as to obtain each photo or the second photo Whether the photo corresponds to a target in the positioning area; when the photo corresponds to the target, search for the location corresponding to the target in the map information; and when the second photo corresponds to the target, Find the second location corresponding to the target in the map information.

The computer vision positioning method according to claim 2, further comprising: judging whether the target object corresponds to a virtual object; and when the target object corresponds to the virtual object, registering with a server through the network to obtain the virtual object A certificate of the object.

A computer vision positioning device includes: a camera module for taking photos; a posture and motion detection module for posture and motion detection; and a central processing unit module for performing non-volatile The program code stored in the sexual memory is used to perform the following steps: make the attitude and motion detection module perform attitude and motion detection; obtain multiple photos taken by the camera module at a location, of which the multiple Each photo of the photo corresponds to different azimuth information, which is provided by the detection step; input the multiple photos into a trained deep neural network model for inference, so as to obtain each Whether the photo corresponds to a position in a positioning area; when the inference step obtains a plurality of corresponding positions in the positioning area, perform positioning according to the plurality of corresponding positions to obtain a positioning position; obtain the camera module One or more second photos taken at another location; input the one or more second photos into the deep neural network model for inference, so as to obtain whether each second photo corresponds to the location area When the inference step related to the one or more second photos obtains one or more corresponding second positions in the positioning area, perform positioning according to the one or more corresponding second positions , Obtaining a second positioning position; and determining a corrected positioning position according to the positioning position, the second positioning position, and the posture and dynamic detection results.

The computer vision positioning device according to claim 4, wherein the central processing unit module is further used for: inputting the multiple photos or the second photos into the deep neural network model for inference, so as to obtain each photo Or whether the second photo corresponds to a target in the positioning area; when the photo corresponds to the target, search for the location corresponding to the target in the map information; and when the second photo corresponds to the target When the target object, the second location corresponding to the target object is searched for in the map information.

The computer visual positioning device according to claim 5, further comprising a network module connected to a network, wherein the CPU module is further used for determining whether the target object corresponds to a virtual object; and When the target object corresponds to the virtual object, the network module is made to register with a server through the network to obtain a certificate of the virtual object.