TWI748685B

TWI748685B - Method and server for optimizing joint object detection model

Info

Publication number: TWI748685B
Application number: TW109135316A
Authority: TW
Inventors: 黃文宏
Original assignee: 中華電信股份有限公司
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-12-01
Also published as: TW202215307A

Abstract

The present invention discloses a method and server for optimizing a joint object detection model, which is mainly applied to accelerate and effectively produce artificial intelligence (AI) training image annotation data, AI image detection model and AI image classification model. The present invention can effectively reduce the cost of manpower and time in the stages of graphic data managing, screening, labeling and review.

Description

Method and server for optimizing joint object detection model

本發明是有關於一種訓練物件偵測模型的技術，且特別是有關於一種種優化聯合物件偵測模型的方法及伺服器。 The present invention relates to a technique for training an object detection model, and particularly relates to a method and server for optimizing a joint object detection model.

在影像監控應用上，結合電腦視覺之深度學習(deep learning)技術已是發展趨勢，且其中應用最為廣泛的是監督學習，即模型訓練過程的所有數據都需要事先正確標註。然而要產生預測準確率高的深度學習模型，需要收集大量的影像標註訓練樣本。目前影像標註的方法大多採取人工標註，標註者需針對影像資料中的目標物件逐一框選物件範圍與標示關聯的物件類別名稱，並於標註後進行審查，以確保標註內容的準確性，如此標註圖資流程相當費時費力。 In the application of image monitoring, deep learning technology combined with computer vision has become a development trend, and the most widely used is supervised learning, that is, all data in the model training process needs to be correctly labeled in advance. However, to produce a deep learning model with high prediction accuracy, a large number of image annotation training samples need to be collected. At present, most of the methods of image labeling are manual labeling. The labeler needs to select the object range and the object category name associated with the label one by one for the target object in the image data, and review it after labeling to ensure the accuracy of the labeling content. Such labeling The drawing and funding process is time-consuming and labor-intensive.

在影像監控應用領域日漸增加的各種人工智慧影像辨識場景與應用需求，若只依據固定的標註圖資集所訓練的深度學習模型，缺乏數據多樣化，且往往不易適用現實場景與應用的需求。若需要根據現實場景的需求構建專屬的圖資集，實現對特定類別的分類，需耗費相當的標註與校驗人力來維護。若建置後必須依需求新增一至多項新的物件特徵類別到既有的應用模型中，對原訓練圖資集的新增標註成本亦相當可觀。 In the field of image surveillance applications, various artificial intelligence image recognition scenarios and application requirements are increasing. If only a deep learning model trained based on a fixed set of annotation maps, there is a lack of data diversification, and it is often difficult to adapt to the needs of real scenarios and applications. If you need to build an exclusive set of graphics resources according to the needs of the real scene, realize the specific category The classification requires considerable labeling and verification manpower to maintain. If one or more new object feature categories must be added to the existing application model after construction, the cost of adding new annotations to the original training image collection is also considerable.

此外，習知之自動標註方法多半是為了保證標註結果的客觀性和準確性，但該自動標註的模型在其訓練過程中，需要對標註結果進行篩檢，仍存在大量人工對標註資料審查的成本。另外，對於找出最佳模型的訓練過程，大多依靠既有的訓練資料集進行，但在實務應用中，往往需要針對不同的環境和需求，對不同的類別做偵測或分類，而且類別多樣且繁複，故需構建或擴充包含現場實際資料的訓練資料集，以實現適用之模型對特定類別的分類。 In addition, most of the conventional automatic labeling methods are to ensure the objectivity and accuracy of the labeling results, but the automatic labeling model needs to screen the labeling results during its training process, and there is still a large amount of manual review of the cost of labeling data. . In addition, the training process for finding the best model mostly relies on existing training data sets. However, in practical applications, it is often necessary to detect or classify different categories according to different environments and needs, and the categories are diverse And it is complicated, so it is necessary to construct or expand the training data set containing the actual data on the spot to realize the classification of specific categories by the applicable model.

另外，習知技術對於已達預測精準度目標之模型，遇到變更偵測物件的要求，在時程與成本有限的條件下，不易再將龐大訓練圖資全部檢視，逐一標註出新物件類別後重新精練模型，來為新變更的應用需求提供新物件特徵組合的偵測模型。因此，實務上變通方式會從既有的物件偵測模型中以串連或並接的方式組合出所需要偵測的物件類別，然而，此舉在初期雖然可以省略重新標註圖資與訓練模型的成本，並快速上線應用，但缺點是系統資源需求高且運作效率差，因上線系統的資源有限，加上圖像處理單元(GPU)推論設備的擴充成本高，後續擴增的維運成本與頻繁客訴的壓力，非長久上線應用之良策。 In addition, the conventional technology meets the requirement of changing detected objects for models that have reached the target of prediction accuracy. Under the conditions of limited time and cost, it is not easy to inspect all the huge training images and mark new object categories one by one. Then refine the model again to provide a detection model of new object feature combinations for newly changed application requirements. Therefore, in practice, the workaround is to combine the types of objects that need to be detected from the existing object detection models in a serial or parallel manner. However, this can omit the re-labeling and training of the model in the initial stage. Cost, and quickly launch the application, but the disadvantage is that the system resource requirements are high and the operation efficiency is poor. Because the resources of the online system are limited, and the expansion cost of the image processing unit (GPU) inference equipment is high, the maintenance cost of subsequent expansion is also The pressure of frequent customer complaints is not a good strategy for long-term online applications.

有鑑於此，本發明提供一種優化聯合物件偵測模型的方法及伺服器，其可用於解決上述技術問題。 In view of this, the present invention provides a method and server for optimizing a joint object detection model, which can be used to solve the above technical problems.

本發明提供一種優化聯合物件偵測模型的方法，包括：取得一圖資庫，其中圖資庫包括一第一未標註圖資及多個標註圖資集，各標註圖資集對應於單一類物件；以所述多個標註圖資集訓練多個單物件偵測模型及多個單物件分類模型，其中所述多個標註圖資集一對一地對應於所述多個單物件偵測模型，且所述多個標註圖資集一對一地對應於所述多個單物件分類模型；取得多個指定偵測物件類別，並據以取得多個外部物件偵測模型，其中各外部物件偵測模型用於偵測屬於所述多個指定偵測物件類別的至少其中之一的物件；反應於判定各偵測物件類別對應於所述多個標註圖資集的其中之一，從所述多個單物件偵測模型找出對應於所述多個指定偵測物件類別的多個特定單物件偵測模型；在所述多個標註圖資集中找出對應於所述多個指定偵測物件類別的多個特定標註圖資集，並據以訓練一聯合物件偵測模型；使用所述多個特定單物件偵測模型、所述多個外部物件偵測模型及聯合物件偵測模型對第一未標註圖資進行偵測，以產生多個第一物件偵測結果，其中所述多個第一物件偵測結果對應於所述多個指定偵測物件類別中的一第一指定偵測物件類別；使用所述多個單物件分類模型中對應於第一指定偵測物件類別的一第一單物件分類模型對各第一物件偵測結果進行預測分類以得到各第一物件偵測結果的一第一物件分類結果；基於各第一物件偵測結果及對應的第一物件分類結果適應性地修正所述多個第一物件偵測結果，並將修正後的所述多個第一物件偵測結果加入所述多個標註圖資集中對應於第一指定偵測物件類別的一第一標註圖資集；基於第一標註圖資集重新訓練所述多個單物件偵測模型中對應於第一指定偵測物件類別的一第一單物件偵測模型、第一單物件分類模型及聯合物件偵測模型。 The present invention provides a method for optimizing a joint object detection model, including: obtaining a map database, wherein the map database includes a first unlabeled map asset and a plurality of annotated map asset sets, and each annotated map asset set corresponds to a single category Objects; training multiple single-object detection models and multiple single-object classification models with the multiple annotated image data sets, wherein the multiple annotated image data sets correspond to the multiple single-object detections one-to-one Model, and the plurality of annotated image data sets correspond to the plurality of single object classification models one-to-one; obtain a plurality of designated detection object categories, and obtain a plurality of external object detection models accordingly, wherein each external object The object detection model is used to detect objects belonging to at least one of the plurality of designated detection object categories; in response to determining that each detection object category corresponds to one of the plurality of annotated image data sets, from The plurality of single-object detection models find out a plurality of specific single-object detection models corresponding to the plurality of designated detection object categories; and find out the plurality of specific single-object detection models corresponding to the plurality of designated detection objects in the plurality of annotated image data sets Detect a plurality of specific annotation map data sets of object types, and train a joint object detection model accordingly; use the plurality of specific single object detection models, the plurality of external object detection models, and joint object detection The model detects the first unlabeled image data to generate a plurality of first object detection results, wherein the plurality of first object detection results correspond to a first of the plurality of specified detection object categories Specify the detection object category; use a first single object classification model corresponding to the first specified detection object category among the plurality of single object classification models to predict and classify the detection results of each first object to obtain each first object Detection knot A first object classification result of the result; based on each first object detection result and the corresponding first object classification result, the plurality of first object detection results are adaptively corrected, and the corrected plurality of first object An object detection result is added to a first annotated image data set corresponding to the first specified detection object category in the plurality of annotated image data sets; the multiple single-object detection models are retrained based on the first annotated image data set A first single-object detection model, a first single-object classification model, and a joint object detection model corresponding to the first designated detection object category.

本發明提供一種優化聯合物件偵測模型的伺服器，包括儲存電路及處理器。儲存電路儲存多個模組。處理器耦接儲存電路，存取所述多個模組以執行下列步驟：取得一圖資庫，其中圖資庫包括一第一未標註圖資及多個標註圖資集，各標註圖資集對應於單一類物件；以所述多個標註圖資集訓練多個單物件偵測模型及多個單物件分類模型，其中所述多個標註圖資集一對一地對應於所述多個單物件偵測模型，且所述多個標註圖資集一對一地對應於所述多個單物件分類模型；取得多個指定偵測物件類別，並據以取得多個外部物件偵測模型，其中各外部物件偵測模型用於偵測屬於所述多個指定偵測物件類別的至少其中之一的物件；反應於判定各偵測物件類別對應於所述多個標註圖資集的其中之一，從所述多個單物件偵測模型找出對應於所述多個指定偵測物件類別的多個特定單物件偵測模型；在所述多個標註圖資集中找出對應於所述多個指定偵測物件類別的多個特定標註圖資集，並據以訓練一聯合物件偵測模型；使用所述多個特定單物件偵測模型、所述多個外部物件偵測模型及聯合物件偵測模型對第一未標註圖資進行偵測，以產生多個第一物件偵測結果，其中所述多個第一物件偵測結果對應於所述多個指定偵測物件類別中的一第一指定偵測物件類別；使用所述多個單物件分類模型中對應於第一指定偵測物件類別的一第一單物件分類模型對各第一物件偵測結果進行預測分類以得到各第一物件偵測結果的一第一物件分類結果；基於各第一物件偵測結果及對應的第一物件分類結果適應性地修正所述多個第一物件偵測結果，並將修正後的所述多個第一物件偵測結果加入所述多個標註圖資集中對應於第一指定偵測物件類別的一第一標註圖資集；基於第一標註圖資集重新訓練所述多個單物件偵測模型中對應於第一指定偵測物件類別的一第一單物件偵測模型、第一單物件分類模型及聯合物件偵測模型。 The invention provides a server for optimizing a joint object detection model, which includes a storage circuit and a processor. The storage circuit stores multiple modules. The processor is coupled to the storage circuit and accesses the plurality of modules to perform the following steps: obtain a map database, where the map database includes a first unlabeled map resource and a plurality of labeled map resource sets, each of which is labeled A set corresponds to a single type of object; multiple single-object detection models and multiple single-object classification models are trained with the multiple annotated image data sets, wherein the multiple annotated image data sets correspond to the multiple one-to-one A single-object detection model, and the plurality of label map data sets correspond to the plurality of single-object classification models one-to-one; obtain a plurality of designated detection object categories, and obtain a plurality of external object detections accordingly Model, wherein each external object detection model is used to detect objects belonging to at least one of the plurality of specified detection object categories; in response to determining that each detection object category corresponds to the plurality of annotated image data sets One of them is to find a plurality of specific single-object detection models corresponding to the plurality of designated detection object categories from the plurality of single-object detection models; The plurality of specific annotation map data sets of the plurality of designated detection object categories are used to train a joint object detection model; the plurality of specific single object detection models are used Type, the plurality of external object detection models and the joint object detection model detect the first unlabeled image to generate a plurality of first object detection results, wherein the plurality of first object detection results Corresponding to a first specified detection object category in the plurality of specified detection object categories; using a first single object classification model pair corresponding to the first specified detection object category among the plurality of single object classification models Predictive classification is performed on each first object detection result to obtain a first object classification result of each first object detection result; based on each first object detection result and the corresponding first object classification result, the multiple A first object detection result, and adding the corrected detection results of the plurality of first objects to a first annotation image data set corresponding to the first designated detection object category in the plurality of annotated image data sets; Retraining a first single-object detection model, a first single-object classification model, and a joint object detection model corresponding to the first designated detection object category among the plurality of single-object detection models based on the first annotation image data set .

100:伺服器 100: server

102:儲存電路 102: storage circuit

104:處理器 104: processor

310:圖資集 310: Picture Collection

311:整合標註圖資集 311: Integrate annotated map data collection

GA~GH,GN:標註圖資集 GA ~ GH, GN: Annotated image collection

A ₀~E ₀,N ₀:單物件偵測模型 A ₀ ~ E ₀ , N ₀ : Single object detection model

A’~E’,N’:單物件分類模型 A’~E’,N’: Single object classification model

324:聯合偵測模型 324: Joint Detection Model

341~343:偵測結果 341~343: Detection result

S210~S290:步驟 S210~S290: steps

圖1是依據本發明之一實施例繪示的優化聯合物件偵測模型的伺服器示意圖。 FIG. 1 is a schematic diagram of a server for optimizing a joint object detection model according to an embodiment of the present invention.

圖2是依據本發明之一實施例繪示的優化聯合物件偵測模型的方法流程圖。 FIG. 2 is a flowchart of a method for optimizing a joint object detection model according to an embodiment of the present invention.

圖3是依據本發明之一實施例繪示的應用情境圖。 Fig. 3 is an application scenario diagram drawn according to an embodiment of the present invention.

概略而言，為解決上述人工智慧影像辨識應用需求所面臨的問題，本發明包含了建立單物件標註圖資集與單物件偵測模型與單物件分類模型機制、影像預測標註機制、多模型預測結果比對機制等方法組合來達成較佳之應用目標。 Generally speaking, in order to solve the above-mentioned problems faced by the application requirements of artificial intelligence image recognition, the present invention includes the establishment of single-object annotation map resource collection and single-object detection model and single-object classification model mechanism, image prediction and annotation mechanism, and multi-model prediction The result comparison mechanism and other methods are combined to achieve better application goals.

在擴充標註圖資集特徵上，本發明可統一儲存拆解與融合後之外部匯入之標註圖資集與收集外部物件偵測模型之可偵測物件特徵類別之特徵標註內容。 In terms of expanding the features of annotated image collections, the present invention can uniformly store the externally imported annotated image collections after disassembly and fusion and collect the feature annotation content of the detectable object feature categories of the external object detection model.

在多模型比對與訓練效率特徵上，建立單物件特徵的物件偵測模型進行預測標註與修正再訓練的效果，相較於一次標註多種物件特徵圖資的標註和訓練效果，前者效率更好，且人工複審與修正的預測標註資料的品質更佳。 In terms of multi-model comparison and training efficiency characteristics, the effect of establishing a single-object feature object detection model for predictive labeling and correction retraining is better than the labeling and training effects of labeling multiple object feature maps at once. , And the quality of the artificially reviewed and revised forecast annotation data is better.

在臨場應用與組合目標偵測模型特徵上，因單物件偵測模型可較快建立並產生新案場圖資的預測標註資料，可彈性搭配適合案場所需的多物件特徵組合，並可進一步建立與合併新案場之訓練圖資與物件偵測模型。 In the field application and the combination of target detection model features, the single-object detection model can be established quickly and generate the prediction annotation data of the new case map, which can be flexibly matched with the combination of multiple object features required by the case, and can be further established Combine the training graphics and object detection model of the new case.

在模型預測物件類別名稱之自動審查特徵上，可利用標註物件邊界框內之區域影像為訓練圖資，透過不同之演算法與參數，訓練該單物件特徵分類模型以用來即時對預測標註資料之物件類別進行審查，具有減少人力複審成本的效果。 In the automatic review feature of the model predicting the object category name, the regional image within the bounding box of the labeled object can be used as the training image. Through different algorithms and parameters, the single-object feature classification model can be trained for real-time labeling of the predicted data. It has the effect of reducing the cost of manpower review.

在精進人工智慧物件偵測模型迭代週期的工作方法與工作流程特徵上，以上述特徵與利用多模型預測結果比對機制，有著訓練模型更快收斂、標註圖資快速且易多樣化、準確度較高、預測速度較快、節省標註與審查人力的優勢，讓模型精練更有效率，亦可從案場影像中快速挑出特定的物件影像，可有效改善資料偏頗問題。以下將針對本發明提供詳細說明。 In the work method and workflow characteristics of the iterative cycle of the refined artificial intelligence object detection model, the above-mentioned characteristics and the use of the multi-model prediction result comparison mechanism have the training model to converge faster, the annotation map data is fast and easy to diversify, and the accuracy is Higher, The advantages of faster prediction speed, saving annotation and review manpower, make the model more efficient, and can also quickly pick out specific object images from the scene images, which can effectively improve the problem of data bias. A detailed description of the present invention will be provided below.

請參照圖1，其是依據本發明之一實施例繪示的優化聯合物件偵測模型的伺服器示意圖。在不同的實施例中，優化聯合物件偵測模型的伺服器100例如是各式電腦裝置及智慧型裝置，但可不限於此。如圖1所示，伺服器100包括儲存電路102及處理器104。 Please refer to FIG. 1, which is a schematic diagram of a server that optimizes a joint object detection model according to an embodiment of the present invention. In different embodiments, the server 100 for optimizing the combined object detection model is, for example, various computer devices and smart devices, but it is not limited to this. As shown in FIG. 1, the server 100 includes a storage circuit 102 and a processor 104.

儲存電路102例如是任意型式的固定式或可移動式隨機存取記憶體(Random Access Memory，RAM)、唯讀記憶體(Read-Only Memory，ROM)、快閃記憶體(Flash memory)、硬碟或其他類似裝置或這些裝置的組合，而可用以記錄多個程式碼或模組。 The storage circuit 102 is, for example, any type of fixed or removable random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), flash memory (Flash memory), hard disk Disk or other similar devices or a combination of these devices can be used to record multiple codes or modules.

處理器104耦接於儲存電路102，並可為一般用途處理器、特殊用途處理器、傳統的處理器、數位訊號處理器、多個微處理器(microprocessor)、一個或多個結合數位訊號處理器核心的微處理器、控制器、微控制器、特殊應用積體電路(Application Specific Integrated Circuit，ASIC)、現場可程式閘陣列電路(Field Programmable Gate Array，FPGA)、任何其他種類的積體電路、狀態機、基於進階精簡指令集機器(Advanced RISC Machine，ARM)的處理器以及類似品。 The processor 104 is coupled to the storage circuit 102, and can be a general purpose processor, a special purpose processor, a traditional processor, a digital signal processor, multiple microprocessors, one or more combined digital signal processing The core microprocessor, controller, microcontroller, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), any other types of integrated circuits , State machines, processors based on Advanced RISC Machine (ARM) and similar products.

在本發明的實施例中，處理器104可存取儲存電路102 中記錄的模組、程式碼來實現本發明提出的優化聯合物件偵測模型的方法，其細節詳述如下。 In the embodiment of the present invention, the processor 104 can access the storage circuit 102 The module and program code recorded in the document are used to implement the method of optimizing the joint object detection model proposed by the present invention. The details of the method are as follows.

請參照圖2，其是依據本發明之一實施例繪示的優化聯合物件偵測模型的方法流程圖。本實施例的方法可由圖1的伺服器100執行，以下即搭配圖1所示的元件說明圖2各步驟的細節。此外，為便於理解本發明的概念，以下將另以圖3為例進行說明，其中圖3是依據本發明之一實施例繪示的應用情境圖。 Please refer to FIG. 2, which is a flowchart of a method for optimizing a joint object detection model according to an embodiment of the present invention. The method of this embodiment can be executed by the server 100 in FIG. 1. The details of each step in FIG. 2 are described below with the components shown in FIG. 1. In addition, in order to facilitate the understanding of the concept of the present invention, the following will further take FIG. 3 as an example for description, where FIG. 3 is an application scenario diagram drawn according to an embodiment of the present invention.

首先，在步驟S210中，處理器104可取得圖資庫310，其中此圖資庫例如可包括多個未標註圖資及多個標註圖資集GA~GH及GN，且標註圖資集GA~GH及GN個別可對應於單一類物件。舉例而言，標註圖資集GA例如可僅包括以邊界框(bounding box)或其他標註方式標註有「物件A」(例如人)的各式圖片，標註圖資集GB例如可僅包括標註有「物件B」(例如機車)的各式圖片，標註圖資集GC例如可僅包括標註有「物件C」(例如自行車)的各式圖片，而其餘標註圖資集的內容應可依此類推，於此不另贅述。 First, in step S210, the processor 104 can obtain a map database 310, where the map database may include, for example, a plurality of unlabeled map resources and a plurality of labeled map resource sets GA~GH and GN, and the labeled map resource set GA ~GH and GN can respectively correspond to a single type of object. For example, the annotated image resource set GA may only include all kinds of images labeled with "object A" (such as a person) in a bounding box or other labeling methods, and the annotated image resource set GB may, for example, only include All kinds of pictures of "Object B" (such as a locomotive), for example, annotated image collection GC may only include all kinds of images labeled "Object C" (such as a bicycle), and the content of the rest of the annotated image collections should be deduced by analogy , I will not repeat it here.

之後，在步驟S220中，處理器104可以所述多個標註圖資集訓練多個單物件偵測模型及多個單物件分類模型，其中所述多個標註圖資集GA~GH及GN一對一地對應於所述多個單物件偵測模型，且所述多個標註圖資集一對一地對應於所述多個單物件分類模型。 After that, in step S220, the processor 104 may train multiple single-object detection models and multiple single-object classification models with the multiple annotated image data sets, wherein the multiple annotated image data sets GA~GH and GN one One-to-one correspondence to the plurality of single-object detection models, and one-to-one correspondence to the multiple single-object classification models.

舉例而言，處理器104可以標註圖資集GA來訓練用於對「物件A」進行偵測的單物件偵測模型A ₀，以及訓練用於對「物件A」進行分類的單物件分類模型A’。舉另一例而言，處理器104可以標註圖資集B來訓練用於對「物件B」進行偵測的單物件偵測模型B ₀，以及訓練用於對「物件B」進行分類的單物件分類模型B’。舉又一例而言，處理器104可以標註圖資集C來訓練用於對「物件C」進行偵測的單物件偵測模型C ₀，以及訓練用於對「物件C」進行分類的單物件分類模型C’。換言之，對於圖資庫310中的各個標註圖資集而言，處理器104皆可用以訓練對應的單物件偵測模型及單物件分類模型。 For example, the processor 104 can annotate the image data set GA to train a single-object detection model A ₀ for detecting "object A", and train a single-object classification model for classifying "object A"A'. For another example, the processor 104 may label the image data set B to train the single-object detection model B ₀ for detecting "object B" and train the single-object to classify "object B" Classification model B'. For another example, the processor 104 may label the image data set C to train a single-object detection model C ₀ for detecting "object C", and train a single-object to classify "object C" Classification model C'. In other words, for each annotated image set in the image database 310, the processor 104 can be used to train the corresponding single-object detection model and single-object classification model.

應了解的是，本發明所提及的各式模型皆為人工智慧模型，且可採用各式設計者偏好的模型(例如各式神經網路等)實現，但可不限於此。 It should be understood that the various models mentioned in the present invention are all artificial intelligence models, and can be implemented using various models preferred by the designer (for example, various neural networks, etc.), but may not be limited thereto.

在步驟S230中，處理器104可取得多個指定偵測物件類別，並據以取得多個外部物件偵測模型ext1~ext3，其中外部物件偵測模型ext1~ext3個別可用於偵測屬於所述多個指定偵測物件類別的至少其中之一的物件。 In step S230, the processor 104 may obtain a plurality of specified detection object categories, and accordingly obtain a plurality of external object detection models ext1~ext3, of which the external object detection models ext1~ext3 can be used to detect those belonging to the A plurality of objects of at least one of the specified detection object types.

在一實施例中，假設使用者欲使用經本發明的方法訓練而得的聯合物件偵測模型來對某個特定場域(例如某道路、路口等)進行一或多種物件的辨識的話，使用者可依據所欲偵測的物件類別來設定上述指定偵測物件類別。例如，若使用者欲讓聯合物件偵測模型具有偵測屬於「物件A」、「物件B」、「物件C」、「物件D」(例如狗)、「物件E」(例如小客車)及「物件N」(例如騎士)等物件類別之物件的能力，則使用者可將「物件A」、「物件B」、「物件C」、「物件D」、「物件E」及「物件N」設定為上述指定偵測物件類別，但可不限於此。 In one embodiment, if the user wants to use the joint object detection model trained by the method of the present invention to identify one or more objects in a specific field (such as a road, intersection, etc.), the user The above-mentioned designated detection object type can be set according to the object type to be detected. For example, if the user wants the joint object detection model to detect objects belonging to "Object A", "Object B", "Object C", "Object D" (such as a dog), "Object E" (such as a car), and "Object N" (e.g. ride 士) and other object types, the user can set "Object A", "Object B", "Object C", "Object D", "Object E" and "Object N" to the above specified detection Object category, but not limited to this.

另外，在一實施例中，上述外部物件偵測模型個別可具有一次偵測多種物件類別的能力。舉例而言，外部物件偵測模型ext1例如可具有偵測「物件A」、「物件B」、「物件K」及「物件S」的能力。舉另一例而言，外部物件偵測模型ext2例如可具有偵測「物件C」、「物件G」及「物件M」的能力。此外，外部物件偵測模型ext3例如可具有偵測「物件A」、「物件D」、「物件E」及「物件X」的能力。由上可知，外部物件偵測模型ext1~ext3個別皆具有偵測屬於所述多個指定偵測物件類別的至少其中之一的物件的能力。 In addition, in one embodiment, the above-mentioned external object detection model may individually have the ability to detect multiple object types at one time. For example, the external object detection model ext1 can detect "object A", "object B", "object K" and "object S". For another example, the external object detection model ext2 can detect "object C", "object G" and "object M", for example. In addition, the external object detection model ext3 can detect "object A", "object D", "object E" and "object X", for example. It can be seen from the above that the external object detection models ext1 to ext3 each have the ability to detect objects belonging to at least one of the plurality of specified detection object categories.

另外，在一些實施例中，處理器104於步驟S230中挑選的外部物件偵測模型也可被要求符合一或多個條件，例如「物件預測區域邊框與原標註區域邊框交疊率(Intersection over Union)>交疊率目標閾值」、「單一物件特徵類別平均精度(Average Precision)>單物件平均精度目標閾值」、「模型準確度(accuracy)>模型準確度目標閾值」、「模型損失函數收斂結果值<模型損失函數目標閾值」等，但可不限於此。 In addition, in some embodiments, the external object detection model selected by the processor 104 in step S230 may also be required to meet one or more conditions, such as "Intersection over the border of the object prediction area and the original labeled area. Union)>Overlap rate target threshold'', ``Average Precision of a single object feature category (Average Precision)>Single object average accuracy target threshold'', ``Model accuracy (accuracy)>Model accuracy target threshold'', ``Model loss function convergence The result value<model loss function target threshold" etc., but it is not limited to this.

之後，在一實施例中，處理器104可判斷各指定偵測物件類別是否對應於所述多個標註圖資集的其中之一。 After that, in an embodiment, the processor 104 may determine whether each specified detection object category corresponds to one of the plurality of annotated image resources.

在圖3情境中，由於「物件A」、「物件B」、「物件C」、「物件D」、「物件E」及「物件N」分別對應於標註圖資集GA~GH及GN，故處理器104可判定各指定偵測物件類別皆對應於所述多個標註圖資集的其中之一，並可接續在步驟S240中從所述多個單物件偵測模型找出對應於所述多個指定偵測物件類別的多個特定單物件偵測模型。 In the scenario in Figure 3, due to "object A", "object B", "object C", "object "Item D", "Object E", and "Object N" respectively correspond to the annotated image data sets GA~GH and GN, so the processor 104 can determine that each specified detection object category corresponds to the plurality of annotated image data sets. One of them can be continued in step S240 to find multiple specific single object detection models corresponding to the multiple specified detection object categories from the multiple single object detection models.

在圖3中，處理器104可將對應於「物件A」、「物件B」、「物件C」、「物件D」、「物件E」及「物件N」的單物件偵測模型A ₀~E ₀及N ₀作為上述特定單物件偵測模型，但可不限於此。 In FIG. 3, the processor 104 can detect the single object model A corresponding to "object A", "object B", "object C", "object D", "object E" and "object N" ₀ ~ E ₀ and N _{0 are} used as the above-mentioned specific single-object detection models, but they are not limited to this.

然而，在其他實施例中，由於使用者可能會設定未對應於任何標註圖資集的指定偵測物件類別(下稱參考指定偵測物件類別)，因此本發明可透過所提出的圖資新增機制來新增對應於參考指定偵測物件類別的標註圖資集(下稱參考標註圖資集)，並據以訓練對應的單物件偵測模型(下稱參考單物件偵測模型)及單物件分類模型(下稱參考單物件偵測模型)。 However, in other embodiments, since the user may set a designated detection object category that does not correspond to any labeled map data set (hereinafter referred to as the designated detection object category), the present invention can use the proposed map data update Added mechanism to add annotated image data set corresponding to the reference specified detection object category (hereinafter referred to as reference annotated image data set), and train the corresponding single object detection model (hereinafter referred to as reference single object detection model) and Single object classification model (hereinafter referred to as reference single object detection model).

在一實施例中，在處理器104判定所述多個指定偵測物件類別中的參考指定偵測物件類別(例如「大客車」)未對應於標註圖資集GA~GH及GN的任一時，處理器104可在圖資庫310中新增對應於參考指定偵測物件類別的參考標註圖資集。之後，處理器104可從圖資庫310取得多筆第一圖資，並在各第一圖資中針對對應於參考指定偵測物件類別的特定類物件(例如大客車)進行標註，以產生多筆第一參考標註圖資。在一些實施例中，上述標註行為可由人工對所述第一圖資執行，但可不限於此。之後，處理器104可基於上述第一參考標註圖資訓練參考單物件偵測模型及參考單物件分類模型。 In one embodiment, when the processor 104 determines that the reference designated detection object category (for example, "bus") among the plurality of designated detection object categories does not correspond to any one of the annotation data sets GA~GH and GN , The processor 104 may add a reference annotated image resource set corresponding to the reference specified detection object category in the image resource library 310. After that, the processor 104 may obtain a plurality of first image data from the image data library 310, and annotate a specific type of object (such as a bus) corresponding to the reference specified detection object category in each first image data to generate Multiple first reference annotations. In some embodiments, the above-mentioned labeling behavior may be manually performed on the first image resource, but it may not be limited to this. after, The processor 104 may train the reference single object detection model and the reference single object classification model based on the above-mentioned first reference annotation image data.

在一實施例中，反應於判定所述參考單物件偵測模型及所述參考單物件分類模型個別滿足上述目標條件，處理器104可將所述參考單物件分類模型新增至單物件偵測模型中，並將參考單物件分類模型新增至上述單物件分類模型及上述特定單物件偵測模型中。 In one embodiment, in response to determining that the reference single object detection model and the reference single object classification model meet the aforementioned target conditions, the processor 104 may add the reference single object classification model to the single object detection In the model, the reference single-object classification model is added to the above-mentioned single-object classification model and the above-mentioned specific single-object detection model.

在另一實施例中，反應於判定參考單物件偵測模型及參考單物件分類模型皆未滿足上述目標條件，且圖資庫310仍有未針對上述特定類物件(例如大客車)進行標註的多筆第二圖資，處理器104可經配置以：以參考單物件偵測模型對各第二圖資進行標註，以得到多個第二參考標註圖資；修正上述第二參考標註圖資，並將修正後的第二參考標註圖資新增至參考標註圖資集；基於參考標註圖資集再次訓練參考單物件偵測模型及參考單物件分類模型。處理器104可持續重複上述步驟，直至所述參考單物件偵測模型及所述參考單物件分類模型個別滿足上述目標條件，但可不限於此。在一些實施例中，修正各第二參考標註圖資的方式可包括但不限於：調整各第二參考標註圖資中用於標註特定類物件的邊界框位置/大小、刪除非必要之重複邊界框、調整特定類物件所在區域邊框或調整物件邊緣輪廓。 In another embodiment, it is reflected in the determination that the reference single object detection model and the reference single object classification model do not meet the above-mentioned target conditions, and the image database 310 still has unmarked objects of the above-mentioned specific category (for example, a bus) The processor 104 may be configured to: use the reference single object detection model to annotate each second image to obtain a plurality of second reference annotated images; modify the above-mentioned second reference annotated image , And add the revised second reference annotated image to the reference annotated image collection; retrain the reference single object detection model and reference single object classification model based on the reference annotated image collection. The processor 104 can continue to repeat the foregoing steps until the reference single object detection model and the reference single object classification model meet the foregoing target conditions, but it may not be limited thereto. In some embodiments, the method of modifying each second reference annotation image data may include, but is not limited to: adjusting the position/size of the bounding box used to mark a specific type of object in each second reference annotation image data, and deleting unnecessary duplicate borders. Frame, adjust the border of the area where a certain type of object is located, or adjust the outline of the edge of the object.

舉例而言，在一實施例中，假設圖資庫310中原本不存在標註圖資集GN，則處理器104可執行上述圖資集新增機制以在圖資庫310中新增對應於「物件N」的標註圖資集GN。之後，處理器104可基於標註圖資集GN訓練單物件偵測模型N ₀及單物件分類模型N’。並且，在單物件偵測模型N ₀及單物件分類模型N’個別滿足上述目標條件之後，處理器104可將單物件偵測模型N ₀亦視為特定單物件偵測模型，但可不限於此。 For example, in one embodiment, assuming that there is no annotated map resource set GN in the map database 310 originally, the processor 104 may execute the above-mentioned map resource set adding mechanism to add a map corresponding to " Annotated drawing data collection GN of "Object N". After that, the processor 104 may train the single-object detection model N ₀ and the single-object classification model N′ based on the annotated image data set GN. Furthermore, after the single-object detection model N ₀ and the single-object classification model N′ meet the above target conditions individually, the processor 104 may treat the single-object detection model N ₀ as a specific single-object detection model, but it may not be limited to this. .

在步驟S250中，處理器104可在所述多個標註圖資集GA~GH、GN中找出對應於所述多個指定偵測物件類別的多個特定標註圖資集，並據以訓練聯合物件偵測模型324。 In step S250, the processor 104 may find a plurality of specific annotated image sets corresponding to the plurality of specified detection object categories among the plurality of annotated image sets GA~GH, GN, and train them accordingly Joint object detection model 324.

在圖3情境中，對應於所述多個指定偵測物件類別的所述多個特定標註圖資集例如包括標註圖資集GA~GE、GN，而其可形成整合標註圖資集311。之後，處理器104可基於整合標註圖資集311來訓練訓練聯合物件偵測模型324。在此情況下，聯合物件偵測模型324應可具有偵測「物件A」、「物件B」、「物件C」、「物件D」、「物件E」及「物件N」的能力。 In the scenario of FIG. 3, the plurality of specific annotation map resource sets corresponding to the plurality of designated detection object categories include, for example, annotation map resource sets GA to GE, GN, and they can form an integrated annotation map resource set 311. After that, the processor 104 can train the joint object detection model 324 based on the integrated annotated image resource set 311. In this case, the joint object detection model 324 should be able to detect "object A", "object B", "object C", "object D", "object E" and "object N".

在本發明的實施例中，處理器104可對圖資庫310中的一或多個未標註圖資執行以下步驟。為便於說明，以下將針對未標註圖資中的一第一未標註圖資進行說明，但本發明可不限於此。 In an embodiment of the present invention, the processor 104 may perform the following steps on one or more unlabeled image assets in the image asset library 310. For the convenience of description, the following will describe a first unlabeled image among the unlabeled images, but the present invention may not be limited to this.

之後，在步驟S260中，處理器104可使用所述多個特定單物件偵測模型(即，單物件偵測模型A ₀~E ₀及N ₀)、所述多個外部物件偵測模型ext1~ext3及聯合物件偵測模型324對第一未標註圖資進行偵測，以產生多個第一物件偵測結果，其中所述多個第一物件偵測結果對應於所述多個指定偵測物件類別中的第一指定偵測物件類別。 Then, in step S260, the processor 104 may use the plurality of specific single object detection models (ie, single object detection models A ₀ ~ E ₀ and N ₀ ), the plurality of external object detection models ext1 ~ext3 and the combined object detection model 324 detect the first unlabeled image to generate multiple first object detection results, wherein the multiple first object detection results correspond to the multiple specified detection results The first specified detection object category in the detection object category.

在圖3中，偵測結果341例如可包括由所述多個特定單物件偵測模型(即，單物件偵測模型A ₀~E ₀及N ₀)對第一未標註圖資進行偵測的結果。如圖3所示，偵測結果341可包括以方框框起的代號A ₀~E ₀及N ₀，此即代表單物件偵測模型A ₀~E ₀及N ₀分別在第一未標註圖資中偵測到屬於「物件A」、「物件B」、「物件C」、「物件D」、「物件E」及「物件N」的物件，而每個被偵測到的物件例如可由對應的單物件偵測模型以邊界框或其他類似方式予以標註，且可具有對應的偵測信心值。 In FIG. 3, the detection result 341 may include, for example, the detection of the first unlabeled image data by the plurality of specific single-object detection models (ie, single-object detection models A ₀ ~ E ₀ and N ₀ ) the result of. As shown in FIG. 3, the detection result 341 may include the codes A ₀ ~ E ₀ and N ₀ enclosed by boxes, which means that the single object detection models A ₀ ~ E ₀ and N ₀ are respectively shown in the first unlabeled figure. Objects belonging to "Object A", "Object B", "Object C", "Object D", "Object E" and "Object N" are detected in the data, and each detected object can be mapped to The single-object detection model of is marked with bounding boxes or other similar methods, and can have corresponding detection confidence values.

此外，偵測結果342例如可包括由所述多個外部物件偵測模型ext1~ext3對第一未標註圖資進行偵測的結果。如圖3所示，偵測結果342可包括以方框框起的代號A ₁、B ₁、C ₂、A ₃、D ₃及E ₃，其中A ₁、B ₁代表外部物件偵測模型ext1在第一未標註圖資中偵測到屬於「物件A」、「物件B」的物件；C ₂代表外部物件偵測模型ext2在第一未標註圖資中偵測到屬於「物件C」的物件；A ₃、D ₃及E ₃代表外部物件偵測模型ext3在第一未標註圖資中偵測到屬於「物件A」、「物件D」、「物件E」的物件。相似地，在第一未標註圖資中，每個被外部物件偵測模型ext1~ext3偵測到的物件可由對應的外部物件偵測模型以邊界框或其他類似方式予以標註，且可具有對應的偵測信心值。 In addition, the detection result 342 may include, for example, the detection result of the first unlabeled image data by the plurality of external object detection models ext1 to ext3. As shown in FIG. 3, the detection result 342 may include the codes A ₁ , B ₁ , C ₂ , A ₃ , D ₃ and E ₃ framed by boxes, where A ₁ , B ₁ represent the external object detection model ext1 in Objects belonging to "Object A" and "Object B" are detected in the first unlabeled image; C ₂ represents the external object detection model ext2 detected objects belonging to "Object C" in the first unlabeled image ; A ₃ , D ₃ and E ₃ represent that the external object detection model ext3 detects objects belonging to "object A", "object D", and "object E" in the first unlabeled image. Similarly, in the first unlabeled image, each object detected by the external object detection model ext1~ext3 can be marked with a bounding box or other similar methods by the corresponding external object detection model, and can have a corresponding The detection confidence value.

另外，偵測結果343例如可包括由聯合物件偵測模型324 對第一未標註圖資進行偵測的結果。如圖3所示，偵測結果343可包括以方框框起的代號A ₄~E ₄及N ₄，此即代表聯合物件偵測模型324在第一未標註圖資中偵測到屬於「物件A」、「物件B」、「物件C」、「物件D」、「物件E」及「物件N」的物件。相似地，在第一未標註圖資中，每個被聯合物件偵測模型324偵測到的物件可由聯合物件偵測模型324以邊界框或其他類似方式予以標註，且可具有對應的偵測信心值。 In addition, the detection result 343 may include, for example, the detection result of the first unlabeled image data performed by the joint object detection model 324. As shown in Figure 3, the detection result 343 may include the codes A ₄ ~ E ₄ and N ₄ framed by squares, which means that the joint object detection model 324 has detected that it belongs to the "object" in the first unlabeled image. A", "Object B", "Object C", "Object D", "Object E" and "Object N". Similarly, in the first unlabeled image, each object detected by the combined object detection model 324 can be labeled by the combined object detection model 324 with a bounding box or other similar methods, and can have a corresponding detection. Confidence value.

為便於說明，以下假設第一指定偵測物件類別為「物件A」。在此情況下，處理器104於步驟S260中所取得所述多個第一物件偵測結果例如包括偵測結果341中的代號A ₀、偵測結果342中的代號A ₁、A ₃及偵測結果343中的代號A ₄，且其可個別具有對應的第一偵測信心值。 For ease of description, the following assumes that the first designated detection object category is "Object A". In this case, the detection results of the plurality of first objects obtained by the processor 104 in step S260 include, for example, the code A ₀ in the detection result 341, the codes A ₁ and A ₃ in the detection result 342, and the detection results. The code in the test result 343 is A ₄ , and it can each have a corresponding first detection confidence value.

之後，在步驟S270中，處理器104可使用所述多個單物件分類模型中對應於第一指定偵測物件類別的第一單物件分類模型對各第一物件偵測結果進行預測分類以得到各第一物件偵測結果的第一物件分類結果。 After that, in step S270, the processor 104 may use the first single-object classification model corresponding to the first designated detection object category among the plurality of single-object classification models to predict and classify the detection results of each first object to obtain The first object classification result of each first object detection result.

承上例，在第一指定偵測物件類別經假設為「物件A」的情況下，處理器104可將單物件分類模型A’作為對應於第一指定偵測物件類別的第一單物件分類模型。之後，處理器104可使用單物件分類模型A’對上述第一物件偵測結果進行預測分類以得到各第一物件偵測結果的第一物件分類結果。 Continuing from the above example, if the first designated detection object category is assumed to be "Object A", the processor 104 can use the single object classification model A'as the first single object classification corresponding to the first designated detection object category Model. After that, the processor 104 may use the single-object classification model A'to predict and classify the first object detection result to obtain the first object classification result of each first object detection result.

在一實施例中，處理器104可使用單物件分類模型A’對偵測結果341中的代號A ₀進行預測分類。例如，處理器104可使用單物件分類模型A’對由單物件偵測模型A ₀於第一未標註圖資中框選的影像區域進行預測分類，以得到相應的一第一分類信心值作為對於此影像區域的第一物件分類結果，其中所述第一分類信心值可表徵單物件偵測模型A ₀偵測的正確性。 In an embodiment, the processor 104 may use the single-object classification model A′ to predict and classify the code A _{0 in the detection result 341.} For example, the processor 104 may use the single-object classification model A'to predict and classify the image area selected by the single-object detection model A ₀ in the first unlabeled image to obtain a corresponding first classification confidence value as For the first object classification result of this image area, the first classification confidence value can represent the accuracy of the single object detection model A _{0 detection.}

在另一實施例中，處理器104可使用單物件分類模型A’對偵測結果343中的代號A ₄進行預測分類。例如，處理器104可使用單物件分類模型A’對由聯合物件偵測模型324於第一未標註圖資中框選的影像區域進行預測分類，以得到相應的一第一分類信心值作為對於此影像區域的第一物件分類結果，其中所述第一分類信心值可表徵聯合物件偵測模型324偵測的正確性。 In another embodiment, the processor 104 may use the single-object classification model A′ to predict and classify the code A _{4 in the detection result 343.} For example, the processor 104 may use the single object classification model A'to predict and classify the image area selected by the joint object detection model 324 in the first unlabeled image to obtain a corresponding first classification confidence value as a reference The first object classification result of this image area, where the first classification confidence value can represent the correctness of the joint object detection model 324 detection.

針對偵測結果342中的代號A ₁、A ₃，處理器104可使用單物件分類模型A’執行與上述類似的機制，以得到其個別對應的第一物件分類結果，其細節於此不另贅述。 Regarding the codes A ₁ and A ₃ in the detection result 342, the processor 104 can use the single-object classification model A'to execute a similar mechanism to the above to obtain the first object classification result corresponding to each of them. The details are not mentioned here. Go into details.

之後，在步驟S280中，處理器104可基於各第一物件偵測結果及對應的第一物件分類結果適應性地修正所述多個第一物件偵測結果，並將修正後的所述多個第一物件偵測結果加入所述多個標註圖資集中對應於第一指定偵測物件類別(例如「物件A」)的第一標註圖資集(例如，標註圖資集GA)。 Afterwards, in step S280, the processor 104 may adaptively correct the detection results of the plurality of first objects based on the detection results of each first object and the corresponding first object classification results, and combine the corrected plurality of first object detection results. The first object detection result is added to the first annotation image data set (for example, annotation image data set GA) corresponding to the first specified detection object category (for example, "object A") in the plurality of annotation image data sets.

在第一實施例中，反應於判定各第一物件偵測結果的第一偵測信心值皆高於第一偵測信心閾值，所述多個第一物件偵測結果彼此的相似度皆滿足相似度條件，且各第一物件偵測結果對應的第一分類信心值皆高於第一分類信心閾值，處理器104可相應地判定所述多個第一物件偵測結果不需修正，並將所述多個第一物件偵測結果加入第一標註圖資集(例如，標註圖資集GA)。 In the first embodiment, the first detection confidence value reflecting the determination that the detection result of each first object is higher than the first detection confidence threshold, and the similarity of the detection results of the plurality of first objects to each other is satisfied Similarity conditions, and the detection results of each first object are right The corresponding first classification confidence values are all higher than the first classification confidence threshold, the processor 104 may accordingly determine that the detection results of the plurality of first objects do not need to be corrected, and add the detection results of the plurality of first objects The first annotated image collection (for example, annotated image collection GA).

在一些實施例中，上述相似度條件可包括「邊框交疊率(Intersection over Union)大於交疊率差異基準閾值」，但可不限於此。 In some embodiments, the above-mentioned similarity condition may include "the frame overlap rate (Intersection over Union) is greater than the overlap rate difference reference threshold", but it may not be limited thereto.

在第二實施例中，反應於判定任一第一物件偵測結果的第一偵測信心值不高於第一偵測信心閾值，所述多個第一物件偵測結果的其中之二的相似度未滿足上述相似度條件，或任一第一物件偵測結果對應的第一分類信心值未高於第一分類信心閾值，處理器104可相應地判定所述多個第一物件偵測結果需修正，並將修正後的所述多個第一物件偵測結果加入第一標註圖資集(例如，標註圖資集GA)。 In the second embodiment, the first detection confidence value reflecting the determination that any one of the first object detection results is not higher than the first detection confidence threshold value is that two of the plurality of first object detection results If the similarity does not satisfy the above-mentioned similarity condition, or the first classification confidence value corresponding to any of the first object detection results is not higher than the first classification confidence threshold, the processor 104 may determine that the plurality of first objects are detected accordingly The result needs to be corrected, and the corrected detection results of the plurality of first objects are added to the first annotation collection (for example, the annotation collection GA).

之後，在步驟S290中，處理器104可基於第一標註圖資集重新訓練所述多個單物件偵測模型中對應於第一指定偵測物件類別的第一單物件偵測模型(例如單物件偵測模型A ₀)、第一單物件分類模型(例如單物件分類模型A’)及聯合物件偵測模型324。 After that, in step S290, the processor 104 may retrain the first single-object detection model corresponding to the first specified detection object category among the plurality of single-object detection models (for example, single-object The object detection model A ₀ ), the first single object classification model (for example, the single object classification model A′), and the joint object detection model 324.

在一些實施例中，處理器104可針對圖資庫310中的其他未標註圖資重複地執行步驟S260~S290，直至聯合物件偵測模型324滿足設計者所設定的一或多個目標條件。 In some embodiments, the processor 104 may repeatedly execute steps S260 to S290 for other unlabeled images in the image library 310 until the joint object detection model 324 meets one or more target conditions set by the designer.

例如，對於圖資庫310中的一第二未標註圖資而言，處理器104可經配置以：使用所述多個特定單物件偵測模型、所述多個外部物件偵測模型及聯合物件偵測模型對第二未標註圖資進行偵測，以產生多個第二物件偵測結果(例如偵測結果341中的代號B ₀、偵測結果342中的代號B ₁及偵測結果343中的代號B ₄)，其中所述多個第二物件偵測結果對應於所述多個指定偵測物件類別中的一第二指定偵測物件類別(例如「物件B」)；使用所述多個單物件分類模型中對應於第二指定偵測物件類別的第二單物件分類模型(例如單物件分類模型B’)對各第二物件偵測結果進行預測分類以得到各第二物件偵測結果的第二物件分類結果；基於各第二物件偵測結果及對應的第二物件分類結果適應性地修正所述多個第二物件偵測結果，並將修正後的所述多個第二物件偵測結果加入所述多個標註圖資集中對應於第二指定偵測物件類別的第二標註圖資集(例如標註圖資集GB)；基於第二標註圖資集重新訓練所述多個單物件偵測模型中對應於第二指定偵測物件類別的第二單物件偵測模型(例如單物件偵測模型B ₀)、第二單物件分類模型及聯合物件偵測模型324。 For example, for a second unlabeled image in the image library 310, the processor 104 may be configured to: use the plurality of specific single object detection models, the plurality of external object detection models, and the joint detecting a second object model for detecting unlabeled map data to generate a plurality of second object detection result (e.g. code detection result B _{0 341,} 342 of the detection result and the detection result of the code B ₁ Code B _{4 in} 343), wherein the detection results of the plurality of second objects correspond to a second specified detection object category (for example, "object B") among the plurality of specified detection object categories; A second single-object classification model (for example, single-object classification model B') corresponding to the second designated detection object category among the plurality of single-object classification models predicts and classifies the detection results of each second object to obtain each second object The second object classification result of the detection result; based on each second object detection result and the corresponding second object classification result, the plurality of second object detection results are adaptively corrected, and the plurality of corrected The second object detection result is added to the second annotated image resource set (for example, annotated image resource set GB) corresponding to the second specified detection object category in the multiple annotated image resource sets; the training institute is retrained based on the second annotated image resource set A second single-object detection model (for example, a single-object detection model B ₀ ), a second single-object classification model, and a joint object detection model corresponding to the second designated detection object category among the plurality of single-object detection models 324 .

在不同的實施例中，聯合物件偵測模型324需滿足的目標條件可包括「訓練圖資中的新增案場圖資大於規劃新增圖資數量」、「平均物件預測區域邊框與原標註區域邊框交疊率(Intersection over Union)大於交疊率目標閾值」、「各個單一物件特徵類別平均精度(Average Precision)皆大於單物件平均精度目標閾值」、「聯合物件偵測模型324之所有可偵測之物件特徵類別之平均精度均值(Mean Average Precision)大於模型平均精度目標閾值」、「模型準確度(accuracy)大於模型準確度目標閾值」、「模型損失函數小於模型損失函數目標閾值」的至少其中之一，但可不限於此。 In different embodiments, the target conditions to be met by the joint object detection model 324 may include "the number of new plots in the training plot is greater than the number of new plots in the plan", and the "average object prediction area border and original label The area border overlap rate (Intersection over Union) is greater than the target threshold of the overlap rate", "The average precision of each single object feature category (Average Precision) is greater than the single object average accuracy target threshold", "All available objects of the joint object detection model 324 The Mean Average Precision of the detected object feature category is greater than the model average precision target threshold At least one of "value", "model accuracy (accuracy) greater than model accuracy target threshold", "model loss function less than model loss function target threshold", but not limited to this.

在一些實施例中，在聯合物件模型324訓練的初期時，處理器104在執行步驟S280時可採用第一實施例中記載的機制來將未修正的所述多個第一物件偵測結果加入第一標註圖資集(例如，標註圖資集GA)，以較為快速地提升聯合物件偵測模型324的偵測能力。 In some embodiments, at the initial stage of the training of the joint object model 324, the processor 104 may use the mechanism described in the first embodiment to add the uncorrected detection results of the plurality of first objects when performing step S280 The first annotated image collection (for example, annotated image collection GA), so as to relatively quickly improve the detection capability of the joint object detection model 324.

另外，在聯合物件模型324訓練的中期時，處理器104在執行步驟S280時可採用第二實施例中記載的機制來將修正後的所述多個第一物件偵測結果加入第一標註圖資集(例如，標註圖資集GA)，以藉由較為多樣的訓練資料進一步提升聯合物件偵測模型324的偵測能力。 In addition, in the middle of the training of the joint object model 324, the processor 104 may use the mechanism described in the second embodiment when executing step S280 to add the corrected detection results of the plurality of first objects to the first annotated image Data collection (for example, annotated map data collection GA) to further enhance the detection capability of the joint object detection model 324 by using more diverse training data.

此外，在聯合物件偵測模型324訓練的後期時，由於可能有某些標註圖資集(例如標註圖資集GA)的內容較為缺少，使得聯合物件偵測模型324對於屬於對應指定偵測物件類別(例如「物件A」)的物件的偵測能力較為不佳。因此，在一些實施例中，處理器104在執行步驟S260時可僅要求所述多個特定單物件偵測模型、所述多個外部物件偵測模型及聯合物件偵測模型僅偵測屬於上述指定偵測物件類別的物件，以增加其對應的訓練資料量。 In addition, in the later stage of the training of the joint object detection model 324, because there may be a lack of content in some annotated image collections (for example, annotated image collection GA), the combined object detection model 324 is therefore The detection capability of objects of the category (such as "Object A") is relatively poor. Therefore, in some embodiments, the processor 104 may only request the specific single object detection models, the multiple external object detection models, and the joint object detection model to detect only those belonging to the above when performing step S260. Specify objects of the detection object category to increase the amount of corresponding training data.

例如，假設標註圖資集GA的內容較為缺少，則處理器104在執行步驟S260時可要求對應的第一單物件偵測模型(即，單物件偵測模型A ₀)、外部物件偵測模型ext1~ext3及聯合物件偵測模型324對其他的未標註圖資進行偵測，並依據先前的教示將相關的偵測結果修正後加入標註圖資集GA。之後，處理器104可再依據標註圖資集GA訓練聯合物件偵測模型324(及單物件偵測模型A ₀)，以強化聯合物件偵測模型324偵測「物件A」的能力，但可不限於此。 For example, assuming that the content of the annotated image collection GA is relatively lacking, the processor 104 may request the corresponding first single-object detection model (ie, single-object detection model A ₀ ) and external object detection model when performing step S260 ext1~ext3 and the joint object detection model 324 detect other unlabeled images, and add relevant detection results to the labeled image collection GA according to the previous teaching. After that, the processor 104 can train the joint object detection model 324 (and the single object detection model A ₀ ) according to the annotated image data set GA to enhance the ability of the joint object detection model 324 to detect "object A", but it may not Limited to this.

在一些實施例中，若處理器104判定聯合物件偵測模型324已滿足上述目標條件，則處理器104可判定已完成聯合物件偵測模型324的訓練。在此情況下，訓練後的聯合物件偵測模型324即可用於對上述特定場域進行「物件A」、「物件B」、「物件C」、「物件D」、「物件E」及「物件N」的偵測，但可不限於此。 In some embodiments, if the processor 104 determines that the joint object detection model 324 has met the aforementioned target conditions, the processor 104 may determine that the training of the joint object detection model 324 has been completed. In this case, the trained joint object detection model 324 can be used to perform "object A", "object B", "object C", "object D", "object E" and "object N" detection, but not limited to this.

綜上所述，本發明至少具備以下特點。 In summary, the present invention has at least the following features.

(1)累積標註成果，彈性組合應用：可將既有的多個訓練標註圖資集，以各個物件特徵類別進行抽離、分類、再彙集建立各個單物件特徵類別的標註資料集，並訓練產生對應的各個單物件特徵類別的單物件偵測模型。以此方式可累積影像標註成果、可簡化後續更新維護成本、增加建立新應用的模型效率與組合彈性。 (1) Cumulative labeling results, flexible combination application: multiple existing training labeling image data sets can be separated, classified, and re-assembled for each object feature category to create a labeling data set for each single object feature category, and train Generate single-object detection models corresponding to each single-object feature category. In this way, image annotation results can be accumulated, subsequent update and maintenance costs can be simplified, and model efficiency and combination flexibility for establishing new applications can be increased.

(2)可融合既有模型成果：對於未提供訓練資料集的模型，透過預測標註模組，對既有圖資庫進行預測標註，除可得到該模型之各個物件特徵類別之預測準確率，亦可進行反向驗證該既有對應之標註資料集的資料是否準確，並修正標註資料。當該模型具有標註圖資集中未標註的一物件特徵類別，根據本發明的特點，可將該一物件特徵類別之修正後標註資料儲存於該一物件特徵類別標註圖資集，實現融合既有模型成果。 (2) Existing model results can be integrated: For models that do not provide training data sets, predict and label the existing map database through the predictive labeling module. In addition to obtaining the prediction accuracy of each object feature category of the model, It is also possible to reversely verify whether the data in the existing corresponding labeled data set is accurate, and correct the labeled data. When should The model has a feature category of an object that is not labeled in the annotated map data set. According to the characteristics of the present invention, the corrected annotation data of the feature category of the object can be stored in the annotated map data set of the feature category of an object to realize the integration of existing model results .

(3)利用既有模型比對篩選資料：可利用組織內或外部的既有機器學習物件偵測模型幫助篩選未標註圖資，亦可作為目標模型預測標註結果的比對資訊、幫助檢測與建立各個單物件特徵類別的標註圖資集。 (3) Use existing models to compare and filter data: Existing machine learning object detection models within or outside the organization can be used to help filter unlabeled images, and can also be used as comparison information for target model prediction and labeling results to help detect and Establish annotated map data set of each single object feature category.

(4)以單物件偵測模型為預測比對標的：利用單物件特徵類別之AI偵測模型對比多種物件特徵類別之AI偵測模型的優勢，有訓練更快收斂、圖資標註快且易多樣化、準確度較高、預測速度較快的優勢，可為主要AI偵測模型預測的比對標的，取代初期圖資之篩選、標註、審查人力。 (4) The single-object detection model is used as the prediction comparison target: the advantage of using the AI detection model of single object feature category to compare with the AI detection model of multiple object feature categories, has faster training convergence, quick and easy map data labeling The advantages of diversification, higher accuracy, and faster prediction speed can be the comparison target for the main AI detection model prediction, replacing the initial map data selection, labeling, and review manpower.

(5)可快速且彈性擴充訓練標註圖資：當有擴增新物件特徵類別的需求，可從既有圖資庫中標註部分少量的圖資，經由建立單物件特徵類別的物件偵測模型，再透過預測標註模組篩選與擴充標註訓練圖資的迭代方式完成所有新進標註圖資與精煉模型。 (5) Can quickly and flexibly expand the training and labeling image data: when there is a need to expand the feature category of new objects, a small amount of image data can be labeled from the existing image database, and the object detection model of a single object feature category can be created , And then through the iterative method of predictive annotation module screening and expansion of annotation training image resources to complete all new annotation image resources and refined models.

(6)可提供物件特徵類別的菜單自動產生訓練圖資與應用模型：依應用需求挑選欲偵測的物件特徵類別，初期可從標註圖資集快速組合出符合新應用所需的多項物件特徵類別的訓練標註圖資，訓練後得到所需求的聯合物件偵測模型。後續以需求組合內的各個單物件特徵類別的物件偵測模型，對新投入未標註之圖資進行預測，再以多模型預測結果比對機制進行標註圖資擴充與目標聯合模型的精練。 (6) A menu of object feature categories can be provided to automatically generate training images and application models: select the object feature categories to be detected according to the application needs, and initially can quickly combine multiple object features that meet the requirements of the new application from the set of labeled images The training annotated image data of the category, and the required joint object detection model is obtained after training. Subsequent to the object detection model of each single object feature category in the demand combination, the new investment is not marked The map data is predicted, and then the multi-model prediction result comparison mechanism is used to mark the map data expansion and refine the target joint model.

(7)多模型預測結果比對機制：本發明的多模型預測結果比對機制可更精準挑選出所需要的圖資進行標註；經由對比挑出AI模型預測不準確的新圖資再投入訓練，讓訓練模型更有效率；可快速篩選所需特定物件影像資料，解決訓練圖資偏頗問題；幫助降低AI物件偵測模型的精練成本，減少目標模型的迭代訓練週期，加快AI模型商用時程。 (7) Multi-model prediction result comparison mechanism: The multi-model prediction result comparison mechanism of the present invention can more accurately select the required image data for labeling; through comparison, new images with inaccurate AI model predictions are selected and then put into training. Make training models more efficient; quickly filter the image data of specific objects needed to solve the problem of biased training maps; help reduce the cost of refining the AI object detection model, reduce the iterative training cycle of the target model, and accelerate the commercialization of AI models.

(8)提供對預測結果之物件類別的審查機制：透過建立單物件特徵類別的物件分類模型，可複檢該相同物件特徵類別的物件偵測模型之偵測結果之物件邊界框內之區域影像，是否為該物件特徵類別之正確性。 (8) Provide a review mechanism for the object category of the prediction result: by creating an object classification model of a single object feature category, the area image within the object bounding box of the detection result of the object detection model of the same object feature category can be rechecked , Whether it is the correctness of the feature category of the object.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be subject to those defined by the attached patent scope.

S210~S290:步驟 S210~S290: steps

Claims

A method for optimizing a joint object detection model, wherein the method is executed by a processor of a server, and the method includes: obtaining an image database, wherein the image database includes a first unlabeled image data and a plurality of annotations Map data sets, each of the labeled map data sets corresponds to a single type of object; training multiple single-object detection models and multiple single-object classification models with these labeled map data sets, wherein the labeled map data sets are one-to-one Corresponding to the single-object detection models, and the labeling data sets correspond to the single-object classification models one-to-one; obtain multiple designated detection object categories, and obtain multiple external object detection models accordingly , Wherein each of the external object detection models is used to detect objects belonging to at least one of the specified detection object categories; in response to determining that each of the detection object categories corresponds to one of the annotation image data sets , From the single-object detection models, find out the specific single-object detection models corresponding to the specified detection object categories; find out the multiple specific detection object categories corresponding to the specified detection object categories in the annotation map data sets A set of specific labeled image data, and training a joint object detection model accordingly; use the specific single object detection models, the external object detection models, and the combined object detection model to the first unlabeled image data Perform detection to generate a plurality of first object detection results, where the first object detection results correspond to a first specified detection object category among the specified detection object categories; use the single object classification A first single object classification model corresponding to the first designated detection object category in the model predicts and classifies the detection results of each first object To obtain a first object classification result of each of the first object detection results; adaptively correct the first object detection results based on each of the first object detection results and the corresponding first object classification results, and Add the corrected detection results of the first objects to a first annotation image data set corresponding to the first designated detection object category in the annotation image data sets; retrain the first annotation image data sets based on the first annotation image data set In the single-object detection model, a first single-object detection model corresponding to the first designated detection object category, the first single-object classification model, and the combined object detection model.

The method according to claim 1, wherein in response to determining that a reference designated detection object category in the detection object categories does not correspond to any one of the annotation image data sets, the method further includes: A reference annotated image collection corresponding to the reference specified detection object category is newly added to the annotation image data sets of the database; a plurality of first image resources are obtained from the image library and placed in each of the first image resources Annotate a specific type of object corresponding to the reference designated detection object category to generate a plurality of first reference annotated images; train a reference list object detection model and a reference list based on the first reference annotated images Object classification model; in response to determining that the reference single object detection model and the reference single object classification model meet at least one target condition individually, the reference single object classification model is added to the single object detection models, and the reference single object classification model is added to the single object detection models. The reference single object classification model is added to the single object classification models and the specific single object detection models.

The method according to claim 2, wherein it is reflected in determining that the reference list object detection model and the reference list object classification model do not meet the at least one target condition, and the image database still has not been processed for the specific type of object The method further includes: marking each of the second drawing materials with the reference list object detection model to obtain a plurality of second reference marking drawing materials; revising the second reference drawing materials Annotate images, and add the revised second reference annotated images to the reference annotated image collection; retrain the reference list object detection model and the reference list object classification model based on the reference annotated image collection .

The method according to claim 1, wherein the image database further includes a second unlabeled image asset, and the method further includes: using the specific single object detection models, the external object detection models, and the The joint object detection model detects the second unlabeled image data to generate a plurality of second object detection results, wherein the second object detection results correspond to a first of the specified detected object categories 2. Specify the detection object category; use a second single object classification model corresponding to the second specified detection object category in the single object classification models to predict and classify the detection results of each second object to obtain each second object A second object classification result of two object detection results; based on each of the second object detection results and the corresponding second object classification results, the second object detection results are adaptively corrected, and the corrected The detection results of some second objects are added to the annotation map data set corresponding to the second designated detection object category Based on the second annotated image data set; retrain a second single object detection model corresponding to the second designated detection object category in the single object detection models based on the second annotated image data set, the second Single object classification model and the combined object detection model.

The method according to claim 1, wherein in response to determining that the joint object detection model has met at least one target condition, the method further includes: determining that the training of the joint object detection model has been completed.

The method according to claim 1, wherein each of the first object detection results has a corresponding first detection confidence value, and the first object classification result of each of the first object detection results has a first classification confidence The step of adaptively correcting the detection results of the first objects based on the detection results of each of the first objects and the corresponding classification results of the first objects includes: responding to the determination of the detection results of each of the first objects The first detection confidence value is higher than a first detection confidence threshold, the similarity of the detection results of the first objects meets the similarity condition, and the detection result of each first object corresponds to the first category If the confidence values are all higher than a first classification confidence threshold, it is determined that the detection results of the first objects do not need to be corrected, and the detection results of the first objects are added to the first annotation image data set.

The method according to claim 1, wherein each of the first object detection results has a corresponding first detection confidence value, and the first object classification result of each of the first object detection results has a first classification confidence The step of adaptively correcting the detection results of the first objects based on the detection results of the first objects and the corresponding classification results of the first objects includes: In response to determining that the first detection confidence value of any of the first object detection results is not higher than a first detection confidence threshold, the similarity of two of the first object detection results does not satisfy the similarity Condition, or the first classification confidence value corresponding to any one of the first object detection results is not higher than a first classification confidence threshold, it is determined that the first object detection results need to be corrected, and the corrected first object detection results An object detection result is added to the first annotation image collection.

A server for optimizing a joint object detection model includes: a storage circuit storing a plurality of modules; and a processor, coupled to the storage circuit, accessing the modules to perform the following steps: obtaining a map database , Wherein the image data library includes a first unlabeled image data and a plurality of annotated image data sets, each of which annotated image data set corresponds to a single type of object; training multiple single-object detection models with these annotated image data sets and A plurality of single-object classification models, wherein the labeled image data sets correspond to the single-object detection models one-to-one, and the labeled image data sets correspond to the single-object classification models one-to-one; A designated detection object category, and a plurality of external object detection models are obtained accordingly, wherein each of the external object detection models is used to detect objects belonging to at least one of the specified detection object categories; in response to the determination Each detection object category corresponds to one of the annotation image data sets, and a plurality of specific single object detection models corresponding to the specified detection object categories are found from the single object detection models; in the Some labeled images are concentrated to find out the categories corresponding to the specified detected objects A set of specific annotation maps of, and training a joint object detection model accordingly; use the specific single object detection models, the external object detection models, and the joint object detection model to the first unlabeled The map data is detected to generate a plurality of first object detection results, where the first object detection results correspond to a first specified detection object category among the specified detection object categories; use the lists In the object classification model, a first single object classification model corresponding to the first designated detection object type performs predictive classification on each of the first object detection results to obtain a first object classification of each of the first object detection results Results; based on the detection results of the first objects and the corresponding classification results of the first objects, adaptively correct the detection results of the first objects, and add the corrected detection results of the first objects to the labels The image data set corresponds to a first annotated image data set of the first designated detection object category; based on the first annotated image data set, the single object detection models are retrained to correspond to the first designated detection object category A first single-object detection model, the first single-object classification model, and the combined object detection model of.

The server according to claim 8, wherein in response to determining that a reference-designated detected object type among the detected object types does not correspond to any one of the labeled image data sets, the processor is further configured to: Add a reference annotated image resource set corresponding to the reference designated detection object category to the annotated image resource sets in the image database; Obtain multiple first image data from the image database, and mark a specific type of object corresponding to the reference designated detection object category in each of the first image data to generate multiple first reference annotation image data Training a reference single object detection model and a reference single object classification model based on the first reference annotated images; in response to determining that the reference single object detection model and the reference single object classification model meet at least one target condition individually, The reference single object classification model is added to the single object detection models, and the reference single object classification model is added to the single object classification models and the specific single object detection models.

The server according to claim 9, wherein it is reflected in determining that the reference list object detection model and the reference list object classification model do not meet the at least one target condition, and the image database is still not targeted for the specific type of object The processor is further configured to: use the reference single object detection model to annotate each of the second image data to obtain a plurality of second reference image data; The second reference annotated image data, and the revised second reference annotated images are added to the reference annotated image resource set; based on the reference annotated image data set, the reference list object detection model and the reference list are retrained Object classification model.

The server according to claim 8, wherein the image database further includes a second unlabeled image asset, and the processor is further configured to: use the specific single object detection models and the external object detection Model and the The joint object detection model detects the second unlabeled image data to generate a plurality of second object detection results, wherein the second object detection results correspond to a first of the specified detected object categories 2. Specify the detection object category; use a second single object classification model corresponding to the second specified detection object category in the single object classification models to predict and classify the detection results of each second object to obtain each second object A second object classification result of two object detection results; based on each of the second object detection results and the corresponding second object classification results, the second object detection results are adaptively corrected, and the corrected The detection results of the second objects are added to a second annotated image data set corresponding to the second designated detection object category in the annotated image data sets; the single-object detection models are retrained based on the second annotated image data set A second single-object detection model, the second single-object classification model, and the combined object detection model corresponding to the second designated detection object category in.

The server according to claim 8, wherein in response to determining that the joint object detection model has met at least one target condition, the processor is further configured to determine that the training of the joint object detection model has been completed.

The server according to claim 8, wherein each of the first object detection results has a corresponding first detection confidence value, and the first object classification result of each of the first object detection results has a first classification A confidence value, and the processor is configured to: reflect that the first detection confidence value of each of the first object detection results is higher than a first detection confidence threshold, and the first object detection results are mutually exclusive The similarity of both If the similarity condition is met, and the first classification confidence value corresponding to each of the first object detection results is higher than a first classification confidence threshold, it is determined that the first object detection results do not need to be corrected, and the first object detection results are An object detection result is added to the first annotation image collection.

The server according to claim 8, wherein each of the first object detection results has a corresponding first detection confidence value, and the first object classification result of each of the first object detection results has a first classification A confidence value, and the processor is configured to: respond to determining that the first detection confidence value of any of the first object detection results is not higher than a first detection confidence threshold, the first object detection results The similarity of two of them does not meet the similarity condition, or the first classification confidence value corresponding to any of the first object detection results is not higher than a first classification confidence threshold, then the first object detection results are determined It needs to be corrected, and the corrected detection results of the first objects are added to the first marked image collection.