[go: up one dir, main page]

WO2025052642A1 - Training information creation assistance device, image classification device, and program - Google Patents

Training information creation assistance device, image classification device, and program Download PDF

Info

Publication number
WO2025052642A1
WO2025052642A1 PCT/JP2023/032758 JP2023032758W WO2025052642A1 WO 2025052642 A1 WO2025052642 A1 WO 2025052642A1 JP 2023032758 W JP2023032758 W JP 2023032758W WO 2025052642 A1 WO2025052642 A1 WO 2025052642A1
Authority
WO
WIPO (PCT)
Prior art keywords
category
score
image
uncertainty
correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2023/032758
Other languages
French (fr)
Japanese (ja)
Inventor
壮太 小松
昌義 石川
軍 陳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi High Tech Corp
Original Assignee
Hitachi High Tech Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi High Tech Corp filed Critical Hitachi High Tech Corp
Priority to PCT/JP2023/032758 priority Critical patent/WO2025052642A1/en
Priority to TW113128043A priority patent/TW202512113A/en
Publication of WO2025052642A1 publication Critical patent/WO2025052642A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a teacher information creation support device, an image classification device, and a program for reducing the cost of creating teacher data in, for example, an image classification system with a learning function.
  • Creating a pixel-by-pixel image classification model using deep learning requires teacher information, but the cost of adding teacher information is high.
  • teacher information As a countermeasure, there is interactive segmentation, where the model learns in collaboration with the user by having the user indicate the parts and content of corrections to be made to the model's inference results.
  • One possible way to select the parts to be corrected is to select parts with a low probability of inference from among the parts inferred as the focus category, but since parts that are not inferred as the focus category are not selected, false negative parts are not selected and the accuracy does not improve.
  • Patent Document 1 the tendency for inference probability to decrease is utilized to select areas with low model inference probability as candidates for correction areas, but misclassified areas unrelated to the target category may be selected, making it difficult to improve the accuracy of the target category.
  • Patent Document 2 among the parts classified as the target category, parts with a low probability of inference are selected as candidates for correction parts, but parts that are not inferred to be in the target category are not selected, so false negative parts are not selected and accuracy is not improved.
  • the purpose of this invention is to present areas requiring correction, including false negatives, that are expected to improve the accuracy of the focus category.
  • An example of a teacher information creation support device is: a category score calculation unit that calculates, for an image consisting of a plurality of pixels, a category score indicating a degree of certainty that each combination of a pixel and a category belongs to the corresponding category, using a segmentation model; An inference image determination unit that calculates an inference image based on the category score; an adjacent category calculation unit that receives an input of a focus category and calculates an adjacent category that is likely to be adjacent to the focus category based on the inference image and the focus category; a correction-required score calculation unit that determines a correction-required score for at least one pixel based on the focus category, the neighboring categories, and the category scores; has.
  • An example of an image classification device includes: an image category inference unit that uses a segmentation model to calculate a category score indicating a degree of certainty that a pixel belongs to a category for each combination of a pixel and a category for an image consisting of a plurality of pixels, and calculates an inferred image based on the category score;
  • a teacher information creation GUI having a focus category setting unit; having The attention category setting unit receives an input of an attention category, The image classification device calculates a portion to be corrected based on the attention category and the category score,
  • the teacher information creation GUI includes: Output the part that needs to be corrected, The teacher information is accepted via the teacher information providing unit.
  • One example of the program according to the present invention causes a computer to function as the teacher information creation support device described above.
  • One example of the program according to the present invention causes a computer to function as the image classification device described above.
  • the present invention can suggest areas that need correction to improve accuracy against false negatives in a category of interest.
  • FIG. 1 is an example of a functional configuration of a teacher information creation support device according to a first embodiment of the present invention.
  • 13 shows an example of processing by an adjacent category calculation unit.
  • 13 shows an example of a processing result of an adjacent category calculation unit.
  • 13 illustrates an example of a correction-required score calculation unit. Another example of how to calculate uncertainty.
  • 13 is an example of a functional configuration of an image classification device according to a second embodiment of the present invention.
  • 13 is an example of a functional configuration of an image classification device according to a third embodiment of the present invention.
  • 13 is another example of a functional configuration of an image classification device according to a third embodiment of the present invention.
  • 13 is an example of a teacher information creation GUI.
  • 13 is another example of a teacher information creation GUI.
  • Example 1 shows an example of the functional configuration of a teacher information creation support device according to a first embodiment of the present invention.
  • the teacher information creation support device 100 receives an input image D1, a segmentation model D2, and a focus category D4 as inputs, and includes a category score calculation unit 101, an inference image determination unit 102, an adjacent category calculation unit 103, and a correction required score calculation unit 104.
  • the category score calculation unit 101 uses a segmentation model D2 to calculate a category score D3 for each combination of pixel and category for an input image D1 consisting of multiple pixels, which indicates the degree of certainty that the pixel belongs to that category.
  • the category score D3 is a vector having values indicating the likelihood of each category being identified for each pixel in the input image D1.
  • the inference image determination unit 102 calculates an inference image D5 based on the category score D3.
  • the inference image D5 is an image in which a single category has been identified for each pixel through inference. Specifically, the inference image D5 can be calculated by determining the category with the maximum value for the category score D3 for each pixel as the identification category.
  • the adjacent category calculation unit 103 receives the input of the focused category D4, and calculates the adjacent category D6 based on the inference image D5 and the focused category D4.
  • An adjacent category means a category that is likely to be adjacent to the focused category (i.e., a category with a high adjacent likelihood), and a specific example will be described later.
  • the correction score calculation unit 104 determines the correction score D7 based on the focused category D4, the adjacent category D6, and the category score D3.
  • the correction score is calculated, for example, for a portion that is difficult for the segmentation model D2 to identify in relation to the focused category, and a specific example will be described later.
  • the teacher information creation support device 100 can be configured, for example, using a computer.
  • the computer has a hardware configuration as a known computer, and includes, for example, a calculation means and a storage means.
  • the calculation means includes, for example, a processor
  • the storage means includes, for example, a storage medium such as a semiconductor memory device and a magnetic disk device. Some or all of the storage medium may be a non-transitory storage medium.
  • the computer may also be equipped with input/output means.
  • the input/output means may include, for example, input devices such as a keyboard and a mouse, output devices such as a display and a printer, and communication devices such as a network interface.
  • the storage means may store a program.
  • the processor executes this program, the computer functions as the teacher information creation support device 100 according to this embodiment or a device according to another embodiment.
  • Segmentation model D2 is a trained model that has undergone prior machine learning and has the ability to discriminate images. However, the prior training may be performed using a dataset that has similar characteristics to input image D1, or may be performed using a dataset that is unrelated to input image D1.
  • the adjacent category calculation unit 103 extracts pixels from the inference image D5 that belong to the attention category. Then, for each pixel inferred to belong to the attention category, pixels adjacent to that pixel are extracted as adjacent pixels (however, if the adjacent pixels also belong to the attention category, those pixels are not extracted as adjacent pixels).
  • the group of pixels adjacent to the location inferred to be category 1 is calculated as adjacent pixel group 300.
  • the frequency of the category to which adjacent pixel group 300 belongs is calculated, and category 2, which is the category with the highest frequency, is determined as the adjacent category.
  • the category indicates the structure of the measurement target in a semiconductor measurement image where similar structures are repeated within the image
  • the number of categories adjacent to a particular category is limited, so the above method can be used to determine the most frequent category as the adjacent category.
  • the asphalt parts of the road are frequently seen as an adjacent category, and the above method will register the asphalt parts as an adjacent category.
  • FIG. 4 shows an example of the correction score calculation unit 104.
  • the correction score calculation unit 104 extracts a category score related to the attention category D4 and a category score related to the adjacent category D6 for each pixel inferred to belong to the attention category D4 in the inference image D5 and each pixel inferred to belong to the adjacent category D6 in the inference image D5.
  • the correction score calculation unit 104 calculates uncertainty based on the extracted category scores as a value indicating the degree to which the segmentation model is insufficiently trained (e.g., contains insufficiently trained features). For example, the uncertainty is calculated as the smallness of the category scores for the specified categories (focus category D4 and adjacent category D6) from among the extracted category scores.
  • the uncertainty value is 1 - inference probability (the value obtained by subtracting that inference probability from 1).
  • the inference probability of a pixel with sufficiently trained features will be large, and the inference probability of a pixel with insufficiently trained features will be small, so it can be said that areas where 1 - inference probability is large (areas with large uncertainty) are areas where learning is insufficient.
  • processing step S403 identifies parts that need correction and have a large degree of uncertainty, and calculates a correction score for the identified parts that need correction.
  • the parts requiring correction may be parts where the uncertainty is equal to or greater than a predetermined threshold.
  • a specific method for calculating the score requiring correction may be designed as appropriate by a person skilled in the art, and may be, for example, the sum of the uncertainty related to the focus category and the uncertainty related to the adjacent category.
  • Figure 5 shows another example of a method for calculating uncertainty.
  • a method for calculating uncertainty regarding adjacent categories is to use the magnitude of the category score for the focus category as the uncertainty at an inferred location that belongs to the adjacent category.
  • the correction score calculation unit 104 extracts a group of category scores for each pixel inferred to belong to an adjacent category in the inference image. Then, in processing step S502, the uncertainty (second uncertainty) is calculated based on the category score related to the focused category from the extracted group of category scores.
  • the second uncertainty can be, for example, the magnitude of the category score related to the focused category.
  • the correction score calculation unit 104 calculates the uncertainty (first uncertainty) for each pixel inferred to belong to the attention category in the inference image, based on the category score related to the attention category.
  • the correction score calculation unit 104 calculates the correction score based on the first uncertainty and the second uncertainty.
  • the input image may contain features of the target category.
  • the category score of the target category may also be high. Therefore, it is possible to calculate the score requiring correction only for places within the adjacent category that are more likely to be false negatives for the target category.
  • Example 1 the method described above makes it possible to calculate a score requiring correction that is expected to improve accuracy with respect to false negatives as well as false positives.
  • FIG. 6 shows another example of a method for calculating the correction-needed score.
  • the correction-needed score calculation unit 104 smoothes the correction-needed score D7 for each pixel within a predetermined range including the pixel, and calculates the part where the value of the smoothed correction-needed score D8 is large as the part that needs correction.
  • a specific method for determining the specified range and a specific calculation for smoothing within the specified range can be designed appropriately by a person skilled in the art, but for example, a two-dimensional convolution process with a Gaussian function or a simple average calculation process may be used.
  • FIG. 7 shows yet another example of a method for calculating the correction-needed score.
  • the correction-needed score calculation unit 104 identifies a portion having a high correction-needed score as a correction-needed portion D9 in processing step S701.
  • the definition of a portion having a high correction-needed score can be appropriately determined by a person skilled in the art, but it may be, for example, a portion having a correction-needed score equal to or greater than a threshold value, or an area having a high standard deviation value of the correction-needed score.
  • Example 2 In the second embodiment, a GUI is proposed for creating teacher information using the correction-required score calculated in the first embodiment. Description of parts common to the first embodiment may be omitted.
  • FIG. 8 shows an example of the functional configuration of an image classification device according to the second embodiment of the present invention.
  • the image classification device 800 also functions as a teacher information creation support device similar to that of the first embodiment.
  • the image classification device 800 may further have a configuration similar to that of the teacher information creation support device 100 of FIG. 1.
  • the image classification device 800 receives an input image D801 and a segmentation model D802, and is equipped with an image category inference unit 801 and a teacher information creation GUI 802.
  • the image category inference unit 801 uses a segmentation model D802 to calculate a category score D804 indicating the degree of certainty that a pixel belongs to a category for each combination of pixel and category for an input image D801 consisting of multiple pixels, and calculates an inference image D803 based on the category score D804.
  • the inference image D803 and category score D804 are calculated in the same way as the inference image D5 and category score D3 in Figure 1, respectively.
  • the teacher information creation GUI802 is a GUI intended for the user to create teacher information D805 for each pixel using the input image D801, inference image D803, and category score D804.
  • the teacher information creation GUI802 has a function of simultaneously displaying the input image D801 and inference image D803 to make it easier to find errors in the inference results, and a function of having the machine find areas that need correction from the category score D804, making it easy to select the areas to be corrected.
  • FIG 11 shows an example of a teacher information creation GUI 802.
  • the teacher information creation GUI has an input image display unit that displays the input image D801, an inference image display unit that displays the inference image D803, a teacher information attachment unit that has a function of attaching teacher information creation, a focus category setting unit that accepts input of a focus category, a correction-required part display unit that outputs (for example, displays) a part that needs correction calculated based on the teacher information creation support device 100 shown in Figure 1 from the input focus category, input image, and category score, an uncertainty display unit that similarly displays the uncertainty calculated, and a model learning start unit that accepts an operation to start learning the segmentation model.
  • the uncertainty display section and the section displaying areas requiring correction are displayed as reference information when adding supervised information, and may not be necessary.
  • the device may also have an automatic correction part setting unit that accepts operations for automatically calculating the correction parts.
  • the image classification device 800 can calculate the areas that need to be corrected based on the focus category and category score, for example, in a manner similar to that of the teacher information creation support device 100 of Example 1.
  • the teacher information creation GUI 802 accepts the assignment of teacher information via the teacher information assignment unit.
  • the teacher information assignment unit can set a category for each pixel using a click process on a computer or a pen tablet.
  • the input image is displayed as the background, and the assigned teacher information is displayed superimposed on the foreground of the input image, but it is not necessary to display the input image as the background.
  • it may have a function to assign the same category to multiple areas requiring correction with a single operation.
  • the uncertainty display section displays information to be used as a reference when adding teacher information. Similar to the teacher information creation support device 100 of Example 1, the image classification device 800 calculates the uncertainty that indicates the degree to which the learning of the segmentation model is insufficient, and the uncertainty display section displays this uncertainty. In the example of Figure 11, white areas represent areas with high uncertainty.
  • the uncertainty display unit may also display the uncertainty in the focus category (first uncertainty) and the uncertainty in the adjacent category (second uncertainty) shown in FIG. 4 separately. Furthermore, if there are multiple adjacent categories, the second uncertainty may be displayed divided into multiple regions.
  • the section that needs correction displays areas that are highly uncertain as areas that need correction.
  • the areas that need correction are displayed as white areas.
  • the user can refer to the areas that need correction when adding teaching information. However, if the areas that need correction are selected automatically, the section that needs correction does not need to be displayed.
  • Figure 12 shows another example of the teaching information creation GUI 802.
  • the GUI in Figure 11 is applied to detecting defects in structures during the semiconductor manufacturing process.
  • three categories are displayed, and in particular, the categories "Structure 1" and “Structure 2" represent appropriate semiconductor structures, and the category “Defect” represents a structural defect.
  • GUI users can efficiently create training information that can be used to detect defects in semiconductor structures.
  • Example 3 an image classification device is proposed for a segmentation model in the middle of learning in interactive segmentation in which learning of a segmentation model and annotation by a user are performed mutually. Descriptions of parts common to the first and second embodiments may be omitted.
  • FIG. 9 shows an example of the functional configuration of an image classification device according to the third embodiment of the present invention.
  • the image classification device 900 first trains a segmentation model D802 using training information D805 calculated by the method shown in the second embodiment and an input image.
  • the model learning unit 901 learns a segmentation model D802 using an input image D801 and training information D805.
  • the segmentation model D802 used is one that has been trained in advance at the input stage.
  • a small amount of teacher information for the input image D801 may be used, or training may be performed using a public dataset, etc.
  • the model learning unit 901 learns the segmentation model D802, and then creates teacher information D805 again through processing by the image category inference unit 801, etc., and trains the segmentation model D802 using the teacher information D805. By repeating this process until there are no more misclassifications in the inference image D803, it is possible to obtain a segmentation model D802 with good classification accuracy.
  • FIG. 10 shows another example of the functional configuration of an image classification device according to the third embodiment of the present invention.
  • the image classification device 1000 calculates the uncertainty in the same manner as in the first and second embodiments, and performs enhancement learning using the uncertainty for each pixel.
  • the model training unit 1001 trains the segmentation model based on the uncertainty information D1001 so that the loss of pixels with high uncertainty is smaller than the loss of pixels with low uncertainty.
  • This type of learning is expected to improve accuracy. For example, by increasing the loss per pixel in proportion to the uncertainty and training the model according to that loss, it is possible to more strongly learn the features of areas with high uncertainty.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This training information creation assistance device comprises: a category score calculation unit that uses a segmentation model for an image which is composed of a plurality of pixels to calculate, for each combination of a pixel and a category, a category score indicating the degree of confidence that the pixel belongs to the category; an inference image determination unit that calculates an inference image on the basis of the category score; an adjacent category calculation unit that receives input of a category of interest and calculates, on the basis of the inference image and the category of interest, an adjacent category which is likely to be adjacent to the category of interest; and a correction-required score calculation unit that determines a correction-required score for at least one pixel on the basis of the category of interest, the adjacent category, and the category score.

Description

教師情報作成支援装置、画像分類装置およびプログラムTeacher information creation support device, image classification device and program

 本発明は、教師情報作成支援装置、画像分類装置およびプログラムに関し、たとえば学習機能を備えた画像分類システムにおける教師データ作成コストを削減するためのものに関する。 The present invention relates to a teacher information creation support device, an image classification device, and a program for reducing the cost of creating teacher data in, for example, an image classification system with a learning function.

 深層学習を用いた画素ごとの画像分類モデルの作成には教師情報が必要となるが、教師情報付与にかかるコストが大きい。対策としてモデルの推論結果に対してユーザが修正箇所と修正内容を指示することでユーザと協調しながら学習するインタラクティブセグメンテーションがある。修正箇所の選び方として、注目カテゴリと推論した箇所のうち、推論確率の低い箇所とする方法が考えられるが、注目カテゴリと推論していない箇所は選ばれないため、偽陰性の箇所が選ばれず精度が向上しない。  Creating a pixel-by-pixel image classification model using deep learning requires teacher information, but the cost of adding teacher information is high. As a countermeasure, there is interactive segmentation, where the model learns in collaboration with the user by having the user indicate the parts and content of corrections to be made to the model's inference results. One possible way to select the parts to be corrected is to select parts with a low probability of inference from among the parts inferred as the focus category, but since parts that are not inferred as the focus category are not selected, false negative parts are not selected and the accuracy does not improve.

国際公開第2020/129235号公報International Publication No. 2020/129235 国際公開第2020/189269号公報International Publication No. 2020/189269

 特許文献1では、推論確率が低くなる傾向を利用して、モデルの推論確率が低い箇所を修正箇所候補とするが、注目カテゴリに関係の無い誤識別箇所が選ばれることがあり、注目カテゴリの精度が上がりづらい。 In Patent Document 1, the tendency for inference probability to decrease is utilized to select areas with low model inference probability as candidates for correction areas, but misclassified areas unrelated to the target category may be selected, making it difficult to improve the accuracy of the target category.

 特許文献2では、注目カテゴリと分類された箇所のうち、推論確率の低い箇所を修正箇所候補とするが、注目カテゴリと推論していない箇所は選ばれないため、偽陰性の箇所が選ばれず精度が向上しない。 In Patent Document 2, among the parts classified as the target category, parts with a low probability of inference are selected as candidates for correction parts, but parts that are not inferred to be in the target category are not selected, so false negative parts are not selected and accuracy is not improved.

 このことから本発明においては、偽陰性箇所を含めて注目カテゴリの精度向上が期待できる要修正箇所を提示することを目的とする。 The purpose of this invention is to present areas requiring correction, including false negatives, that are expected to improve the accuracy of the focus category.

 本発明に係る教師情報作成支援装置の一例は、
 複数の画素からなる画像について、セグメンテーションモデルを用いて、画素とカテゴリとの組み合わせごとに、その画素がそのカテゴリに属することの確信度を示すカテゴリスコアを算出する、カテゴリスコア算出部と、
 前記カテゴリスコアに基づき、推論画像を算出する、推論画像決定部と、
 注目カテゴリの入力を受け付けるとともに、前記推論画像および前記注目カテゴリに基づき、注目カテゴリに隣接しやすい隣接カテゴリを算出する、隣接カテゴリ算出部と、
 前記注目カテゴリ、前記隣接カテゴリおよび前記カテゴリスコアに基づき、少なくとも1つの画素について要修正スコアを決定する、要修正スコア算出部と、
を有する。
An example of a teacher information creation support device according to the present invention is:
a category score calculation unit that calculates, for an image consisting of a plurality of pixels, a category score indicating a degree of certainty that each combination of a pixel and a category belongs to the corresponding category, using a segmentation model;
An inference image determination unit that calculates an inference image based on the category score;
an adjacent category calculation unit that receives an input of a focus category and calculates an adjacent category that is likely to be adjacent to the focus category based on the inference image and the focus category;
a correction-required score calculation unit that determines a correction-required score for at least one pixel based on the focus category, the neighboring categories, and the category scores;
has.

 本発明に係る画像分類装置の一例は、
 複数の画素からなる画像について、セグメンテーションモデルを用いて、画素とカテゴリとの組み合わせごとに、その画素がそのカテゴリに属することの確信度を示すカテゴリスコアを算出し、前記カテゴリスコアに基づいて推論画像を算出する、画像カテゴリ推論部と、
 注目カテゴリ設定部を有する教師情報作成GUIと、
を有し、
 前記注目カテゴリ設定部は、注目カテゴリの入力を受け付け、
 前記画像分類装置は、前記注目カテゴリおよび前記カテゴリスコアに基づいて要修正箇所を算出し、
 前記教師情報作成GUIは、
 前記要修正箇所を出力し、
 教師情報付与部を介して教師情報の付与を受け付ける。
An example of an image classification device according to the present invention includes:
an image category inference unit that uses a segmentation model to calculate a category score indicating a degree of certainty that a pixel belongs to a category for each combination of a pixel and a category for an image consisting of a plurality of pixels, and calculates an inferred image based on the category score;
A teacher information creation GUI having a focus category setting unit;
having
The attention category setting unit receives an input of an attention category,
The image classification device calculates a portion to be corrected based on the attention category and the category score,
The teacher information creation GUI includes:
Output the part that needs to be corrected,
The teacher information is accepted via the teacher information providing unit.

 本発明に係るプログラムの一例は、コンピュータを上述の教師情報作成支援装置として機能させる。 One example of the program according to the present invention causes a computer to function as the teacher information creation support device described above.

 本発明に係るプログラムの一例は、コンピュータを上述の画像分類装置として機能させる。 One example of the program according to the present invention causes a computer to function as the image classification device described above.

 本発明によれば、注目カテゴリの偽陰性に対する精度向上に繋がる要修正箇所が提示できる。 The present invention can suggest areas that need correction to improve accuracy against false negatives in a category of interest.

本発明の実施例1に係る教師情報作成支援装置の機能構成例。1 is an example of a functional configuration of a teacher information creation support device according to a first embodiment of the present invention. 隣接カテゴリ算出部の処理の一例。13 shows an example of processing by an adjacent category calculation unit. 隣接カテゴリ算出部の処理結果の一例。13 shows an example of a processing result of an adjacent category calculation unit. 要修正スコア算出部の一例。13 illustrates an example of a correction-required score calculation unit. 不確実性を算出する方法の別の例。Another example of how to calculate uncertainty. 要修正スコアを算出する方法の別の例。Another example of how to calculate the Needs Revision score. 要修正スコアを算出する方法のさらに別の例。Yet another example of how to calculate the Needs Revision score. 本発明の実施例2に係る画像分類装置の機能構成例。13 is an example of a functional configuration of an image classification device according to a second embodiment of the present invention. 本発明の実施例3に係る画像分類装置の機能構成例。13 is an example of a functional configuration of an image classification device according to a third embodiment of the present invention. 本発明の実施例3に係る画像分類装置の機能構成の別の例。13 is another example of a functional configuration of an image classification device according to a third embodiment of the present invention. 教師情報作成GUIの一例。13 is an example of a teacher information creation GUI. 教師情報作成GUIの別の例。13 is another example of a teacher information creation GUI.

 以下、本発明の実施例について、図面を参照して詳細に説明する。
[実施例1]
 図1に、本発明の実施例1に係る教師情報作成支援装置の機能構成例を示す。まず図1の機能構成の概要を述べると、教師情報作成支援装置100は入力画像D1とセグメンテーションモデルD2と注目カテゴリD4を入力とし、カテゴリスコア算出部101と推論画像決定部102と隣接カテゴリ算出部103と要修正スコア算出部104とを備える。
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[Example 1]
1 shows an example of the functional configuration of a teacher information creation support device according to a first embodiment of the present invention. First, an overview of the functional configuration in Fig. 1 will be given. The teacher information creation support device 100 receives an input image D1, a segmentation model D2, and a focus category D4 as inputs, and includes a category score calculation unit 101, an inference image determination unit 102, an adjacent category calculation unit 103, and a correction required score calculation unit 104.

 カテゴリスコア算出部101は、複数の画素からなる入力画像D1について、セグメンテーションモデルD2を用いて、画素とカテゴリとの組み合わせごとに、その画素がそのカテゴリに属することの確信度を示すカテゴリスコアD3を算出する。カテゴリスコアD3は、入力画像D1に対して各画素ごとに、識別されるカテゴリごとの可能性の高さを示す値を持つベクトルである。 The category score calculation unit 101 uses a segmentation model D2 to calculate a category score D3 for each combination of pixel and category for an input image D1 consisting of multiple pixels, which indicates the degree of certainty that the pixel belongs to that category. The category score D3 is a vector having values indicating the likelihood of each category being identified for each pixel in the input image D1.

 推論画像決定部102は、カテゴリスコアD3に基づき、推論画像D5を算出する。推論画像D5は、推論により画素ごとにカテゴリが1つに特定された画像である。具体的には画素ごとにカテゴリスコアD3について、最大の値を持つカテゴリを識別カテゴリとすることで算出できる。 The inference image determination unit 102 calculates an inference image D5 based on the category score D3. The inference image D5 is an image in which a single category has been identified for each pixel through inference. Specifically, the inference image D5 can be calculated by determining the category with the maximum value for the category score D3 for each pixel as the identification category.

 隣接カテゴリ算出部103は、注目カテゴリD4の入力を受け付けるとともに、推論画像D5および注目カテゴリD4に基づき、隣接カテゴリD6を算出する。隣接カテゴリとは、注目カテゴリに隣接しやすいカテゴリ(すなわち隣接する尤度が高いカテゴリ)を意味するが、具体例は後述する。 The adjacent category calculation unit 103 receives the input of the focused category D4, and calculates the adjacent category D6 based on the inference image D5 and the focused category D4. An adjacent category means a category that is likely to be adjacent to the focused category (i.e., a category with a high adjacent likelihood), and a specific example will be described later.

 要修正スコア算出部104は、注目カテゴリD4、隣接カテゴリD6およびカテゴリスコアD3に基づき、要修正スコアD7を決定する。要修正スコアは、たとえば、セグメンテーションモデルD2にとって注目カテゴリに関して識別が難しい箇所について算出されるが、具体例は後述する。 The correction score calculation unit 104 determines the correction score D7 based on the focused category D4, the adjacent category D6, and the category score D3. The correction score is calculated, for example, for a portion that is difficult for the segmentation model D2 to identify in relation to the focused category, and a specific example will be described later.

 教師情報作成支援装置100は、たとえばコンピュータを用いて構成することができる。コンピュータは公知のコンピュータとしてのハードウェア構成を有し、たとえば演算手段および記憶手段を備える。演算手段はたとえばプロセッサを含み、記憶手段はたとえば半導体メモリ装置および磁気ディスク装置等の記憶媒体を含む。記憶媒体の一部または全部が、過渡的でない(non-transitory)記憶媒体であってもよい。 The teacher information creation support device 100 can be configured, for example, using a computer. The computer has a hardware configuration as a known computer, and includes, for example, a calculation means and a storage means. The calculation means includes, for example, a processor, and the storage means includes, for example, a storage medium such as a semiconductor memory device and a magnetic disk device. Some or all of the storage medium may be a non-transitory storage medium.

 また、コンピュータは入出力手段を備えてもよい。入出力手段は、たとえばキーボードおよびマウス等の入力装置と、ディスプレイおよびプリンタ等の出力装置と、ネットワークインタフェース等の通信装置とを含む。 The computer may also be equipped with input/output means. The input/output means may include, for example, input devices such as a keyboard and a mouse, output devices such as a display and a printer, and communication devices such as a network interface.

 記憶手段はプログラムを記憶してもよい。プロセッサがこのプログラムを実行することにより、コンピュータは本実施例に係る教師情報作成支援装置100または他の実施例に係る装置として機能する。 The storage means may store a program. When the processor executes this program, the computer functions as the teacher information creation support device 100 according to this embodiment or a device according to another embodiment.

 以下、図1中の各構成要素の詳細を述べる。 The details of each component in Figure 1 are described below.

 入力画像D1は、要修正スコアを算出する対象となる画像である。この画像内の、少なくとも1つの画素(画素又は画素群)について、セグメンテーションモデルD2の精度向上を促すと考えられる要修正スコアが算出される。 The input image D1 is an image for which a correction score is to be calculated. For at least one pixel (pixel or group of pixels) in this image, a correction score that is thought to promote improvement of the accuracy of the segmentation model D2 is calculated.

 セグメンテーションモデルD2は事前に機械学習が行われた学習済みモデルであり、画像に対して識別性能を有しているものである。ただし、事前の学習は入力画像D1と似た特徴を持つデータセットを用いて実施されていてもいいし、入力画像D1とは関係の無いデータセットで実施されていても良い。 Segmentation model D2 is a trained model that has undergone prior machine learning and has the ability to discriminate images. However, the prior training may be performed using a dataset that has similar characteristics to input image D1, or may be performed using a dataset that is unrelated to input image D1.

 セグメンテーションモデルD2は、入力画像D1の画素ごとに特徴量を算出し、特徴量から各カテゴリへ識別する際の確信度を計算する。確信度はソフトマックス関数で画素ごとに正規化され推論確率を示す値の場合もあるし、ソフトマックスで正規化される前のロジット(logit)関数である場合もあり、どちらでもよい。 The segmentation model D2 calculates features for each pixel of the input image D1, and calculates the confidence level when classifying each category from the features. The confidence level may be a value that indicates the inference probability after being normalized for each pixel using a softmax function, or it may be a logit function before being normalized using softmax. Either is acceptable.

 注目カテゴリD4は、セグメンテーションモデルの識別するカテゴリの内、優先して精度向上を図るべきカテゴリを示す。例えば、半導体計測画像であれば計測対象の構造を示すカテゴリなどである。 The focus category D4 indicates the category that should be prioritized for improving accuracy among the categories identified by the segmentation model. For example, in the case of a semiconductor measurement image, this would be a category that indicates the structure of the measurement target.

 図2に、隣接カテゴリ算出部の処理の一例を示す。隣接カテゴリ算出部103はまず、処理ステップS201として、隣接画素群を抽出する。 FIG. 2 shows an example of the processing of the adjacent category calculation unit. First, in processing step S201, the adjacent category calculation unit 103 extracts adjacent pixel groups.

 ここで、まず隣接カテゴリ算出部103は、推論画像D5の画素のうち注目カテゴリに属するものを抽出する。そして、注目カテゴリに属すると推論された画素のそれぞれについて、その画素に隣接する画素を、隣接画素として抽出する(ただし、隣接する画素も注目カテゴリに属する場合には、その画素は隣接画素としては抽出されない)。 First, the adjacent category calculation unit 103 extracts pixels from the inference image D5 that belong to the attention category. Then, for each pixel inferred to belong to the attention category, pixels adjacent to that pixel are extracted as adjacent pixels (however, if the adjacent pixels also belong to the attention category, those pixels are not extracted as adjacent pixels).

 次に、隣接カテゴリ算出部103は処理ステップS202として、隣接カテゴリを決定する。ここで、隣接カテゴリ算出部103は、隣接画素群について、推論画像D5におけるカテゴリの頻度を求め、頻度の高いカテゴリを、注目カテゴリD4に対する隣接カテゴリとして決定する。これによって、注目カテゴリD4に隣接しやすいカテゴリが決定される。ここで、頻度の高いカテゴリが複数あれば、隣接カテゴリを複数決定することも考えられる。これにより、注目カテゴリD4に隣接しやすいカテゴリを算出することが出来る。 Next, in processing step S202, the adjacent category calculation unit 103 determines adjacent categories. Here, the adjacent category calculation unit 103 obtains the frequency of categories in the inference image D5 for the adjacent pixel groups, and determines the category with a high frequency as the adjacent category for the focused category D4. In this way, the category that is likely to be adjacent to the focused category D4 is determined. Here, if there are multiple frequently occurring categories, it is also possible to determine multiple adjacent categories. In this way, it is possible to calculate the categories that are likely to be adjacent to the focused category D4.

 このように、本実施例において、注目カテゴリD4に隣接しやすいカテゴリとは、推論画像D5において注目カテゴリD4に属する画素に隣接する画素が属するカテゴリとして頻度の高いものをいう。「頻度の高い」の定義は当業者が適宜決定可能であるが、たとえば所定の閾値を超える場合をいう。 In this way, in this embodiment, a category that is likely to be adjacent to the focused category D4 refers to a category that frequently has pixels adjacent to pixels belonging to the focused category D4 in the inference image D5. The definition of "frequent" can be determined appropriately by those skilled in the art, but it refers to, for example, a case where the frequency exceeds a predetermined threshold.

 図3に、隣接カテゴリ算出部の処理結果の一例を示す。図3は、各画素がカテゴリ1、2、3のいずれかに識別された推論画像を入力とし、注目カテゴリをカテゴリ1とした例を示す。 Figure 3 shows an example of the processing results of the adjacent category calculation unit. Figure 3 shows an example in which an inference image in which each pixel is classified as category 1, 2, or 3 is input, and the focus category is category 1.

 まずS201にてカテゴリ1と推論された箇所に隣接する画素群が、隣接画素群300として算出される。次に、S202にて隣接画素群300が属するカテゴリの頻度を求め、頻度の高いカテゴリとしてカテゴリ2が隣接カテゴリとして決定される。 First, in S201, the group of pixels adjacent to the location inferred to be category 1 is calculated as adjacent pixel group 300. Next, in S202, the frequency of the category to which adjacent pixel group 300 belongs is calculated, and category 2, which is the category with the highest frequency, is determined as the adjacent category.

 例えば画像内に同様の構造が繰り返される半導体計測画像の計測対象の構造を示すカテゴリであれば、特定のカテゴリに隣接するカテゴリは限定されているため、上記の手法で頻度の高いカテゴリを隣接するカテゴリとすることが出来る。 For example, if the category indicates the structure of the measurement target in a semiconductor measurement image where similar structures are repeated within the image, the number of categories adjacent to a particular category is limited, so the above method can be used to determine the most frequent category as the adjacent category.

 また、例えば道路の画像の白線を注目カテゴリとした場合、隣接するカテゴリとして道路のアスファルト部分の頻度が高く、上記の手法によりアスファルト部分が隣接するカテゴリとして登録される。 For example, if the white lines in an image of a road are set as the focus category, the asphalt parts of the road are frequently seen as an adjacent category, and the above method will register the asphalt parts as an adjacent category.

 図4に要修正スコア算出部104の一例を示す。要修正スコア算出部104はまず処理ステップS401で、推論画像D5において注目カテゴリD4に属すると推論された各画素、および、推論画像D5において隣接カテゴリD6に属すると推論された各画素について、注目カテゴリD4に係るカテゴリスコアおよび隣接カテゴリD6に係るカテゴリスコアを抽出する。 FIG. 4 shows an example of the correction score calculation unit 104. First, in processing step S401, the correction score calculation unit 104 extracts a category score related to the attention category D4 and a category score related to the adjacent category D6 for each pixel inferred to belong to the attention category D4 in the inference image D5 and each pixel inferred to belong to the adjacent category D6 in the inference image D5.

 そして、要修正スコア算出部104は、抽出されたカテゴリスコアに基づき、処理ステップS402で、セグメンテーションモデルの学習が不十分である(たとえば学習が不十分である特徴を含んでいる)度合いを示す値として不確実性を算出する。たとえば、抽出されたカテゴリスコア群のうち、指定されたカテゴリ(注目カテゴリD4および隣接カテゴリD6)に対するカテゴリスコアの小ささを不確実性として算出する。 Then, in processing step S402, the correction score calculation unit 104 calculates uncertainty based on the extracted category scores as a value indicating the degree to which the segmentation model is insufficiently trained (e.g., contains insufficiently trained features). For example, the uncertainty is calculated as the smallness of the category scores for the specified categories (focus category D4 and adjacent category D6) from among the extracted category scores.

 例えば、カテゴリスコアが推論確率であった場合、1-推論確率(1からその推論確率を減算した値)を不確実性の値とする。一般に、学習したセグメンテーションモデルでは、十分に学習された特徴を持つ画素の推論確率は大きくなり、学習が不十分な特徴を持つ画素の推論確率は小さくなるため、1-推論確率が大きいとされた箇所(不確実性が大きい箇所)は学習が不十分な箇所であると言える。 For example, if the category score is an inference probability, then the uncertainty value is 1 - inference probability (the value obtained by subtracting that inference probability from 1). Generally, in a trained segmentation model, the inference probability of a pixel with sufficiently trained features will be large, and the inference probability of a pixel with insufficiently trained features will be small, so it can be said that areas where 1 - inference probability is large (areas with large uncertainty) are areas where learning is insufficient.

 これらに指定カテゴリとして注目カテゴリと隣接カテゴリを入力とすることで、注目カテゴリであると推論された箇所の不確実性と、隣接カテゴリであると推論された箇所の不確実性を得る。そして、これらの不確実性に基づき、処理ステップS403にて不確実性の大きい要修正箇所を特定し、特定された要修正箇所の要修正スコアを算出する。 By inputting the focus category and adjacent categories as the specified categories, the uncertainty of the part inferred to be the focus category and the uncertainty of the part inferred to be the adjacent category are obtained. Then, based on these uncertainties, processing step S403 identifies parts that need correction and have a large degree of uncertainty, and calculates a correction score for the identified parts that need correction.

 要修正箇所は、たとえば不確実性が所定閾値以上となる箇所とすることができる。要修正スコアの具体的な算出方法は、当業者が適宜設計することができ、たとえば注目カテゴリに係る不確実性と隣接カテゴリに係る不確実性との和であってもよい。 The parts requiring correction may be parts where the uncertainty is equal to or greater than a predetermined threshold. A specific method for calculating the score requiring correction may be designed as appropriate by a person skilled in the art, and may be, for example, the sum of the uncertainty related to the focus category and the uncertainty related to the adjacent category.

 不確実性の値が大きい箇所はセグメンテーションモデルにとって識別が難しい箇所を示し、優先的に学習することでセグメンテーションモデルの識別精度向上が期待できる。これにより、偽陽性箇所が出る可能性のある注目カテゴリと推論された箇所に加えて、偽陰性箇所が出る可能性がある隣接カテゴリと推論された箇所を含めた候補から要修正スコアを算出することが可能となる。  Areas with large uncertainty values indicate areas that are difficult for the segmentation model to identify, and by prioritizing learning from these areas, it is expected that the segmentation model's identification accuracy will improve. This makes it possible to calculate the score that needs correction from candidates that include adjacent categories and inferred areas that may result in false negatives, in addition to the focus category and inferred areas that may result in false positives.

 図5に、不確実性を算出する方法の別の例を示す。図5のように、隣接カテゴリに係る不確実性の算出方法として、隣接カテゴリに属する推論された箇所において、注目カテゴリに対するカテゴリスコアの大きさを不確実性とすることも考えられる。 Figure 5 shows another example of a method for calculating uncertainty. As shown in Figure 5, a method for calculating uncertainty regarding adjacent categories is to use the magnitude of the category score for the focus category as the uncertainty at an inferred location that belongs to the adjacent category.

 図5の例では、要修正スコア算出部104は、処理ステップS501で、推論画像において隣接カテゴリに属すると推論された各画素について、カテゴリスコア群を抽出する。そして、処理ステップS502で、抽出されたカテゴリスコア群のうち、注目カテゴリに係るカテゴリスコアに基づき、不確実性(第2不確実性)を算出する。第2不確実性は、たとえば注目カテゴリに係るカテゴリスコアの大きさとすることができる。 In the example of FIG. 5, in processing step S501, the correction score calculation unit 104 extracts a group of category scores for each pixel inferred to belong to an adjacent category in the inference image. Then, in processing step S502, the uncertainty (second uncertainty) is calculated based on the category score related to the focused category from the extracted group of category scores. The second uncertainty can be, for example, the magnitude of the category score related to the focused category.

 なお、注目カテゴリに属すると推論された箇所については、図4と同様の算出方法とすることができる。すなわち、要修正スコア算出部104は、推論画像において注目カテゴリに属すると推論された各画素について、注目カテゴリに係るカテゴリスコアに基づき、不確実性(第1不確実性)を算出する。 Note that for the parts inferred to belong to the attention category, the same calculation method as in FIG. 4 can be used. That is, the correction score calculation unit 104 calculates the uncertainty (first uncertainty) for each pixel inferred to belong to the attention category in the inference image, based on the category score related to the attention category.

 この場合には、要修正スコア算出部104は、第1不確実性および第2不確実性に基づき、要修正スコアを算出する。 In this case, the correction score calculation unit 104 calculates the correction score based on the first uncertainty and the second uncertainty.

 偽陰性の箇所では、入力画像が注目カテゴリの特徴を含んでいる場合があり、すなわち、そのような箇所では、最もカテゴリスコアが高いカテゴリが隣接カテゴリであっても、注目カテゴリのカテゴリスコアも高い場合がある。よって、隣接カテゴリ内において、より注目カテゴリに対する偽陰性の可能性が高い箇所に限定して、要修正スコアを算出することが可能となる。 In places where there are false negatives, the input image may contain features of the target category. In other words, in such places, even if the category with the highest category score is an adjacent category, the category score of the target category may also be high. Therefore, it is possible to calculate the score requiring correction only for places within the adjacent category that are more likely to be false negatives for the target category.

 実施例1では、以上述べた方法によって偽陽性に加えて偽陰性に関する精度向上を見込める要修正スコアを算出することが可能となる。 In Example 1, the method described above makes it possible to calculate a score requiring correction that is expected to improve accuracy with respect to false negatives as well as false positives.

 図6に、要修正スコアを算出する方法の別の例を示す。図6の例では、要修正スコア算出部104は、処理ステップS601において、各画素について、その画素を含む所定範囲内で要修正スコアD7を平滑化し、平滑化された要修正スコアD8の値が大きい箇所を要修正箇所として算出する。 FIG. 6 shows another example of a method for calculating the correction-needed score. In the example of FIG. 6, in processing step S601, the correction-needed score calculation unit 104 smoothes the correction-needed score D7 for each pixel within a predetermined range including the pixel, and calculates the part where the value of the smoothed correction-needed score D8 is large as the part that needs correction.

 ユーザが修正を行う際に、孤立した1画素のみを修正するよりも、大きい範囲を一度に修正した方が効率が良いため、所定範囲で修正スコアが高い箇所を算出することで、効率のよい修正を行うことが可能となる。所定範囲の具体的な決定方法、および、所定範囲内で平滑化するための具体的な演算は、当業者が適宜設計可能であるが、たとえばガウス関数との2次元畳み込み処理を用いてもよく、単純な平均演算処理を用いてもよい。 When a user makes corrections, it is more efficient to correct a large range at once than to correct only one isolated pixel, so by calculating the points with a high correction score within a specified range, it is possible to perform efficient corrections. A specific method for determining the specified range and a specific calculation for smoothing within the specified range can be designed appropriately by a person skilled in the art, but for example, a two-dimensional convolution process with a Gaussian function or a simple average calculation process may be used.

 図7に、要修正スコアを算出する方法のさらに別の例を示す。図7の例では、要修正スコア算出部104は、処理ステップS701において、要修正スコアが大きい箇所を要修正箇所D9として特定する。要修正スコアが大きい箇所の定義は、当業者が適宜決定可能であるが、たとえば閾値以上の要修正スコアを持つ箇所としてもよいし、要修正スコアの偏差値が大きい領域としてもよい。 FIG. 7 shows yet another example of a method for calculating the correction-needed score. In the example of FIG. 7, the correction-needed score calculation unit 104 identifies a portion having a high correction-needed score as a correction-needed portion D9 in processing step S701. The definition of a portion having a high correction-needed score can be appropriately determined by a person skilled in the art, but it may be, for example, a portion having a correction-needed score equal to or greater than a threshold value, or an area having a high standard deviation value of the correction-needed score.

 このように要修正箇所を決定することにより、ユーザは要修正スコアが大きい箇所を効率的に把握することができる。 By determining the areas that need correction in this way, users can efficiently identify areas that have a high correction score.

[実施例2]
 実施例2においては、実施例1にて算出される要修正スコアを用いて教師情報を作成するGUIを提案する。実施例1と共通する部分については説明を省略する場合がある。
[Example 2]
In the second embodiment, a GUI is proposed for creating teacher information using the correction-required score calculated in the first embodiment. Description of parts common to the first embodiment may be omitted.

 図8に、本発明の実施例2に係る画像分類装置の機能構成例を示す。画像分類装置800は、実施例1と同様の教師情報作成支援装置としても機能する。たとえば、画像分類装置800は、さらに図1の教師情報作成支援装置100と同様の構成を有してもよい。 FIG. 8 shows an example of the functional configuration of an image classification device according to the second embodiment of the present invention. The image classification device 800 also functions as a teacher information creation support device similar to that of the first embodiment. For example, the image classification device 800 may further have a configuration similar to that of the teacher information creation support device 100 of FIG. 1.

 まず図8の機能構成の概要を述べると、画像分類装置800は、入力画像D801とセグメンテーションモデルD802を入力とし、画像カテゴリ推論部801と教師情報作成GUI802を備える。 First, to outline the functional configuration in Figure 8, the image classification device 800 receives an input image D801 and a segmentation model D802, and is equipped with an image category inference unit 801 and a teacher information creation GUI 802.

 画像カテゴリ推論部801は、複数の画素からなる入力画像D801について、セグメンテーションモデルD802を用いて、画素とカテゴリとの組み合わせごとに、その画素がそのカテゴリに属することの確信度を示すカテゴリスコアD804を算出し、カテゴリスコアD804に基づいて推論画像D803を算出する。推論画像D803とカテゴリスコアD804はそれぞれ図1の推論画像D5とカテゴリスコアD3と同様に算出される。 The image category inference unit 801 uses a segmentation model D802 to calculate a category score D804 indicating the degree of certainty that a pixel belongs to a category for each combination of pixel and category for an input image D801 consisting of multiple pixels, and calculates an inference image D803 based on the category score D804. The inference image D803 and category score D804 are calculated in the same way as the inference image D5 and category score D3 in Figure 1, respectively.

 教師情報作成GUI802は入力画像D801と推論画像D803とカテゴリスコアD804を用いてユーザが画素ごとの教師情報D805を作成することを目的としたGUIである。教師情報作成GUI802では入力画像D801と推論画像D803を同時に表示し推論結果の誤りを見つけやすくしたり、カテゴリスコアD804から要修正箇所を機械が求め修正箇所の選択を容易にする機能を有する。 The teacher information creation GUI802 is a GUI intended for the user to create teacher information D805 for each pixel using the input image D801, inference image D803, and category score D804. The teacher information creation GUI802 has a function of simultaneously displaying the input image D801 and inference image D803 to make it easier to find errors in the inference results, and a function of having the machine find areas that need correction from the category score D804, making it easy to select the areas to be corrected.

 図11に、教師情報作成GUI802の一例を示す。教師情報作成GUIは入力画像D801を表示する入力画像表示部と、推論画像D803を表示する推論画像表示部と、教師情報作成を付与する機能を有する教師情報付与部と、注目カテゴリの入力を受け付ける注目カテゴリ設定部と、入力された注目カテゴリと入力画像とカテゴリスコアから図1に示す教師情報作成支援装置100に基づいて計算された要修正箇所を出力(たとえば表示)する要修正箇所表示部と、同様に計算された不確実性を表示する不確実性表示部と、セグメンテーションモデルの学習を開始するための操作を受け付けるモデル学習開始部を有する。 Figure 11 shows an example of a teacher information creation GUI 802. The teacher information creation GUI has an input image display unit that displays the input image D801, an inference image display unit that displays the inference image D803, a teacher information attachment unit that has a function of attaching teacher information creation, a focus category setting unit that accepts input of a focus category, a correction-required part display unit that outputs (for example, displays) a part that needs correction calculated based on the teacher information creation support device 100 shown in Figure 1 from the input focus category, input image, and category score, an uncertainty display unit that similarly displays the uncertainty calculated, and a model learning start unit that accepts an operation to start learning the segmentation model.

 不確実性表示部と要修正箇所表示部は、教師情報を付与する際の参考情報として表示していて、無くても良い。 The uncertainty display section and the section displaying areas requiring correction are displayed as reference information when adding supervised information, and may not be necessary.

 また、修正箇所を自動的に算出するための操作を受け付ける修正箇所自動設定部を有してもよい。 The device may also have an automatic correction part setting unit that accepts operations for automatically calculating the correction parts.

 画像分類装置800は、注目カテゴリおよびカテゴリスコアに基づいて、たとえば実施例1の教師情報作成支援装置100と同様にして要修正箇所を算出することができる。 The image classification device 800 can calculate the areas that need to be corrected based on the focus category and category score, for example, in a manner similar to that of the teacher information creation support device 100 of Example 1.

 教師情報作成GUI802は、教師情報付与部を介して教師情報の付与を受け付ける。教師情報付与部は、計算機のクリック処理やペンタブレットを用いて画素ごとのカテゴリを設定できる。図11の例では背景として入力画像を表示し、付与される教師情報を入力画像の前景に重畳させて表示するようになっているが、背景として入力画像を表示しなくても良い。また、複数の要修正箇所に対して1回の操作により同じカテゴリに修正する機能を有していても良い。 The teacher information creation GUI 802 accepts the assignment of teacher information via the teacher information assignment unit. The teacher information assignment unit can set a category for each pixel using a click process on a computer or a pen tablet. In the example of Figure 11, the input image is displayed as the background, and the assigned teacher information is displayed superimposed on the foreground of the input image, but it is not necessary to display the input image as the background. In addition, it may have a function to assign the same category to multiple areas requiring correction with a single operation.

 不確実性表示部は、教師情報を付与する際に参考にする情報として表示している。画像分類装置800は、実施例1の教師情報作成支援装置100と同様にして、セグメンテーションモデルの学習が不十分である度合いを表す不確実性を算出し、不確実性表示部はこの不確実性を表示する。図11の例では、白い領域が不確実性の高い領域を表す。 The uncertainty display section displays information to be used as a reference when adding teacher information. Similar to the teacher information creation support device 100 of Example 1, the image classification device 800 calculates the uncertainty that indicates the degree to which the learning of the segmentation model is insufficient, and the uncertainty display section displays this uncertainty. In the example of Figure 11, white areas represent areas with high uncertainty.

 要修正箇所として選ばれていないが比較的不確実性が大きい箇所は、推論が誤っている箇所であることが多く、また誤っていなくてもセグメンテーションモデルの学習が不十分であることが多い。このため、ユーザは、不確実性表示部の表示を参考にし、比較的不確実性が大きい箇所に教師情報を付与することで、セグメンテーションモデルの精度を向上させることができる。ただし、自動で要修正箇所を選択する場合には、不確実性表示部は表示しなくても良い。  Areas that are not selected as areas requiring correction but have relatively high uncertainty are often areas where the inference is incorrect, and even if there is no error, the segmentation model is often insufficiently trained. For this reason, the user can improve the accuracy of the segmentation model by referring to the display in the uncertainty display section and adding teaching information to areas with relatively high uncertainty. However, if areas requiring correction are selected automatically, it is not necessary to display the uncertainty display section.

 また、不確実性表示部は図4に示す注目カテゴリ内の不確実性(第1不確実性)と隣接カテゴリ内の不確実性(第2不確実性)を別々に表示しても良い。また、隣接カテゴリが複数ある場合には、第2不確実性を複数の領域に分けて表示しても良い。 The uncertainty display unit may also display the uncertainty in the focus category (first uncertainty) and the uncertainty in the adjacent category (second uncertainty) shown in FIG. 4 separately. Furthermore, if there are multiple adjacent categories, the second uncertainty may be displayed divided into multiple regions.

 要修正箇所表示部は、不確実性が大きい箇所を要修正箇所として表示する。図11の例では、要修正箇所は白い領域として表される。ユーザは、教師情報を付与する際に、要修正箇所を参考にすることができる。ただし、自動で要修正箇所を選択する場合には、要修正箇所表示部は表示しなくても良い。 The section that needs correction displays areas that are highly uncertain as areas that need correction. In the example of Figure 11, the areas that need correction are displayed as white areas. The user can refer to the areas that need correction when adding teaching information. However, if the areas that need correction are selected automatically, the section that needs correction does not need to be displayed.

 図12に、教師情報作成GUI802の別の例を示す。この例は、図11のGUIを、半導体の製造過程における構造の欠陥検出に適用する場合の例である。図11と同様に3つのカテゴリが表示されており、とくにカテゴリ「構造1」および「構造2」は半導体構造の適切な構造を表し、カテゴリ「欠陥」は構造的欠陥を表す。 Figure 12 shows another example of the teaching information creation GUI 802. In this example, the GUI in Figure 11 is applied to detecting defects in structures during the semiconductor manufacturing process. As in Figure 11, three categories are displayed, and in particular, the categories "Structure 1" and "Structure 2" represent appropriate semiconductor structures, and the category "Defect" represents a structural defect.

 このようなGUIによれば、ユーザは半導体構造における欠陥検出に利用可能な教師情報を、効率的に作成することができる。 With this type of GUI, users can efficiently create training information that can be used to detect defects in semiconductor structures.

[実施例3]
 実施例3においてはセグメンテーションモデルの学習とユーザのアノテーションを相互に行うインタラクティブセグメンテーションにおける、学習途中のセグメンテーションモデルである場合の画像分類装置を提案している。実施例1または2と共通する部分については、説明を省略する場合がある。
[Example 3]
In the third embodiment, an image classification device is proposed for a segmentation model in the middle of learning in interactive segmentation in which learning of a segmentation model and annotation by a user are performed mutually. Descriptions of parts common to the first and second embodiments may be omitted.

 図9に、本発明の実施例3に係る画像分類装置の機能構成例を示す。画像分類装置900は、まず、実施例2に示す方法で算出される教師情報D805と入力画像を用いてセグメンテーションモデルD802を学習する。 FIG. 9 shows an example of the functional configuration of an image classification device according to the third embodiment of the present invention. The image classification device 900 first trains a segmentation model D802 using training information D805 calculated by the method shown in the second embodiment and an input image.

 モデル学習部901は入力画像D801と教師情報D805を用いてセグメンテーションモデルD802を学習する。 The model learning unit 901 learns a segmentation model D802 using an input image D801 and training information D805.

 セグメンテーションモデルD802は入力段階で事前に学習されているものを用いる。事前の学習には入力画像D801に対する少数の教師情報を用いても良いし、公開データセット等を用いて学習しても良い。 The segmentation model D802 used is one that has been trained in advance at the input stage. For the pre-training, a small amount of teacher information for the input image D801 may be used, or training may be performed using a public dataset, etc.

 画像分類装置900において、モデル学習部901は、セグメンテーションモデルD802を学習し、その後、再度、画像カテゴリ推論部801等の処理により教師情報D805を作成し、教師情報D805を用いてセグメンテーションモデルD802を学習させる。この処理を推論画像D803に誤識別が無くなるまで繰り返すことにより、識別精度の良いセグメンテーションモデルD802を得ることが可能となる。 In the image classification device 900, the model learning unit 901 learns the segmentation model D802, and then creates teacher information D805 again through processing by the image category inference unit 801, etc., and trains the segmentation model D802 using the teacher information D805. By repeating this process until there are no more misclassifications in the inference image D803, it is possible to obtain a segmentation model D802 with good classification accuracy.

 図10に、本発明の実施例3に係る画像分類装置の機能構成の別の例を示す。画像分類装置1000は、実施例1および2と同様にして不確実性を算出し、各画素ごとの不確実性を用いて強調学習を行う。 FIG. 10 shows another example of the functional configuration of an image classification device according to the third embodiment of the present invention. The image classification device 1000 calculates the uncertainty in the same manner as in the first and second embodiments, and performs enhancement learning using the uncertainty for each pixel.

 不確実性の大きい箇所は、セグメンテーションモデルにとって学習が不十分である箇所である。モデル学習部1001は、不確実性情報D1001に基づき、不確実性の大きい画素の損失が、不確実性の小さい画素の損失より小さくなるように、セグメンテーションモデルを学習させる。 Areas with high uncertainty are areas where the segmentation model has not been sufficiently trained. The model training unit 1001 trains the segmentation model based on the uncertainty information D1001 so that the loss of pixels with high uncertainty is smaller than the loss of pixels with low uncertainty.

 このような学習により、精度向上が見込める。例えば、不確実性に比例して画素ごとの損失を大きくし、モデル学習時にその損失に従って学習を行うことで、不確実性の大きい箇所の特徴をより強く学習することが可能となる。 This type of learning is expected to improve accuracy. For example, by increasing the loss per pixel in proportion to the uncertainty and training the model according to that loss, it is possible to more strongly learn the features of areas with high uncertainty.

 100…教師情報作成支援装置
 101…カテゴリスコア算出部
 102…推論画像決定部
 103…隣接カテゴリ算出部
 104…要修正スコア算出部
 300…隣接画素群
 800…画像分類装置
 801…画像カテゴリ推論部
 802…教師情報作成GUI
 900…画像分類装置
 901…モデル学習部
 1000…画像分類装置
 1001…モデル学習部
 D1…入力画像
 D2…セグメンテーションモデル
 D3…カテゴリスコア
 D4…注目カテゴリ
 D5…推論画像
 D6…隣接カテゴリ
 D7…要修正スコア
 D8…要修正スコア
 D9…要修正箇所
 D801…入力画像
 D802…セグメンテーションモデル
 D803…推論画像
 D804…カテゴリスコア
 D805…教師情報
 D1001…不確実性情報
 S201…処理ステップ
 S202…処理ステップ
 S401…処理ステップ
 S402…処理ステップ
 S403…処理ステップ
 S501…処理ステップ
 S502…処理ステップ
 S601…処理ステップ
 S701…処理ステップ
100: Teacher information creation support device 101: Category score calculation unit 102: Inference image determination unit 103: Adjacent category calculation unit 104: Correction score calculation unit 300: Adjacent pixel group 800: Image classification device 801: Image category inference unit 802: Teacher information creation GUI
900: Image classification device 901: Model learning unit 1000: Image classification device 1001: Model learning unit D1: Input image D2: Segmentation model D3: Category score D4: Category of interest D5: Inferred image D6: Adjacent category D7: Score requiring correction D8: Score requiring correction D9: Area requiring correction D801: Input image D802: Segmentation model D803: Inferred image D804: Category score D805: Teacher information D1001: Uncertainty information S201: Processing step S202: Processing step S401: Processing step S402: Processing step S403: Processing step S501: Processing step S502: Processing step S601: Processing step S701: Processing step

Claims (13)

 教師情報作成支援装置であって、
 複数の画素からなる画像について、セグメンテーションモデルを用いて、画素とカテゴリとの組み合わせごとに、その画素がそのカテゴリに属することの確信度を示すカテゴリスコアを算出する、カテゴリスコア算出部と、
 前記カテゴリスコアに基づき、推論画像を算出する、推論画像決定部と、
 注目カテゴリの入力を受け付けるとともに、前記推論画像および前記注目カテゴリに基づき、注目カテゴリに隣接しやすい隣接カテゴリを算出する、隣接カテゴリ算出部と、
 前記注目カテゴリ、前記隣接カテゴリおよび前記カテゴリスコアに基づき、少なくとも1つの画素について要修正スコアを決定する、要修正スコア算出部と、
を有することを特徴とする、教師情報作成支援装置。
A teacher information creation support device,
a category score calculation unit that calculates, for an image consisting of a plurality of pixels, a category score indicating a degree of certainty that each combination of a pixel and a category belongs to the corresponding category, using a segmentation model;
An inference image determination unit that calculates an inference image based on the category score;
an adjacent category calculation unit that receives an input of a focus category and calculates an adjacent category that is likely to be adjacent to the focus category based on the inference image and the focus category;
a correction-required score calculation unit that determines a correction-required score for at least one pixel based on the focus category, the neighboring categories, and the category scores;
A teacher information creation support device comprising:
 請求項1に記載の教師情報作成支援装置であって、
 前記要修正スコア算出部は、
 前記推論画像において前記注目カテゴリに属すると推論された各画素、および、前記推論画像において前記隣接カテゴリに属すると推論された各画素について、前記注目カテゴリに係るカテゴリスコアおよび前記隣接カテゴリに係るカテゴリスコアを抽出し、
 抽出された各カテゴリスコアに基づき、前記セグメンテーションモデルの学習が不十分である度合いを示す値として不確実性を算出し、
 前記不確実性に基づき、前記要修正スコアを算出する、
ことを特徴とする、教師情報作成支援装置。
The teacher information creation support device according to claim 1,
The correction score calculation unit,
extracting a category score related to the attention category and a category score related to the adjacent category for each pixel inferred to belong to the attention category in the inference image and each pixel inferred to belong to the adjacent category in the inference image;
Calculating uncertainty based on each extracted category score as a value indicating the degree to which the segmentation model is insufficiently trained;
calculating the revision score based on the uncertainty;
A teacher information creation support device comprising:
 請求項1に記載の教師情報作成支援装置であって、
 前記要修正スコア算出部は、
 前記推論画像において前記注目カテゴリに属すると推論された各画素について、前記注目カテゴリに係るカテゴリスコアに基づき、第1不確実性を算出し、
 前記推論画像において前記隣接カテゴリに属すると推論された各画素について、前記注目カテゴリに係るカテゴリスコアに基づき、第2不確実性を算出し、
 前記第1不確実性および前記第2不確実性に基づき、前記要修正スコアを算出する、
ことを特徴とする、教師情報作成支援装置。
The teacher information creation support device according to claim 1,
The correction score calculation unit,
Calculating a first uncertainty for each pixel inferred to belong to the attention category in the inference image based on a category score related to the attention category;
Calculating a second uncertainty for each pixel inferred to belong to the adjacent category in the inference image based on a category score related to the target category;
calculating the revision score based on the first uncertainty and the second uncertainty;
A teacher information creation support device comprising:
 請求項1に記載の教師情報作成支援装置であって、前記要修正スコア算出部は、前記要修正スコアの大きい箇所を要修正箇所として特定することを特徴とする教師情報作成支援装置。 The teacher information creation support device according to claim 1, wherein the correction score calculation unit identifies the parts with a large correction score as parts that need correction.  請求項1に記載の教師情報作成支援装置であって、前記要修正スコア算出部は、各画素について、その画素を含む所定範囲内で前記要修正スコアを平滑化し、平滑化された値が大きい箇所を要修正箇所として算出することを特徴とする、教師情報作成支援装置。 The teacher information creation support device according to claim 1, characterized in that the correction score calculation unit smoothes the correction score for each pixel within a predetermined range including the pixel, and calculates the part with the large smoothed value as the part requiring correction.  画像分類装置であって、
 複数の画素からなる画像について、セグメンテーションモデルを用いて、画素とカテゴリとの組み合わせごとに、その画素がそのカテゴリに属することの確信度を示すカテゴリスコアを算出し、前記カテゴリスコアに基づいて推論画像を算出する、画像カテゴリ推論部と、
 注目カテゴリ設定部を有する教師情報作成GUIと、
を有し、
 前記注目カテゴリ設定部は、注目カテゴリの入力を受け付け、
 前記画像分類装置は、前記注目カテゴリおよび前記カテゴリスコアに基づいて要修正箇所を算出し、
 前記教師情報作成GUIは、
 前記要修正箇所を出力し、
 教師情報付与部を介して教師情報の付与を受け付ける、
ことを特徴とする、画像分類装置。
An image classification device, comprising:
an image category inference unit that uses a segmentation model to calculate a category score indicating a degree of certainty that a pixel belongs to a category for each combination of a pixel and a category for an image consisting of a plurality of pixels, and calculates an inferred image based on the category score;
A teacher information creation GUI having a focus category setting unit;
having
The attention category setting unit receives an input of an attention category,
The image classification device calculates a portion to be corrected based on the attention category and the category score,
The teacher information creation GUI includes:
Output the part that needs to be corrected,
Accepting assignment of teacher information via a teacher information assignment unit;
1. An image classification device comprising:
 請求項6に記載の画像分類装置であって、前記画像分類装置は、前記教師情報を用いてセグメンテーションモデルを学習させるモデル学習部を有することを特徴とする、画像分類装置。 The image classification device according to claim 6, characterized in that the image classification device has a model learning unit that uses the teacher information to learn a segmentation model.  請求項7に記載の画像分類装置であって、
 前記画像分類装置は、前記セグメンテーションモデルの学習が不十分である度合いを示す不確実性を算出し、
 前記モデル学習部は、前記不確実性の大きい画素の損失が、前記不確実性の小さい画素の損失より小さくなるように前記セグメンテーションモデルを学習させる、
ことを特徴とする、画像分類装置。
The image classification device according to claim 7,
The image classification device calculates an uncertainty indicating a degree to which the segmentation model is insufficiently trained;
The model learning unit learns the segmentation model so that a loss of pixels with large uncertainty is smaller than a loss of pixels with small uncertainty.
1. An image classification device comprising:
 請求項6に記載の画像分類装置であって、
 前記画像分類装置は、前記セグメンテーションモデルの学習が不十分である度合いを表す不確実性を算出し、
 前記不確実性を表示する、
ことを特徴とする、画像分類装置。
7. The image classification device according to claim 6,
The image classification device calculates an uncertainty representing a degree to which the segmentation model is insufficiently trained;
indicating said uncertainty;
1. An image classification device comprising:
 請求項6に記載の画像分類装置であって、
 前記画像分類装置は、
 前記推論画像および前記注目カテゴリに基づき、注目カテゴリに隣接しやすい隣接カテゴリを算出し、
 前記推論画像において前記注目カテゴリに属すると推論された各画素について、前記注目カテゴリに係るカテゴリスコアに基づき、第1不確実性を算出し、
 前記推論画像において前記隣接カテゴリに属すると推論された各画素について、前記注目カテゴリに係るカテゴリスコアに基づき、第2不確実性を算出し、
 前記教師情報作成GUIは、前記第1不確実性および前記第2不確実性を表示する、
ことを特徴とする、画像分類装置。
7. The image classification device according to claim 6,
The image classification device includes:
Calculating an adjacent category that is likely to be adjacent to the attention category based on the inference image and the attention category;
Calculating a first uncertainty for each pixel inferred to belong to the attention category in the inference image based on a category score related to the attention category;
Calculating a second uncertainty for each pixel inferred to belong to the adjacent category in the inference image based on a category score related to the target category;
The teacher information creation GUI displays the first uncertainty and the second uncertainty.
1. An image classification device comprising:
 請求項8に記載の画像分類装置であって、前記教師情報作成GUIは、前記不確実性が大きい箇所を前記要修正箇所として表示することを特徴とする、画像分類装置。 The image classification device according to claim 8, wherein the teaching information creation GUI displays the areas with high uncertainty as the areas requiring correction.  コンピュータを、請求項1に記載の教師情報作成支援装置として機能させるプログラム。 A program that causes a computer to function as the teacher information creation support device described in claim 1.  コンピュータを、請求項6に記載の画像分類装置として機能させるプログラム。 A program that causes a computer to function as the image classification device described in claim 6.
PCT/JP2023/032758 2023-09-07 2023-09-07 Training information creation assistance device, image classification device, and program Pending WO2025052642A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2023/032758 WO2025052642A1 (en) 2023-09-07 2023-09-07 Training information creation assistance device, image classification device, and program
TW113128043A TW202512113A (en) 2023-09-07 2024-07-29 Instructional information generation support device, image classification device and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/032758 WO2025052642A1 (en) 2023-09-07 2023-09-07 Training information creation assistance device, image classification device, and program

Publications (1)

Publication Number Publication Date
WO2025052642A1 true WO2025052642A1 (en) 2025-03-13

Family

ID=94923801

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/032758 Pending WO2025052642A1 (en) 2023-09-07 2023-09-07 Training information creation assistance device, image classification device, and program

Country Status (2)

Country Link
TW (1) TW202512113A (en)
WO (1) WO2025052642A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254529A1 (en) * 2014-03-10 2015-09-10 Canon Kabushiki Kaisha Image processing apparatus and image processing method
CN111428762A (en) * 2020-03-12 2020-07-17 武汉大学 Interpretable remote sensing image feature classification method combining deep data learning and ontology knowledge reasoning
JP2021022236A (en) * 2019-07-29 2021-02-18 セコム株式会社 Classification reliability calculation device, area division device, learning device, classification reliability calculation method, learning method, classification reliability calculation program, and learning program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254529A1 (en) * 2014-03-10 2015-09-10 Canon Kabushiki Kaisha Image processing apparatus and image processing method
JP2021022236A (en) * 2019-07-29 2021-02-18 セコム株式会社 Classification reliability calculation device, area division device, learning device, classification reliability calculation method, learning method, classification reliability calculation program, and learning program
CN111428762A (en) * 2020-03-12 2020-07-17 武汉大学 Interpretable remote sensing image feature classification method combining deep data learning and ontology knowledge reasoning

Also Published As

Publication number Publication date
TW202512113A (en) 2025-03-16

Similar Documents

Publication Publication Date Title
CN109343920B (en) Image processing method and device, equipment and storage medium thereof
CN113822116B (en) Text recognition method, device, computer equipment and storage medium
CN110582783B (en) Training device, image recognition device, training method, and computer-readable information storage medium
WO2017020723A1 (en) Character segmentation method and device and electronic device
JP2018200685A (en) Forming of data set for fully supervised learning
CN109740515B (en) Evaluation method and device
CN114120305A (en) Training method of text classification model, and recognition method and device of text content
CN111814905A (en) Target detection method, target detection device, computer equipment and storage medium
JP6892606B2 (en) Positioning device, position identification method and computer program
CN107067536A (en) A kind of image boundary determines method, device, equipment and storage medium
CN113792780A (en) Container number recognition method based on deep learning and image post-processing
CN116311312A (en) Training method of visual question answering model and visual question answering method
CN114359906B (en) Network image text recognition method and system based on multi-scale feature fusion
CN114359936A (en) Answer card filling and painting identification method, model construction method, equipment and storage medium
CN114494693A (en) Method and device for performing semantic segmentation on image
WO2025052642A1 (en) Training information creation assistance device, image classification device, and program
CN117746018A (en) Customized intention understanding method and system for plane scanning image
CN114973276B (en) Handwriting word detection method and device and electronic equipment
CN115131777B (en) Image recognition method, device, electronic device and readable medium
CN115797939A (en) Two-stage italic character recognition method and device based on deep learning
CN116052196A (en) Text recognition method, device, electronic device and storage medium
US20250095343A1 (en) Image processing apparatus, image processing method, and non-transitory computer-readable storage medium
CN116563869B (en) Page image word processing method and device, terminal equipment and readable storage medium
CN115629831B (en) Device interface data collection method, device, equipment and storage medium
Somashekharaiah et al. Preprocessing techniques for recognition of ancient Kannada epigraphs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23951540

Country of ref document: EP

Kind code of ref document: A1