JP2023000929A

JP2023000929A - Machine learning device and machine learning method

Info

Publication number: JP2023000929A
Application number: JP2021102015A
Authority: JP
Inventors: 弘法松尾; Hironori Matsuo; 忠政入佐; Tadamasa Irisa; 雅一戸部田; Masakazu Tobeta
Original assignee: Aisin Corp
Current assignee: Aisin Corp
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2023-01-04

Abstract

To create a highly accurate learning model even when using composite data.SOLUTION: A machine learning device stores a first learning model for outputting a feature quantity of input data, a second learning model for inputting the feature quantity output from the first learning model to output an identification error whether or not the data input to the first learning model is actual data, and a third learning model for inputting the identification error output from the second learning model to output generated data obtained by imparting features of the actual data to composite data. The machine learning device inputs the actual data, the composite data, and the generated data to the first learning model and performs machine learning of the second learning model on the basis of the output feature quantity. The machine learning device inputs the feature quantity output from the first learning model to the second learning model and performs machine learning of the third learning model on the basis of the output identification error. The machine learning device input the identification error output from the second learning model to the third learning model and performs machine learning of the first learning model on the basis of the output generated data.SELECTED DRAWING: Figure 7

Description

本発明の実施形態は、機械学習装置、および、機械学習方法に関する。 TECHNICAL FIELD Embodiments of the present invention relate to a machine learning device and a machine learning method.

近年、データを入力してそのデータの特徴量（中間出力データ）や最終出力データを出力する学習モデルの研究開発が多く行われている。このような学習モデルによれば、例えば、画像データから人や車両などを検出することができる。また、学習モデルの学習時のデータとしては、実データのほかに、合成データが用いられる場合もある。そうすれば、例えば、実データが少ない場合でも、合成データを用いることで、学習用データを多くすることができる。 2. Description of the Related Art In recent years, much research and development has been carried out on learning models that input data and output feature amounts (intermediate output data) and final output data of the data. According to such a learning model, for example, it is possible to detect people, vehicles, and the like from image data. In addition to real data, synthetic data may also be used as data for learning a learning model. Then, for example, even if there is little actual data, it is possible to increase the amount of learning data by using synthetic data.

特開２０１８－１６３４４４号公報JP 2018-163444 A 特開２０２１－１２５９５号公報JP-A-2021-12595 国際公開第２０１８／１９８２３３号WO2018/198233

しかしながら、従来技術においては、合成データが実データと異なる特性を有している可能性があり、合成データを用いて作成された学習モデルの精度が高くない場合が多かった。 However, in the prior art, synthetic data may have characteristics different from those of actual data, and learning models created using synthetic data are often not highly accurate.

そこで、本発明は、上述の事情に鑑みてなされたものであり、合成データを用いた場合でも高精度な学習モデルを安定して作成することができる機械学習装置、および、機械学習方法を提供することを課題とする。 Therefore, the present invention has been made in view of the above circumstances, and provides a machine learning device and a machine learning method that can stably create a highly accurate learning model even when synthetic data is used. The task is to

実施形態の機械学習装置は、例えば、データを入力して前記データの特徴量を出力する第１学習モデルと、前記第１学習モデルから出力された前記特徴量を入力して前記第１学習モデルに入力されたデータが実データであるか否かの識別誤差を出力する第２学習モデルと、前記第２学習モデルから出力された前記識別誤差を入力して合成データに対して前記実データの特徴が付与された生成データを出力する第３学習モデルと、を記憶する記憶部と、前記第１学習モデルに、前記実データ、前記合成データ、前記生成データを入力して、出力された前記特徴量に基づいて前記第２学習モデルを機械学習する第２の処理を行う第２学習処理部と、前記第２学習モデルに、前記第１学習モデルから出力された前記特徴量を入力して、出力された前記識別誤差に基づいて前記第３学習モデルを機械学習する第３の処理を行う第３学習処理部と、前記第３学習モデルに、前記第２学習モデルから出力された前記識別誤差を入力して、出力された前記生成データに基づいて前記第１学習モデルを機械学習する第１の処理を行う第１学習処理部と、を備える。
この構成によれば、例えば、第３学習モデルによって合成データに対して実データの特徴が付与された生成データを出力し、その生成データを用いることで、高精度な第１学習モデルを安定して作成することができる。 The machine learning device of the embodiment includes, for example, a first learning model that inputs data and outputs a feature amount of the data, and a first learning model that inputs the feature amount output from the first learning model. a second learning model that outputs a discrimination error as to whether or not the data input to is real data; a storage unit that stores a third learning model that outputs generated data to which features are added; a second learning processing unit that performs a second process of machine learning the second learning model based on the feature amount; and inputting the feature amount output from the first learning model to the second learning model. a third learning processing unit that performs a third process of machine-learning the third learning model based on the output identification error; a first learning processing unit that inputs an error and performs a first process of machine learning the first learning model based on the output generated data.
According to this configuration, for example, the third learning model outputs the generated data in which the features of the real data are added to the synthetic data, and by using the generated data, the highly accurate first learning model is stabilized. can be created by

また、実施形態の機械学習装置は、例えば、前記第２の処理および前記第３の処理を、第１の繰り返し終了条件を満たすまで繰り返し、その後に、前記第１の処理を行う、という一連の処理を、第２の繰り返し終了条件を満たすまで繰り返す。
この構成によれば、例えば、上述の全体の繰り返し処理を行うことで、第１学習モデルの精度をより向上させることができる。 Further, the machine learning device of the embodiment repeats the second process and the third process until a first repetition end condition is satisfied, and then performs the first process. The process is repeated until the second iteration end condition is met.
According to this configuration, the accuracy of the first learning model can be further improved by, for example, repeating the entire above-described processing.

また、実施形態の機械学習装置は、例えば、前記合成データと前記生成データとの差分を算出する差分算出処理部を、さらに備え、前記第３学習処理部は、さらに前記差分を用いて前記第３の処理を行う。
この構成によれば、例えば、上述の差分を用いることで、第３学習処理部が合成データを実データに近づけるためだけに合成データからかけ離れた生成データを生成してしまう事態を回避できる。 Further, the machine learning device of the embodiment further includes, for example, a difference calculation processing unit that calculates a difference between the synthesized data and the generated data, and the third learning processing unit further uses the difference to calculate the third Perform the process of 3.
According to this configuration, for example, by using the above-described difference, it is possible to avoid a situation where the third learning processing unit generates generated data that is far from the synthesized data just to bring the synthesized data closer to the real data.

また、実施形態の機械学習装置は、例えば、前記第３学習処理部は、前記識別誤差と、前記差分と、に重みを付けて前記第３の処理を行う。
この構成によれば、例えば、全体の繰り返し処理の前半と後半で重みを変えるなど、場面に応じた重み付けによって、第１学習モデルのさらなる精度向上や演算処理の短時間化などを実現できる。 Further, in the machine learning device of the embodiment, for example, the third learning processing unit weights the identification error and the difference and performs the third processing.
According to this configuration, it is possible to further improve the accuracy of the first learning model and shorten the computation processing time by weighting according to the scene, such as changing the weights in the first half and the second half of the entire repeated processing.

また、実施形態の機械学習方法は、例えば、データを入力して前記データの特徴量を出力する第１学習モデルと、前記第１学習モデルから出力された前記特徴量を入力して前記第１学習モデルに入力されたデータが実データであるか否かの識別誤差を出力する第２学習モデルと、前記第２学習モデルから出力された前記識別誤差を入力して合成データに対して前記実データの特徴が付与された生成データを出力する第３学習モデルと、を用いた機械学習方法であって、前記第１学習モデルに、前記実データ、前記合成データ、前記生成データを入力して、出力された前記特徴量に基づいて前記第２学習モデルを機械学習する第２の処理を行う第２学習処理ステップと、前記第２学習モデルに、前記第１学習モデルから出力された前記特徴量を入力して、出力された前記識別誤差に基づいて前記第３学習モデルを機械学習する第３の処理を行う第３学習処理ステップと、前記第３学習モデルに、前記第２学習モデルから出力された前記識別誤差を入力して、出力された前記生成データに基づいて前記第１学習モデルを機械学習する第１の処理を行う第１学習処理ステップと、を含む。
この構成によれば、例えば、第３学習モデルによって合成データに対して実データの特徴が付与された生成データを出力し、その生成データを用いることで、高精度な第１学習モデルを安定して作成することができる。 Further, the machine learning method of the embodiment includes, for example, a first learning model that inputs data and outputs a feature amount of the data, and a first learning model that inputs the feature amount output from the first learning model. a second learning model that outputs a discrimination error indicating whether or not data input to the learning model is real data; and a third learning model that outputs generated data to which data features are added, wherein the actual data, the synthetic data, and the generated data are input to the first learning model. a second learning processing step of performing a second process of machine-learning the second learning model based on the output feature amount; and providing the second learning model with the feature output from the first learning model a third learning processing step of performing a third process of inputting a quantity and performing machine learning of the third learning model based on the output identification error; a first learning processing step of inputting the outputted identification error and performing a first processing of machine learning the first learning model based on the outputted generated data.
According to this configuration, for example, the third learning model outputs the generated data in which the features of the real data are added to the synthetic data, and by using the generated data, the highly accurate first learning model is stabilized. can be created by

図１は、実施形態の機械学習装置のハードウェア構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a hardware configuration of a machine learning device according to an embodiment; 図２は、実施形態の機械学習装置の機能構成を示す図である。FIG. 2 is a diagram illustrating a functional configuration of the machine learning device according to the embodiment; 図３は、実施形態の合成データと生成データの例を示す図である。FIG. 3 is a diagram illustrating examples of synthesized data and generated data according to the embodiment. 図４は、実施形態における第２学習モデルの学習の説明図である。FIG. 4 is an explanatory diagram of learning of the second learning model in the embodiment. 図５は、実施形態における第３学習モデルの学習の説明図である。FIG. 5 is an explanatory diagram of learning of the third learning model in the embodiment. 図６は、実施形態における第１学習モデルの学習の説明図である。FIG. 6 is an explanatory diagram of learning of the first learning model in the embodiment. 図７は、実施形態の機械学習装置によって実行される処理を示すフローチャートである。FIG. 7 is a flowchart illustrating processing executed by the machine learning device according to the embodiment;

以下、本発明の例示的な実施形態が開示される。以下に示される実施形態の構成、ならびに当該構成によってもたらされる作用、結果、および効果は、一例である。本発明は、以下の実施形態に開示される構成以外によっても実現可能であるとともに、基本的な構成に基づく種々の効果や、派生的な効果のうち、少なくとも一つを得ることが可能である。 Illustrative embodiments of the invention are disclosed below. The configurations of the embodiments shown below and the actions, results, and effects brought about by the configurations are examples. The present invention can be realized by configurations other than those disclosed in the following embodiments, and at least one of various effects based on the basic configuration and derivative effects can be obtained. .

本実施形態の理解を容易にするために、従来技術についてあらためて説明する。また、以下では、機械学習や学習モデルのことをＡＩ（Artificial Intelligence）ともいう。 In order to facilitate understanding of the present embodiment, the related art will be explained again. Machine learning and learning models are hereinafter also referred to as AI (Artificial Intelligence).

一般に、ＡＩを合成データで学習させると実データに適合しなくなる課題がある。これは、ＡＩによって算出される特徴量が、合成データと実データで大きくギャップを有しているためと考えられている。このギャップを埋める手法として、これまでに、例えば、以下の手法１～３が提案されている。 In general, there is a problem that when AI is trained on synthetic data, it does not match real data. It is believed that this is because there is a large gap between synthetic data and real data in feature amounts calculated by AI. As methods for filling this gap, for example, the following methods 1 to 3 have been proposed so far.

手法１（特許文献１）では、ＣＧ（Computer Graphics）データと実写データを分類する識別器を用いて、ＣＧデータと実写データが分類されるようにＣＧデータを修正することで、ＣＧデータと実写データの間のギャップを埋める。 In method 1 (Patent Document 1), a discriminator that classifies CG (Computer Graphics) data and real-world data is used to correct CG data so that the CG data and real-world data are classified, so that CG data and real-world data are classified. Fill gaps between data.

また、手法２（特許文献２）では、学習対象のＡＩが算出する実写データ／ＣＧデータの特徴量の分布の距離を測り、それが小さくなるような制約をかけてＡＩを学習させることで、実写データ／ＣＧデータの特徴量のギャップを埋める。 Also, in Method 2 (Patent Document 2), the distance of the distribution of the feature amount of the live-action data/CG data calculated by the AI to be learned is measured, and by making the AI learn with a constraint that the distance becomes small, To fill the feature amount gap between actual data/CG data.

また、手法３（特許文献３）では、ＣＧデータと実写データの特徴量のギャップ（オフセット）を特定する仕組みをＡＩに組み込み、オフセットを差し引いた特徴量でＡＩを学習することで、ＣＧデータ／実写データのギャップを埋める。 In addition, in method 3 (Patent Document 3), a mechanism for identifying the gap (offset) between the feature amount of CG data and actual data is incorporated into AI, and by learning AI with the feature amount after subtracting the offset, CG data / Filling the gaps in live-action data.

手法１～３は、ＣＧデータ／実写データのギャップを埋める技術であるが、課題がある。例えば、人目線でＣＧデータを実写データに近づけるという手法では、人が見て実写データっぽくなったからＡＩにとっても実写データに近いだろうという考え方がベースになっている。しかし、ＡＩが人と同じ特徴で画像を認識しているとは限らない。そのため、実際にＣＧデータで学習するＡＩから見て、ＣＧデータが実写データに近づいているかどうかはわからない。つまり、ＣＧデータでＡＩが有効な学習をできるとは限らない。 Methods 1 to 3 are techniques for filling the gap between CG data/actual data, but there are problems. For example, in the method of making CG data look like real-life data from a human perspective, it is based on the idea that since it looks like real-life data to humans, it will also be close to real-life data for AI. However, AI does not always recognize images with the same characteristics as humans. Therefore, AI that actually learns from CG data cannot know whether the CG data is close to the actual data. In other words, it is not always possible for AI to learn effectively with CG data.

また、ＣＧデータ／実写データのギャップを埋める仕組みをＣＧデータで学習するＡＩに組み込む手法では、ＡＩに実写データ／ＣＧデータの両方に適合するように（特徴量を算出できるように）学習するが、実写データのみで学習したＡＩの性能向上をしたい場合（既存のＡＩの性能改善をする場合）にこの手法を用いると、ＣＧデータにも適合しようとして実写データへの適合性が低下するリスクが発生する。 In addition, in the method of incorporating a mechanism that fills the gap between CG data / live-action data into AI that learns with CG data, AI learns so that it is suitable for both live-action data / CG data (so that it can calculate the feature amount). , If you want to improve the performance of AI that has learned only with live-action data (improving the performance of existing AI), using this method may reduce the suitability for live-action data by trying to adapt to CG data. Occur.

つまり、合成データが実データと異なる特性を有している可能性があり、従来技術では、合成データを用いて作成された学習モデルの精度が高くない場合が多かった。 In other words, the synthetic data may have different characteristics from the actual data, and in the conventional technology, the accuracy of the learning model created using the synthetic data is often not high.

そこで、以下において、合成データを用いた場合でも高精度な学習モデルを安定して作成することができる技術について説明する。 Therefore, a technique for stably creating a highly accurate learning model even when synthetic data is used will be described below.

図１は、機械学習装置１００のハードウェア構成の一例を示す図である。図１に示すように、機械学習装置１００は、プロセッサ１０１、ＲＯＭ１０２、ＲＡＭ１０３、入力部１０４、表示部１０５、通信Ｉ／Ｆ１０６、ＨＤＤ１０７を備える。この例では、機械学習装置１００は、通常のコンピュータと同様のハードウェア構成を有している。なお、機械学習装置１００が有するハードウェア要素は図１に例示したハードウェア要素に限られるものではなく、例えば、カメラなどをさらに備える形態であっても構わない。 FIG. 1 is a diagram showing an example of a hardware configuration of a machine learning device 100. As shown in FIG. As shown in FIG. 1, machine learning device 100 includes processor 101, ROM 102, RAM 103, input unit 104, display unit 105, communication I/F 106, and HDD 107. In this example, machine learning device 100 has a hardware configuration similar to that of a normal computer. Note that the hardware elements of the machine learning device 100 are not limited to the hardware elements illustrated in FIG.

プロセッサ１０１は、例えば、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＭＰＵ（Micro processing unit）等で構成されるハードウェア回路である。プロセッサ１０１は、プログラムを実行することにより、機械学習装置１００の動作を統括的に制御し、機械学習装置１００が有する各種の機能を実現する。機械学習装置１００が有する各種の機能については後述する。 The processor 101 is a hardware circuit including, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an MPU (Micro processing unit), and the like. By executing a program, the processor 101 comprehensively controls the operation of the machine learning device 100 and implements various functions of the machine learning device 100 . Various functions of the machine learning device 100 will be described later.

ＲＯＭ１０２は、不揮発性のメモリであり、機械学習装置１００を起動させるためのプログラムを含む各種データを記憶する。ＲＡＭ１０３は、プロセッサ１０１の作業領域を有する揮発性のメモリである。 The ROM 102 is a non-volatile memory and stores various data including programs for activating the machine learning device 100 . A RAM 103 is a volatile memory having a work area for the processor 101 .

入力部１０４は、機械学習装置１００を使用するユーザが各種の操作を行うためのデバイスである。入力部１０４は、例えば、マウス、キーボード、タッチパネル、ハードウェアキー等で構成される。 The input unit 104 is a device for a user using the machine learning device 100 to perform various operations. The input unit 104 is composed of, for example, a mouse, keyboard, touch panel, hardware keys, and the like.

表示部１０５は、各種の情報を表示する。表示部１０５は、例えば、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイなどで構成される。なお、例えば、タッチパネルのような形態で、入力部１０４と表示部１０５とが一体に構成されてもよい。通信Ｉ／Ｆ１０６は、ネットワークと接続するためのインタフェースである。ＨＤＤ（Hard Disk Drive）１０７は、各種データを格納する。 The display unit 105 displays various information. The display unit 105 is configured by, for example, a liquid crystal display, an organic EL (Electro Luminescence) display, or the like. Note that, for example, the input unit 104 and the display unit 105 may be configured integrally in a form like a touch panel. Communication I/F 106 is an interface for connecting with a network. A HDD (Hard Disk Drive) 107 stores various data.

図２は、実施形態の機械学習装置１００の機能構成を示す図である。機械学習装置１００は、処理部１と、記憶部２と、を備える。記憶部２は、例えば、ＲＯＭ１０２、ＲＡＭ１０３、ＨＤＤ１０７によって実現される。記憶部２は、各種データを記憶する。記憶部２は、例えば、第１学習モデル２１、第２学習モデル２２、第３学習モデル２３を記憶する。なお、それぞれの学習モデルは処理部１によって用いられるデータであるが、以下では、説明の便宜上、学習モデルが入力や出力を行うような記載をする場合がある。 FIG. 2 is a diagram showing the functional configuration of the machine learning device 100 of the embodiment. A machine learning device 100 includes a processing unit 1 and a storage unit 2 . The storage unit 2 is realized by the ROM 102, the RAM 103, and the HDD 107, for example. The storage unit 2 stores various data. The storage unit 2 stores a first learning model 21, a second learning model 22, and a third learning model 23, for example. Each learning model is data used by the processing unit 1, but for convenience of explanation, the learning model may be described as inputting or outputting.

また、第１学習モデル２１、第２学習モデル２２、第３学習モデル２３のうち、精度向上を図る直接の対象は第１学習モデル２１である。第２学習モデル２２、第３学習モデル２３は、その第１学習モデル２１の精度向上のために用いられる。また、以下において、実データとは、実際に得られたデータであり、例えば、実写データである。また、合成データとは、実データの代用として合成されたデータであり、例えば、ＣＧデータである。また、生成データとは、合成データに対して実データの特徴が付与されたデータ（合成データを加工したデータ）である。 Further, among the first learning model 21, the second learning model 22, and the third learning model 23, the first learning model 21 is the direct target for improving the accuracy. The second learning model 22 and the third learning model 23 are used to improve the accuracy of the first learning model 21 . Further, hereinafter, actual data is actually obtained data, for example, photographed data. Synthetic data is data synthesized as a substitute for real data, such as CG data. The generated data is data obtained by adding the characteristics of the actual data to the synthesized data (data obtained by processing the synthesized data).

ここで、図３は、実施形態において、（ａ）合成データと、（ｂ）生成データの例を示す図である。図３（ａ）に示す合成データに対して実データの特徴が付与されたデータが、図３（ｂ）に示す生成データである。なお、この図３では、両データの違いを道路Ｒ上の車両Ｃの影の有無だけとしたが、これに限定されない。例えば、車両Ｃの部分にもっと複雑な加工がされる場合もあり、また、車両Ｃ以外の道路Ｒ等の部分に加工がされる場合もある。 Here, FIG. 3 is a diagram showing an example of (a) synthesized data and (b) generated data in the embodiment. The generated data shown in FIG. 3B is obtained by adding the features of the real data to the synthetic data shown in FIG. 3A. In FIG. 3, the only difference between the two data is the presence or absence of the shadow of the vehicle C on the road R, but the present invention is not limited to this. For example, the vehicle C portion may be processed more complicatedly, and the road R other than the vehicle C may be processed.

図２に戻って、第１学習モデル２１は、データ（実データ、合成データ、実データ）を入力してデータの特徴量（中間出力データ）や最終出力データを出力する。なお、第１学習モデル２１を含む機械学習の種類は任意である。 Returning to FIG. 2, the first learning model 21 inputs data (real data, synthetic data, real data) and outputs data feature amounts (intermediate output data) and final output data. The type of machine learning including the first learning model 21 is arbitrary.

第２学習モデル２２は、第１学習モデル２１から出力された特徴量（中間出力データ）や最終出力データを入力して第１学習モデル２１に入力されたデータが実データであるか否かを識別し、その識別誤差を出力する。なお、第２学習モデル２２は、第１学習モデル２１に入力されたデータが実データであるか否かの正解データ（例えば、実データ、合成データ、生成データの区別を示すラベル）を取得することで、識別結果が正しいか否かを判定し、識別誤差を算出することができる。 The second learning model 22 inputs the feature amount (intermediate output data) and final output data output from the first learning model 21, and determines whether the data input to the first learning model 21 is real data. identify and output the identification error. In addition, the second learning model 22 acquires correct data indicating whether the data input to the first learning model 21 is real data (for example, a label indicating a distinction between real data, synthetic data, and generated data). Thus, it is possible to determine whether or not the identification result is correct, and to calculate the identification error.

第３学習モデル２３は、第２学習モデル２２から出力された識別誤差を入力して合成データに対して実データの特徴が付与された生成データを出力する。 The third learning model 23 receives the identification error output from the second learning model 22 and outputs generated data obtained by adding the features of the real data to the synthetic data.

処理部１は、ＲＯＭ１０２やＨＤＤ１０７に記憶されているプログラムをプロセッサ１０１が実行することで、実現される。処理部１は、各種演算処理を実行する。処理部１は、機能構成として、第１学習処理部１１、第２学習処理部１２、第３学習処理部１３、推定処理部１４、設定部１５、差分算出処理部１６、制御部１７を備える。 The processing unit 1 is implemented by the processor 101 executing programs stored in the ROM 102 and the HDD 107 . The processing unit 1 executes various arithmetic processing. The processing unit 1 includes, as a functional configuration, a first learning processing unit 11, a second learning processing unit 12, a third learning processing unit 13, an estimation processing unit 14, a setting unit 15, a difference calculation processing unit 16, and a control unit 17. .

ここで、図４は、実施形態における第２学習モデル２２の学習の説明図である。第２学習処理部１２は、第１学習モデル２１に、実データ、合成データ、生成データを入力して、出力された特徴量（中間出力データ）や最終出力データに基づいて第２学習モデル２２を機械学習する第２の処理を行う。なお、最初の時点では、生成データは無いので、実データと合成データによってこの第２の処理を行う。 Here, FIG. 4 is an explanatory diagram of learning of the second learning model 22 in the embodiment. The second learning processing unit 12 inputs real data, synthetic data, and generated data to the first learning model 21, and based on the output feature amount (intermediate output data) and final output data, the second learning model 22 is machine-learned. Since there is no generated data at the first time, the second processing is performed using the actual data and the synthetic data.

また、図５は、実施形態における第３学習モデル２３の学習の説明図である。第３学習処理部１３は、第２学習モデル２２に、第１学習モデル２１から出力された特徴量（中間出力データ）や最終出力データを入力して、出力された識別誤差に基づいて第３学習モデル２３を機械学習する第３の処理を行う。 FIG. 5 is an explanatory diagram of learning of the third learning model 23 in the embodiment. The third learning processing unit 13 inputs the feature amount (intermediate output data) and the final output data output from the first learning model 21 to the second learning model 22, and performs third learning based on the output identification error. A third process of machine-learning the learning model 23 is performed.

また、差分算出処理部１６は、合成データと生成データとの差分を算出する。そして、第３学習処理部１３は、さらにその差分を用いて第３の処理を行う。また、第３学習処理部１３は、識別誤差と、差分と、に重みを付けて第３の処理を行うようにしてもよい。第３学習処理部１３について、以下でさらに具体的に説明する。 Also, the difference calculation processing unit 16 calculates the difference between the synthesized data and the generated data. Then, the third learning processing unit 13 further uses the difference to perform the third processing. Also, the third learning processing unit 13 may perform the third processing by weighting the identification error and the difference. The third learning processing section 13 will be described in more detail below.

第３学習処理部１３は、例えば、第２学習モデル２２による識別誤差（０～１）をｅとしたときに、（１－ｅ）を第３学習モデル２３の誤差として、第３学習モデル２３へフィードバックする。つまり、第３学習処理部１３は、第３学習モデル２３の誤差が小さくなるように、合成データを変換する方法を学習する。このとき、第３学習処理部１３は、差分算出処理部１６により算出される合成データと生成データの差分に基づいて、合成データと生成データがかけ離れないようにする。これにより、第３学習処理部１３は、第１学習モデル２１が実データと騙されるような生成データを生成する。また、第３学習処理部１３は、第３学習モデル２３の学習時に、識別誤差と、差分と、に重みを付けてもよい。例えば、学習初期は第２学習モデル２２の識別誤差の重みを大きくし、ある程度、第１学習モデル２１が騙されるようになったら、合成データ／生成データの差分の重みを大きくすればよい。 For example, when the identification error (0 to 1) by the second learning model 22 is e, the third learning processing unit 13 sets (1-e) as the error of the third learning model 23, and the third learning model 23 feedback to In other words, the third learning processing unit 13 learns how to convert the synthesized data so that the error of the third learning model 23 is reduced. At this time, based on the difference between the synthesized data and the generated data calculated by the difference calculation processor 16, the third learning processing unit 13 prevents the synthesized data and the generated data from being separated from each other. As a result, the third learning processing unit 13 generates generated data such that the first learning model 21 is deceived as real data. Further, the third learning processing unit 13 may weight the identification error and the difference when learning the third learning model 23 . For example, the weight of the identification error of the second learning model 22 may be increased at the beginning of learning, and the weight of the difference between the synthesized data/generated data may be increased when the first learning model 21 is deceived to some extent.

また、図６は、実施形態における第１学習モデル２１の学習の説明図である。第１学習処理部１１は、第３学習モデル２３に、第２学習モデル２２から出力された識別誤差を入力して、出力された生成データに基づいて第１学習モデル２１を機械学習する第１の処理を行う。また、この機械学習において、第１学習モデル２１には実データも入力される。 FIG. 6 is an explanatory diagram of learning of the first learning model 21 in the embodiment. The first learning processing unit 11 inputs the identification error output from the second learning model 22 to the third learning model 23, and machine-learns the first learning model 21 based on the output generated data. process. In this machine learning, actual data is also input to the first learning model 21 .

図２に戻って、また、処理部１は、例えば、第２の処理および第３の処理を、第１の繰り返し終了条件を満たすまで繰り返し、その後に、第１の処理を行う、という一連の処理を、第２の繰り返し終了条件を満たすまで繰り返す（詳細は図７を用いて後述）。 Returning to FIG. 2, the processing unit 1 repeats, for example, the second processing and the third processing until the first repetition end condition is satisfied, and then performs the first processing. The process is repeated until the second repetition end condition is satisfied (details will be described later with reference to FIG. 7).

推定処理部１４は、学習が完了した第１学習モデル２１に対してデータを入力してデータの特徴量（中間出力データ）や最終出力データを出力することで、物体検出等の推定処理を行う。 The estimation processing unit 14 performs estimation processing such as object detection by inputting data to the first learning model 21 that has completed learning and outputting the feature amount (intermediate output data) and final output data of the data. .

設定部１５は、各種パラメータや重み付けの設定等を行う。 The setting unit 15 sets various parameters, weighting, and the like.

制御部１７は、各部１１～１６による処理以外の処理を行う。制御部１７は、例えば、各種情報を表示部１０５に表示する制御を行う。 The control unit 17 performs processing other than the processing by each unit 11-16. The control unit 17 controls display of various information on the display unit 105, for example.

次に、機械学習装置１００によって実行される処理について説明する。図７は、実施形態の機械学習装置１００によって実行される処理を示すフローチャートである。 Next, processing executed by the machine learning device 100 will be described. FIG. 7 is a flowchart showing processing executed by the machine learning device 100 of the embodiment.

まず、ステップＳ１において、第２学習処理部１２は、第１学習モデル２１に、実データ、合成データを入力して、出力された特徴量（中間出力データ）や最終出力データに基づいて第２学習モデル２２を機械学習する第２の処理を行う（図４）。 First, in step S1, the second learning processing unit 12 inputs real data and synthetic data to the first learning model 21, and performs a second learning process based on the output feature amount (intermediate output data) and final output data. A second process of machine-learning the learning model 22 is performed (FIG. 4).

次に、ステップＳ２において、第２学習処理部１２は、学習が完了したか、つまり、所定の学習終了条件を満たしたか否かを判定し、Ｙｅｓの場合はステップＳ３に進み、Ｎｏの場合はステップＳ１に戻る。なお、ここでの学習終了条件としては、例えば、第２学習モデル２２の学習時の損失（loss）について設けられた条件などが考えられる。さらに具体的には、ここでの学習終了は、学習データに対するlossと検証データに対するlossのバランスで判定することが一般的だが、early stopping等の自動的に学習完了を判断する仕組みを用いてもよい。 Next, in step S2, the second learning processing unit 12 determines whether or not learning is completed, that is, whether or not a predetermined learning end condition is satisfied. Return to step S1. As the learning termination condition here, for example, a condition provided for a loss during learning of the second learning model 22 can be considered. More specifically, the end of learning is generally judged by the balance between the loss for training data and the loss for validation data, but it is also possible to use a mechanism that automatically judges the completion of learning, such as early stopping. good.

ステップＳ３において、第３学習処理部１３は、第２学習モデル２２に、第１学習モデル２１から出力された特徴量（中間出力データ）や最終出力データを入力して、出力された識別誤差に基づいて第３学習モデル２３を機械学習する第３の処理を行う。 In step S3, the third learning processing unit 13 inputs the feature amount (intermediate output data) and the final output data output from the first learning model 21 to the second learning model 22, and converts the output identification error into Based on this, a third process of machine learning the third learning model 23 is performed.

次に、ステップＳ４において、第３学習処理部１３は、学習が完了したか、つまり、所定の学習終了条件を満たしたか否かを判定し、Ｙｅｓの場合はステップＳ５に進み、Ｎｏの場合はステップＳ３に戻る。なお、ここでの学習終了条件としては、例えば、第２学習モデル２２の識別誤差や学習回数などが考えられる。さらに具体的には、ここでの学習終了は、例えば、第２学習モデル２２が実データと生成データを正しく識別できなくなったとき（正しく識別できた割合が０．５前後になったとき）である。そして、学習初期に偶然、０．５前後になる場合を考慮して、一定回数の学習回数があることを学習終了条件としてもよい。 Next, in step S4, the third learning processing unit 13 determines whether or not learning is completed, that is, whether or not a predetermined learning end condition is satisfied. Return to step S3. Note that, as the learning termination condition here, for example, the identification error of the second learning model 22, the number of times of learning, etc. can be considered. More specifically, the learning ends here, for example, when the second learning model 22 can no longer correctly distinguish between real data and generated data (when the rate of correct discrimination becomes around 0.5). be. Considering the case where the number of times of learning is about 0.5 by chance at the beginning of learning, the learning end condition may be that the number of times of learning is a certain number.

ステップＳ５において、処理部１は、第１の繰り返し終了条件を満たしたか否かを判定し、Ｙｅｓの場合はステップＳ６に進み、Ｎｏの場合はステップＳ１に戻る。第１の繰り返し終了条件としては、例えば、繰り返し回数（Ｎ１）が考えられる。 In step S5, the processing section 1 determines whether or not the first repetition end condition is satisfied. If Yes, the process proceeds to step S6, and if No, the process returns to step S1. As the first repetition end condition, for example, the number of repetitions (N1) can be considered.

ステップＳ１～Ｓ４の繰り返しによって洗練された（現実化処理が進んだ）生成データを使って、ステップＳ６以降で、合成データを活用して性能向上を狙う第１学習モデル２１を学習する。例えば、学習対象が物体検出のＡＩ（第１学習モデル２１）であれば、生成データを教師データとして検出対象物の種類や位置の情報を与えて、正しく検出できるように学習する。この教師データは合成データを作成時に作成しておく。以下、ステップＳ６以降について説明する。 Using generated data that has been refined by repeating steps S1 to S4 (actualization processing has progressed), the first learning model 21 that aims to improve performance by utilizing synthetic data is learned from step S6 onwards. For example, if the object detection AI (the first learning model 21) is to be learned, information on the type and position of the object to be detected is given as teacher data, and learning is performed so that the object can be detected correctly. This teacher data is created when synthetic data is created. Steps after step S6 will be described below.

ステップＳ６において、第１学習処理部１１は、第３学習モデル２３に、第２学習モデル２２から出力された識別誤差を入力して、出力された生成データに基づいて第１学習モデル２１を機械学習する第１の処理を行う。 In step S6, the first learning processing unit 11 inputs the identification error output from the second learning model 22 to the third learning model 23, and converts the first learning model 21 into a machine based on the output generated data. A first process of learning is performed.

次に、ステップＳ７において、第１学習処理部１１は、学習が完了したか、つまり、所定の学習終了条件を満たしたか否かを判定し、Ｙｅｓの場合はステップＳ８に進み、Ｎｏの場合はステップＳ６に戻る。なお、ここでの学習終了条件としては、例えば、第１学習モデル２１の学習時の損失（loss）について設けられた条件などが考えられる。 Next, in step S7, the first learning processing unit 11 determines whether or not learning is completed, that is, whether or not a predetermined learning end condition is satisfied. Return to step S6. Note that, as the learning termination condition here, for example, a condition provided for a loss during learning of the first learning model 21 can be considered.

ステップＳ８において、処理部１は、第２の繰り返し終了条件を満たしたか否かを判定し、Ｙｅｓの場合は処理を終了し、Ｎｏの場合はステップＳ１に戻る。第２の繰り返し終了条件としては、例えば、繰り返し回数（Ｎ２）が考えられる。 In step S8, the processing unit 1 determines whether or not the second repetition end condition is satisfied. If Yes, the process ends, and if No, the process returns to step S1. As the second repetition end condition, for example, the number of repetitions (N2) can be considered.

なお、ステップＳ６の繰り返しで第１学習モデル２１の学習が進むと、第１学習モデル２１が実データ／生成データから算出する特徴量が変わりえる。そのため、第２学習モデル２２の学習（ステップＳ１）と第３学習モデル２３の学習（ステップＳ３）を再度行う必要があるのでステップＳ１に戻るが、そのとき、第２学習モデル２２、第３学習モデル２３は学習前の初期状態に戻しておく。以上を繰り返すことで、合成データ（生成データ）を使って第１学習モデル２１の性能向上ができる。 Note that as the learning of the first learning model 21 progresses by repeating step S6, the feature amount calculated from the actual data/generated data by the first learning model 21 may change. Therefore, it is necessary to repeat the learning of the second learning model 22 (step S1) and the learning of the third learning model 23 (step S3), so the process returns to step S1. The model 23 is returned to the initial state before learning. By repeating the above, the performance of the first learning model 21 can be improved using the synthesized data (generated data).

このように、本実施形態の機械学習装置１００によれば、第３学習モデル２３によって合成データに対して実データの特徴が付与された生成データを出力し、その生成データを用いることで、高精度な第１学習モデル２１を安定して作成することができる。つまり、合成データを使って性能向上をさせたい第１学習モデル２１の目線で合成データを実データに近づけて（現実化処理して）生成データを生成するため、実データ相当の学習効果が期待できる。 As described above, according to the machine learning device 100 of the present embodiment, the third learning model 23 outputs the generated data in which the features of the real data are added to the synthetic data, and by using the generated data, The accurate first learning model 21 can be stably created. In other words, since the synthetic data is brought closer to the real data (realization processing) from the perspective of the first learning model 21 whose performance is to be improved using the synthetic data, a learning effect equivalent to the real data is expected. can.

また、図７の全体の繰り返し処理を行うことで、第１学習モデル２１の精度をより向上させることができる。 Further, by repeating the entire process of FIG. 7, the accuracy of the first learning model 21 can be further improved.

また、上述の差分を用いることで、第３学習処理部１３が合成データを実データに近づけるためだけに合成データからかけ離れた生成データを生成してしまう事態を回避できる。 Moreover, by using the above-described difference, it is possible to avoid a situation in which the third learning processing unit 13 generates generated data that is far from the synthesized data just to bring the synthesized data closer to the real data.

また、図７の全体の繰り返し処理の前半と後半で上述の重みを変えるなど、場面に応じた重み付けによって、第１学習モデル２１のさらなる精度向上や演算処理の短時間化などを実現できる。 Also, by weighting according to the scene, such as changing the above-mentioned weights in the first half and the second half of the entire iterative process in FIG.

なお、第１学習モデル２１において用いられる特徴量の数は、数万～数十億と多い。したがって、実データと合成データのそれぞれの特徴量に関する特性は、人間には正確に把握できない。本実施形態の機械学習装置１００によれば、第１学習モデル２１が実データと騙されるような生成データを生成することにより、そのような生成データを用いて高精度な第１学習モデル２１を安定して作成できる。 The number of feature values used in the first learning model 21 is as large as tens of thousands to billions. Therefore, it is impossible for humans to accurately grasp the characteristics of the feature amounts of the actual data and the synthetic data. According to the machine learning device 100 of the present embodiment, by generating generated data such that the first learning model 21 is deceived as real data, the highly accurate first learning model 21 is generated using such generated data. It can be created stably.

また、合成データ（生成データ）を使って第１学習モデル２１を学習する際に、性能向上をさせたい第１学習モデル２１に特別な仕組みを持たせる必要がないため、既存のＡＩ（第１学習モデル２１）を有効に活用できる。 In addition, when learning the first learning model 21 using synthetic data (generated data), it is not necessary to provide a special mechanism to the first learning model 21 whose performance is to be improved. Learning model 21) can be effectively utilized.

また、実データと合成データのギャップを埋める現実化処理を行う第３学習モデル２３の学習と、性能向上をさせたい第１学習モデル２１の学習のサイクルを繰り返すことで、双対的に性能向上を図ることができる。これにより、従来技術よりも有意に第１学習モデル２１を性能向上することが期待できる。 In addition, by repeating the cycle of learning the third learning model 23 that performs realization processing to fill the gap between the real data and the synthetic data and learning the first learning model 21 whose performance is to be improved, dual performance improvement is achieved. can be planned. As a result, it can be expected that the performance of the first learning model 21 is significantly improved as compared with the conventional technology.

なお、上述した実施形態における、上記情報処理を実行するためのプログラムを、インストール可能な形式または実行可能な形式のファイルでＣＤ－ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ－Ｒ、ＤＶＤ（Digital Versatile Disk）、ＵＳＢ（Universal Serial Bus）メモリ等のコンピュータで読み取り可能な記録媒体に記録して提供するようにしてもよい。また、当該プログラムを、インターネット等のネットワーク経由で提供または配布するようにしてもよい。また、当該プログラムを、ＲＯＭ等に予め組み込んで提供するようにしてもよい。 It should be noted that the program for executing the information processing in the above-described embodiment can be stored as a file in an installable format or an executable format on a CD-ROM, flexible disk (FD), CD-R, DVD (Digital Versatile Disk). ), USB (Universal Serial Bus) memory or other computer-readable recording medium. Also, the program may be provided or distributed via a network such as the Internet. Alternatively, the program may be provided by being incorporated in a ROM or the like in advance.

また、当該プログラムは、上記各機能構成を含むモジュール構成となっており、実際のハードウェアとしては、例えば、ＣＰＵ（プロセッサ回路）がＲＯＭまたはＨＤＤから当該プログラムを読み出して実行することにより、上述した各機能部がＲＡＭ上にロードされ、上述した各機能部がＲＡＭ上に生成されるようになっている。なお、上述した各機能部の一部または全部を、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field-Programmable Gate Array）などの専用のハードウェアを用いて実現することも可能である。 In addition, the program has a module configuration including each of the above functional configurations. Each functional unit is loaded onto the RAM, and each functional unit described above is generated on the RAM. Part or all of the functional units described above can also be realized using dedicated hardware such as ASIC (Application Specific Integrated Circuit) and FPGA (Field-Programmable Gate Array).

なお、実施形態について説明したが、上記実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。上記新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although the embodiment has been described, the above embodiment is presented as an example and is not intended to limit the scope of the invention. The novel embodiments described above can be embodied in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. The above embodiments are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

例えば、対象となるデータは、実写データ／ＣＧデータのような画像データに限定されず、ほかに、センサデータ（波形データなど）などであってもよい。 For example, the target data is not limited to image data such as photographed data/CG data, and may be sensor data (waveform data, etc.).

また、第２学習モデル２２によって実データと合成データ（生成データ）を識別する場合、直接的に識別する手法に限定されず、例えば、実データと合成データ（生成データ）の特徴量の分布の差を測るＡＩや既存の手法（Kullback-Leibler divergence、JS-divergenceなど）などの間接的に識別する手法を採用してもよい。 Further, when the second learning model 22 discriminates between the real data and the synthetic data (generated data), it is not limited to the direct discrimination method. Indirect identification methods such as AI for measuring differences and existing methods (Kullback-Leibler divergence, JS-divergence, etc.) may be employed.

また、差分算出処理部１６における合成データと生成データの差分算出を、事前に実データ同士、合成データ同士の近さを測る学習をさせたＡＩに置き換えてもよい。 Further, the calculation of the difference between the synthesized data and the generated data in the difference calculation processing unit 16 may be replaced with an AI trained in advance to measure the closeness between the actual data and the synthesized data.

また、識別誤差と合成データ／生成データの差分に重みを付ける際に、重みの付け方自体を最適化するような仕組み（強化学習等）を全体のサイクル（図７のステップＳ１～Ｓ８の繰り返し）に組み込んでもよい。 In addition, when weighting the difference between the identification error and the synthetic data/generated data, a mechanism (reinforcement learning, etc.) that optimizes the weighting method itself is implemented in the entire cycle (repeating steps S1 to S8 in FIG. 7). may be incorporated into

１…処理部、２…記憶部、１１…第１学習処理部、１２…第２学習処理部、１３…第３学習処理部、１４…推定処理部、１５…設定部、１６…差分算出処理部、１７…制御部、２１…第１学習モデル、２２…第２学習モデル、２３…第３学習モデル、１００…機械学習装置、１０１…プロセッサ、１０２…ＲＯＭ、１０３…ＲＡＭ、１０４…入力部、１０５…表示部、１０６…通信Ｉ／Ｆ、１０７…ＨＤＤ Reference Signs List 1 processing unit 2 storage unit 11 first learning processing unit 12 second learning processing unit 13 third learning processing unit 14 estimation processing unit 15 setting unit 16 difference calculation processing Unit 17 Control unit 21 First learning model 22 Second learning model 23 Third learning model 100 Machine learning device 101 Processor 102 ROM 103 RAM 104 Input unit , 105... display section, 106... communication I/F, 107... HDD

Claims

A first learning model for inputting data and outputting a feature amount of the data, and data input to the first learning model for inputting the feature amount output from the first learning model are real data. a second learning model that outputs a discrimination error as to whether or not the discrimination error output from the second learning model is input to output generated data in which the features of the real data are added to the synthetic data a storage unit that stores a third learning model;
Second learning for performing a second process of inputting the actual data, the synthetic data, and the generated data into the first learning model and machine-learning the second learning model based on the output feature amount. a processing unit;
performing a third process of inputting the feature amount output from the first learning model into the second learning model and performing machine learning of the third learning model based on the output identification error; a learning processing unit;
A first process of inputting the identification error output from the second learning model into the third learning model and performing machine learning of the first learning model based on the output generated data. a learning processing unit;
A machine learning device with

The second process and the third process are repeated until the first repetition end condition is satisfied, and then the first process is performed. A series of processes is repeated until the second repetition end condition is met. The machine learning device of claim 1, iterating.

further comprising a difference calculation processing unit that calculates a difference between the synthesized data and the generated data,
3. The machine learning device according to claim 1, wherein said third learning processing unit further uses said difference to perform said third processing.

4. The machine learning device according to claim 3, wherein said third learning processing unit weights said identification error and said difference and performs said third processing.

A first learning model for inputting data and outputting a feature amount of the data, and data input to the first learning model for inputting the feature amount output from the first learning model are actual data. a second learning model that outputs a discrimination error as to whether or not the discrimination error output from the second learning model is input to output generated data in which the features of the real data are added to the synthetic data A machine learning method using a third learning model,
Second learning for performing a second process of inputting the actual data, the synthetic data, and the generated data into the first learning model and machine-learning the second learning model based on the output feature amount. a processing step;
performing a third process of inputting the feature amount output from the first learning model into the second learning model and performing machine learning of the third learning model based on the output identification error; a learning processing step;
A first process of inputting the identification error output from the second learning model into the third learning model and performing machine learning of the first learning model based on the output generated data. a learning processing step;
Machine learning methods, including