WO2026009328A1

WO2026009328A1 - Learning device, learning method, and learning program

Info

Publication number: WO2026009328A1
Application number: PCT/JP2024/023973
Authority: WO
Inventors: 芙巳雄二瓶; 亮石井; 順一澤瀬; 徹也山口
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2024-07-02
Filing date: 2024-07-02
Publication date: 2026-01-08
Anticipated expiration: 2027-01-02

Abstract

This learning device receives input of information indicating the type of a unit of analysis in a machine learning model, and data extracted for each unit of analysis of said type. The learning device subsequently adds, to data extracted in a unit of analysis for each type of unit of analysis, embedded information indicating the type of the unit of analysis. The learning device trains the machine learning model by using data obtained by adding, to the data extracted for each unit of analysis, embedded information indicating the type of the unit of analysis. The learning device subsequently outputs a parameter of the trained machine learning model.

Description

Learning device, learning method, and learning program

　本発明は、機械学習モデルの学習装置、学習方法、および、学習プログラムに関する。 The present invention relates to a machine learning model learning device, learning method, and learning program.

　従来、機械学習で会話をモデル化する際に必要となる、分析の単位の種類に頑健なモデル化手法が必要とされている。 Traditionally, when modeling conversations using machine learning, a modeling method that is robust to the type of unit of analysis required has been required.

　ここで、会話のモデル化とは、例えば、会話における音声、映像、テキスト等のデータから会話そのものの属性、会話参加者の属性等を推定する取り組みである。また、会話のデータは、例えば、会話の音声、映像、テキスト等を含むマルチモーダルデータであってもよい。なお、会話のモデル化は、例えば、autoencoderやtransformerにより会話のマルチモーダルデータそのものを復元するような取り組みも含む。 Here, conversation modeling refers to efforts to estimate the attributes of the conversation itself and the attributes of the conversation participants from data such as the audio, video, and text of the conversation. Furthermore, the conversation data may be multimodal data including the audio, video, and text of the conversation. Note that conversation modeling also includes efforts to restore the multimodal data of the conversation itself, for example, using an autoencoder or transformer.

　また、分析の単位とは、会話のデータのインスタンス化に必要な、データを区切る時間区間を定義するカテゴリである。代表的な分析の単位には、発話やターンなどがあり、これらの分析の単位に基づき抽出されたインスタンスがモデルの学習に使用される。 Furthermore, a unit of analysis is a category that defines the time interval that separates the data, which is necessary for instantiating conversational data. Typical units of analysis include utterances and turns, and instances extracted based on these units of analysis are used to train the model.

　従来、複数の話題を含む対話の系列データにおける段落の範囲を推定する技術が提案されている（特許文献１参照）。 Technology has been proposed to estimate the range of paragraphs in sequential data of dialogues containing multiple topics (see Patent Document 1).

特許第7425368号公報Patent No. 7425368

　しかし、上記の技術は、分析の単位を発話に限定していることから、モデルに様々な種類のデータ区切りのデータを入力することはできない。また、モデルはデータ区切りの種類に頑健ではない。 However, because the above technology limits the unit of analysis to speech, it is not possible to input data with various types of data segments into the model. Furthermore, the model is not robust to the type of data segmentation.

　そこで本発明は、前記した問題を解決し、様々な種類のデータ区切りのデータを入力可能であり、かつ、データ区切りの種類に頑健な（データ区切りの種類にかかわらず、モデルの性能が劣化しない）モデルを構築することを課題とする。 The present invention aims to solve the above-mentioned problems by constructing a model that can accept input data with various types of data delimiters and is robust to the type of data delimiter (i.e., the performance of the model does not deteriorate regardless of the type of data delimiter).

　前記した課題を解決するため、本発明は、機械学習モデルにおける分析の単位の種類を示す情報と、前記種類の分析の単位ごとに抽出されたデータの入力を受け付けるデータ入力部と、前記分析の単位で抽出されたデータに、前記分析の単位の種類を示す埋め込み情報を加算する加算部と、前記分析の単位ごとに抽出されたデータに、当該分析の単位の種類を示す埋め込み情報が加算されたデータを用いて、前記機械学習モデルを学習するモデル学習部とを備えることを特徴とする。 In order to solve the above-mentioned problems, the present invention is characterized by comprising a data input unit that accepts input of information indicating the type of unit of analysis in a machine learning model and data extracted for each unit of analysis of that type; an addition unit that adds embedded information indicating the type of unit of analysis to the data extracted for that unit of analysis; and a model learning unit that learns the machine learning model using data in which embedded information indicating the type of unit of analysis has been added to the data extracted for that unit of analysis.

　本発明によれば、様々な種類のデータ区切りのデータを入力可能であり、かつ、データ区切りの種類に頑健なモデルを構築することできる。 According to the present invention, it is possible to input data with various types of data delimiters and to construct a model that is robust to the type of data delimiter.

図１は、学習装置の前提技術を説明するための図である。FIG. 1 is a diagram for explaining the underlying technology of the learning device. 図２は、学習装置が実行する処理の例を説明する図である。FIG. 2 is a diagram illustrating an example of processing executed by the learning device. 図３は、学習装置の構成例を示す図である。FIG. 3 is a diagram illustrating an example of the configuration of a learning device. 図４は、学習装置が実行する処理手順の例を示すフローチャートである。FIG. 4 is a flowchart illustrating an example of a processing procedure executed by the learning device. 図５は、学習プログラムを実行するコンピュータの一例を示す図である。FIG. 5 is a diagram illustrating an example of a computer that executes a learning program.

　以下、図面を参照しながら、本発明を実施するための形態（実施形態）について説明する。本発明は、本実施形態に限定されない。 Below, a description will be given of a form (embodiment) for carrying out the present invention with reference to the drawings. The present invention is not limited to this embodiment.

［前提技術］
　まず、図１を用いて、本実施形態の学習装置の前提技術を説明する。図１に示すデータdは、会話のメンバーa,b,…,zによる会話の様子を時系列データで示したものである。図１において文字が記載された矩形はメンバーの発話を示す。以下において、学習装置による学習対象の機械学習モデルの分析の単位（u）の種類は、発話（分割単位番号＝k）である場合を例に説明する。 [Prerequisite technology]
First, the underlying technology of the learning device of this embodiment will be described using Figure 1. Data d shown in Figure 1 is time-series data showing the state of a conversation between members a, b, ..., z. In Figure 1, rectangles with letters written on them represent the utterances of the members. In the following, an example will be described in which the type of unit (u) of analysis of the machine learning model to be learned by the learning device is utterance (division unit number = k).

　図１は、データdからの抽出データd_iの抽出区間を示している。抽出データd_iの抽出区間は、学習装置による学習対象の機械学習モデルの分析の単位により定義される。この例では、分析の単位は、発話である。よって、学習装置は、ユーザの会話の各発話（k）の開始時刻（区間開始時刻s_i ^k）および終了時刻（区間終了時刻e_i ^k）に基づき、抽出データd_iの抽出区間を設定する。 FIG. 1 shows an extraction interval for extracted data d _i from data d. The extraction interval for extracted data d _i is defined by the unit of analysis of the machine learning model to be learned by the learning device. In this example, the unit of analysis is an utterance. Therefore, the learning device sets the extraction interval for extracted data d i based on the start time (interval start time s _i ^k ) and end time (interval end time e _i ^k ) of each utterance ₍ k) in the user's conversation.

　分析の単位（u）は、発話以外にも、例えば、映像の1フレームに相当する微小時間、発話のターン、発話後の5秒、ランダムにサンプルされた3秒間の区間等、様々なものがある。これらはすべて、機械学習モデルの分析対象ごとに使い分ける必要がある。 The unit of analysis (u) can be anything other than an utterance, such as a minute period of time equivalent to one frame of video, a turn of speech, five seconds after an utterance, or a randomly sampled three-second interval. All of these must be used appropriately depending on the target of analysis for the machine learning model.

　例えば、機械学習モデルのタスクが、ユーザの発話の意図の推定である場合、分析の単位を発話にすることが一般的である。また、例えば、機械学習モデルのタスクが、ユーザの表情認識である場合、映像の1フレームに相当する微小時間を分析の単位にすることが一般的である。 For example, if the task of a machine learning model is to estimate the intention of a user's speech, it is common to use the speech as the unit of analysis. Also, if the task of a machine learning model is to recognize a user's facial expression, it is common to use a small amount of time equivalent to one frame of video as the unit of analysis.

　分析の単位が異なれば、タスクの実行においてデータ中で注目すべき箇所や情報の量や質が異なる。そのため、従来、会話のモデル化において、異なる分析の単位を一括で学習することはなかった。 Different units of analysis require different focus points in the data, as well as different quantities and qualities of information, when performing a task. For this reason, in the past, different units of analysis have not been trained together in conversation modeling.

［概要］
　一方、本実施形態の学習装置は、機械学習モデルの学習において異なる分析の単位（分析単位）を一括で学習する。 [overview]
On the other hand, the learning device of this embodiment learns different units of analysis (analysis units) all at once in learning a machine learning model.

　例えば、学習装置は、機械学習モデルの学習用のデータから任意の分析単位のデータ抽出区間でデータを抽出し、抽出したデータに対して、分析単位の種類を示す情報のembedding（AU（Analyze　Unit）embedding）を加算する。これにより、学習装置は、機械学習モデルの学習用のデータに、分析単位の種類を示す情報を埋め込むことができる。 For example, the learning device extracts data from the data used for training the machine learning model in a data extraction section for an arbitrary analysis unit, and adds embedding information indicating the type of analysis unit (AU (Analyze Unit) embedding) to the extracted data. This allows the learning device to embed information indicating the type of analysis unit in the data used for training the machine learning model.

　その結果、学習装置が、複数の種類の分析単位で抽出されたデータを用いて機械学習モデルを学習しても、AU　embeddingが異なる分析単位のデータ間の差異を吸収するので、分析単位の種類に対して頑健な機械学習モデルを構築することができる。また、学習装置は、機械学習モデルの学習に、多様な分析単位から抽出されたデータを用いることができるので、学習用のデータの量を改善し、より性能の高い機械学習モデルを構築することができる。 As a result, even if a learning device trains a machine learning model using data extracted from multiple types of analytical units, AU embedding absorbs the differences between data from different analytical units, making it possible to build a machine learning model that is robust to the type of analytical unit. Furthermore, because the learning device can use data extracted from a variety of analytical units to train a machine learning model, it is possible to improve the amount of data available for training and build a machine learning model with higher performance.

　なお、上記の機械学習モデルへの入力データは、任意のデータでよい。例えば、入力データは、会話の参加メンバーそれぞれあるいは全員の、音声信号、顔映像、発話の書き起こしテキストのいずれかまたはこれらの組み合わせ等でもよい。また、機械学習モデルの学習には、深層学習等を用いる。 The input data to the above machine learning model can be any data. For example, the input data can be any or a combination of audio signals, facial images, or transcripts of speech from each or all of the participants in the conversation. Furthermore, deep learning, etc., can be used to train the machine learning model.

　次に、図２を用いて、学習装置が実行する処理の例を説明する。例えば、学習装置への入力は、会話のデータ、分析単位の種類、その分析単位のデータの開始時刻および終了時刻である。 Next, an example of the processing performed by the learning device will be described using Figure 2. For example, the input to the learning device is conversation data, the type of analysis unit, and the start and end times of the data for that analysis unit.

　ここでは，k∈N（N：自然数）種類の分析単位uがあるものとする。学習装置は、k番目の分析単位の、i∈N番目のデータ抽出区間の開始時刻s_i ^k∈R(R：実数，≧0)および終了時刻e_i ^k∈Rを用いて、データdから抽出データd_iを抽出する。 Here, it is assumed that there are k∈N (N: natural number) types of analytical units u. The learning device extracts extracted data d _i from data d using the start time s _i ^k ∈R (R: real number, ≧0) and end time e _i ^k ∈R of the i ∈N-th data extraction interval of the k-th analytical unit.

　その後、学習装置は、抽出データd_iをベクトル化する（vectorize）。例えば、学習装置は、抽出データd_iをベクトル化することにより、n∈N次元のデータベクトルへ埋め込む。 Then, the learning device vectorizes the extracted data d _i . For example, the learning device embeds the extracted data d i into an n∈N-dimensional data vector by vectorizing the extracted data d _i .

　また、学習装置は、分析単位uの番号kを、分析単位の種類を示すn次元のベクトル（AU　embedding、分析単位ベクトル）へ埋め込む（emb）。埋め込み（emb）は、例えば、任意の整数をn次元のベクトルへ置き換える処理である。 The learning device also embeds (emb) the number k of the analysis unit u into an n-dimensional vector (AU embedding, analysis unit vector) that indicates the type of analysis unit. Embedding (emb) is, for example, a process of replacing an arbitrary integer with an n-dimensional vector.

　学習装置は、上記のデータベクトルと分析単位ベクトルとの和をモデル（機械学習モデル）に入力し、モデルの学習を行う。モデルは任意であり、例えば、ニューラルネットワーク、SVM（Support　Vector　Machine）等である。 The learning device inputs the sum of the above data vector and analytical unit vector into a model (machine learning model) and trains the model. The model can be any type, such as a neural network or SVM (Support Vector Machine).

　また、学習装置が、モデルにマルチモーダルデータを入力する場合、図２の符号２０１に示す一連の処理を並列にマルチモーダルデータごとに実行する。その後、学習装置は、例えば、マルチモーダルデータごとのデータベクトルと分析単位ベクトルとの和を、マルチモーダルフュージョン手法によりフュージョンした後、モデルへ入力する。マルチモーダルフュージョン手法は、例えば、アーリーフュージョン、ベクトルの連結等である。 Furthermore, when the learning device inputs multimodal data into the model, it executes the series of processes shown by reference numeral 201 in Figure 2 in parallel for each piece of multimodal data. The learning device then fuses, for example, the sum of the data vector and the analysis unit vector for each piece of multimodal data using a multimodal fusion method, and then inputs the result into the model. Examples of multimodal fusion methods include early fusion and vector concatenation.

　学習装置が上記の処理を実行することで、マルチモーダルデータに対して、多様な分析単位に基づきタスクを実行可能なモデルを構築することができる。 By performing the above process, the learning device can build a model that can perform tasks based on various analytical units for multimodal data.

　従来手法ではモデルの学習において、１つの分析単位のみを対象にしているため、例えば、図２に示す分析単位ベクトルを作成するパス（符号２０２参照）は考慮されない。一方、本実施形態の学習装置は、モデルの学習において分析単位ベクトルを作成するパス（符号２０２参照）も備える。これにより、学習装置は、符号２０１に示すパスにより抽出されたデータを用いてモデルを学習する際、符号２０２に示すパスにより得られた情報（分析単位ベクトル）による重み付けを行うことができる。 In conventional methods, model training targets only one analysis unit, and therefore does not take into account, for example, the path (see reference numeral 202) that creates the analysis unit vector shown in Figure 2. In contrast, the training device of this embodiment also includes a path (see reference numeral 202) that creates the analysis unit vector during model training. This allows the training device to perform weighting using the information (analysis unit vector) obtained via the path (see reference numeral 202) when training a model using data extracted via the path (see reference numeral 201).

　例えば、学習装置が、２種類の分析単位（分析単位A，B）のデータを使用して、１つのモデルを学習する場合を考える。ここで、分析単位Aのデータには常に顕著なデータが含まれ、分析単位Bのデータには常に微かなデータが含まれるとする。 For example, consider the case where a learning device uses data from two types of analysis units (analysis units A and B) to learn a single model. Here, assume that the data from analysis unit A always contains significant data, and the data from analysis unit B always contains faint data.

　この場合、学習装置が、モデルの学習において、２つの分析単位のデータを同じ重みで学習してしまうと、分析単位Bのデータがモデルの中であまり影響を与えない可能性がある。ここで、モデルの学習において、例えば、分析単位Bのデータを強く重みづけ、分析単位Aを弱く重みづける等、分析単位Aのデータと分析単位Bのデータの重みをそれぞれ適切に行うことができれば、２つの異なる分析単位のデータを使用してモデルを学習する場合でも、どちらの分析単位のデータもモデルに対して上手く作用すると考えられる。 In this case, if the learning device assigns the same weight to the data from the two units of analysis when training the model, there is a possibility that the data from unit of analysis B will not have much of an impact on the model. However, if the weights for the data from unit of analysis A and the data from unit of analysis B can be appropriately assigned when training the model, for example by heavily weighting the data from unit of analysis B and lightly weighting the data from unit of analysis A, then even when training a model using data from two different units of analysis, it is believed that the data from both units of analysis will work well on the model.

　そこで、学習装置は、分析単位の種類に合わせた重みを作成するため、図２の符号２０２に示す分析単位の種類を示す情報の埋め込み処理を行う。これにより、例えば、学習対象のモデルがニューラルネットワークに基づくモデルである場合、モデルの学習過程において分析単位の種類に合わせてパラメータが調整される、すなわち適切な重みが作成される。これにより、学習装置は、様々な種類の分析単位のデータを入力可能であり、かつ、分析単位の種類に頑健なモデルを構築することができる。 Therefore, in order to create weights that match the type of analysis unit, the learning device performs an embedding process of information indicating the type of analysis unit, as shown by reference numeral 202 in Figure 2. As a result, if the model to be learned is a neural network-based model, for example, parameters are adjusted to match the type of analysis unit during the model learning process, i.e., appropriate weights are created. This allows the learning device to input data for various types of analysis units and to build a model that is robust to the type of analysis unit.

［構成例］
　次に、図３を用いて、学習装置１０の構成例を説明する。学習装置１０は、例えば、入出力部１１、記憶部１２、および、制御部１３を備える。 [Configuration example]
Next, an example of the configuration of the learning device 10 will be described with reference to Fig. 3. The learning device 10 includes, for example, an input/output unit 11, a storage unit 12, and a control unit 13.

　入出力部１１は、各種データの入出力を司るインタフェースである。入出力部１１は、例えば、モデルの学習用のデータ（データd）の入力を受け付ける。 The input/output unit 11 is an interface that handles the input and output of various data. For example, the input/output unit 11 accepts input of data (data d) for model training.

　記憶部１２は、制御部１３が各種処理を実行する際に参照されるデータ、プログラム等を記憶する。記憶部１２は、ＲＡＭ（Random　Access　Memory）、フラッシュメモリ（Flash　Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。例えば、記憶部１２は、入出力部１１で受け付けたモデルの学習用のデータ（データd）、制御部１３によるモデルの学習により得られた、当該モデルのパラメータ等を記憶する。 The memory unit 12 stores data, programs, etc. referenced when the control unit 13 executes various processes. The memory unit 12 is realized by semiconductor memory elements such as RAM (Random Access Memory) or flash memory, or storage devices such as hard disks or optical disks. For example, the memory unit 12 stores model learning data (data d) received by the input/output unit 11, and parameters of the model obtained by model learning by the control unit 13.

　制御部１３は、学習装置１０全体の制御を司る。制御部１３の機能は、例えば、ＣＰＵ（Central　Processing　Unit）が、記憶部１２に記憶されるプログラムを実行することにより実現される。 The control unit 13 is responsible for overall control of the learning device 10. The functions of the control unit 13 are realized, for example, by the CPU (Central Processing Unit) executing a program stored in the memory unit 12.

　制御部１３は、例えば、モデルの学習を行う学習部１３１を備える。学習部１３１は、データ入力部１３２と、データ抽出部１３３と、ベクトル化部１３４と、埋め込み部１３５と、加算部１３６と、モデル学習部１３７とを備える。破線で示す推定部１３８は、装備される場合と装備されない場合とがあり、装備される場合については後記する。 The control unit 13 includes, for example, a learning unit 131 that learns a model. The learning unit 131 includes a data input unit 132, a data extraction unit 133, a vectorization unit 134, an embedding unit 135, an addition unit 136, and a model learning unit 137. The estimation unit 138, shown with a dashed line, may or may not be included; cases where it is included will be described later.

　データ入力部１３２は、モデルの学習用のデータの入力を受け付ける。また、データ入力部１３２は、モデルにおける分析単位の種類を示す情報と、当該種類の分析単位のデータの抽出区間を示す情報（開始時刻および終了時刻）との入力を受け付ける。 The data input unit 132 accepts input of data for model training. The data input unit 132 also accepts input of information indicating the type of analysis unit in the model and information (start time and end time) indicating the extraction interval for data of that type of analysis unit.

　データ抽出部１３３は、上記の分析単位の種類ごとのデータの抽出区間を示す情報に基づき、モデルの学習用のデータから、分析単位ごとのデータを抽出する。例えば、分析単位の種類が分析単位Aと分析単位Bである場合、データ抽出部１３３は、モデルの学習用のデータから分析単位Aのデータと分析単位Bのデータとを抽出する。 The data extraction unit 133 extracts data for each analysis unit from the model training data based on information indicating the data extraction interval for each type of analysis unit. For example, if the types of analysis units are analysis unit A and analysis unit B, the data extraction unit 133 extracts data for analysis unit A and data for analysis unit B from the model training data.

　ベクトル化部１３４は、データ抽出部１３３により抽出されたデータをベクトル化する。例えば、ベクトル化部１３４は、データ抽出部１３３により抽出されたデータをn次元のベクトル（データベクトル）に変換する。例えば、ベクトル化部１３４は、当該データが音声データであれば、wav2vecやHuBERT等の事前学習モデル、OpenSMILE等のツールキットを用いてベクトルに変換する。 The vectorization unit 134 vectorizes the data extracted by the data extraction unit 133. For example, the vectorization unit 134 converts the data extracted by the data extraction unit 133 into an n-dimensional vector (data vector). For example, if the data is audio data, the vectorization unit 134 converts it into a vector using a pre-training model such as wav2vec or HuBERT, or a toolkit such as OpenSMILE.

　埋め込み部１３５は、データ入力部１３２で受け付けた分析単位の種類を示す情報の埋め込み処理を行う。例えば、埋め込み部１３５は、例えば、ニューラルネットワークのembedding　layerにより、上記の分析単位の種類を示す情報をn次元のベクトル（分析単位ベクトル）に埋め込む。 The embedding unit 135 performs an embedding process for information indicating the type of analysis unit received by the data input unit 132. For example, the embedding unit 135 embeds the information indicating the type of analysis unit into an n-dimensional vector (analysis unit vector) using, for example, an embedding layer of a neural network.

　加算部１３６は、分析単位の種類ごとに、上記のデータベクトルと分析単位ベクトルとの和を求める。モデル学習部１３７は、分析単位の種類ごとのデータベクトルと分析単位ベクトルとの和をモデルに入力し、モデルの学習を行う。その後、モデル学習部１３７は、学習により得られた当該モデルのパラメータを記憶部１２に格納する。 The addition unit 136 calculates the sum of the above data vector and analysis unit vector for each type of analysis unit. The model learning unit 137 inputs the sum of the data vector and analysis unit vector for each type of analysis unit into the model and learns the model. The model learning unit 137 then stores the model parameters obtained through learning in the memory unit 12.

　これにより、学習装置１０は、多様な分析単位に頑健なモデルを構築することができる。 This allows the learning device 10 to build a robust model for a variety of analysis units.

［処理手順の例］
　次に、図４を用いて学習装置１０が実行する処理手順の例を説明する。まず、学習装置１０のデータ入力部１３２は、モデルの学習用のデータ、分析単位の種類を示す情報、当該分析単位のデータの抽出区間を示す情報の入力を受け付ける（Ｓ１）。 [Example of processing procedure]
Next, an example of a processing procedure executed by the learning device 10 will be described with reference to Fig. 4. First, the data input unit 132 of the learning device 10 receives input of data for model training, information indicating the type of analysis unit, and information indicating the extraction interval of the data for the analysis unit (S1).

　Ｓ１の後、学習装置１０は、Ｓ１で受け付けた分析単位の種類ごとに以下のＳ２～Ｓ５に示す処理を実行する。 After S1, the learning device 10 executes the processes shown in S2 to S5 below for each type of analysis unit accepted in S1.

　まず、データ抽出部１３３は、Ｓ１で受け付けた分析単位ごとのデータの抽出区間を示す情報に基づき、モデルの学習用のデータから、当該分析単位ごとのデータを抽出する（Ｓ２）。その後、ベクトル化部１３４は、Ｓ２で抽出されたデータをベクトル化する（Ｓ３）。また、埋め込み部１３５は、Ｓ１で受け付けた分析単位の種類を示す情報の埋め込み処理を行う（Ｓ４）。その後、加算部１３６は、Ｓ３でベクトル化されたデータとＳ４の埋め込み処理の結果（分析単位ベクトル）とを加算する（Ｓ５）。 First, the data extraction unit 133 extracts data for each analysis unit from the data used to train the model, based on information indicating the extraction interval of data for each analysis unit received in S1 (S2). Then, the vectorization unit 134 vectorizes the data extracted in S2 (S3). Furthermore, the embedding unit 135 performs an embedding process for information indicating the type of analysis unit received in S1 (S4). Then, the addition unit 136 adds the data vectorized in S3 and the result of the embedding process in S4 (analysis unit vector) (S5).

　学習装置１０が、Ｓ１で受け付けたすべての分析単位の種類についてＳ２～Ｓ５の処理を実行すると、モデル学習部１３７は、ベクトル化されたデータと分析単位ベクトルとを加算した結果を、モデルに入力し、モデルの学習を実行する（Ｓ６）。 Once the learning device 10 has performed steps S2 to S5 for all types of analysis units received in S1, the model learning unit 137 inputs the result of adding the vectorized data and the analysis unit vector into the model and performs model learning (S6).

　学習装置１０が上記の処理を実行することで、様々な種類の分析単位のデータを入力可能であり、かつ、分析単位の種類に頑健なモデルを構築することができる。 By performing the above process, the learning device 10 can input data for various types of analysis units and construct a model that is robust to the type of analysis unit.

［その他の実施形態］
　なお、学習装置１０は、上記の処理により学習されたモデルを用いて、入力データの推定処理を行ってもよい。この場合、学習装置１０は、図３に示す推定部１３８をさらに備える。推定部１３８は、データ入力部１３９と、推定処理部１４０とを備える。 [Other embodiments]
The learning device 10 may perform an estimation process on input data using the model learned by the above process. In this case, the learning device 10 further includes an estimation unit 138 shown in Fig. 3. The estimation unit 138 includes a data input unit 139 and an estimation processing unit 140.

　データ入力部１３９は、学習後のモデルへの入力データ（推定対象のデータと分析単位の種類を示す情報）を受け付ける。 The data input unit 139 accepts input data for the trained model (data to be estimated and information indicating the type of analysis unit).

　なお、入力される分析単位の種類は、モデルの学習に用いられた分析単位の種類のうち少なくともいずれかである。例えば、モデルの学習に用いられた分析単位の種類がA，B，Cであれば、A，B，Cのいずれかまたはこれらの組み合わせが入力可能である。 The type of analysis unit to be input is at least one of the types of analysis unit used to train the model. For example, if the types of analysis unit used to train the model are A, B, and C, then A, B, C, or any combination of these, can be input.

　推定処理部１４０は、学習後のモデルに、推定対象のデータと分析単位の種類を示す情報を入力し、当該モデルにより当該分析単位を用いたデータの推定を行う。そして、推定処理部１４０は、推定結果を出力する。 The estimation processing unit 140 inputs information indicating the data to be estimated and the type of analysis unit into the trained model, and uses the model to estimate the data using that analysis unit. The estimation processing unit 140 then outputs the estimation results.

　例えば、学習部１３１が分析単位A，B，Cでモデルを学習し、データ入力部１３９が、推定対象のデータと分析単位A，Cを示す情報の入力を受け付けた場合を考える。この場合、分析処理部１４０は、モデルに、推定対象のデータと分析単位A，Cを示す情報を入力し、当該モデルにより分析単位A，Cを用いたデータの推定を行う。そして、推定処理部１４０は、推定結果を出力する。 For example, consider a case where the learning unit 131 learns a model using analysis units A, B, and C, and the data input unit 139 receives input of information indicating the data to be estimated and the analysis units A and C. In this case, the analysis processing unit 140 inputs the data to be estimated and the information indicating the analysis units A and C into the model, and estimates the data using the analysis units A and C using the model. The estimation processing unit 140 then outputs the estimation results.

［システム構成等］
　また、図示した各部の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、ＣＰＵ及び当該ＣＰＵにて実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration, etc.]
Furthermore, the components of each unit shown in the figure are conceptual functional units and do not necessarily have to be physically configured as shown. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc. Furthermore, all or any part of the processing functions performed by each device can be realized by a CPU and a program executed by the CPU, or can be realized as hardware using wired logic.

　また、前記した実施形態において説明した処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Furthermore, among the processes described in the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically using known methods. In addition, the information including the processing procedures, control procedures, specific names, various data and parameters shown in the above documents and drawings can be changed as desired unless otherwise specified.

［プログラム］
　前記した学習装置１０は、パッケージソフトウェアやオンラインソフトウェアとしてプログラム（学習プログラム）を所望のコンピュータにインストールさせることによって実装できる。例えば、上記のプログラムを情報処理装置に実行させることにより、情報処理装置を学習装置１０として機能させることができる。ここで言う情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal　Handyphone　System）等の移動体通信端末、さらには、ＰＤＡ（Personal　Digital　Assistant）等の端末等がその範疇に含まれる。 [program]
The learning device 10 can be implemented by installing a program (learning program) as package software or online software on a desired computer. For example, by running the program on an information processing device, the information processing device can function as the learning device 10. The information processing device referred to here includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone Systems), as well as terminals such as PDAs (Personal Digital Assistants).

　図５は、学習プログラムを実行するコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 FIG. 5 shows an example of a computer that executes a learning program. The computer 1000 has, for example, memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.

　メモリ１０１０は、ＲＯＭ（Read　Only　Memory）１０１１及びＲＡＭ（Random　Access　Memory）１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic　Input　Output　System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 Memory 1010 includes ROM (Read Only Memory) 1011 and RAM (Random Access Memory) 1012. ROM 1011 stores a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1090. Disk drive interface 1040 is connected to disk drive 1100. A removable storage medium such as a magnetic disk or optical disk is inserted into disk drive 1100. Serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. Video adapter 1060 is connected to, for example, a display 1130.

　ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、上記の学習装置１０が実行する各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、学習装置１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid　State　Drive）により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. In other words, the programs that define the processes executed by the learning device 10 are implemented as program modules 1093 in which computer-executable code is written. The program modules 1093 are stored, for example, on the hard disk drive 1090. For example, a program module 1093 for executing processes similar to the functional configuration of the learning device 10 is stored on the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

　また、上述した実施形態の処理で用いられるデータは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 Furthermore, data used in the processing of the above-described embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090. Then, the CPU 1020 reads the program modules 1093 and program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as needed and executes them.

　なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３及びプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local　Area　Network）、ＷＡＮ（Wide　Area　Network）等）を介して接続される他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３及びプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and program data 1094 do not necessarily have to be stored on the hard disk drive 1090; they may instead be stored on a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN (Local Area Network) or WAN (Wide Area Network)). The program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.

　１０　学習装置
　１１　入出力部
　１２　記憶部
　１３　制御部
　１３１　学習部
　１３２，１３９　データ入力部
　１３３　データ抽出部
　１３４　ベクトル化部
　１３５　埋め込み部
　１３６　加算部
　１３７　モデル学習部
　１３８　推定部
　１４０　分析処理部 REFERENCE SIGNS LIST 10 Learning device 11 Input/output unit 12 Memory unit 13 Control unit 131 Learning unit 132, 139 Data input unit 133 Data extraction unit 134 Vectorization unit 135 Embedding unit 136 Addition unit 137 Model learning unit 138 Estimation unit 140 Analysis processing unit

Claims

a data input unit that receives input of information indicating the type of unit of analysis in the machine learning model and data extracted for each unit of analysis of the type;
an adding unit that adds embedded information indicating the type of the unit of analysis to the data extracted in the unit of analysis;
and a model learning unit that learns the machine learning model using data in which embedded information indicating the type of unit of analysis is added to the data extracted for each unit of analysis.

The data input unit
The learning device according to claim 1 , wherein the learning device receives input of data extracted from multimodal data for each unit of analysis of the type.

a vectorization unit that vectorizes the data extracted in units of analysis;
The adding unit
The learning device according to claim 1 , wherein the embedding information having the same number of dimensions as the vectorized data is added to the vectorized data.

The learning device according to claim 1, further comprising an estimation unit that outputs an estimation result for the data using the unit of analysis of the type by inputting data to be estimated and information indicating the type of unit of analysis of the data into the machine learning model after learning.

A learning method executed by a learning device, comprising:
receiving input of information indicating the type of unit of analysis in the machine learning model and data extracted for each unit of analysis of the type;
adding embedded information indicating the type of the unit of analysis to the data extracted in the unit of analysis;
and training the machine learning model using data in which embedded information indicating the type of unit of analysis is added to the data extracted for each unit of analysis.

receiving input of information indicating the type of unit of analysis in the machine learning model and data extracted for each unit of analysis of the type;
adding embedded information indicating the type of the unit of analysis to the data extracted in the unit of analysis;
and a step of training the machine learning model using data in which embedded information indicating the type of unit of analysis is added to the data extracted for each unit of analysis.