JP5766753B2

JP5766753B2 - System parameter learning apparatus, information processing apparatus, method, and program

Info

Publication number: JP5766753B2
Application number: JP2013154983A
Authority: JP
Inventors: 鈴木　潤; 潤鈴木
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Current assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Priority date: 2013-07-25
Filing date: 2013-07-25
Publication date: 2015-08-19
Anticipated expiration: 2033-07-25
Also published as: JP2015026217A

Description

本発明は、システムパラメータ学習装置、情報処理装置、方法、及びプログラムに係り、特に、システムパラメータを学習するシステムパラメータ学習装置、情報処理装置、方法、及びプログラムに関する。 The present invention relates to a system parameter learning apparatus, information processing apparatus, method, and program, and more particularly, to a system parameter learning apparatus, information processing apparatus, method, and program for learning system parameters.

図１１に示すような、音声認識、機械翻訳、文字認識、物体認識、ＤＮＡの構造予測などといった情報処理における識別問題は、図１２に示すように、入力が与えられた時に、出力を予測するシステムとみなすことができる。 As shown in FIG. 11, identification problems in information processing such as speech recognition, machine translation, character recognition, object recognition, DNA structure prediction, etc. predict output when input is given as shown in FIG. It can be regarded as a system.

これらのシステムは一般的に、実行フェーズと構築フェーズに分けることができる。構築フェーズとは、人手により事前にシステムを設計し、システムパラメータ等を決定する作業を指す。そして、実行フェーズでは、構築フェーズで定義された設計に基づき入力を処理し、出力はシステムパラメータに依存して決定される。 These systems can generally be divided into an execution phase and a construction phase. The construction phase refers to the work of designing a system in advance by hand and determining system parameters and the like. In the execution phase, the input is processed based on the design defined in the construction phase, and the output is determined depending on the system parameters.

構築フェーズでは、様々な方法でシステムを構築することができる。例えば、人手により変換規則を記述しておいて、その規則に則って入力を出力へ変換し、それを出力する方法が考えられる。 In the construction phase, the system can be constructed in various ways. For example, a method is conceivable in which a conversion rule is described manually, an input is converted into an output in accordance with the rule, and the output is output.

ただし、変換規則を人手により準備するのは網羅性や整合性を保持するためのコストが非常にかかるため、図１３に示すように、データから自動的にシステムを構築する機械学習手法を用いてシステムを自動構築する方法を用いるのが近年では主流である。 However, manual preparation of conversion rules is very expensive to maintain completeness and consistency. Therefore, as shown in FIG. 13, a machine learning method for automatically constructing a system from data is used. In recent years, the method of automatically constructing a system has been mainstream.

構築フェーズでは、まず、対象とするシステムの入力とそれに対応する出力のペアを用意する。これは、一般に、正解データ或いは教師データとよばれる。教師データとは、教師データ中の入力がシステムに入力された際に、どのような出力がされるべきかを表したデータと言える。次に、この教師データを使って、システムを構築する。必要な要件は、教師データ中の入力に対して、正しい出力が行えるシステムであることである。そこで、機械学習に基づく構築フェーズは、教師データを使って、教師データを正しく判別できるようなシステムパラメータの集合を学習することに帰着する。 In the construction phase, first, an input pair of the target system and a corresponding output pair are prepared. This is generally called correct answer data or teacher data. Teacher data can be said to be data representing what kind of output should be made when input in teacher data is input to the system. Next, a system is constructed using the teacher data. The necessary requirement is that the system can perform correct output with respect to the input in the teacher data. Therefore, the construction phase based on machine learning results in learning a set of system parameters that can correctly discriminate the teacher data using the teacher data.

以上の処理を数式的に表すと以下のようになる。
まず実行フェーズを示す。ｘ＾を一つの入力を表すこととし、Ｘを、システムが受け付ける取り得る全ての入力ｘ＾の集合とする。なお、記号に付された「＾」は、当該記号が行列または多次元配列またはベクトルであることを表わしている。同様に、ｙ＾を一つの出力を表すこととし、Ｙを、システムが許容する取り得る全ての出力ｙ＾の集合とする。また、Ｙ（ｘ）を、ｘ＾が与えられた時に取り得る全ての出力ｙ＾の集合とする。よって、ｘ＾∈Ｘ、ｙ＾∈Ｙ（ｘ＾)⊆Ｙの関係が成り立つ。 The above processing is expressed mathematically as follows.
First, the execution phase is shown. Let x ^ represent one input and let X be the set of all possible inputs x ^ that the system can accept. Note that “＾” attached to a symbol indicates that the symbol is a matrix, a multidimensional array, or a vector. Similarly, let y denote a single output, and let Y be the set of all possible outputs y that the system allows. Also, let Y (x) be the set of all outputs y ^ that can be taken when x ^ is given. Therefore, the relationship x ^ ∈X, y ^ ∈Y (x ^) ⊆Y holds.

次に、ｗ＾をシステムパラメータの集合をベクトル表記したものとする。ここで、ｗ_ｄをベクトルｗ＾のｄ番目の要素であり、同時にｄ番目のシステムパラメータとする。つまり、ｗ＾＝（ｗ_１，．．．，ｗ_Ｎ）かつｄ＝｛１，．．．，Ｎ｝の関係が成り立つ。ただし、システムパラメータ数はＮであり、ｗ＾はＮ次元ベクトルとする。この時、入力ｘ＾が与えられた時に出力ｙ＾を返すシステムを以下のように書きあらわすことができる。 Next, let ＾ be a vector representation of a set of system parameters. Here, w _d is the d-th element of the vector w ^, and at the same time, is the d-th system parameter. That is, w ^ = (w ₁ ,..., W _N ) and d = {1,. . . , N} holds. However, the number of system parameters is N, and w ^ is an N-dimensional vector. At this time, a system that returns an output y ^ when an input x ^ is given can be expressed as follows.

ただし、Φ（ｘ＾，ｙ＾；ｗ＾）は、ｘ＾からｙ＾へ変換する際のスコアを決定する関数であり、ここでは、単にスコア関数と呼ぶ。つまり、ｘ＾が与えられた際に得られる可能性がある全ての出力ｙ＾の中で、この変換スコアが最も高くなるｙ＾が出力として採用されることになる。つまり、ｗ＾は、どの出力が選ばれるかを制御するシステムパラメータであり、システム全体の性能を決定する要因と言える。よって、システムパラメータｗ＾をいかに精度よく求めるかという事が、構築フェーズの最大の要件となる。ここで言う精度よくとは、あらゆる入力に対して可能な限り多くの正しい出力を行うことが可能なｗ＾を求めることを意味する。なお、記号の前に付された「’」は、当該記号が推定された値であることを表わしている。 However, Φ (x ^, y ^; w ^) is a function for determining a score when converting from x ^ to y ^, and is simply referred to as a score function here. That is, among all the outputs y ^ that may be obtained when x ^ is given, y ^ having the highest conversion score is adopted as the output. In other words, w ^ is a system parameter that controls which output is selected, and can be said to be a factor that determines the performance of the entire system. Therefore, how to obtain the system parameter w ^ with high accuracy is the greatest requirement in the construction phase. The term “accurate” as used herein means obtaining w ^ that can perform as many correct outputs as possible for every input. Note that “′” added in front of the symbol indicates that the symbol is an estimated value.

次に、構築フェーズについて述べる。実際に、あらゆる可能な入力に対して最良のパラメータｗ＾を求めることは非常に困難を伴う。それは、実際に、あらゆる可能な入力を列挙することが事実上困難でありことに起因する。そこで、パターン認識の分野では、実データに基づいてｗ＾を決定する。まず、教師データをＤで表す。教師データは入力ｘ＾、出力ｙ＾のペアの集合で構成される。つまり、Ｄ＝｛（ｘ＾_ｉ，ｙ＾_ｉ）｝^Ｎ _ｉ＝１である。このとき、ｘ_ｉを、教師データ中のｉ番目の入力データ、ｙ_ｉを、ｉ番目の入力に対応する出力データとする。
システムパラメータの学習は、以下の最適化問題を解くことで得られる。 Next, the construction phase will be described. In fact, it is very difficult to find the best parameter w for every possible input. That is due to the fact that it is virtually difficult to enumerate all possible inputs. Therefore, in the field of pattern recognition, w ^ is determined based on actual data. First, the teacher data is represented by D. The teacher data is composed of a set of pairs of input x ^ and output y ^. That is, D = {(x ^ _i , y ^ _i )} ^N _{i = 1} . At this time, x _i is the i-th input data in the teacher data, and y _i is the output data corresponding to the i-th input.
Learning system parameters can be obtained by solving the following optimization problem.

このとき、Ｌ（Φ（），ｗ＾，Ｄ）は、リスク関数や損失関数とよばれ、教師データ内の入力に対してどの程度正しい出力を得られるかといった値を返す関数である。現在のシステムパラメータｗ＾を用いて、実際に上記（１）式を用いて判別を行なってみて、より多く間違える場合には、より大きな値となるような関数を用いる。Ω（ｗ＾）は、一般に正則化項とよばれ、教師データが有限個しかない状況で、教師データに現れないデータに対してもより正しく判別できるように、システムパラメータが教師データに過適応しないように、ペナルティを与える項である。例えば、パラメータのＬ２ノルムがなるべく小さくなるような制約を課すことで、パラメータが極端に大きな値をとらないように制限するといったことが、よく用いられる。最終的に、上記（２）式で得られる’ｗ＾は、正則化項の条件のもとで，教師データを最もよく識別することができるパラメータの集合と言える。 At this time, L (Φ (), w ^, D) is called a risk function or a loss function, and is a function that returns a value indicating how much correct output can be obtained with respect to the input in the teacher data. When the current system parameter w is actually used to make a discrimination using the above equation (1), if more mistakes are made, a function having a larger value is used. Ω (w ^) is generally called a regularization term, and in a situation where there is only a limited number of teacher data, system parameters are over-adapted to teacher data so that it can be correctly identified even for data that does not appear in the teacher data. This is a term that gives a penalty. For example, it is often used that a parameter is restricted so as not to take an extremely large value by imposing a constraint such that the L2 norm of the parameter becomes as small as possible. Finally, 'w ^ obtained by the above equation (2) can be said to be a set of parameters that can best identify the teacher data under the condition of the regularization term.

以上が、本発明で対象とする情報処理システムの実行フェーズと構築フェーズを数式的に定義したものである。 The above is a mathematical definition of the execution phase and the construction phase of the information processing system targeted by the present invention.

上記式（２）に基づいたシステムパラメータの獲得は、パターン認識では教師あり学習と呼ばれる。この時、学習後のシステムパラメータ’ｗ＾は、実数値で表される。よって、構築フェーズ終了時に’ｗ＾の値をファイルなどに書き出しておき、実行フェーズでは、書きだしたファイルを読み込んでスコア関数を計算し、出力を得る。
つまり、パラメータ数が多くなればなるほど、その情報を保持するために必要なファイルサイズは大きくなる。 Acquisition of system parameters based on the above equation (2) is called supervised learning in pattern recognition. At this time, the learned system parameter 'w ^ is represented by a real value. Therefore, at the end of the construction phase, the value of “w” is written to a file or the like, and in the execution phase, the written file is read and a score function is calculated to obtain an output.
In other words, the larger the number of parameters, the larger the file size required to hold that information.

ファイルサイズは、そのまま実行時のメモリ占有量と同じとなる。メモリ占有量は、携帯端末等の限定されたリソースしか持たない計算環境で、非常に大きな問題となる可能性がある。また、一般的な計算機上での実行時にも、近年のマルチコアな計算機上で同時に複数実行する際や、他のプログラムになるべく影響を与えないという観点で、メモリ占有量は極力少ないことが望まれる。この問題に対し、これまで、Ｌ１正則化項を用いることで、学習時にパラメータが極力ゼロとなるようにする、といった方策がしばしば用いられてきた（例えば、非特許文献１〜３を参照）。この方法は、パラメータがゼロになった場合は、そのパラメータに関わる項目を消去できることから、モデルサイズを削減できるといった方法論に基づいている。 The file size is the same as the memory occupancy during execution. Memory occupancy can be a significant problem in computing environments with limited resources, such as portable terminals. In addition, even when executing on a general computer, it is desirable that the memory occupancy is as small as possible from the viewpoint of not affecting other programs as much as possible when simultaneously executing multiple on a recent multi-core computer. . In order to solve this problem, a method has been used so far in which the parameter is made zero as much as possible by using the L1 regularization term (see, for example, Non-Patent Documents 1 to 3). This method is based on a methodology in which, when a parameter becomes zero, items related to the parameter can be deleted, so that the model size can be reduced.

Galen Andrew and Jianfeng Gao、「Scalable Training of L1-regularized Log-linear Models」、In Zoubin Ghahra-mani, editor, Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007) Omnipress、2007、p.33-40Galen Andrew and Jianfeng Gao, `` Scalable Training of L1-regularized Log-linear Models '', In Zoubin Ghahra-mani, editor, Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007) Omnipress, 2007, p.33-40 John Duchi and Yoram Singer、「Efficient Online and Batch Learning Using Forward Backward Splitting」、Journal of Machine Learning Research, 10（2009）、p.2899-2934John Duchi and Yoram Singer, “Efficient Online and Batch Learning Using Forward Backward Splitting”, Journal of Machine Learning Research, 10 (2009), p.2899-2934 Lin Xiao、「Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization」、Journal of Machine Learning Research, 11（2010）、p.2543-2596Lin Xiao, “Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization”, Journal of Machine Learning Research, 11 (2010), p.2543-2596

しかし、従来法では、パラメータの値を極力ゼロにするように学習を行うが、ゼロになった値以外は従来通り残るという問題点がある。ここで、値をゼロにするのは、真にそのシステムパラメータに関わる項目に意味がない場合を除き、システムの精度を低下させる恐れがあるため、値をゼロにする効果とシステム性能はトレードオフの関係になる。つまり、なるべくゼロを多くしてモデルサイズを小さくするのが望ましいが、あまりにゼロを多くしすぎると今度はシステム性能が十分でなくなるという問題が発生する可能性がある。 However, in the conventional method, learning is performed so that the parameter value becomes zero as much as possible. However, there is a problem that the values other than the zero value remain as before. Here, setting the value to zero is a trade-off between the effect of setting the value to zero and the system performance because there is a risk of reducing the accuracy of the system unless the item related to the system parameter is truly meaningless. It becomes a relationship. In other words, it is desirable to reduce the model size by increasing the number of zeros as much as possible. However, if the number of zeros is excessively large, there is a possibility that the system performance will be insufficient.

本発明は、上記の事情を鑑みてなされたもので、システムパラメータを記憶するための記憶容量を削減することができるシステムパラメータ学習装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a system parameter learning apparatus, method, and program capable of reducing the storage capacity for storing system parameters.

上記の目的を達成するために本発明に係るシステムパラメータ学習装置は、入力データに対する出力データのスコアを決定するためのスコア関数を用いて、入力データに対して所定の情報処理を行って出力データを出力する情報処理システムにおいて設定される、複数のシステムパラメータを学習するシステムパラメータ学習装置であって、複数の入力データの各々と前記複数の入力データに対する複数の正解出力データの各々とのペアである教師データを入力する教師データ入力部と、前記教師データ入力部により入力された前記教師データと、前記スコア関数とに基づいて、前記複数のシステムパラメータの各々の値が、予め定められた有限個の離散値の集合に含まれる制約を満たし、かつ、最適化された前記複数のシステムパラメータを学習する学習部と、を含んで構成されている。 In order to achieve the above object, a system parameter learning device according to the present invention performs predetermined information processing on input data using a score function for determining a score of output data with respect to input data, and outputs data. A system parameter learning device for learning a plurality of system parameters set in an information processing system that outputs a plurality of input data and a pair of correct output data for each of the plurality of input data Based on a teacher data input unit that inputs certain teacher data, the teacher data input by the teacher data input unit, and the score function, the values of each of the plurality of system parameters are determined in advance. The plurality of system parameters satisfying the constraints included in the set of discrete values and optimized It is configured to include a learning unit for learning the.

本発明に係るシステムパラメータ学習方法は、教師データ入力部と、学習部とを含み、入力データに対する出力データのスコアを決定するためのスコア関数を用いて、入力データに対して所定の情報処理を行って出力データを出力する情報処理システムにおいて設定される、複数のシステムパラメータを学習するシステムパラメータ学習装置におけるシステムパラメータ学習方法であって、前記教師データ入力部によって、複数の入力データの各々と前記複数の入力データに対する複数の正解出力データの各々とのペアである教師データを入力するステップと、前記学習部によって、前記教師データ入力部により入力された前記教師データと、前記スコア関数とに基づいて、前記複数のシステムパラメータの各々の値が、予め定められた有限個の離散値の集合に含まれる制約を満たし、かつ、最適化された前記複数のシステムパラメータを学習するステップとを含む。 A system parameter learning method according to the present invention includes a teacher data input unit and a learning unit, and performs predetermined information processing on input data using a score function for determining a score of output data with respect to input data. A system parameter learning method in a system parameter learning apparatus for learning a plurality of system parameters, set in an information processing system that performs output data output, wherein each of a plurality of input data is Based on the step of inputting teacher data that is a pair with each of a plurality of correct output data for a plurality of input data, the teacher data input by the teacher data input unit by the learning unit, and the score function And each of the plurality of system parameters has a predetermined finite value. Satisfy the constraints in the set of discrete values, and, and a step of learning a plurality of system parameters optimized.

本発明に係る前記学習部は、以下の式に従って、前記最適化されたＮ個のシステムパラメータ’ｗ＾を学習するようにすることができる。

ただし、Ｌ（Φ（）、ｗ＾、Ｄ）は、前記教師データＤの入力データｘ_ｉに対してシステムパラメータｗ＾を用いて前記所定の情報処理を行ったときに得られる出力データが、前記教師データＤの正解出力データｙ_ｉとどの程度間違っているかを示す値を返すリスク関数であり、Ω（ｗ＾）は、正則化項であり、Ｓ^Ｎは、前記Ｎ個のシステムパラメータの各々についての前記有限個の離散値の直積集合である。 The learning unit according to the present invention can learn the optimized N system parameters' w ^ according to the following equation.

However, L (Φ (), w ^, D) , the output data obtained when performing the predetermined processing using the system parameter w ^ with respect to the input data x _i of the training data D is, This is a risk function that returns a value indicating how wrong the correct output data y _i of the teacher data D is, Ω (w ^) is a regularization term, and S ^N is the N system parameters. A Cartesian product set of the finite number of discrete values for each.

また、本発明に係る前記学習部は、以下の式に従って、前記最適化されたＮ個のシステムパラメータ’ｗ＾を学習するようにすることができる。

ただし、α＾は、ラグランジュ未定乗数であり、ρは、予め定められたチューニングパラメータであり、ｕ＾は、補助パラメータである。 In addition, the learning unit according to the present invention can learn the optimized N system parameters' w ^ according to the following equation.

Where α ^ is a Lagrange undetermined multiplier, ρ is a predetermined tuning parameter, and u ^ is an auxiliary parameter.

また、本発明に係る前記学習部によって学習された前記複数のシステムパラメータに基づいて、値が同一となるシステムパラメータでグループを構成し、各グループにインデックス番号を付与し、各グループについて、前記グループを構成するシステムパラメータの各々を、前記グループに付与された前記インデックス番号のシステムパラメータに変換し、前記インデックス番号のシステムパラメータを保存する重複パラメータ圧縮部を更に含むようにすることができる。 Further, based on the plurality of system parameters learned by the learning unit according to the present invention, a group is configured with system parameters having the same value, an index number is assigned to each group, and for each group, the group Is further converted to a system parameter of the index number assigned to the group, and further includes a duplicate parameter compression unit for storing the system parameter of the index number.

また、本発明に係る前記複数のシステムパラメータは、入力データｘ＾及び出力データｙ＾に対する複数の素性抽出関数ｆ_ｄ（ｘ＾、ｙ＾）の重みであって、前記重複パラメータ圧縮部は、各グループについて、前記グループを構成するシステムパラメータの各々を、前記グループに付与された前記インデックス番号のシステムパラメータに変換し、前記インデックス番号のシステムパラメータを保存すると共に、以下の式に示すように、前記スコア関数を定義して保存するようにすることができる。

ただし、Φ（ｘ＾，ｙ＾；ｗ＾）はスコア関数であり、ｖ_ｋは、グループに付与されたインデックス番号ｋのシステムパラメータである。 The plurality of system parameters according to the present invention are weights of a plurality of feature extraction functions f _d (x ^, y ^) for the input data x ^ and output data y ^, For each group, each of the system parameters constituting the group is converted to the system parameter of the index number assigned to the group, and the system parameter of the index number is stored, as shown in the following equation: The score function can be defined and stored.

Here, Φ (x ^, y ^; w ^) is a score function, and _vk is a system parameter of the index number k assigned to the group.

本発明に係る情報処理装置は、入力データを受け付ける入力部と、前記スコア関数と、上記のシステムパラメータ学習装置によって保存された各グループの前記インデックス番号のシステムパラメータとに基づいて、前記入力データに対して、前記所定の情報処理を行って出力データを出力する情報処理部と、を含んで構成されている。 The information processing apparatus according to the present invention, based on the input unit that receives input data, the score function, and the system parameter of the index number of each group stored by the system parameter learning apparatus, On the other hand, an information processing unit that performs the predetermined information processing and outputs output data is included.

本発明に係る情報処理方法は、入力部、及び情報処理部を含む情報処理装置における情報処理方法であって、前記入力部によって、入力データを受け付けるステップと、前記情報処理部によって、前記スコア関数と、上記のシステムパラメータ学習装置によって保存された各グループの前記インデックス番号のシステムパラメータとに基づいて、前記入力データに対して、前記所定の情報処理を行って出力データを出力するステップと、を含む。 An information processing method according to the present invention is an information processing method in an information processing apparatus including an input unit and an information processing unit, the step of receiving input data by the input unit, and the score function by the information processing unit. And outputting the output data by performing the predetermined information processing on the input data based on the system parameter of the index number of each group stored by the system parameter learning device. Including.

本発明に係る第１のプログラムは、コンピュータを、システムパラメータ学習装置の各部として機能させるためのプログラムである。 The first program according to the present invention is a program for causing a computer to function as each part of the system parameter learning device.

本発明に係る第２のプログラムは、コンピュータを、情報処理装置の各部として機能させるためのプログラムである。 The second program according to the present invention is a program for causing a computer to function as each unit of the information processing apparatus.

以上説明したように、本発明のシステムパラメータ学習装置、方法、及びプログラムによれば、教師データに基づいて、複数のシステムパラメータの各々の値が、予め定められた有限個の離散値の集合に含まれる制約を満たし、かつ、最適化された複数のシステムパラメータを学習することにより、システムパラメータを記憶するための記憶容量を削減することができる、という効果が得られる。 As described above, according to the system parameter learning apparatus, method, and program of the present invention, based on the teacher data, each value of the plurality of system parameters is converted into a set of a predetermined finite number of discrete values. By learning a plurality of optimized system parameters that satisfy the included constraints, the storage capacity for storing the system parameters can be reduced.

また、本発明の情報処理装置、方法、及びプログラムによれば、システムパラメータ学習装置によって保存された各グループのインデックス番号のシステムパラメータに基づいて、入力データに対して、所定の情報処理を行って出力データを出力することにより、実行時に必要とされるリソースを削減することができる、という効果が得られる。 In addition, according to the information processing apparatus, method, and program of the present invention, predetermined information processing is performed on input data based on the system parameter of each group index number stored by the system parameter learning apparatus. By outputting the output data, it is possible to reduce the resources required at the time of execution.

本発明の実施の形態を適用する問題の例を示す図である。It is a figure which shows the example of the problem to which embodiment of this invention is applied. 素性とシステムの定義づけの一例を説明するための概念図である。It is a conceptual diagram for demonstrating an example of the definition of a feature and a system. 本発明の実施の形態に係るシステムパラメータ学習装置の構成を示す概略図である。It is the schematic which shows the structure of the system parameter learning apparatus which concerns on embodiment of this invention. 素性抽出関数の一例を説明するための第１の概念図である。It is a 1st conceptual diagram for demonstrating an example of a feature extraction function. 素性抽出関数の一例を説明するための第２の概念図である。It is a 2nd conceptual diagram for demonstrating an example of a feature extraction function. 教師データに基づいて、システムパラメータ値の学習処理を説明するための概念図である。It is a conceptual diagram for demonstrating the learning process of a system parameter value based on teacher data. 本発明の実施の形態に係る情報処理装置の構成を示す概略図である。It is the schematic which shows the structure of the information processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るシステムパラメータ学習装置におけるシステムパラメータ学習処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the system parameter learning process routine in the system parameter learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置における情報処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the information processing routine in the information processing apparatus which concerns on embodiment of this invention. 本実施の形態を用いた場合における実験結果を示す図である。It is a figure which shows the experimental result in the case of using this Embodiment. 従来技術の概要を説明するための第１の説明図である。It is the 1st explanatory view for explaining an outline of conventional technology. 従来技術の概要を説明するための第２の説明図である。It is the 2nd explanatory view for explaining an outline of conventional technology. 従来技術の概要を説明するための第３の説明図である。It is the 3rd explanatory view for explaining an outline of conventional technology.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜概要＞
本発明に係る実施の形態は、ある入力データに対して、事前に定めたシステムの動作定義に基づいて、入力データに対応する出力データを選択し、当該出力データを返す情報処理システムを構築する技術分野に関するものであり、事前に設計されたシステムパラメータを計算機により自動的に獲得する技術、一般にパターン認識や機械学習と呼ばれる技術分野に属する。 <Overview>
The embodiment according to the present invention constructs an information processing system that selects output data corresponding to input data based on a predetermined system operation definition for certain input data and returns the output data. The present invention relates to a technical field, and belongs to a technology field in which pre-designed system parameters are automatically acquired by a computer, generally called a pattern recognition or machine learning.

対象となる情報処理システムには、例えば、音声認識、機械翻訳、文字認識、物体認識、ＤＮＡの構造予測を行う情報処理システムが挙げられる。 Examples of the target information processing system include an information processing system that performs speech recognition, machine translation, character recognition, object recognition, and DNA structure prediction.

本発明に係る実施の形態では、本質的に、必要リソース量（ファイルサイズ、メモリ専有量など）は、どのような計算環境であれ少ないほうがよりよい、という一般的な考えに基づいて、学習後のモデルサイズを圧縮するという課題に取り組む。 In the embodiment according to the present invention, the necessary resource amount (file size, memory exclusive amount, etc.) is essentially based on the general idea that it is better to be smaller in any calculation environment. Tackle the challenge of compressing the model size.

本発明に係る実施の形態では、従来技術とは異なり、システムパラメータ学習後に複数のシステムパラメータが同じ値を取れば、この重複情報を利用して、実際に保持しておかなくてはいけない情報量を減らすことが可能であるという原理を利用する。 In the embodiment according to the present invention, unlike the prior art, if a plurality of system parameters have the same value after system parameter learning, the amount of information that must be actually retained using this duplicate information The principle that it is possible to reduce is used.

例えば、システムパラメータ数が５の場合に、上記（２）式で最終的に得られたシステムパラメータ集合が’ｗ＾＝（０．３，０．３，−０．６，−０．６，１．０）だったとする。すると、’ｗ＾では、０．３と−０．６が２回重複して出現するため、１番目と２番目のシステムパラメータを合わせて、また、３番目と４番目のシステムパラメータを合わせて、’ｗ＾＝（０．３，−０．６，１．０）という３つの情報が最低限あれば同じ結果が得られる。このように、重複した値が多ければ多いほど等価な情報についての情報量を減らして保持することが可能であることがわかる。 For example, when the number of system parameters is 5, the set of system parameters finally obtained by the above equation (2) is' w ^ = (0.3, 0.3, -0.6, -0.6, 1.0). Then, in 'w ^, 0.3 and -0.6 appear twice, so the first and second system parameters are combined, and the third and fourth system parameters are combined. , 'W ^ = (0.3, -0.6, 1.0), the same result can be obtained if there is at least three pieces of information. Thus, it can be seen that the more overlapping values, the smaller the amount of information about equivalent information can be retained.

つまり、本実施の形態では、上記（２）式のシステムパラメータの学習を行う際に、複数のシステムパラメータの各々の値が同じ値を取るように、所定の制約を追加して、システムパラメータの学習を行うことで、結果として得られるシステムパラメータの各々の値が重複するようにする。 That is, in the present embodiment, when learning the system parameter of the above formula (2), a predetermined constraint is added so that each value of the plurality of system parameters takes the same value, and the system parameter By performing learning, each value of the resulting system parameter is made to overlap.

そして、得られたシステムパラメータに対して、同じ値を持つシステムパラメータを一つにまとめることで、保持すべき情報量を削減する。最後に、削減したシステムパラメータを用いて、実際に情報処理システムを実行する。 Then, the system parameters having the same value are combined into one system parameter obtained, thereby reducing the amount of information to be retained. Finally, the information processing system is actually executed using the reduced system parameters.

以上をまとめると、本実施の形態では、大きく分けて以下の３つの処理で構成される。 In summary, the present embodiment is roughly composed of the following three processes.

１．第１の処理として、通常のシステムパラメータ学習時に最終的に得られるシステムパラメータの値が同じになるようなシステムパラメータの獲得処理
２．第２の処理として、得られたシステムパラメータの重複する値を一つにまとめて保持しておくべき情報量を最小限にするシステムパラメータの圧縮処理
３．第３の処理として、最小限に圧縮したシステムパラメータを用いて、情報処理システムを実行フェーズで動作させる処理 1. As a first process, a system parameter acquisition process in which the system parameter value finally obtained at the time of normal system parameter learning becomes the same. As a second process, a system parameter compression process for minimizing the amount of information that should be held by holding duplicate values of the obtained system parameters together 3. As a third process, a process for operating the information processing system in the execution phase using the system parameters compressed to the minimum

より具体的には、第１の処理では、上記（２）式のシステムパラメータ学習問題を、以下の問題に置き換える。
本実施の形態では、以下の（３）式に従って、最適化されたＮ個のシステムパラメータ’ｗ＾を学習する。 More specifically, in the first process, the system parameter learning problem of the above equation (2) is replaced with the following problem.
In this embodiment, the optimized N system parameters' w ^ are learned according to the following equation (3).

ただし、Ｌ（Φ（）、ｗ＾、Ｄ）は、教師データＤの入力データｘ_ｉに対してシステムパラメータｗ＾を用いて所定の情報処理を行ったときに得られる出力データ'ｙ_ｉが、正解出力データｙ_ｉとどの程度間違っているかを示す値を返すリスク関数であり、Ω（ｗ＾）は、正則化項であり、Ｓ^Ｎは、Ｎ個のシステムパラメータの各々についての有限個の離散値の直積集合である。 However, L (Φ (), w ^, D) is output data 'y _i obtained when predetermined information processing is performed on the input data x _i of the teacher data D using the system parameter w ^. , Is a risk function that returns a value indicating how wrong the correct answer data y _i is, Ω (w ^) is a regularization term, and S ^N is a finite number for each of the N system parameters. Is a Cartesian product set of discrete values.

上記（２）式との違いは、単純に制約項ｗ＾∈Ｓ^Ｎが増えただけである。この制約は、システムパラメータｗ＾がある任意の離散集合Ｓ^Ｎの要素となる場合にだけ解として認めるということを意味している。つまり、上記（２）式と同等の最適化問題を解くが、解は制約を満たしている必要がある。この制約は、システムパラメータの値をなるべく重複してとるように設計するために、有限個の離散値からなる集合Ｓの直積集合Ｓ^Ｎで構成する。ただし、効率的な計算のために、離散値の集合Ｓは、必ずゼロを含むことと、ゼロを中心として対称となる値（例えば１と−１，５と−５など）が必ず含まれることとする。 The difference from the above equation (2) is simply that the constraint term w ^ εS ^N is increased. This restriction means that the solution is allowed only when the system parameter w is an element of an arbitrary discrete set ^SN . That is, the optimization problem equivalent to the above equation (2) is solved, but the solution needs to satisfy the constraints. This constraint is constituted by a Cartesian product set ^SN of a set S composed of a finite number of discrete values in order to design the system parameter values to be duplicated as much as possible. However, for efficient calculation, the set S of discrete values must always include zero and must always include values that are symmetric about zero (for example, 1 and -1, 5 and -5). And

第２の処理として、同じ値となったシステムパラメータを一つにまとめる。これは、構築フェーズの後処理的な位置づけになる。例えば、ｗ_ｉ＝ｗ_ｊ＝ｗ_ｋとなったと仮定する。つまり、ｉ，ｊ，ｋ番目のシステムパラメータ(システムパラメータベクトルの要素) が同じ値となったとき、ｗ_ｉ，ｗ_ｊ，ｗ_ｋを削除し、新たにｗ_ｌとするといった処理に相当する。ただし、新たに追加したｗ_ｌはｗ_ｉなどと同じ値であり、インデックスｉ，ｊ，ｋはインデックスｌに新たに振り直されたとみなすことと等価である。このように、同じ値を保持していても冗長な情報なので、それらを一つにまとめて新たに割り振ったインデックスを、元の値のインデックスが指すことで、同じ情報を得られるようにする。こうすることで、従来と同じ形式のシステムパラメータでありながら、システムパラメータの値が重複した分を減らした、システムパラメータの集合を獲得することができる。 As a second process, system parameters having the same value are combined into one. This is a post-processing position of the construction phase. For example, assume that w _i = w _j = w _k . That is, when the i, j, k-th system parameters (elements of the system parameter vector) have the same value, this corresponds to a process of deleting w _i , w _j , w _k and newly setting them to w _l . However, the newly added w _l is the same value as w _i and the like, and the indexes i, j, and k are equivalent to assuming that the index l is newly reassigned. In this way, even if the same value is held, it is redundant information, so that the same information can be obtained by referring to the index of the original value that points to the newly allocated index. By doing so, it is possible to obtain a set of system parameters, which are system parameters in the same format as before, but with a reduced amount of duplicate system parameter values.

第３の処理は、得られたシステムパラメータを用いて実際にシステムを動作させる。この処理は、第２の処理で、従来と同じ形式のシステムパラメータを得るようにすることで、処理としては、従来と同等の上記（１）式の処理が使える。 In the third process, the system is actually operated using the obtained system parameters. In this process, the system parameter of the same format as in the conventional system is obtained in the second process, so that the process of the above formula (1) equivalent to the conventional process can be used.

以下では、図１に示すような、自然言語処理の固有表現抽出と係り受け解析に対する問題について、本発明の実施の形態を適用した場合を想定して説明する。これらの問題は、構造予測問題と呼ばれ、グラフ構造などに変換されたものを入力として受け取り、同じくグラフ構造で表されるものを出力とする問題とみなすことができる。 In the following, the problem of natural language processing specific expression extraction and dependency analysis as shown in FIG. 1 will be described assuming that the embodiment of the present invention is applied. These problems are called structure prediction problems, and can be regarded as problems that receive as an input what has been converted to a graph structure or the like and output what is also represented by the graph structure.

以下で説明する、構築および実行フェーズでは、図２に示すように、最初にそれぞれの入力と出力の特徴付ける素性の定義を人手にて与える。ここでは、素性は関数として定義することを想定するため、素性抽出関数の集合を定義することと等価である。また、上記図２に示すように、ある入力データが与えられた場合に、どのような出力データが出力されるかといった、システムの動作定義も人手にて与える。これは、実際に解く問題、ここでは固有表現抽出や係り受け解析、の問題の定義に従って自動的に決まるものである。 In the construction and execution phase described below, as shown in FIG. 2, first, the definition of the features that characterize each input and output is given manually. Here, since the feature is assumed to be defined as a function, it is equivalent to defining a set of feature extraction functions. Further, as shown in FIG. 2, the system operation definition such as what kind of output data is output when given input data is given manually. This is automatically determined according to the definition of the problem to be solved, here, the specific expression extraction and dependency analysis.

＜システムパラメータ学習装置のシステム構成＞
本発明の実施の形態に係るシステムパラメータ学習装置１００は、ＣＰＵと、ＲＡＭと、後述するシステムパラメータ学習処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。図３に示すように、システムパラメータ学習装置１００は、教師データ入力部１と、演算部２と、システムパラメータ記憶部４とを備えている。 <System configuration of system parameter learning device>
A system parameter learning apparatus 100 according to an embodiment of the present invention is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a system parameter learning processing routine described later, and is functionally Is configured as follows. As shown in FIG. 3, the system parameter learning device 100 includes a teacher data input unit 1, a calculation unit 2, and a system parameter storage unit 4.

教師データ入力部１は、教師データの入力を受け付ける。ここで、教師データは、上記図１で示したように、予め定められた入力データと、当該入力データに対する正解出力データとの複数ペアである。また、教師データ入力部１は、後述する学習部３で用いるパラメータρの入力を受け付ける。パラメータρは、人手によって予め与えられる。 The teacher data input unit 1 receives input of teacher data. Here, as shown in FIG. 1, the teacher data is a plurality of pairs of predetermined input data and correct output data for the input data. The teacher data input unit 1 accepts input of a parameter ρ used in the learning unit 3 described later. The parameter ρ is given in advance by hand.

演算部２は、教師データベース２０、学習部３、及び重複パラメータ圧縮部４０を備えている。 The calculation unit 2 includes a teacher database 20, a learning unit 3, and a duplicate parameter compression unit 40.

教師データベース２０には、教師データ入力部１により受け付けた教師データが格納される。 Teacher data received by the teacher data input unit 1 is stored in the teacher database 20.

学習部３は、教師データベース２０に格納された教師データと、予め定められたスコア関数とに基づいて、複数のシステムパラメータを学習する。具体的には、学習部３は、教師データとスコア関数とに基づいて、複数のシステムパラメータの各々の値が、予め定められた有限個の離散値の集合に含まれる制約を満たし、かつ、最適化された複数のシステムパラメータを学習する。学習部３は、初期化部３０、システムパラメータ更新部３２、補助パラメータ更新部３４、未定乗数更新部３６、及び収束判定部３８を備えている。 The learning unit 3 learns a plurality of system parameters based on teacher data stored in the teacher database 20 and a predetermined score function. Specifically, the learning unit 3 satisfies, based on the teacher data and the score function, each value of the plurality of system parameters satisfies a constraint included in a predetermined finite set of discrete values, and Learn multiple optimized system parameters. The learning unit 3 includes an initialization unit 30, a system parameter update unit 32, an auxiliary parameter update unit 34, an undetermined multiplier update unit 36, and a convergence determination unit 38.

学習部３における学習アルゴリズムでは、目的関数を最小化するシステムパラメータｗ＾＝（ｗ_１，ｗ_２，．．．ｗ_Ｎ）を学習する。具体的には、正解データと教師あり学習アルゴリズムを人手により決定し、それを構築フェーズの設定として与える。以下では、システムパラメータｗ＾を学習するための目的関数について最初に説明する。 The learning algorithm in the learning unit 3 learns system parameters w ^ = (w ₁ , w ₂ ,... W _N ) that minimize the objective function. Specifically, the correct answer data and the supervised learning algorithm are determined manually and given as the setting of the construction phase. Hereinafter, an objective function for learning the system parameter w will be described first.

ここでは、素性を抽出するための素性抽出関数ｆを、入力データｘ＾と出力データｙ＾の組み合わせで定義される関数とする。個々の素性抽出関数は、ｆ_ｄ（ｘ＾，ｙ＾）の形式の関数であり、任意の実数値を返す関数である。ここで、ｗ＾と同様に、素性抽出関数の集合をベクトル表記ｆ＾（ｘ＾，ｙ＾）で表す。このとき、ｆ_ｄ（ｘ＾，ｙ＾）は、ｆ＾（ｘ＾，ｙ＾）のｄ番目の要素を表す。素性抽出関数によって抽出される素性の例を、図４、及び図５に示す。 Here, the feature extraction function f for extracting features is a function defined by a combination of input data x ^ and output data y ^. Each feature extraction function is a function of the form f _d (x ^, y ^) and is a function that returns an arbitrary real value. Here, like w ^, a set of feature extraction functions is represented by a vector notation f ^ (x ^, y ^). At this time, f _d (x ^, y ^) represents the d-th element of f ^ (x ^, y ^). Examples of features extracted by the feature extraction function are shown in FIG. 4 and FIG.

また、上記の素性抽出関数の数が、システムパラメータの数Ｎとなるように設計すると、スコア関数（）は以下の（４）式のように線形関数として定義できる。 If the number of feature extraction functions is designed to be the number N of system parameters, the score function () can be defined as a linear function as shown in the following equation (4).

つまり、システムパラメータｗ＾_ｄは素性抽出関数ｆ_ｄの重みであり、値が大きければ、素性抽出関数ｆ_ｄがよりシステムの出力データの選択に影響をおよぼし、マイナス側に大きければ、システムが出力データｙ＾をより選ばないような重み付けをしたことに相当する。つまりここでのシステムパラメータは素性の重みを決める値である。学習部３では、図６に示すように教師あり学習アルゴリズムによってシステムパラメータの学習を行う。 In other words, the system parameter w ^ _d is the weight of the feature extraction function f _d , and if the value is large, the feature extraction function f _d more influences the selection of the output data of the system. This corresponds to weighting such that data y is not selected more. In other words, the system parameter here is a value that determines the weight of the feature. The learning unit 3 learns system parameters using a supervised learning algorithm as shown in FIG.

目的関数は上記（３）式で表わされるが、上記（３）式は、解に離散制約が入るので、基本的に離散最適化問題となり、厳密に解を求めるのが非常に困難な問題の系になる。しかし、双対分解に基づく方法（例えば、参考文献１（Stephen Boyd, Neal Parikh, Eric Chu, BorjaPeleato, and Jonathan Eckstein、「Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers」、2011、Foundations and Trends in Machine Learning）を参照）を活用することで効率的な解法が得られる。まず、双対分解に基づいて上記（３）式の制約を分解する。 The objective function is expressed by the above equation (3). However, the above equation (3) is basically a discrete optimization problem because the solution has discrete constraints, and it is very difficult to obtain a solution strictly. Become a system. However, methods based on dual decomposition (eg, Reference 1 (Stephen Boyd, Neal Parikh, Eric Chu, BorjaPeleato, and Jonathan Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”, 2011, Foundations and Trends in (See Machine Learning) to obtain an efficient solution. First, the constraint of the above equation (3) is decomposed based on dual decomposition.

ここで、ｕ＾は、補助パラメータである。これは、等式制約ｗ＾＝ｕ＾を用いて従来の最適化問題と制約を分離したことになる。次に、上記参考文献１の拡張ラグランジュ緩和を用いて制約を目的関数に代入する。 Here, u ^ is an auxiliary parameter. This means that the conventional optimization problem and the constraint are separated by using the equality constraint w ^ = u ^. Next, the constraint is substituted into the objective function by using the extended Lagrangian relaxation of Reference Document 1 above.

α＾はラグランジュ乗数である。上記参考文献１にしたがって、初期化部３０、システムパラメータ更新部３２、補助パラメータ更新部３４、未定乗数更新部３６、及び収束判定部３８における処理によって、上記（６）式に示す目的関数を最適化することにより、本実施の形態では、値が重複したシステムパラメータが得られる。 α ^ is a Lagrange multiplier. According to the above reference 1, the objective function shown in the above equation (6) is optimized by the processing in the initialization unit 30, the system parameter update unit 32, the auxiliary parameter update unit 34, the undetermined multiplier update unit 36, and the convergence determination unit 38. In this embodiment, system parameters with overlapping values are obtained.

リスク関数は、基本的にどのような関数を用いてもよいが、ここでは、凸関数であることに限定する。本実施の形態では、リスク関数として、以下の（７）式に示すヒンジ損失関数を用いる。 As the risk function, basically any function may be used, but here, the risk function is limited to a convex function. In the present embodiment, a hinge loss function shown in the following equation (7) is used as the risk function.

ここで、Ｅ（ｙ＾，’ｙ＾）は、ｙ＾と’ｙ＾とがどの程度違っているかを示す関数である。ｙ＾と’ｙ＾との違いが大きい程、Ｅ（ｙ＾，’ｙ＾）は０以上の大きい値となり、ｙ＾と’ｙ＾とが同一である場合にＥ（ｙ＾，’ｙ＾）は０となる。 Here, E (y ^, 'y ^) is a function indicating how much y ^ differs from' y ^. The larger the difference between y ^ and 'y ^, the larger E (y ^,' y ^) is greater than 0. If y ^ and 'y ^ are the same, E (y ^,' y ^) Becomes 0.

また、リスク関数と同様に、正則化項も凸関数であることとする。例えば、正則化項について、以下の（８）式に示すＬ１ノルム正則化項を用いる。 Also, like the risk function, the regularization term is also a convex function. For example, for the regularization term, the L1 norm regularization term shown in the following equation (8) is used.

リスク関数、及び目的関数が凸関数であるため、上記（７）式の最適化問題は基本的に凸最適化であり、大域的な最適解が必ず存在する。 Since the risk function and the objective function are convex functions, the optimization problem of the above equation (7) is basically convex optimization, and there is always a global optimum solution.

初期化部３０は、最適化に用いる３種類のパラメータｗ＾、ｕ＾、α＾を全て０にセットする。以下繰り返し計算となるため、繰り返し回数を管理する変数をｔとし、ｔ＝０とする。 The initialization unit 30 sets all three parameters w ^, u ^, and α ^ used for optimization to zero. Since the calculation is repeated, the variable for managing the number of repetitions is t, and t = 0.

システムパラメータ更新部３２は、初期化部３０で初期化された３種類のパラメータｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）、又は後述する各処理で前回更新された３種類のパラメータｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）を用いて、以下に説明するように、システムパラメータｗ＾^（ｔ）からｗ＾^{（ｔ＋１）}へ更新する。
反復計算ｔの時点で、ｕ＾^（ｔ）とα＾^（ｔ）を固定したときのｗ＾の最適解は、目的関数Ｏ（ｗ＾，ｕ＾^（ｔ），α＾^（ｔ）；Ｄ）を最小にするｗ＾を見つける問題である。 The system parameter update unit 32 includes three types of parameters w ^, u ^ ^(t) , α ^ ^(t) initialized by the initialization unit 30, or three types of parameters w updated last time in each process described later. Using {circumflex over ⁽ u ⁾ }, {circumflex over ⁽ u ⁾ } ^(t) , and {circumflex over ⁽ α ⁾ } ^(t) , the system parameters w ^ ^(t ⁾ are updated to w ^ ^{(t + 1) as described} below.
At the time of the iterative calculation t, the optimal solution of w ^ when u ^ ^(t) and α ^ ^(t) are fixed is the objective function O (w ^, u ^ ^(t) , α ^ ^(t) ; D ) To minimize w).

定義にしたがって、ｗ＾に関係する項のみを取り出すと、以下の式になる。 If only the terms related to w ^ are taken out according to the definition, the following equation is obtained.

この最適化問題は、上記（２）式にＬ２ノルム正則化項が追加された最適化問題とみなすことができる。この問題は、上記（２）式が凸最適化問題であれば、必ず凸最適化問題となるため、上記（２）式を解く方法を用いて容易に解くことができる。
そこで、システムパラメータ更新部３２は、初期化部３０で初期化された３種類のパラメータｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）、又は後述する各処理で前回更新された３種類のパラメータｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）を用いて、上記（１０）式に従って、システムパラメータｗ＾^（ｔ）からｗ＾^{（ｔ＋１）}へ更新する。 This optimization problem can be regarded as an optimization problem in which the L2 norm regularization term is added to the above equation (2). This problem is always a convex optimization problem if the above equation (2) is a convex optimization problem, and can be easily solved using a method for solving the above equation (2).
Therefore, the system parameter update unit 32 includes the three types of parameters w ^, u ^ ^(t) , α ^ ^(t) initialized by the initialization unit 30, or the three types of parameters updated last time in each process described later. Using the parameters w ^, u ^ ^(t) , α ^ ^(t) , the system parameters w ^ ^(t ⁾ are updated to w ^ ^{(t + 1)} according to the above equation (10).

補助パラメータ更新部３４は、システムパラメータ更新部３２で更新されたｗ＾^{（ｔ＋１）}と、初期化部３０で初期化されたパラメータα＾^（ｔ）、又は前回更新されたパラメータα＾^（ｔ）とを用いて、以下に説明するように、補助パラメータｕ＾^（ｔ）からｕ＾^{（ｔ＋１）}へ更新する。
ｗ＾^{（ｔ＋１）}とα＾^（ｔ）を固定したときｕ＾の最適解は、目的関数Ｏ（ｗ＾^{（ｔ＋１）}、ｕ＾，α＾^（ｔ）；Ｄ）に制約ｕ＾∈Ｓ^Ｎが追加された最適化問題の解である。まず、ｕ＾に関連する項だけ集めると、以下の目的関数となる。 The auxiliary parameter update unit 34 is updated by the system parameter update unit 32 ＾ ^{(t + 1)} , the parameter α ^ ^(t) initialized by the initialization unit 30, or the parameter α ^ ^(t) updated last time. As described below, the auxiliary parameters u ^ ^(t ⁾ are updated to u ^ ^{(t + 1)} .
When w ^ ^{(t + 1)} and α ^ ^(t) are fixed, the optimal solution of u ^ is constrained to the objective function O (w ^ ^{(t + 1)} , u ^, α ^ ^(t) ; D) u ^ ∈S ^N Is the solution to the added optimization problem. First, if only terms related to u ^ are collected, the following objective function is obtained.

まず、制約が無い場合を考えると、目的関数Ｏ（ｗ＾，ｕ＾，α＾；Ｄ）のｕ＾に関する勾配が零ベクトルになる点である。その関係から以下の関係式が得られる。 First, considering the case where there is no restriction, the gradient of the objective function O (w ^, u ^, α ^; D) with respect to u ^ is a zero vector. From this relationship, the following relational expression is obtained.

次に、制約を考慮すると、上記式（１１）から、制約が無い場合の最適解ｕ＾＝ｗ＾^{（ｔ＋１）}＋α＾^（ｔ）からＬ２距離で最も近いＳ^Ｎに属する点であることがわかる。よって、ｕ＾＝ｗ＾^{（ｔ＋１）}＋α＾^（ｔ）から最も近いＳ^Ｎに属する点を見つける処理を行えばよい。 Next, considering the constraints, from the above equation (11), the optimal solution u ^ = w ^ ^{(t + 1)} + α ^ ^(t) when there is no constraint is the point belonging to the nearest ^SN at L2 distance. Recognize. Therefore, a process for finding a point belonging to the nearest ^SN from u ^ = w ^ ^{(t + 1)} + α ^ ^(t) may be performed.

Ｓ^ＮがＳの直積集合であることから、各軸に並行に格子状に並んだ点がＳ^Ｎに含まれる点であるため、Ｌ２距離で最も近い点は、単純にパラメータ毎に最も近いＳに含まれる点を見つける問題に帰着できる。このことから、各パラメータに対して独立に処理が可能であり、非常に効率的に求めることができる。
以上より、補助パラメータ更新部３４は、システムパラメータ更新部３２で更新されたｗ＾^{（ｔ＋１）}と、初期化部３０で初期化されたパラメータα＾^（ｔ）、又は前回更新されたパラメータα＾^（ｔ）とを用いて、上記（１３）式に従って、ｕ＾を計算し、計算されたｕ＾に最も近い点を、Ｓ^Ｎに属する点から見つけて、補助パラメータｕ＾^（ｔ）からｕ＾^{（ｔ＋１）}へ更新する。 Since S ^N is a Cartesian product set of S, since points that are arranged in a grid parallel to each axis are included in S ^N , the closest point in the L2 distance is simply the closest S for each parameter. To the problem of finding points included in. Thus, each parameter can be processed independently and can be obtained very efficiently.
As described above, the auxiliary parameter updating unit 34 is updated by the system parameter updating unit 32, w ^ ^{(t + 1)} , the parameter α ^ ^(t) initialized by the initialization unit 30, or the parameter α ^ updated last time. ^(T) is used to calculate u ^ according to the above equation (13), the closest point to the calculated u ^ is found from the points belonging to ^SN , and u is calculated from the auxiliary parameter u ^ ^(t). ^ Update to ^{(t + 1)} .

未定乗数更新部３６は、システムパラメータ更新部３２で更新されたｗ＾^{（ｔ＋１）}と、補助パラメータ更新部３４で更新されたｕ＾^{（ｔ＋１）}と、初期化部３０で初期化されたパラメータα＾^（ｔ）、又は前回更新されたパラメータα＾^（ｔ）とを用いて、以下に説明するように、ラグランジュ未定乗数α＾を更新する。
ｗ＾^{（ｔ＋１）}とｕ＾^{（ｔ＋１）}を固定したときの最適値の方向は、目的関数Ｏ（ｗ＾^{（ｔ＋１）}，ｕ＾^{（ｔ＋１）}，α＾；Ｄ）のα＾に関する勾配方向である。 The undetermined multiplier updating unit 36 includes w ^ ^{(t + 1)} updated by the system parameter updating unit 32, u ^ ^{(t + 1)} updated by the auxiliary parameter updating unit 34, and the parameter α initialized by the initialization unit 30. Using La ^(t) or the parameter α ^ ^(t) updated last time, the Lagrange undetermined multiplier α ^ is updated as described below.
The direction of the optimum value when w ^ ^{(t + 1)} and u ^ ^{(t + 1)} are fixed is the gradient direction with respect to α ^ of the objective function O (w ^ ^{(t + 1)} , u ^ ^{(t + 1)} , α ^; D). is there.

この関係から以下の更新式を得る。 From this relationship, the following update formula is obtained.

以上より、未定乗数更新部３６は、システムパラメータ更新部３２で更新されたｗ＾^{（ｔ＋１）}と、補助パラメータ更新部３４で更新されたｕ＾^{（ｔ＋１）}と、初期化部３０で初期化されたパラメータα＾^（ｔ）、又は前回更新されたパラメータα＾^（ｔ）とを用いて、上記（１５）式に従って、ラグランジュ未定乗数α＾^（ｔ）からα＾^{（ｔ＋１）}へ更新する。 As described above, the undetermined multiplier update unit 36 is initialized by the initialization unit 30 with w ^ ^{(t + 1)} updated by the system parameter update unit 32, u ^ ^{(t + 1)} updated by the auxiliary parameter update unit 34. parameter alpha ^{^ (t),} or by using the parameters alpha ^{^} and ^(t), which was last updated, according to the above (15), to update the Lagrange multiplier alpha ^{^ (t)} alpha ^{^} to ^{(t + 1).}

収束判定部３８は、予め定められた条件が満たされたか否かを判定し、当該条件が満たされるまで、システムパラメータ更新部３２、補助パラメータ更新部３４及び未定乗数更新部３６による更新処理を繰り返す。具体的には、収束判定部３８は、上記の各処理によって得られたシステムパラメータｗ＾が最適値になっているか判定する。より詳細には、二つの小さな正の実数ε_１、ε_２をあたえ、以下の（１６）式を満たした際に収束したと判定する（上記参考文献１参照）。 The convergence determination unit 38 determines whether or not a predetermined condition is satisfied, and repeats the update process by the system parameter update unit 32, the auxiliary parameter update unit 34, and the undetermined multiplier update unit 36 until the condition is satisfied. . Specifically, the convergence determination unit 38 determines whether or not the system parameter w ^ obtained by each of the above processes is an optimal value. More specifically, two small positive real numbers ε ₁ and ε ₂ are given, and it is determined that convergence has occurred when the following expression (16) is satisfied (see the above-mentioned Reference 1).

そして、収束判定部３８による収束判定で、収束していなかった場合は、ｔ＝ｔ＋１としてシステムパラメータ更新部３２に戻る。収束していると判定された場合は、後述する重複パラメータ圧縮部４０に移る。 When the convergence is not determined by the convergence determination by the convergence determination unit 38, t = t + 1 is returned to the system parameter update unit 32. If it is determined that it has converged, the process proceeds to the duplicate parameter compression unit 40 described later.

重複パラメータ圧縮部４０は、学習部３によって学習された複数のシステムパラメータに基づいて、値が同一となるシステムパラメータでグループを構成し、各グループにインデックス番号を付与し、各グループについて、グループを構成するシステムパラメータの各々を、当該グループに付与されたインデックス番号のシステムパラメータに変換し、当該インデックス番号のシステムパラメータを、後述するシステムパラメータ記憶部４に保存する。具体的には、上記（４）式で定義したように、スコア関数には線形式を仮定したので、ｗ_ｉ＝ｗ_ｊ＝，，，となるような、同じ値を取るシステムパラメータでグループを構築する。仮に、Ｋ個のグループができたと仮定すると、各グループに１からＫまでのインデックス番号を付け｛ｇ’_ｋ｝^Ｋ _ｋ＝１のようにＫ個のグループの集合を構築する。ここで、ｗ_ｉ＝ｗ_ｊ＝ｗ_ｌの時に、ｇ’_ｋをシステムパラメータの元の番号の集合と考え、ｇ’_ｋ＝｛ｉ，ｊ，ｌ｝と定義する。つまり、元のシステムパラメータ番号から、そのシステムパラメータと同じ値になるグループのインデックス番号への変換を、一時的に記憶しておくことを意味する。また、ｖ_ｋをｋ番目のグループの値とする、つまり、ｗ_ｉ＝ｗ_ｊ＝ｗ_ｌでｇ’_ｋ＝｛ｉ，ｊ，ｌ｝の時、ｗ_ｉ＝ｗ_ｊ＝ｗ_ｌ＝ｖ_ｋである。 Based on the plurality of system parameters learned by the learning unit 3, the duplicate parameter compression unit 40 forms a group with system parameters having the same value, assigns an index number to each group, and sets the group for each group. Each of the constituting system parameters is converted into a system parameter having an index number assigned to the group, and the system parameter having the index number is stored in a system parameter storage unit 4 to be described later. Specifically, as defined in the above equation (4), since the score function is assumed to have a linear format, groups are defined by system parameters having the same values such that w _i = w _j =,. To construct. Assuming that K groups are created, an index number from 1 to K is assigned to each group, and a set of K groups is constructed such that {g ′ _k } ^K _{k = 1} . Here, when w _i = w _j = w _l , g ′ _k is considered as a set of system parameter original numbers, and g ′ _k = {i, j, l} is defined. That is, it means that the conversion from the original system parameter number to the index number of the group having the same value as the system parameter is temporarily stored. Also, let v _{k be} the value of the k-th group, that is, when w _i = w _j = w _l and g ′ _k = {i, j, l}, w _i = w _j = w _l = v _k It is.

また、重複パラメータ圧縮部４０は、このインデックス番号の変換を素性関数ｆにも同じように適用する。このときに、新しくｇ_ｋを以下の（１７）式のように、元の素性抽出関数の単純な線形結合で定義する。 Further, the duplicate parameter compression unit 40 applies the conversion of the index number to the feature function f in the same manner. At this time, g _k is newly defined by a simple linear combination of the original feature extraction functions as in the following equation (17).

すると、上記式（４）のスコア関数は、 Then, the score function of the above equation (4) is

という関係が成り立つ。ここで、複数のシステムパラメータｗ＾は、入力データｘ＾及び出力データｙ＾に対する複数の素性抽出関数ｆ_ｄ（ｘ＾、ｙ＾）の重みとなる。つまり、もともとの線形関数Σ^Ｎ _ｄ＝１ｗ_ｄｆ_ｄ＾（ｘ＾，ｙ＾）は、新たな線形関数Σ^Ｋ _ｋ＝１ｖ_ｋｇ_ｋ（ｘ＾，ｙ＾）と等価で変換できる。ただし、Ｋはグループの数なので、Ｎと比べると圧倒的に少ない数になっていることが想定できる。また、ｇ_ｋはグループ内に属する素性関数の総和なので、事前に容易に計算できる。
以上のように、重複パラメータ圧縮部４０は、各グループについて、当該グループを構成するシステムパラメータの各々を、当該グループに付与された当該インデックス番号のシステムパラメータに変換し、当該インデックス番号のシステムパラメータを保存すると共に、上記の（１７）式に示すように、スコア関数を定義する。 This relationship holds. Here, the plurality of system parameters w ^ are weights of the plurality of feature extraction functions f _d (x ^, y ^) for the input data x ^ and the output data y ^. That is, the original linear function Σ ^N _{d = 1} w _d f _d ^ (x ^, y ^) can be converted equivalently to the new linear function Σ ^K _{k = 1} v _k g _k (x ^, y ^). . However, since K is the number of groups, it can be assumed that the number is much smaller than N. Moreover, since g _k is the sum of the feature functions belonging to the group, it can be easily calculated in advance.
As described above, for each group, the duplicate parameter compression unit 40 converts each system parameter constituting the group into a system parameter of the index number assigned to the group, and converts the system parameter of the index number to In addition to saving, a score function is defined as shown in equation (17) above.

よって、システムパラメータ更新部３２により得られた重複が多くあるシステムパラメータの集合から、重複部分を融合し、等価だが無駄のない形式に圧縮することで、大幅にモデルサイズを削減することができる。また追加の処理として、ｗ_ｄ＝０になる場合、ｆ_ｄは出力の選択に何も寄与しないので、素性関数の定義そのものを削除することが可能である。この処理もここで合わせて行う。 Therefore, the model size can be greatly reduced by merging the overlapping parts from the system parameter set obtained by the system parameter updating unit 32 and having many overlappings and compressing them into an equivalent but useless format. As an additional process, when w _d = 0, f _d contributes nothing to the output selection, so that the definition of the feature function itself can be deleted. This processing is also performed here.

システムパラメータ記憶部４には、重複パラメータ圧縮部４０によって圧縮されたシステムパラメータｖ_ｋと、新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とが格納される。 The system parameter storage unit 4 stores the system parameter v _k compressed by the duplicate parameter compression unit 40 and the newly defined score function Φ (x ^, y ^; w ^).

＜情報処理装置のシステム構成＞
前述のシステムパラメータ学習装置１００で得られた圧縮されたシステムパラメータを用いて、情報処理装置２００によって、未知の入力データに対して所定の情報処理を行う。システムパラメータの圧縮を行う場合と、仮にシステムパラメータの圧縮をしなかった場合とで、処理結果は完全に一致する。よって、圧縮を行うことによって、モデルのサイズ（システムパラメータ自体のサイズ）を大幅に削減できる分、実行時に必要とされるリソースを削減できるというメリットだけを享受することができる。
図７は、本発明の実施の形態に係る情報処理装置２００を示すブロック図である。この情報処理装置２００は、ＣＰＵと、ＲＡＭと、後述する情報処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。 <System configuration of information processing apparatus>
The information processing apparatus 200 performs predetermined information processing on unknown input data using the compressed system parameters obtained by the system parameter learning apparatus 100 described above. The processing results are completely the same when the system parameters are compressed and when the system parameters are not compressed. Therefore, by performing the compression, it is possible to enjoy only the merit that the resources required at the time of execution can be reduced by the amount that the size of the model (the size of the system parameter itself) can be greatly reduced.
FIG. 7 is a block diagram showing an information processing apparatus 200 according to the embodiment of the present invention. The information processing apparatus 200 is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing an information processing routine described later, and is functionally configured as follows. .

本実施の形態に係る情報処理装置２００は、図７に示すように、入力部５と、システムパラメータ記憶部６と、情報処理部７と、出力部８とを備えている。 As shown in FIG. 7, the information processing apparatus 200 according to the present embodiment includes an input unit 5, a system parameter storage unit 6, an information processing unit 7, and an output unit 8.

入力部５は、入力データｘ＾を受け付ける。 The input unit 5 accepts input data x ^.

システムパラメータ記憶部６には、上記システムパラメータ学習装置によって圧縮されたシステムパラメータｖ_ｋと、新たに定義されたスコア関数ｇ_ｋとが格納される。 The system parameter storage unit 6 stores a system parameter v _k compressed by the system parameter learning device and a newly defined score function g _k .

情報処理部７は、システムパラメータ記憶部６に格納された、インデックス番号ｋのシステムパラメータｖ_ｋと、新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とに基づいて、入力部５により受け付けた入力データｘ＾に対して、所定の情報処理を行う。具体的には、情報処理部７は、入力部５により受け付けた入力データｘ＾と、システムパラメータ記憶部６に格納された、インデックス番号ｋのシステムパラメータｖ_ｋと、新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とに基づいて、所定の最適化手法を用いて、スコア関数Φ（ｘ＾，ｙ＾；ｗ＾）が最大となる出力データｙ＾を算出する。 The information processing unit 7 is input based on the system parameter v _k of the index number k stored in the system parameter storage unit 6 and the newly defined score function Φ (x ^, y ^; w ^). Predetermined information processing is performed on the input data x ^ received by the unit 5. Specifically, the information processing unit 7 receives the input data x ^ received by the input unit 5, the system parameter v _k of the index number k stored in the system parameter storage unit 6, and a newly defined score function Based on Φ (x ^, y ^; w ^), output data y ^ having the maximum score function Φ (x ^, y ^; w ^) is calculated using a predetermined optimization method.

出力部８は、情報処理部７によって算出された出力データｙ＾を結果として出力する。 The output unit 8 outputs the output data y ^ calculated by the information processing unit 7 as a result.

＜システムパラメータ学習装置の作用＞
次に、本実施の形態に係るシステムパラメータ学習装置１００の作用について説明する。まず、教師データと、パラメータρとが、システムパラメータ学習装置１００に入力されると、システムパラメータ学習装置１００によって、入力された教師データが、教師データベース２０へ格納される。 <Operation of system parameter learning device>
Next, the operation of the system parameter learning apparatus 100 according to the present embodiment will be described. First, when teacher data and a parameter ρ are input to the system parameter learning device 100, the input teacher data is stored in the teacher database 20 by the system parameter learning device 100.

そして、システムパラメータ学習装置１００によって、図８に示すシステムパラメータ学習処理ルーチンが実行される。 Then, the system parameter learning apparatus 100 executes the system parameter learning process routine shown in FIG.

まず、ステップＳ１００において、初期化部３０によって、最適化に用いる３種類の最適化変数ｗ＾、ｕ＾、α＾を全て０にセットし、初期化する。 First, in step S100, the initialization unit 30 sets all three types of optimization variables w ^, u ^, α ^ used for optimization to 0 and initializes them.

次に、ステップＳ１０２において、初期化部３０によって、繰り返し回数を管理する変数ｔを、ｔ＝０と設定し、初期化する。 In step S102, the initialization unit 30 initializes the variable t for managing the number of repetitions by setting t = 0.

ステップＳ１０４において、システムパラメータ更新部３２によって、上記ステップＳ１００で初期化された３種類のパラメータｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）、又は後述する各ステップで前回更新された３種類のパラメータｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）を用いて、上記（１０）式に従って、システムパラメータｗ＾^（ｔ）からｗ＾^{（ｔ＋１）}へ更新する。 In step S104, the three types of parameters w ^, u ^ ^(t) , α ^ ^(t) initialized in step S100 by the system parameter update unit 32, or the three types last updated in each step described later. System parameters w ^ ^(t) to w ^ ^{(t + 1)} according to the above equation (10) using the parameters w ^, u ^ ^(t) , α ^ ^(t) .

ステップＳ１０６において、補助パラメータ更新部３４によって、上記ステップＳ１０４で更新されたｗ＾^{（ｔ＋１）}と、上記ステップＳ１００で初期化されたパラメータα＾^（ｔ）、又は後述するステップＳ１０８で前回更新されたパラメータα＾^（ｔ）とを用いて、上記（１３）式に従って、ｕ＾を計算し、計算されたｕ＾に最も近い点を、Ｓ^Ｎに属する点から見つけて、補助パラメータｕ＾^（ｔ）からｕ＾^{（ｔ＋１）}へ更新する。 In step S106, the auxiliary parameter update unit 34 updates w ^ ^{(t + 1)} updated in step S104, the parameter α ^ ^(t) initialized in step S100, or updated last time in step S108 described later. Using the parameter α ^ ^(t) , u ^ is calculated according to the above equation (13), the point closest to the calculated u ^ is found from the points belonging to ^SN , and the auxiliary parameter u ^ ^{(t )} To u ^ ^{(t + 1)} .

ステップＳ１０８において、未定乗数更新部３６によって、上記ステップＳ１０４で更新されたｗ＾^{（ｔ＋１）}と、上記ステップＳ１０６で更新されたｕ＾^{（ｔ＋１）}と、上記ステップＳ１００で初期化されたパラメータα＾^（ｔ）、又は本ステップＳ１０８で前回更新されたパラメータα＾^（ｔ）とを用いて、上記（１５）式に従って、ラグランジュ未定乗数α＾^（ｔ）からα＾^{（ｔ＋１）}へ更新する。 In step S108, the undetermined multiplier updating unit 36 updates w ^ ^{(t + 1)} updated in step S104, u ^ ^{(t + 1)} updated in step S106, and the parameter α ^ initialized in step S100. ^(T) or using the parameter α ^ ^(t) updated last time in this step S108, the Lagrange undetermined multiplier α ^ ^(t ⁾ is updated to α ^ ^{(t + 1)} according to the above equation (15).

ステップＳ１１０において、収束判定部３８によって、上記ステップＳ１０４で更新されたｗ＾^{（ｔ＋１）}と、前回更新されたｗ＾^（ｔ）と、上記ステップＳ１０６で更新されたｕ＾^{（ｔ＋１）}と、前回更新されたｕ＾^（ｔ）とに基づいて、上記（１６）式に示す収束条件を満たしているか否かを判定する。そして、収束していないと判定された場合には、ステップＳ１１２で繰り返しを管理する変数ｔをインクリメントして、ステップＳ１０４へ移行し、上記ステップＳ１０４〜ステップＳ１０８の各処理を繰り返す。収束したと判定された場合には、ステップＳ１１４へ移行する。 In step S110, the convergence determining unit 38, and updated in step ^{S104 w ^ (t + 1)} , and ^{w ^} was last updated ^(t), ^{u ^} updated in step S106 and ^{(t + 1),} the last Based on the updated u ^ ^(t) , it is determined whether or not the convergence condition shown in the above equation (16) is satisfied. If it is determined that it has not converged, the variable t for managing repetition is incremented in step S112, the process proceeds to step S104, and the processes in steps S104 to S108 are repeated. When it determines with having converged, it transfers to step S114.

ステップＳ１１４において、重複パラメータ圧縮部４０によって、上記ステップＳ１０４の更新処理によって最終的に得られたシステムパラメータｗ＾の値に基づいて、値が同一となるシステムパラメータでグループを構成し、各グループについて、グループを構成するシステムパラメータの各々を、当該グループに付与された当該インデックス番号のシステムパラメータに変換し、当該インデックス番号のシステムパラメータをシステムパラメータ記憶部４に保存すると共に、上記の（１８）式に示すように、新たなスコア関数を定義して、システムパラメータ記憶部４に保存して、システムパラメータ学習処理ルーチンを終了する。 In step S114, the duplicate parameter compression unit 40 forms a group with system parameters having the same value based on the value of the system parameter w ^ finally obtained by the update process in step S104. Each of the system parameters constituting the group is converted into the system parameter of the index number assigned to the group, the system parameter of the index number is stored in the system parameter storage unit 4, and the above equation (18) As shown in FIG. 5, a new score function is defined and stored in the system parameter storage unit 4, and the system parameter learning processing routine is terminated.

＜情報処理装置の作用＞
次に、本実施の形態に係る情報処理装置２００の作用について説明する。まず、システムパラメータ学習装置１００のシステムパラメータ記憶部４に記憶されているシステムパラメータｖ_ｋの各々と新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とが、情報処理装置２００に入力されると、システムパラメータ記憶部６に格納される。そして、対象としての入力データｘ＾が情報処理装置２００に入力されると、情報処理装置２００によって、図９に示す実行処理ルーチンが実行される。 <Operation of information processing device>
Next, the operation of the information processing apparatus 200 according to this embodiment will be described. First, each of the system parameters v _k stored in the system parameter storage unit 4 of the system parameter learning device 100 and the newly defined score function Φ (x ^, y ^; w ^) Is stored in the system parameter storage unit 6. When the target input data x ^ is input to the information processing apparatus 200, the information processing apparatus 200 executes an execution processing routine shown in FIG.

まず、ステップＳ２００において、入力部５によって、入力データｘ＾を受け付ける。 First, in step S200, the input data 5 is received by the input unit 5.

次に、ステップＳ２０２において、情報処理部７によって、システムパラメータ記憶部６に記憶されたシステムパラメータｖ_ｋの各々と、新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とを読み込む。 Next, in step S202, each of the system parameters v _k stored in the system parameter storage unit 6 and the newly defined score function Φ (x ^, y ^; Read.

ステップＳ２０４において、情報処理部７によって、上記ステップＳ２００で受け付けた入力データｘ＾と、上記ステップＳ２０２で読み込んだシステムパラメータｖ_ｋの各々及びスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とに基づいて、所定の最適化手法を用いて、スコア関数Φ（ｘ＾，ｙ＾；ｗ＾）が最大となる出力データｙ＾を算出する。 In step S204, the information processing unit 7, the input data x ^ accepted in step S200, the read system parameters at step S202 _v respectively and the score function of _{k Φ (x ^, y ^} ; w ^) and the Based on this, output data y ^ having the maximum score function Φ (x ^, y ^; w ^) is calculated using a predetermined optimization method.

ステップＳ２０６において、出力部８によって、上記ステップＳ２０４で算出された出力データｙ＾を結果として出力して、情報処理ルーチンを終了する。 In step S206, the output unit 8 outputs the output data y ^ calculated in step S204 as a result and ends the information processing routine.

＜実験結果＞
次に、本実施の形態の実験結果を示す。実際にテキスト処理の問題では、識別する際に入力と出力を特徴付ける、いわゆる素性として単語等の離散シンボル的なものを扱うため、全体の素性数が数百万から、数十億程度まで利用されることが往々にしてあり得る。図１０に、自然言語処理の構文解析と固有表現抽出問題において、本実施の形態で説明した手法を用いた際の効果を示す。 <Experimental result>
Next, experimental results of the present embodiment are shown. In fact, in the text processing problem, since the discrete features such as words are used as so-called features that characterize the input and output at the time of identification, the total number of features is used from several million to several billions. It can often happen. FIG. 10 shows the effect of using the method described in this embodiment in the syntax analysis of natural language processing and the proper expression extraction problem.

図１０中の横軸は、グループの数を表し、縦軸は、システムの解析精度を表している。この図からもわかるように、本実施の形態の方法を用いると、グループの数を２から８といった非常に少ない数で従来と同等の解析精度を得ることができる。 The horizontal axis in FIG. 10 represents the number of groups, and the vertical axis represents the analysis accuracy of the system. As can be seen from this figure, when the method of the present embodiment is used, an analysis accuracy equivalent to the conventional one can be obtained with a very small number of groups, such as 2 to 8.

以上説明したように、本発明の実施の形態に係るシステムパラメータ学習装置によれば、教師データに基づいて、複数のシステムパラメータの各々の値が、予め定められた有限個の離散値の集合に含まれる制約を満たし、かつ、最適化された複数のシステムパラメータを学習することにより、値が重複するようにシステムパラメータを学習することができ、学習されたシステムパラメータを圧縮することにより、システムパラメータを記憶するための記憶容量を削減することができる。 As described above, according to the system parameter learning device according to the embodiment of the present invention, based on the teacher data, each value of the plurality of system parameters is converted into a set of a predetermined finite number of discrete values. By learning multiple optimized system parameters that satisfy the included constraints, system parameters can be learned so that their values overlap, and by compressing the learned system parameters, system parameters Can be reduced in storage capacity.

また、本発明の実施の形態に係る情報処理装置によれば、システムパラメータ学習装置によって学習された、圧縮されたシステムパラメータに基づいて、入力データに対して、所定の情報処理を行って出力データを出力することにより、モデルのサイズ（システムパラメータ自体のサイズ）を大幅に削減できる分、実行時に必要とされるリソースを削減できる。 Further, according to the information processing apparatus according to the embodiment of the present invention, the input data is subjected to predetermined information processing based on the compressed system parameters learned by the system parameter learning apparatus, and the output data Since the model size (the size of the system parameter itself) can be greatly reduced, the resources required at the time of execution can be reduced.

また、従来とほぼ同等の精度を保ちつつシステムパラメータ自体のサイズを大幅に削減することが可能となる。具体的には、システムパラメータ数１億のモデルの場合は、１００，０００，０００ｘ８ｂｙｔｅ＝８００，０００，０００ｂｙｔｅであるので、８００ＭＢの容量を必要とする。しかし、本実施の形態を用いることで、システムパラメータのユニークな値を、例えば、８程度といった非常に小さい値にまで削減できる。この場合、８×８ｂｙｔｅ＝６４ｂｙｔｅであり、計算上は、約１２５０万分の１程度に圧縮できたことになる。 In addition, it is possible to greatly reduce the size of the system parameter itself while maintaining almost the same accuracy as before. Specifically, in the case of a model with 100 million system parameters, since 100,000,000 × 8 bytes = 800,000,000 bytes, a capacity of 800 MB is required. However, by using this embodiment, the unique value of the system parameter can be reduced to a very small value such as about 8. In this case, 8 × 8 bytes = 64 bytes, and in terms of calculation, the compression can be performed to about 12.5 million.

また、従来技術で示した方法との違いとしては、従来技術では、「同じ値を取るように」といった概念は存在しないため、このような大幅なモデルサイズの削減はできない。逆に、本実施の形態の「同じ値を取るようにする」、といった概念の中には、従来技術の方策である極力ゼロを取るという概念も含まれる。これは、本実施の形態では、同じ値を取るという考えの中に、値がゼロになる点でも同じ値をなるべくとるようにする、という考えが含まれるからである。つまり、本実施の形態は、従来技術を包含する上位概念と考えることができる。また、その効果も、単に従来技術のようにゼロを増やすだけでなく、ゼロ以外の値に対しても大幅に必要情報量を削減できることから、従来技術に比べて、システム性能を大幅に落とすことなく、大幅なサイズ削減を可能とする。 Further, as a difference from the method shown in the prior art, in the prior art, there is no concept such as “to have the same value”, and thus such a large model size cannot be reduced. Conversely, the concept of “take the same value” in the present embodiment includes the concept of taking zero as much as possible, which is a measure of the prior art. This is because, in the present embodiment, the idea of taking the same value includes the idea of taking the same value as much as possible even when the value becomes zero. In other words, the present embodiment can be considered as a superordinate concept including the prior art. In addition, the effect is not only to increase zero as in the conventional technology, but also to significantly reduce the amount of required information for non-zero values, so that the system performance is greatly reduced compared to the conventional technology. Without significant size reduction.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、システムパラメータ記憶部４、６及び教師データベース２０は、外部に設けられ、システムパラメータ学習装置１００及び情報処理装置２００とネットワークで接続されていてもよい。 For example, the system parameter storage units 4 and 6 and the teacher database 20 may be provided outside and connected to the system parameter learning device 100 and the information processing device 200 via a network.

また、上記実施の形態では、システムパラメータ学習装置１００と情報処理装置２００とを別々の装置として構成する場合を例に説明したが、システムパラメータ学習装置１００と情報処理装置２００とを１つの装置として構成してもよい。 In the above embodiment, the case where the system parameter learning device 100 and the information processing device 200 are configured as separate devices has been described as an example. However, the system parameter learning device 100 and the information processing device 200 are configured as one device. It may be configured.

上述のシステムパラメータ学習装置１００及び情報処理装置２００は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 The system parameter learning device 100 and the information processing device 200 described above have a computer system inside. If the “computer system” uses a WWW system, a homepage providing environment (or display environment) is used. ).

例えば、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 For example, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program may be provided by being stored in a computer-readable recording medium.

１教師データ入力部
２演算部
３学習部
４、６システムパラメータ記憶部
５入力部
７情報処理部
８出力部
２０教師データベース
３０初期化部
３２システムパラメータ更新部
３４補助パラメータ更新部
３６未定乗数更新部
３８収束判定部
４０重複パラメータ圧縮部
１００システムパラメータ学習装置
２００情報処理装置 DESCRIPTION OF SYMBOLS 1 Teacher data input part 2 Calculation part 3 Learning part 4, 6 System parameter memory | storage part 5 Input part 7 Information processing part 8 Output part 20 Teacher database 30 Initialization part 32 System parameter update part 34 Auxiliary parameter update part 36 Undecided multiplier update part 38 Convergence determining unit 40 Duplicate parameter compressing unit 100 System parameter learning device 200 Information processing device

Claims

Learning a plurality of system parameters set in an information processing system that performs predetermined information processing on input data and outputs the output data, using a score function for determining the score of the output data with respect to the input data A system parameter learning device,
A teacher data input unit that inputs teacher data that is a pair of each of a plurality of input data and each of a plurality of correct output data for the plurality of input data;
Based on the teacher data input by the teacher data input unit and the score function, the values of the plurality of system parameters satisfy a constraint included in a predetermined finite set of discrete values. And a learning unit that learns the plurality of optimized system parameters.

The system parameter learning device according to claim 1, wherein the learning unit learns the optimized N system parameters' w ^ according to the following expression.

However, L (Φ (), w ^, D) , the output data obtained when performing the predetermined processing using the system parameter w ^ with respect to the input data x _i of the training data D is, This is a risk function that returns a value indicating how wrong the correct output data y _i of the teacher data D is, Ω (w ^) is a regularization term, and S ^N is the N system parameters. A Cartesian product set of the finite number of discrete values for each.

3. The system parameter learning apparatus according to claim 2, wherein the learning unit learns the optimized N system parameters' w ^ according to the following expression.

Where α ^ is a Lagrange undetermined multiplier, ρ is a predetermined tuning parameter, and u ^ is an auxiliary parameter.

Based on the plurality of system parameters learned by the learning unit, a group is configured with system parameters having the same value, an index number is assigned to each group, and for each group, a system parameter that configures the group The system parameter according to any one of claims 1 to 3, further comprising a duplicate parameter compression unit that converts each into a system parameter of the index number assigned to the group and stores the system parameter of the index number. Learning device.

The plurality of system parameters are weights of a plurality of feature extraction functions f _d (x ^, y ^) for input data x ^ and output data y ^,
The duplicate parameter compression unit converts, for each group, each system parameter constituting the group into a system parameter of the index number assigned to the group, and stores the system parameter of the index number, and The system parameter learning apparatus according to claim 4, wherein the score function is defined and stored as shown in the equation:

Here, Φ (x ^, y ^; w ^) is a score function, and _vk is a system parameter of the index number k assigned to the group.

An input unit for receiving input data;
6. Based on the score function and the system parameter of the index number of each group stored by the system parameter learning device according to claim 4 or 5, the input information is subjected to the predetermined information processing and output. An information processing unit for outputting data;
An information processing apparatus including:

An information processing system that includes a teacher data input unit and a learning unit, performs a predetermined information process on the input data using a score function for determining a score of the output data with respect to the input data, and outputs the output data A system parameter learning method in a system parameter learning device that learns a plurality of system parameters set in
Inputting teacher data which is a pair of each of a plurality of input data and each of a plurality of correct output data for the plurality of input data by the teacher data input unit;
Based on the teacher data input by the teacher data input unit and the score function by the learning unit, each value of the plurality of system parameters is set to a predetermined finite set of discrete values. Learning the plurality of system parameters that satisfy and satisfy the constraints included therein, and a system parameter learning method.

An information processing method in an information processing apparatus including an input unit and an information processing unit,
Receiving input data by the input unit;
Based on the score function and the system parameter of the index number of each group stored by the system parameter learning device according to claim 4 or 5, the information processing unit performs the predetermined function on the input data. Performing information processing and outputting output data;
An information processing method including:

The program for functioning a computer as each part of the system parameter learning apparatus of any one of Claims 1-5.

The program for functioning a computer as each part of the information processing apparatus of Claim 6.