JP2016161968A

JP2016161968A - Word vector learning device, natural language processing device, method, and program

Info

Publication number: JP2016161968A
Application number: JP2015037057A
Authority: JP
Inventors: 潤鈴木; Jun Suzuki
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2015-02-26
Filing date: 2015-02-26
Publication date: 2016-09-05
Anticipated expiration: 2035-02-26
Also published as: JP6517537B2

Abstract

PROBLEM TO BE SOLVED: To allow efficient learning of word vectors.SOLUTION: A document data update part 26 updates a word list and co-occurrence information of contexts of words, and adds word vectors and texture vectors for added words. A number-of-dimensions increasing part 28 updates the number of dimensions. A repetition optimization part 30 repeats the successive steps of fixing rfor all the words i, optimizing a variable cfor all the words j, fixing cfor all the words j, and optimizing a variable rfor the words i, so that a solution to an optimizing problem can be obtained in the end.SELECTED DRAWING: Figure 1

Description

本発明は、単語ベクトル学習装置、自然言語処理装置、方法、及びプログラムに係り、特に、単語に関する単語ベクトルを学習する単語ベクトル学習装置、自然言語処理装置、方法、及びプログラムに関する。 The present invention relates to a word vector learning device, a natural language processing device, a method, and a program, and more particularly, to a word vector learning device, a natural language processing device, a method, and a program for learning a word vector related to a word.

個々の単語は離散シンボルであり、かつ物理現象などに基づくものでもないことから、単語間の類似度を定量的に表現するのはそれほど単純ではない。比較として、例えば音声は、計算機上では一般的に周波数の時系列データとして捉えられる。よって、任意の音声区間同士の類似度は、周波数から算出できる様々な特徴量（連続値）をベクトル化したものの間で距離を計算することで、ある程度計測することができる。同様に、画像間の類似度も画素情報を特徴量としてベクトル化したものの間で距離を計算することである程度容易に計算できる。このように、波形であったり色彩であったり物理的な現象を基とするものの間の類似度は、計算機上でも比較的自然に扱うことが可能であるが、言語のような離散的なシンボルで記述された物理現象にも則さないものの間の類似度は、計算機上で単純には扱えない。 Since each word is a discrete symbol and not based on a physical phenomenon or the like, it is not so simple to quantitatively express the similarity between words. For comparison, for example, voice is generally regarded as time-series data of frequency on a computer. Therefore, the similarity between arbitrary speech sections can be measured to some extent by calculating the distance between the vectorization of various feature quantities (continuous values) that can be calculated from the frequency. Similarly, similarity between images can be easily calculated to some extent by calculating a distance between pixel information vectorized as feature amounts. In this way, the similarity between waveforms, colors, and those based on physical phenomena can be handled relatively naturally on a computer, but discrete symbols such as languages can be used. The similarity between things that do not conform to the physical phenomenon described in (1) cannot simply be handled on a computer.

このような背景から、単語のような離散シンボル間の類似度を計算するために、これまで様々な方法論が考案されている。そのひとつに分散意味表現という方法がある。これは、音声や画像と同様に、各単語に一つのベクトルを割り振り、そのベクトル間の距離をもって単語間の意味的な類似度を表現しようと試みる方法である．ベクトル空間内の距離計算で単語間の意味的な近さを表現するので、計算機にとっては非常に親和性が高い方法と言える。 From this background, various methodologies have been devised so far to calculate the similarity between discrete symbols such as words. One of them is a distributed semantic expression method. This is a method of assigning one vector to each word and trying to express the semantic similarity between words by the distance between the vectors, as in the case of speech and images. It can be said that this method has a very high affinity for computers because it expresses the semantic proximity between words by calculating the distance in the vector space.

図９に分散意味表現による単語間の類似度の概要を示す。 FIG. 9 shows an outline of the similarity between words by the distributed semantic expression.

ここでは、ｉ番目の単語をｗ_ｉと表す。また、ｉ番目の単語ｗ_ｉに割り当てられたベクトルをｒ´_ｉで表す。なお、ベクトルを表す記号には、記号の後ろに「´」を付して表現する。
以降、単語に割り振られたベクトルのことを特別に「単語ベクトル」と呼ぶこととする。つまり、単語ｗ_ｉの単語ベクトルはｒ´_ｉである。この時、計算機上の計算としては、以下（１）式において表すように、二つの単語ｗ_ｉとｗ_ｊと間の類似度は、ｗ_ｉと、ｗ_ｊとの単語ベクトル間の内積、あるいは、コサイン距離により定義するのが一般的である。 Here, the i-th word is represented as w _i . Further, r ′ _i represents a vector assigned to the i-th word w _i . A symbol representing a vector is represented by adding “′” after the symbol.
Hereinafter, a vector assigned to a word is specifically referred to as a “word vector”. That is, the word vector of the word w _i is r ′ _i . At this time, as a calculation on the computer, as shown in the following formula (1), the similarity between the two words w _i and w _j is the inner product between the word vectors of w _i and w _j , or Generally, it is defined by the cosine distance.

この場合、内積あるいはコサイン距離の値が大きければ大きいほど単語ｗ_ｉとｗ_ｊは似ているということを意味する。これによって、翻訳、対話、文書要約、又は文書校正といった言語処理の様々なアプリケーションの中で意味的に類似性がある単語を処理の中で扱えるようになる利点がある。結果として、単語間の意味的な近さを用いない処理方式より良い結果が得られることが示されている。ここで、各単語の単語ベクトルの獲得方法には、これまで多くの方法が提案されている。基本的な方法論としては、まず文章内の各単語に対して、その単語の文脈情報を定義する。文脈情報に特に規定はなく様々な情報を用いることができるが、最も単純には各単語の周辺に出現する単語を文脈情報として扱う場合がほとんどである。文脈の定義を変更しても、単語ベクトルの推定アルゴリズムそのものにはあまり影響を与えない。よって、以降の議論では、単語の文脈情報としては、周辺に出現する単語とする。 In this case, the larger the inner product or cosine distance value, the more similar the words w _i and w _j are. This has the advantage that semantically similar words can be handled during processing in various language processing applications such as translation, dialogue, document summarization, or document proofing. As a result, it is shown that better results can be obtained than a processing method that does not use semantic proximity between words. Here, many methods have been proposed for acquiring word vectors for each word. As a basic methodology, first, for each word in a sentence, the context information of that word is defined. There are no specific rules for the context information and various information can be used, but most simply, the words appearing around each word are mostly handled as context information. Changing the context definition does not significantly affect the word vector estimation algorithm itself. Therefore, in the following discussion, the word context information is a word that appears in the vicinity.

近年では、ｗｅｂから得られるような大規模なデータにも対応できるほど高速に処理可能な方法が主流となっている（非特許文献１及び非特許文献２参照）。大規模データが扱える手法が主流な理由は、データが多ければ多いほど、言語事象を的確に捉えることが可能となるため、類似度の推定精度が向上することが理論的にも期待できるからである。ここでは従来方式の代表として、非特許文献２に即した単語ベクトルの獲得方法を述べる。 In recent years, methods that can be processed at such a high speed as to be able to deal with large-scale data obtained from web have become mainstream (see Non-Patent Document 1 and Non-Patent Document 2). The reason why the method that can handle large-scale data is the main reason is that the more data there is, the more accurately the language event can be captured, so it can be theoretically expected that the estimation accuracy of similarity will be improved. is there. Here, as a representative of the conventional method, a method for acquiring a word vector according to Non-Patent Document 2 will be described.

非特許文献２の方式では、単語がある他の単語の文脈として出現した場合を表現するために、各単語に単語ベクトルとは別のもう一つのベクトルを割り当てる。これを単語ベクトルと対比して便宜上「文脈ベクトル」と呼ぶ。つまり、ｉ番目の単語ｗ_ｉは、単語ベクトルｒ´_ｉと文脈ベクトルｃ´_jの二つのベクトルを持つ。 In the method of Non-Patent Document 2, in order to express a case where a word appears as the context of another word, another vector different from the word vector is assigned to each word. For convenience, this is called a “context vector” in contrast to the word vector. That is, the i-th word w _i has two vectors, a word vector r ′ _i and a context vector c ′ _j .

単語がＮ単語あるとする。この時に、（ｒ´_ｉ）_ｉ＝１ ^Ｎを全ての単語ベクトルをｉ＝１からＮまで順番に並べたベクトルのリストとする。同様に、（ｃ´_ｊ）_ｊ＝１ ^Ｎを全ての文脈ベクトルのｊについて１からＮまで順番に並べたベクトルのリストとする。また、ｉ番目の単語に対してｊ番目の単語が文脈となった回数をＸ_ｉ，ｊとする。この時、非特許文献２に即した分散意味表現の獲得には以下（３）の目的関数を最小化する問題として、以下（２）式により定式化できる。 Assume that there are N words. At this time, (r ′ _i ) _{i = 1} ^N is a list of vectors in which all word vectors are arranged in order from i = 1 to N. Similarly, let (c ′ _j ) _{j = 1} ^{N be} a list of vectors arranged in order from 1 to N for j of all context vectors. Further, the number of times that the j-th word becomes a context with respect to the i-th word is X _{i, j} . At this time, the acquisition of the distributed semantic expression according to Non-Patent Document 2 can be formulated by the following equation (2) as a problem of minimizing the objective function of the following (3).

ただし、＾ｒ´_ｉ及び＾ｃ´_ｊは，単語ベクトル及び文脈ベクトルの推定結果である。 However, ^ _r'i and ^ _c'j is an estimation result of the word vectors and the context vector.

最終的に得られた＾ｒ´_ｉがｉ番目の単語の単語ベクトルである。これが、（１）式の類似度計算などで用いられる単語ベクトルとなる。また、翻訳、又は文書校正等の自然言語処理の応用アプリケーションで利用される。 The finally obtained ^ r ′ _i is the word vector of the i-th word. This is the word vector used in the similarity calculation of equation (1). It is also used in application applications for natural language processing such as translation or document proofreading.

Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at ICLR, 2013.Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space.Proceedings of Workshop at ICLR, 2013. Jeffrey Pennington, Richard Socher, and Christopher Manning, Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),2014.Jeffrey Pennington, Richard Socher, and Christopher Manning, Glove: Global Vectors for Word Representation.Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.

前述の非特許文献２のような現在主流に用いられている方法の課題は、学習を行う前に扱う「単語数」、出力となるベクトル空間の「次元数」を明示的に決定しなくてはいけない点である。議論を明確にするために、以降単語数をＮ、出力ベクトル空間の次元数をＤと記述することとする。つまり、分散意味表現の学習を始める前に単語数Ｎと次元数Ｄを決めて学習を始めることを意味する。 The problem of the method used in the current mainstream as in Non-Patent Document 2 described above is that the “number of words” to be handled before learning and the “number of dimensions” of the vector space to be output need not be explicitly determined. It is a point that should not be done. In order to clarify the discussion, the number of words is hereinafter referred to as N and the number of dimensions of the output vector space as D. That is, it means that the learning is started by determining the number of words N and the number of dimensions D before starting the learning of the distributed semantic expression.

一般的に分散意味表現を構築する状況を考えると、学習に用いることが可能なデータは日々増加、更新する場合がほとんどであり、理想的には学習データの更新分に合わせて逐次学習していきたいという状況が発生する。また単語ベクトルの適切な次元数は、取り扱う学習データ量によって異なると考えられるため、例えば、学習データ量が増加した際にベクトルの次元数を増やすといった処理を違和感なく扱える枠組みであることが望ましい。更に、これらの理由がなかったとしても、分散意味表現の学習は基本的に非凸最適化問題であり局所最適解が複数あるため、同じデータを用いていて同じ目的関数を使っても、学習時の初期値や手順の違いによって、得られる単語ベクトルが大きく異なることがあるという性質を持つ。このため、少し学習データが増加した状況で、再学習前後の単語ベクトルの間に関係性が全くないとすると、分散意味表現を再学習するたびに、それを利用したシステムのシステムパラメタも再推定しなくてはいけないというデメリットが生じる。 In general, considering the situation of constructing a distributed semantic expression, the data that can be used for learning is almost always increasing and updating every day. Ideally, learning is performed sequentially according to the update of the learning data. A situation occurs where you want to go. In addition, since the appropriate number of dimensions of the word vector is considered to vary depending on the amount of learning data to be handled, for example, it is desirable that the framework be able to handle the process of increasing the number of dimensions of the vector when the amount of learning data increases without uncomfortable feeling. Furthermore, even without these reasons, learning of distributed semantic representation is basically a non-convex optimization problem and there are multiple local optimal solutions, so even if the same data is used and the same objective function is used, learning is possible. Depending on the initial value and procedure of the time, the resulting word vectors may vary greatly. For this reason, if there is no relationship between the word vectors before and after relearning in a situation where the learning data has increased slightly, the system parameters of the system that uses it will be reestimated each time relearning the distributed semantic representation. There is a demerit that it must be done.

本発明は、上記問題点を解決するために成されたものであり、効率よく、単語ベクトルを学習するごとができる単語ベクトル学習装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a word vector learning device, method, and program capable of efficiently learning a word vector.

また、学習された単語ベクトルを用いて、単語の意味的な類似度に基づく自然言語処理を行う自然言語処理装置、及びプログラムを提供することを目的とする。 It is another object of the present invention to provide a natural language processing apparatus and program for performing natural language processing based on the semantic similarity of words using learned word vectors.

上記目的を達成するために、第１の発明に係る単語ベクトル学習装置は、文書データに基づいて、単語の各々について、前記単語に関する単語ベクトル、及び前記単語が他の単語の文脈として出現することを表す文脈ベクトルを学習する単語ベクトル学習装置であって、前記単語ベクトル及び文脈ベクトルの何れかの次元を順番に対象次元とし、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記対象次元に関する目的関数を最適化するように、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元の値を推定することを繰り返す反復最適化部、を含んで構成されている。 In order to achieve the above object, the word vector learning device according to the first invention is based on document data, for each word, the word vector related to the word, and the word appears as the context of another word. A word vector learning device that learns a context vector representing a word vector and a context vector that is one of the dimensions of the word vector and the context vector in order, each different from the target dimension of the word vector and the context vector for each of the words A word vector and a context vector for each of the words so as to optimize the objective function for the target dimension based on the value of the dimension and the number of times one word has appeared as the context of the other word for the word pair It includes an iterative optimization unit that repeatedly estimates the value of the target dimension.

また、第１の発明に係る単語ベクトル学習装置において、前記反復最適化部は、前記単語の各々についての単語ベクトルの対象次元の値を固定とし、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記対象次元に関する目的関数を最適化するように、前記単語の各々についての文脈ベクトルの対象次元の値を推定する文脈ベクトル最適化部と、前記単語の各々についての文脈ベクトルの対象次元の値を固定とし、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記対象次元に関する目的関数を最適化するように、前記単語の各々についての単語ベクトルの対象次元の値を推定する単語ベクトル最適化部と、予め定められた反復終了条件を満たすまで、前記文脈ベクトル最適化部による推定、及び前記単語ベクトル最適化部による推定を交互に繰り返し行う反復判定部と、を含み、前記単語ベクトル及び文脈ベクトルの何れかの次元を順番に対象次元とし、前記文脈ベクトル最適化部、前記単語ベクトル最適化部、及び前記反復判定部による処理を繰り返すようにしてもよい。 In the word vector learning device according to the first invention, the iterative optimization unit fixes a value of a target dimension of the word vector for each of the words, and determines a word vector and a context vector for each of the words. Based on the value of each dimension different from the target dimension and the number of times one word has appeared as the context of the other word for the word pair, the objective function for the target dimension is optimized for each of the words A context vector optimization unit for estimating a target dimension value of a context vector, and a target dimension value of the context vector for each of the words is fixed, and is different from the target dimension of the word vector and the context vector for each of the words Based on the value of each dimension and the number of times one word appears as the context of the other word for a word pair, By the word vector optimization unit that estimates the value of the target dimension of the word vector for each of the words so as to optimize the objective function, and by the context vector optimization unit until a predetermined iteration termination condition is satisfied An iterative determination unit that alternately repeats estimation and estimation by the word vector optimization unit, and any one of the word vector and the context vector is set as a target dimension in order, the context vector optimization unit, The processing by the word vector optimization unit and the iterative determination unit may be repeated.

また、第１の発明に係る単語ベクトル学習装置において、前記反復最適化部は、学習された前記単語ベクトル及び文脈ベクトルに対して次元数を増やして前記単語ベクトル及び文脈ベクトルを学習する場合、前記単語ベクトル及び文脈ベクトルの学習されていない何れかの次元を順番に対象次元とし、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記対象次元に関する目的関数を最適化するように、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元の値を推定することを繰り返すようにしてもよい。 Further, in the word vector learning device according to the first invention, when the iterative optimization unit learns the word vector and context vector by increasing the number of dimensions with respect to the learned word vector and context vector, Any dimension that is not learned of the word vector and the context vector is set as the target dimension in order, and the value of each dimension different from the target dimension of the word vector and the context vector for each of the words, and one word for the word pair Based on the number of appearances as the context of the other word, the estimation of the value of the target dimension of the word vector and the context vector for each of the words is repeated so as to optimize the objective function related to the target dimension. It may be.

また、第１の発明に係る単語ベクトル学習装置において、前記反復最適化部は、以下（５）式により表される、対象次元ｄに関する前記目的関数を、以下（４）式に従って最適化するように、単語ｉの各々についての単語ベクトルｒ´_ｉ及び単語ｊの各々についての文脈ベクトルｃ´_ｊの対象次元ｄの値を推定することを繰り返すようにしてもよい。
In the word vector learning device according to the first invention, the iterative optimization unit optimizes the objective function related to the target dimension d represented by the following equation (5) according to the following equation (4): to, may be repeated to estimate a value of the target dimension d of the context vectors _c'j for each word vectors _r'i and word j for each word i.

ただし、ｒ_ｉ，_ｄは単語ｉの単語ベクトルｒ´_ｉの次元ｄの要素、ｃ_ｊ，_ｄは単語ｊの文脈ベクトルｃ´_jの次元ｄの要素を表し、＾ｒ_ｉ，_ｄはｒ_ｉ，_ｄの、＾ｃ_ｊ，_ｄはｃ_ｊ，_ｄの推定結果を表し、ｂ_{ｉ，ｊ，ｄ}＝−ｒ´_ｉ ^（ｄ）・ｃ´_ｊ ^（ｄ）＋ｌｏｇ（Ｘ_ｉ，ｊ）であり、ｒ´_ｉ ^（ｄ）は、単語ｉの単語ベクトルｒ´_ｉの次元ｄの要素を０に置き換えた単語ベクトルを表し、ｃ´_ｊ ^（ｄ）は、単語ｊの単語ベクトルｃ´_ｊの次元ｄの要素を０に置き換えた文脈ベクトルを表し、Ｘ_ｉ，ｊは単語ｉに対して単語ｊが文脈として出現した回数を表す。 _However, r _{i, d} is the elements of the dimension d of the word vectors _r'i words _i, c _{j, d} represents the elements of the dimension d of the context vectors _c'j words _{j, ^} r _{i, d} is _{r i} , _D , ^ c _j , _d represents the estimation result of c _j , _d , and b _{i, j, d} = −r ′ _i ^(d) · c ′ _j ^(d) + log (X _{i, j} ) , R ′ _i ^(d) represents a word vector obtained by replacing the element of the dimension d of the word vector r ′ _i of the word _i with 0, and c ′ _j ^(d) represents the dimension of the word vector c ′ _j of the word _j . A context vector in which the element of d is replaced with 0 is represented, and X _{i, j} represents the number of times that the word j appears as a context with respect to the word i.

第２の発明に係る自然言語処理装置は、入力された入力文書に対して、上記の単語ベクトル学習装置で学習された各単語の前記単語ベクトルを用いて、前記単語ベクトルに基づく単語間の意味的な類似度に基づく自然言語処理を行う自然言語処理部、を含んで構成されている。 The natural language processing device according to the second invention uses the word vector of each word learned by the word vector learning device for an input document, and the meaning between words based on the word vector A natural language processing unit that performs natural language processing based on a similar degree of similarity.

第３の発明に係る単語ベクトル学習方法は、文書データに基づいて、単語の各々について、前記単語に関する単語ベクトル、及び前記単語が他の単語の文脈として出現することを表す文脈ベクトルを学習する単語ベクトル学習装置における単語ベクトル学習方法であって、反復最適化部が、前記単語ベクトル及び文脈ベクトルの何れかの次元を順番に対象次元とし、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記対象次元に関する目的関数を最適化するように、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元の値を推定することを繰り返すステップ、を含んで実行することを特徴とする。 A word vector learning method according to a third invention is based on document data, and for each word, learns a word vector related to the word and a context vector indicating that the word appears as a context of another word. A word vector learning method in a vector learning device, wherein the iterative optimization unit sequentially sets any one dimension of the word vector and the context vector as a target dimension, and the target dimension of the word vector and the context vector for each of the words A word vector for each of the words so as to optimize the objective function for the target dimension based on the value of each dimension different from the number of times that one word has appeared as the context of the other word for the word pair And repeating the step of estimating the value of the target dimension of the context vector. .

第３の発明に係る単語ベクトル学習方法において、前記反復最適化部が推定することを繰り返すステップは、文脈ベクトル最適化部が、前記単語の各々についての単語ベクトルの対象次元の値を固定とし、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記対象次元に関する目的関数を最適化するように、前記単語の各々についての文脈ベクトルの対象次元の値を推定するステップと、単語ベクトル最適化部が、前記単語の各々についての文脈ベクトルの対象次元の値を固定とし、前記単語の各々についての単語ベクトル及び文脈ベクトルの対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記対象次元に関する目的関数を最適化するように、前記単語の各々についての単語ベクトルの対象次元の値を推定するステップと、反復判定部が、予め定められた反復終了条件を満たすまで、前記文脈ベクトル最適化部による推定、及び前記単語ベクトル最適化部による推定を交互に繰り返し行うステップと、を含み、前記単語ベクトル及び文脈ベクトルの何れかの次元を順番に対象次元とし、前記文脈ベクトル最適化部が推定するステップ、前記単語ベクトル最適化部が推定するステップ、及び前記反復判定部によって処理を繰り返すことを特徴とする。 In the word vector learning method according to the third invention, the step of repeating the estimation by the iterative optimization unit is such that the context vector optimization unit fixes the value of the target dimension of the word vector for each of the words, Based on the value of each dimension different from the target dimension of the word vector and context vector for each of the words, and the number of times one word has appeared as the context of the other word for the word pair, the objective function for the target dimension is A step of estimating a value of a target dimension of a context vector for each of the words so as to optimize, and a word vector optimization unit fixes a value of a target dimension of the context vector for each of the words, The value of each dimension different from the target dimension of the word vector and context vector for each word, and one word for the other Estimating the value of the target dimension of the word vector for each of the words so as to optimize the objective function related to the target dimension based on the number of appearances as the context of the word; Alternately performing the estimation by the context vector optimization unit and the estimation by the word vector optimization unit until a predetermined iteration end condition is satisfied, and including any dimension of the word vector and the context vector Are sequentially set as the target dimension, and the process is repeated by the step of estimation by the context vector optimization unit, the step of estimation by the word vector optimization unit, and the iterative determination unit.

第４の発明に係るプログラムは、コンピュータを、上記の単語ベクトル学習装置、又は自然言語処理装置を構成する各部として機能させるためのプログラムである。 A program according to a fourth aspect of the invention is a program for causing a computer to function as each unit constituting the word vector learning device or the natural language processing device.

本発明の単語ベクトル学習装置、方法、及びプログラムによれば、単語ベクトル及び文脈ベクトルの何れかの次元を順番に対象次元とし、対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、対象次元に関する目的関数を最適化するように、単語の各々についての単語ベクトル及び文脈ベクトルの対象次元の値を推定することを繰り返すことにより、効率よく、単語ベクトルを学習するごとができる、という効果が得られる。 According to the word vector learning apparatus, method, and program of the present invention, any one dimension of the word vector and the context vector is set as the target dimension in order, and one word for each dimension value and word pair is different from the target dimension. By repeatedly estimating the value of the target dimension of the word vector and the context vector for each of the words so as to optimize the objective function for the target dimension based on the number of appearances as the context of the other word, There is an effect that each word vector can be learned efficiently.

また、本発明の自然言語処理装置、及びプログラムによれば、学習された単語ベクトルを用いて、単語の意味的な類似度に基づく自然言語処理を行うことができる、という効果が得られる。 In addition, according to the natural language processing apparatus and the program of the present invention, it is possible to perform natural language processing based on the semantic similarity of words using a learned word vector.

本発明の実施の形態に係る単語ベクトル学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the word vector learning apparatus which concerns on embodiment of this invention. 単語リスト及び文脈の共起情報の例を示す図である。It is a figure which shows the example of the word list and context co-occurrence information. 一次元の単語ベクトルの例を示す図である。It is a figure which shows the example of a one-dimensional word vector. 次元数を増やした二次元の単語ベクトルの例を示す図である。It is a figure which shows the example of the two-dimensional word vector which increased the number of dimensions. １０次元の単語ベクトルの例を示す図である。It is a figure which shows the example of a 10-dimensional word vector. 本発明の実施の形態に係る自然言語処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the natural language processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る単語ベクトル学習装置における単語ベクトル学習処理ルーチンを示すフローチャートである。It is a flowchart which shows the word vector learning process routine in the word vector learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る自然言語処理装置における自然言語処理ルーチンを示すフローチャートである。It is a flowchart which shows the natural language processing routine in the natural language processing apparatus which concerns on embodiment of this invention. 分散意味表現による単語間の類似度の概要を示す図である。It is a figure which shows the outline | summary of the similarity between words by a distributed semantic expression.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞ <Outline according to Embodiment of the Present Invention>

まず、本発明の実施の形態における概要を説明する。本発明の実施の形態では、現実の環境で分散意味表現を利用する際に起こりえる状況に即して、単語数Ｎと次元数Ｄの更新という観点から、逐次追加可能で継続的に学習が行え、かつ、得られる結果は前回の結果を極力保持したような振る舞いとなる単語の分散意味表現獲得方法の枠組みを提案する。本実施の形態でも従来法と同様に、上記（２）式で示した最適化問題を基本的に踏襲して単語ベクトルを獲得する。ただし、前述の単語数Ｎと次元数Ｄが逐次増加するような状況でも継続的に学習を行えるような枠組みへ変更する。 First, an outline of the embodiment of the present invention will be described. In the embodiment of the present invention, in accordance with the situation that can occur when using the distributed semantic representation in the real environment, it is possible to add continuously and learn continuously from the viewpoint of updating the number of words N and the number of dimensions D. We propose a framework for obtaining a distributed semantic representation of words that can be performed and the results obtained behave as if the previous results were kept as much as possible. Also in the present embodiment, as in the conventional method, a word vector is acquired basically following the optimization problem shown in the above equation (2). However, the framework is changed so that learning can be continuously performed even in a situation where the number of words N and the number of dimensions D are sequentially increased.

まず、全ての単語ベクトルｒ´_ｉと文脈ベクトルｃ´_ｊはＤ次元のベクトルからなるものとする。ただし、Ｄ≧１である。このとき任意の対象次元ｄに対して、全ての単語ベクトルの次元ｄの要素のみを最適化変数とし、それ以外（次元ｄ以外）の要素の値を定数とみなす。ただし、１≦ｄ≦Ｄである。この設定で、上記（２）式の最適化問題を解くことを考えると、各単語について１変数の最適化問題を解く問題とみなせる。このとき当たり前の話であるが、各ベクトルの次元ｄの要素のみを最適化変数と決定したので、最適化後に変更される単語ベクトルの要素は選択した次元ｄの要素のみである。対象次元ｄでの学習が終了した後に、対象次元ｄの選択を適宜変更し同様の学習を繰り返す、という処理へ変更する。このように変更しても、単に各次元毎の最適化を繰り返すことで元の最適化問題の解を得ようという最適化アルゴリズムを用いていると解釈できるので、基本的には同じ問題を、ある特定の方法で解いているという解釈ができる。 First, all word vectors r ′ _i and context vectors c ′ _j are assumed to be D-dimensional vectors. However, D ≧ 1. At this time, with respect to an arbitrary target dimension d, only the elements of the dimension d of all the word vectors are set as optimization variables, and the values of the other elements (except the dimension d) are regarded as constants. However, 1 ≦ d ≦ D. Considering solving the optimization problem of the above equation (2) with this setting, it can be regarded as a problem of solving the optimization problem of one variable for each word. As a matter of course, since only the element of dimension d of each vector is determined as the optimization variable, the elements of the word vector that are changed after optimization are only elements of the selected dimension d. After the learning in the target dimension d is completed, the process is changed to the process of appropriately changing the selection of the target dimension d and repeating the same learning. Even if it changes in this way, it can be interpreted that the optimization algorithm is used to obtain the solution of the original optimization problem by simply repeating the optimization for each dimension. It can be interpreted as solving in a certain way.

本実施の形態では、まず、基本的に学習は永続的に行われるという状況において、学習環境の設定でおこなう。ただし、実際に処理が発生するのは、（１）文書データが更新された場合、（２）単語ベクトルの次元数を増加したいとき、の２パターンの事象が発生した場合に限られる。学習にはある程度の時間を必要とするが、上記２パターンの事象の発生より学習時間の方が短ければ、処理としては停止した状態で待機するような形式となる。 In the present embodiment, first, in a situation where learning is basically performed permanently, the learning environment is set. However, the actual processing occurs only when two patterns of events occur: (1) when document data is updated, or (2) when it is desired to increase the number of dimensions of a word vector. Learning requires a certain amount of time, but if the learning time is shorter than the occurrence of the two patterns of events, the process is in a state of waiting in a stopped state.

また、学習の開始時は単語ベクトルの数はＮ＝０、ベクトルの次元はＤ＝１に設定されていることとする。このように、本実施の形態においては、最初は必ず単語数や次元数が最小の状態から始めることができる。 Also, at the start of learning, the number of word vectors is set to N = 0, and the vector dimension is set to D = 1. As described above, in the present embodiment, it is always possible to start from the state where the number of words and the number of dimensions are minimum.

主な処理の流れは、以下の処理１〜処理４のようになる。 The main processing flow is as shown in the following processing 1 to processing 4.

（処理１）：待機
文書データ更新のシグナルを受信したら処理２Ａへ移行する。
単語ベクトルの次元数増加のシグナルを受信したら処理２Ｂへ移行する。 (Process 1): Standby When a document data update signal is received, the process proceeds to Process 2A.
When a signal for increasing the number of dimensions of the word vector is received, the process proceeds to process 2B.

（処理２Ａ）：文書データの更新
文書データを読み込み、単語と文脈単語の共起情報を更新する。
文書データに出現した単語数に従って単語数Ｎの値を更新する。
対象次元ｄをｄ＝１とセットする。 (Process 2A): Update of document data The document data is read, and the co-occurrence information of the word and the context word is updated.
The value of the number of words N is updated according to the number of words that appear in the document data.
The target dimension d is set as d = 1.

（処理２Ｂ）：ベクトルに対する次元の追加
単語ベクトル及び文脈ベクトルの要素を末尾に一つ増やし初期化する。
次元数ＤをＤ＝Ｄ＋１とする。
ｄの初期値をｄ＝Ｄとセットする。 (Process 2B): Addition of dimension to vector The word vector and context vector elements are incremented by one at the end and initialized.
The dimension number D is set to D = D + 1.
Set the initial value of d as d = D.

（処理３）
後述する（６）式に従って対象次元ｄに関する最適化問題を解き、単語ベクトルの対象次元ｄの値を更新する。 (Process 3)
The optimization problem related to the target dimension d is solved according to the equation (6) described later, and the value of the target dimension d of the word vector is updated.

（処理４）
ｄがｄ＝Ｄならば処理１へ戻る。
ｄがｄ＝Ｄ以外であればｄ＝ｄ＋１として処理３へ戻る。 (Process 4)
If d is d = D, return to process 1.
If d is other than d = D, d = d + 1 and the process returns to step 3.

本発明の実施の形態では、上記の処理１〜処理４を、ｗｅｂ上に存在する自然言語で記述された電子化文書を使って、単語の分散意味表現を獲得する問題を題材として適用した単語ベクトル学習装置を例に説明する。 In the embodiment of the present invention, the above processing 1 to processing 4 are applied using the problem of obtaining a distributed semantic expression of a word using an electronic document described in a natural language existing on a web as a subject. A vector learning apparatus will be described as an example.

＜本発明の実施の形態に係る単語ベクトル学習装置の構成＞ <Configuration of Word Vector Learning Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る単語ベクトル学習装置の構成について説明する。図１に示すように、本発明の実施の形態に係る単語ベクトル学習装置１００は、ＣＰＵと、ＲＡＭと、後述する単語ベクトル学習処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この単語ベクトル学習装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。また、単語ベクトル学習装置１００は、初期状態は、何も学習していない状態であると仮定する。よって単語数ＮはＮ＝０、次元数ＤはＤ＝１に設定されている。 Next, the configuration of the word vector learning device according to the embodiment of the present invention will be described. As shown in FIG. 1, a word vector learning device 100 according to an embodiment of the present invention includes a CPU, a RAM, a ROM for storing a program and various data for executing a word vector learning processing routine described later, Can be configured with a computer including Functionally, the word vector learning device 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG. The word vector learning device 100 assumes that the initial state is a state in which nothing is learned. Therefore, the number of words N is set to N = 0, and the number of dimensions D is set to D = 1.

入力部１０は、文書データ及び文書データ更新のシグナル、又は次元数増加のシグナルを受け付ける。文書データとしては、本実施の形態では、ある一日のＳＮＳサイト等へ投稿される一般ユーザが生成する文書データを用いる。 The input unit 10 receives a document data and a document data update signal or a signal for increasing the number of dimensions. As the document data, in this embodiment, document data generated by a general user who is posted to an SNS site on a certain day is used.

演算部２０は、文書データ更新部２６と、次元数増加部２８と、反復最適化部３０と、ベクトル記憶部４０とを含んで構成されている。 The calculation unit 20 includes a document data update unit 26, a dimension number increase unit 28, an iterative optimization unit 30, and a vector storage unit 40.

文書データ更新部２６は、以下に説明するように、入力部１０から文書データ更新のシグナルを受け取った場合に、文書データの更新に伴い、単語リストと、単語の文脈の共起情報を更新すると共に、新たに追加された単語の単語ベクトル及び文脈ベクトルを準備する。 As described below, the document data update unit 26 updates the word list and the word context co-occurrence information when the document data update signal is received from the input unit 10 as the document data is updated. At the same time, a word vector and a context vector of the newly added word are prepared.

文書データ更新部２６は、まず、受け付けた文書データに基づいて、単語リストと、単語の文脈の共起情報とを更新する。ここで得られた単語リスト内の単語数に基づいてＮを設定する。本実施の形態では、単語リストに用いる単語数は、受け付けた文書データに出現する全単語とする。例えば、単語数が１万単語であれば、Ｎ＝１０，０００である。あるいは、ある頻度以上出現した単語としてもよい。 The document data updating unit 26 first updates the word list and the word context co-occurrence information based on the received document data. N is set based on the number of words in the word list obtained here. In the present embodiment, the number of words used in the word list is all the words that appear in the accepted document data. For example, if the number of words is 10,000 words, N = 10,000. Or it is good also as a word which appeared more than a certain frequency.

次に、文書データ更新部２６は、単語と、当該単語の文脈から抽出された単語とのペアをまとめ、共起情報を得る。例えば、各単語の文脈情報として、対象となる単語の前後２単語を抽出する。ただし、前後何単語を実際に使うかは任意に定めればよく、いくつでもよい。その際に出現数（共起頻度）も併せて記録する。このペアと頻度が統計情報として事前に獲得されるものである。 Next, the document data update unit 26 collects a pair of a word and a word extracted from the context of the word to obtain co-occurrence information. For example, two words before and after the target word are extracted as context information of each word. However, the number of words used before and after may be arbitrarily determined, and any number may be used. At that time, the number of appearances (co-occurrence frequency) is also recorded. This pair and frequency are acquired in advance as statistical information.

図２に単語リスト及び文脈の共起情報の例を示す。なお、ここで示す例では簡単のため、単語区切りなどは、一般的によく用いられるツール等を用いて容易に獲得可能であることを前提とする。日本語の場合は、フリーで利用できるツールが存在するし、英語であれば、空白区切りを単語の区切りとして利用すれば良い。また、本実施の形態では日本語の例として説明するが、英語に適用してもよい。また、本実施の形態では、処理を簡便化、高速化するため、事前に単語と文脈の共起情報を文書データから取得しておく。 FIG. 2 shows an example of word list and context co-occurrence information. For the sake of simplicity in the example shown here, it is assumed that word breaks and the like can be easily obtained using a commonly used tool or the like. In the case of Japanese, there are tools that can be used for free, and in the case of English, a blank separator can be used as a word separator. In this embodiment, an example of Japanese is described, but it may be applied to English. In this embodiment, in order to simplify and speed up the processing, word and context co-occurrence information is acquired from document data in advance.

次に、文書データ更新部２６は、文書データの更新により新たに追加された単語については、当該単語の単語ベクトル及び文脈ベクトルを、ベクトル記憶部４０に記憶されている単語ベクトル及び文脈ベクトルに追加する。初期状態では、次元数ＤがＤ＝１なので、文書データ更新部２６は、各単語に対して、全て１次元の単語ベクトル及び文脈ベクトルを準備し、ベクトル記憶部４０に記憶する。 Next, the document data update unit 26 adds the word vector and context vector of the word to the word vector and context vector stored in the vector storage unit 40 for a word newly added by updating the document data. To do. In the initial state, since the dimension number D is D = 1, the document data update unit 26 prepares a one-dimensional word vector and context vector for each word and stores them in the vector storage unit 40.

次元数増加部２８は、入力部１０から次元数増加のシグナルを受け取った場合に処理を実行する。次元数の増加に対応するため、全ての単語ベクトル及び文脈ベクトルの各々について、次元を一つ追加し、追加した次元の要素を任意の値で初期化する。次に、次元数ＤをＤ＝Ｄ＋１と更新する。 The dimension number increasing unit 28 executes processing when a signal indicating an increase in dimension number is received from the input unit 10. In order to cope with the increase in the number of dimensions, one dimension is added for each of all word vectors and context vectors, and the elements of the added dimension are initialized with arbitrary values. Next, the dimension number D is updated to D = D + 1.

反復最適化部３０は、文脈ベクトル最適化部３２と、単語ベクトル最適化部３４と、反復判定部３６とを含んで構成されている。 The iterative optimization unit 30 includes a context vector optimization unit 32, a word vector optimization unit 34, and an iterative determination unit 36.

まず、反復最適化部３０の原理について説明する。 First, the principle of the iterative optimization unit 30 will be described.

反復最適化部３０は、単語ベクトル及び文脈ベクトルの何れかの次元を順番に対象次元とし、単語の各々についての単語ベクトル及び文脈ベクトルの対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、対象次元に関する目的関数を最適化するように、単語の各々についての単語ベクトル及び文脈ベクトルの対象次元の値を推定することを繰り返す。 The iterative optimization unit 30 sets any one dimension of the word vector and the context vector as the target dimension in order, and the value of each dimension different from the target dimension of the word vector and the context vector for each word and one of the word pairs Based on the number of times a word appears as the context of the other word, the estimation of the value of the target dimension of the word vector and the context vector for each of the words is repeated so as to optimize the objective function for the target dimension.

反復最適化部３０は、具体的には、以下の処理を行う。まず、入力として、文書データ更新部２６から単語リストと文脈の共起頻度とを受け付ける。そして、処理対象の次元数を表す変数ｄの初期値を１と設定する。記述を簡単にするために、ｉ番目の単語の単語ベクトルｒ´_ｉに対して、次元ｄの要素をｒ_ｉ，ｄとする。次に、単語ベクトルｒ´_ｉの次元ｄの要素を０に固定したベクトルをｒ´_ｉ ^（ｄ）とする。同様に、文脈ベクトルｃ´_ｊの次元ｄの要素を０に固定したベクトルをｃ´_ｊ ^（ｄ）とする。このとき、最適化に用いる変数は、全ての単語ｉに対するｒ_ｉ，ｄ及び全ての単語ｊに対するｃ_ｊ，ｄのみとし、全てのｒ´_ｉ ^（ｄ）及びｃ´_ｊ ^（ｄ）は定数とする。更に記述を簡単にするため、最適化中に定数となる項をまとめてｂ_{ｉ，ｊ，ｄ}＝−ｒ´_ｉ ^（ｄ）・ｃ´_ｊ ^（ｄ）＋ｌｏｇ（Ｘ_ｉ，ｊ）とおく。 Specifically, the iterative optimization unit 30 performs the following processing. First, as input, a word list and a context co-occurrence frequency are received from the document data update unit 26. Then, the initial value of the variable d representing the number of dimensions to be processed is set to 1. To simplify the description, for the i th word of the word vectors _r'i, the elements of the dimension d r _i, as _d. Next, let r ′ _i ^(d) be a vector in which the element of dimension d of word vector r ′ _i is fixed to 0. Similarly, let c ′ _j ^(d) be a vector in which the element of the dimension d of the context vector c ′ _j is fixed to 0. At this time, variables used for optimization are only r _{i, d} for all words i and c _{j, d} for all words j, and all r ′ _i ^(d) and c ′ _j ^(d) are constants. To do. To further simplify the description, terms that become constants during optimization are collectively represented as b _{i, j, d} = −r ′ _i ^(d) · c ′ _j ^(d) + log (X _{i, j} ).

このとき、以下（６）式に示す、対象次元ｄに関する最適化問題を解く。 At this time, the optimization problem related to the target dimension d shown in the following equation (6) is solved.

上記（６）式は、上記（２）式と比較して、単純に全ての単語ベクトル及び文脈ベクトルの次元ｄの要素のみが最適化変数となり、それ以外の次元の要素は全て固定された状態で最適化を行うことを意味している。よって、Ｄ＝１であるならば、（２）式と（６）式は完全に等価であることを意味する。 Compared with the above equation (6), the above equation (6) is simply a state in which only the elements of the dimension d of all the word vectors and context vectors are optimization variables, and the elements of the other dimensions are all fixed. This means that optimization is performed. Therefore, if D = 1, it means that the expressions (2) and (6) are completely equivalent.

上記（６）式の最適化時には、全ての単語ｉに対してｒ_ｉ，ｄを固定し、全ての単語ｊに関して変数ｃ_ｊ，ｄを最適化する。逆に、全ての単語ｊに対してｃ_ｊ，ｄを固定し、全ての単語ｉに関して変数ｒ_ｉ，ｄを最適化する、という処理を交互に繰り返して最終的に（６）式の最適化問題の解を得る。このように、それぞれの問題は、それぞれの変数に対して解析解が求めるので正確な解が求まる。以下、反復最適化部３０における、文脈ベクトル最適化部３２、単語ベクトル最適化部３４、及び反復判定部３６の各部の処理について説明する。 When optimizing the above equation (6), r _{i, d} is fixed for all words i, and variables c _{j, d} are optimized for all words j. Conversely, the process of fixing c _{j, d} for all the words j and optimizing the variables r _{i, d} for all the words i is repeated alternately to finally optimize the expression (6). Get the solution to the problem. In this way, each problem can be obtained accurately because an analytical solution is obtained for each variable. Hereinafter, processing of each unit of the context vector optimization unit 32, the word vector optimization unit 34, and the iteration determination unit 36 in the iterative optimization unit 30 will be described.

文脈ベクトル最適化部３２は、全ての単語ｉについての単語ベクトルｒ´_ｉの対象次元ｄの値を、ベクトル記憶部４０に記憶されているｒ_ｉ，ｄに固定し、各単語ｉについての単語ベクトルｒ´_ｉ及び各単語ｊについての文脈ベクトルｃ´_ｊの、対象次元と異なる各次元の値と、単語ペアについて一方の単語ｉが他方の単語ｊの文脈として出現した回数Ｘ_ｉ,ｊとに基づいて、対象次元ｄに関する上記（７）式に示す目的関数を最適化するように、各単語ｊについての文脈ベクトルｃ´_ｊの対象次元ｄの値を推定して、ベクトル記憶部４０に記憶する。ここで、文脈ベクトル最適化部３２は、全ての単語ｉについての単語ベクトルｒ´_ｉの対象次元ｄの値ｒ_ｉ，ｄを固定して、各単語ｊに関して文脈ベクトルｃ´_ｊの対象次元ｄの要素ｃ_ｊ，ｄを最適化する場合は、以下（８）式のように、全ての単語ｊに対して各要素ｃ_ｊ，ｄの偏微分が０になる値を求めればよい。 Context vectors optimization unit 32, the value of the target dimension d of the word vectors _r'i for all words i, r _i, is fixed to _d stored in the vector storage unit 40, the words for each word i context vector _c'j for vector _r'i and each word j, and the value of each dimension different from the target dimension, the number one word i for the word pair has emerged as the context of the other words j X _i, and _j based on, to optimize the objective function shown in equation (7) relating to the target dimension d, the estimated value of the target dimension d of the context vectors _c'j for each word j, the vector storage section 40 Remember. Here, the context vector optimization unit 32, the value r _i of the target dimension d of the word vectors _r'i for all word _i, to fix the _d, target dimension d of the context vectors _c'j for each word j when optimizing the element c _j, and _d, as follows (8), each element c _j for all word _j, may be determined a value partial differential becomes zero _d.

単語ベクトル最適化部３４は、全ての単語ｊについての文脈ベクトルｃ´_ｊの対象次元ｄの値をベクトル記憶部４０に記憶されているｃ_ｊ，ｄに固定し、各単語ｉ、ｊについての単語ベクトルｒ´_ｉ及び文脈ベクトルｃ´_ｊの対象次元と異なる各次元の値と、単語ペアについて一方の単語ｉが他方の単語ｊの文脈として出現した回数Ｘ_ｉ,ｊとに基づいて、対象次元ｄに関する上記（７）式に示す目的関数を最適化するように、各単語ｉについての単語ベクトルｒ´_ｉの対象次元ｄの値を推定して、ベクトル記憶部４０に記憶する。ここで、単語ベクトル最適化部３４は、全ての単語ｊについての単語ベクトルｃ´_ｊの対象次元ｄの値ｃ_j，ｄを固定して、各単語ｉに関して単語ベクトルｒ´_ｉの対象次元ｄの要素ｒ_ｉ，ｄを最適化する場合は、以下（９）式のように、全ての単語ｉに対して各要素ｒ_ｉ，ｄの偏微分が０になる値を求めればよい。 Word vector optimization section 34, c _j where the value of the target dimension d is stored in the vector storage unit 40 of the context vectors _c'j for all word _j, fixed to _d, each word i, for j Based on the value of each dimension different from the target dimension of the word vector r ′ _i and the context vector c ′ _j and the number of times X _{i, j} that one word i appears as the context of the other word j for the word pair so as to optimize the objective function shown in equation (7) relates to the dimension d, and estimate the value of the target dimension d of the word vectors _r'i for each word i, and stores in the vector storage section 40. Here, the word vector optimization unit 34, the value c _j of the target dimension d of the word vector _c'j for all word _j, to fix the _d, target dimension d of the word vectors _r'i for each word i when optimizing the elements r _i, and _d, as follows (9), each element r _i for all words _i, may be determined a value partial differential becomes zero _d.

反復判定部３６は、予め定められた反復終了条件を満たすまで、文脈ベクトル最適化部３２による推定、及び単語ベクトル最適化部３４による推定を交互に繰り返し行う。反復終了条件としては、例えば、上記（８）式と（９）式とを交互に計算すると、上記（７）式の目的関数の値は必ず同じか、小さくなるため、目的関数の値が変化しなくなるか、ほとんど変化がなくなるまで処理を繰り返す。そして、対象次元ｄがｄ＝Ｄであれば、次の学習データ更新のシグナル又は次元数増加のシグナルが届くまで待機する。一方、対象次元ｄがｄ≠Ｄであれば、ｄ＝ｄ＋１として対象次元を変更し、次の対象次元において、文脈ベクトル最適化部３２、及び単語ベクトル最適化部３４の処理を繰り返す。例えば、評判分析等に利用するため、単語ベクトルを１０次元程度まで拡充したいという要求があったと仮定した場合、次元数増加のシグナルを受け付ける。この時、本実施の形態に係る単語ベクトル学習装置は必ず１次元ずつしか増やさないため、次元数増加部２８、文脈ベクトル最適化部３２、及び単語ベクトル最適化部３４の一連の処理を１０回繰り返す。 The iterative determination unit 36 repeatedly performs the estimation by the context vector optimization unit 32 and the estimation by the word vector optimization unit 34 alternately until a predetermined iteration end condition is satisfied. As an iterative end condition, for example, when the above formulas (8) and (9) are calculated alternately, the value of the objective function in the above formula (7) is always the same or smaller, so the value of the objective function changes. Repeat the process until there is no change or almost no change. If the target dimension d is d = D, the process waits until a next learning data update signal or a signal indicating an increase in the number of dimensions arrives. On the other hand, if the target dimension d is d ≠ D, the target dimension is changed as d = d + 1, and the processing of the context vector optimization unit 32 and the word vector optimization unit 34 is repeated in the next target dimension. For example, if it is assumed that there is a request to expand the word vector to about 10 dimensions for use in reputation analysis or the like, a signal for increasing the number of dimensions is accepted. At this time, since the word vector learning device according to the present embodiment always increases only one dimension at a time, a series of processes of the dimension number increasing unit 28, the context vector optimizing unit 32, and the word vector optimizing unit 34 are performed 10 times. repeat.

図３に反復最適化部３０による推定により得られた一次元の単語ベクトルの例を示し、図４及び図５に、次元数を増やして、反復最適化部３０による推定を繰り返して得た単語ベクトルの例を示す。 FIG. 3 shows an example of a one-dimensional word vector obtained by estimation by the iterative optimization unit 30, and FIGS. 4 and 5 show words obtained by repeating the estimation by the iterative optimization unit 30 by increasing the number of dimensions. An example of a vector is shown.

＜本発明の実施の形態に係る自然言語処理装置の構成＞ <Configuration of Natural Language Processing Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る自然言語処理装置の構成について説明する。図６に示すように、本発明の実施の形態に係る自然言語処理装置２００は、ＣＰＵと、ＲＡＭと、後述する自然言語処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この自然言語処理装置２００は、機能的には図６に示すように入力部２１０と、演算部２２０と、出力部２５０とを備えている。本実施の形態では、自然言語処理装置２００では、単語ベクトル学習装置１００により学習された単語ベクトルに基づいて、未知の単語を類似度の高い単語に置き換えて翻訳を行う場合を例に説明するが、これに限定されるものではなく、置き換えた単語を用いて要約、文書校正などを行ってもよい。 Next, the configuration of the natural language processing apparatus according to the embodiment of the present invention will be described. As shown in FIG. 6, a natural language processing apparatus 200 according to an embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a natural language processing routine described later. It can be configured with a computer including. Functionally, the natural language processing apparatus 200 includes an input unit 210, a calculation unit 220, and an output unit 250 as shown in FIG. In the present embodiment, the natural language processing device 200 will be described by taking as an example a case where an unknown word is replaced with a word having a high similarity based on the word vector learned by the word vector learning device 100. However, the present invention is not limited to this, and summarization, document proofreading, and the like may be performed using the replaced word.

入力部２１０は、翻訳対象のテキストを受け付ける。 The input unit 210 receives text to be translated.

演算部２２０は、自然言語処理部２３０と、ベクトル記憶部２４０とを備えている。 The calculation unit 220 includes a natural language processing unit 230 and a vector storage unit 240.

ベクトル記憶部２４０には、ベクトル記憶部４０と同じものが記憶されている。 The vector storage unit 240 stores the same as the vector storage unit 40.

自然言語処理部２３０は、置換部２３２と、翻訳部２３４とを備えている。 The natural language processing unit 230 includes a replacement unit 232 and a translation unit 234.

置換部２３２は、入力部２１０で受け付けたテキストの単語のうち、単語を格納した既存の辞書（図示省略）にない未知の単語を抽出し、ベクトル記憶部４０に記憶されている単語に対する文脈ベクトルに基づいて、未知の単語に対して最も類似度が高い、辞書中の単語を推定する。そして、未知の単語を、推定された辞書中の単語に置き換えたテキストを生成する。 The replacement unit 232 extracts an unknown word that is not in an existing dictionary (not shown) that stores the word from the words of the text received by the input unit 210, and a context vector for the word stored in the vector storage unit 40 Based on the above, the word in the dictionary having the highest similarity to the unknown word is estimated. And the text which replaced the unknown word with the word in the estimated dictionary is produced | generated.

翻訳部２３４は、置換部２３２により単語が置き換えられたテキストを既存の手法により翻訳し、出力部２５０に出力して処理を終了する。 The translation unit 234 translates the text in which the word is replaced by the replacement unit 232 using an existing method, outputs the translated text to the output unit 250, and ends the processing.

なお、自然言語処理装置２００において、他の自然言語処理を行う際に、特定の文書中に出現する単語と類似する単語を辞書から抽出して、処理対象に含めることで、情報を増やして精度を向上させることが可能である。この際、出現した各単語に対して、上記（１）式を計算して類似度が高い単語を処理に含めるといったことを行う。 In the natural language processing apparatus 200, when performing other natural language processing, a word similar to a word appearing in a specific document is extracted from the dictionary and included in the processing target, thereby increasing information and accuracy. It is possible to improve. At this time, for each word that appears, the above formula (1) is calculated and a word having a high similarity is included in the process.

＜本発明の実施の形態に係る単語ベクトル学習装置の作用＞ <Operation of the word vector learning device according to the embodiment of the present invention>

次に、本発明の実施の形態に係る単語ベクトル学習装置１００の作用について説明する。入力部１０において文書データ及び文書データ更新のシグナル、又は次元数増加のシグナルを受け付けると、単語ベクトル学習装置１００は、図７に示す単語ベクトル学習処理ルーチンを実行する。 Next, the operation of the word vector learning device 100 according to the embodiment of the present invention will be described. When the input unit 10 receives a document data and a document data update signal or a dimensionality increase signal, the word vector learning device 100 executes a word vector learning processing routine shown in FIG.

まず、ステップＳ１００では、入力部１０において受け付けたシグナルが、文書データ更新のシグナル、又は次元数増加のシグナルのいずれのシグナルであるかを判定し、文書データ更新のシグナルであれば、ステップＳ１０２へ移行し、次元数増加のシグナルであれば、ステップＳ１０８へ移行する。 First, in step S100, it is determined whether the signal received at the input unit 10 is a document data update signal or a dimensionality increase signal. If the signal is a document data update signal, the process proceeds to step S102. If the signal is a signal for increasing the number of dimensions, the process proceeds to step S108.

ステップＳ１０２では、入力部１０で受け付けた文書データに基づいて、単語リストと、単語の文脈の共起情報を更新する。 In step S102, the word list and word context co-occurrence information are updated based on the document data received by the input unit 10.

ステップＳ１０４では、入力部１０で受け付けた文書データにより新たに単語が追加されているかを判定し、追加されていればステップＳ１０６へ移行し、追加されていなければステップＳ１１０へ移行する。 In step S104, it is determined whether a word is newly added based on the document data received by the input unit 10, and if it is added, the process proceeds to step S106, and if not, the process proceeds to step S110.

ステップＳ１０６では、入力部１０で受け付けた文書データにより追加された単語については、当該単語のベクトル及び文脈ベクトルを、ベクトル記憶部４０に記憶されている単語ベクトル及び文脈ベクトルに追加する。そして、ステップＳ１０７において、対象次元ｄを１にセットして、ステップＳ１１４へ移行する。 In step S <b> 106, for the word added by the document data received by the input unit 10, the word vector and context vector are added to the word vector and context vector stored in the vector storage unit 40. In step S107, the target dimension d is set to 1, and the process proceeds to step S114.

ステップＳ１０８では、入力部１０で受け付けた次元数増加のシグナルに基づいて、ベクトル記憶部４０に記憶されている全ての単語の単語ベクトル及び文脈ベクトルの各々について、次元を一つ追加し、追加した次元の要素を任意の値で初期化する。 In step S108, one dimension is added and added for each of the word vectors and context vectors of all words stored in the vector storage unit 40 based on the signal of the increase in the number of dimensions received by the input unit 10. Initialize dimension elements with arbitrary values.

ステップＳ１１０では、次元数ＤをＤ＋１に更新する。 In step S110, the number of dimensions D is updated to D + 1.

ステップＳ１１２では、学習されていない次元を学習対象とするため、対象次元ｄをＤに設定する。 In step S112, the target dimension d is set to D in order to set an unlearned dimension as a learning target.

ステップＳ１１４では、ベクトル記憶部４０に記憶されている全ての単語の単語ベクトル及び文脈ベクトルに基づいて、上記（８）式に従って、各単語の文脈ベクトルの対象次元ｄの要素ｃ_ｊ，ｄを最適化し、ベクトル記憶部４０に記憶されている各単語の文脈ベクトルを更新する。 In step S114, based on the word vectors and context vectors of all the words stored in the vector storage unit 40, the elements c _{j, d} of the target dimension d of the context vector of each word are optimized according to the above equation (8). And the context vector of each word stored in the vector storage unit 40 is updated.

ステップＳ１１６では、ベクトル記憶部４０に記憶されている全ての単語の単語ベクトル及び文脈ベクトルに基づいて、上記（９）式に従って、各単語の単語ベクトルの対象次元ｄの要素ｒ_ｉ，ｄを最適化し、ベクトル記憶部４０に記憶されている各単語の単語ベクトルを更新する。 In step S116, based on the word vectors and context vectors of all the words stored in the vector storage unit 40, the elements r _{i, d} of the target dimension d of the word vector of each word are optimized according to the above equation (9). And the word vector of each word stored in the vector storage unit 40 is updated.

ステップＳ１１８では、ステップＳ１１４及びステップＳ１１６の最適化により、上記（７）式の目的関数の値が、上記（６）式に従って定められた反復終了条件を満たすかを判定し、満たしていればステップＳ１２０へ移行し、満たしていなければステップＳ１１４へ戻ってステップＳ１１４及びステップＳ１１６の最適化の処理を繰り返す。 In step S118, the optimization of step S114 and step S116 determines whether the value of the objective function in equation (7) satisfies the iterative termination condition defined according to equation (6). The process proceeds to S120, and if not satisfied, the process returns to Step S114, and the optimization process in Steps S114 and S116 is repeated.

ステップＳ１２０では、ｄ＝Ｄであるかを判定し、ｄ＝Ｄであれば、ステップＳ１２２へ移行し、ｄ＝ＤでなければステップＳ１１８へ移行する。 In step S120, it is determined whether d = D. If d = D, the process proceeds to step S122, and if d = D, the process proceeds to step S118.

ステップＳ１２２では、ｄ＝ｄ＋１として対象次元を変更し、ステップＳ１１４へ移行してステップＳ１１４〜ステップＳ１１６の最適化の処理を繰り返す。 In step S122, the target dimension is changed as d = d + 1, the process proceeds to step S114, and the optimization processing in steps S114 to S116 is repeated.

ステップＳ１２４では、ステップＳ１１４〜ステップＳ１１８の処理で得られた単語ベクトル及び文脈ベクトルを出力して単語ベクトル学習処理ルーチンを終了し、次のシグナルを受け付けるまで待機する。 In step S124, the word vector and the context vector obtained in steps S114 to S118 are output, the word vector learning processing routine is terminated, and the process waits until the next signal is received.

以上説明したように、本発明の実施の形態に係る単語ベクトル学習装置によれば、単語ベクトル及び文脈ベクトルの何れかの次元を順番に対象次元とし、対象次元と異なる各次元の値と、単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、対象次元に関する上記（７）式の目的関数を最適化するように、単語の各々についての単語ベクトル及び文脈ベクトルの対象次元の値を推定することを繰り返すことにより、効率よく、単語ベクトルを学習するごとができる。 As described above, according to the word vector learning device according to the embodiment of the present invention, any dimension of the word vector and the context vector is set as the target dimension in order, the value of each dimension different from the target dimension, and the word The word vector and context vector targets for each word so as to optimize the objective function of equation (7) above for the target dimension based on the number of times one word has appeared as the context of the other word for the pair By repeatedly estimating the dimension value, the word vector can be learned efficiently.

＜本発明の実施の形態に係る自然言語処理装置の作用＞ <Operation of Natural Language Processing Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る自然言語処理装置２００の作用について説明する。入力部２１０において翻訳対象のテキストを受け付けると、自然言語処理装置２００は、図８に示す自然言語処理ルーチンを実行する。 Next, the operation of the natural language processing apparatus 200 according to the embodiment of the present invention will be described. When the input unit 210 accepts a text to be translated, the natural language processing apparatus 200 executes a natural language processing routine shown in FIG.

ステップＳ２００では、入力部２１０で受け付けた翻訳対象のテキストから未知の単語を抽出する。 In step S200, an unknown word is extracted from the text to be translated received by the input unit 210.

ステップＳ２０２では、ベクトル記憶部２４０に記憶されている各単語の単語ベクトルに基づいて、ステップＳ２００で抽出された未知の単語に対して最も類似度が高い、辞書中の単語を推定し、翻訳対象のテキストについて、未知の単語を、推定された辞書中の単語に置き換えたテキストを生成する。 In step S202, based on the word vector of each word stored in the vector storage unit 240, the word in the dictionary having the highest similarity to the unknown word extracted in step S200 is estimated, and the translation target Is generated by replacing an unknown word with a word in the estimated dictionary.

ステップＳ２０４では、ステップＳ２０２で生成されたテキストに基づいて翻訳し、出力部２５０に出力して処理を終了する。 In step S204, translation is performed based on the text generated in step S202, and the text is output to the output unit 250 and the process is terminated.

以上説明したように、本発明の実施の形態に係る自然言語処理装置によれば、学習された単語ベクトルを用いて、単語の意味的な類似度に基づく翻訳処理を行うことができる。 As described above, according to the natural language processing apparatus of the embodiment of the present invention, it is possible to perform translation processing based on the semantic similarity of words using learned word vectors.

また、本発明の実施の形態に係る単語ベクトル学習装置によれば、ベクトルの各次元に意味を持たせるために、ｄ＝１からｄ＝Ｄまで順番に最適化を行っている。これは、このように１番目の次元から順番に処理することで、各ベクトルの１番目の要素が最も強い類似度を表現し、以下２番目、３番目と大きくなるにしたがって、それまで学習した類似度を前提条件として、これまで捉えきれなかった更に詳細な類似度を学習していくという仕組みを得ることができるからである。 Further, according to the word vector learning device according to the embodiment of the present invention, optimization is performed in order from d = 1 to d = D in order to give meaning to each dimension of the vector. In this way, the first element of each vector expresses the strongest similarity by processing in order from the first dimension in this way. This is because it is possible to obtain a mechanism of learning a more detailed similarity that could not be grasped so far, using the similarity as a precondition.

また、学習済みの単語ベクトルに対して、文書データが増加するか、あるいは更新された際に再学習する場合、学習済み単語ベクトルを初期値として学習を継続することで、差分情報を各次元毎に追加していく形式で学習が行える。よって、再学習前と後の単語ベクトルの変更分は、差分情報から得られた値の変化となる。これは、仮にデータの差分に影響を受けなかった場合は値も変化しないことを意味する。この性質により、従来のように、文書データが増加した際に初めから再学習する場合と比較して、再学習前後の単語ベクトルが文書データの差分情報により更新され、以前の情報を保持し続けるといったメリットを得ることができる。 In addition, when the document data increases or is updated with respect to the learned word vector, the learning is continued with the learned word vector as an initial value, so that the difference information is obtained for each dimension. You can learn in the form of adding to the. Therefore, the change in the word vector before and after the relearning is a change in the value obtained from the difference information. This means that the value does not change if the data difference is not affected. Due to this property, the word vector before and after the re-learning is updated with the difference information of the document data and keeps the previous information as compared with the conventional case where the re-learning from the beginning when the document data increases. Can be obtained.

また、単語ベクトルの次元数を増加したい場合でも、処理を各次元毎の最適化に分解していることから、容易に対応可能である、後付けの形で、ベクトルの次元数をＤからＤ＋１へ増加し、ｄ＝Ｄと設定し、逐次次元を増やしながら学習ということも容易に行える。更に、全く単語ベクトルが存在しない状態から学習を始める場合には、Ｄ＝１と設定し１次元のベクトルから処理を始めればよく、特に事前に単語ベクトルの次元数を決定しなくても、学習を開始することが可能である。 Further, even if it is desired to increase the number of dimensions of the word vector, since the processing is decomposed into optimization for each dimension, the number of dimensions of the vector can be changed from D to D + 1 in a retrofit form that can be easily handled. Increased, d = D is set, and learning can be easily performed while increasing the dimensionality sequentially. Furthermore, when learning is started from a state in which there is no word vector, it is only necessary to set D = 1 and start processing from a one-dimensional vector, especially without determining the number of dimensions of the word vector in advance. It is possible to start.

このように、本発明を用いることで従来法と異なり、単語数や次元数が継続的に増加するような状況下でも分散意味表現の学習が可能となる。また、得られた単語ベクトルの前回からの差分には依存関係が保持することが可能である。これによって、分散意味表現を利用したシステムの再構築を行う際に、システムパラメタ等を最初から全て再推定しなくてもよくなり、システムメンテナンスのコストを大幅に下げられるという利点がある。 As described above, by using the present invention, unlike the conventional method, it is possible to learn distributed semantic expressions even in a situation where the number of words and the number of dimensions continuously increase. In addition, it is possible to maintain a dependency relationship with the difference from the previous word vector obtained. As a result, when the system is reconstructed using the distributed semantic representation, it is not necessary to re-estimate all the system parameters from the beginning, and there is an advantage that the cost of system maintenance can be greatly reduced.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上述した実施の形態では、次元数増加のシグナルを受け付けて次元数ＤをＤ＋１に更新して、増加させた次元を対象次元として単語ベクトル及び文脈ベクトルを学習する場合を例に説明したが、これに限定されるものではなく、次元数を任意の二以上の数だけに増加させるシグナルを受け付けて、増加させた分の次元を順番に対象次元として単語ベクトル及び文脈ベクトルを学習するようにしてもよい。 For example, in the above-described embodiment, a case has been described in which the signal of increasing the number of dimensions is received, the number of dimensions D is updated to D + 1, and the word vector and the context vector are learned using the increased dimension as the target dimension. However, the present invention is not limited to this, and a signal for increasing the number of dimensions to an arbitrary two or more is accepted, and word vectors and context vectors are learned using the increased dimensions as target dimensions in order. May be.

１０、２１０入力部
２０、２２０演算部
２６文書データ更新部
２８次元数増加部
３０反復最適化部
３２文脈ベクトル最適化部
３４単語ベクトル最適化部
３６反復判定部
４０ベクトル記憶部
５０、２５０出力部
１００単語ベクトル学習装置
２００自然言語処理装置
２３０自然言語処理部
２３２置換部
２３４翻訳部
２４０ベクトル記憶部 10, 210 Input unit 20, 220 Calculation unit 26 Document data update unit 28 Number of dimension increase unit 30 Iterative optimization unit 32 Context vector optimization unit 34 Word vector optimization unit 36 Iteration determination unit 40 Vector storage unit 50, 250 Output unit 100 word vector learning device 200 natural language processing device 230 natural language processing unit 232 replacement unit 234 translation unit 240 vector storage unit

Claims

A word vector learning device that learns, for each word, a word vector related to the word and a context vector representing that the word appears as a context of another word based on document data,
Any dimension of the word vector and the context vector is set as the target dimension in order, the value of each dimension different from the target dimension of the word vector and the context vector for each of the words, and one word for the word pair is the other word An iterative optimization unit that repeatedly estimates a word vector for each of the words and a value of the target dimension of the context vector so as to optimize an objective function related to the target dimension based on the number of appearances of Word vector learning device including

The iterative optimization unit includes:
The value of the target dimension of the word vector for each of the words is fixed, the value of each dimension different from the target dimension of the word vector and the context vector for each of the words, and one word for the word pair of the other word A context vector optimization unit for estimating a value of a target dimension of a context vector for each of the words so as to optimize an objective function related to the target dimension based on the number of appearances as a context;
The value of the target dimension of the context vector for each of the words is fixed, the value of each dimension different from the target dimension of the word vector and the context vector for each of the words, and one word for the word pair of the other word A word vector optimization unit that estimates a value of a target dimension of a word vector for each of the words so as to optimize an objective function related to the target dimension based on the number of times it appears as context;
An iterative determination unit that alternately repeats the estimation by the context vector optimization unit and the estimation by the word vector optimization unit until a predetermined iteration end condition is satisfied,
The word vector learning device according to claim 1, wherein any one of the word vector and the context vector is set as a target dimension in order, and the processing by the context vector optimization unit, the word vector optimization unit, and the iterative determination unit is repeated. .

When the iterative optimization unit learns the word vector and context vector by increasing the number of dimensions with respect to the learned word vector and context vector, any dimension of the word vector and context vector that has not been learned , In turn, based on the value of each dimension different from the target dimension of the word vector and context vector for each of the words, and the number of times one word has appeared as the context of the other word for the word pair, The word vector learning device according to claim 1, wherein the estimation of the value of the target dimension of the word vector and the context vector for each of the words is repeated so as to optimize the objective function related to the target dimension.

The iterative optimization unit optimizes the objective function related to the target dimension d represented by the following equation (2) according to the following equation (1), and the word vector r ′ i for each word _i and word j word vector learning device context vector _c'j repeatedly to estimate the value of the target dimension d of claim 1, wherein for each.

_However, r _{i, d} is the elements of the dimension d of the word vectors _r'i words _i, c _{j, d} represents the elements of the dimension d of the context vectors _c'j words _{j, ^} r _{i, d} is _{r i} , _D , ^ c _j , _d represents the estimation result of c _j , _d , and b _{i, j, d} = −r ′ _i ^(d) · c ′ _j ^(d) + log (X _{i, j} ) , R ′ _i ^(d) represents a word vector obtained by replacing the element of the dimension d of the word vector r ′ _i of the word _i with 0, and c ′ _j ^(d) represents the dimension of the word vector c ′ _j of the word _j . A context vector in which the element of d is replaced with 0 is represented, and X _{i, j} represents the number of times that the word j appears as a context with respect to the word i.

Meaning between words based on the word vector using the word vector of each word learned by the word vector learning device according to any one of claims 1 to 4 with respect to the inputted input document A natural language processing apparatus including a natural language processing unit that performs natural language processing based on a similar degree of similarity.

A word vector learning method in a word vector learning device that learns, for each word, a word vector related to the word and a context vector representing that the word appears as a context of another word based on document data,
The iterative optimization unit sequentially sets any dimension of the word vector and the context vector as a target dimension, and each of the word vector and context vector has a dimension value different from the target dimension and one of the word pairs. Estimating the value of the word vector and the target dimension of the context vector for each of the words so as to optimize the objective function for the target dimension based on the number of times the word of the word appears as the context of the other word The word vector learning method including the step of repeating.

The step of repeating the estimation by the iterative optimization unit includes:
The context vector optimization unit fixes the value of the target dimension of the word vector for each of the words, and the word vector for each of the words, the value of each dimension different from the target dimension of the context vector, and one of the word pairs Estimating the value of the target dimension of the context vector for each of the words so as to optimize the objective function for the target dimension based on the number of times the word of
The word vector optimization unit fixes the value of the target dimension of the context vector for each of the words, the word vector for each of the words, the value of each dimension different from the target dimension of the context vector, and one of the word pairs Estimating the value of the target dimension of the word vector for each of the words so as to optimize the objective function for the target dimension based on the number of times that the word has appeared as the context of the other word;
The iterative determination unit alternately repeats the estimation by the context vector optimization unit and the estimation by the word vector optimization unit until a predetermined iteration end condition is satisfied, and
A process in which any one of the word vector and the context vector is sequentially set as a target dimension, the context vector optimization unit estimates, the word vector optimization unit estimates, and the iteration determination unit repeats the process. Item 7. The word vector learning method according to Item 6.

The program for functioning a computer as each part which comprises the word vector learning apparatus of any one of Claims 1-4, or the natural language processing apparatus of Claim 5.