JP2009075795A

JP2009075795A - Machine translation apparatus, machine translation method, and program

Info

Publication number: JP2009075795A
Application number: JP2007243251A
Authority: JP
Inventors: Hiroshi Yamamoto; 博史山本; Hideo Okuma; 英男大熊; Eiichiro Sumida; 英一郎隅田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2007-09-20
Filing date: 2007-09-20
Publication date: 2009-04-09

Abstract

【課題】学習なしに構文情報を考慮した機械翻訳を実現する機械翻訳装置を提供する。
【解決手段】原言語の翻訳対象テキストが記憶される翻訳対象テキストデータ記憶部１１、翻訳モデルが記憶される翻訳モデル情報記憶部１２、翻訳モデルを用いて翻訳対象テキストを統計的に機械翻訳する機械翻訳部１３、翻訳後テキストを蓄積する翻訳後テキストデータ蓄積部１４、翻訳対象テキストを構文解析して木構造情報を得る構文解析部１５、木構造情報を蓄積する木構造情報蓄積部１６、翻訳後テキストを、木構造情報の示す木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できるか判断する判断部１８、木構造のリーフの変換とノードの入れ替えによって実現できると判断部１８が判断した翻訳後テキストを選択する選択部１９、選択結果を出力する出力部２０を備える。
【選択図】図１A machine translation device that realizes machine translation in consideration of syntax information without learning is provided.
A translation target text data storage unit for storing a translation target text in a source language, a translation model information storage unit for storing a translation model, and statistically machine-translating the translation target text using the translation model. A machine translation unit 13; a post-translation text data storage unit 14 that stores post-translation text; a syntax analysis unit 15 that parses the translation target text to obtain tree structure information; a tree structure information storage unit 16 that stores tree structure information; Judgment unit 18 for judging whether the translated text can be realized by converting the leaf of the tree structure indicated by the tree structure information from the original language to the target language and replacing the nodes of the tree structure; A selection unit 19 for selecting the post-translation text determined by the determination unit 18 and an output unit 20 for outputting the selection result.
[Selection] Figure 1

Description

本発明は、統計的機械翻訳を行う機械翻訳装置等に関する。 The present invention relates to a machine translation apparatus that performs statistical machine translation.

近年、機械翻訳として統計翻訳（ＳＭＴ）、特にフレーズベース統計翻訳（ＰＢＳＭＴ）が広く使われはじめている（例えば、非特許文献１等）。ＰＢＳＭＴにおける最も大きな問題点の一つとして、フレーズの並び替え（特に対極的な）がある。その理由は、ＰＢＳＭＴにおける並び替えモデルは、単に並べ替えの際に何単語先（後）に移動させるかの距離に依存したモデルであるためである。 In recent years, statistical translation (SMT), particularly phrase-based statistical translation (PBSMT), has begun to be widely used as machine translation (for example, Non-Patent Document 1). One of the biggest problems in PBSMT is the rearrangement of phrases (especially opposite). The reason is that the rearrangement model in PBSMT is a model that depends on the number of words ahead (after) to be moved when rearranging.

この問題を解決するために、構文情報を統計翻訳に導入する試みが数多くなされてきた。これらの試みは用いる翻訳原言語、翻訳先言語の構文情報のうち、どれを用いるかで大きく三つに分類される。一つ目は、翻訳原言語、翻訳先言語双方の構文情報を用いるもので、ｔｒｅｅ−ｔｏ−ｔｒｅｅ翻訳と呼ばれる。二つ目は、翻訳先言語のみの構文情報を用いるもので、ｓｔｒｉｎｇ−ｔｏ−ｔｒｅｅ翻訳と呼ばれる。三つ目は、ｔｒｅｅ−ｔｏ−ｓｔｒｉｎｇ翻訳と呼ばれ、翻訳原言語のみの構文情報を用いる。これらの手法のように、構文情報を用いることで統計翻訳の性能を向上させることができる。
ＤａｎｉｅｌＭａｒｃｕ，ＷｉｌｌｉａｍＷｏｎｇ、「Ａｐｈｒａｓｅ−ｂａｓｅｄ，ｊｏｉｎｔｐｒｏｂａｂｉｌｉｔｙｍｏｄｅｌｆｏｒｓｔａｔｉｓｔｉｃａｌｍａｃｈｉｎｅｔｒａｎｓｌａｔｉｏｎ」、Ｐｒｏｃ．ＥＭＮＬＰ−２００２，ｐｐ．１３３−１３９、２００２年 In order to solve this problem, many attempts have been made to introduce syntax information into statistical translation. These attempts are roughly classified into three types depending on which one of the source language and target language syntax information to be used. The first uses syntactic information of both the translation source language and the translation destination language, and is called tree-to-tree translation. The second method uses syntax information of only the translation destination language, and is called string-to-tree translation. The third is called tree-to-string translation, and uses syntax information of only the source language. Like these methods, the performance of statistical translation can be improved by using syntax information.
Daniel Marcu, William Wong, “A phrase-based, joint property model for statistical machine translation”, Proc. EMNLP-2002, pp. 133-139, 2002

しかしながら、構文情報をモデルに導入することによって、学習すべきパラメータの数は増大する。特にｔｒｅｅ−ｔｏ−ｔｒｅｅ翻訳では、そのことが顕著になる。パラメータ数の増大は統計翻訳の訓練データであるパラレルコーパスに対し、質と量の両面での要求が大きくなることを意味している。質の面では、そもそもデータスパースネスの問題が厳しいＰＢＳＭＴに対し、さらに拍車をかけることになる。また、質の面から考えると、学習データたるパラレルコーパスの対訳文には原言語の構造が反映されている必要があるが、同一の意味でも何種類かの異なった構文に翻訳が可能である。これは、構造の反映のさせ方に必ずしも一貫性がないことを意味しており、これもまたデータスパースネスの問題にさらに拍車をかけることになる。 However, by introducing syntax information into the model, the number of parameters to learn increases. This is particularly true in tree-to-tree translation. The increase in the number of parameters means that the quality and quantity requirements for the parallel corpus, which is training data for statistical translation, will increase. In terms of quality, it will further spur PBSMT, which has a severe data sparseness problem in the first place. In terms of quality, the parallel language corpus, which is the learning data, must reflect the structure of the source language, but it can be translated into several different syntaxes with the same meaning. . This means that the way the structure is reflected is not necessarily consistent, which also adds to the data sparseness issue.

一般的に言うと、単語ベースやフレーズベースの統計翻訳に対する構文情報の導入は、モデルの表現能力を高めるものの、モデルパラメータの学習の面では深刻なデータスパースネスの問題を引き起こすという問題があった。 Generally speaking, the introduction of syntactic information for word-based and phrase-based statistical translation increases the ability of the model to express, but has the problem of causing serious data sparseness problems in terms of learning model parameters. .

本発明は、上記問題点を解決するためになされたものであり、このデータスパースネスの問題を回避するために、パラメータ学習の不要な構文情報モデルである木構造制約モデルを導入した機械翻訳装置等を提供することを目的とする。 The present invention has been made to solve the above problems, and in order to avoid this data sparseness problem, a machine translation apparatus in which a tree structure constraint model, which is a syntax information model that does not require parameter learning, is introduced. The purpose is to provide.

上記目的を達成するため、本発明による機械翻訳装置は、翻訳対象となる原言語のテキストデータである翻訳対象テキストデータが記憶される翻訳対象テキストデータ記憶部と、原言語から目的言語への翻訳で用いられる翻訳モデル情報が記憶される翻訳モデル情報記憶部と、前記翻訳モデル情報を用いて、前記翻訳対象テキストデータを統計的に機械翻訳する機械翻訳部と、前記機械翻訳部が翻訳対象テキストデータを機械翻訳した目的言語のテキストデータである翻訳後テキストデータを蓄積する翻訳後テキストデータ蓄積部と、前記翻訳対象テキストデータを構文解析することにより、前記翻訳対象テキストデータの木構造を示す情報である木構造情報を得る構文解析部と、前記木構造情報を蓄積する木構造情報蓄積部と、前記翻訳後テキストデータ蓄積部が蓄積した翻訳後テキストデータを、前記木構造情報蓄積部が蓄積した木構造情報の示す木構造であり、当該翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への変換と、当該木構造のノードの入れ替えによって実現できるかどうか判断する判断部と、前記木構造のリーフの原言語から目的言語への変換と、当該木構造のノードの入れ替えによって実現できると前記判断部が判断した前記翻訳後テキストデータを選択する選択部と、前記選択部による選択結果を出力する出力部と、を備えたものである。 In order to achieve the above object, a machine translation apparatus according to the present invention includes a translation target text data storage unit that stores translation target text data that is text data of a source language to be translated, and translation from the source language to the target language. A translation model information storage unit that stores translation model information used in the machine, a machine translation unit that statistically translates the text data to be translated using the translation model information, and a text to be translated by the machine translation unit Information indicating a tree structure of the text data to be translated by parsing the text data to be translated, and a text data storage unit for the text to be translated that accumulates the text data after translation which is text data of a target language obtained by machine translation of the data A syntax analysis unit for obtaining tree structure information, a tree structure information storage unit for storing the tree structure information, and the post-translation The post-translation text data stored by the text data storage unit is a tree structure indicated by the tree structure information stored by the tree structure information storage unit, and the tree structure obtained from the translation target text data corresponding to the post-translation text data A determination unit that determines whether or not it can be realized by converting the source language of the leaf of the tree into the target language, replacing the nodes of the tree structure, converting the source language of the leaf of the tree structure into the target language, and the tree structure A selection unit that selects the post-translation text data determined by the determination unit to be realized by exchanging the nodes, and an output unit that outputs a selection result by the selection unit.

このような構成により、木構造情報を用いた判断を行い、その判断結果に応じて、翻訳後テキストデータの選択を行うことによって、構文情報に関するパラメータ学習を行うことなく、構文情報に関する制限を導入した機械翻訳を実現することが可能となる。したがって、学習データに関するデータスパースネスの問題を生じないようにすることができる。また、構文情報に関する学習を行わなくてよいため、構文情報に関する学習を行う場合に比べて、翻訳モデル情報の生成に関する処理負荷が軽いものとなり、さらに、翻訳モデル情報のデータ量も小さくすることができる。また、木構造情報を用いた判断を行うことによって、木構造制約モデルを導入した機械翻訳を実現することができ、翻訳結果の精度を向上させることができうる。 With this configuration, a decision using tree structure information is made, and post-translation text data is selected according to the decision result, thereby introducing restrictions on syntax information without performing parameter learning on syntax information. Machine translation can be realized. Therefore, it is possible to prevent a problem of data sparseness related to learning data. In addition, since learning about syntax information is not required, the processing load related to generation of translation model information is lighter than when learning about syntax information is performed, and the data amount of translation model information may be reduced. it can. Also, by making a decision using the tree structure information, machine translation with the tree structure constraint model introduced can be realized, and the accuracy of the translation result can be improved.

また、本発明による機械翻訳装置では、前記機械翻訳部は、単語ベースの統計的機械翻訳を行うものであり、前記判断部は、前記翻訳後テキストデータ蓄積部が蓄積した翻訳後テキストデータを、前記木構造情報蓄積部が蓄積した木構造情報の示す木構造であり、当該翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への単語単位の変換と、当該木構造のノードの入れ替えによって実現できるかどうか判断してもよい。
このような構成により、単語ベースの統計的機械翻訳の際に、木構造情報に関する制約を適切に導入することができる。 In the machine translation device according to the present invention, the machine translation unit performs a word-based statistical machine translation, and the determination unit stores the post-translation text data accumulated by the post-translation text data accumulation unit, The tree structure information is stored in the tree structure information stored by the tree structure information storage unit, and the word unit from the source language of the leaf of the tree structure obtained from the text data to be translated corresponding to the translated text data to the target language. It may be determined whether it can be realized by conversion and replacement of nodes of the tree structure.
With such a configuration, it is possible to appropriately introduce restrictions on tree structure information during word-based statistical machine translation.

また、本発明による機械翻訳装置では、前記機械翻訳部は、フレーズベースの統計的機械翻訳を行うものであり、前記判断部は、前記翻訳後テキストデータ蓄積部が蓄積した翻訳後テキストデータを、前記木構造情報蓄積部が蓄積した木構造情報の示す木構造であり、当該翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語へのフレーズ単位の変換を含む変換と、フレーズベースの統計的機械翻訳で翻訳される単位であるフレーズを分割することのない、当該木構造のノードの入れ替えによって実現できるかどうか判断してもよい。
このような構成により、フレーズベースの統計的機械翻訳の際に、木構造情報に関する制約を適切に導入することができる。 In the machine translation device according to the present invention, the machine translation unit performs phrase-based statistical machine translation, and the determination unit stores the post-translation text data accumulated by the post-translation text data accumulation unit, It is a tree structure indicated by the tree structure information accumulated by the tree structure information accumulation unit, and the phrase unit from the source language of the leaf of the tree structure obtained from the text data to be translated corresponding to the translated text data to the target language It may be determined whether or not the conversion can be realized by replacing the node of the tree structure without dividing the phrase that is a unit translated by the phrase-based statistical machine translation and the conversion including the conversion.
With such a configuration, it is possible to appropriately introduce restrictions on tree structure information in the phrase-based statistical machine translation.

また、本発明による機械翻訳装置では、前記機械翻訳部は、翻訳先でのレフト・トゥ・ライトの機械翻訳を行うものであり、前記翻訳後テキストデータは、翻訳途中のテキストデータを含むものであり、前記木構造情報蓄積部が蓄積した木構造情報の示す木構造における各サブツリーを、当該サブツリーが、未翻訳のリーフのみを含むか、あるいは、未翻訳のサブツリーのみからなることを示す未翻訳と、当該サブツリーが、翻訳済のリーフのみを含むか、あるいは、翻訳済のサブツリーのみからなることを示す翻訳済と、当該サブツリーが、翻訳済と未翻訳のリーフのみを含むか、未翻訳と翻訳済のサブツリーのみからなるか、あるいは、１個だけ翻訳中のサブツリーを含むことを示す翻訳中とに分類する分類部をさらに備え、前記判断部は、前記分類部による分類結果を用いて、２個以上の翻訳中のサブツリーを含むサブツリーが出現した場合に、前記翻訳後テキストデータを、前記木構造のリーフの原言語から目的言語への変換と、当該木構造のノードの入れ替えによって実現できないと判断し、前記機械翻訳部は、前記翻訳後テキストデータに新たな翻訳後の目的言語のテキストを追加する際に、当該テキストの追加後の翻訳後テキストデータが前記選択部によって選択されるように追加してもよい。 In the machine translation apparatus according to the present invention, the machine translation unit performs left-to-right machine translation at a translation destination, and the post-translation text data includes text data being translated. Yes, each subtree in the tree structure indicated by the tree structure information stored by the tree structure information storage unit is untranslated indicating that the subtree includes only untranslated leaves or consists only of untranslated subtrees. Translated to indicate that the subtree contains only translated leaves, or consists only of translated subtrees, and the subtree contains only translated and untranslated leaves, The determination unit further comprises a classification unit that classifies only a translated subtree or classifies that the translation includes only one translated subtree. When the subtree including two or more subtrees under translation appears using the classification result by the classification unit, the translated text data is converted from the source language of the leaf of the tree structure to the target language. , The machine translation unit determines that the post-translation after the addition of the text when adding a new post-translation target language text to the post-translation text data. You may add so that text data may be selected by the said selection part.

このような構成により、翻訳先でのレフト・トゥ・ライトの機械翻訳の際に、木構造情報に関する制約を適切に導入した機械翻訳を行うことができるようになる。 With such a configuration, it becomes possible to perform machine translation in which restrictions relating to tree structure information are appropriately introduced during left-to-right machine translation at the translation destination.

また、本発明による機械翻訳装置では、前記機械翻訳部は、前記翻訳対象テキストデータに対応する複数の翻訳後テキストデータを生成するものであり、前記選択部は、複数の翻訳後テキストデータから、前記木構造のリーフの原言語から目的言語への変換と、当該木構造のノードの入れ替えによって実現できると前記判断部が判断した１または２以上の翻訳後テキストデータを選択してもよい。 In the machine translation device according to the present invention, the machine translation unit generates a plurality of post-translation text data corresponding to the text data to be translated, and the selection unit generates a plurality of post-translation text data, One or two or more post-translation text data determined by the determination unit to be realized by converting the leaf of the tree structure from the source language to the target language and replacing the nodes of the tree structure may be selected.

このような構成により、あらかじめ翻訳された複数の翻訳後テキストデータから、木構造情報の制約にあう適切なものを選択することができ、その選択したものを最終的な翻訳結果とすることができる。 With such a configuration, it is possible to select an appropriate one that satisfies the constraints of the tree structure information from a plurality of post-translation text data that has been translated in advance, and the selected one can be used as a final translation result. .

本発明による機械翻訳装置等によれば、構文情報に関するパラメータ学習を行うことなく、構文情報に関する制限を導入した機械翻訳を実現することが可能となる。 According to the machine translation apparatus and the like according to the present invention, it is possible to realize machine translation in which restrictions on syntax information are introduced without performing parameter learning on syntax information.

以下、本発明による機械翻訳装置について、実施の形態を用いて説明する。なお、以下の実施の形態において、同じ符号を付した構成要素及びステップは同一または相当するものであり、再度の説明を省略することがある。 Hereinafter, a machine translation apparatus according to the present invention will be described using embodiments. In the following embodiments, components and steps denoted by the same reference numerals are the same or equivalent, and repetitive description may be omitted.

（実施の形態１）
本発明の実施の形態１による機械翻訳装置について、図面を参照しながら説明する。本実地の形態による機械翻訳装置は、統計的機械翻訳において、木構造を用いた制限を導入したものである。 (Embodiment 1)
A machine translation apparatus according to Embodiment 1 of the present invention will be described with reference to the drawings. The machine translation apparatus according to the present embodiment introduces a restriction using a tree structure in statistical machine translation.

図１は、本実施の形態による機械翻訳装置１の構成を示すブロック図である。本実施の形態による機械翻訳装置１は、翻訳対象テキストデータ記憶部１１と、翻訳モデル情報記憶部１２と、機械翻訳部１３と、翻訳後テキストデータ蓄積部１４と、構文解析部１５と、木構造情報蓄積部１６と、分類部１７と、判断部１８と、選択部１９と、出力部２０とを備える。 FIG. 1 is a block diagram showing a configuration of a machine translation apparatus 1 according to this embodiment. The machine translation apparatus 1 according to the present embodiment includes a translation target text data storage unit 11, a translation model information storage unit 12, a machine translation unit 13, a post-translation text data storage unit 14, a syntax analysis unit 15, a tree The structure information storage unit 16, the classification unit 17, the determination unit 18, the selection unit 19, and the output unit 20 are provided.

翻訳対象テキストデータ記憶部１１では、翻訳対象となる原言語のテキストデータである翻訳対象テキストデータが記憶される。この翻訳対象テキストデータが、後述する機械翻訳部１３によって機械翻訳されることになる。したがって、翻訳対象テキストデータ記憶部１１には、機械翻訳を行いたいデータとしての翻訳対象テキストデータを蓄積しておくことになる。その翻訳対象テキストデータは、例えば、翻訳対象となる１文のテキストデータであってもよく、あるいは、一連のまとまりのある複数の文のテキストデータ（例えば、ビジネスレターや書籍などのテキストデータ）であってもよい。 The translation target text data storage unit 11 stores translation target text data that is text data of the source language to be translated. This translation target text data is machine-translated by a machine translation unit 13 described later. Accordingly, the translation target text data storage unit 11 stores translation target text data as data to be machine translated. The text data to be translated may be, for example, text data of one sentence to be translated, or text data of a plurality of sentences having a series of coordinates (for example, text data such as business letters and books). There may be.

翻訳対象テキストデータ記憶部１１に情報が記憶される過程は問わない。例えば、記録媒体を介して翻訳対象テキストデータが翻訳対象テキストデータ記憶部１１で記憶されるようになってもよく、通信回線等を介して送信された翻訳対象テキストデータが翻訳対象テキストデータ記憶部１１で記憶されるようになってもよく、あるいは、入力デバイスを介して入力された翻訳対象テキストデータが翻訳対象テキストデータ記憶部１１で記憶されるようになってもよい。翻訳対象テキストデータ記憶部１１での記憶は、外部のストレージデバイス等から読み出した翻訳対象テキストデータのＲＡＭ等における一時的な記憶でもよく、あるいは、長期的な記憶でもよい。翻訳対象テキストデータ記憶部１１は、所定の記録媒体（例えば、半導体メモリや磁気ディスク、光ディスクなど）によって実現されうる。 The process in which information is stored in the translation target text data storage unit 11 does not matter. For example, the translation target text data may be stored in the translation target text data storage unit 11 via a recording medium, and the translation target text data transmitted via a communication line or the like is stored in the translation target text data storage unit. 11 may be stored, or the text data to be translated input via the input device may be stored in the text data storage unit 11 to be translated. The storage in the translation target text data storage unit 11 may be temporary storage in the RAM or the like of translation target text data read from an external storage device or the like, or may be long-term storage. The translation target text data storage unit 11 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.).

翻訳モデル情報記憶部１２では、原言語から目的言語への翻訳で用いられる翻訳モデル情報が記憶される。ここで、翻訳モデル情報とは、対訳コーパスに含まれる原言語の翻訳単位と、目的言語の翻訳単位と、確率とを対応付けて有する情報である。なお、目的言語の翻訳単位とは、その目的言語の翻訳単位が対応付けられている原言語の翻訳単位と対訳関係にある目的言語の翻訳単位であり、対訳コーパスに含まれる翻訳単位である。また、確率とは、その確率が対応付けられている原言語の翻訳単位と目的言語の翻訳単位とに関する確率である。翻訳単位とは、例えば、単語や形態素、フレーズ等である。また、フレーズとは、言語学的なフレーズではなく、数単語からなる単語列のことである。原言語のＡと目的言語のＢとが対訳関係にあるとは、原言語のＡを目的言語に翻訳したものがＢになるか、あるいは、その逆の関係を有することを言う。また、原言語の翻訳単位と目的言語の翻訳単位とに関する確率とは、例えば、原言語の翻訳単位が与えられたときの目的言語の翻訳単位の確率や、目的言語の翻訳単位が与えられたときの原言語の翻訳単位の確率等である。翻訳モデル情報を生成する方法はすでに知られており、その説明を省略する。翻訳モデル情報は、対訳コーパスを用いて生成される。 The translation model information storage unit 12 stores translation model information used for translation from the source language to the target language. Here, the translation model information is information having a translation unit of the source language included in the bilingual corpus, a translation unit of the target language, and a probability associated with each other. The translation unit of the target language is a translation unit of the target language having a translation relationship with the translation unit of the source language associated with the translation unit of the target language, and is a translation unit included in the bilingual corpus. The probability is a probability relating to the source language translation unit and the target language translation unit with which the probability is associated. The translation unit is, for example, a word, a morpheme, a phrase, or the like. The phrase is not a linguistic phrase but a word string composed of several words. The fact that the source language A and the target language B have a parallel translation relationship means that the source language A translated into the target language becomes B or vice versa. The probabilities regarding the source language translation unit and the target language translation unit include, for example, the probability of the target language translation unit when the source language translation unit is given, and the target language translation unit. Such as the probability of the source language translation unit. A method for generating translation model information is already known, and a description thereof will be omitted. The translation model information is generated using a bilingual corpus.

翻訳モデル情報記憶部１２に情報が記憶される過程は問わない。例えば、記録媒体を介して翻訳モデル情報が翻訳モデル情報記憶部１２で記憶されるようになってもよく、通信回線等を介して送信された翻訳モデル情報が翻訳モデル情報記憶部１２で記憶されるようになってもよく、あるいは、入力デバイスを介して入力された翻訳モデル情報が翻訳モデル情報記憶部１２で記憶されるようになってもよい。翻訳モデル情報記憶部１２での記憶は、外部のストレージデバイス等から読み出した翻訳モデル情報のＲＡＭ等における一時的な記憶でもよく、あるいは、長期的な記憶でもよい。翻訳モデル情報記憶部１２は、所定の記録媒体（例えば、半導体メモリや磁気ディスク、光ディスクなど）によって実現されうる。 The process in which information is stored in the translation model information storage unit 12 does not matter. For example, the translation model information may be stored in the translation model information storage unit 12 via a recording medium, and the translation model information transmitted via a communication line or the like is stored in the translation model information storage unit 12. Alternatively, the translation model information input via the input device may be stored in the translation model information storage unit 12. The translation model information storage unit 12 may store the translation model information read from an external storage device or the like temporarily in a RAM or the like, or may be a long-term storage. The translation model information storage unit 12 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.).

機械翻訳部１３は、翻訳モデル情報記憶部１２で記憶されている翻訳モデル情報を用いて、翻訳対象テキストデータ記憶部１１から読み出した翻訳対象テキストデータを統計的に機械翻訳する。この統計的機械翻訳の方法は、すでに公知であり、その詳細な説明を省略する。機械翻訳部１３は、例えば、単語ベースの統計的機械翻訳を行ってもよく、フレーズベースの統計的機械翻訳を行ってもよい。単語ベースの統計的機械翻訳の場合には、単語（あるいは、単語に類似する形態素等の場合もありうる）単位での原言語の文字列から目的言語の文字列への置き換えが行われることになる。一方、フレーズベースの統計的機械翻訳の場合には、フレーズ単位での原言語の文字列から目的言語の文字列への置き換えが行われることになる。なお、フレーズベースの統計的機械翻訳の場合であっても、単語単位での原言語の文字列から目的言語の文字列への置き換えも行われてもよい。機械翻訳部１３が単語ベースの統計的機械翻訳を行うのか、あるいは、フレーズベースの統計的機械翻訳を行うのかに応じて、それに適した翻訳モデル情報を翻訳モデル情報記憶部１２に蓄積しておき、その翻訳モデル情報を用いることが好適である。 The machine translation unit 13 statistically machine translates the translation target text data read from the translation target text data storage unit 11 using the translation model information stored in the translation model information storage unit 12. This statistical machine translation method is already known and will not be described in detail. The machine translation unit 13 may perform, for example, a word-based statistical machine translation or a phrase-based statistical machine translation. In the case of word-based statistical machine translation, the source language character string is replaced with the target language character string in units of words (or possibly morphemes similar to words). Become. On the other hand, in the case of the phrase-based statistical machine translation, the replacement of the source language character string to the target language character string is performed in phrase units. Note that even in the case of phrase-based statistical machine translation, replacement of a source language character string with a target language character string in units of words may be performed. Depending on whether the machine translation unit 13 performs word-based statistical machine translation or phrase-based statistical machine translation, appropriate translation model information is accumulated in the translation model information storage unit 12. It is preferable to use the translation model information.

また、機械翻訳部１３は、翻訳先でのレフト・トゥ・ライト（ｔａｒｇｅｔｓｉｄｅｌｅｆｔ−ｔｏ−ｒｉｇｈｔ）の機械翻訳を行うものであってもよく、そうでなくてもよい。本実施の形態では、前者の場合について主に説明する。なお、機械翻訳部１３は、翻訳先でのレフト・トゥ・ライトの機械翻訳を行う場合に、翻訳後テキストデータに新たな翻訳後の目的言語のテキストを追加する際に、そのテキストの追加後の翻訳後テキストデータが選択部１９によって選択されるように追加する。選択部１９による選択については後述する。また、機械翻訳部１３が翻訳先でのレフト・トゥ・ライトの機械翻訳を行う場合には、翻訳後テキストデータは、翻訳途中のテキストデータを含んでもよい。翻訳先でのレフト・トゥ・ライトの機械翻訳では、翻訳先（目的言語）において１語ずつ翻訳された言葉を足していくことになるため、翻訳途中のテキストデータであっても、その翻訳の時点では、翻訳後のテキストデータとなりうるからである。ここで、翻訳途中のテキストデータとは、翻訳対象テキストデータに完全に対応していないテキストデータである。すなわち、翻訳途中のテキストデータは、翻訳対象テキストデータの一部の翻訳後のテキストデータとなる。また、機械翻訳部１３は、翻訳先でのレフト・トゥ・ライトの機械翻訳を行わない場合に、翻訳対象テキストデータに対応する複数の翻訳後テキストデータ（例えば、一の原言語の文に対応する複数の目的言語の文）を生成してもよい。 In addition, the machine translation unit 13 may or may not perform a target side left-to-right machine translation at the translation destination. In the present embodiment, the former case will be mainly described. When the machine translation unit 13 performs a left-to-right machine translation at the translation destination, when adding a new translated target language text to the translated text data, The post-translation text data is added to be selected by the selection unit 19. The selection by the selection unit 19 will be described later. Further, when the machine translation unit 13 performs left-to-right machine translation at the translation destination, the post-translation text data may include text data being translated. In left-to-right machine translation at the translation destination, words that are translated one by one at the translation destination (target language) are added, so even text data that is in the middle of translation This is because the text data after translation can be obtained. Here, the text data being translated is text data that does not completely correspond to the text data to be translated. In other words, the text data being translated is partly translated text data of the translation target text data. The machine translation unit 13 also supports a plurality of post-translation text data corresponding to the text data to be translated (for example, corresponding to a sentence in one source language) when left-to-right machine translation is not performed at the translation destination. A plurality of target language sentences) may be generated.

また、機械翻訳部１３は、翻訳モデル情報以外の情報を用いて機械翻訳を行ってもよい。例えば、翻訳モデル情報と共に、言語モデル情報を用いて機械翻訳を行ってもよい。言語モデル情報としては、例えば、Ｎグラム言語モデルを用いてもよい。その言語モデル情報は、図示しない記録媒体に記憶されており、機械翻訳部１３は、その図示しない記録媒体から言語モデル情報を読み出すことによって用いてもよい。 The machine translation unit 13 may perform machine translation using information other than translation model information. For example, machine translation may be performed using language model information together with translation model information. As language model information, for example, an N-gram language model may be used. The language model information is stored in a recording medium (not shown), and the machine translation unit 13 may use the language model information by reading the language model information from the recording medium (not shown).

翻訳後テキストデータ蓄積部１４は、翻訳後テキストデータを所定の記録媒体に蓄積する。翻訳後テキストデータとは、機械翻訳部１３が翻訳対象テキストデータを機械翻訳した目的言語のテキストデータである。したがって、翻訳対象テキストデータと翻訳後テキストデータとは、対訳関係を有することになる。また、翻訳後テキストデータ蓄積部１４が翻訳後テキストデータを蓄積する記録媒体は、例えば、半導体メモリや、光ディスク、磁気ディスク等であり、翻訳後テキストデータ蓄積部１４が有していてもよく、あるいは翻訳後テキストデータ蓄積部１４の外部（機械翻訳装置１の外部でもよい）に存在してもよい。また、この記録媒体は、翻訳後テキストデータを一時的に記憶するものであってもよく、そうでなくてもよい。また、機械翻訳部１３の処理と翻訳後テキストデータ蓄積部１４の処理とは、一体としてなされてもよい。例えば、機械翻訳部１３が翻訳後テキストデータを生成した時点で、その翻訳後テキストデータがメモリ等の記録媒体に蓄積されており、その蓄積が翻訳後テキストデータ蓄積部１４によってなされていてもよい。 The translated text data storage unit 14 stores the translated text data in a predetermined recording medium. The post-translation text data is text data in a target language in which the machine translation unit 13 machine-translates the text data to be translated. Therefore, the translation target text data and the translated text data have a parallel translation relationship. Further, the recording medium in which the post-translation text data storage unit 14 stores the post-translation text data is, for example, a semiconductor memory, an optical disk, a magnetic disk, etc., and the post-translation text data storage unit 14 may have, Alternatively, it may exist outside the post-translation text data storage unit 14 (may be outside the machine translation apparatus 1). Further, this recording medium may or may not store the post-translation text data temporarily. Moreover, the process of the machine translation part 13 and the process of the post-translation text data storage part 14 may be made integrally. For example, when the machine translation unit 13 generates post-translation text data, the post-translation text data may be stored in a recording medium such as a memory, and the storage may be performed by the post-translation text data storage unit 14. .

構文解析部１５は、翻訳対象テキストデータ記憶部１１から読み出した翻訳対象テキストデータを構文解析することにより、翻訳対象テキストデータの木構造を示す情報である木構造情報を得る。この木構造情報には、一般的な構文解析で得られるラベル（例えば、品詞や主語・述語等）は含まれていなくてもよい。木構造情報は、翻訳対象テキストデータの文ごとに構成されることが一般的であるが、そうでなくてもよい。すなわち、木構造情報は、例えば、文に含まれる単語や形態素等を、主語や述語、名詞句や動詞句等のラベルで構造化した構造を示す情報である（前述のように、そのラベル自体は木構造情報に含まれていなくてもよい）。この木構造情報は、一般に構文木として知られており、その詳細な説明を省略する。図２に、「Ｔｈｉｓｉｓａｐｅｎ．」の木構造情報の一例を示す図である。図２で示されるように、「ａ」と「ｐｅｎ」が名詞句を構成するため、一つのノードでひとくくりにされている。また、「ｉｓ」と「ａｐｅｎ」が動詞句を形成するため、一つのノードでひとくくりにされている。また、「Ｔｈｉｓ」と「ｉｓａｐｅｎ」と「．」とが文を構成するため、一つのノードでひとくくりにされている。なお、構文木の場合には、図２の各ノードに「名詞句（ＮＰ）」「動詞句（ＶＰ）」等のラベルの付されることが一般的であるが、前述のように、本実施の形態による機械翻訳装置１では、そのラベルを用いないため、木構造情報は、そのラベルに関する情報を含んでいなくてもよい。なお、図２で示されるような木構造において、最下位の階層のもの（それより下の階層の子を持たないもの）をリーフ（葉）と呼ぶことがある。図２では、「Ｔｈｉｓ」「ｉｓ」等がリーフである。 The syntax analysis unit 15 parses the translation target text data read from the translation target text data storage unit 11 to obtain tree structure information that is information indicating the tree structure of the translation target text data. This tree structure information does not need to include a label (for example, part of speech, subject, predicate, etc.) obtained by general syntax analysis. The tree structure information is generally configured for each sentence of the text data to be translated, but this need not be the case. In other words, the tree structure information is information indicating a structure in which words, morphemes, and the like included in a sentence are structured with labels such as a subject, a predicate, a noun phrase, a verb phrase, and the like (as described above, the label itself) May not be included in the tree structure information). This tree structure information is generally known as a syntax tree, and a detailed description thereof is omitted. FIG. 2 is a diagram illustrating an example of tree structure information of “This is a pen.”. As shown in FIG. 2, “a” and “pen” constitute a noun phrase and are grouped together in one node. Also, “is” and “a pen” are grouped together in one node to form a verb phrase. Further, since “This”, “is a pen”, and “.” Constitute a sentence, they are grouped together in one node. In the case of a syntax tree, labels such as “noun phrase (NP)” and “verb phrase (VP)” are generally attached to each node in FIG. 2, but as described above, Since the machine translation device 1 according to the embodiment does not use the label, the tree structure information may not include information on the label. Note that in the tree structure as shown in FIG. 2, the one in the lowest hierarchy (one that does not have a child below it) may be called a leaf. In FIG. 2, “This”, “is”, and the like are leaves.

また、図２を用いて木構造情報について説明したが、木構造情報のデータ構造は問わない。例えば、前述の「Ｔｈｉｓｉｓａｐｅｎ．」を構文解析器（パーザー）によって構文解析すると、（Ｓ１（Ｓ（ＮＰ（ＤＴＴｈｉｓ））（ＶＰ（ＡＵＸｉｓ）（ＮＰ（ＤＴａ）（ＮＮｐｅｎ）））（．．）））となる。ここから、ラベル（例えば、文を示すＳや、名詞句を示すＮＰ、動詞句を示すＶＰ等）を取り除くことによって、括弧付けされた文（ｂｒａｃｋｅｔｅｄｓｅｎｔｅｎｃｅ）である（（Ｔｈｉｓ）（（ｉｓ）（（ａ）（ｐｅｎ）））（．））が得られる。これは、図２の木構造情報と同じ構造を示している。したがって、木構造情報のデータ構造は、このような括弧付けされた文のデータ構造であってもよい。また、木構造情報の示す木構造の構成を表すことができる情報であれば、このようなデータ構造に限定されないことは言うまでもない。 Moreover, although the tree structure information has been described with reference to FIG. 2, the data structure of the tree structure information does not matter. For example, when the above-mentioned “This is a pen.” Is parsed by a syntax analyzer (parser), (S1 (S (NP (DT This)) (VP (AUX is) (NP (DT a) (NN pen) )) (..))). From this, by removing a label (for example, S indicating a sentence, NP indicating a noun phrase, VP indicating a verb phrase, etc.), it is a bracketed sentence ((This) ((is) ((A) (pen))) (.)) Is obtained. This shows the same structure as the tree structure information of FIG. Therefore, the data structure of the tree structure information may be a data structure of such a bracketed sentence. Needless to say, the data structure is not limited to the data structure as long as it can represent the configuration of the tree structure indicated by the tree structure information.

木構造情報蓄積部１６は、木構造情報を所定の記録媒体に蓄積する。この記録媒体は、例えば、半導体メモリや、光ディスク、磁気ディスク等であり、木構造情報蓄積部１６が有していてもよく、あるいは木構造情報蓄積部１６の外部（機械翻訳装置１の外部でもよい）に存在してもよい。また、この記録媒体は、木構造情報を一時的に記憶するものであってもよく、そうでなくてもよい。また、構文解析部１５の処理と木構造情報蓄積部１６の処理とは、一体としてなされてもよい。例えば、構文解析部１５が木構造情報を生成した時点で、その木構造情報がメモリ等の記録媒体に蓄積されており、その蓄積が木構造情報蓄積部１６によってなされていてもよい。 The tree structure information storage unit 16 stores the tree structure information in a predetermined recording medium. This recording medium is, for example, a semiconductor memory, an optical disk, a magnetic disk, or the like, and may be included in the tree structure information storage unit 16 or outside the tree structure information storage unit 16 (even outside the machine translation device 1). May be present). Further, this recording medium may or may not temporarily store the tree structure information. Further, the processing of the syntax analysis unit 15 and the processing of the tree structure information storage unit 16 may be performed integrally. For example, when the syntax analysis unit 15 generates the tree structure information, the tree structure information may be stored in a recording medium such as a memory, and the storage may be performed by the tree structure information storage unit 16.

分類部１７は、木構造情報蓄積部１６が蓄積した木構造情報の示す木構造における各サブツリーを分類する。この分類は、後述する判断部１８による判断のためになされるものである。より詳細には、分類部１７は、木構造における各サブツリーを、未翻訳と、翻訳済と、翻訳中とに分類する。未翻訳とは、分類対象のサブツリーが、未翻訳のリーフのみを含むか、あるいは、未翻訳のサブツリーのみからなることを示す。翻訳済とは、分類対象のサブツリーが、翻訳済のリーフのみを含むか、あるいは、翻訳済のサブツリーのみからなることを示す。翻訳中とは、分類対象のサブツリーが、翻訳済と未翻訳のリーフのみを含むか、未翻訳と翻訳済のサブツリーのみからなるか、あるいは、１個だけ翻訳中のサブツリーを含むことを示す。分類部１７が分類する対象となる木構造情報は、機械翻訳部１３によってレフト・トゥ・ライトの機械翻訳がなされている文に対応する木構造情報である。なお、分類部１７は、例えば、分類することができないサブツリー（例えば、２個以上の翻訳中のサブツリーを含むもの）を、分類しなくてもよく、あるいは、ＮＧに分類してもよい。また、分類部１７が分類した結果の情報は、図示しない記録媒体において、一時的に記憶されていてもよい。例えば、各サブツリーに対応付けられて、分類結果（未翻訳、翻訳中、翻訳済を示す情報）が蓄積されてもよく、あるいは、分類結果に対応付けられて、各サブツリーを識別する情報が蓄積されてもよい。 The classification unit 17 classifies each subtree in the tree structure indicated by the tree structure information accumulated by the tree structure information accumulation unit 16. This classification is performed for determination by the determination unit 18 described later. More specifically, the classification unit 17 classifies each subtree in the tree structure into untranslated, translated, and being translated. Untranslated means that the subtree to be classified includes only untranslated leaves or consists of only untranslated subtrees. “Translated” indicates that the subtree to be classified includes only translated leaves or consists only of translated subtrees. “Translated” indicates that the subtree to be classified includes only translated and untranslated leaves, or includes only untranslated and translated subtrees, or includes only one subtree being translated. The tree structure information to be classified by the classification unit 17 is tree structure information corresponding to a sentence that has been left-to-right machine translated by the machine translation unit 13. For example, the classification unit 17 may not classify a subtree that cannot be classified (for example, one that includes two or more subtrees being translated), or may classify the subtree into NG. Further, information on the result of classification by the classification unit 17 may be temporarily stored in a recording medium (not shown). For example, classification results (information indicating untranslated, being translated, and translated) may be accumulated in association with each subtree, or information identifying each subtree is accumulated in association with the classification result. May be.

判断部１８は、翻訳後テキストデータ蓄積部１４が蓄積した翻訳後テキストデータを、所定の処理によって実現できるかどうか判断する。所定の処理とは、木構造情報蓄積部１６が蓄積した木構造情報の示す木構造であり、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えである。実現できるかどうかは、そのような処理によって、翻訳後テキストデータを得ることができるかどうか、と言うことである。その所定の処理において、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えとの順序は問わない。また、この判断は、翻訳後テキストデータが木構造情報の示す木構造に関する制限を満たしているかどうかの判断である。したがって、判断部１８が、翻訳後テキストデータを、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できると判断した場合には、その翻訳後テキストデータは、木構造情報の示す木構造に関する制限（木構造に関する構文の制限）を満たしていると判断されたことになる。この判断は、例えば、前述の木構造のリーフの原言語から目的言語への変換と、その木構造のノードを入れ替えたものを可能な限り生成し、その生成したものの中に、翻訳後テキストデータが含まれるかどうかによって判断してもよい。含まれる場合には、判断部１８は、翻訳後テキストデータを所定の処理によって実現できると判断することになる。なお、この判断は、結果が同じになるのであれば、等価な他の判断条件によって行われてもよいことは言うまでもない。 The determination unit 18 determines whether the post-translation text data stored by the post-translation text data storage unit 14 can be realized by a predetermined process. The predetermined processing is a tree structure indicated by the tree structure information accumulated by the tree structure information accumulating unit 16, and is obtained from the original language of the leaf of the tree structure obtained from the text data to be translated corresponding to the post-translation text data. It is conversion to language and replacement of nodes of the tree structure. Whether it can be realized is whether post-translation text data can be obtained by such processing. In the predetermined process, the order of conversion from the original language of the leaf of the tree structure to the target language and the replacement of the nodes of the tree structure are not limited. In addition, this determination is a determination as to whether or not the translated text data satisfies the restriction on the tree structure indicated by the tree structure information. Therefore, when the determination unit 18 determines that the translated text data can be realized by converting the leaf of the tree structure from the original language to the target language and replacing the nodes of the tree structure, the translated text data Is determined to satisfy the restriction on the tree structure indicated by the tree structure information (syntax restriction on the tree structure). This determination is made, for example, by converting the tree-structured leaf source language into the target language and replacing the tree-structured nodes as much as possible, and in the generated text data after translation You may judge by whether it is included. If included, the determination unit 18 determines that the post-translation text data can be realized by a predetermined process. Needless to say, this determination may be made according to other equivalent determination conditions as long as the results are the same.

機械翻訳部１３が単語ベースの統計的機械翻訳を行う場合には、判断部１８は、翻訳後テキストデータ蓄積部１４が蓄積した翻訳後テキストデータを、木構造情報蓄積部１６が蓄積した木構造情報の示す木構造であり、その翻訳後テキストデータに対応する（すなわち、その翻訳後テキストデータと対訳関係にある）翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への単語単位の変換と、その木構造のノードの入れ替えによって実現できるかどうか判断してもよい。 When the machine translation unit 13 performs word-based statistical machine translation, the determination unit 18 uses the post-translation text data accumulated by the post-translation text data accumulation unit 14 and the tree structure accumulated by the tree structure information accumulation unit 16. The tree structure indicated by the information, from the source language of the leaf of the tree structure obtained from the text data to be translated corresponding to the translated text data (that is, in a parallel translation relationship with the translated text data) to the target language It may be determined whether it can be realized by conversion in units of words and replacement of nodes of the tree structure.

また、機械翻訳部１３がフレーズベースの統計的機械翻訳を行う場合には、判断部１８は、翻訳後テキストデータ蓄積部１４が蓄積した翻訳後テキストデータを、木構造情報蓄積部１６が蓄積した木構造情報の示す木構造であり、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語へのフレーズ単位の変換を含む変換と、フレーズベースの統計的機械翻訳で翻訳される単位であるフレーズを分割することのない、その木構造のノードの入れ替えによって実現できるかどうか判断してもよい。「フレーズ単位の変換を含む変換」とは、フレーズ単位以外の変換を含んでもよい意味である。フレーズ単位以外の変換とは、例えば、単語単位の変換である。 When the machine translation unit 13 performs the phrase-based statistical machine translation, the determination unit 18 accumulates the post-translation text data accumulated by the post-translation text data accumulation unit 14 by the tree structure information accumulation unit 16. A tree structure indicated by the tree structure information, including conversion of a phrase unit from the source language of the leaf of the tree structure obtained from the text data to be translated corresponding to the post-translation text data, and a phrase-based It may be determined whether or not the phrase that is a unit translated by statistical machine translation can be realized by exchanging nodes of the tree structure without dividing the phrase. “Conversion including phrase unit conversion” means that conversion other than phrase unit may be included. The conversion other than the phrase unit is, for example, conversion in word units.

また、本実施の形態のように、機械翻訳部１３がレフト・トゥ・ライトの機械翻訳を行うと共に、分類部１７による分類が行われる場合には、判断部１８は、分類部１７による分類結果を用いて、２個以上の翻訳中のサブツリーを含むサブツリーが出現した場合に、翻訳後テキストデータを、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できないと判断してもよい。そのように判断することで、上述の判断と同様のことを行うことになるからである。その理由については後述する。本実施の形態では、このように分類結果を用いて判断部１８が判断を行う場合について説明する。
また、判断部１８は、判断結果を選択部１９に渡してもよく、あるいは、図示しない記録媒体に一時的に記憶してもよい。 When the machine translation unit 13 performs left-to-right machine translation and classification by the classification unit 17 is performed as in the present embodiment, the determination unit 18 performs the classification result by the classification unit 17. When a subtree including two or more subtrees under translation appears, the translated text data is converted from the source language of the tree-structured leaf to the target language and the nodes of the tree-structure are replaced. It may be determined that it cannot be realized. This is because by making such a determination, the same determination as described above is performed. The reason will be described later. In the present embodiment, a case will be described in which the determination unit 18 makes a determination using the classification result.
The determination unit 18 may pass the determination result to the selection unit 19 or may temporarily store it in a recording medium (not shown).

選択部１９は、木構造のリーフの原言語から目的言語への変換と、当該木構造のノードの入れ替えによって実現できると判断部１８が判断した翻訳後テキストデータを選択する。すなわち、木構造情報の示す木構造に関する制限を満たしていると判断された翻訳後テキストデータが選択されることになる。この選択は、機械翻訳部１３による一連の翻訳（例えば、文単位で翻訳を行う場合には、一文の翻訳）が終了した後での選択であってもよく、機械翻訳部１３が機械翻訳を行っている途中における、次に続く翻訳後の単語や形態素、フレーズ等の選択であってもよい。前者の場合には、例えば、事後的に選択が行われるため、機械翻訳部１３によって翻訳対象テキストデータと対訳関係にある複数の翻訳後テキストデータが生成されることになる。そして、選択部１９は、その複数の翻訳後テキストデータから、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できると判断部１８が判断した１または２以上の翻訳後テキストデータを選択する。この選択される翻訳後テキストデータは、１個であってもよく、あるいは、２個以上であってもよい。また、後者の場合、すなわち、機械翻訳部１３が機械翻訳を行っている途中において、次に続く目的言語の翻訳単位（単語や形態素、フレーズ等）を選択する場合には、例えば、機械翻訳部１３が翻訳対象テキストデータを翻訳する際に、単語や形態素、フレーズごとの選択がなされるため、翻訳対象テキストデータと対訳関係にある翻訳後テキストデータは１個だけ生成されることになる。 The selection unit 19 selects the post-translation text data determined by the determination unit 18 that can be realized by converting the source language of the tree-structured leaf into the target language and replacing the nodes of the tree structure. That is, the post-translation text data determined to satisfy the restriction on the tree structure indicated by the tree structure information is selected. This selection may be a selection after a series of translations by the machine translation unit 13 (for example, translation of one sentence when translation is performed in sentence units), and the machine translation unit 13 performs machine translation. It may be the selection of the subsequent translated word, morpheme, phrase, etc. in the middle of going. In the former case, for example, since the selection is performed after the fact, the machine translation unit 13 generates a plurality of post-translation text data having a translation relation with the translation target text data. Then, the selection unit 19 determines that the determination unit 18 determines that the plurality of post-translation text data can be realized by converting the source language of the leaf of the tree structure to the target language and replacing the nodes of the tree structure. Select two or more post-translation text data. The selected post-translation text data may be one piece or two or more pieces. In the latter case, that is, when the machine translation unit 13 is performing machine translation, when selecting a translation unit (word, morpheme, phrase, etc.) of the next target language, for example, the machine translation unit When 13 translates the text data to be translated, selection is made for each word, morpheme, and phrase, so that only one post-translation text data that has a translation relationship with the text data to be translated is generated.

出力部２０は、選択部１９による選択結果を出力する。ここで、選択結果とは、選択部１９が選択した翻訳後テキストデータであってもよく、選択部１９がどの翻訳後テキストデータを選択したのかを示す情報であってもよい。後者の場合には、例えば、翻訳後テキストデータ蓄積部１４が翻訳後テキストデータを蓄積した記録媒体において、選択部１９が選択した翻訳後テキストデータに付与されるフラグ等が選択結果であってもよく、選択された翻訳後テキストデータを識別する情報が選択結果であってもよい。また、機械翻訳部１３がレフト・トゥ・ライトの機械翻訳を行う場合には、一つの文が翻訳されるまでに複数回の選択が行われることになるが、翻訳対象テキストデータと対訳関係にある翻訳後テキストデータが、その複数回の選択の結果を示す情報となりうる。 The output unit 20 outputs the selection result by the selection unit 19. Here, the selection result may be post-translation text data selected by the selection unit 19 or information indicating which post-translation text data the selection unit 19 has selected. In the latter case, for example, in the recording medium in which the post-translation text data storage unit 14 stores the post-translation text data, even if the flag or the like given to the post-translation text data selected by the selection unit 19 is the selection result. The information for identifying the selected post-translation text data may be the selection result. In addition, when the machine translation unit 13 performs left-to-right machine translation, multiple selections are made before a sentence is translated. Certain post-translation text data can be information indicating the result of the multiple selections.

ここで、この出力は、例えば、表示デバイス（例えば、ＣＲＴや液晶ディスプレイなど）への表示でもよく、所定の機器への通信回線を介した送信でもよく、プリンタによる印刷でもよく、スピーカによる音声出力でもよく、記録媒体への蓄積でもよく、他の構成要素への引き渡しでもよい。なお、出力部２０は、出力を行うデバイス（例えば、表示デバイスやプリンタなど）を含んでもよく、あるいは含まなくてもよい。また、出力部２０は、ハードウェアによって実現されてもよく、あるいは、それらのデバイスを駆動するドライバ等のソフトウェアによって実現されてもよい。 Here, the output may be, for example, display on a display device (for example, a CRT or a liquid crystal display), transmission via a communication line to a predetermined device, printing by a printer, or audio output by a speaker. Alternatively, it may be stored in a recording medium or delivered to another component. The output unit 20 may or may not include an output device (for example, a display device or a printer). The output unit 20 may be realized by hardware, or may be realized by software such as a driver that drives these devices.

なお、翻訳対象テキストデータ記憶部１１と、翻訳モデル情報記憶部１２と、翻訳後テキストデータ蓄積部１４が翻訳後テキストデータを蓄積する記録媒体と、木構造情報蓄積部１６が木構造情報を蓄積する記録媒体との任意の２以上の記憶部あるいは記録媒体は、同一の記録媒体によって実現されてもよく、あるいは、別々の記録媒体によって実現されてもよい。前者の場合には、例えば、翻訳対象テキストデータを記憶している領域が翻訳対象テキストデータ記憶部１１となり、翻訳モデル情報を記憶している領域が翻訳モデル情報記憶部１２となる。 The translation target text data storage unit 11, the translation model information storage unit 12, the post-translation text data storage unit 14 stores the post-translation text data, and the tree structure information storage unit 16 stores the tree structure information. Any two or more storage units or recording media with the recording medium to be recorded may be realized by the same recording medium, or may be realized by separate recording media. In the former case, for example, a region storing translation target text data is the translation target text data storage unit 11, and a region storing translation model information is the translation model information storage unit 12.

次に、本実施の形態による機械翻訳装置１の動作について、図３のフローチャートを用いて説明する。なお、この図３のフローチャートは、機械翻訳部１３がレフト・トゥ・ライトの機械翻訳を行う場合における機械翻訳装置１の動作を示すものである。また、図３のフローチャートは、一文を機械翻訳する処理について説明するためのものである。したがって、複数の文を連続して機械翻訳する場合には、図３のフローチャートで示される一連の処理をその文の数だけ繰り返して実行すればよい。 Next, the operation of the machine translation apparatus 1 according to this embodiment will be described using the flowchart of FIG. The flowchart of FIG. 3 shows the operation of the machine translation apparatus 1 when the machine translation unit 13 performs left-to-right machine translation. Also, the flowchart of FIG. 3 is for explaining the process of machine-translating a sentence. Therefore, when a plurality of sentences are continuously machine-translated, the series of processes shown in the flowchart of FIG. 3 may be repeated for the number of sentences.

（ステップＳ１０１）構文解析部１５は、翻訳対象テキストデータ記憶部１１から翻訳対象テキストデータを読み出し、その翻訳対象テキストデータを構文解析することによって、木構造情報を生成する。なお、構文解析部１５が読み出す翻訳対象テキストデータは、この後に、機械翻訳部１３によって機械翻訳される翻訳対象テキストデータである。 (Step S101) The syntax analysis unit 15 reads the translation target text data from the translation target text data storage unit 11, and parses the translation target text data to generate tree structure information. The translation target text data read by the syntax analysis unit 15 is translation target text data that is subsequently machine translated by the machine translation unit 13.

（ステップＳ１０２）木構造情報蓄積部１６は、構文解析部１５が生成した木構造情報を所定の記録媒体に蓄積する。
（ステップＳ１０３）機械翻訳部１３は、カウンタｉを１に設定する。 (Step S102) The tree structure information storage unit 16 stores the tree structure information generated by the syntax analysis unit 15 in a predetermined recording medium.
(Step S103) The machine translation unit 13 sets the counter i to 1.

（ステップＳ１０４）機械翻訳部１３は、翻訳先でのｉ番目の翻訳単位（例えば、単語や形態素、フレーズ等）を機械翻訳する。すなわち、翻訳先言語（目的言語）での翻訳後テキストデータを文頭から文末に延ばしていくために、ｉ番目の翻訳単位が翻訳されることになる。なお、このｉ番目の翻訳単位の翻訳の際には、複数の候補が翻訳されるものとする。この候補の個数は問わない。例えば、機械翻訳部１３が翻訳する候補の数があらかじめ決められていてもよく、あるいは、あらかじめ決められた値以上の尤度を有する翻訳結果を、候補として採用してもよい。 (Step S104) The machine translation unit 13 machine translates the i-th translation unit (for example, a word, a morpheme, a phrase, etc.) at the translation destination. That is, in order to extend the post-translation text data in the translation destination language (target language) from the beginning of the sentence to the end of the sentence, the i-th translation unit is translated. It is assumed that a plurality of candidates are translated when the i-th translation unit is translated. The number of candidates is not limited. For example, the number of candidates to be translated by the machine translation unit 13 may be determined in advance, or a translation result having a likelihood equal to or greater than a predetermined value may be adopted as a candidate.

（ステップＳ１０５）翻訳後テキストデータ蓄積部１４は、機械翻訳部１３が翻訳した複数の候補を、それまでに翻訳され、蓄積されている翻訳語テキストデータに対応付けて蓄積する。なお、この蓄積の際に、複数の候補を尤度の高い順（すなわち、尤度の降順）に蓄積することが好適である。後述するカウンタｊの値が小さいほど、尤度が高くなるようにするためである。 (Step S105) The post-translation text data storage unit 14 stores a plurality of candidates translated by the machine translation unit 13 in association with the translated word text data that has been translated and stored so far. In this accumulation, it is preferable to accumulate a plurality of candidates in descending order of likelihood (that is, descending order of likelihood). This is because the likelihood increases as the value of a counter j described later decreases.

（ステップＳ１０６）判断部１８は、カウンタｊを１に設定する。
（ステップＳ１０７）分類部１７は、機械翻訳部１３が翻訳したｉ番目の翻訳単位のうち、ｊ番目のものに関して、木構造情報の示す木構造を分類する。すなわち、木構造の各サブツリーを未翻訳、翻訳中、翻訳済のいずれかに分類する。なお、そのいずれにも分類できないサブツリーが存在する場合には、分類部１７は、そのサブツリーをＮＧに分類してもよい。 (Step S106) The determination unit 18 sets the counter j to 1.
(Step S107) The classification unit 17 classifies the tree structure indicated by the tree structure information with respect to the j-th translation unit among the i-th translation units translated by the machine translation unit 13. That is, each sub-tree of the tree structure is classified as untranslated, being translated, or translated. In addition, when there exists a subtree that cannot be classified into any of them, the classification unit 17 may classify the subtree into NG.

（ステップＳ１０８）判断部１８は、分類部１７による分類結果を用いて、機械翻訳部１３が翻訳したｉ番目の翻訳単位のうち、ｊ番目のものが、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できるかどうか判断する。 (Step S108) The determination unit 18 uses the classification result of the classification unit 17 to convert the j-th translation unit translated by the machine translation unit 13 from the source language of the tree-structured leaf to the target language. It is judged whether it can be realized by conversion to and replacement of nodes of the tree structure.

（ステップＳ１０９）選択部１９は、判断部１８による判断の結果、それまでの翻訳後テキストデータに、ｊ番目の翻訳単位を追加したものが、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できると判断された場合には、そのｊ番目の翻訳単位を選択する。したがって、そのｊ番目の翻訳単位が翻訳後テキストデータの最後尾に付加されることになる。そして、ステップＳ１１０に進む。一方、判断部１８による判断の結果、それまでの翻訳後テキストデータに、ｊ番目の翻訳単位を追加したものが、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できないと判断された場合には、選択部１９は、そのｊ番目の翻訳単位を選択しないで、ステップＳ１１３に進む。 (Step S109) As a result of the determination by the determination unit 18, the selection unit 19 converts the tree-structured leaf source language into the target language by adding the jth translation unit to the post-translation text data so far. If it is determined that the tree structure node can be replaced, the j-th translation unit is selected. Therefore, the jth translation unit is added to the end of the post-translation text data. Then, the process proceeds to step S110. On the other hand, as a result of the determination by the determination unit 18, the j-th translation unit added to the post-translation text data up to that point is the conversion of the tree-structured leaf from the source language to the target language and the tree-structured node. If it is determined that the change cannot be realized by the replacement, the selection unit 19 does not select the j-th translation unit and proceeds to step S113.

（ステップＳ１１０）機械翻訳部１３は、文末まで翻訳したかどうか判断する。例えば、レフト・トゥ・ライトの機械翻訳の場合には、翻訳された翻訳先のテキストに対応する原言語の単語にビットが立てられることがある。その場合には、翻訳している原言語の文のすべての単語にビットが立てられているのであれば、文末まで翻訳したと判断してもよい。そして、文末まで翻訳していた場合には、ステップＳ１１１に進み、そうでない場合には、ステップＳ１１２に進む。 (Step S110) The machine translation unit 13 determines whether the sentence has been translated to the end of the sentence. For example, in the case of left-to-right machine translation, a bit may be set in a source language word corresponding to a translated text. In that case, if all the words in the source language sentence being translated have bits set, it may be determined that the sentence has been translated to the end of the sentence. If the sentence has been translated to the end of the sentence, the process proceeds to step S111. If not, the process proceeds to step S112.

（ステップＳ１１１）出力部２０は、選択部１９が選択を繰り返すことによって作成された翻訳後テキストデータを、翻訳後テキストデータ蓄積部１４が蓄積した記録媒体から読み出して出力する。そして、翻訳対象テキストデータの機械翻訳の一連の処理が終了となる。 (Step S <b> 111) The output unit 20 reads out the translated text data created by the selection unit 19 repeating the selection from the recording medium accumulated in the translated text data storage unit 14, and outputs it. Then, the series of processes for machine translation of the text data to be translated ends.

（ステップＳ１１２）機械翻訳部１３は、カウンタｉを１だけインクリメントする。そして、ステップＳ１０４に戻る。
（ステップＳ１１３）判断部１８は、カウンタｊを１だけインクリメントする。そして、ステップＳ１０７に戻る。 (Step S112) The machine translation unit 13 increments the counter i by 1. Then, the process returns to step S104.
(Step S113) The determination unit 18 increments the counter j by 1. Then, the process returns to step S107.

ここで、単語ベースの統計的機械翻訳における判断の方法、フレーズベースの統計的機械翻訳における判断の方法、及び翻訳先でのレフト・トゥ・ライトの機械翻訳の方法について、それぞれ簡単に説明する。 Here, a determination method in word-based statistical machine translation, a determination method in phrase-based statistical machine translation, and a left-to-right machine translation method at the translation destination will be briefly described.

［単語ベースの統計的機械翻訳における判断の方法］
まず、最も単純なケースとして、翻訳対象テキストデータ、翻訳後テキストデータのすべての単語が１対１対応をしている場合を考える。なお、前述のように、ここでの「単語」には、形態素等の単語に類似する単位も含まれるものとする。翻訳対象テキストデータ中の単語ｓ_ｉは翻訳先ではＳ_ｉに翻訳されるものとした場合、翻訳対象テキストデータｓ_１、ｓ_２、…、ｓ_Ｎは単語セットＳ_１、Ｓ_２、…、Ｓ_Ｎの語順を並び替えたものとして翻訳される。この場合、翻訳後テキストデータの可能な組み合わせ数はＮ！（Ｎの階乗）となる。本実施の形態による木構造情報を用いた木構造制約モデルの目的は、このＮ！通りの探索空間を縮めるような制約を与えることにある。その木構造制約モデルは、次に示す二つのルールに従いながら翻訳が可能であるという仮定に基づいており、パラメータの訓練を必要としない。 [Method of judgment in word-based statistical machine translation]
First, as the simplest case, consider a case in which all words in the text data to be translated and the text data after translation have a one-to-one correspondence. As described above, the “word” here includes units similar to words such as morphemes. When the word s _i in the translation target text data is translated into S _i at the translation destination, the translation target text data s ₁ , s ₂ ,..., S _N are word sets S ₁ , S ₂ ,. Translated as a rearranged _N word order. In this case, the number of possible combinations of translated text data is N! (N factorial). The purpose of the tree structure constraint model using the tree structure information according to this embodiment is N! The restriction is to reduce the search space of the street. The tree structure constraint model is based on the assumption that translation is possible according to the following two rules, and does not require parameter training.

ルール１：翻訳対象テキストデータの単語ｓ_ｉがｓ_ｊと依存関係等の関係を持つならば、翻訳後テキストデータの単語Ｓ_ｉもまたＳ_ｊと関係を持つ。
ルール２：単語間の関係を表す木構造情報のアークは交差しない。 Rule 1: If the word s _i of the text data to be translated has a relationship such as a dependency relationship with s _j , the word S _i of the post-translation text data also has a relationship with S _j .
Rule 2: Arcs of tree structure information representing the relationship between words do not intersect.

上記のルールを満たしているかどうかの判断は、「翻訳後テキストデータを、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造（木構造情報の示す木構造である）のリーフの原言語（ｓ_ｉ）から目的言語（Ｓ_ｉ）への変換と、その木構造のノードの入れ替えによって実現できるかどうか」の判断と同じことになる。この判断を判断部１８が行うことになる。 The determination as to whether or not the above-mentioned rule is satisfied is as follows: “Left of the tree structure (the tree structure indicated by the tree structure information) obtained from the translation target text data corresponding to the translated text data. This is the same as the determination of “whether it can be realized by conversion from the source language (s _i ) to the target language (S _i ) and switching the nodes of the tree structure”. The determination unit 18 makes this determination.

図４の木構造に対してこの判断を行う場合について説明する。図４の木構造を括弧付けされた文で示すと、（（ａｂ）（ｃｄ））となる。この木構造から、上述の判断によって、翻訳後テキストデータを、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できると判断されうる翻訳後テキストデータは、次のどれかに限られることになる。［ＡＢＣＤ］、［ＡＢＤＣ］、［ＢＡＣＤ］、［ＢＡＤＣ］、［ＣＤＡＢ］、［ＣＤＢＡ］、［ＤＣＡＢ］、［ＤＣＢＡ］。ここで、翻訳後テキストデータの単語Ａ、Ｂ、Ｃ、Ｄはそれぞれ、翻訳対象テキストデータの単語ａ、ｂ、ｃ、ｄの対訳語である。 A case where this determination is performed on the tree structure of FIG. When the tree structure in FIG. 4 is indicated by a sentence in parentheses, ((ab) (cd)) is obtained. From this tree structure, the translated text data is converted from the source language of the leaf of the tree structure obtained from the text data to be translated corresponding to the translated text data to the target language, and the tree structure is determined based on the above determination. The post-translation text data that can be determined to be realized by the replacement of the nodes is limited to one of the following. [ABCD], [ABDC], [BACD], [BADC], [CDAB], [CDBA], [DCAB], [DCBA]. Here, the words A, B, C, and D in the post-translation text data are parallel translations of the words a, b, c, and d in the translation target text data, respectively.

例えば、翻訳後テキストデータとして［ＡＣＢＤ］を考えてみる。翻訳対象テキストデータの括弧付けされた文から、ａとｂが関係を持っており、ｃとｄが関係を持っていることがわかる。これに対し、ルール１を適用するとＡとＢ、ＣとＤもまた関係を持つ。しかしながら、この関係を単語列［ＡＣＢＤ］にあてはめるとそれぞれの関係を示すアークが交差することになり、ルール２を満たすことができない。したがって、翻訳後テキストデータ［ＡＣＢＤ］は不適切であるとわかる。 For example, consider [ACBD] as post-translation text data. It can be seen from the parenthesized sentences of the text data to be translated that a and b have a relationship, and c and d have a relationship. On the other hand, when rule 1 is applied, A and B and C and D are also related. However, if this relationship is applied to the word string [ACBD], arcs indicating the respective relationships intersect, and rule 2 cannot be satisfied. Therefore, it can be seen that the post-translation text data [ACBD] is inappropriate.

ここで、翻訳後テキストデータに上述の判断部１８による判断を適用する場合について説明する。翻訳後テキストデータ［ＡＣＢＤ］に対応する翻訳対象テキストデータから得られた木構造（（ａｂ）（ｃｄ））のリーフを目的言語に変換すると、（（ＡＢ）（ＣＤ））となる。次に、これのノードを入れ替えることによって、翻訳後テキストデータ［ＡＣＢＤ］を実現できるかどうか判断する。この場合に（（ＡＢ）（ＣＤ））では、ＡＢがペアとなっており（一つのノードに対応しており）、ノードを入れ替えることによっても決して分かれることがないが、翻訳後テキストデータ［ＡＣＢＤ］では、ＡとＢが分かれている。したがって、翻訳後テキストデータ［ＡＣＢＤ］を、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できないと判断されることになる。このような判断を、判断部１８が実行することになる。ここで、判断部１８は、例えば、（（ＡＢ）（ＣＤ））からノードの入れ替えによって到達することができるすべての翻訳後テキストデータを構成し、その構成した翻訳後テキストデータのいずれかに、判断対象となる翻訳後テキストデータが含まれるかどうかを判断することにより、この判断を行ってもよく、他の方法によって判断を行ってもよい。 Here, a case will be described in which the determination by the determination unit 18 is applied to the translated text data. When the leaf of the tree structure ((ab) (cd)) obtained from the text data to be translated corresponding to the post-translation text data [ACBD] is converted into the target language, ((AB) (CD)) is obtained. Next, it is determined whether or not the translated text data [ACBD] can be realized by replacing these nodes. In this case, in ((AB) (CD)), AB is a pair (corresponding to one node), and it is never separated by replacing the node, but the translated text data [ACBD ], A and B are separated. Therefore, the translated text data [ACBD] is converted from the original language of the leaf of the tree structure obtained from the text data to be translated corresponding to the translated text data to the target language, and the nodes of the tree structure are replaced. It will be judged that it cannot be realized. Such a determination is performed by the determination unit 18. Here, for example, the determination unit 18 configures all post-translation text data that can be reached by exchanging nodes from ((AB) (CD)). This determination may be performed by determining whether or not post-translation text data to be determined is included, or may be determined by other methods.

次の例について説明する。翻訳後テキストデータが［ＤＣＡＢ］であるとする。すると、上記説明と同様に、翻訳後テキストデータ［ＤＣＡＢ］に対応する翻訳対象テキストデータから得られた木構造（（ａｂ）（ｃｄ））のリーフを目的言語に変換すると、（（ＡＢ）（ＣＤ））となる。次に、これのノードを入れ替えることによって、翻訳後テキストデータ［ＤＣＡＢ］を実現できるかどうか判断する。この場合には、ＣとＤを入れ替えて（図４でのノード２の入れ替え）、（ＡＢ）と（ＤＣ）を入れ替える（図４でのノード３の入れ替え）ことによって、［ＤＣＡＢ］が得られる。したがって、判断部１８は、翻訳後テキストデータ［ＤＣＡＢ］は、木構造制約モデルの制限を満たしていると判断する。 The following example will be described. Assume that the translated text data is [DCAB]. Then, similarly to the above description, when the leaf of the tree structure ((ab) (cd)) obtained from the text data to be translated corresponding to the post-translation text data [DCAB] is converted into the target language, ((AB) ( CD)). Next, it is determined whether or not the translated text data [DCAB] can be realized by replacing these nodes. In this case, [DCAB] is obtained by exchanging C and D (replacement of node 2 in FIG. 4), and (AB) and (DC) (replacement of node 3 in FIG. 4). . Therefore, the determination unit 18 determines that the translated text data [DCAB] satisfies the restriction of the tree structure constraint model.

実際の文の例としては、英文（Ｈｅ（ｅａｔｓ（ｌａｒｇｅｂｒｅａｄ）ｑｕｉｃｋｌｙ）．）に対する日本語対訳、（彼は（（大きなパンを）早く食べる）。）があげられ、ＳＶＯとＳＯＶと語順に大きな違いがあるにもかかわらず、上記のルールを満たしながら翻訳ができていることがわかる。したがって、この木構造制約モデルが適切なものであると推察することができる。 An example of an actual sentence is a Japanese translation of an English sentence (He (eats (large bread) quickly).)) (He eats (big bread) quickly). Despite the big differences, you can see that the translation is done while satisfying the above rules. Therefore, it can be inferred that this tree structure constraint model is appropriate.

この木構造制約モデルを用いない場合には、（（ａｂ）（ｃｄ））に対する可能な翻訳後テキストデータの組み合わせは４！＝２４である。一方、本モデルを導入した場合は８に減少している。Ｎ単語からなるバイナリ木構造の場合、本モデル導入時の組み合わせは２^Ｎ−１となる。この理由は、このバイナリ木のノードの数はＮ−１であり、それぞれのノードに対し、入れ替えを行う、行わない、の二つの選択枝があるためである。この組み合わせ数は、本モデルを導入しない場合のＮ！に比べ、非常に小さいものとなっている。実際Ｎ＝１０の場合で約１／７，０００、Ｎ＝２０の場合で１／２×１０^１２である。より一般的に、バイナリ木でない場合の本モデル導入時の組み合わせ数はΠ_ｉ＝１ ^ｎ（Ｂ_ｉ！）である。ここで、ｎは木に含まれるノードの数、Ｂ_ｉはｉ番目のノードの枝の数を表す。 When this tree structure constraint model is not used, there are 4 possible combinations of post-translation text data for ((ab) (cd))! = 24. On the other hand, when this model is introduced, the number is reduced to 8. In the case of a binary tree structure composed of N words, the combination when this model is introduced is ^2N-1 . This is because the number of nodes in this binary tree is N-1, and there are two selection branches for each node, with or without replacement. This number of combinations is N when this model is not introduced! Compared to, it is very small. Actually, it is about 1 / 7,000 when N = 10, and 1/2 × 10 ¹² when N = 20. More generally, the number of combinations at the time of introduction of this model when not a binary tree is Π _{i = 1} ⁿ (B _i !). Here, n represents the number of nodes included in the tree, and B _i represents the number of branches of the i-th node.

次の式（１）は、本モデルを用いない場合の統計翻訳を表す式であり、Ｐ（ｆ｜ｅ）、Ｐ（ｅ）はそれぞれ翻訳モデル、言語モデルを表している。

The following expression (1) is an expression representing statistical translation when this model is not used, and P (f | e) and P (e) represent a translation model and a language model, respectively.

これに対し、本実施の形態による提案モデルを用いた場合は、新たな項Ｐ（ｅ｜Ｔ）が追加され、次のような式で表されることになる。

On the other hand, when the proposed model according to the present embodiment is used, a new term P (e | T) is added and is expressed by the following equation.

ここで、Ｐ（ｅ｜Ｔ）が木構造制約モデルであり、Ｔは翻訳対象テキストデータの木構造を表している。Ｐ（ｅ｜Ｔ）の値は、ｅがモデルの制約を満たす場合は１であり、そうでなければ０である。ｅがモデルの制約を満たす場合とは、翻訳後テキストデータを、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できると判断される場合である。 Here, P (e | T) is a tree structure constraint model, and T represents a tree structure of text data to be translated. The value of P (e | T) is 1 if e satisfies the model constraints and 0 otherwise. When e satisfies the constraints of the model, the translated text data is converted from the source language of the leaf of the tree structure obtained from the text data to be translated corresponding to the translated text data to the target language, and the tree This is a case where it can be realized by exchanging the nodes of the structure.

［フレーズベースの統計的機械翻訳における判断の方法］
次に、単語ベースの統計的機械翻訳における判断方法を、フレーズベースモデルに対して適用できるように拡張する。通常、単語アライメントはｎ対ｍ（０対ｍ、ｎ対０を含む）である。しかしながら、フレーズベースモデルでは、フレーズ対フレーズのアライメントは、たとえそれぞれのフレーズに含まれる単語数が異なっていたとしても、常に１対１対応となる。このため、前述の１対１単語対応のルールをおおむねそのままフレーズ対フレーズの対応に当てはめることができる。フレーズｐｈ_ｉが単語ｓ_ｎを含み、フレーズｐｈ_ｊが単語ｓ_ｍを含むものとする。ここで、単語ｓ_ｎと単語ｓ_ｍが関係を持つならば、フレーズｐｈ_ｉとフレーズｐｈ_ｊも関係を持つと定義する。これにより、前述のルールは次のようにフレーズに拡張することができる。 [Method of judgment in phrase-based statistical machine translation]
Next, the judgment method in the word-based statistical machine translation is extended so that it can be applied to the phrase-based model. Usually, the word alignment is n to m (including 0 to m and n to 0). However, in the phrase-based model, the phrase-to-phrase alignment is always one-to-one even if the number of words contained in each phrase is different. For this reason, the above-mentioned rule for one-to-one words can be applied to the phrase-to-phrase correspondence as it is. It is assumed that the phrase ph _i includes the word s _n and the phrase ph _j includes the word s _m . Here, if the word _{s n} and the word _{s m} has a relation, also defined as having a relationship phrase ph _i and phrase ph _j. This allows the above rules to be expanded into phrases as follows:

ルール１：翻訳原言語フレーズｐｈ_ｉがｐｈ_ｊと依存関係等の関係を持つならば、翻訳先言語フレーズＰＨ_ｉもまたＰＨ_ｊと関係を持つ。
ルール２：フレーズ間の関係を表すアークは交差しない。 Rule 1: If the source language phrase ph _i has a relationship such as a dependency relationship with ph _j , the translated language phrase PH _i also has a relationship with PH _j .
Rule 2: Arcs representing the relationship between phrases do not intersect.

ここで、ＰＨ_ｎは、翻訳原言語フレーズｐｈ_ｎの対訳フレーズを表すものとする。また、翻訳原言語フレーズは、翻訳対象テキストデータに含まれるフレーズであり、翻訳先言語フレーズは、翻訳後テキストデータに含まれるフレーズである。 Here, PH _n represents a parallel translation phrase of the translation source language phrase ph _n . The translation source language phrase is a phrase included in the translation target text data, and the translation destination language phrase is a phrase included in the translated text data.

上記のルールを満たしているかどうかの判断は、「翻訳後テキストデータを、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造（木構造情報の示す木構造である）のリーフの原言語から目的言語へのフレーズ単位の変換と、その木構造のノードの入れ替え（ただし、そのフレーズを分割することがないものとする）によって実現できるかどうか」の判断と同じことになる。この判断を判断部１８が行うことになる。 The determination as to whether or not the above-mentioned rule is satisfied is as follows: “Left of the tree structure (the tree structure indicated by the tree structure information) obtained from the translation target text data corresponding to the translated text data. This is the same as the determination of “whether it can be realized by converting the phrase unit from the original language of the phrase into the target language and replacing the nodes of the tree structure (provided that the phrase is not divided)”. The determination unit 18 makes this determination.

このフレーズに拡張したルールを適用した場合、１対１単語対応の場合と同様、翻訳原言語の括弧付けられた文を表す木構造の各ノードに対し、その直下のサブツリー（またはリーフである単語）どうしの順序を入れ替えることによって翻訳先言語文が得られる。ただし、入れ替えを行えるノードには制限があり、フレーズの一部のみを含むノードは入れ替えを行うことができない。例えば、図５のように、木構造（（ａｂｃ）（（ｄｅ）（ｆｇ）））で、ｂｃｄがフレーズｐｈを構成している場合を考える。この場合に、ノード１はフレーズｐｈの一部であるｂｃを含むため入れ替えができない。同様に、ノード２、４も入れ替えができない。一方、ノード３はフレーズを含まず、ノード５はフレーズ全体を含んでいるため、入れ替え可能である。例えば、ノード２に対し入れ替えを行った場合、フレーズｐｈは翻訳先言語では２箇所に分割されることになり、フレーズベースモデルにおけるフレーズ対フレーズの対応が１対１であることに反することになる。結果として、この木構造の対訳としては［ＡＰＨＥＦＧ］、［ＡＰＨＥＧＦ］、［ＧＦＥＰＨＡ］、［ＦＧＥＰＨＡ］のみが許されることになる。ここで、ＰＨはｐｈの対訳フレーズを表すものとする。 When the extended rule is applied to this phrase, as in the case of one-to-one word correspondence, for each node of the tree structure representing the parenthesized sentence of the source language, the subtree (or the word that is a leaf) immediately below it ) The translated language sentence can be obtained by changing the order of each other. However, there are restrictions on the nodes that can be replaced, and a node that includes only a part of a phrase cannot be replaced. For example, as shown in FIG. 5, consider a case where bcd forms a phrase ph in a tree structure ((abc) ((de) (fg))). In this case, since node 1 includes bc which is a part of phrase ph, it cannot be replaced. Similarly, the nodes 2 and 4 cannot be switched. On the other hand, the node 3 does not include a phrase, and the node 5 includes the entire phrase, and thus can be replaced. For example, when the node 2 is replaced, the phrase ph is divided into two parts in the translation target language, which is contrary to the phrase-to-phrase correspondence in the phrase-based model is 1: 1. . As a result, only [APHEFG], [APHEGF], [GFEPHA], and [FGEPHA] are allowed as parallel translations of this tree structure. Here, PH represents a parallel translation phrase of ph.

ここで、翻訳後テキストデータに上述の判断を適用する場合の具体例について説明する。翻訳後テキストデータが［ＤＥＦＧＡＢＣ］であるとする。その翻訳後テキストデータ［ＤＥＦＧＡＢＣ］に対応する翻訳対象テキストデータから得られた木構造（（ａｂｃ）（（ｄｅ）（ｆｇ）））のリーフを目的言語に変換すると、（（ＡＢＣ）（（ＤＥ）（ＦＧ）））となる。次に、これのノードを入れ替えることによって、翻訳後テキストデータ［ＤＥＦＧＡＢＣ］を実現できるかどうか判断すると、それは可能である。図５におけるノード５の入れかを行えばよいからである。しかしながら、翻訳後テキストデータ［ＤＥＦＧＡＢＣ］では、フレーズＰＨが、ＤとＢＣとに分割されてしまっている。そのため、判断部１８は、翻訳後テキストデータ［ＤＥＦＧＡＢＣ］を、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語へのフレーズ単位の変換と、その木構造のノードの入れ替え（ただし、そのフレーズを分割することがないものとする）によって実現できないと判断する。したがって、翻訳後テキストデータ［ＤＥＦＧＡＢＣ］が選択部１９によって選択されることはないことになる。 Here, a specific example in the case where the above-described determination is applied to translated text data will be described. Assume that the translated text data is [DEFGABC]. When the leaf of the tree structure ((abc) ((de) (fg))) obtained from the text data to be translated corresponding to the post-translation text data [DEFGABC] is converted into the target language, ((ABC) ((DE ) (FG))). Next, it is possible to determine whether or not the translated text data [DEFGABC] can be realized by replacing these nodes. This is because the node 5 in FIG. However, in the post-translation text data [DEFGABC], the phrase PH is divided into D and BC. Therefore, the determination unit 18 converts the post-translation text data [DEFGABC] from the source language of the leaf of the tree structure obtained from the text data to be translated corresponding to the post-translation text data to the target language, It is determined that it cannot be realized by replacing the nodes of the tree structure (however, the phrase is not divided). Therefore, the post-translation text data [DEFGABC] is not selected by the selection unit 19.

一方、翻訳後テキストデータが［ＧＦＥＤＣＢＡ］であれば、判断部１８は、（（ＡＢＣ）（（ＤＥ）（ＦＧ）））のノードを入れ替えることによって、その翻訳後テキストデータを得ることが可能であり、かつ、フレーズＰＨを分割していないと判断する。したがって、翻訳後テキストデータ［ＧＦＥＤＣＢＡ］は、選択部１９によって選択されうることになる。 On the other hand, if the post-translation text data is [GFEDCBA], the determination unit 18 can obtain the post-translation text data by replacing the node of ((ABC) ((DE) (FG))). It is determined that the phrase PH is not divided. Therefore, the post-translation text data [GFEDCBA] can be selected by the selection unit 19.

［翻訳先でのレフト・トゥ・ライトの機械翻訳の方法］
次に、上述の木構造制約モデルを翻訳先でのレフト・トゥ・ライトの機械翻訳（デコーダ）に導入する場合のアルゴリズムについて説明する。この機械翻訳では、翻訳後テキストデータ（翻訳先言語文）は、左から右へ（すなわち、文頭から文末に向かって）順に生成されていく。翻訳後テキストデータを右に伸ばすための翻訳単位が新たに生成された場合に、その翻訳単位に対応する翻訳対象テキストデータの単語に対してビットが立てられる。そして、翻訳対象テキストデータのすべての単語に対するビットが立てられたときに、翻訳後テキストデータは文末に達したと判断される。木構造制約モデルをこの流れに組み込むためには、翻訳後テキストデータに付加する目的言語の翻訳単位が新たに翻訳されるたびに、木構造制約モデルの制約を満たしているかどうかチェックして、木構造制約モデルの制約を満たしているものを選択する必要がある。 [Left-to-right machine translation method at the translation destination]
Next, an algorithm for introducing the above-described tree structure constraint model into a left-to-right machine translation (decoder) at the translation destination will be described. In this machine translation, post-translation text data (translation target language sentence) is generated in order from left to right (that is, from the beginning to the end of the sentence). When a new translation unit for extending the post-translation text data to the right is generated, a bit is set for the word of the translation target text data corresponding to the translation unit. When the bits for all the words in the text data to be translated are set, it is determined that the translated text data has reached the end of the sentence. In order to incorporate the tree structure constraint model into this flow, each time a target language translation unit added to the post-translation text data is newly translated, it is checked whether the tree structure constraint model is satisfied. It is necessary to select one that satisfies the constraints of the structural constraint model.

このチェックアルゴリズムの説明の前に、翻訳対象テキストデータの木構造のサブツリーを「未翻訳」「翻訳済」「翻訳中」「ＮＧ」の４種類に分類しておく。 Prior to the description of this check algorithm, the tree structure subtree of the text data to be translated is classified into four types: “untranslated”, “translated”, “under translation”, and “NG”.

サブツリーがリーフである単語のみからなっており、かつすべての単語が未翻訳（ビットが立っていない）ならば、そのサブツリーは未翻訳である。
サブツリーが未翻訳サブツリーのみからなっているならば、そのサブツリーもまた未翻訳である。 If a subtree consists only of words that are leaves, and all the words are untranslated (no bits are set), the subtree is untranslated.
If a subtree consists only of untranslated subtrees, that subtree is also untranslated.

サブツリーがリーフである単語のみからなっており、かつすべての単語が翻訳済み（ビットが立っている）ならば、そのサブツリーは翻訳済である。
サブツリーが翻訳済サブツリーのみからなっているならば、そのサブツリーもまた翻訳済である。 If a subtree consists only of words that are leaves and all the words have been translated (bits are set), the subtree has been translated.
If a subtree consists only of translated subtrees, that subtree is also translated.

サブツリーがリーフである単語のみからなっており、未翻訳の単語と、翻訳済みの単語との双方を含むならば、そのサブツリーは翻訳中である。
サブツリーが翻訳済と、未翻訳との双方のサブツリーを含むならば、そのサブツリーは翻訳中である。
サブツリーが一つだけ翻訳中サブツリーを含むのならば、そのサブツリーは翻訳中である。 If a subtree consists only of words that are leaves and contains both untranslated words and translated words, the subtree is being translated.
If a subtree contains both translated and untranslated subtrees, the subtree is being translated.
If a subtree contains only one subtree being translated, that subtree is being translated.

サブツリーが二つ以上の翻訳中サブツリーを含むのならば、そのサブツリーはＮＧである。なぜなら、翻訳先でのレフト・トゥ・ライトの機械翻訳において、前述のルールを満たすのであれば、木構造において未翻訳と翻訳済とが接する境界が１個となり、翻訳中のサブツリーが２個以上存在することはあり得ないからである。
サブツリーがＮＧサブツリーを含んでいるならば、そのサブツリーもまたＮＧである。 If a subtree contains more than one translating subtree, the subtree is NG. This is because if left-to-right machine translation at the translation destination satisfies the above rules, there will be one boundary where untranslated and translated in the tree structure, and there will be two or more subtrees being translated This is because it cannot exist.
If a subtree contains an NG subtree, that subtree is also NG.

なお、本実施の形態では、ＮＧのサブツリーが発生しないように翻訳単位の候補を選択することになるため、分類部１７は、「未翻訳」「翻訳中」「翻訳済」の三種類の分類しか行わないが、上記説明のように、分類部１７が「未翻訳」「翻訳中」「翻訳済」「ＮＧ」の四種類の分類を行い、分類部１７によってＮＧの分類がなされた場合に、判断部１８は、翻訳後テキストデータを、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できないと判断してもよい。そのように判断しても、前述の説明の場合、すなわち、「分類結果を用いて、２個以上の翻訳中のサブツリーを含むサブツリーが出現した場合に、翻訳後テキストデータを、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できないと判断部１８が判断する」場合と実質的に同じことになる。 In this embodiment, the translation unit candidates are selected so that an NG subtree does not occur. Therefore, the classification unit 17 has three types of classifications “untranslated”, “under translation”, and “translated”. However, as described above, when the classification unit 17 performs four types of classification, “untranslated”, “under translation”, “translated”, and “NG”, and the classification unit 17 performs NG classification. The determination unit 18 may determine that the translated text data cannot be realized by converting the tree-structured leaf source language into the target language and replacing the tree-structured nodes. Even if such a determination is made, in the case of the above-described explanation, that is, “when a subtree including two or more subtrees under translation appears using the classification result, the translated text data is converted into a tree-structured leaf. This is substantially the same as the case where the determination unit 18 determines that it cannot be realized by conversion from the original language to the target language and switching the nodes of the tree structure.

翻訳先のレフト・トゥ・ライトの機械翻訳中において、翻訳後テキストデータに新たな目的言語の翻訳単位を追加した際にＮＧサブツリーが生成されたならば、その翻訳単位を追加した後の翻訳後テキストデータは木構造制約モデルを満たすことができない。ここで、その理由について説明する。翻訳対象テキストデータ側のサブツリーが単語列［ｘ_１、ｘ_２、…、ｘ_ｎ］からなるとする。このとき、このサブツリーの対訳は、それの対訳語のセット｛Ｘ_１、Ｘ_２、…、Ｘ_Ｎ｝の語順を並べ替えたものとして得られる。もし、途中に他の単語が割り込んだ場合、ルール２が満たせなくなるためである。このため、翻訳中サブツリーの最後の翻訳済み単語に続く単語は、このサブツリーの中の未翻訳の単語でなければならない。これは、次に翻訳される単語は翻訳中サブツリーの中の未翻訳の単語から選ばれなければならないことを意味している。例えば、木構造（（ａｂ）（（ｃｄ）（ｅｆ）））において、ａ、ｂが翻訳済単語であり、ｃ、ｄ、ｅ、ｆが未翻訳単語である場合には、次に翻訳されるべき単語はｃ、ｄ、ｅ、ｆのうちのどれかでなければならない。このように、翻訳中サブツリーが二つ以上含まれるならば、この条件を満たすことができなくなる。 During machine translation of the translation destination Left to Right, if a new NG subtree is generated when a new target language translation unit is added to the post-translation text data, the post-translation after the translation unit is added Text data cannot satisfy the tree structure constraint model. Here, the reason will be described. It is assumed that the subtree on the translation target text data side is composed of word strings [x ₁ , x ₂ ,..., X _n ]. At this time, the bilingual translation of this subtree is obtained by rearranging the word order of the bilingual word set {X ₁ , X ₂ ,..., X _N }. This is because rule 2 cannot be satisfied if another word interrupts in the middle. For this reason, the word following the last translated word in the sub-tree being translated must be an untranslated word in this sub-tree. This means that the next word to be translated must be chosen from the untranslated words in the translating subtree. For example, in a tree structure ((ab) ((cd) (ef))), when a and b are translated words and c, d, e and f are untranslated words, they are translated next. The word that should be should be one of c, d, e, f. In this way, if two or more subtrees are included during translation, this condition cannot be satisfied.

次に、翻訳原言語フレーズｐｈの対訳フレーズＰＨを翻訳文仮説（生成中の）に後続させる場合を考える。その時点での翻訳後テキストデータを（Ｓ_１、Ｓ_２、…、Ｓ_ｉ）とし、これは木構造制約モデルを満たしているものとする。これにフレーズＰＨを後続させた翻訳後テキストデータが翻訳中のサブツリーを二つ以上生成させない（すなわち、木構造制約モデルを満たす）ためには、次の条件のうちのどちらかを満たす必要がある。ここで、Ｔは、木構造における、（Ｓ_１、Ｓ_２、…、Ｓ_ｉ）のいずれかを含む最小の翻訳中サブツリーを表すものとする。 Next, let us consider a case where the translation phrase PH of the translation source language phrase ph follows the translation sentence hypothesis (during generation). Assume that the post-translation text data at that time is (S ₁ , S ₂ ,..., S _i ), which satisfies the tree structure constraint model. In order to prevent the post-translation text data followed by the phrase PH from generating two or more subtrees being translated (that is, to satisfy the tree structure constraint model), one of the following conditions must be satisfied: . Here, T represents the smallest sub-tree under translation including any of (S ₁ , S ₂ ,..., S _i ) in the tree structure.

（条件１）フレーズＰＨを後続させた後でもＴが翻訳中であり、かつ別の翻訳中サブツリーが生成されないこと。
（条件２）フレーズＰＨを後続させることによって、Ｔが翻訳済サブツリーとなること。 (Condition 1) T is still being translated even after the phrase PH is followed, and no other subtree being translated is generated.
(Condition 2) T becomes a translated subtree by following the phrase PH.

フレーズｐｈがＴの未翻訳の部分に含まれていることは、上記条件１に対して必要十分である。また、フレーズｐｈがＴの未翻訳の部分をすべて含んでいることは条件２に対して必要十分である。このことより、既存の翻訳後テキストデータに新たな目的言語の翻訳単位を付加したものが木構造制約モデルを満たしているかの判断は次の手順で行うことができる。 It is necessary and sufficient for the above condition 1 that the phrase ph is included in the untranslated portion of T. In addition, it is necessary and sufficient for Condition 2 that the phrase ph includes all untranslated portions of T. From this, it can be determined by the following procedure whether the existing target text data added with the new target language translation unit satisfies the tree structure constraint model.

（１）既存の翻訳後テキストデータに対し、それに対応する木構造に含まれる最小の翻訳中サブツリーを記憶しておく。 (1) For existing post-translation text data, the minimum sub-tree under translation included in the corresponding tree structure is stored.

（２）既存の翻訳後テキストデータに対して新たなフレーズＰＨを後続させる場合に、それの対訳である翻訳原言語フレーズｐｈと、上記（１）で記憶している最小の翻訳中サブツリーの未翻訳部分との比較を行う。ｐｈがその最小の翻訳中サブツリーの未翻訳部分に含まれる場合、あるいは、ｐｈがその最小の翻訳中サブツリーの未翻訳部分を含む場合には、既存の翻訳後テキストデータにＰＨを後続させた新たな翻訳後テキストデータを生成し、それに対応して、最小の翻訳中サブツリーを更新する。そうでなければ、その新たなフレーズＰＨは破棄され、次のフレーズについて、同様の判断が行われる。したがって、判断部１８は、このような判断を行うことによって、木構造制約モデルの制限を満たしているかどうかを判断してもよい。 (2) When a new phrase PH is added to the existing post-translation text data, the translation source language phrase ph, which is a translation of the phrase, and the untranslated subtree not stored in (1) above Compare with the translated part. If ph is included in the untranslated part of the smallest translating subtree, or if ph contains the untranslated part of the smallest translating subtree, the new post-translation text data is followed by PH. Generates post-translation text data and correspondingly updates the smallest sub-translation subtree. Otherwise, the new phrase PH is discarded and the same determination is made for the next phrase. Therefore, the determination unit 18 may determine whether the restriction of the tree structure constraint model is satisfied by making such a determination.

次に、本実施の形態による機械翻訳装置１の動作について、具体例を用いて説明する。
この具体例において、翻訳対象テキストデータが「ａｂｃｄｅｆｇｈｉ」であるとする。ここで、ａｂｃ…等は、原言語の単語や形態素である。その翻訳対象テキストデータ「ａｂｃｄｅｆｇｈｉ」が、翻訳対象テキストデータ記憶部１１で記憶されているものとする。 Next, the operation of the machine translation apparatus 1 according to the present embodiment will be described using a specific example.
In this specific example, it is assumed that the text data to be translated is “abcdefghi”. Here, abc... Are source language words and morphemes. It is assumed that the translation target text data “abcdefghi” is stored in the translation target text data storage unit 11.

ユーザが機械翻訳装置１を操作して、翻訳対象テキストデータ「ａｂｃｄｅｆｇｈｉ」の機械翻訳の処理を開始させたとする。すると、構文解析部１５は、翻訳対象テキストデータ「ａｂｃｄｅｆｇｈｉ」を翻訳対象テキストデータ記憶部１１から読み出して、構文解析し、木構造情報「（（（ａｂ）（ｃｄ））（（ｅｆｇ）（ｈｉ）））」を構成して、木構造情報蓄積部１６に渡す（ステップＳ１０１）。すると、木構造情報蓄積部１６は、その括弧付けられた文「（（（ａｂ）（ｃｄ））（（ｅｆｇ）（ｈｉ）））」を、木構造情報として図示しない記録媒体に蓄積する（ステップＳ１０２）。なお、この木構造情報を図示すると、図６で示されるようになる。また、図６で示されているように、各ノードには、ノードＩＤ（例えば、Ｎ００１等）が付与されている。そのノードＩＤは、リーフに近い階層の左から、Ｎ００１、Ｎ００２と言うように順番に付与されるように決まっているものとする。 It is assumed that the user operates the machine translation apparatus 1 to start the machine translation process for the text data to be translated “abcdefghi”. Then, the syntax analysis unit 15 reads out the translation target text data “abcdefghhi” from the translation target text data storage unit 11, parses it, and generates tree structure information “(((ab) (cd)) ((efg) (hi ))) ”And pass it to the tree structure information storage unit 16 (step S101). Then, the tree structure information storage unit 16 stores the parenthesized sentence “(((ab) (cd)) ((efg) (hi)))” as a tree structure information on a recording medium (not shown) ( Step S102). This tree structure information is illustrated in FIG. Further, as shown in FIG. 6, each node is given a node ID (for example, N001). It is assumed that the node IDs are assigned in order such as N001 and N002 from the left of the hierarchy close to the leaf.

次に、機械翻訳部１３は、翻訳対象テキストデータ記憶部１１から翻訳対象テキストデータを読み出し、１番目の翻訳単位の翻訳を行う（ステップＳ１０３，Ｓ１０４）。その機械翻訳によって、複数の翻訳結果の候補が作成されたとする。翻訳後テキストデータ蓄積部１４は、その複数の翻訳結果の候補を、図示しない記録媒体に蓄積する（ステップＳ１０５）。なお、翻訳後テキストデータ蓄積部１４は、その蓄積の際に、翻訳結果の候補を、その機械翻訳の尤度（確率値）の高いものから順に蓄積するものとする。したがって、蓄積後の翻訳結果の候補は、尤度の降順となっている。その１番目の候補は、フレーズ「ｅｆ」の翻訳されたフレーズ「ＦＥ」であったとする。 Next, the machine translation unit 13 reads the translation target text data from the translation target text data storage unit 11 and translates the first translation unit (steps S103 and S104). It is assumed that a plurality of translation result candidates are created by the machine translation. The post-translation text data storage unit 14 stores the plurality of translation result candidates on a recording medium (not shown) (step S105). The post-translation text data accumulating unit 14 accumulates the translation result candidates in descending order of the likelihood (probability value) of the machine translation. Therefore, the translation result candidates after accumulation are in descending order of likelihood. It is assumed that the first candidate is a translated phrase “FE” of the phrase “ef”.

ここで、ある翻訳結果の候補についての分類や判断が行われる際には、翻訳対象テキストデータ記憶部１１において、前述のように、その候補を含めた翻訳済の翻訳単位に対応する翻訳済フラグが「１」に設定されているものとする。図７は、そのようにして設定された翻訳済フラグと、翻訳対象テキストデータとの対応の一例を示す図である。図７において、翻訳済フラグ「１」に対応する原言語の単語や形態素は、翻訳済であり、翻訳済フラグ「０」に対応する原言語の単語や形態素は未翻訳である。この翻訳済フラグの設定は、例えば、機械翻訳部１３によって行われてもよく、判断部１８によって行われてもよく、あるいは、その他の構成要素によって行われてもよい。 Here, when classification or determination is made regarding a candidate of a certain translation result, the translated flag corresponding to the translated translation unit including the candidate in the translation target text data storage unit 11 as described above. Is set to “1”. FIG. 7 is a diagram showing an example of the correspondence between the translated flag set in this way and the text data to be translated. In FIG. 7, the source language words and morphemes corresponding to the translated flag “1” have been translated, and the source language words and morphemes corresponding to the translated flag “0” have not been translated. The translated flag may be set by, for example, the machine translation unit 13, the determination unit 18, or other components.

次に、判断部１８は、分類部１７に対して、１番目の候補に関する分類を行うように指示する。すると、分類部１７は、木構造情報蓄積部１６が蓄積した木構造情報と、図７で示される翻訳対象テキストデータと翻訳済フラグとの関係とを用いて、ノードごとに分類を行う。例えば、ノードＩＤ「Ｎ００１」のノードは、ａ，ｂの双方が翻訳されていないため（翻訳済フラグが「０」であるため）、未翻訳に分類される。一方、ノードＩＤ「Ｎ００３」のノードは、ｅ，ｆが翻訳済であるが、ｇは未翻訳であるため、翻訳中に分類される。その分類の結果は、図８で示されるようになる（ステップＳ１０６，Ｓ１０７）。なお、図８で示される分類結果は、分類部１７によって図示しない記録媒体に蓄積されるものとする。 Next, the determination unit 18 instructs the classification unit 17 to perform classification related to the first candidate. Then, the classification unit 17 performs classification for each node using the tree structure information accumulated by the tree structure information accumulation unit 16 and the relationship between the translation target text data and the translated flag shown in FIG. For example, the node with the node ID “N001” is classified as untranslated because both a and b are not translated (the translated flag is “0”). On the other hand, the node with the node ID “N003” is classified during translation because e and f have been translated but g has not been translated. The result of the classification is as shown in FIG. 8 (steps S106 and S107). 8 is stored in a recording medium (not shown) by the classification unit 17.

判断部１８は、分類部１７によって蓄積された図８で示される分類結果を示す情報を参照し、２以上の翻訳中のサブツリー（ノード）を含むサブツリー（ノード）が存在するかどうか判断する。ここでは、ノードＩＤ「Ｎ００３」「Ｎ００６」「Ｎ００７」のノードが翻訳中であるが、その各ノードは、１個の翻訳中のノードを有するのみであるため、判断部１８は、２以上の翻訳中のサブツリーを含むサブツリーが存在しないと判断する（ステップＳ１０８）。したがって、選択部１９は、その１番目の翻訳結果の候補を選択する（ステップＳ１０９）。そして、翻訳後テキストデータ蓄積部１４が翻訳結果の候補を蓄積した記録媒体において、選択部１９は、その翻訳結果の候補「ＦＥ」を翻訳後テキストデータに決定する。なお、選択された以外の翻訳結果の候補、すなわち、２番目以降の翻訳結果の候補は、すべて廃棄されるものとする。 The determination unit 18 refers to the information indicating the classification result shown in FIG. 8 accumulated by the classification unit 17 and determines whether there is a subtree (node) including two or more subtrees (nodes) being translated. Here, although the nodes with node IDs “N003”, “N006”, and “N007” are being translated, each of the nodes has only one node being translated. It is determined that there is no subtree including the subtree being translated (step S108). Therefore, the selection unit 19 selects the first translation result candidate (step S109). Then, in the recording medium in which the post-translation text data storage unit 14 stores the translation result candidates, the selection unit 19 determines the translation result candidate “FE” as post-translation text data. It is assumed that all translation result candidates other than the selected translation result, that is, the second and subsequent translation result candidates are all discarded.

機械翻訳部１３は、図７で示される翻訳対象テキストデータと翻訳済フラグとの関係を示す情報を参照し、翻訳済フラグがすべて「１」ではないため、まだ文末ではないと判断して、次の翻訳単位、すなわち、２番目の翻訳単位の翻訳を行う（ステップＳ１１０，Ｓ１１２，Ｓ１０４）。 The machine translation unit 13 refers to the information indicating the relationship between the translation target text data and the translated flag shown in FIG. 7, and determines that the translated flags are not all “1”, so that it is not yet the end of the sentence. The next translation unit, that is, the second translation unit is translated (steps S110, S112, and S104).

その翻訳結果は、第１の候補が「ｃ」に対応する「Ｃ」であり、第２の候補が「ｇ」に対応する「Ｇ」であり、第３の候補が「ｈ」に対応する「Ｈ」であり、第４以降の候補も得られたとする。それらの翻訳結果の候補は、１番目の翻訳単位の機械翻訳の際と同様に、翻訳後テキストデータ蓄積部１４によって蓄積される（ステップＳ１０５）。 As a result of the translation, the first candidate is “C” corresponding to “c”, the second candidate is “G” corresponding to “g”, and the third candidate corresponds to “h”. It is assumed that “H” and fourth and subsequent candidates are also obtained. These translation result candidates are accumulated by the post-translation text data accumulating unit 14 as in the case of machine translation of the first translation unit (step S105).

その後、１番目の翻訳単位の場合と同様に、１番目の翻訳結果の候補に関する分類が分類部１７によって行われ（ステップＳ１０６，Ｓ１０７）、その分類結果を基に、判断部１８によって２以上の翻訳中のサブツリーを含むサブツリーが存在するかどうか判断される（ステップＳ１０８）。この場合には、ノードＩＤ「Ｎ００２」「Ｎ００３」「Ｎ００５」「Ｎ００６」のノードが翻訳中に分類されるため、判断部１８は、２以上の翻訳中のサブツリー（Ｎ００５とＮ００６）を含むサブツリー「Ｎ００７」が存在すると判断する（このサブツリー「Ｎ００７」は、ＮＧに分類されていてもよい）。したがって、その「Ｃ」の選択は行われず（ステップＳ１０９）、２番目の候補についての分類、判断が行われる（ステップＳ１１３）。 Thereafter, as in the case of the first translation unit, the classification related to the first translation result candidate is performed by the classification unit 17 (steps S106 and S107), and two or more are determined by the determination unit 18 based on the classification result. It is determined whether there is a subtree including the subtree being translated (step S108). In this case, since the nodes with node IDs “N002”, “N003”, “N005”, and “N006” are classified during translation, the determination unit 18 includes a subtree including two or more subtrees (N005 and N006) being translated. It is determined that “N007” exists (this subtree “N007” may be classified as NG). Therefore, the selection of “C” is not performed (step S109), and the classification and determination for the second candidate are performed (step S113).

前述の説明と同様に、２番目の翻訳結果の候補に関する分類が分類部１７によって行われ（ステップＳ１０７）、その分類結果を基に、判断部１８によって２以上の翻訳中のサブツリーを含むサブツリーが存在するかどうか判断される（ステップＳ１０８）。その場合の分類結果は、図８の分類結果におけるノードＩＤ「Ｎ００３」に対応する「翻訳中」が「翻訳済」となるだけであるため、判断部１８は、２以上の翻訳中のサブツリーを含むサブツリーが存在しないと判断する。したがって、その「Ｃ」が選択され（ステップＳ１０９）、翻訳結果の候補「Ｇ」が既存の翻訳後テキストデータに付加されるようになる。その結果、この時点での翻訳後テキストデータは、「ＦＥＧ」となる。その後、前述の説明と同様に、次の翻訳単位の翻訳が行われる（ステップＳ１１０，Ｓ１１２，Ｓ１０４）。このようにして、順次、翻訳後テキストデータに付加していく翻訳単位の候補が選択されていくことによって、翻訳後テキストデータが延びていくこととなり、最終的に文末までの翻訳後テキストデータが作成されることになる。出力部２０は、その最終的な翻訳後テキストデータを出力する（ステップＳ１１１）。例えば、出力部２０は、ディスプレイに翻訳後テキストデータを表示する。そのディスプレイの表示を見ることによって、ユーザは、翻訳対象テキストデータ「ａｂｃｄｅｆｇｈｉ」の翻訳結果を知ることができる。
次に、本実施の形態による機械翻訳装置１の実験例について説明する。 Similar to the above description, classification relating to the second translation result candidate is performed by the classification unit 17 (step S107), and based on the classification result, a subtree including two or more subtrees being translated is determined by the determination unit 18. It is determined whether it exists (step S108). The classification result in this case is that “translated” corresponding to the node ID “N003” in the classification result of FIG. 8 is only “translated”, so the determination unit 18 selects two or more subtrees under translation. Determine that there is no subtree to contain. Accordingly, the “C” is selected (step S109), and the translation result candidate “G” is added to the existing post-translation text data. As a result, the post-translation text data at this point is “FEG”. Thereafter, similarly to the above description, the next translation unit is translated (steps S110, S112, S104). In this way, by sequentially selecting the translation unit candidates to be added to the post-translation text data, the post-translation text data will be extended, and finally the post-translation text data up to the end of the sentence will be Will be created. The output unit 20 outputs the final post-translation text data (step S111). For example, the output unit 20 displays the translated text data on the display. By viewing the display on the display, the user can know the translation result of the text data to be translated “abcdefghi”.
Next, an experimental example of the machine translation apparatus 1 according to this embodiment will be described.

まず、実験における評価の尺度について説明する。このたびの実験例では、四つの評価尺度ＷＥＲ、ＰＥＲ、ＢＬＥＵ、ＮＩＳＴを用いた。ここで、各評価尺度に対する本実施の形態による機械翻訳装置１の有効性について簡単に考察する。 First, the evaluation scale in the experiment will be described. In this experimental example, four evaluation scales WER, PER, BLEU, and NIST were used. Here, the effectiveness of the machine translation apparatus 1 according to the present embodiment for each evaluation scale will be briefly considered.

ＷＥＲ：この尺度は大域的な単語順序の入れ替えを考慮することができる。そのため本実施の形態による機械翻訳装置１は、この尺度に対して有効に働くと予想される。
ＰＥＲ：この尺度は基本的には語順を考慮することができない。したがって、この尺度に対して、本実施の形態による提案法は有効ではないと予想される。 WER: This scale can take into account global word order permutations. Therefore, the machine translation apparatus 1 according to the present embodiment is expected to work effectively for this scale.
PER: This measure basically cannot take into account word order. Therefore, it is expected that the proposed method according to the present embodiment is not effective for this scale.

ＢＬＥＵ：この尺度は、ｎｇｒａｍに着目するため、中距離の単語順序の入れ替えを考慮することができる。例えば、レファレンス翻訳ｔｒａｎｓｌａｔｉｏｎ（ｗ_１、ｗ_２、…、ｗ_ｎ）に対し、翻訳結果が（ｗ_１、ｗ_２、…、ｗ_ｊ−１、Ｘ、ｗ_ｊ＋１、…、ｗ_ｎ）である場合に、ＷＥＲ、ＢＬＥＵは共に高い値を示す。しかしながら、翻訳結果（ｗ_ｊ＋１、…、ｗ_ｎ、Ｘ、ｗ_１、ｗ_２、…、ｗ_ｊ−１）に対しＢＬＥＵは同じく高い値を示すのに対し、ＷＥＲの値は０となる。したがって、本実施の形態による機械翻訳装置１はＢＬＥＵに対し有効ではあるが、ＷＥＲほどではないと予想される。 BLEU: Since this scale focuses on ngram, it is possible to consider the replacement of the middle distance word order. For example, when the reference translation translation (w ₁ , w ₂ ,..., W _n ) is a translation result (w ₁ , w ₂ ,..., W _j−1 , X, w _{j + 1} ,..., W _n ). Both WER and BLEU show high values. However, BLEU shows the same high value for the translation results (w _{j + 1} ,..., W _n , X, w ₁ , w ₂ ,..., W _j−1 ), whereas the WER value is 0. Therefore, the machine translation apparatus 1 according to this embodiment is effective for BLEU, but is not expected to be as high as WER.

ＮＩＳＴ：この尺度もＢＬＥＵと同様に、ｎｇｒａｍに着目する。しかしながら、低次のｎｇｒａｍに対する重みがＢＬＥＵより大きいため、本実施の形態による機械翻訳装置１の有効性はＢＬＥＵよりも低いと予想される。 NIST: This scale also pays attention to ngram as in BLEU. However, since the weight for the low-order ngram is larger than BLEU, the effectiveness of the machine translation apparatus 1 according to the present embodiment is expected to be lower than that of BLEU.

［英日ニュース翻訳実験］
まず、本実施の形態による機械翻訳装置１の性能評価のために、英日ニュース翻訳実験を行った。本実施の形態による方法を用いるためには、まず翻訳対象テキストデータの木構造情報（括弧付けられた文）が必要となり、このために、構文解析部１５によって翻訳対象テキストデータをパーズ（構文解析）する必要がある。この際のパージングエラーによる性能劣化が予想される。本実験では、本実施の形態による機械翻訳装置１の性能評価を行うとともに、パージングエラーによる性能劣化の評価を行うことも目的とする。パージングエラーによる性能劣化の評価のために、本実施の形態による方法に対しては自動で（すなわち、構文解析部１５により）パージングを行った結果と、正しい（人手であたえた）パーズ結果を用いた場合の二通りに対する評価を行った。 [English-Japanese News Translation Experiment]
First, in order to evaluate the performance of the machine translation apparatus 1 according to the present embodiment, an English-Japanese news translation experiment was performed. In order to use the method according to the present embodiment, first, tree structure information (sentences in parentheses) of translation target text data is required. For this purpose, the syntax analysis unit 15 parses the translation target text data (syntax analysis). )There is a need to. Performance degradation due to a purging error at this time is expected. The purpose of this experiment is to evaluate the performance of the machine translation apparatus 1 according to the present embodiment and to evaluate performance degradation due to a parsing error. In order to evaluate performance degradation due to a parsing error, the method according to the present embodiment uses a result of parsing automatically (that is, by the parsing unit 15) and a correct parse result (provided manually). Evaluation was made for two cases.

実験コーパスとしては読売新聞、ロイター及びウォールストリートジャーナルを訓練コーパスとして用いた。それぞれのデータサイズ（文数）は１４５Ｋ、５７Ｋ、１４Ｋである。また、ウォールストリートジャーナルから１、７８７文をデベロップメントセットとして、同じく１、７８７文を評価セットとして用いた。実験に用いたウォールストリートジャーナル文はペンツリーバンクコーパスに含まれているものであり、人手によるパーズツリーが与えられている。これらのコーパスの詳細については、図９の表に示されている。 As the experimental corpus, Yomiuri Shimbun, Reuters and Wall Street Journal were used as training corpora. Each data size (number of sentences) is 145K, 57K, and 14K. Also, 1,787 sentences from the Wall Street Journal were used as development sets, and 1,787 sentences were also used as evaluation sets. The Wall Street Journal text used in the experiment is included in the Pentree Bank Corpus and is given a manual parse tree. Details of these corpora are shown in the table of FIG.

この実験例での翻訳モデル情報としては、フレーズベース翻訳モデルを用い、その訓練にはＧＩＺＡ＋＋を用いた。このＧＩＺＡ＋＋については、次の文献１を参照されたい。また、機械翻訳部１３は、言語モデルも用いて機械翻訳を行うものとする。その言語モデルの訓練には、ＳＲＩｌａｎｇｕａｇｅｍｏｄｅｌｔｏｏｌｋｉｔを用いた。これについては、次の文献２を参照されたい。言語モデルは単語トライグラムで、Ｋｎｅｓｅｒ−Ｎｅｙディスカウンティング（次の文献３を参照されたい）で平滑化を行った。デコーディングパラメータの最適化にはｍｉｎｉｍｕｍｅｒｒｏｒｔｒａｉｎｉｎｇ（次の文献４を参照されたい）を用い、ＢＬＥＵに対して最適化を行っている。また、翻訳対象テキストデータから木構造情報を抽出する構文解析部１５としては、Ｃｈａｒｎｉａｋパーザー（次の文献５を参照されたい）を用いた。 As translation model information in this experimental example, a phrase-based translation model was used, and GIZA ++ was used for training. Regarding this GIZA ++, refer to the following document 1. The machine translation unit 13 performs machine translation using a language model. SRI language model tool kit was used for training the language model. Refer to the following document 2 for this. The language model is a word trigram, and smoothing was performed by Kneser-Nee discounting (see the following document 3). For optimization of decoding parameters, minimum error training (refer to the following document 4) is used, and optimization is performed for BLEU. Further, as the syntax analysis unit 15 that extracts the tree structure information from the text data to be translated, a Charniak parser (see the following document 5) is used.

文献１：Ｆ．Ｊ．Ｏｃｈ，Ｈ．Ｎｅｙ，「ＡＳｙｓｔｅｍａｔｉｃＣｏｍｐａｒｉｓｏｎｏｆＶａｒｉｏｕｓＳｔａｔｉｓｔｉｃａｌＡｌｉｇｎｍｅｎｔＭｏｄｅｌｓ」、ＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，Ｎｏ．１，Ｖｏｌ．２９，ｐｐ．１９−５１，２００３． Reference 1: F.R. J. et al. Och, H.C. Ney, “A Systemical Comparison of Variant Statistical Alignment Models”, Computational Linguistics, No. 5; 1, Vol. 29, pp. 19-51, 2003.

文献２：Ａ．Ｓｔｏｌｃｋｅ，「ＳＲＩＬＭ−ＡｎＥｘｔｅｎｓｉｂｌｅＬａｎｇｕａｇｅＭｏｄｅｌＴｏｏｌｋｉｔ」、Ｐｒｏｃ．ＩＣＳＬＰ'０２，２００２．（ｈｔｔｐ：／／ｗｗｗ．ｓｐｅｅｃｈ．ｓｒｉ．ｃｏｍ／ｐｒｏｊｅｃｔｓ／ｓｒｉｌｍ／） Reference 2: A. Stockke, “SRILM-An Extensible Language Model Tool”, Proc. ICSLP '02, 2002. (Http://www.speech.sri.com/projects/srilm/)

文献３：Ｒ．Ｋｎｅｓｅｒ，Ｈ．Ｎｅｙ，「Ｉｍｐｒｏｖｅｄｂａｃｋｉｎｇ−ｏｆｆｆｏｒｍ−ｇｒａｍｌａｎｇｕａｇｅｍｏｄｅｌ」、ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｆＡｃｏｕｓｔｉｃ，Ｓｐｅｅｃｈ，ａｎｄＳｉｇｎａｌｐｒｏｃｅｓｓｉｎｇ．Ｖｏｌ．１，ｐｐ．１８１−１８４，１９９５． Reference 3: R.A. Kneser, H .; Ney, “Improved backing-off for gram language model”, Proceedings of the IEEE International Conference of Acoustics, Speech, and Signaling. Vol. 1, pp. 181-184, 1995.

文献４：Ｆ．Ｊ．Ｏｃｈ，「Ｍｉｎｉｍｕｍｅｒｒｏｒｒａｔｅｔｒａｉｎｉｎｇｆｏｒｓｔａｔｉｓｔｉｃａｌｍａｃｈｉｎｅｔｒａｉｎｓｌａｔｉｏｎ」、Ｐｒｏｃ．ＡＣＬ，２００３． Reference 4: F.R. J. et al. Och, “Minimum error rate training for statistical machine training”, Proc. ACL, 2003.

文献５：Ｅ．Ｃｈａｒｎｉａｋ，「ＡＭａｘｉｍｕｍ−Ｅｎｔｒｏｐｙ−ＩｎｓｐｉｒｅｄＰａｒｓｅｒ」、Ｐｒｏｃ．ＮＡＡＣＬ−２０００，ｐｐ．１３２−１３９，２０００． Reference 5: E.M. Charniak, “A Maximum-Entropy-Inspired Parser”, Proc. NAACL-2000, pp. 132-139, 2000.

実験では三つの条件での比較を行った。「Ｂａｓｅ−ｌｉｎｅ」は、本実施の形態による方法を用いない場合（単なるフレーズベースの統計的機械翻訳を行う場合）であり、「Ｃｈａｒｉｎｉａｋ」は、木構造情報の抽出にＣｈａｒｎｉａｋパーザーを用いた場合であり、「Ｏｒａｃｌｅ」は、ペンツリーバンクの木構造（すなわち、人手による木構造）を用いた場合である。デコーダは発明者らが独自に開発したＰｈａｒａｏｈ（次の文献６を参照されたい）互換デコーダＣｌｅｏｐＡＴＲａを本実施の形態による木構造制約モデル用に改造したものを用いた。この際のパラメータはすべて共通で、Ｂａｓｅ−ｌｉｎｅの条件で最適化を行ったものを用いた。図１０に各条件での評価結果を示す。Ｃｈａｒｎｉａｋパーザーを用いた条件（Ｃｈａｒｉｎｉａｋ）でＷＥＲは約４％の改善、ＢＬＥＵでは約０．６の改善であった。各評価基準に対する改善幅は前述した予想と一致しており、ＷＥＲが一番でＢＬＥＵがそれに続いている。正解木構造を用いたＯｒａｃｌｅの結果とＣｈａｒｉｎｉａｋでは大きな違いはなく、本実施の形態による機械翻訳装置１に対してはＣｈａｒｎｉａｋパーザーの精度は十分であるということができる。ここで、評価文セットに対する木構造はＣｈａｒｉｎｉａｋとＯｒａｃｌｅで６０％が同じであった。 In the experiment, comparison was made under three conditions. “Base-line” is a case where the method according to the present embodiment is not used (when simple phrase-based statistical machine translation is performed), and “Chariniak” is a case where a Charniak parser is used to extract tree structure information. “Oracle” is a case where a tree structure of a pen tree bank (that is, a manual tree structure) is used. As the decoder, a Pharaoh (refer to the following document 6) compatible decoder CleopATRa originally developed by the inventors was used for the tree structure constraint model according to the present embodiment. All parameters at this time are the same, and those optimized under Base-line conditions were used. FIG. 10 shows the evaluation results under each condition. Under the condition using the Charniak parser (Chariniak), the WER was improved by about 4%, and the BLEU was improved by about 0.6. The amount of improvement for each evaluation standard is consistent with the above-mentioned prediction, followed by WER and BLEU. There is no significant difference between the result of Oracle using the correct tree structure and Chariniak, and it can be said that the accuracy of the Charniak parser is sufficient for the machine translation apparatus 1 according to the present embodiment. Here, the tree structure for the evaluation sentence set is 60% the same for Chariniak and Oracle.

文献６：Ｐ．Ｋｏｅｈｎ，「ＰＨＡＲＡＯＨ：Ａｂｅａｍｓｅａｒｃｈｄｅｃｏｄｅｒｆｏｒｐｈｒａｓｅ−ｂａｓｅｄｓｔａｔｉｓｔｉｃａｌｍａｃｈｉｎｅｔｒａｎｓｌａｔｉｏｎｍｏｄｅｌｓ」、Ｐｒｏｃ．ＡＭＴＡ，２００４．（ｈｔｔｐ：／／ｗｗｗ．ｉｓｉ．ｅｄｕ／ｐｕｂｌｉｃａｔｉｏｎｓ／ｌｉｃｅｎｓｅｄ−ｓｗ／ｐｈａｒａｏｈ／） Reference 6: P.M. Koehn, “PHARAOH: A beam search decoder for phase-basic static machine translation models”, Proc. AMTA, 2004. (Http://www.isi.edu/publications/licensed-sw/farah/)

［英日特許翻訳実験］
次に、先の実験とは異なるドメインである特許に対する翻訳実験を行った。特許翻訳実験コーパスに関する詳細情報を図１１に示す。モデルの訓練、パラメータの最適化方法、デコーディングに関しては、ニュース翻訳実験で用いた方法と同じである。 [English-Japanese Patent Translation Experiment]
Next, we conducted translation experiments on patents that have different domains from the previous experiment. Detailed information on the patent translation experiment corpus is shown in FIG. The model training, parameter optimization method, and decoding are the same as those used in the news translation experiment.

図１２に実験結果を示す。図１２において、「Ｐｒｏｐｏｓｅｄ」が本実施の形態による機械翻訳装置１の結果である。ニュース翻訳実験の場合と同様に、ＷＥＲに対する改善が最も大きく４．９％であり、ＢＬＥＵがそれに次いで１．５である。この実験結果から、本実施の形態による機械翻訳装置１は、異なるドメインに対しても有効であることが確認できた。 FIG. 12 shows the experimental results. In FIG. 12, “Proposed” is the result of the machine translation apparatus 1 according to the present embodiment. As with the news translation experiment, the improvement to WER is the largest with 4.9%, followed by BLEU with 1.5. From this experimental result, it was confirmed that the machine translation apparatus 1 according to the present embodiment is also effective for different domains.

［英中翻訳実験］
最後の異なる言語ペアとして英中翻訳実験を行った。実験に用いたコーパスはＳＳＭＴ２００７英中リミテッドトラックで用いられたもので、その詳細を図１３に示す。モデルトレーニング等の条件は英日実験の場合と同様であるが、パラメータの最適化のみに対しては評価セットをそのまま用いており、パラメータに関してクローズドの条件となっている（すなわち、評価データとは別のディベロップメントを用意しなかった）。 [English-Chinese translation experiment]
We conducted an English-Chinese translation experiment as the last different language pair. The corpus used in the experiment was the one used in the SSMT2007 UK / China Limited Track, and its details are shown in FIG. The conditions for model training, etc. are the same as for the English-Japanese experiment, but the evaluation set is used as it is for parameter optimization only, and the parameters are closed (that is, the evaluation data is I did n’t have another development).

図１４に実験結果を示す。なお、本実験での中国語レファレンスの数は４（日本語リファレンスは１）となっている。また評価の単位は文字（漢字）である。英日実験の場合と同様に、ＷＥＲに対する改善が最も大きく４．９％で、ＢＬＥＵがそれに次いで１．９である。この実験結果から本実施の形態による機械翻訳装置１は、異なる言語ペアに対しても有効であることが確認できた。 FIG. 14 shows the experimental results. The number of Chinese references in this experiment is 4 (1 for Japanese reference). The unit of evaluation is a character (kanji). As with the UK-Japan experiment, the improvement to WER is the largest at 4.9%, followed by BLEU at 1.9. From this experimental result, it was confirmed that the machine translation apparatus 1 according to the present embodiment is also effective for different language pairs.

以上のように、本実施の形態による機械翻訳装置１によれば、翻訳対象テキストデータの木構造情報を生成し、その木構造情報に関する制約を導入することによって、構文情報に関するパラメータ学習を行うことなく、構文情報を用いた機械翻訳を実現することができる。したがって、学習データに関するデータスパースネスの問題を生じないようにすることができる。また、その機械翻訳の方法を、翻訳先でのレフト・トゥ・ライトの機械翻訳に直接組み込むことが可能であり、その際には単語の並び替えに関する新たな制約として働く。上記実験例によれば、本実施の形態による機械翻訳装置１においてフレーズベースの統計的機械翻訳を採用した際に、英中翻訳実験においてＢＬＥＵで１．９、ＷＥＲで４．９％の改善を示し、単語の大域的な並び替えに関して有効に働くことをＷＥＲにおける４．９％という性能向上で確認することができた。 As described above, according to the machine translation device 1 according to the present embodiment, parameter structure-related parameter learning is performed by generating tree-structure information of translation target text data and introducing restrictions on the tree-structure information. And machine translation using syntax information can be realized. Therefore, it is possible to prevent a problem of data sparseness related to learning data. Also, the machine translation method can be directly incorporated into the left-to-right machine translation at the translation destination, and in this case, it acts as a new restriction on the rearrangement of words. According to the above experimental example, when the phrase-based statistical machine translation is adopted in the machine translation apparatus 1 according to the present embodiment, an improvement of 1.9% for BLEU and 4.9% for WER is achieved in an English-Chinese translation experiment. It was confirmed that it works effectively for global word reordering with a performance improvement of 4.9% in WER.

なお、本実施の形態による機械翻訳装置１では、主に翻訳先でのレフト・トゥ・ライトの機械翻訳を行う場合について説明したが、前述のように、機械翻訳部１３によって複数の翻訳後テキストデータを事前に生成し、その生成された複数の翻訳後テキストデータから、木構造の制約条件に合致するものを選択するようにしてもよい。その場合には、分類部１７による分類を行う必要がないため、機械翻訳装置が分類部１７を備えていなくてもよい。図１５は、分類部１７を備えず、機械翻訳部１３によって生成された複数の翻訳後テキストデータから、木構造の制約条件に合致するものを選択部１９が選択する機械翻訳装置２の構成を示すブロック図である。その機械翻訳装置２が備える各構成要素は、上記説明と同様のものである。ただし、機械翻訳部１３によって生成された複数の翻訳後テキストデータから、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できると判断部１８が判断した１または２以上の翻訳後テキストデータを選択部１９が選択することになる。 In the machine translation apparatus 1 according to the present embodiment, a case where left-to-right machine translation is mainly performed at the translation destination has been described. However, as described above, the machine translation unit 13 performs a plurality of post-translation texts. Data may be generated in advance, and a plurality of post-translation text data generated that match the tree structure constraint conditions may be selected. In that case, since it is not necessary to perform classification by the classification unit 17, the machine translation device may not include the classification unit 17. FIG. 15 shows a configuration of the machine translation apparatus 2 that does not include the classification unit 17 and that the selection unit 19 selects from the plurality of post-translation text data generated by the machine translation unit 13 that matches the constraint condition of the tree structure. FIG. Each component included in the machine translation apparatus 2 is the same as that described above. However, the determination unit 18 determines that it can be realized by converting a plurality of post-translation text data generated by the machine translation unit 13 from the source language of the leaf of the tree structure to the target language and replacing the nodes of the tree structure. The selection unit 19 selects one or more post-translation text data.

図１６は、図１５で示される機械翻訳装置２の動作を示すフローチャートである。図１６のフローチャートにおいて、ステップＳ１０１，Ｓ１０２の処理は、図３のフローチャートと同様であり、その説明を省略する。 FIG. 16 is a flowchart showing the operation of the machine translation apparatus 2 shown in FIG. In the flowchart of FIG. 16, the processes in steps S101 and S102 are the same as those in the flowchart of FIG.

（ステップＳ２０１）機械翻訳部１３は、翻訳対象テキストデータを機械翻訳する。この機械翻訳の際に、機械翻訳部１３は、翻訳対象テキストデータと対訳関係にある複数の翻訳後テキストデータを生成するものとする。この生成される翻訳後テキストデータは、例えば、文である。 (Step S201) The machine translation unit 13 machine translates the text data to be translated. At the time of this machine translation, the machine translation unit 13 generates a plurality of post-translation text data having a parallel translation relationship with the translation target text data. The generated post-translation text data is, for example, a sentence.

（ステップＳ２０２）翻訳後テキストデータ蓄積部１４は、機械翻訳部１３の生成した複数の翻訳後テキストデータを所定の記録媒体に蓄積する。
（ステップＳ２０３）判断部１８は、カウンタｉを１に設定する。 (Step S202) The post-translation text data storage unit 14 stores a plurality of post-translation text data generated by the machine translation unit 13 in a predetermined recording medium.
(Step S203) The determination unit 18 sets the counter i to 1.

（ステップＳ２０４）判断部１８は、木構造情報蓄積部１６が蓄積した木構造情報を参照し、翻訳後テキストデータ蓄積部１４が蓄積したｉ番目の翻訳後テキストデータを、木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できるかどうか判断する。なお、機械翻訳部１３が単語ベースの統計的機械翻訳を行う場合には、判断部１８は、ｉ番目の翻訳後テキストデータを、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への単語単位の変換と、その木構造のノードの入れ替えによって実現できるかどうか判断してもよい。また、機械翻訳部１３がフレーズベースの統計的機械翻訳を行う場合には、判断部１８は、ｉ番目の翻訳後テキストデータを、その翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語へのフレーズ単位の変換を含む変換と、フレーズベースの統計的機械翻訳で翻訳される単位であるフレーズを分割することのない、その木構造のノードの入れ替えによって実現できるかどうか判断してもよい。 (Step S204) The determination unit 18 refers to the tree structure information stored in the tree structure information storage unit 16 and uses the i-th post-translation text data stored in the post-translation text data storage unit 14 as the original tree structure leaf. It is determined whether or not it can be realized by converting the language into the target language and switching the nodes of the tree structure. When the machine translation unit 13 performs word-based statistical machine translation, the determination unit 18 obtains the i-th post-translation text data from the translation target text data corresponding to the post-translation text data. It may be determined whether or not it can be realized by conversion of a word unit from the original language of the leaf of the tree structure to the target language and replacement of the nodes of the tree structure. When the machine translation unit 13 performs the phrase-based statistical machine translation, the determination unit 18 obtains the i-th post-translation text data from the translation target text data corresponding to the post-translation text data. Conversion including the conversion of the phrase unit from the source language to the target language of the leaf of the tree structure, and replacement of the node of the tree structure without dividing the phrase that is the unit translated by the phrase-based statistical machine translation It may be determined whether it can be realized.

（ステップＳ２０５）選択部１９は、実現できると判断部１８によって判断された場合には、そのｉ番目の翻訳後テキストデータを選択し、ステップＳ２０６に進む。一方、実現できないと判断部１８によって判断された場合には、そのｉ番目の翻訳後テキストデータを選択せず、ステップＳ２０７に進む。 (Step S205) If the determination unit 18 determines that the selection can be realized, the selection unit 19 selects the i-th post-translation text data, and proceeds to step S206. On the other hand, if the determination unit 18 determines that it cannot be realized, the i-th post-translation text data is not selected, and the process proceeds to step S207.

（ステップＳ２０６）出力部２０は、選択部１９によって選択されたｉ番目の翻訳後テキストデータを出力する。なお、前述のように、出力部２０による出力は、選択結果を示す情報（例えば、選択された翻訳後テキストデータを識別する情報など）の出力であってもよい。そして、機械翻訳の一連の処理は、終了となる。
（ステップＳ２０７）判断部１８は、カウンタｉを１だけインクリメントする。そして、ステップＳ２０４に戻る。 (Step S206) The output unit 20 outputs the i-th post-translation text data selected by the selection unit 19. As described above, the output by the output unit 20 may be output of information indicating the selection result (for example, information for identifying the selected post-translation text data). Then, a series of machine translation processes ends.
(Step S207) The determination unit 18 increments the counter i by 1. Then, the process returns to step S204.

なお、この図１６のフローチャートも、図３のフローチャートと同様に、一文を機械翻訳する処理について説明するためのものである。したがって、複数の文を連続して機械翻訳する場合には、図１６のフローチャートで示される一連の処理をその文の数だけ繰り返して実行すればよい。 Note that the flowchart of FIG. 16 is also for explaining the process of machine-translating a sentence, like the flowchart of FIG. Therefore, when a plurality of sentences are continuously machine-translated, the series of processes shown in the flowchart of FIG. 16 may be repeated for the number of sentences.

また、図１６のフローチャートでは、選択部１９による選択が行われた時点で、他の翻訳後テキストデータに関する判断を行わない場合について説明した。したがって、図１６のフローチャートでは、１個の翻訳後テキストデータが選択されるだけである。一方、選択部１９による選択が行われたとしても、それ以降の翻訳後テキストデータについても、判断を行い、選択することができる翻訳後テキストデータがあるのであれば、その翻訳後テキストデータを選択するようにしてもよい。その場合には、２以上の翻訳後テキストデータが選択されうることになる。 In the flowchart of FIG. 16, the case has been described in which the determination regarding other post-translation text data is not performed when the selection unit 19 performs the selection. Therefore, in the flowchart of FIG. 16, only one post-translation text data is selected. On the other hand, even if a selection is made by the selection unit 19, if there is post-translation text data that can be selected and selected for subsequent post-translation text data, the post-translation text data is selected. You may make it do. In that case, two or more post-translation text data can be selected.

また、図１６のフローチャートにおいて、選択部１９は、判断部１８によって木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できると判断された翻訳後テキストデータのうち、機械翻訳の際に算出された尤度の大きいものから順番に、予め決められた個数や割合の翻訳後テキストデータを選択するようにしてもよく、あるいは、その尤度があらかじめ決められたしきい値よりも高い翻訳後テキストデータを選択するようにしてもよい。このように、選択部１９による選択は、判断部１８によって木構造のリーフの原言語から目的言語への変換と、その木構造のノードの入れ替えによって実現できると判断された翻訳後テキストデータを少なくとも選択するのであれば、その他の点において、ある程度の任意性がある。 In the flowchart of FIG. 16, the selection unit 19 determines the post-translation text data that can be realized by the conversion from the source language of the leaf of the tree structure to the target language and the replacement of the nodes of the tree structure by the determination unit 18. Among them, it is possible to select a predetermined number or ratio of post-translation text data in descending order of the likelihood calculated at the time of machine translation, or the likelihood is determined in advance. Alternatively, post-translation text data that is higher than the threshold value may be selected. As described above, the selection by the selection unit 19 includes at least post-translation text data determined by the determination unit 18 to be realized by converting the leaf of the tree structure from the source language to the target language and replacing the nodes of the tree structure. If so, there is some degree of arbitraryness in other respects.

また、図１６のフローチャートでは、機械翻訳部１３が翻訳した複数の翻訳後テキストデータから、いずれかの翻訳後テキストデータが選択される場合について説明したが、例えば、すべての翻訳後テキストデータについて判断を行っても、選択しうる翻訳後テキストデータが存在しなかった場合には、エラーであるとして、翻訳後テキストデータを出力することなく、一連の処理を終了するようにしてもよい。その場合には、エラーである旨を出力してもよく、そうでなくてもよい。 In the flowchart of FIG. 16, the case where any post-translation text data is selected from the plurality of post-translation text data translated by the machine translation unit 13 is described. For example, all post-translation text data is determined. If there is no post-translation text data that can be selected even after performing the above, it is determined that an error has occurred, and the series of processing may be terminated without outputting the post-translation text data. In that case, an error message may be output or not.

また、翻訳後テキストデータが木構造の制約に合致するかどうかの具体的な判断方法は、上述の「単語ベースの統計的機械翻訳における判断の方法」や、「フレーズベースの統計的機械翻訳における判断の方法」での具体例と同様であって、その説明を省略する。 In addition, the specific method for determining whether the translated text data meets the constraints of the tree structure is the above-mentioned “Method of Judgment in Word-Based Statistical Machine Translation” or “Phrase-based Statistical Machine Translation”. This is the same as the specific example in the “determination method”, and the description thereof is omitted.

また、上記実施の形態では、機械翻訳装置がスタンドアロンである場合について説明したが、機械翻訳装置は、スタンドアロンの装置であってもよく、サーバ・クライアントシステムにおけるサーバ装置であってもよい。後者の場合には、出力部等は、例えば、通信回線を介して情報を出力することになる。 In the above-described embodiment, the case where the machine translation apparatus is a stand-alone has been described. However, the machine translation apparatus may be a stand-alone apparatus or a server apparatus in a server / client system. In the latter case, the output unit or the like outputs information via a communication line, for example.

また、上記実施の形態において、各処理または各機能は、単一の装置または単一のシステムによって集中処理されることによって実現されてもよく、あるいは、複数の装置または複数のシステムによって分散処理されることによって実現されてもよい。 In the above embodiment, each process or each function may be realized by centralized processing by a single device or a single system, or may be distributedly processed by a plurality of devices or a plurality of systems. It may be realized by doing.

また、上記実施の形態において、機械翻訳装置に含まれる２以上の構成要素が通信デバイスや入力デバイス等を有する場合に、２以上の構成要素が物理的に単一のデバイスを有してもよく、あるいは、別々のデバイスを有してもよい。 In the above embodiment, when two or more components included in the machine translation apparatus include a communication device or an input device, the two or more components may have a physically single device. Alternatively, it may have a separate device.

また、上記実施の形態において、各構成要素は専用のハードウェアにより構成されてもよく、あるいは、ソフトウェアにより実現可能な構成要素については、プログラムを実行することによって実現されてもよい。例えば、ハードディスクや半導体メモリ等の記録媒体に記録されたソフトウェア・プログラムをＣＰＵ等のプログラム実行部が読み出して実行することによって、各構成要素が実現され得る。なお、上記実施の形態における機械翻訳装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータを、原言語から目的言語への翻訳で用いられる翻訳モデル情報が記憶される翻訳モデル情報記憶部で記憶されている翻訳モデル情報を用いて、翻訳対象となる原言語のテキストデータである翻訳対象テキストデータが記憶される翻訳対象テキストデータ記憶部で記憶されている翻訳対象テキストデータを統計的に機械翻訳する機械翻訳部と、前記機械翻訳部が翻訳対象テキストデータを機械翻訳した目的言語のテキストデータである翻訳後テキストデータを蓄積する翻訳後テキストデータ蓄積部と、前記翻訳対象テキストデータを構文解析することにより、前記翻訳対象テキストデータの木構造を示す情報である木構造情報を得る構文解析部と、前記木構造情報を蓄積する木構造情報蓄積部と、前記翻訳後テキストデータ蓄積部が蓄積した翻訳後テキストデータを、前記木構造情報蓄積部が蓄積した木構造情報の示す木構造であり、当該翻訳後テキストデータに対応する翻訳対象テキストデータから得られた木構造のリーフの原言語から目的言語への変換と、当該木構造のノードの入れ替えによって実現できるかどうか判断する判断部と、前記木構造のリーフの原言語から目的言語への変換と、当該木構造のノードの入れ替えによって実現できると前記判断部が判断した前記翻訳後テキストデータを選択する選択部と、前記選択部による選択結果を出力する出力部として機能させるためのものである。 In the above embodiment, each component may be configured by dedicated hardware, or a component that can be realized by software may be realized by executing a program. For example, each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. The software that implements the machine translation apparatus in the above embodiment is the following program. In other words, this program uses a translation model information stored in a translation model information storage unit in which translation model information used in translation from the source language to the target language is stored. Machine translation unit that statistically translates translation target text data stored in a translation target text data storage unit in which translation target text data is stored, and the machine translation unit converts the translation target text data This is information indicating a tree structure of the text data to be translated by parsing the text data to be translated and a text data storage section for storing the text data after translation which is text data in the target language that has been machine translated. A syntax analysis unit for obtaining tree structure information; a tree structure information storage unit for storing the tree structure information; The post-translation text data stored by the post-translation text data storage unit is a tree structure indicated by the tree structure information stored by the tree structure information storage unit, and is obtained from the translation target text data corresponding to the post-translation text data Conversion from the source language of the tree-structured leaf to the target language, a determination unit that determines whether the tree-structured node can be replaced, conversion from the source language of the tree-structured leaf to the target language, This is to function as a selection unit that selects the post-translation text data determined by the determination unit to be realized by replacing nodes in a tree structure, and an output unit that outputs a selection result by the selection unit.

なお、上記プログラムにおいて、上記プログラムが実現する機能には、ハードウェアでしか実現できない機能は含まれない。例えば、情報を出力する出力部などにおけるモデムやインターフェースカードなどのハードウェアでしか実現できない機能は、上記プログラムが実現する機能には少なくとも含まれない。 In the program, the functions realized by the program do not include functions that can be realized only by hardware. For example, a function that can be realized only by hardware such as a modem or an interface card in an output unit that outputs information is not included in at least the function realized by the program.

また、このプログラムは、サーバなどからダウンロードされることによって実行されてもよく、所定の記録媒体（例えば、ＣＤ−ＲＯＭなどの光ディスクや磁気ディスク、半導体メモリなど）に記録されたプログラムが読み出されることによって実行されてもよい。 Further, this program may be executed by being downloaded from a server or the like, and a program recorded on a predetermined recording medium (for example, an optical disk such as a CD-ROM, a magnetic disk, a semiconductor memory, or the like) is read out. May be executed by

また、このプログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes this program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

図１７は、上記プログラムを実行して、上記実施の形態による機械翻訳装置を実現するコンピュータの外観の一例を示す模式図である。上記実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムによって実現される。 FIG. 17 is a schematic diagram showing an example of the external appearance of a computer that executes the program and realizes the machine translation apparatus according to the embodiment. The above-described embodiment is realized by computer hardware and a computer program executed on the computer hardware.

図１７において、コンピュータシステム１００は、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）ドライブ１０５、ＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋ）ドライブ１０６を含むコンピュータ１０１と、キーボード１０２と、マウス１０３と、モニタ１０４とを備える。 In FIG. 17, the computer system 100 includes a computer 101 including a CD-ROM (Compact Disk Read Only Memory) drive 105 and an FD (Flexible Disk) drive 106, a keyboard 102, a mouse 103, and a monitor 104.

図１８は、コンピュータシステムを示す図である。図１８において、コンピュータ１０１は、ＣＤ−ＲＯＭドライブ１０５、ＦＤドライブ１０６に加えて、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１１と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１１２と、ＣＰＵ１１１に接続され、アプリケーションプログラムの命令を一時的に記憶すると共に、一時記憶空間を提供するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１１３と、アプリケーションプログラム、システムプログラム、及びデータを記憶するハードディスク１１４と、ＣＰＵ１１１、ＲＯＭ１１２等を相互に接続するバス１１５とを備える。なお、コンピュータ１０１は、ＬＡＮへの接続を提供する図示しないネットワークカードを含んでいてもよい。 FIG. 18 is a diagram illustrating a computer system. 18, in addition to the CD-ROM drive 105 and the FD drive 106, a computer 101 includes a CPU (Central Processing Unit) 111, a ROM (Read Only Memory) 112 for storing a program such as a bootup program, A CPU (Random Access Memory) 113 that is connected to the CPU 111 and temporarily stores application program instructions and provides a temporary storage space, a hard disk 114 that stores application programs, system programs, and data, a CPU 111 and a ROM 112. Etc. to each other. The computer 101 may include a network card (not shown) that provides connection to the LAN.

コンピュータシステム１００に、上記実施の形態による機械翻訳装置の機能を実行させるプログラムは、ＣＤ−ＲＯＭ１２１、またはＦＤ１２２に記憶されて、ＣＤ−ＲＯＭドライブ１０５、またはＦＤドライブ１０６に挿入され、ハードディスク１１４に転送されてもよい。これに代えて、そのプログラムは、図示しないネットワークを介してコンピュータ１０１に送信され、ハードディスク１１４に記憶されてもよい。プログラムは実行の際にＲＡＭ１１３にロードされる。なお、プログラムは、ＣＤ−ＲＯＭ１２１やＦＤ１２２、またはネットワークから直接、ロードされてもよい。 A program that causes the computer system 100 to execute the functions of the machine translation apparatus according to the above-described embodiment is stored in the CD-ROM 121 or FD 122, inserted into the CD-ROM drive 105 or FD drive 106, and transferred to the hard disk 114. May be. Instead, the program may be transmitted to the computer 101 via a network (not shown) and stored in the hard disk 114. The program is loaded into the RAM 113 at the time of execution. The program may be loaded directly from the CD-ROM 121, the FD 122, or the network.

プログラムは、コンピュータ１０１に、上記実施の形態による機械翻訳装置の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティプログラム等を必ずしも含んでいなくてもよい。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいてもよい。コンピュータシステム１００がどのように動作するのかについては周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS) or a third-party program that causes the computer 101 to execute the functions of the machine translation apparatus according to the above-described embodiment. The program may include only a part of an instruction that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 100 operates is well known and will not be described in detail.

また、本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 Further, the present invention is not limited to the above-described embodiment, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上より、本発明による機械翻訳装置等によれば、構文情報による学習を行うことなく、木構造情報に関する制約を機械翻訳に導入することができ、データスパースネスの問題を生じないという効果が得られ、機械翻訳を行う機械翻訳装置等として有用である。 As described above, according to the machine translation device or the like according to the present invention, it is possible to introduce restrictions on tree structure information into machine translation without performing learning by syntax information, and there is an effect that a problem of data sparseness does not occur. It is useful as a machine translation device that performs machine translation.

本発明の実施の形態１による機械翻訳装置の構成を示すブロック図1 is a block diagram showing the configuration of a machine translation apparatus according to Embodiment 1 of the present invention. 同実施の形態における木構造の一例を示す図The figure which shows an example of the tree structure in the embodiment 同実施の形態による機械翻訳装置の動作を示すフローチャートFlowchart showing the operation of the machine translation apparatus according to the embodiment 同実施の形態における木構造の一例を示す図The figure which shows an example of the tree structure in the embodiment 同実施の形態における木構造の一例を示す図The figure which shows an example of the tree structure in the embodiment 同実施の形態における木構造の一例を示す図The figure which shows an example of the tree structure in the embodiment 同実施の形態における翻訳対象テキストデータと翻訳済フラグとの対応の一例を示す図The figure which shows an example of a response | compatibility with the translation object text data and translated flag in the embodiment 同実施の形態における分類部による分類結果の一例を示す図The figure which shows an example of the classification result by the classification | category part in the embodiment 同実施の形態の実験例における英日コーパスについて説明するための図The figure for demonstrating the English-Japanese corpus in the experiment example of the embodiment 同実施の形態における実験結果の一例を示す図The figure which shows an example of the experimental result in the same embodiment 同実施の形態の実験例における英日コーパスについて説明するための図The figure for demonstrating the English-Japanese corpus in the experiment example of the embodiment 同実施の形態における実験結果の一例を示す図The figure which shows an example of the experimental result in the same embodiment 同実施の形態の実験例における英中コーパスについて説明するための図The figure for demonstrating the English-Chinese corpus in the experiment example of the embodiment 同実施の形態における実験結果の一例を示す図The figure which shows an example of the experimental result in the same embodiment 同実施の形態における機械翻訳装置の他の構成を示すブロック図The block diagram which shows the other structure of the machine translation apparatus in the embodiment 同実施の形態における機械翻訳装置の動作を示すフローチャートFlowchart showing the operation of the machine translation device in the same embodiment 同実施の形態におけるコンピュータシステムの外観一例を示す模式図Schematic diagram showing an example of the appearance of the computer system in the embodiment 同実施の形態におけるコンピュータシステムの構成の一例を示す図The figure which shows an example of a structure of the computer system in the embodiment

Explanation of symbols

１、２機械翻訳装置
１１翻訳対象テキストデータ記憶部
１２翻訳モデル情報記憶部
１３機械翻訳部
１４翻訳後テキストデータ蓄積部
１５構文解析部
１６木構造情報蓄積部
１７分類部
１８判断部
１９選択部
２０出力部 DESCRIPTION OF SYMBOLS 1, 2 Machine translation apparatus 11 Translation object text data storage part 12 Translation model information storage part 13 Machine translation part 14 Post-translation text data storage part 15 Syntax analysis part 16 Tree structure information storage part 17 Classification part 18 Judgment part 19 Selection part 20 Output section

Claims

A translation target text data storage unit that stores translation target text data that is text data of a source language to be translated;
A translation model information storage unit for storing translation model information used in translation from the source language to the target language;
Using the translation model information, a machine translation unit that statistically machine translates the text data to be translated,
A post-translation text data storage unit that stores post-translation text data that is text data of a target language obtained by machine-translating the text data to be translated by the machine translation unit;
A syntax analysis unit that obtains tree structure information that is information indicating a tree structure of the translation target text data by parsing the translation target text data;
A tree structure information storage unit for storing the tree structure information;
The post-translation text data stored by the post-translation text data storage unit is a tree structure indicated by the tree structure information stored by the tree structure information storage unit, and is obtained from the translation target text data corresponding to the post-translation text data. A determination unit that determines whether or not it can be realized by converting the leaf of the tree structure from the source language to the target language and replacing the nodes of the tree structure;
A selection unit that selects the post-translation text data determined by the determination unit to be realized by conversion from a source language of a leaf of the tree structure to a target language, and replacement of nodes of the tree structure;
An output unit that outputs a selection result by the selection unit.

The machine translation unit performs statistical machine translation based on words,
The determination unit is a tree structure indicated by the tree structure information accumulated by the tree structure information accumulation unit, the post-translation text data accumulated by the translated text data accumulation unit, and a translation target corresponding to the translated text data The machine translation device according to claim 1, wherein it is determined whether or not it can be realized by conversion of a word unit from a source language of a leaf of a tree structure obtained from text data to a target language and replacement of nodes of the tree structure.

The machine translation unit performs a phrase-based statistical machine translation,
The determination unit is a tree structure indicated by the tree structure information accumulated by the tree structure information accumulation unit, the post-translation text data accumulated by the translated text data accumulation unit, and a translation target corresponding to the translated text data The conversion that includes the conversion of the phrase unit from the source language of the tree-structured leaf obtained from the text data to the target language, and the phrase that is the unit translated by the phrase-based statistical machine translation, without dividing The machine translation apparatus according to claim 1, wherein it is determined whether or not it can be realized by exchanging nodes of a tree structure.

The machine translation unit performs machine translation of left to right at the translation destination,
The post-translation text data includes text data being translated,
Each subtree in the tree structure indicated by the tree structure information stored by the tree structure information storage unit is untranslated indicating that the subtree includes only untranslated leaves or consists only of untranslated subtrees; Translated to indicate that the subtree contains only translated leaves or consists only of translated subtrees, and the subtree contains only translated and untranslated leaves, or untranslated and translated A classifying unit that classifies only that sub-tree or is in translation indicating that only one sub-tree is being translated,
When the subtree including two or more subtrees under translation appears using the classification result of the classification unit, the determination unit converts the translated text data from the source language of the tree-structured leaf to the target language. Judging that it can not be realized by conversion to and replacement of nodes of the tree structure,
The machine translation unit adds the post-translation text data after the addition of the text so that the post-translation text data is selected by the selection unit when adding a new post-translation target language text to the post-translation text data. The machine translation device according to any one of claims 1 to 3.

The machine translation unit generates a plurality of post-translation text data corresponding to the text data to be translated,
1 or 2 or more which the said judgment part judged that the said selection part was realizable by conversion from the original language of the leaf of the said tree structure to the target language, and replacement | exchange of the node of the said tree structure from several translated text data The machine translation apparatus according to claim 1, wherein the post-translation text data is selected.

Translation target text data storage unit storing translation target text data which is text data of the source language to be translated, and translation model information storage unit storing translation model information used in translation from the source language to the target language A machine translation method processed using a machine translation unit, a post-translation text data storage unit, a syntax analysis unit, a tree structure information storage unit, a determination unit, a selection unit, and an output unit. ,
A machine translation step in which the machine translation unit statistically machine translates the text data to be translated using the translation model information;
The post-translation text data storage unit stores post-translation text data that is post-translation text data that is text data of a target language obtained by machine translating the text data to be translated in the machine translation step;
A syntax analysis step in which the syntax analysis unit obtains tree structure information which is information indicating a tree structure of the translation target text data by parsing the translation target text data;
A tree structure information storage step in which the tree structure information storage unit stores the tree structure information;
The determination unit is a tree structure indicated by the tree structure information stored in the tree structure information storage step, the post-translation text data stored in the post-translation text data storage step, and a translation target corresponding to the post-translation text data A determination step for determining whether the tree structure leaf obtained from the text data can be realized by conversion from the source language to the target language and replacing the nodes of the tree structure;
A selection step of selecting the post-translation text data determined in the determination step that the selection unit can be realized by conversion from the source language of the leaf of the tree structure to a target language and replacement of nodes of the tree structure;
An output step in which the output unit outputs a selection result in the selection step.

Computer
Translation target text data, which is text data in the source language to be translated, using translation model information stored in a translation model information storage unit in which translation model information used in translation from the source language to the target language is stored A machine translation unit that statistically machine translates the text data to be translated stored in the text data storage unit to be translated,
A post-translation text data storage unit that stores post-translation text data that is text data in a target language obtained by machine-translating the text data to be translated by the machine translation unit;
A syntax analysis unit that obtains tree structure information that is information indicating a tree structure of the translation target text data by parsing the translation target text data;
A tree structure information storage unit for storing the tree structure information;
The post-translation text data stored by the post-translation text data storage unit is a tree structure indicated by the tree structure information stored by the tree structure information storage unit, and is obtained from the translation target text data corresponding to the post-translation text data. A determination unit that determines whether or not it can be realized by converting the leaf of the tree structure from the source language to the target language and replacing the nodes of the tree structure;
A selection unit that selects the post-translation text data determined by the determination unit to be realized by conversion from a source language of a leaf of the tree structure to a target language, and replacement of nodes of the tree structure;
The program for functioning as an output part which outputs the selection result by the said selection part.