JP2022002090A

JP2022002090A - Information processing method, apparatus, device, and computer readable storage media

Info

Publication number: JP2022002090A
Application number: JP2021101296A
Authority: JP
Inventors: スーマンヂャン; si man Zhang; シュホングオ; Shuhong Guo; ウェイリィウ; Wei Liu; アンシンリ; An-Shin Lee; ランチェン; Lan Chen; 聡一朗村上; Soichiro Murakami
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2020-06-19
Filing date: 2021-06-18
Publication date: 2022-01-06
Also published as: CN113822082A

Abstract

【課題】文章における単語の翻訳の一貫性を維持する情報処理方法、装置、機器及びコンピュータ読み取り可能な記憶媒体を提供する。【解決手段】情報処理方法は、第１の処理すべき情報を取得しＳ１０１、第１の処理すべき情報に第１の代用語が含まれると、１つ又は複数の候補単語集合から第１の代用語を含む第１の候補単語集合を選択しＳ１０２（１つ又は複数の候補単語集合のそれぞれは、少なくとも２つの意味が一致し表現が一致しない候補単語を含む）、制約モデルにより、第１の候補単語集合における第１の代用語に対応して翻訳された第１の制約結果を取得しＳ１０３、第１の制約結果に応じて、第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成するＳ１０４。【選択図】図１PROBLEM TO BE SOLVED: To provide an information processing method, an apparatus, an apparatus and a computer-readable storage medium for maintaining the consistency of word translation in a sentence. SOLUTION: In an information processing method, when a first information to be processed is acquired and S101, and a first substitute term is included in the first information to be processed, a first from one or a plurality of candidate word sets. Select the first candidate word set containing the substitute term of S102 (each of one or more candidate word sets includes a candidate word having at least two meanings matching and the expressions do not match), according to the constraint model. The first constraint result translated corresponding to the first alternative word in the candidate word set of 1 is acquired, and the translation result of the information to be processed is modified according to S103 and the first constraint result. By doing so, S104 to generate the second information to be processed. [Selection diagram] Fig. 1

Description

本出願は、情報処理分野に関し、具体的に、情報処理方法、装置、機器及びコンピュータ読み取り可能な記憶媒体に関するものである。 The present application relates to the field of information processing, specifically to information processing methods, devices, devices and computer-readable storage media.

言語翻訳が行われる場合、セマンティクスが同じ単語は文書内で表現が異なる可能性があるため、文章における全ての翻訳単語の一貫性を確保することは非常に重要になる。特に長いドキュメント（特に、法的文書など）を翻訳する場合に、文章における単語の翻訳の一貫性を維持することは特に重要になる。 When linguistic translation is done, it is very important to ensure the consistency of all translated words in a sentence, as words with the same semantics may have different expressions in the document. Maintaining consistent word translations in sentences is especially important when translating long documents (especially legal documents).

上記の問題に鑑み、本開示は、情報処理方法、装置、機器及びコンピュータ読み取り可能な記憶媒体を提供する。 In view of the above problems, the present disclosure provides information processing methods, devices, devices and computer readable storage media.

本開示の一態様によれば、第１の処理すべき情報を取得することと、前記第１の処理すべき情報に第１の代用語が含まれると、１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは、少なくとも２つの意味が一致し表現が一致しない候補単語を含むことと、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得することと、前記第１の制約結果に応じて、前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成することと、を含む情報処理方法を提供する。 According to one aspect of the present disclosure, when the first information to be processed is acquired and the first substitute term is included in the first information to be processed, one or a plurality of candidate word sets are used. A first candidate word set containing the first alternative term is selected, and each of the one or more candidate word sets is restricted to include candidate words having at least two meanings matching and expressions not matching. By the model, the first constraint result translated corresponding to the first alternative word in the first candidate word set is acquired, and the first process is performed according to the first constraint result. By modifying the translation result of the information to be processed, the information processing method including the generation of the second information to be processed is provided.

本開示の一態様によれば、前記制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得することは、前記第１の候補単語集合における各候補単語と各候補単語の１つ又は複数の翻訳結果とをペアリングし、翻訳ペアとすることと、各翻訳ペアの特徴及び／又は各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することと、を含む。 According to one aspect of the present disclosure, obtaining the first constraint result translated corresponding to the first alternative word in the first candidate word set by the constraint model is the first constraint. Based on pairing each candidate word in the candidate word set with one or more translation results of each candidate word to form a translation pair, the characteristics of each translation pair, and / or the representation of the semantic space of each translation pair. And to acquire the first constraint result.

本開示の一態様によれば、前記制約モデルは教師ありによる第１の制約モデルであり、前記各翻訳ペアの特徴及び／又は各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することは、前記第１の制約モデルにより、各翻訳ペアの特徴に基づいて、前記第１の制約結果を取得することを含み、前記教師ありによる第１の制約モデルは学習データにより学習して得られる。 According to one aspect of the present disclosure, the constraint model is a supervised first constraint model, the first constraint based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair. Acquiring the result includes acquiring the first constraint result based on the characteristics of each translation pair by the first constraint model, and the first constraint model with the teacher is based on training data. Obtained by learning.

本開示の一態様によれば、前記制約モデルは教師なしによる第２の制約モデルであり、前記各翻訳ペアの特徴及び／又は各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することは、前記第２の制約モデルにより、各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することを含み、前記教師なしによる第２の制約モデルは事前に学習する必要がない。 According to one aspect of the present disclosure, the constraint model is an unsupervised second constraint model, the first constraint based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair. Obtaining the result includes obtaining the first constraint result based on the representation of the semantic space of each translation pair by the second constraint model, and the second constraint model without the teacher is No need to learn in advance.

本開示の一態様によれば、前記第２の制約モデルにより、各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することは、全ての翻訳ペアのセマンティック空間の中心表現と各翻訳ペアのセマンティック空間の表現との間の距離に基づいて、前記第１の制約結果を取得することを含む。 According to one aspect of the present disclosure, it is the center of the semantic space of all translation pairs to obtain the first constraint result based on the representation of the semantic space of each translation pair by the second constraint model. It involves acquiring the first constraint result based on the distance between the representation and the representation of the semantic space of each translation pair.

本開示の一態様によれば、前記制約モデルは教師ありによる第１の制約モデル及び教師なしによる第２の制約モデルを含む第３の制約モデルであり、前記各翻訳ペアの特徴及び／又は各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することは、前記第２の制約モデルにより、各翻訳ペアのセマンティック空間の表現に基づいて、Ｎ個（Ｎは２以上の整数である）の翻訳ペアを含む第２の候補単語集合を取得することと、前記第１の制約モデルにより、前記第２の候補単語集合における各翻訳ペアの特徴に基づいて、前記第１の制約結果を取得することと、を含み、前記教師ありによる第１の制約モデルは学習データにより学習して得られ、前記教師なしによる第２の制約モデルは事前に学習する必要がない。 According to one aspect of the present disclosure, the constraint model is a third constraint model including a first constraint model with supervision and a second constraint model with no supervision, and features and / or each of the translation pairs. Acquiring the first constraint result based on the representation of the semantic space of the translation pair is N (N is 2 or more) based on the representation of the semantic space of each translation pair by the second constraint model. Based on the characteristics of each translation pair in the second candidate word set by the acquisition of the second candidate word set containing the translation pair (which is an integer of) and the first constraint model, the first The first constraint model with the teacher is obtained by learning from the training data, and the second constraint model without the teacher does not need to be learned in advance.

本開示の一態様によれば、前記距離はユークリッド距離である。 According to one aspect of the present disclosure, the distance is an Euclidean distance.

本開示の一態様によれば、前記翻訳ペアの特徴は、１つ又は複数の翻訳が選択された頻度、最近選択されたか否か、翻訳の長さ、正式単語に含まれているか否か、セマンティック関連度を含む。 According to one aspect of the present disclosure, the characteristics of the translation pair are the frequency with which one or more translations were selected, whether or not they were recently selected, the length of the translation, and whether or not they were included in the formal word. Includes semantic relevance.

本開示の一態様によれば、前記第１の処理すべき情報は第１の処理テキストの一部であり、前記１つ又は複数の候補単語集合は、前記第１の処理テキストに基づいて生成され、又は前記第１の処理すべき情報の前の１つ又は複数の処理すべき情報に基づいて生成される。 According to one aspect of the present disclosure, the first information to be processed is part of the first processed text, and the one or more candidate word sets are generated based on the first processed text. Or generated based on one or more pieces of information to be processed prior to the first piece of information to be processed.

本開示の一態様によれば、ニューラルネットワークを学習し、前記第１の処理テキストに基づいて、又は前記第１の処理すべき情報の前の１つ又は複数の処理すべき情報に基づいて、前記１つ又は複数の候補単語集合を生成する。 According to one aspect of the present disclosure, the neural network is trained and based on the first processed text or based on one or more pieces of information to be processed prior to the first information to be processed. Generate the one or more candidate word sets.

本開示の一態様によれば、前記第１の制約結果に応じて、制約付きの復号化処理を使用して前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成する。 According to one aspect of the present disclosure, a second process is performed by modifying the translation result of the information to be processed in the first process by using the constrained decoding process according to the first constraint result. Generate information to be done.

本開示の一態様によれば、制約モデルにより前記第１の制約結果を取得する前に、前記第１の候補単語集合からユーザパーソナライズ情報に存在しない単語を削除する。 According to one aspect of the present disclosure, words that do not exist in the user personalized information are deleted from the first candidate word set before the first constraint result is acquired by the constraint model.

本開示の一態様によれば、前記ユーザパーソナライズ情報は、ユーザの翻訳履歴、ユーザのスタイルの傾向、翻訳の分野のうちの１つ又は複数に基づいて構築される。 According to one aspect of the present disclosure, the user personalized information is constructed based on one or more of the user's translation history, the user's style trends, and the field of translation.

本開示の一態様によれば、第３の処理すべき情報を翻訳して、第４の処理すべき情報を生成することと、前記第４の処理すべき情報に第３の代用語が含まれると、１つ又は複数の候補単語集合から前記第３の代用語を含む第３の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し表現が一致しない候補単語を含むことと、制約モデルにより、前記第３の候補単語集合における第３の制約結果を取得し、前記第３の制約結果に応じて、前記第４の処理すべき情報を修正することで、第５の処理すべき情報を生成することと、を含む情報処理方法を提供する。 According to one aspect of the present disclosure, the third information to be processed is translated to generate the fourth information to be processed, and the fourth information to be processed includes a third alternative term. Then, a third candidate word set containing the third alternative word is selected from one or more candidate word sets, and each of the one or more candidate word sets has at least two meanings and is expressed. The information to be processed is obtained by acquiring the third constraint result in the third candidate word set by the constraint model and including the candidate words that do not match, and according to the third constraint result. By modifying it, a fifth information processing method including the generation of information to be processed is provided.

本開示の一態様によれば、前記制約モデルにより、前記第３の候補単語集合における第３の制約結果を取得することは、前記第３の候補単語集合における各候補単語の特徴及び／又は各候補単語のセマンティック空間の表現に基づいて、前記第３の制約結果を取得することを含む。 According to one aspect of the present disclosure, obtaining the third constraint result in the third candidate word set by the constraint model is a feature of each candidate word in the third candidate word set and / or each. It includes acquiring the third constraint result based on the representation of the semantic space of the candidate word.

本開示の一態様によれば、前記制約モデルは教師ありによる第１の制約モデルであり、前記第３の候補単語集合における各候補単語の特徴及び／又は各候補単語のセマンティック空間の表現に基づいて、前記第３の制約結果を取得することは、前記第１の制約モデルにより、前記第３の候補単語集合における各候補単語の特徴に基づいて、前記第３の制約結果を取得することを含み、前記教師ありによる第１の制約モデルは学習データにより学習して得られる。 According to one aspect of the present disclosure, the constraint model is a supervised first constraint model, based on the characteristics of each candidate word in the third candidate word set and / or the representation of the semantic space of each candidate word. Therefore, to acquire the third constraint result is to acquire the third constraint result based on the characteristics of each candidate word in the third candidate word set by the first constraint model. Including, the first constraint model with the teacher is obtained by learning from the training data.

本開示の一態様によれば、前記制約モデルは教師なしによる第２の制約モデルであり、前記第３の候補単語集合における各候補単語の特徴及び／又は各候補単語のセマンティック空間の表現に基づいて、前記第３の制約結果を取得することは、前記第２の制約モデルにより、各候補単語のセマンティック空間の表現に基づいて前記第３の制約結果を取得することを含み、前記教師なしによる第２の制約モデルは事前に学習する必要がない。 According to one aspect of the present disclosure, the constraint model is a second constraint model without a teacher, based on the characteristics of each candidate word in the third candidate word set and / or the representation of the semantic space of each candidate word. The acquisition of the third constraint result includes the acquisition of the third constraint result based on the representation of the semantic space of each candidate word by the second constraint model, without the teacher. The second constraint model does not need to be trained in advance.

本開示の一態様によれば、前記第２の制約モデルにより、各候補単語のセマンティック空間の表現に基づいて前記第３の制約結果を取得することは、全ての候補単語のセマンティック空間の中心表現と各候補単語のセマンティック空間の表現との間の距離に基づいて、前記第３の制約結果を取得することを含む。 According to one aspect of the present disclosure, obtaining the third constraint result based on the semantic space representation of each candidate word by the second constraint model is a central representation of the semantic space of all candidate words. Includes obtaining the third constraint result based on the distance between and the semantic space representation of each candidate word.

本開示の一態様によれば、前記制約モデルは教師ありによる第１の制約モデル及び教師なしによる第２の制約モデルを含む第３の制約モデルであり、前記各候補単語の特徴及び／又は各候補単語のセマンティック空間の表現に基づいて、前記第３の制約結果を取得することは、前記第２の制約モデルにより、各候補単語のセマンティック空間の表現に基づいてＭ個（Ｍは２以上の整数である）の候補単語を含む第４の候補単語集合を取得することと、前記第１の制約モデルにより、前記第４の候補単語集合における各候補単語の特徴に基づいて、前記第３の制約結果を取得することと、を含み、前記教師ありによる第１の制約モデルは学習データにより学習して得られ、前記教師なしによる第２の制約モデルは事前に学習する必要がない。 According to one aspect of the present disclosure, the constraint model is a third constraint model including a first constraint model with supervision and a second constraint model with no supervision, and features and / or each of the candidate words. Acquiring the third constraint result based on the representation of the semantic space of the candidate word is M (M is 2 or more) based on the representation of the semantic space of each candidate word by the second constraint model. Based on the characteristics of each candidate word in the fourth candidate word set by the acquisition of the fourth candidate word set containing the candidate words (which is an integer) and the first constraint model, the third candidate word set is used. The first constraint model with the teacher is obtained by learning from the training data, including the acquisition of the constraint result, and the second constraint model without the teacher does not need to be learned in advance.

本開示の一態様によれば、前記第３の処理すべき情報は第２の処理テキストの一部であり、前記１つ又は複数の候補単語集合は、前記第２の処理テキストの翻訳テキストに基づいて生成され、又は、前記第３の処理すべき情報の前の１つ又は複数の処理すべき情報の翻訳テキストに基づいて生成される。 According to one aspect of the present disclosure, the third processed information is part of the second processed text, and the one or more candidate word sets are in the translated text of the second processed text. Generated based on, or based on the translated text of one or more pieces of information to be processed prior to the third piece of information to be processed.

本開示の一態様によれば、第１の処理すべき情報を取得するための第１の処理すべき情報取得ユニットと、前記第１の処理すべき情報に第１の代用語が含まれると、１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択するためのものであり、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し表現が一致しない候補単語を含む第１の候補単語集合選択ユニットと、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得するための第１の制約結果取得ユニットと、前記第１の制約結果に応じて、前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成するための第２の処理すべき情報生成ユニットとを含む情報処理装置を提供する。 According to one aspect of the present disclosure, the first information acquisition unit to be processed for acquiring the first information to be processed and the first information to be processed include the first alternative term. The purpose is to select a first candidate word set containing the first alternative term from one or more candidate word sets, and each of the one or more candidate word sets has at least two meanings. A first candidate word set selection unit containing candidate words that match and the expressions do not match, and a first constraint result translated corresponding to the first alternative term in the first candidate word set by a constraint model. The second information to be processed is generated by modifying the translation result of the first information to be processed according to the first constraint result acquisition unit for acquiring the first constraint result and the first constraint result. Provided is an information processing apparatus including a second information generation unit to be processed.

本開示の一態様によれば、第３の処理すべき情報を翻訳して、第４の処理すべき情報を生成するための第４の処理すべき情報生成ユニットと、前記第４の処理すべき情報に第３の代用語が含まれると、１つ又は複数の候補単語集合から前記第３の代用語を含む第３の候補単語集合を選択するためのものであり、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し表現が一致しない候補単語を含む第３の候補単語集合選択ユニットと、制約モデルにより、前記第３の候補単語集合における第３の制約結果を取得するための第３の制約結果取得ユニットと、前記第３の制約結果に応じて、前記第４の処理すべき情報を修正することで、第５の処理すべき情報を生成するための第５の処理すべき情報生成ユニットとを含む情報処理装置を提供する。 According to one aspect of the present disclosure, a fourth information generation unit to be processed for translating the third information to be processed and generating the fourth information to be processed, and the fourth processing. When the information to be included includes a third substitute word, the third candidate word set containing the third substitute word is selected from one or more candidate word sets, and the one or more candidate words are selected. Each of the candidate word sets of the third candidate word set contains at least two candidate words whose meanings match and the expressions do not match, and the third constraint result in the third candidate word set is obtained by the constraint model. The third constraint result acquisition unit for acquisition and the fifth information to be processed are generated by modifying the fourth information to be processed according to the third constraint result. Provided is an information processing apparatus including an information generation unit to be processed in 5.

本開示の一態様によれば、プロセッサと、コンピュータ読み取り可能な命令を記憶するメモリとを含み、前記コンピュータ読み取り可能な命令が前記プロセッサによって実行される場合に、情報処理方法を実行させ、前記方法は、第１の処理すべき情報を取得し、前記第１の処理すべき情報に第１の代用語が含まれると、１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し表現が一致しない単語を含み、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得し、前記第１の制約結果に応じて、前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成することを含む情報処理機器を提供する。 According to one aspect of the present disclosure, the information processing method is executed when the computer-readable instruction is executed by the processor, including a processor and a memory for storing computer-readable instructions. Acquires the first information to be processed, and when the first information to be processed includes the first substitute term, the first substitute word set including the first substitute term from one or a plurality of candidate word sets. One candidate word set is selected, and each of the one or more candidate word sets contains words whose meanings match and expressions do not match, and the constraint model allows the first candidate word set in the first candidate word set. By acquiring the first constraint result translated corresponding to the substitute term of 1 and modifying the translation result of the information to be processed according to the first constraint result, the second constraint result is obtained. Provide information processing equipment including generating information to be processed.

本開示の一態様によれば、コンピュータ読み取り可能なプログラムを記憶するためのコンピュータ読み取り可能な記憶媒体を提供し、前記プログラムは、上記のいずれか一項に記載の情報処理方法をコンピュータに実行させる。 According to one aspect of the present disclosure, a computer-readable storage medium for storing a computer-readable program is provided, and the program causes a computer to execute the information processing method according to any one of the above. ..

本開示の上記の態様において、制約モデルにより代用語を制約し、情報における代用語の翻訳を修正することにより、文書における全ての特定の代用語の翻訳を一致させ、翻訳の正確性及び専門性を向上させる。 In the above aspects of the present disclosure, by constraining the alternative terms with a constraint model and modifying the translation of the alternative terms in the information, the translations of all specific alternative terms in the document are matched, and the accuracy and expertise of the translation. To improve.

本開示の実施例を添付の図面と併せてより詳細に説明することにより、本開示の上記及び他の目的、特徴、及び利点がより明らかになるであろう。図面は、本開示の実施例のさらなる理解を提供し、明細書の一部を構成し、本開示の実施例とともに本開示を説明するために使用され、本開示を限定するものではない。図面において、同じ参照記号は、一般には、同じ構成要素又はステップを表す。
本開示の実施例によるソースエンドに基づく情報処理の方法のフローチャートである。図２ａは、本開示の実施例による辞書に基づいて１つ又は複数の候補単語集合を取得する例示的な概略図である。図２ｂは、本開示の実施例による辞書に基づいて１つ又は複数の候補単語集合を取得する例示的な概略図である。本開示の実施例によるニューラルネットワークに基づいて１つ又は複数の候補単語集合を示す例示的な概略図である。本開示の実施例による制約モデルにより第１の制約結果を取得する方法のフローチャートである。本開示の実施例による第１の制約モデルにより第１の制約結果を取得する例示的な概略図である。本開示の実施例による第２の制約モデルにより第１の制約結果を取得する方法のフローチャートである。本開示の実施例による第２の制約モデルにより第１の制約結果を取得する例示的な概略図である。本開示の実施例による第３の制約モデルにより第１の制約結果を取得する方法のフローチャートである。本開示の実施例によるソースエンドに基づく情報処理方法の例示的な概略図である。本開示の実施例によるソースエンドに基づく情報処理方法の他の例示的な概略図である。本開示の実施例によるターゲット端に基づく他の情報処理の方法のフローチャートである。本開示の実施例による第２の制約モデルにより第３の制約結果を取得する方法のフローチャートである。本開示の実施例による第３の制約モデルにより第３の制約結果を取得する方法のフローチャートである。本開示の実施例によるターゲット端に基づく情報処理方法の例示的な概略図である。本開示の実施例によるターゲット端に基づく情報処理方法の他の例示的な概略図である。本開示の実施例による翻訳結果の概略図である。本開示の実施例による情報処理装置の機能ブロック図である。本開示の実施例による他の情報処理装置の機能ブロック図である。本開示の実施例による情報処理装置の概略図である。本開示の実施例に係る電子機器のハードウェア構成の例の図である。 By describing the embodiments of the present disclosure in more detail in conjunction with the accompanying drawings, the above and other purposes, features, and advantages of the present disclosure will become more apparent. The drawings provide a further understanding of the embodiments of the present disclosure, form part of the specification, and are used in conjunction with the embodiments of the present disclosure to illustrate the present disclosure and are not intended to limit the present disclosure. In the drawings, the same reference symbol generally represents the same component or step.
It is a flowchart of the information processing method based on the source end by the Example of this disclosure. FIG. 2a is an exemplary schematic diagram of acquiring one or more candidate word sets based on a dictionary according to an embodiment of the present disclosure. FIG. 2b is an exemplary schematic diagram of acquiring one or more candidate word sets based on a dictionary according to an embodiment of the present disclosure. It is an exemplary schematic diagram showing one or more candidate word sets based on the neural network according to the embodiment of the present disclosure. It is a flowchart of the method of acquiring the first constraint result by the constraint model by the embodiment of this disclosure. It is an exemplary schematic diagram which acquires the 1st constraint result by the 1st constraint model by the Example of this disclosure. It is a flowchart of the method of acquiring the 1st constraint result by the 2nd constraint model by the Example of this disclosure. It is an exemplary schematic diagram which acquires the 1st constraint result by the 2nd constraint model by the Example of this disclosure. It is a flowchart of the method of acquiring the 1st constraint result by the 3rd constraint model by the Example of this disclosure. It is an exemplary schematic diagram of an information processing method based on a source end according to an embodiment of the present disclosure. It is another exemplary schematic diagram of the information processing method based on the source end according to the embodiment of this disclosure. It is a flowchart of another information processing method based on the target end by the Example of this disclosure. It is a flowchart of the method of acquiring the 3rd constraint result by the 2nd constraint model by the Example of this disclosure. It is a flowchart of the method of acquiring the 3rd constraint result by the 3rd constraint model by the Example of this disclosure. It is an exemplary schematic diagram of the information processing method based on the target end by the Example of this disclosure. It is another exemplary schematic diagram of the information processing method based on the target edge by the Example of this disclosure. It is a schematic diagram of the translation result by the Example of this disclosure. It is a functional block diagram of the information processing apparatus according to the Example of this disclosure. It is a functional block diagram of another information processing apparatus according to the Example of this disclosure. It is a schematic diagram of the information processing apparatus according to the Example of this disclosure. It is a figure of the example of the hardware composition of the electronic device which concerns on embodiment of this disclosure.

以下に、本開示の実施例における添付の図面と併せて本開示の実施例における技術方案をより明確、且つ完全に説明する。明らかに、説明される実施例は、本開示の実施例の一部にすぎず、すべての実施例ではない。本開示の実施例に基づいて、当業者が創造的な労力なしに得られる他のすべての実施例は、本出願の保護の範囲に含まれるものとする。 Hereinafter, the technical plan in the embodiments of the present disclosure will be described more clearly and completely together with the accompanying drawings in the embodiments of the present disclosure. Obviously, the examples described are only a part of the examples of the present disclosure and not all of them. All other examples obtained by one of ordinary skill in the art without creative effort based on the examples of the present disclosure shall be within the scope of protection of this application.

本出願では、フローチャートを使用して本出願の実施例による方法のステップを説明する。なお、前のステップ又は後のステップは、必ずしも順序で精確に実行されるとは限らないことを理解されたい。代わりに、さまざまなステップを逆の順序で、又は同時に処理できる。同時に、他の操作をこれらのプロセスに追加したり、１つ又は複数のステップをこれらのプロセスから削除したりすることができる。 In this application, flowcharts are used to describe the steps of the method according to the embodiments of the present application. It should be understood that the previous step or the subsequent step is not always performed accurately in order. Alternatively, the various steps can be processed in reverse order or simultaneously. At the same time, other operations can be added to these processes and one or more steps can be removed from these processes.

まず、図１を参照して、本開示の実施例を実現するための情報処理方法１００を説明する。本開示は、制約モデルにより第１の代用語を制約して、情報における第１の代用語の翻訳を修正することにより、文章における全ての第１の代用語の翻訳を一致させ、翻訳の正確性及び専門性を向上させる。 First, the information processing method 100 for realizing the embodiment of the present disclosure will be described with reference to FIG. The present disclosure aligns the translations of all first terminology in a sentence by constraining the first terminology with a constraint model and modifying the translation of the first terminology in the information, and the translation accuracy. Improve sex and expertise.

図面と併せて本開示の実施例及びその例について詳細に説明する。 Examples of the present disclosure and examples thereof will be described in detail together with the drawings.

本開示の少なくとも１つの実施例は、情報処理方法、装置、機器及びコンピュータ読み取り可能な記憶媒体を提供する。以下、幾つかの例示的な例及び実施例により本開示の少なくとも１つの実施例によって提供される情報処理について限定的ではなく説明し、以下説明するように、相互矛盾がない場合に、これらの具体例及び実施例の異なる特徴が互いに組み合わせることができ、それによって新たな例及び実施例を得て、これらの新たな例及び実施例も本開示によって保護される範囲に属する。 At least one embodiment of the present disclosure provides information processing methods, devices, equipment and computer readable storage media. Hereinafter, the information processing provided by at least one embodiment of the present disclosure will be described, but is not limited to, by some exemplary examples and examples, and as will be described below, these will be described when there is no mutual contradiction. Different features of the embodiments and examples can be combined with each other to give new examples and examples, which also fall within the scope protected by the present disclosure.

以下、図１〜１０を参照して、本開示の実施例によるソースエンドに基づく情報処理方法を説明する。 Hereinafter, the information processing method based on the source end according to the embodiment of the present disclosure will be described with reference to FIGS. 1 to 10.

まず、図１を参照して、本開示の実施例によるソースエンドに基づく情報処理方法を説明する。当該方法はコンピュータ等により自動的に完了することができる。例えば、当該方法は、情報やテキストを翻訳するために適用することができる。例えば、当該情報処理取得方法は、ソフトウェア、ハードウェア、ファームウェア又はその任意の組み合わせで実現することができ、例えば、携帯電話、タブレット、ラップトップ、デスクトップコンピュータ、ウェブサーバなどの機器のプロセッサによってロードされ実行される。 First, with reference to FIG. 1, an information processing method based on the source end according to the embodiment of the present disclosure will be described. The method can be completed automatically by a computer or the like. For example, the method can be applied to translate information or text. For example, the information processing acquisition method can be realized by software, hardware, firmware or any combination thereof, and is loaded by the processor of a device such as a mobile phone, a tablet, a laptop, a desktop computer, or a web server. Will be executed.

図１に示すように、当該情報処理方法は以下のステップＳ１０１〜Ｓ１０４を含む。 As shown in FIG. 1, the information processing method includes the following steps S101 to S104.

ステップＳ１０１において、第１の処理すべき情報を取得する。 In step S101, the first information to be processed is acquired.

ステップＳ１０２において、前記第１の処理すべき情報に第１の代用語が含まれると、１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは、少なくとも２つの意味が一致し、表現が一致しない候補単語を含む。 In step S102, when the first substitute word is included in the first information to be processed, the first candidate word set containing the first substitute word is selected from one or more candidate word sets. Each of the one or more candidate word sets includes candidate words whose meanings match at least two and whose expressions do not match.

ステップＳ１０３において、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得する。 In step S103, the constraint model acquires the first constraint result translated corresponding to the first alternative term in the first candidate word set.

ステップＳ１０４において、前記第１の制約結果に応じて、前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成する。 In step S104, the second information to be processed is generated by modifying the translation result of the first information to be processed according to the first constraint result.

ステップＳ１０１について、例えば、第１の処理すべき情報は、一段落又は一編の翻訳すべきテキストであってもよいし、一句の翻訳すべきテキストであってもよい。それは、任意の言語形態（例えば、中国語、英語、日本語など）であってもよいが、ここで限定しない。本開示の情報処理方法により、第１の処理すべき情報を所望のテキストに翻訳することができ（例えば、中国語を所望の英語、日本語などに翻訳するが、ここで限定しない）、且つ、同じ又は類似の意味である代用語の翻訳を一致させる。 Regarding step S101, for example, the first information to be processed may be one paragraph or one text to be translated, or one phrase may be the text to be translated. It may be in any language form (eg, Chinese, English, Japanese, etc.), but is not limited herein. According to the information processing method of the present disclosure, the information to be processed first can be translated into a desired text (for example, Chinese is translated into desired English, Japanese, etc., but is not limited thereto), and , Match translations of alternative terms that have the same or similar meanings.

ステップＳ１０２について、例えば、第１の代用語は、その意味と一致し表現が一致しない名詞又は代詞が存在する可能性がある。例えば、第１の代用語は第１の候補単語集合に含まれ、且つ第１の候補単語集合は少なくとも２つの意味が一致し、表現が一致しない候補単語を含んでもよい。１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択することは、共参照クエリと呼ばれる。 Regarding step S102, for example, the first pronoun may have a noun or pronoun whose meaning and expression do not match. For example, the first alternative term may be included in the first candidate word set, and the first candidate word set may include candidate words having at least two meanings matching and expressions not matching. Selecting a first candidate word set containing the first alternative term from one or more candidate word sets is called a co-reference query.

以上、第１の処理すべき情報、第１の代用語及び第１の候補単語集合に関する説明は、一例であり、第１の処理すべき情報には複数の候補単語集合を含む可能性があると理解されたく、ここで限定しない。

As described above, the description regarding the first information to be processed, the first alternative term, and the first candidate word set is an example, and the first information to be processed may include a plurality of candidate word sets. I want to understand that, so I don't limit it here.

例えば、第１の処理すべき情報は第１の処理テキストの一部であり、前記１つ又は複数の候補単語集合は、前記第１の処理テキストに基づいて、又は、前記第１の処理すべき情報の前の１つ又は複数の処理すべき情報に基づいて生成されてもよい。 For example, the information to be processed first is a part of the first processing text, and the one or more candidate word sets are based on the first processing text or the first processing. It may be generated based on one or more information to be processed prior to the information to be processed.

一例として、第１の処理テキスト一篇の翻訳すべきテキストであってもよく、第１の処理すべき情報が当該一篇の翻訳すべきテキスト内の文である場合に、当該一篇の翻訳すべきテキストに基づいて１つ又は複数の候補単語集合を生成することができる。 As an example, the first processed text may be one text to be translated, and when the first information to be processed is a sentence in the one text to be translated, the translation of the one text. One or more candidate word sets can be generated based on the text to be written.

代わりに、他の例として、第１の処理テキストは一篇の翻訳すべきテキストであってもよく、第１の処理すべき情報が当該一篇の翻訳すべきテキスト内の文である場合に、当該文の前の１つ又は複数のセンテンスに基づいて１つ又は複数の候補単語集合を生成することができる。 Alternatively, as another example, the first processed text may be one text to be translated, and the first information to be processed is a sentence in the one text to be translated. , One or more candidate word sets can be generated based on one or more sentences before the sentence.

例えば、辞書などにおける単語翻訳に基づいて１つ又は複数の候補単語集合を取得することができる。図２（ａ）及び図２（ｂ）に示すように、本開示の実施例による辞書における単語翻訳に基づいて１つ又は複数の候補単語集合を取得する例示的な例の概略図である。 For example, one or a plurality of candidate word sets can be obtained based on a word translation in a dictionary or the like. As shown in FIGS. 2A and 2B, it is a schematic diagram of an exemplary example of acquiring one or more candidate word sets based on word translation in a dictionary according to an embodiment of the present disclosure.

図２（ａ）及び図２（ｂ）に示すように、中国語−英語の翻訳を実行する場合に、１つの英語は複数の中国語に対応し、１つの中国語も複数の英語に対応することがある。例えば、英語を中国語に翻訳する場合、１つの英語（例えば、図２（ａ）中の「ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ」）は複数の中国語翻訳（例えば、図２（ａ）中の「都科摩北京研究所」、「ＤＯＣＯＭＯ北京研究所」、「ＤＯＣＯＭＯ北京研」）に対応することがある。また、中国語を英語に翻訳する場合、１つの中国語（（例えば、図２（ｂ）中の「都科摩北京研究所」））は複数の中国語（例えば、図２（ｂ）中の「ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ」、「Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．」、「ＤＢＬ」）に対応することもある。これは、翻訳結果の不一致が発生し、辞書などに基づく情報処理は辞書内のエンティティ単語（例えば名詞）のみを処理でき、情報に含まれている代詞を処理することができない。 As shown in FIGS. 2 (a) and 2 (b), when performing a Chinese-English translation, one English corresponds to a plurality of Chinese, and one Chinese also corresponds to a plurality of English. I have something to do. For example, when translating English into Chinese, one English (for example, "Docomo Beijing Labs" in FIG. 2 (a)) is translated into multiple Chinese (for example, "Toshima" in FIG. 2 (a)). "Beijing Research Institute", "DOCOMO Beijing Research Institute", "DOCOMO Beijing Research Institute") may be supported. Further, when translating Chinese into English, one Chinese (for example, "Toshima Beijing Research Institute" in FIG. 2 (b)) is in a plurality of Chinese (for example, in FIG. 2 (b)). "Docomo Beijing Labs", "Docomo Beijing communications laboratory co. Ltd.", "DBL"). This is because inconsistencies in translation results occur, and information processing based on a dictionary or the like can process only entity words (for example, nouns) in the dictionary, and cannot process pronouns contained in the information.

本開示は、ニューラルネットワークに基づいて候補単語集合を取得する方法を提供する。当該方法は、既存のニューラルネットワークにより、情報に含まれる全ての単語（名詞、代詞を含む）の候補単語集合を取得できる。他の方法（例えば、統計方法など）に基づいて候補単語集合を取得することも可能であることを理解されたく、ここで限定しない。 The present disclosure provides a method of obtaining a candidate word set based on a neural network. In this method, a candidate word set of all words (including nouns and pronouns) contained in the information can be obtained by an existing neural network. It should be understood that it is also possible to obtain a candidate word set based on other methods (eg, statistical methods, etc.), and is not limited here.

一例として、学習済みのニューラルネットワークにより、前記第１の処理テキストに基づいて、又は、前記第１の処理すべき情報の前の１つ又は複数の処理すべき情報に基づいて、前記１つ又は複数の候補単語集合を生成することができる。 As an example, the trained neural network may be based on the one or more information to be processed, based on the first processed text, or based on one or more pieces of information to be processed prior to the first information to be processed. Multiple candidate word sets can be generated.

例えば、当該ニューラルネットワークはセマンティック関連性、含意関係などに基づいて、その入力情報からセマンティック意味が一致し、表現が一致しない用語及び／又は代詞を取得し候補単語集合を構成してもよい。一例として、当該ニューラルネットワークは、単語、単語ベクトル（又はセマンティック空間の表現）に基づいて、それと他の単語の単語ベクトルとをペアリング、採点し、得点が最も高い単語を当該単語の候補単語集合における候補単語として取ってもよい。 For example, the neural network may construct a candidate word set by acquiring terms and / or pronouns whose semantic meanings match and expressions do not match from the input information based on semantic relationships, implications, and the like. As an example, the neural network pairs and scores the word, the word vector (or the representation of the semantic space), and the word vector of another word, and the word with the highest score is the candidate word set of the word. It may be taken as a candidate word in.

図３に示すように、本開示の実施例によるニューラルネットワークに基づいて１つ又は複数の候補単語集合を取得する例示的な例の概略図である。複数の候補単語を、意味が一致し、表現が一致しない共参照単語といい、候補単語集合を取得する方法を「共参照単語発見（又は共参照解析）」方法ということができる。一般に、共参照単語発見方法の仕方は当該語句の前の全ての単語（名詞又は名詞連語）とをペアリング、採点することである。

As shown in FIG. 3, it is a schematic diagram of an exemplary example of acquiring one or more candidate word sets based on a neural network according to an embodiment of the present disclosure. A plurality of candidate words are called co-reference words whose meanings match and expressions do not match, and a method of acquiring a candidate word set can be called a "co-reference word discovery (or co-reference analysis)" method. In general, the method of finding a co-reference word is to pair and score all the words (nouns or noun collocations) before the phrase.

ニューラルネットワークは、異なるニューラルネットワーク構造を採用でき、畳み込みニューラルネットワーク、リカレントニューラルネットワーク（ＲＮＮ）などを含むがこれらに限定されないことが容易に理解される。前記畳み込みニューラルネットワークは、Ｕ−Ｎｅｔニューラルネットワーク、ＲｅｓＮｅｔ、ＤｅｎｓｅＮｅｔなどを含むが、それに限定されない。 It is easily understood that neural networks can employ different neural network structures and include, but are not limited to, convolutional neural networks, recurrent neural networks (RNNs), and the like. The convolutional neural network includes, but is not limited to, a U-Net neural network, ResNet, DenseNet, and the like.

上記の図３に示すニューラルネットワークに基づいて１つ又は複数の候補単語集合を取得する方法は一例であり、他の方法（例えば、統計方法など）で１つ又は複数の候補単語集合を取得することもできるが、ここで限定しない。 The method of acquiring one or more candidate word sets based on the neural network shown in FIG. 3 above is an example, and one or more candidate word sets are acquired by another method (for example, a statistical method). It can be done, but it is not limited here.

１つ又は複数の候補単語集合を取得した後に、１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択することができる。次に、ステップＳ１０３において、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得する。 After acquiring one or more candidate word sets, the first candidate word set containing the first alternative term can be selected from the one or more candidate word sets. Next, in step S103, the constraint model acquires the first constraint result translated corresponding to the first substitute term in the first candidate word set.

以下、図４〜図８を参照して本開示の実施例による第１の制約結果を取得する方法を説明する。 Hereinafter, a method of obtaining the first constraint result according to the embodiment of the present disclosure will be described with reference to FIGS. 4 to 8.

図４に示すように、本開示の実施例による制約モデルにより、第１の制約結果を取得する方法２００のフローチャートである。例えば、図４に示すように、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得することは、前記第１の候補単語集合における各候補単語と各候補単語の１つ又は複数の翻訳結果とをペアリングし、翻訳ペアとし（Ｓ２０１）、各翻訳ペアの特徴及び／又は各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得する（Ｓ２０２）ことを含むことができる。 As shown in FIG. 4, it is a flowchart of the method 200 for acquiring the first constraint result by the constraint model according to the embodiment of the present disclosure. For example, as shown in FIG. 4, it is the first candidate to acquire the first constraint result translated corresponding to the first alternative word in the first candidate word set by the constraint model. Each candidate word in the word set is paired with one or more translation results of each candidate word to form a translation pair (S201), based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair. , Acquiring the first constraint result (S202) can be included.

図５に示すように、本開示の実施例による第１の制約モデルにより第１の制約結果を取得する例示的な例の概略図である。 As shown in FIG. 5, it is a schematic diagram of an exemplary example in which the first constraint result is acquired by the first constraint model according to the embodiment of the present disclosure.

例えば、前記制約モデルは教師ありによる第１の制約モデルであり、前記各翻訳ペアの特徴及び／又は各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することは、前記第１の制約モデルにより、各翻訳ペアの特徴に基づいて、前記第１の制約結果を取得することを含むことができ、前記教師ありによる第１の制約モデルは学習データにより学習して得られる。 For example, the constraint model is a supervised first constraint model, and acquiring the first constraint result based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair is possible. The first constraint model can include acquiring the first constraint result based on the characteristics of each translation pair, and the supervised first constraint model can be obtained by learning from the training data. Be done.

例えば、翻訳ペアの特徴は、翻訳が選択された頻度、最近選択されたか否か、翻訳の長さ、正式単語に含まれているか否か、セマンティック関連度、語句の単語の埋め込み表現（ｅｍｂｅｄｄｉｎｇ）などのうちの１つ又は複数を含んでもよい。当該特徴は、ユーザ辞書、ビッグデータ統計などから取得されてもよいが、ここで限定しない。翻訳ペアの特徴は上記の１つ又は複数の特徴に限定されず、必要に応じて他の特徴を追加するが、ここで限定しないと理解されたい。 For example, the characteristics of a translation pair are the frequency with which the translation was selected, whether it was recently selected, the length of the translation, whether it was included in the formal word, the semantic relevance, and the embedded expression of the word in the phrase. It may contain one or more of the above. The feature may be obtained from a user dictionary, big data statistics, etc., but is not limited thereto. It should be understood that the features of the translation pair are not limited to one or more of the above features, and other features may be added as needed, but are not limited here.

図５に示すように、中国語を英語に翻訳する場合に、第１の代用語が「都科摩北京研究所」であるとすると、それに対応する第１の候補単語集合が｛都科摩北京研究所、北京研究所、ＤＯＣＯＭＯ北京研｝であるとすると、候補単語集合に対応する翻訳結果が｛ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ，Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．，ＤＢＬ｝であり得る。 As shown in FIG. 5, when translating Chinese into English, if the first alternative term is "Toshima Beijing Research Institute", the corresponding first candidate word set is {Toshima If it is Beijing Research Institute, Beijing Research Institute, DOCOMO Beijing Research Institute}, the translation result corresponding to the candidate word set is {Docomo Beijing Labs, Docomo Beijing communications laboratory co. Ltd. , DBL}.

まず、｛都科摩北京研究所、北京研究所、ＤＯＣＯＭＯ北京研｝のそれぞれと｛ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ，Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．，ＤＢＬ｝のそれぞれとをペアリングし、翻訳ペア７１とする。次に、全ての翻訳ペア７１のそれぞれについて、既存の適切な方法を利用して特徴抽出７２を行う。例えば、各翻訳ペアの、翻訳が選択された頻度、最近選択されたか否か、翻訳の長さ、正式単語に含まれているか否か、（１）セマンティック関連度の特徴を抽出してもよい。例えば、図５に示すように、抽出された翻訳ペア「都科摩北京研究所−ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ」の特徴はそれぞれ、翻訳が選択された頻度：２８、（２）最近選択されたか否か：０、（３）翻訳の長さ：３、（４）正式単語に含まれているか否か：０、（５）セマンティック関連度：３．１２である。次に、各特徴に基づいて各翻訳ペアの得点７４を算出する。また、取得された得点を、例えばｓｏｆｔｍａｘを含むニューラルネットワーク７５を通過させて各得点に対応する確率７６を取得する。最後に、選択確率７６が最も高い翻訳ペアに対応する翻訳を第１の制約結果とする。例えば、翻訳ペア「都科摩北京研究所−Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．」の確率が最も高い場合に、「Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．」を第１の代用語「都科摩北京研究所」の第１の制約結果とする。 First, {Toshina Beijing Research Institute, Beijing Research Institute, DOCOMO Beijing Research Institute} and {Docomo Beijing Labs, Docomo Beijing Communications laboratory co. Ltd. , DBL} are paired to form a translation pair 71. Next, for each of all the translation pairs 71, feature extraction 72 is performed using an existing appropriate method. For example, for each translation pair, the frequency at which the translation was selected, whether it was recently selected, the length of the translation, whether it was included in the formal word, and (1) the characteristics of the semantic relevance may be extracted. .. For example, as shown in FIG. 5, the characteristics of the extracted translation pair "Toshima Beijing Institute-Docomo Beijing Labs" are: frequency of translation selection: 28, (2) recently selected or not: 0, (3) Translation length: 3, (4) Whether or not it is included in the formal word: 0, (5) Semantic relevance: 3.12. Next, the score 74 of each translation pair is calculated based on each feature. Further, the acquired scores are passed through, for example, a neural network 75 including softmax, and the probability 76 corresponding to each score is acquired. Finally, the translation corresponding to the translation pair with the highest selection probability 76 is set as the first constraint result. For example, when the probability of the translation pair "Docomo Beijing Communications laboratory co. Ltd." is the highest, "Docomo beijing communications laboratory co. Ltd." is used as the first alternative term "Toshima". This is the first constraint result of "Laboratory".

例えば、上記の教師ありによる第１の制約モデルはラベル付けされた学習データにより事前学習して得られてもよい。上記の教師ありによる第１の制約モデルはニューラルネットワークモデル又は統計方法モデルなどであってもよいが、ここで限定しない。 For example, the supervised first constraint model described above may be obtained by pre-training with labeled training data. The above supervised first constraint model may be a neural network model, a statistical method model, or the like, but is not limited thereto.

例えば、図５において、ｋ１、ｋ２、ｋ３、ｋ４、ｋ５は教師ありによる第１の制約モデルにおける各特徴に対応する重みであってもよく、損失関数によりニューラルネットワークを学習して得られてもよい。本開示の実施例による教師ありによる第１の制約モデルは異なるニューラルネットワーク構成を採用してもよく、畳み込みニューラルネットワーク、リカレントニューラルネットワーク（ＲＮＮ）などを含むがそれに限定されないことが容易に理解される。前記畳み込みニューラルネットワークはＵ−Ｎｅｔニューラルネットワーク、ＲｅｓＮｅｔ、ＤｅｎｓｅＮｅｔなどを含むがそれに限定されない。本開示は中国語を英語に翻訳する場合に限定されず、本開示は、例えば、中国語−日本語翻訳、英語−日本語翻訳、ドイツ語−中国語翻訳など、任意の所望の翻訳をする場合に適用することができると理解されたい。 For example, in FIG. 5, k1, k2, k3, k4, and k5 may be weights corresponding to each feature in the supervised first constraint model, or may be obtained by learning a neural network by a loss function. good. It is easily understood that the supervised first constraint model according to the embodiments of the present disclosure may employ different neural network configurations, including, but not limited to, convolutional neural networks, recurrent neural networks (RNNs), and the like. .. The convolutional neural network includes, but is not limited to, a U-Net neural network, ResNet, DenseNet, and the like. The present disclosure is not limited to translating Chinese into English, and the present disclosure makes any desired translation, for example, Chinese-Japanese translation, English-Japanese translation, German-Chinese translation. It should be understood that it can be applied in some cases.

図６に示すように、本開示の実施例による第２の制約モデルにより第１の制約結果を取得する方法３００のフローチャートである。図７に示すように、本開示の実施例による第２の制約モデルにより第１の制約結果を取得する例示的な例の概略図である。 As shown in FIG. 6, it is a flowchart of the method 300 which acquires the 1st constraint result by the 2nd constraint model by the embodiment of this disclosure. As shown in FIG. 7, it is a schematic diagram of an exemplary example in which the first constraint result is acquired by the second constraint model according to the embodiment of the present disclosure.

例えば、上記の制約モデルは教師なしによる第２の制約モデルであってもよく、前記各翻訳ペアの特徴及び／又は各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することは、前記第２の制約モデルにより、各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することを含むことができ、前記教師なしによる第２の制約モデルは事前に学習する必要がない。 For example, the constraint model may be a second constraint model without a teacher, and the first constraint result is acquired based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair. What is done can include obtaining the first constraint result based on the representation of the semantic space of each translation pair by the second constraint model, the second constraint model without the teacher. No need to learn in advance.

例えば、上記の教師なしによる第２の制約モデルはテキストランク（ＴｅｘｔＲａｎｋ）、埋め込みセンター（ＥｍｂｅｄｄｉｎｇＣｅｎｔｅｒ）などであってもよいが、ここで限定しない。 For example, the above unsupervised second constraint model may be, but is not limited to, a text rank, an embedded center, and the like.

一例として、前記第２の制約モデルにより、各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得することは、全ての翻訳ペアのセマンティック空間の中心表現と各翻訳ペアのセマンティック空間の表現との間の距離に基づいて、前記第１の制約結果を取得することを含むことができる。 As an example, according to the second constraint model, obtaining the first constraint result based on the representation of the semantic space of each translation pair is the central representation of the semantic space of all translation pairs and each translation pair. Acquiring the first constraint result can be included based on the distance from the representation of the semantic space.

図６に示すように、全ての翻訳ペアのセマンティック空間の中心表現と各翻訳ペアのセマンティック空間の表現との間の距離に基づいて、前記第１の制約結果を取得することは、全ての翻訳ペアのセマンティック空間の中心表現を取得し（Ｓ３０１）、全ての翻訳ペアのセマンティック空間の中心表現と各翻訳ペアのセマンティック空間の表現との間の距離を取得し（Ｓ３０２）、距離が最も小さい翻訳ペアに対応する翻訳結果を前記第１の制約結果として選択する（Ｓ３０３）ことを含むことができる。 As shown in FIG. 6, obtaining the first constraint result based on the distance between the central representation of the semantic space of all translation pairs and the representation of the semantic space of each translation pair is all translations. Get the central representation of the semantic space of a pair (S301), get the distance between the central representation of the semantic space of all translation pairs and the representation of the semantic space of each translation pair (S302), and get the translation with the shortest distance. It can include selecting the translation result corresponding to the pair as the first constraint result (S303).

例えば、図７に示すように、中国語を英語に翻訳する場合に、第１の代用語が「都科摩北京研究所」、それに対応する第１の候補単語集合が｛都科摩北京研究所，北京研究所，ＤＯＣＯＭＯ北京研｝であるとすると、候補単語集合に対応する翻訳結果は｛ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ，Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．，ＤＢＬ｝であり得る。 For example, as shown in Fig. 7, when translating Chinese into English, the first alternative term is "Toshima Beijing Research Institute", and the corresponding first candidate word set is {Toshima Beijing Research. However, if it is Beijing Research Institute, DOCOMO Beijing Research Institute}, the translation result corresponding to the candidate word set is {Docomo Beijing Labs, Docomo Beijing communications laboratory co. Ltd. , DBL}.

まず、｛都科摩北京研究所、北京研究所、ＤＯＣＯＭＯ北京研｝のそれぞれと｛ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ、Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．、ＤＢＬ｝のそれぞれとをペアリングし、翻訳ペア８１とする。次に、全ての翻訳ペア８１のそれぞれについて、既存の適切な方法（例えば、既存のニューラルネットワークなどであり、ここで限定しない）を利用してセマンティック空間の表現８２を抽出して、各翻訳ペアのセマンティック空間の表現８３を取得する。図７に示すように、抽出された翻訳ペア「都科摩北京研究所−ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ」のセマンティック空間の表現はＸ１=［４．５１１, ０．７６８, ３．８８６, ４．００８, １．６０４］となる。次に、全ての翻訳ペアのセマンティック空間の中心表現８４を取得する。例えば、全ての翻訳ペアのセマンティック空間の表現を平均化して全ての翻訳ペアのセマンティック空間の中心表現を取得してもよい。次に、各翻訳ペアのセマンティック空間の表現とセマンティック空間の中心表現との間の距離を取得する（例えば、当該距離はユークリッド距離、マンハッタン距離、マハラノビス距離などであってもよい）。最後に、距離が最も小さい翻訳ペアに対応する翻訳結果を前記第１の制約結果として選択する。図７に示すように、Ｘ３に対応する距離が最も小さいので、「Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．」を第１の代用語「都科摩北京研究所」の第１の制約結果８５とする。 First, {Toshina Beijing Research Institute, Beijing Research Institute, DOCOMO Beijing Research Institute} and {Docomo Beijing Labs, Docomo Beijing Communications laboratory co. Ltd. , DBL} are paired with each other to obtain a translation pair 81. Next, for each of the translation pairs 81, the representation 82 of the semantic space is extracted using an existing appropriate method (eg, an existing neural network, etc., not limited thereto), and each translation pair is extracted. Gets the representation 83 of the semantic space of. As shown in FIG. 7, the representation of the semantic space of the extracted translation pair "Toshima Beijing Institute-Docomo Beijing Labs" is X1 = [4.511, 0.768, 3.886, 4.08, 1 .604]. Next, the central representation 84 of the semantic space of all translation pairs is acquired. For example, the semantic space representation of all translation pairs may be averaged to obtain the central representation of the semantic space of all translation pairs. The distance between the semantic space representation of each translation pair and the central representation of the semantic space is then obtained (eg, the distance may be Euclidean distance, Manhattan distance, Mahalanobis distance, etc.). Finally, the translation result corresponding to the translation pair with the shortest distance is selected as the first constraint result. As shown in FIG. 7, since the distance corresponding to X3 is the smallest, "Docomo beijing communications laboratory co. Ltd." is used as the first constraint result 85 of the first alternative term "Toshina Beijing Research Institute". ..

上記の図７を参照してセマンティック空間の中心表現及び距離を取得する方法はそれに限定せず、他の適切な方法を採用してセマンティック空間の中心表現及び距離を取得することもできると理解されたい。 It is understood that the method of obtaining the central representation and distance of the semantic space is not limited to that with reference to FIG. 7 above, and other appropriate methods can be adopted to obtain the central representation and distance of the semantic space. sea bream.

図８に示すように、本開示の実施例による第３の制約モデルにより第１の制約結果を取得する方法４００のフローチャートである。当該第３の制約モデルは教師ありによる第１の制約モデル及び教師なしによる第２の制約モデルを含み、前記教師ありによる第１の制約モデルは学習データにより学習して得られ、前記教師なしによる第２の制約モデルは事前に学習する必要がない。当該方法は以下のステップＳ４０１〜Ｓ４０２を含む。 As shown in FIG. 8, it is a flowchart of the method 400 for acquiring the first constraint result by the third constraint model according to the embodiment of the present disclosure. The third constraint model includes a first constraint model with supervised learning and a second constraint model with no supervised learning, and the first constraint model with supervised learning is obtained by learning from training data and is unsupervised. The second constraint model does not need to be trained in advance. The method comprises the following steps S401-S402.

ステップＳ４０１において、前記第２の制約モデルにより、各翻訳ペアのセマンティック空間の表現に基づいて、Ｎ個の翻訳ペアを含む第２の候補単語集合を取得し、Ｎは２以上の整数である。 In step S401, the second constraint model acquires a second candidate word set containing N translation pairs based on the representation of the semantic space of each translation pair, where N is an integer of 2 or more.

ステップＳ４０２において、前記第１の制約モデルにより、前記第２の候補単語集合における各翻訳ペアの特徴に基づいて、前記第１の制約結果を取得する。 In step S402, the first constraint model acquires the first constraint result based on the characteristics of each translation pair in the second candidate word set.

例えば、ステップＳ４０１について、上記の図６で説明された方法に基づいて、全ての翻訳ペアのセマンティック空間の中心表現と各翻訳ペアのセマンティック空間の表現との間の距離を取得してから、距離が小さい順にソートされ、その中から前記Ｎ個の翻訳ペアを第２の候補単語集合として選択してもよい。次に、当該第２の候補単語集合について、例えば、図５に示す第１の制約モデルにより第１の制約結果を取得する。 For example, for step S401, the distance between the central representation of the semantic space of all translation pairs and the representation of the semantic space of each translation pair is obtained based on the method described in FIG. 6 above, and then the distance. Are sorted in ascending order, and the N translation pairs may be selected as the second candidate word set. Next, for the second candidate word set, for example, the first constraint result is acquired by the first constraint model shown in FIG.

例えば、距離が小さい順にソートされた後に、前の２つの第２の距離に対応する翻訳ペアを第２の候補単語集合（例えば、翻訳ペア「都科摩北京研究所−Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．」及び「都科摩北京研究所−ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ」）が選択される。次に、選択された２つの翻訳ペアを入力として、図５を参照して説明された第１の制約モデルにより前記第２の候補単語集合における各翻訳ペアの特徴に基づいて、前記第１の制約結果を取得する。 For example, after sorting in ascending order of distance, the translation pair corresponding to the previous two second distances is referred to as a second candidate word set (eg, translation pair "Dokomo Beijing Institutes laboratory co. "Ltd." And "Dokomo Beijing Labs") are selected. Next, with the two selected translation pairs as inputs, the first constraint model described with reference to FIG. 5 is based on the characteristics of each translation pair in the second candidate word set. Get the constraint result.

以上、図４〜図８を参照して本開示の実施例による第１の制約結果を取得する方法を説明したが、制約モデルにより、前記第１の制約結果を取得する前に、前記第１の候補単語集合からユーザパーソナライズ情報に存在しない単語を削除してもよい。 Although the method of acquiring the first constraint result according to the embodiment of the present disclosure has been described above with reference to FIGS. 4 to 8, the first constraint result is obtained before the first constraint result is acquired by the constraint model. Words that do not exist in the user personalized information may be deleted from the candidate word set of.

例えば、ユーザパーソナライズ情報は、ユーザの翻訳履歴、ユーザのスタイルの傾向、翻訳の分野のうちの１つ又は複数に基づいて構築されてもよい。例えば、当該方法を使用して情報処理（例えば、翻訳）を行う前に、ユーザは、自分のスタイルの傾向（例えば、書面／口頭、性別、年齢、身元などに基づいて）、翻訳の分野（例えば、法律、ニュース、医療、知的財産など）に基づいて所望のルールを自動的に選択し、その後、当該ルールに基づいて制約結果を最適化することができる。代わりに、当該方法を使用して情報処理（例えば、翻訳）を行う前に、ユーザの翻訳履歴、ユーザ情報からユーザのお気に入りの翻訳スタイルを学習し、その後、当該スタイルに基づいて制約結果を最適化してもよい。上記のユーザパーソナライズ情報に基づいて制約結果を最適化することは一例であり、他の有効な方法を採用して制約結果を最適化することができるが、ここで再度説明されないと理解されたい。 For example, the user personalized information may be constructed based on one or more of the user's translation history, the user's style trends, and the field of translation. For example, prior to information processing (eg, translation) using such methods, users may use their style trends (eg, based on written / verbal, gender, age, identity, etc.), areas of translation (eg, translation). For example, the desired rule can be automatically selected based on the law, news, medical care, intellectual property, etc., and then the constraint result can be optimized based on the rule. Instead, the user's favorite translation style is learned from the user's translation history, user information, and then the constraint result is optimized based on the style before information processing (eg, translation) is performed using the method. It may be converted. It is an example to optimize the constraint result based on the above user personalization information, and other valid methods can be adopted to optimize the constraint result, but it should be understood that it will not be explained again here.

次に、図１に戻って、ステップＳ１０４において、前記第１の制約結果に応じて、前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成する。 Next, returning to FIG. 1, in step S104, the translation result of the first information to be processed is modified according to the first constraint result to generate the second information to be processed. ..

例えば、前記第１の制約結果に応じて、制約付きの復号化（ｃｏｎｓｔｒａｉｎｅｄｄｅｃｏｄｉｎｇ）処理を使用して前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成することができる。当該制約付きの復号化処理は、センテンス文を流暢に保つことを前提として、第１の制約結果を利用して第１の処理すべき情報を修正することができる。第１の処理すべき情報を修正する方法はそれに限定されず、他の既知の技術を使用して翻訳結果を修正することもできるが、ここで再度説明されないと理解されたい。 For example, information to be processed second by modifying the translation result of the information to be processed first by using constrained decoding processing according to the first constraint result. Can be generated. The constrained decryption process can modify the information to be processed first by using the first constraint result on the premise that the sentence sentence is kept fluent. The first method of modifying the information to be processed is not limited to that, and other known techniques can be used to modify the translation result, but it should be understood that it will not be explained again here.

以下、図９〜図１０を参照して本開示の実施例によるソースエンドに基づく情報処理方法の例を説明する。 Hereinafter, an example of the information processing method based on the source end according to the embodiment of the present disclosure will be described with reference to FIGS. 9 to 10.

上述のように、第１の処理テキストは一篇の翻訳すべきテキストであってもよく、第１の処理すべき情報は当該一篇の翻訳すべきテキスト内の文である場合に、当該一篇の翻訳すべきテキストに基づいて１つ又は複数の候補単語集合を生成することができる。代わりに、また、当該文の前の１つ又は複数のセンテンスに基づいて１つ又は複数の候補単語集合を生成してもよい。 As described above, the first processed text may be one text to be translated, and the first processed information may be a sentence in the one text to be translated. One or more candidate word sets can be generated based on the text to be translated. Alternatively, one or more candidate word sets may be generated based on the one or more sentences preceding the sentence.

図９に示すように、当該一篇の翻訳すべきテキストに基づいて１つ又は複数の候補単語集合を生成する例示的な概略図であり、図１０は当該文の前の１つ又は複数のセンテンスに基づいて１つ又は複数の候補単語集合を生成する例示的な概略図である。 As shown in FIG. 9, it is an exemplary schematic diagram that generates one or more candidate word sets based on the text to be translated, and FIG. 10 is one or more before the sentence. It is an exemplary schematic diagram which generates one or more candidate word sets based on a sentence.

図９〜図１０に示すように、第１の処理すべき情報（即ち、現在のセンテンス２１）について、前記第１の処理すべき情報に第１の代用語が含まれると、まず共参照クエリにより前記第１の代用語を含む第１の候補単語集合を選択し、（即ち、関連する候補単語集合２３）を選択し、次に、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果２５を取得し、前記第１の制約結果２５に応じて、前記第１の処理すべき情報の翻訳結果２７を修正２６することで、第２の処理すべき情報（即ち、修正された翻訳結果２８）を生成する。 As shown in FIGS. 9 to 10, when the first substitute term is included in the first information to be processed for the first information to be processed (that is, the current sentence 21), a co-reference query is first made. Selects the first candidate word set containing the first alternative term, (ie, the related candidate word set 23), and then by the constraint model, the first candidate word set in the first candidate word set. By acquiring the first constraint result 25 translated corresponding to the substitute term of 1, and modifying the translation result 27 of the first information to be processed 26 according to the first constraint result 25. , Generates second information to be processed (ie, modified translation result 28).

図９において、当該一篇の翻訳すべきテキスト（即ち、全文２９）に基づいて、上記の図３を参照する方法（例えば、共参照単語発見３０）で１つ又は複数の候補単語集合３１を生成する。代わりに、図１０において、当該文の前の１つ又は複数のセンテンス３２に基づいて、上記の図３参照する方法（例えば、共参照単語発見３３）で１つ又は複数の候補単語集合３１を生成する。 In FIG. 9, one or a plurality of candidate word sets 31 are obtained by the method of referring to FIG. 3 above (for example, co-reference word discovery 30) based on the text to be translated (that is, the full sentence 29). Generate. Instead, in FIG. 10, one or more candidate word sets 31 are presented in FIG. 10 by the method of reference to FIG. 3 above (eg, co-reference word discovery 33) based on one or more sentences 32 before the sentence. Generate.

以上、図１〜図１０を参照して本開示の実施例によるソースエンドに基づく情報処理方法を説明した。以下、図１１〜図１５を参照して本開示の実施例によるターゲットエンドに基づく情報処理方法を説明する。 As described above, the information processing method based on the source end according to the embodiment of the present disclosure has been described with reference to FIGS. 1 to 10. Hereinafter, the information processing method based on the target end according to the embodiment of the present disclosure will be described with reference to FIGS. 11 to 15.

まず、図１１を参照して本開示の実施例によるターゲットエンドに基づく情報処理方法５００を説明する。当該方法はコンピュータなどによって自動的に完了することができる。例えば、当該方法は、情報やテキストを翻訳するために適用することができる。例えば、当該情報処理取得方法は、ソフトウェア、ハードウェア、ファームウェア又はその任意の組み合わせで実現することができ、例えば、携帯電話、タブレット、ラップトップ、デスクトップコンピュータ、ウェブサーバなどの機器のプロセッサによってロードされ実行される。 First, the information processing method 500 based on the target end according to the embodiment of the present disclosure will be described with reference to FIG. The method can be completed automatically by a computer or the like. For example, the method can be applied to translate information or text. For example, the information processing acquisition method can be realized by software, hardware, firmware or any combination thereof, and is loaded by the processor of a device such as a mobile phone, a tablet, a laptop, a desktop computer, or a web server. Will be executed.

図１１に示すように、当該情報処理方法は以下のステップＳ５０１〜Ｓ５０４を含む。 As shown in FIG. 11, the information processing method includes the following steps S501 to S504.

ステップＳ５０１において、第３の処理すべき情報を翻訳して、第４の処理すべき情報を生成する。 In step S501, the third information to be processed is translated to generate the fourth information to be processed.

ステップＳ５０２において、前記第４の処理すべき情報に第３の代用語が含まれると、１つ又は複数の候補単語集合から前記第３の代用語を含む第３の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し、表現が一致しない候補単語を含む。 In step S502, when the third substitute word is included in the fourth information to be processed, the third candidate word set containing the third substitute word is selected from one or more candidate word sets. Each of the one or more candidate word sets includes candidate words whose meanings match at least two and whose expressions do not match.

ステップＳ５０３において、制約モデルにより、前記第３の候補単語集合における第３の制約結果を取得する。 In step S503, the constraint model acquires the third constraint result in the third candidate word set.

ステップＳ５０４において、前記第３の制約結果に応じて、前記第４の処理すべき情報を修正することで、第５の処理すべき情報を生成する。 In step S504, the fifth information to be processed is generated by modifying the fourth information to be processed according to the third constraint result.

ステップＳ５０１について、例えば、第３の処理すべき情報は一段落又は一編の翻訳すべきテキストであってもよいし、一句の翻訳すべきテキストであってもよい。それは、任意の言語形態（例えば、中国語、英語、日本語など）であってもよいが、ここで限定しない。本開示の情報処理方法により、第３の処理すべき情報を所望のテキストに翻訳することができ（例えば、中国語を所望の英語、日本語などに翻訳するが、ここで限定しない）、そして、同じ又は類似の意味である代用語の翻訳を一致させる。 Regarding step S501, for example, the third information to be processed may be one paragraph or one text to be translated, or one phrase may be the text to be translated. It may be in any language form (eg, Chinese, English, Japanese, etc.), but is not limited herein. The information processing method of the present disclosure allows a third piece of information to be processed to be translated into the desired text (eg, Chinese is translated into the desired English, Japanese, etc., but is not limited herein), and , Match translations of alternative terms that have the same or similar meanings.

ステップＳ５０２について、例えば、第３の代用語は、その意味と一致し、表現が一致しない名詞又は代詞が存在する可能性がある。例えば、第３の代用語は第３の候補単語集合に含まれ、第３の候補単語集合は少なくとも２つの意味が一致し、表現が一致しない候補単語を含んでもよい。第３の代用語及び第３の候補単語集合の意味は上記の図１における第１の代用語及び第１の候補単語集合の意味と同じであるので、ここで再度説明されない。 For step S502, for example, the third pronoun may have a noun or pronoun that matches its meaning and does not match the expression. For example, the third alternative term may be included in the third candidate word set, and the third candidate word set may include candidate words whose meanings match at least two and whose expressions do not match. The meanings of the third substitute term and the third candidate word set are the same as the meanings of the first substitute term and the first candidate word set in FIG. 1 above, and are not explained again here.

例えば、前記第３の処理すべき情報は第２の処理テキストの一部であり、前記１つ又は複数の候補単語集合は、前記第２の処理テキストの翻訳テキストに基づいて生成され、又は、前記第３の処理すべき情報の前の１つ又は複数の処理すべき情報の翻訳テキストに基づいて生成される。 For example, the third processed information is part of the second processed text, and the one or more candidate word sets are generated or based on the translated text of the second processed text. It is generated based on the translated text of one or more pieces of information to be processed prior to the third piece of information to be processed.

一例として、第２の処理テキストは一篇の翻訳すべきテキストであってもよく、第３の処理すべき情報は当該一篇の翻訳すべきテキスト内の文であり、前記第３の処理すべき情報の前の１つ又は複数の処理すべき情報の翻訳テキストに基づいて１つ又は複数の候補単語集合を生成することができる。 As an example, the second processed text may be one text to be translated, and the third information to be processed is a sentence in the one text to be translated, and the third processed text is described. It is possible to generate one or more candidate word sets based on the translated text of one or more information to be processed before the information to be processed.

代わりに、他の一例として、第２の処理テキストは一篇の翻訳すべきテキストであってもよく、第３の処理すべき情報は当該一篇の翻訳すべきテキスト内の文であり、前記第２の処理テキストの翻訳テキストに基づいて１つ又は複数の候補単語集合を生成することができる。 Alternatively, as another example, the second processed text may be a text to be translated, and the third information to be processed is a sentence in the text to be translated, said. It is possible to generate one or more candidate word sets based on the translated text of the second processed text.

本開示は、ニューラルネットワークに基づいて候補単語集合を取得する方法を提供する。当該方法は、従来のニューラルネットワークにより、情報に含まれる全ての単語（名詞、代詞）における候補単語集合を取得できる。 The present disclosure provides a method of obtaining a candidate word set based on a neural network. In this method, a candidate word set for all words (nouns, pronouns) included in the information can be obtained by a conventional neural network.

一例として、学習済みのニューラルネットワークにより前記１つ又は複数の候補単語集合を生成することができる。 As an example, the trained neural network can generate the one or more candidate word sets.

例えば、当該ニューラルネットワークは、セマンティック関連性、含意関係などに基づいて、その入力情報からセマンティック意味が一致し表現が一致しない用語及び／又は代詞を取得して候補単語集合を構築することができる。一例として、当該ニューラルネットワークは、単語の単語ベクトルについて、それと他の単語の単語ベクトルとをペアリング、採点し、得点が最も高い単語を当該単語の候補単語集合における候補単語として取ってもよい。 For example, the neural network can construct a candidate word set by acquiring terms and / or pronouns whose semantic meanings match and expressions do not match from the input information based on semantic relationships, implications, and the like. As an example, the neural network may pair and score the word vector of a word with the word vector of another word, and take the word with the highest score as a candidate word in the candidate word set of the word.

本開示の実施例によるニューラルネットワークは異なるニューラルネットワーク構成を採用でき、畳み込みニューラルネットワーク、リカレントニューラルネットワーク（ＲＮＮ）などを含むがそれに限定されないことが容易に理解される。前記畳み込みニューラルネットワークはＵ−Ｎｅｔニューラルネットワーク、ＲｅｓＮｅｔ、ＤｅｎｓｅＮｅｔなどを含むがそれに限定されない。 It is easily understood that the neural networks according to the embodiments of the present disclosure can adopt different neural network configurations, including, but not limited to, convolutional neural networks, recurrent neural networks (RNNs), and the like. The convolutional neural network includes, but is not limited to, a U-Net neural network, ResNet, DenseNet, and the like.

ニューラルネットワークに基づいて１つ又は複数の候補単語集合を取得する方法は一例であり、他の方法で１つ又は複数の候補単語集合を取得することもできるが、ここで限定しない。 The method of acquiring one or more candidate word sets based on the neural network is an example, and one or more candidate word sets can be acquired by other methods, but the present invention is not limited thereto.

１つ又は複数の候補単語集合を取得した後に、１つ又は複数の候補単語集合から前記第３の代用語を含む第３の候補単語集合を選択することができる。次に、ステップＳ５０３において、制約モデルにより、前記第３の候補単語集合における第３の制約結果を取得する。 After acquiring one or more candidate word sets, a third candidate word set containing the third alternative term can be selected from the one or more candidate word sets. Next, in step S503, the third constraint result in the third candidate word set is acquired by the constraint model.

以下、図１２〜図１５を参照して本開示の実施例による第３の制約結果を取得する方法を説明する。 Hereinafter, a method of obtaining the third constraint result according to the embodiment of the present disclosure will be described with reference to FIGS. 12 to 15.

例えば、制約モデルにより、前記第３の候補単語集合における第３の制約結果を取得することは、前記第３の候補単語集合における各候補単語の特徴及び／又は各候補単語のセマンティック空間の表現に基づいて、前記第３の制約結果を取得することを含むことができる。 For example, to obtain the third constraint result in the third candidate word set by the constraint model is a feature of each candidate word in the third candidate word set and / or a representation of the semantic space of each candidate word. Based on this, it can include acquiring the third constraint result.

例えば、前記制約モデルは教師ありによる第１の制約モデルであり、前記に前記第３の候補単語集合における各候補単語の特徴及び／又は各候補単語のセマンティック空間の表現に基づいて、前記第３の制約結果を取得することは、前記第１の制約モデルにより、前記第３の候補単語集合における各候補単語の特徴に基づいて、前記第３の制約結果を取得することを含むことができ、前記教師ありによる第１の制約モデルは学習データにより学習して得られる。 For example, the constraint model is a first constrained model with a teacher, and the third is based on the characteristics of each candidate word in the third candidate word set and / or the representation of the semantic space of each candidate word. Acquiring the constraint result of the above can include acquiring the third constraint result based on the characteristics of each candidate word in the third candidate word set by the first constraint model. The first constraint model with supervision is obtained by training with training data.

例えば、候補単語の特徴は、翻訳が選択された頻度、最近選択されたか否か、翻訳の長さ、正式単語に含まれているか否か、セマンティック関連度、語句の単語の埋め込み表現（ｅｍｂｅｄｄｉｎｇ）などのうちの１つ又は複数を含んでもよい。当該特徴は、ユーザ辞書、ビッグデータ統計などから取得されることができるが、ここで限定しない。翻訳ペアの特徴は上記の１つ又は複数の特徴に限定されず、必要に応じて、他の特徴を追加することができるが、ここで限定しないと理解されたい。 For example, the characteristics of a candidate word are the frequency with which the translation was selected, whether it was recently selected, the length of the translation, whether it was included in the formal word, the semantic relevance, and the embedded expression of the word in the phrase (embedding). It may contain one or more of the above. The feature can be obtained from a user dictionary, big data statistics, etc., but is not limited here. It should be understood that the features of the translation pair are not limited to one or more of the above features and other features may be added as needed, but are not limited here.

例えば、中国語を英語に翻訳する場合に、翻訳して生成された第４の処理すべき情報に含まれる第３の代用語が「ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ」であるとすると、それに対応する第３の候補単語集合は｛ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ，Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．，ＤＢＬ｝となる。 For example, when translating Chinese into English, if the third alternative term contained in the fourth information to be processed generated by translation is "Docomo Beijing Labs", the corresponding third Candidate word sets are {Docomo Beijing Labs, Docomo Beijing communications laboratory co. Ltd. , DBL}.

まず、翻訳集合｛ＤｏｃｏｍｏＢｅｉｊｉｎｇＬａｂｓ，Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．，ＤＢＬ｝のそれぞれについて、既存の適切な方法を利用して特徴抽出を行う。例えば、各翻訳の、翻訳が選択された頻度、最近選択されたか否か、翻訳の長さ、正式単語に含まれているか否か、セマンティック関連度の特徴を抽出してもよい。次に、各特徴に基づいて各翻訳の得点を算出する。また、取得された得点を、ｓｏｆｔｍａｘを含むニューラルネットワークを通過させて各得点に対応する確率を取得する。最後に、確率が最も高い翻訳を第３の制約結果として選択する。例えば、「Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．」の確率が最も高い場合に、「Ｄｏｃｏｍｏｂｅｉｊｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｌａｂｏｒａｔｏｒｙｃｏ．Ｌｔｄ．」を第３の制約結果とする。 First, the translation set {Docomo Beijing Labs, Docomo Beijing communications laboratory co. Ltd. , DBL}, feature extraction is performed using an existing appropriate method. For example, for each translation, the frequency of selection of the translation, whether it was recently selected, the length of the translation, whether it is included in the formal word, and the characteristics of the semantic relevance may be extracted. Next, the score of each translation is calculated based on each feature. In addition, the acquired scores are passed through a neural network including softmax to acquire the probability corresponding to each score. Finally, the translation with the highest probability is selected as the third constraint result. For example, when the probability of "DOCOMO Beijing communications laboratory co. Ltd." is the highest, "DOCOMO Beijing communications laboratory co. Ltd." is set as the third constraint result.

第１のモデルにより第３の制約結果を取得する方法と上記の第１のモデルにより第１の制約結果を取得する方法との違いは処理対象が異なることであり、第１のモデルにより第３の制約結果を取得する方法の処理対象は第３の代用語を含む第３の候補単語集合であり、第１のモデルにより第１の制約結果を取得する方法の処理対象は全ての翻訳ペアであると理解されたい。 The difference between the method of acquiring the third constraint result by the first model and the method of acquiring the first constraint result by the above first model is that the processing target is different, and the third model is the third. The processing target of the method of acquiring the constraint result of is the third candidate word set including the third alternative term, and the processing target of the method of acquiring the first constraint result by the first model is all translation pairs. Please understand that there is.

例えば、上記の教師ありによる第１の制約モデルは、ラベル付けされた学習データにより事前に学習して得られてもよい。上記の教師ありによる第１の制約モデルはニューラルネットワークモデル又は統計方法モデルなどであってもよいが、ここで限定しない。本開示の実施例による教師ありによる第１の制約モデルは異なるニューラルネットワーク構成を採用でき、畳み込みニューラルネットワーク、リカレントニューラルネットワーク（ＲＮＮ）などを含むがそれに限定されないことが容易に理解される。前記畳み込みニューラルネットワークはＵ−Ｎｅｔニューラルネットワーク、ＲｅｓＮｅｔ、ＤｅｎｓｅＮｅｔなどを含むがそれに限定されない。本開示は、中国語を英語に翻訳する場合に限定されず、本開示は、例えば、中国語−日本語翻訳、英語−日本語翻訳、ドイツ語−中国語翻訳など、任意の所望の翻訳をする場合に適用することができると理解されたい。 For example, the supervised first constraint model described above may be obtained by pre-training with labeled training data. The above supervised first constraint model may be a neural network model, a statistical method model, or the like, but is not limited thereto. It is easily understood that the supervised first constraint model according to the embodiments of the present disclosure can employ different neural network configurations, including, but not limited to, convolutional neural networks, recurrent neural networks (RNNs), and the like. The convolutional neural network includes, but is not limited to, a U-Net neural network, ResNet, DenseNet, and the like. The present disclosure is not limited to translating Chinese into English, and the present disclosure may include any desired translation, such as Chinese-Japanese translation, English-Japanese translation, German-Chinese translation. It should be understood that it can be applied when doing so.

図１２に示すように、本開示の実施例による第２の制約モデルにより第３の制約結果を取得する方法６００のフローチャートである。上記の制約モデルは教師なしによる第２の制約モデルであってもよく、前記第３の候補単語集合における各候補単語の特徴及び／又は各候補単語のセマンティック空間の表現に基づいて前記第３の制約結果を取得することは、前記第２の制約モデルにより、各候補単語のセマンティック空間の表現に基づいて前記第１の制約結果を取得することを含むことができ、前記教師なしによる第２の制約モデルは事前に学習する必要がない。 As shown in FIG. 12, it is a flowchart of the method 600 for acquiring the third constraint result by the second constraint model according to the embodiment of the present disclosure. The constraint model may be a second constraint model without a teacher, based on the characteristics of each candidate word in the third candidate word set and / or the representation of the semantic space of each candidate word. Obtaining the constraint result can include obtaining the first constraint result based on the representation of the semantic space of each candidate word by the second constraint model, and the second without the teacher. The constraint model does not need to be trained in advance.

例えば、上記の教師なしによる第２の制約モデルは、テキストランク（ＴｅｘｔＲａｎｋ）、埋め込みセンター（ＥｍｂｅｄｄｉｎｇＣｅｎｔｅｒ）などであってもよいが、ここで限定しない。 For example, the above unsupervised second constraint model may be, but is not limited to, a text rank, an embedded center, and the like.

一例として、前記第２の制約モデルにより、各候補単語のセマンティック空間の表現に基づいて前記第１の制約結果を取得することは、全ての候補単語のセマンティック空間の中心表現と各候補単語のセマンティック空間の表現との間の距離に基づいて、前記第１の制約結果を取得することを含む。 As an example, to obtain the first constraint result based on the representation of the semantic space of each candidate word by the second constraint model is the central representation of the semantic space of all candidate words and the semantic of each candidate word. It involves acquiring the first constraint result based on the distance to and from the representation of space.

図１２に示すように、全ての候補単語のセマンティック空間の中心表現と各候補単語のセマンティック空間の表現との間の距離に基づいて、前記第１の制約結果を取得することは、全ての候補単語のセマンティック空間の中心表現を取得し（Ｓ６０１）、全ての候補単語のセマンティック空間の中心表現と各候補単語のセマンティック空間の表現との間の距離を取得し（Ｓ６０２）、距離が最も小さい候補単語を第３の制約結果として選択する（Ｓ６０３）ことを含むことができる。 As shown in FIG. 12, it is all candidates to obtain the first constraint result based on the distance between the central representation of the semantic space of all candidate words and the representation of the semantic space of each candidate word. Obtain the central representation of the semantic space of a word (S601), obtain the distance between the central representation of the semantic space of all candidate words and the representation of the semantic space of each candidate word (S602), and obtain the candidate with the shortest distance. It can include selecting a word as a third constraint result (S603).

第２のモデルにより第３の制約結果を取得する方法と上記の第２のモデルにより第１の制約結果を取得する方法との違いは処理対象が異なることであり、第２のモデルにより第３の制約結果を取得する方法の処理対象は第３の代用語を含む第３の候補単語集合であり、第２のモデルにより第１の制約結果を取得する方法の処理対象は全ての翻訳ペアであると理解されたい。従って、ここで再度説明されない。 The difference between the method of acquiring the third constraint result by the second model and the method of acquiring the first constraint result by the above second model is that the processing target is different, and the third model is the third. The processing target of the method of acquiring the constraint result of is the third candidate word set including the third alternative term, and the processing target of the method of acquiring the first constraint result by the second model is all translation pairs. Please understand that there is. Therefore, it will not be explained again here.

図１３は、本開示の実施例による第３の制約モデルにより第３の制約結果を取得する方法７００のフローチャートである。当該第３の制約モデルは教師ありによる第１の制約モデル及び教師なしによる第２の制約モデルを含み、前記教師ありによる第１の制約モデルは学習データにより学習して得られ、前記教師なしによる第２の制約モデルは事前に学習する必要がない。当該方法は以下のステップＳ７０１〜Ｓ７０２を含む。 FIG. 13 is a flowchart of the method 700 for acquiring the third constraint result by the third constraint model according to the embodiment of the present disclosure. The third constraint model includes a first constraint model with supervised learning and a second constraint model with no supervised learning, and the first constraint model with supervised learning is obtained by learning from training data and is unsupervised. The second constraint model does not need to be trained in advance. The method includes the following steps S701-S702.

ステップＳ７０１において、前記第２の制約モデルにより、各候補単語のセマンティック空間の表現に基づいてＭ個の候補単語を含む第４の候補単語集合を取得し、Ｍは２以上の整数である。 In step S701, the second constraint model acquires a fourth candidate word set containing M candidate words based on the semantic space representation of each candidate word, and M is an integer of 2 or more.

ステップＳ７０２において、前記第１の制約モデルにより、前記第４の候補単語集合における各候補単語の特徴に基づいて前記第３の制約結果を取得する。 In step S702, the third constraint result is acquired based on the characteristics of each candidate word in the fourth candidate word set by the first constraint model.

例えば、ステップＳ７０１について、上記の図６で説明された方法に基づいて、全ての候補単語のセマンティック空間の中心表現と各候補単語のセマンティック空間の表現との間の距離を取得し、次に、距離が小さい順にソートされ、その中から前のＭ個の候補単語を第４の候補単語集合として選択してもよい。次に、当該第４の候補単語集合について、例えば、図５に示す第１の制約モデルにより第３の制約結果を取得する。 For example, for step S701, the distance between the central representation of the semantic space of all candidate words and the representation of the semantic space of each candidate word is obtained based on the method described in FIG. 6 above, and then the distance is obtained. Sorted in ascending order of distance, the previous M candidate words may be selected as the fourth candidate word set. Next, for the fourth candidate word set, for example, a third constraint result is acquired by the first constraint model shown in FIG.

ステップＳ７０１〜Ｓ７０２は上記の図８で説明された方法と類似し、第３のモデルにより第３の制約結果を取得する方法と上記の第３のモデルにより第１の制約結果を取得する方法との違いは処理対象が異なることであり、第３のモデルにより第３の制約結果を取得する方法の処理対象は第３の代用語を含む第３の候補単語集合であり、第３のモデルにより第１の制約結果を取得する方法の処理対象は全ての翻訳ペアであるので、ここで再度説明されないと理解されたい。 Steps S701 to S702 are similar to the method described with reference to FIG. 8, and include a method of acquiring a third constraint result by the third model and a method of acquiring the first constraint result by the above third model. The difference is that the processing target is different, and the processing target of the method of acquiring the third constraint result by the third model is the third candidate word set including the third alternative term, and the processing target is the third model. It should be understood that the processing target of the method of acquiring the first constraint result is all translation pairs, so that it will not be explained again here.

以上、図１２〜図１３を参照して本開示の実施例による第３の制約結果を取得する方法を説明し、次に、図１１に戻って、ステップＳ５０４において、前記第３の制約結果に応じて、前記第４の処理すべき情報を修正することで、第５の処理すべき情報を生成する。 As described above, a method of acquiring the third constraint result according to the embodiment of the present disclosure will be described with reference to FIGS. 12 to 13, and then, returning to FIG. 11, in step S504, the third constraint result will be obtained. Correspondingly, by modifying the fourth information to be processed, the fifth information to be processed is generated.

例えば、前記第３の制約結果に応じて、制約付きの復号化（ｃｏｎｓｔｒａｉｎｅｄｄｅｃｏｄｉｎｇ）処理を使用して前記第４の処理すべき情報を修正することで、第５の処理すべき情報を生成してもよい。当該制約付きの復号化処理は、センテンス文を流暢に保つことを前提として、第３の制約結果を利用して第４の処理すべき情報を修正することができる。第４の処理すべき情報を修正する方法はそれに限定されず、他の既知の技術を使用して翻訳結果を修正することもできるが、ここで再度説明されないと理解されたい。 For example, the fifth information to be processed is generated by modifying the information to be processed in the fourth by using the constrained decoding process according to the third constraint result. You may. The constrained decryption process can modify the fourth information to be processed by utilizing the third constraint result on the premise that the sentence sentence is kept fluent. The fourth method of modifying the information to be processed is not limited to that, and other known techniques can be used to modify the translation result, but it should be understood that it will not be explained again here.

以下、図１４〜図１５を参照して本開示の実施例によるターゲットエンドに基づく情報処理方法の例を説明する。 Hereinafter, an example of the information processing method based on the target end according to the embodiment of the present disclosure will be described with reference to FIGS. 14 to 15.

以上のように、一例として、第２の処理テキストは一篇の翻訳すべきテキストであってもよく、第３の処理すべき情報は当該一篇の翻訳すべきテキスト内の文であり、前記第３の処理すべき情報の前の１つ又は複数の処理すべき情報の翻訳テキストに基づいて１つ又は複数の候補単語集合を生成してもよい。代わりに、他の一例として、前記第２の処理テキストの翻訳テキストに基づいて１つ又は複数の候補単語集合を生成してもよい。 As described above, as an example, the second processed text may be one text to be translated, and the third information to be processed is a sentence in the one text to be translated. One or more candidate word sets may be generated based on the translated text of one or more pieces of information to be processed before the third piece of information to be processed. Alternatively, as another example, one or more candidate word sets may be generated based on the translated text of the second processed text.

図１４に示すように、第２の処理テキストの翻訳テキストに基づいて１つ又は複数の候補単語集合を生成する例示的な例の概略図であり、図１５は前記第３の処理すべき情報の前の１つ又は複数の処理すべき情報の翻訳テキストに基づいて１つ又は複数の候補単語集合を生成する例示的な例の概略図である。 As shown in FIG. 14, it is a schematic diagram of an exemplary example of generating one or more candidate word sets based on the translated text of the second processed text, and FIG. 15 is the third information to be processed. It is a schematic diagram of an exemplary example of generating one or more candidate word sets based on the translated text of one or more pieces of information to be processed before.

図１４に示すように、第２の処理テキストは、一篇の翻訳すべきテキストであってもよく、第３の処理すべき情報は当該一篇の翻訳すべきテキスト内の文である。第３の処理すべき情報（現在のセンテンス４１）について、まずそれに対して翻訳４２を行って第４の処理すべき情報４３を生成し、次に、共参照クエリにより第３の代用語の第３の候補単語集合を取得し（即ち、関連する候補単語集合４５）、次に、制約モデル４６により、前記第３の候補単語集合における第３の制約結果４７を取得し、最後に、前記第３の制約結果４７に応じて、前記第４の処理すべき情報４８を修正することで、第５の処理すべき情報４９を生成する。 As shown in FIG. 14, the second processed text may be one text to be translated, and the third information to be processed is a sentence in the one text to be translated. For the third information to be processed (current sentence 41), the translation 42 is first performed to generate the fourth information 43 to be processed, and then the co-reference query is used to generate the third alternative term. The candidate word set of 3 is obtained (that is, the related candidate word set 45), then the constraint model 46 obtains the third constraint result 47 in the third candidate word set, and finally, the third candidate word set. By modifying the fourth information 48 to be processed according to the constraint result 47 of 3, the fifth information 49 to be processed is generated.

図１４において、当該一篇の翻訳すべきテキストの初期翻訳５０（即ち、第４の処理すべき情報）に基づいて、上記の方法（例えば、共参照単語発見５１）により１つ又は複数の候補単語集合５２を生成する。 In FIG. 14, one or more candidates by the above method (eg, co-reference word discovery 51) based on the initial translation 50 (ie, the fourth information to be processed) of the text to be translated. Generate a word set 52.

代わりに、図１５に示すように、第２の処理テキストは一篇の翻訳すべきテキストであってもよく、第３の処理すべき情報は当該一篇の翻訳すべきテキスト内の文である。前記第３の処理すべき情報の前の１つ又は複数の処理すべき情報の翻訳テキストに基づいて１つ又は複数の候補単語集合を生成してもよい。この場合、処理されている現在のセンテンス５３の前の１つ又は複数のセンテンス５４に基づいて、上記の方法（例えば、共参照単語発見５５）により１つ又は複数の候補単語集合５６を生成する。 Alternatively, as shown in FIG. 15, the second processed text may be one text to be translated, and the third information to be processed is the text in the one text to be translated. .. One or more candidate word sets may be generated based on the translated text of one or more pieces of information to be processed before the third piece of information to be processed. In this case, one or more candidate word sets 56 are generated by the above method (eg, co-reference word discovery 55) based on one or more sentences 54 before the current sentence 53 being processed. ..

以上、それぞれ図１〜図１０を参照して本開示の実施例によるソースエンドに基づく情報処理方法を説明し、図１１〜図１５を参照して本開示の実施例によるターゲットエンドに基づく情報処理方法を説明した。上記の方法により、翻訳を修正でき、文章における全ての代用語の翻訳を一致させ、翻訳の正確性及び専門性を向上させる。 As described above, the information processing method based on the source end according to the embodiment of the present disclosure will be described with reference to FIGS. 1 to 10, and the information processing based on the target end according to the embodiment of the present disclosure will be described with reference to FIGS. 11 to 15. I explained the method. By the above method, the translation can be modified, the translations of all the alternative terms in the sentence are matched, and the accuracy and specialty of the translation are improved.

図１６は本開示の実施例による翻訳結果の概略図である。図１６から、処理すべき情報６０に含まれる代用語（下線を引く）について、本開示の方法を使用して取得された翻訳結果は、他の方法（例えば、従来の翻訳方法、辞書置換）で取得された翻訳結果よりも一致し、代詞の問題を処理でき、翻訳の正確性及び専門性を向上させる。 FIG. 16 is a schematic diagram of translation results according to the examples of the present disclosure. From FIG. 16, for the pronouns (underlined) contained in the information 60 to be processed, the translation results obtained using the method of the present disclosure can be obtained by other methods (eg, conventional translation method, dictionary replacement). It is more consistent than the translation results obtained in, can handle pronoun problems, and improves translation accuracy and expertise.

以上、図面を参照して本発明の実施例による情報処理方法を説明した。以下、本開示の実施例による情報処理装置を説明する。 The information processing method according to the embodiment of the present invention has been described above with reference to the drawings. Hereinafter, the information processing apparatus according to the embodiment of the present disclosure will be described.

図１７は本開示の実施例による情報処理装置の機能ブロック図を図示する。図１７に示すように、本開示の実施例による情報処理装置１３００は、第１の処理すべき情報取得ユニット１３０１、第１の候補単語集合選択ユニット１３０２、第１の制約結果取得ユニット１３０３、及び第２の処理すべき情報生成ユニット１３０４を含む。上記の各モジュールは、それぞれ上記図１ないし図１１を参照して説明された本開示の実施例による情報処理方法の各ステップを実行する。当業者は、これらのユニットモジュールは、ハードウェアのみ、ソフトウェアのみ、又はそれらの組み合わせによって様々な方法で実現することができ、本開示はそれらのいずれかに限定されない。例えば、中央処理装置（ＣＰＵ）、グラフィックスプロセッシングユニット（ＧＰＵ）、テンソル・プロセッシング・ユニット（ＴＰＵ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、又はデータ処理能力及び／又は命令実行能力を有する他の形式の処理ユニット、及び相応するコンピュータ命令によりこれらのユニットを実現してもよいと理解されたい。 FIG. 17 illustrates a functional block diagram of the information processing apparatus according to the embodiment of the present disclosure. As shown in FIG. 17, the information processing apparatus 1300 according to the embodiment of the present disclosure includes a first information acquisition unit 1301 to be processed, a first candidate word set selection unit 1302, a first constraint result acquisition unit 1303, and a first constraint result acquisition unit 1303. It includes a second information generation unit 1304 to be processed. Each of the above modules performs each step of the information processing method according to the embodiment of the present disclosure described with reference to FIGS. 1 to 11 above, respectively. Those skilled in the art can realize these unit modules in various ways by hardware only, software only, or a combination thereof, and the present disclosure is not limited to any of them. For example, a central processing unit (CPU), graphics processing unit (GPU), tensor processing unit (TPU), field programmable gate array (FPGA), or other form of data processing and / or instruction execution capability. It should be understood that these units may be implemented by processing units and corresponding computer instructions.

例えば、第１の処理すべき情報取得ユニット１３０１は、第１の処理すべき情報を取得するために用いられる。 For example, the first information acquisition unit 1301 to be processed is used to acquire the first information to be processed.

例えば、前記第１の処理すべき情報に第１の代用語が含まれると、第１の候補単語集合選択ユニット１３０２は、１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し、表現が一致しない候補単語を含む。 For example, when the first information to be processed contains the first substitute word, the first candidate word set selection unit 1302 includes the first substitute word from one or more candidate word sets. One candidate word set is selected, and each of the one or more candidate word sets includes candidate words whose meanings match at least two and whose expressions do not match.

例えば、第１の制約結果取得ユニット１３０３は制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得するために用いられる。 For example, the first constraint result acquisition unit 1303 is used to acquire the first constraint result translated corresponding to the first alternative term in the first candidate word set by the constraint model.

例えば、第２の処理すべき情報生成ユニット１３０４は、前記第１の制約結果に応じて、前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成するために用いられる。 For example, the second information generation unit 1304 to be processed generates the second information to be processed by modifying the translation result of the first information to be processed according to the first constraint result. Used to do.

例えば、第１の制約結果取得ユニット１３０３は、前記第１の候補単語集合における各候補単語と各候補単語の１つ又は複数の翻訳結果とをペアリングし、翻訳ペアとし、各翻訳ペアの特徴及び／又は各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得してもよい。 For example, the first constraint result acquisition unit 1303 pairs each candidate word in the first candidate word set with one or more translation results of each candidate word to form a translation pair, and features of each translation pair. And / or the first constraint result may be obtained based on the representation of the semantic space of each translation pair.

例えば、前記制約モデルは教師ありによる第１の制約モデルであり、第１の制約結果取得ユニット１３０３は、前記第１の制約モデルにより、各翻訳ペアの特徴に基づいて、前記第１の制約結果を取得してもよく、前記教師ありによる第１の制約モデルは学習データにより学習して得られる。 For example, the constraint model is a supervised first constraint model, and the first constraint result acquisition unit 1303 is based on the characteristics of each translation pair by the first constraint model. May be obtained, and the first supervised constraint model is obtained by learning from the training data.

例えば、前記制約モデルは教師なしによる第２の制約モデルであり、第１の制約結果取得ユニット１３０３は、前記第２の制約モデルにより、各翻訳ペアのセマンティック空間の表現に基づいて、前記第１の制約結果を取得してもよく、前記教師なしによる第２の制約モデルは事前に学習する必要がない。 For example, the constraint model is an unsupervised second constraint model, and the first constraint result acquisition unit 1303 is based on the representation of the semantic space of each translation pair by the second constraint model. The constraint result of the above may be acquired, and the second unsupervised constraint model does not need to be learned in advance.

例えば、第１の制約結果取得ユニット１３０３は、全ての翻訳ペアのセマンティック空間の中心表現と各翻訳ペアのセマンティック空間の表現との間の距離に基づいて、前記第１の制約結果を取得してもよい。 For example, the first constraint result acquisition unit 1303 acquires the first constraint result based on the distance between the central representation of the semantic space of all translation pairs and the semantic space representation of each translation pair. May be good.

例えば、前記制約モデルは教師ありによる第１の制約モデル及び教師なしによる第２の制約モデルを含む第３の制約モデルであり、第１の制約結果取得ユニット１３０３は、前記第２の制約モデルにより、各翻訳ペアのセマンティック空間の表現に基づいて、Ｎ個（Ｎは２以上の整数である）の翻訳ペアを含む第２の候補単語集合を取得することと、前記第１の制約モデルにより、前記第２の候補単語集合における各翻訳ペアの特徴に基づいて、前記第１の制約結果を取得することとを含み、前記教師ありによる第１の制約モデルは学習データにより学習して得られ、前記教師なしによる第２の制約モデルは事前に学習する必要がない。 For example, the constraint model is a third constraint model including a first constraint model with supervised learning and a second constraint model without supervised learning, and the first constraint result acquisition unit 1303 is based on the second constraint model. , Obtaining a second candidate word set containing N translation pairs (N is an integer greater than or equal to 2) based on the semantic space representation of each translation pair, and by the first constraint model. The first supervised constraint model is obtained by training with training data, including acquiring the first constraint result based on the characteristics of each translation pair in the second candidate word set. The unsupervised second constraint model does not need to be trained in advance.

例えば、前記距離はユークリッド距離である。 For example, the distance is an Euclidean distance.

例えば、前記翻訳ペアの特徴は、翻訳が選択された頻度、最近選択されたか否か、翻訳の長さ、正式単語に含まれているか否か、セマンティック関連度のうちの１つ又は複数を含む。 For example, the characteristics of the translation pair include one or more of the frequency of selection of the translation, whether it was recently selected, the length of the translation, whether it is included in the formal word, and the semantic relevance. ..

例えば、前記第１の処理すべき情報は第１の処理テキストの一部であり、前記１つ又は複数の候補単語集合は、前記第１の処理テキストに基づいて生成され、又は、前記第１の処理すべき情報の前の１つ又は複数の処理すべき情報に基づいて生成される。 For example, the first information to be processed is a part of the first processed text, and the one or more candidate word sets are generated based on the first processed text, or the first. Generated based on one or more information to be processed before the information to be processed in.

例えば、ニューラルネットワークを学習し、前記第１の処理テキストに基づいて、又は前記第１の処理すべき情報の前の１つ又は複数の処理すべき情報に基づいて、前記１つ又は複数の候補単語集合を生成する。 For example, the one or more candidates that learn a neural network and are based on the first processed text or based on one or more pieces of information to be processed prior to the first piece of information to be processed. Generate a word set.

例えば、前記第１の制約結果に応じて、制約付きの復号化処理を使用して前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成する。 For example, the second information to be processed is generated by modifying the translation result of the first information to be processed by using the constrained decoding process according to the first constraint result.

例えば、制約モデルにより前記第１の制約結果を取得する前に、前記第１の候補単語集合からユーザパーソナライズ情報に存在しない単語を削除する。 For example, before acquiring the first constraint result by the constraint model, words that do not exist in the user personalized information are deleted from the first candidate word set.

例えば、前記ユーザパーソナライズ情報は、ユーザの翻訳履歴、ユーザのスタイルの傾向、翻訳の分野のうちの１つ又は複数に基づいて構築される。 For example, the user personalized information is constructed based on one or more of the user's translation history, the user's style trends, and the field of translation.

図１８は、本開示の実施例による他の情報処理装置の機能ブロック図を図示する。図１８に示すように、本開示の実施例による情報処理装置１２００は、第４の処理すべき情報生成ユニット１２０１、第３の候補単語集合選択ユニット１２０２、第３の制約結果取得ユニット１２０３、及び第５の処理すべき情報生成ユニット１２０４を含む。上記の各モジュールは、それぞれ上記図１２ないし図１６を参照して説明された本開示の実施例による情報処理方法の各ステップを実行する。当業者は、これらのユニットモジュールは、ハードウェアのみ、ソフトウェアのみ、又はそれらの組み合わせによって様々な方法で実現することができ、本開示はそれらのいずれかに限定されない。例えば、中央処理装置（ＣＰＵ）、グラフィックスプロセッシングユニット（ＧＰＵ）、テンソル・プロセッシング・ユニット（ＴＰＵ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、又はデータ処理能力及び／又は命令実行能力を有する他の形式の処理ユニット及び相応するコンピュータ命令によりこれらのユニットを実現してもよいと理解されたい。 FIG. 18 illustrates a functional block diagram of another information processing apparatus according to an embodiment of the present disclosure. As shown in FIG. 18, the information processing apparatus 1200 according to the embodiment of the present disclosure includes a fourth information generation unit 1201, a third candidate word set selection unit 1202, a third constraint result acquisition unit 1203, and a third constraint result acquisition unit 1203. It includes a fifth information generation unit 1204 to be processed. Each of the above modules performs each step of the information processing method according to the embodiment of the present disclosure described with reference to FIGS. 12 to 16, respectively. Those skilled in the art can realize these unit modules in various ways by hardware only, software only, or a combination thereof, and the present disclosure is not limited to any of them. For example, a central processing unit (CPU), graphics processing unit (GPU), tensor processing unit (TPU), field programmable gate array (FPGA), or other form of data processing and / or instruction execution capability. It should be understood that these units may be implemented by processing units and corresponding computer instructions.

例えば、第４の処理すべき情報生成ユニット１２０１は、第３の処理すべき情報を翻訳して、第４の処理すべき情報を生成するために用いられる。 For example, the fourth information generation unit 1201 to be processed is used to translate the third information to be processed and generate the fourth information to be processed.

例えば、前記第４の処理すべき情報に第３の代用語が含まれると、第３の候補単語集合選択ユニット１２０２は、１つ又は複数の候補単語集合から前記第３の代用語を含む第３の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し、表現が一致しない候補単語を含む。 For example, if the fourth information to be processed contains a third alternative term, the third candidate word set selection unit 1202 includes the third alternative term from one or more candidate word sets. 3 candidate word sets are selected, and each of the one or more candidate word sets includes candidate words whose meanings match at least two and whose expressions do not match.

例えば、第３の制約結果取得ユニット１２０３は、制約モデルにより、前記第３の候補単語集合における第３の制約結果を取得するために用いられる。 For example, the third constraint result acquisition unit 1203 is used to acquire the third constraint result in the third candidate word set by the constraint model.

例えば、第５の処理すべき情報生成ユニット１２０４は、前記第３の制約結果に応じて、前記第４の処理すべき情報を修正することで、第５の処理すべき情報を生成するために用いられる。 For example, the fifth information generation unit 1204 to be processed is to generate the fifth information to be processed by modifying the fourth information to be processed according to the third constraint result. Used.

以下、図１９を参照して本開示の実施例による情報処理装置を説明する。図１９は本開示の実施例による情報処理装置２０００の概略図である。本実施例に係る情報処理装置は、上記図１を参照して説明された方法の詳細と同じであるので、ここで、簡単のために、同一の内容に関する詳細な説明を省略する。 Hereinafter, the information processing apparatus according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 19 is a schematic diagram of the information processing apparatus 2000 according to the embodiment of the present disclosure. Since the information processing apparatus according to the present embodiment is the same as the details of the method described with reference to FIG. 1, here, for the sake of simplicity, detailed description of the same contents will be omitted.

以下、図１９を参照して本開示の実施例による情報処理機器１１００を説明する。図１９は本開示の実施例による情報処理装置の概略図である。本実施例の情報処理装置の機能は、上記図１を参照して説明された方法の詳細と同じであるので、ここで、簡単のために、同一の内容に関する詳細な説明を省略する。 Hereinafter, the information processing apparatus 1100 according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 19 is a schematic diagram of an information processing apparatus according to an embodiment of the present disclosure. Since the function of the information processing apparatus of this embodiment is the same as the details of the method described with reference to FIG. 1, here, for the sake of simplicity, detailed description of the same contents will be omitted.

図１９に示すように、情報処理機器１１００はメモリ１１０３及びプロセッサ１１０１を含む。図１９に情報処理機器１１００が２つの機器のみを含むように示されたが、これは例示にすぎず、情報処理機器１１００は、１つ又は複数の他の機器ことを含んでもよい。これらの機器は、発明の思想に関係しないから、ここで省略されたと留意されたい。 As shown in FIG. 19, the information processing apparatus 1100 includes a memory 1103 and a processor 1101. Although FIG. 19 shows that the information processing device 1100 includes only two devices, this is merely an example, and the information processing device 1100 may include one or more other devices. It should be noted that these devices are omitted here as they are not related to the idea of the invention.

本開示に係るニューラルネットワークに基づく情報処理機器１１００は、プロセッサ１１０１及びメモリ１１０３を含み、コンピュータ読み取り可能な命令を記憶し、前記コンピュータ読み取り可能な命令が前記プロセッサ１００１によって実行される場合に情報処理方法を実行させ、前記方法は、第１の処理すべき情報を取得することと、前記第１の処理すべき情報に第１の代用語が含まれると、１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し、表現が一致しない単語を含むことと、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得することと、前記第１の制約結果に応じて、前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成することと、を含む。 The information processing apparatus 1100 based on the neural network according to the present disclosure includes a processor 1101 and a memory 1103, stores computer-readable instructions, and is an information processing method when the computer-readable instructions are executed by the processor 1001. The method is to acquire the first information to be processed, and when the first information to be processed includes the first alternative term, the method is described from one or more candidate word sets. Depending on the constraint model, a first candidate word set containing the first substitute term is selected, and each of the one or more candidate word sets contains words whose meanings match at least two and whose expressions do not match. , The first processing should be performed according to the acquisition of the first constraint result translated corresponding to the first alternative term in the first candidate word set and the first constraint result. It includes generating a second information to be processed by modifying the translation result of the information.

異なる実施例における情報処理装置１２００、１３００及び情報処理機器１１００の技術的効果については、本開示の実施例で提供される情報処理方法の技術的効果を参照すればよく、ここで再度説明されない。 The technical effects of the information processing apparatus 1200, 1300 and the information processing apparatus 1100 in different embodiments may be referred to with reference to the technical effects of the information processing methods provided in the embodiments of the present disclosure, which are not described again here.

情報処理装置１２００、１３００及び情報処理機器１１００は、様々な適切な電子機器に使用することできる。 The information processing devices 1200 and 1300 and the information processing device 1100 can be used for various suitable electronic devices.

本開示はコンピュータ読み取り可能な記憶媒体をさらに含み、コンピュータ読み取り可能な命令を記憶し、当該コンピュータ読み取り可能な命令がコンピュータに実行される場合に、コンピュータに情報処理方法を実行させ、第１の処理すべき情報を取得することと、前記第１の処理すべき情報に第１の代用語が含まれると、１つ又は複数の候補単語集合から前記第１の代用語を含む第１の候補単語集合を選択し、前記１つ又は複数の候補単語集合のそれぞれは少なくとも２つの意味が一致し、表現が一致しない単語を含むことと、制約モデルにより、前記第１の候補単語集合における前記第１の代用語に対応して翻訳された第１の制約結果を取得することと、前記第１の制約結果に応じて、前記第１の処理すべき情報の翻訳結果を修正することで、第２の処理すべき情報を生成することと、を含む。 The present disclosure further includes a computer-readable storage medium, stores a computer-readable instruction, and causes the computer to execute an information processing method when the computer-readable instruction is executed by the computer. When the information to be obtained is acquired and the first information to be processed includes the first substitute word, the first candidate word containing the first substitute word from one or more candidate word sets. A set is selected, and each of the one or more candidate word sets contains words whose meanings match at least two and whose expressions do not match, and the constraint model allows the first candidate word set in the first candidate word set. By acquiring the first constraint result translated corresponding to the alternative term of, and modifying the translation result of the information to be processed according to the first constraint result, the second constraint result is obtained. Includes generating information to be processed in.

<ハードウェア構成>
なお、上記実施形態の説明に使用されたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及び／又はソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。すなわち、各機能ブロックは、物理的及び／又は論理的に結合した１つの装置により実現されてもよいし、物理的及び／又は論理的に分離した２つ以上の装置を直接的及び／又は間接的に（例えば、有線及び／又は無線）で接続し、これら複数の装置により実現されてもよい。 <Hardware configuration>
The block diagram used in the description of the above embodiment shows a block of functional units. These functional blocks (components) are realized by any combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one physically and / or logically coupled device, or directly and / or indirectly by two or more physically and / or logically separated devices. (For example, wired and / or wireless) may be connected and realized by these plurality of devices.

例えば、本発明の一実施形態における電子機器は、本発明の属性識別方法の処理を行うコンピュータとして機能してもよい。図２０は、本発明の一実施形態に係る電子機器のハードウェア構成の一例を示す図である。上述の電子機器１０は、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６、バス１００７などを含むコンピュータ装置として構成されてもよい。 For example, the electronic device in one embodiment of the present invention may function as a computer that processes the attribute identification method of the present invention. FIG. 20 is a diagram showing an example of a hardware configuration of an electronic device according to an embodiment of the present invention. The electronic device 10 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.

なお、以下の説明では、「装置」という文字は、回路、機器、ユニットなどに読み替えることができる。電子機器１０のハードウェア構成は、図に示した各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the word "device" can be read as a circuit, a device, a unit, or the like. The hardware configuration of the electronic device 10 may be configured to include one or more of the devices shown in the figure, or may be configured not to include some of the devices.

例えば、プロセッサ１００１は、一つだけが示されたが、複数のプロセッサであってもよい。なお、１つのプロセッサで処理を実行してもよく、１以上のプロセッサにより同時又は逐次に、又は、他の方法で実行されてもよい。なお、プロセッサ１００１は、１以上のチップで実装されてもよい。 For example, although only one processor 1001 is shown, it may be a plurality of processors. The process may be executed by one processor, simultaneously or sequentially by one or more processors, or by another method. The processor 1001 may be mounted on one or more chips.

電子機器１０における各機能は、プロセッサ１００１、メモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることで、プロセッサ１００１が演算を行い、通信装置１００４による通信や、メモリ１００２及びストレージ１００３におけるデータの読み出し及び／又は書き込みを制御することで実現される。 For each function in the electronic device 10, the processor 1001 performs calculations by loading predetermined software (programs) on hardware such as the processor 1001 and the memory 1002, and communication by the communication device 1004, memory 1002, and storage 1003 are performed. It is realized by controlling the reading and / or writing of the data in.

プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）で構成されてもよい。 Processor 1001 operates, for example, an operating system to control the entire computer. The processor 1001 may be configured by a central processing unit (CPU: Central Processing Unit) including an interface with peripheral devices, a control device, an arithmetic unit, a register, and the like.

また、プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュールやデータを、ストレージ１００３及び／又は通信装置１００４からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施形態で説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、電子機器１０の制御部４０１は、メモリ１００２に格納され、プロセッサ１００１で動作する制御プログラムによって実現されてもよく、他の機能ブロックについても同様に実現されてもよい。 Further, the processor 1001 reads a program (program code), a software module and data from the storage 1003 and / or the communication device 1004 into the memory 1002, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operations described in the above-described embodiment is used. For example, the control unit 401 of the electronic device 10 may be realized by a control program stored in the memory 1002 and operated by the processor 1001, and may be similarly realized for other functional blocks.

メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、他の適切な記録媒体の少なくとも１つで構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本発明の一実施形態に係る無線通信方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and is, for example, a ROM (Read Only Memory), an EPROM (Erasable Program ROM), an EEPROM (Electrically Erasable Program ROM), a RAM (Random Access), or an appropriate recording medium. It may be composed of at least one of. The memory 1002 may be referred to as a register, a cache, a main memory (main storage device), or the like. The memory 1002 can store a program (program code), a software module, or the like that can be executed to implement the wireless communication method according to the embodiment of the present invention.

ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、フレキシブルディスク、フロッピー（登録商標）ディスク、光磁気ディスク（例えば、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｏｍ）などの読み取り専用な光ディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク）、リムーバブルディスク、ハードディスクドライブ、スマートカード、フラッシュメモリ（例えば、カード、スティック、キードライブ）、磁気ストリップ、データベース、サーバ、他の適切な記録媒体の少なくとも１つで構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。 The storage 1003 is a computer-readable recording medium, for example, a flexible disk, a floppy (registered trademark) disk, a photomagnetic disk (for example, a read-only optical disk such as a CD-ROM (Compact Disc Rome), a digital versatile disk). , Blu-ray® discs), removable discs, hard disk drives, smart cards, flash memories (eg cards, sticks, key drives), magnetic strips, databases, servers, and at least one of other suitable recording media. It may be composed of. The storage 1003 may be referred to as an auxiliary storage device.

通信装置１００４は、有線及び／又は無線ネットワークを介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。 The communication device 1004 is hardware (transmission / reception device) for communicating between computers via a wired and / or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like.

入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、ＬＥＤランプなど）である。なお、入力装置１００５及び出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that receives an input from the outside. The output device 1006 is an output device (for example, a display, a speaker, an LED lamp, etc.) that performs output to the outside. The input device 1005 and the output device 1006 may have an integrated configuration (for example, a touch panel).

また、プロセッサ１００１やメモリ１００２などの各装置は、情報を通信するためのバス１００７で接続される。バス１００７は、単一のバスで構成されてもよいし、装置間で異なるバスで構成されてもよい。 Further, each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be composed of a single bus or may be composed of different buses between the devices.

また、電子機器１０は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）などのハードウェアを含んでもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも１つで実装されてもよい。 In addition, the electronic device 10 includes a microprocessor, a digital signal processor (DSP: Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logical Processor), a PLD (Progrumable Digital Device), a hardware such as FPGA, and an FPGA (File). Often, the hardware may implement some or all of each functional block. For example, the processor 1001 may be implemented on at least one of these hardware.

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software, whether called software, firmware, middleware, microcode, hardware description language, or other names, is an instruction, instruction set, code, code segment, program code, program, subprogram, software module. , Applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, features, etc. should be broadly interpreted.

また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア及びデジタル加入者回線（ＤＳＬ））及び／又は無線技術（赤外線、マイクロ波など）を使用してウェブサイト、サーバ、又は他のリモートソースからソフトウェアが送信される場合、これらの有線技術及び／又は無線技術は、伝送媒体の定義内に含まれる。 Further, software, instructions, information and the like may be transmitted and received via a transmission medium. For example, software from a website, server, or other remote source using wired technology (coaxial cable, fiber optic cable, twisted pair and digital subscriber line (DSL)) and / or wireless technology (infrared, microwave, etc.). When transmitted, these wired and / or wireless technologies are included within the definition of transmission medium.

本明細書で説明した各態様／実施形態は、単独に使用されてもよいし、組み合わせで使用されてもよいし、実行中に切り替えて使用されてもよい。また、本明細書で説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法については、例示的な順序で様々なステップの要素を提示しており、提示した特定の順序に限定されない。 Each aspect / embodiment described herein may be used alone, in combination, or switched during execution. Further, the order of the processing procedures, sequences, flowcharts, etc. of each aspect / embodiment described in the present specification may be changed as long as there is no contradiction. For example, the methods described herein present elements of various steps in an exemplary order and are not limited to the particular order presented.

本明細書で使用する「に応じて」という記載は、別段に明記されていない限り、「のみに応じて」を意味しない。言い換えれば、「に応じて」という記載は、「のみに応じて」と「に少なくとも応じて」の両方を意味する。 As used herein, the phrase "according to" does not mean "according to" unless otherwise stated. In other words, the statement "according to" means both "according only" and "at least according".

本明細書で使用する「第１の」、「第２の」などの呼称を使用したユニットへのいかなる参照も、それらの要素の量又は順序を全般的に限定するものではない。これらの呼称は、２つ以上のユニット間を区別する便利な方法として本明細書で使用され得る。したがって、第１及び第２のユニットへの参照は、２つのユニットのみがそこで採用され得ること、又は何らかの形で第１の要素が第２のユニットに先行しなければならないことを意味しない。 Any reference to a unit using designations such as "first", "second" as used herein does not generally limit the quantity or order of those elements. These designations can be used herein as a convenient way to distinguish between two or more units. Therefore, references to the first and second units do not mean that only two units can be employed there, or that the first element must somehow precede the second unit.

「含む（ｉｎｃｌｕｄｉｎｇ）」、「含んでいる（ｃｏｍｐｒｉｓｉｎｇ）」、及びそれらの変形が、本明細書あるいは特許請求の範囲で使用されている時、これらの用語は、用語「備える」と同様に、包括的であることが意図される。さらに、本明細書あるいは特許請求の範囲において使用されている用語「又は（ｏｒ）」は、排他的論理和ではないことが意図される。 When "included", "comprising", and variations thereof are used herein or in the claims, these terms are used in the same manner as the term "comprising". Intended to be inclusive. Furthermore, the term "or" as used herein or in the claims is intended to be non-exclusive.

当業者は、本出願の様々な態様が、いかなる新しい又は有用なプロセス、機械、製品、又は物質との組み合わせ、又はそれらに対するいかなる新しい又は有用な改善を含む、いくつかの特許可能なカテゴリ又は状況によって例示及び説明され得ることを理解することができる。それに対して、本出願の様々な態様が完全にハードウェアにより実行されてもよく、完全に（ファームウェア、常駐ソフトウェア、マイクロコードなどを含む）ソフトウェアにより実行されてもよく、ハードウェアとソフトウェアの組み合わせにより実行されてもよい。以上のハードウェア又はソフトウェアは、「データブロック」、「モジュール」、「エンジン」、「ユニット」、「要素」又は「システム」と呼ばれてもよい。また、本出願の様々な態様は、１つ又は複数のコンピュータ読み取り可能な媒体に配置されたコンピュータ製品として表現されてもよい。当該製品はコンピュータ読み取り可能なプログラムコードを含む。 Those skilled in the art will appreciate some patentable categories or situations in which the various aspects of this application include any new or useful process, combination with a machine, product, or substance, or any new or useful improvement to them. Can be understood to be exemplified and explained by. In contrast, various aspects of this application may be performed entirely by hardware or entirely by software (including firmware, resident software, microcode, etc.), a combination of hardware and software. May be executed by. The above hardware or software may be referred to as a "data block", "module", "engine", "unit", "element" or "system". Also, various aspects of this application may be expressed as computer products placed on one or more computer-readable media. The product contains computer readable program code.

本出願は、特定の言葉を使用し本出願の実施例を説明する。例えば、「一つの実施例」、「一実施例」、及び／又は「いくつかの実施例」は、本出願の少なくとも１つの実施例に係るある特徴、構成、又は特性を意味する。よって、本明細書の異なる位置で２回以上言及されている「一実施例」又は「一つの実施例」又は「一代替実施例」は、必ずしも同じ実施例を指しているわけではないことを強調し、留意すべきである。また、本出願の１つ又は複数の実施例におけるある特徴、構成、又は特性は、適切に組み合わせることができる。 This application uses specific terms to describe examples of this application. For example, "one example," "one example," and / or "several examples" means a feature, configuration, or characteristic according to at least one embodiment of the present application. Therefore, "one example," "one example," or "one alternative example" referred to more than once in different positions herein does not necessarily refer to the same embodiment. It should be emphasized and noted. Also, certain features, configurations, or properties in one or more embodiments of the present application can be combined appropriately.

特に定義のない限り、ここで使用される全ての用語（技術用語及び科学用語を含む）は、本開示が属する分野の当業者によって一般的に理解される意味と同じ意味を有する。例えば、通常、辞書で定義される用語は、関連技術の文脈における意味と一致する意味を有するものとして解釈されるべきであり、明確にそう定義されていない限り、理想的又は過度に正式に解釈されないことがさらに理解されよう。 Unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. For example, terms defined in a dictionary should usually be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and unless explicitly defined so, an ideal or overly formal interpretation. It will be further understood that it will not be done.

以上、本発明について詳細に説明したが、当業者にとっては、本発明が本明細書中に説明した実施形態に限定されるものではないということは明らかである。本発明は、特許請求の範囲の記載により定まる本発明の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本明細書の記載は、例示説明を目的とするものであり、本発明に対して何ら制限的な意味を有するものではない。
Although the present invention has been described in detail above, it is clear to those skilled in the art that the present invention is not limited to the embodiments described in the present specification. The present invention can be implemented as modifications and modifications without departing from the spirit and scope of the present invention as determined by the description of the scope of claims. Therefore, the description of the present specification is for the purpose of exemplary explanation and does not have any limiting meaning to the present invention.

Claims

Obtaining the first information to be processed and
When the first substitute word is included in the first information to be processed, the first candidate word set containing the first substitute word is selected from one or more candidate word sets, and the one or more candidate word sets are selected. Each of the plurality of candidate word sets contains at least two candidate words whose meanings match and whose expressions do not match.
To obtain the first constraint result translated corresponding to the first alternative term in the first candidate word set by the constraint model.
An information processing method including generating a second information to be processed by modifying a translation result of the first information to be processed according to the first constraint result.

Using the constraint model, it is possible to obtain the first constraint result translated corresponding to the first alternative term in the first candidate word set.
Pairing each candidate word in the first candidate word set with one or more translation results of each candidate word to form a translation pair.
The information processing method according to claim 1, further comprising acquiring the first constraint result based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair.

The constraint model is a supervised first constraint model, and obtaining the first constraint result based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair is possible.
The first constraint model includes acquiring the first constraint result based on the characteristics of each translation pair, and the first constraint model with supervision is a claim obtained by learning from training data. The information processing method according to 2.

The constraint model is an unsupervised second constraint model, and obtaining the first constraint result based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair is possible.
The second constraint model includes obtaining the first constraint result based on the representation of the semantic space of each translation pair, and the unsupervised second constraint model does not need to be trained in advance. The information processing method according to claim 2.

With the second constraint model, it is possible to obtain the first constraint result based on the representation of the semantic space of each translation pair.
The information processing method according to claim 4, further comprising acquiring the first constraint result based on the distance between the central representation of the semantic space of all translation pairs and the representation of the semantic space of each translation pair.

The constraint model is a third constraint model including a first constraint model with supervision and a second constraint model without supervising, and is based on the characteristics of each translation pair and / or the representation of the semantic space of each translation pair. To obtain the first constraint result,
By the second constraint model, a second candidate word set containing N translation pairs (N is an integer of 2 or more) is obtained based on the representation of the semantic space of each translation pair.
The first constraint model includes obtaining the first constraint result based on the characteristics of each translation pair in the second candidate word set.
The information processing method according to claim 2, wherein the first constraint model with supervised learning is obtained by learning from training data, and the second constraint model without supervised learning does not need to be learned in advance.

The information processing method according to claim 5, wherein the distance is an Euclidean distance.

The characteristics of the translation pair are claims including one or more of the frequency of selection, whether the translation was recently selected, the length of the translation, whether it is included in the formal word, and the semantic relevance. The information processing method according to 3 or 6.

The first processing information is a part of the first processing text, and the one or more candidate word sets are generated based on the first processing text, or the first processing. The information processing method according to any one of claims 1 to 8, which is generated based on one or a plurality of information to be processed before the information to be processed.

By learning a neural network, the one or more candidates may be based on the first processed text or based on one or more pieces of information to be processed prior to the first piece of information to be processed. The information processing method according to claim 9, which generates a word set.

Claim 1 to generate the second information to be processed by modifying the translation result of the first information to be processed by using the constrained decoding process according to the first constraint result. The information processing method according to any one of 8 to 8.

The information processing method according to any one of claims 1 to 8, wherein words that do not exist in the user personalized information are deleted from the first candidate word set before the first constraint result is acquired by the constraint model.

The information processing method according to claim 12, wherein the user personalized information is constructed based on one or more of a user's translation history, a user's style tendency, and a field of translation.

To translate the third information to be processed to generate the fourth information to be processed,
When the information to be processed includes the third substitute word, the third candidate word set containing the third substitute word is selected from one or more candidate word sets, and the one or more candidate word sets are selected. Each of the plurality of candidate word sets contains at least two candidate words whose meanings match and whose expressions do not match.
By the constraint model, the third constraint result in the third candidate word set is acquired.
An information processing method including generating a fifth information to be processed by modifying the fourth information to be processed according to the third constraint result.

Obtaining the third constraint result in the third candidate word set by the constraint model is not possible.
The information processing method according to claim 14, further comprising acquiring the third constraint result based on the characteristics of each candidate word in the third candidate word set and / or the representation of the semantic space of each candidate word.

The constraint model is a supervised first constraint model, and the third constraint result is obtained based on the characteristics of each candidate word in the third candidate word set and / or the representation of the semantic space of each candidate word. To get is
The first constraint model with the teacher includes learning data, including acquiring the third constraint result based on the characteristics of each candidate word in the third candidate word set by the first constraint model. The information processing method according to claim 15, which is obtained by learning from the above.

The constraint model is an unsupervised second constraint model, and the third constraint result is obtained based on the characteristics of each candidate word in the third candidate word set and / or the representation of the semantic space of each candidate word. To get is
The second constraint model includes obtaining the third constraint result based on the representation of the semantic space of each candidate word, and the unsupervised second constraint model does not need to be learned in advance. Item 5. The information processing method according to Item 15.

By the second constraint model, it is possible to obtain the third constraint result based on the representation of the semantic space of each candidate word.
The information processing method according to claim 17, wherein the third constraint result is acquired based on the distance between the central representation of the semantic space of all the candidate words and the semantic space representation of each candidate word.

The constraint model is a third constraint model including a first constraint model with supervised learning and a second constraint model without supervised learning, and is based on the characteristics of each candidate word and / or the representation of the semantic space of each candidate word. To obtain the third constraint result,
Using the second constraint model, a fourth candidate word set containing M candidate words (M is an integer of 2 or more) is obtained based on the representation of the semantic space of each candidate word.
The first constraint model includes obtaining the third constraint result based on the characteristics of each candidate word in the fourth candidate word set.
The information processing method according to claim 15, wherein the first constraint model with supervised learning is obtained by learning from training data, and the second constraint model without supervised learning does not need to be learned in advance.

The third information to be processed is a part of the second processed text, and the one or more candidate word sets are generated based on the translated text of the second processed text, or the first. The information processing method according to any one of claims 14 to 19, which is generated based on the translated text of one or a plurality of information to be processed before the information to be processed in 3.

The first information acquisition unit to be processed for acquiring the first information to be processed, and
When the first substitute word is included in the first information to be processed, the purpose is to select the first candidate word set containing the first substitute word from one or more candidate word sets. , Each of the one or more candidate word sets has at least two meanings matching, and a first candidate word set selection unit containing candidate words whose expressions do not match.
A first constraint result acquisition unit for acquiring a first constraint result translated corresponding to the first alternative term in the first candidate word set by a constraint model, and a first constraint result acquisition unit.
A second information generation unit to be processed for generating a second information to be processed by modifying the translation result of the first information to be processed according to the first constraint result. Information processing equipment including.

A fourth information generation unit to be processed for translating the third information to be processed and generating a fourth information to be processed,
When the third substitute word is included in the fourth information to be processed, it is for selecting a third candidate word set containing the third substitute word from one or more candidate word sets. , Each of the one or more candidate word sets includes a third candidate word set selection unit containing at least two matching meanings and mismatched expressions.
With the constraint model, a third constraint result acquisition unit for acquiring the third constraint result in the third candidate word set, and
Information including a fifth information generation unit to be processed for generating a fifth information to be processed by modifying the fourth information to be processed according to the third constraint result. Processing equipment.

With the processor
Includes memory for storing computer-readable instructions,
When the computer-readable instruction is executed by the processor, the information processing method is executed.
The method obtains the first information to be processed and obtains the information to be processed.
When the first substitute word is included in the first information to be processed, the first candidate word set containing the first substitute word is selected from one or more candidate word sets, and the one or more candidate word sets are selected. Each of the candidate word sets contains words that have at least two matching meanings and mismatched expressions.
By the constraint model, the first constraint result translated corresponding to the first alternative term in the first candidate word set is acquired.
An information processing apparatus including generating a second information to be processed by modifying a translation result of the first information to be processed according to the first constraint result.

A computer-readable storage medium for storing a computer-readable program, wherein the program is a computer-readable storage that causes a computer to execute the information processing method according to any one of claims 1 to 20. Medium.