JP2001117922A

JP2001117922A - Translation apparatus, translation method, and recording medium

Info

Publication number: JP2001117922A
Application number: JP29387599A
Authority: JP
Inventors: Atsuo Hiroe; 厚夫廣江; Hironaga Tsutsumi; 洪長包; Hideki Kishi; 秀樹岸; Koji Asano; 康治浅野
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1999-10-15
Filing date: 1999-10-15
Publication date: 2001-04-27

Abstract

(57)【要約】【課題】精度の高い翻訳を行い、また、翻訳文を、違
和感のない合成音で出力する。【解決手段】音声認識部１では、入力音声が認識され
るとともに、入力音声のプロソディ情報が抽出され、入
力音声の音声認識結果とともに、機械翻訳部２に供給さ
れる。機械翻訳部２は、日英および英日の翻訳を行うた
めの言語変換データが、英語および日本語についてのプ
ロソディ情報とともに記述されている変換テーブルを有
しており、その変換テーブルに基づいて、音声認識結果
が、プロソディ情報付きの翻訳文に翻訳される。このプ
ロソディ情報付きの翻訳文は、音声合成部３に供給さ
れ、規則音声合成処理が行われることにより、スピーカ
５から合成音で出力される。 (57) [Summary] [Problem] To perform high-accuracy translation and to output a translated sentence with a synthesized sound that does not cause discomfort. SOLUTION: A speech recognition unit 1 recognizes an input speech, extracts prosody information of the input speech, and supplies it to a machine translation unit 2 together with a speech recognition result of the input speech. The machine translation unit 2 has a conversion table in which language conversion data for performing Japanese-English and English-Japanese translation is described together with prosody information on English and Japanese, and based on the conversion table, The speech recognition result is translated into a translation with prosody information. The translated sentence with the prosody information is supplied to the speech synthesizing unit 3, and is output as a synthesized sound from the speaker 5 by performing a regular speech synthesizing process.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、翻訳装置および翻
訳方法、並びに記録媒体に関し、特に、プロソディ情報
を考慮して翻訳を行うようにすることで、精度の高い翻
訳を行い、さらに、その翻訳文を、精度の高い合成音で
出力すること等ができるようにするものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a translation apparatus, a translation method, and a recording medium, and more particularly to a translation apparatus that performs high-precision translation by performing translation in consideration of prosody information. This enables a sentence to be output as a highly accurate synthesized sound.

【０００２】[0002]

【従来の技術】例えば、日本語と英語等の、異なる言語
による発話を行うユーザどうしがコミュニケーションを
図るためのツールとして、音声翻訳システムがある。音
声翻訳システムでは、例えば、日本語の発話が音声認識
され、その音声認識結果が英語に翻訳されて、合成音で
出力されるとともに、英語の発話が音声認識され、その
音声認識結果が日本語に翻訳されて、合成音で出力され
る。従って、英語の話者（ユーザ）は、日本語の話者の
発話を、英語で聞くことができ、また、日本語の話者
は、英語の話者の発話を、日本語で聞くことができ、相
互に、相手の発話を理解して対話を行うことができる。2. Description of the Related Art A speech translation system is a tool for communicating between users who speak in different languages such as Japanese and English. In a speech translation system, for example, a Japanese utterance is speech-recognized, the speech recognition result is translated into English and output as a synthesized sound, and the English utterance is speech-recognized. And output as synthesized speech. Therefore, an English speaker (user) can hear the utterance of a Japanese speaker in English, and a Japanese speaker can hear the utterance of an English speaker in Japanese. They can interact with each other by understanding the utterance of the other party.

【０００３】図１は、従来の音声翻訳システムの一例の
構成を示している。FIG. 1 shows the configuration of an example of a conventional speech translation system.

【０００４】原言語（翻訳前の言語）による音声は、音
声認識部２０１に入力され、そこで音声認識される。音
声認識部２０１は、原言語による音声の音声認識結果
を、テキストで、機械翻訳部２０２に出力する。なお、
音声認識部２０１から機械翻訳部２０２に対しては、１
つの音声認識結果ではなく、複数の音声認識結果の候補
が出力される場合もある。[0004] Speech in a source language (language before translation) is input to a speech recognition unit 201, where the speech is recognized. The speech recognition unit 201 outputs the speech recognition result of the speech in the source language to the machine translation unit 202 as a text. In addition,
1 from the speech recognition unit 201 to the machine translation unit 202
A plurality of voice recognition result candidates may be output instead of one voice recognition result.

【０００５】機械翻訳部２０２では、原言語による音声
認識結果のテキストが、目的言語（翻訳後の言語）のテ
キスト（翻訳文）に翻訳され、音声合成部２０３に供給
される。音声合成部２０３では、機械翻訳部２０２から
の翻訳文に基づいて、その音韻情報が生成され、適当な
イントネーションを付加することで、その翻訳文に対応
する合成音が生成されて出力される。なお、機械翻訳部
２０２が出力する翻訳文は、図示せぬディスプレイで表
示されることもある。[0005] In the machine translation unit 202, the text of the speech recognition result in the source language is translated into a text (translation sentence) in the target language (the translated language) and supplied to the speech synthesis unit 203. The speech synthesis unit 203 generates the phoneme information based on the translated sentence from the machine translation unit 202, and generates and outputs a synthesized speech corresponding to the translated sentence by adding an appropriate intonation. The translated sentence output by the machine translation unit 202 may be displayed on a display (not shown).

【０００６】[0006]

【発明が解決しようとする課題】音声翻訳システムにお
いて、原言語による音声が、目的言語による正しい翻訳
文に対応する合成音で出力されるかどうかの処理精度に
関しては、音声認識部２０１、機械翻訳部２０２、音声
合成部２０３それぞれにおける処理の精度が問題になる
他、音声（原言語による音声）が、一旦、テキストに変
換され、再び、音声（目的言語による合成音）に変換さ
れるという処理過程で、情報が欠落することも問題にな
る。In the speech translation system, regarding the processing accuracy of whether a speech in the source language is output as a synthesized speech corresponding to a correct translation in the target language, the speech recognition unit 201 and the machine translation The accuracy of the processing in each of the unit 202 and the voice synthesizing unit 203 becomes a problem. In addition, processing in which voice (voice in the original language) is temporarily converted into text and then converted again into voice (synthesized voice in the target language) Missing information during the process can also be a problem.

【０００７】即ち、音声には、読み取り可能な文字で表
すことのできる音韻情報の他、アクセントや、イントネ
ーション、リズム、ポーズといった韻律に関するプロソ
ディ(prosody)情報も含まれるが、音声認識部１におい
て音声認識が行われ、その音声認識結果がテキストで出
力されることにより、音声に含まれていたプロソディ情
報は失われる。That is, speech includes not only phonological information that can be represented by readable characters but also prosody information on prosody such as accent, intonation, rhythm, and pause. The recognition is performed, and the speech recognition result is output as text, so that the prosody information included in the speech is lost.

【０００８】しかしながら、文字によって表現した場合
は同一になる音声であっても、その意味内容が、音声に
含まれるプロソディ情報によって異なる場合があり、上
述のように、翻訳処理の過程で、テキストを介すること
によりプロソディ情報が欠落することが、翻訳結果に影
響を与えることがある。[0008] However, even if the voice is the same when expressed by characters, the meaning may differ depending on the prosody information included in the voice, and as described above, the text is converted during the translation process. The loss of prosody information due to intervening may affect the translation result.

【０００９】即ち、例えば、いま、日本語と英語の話者
（ユーザ）どうしが対話を行うために、日英および英日
の翻訳を行うとして、アクセントまたは上り調子のイン
トネーションがある語句を、アスタリスク（＊）で囲ん
で記述するとする。従って、例えば、"*I* have a pe
n."は、"I"を強く発音することを表し、また、"I havea
*pen*."は、"pen"を強く発音することを表す。That is, for example, in order for Japanese and English speakers (users) to communicate with each other, it is assumed that translation between Japanese and English and between English and Japanese is performed. *) Thus, for example, "* I * have a pe
n. "means pronounce" I "strongly and" I havea
* pen *. "indicates that" pen "is pronounced strongly.

【００１０】例えば、文中に、新情報（対話の中に、新
たに現れた情報）である語句と、旧情報（対話の中に、
既に現れている情報）である語句とがある場合におい
て、真の情報が旧情報よりも先行して現れるときには、
英語では、一般に、新情報の語句は、アクセント付きで
発話される。従って、アクセントを考慮しないで翻訳を
行うと、違和感のある翻訳結果が得られる（誤った翻訳
結果が得られる）ことがある。For example, in a sentence, a phrase which is new information (information newly appearing in a dialogue) and old information (a word in a dialogue,
Information that has already appeared) and the true information appears before the old information,
In English, new information phrases are generally uttered with accents. Therefore, if the translation is performed without considering the accent, a strange translation result may be obtained (an incorrect translation result may be obtained).

【００１１】具体的には、例えば、「弟が写真をとっ
た」という事実がある場合において、日本語による発話
「この写真は、弟が撮りました。」については、「この
写真」が旧情報となっており、「弟」が新情報になって
いる。この日本語による発話の英訳（翻訳文）は、"My
*brother* took this picture."となり、新情報「弟」
の対訳"My brother"が、旧情報「この写真」の対訳"thi
s picture"よりも先行して現れるから、英訳"My *broth
er* took this picture."において、"brother"はアクセ
ント付きで発話される。Specifically, for example, when there is a fact that "my brother took a picture", the Japanese utterance "this picture was taken by my brother" It is information, and "brother" is new information. The English translation (translation) of this Japanese utterance is "My
* brother * took this picture. "
A translation of "My brother" is translated into a translation of the old information "This photo".
s picture "appears before the English translation" My * broth
In "er * took this picture.", "brother" is uttered with an accent.

【００１２】一方、日本語による発話「弟は、この写真
を撮りました。」については、「この写真」が新情報と
なっており、「弟」が旧情報になっている。この日本語
による発話の英訳は、"My brother took this pictur
e."となり、新情報「この写真」の対訳"this picture"
が、旧情報「弟」の対訳"My brother"よりも先行してい
ないから、英訳"My brother took this picture."は、
アクセントなしで発話される。On the other hand, as for the utterance in Japanese, "My brother took this picture", "This picture" is new information and "Brother" is old information. The English translation of this Japanese utterance is "My brother took this pictur
e. "and the new information" this picture "
However, the English translation of "My brother took this picture."
Spoken without accents.

【００１３】日本語による発話「この写真は、弟が撮り
ました。」と、「弟は、この写真を撮りました。」の英
訳は、テキスト上では、いずれも"My brother took thi
s picture."となり、同一であるが、上述のように、発
話したときのアクセントが異なる。従って、アクセント
を考慮せずに、日英や英日の翻訳を行うと、以下のよう
な問題が生じる。The English translations of the utterances in Japanese, "This picture was taken by my brother." And "My brother took this picture."
s picture. ", which is the same, but the accent when speaking is different as described above. Therefore, when translating Japanese to English or English to Japanese without considering the accent, the following problem occurs. Occurs.

【００１４】即ち、日英の翻訳を行い、その翻訳結果
を、合成音で出力する場合、日本語による発話「この写
真は、弟が撮りました。」の英訳"My *brother* took t
his picture."の合成音には、"brother"にアクセントが
あるべきであり、また、日本語による発話「弟は、この
写真を撮りました。」の英訳"My brother took this pi
cture."の合成音には、アクセントがあるべきではな
い。That is, when a Japanese-English translation is performed and the translation result is output as a synthesized sound, the English translation of "My * brother * took t" in Japanese utterance "This picture was taken by a younger brother."
The synthesized sound of "his picture." should have an accent on "brother", and the English translation of "My brother took this pi" in Japanese utterance "My brother took this picture."
cture. "should have no accent.

【００１５】しかしながら、アクセントを考慮しない場
合には、いずれの英訳もアクセントなしの合成音で出力
される。即ち、日本語による発話「この写真は、弟が撮
りました。」と、「弟は、この写真を撮りました。」の
翻訳結果は、区別されずに、いずれも、"My brother to
ok this picture."というアクセントなしの合成音で出
力されることになる。従って、英語のユーザからすれ
ば、日本語による発話「この写真は、弟が撮りまし
た。」の英訳として、意味的に違和感のある合成音が出
力されることになる。However, when the accent is not considered, any English translation is output as a synthesized sound without accent. In other words, the translation results of the utterances in Japanese, "This picture was taken by my brother." And "My brother took this picture."
ok this picture. "Therefore, for English users, the English translation of the Japanese utterance," This picture was taken by my brother. " , A synthetic sound with an uncomfortable feeling is output.

【００１６】逆に、英日の翻訳を行う場合には、英語に
よる発話"My *brother* took thispicture."と、"My br
other took this picture."の翻訳結果は、上述のよう
に、それぞれ「この写真は、弟が撮りました。」と、
「弟は、この写真を撮りました。」となって区別される
べきである。しかしながら、アクセントを考慮しない場
合には、上述の英語による２つの発話は区別されず、い
ずれの日本語訳も、「弟は、この写真を撮りました。」
となり、日本語のユーザからすれば、英語による発話"M
y *brother* took this picture."の日本語訳として、
意味的に違和感のある翻訳文が出力されることになる。Conversely, when translating English to Japanese, the English utterance "My * brother * took this picture."
As described above, the translation result of "other took this picture." is, "This picture was taken by my brother."
"My brother took this picture." However, if accents are not taken into account, the above two utterances in English are indistinguishable, and both Japanese translations say, "My brother took this picture."
And for Japanese users, the utterance "M
y * brother * took this picture. "
A translated sentence with a sense of incongruity is output.

【００１７】また、例えば、日本語では、助詞「は」と
「が」が使い分けられることがあり、さらに、助詞
「が」については、その直前に配置される名詞がアクセ
ント付きで発話されることがあるが、アクセントを考慮
しないで翻訳を行うと、違和感のある翻訳結果が得られ
ることがある。For example, in Japanese, the particles "wa" and "ga" may be used properly, and for the particle "ga", the noun placed immediately before it is uttered with an accent. However, if the translation is performed without considering the accent, an unnatural translation result may be obtained.

【００１８】具体的には、例えば、「ヘンリーがどうし
たのか」という質問に対しての英語によるユーザの返答
が、"Henry has arrived."または"Henry has *arrived
*."であった場合においては、その日本語訳は、「ヘン
リーは到着しました。」となる。Specifically, for example, a user's response in English to the question "What happened to Henry?" Is "Henry has arrived." Or "Henry has * arrived."
*. ", The Japanese translation is" Henry has arrived. "

【００１９】また、例えば、「誰が到着したのか」また
は「到着したのは誰か」という質問に対しての英語によ
るユーザの返答が、"*Henry* has arrived."であった場
合においては、その日本語訳は、「*ヘンリーが*到着し
ました。」または「到着したのはヘンリーです。」とな
る。For example, when the user's reply in English to the question "Who has arrived" or "Who has arrived" is "* Henry * has arrived." The Japanese translation is "* Henry has arrived *" or "Henry has arrived."

【００２０】さらに、例えば、「何が起こったのか」と
いう質問に対しての英語によるユーザの返答が、"Henry
has arrived."であった場合においては、その日本語訳
は、「ヘンリーが到着しました。」となる。Further, for example, a user's response in English to the question "what happened" is "Henry
In the case of "has arrived.", the Japanese translation is "Henry has arrived."

【００２１】従って、日英の翻訳においては、「ヘンリ
ーは到着しました。」、「*ヘンリーが*到着しまし
た。」、「到着したのはヘンリーです。」、「ヘンリー
が到着しました」の翻訳結果は、アクセントを考慮しな
いと、いずれも、"Henry has arrived."となり、この翻
訳結果を、合成音で出力した場合には、英語のユーザに
とって、意味的に違和感のある合成音が出力されること
がある。Therefore, in the Japanese-English translations, "Henry has arrived,""Henry has arrived,""Henry has arrived," and "Henry has arrived." The translation results are all "Henry has arrived." If the accent is not taken into account, and if this translation result is output as a synthetic sound, a synthetic sound that is semantically strange for an English user is output. May be done.

【００２２】一方、英日の翻訳においては、"Henry has
arrived."、"*Henry* has arrived."、"Henry has *ar
rived*."の翻訳結果は、アクセントを考慮しないと、い
ずれも、「ヘンリーがどうしたのか」という質問に対し
ての返答である「ヘンリーは到着しました。」か、また
は、「何が起こったのか」という質問に対しての返答で
ある「ヘンリーが到着しました。」となり、英語による
発話におけるアクセントの違いによって、訳し分けを行
うことができない。その結果、日本語のユーザにとっ
て、意味的に違和感のある翻訳文が出力されることがあ
る。On the other hand, in the English-Japanese translation, "Henry has
arrived. "," * Henry * has arrived. "," Henry has * ar
The translation of "rived *.", without considering accents, is either a response to the question "What happened to Henry?" or "Henry has arrived.""Henry has arrived." Is the answer to the question "That?", And the translation cannot be performed due to the difference in accent in the English utterance. As a result, a translated sentence with a sense of incongruity may be output for a Japanese user.

【００２３】さらに、例えば、英語では、副詞と、その
副詞の意味上の係り先との位置関係によって、アクセン
ト付きの発話が行われることがあるが、アクセントを考
慮しないで翻訳を行うと、違和感のある翻訳結果が得ら
れることがある。Further, for example, in English, an utterance with an accent may be performed depending on a positional relationship between an adverb and a semantic destination of the adverb. Translation results may be obtained.

【００２４】即ち、英語では、例えば、"also"や"only"
といった副詞は、基本的には、それが修飾する語句の直
前に配置されるが（"only"は、それが修飾する語句の直
後に配置されることもある）、話し言葉では、これらの
副詞が、動詞の直前に配置され、係り先（修飾先）の語
句に、アクセント（ストレス）が付されることがある。That is, in English, for example, "also" or "only"
Are basically placed just before the word it modifies ("only" may be right after the word it modifies), but in spoken language these adverbs are Is placed immediately before a verb, and an accent (stress) may be added to a word of a modification destination.

【００２５】このため、例えば、英語による発話"I als
o like *her*."の日本語訳は、「私は彼女も好きで
す。」になるべきであり、英語による発話"*I* also li
ke her."の日本語訳は、「私も彼女が好きです。」にな
るべきである。For this reason, for example, the utterance in English "I als
The Japanese translation of "o like * her *." should be "I also like her."
The Japanese translation of "ke her." should be "I also like her."

【００２６】また、例えば、英語による発話"I only sa
w *her* yesterday."の日本語訳は、「昨日、私は彼女
だけに会いました。」または「昨日私が会った人は、彼
女だけです。」になるべきであり、英語による発話"I o
nly saw her *yesterday*."の日本語訳は、「昨日だ
け、私は彼女に会いました。」または「私が彼女にあっ
たのは、昨日だけです。」にあるべきである。Further, for example, the utterance in English "I only sa
The Japanese translation of "w * her * yesterday." should be "I met her only yesterday." or "She was the only person I met yesterday.""I o
The Japanese translation of "nly saw her * yesterday *." should be "I met her only yesterday." or "I was only with her yesterday."

【００２７】さらに、例えば、英語では、幾つかの語句
が接続されて複合語化すると、アクセントが前方に移動
する。Further, for example, in English, when several phrases are connected to form a compound word, the accent moves forward.

【００２８】このため、例えば、「イギリス人の先生」
という名詞句は、"(an) English *teacher*"と英訳され
るべきであり、「英語の先生」という複合名詞化したも
のは、"(an) *English* teacher"と英訳されるべきであ
る。For this reason, for example, "English teacher"
The noun phrase "(an) English * teacher *" should be translated into English, and the compound noun phrase "English teacher" should be translated into "(an) * English * teacher". is there.

【００２９】従って、アクセントを考慮せずに翻訳を行
うと、上述の場合と同様に、日英の翻訳においては、英
訳が、常に、アクセントなしの合成音で出力される結
果、英語のユーザにとって、意味的に違和感のある合成
音が出力されることがある。一方、英日の翻訳において
も、英語による発話におけるアクセントの違いによっ
て、訳し分けを行うことができない結果、日本語のユー
ザにとって、意味的に違和感のある翻訳文が出力される
ことがある。Therefore, if the translation is performed without considering the accent, as in the above-described case, in the Japanese-English translation, the English translation is always output as an accent-free synthesized sound, and as a result, for the English user, In some cases, a synthetic sound having a sense of incongruity may be output. On the other hand, even in English-Japanese translation, translation cannot be performed due to differences in accents in utterances in English, and as a result, a translated sentence with a sense of incongruity may be output to a Japanese user.

【００３０】また、発話においては、一般に、形式が平
叙文であっても、文末のイントネーションを揚げると
（上り調子のイントネーションとすると）、質問を意図
する文になり得る。従って、イントネーションを考慮し
ないと、質問を意図する発話であっても、その発話は、
すべて平叙文として翻訳され、その翻訳結果が、合成音
で出力されることになる。その結果、ユーザの意図にそ
ぐわない翻訳文が出力される。In addition, in general, even if the utterance format is a declarative sentence, when the intonation at the end of the sentence is raised (assuming the intonation of the rising tone), the sentence can be a sentence intended for a question. Therefore, if intonation is not considered, even if the utterance is intended to be a question, the utterance will be
All are translated as declarative sentences, and the translation results are output as synthesized sounds. As a result, a translation that does not meet the user's intention is output.

【００３１】さらに、音声翻訳システムのマンマシンイ
ンタフェースの観点からは、ユーザの個人性にあわせ
て、翻訳結果の合成音を出力するのが望ましい場合があ
る。即ち、ユーザの個人性としては、例えば、音声のピ
ッチ（例えば、男性または女性のいずれの声である
か）、発話スピード（例えば、早口か、またはゆっくり
した口調であるか）、年齢、口調（例えば、怒っている
か、または喜んでいるか）、音声のパワー、その他非言
語的な音情報（例えば、笑い声や、くしゃみ、舌打ち）
等があるが、このようなユーザの個人性を反映した合成
音で、ユーザの発話の翻訳結果を出力した場合には、自
己の意図を、対話の相手に、より正確に伝えることが可
能となる。Furthermore, from the viewpoint of the man-machine interface of the speech translation system, it may be desirable to output a synthesized speech as a translation result in accordance with the personality of the user. That is, as the personality of the user, for example, the pitch of the voice (for example, whether the voice is male or female), the speed of speech (for example, whether the voice is fast or slow), age, tone ( (Eg, angry or happy), speech power, and other non-verbal sound information (eg, laughter, sneezing, tongue)
However, when a translation of a user's utterance is output using a synthetic sound that reflects the user's personality, it is possible to more accurately convey his or her intentions to the other party of the dialogue. Become.

【００３２】しかしながら、従来の音声翻訳システムで
は、ユーザの発話が、一旦、テキストに変換されること
で、上述のようなユーザの個人性を反映するプロソディ
情報が失われるため、ユーザの個人性を反映した合成音
を出力するのは困難である。However, in the conventional speech translation system, once the utterance of the user is converted into text, the prosody information reflecting the personality of the user as described above is lost. It is difficult to output the reflected synthesized sound.

【００３３】また、例えば、ユーザの性別によって、そ
のユーザによる発話の翻訳結果を変化させたい場合にお
いては、性別の識別は、音声からは、比較的に容易に行
うことが可能であるが、音声が、その音声認識結果とし
てのテキストに変換された後に、その識別を行うのは困
難である。For example, when it is desired to change the translation result of the utterance by the user according to the gender of the user, the gender can be identified relatively easily from the voice. However, it is difficult to perform the identification after the text is converted into the text as the speech recognition result.

【００３４】そこで、従来の音声認識や音声翻訳の手法
の中には、音声のプロソディ情報を用いて処理を行うも
のがある。Therefore, some of the conventional speech recognition and speech translation techniques perform processing using speech prosody information.

【００３５】即ち、例えば、特開平８−５０４９８号公
報には、音韻情報だけでなく、イントネーションの情報
（音韻学的情報）をも用い、例えば、「橋」および
「箸」等といった同音異アクセント語を区別して、音声
認識を行う方法が開示されている。That is, for example, Japanese Unexamined Patent Publication No. Hei 8-50498 uses not only phonological information but also intonation information (phonological information), for example, homophonic accents such as "bridge" and "chopsticks". A method for performing speech recognition by distinguishing words is disclosed.

【００３６】しかしながら、特開平８−５０４９８号公
報に開示されている方法では、同一の単語列ではある
が、アクセントのある単語が異なるもの（例えば、上述
の"(an) English *teacher*"と、"(an) *English* teac
her"など）を区別して認識することはできない。さら
に、特開平８−５０４９８号公報に開示されている方法
は、あくまで、同音異アクセント語を区別して音声認識
を行うことができるだけで、その音声認識の結果出力さ
れるテキストを翻訳し、さらに、その翻訳結果に対応す
る合成音を生成する場合に、プロソディ情報が考慮され
るわけではないため、上述したような違和感のある翻訳
文や合成音が出力されることに対処することはできな
い。However, in the method disclosed in Japanese Patent Laid-Open No. 50498/1996, words having the same word string but different accented words (for example, the above-mentioned "(an) English * teacher *" , "(An) * English * teac
her "). Furthermore, the method disclosed in Japanese Patent Application Laid-Open No. H8-50498 can only perform speech recognition by distinguishing homophonic accent words, When translating the text output as a result of recognition and generating a synthesized speech corresponding to the translation result, the prosody information is not taken into account. Output cannot be dealt with.

【００３７】また、例えば、特開平６−３３２４９４号
公報には、原言語による入力音声から、アクセントのあ
る語句を抽出し、その語句に対応する目的言語の語句に
アクセントを付す翻訳装置が開示されている。For example, Japanese Patent Application Laid-Open No. Hei 6-332494 discloses a translator that extracts an accented phrase from an input speech in a source language and adds an accent to a target language phrase corresponding to the extracted phrase. ing.

【００３８】しかしながら、特開平６−３３２４９４号
公報では、原言語においてアクセントのある語句に対応
する目的言語の語句に、アクセントがある場合（例え
ば、上述の"*Henry* has arrived."が「*ヘンリーが*到
着しました。」と翻訳される場合や、"Henry has arriv
ed."が「ヘンリーが到着しました。」と翻訳される場合
など）には対処することができるが、原言語においてア
クセントのある語句に対応する目的言語の語句に、アク
セントがない場合（例えば、上述の"Henry has *arrive
d*."が「ヘンリーは到着しました。」と翻訳される場合
など）、および原言語においてアクセントのない語句に
対応する目的言語の語句に、アクセントがある場合に対
処するのは困難である。However, in Japanese Patent Application Laid-Open No. Hei 6-332494, when a phrase in the target language corresponding to a phrase with an accent in the source language has an accent (for example, the above-mentioned "* Henry * has arrived." Henry has arrived. "Or" Henry has arriv
ed. "is translated as" Henry has arrived. "However, if the target language phrase that corresponds to the accented phrase in the source language has no accent (for example, "Henry has * arrive"
d *. "is translated as" Henry has arrived. "), and it is difficult to deal with accents in the target language that correspond to unaccented phrases in the source language. .

【００３９】本発明は、このような状況に鑑みてなされ
たものであり、精度の高い翻訳を行うことや、翻訳文
を、違和感のない合成音で出力することができるように
するものである。The present invention has been made in view of such a situation, and is intended to perform high-accuracy translation and to output a translated sentence with a synthesized sound without a sense of incongruity. .

【００４０】[0040]

【課題を解決するための手段】本発明の翻訳装置は、入
力文を入力する入力手段と、入力文を、翻訳文に翻訳す
るための対応関係が、第１または第２の言語のうちの少
なくとも一方のプロソディ情報とともに記述されている
テーブルに基づいて、入力文を、その入力文に対応する
翻訳文に翻訳する翻訳手段と、翻訳文を出力する出力手
段とを備えることを特徴とする。According to the translation apparatus of the present invention, an input means for inputting an input sentence and a correspondence relationship for translating the input sentence into a translated sentence are set in a first or second language. Based on a table described together with at least one piece of prosody information, a translation unit that translates an input sentence into a translation corresponding to the input sentence, and an output unit that outputs a translation is provided.

【００４１】入力手段には、音声を認識する音声認識手
段を設け、音声認識手段による音声認識結果を、入力文
として出力させることができる。The input means is provided with a voice recognition means for recognizing voice, and the result of voice recognition by the voice recognition means can be output as an input sentence.

【００４２】音声認識手段には、音声のプロソディ情報
を抽出する抽出手段を設け、音声の音声認識結果を、そ
の音声のプロソディ情報とともに出力させることがで
き、翻訳手段には、音声認識結果を、そのプロソディ情
報を用いて翻訳させることができる。The speech recognition means is provided with an extraction means for extracting speech prosody information, and the speech recognition result of the speech can be output together with the speech prosody information. The translation means outputs the speech recognition result. It can be translated using the prosody information.

【００４３】本発明の翻訳装置には、音声認識手段に対
して、プロソディ情報を要求する要求手段をさらに設け
ることができ、この場合、音声認識手段には、要求手段
による要求があった場合に、プロソディ情報を出力させ
ることができる。The translation apparatus according to the present invention may further include request means for requesting prosody information to the voice recognition means. In this case, the voice recognition means may receive a request from the request means. , Can output prosody information.

【００４４】出力手段には、翻訳文に対応する合成音を
生成する音声合成手段を設け、翻訳文を、合成音で出力
させることができる。The output means is provided with a speech synthesizing means for generating a synthesized speech corresponding to the translated sentence, and can output the translated sentence as a synthesized speech.

【００４５】翻訳手段には、翻訳文を、そのプロソディ
情報とともに出力させ、音声合成手段には、翻訳文に対
応する合成音を、そのプロソディ情報を用いて生成させ
ることができる。The translation means can output the translated sentence together with the prosody information, and the speech synthesizing means can generate a synthesized sound corresponding to the translated sentence using the prosody information.

【００４６】音声合成手段には、翻訳手段に対して、プ
ロソディ情報を要求する要求手段を設けることができ、
この場合、翻訳手段には、要求手段による要求があった
場合に、プロソディ情報を出力させることができる。The speech synthesizing means can be provided with a requesting means for requesting the prosody information from the translating means.
In this case, the translating means can output prosody information when requested by the request means.

【００４７】本発明の翻訳装置には、入力文を、翻訳文
に翻訳するための対応関係が、第１または第２の言語の
うちの少なくとも一方のプロソディ情報とともに記述さ
れているテーブルを記憶している記憶手段をさらに設け
ることができる。The translation apparatus according to the present invention stores a table in which a correspondence for translating an input sentence into a translated sentence is described together with prosody information of at least one of the first and second languages. Storage means can be further provided.

【００４８】本発明の翻訳方法は、入力文を入力する入
力ステップと、入力文を、翻訳文に翻訳するための対応
関係が、第１または第２の言語のうちの少なくとも一方
のプロソディ情報とともに記述されているテーブルに基
づいて、入力文を、その入力文に対応する翻訳文に翻訳
する翻訳ステップと、翻訳文を出力する出力ステップと
を備えることを特徴とする。[0048] In the translation method of the present invention, the input step of inputting an input sentence and the correspondence relation for translating the input sentence into a translated sentence include the prosody information of at least one of the first and second languages. It is characterized by including a translation step of translating an input sentence into a translation corresponding to the input sentence based on the described table, and an output step of outputting the translated sentence.

【００４９】本発明の記録媒体は、入力文を入力する入
力ステップと、入力文を、翻訳文に翻訳するための対応
関係が、第１または第２の言語のうちの少なくとも一方
のプロソディ情報とともに記述されているテーブルに基
づいて、入力文を、その入力文に対応する翻訳文に翻訳
する翻訳ステップと、翻訳文を出力する出力ステップと
を備えるプログラムが記録されていることを特徴とす
る。According to the recording medium of the present invention, an input step of inputting an input sentence and a correspondence relation for translating the input sentence into a translated sentence are provided together with prosody information of at least one of the first and second languages. A program is recorded which includes a translation step of translating an input sentence into a translation corresponding to the input sentence based on the described table, and an output step of outputting the translated sentence.

【００５０】本発明の翻訳装置および翻訳方法、並びに
記録媒体においては、入力文を、翻訳文に翻訳するため
の対応関係が、第１または第２の言語のうちの少なくと
も一方のプロソディ情報とともに記述されているテーブ
ルに基づいて、入力文が、その入力文に対応する翻訳文
に翻訳される。In the translation apparatus, the translation method, and the recording medium of the present invention, the correspondence for translating the input sentence into the translated sentence is described together with the prosody information of at least one of the first and second languages. The input sentence is translated into a translated sentence corresponding to the input sentence based on the table that has been set.

【００５１】[0051]

【発明の実施の形態】図２は、本発明を適用した音声翻
訳システム（システムとは、複数の装置が論理的に集合
した物をいい、各構成の装置が同一筐体中にあるか否か
は問わない）の一実施の形態の電気的構成例を示してお
り、図３は、その音声翻訳システムの外観構成例を示し
ている。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 2 shows a speech translation system to which the present invention is applied (a system means a plurality of devices logically assembled, and whether each device is in the same housing or not). FIG. 3 shows an example of the external configuration of the speech translation system.

【００５２】この音声翻訳システムでは、日本語や英語
等の原言語による音声が入力されると、その音声を、英
語や日本語等の目的言語に翻訳した翻訳文が出力される
ようになっている。In this speech translation system, when a speech in a source language such as Japanese or English is input, a translated sentence obtained by translating the speech into a target language such as English or Japanese is output. I have.

【００５３】即ち、例えば、日本語や英語などによる音
声は、マイク１１に入力され、音声認識部１に供給され
る。音声認識部１は、マイク１１からの音声を音声認識
し、その音声認識結果としてのテキスト、その他付随す
る情報を、機械翻訳部２や、表示部４などに出力する。That is, for example, a voice in Japanese or English is input to the microphone 11 and supplied to the voice recognition unit 1. The voice recognition unit 1 recognizes voice from the microphone 11 and outputs text as a result of the voice recognition and other accompanying information to the machine translation unit 2, the display unit 4, and the like.

【００５４】機械翻訳部２は、音声認識部１が出力する
音声認識結果を解析し、入力された音声の言語（原言
語）を、例えば英語や日本語などの目的言語に機械翻訳
し、その翻訳結果としてのテキスト、その他付随する情
報を、音声合成部３や、表示部４などに出力する。音声
合成部３は、機械翻訳部２などの出力に基づいて音声合
成処理を行い、入力された音声の、他の言語への翻訳結
果等としての合成音を出力する。The machine translation unit 2 analyzes the speech recognition result output from the speech recognition unit 1 and machine translates the language (original language) of the input speech into a target language such as English or Japanese. The text as a translation result and other accompanying information are output to the speech synthesizer 3, the display 4, and the like. The speech synthesis unit 3 performs a speech synthesis process based on the output of the machine translation unit 2 and the like, and outputs a synthesized sound as a result of translating the input speech into another language.

【００５５】表示部４は、例えば、液晶ディスプレイ等
で構成され、音声認識部１による音声認識結果や、機械
翻訳部２による機械翻訳結果等を表示する。The display unit 4 is composed of, for example, a liquid crystal display or the like, and displays the result of speech recognition by the speech recognition unit 1, the result of machine translation by the machine translation unit 2, and the like.

【００５６】操作部６は、例えば、カーソルを移動させ
る場合等に操作されるカーソルキー６Ａ、選択を確定す
る場合等に操作される決定キー６Ｂ、および選択をキャ
ンセルする場合等に操作されるキャンセルキー６Ｃで構
成され、操作部６の操作に対応する操作信号は、制御部
７に供給されるようになっている。制御部７は、操作部
６からの操作信号にしたがって、各種の処理を行うよう
になっている。なお、操作部６は、上述した場合の他、
文字の入力や、仮名漢字変換を行うとき等にも用いるこ
とができるようになっている。The operation unit 6 includes, for example, a cursor key 6A operated when moving a cursor, an enter key 6B operated when confirming a selection, and a cancel operation operated when canceling a selection. An operation signal corresponding to an operation of the operation unit 6 is constituted by a key 6C, and is supplied to the control unit 7. The control unit 7 performs various processes according to operation signals from the operation unit 6. Note that the operation unit 6 may be used in addition to the case described above.
It can be used for inputting characters and performing kana-kanji conversion.

【００５７】以上のように構成される音声翻訳システム
においては、原言語による音声が入力されると、その音
声が、音声認識部１で音声認識され、機械翻訳部２に供
給される。機械翻訳部２では、音声認識部１による音声
認識結果が、目的言語に機械翻訳され、音声合成部３に
供給される。音声合成部３では、機械翻訳部２からの翻
訳結果に対応する合成音が生成されて出力される。In the speech translation system configured as described above, when speech in the source language is input, the speech is recognized by the speech recognition unit 1 and supplied to the machine translation unit 2. In the machine translation unit 2, the speech recognition result by the speech recognition unit 1 is machine-translated into a target language and supplied to the speech synthesis unit 3. The speech synthesis unit 3 generates and outputs a synthesized speech corresponding to the translation result from the machine translation unit 2.

【００５８】次に、図４は、図２の音声認識部１の構成
例を示している。Next, FIG. 4 shows an example of the configuration of the speech recognition section 1 of FIG.

【００５９】ユーザによる原言語の発話は、マイク１１
に入力され、マイク１１では、その発話が、電気信号と
しての音声信号に変換される。この音声信号は、ＡＤ(A
nalog Digital)変換部１２に供給される。ＡＤ変換部１
２では、マイク１１からのアナログ信号である音声信号
がサンプリング、量子化され、ディジタル信号である音
声データに変換される。この音声データは、特徴抽出部
１３およびバッファ部１４に供給される。The utterance of the source language by the user is transmitted to the microphone 11
The microphone 11 converts the utterance into an audio signal as an electric signal. This audio signal is AD (A
(nalog Digital) converter 12. AD converter 1
In 2, the audio signal which is an analog signal from the microphone 11 is sampled and quantized, and is converted into audio data which is a digital signal. This audio data is supplied to the feature extraction unit 13 and the buffer unit 14.

【００６０】特徴抽出部１３は、ＡＤ変換部１２からの
音声データについて、適当なフレームごとに、例えば、
スペクトルや、線形予測係数、ケプストラム係数、線ス
ペクトル対等の特徴パラメータを抽出し、バッファ部１
４およびマッチング部１５に供給する。The feature extracting unit 13 converts the audio data from the AD converting unit 12 into appropriate frames, for example,
A feature parameter such as a spectrum, a linear prediction coefficient, a cepstrum coefficient, and a line spectrum pair is extracted, and the
4 and the matching unit 15.

【００６１】マッチング部１５は、特徴抽出部１３から
の特徴パラメータに基づき、音響モデルデータベース１
６、辞書データベース１７、および文法データベース１
８を必要に応じて参照しながら、マイク１１に入力され
た音声（入力音声）を認識する。The matching unit 15 is based on the feature parameters from the feature extracting unit 13 and
6, dictionary database 17, and grammar database 1
The voice input to the microphone 11 (input voice) is recognized while referring to 8 as necessary.

【００６２】即ち、音響モデルデータベース１６は、音
声認識する音声の言語における個々の音素や音節などの
音響的な特徴を表す音響モデルを記憶している。ここ
で、音響モデルとしては、例えば、ＨＭＭ(Hidden Mark
ov Model)などを用いることができる。辞書データベー
ス１７は、認識対象の各単語（語句）について、その発
音に関する情報が記述された単語辞書を記憶している。
文法データベース１８は、辞書データベース１７の単語
辞書に登録されている各単語が、どのように連鎖する
（つながる）かを記述した文法規則を記憶している。こ
こで、文法規則としては、例えば、文脈自由文法（ＣＦ
Ｇ）や、統計的な単語連鎖確率（Ｎ−ｇｒａｍ）などに
基づく規則を用いることができる。That is, the acoustic model database 16 stores acoustic models representing acoustic features such as individual phonemes and syllables in the language of the speech to be recognized. Here, as the acoustic model, for example, HMM (Hidden Mark
ov Model) can be used. The dictionary database 17 stores a word dictionary in which information on pronunciation for each word (phrase) to be recognized is described.
The grammar database 18 stores grammar rules describing how each word registered in the word dictionary of the dictionary database 17 is linked (connected). Here, as the grammar rule, for example, a context-free grammar (CF
G) or a rule based on statistical word chain probability (N-gram).

【００６３】マッチング部１５は、辞書データベース１
７の単語辞書を参照することにより、音響モデルデータ
ベース１６に記憶されている音響モデルを接続すること
で、単語の音響モデル（単語モデル）を構成する。さら
に、マッチング部１５は、幾つかの単語モデルを、文法
データベース１８に記憶された文法規則を参照すること
により接続し、そのようにして接続された単語モデルを
用いて、特徴パラメータに基づき、例えば、ＨＭＭ法等
によって、マイク１１に入力された音声を認識する。The matching section 15 stores the dictionary database 1
By connecting the acoustic models stored in the acoustic model database 16 by referring to the 7th word dictionary, an acoustic model (word model) of the word is formed. Further, the matching unit 15 connects several word models by referring to the grammar rules stored in the grammar database 18, and uses the thus connected word models based on the feature parameters, for example, , The HMM method or the like is used to recognize the voice input to the microphone 11.

【００６４】そして、マッチング部１５による音声認識
結果は、例えば、原言語によるテキストで出力される。The result of speech recognition by the matching unit 15 is output, for example, as a text in the source language.

【００６５】ここで、音響モデルデータベース１６に
は、汎用的な音響モデルの他、必要に応じて、男性の声
用や、女性の声用、非言語的な音用（例えば、笑い声
や、くしゃみ、舌打ち等用）等の、いわば特殊な音響モ
デルも記憶させておくことができる。この場合、マッチ
ング部１５において、汎用的な音響モデルだけでなく、
特殊な音響モデルをも用いて音声認識を行うことで、各
音響モデルによる音声認識結果の尤度に基づき、入力さ
れた音声が、男性の発話であるか、または女性の発話で
あるかや、舌打ち、あるいは笑い声であるか等を判定す
ることが可能となる。Here, in addition to the general-purpose acoustic model, the acoustic model database 16 may include, for example, a male voice, a female voice, and a non-verbal voice (for example, laughter, sneeze, etc.). , Tongue tapping, etc.) can also be stored. In this case, in the matching unit 15, not only a general-purpose acoustic model,
By performing speech recognition also using a special acoustic model, based on the likelihood of the speech recognition result by each acoustic model, whether the input speech is a male utterance or a female utterance, It is possible to determine whether tongue tapping, laughter, or the like.

【００６６】なお、マッチング部１５では、通常は、汎
用的な音響モデルだけを用いて音声認識を行い、機械翻
訳部２や音声合成部３から、特殊な音響モデルを用いて
の音声認識を行うように要求するリクエスト信号を受信
した場合にのみ、特殊な音響モデルをも用いて音声認識
を行うようにすることが可能である。It should be noted that the matching unit 15 normally performs speech recognition using only a general-purpose acoustic model, and performs speech recognition using a special acoustic model from the machine translation unit 2 and the speech synthesis unit 3. Only when a request signal requesting the above is received, it is possible to perform speech recognition using a special acoustic model as well.

【００６７】一方、バッファ部１４は、音声データバッ
ファ１４Ａおよび特徴量バッファ１４Ｂで構成され、音
声データバッファ１４Ａは、ＡＤ変換部１２が出力する
音声データを、特徴量バッファ１４Ｂは、特徴抽出部１
３が出力する特徴パラメータを、それぞれ一時記憶す
る。On the other hand, the buffer unit 14 comprises an audio data buffer 14A and a feature buffer 14B. The audio data buffer 14A stores the audio data output from the AD converter 12, and the feature buffer 14B stores the feature extraction unit 1
3 is temporarily stored.

【００６８】そして、音声データバッファ１４Ａに記憶
された音声データや、特徴量バッファ１４Ｂに記憶され
た特徴パラメータは、必要に応じて、プロソディ情報抽
出部１９によって読み出され、プロソディ情報抽出部１
９は、その音声データや特徴パラメータを用いて、入力
された音声（入力音声）のプロソディ情報を抽出し、そ
のプロソディ情報を表すプロソディデータを、マッチン
グ部１５が出力する音声認識結果としてのテキストに付
随する情報として出力する。The audio data stored in the audio data buffer 14A and the characteristic parameters stored in the characteristic amount buffer 14B are read by the prosody information extracting unit 19 as necessary, and
9 extracts prosody information of the input voice (input voice) using the voice data and the feature parameters, and converts prosody data representing the prosody information into a text as a voice recognition result output by the matching unit 15. Output as accompanying information.

【００６９】なお、プロソディ情報抽出部１９には、常
に、プロソディ情報を抽出させて、プロソディデータを
出力させることも可能であるし、機械翻訳部２や音声合
成部３から、プロソディ情報を要求するリクエスト信号
を受信した場合にのみ、プロソディ情報を抽出させるよ
うにすることも可能である。The prosody information extracting unit 19 can always extract prosody information and output prosody data, and requests the prosody information from the machine translation unit 2 or the speech synthesis unit 3. It is also possible to extract prosody information only when a request signal is received.

【００７０】また、プロソディ情報抽出部１９が出力す
るプロソディデータは、マッチング部１５に出力するよ
うにすることができ、この場合、マッチング部１５は、
音声認識結果としてのテキストの中に、プロソディ情報
抽出部１９からのプロソディデータを含めた形のデータ
を生成して出力する。The prosody data output from the prosody information extracting unit 19 can be output to the matching unit 15. In this case, the matching unit 15
In the text as the speech recognition result, data including the prosody data from the prosody information extracting unit 19 is generated and output.

【００７１】次に、図５は、図２の機械翻訳部２の構成
例を示している。FIG. 5 shows an example of the configuration of the machine translation unit 2 shown in FIG.

【００７２】テキスト解析部２１には、音声認識部１が
出力する音声認識結果としてのテキストが、機械翻訳の
対象として入力されるようになっており、テキスト解析
部２１は、辞書データベース２４や解析用文法データベ
ース２５を参照しながら、そのテキストを解析する。A text as a result of speech recognition output from the speech recognition unit 1 is input to the text analysis unit 21 as a target of machine translation. The text is analyzed with reference to the usage grammar database 25.

【００７３】即ち、辞書データベース２４には、各単語
（語句）の表記や、解析用文法を適用するために必要な
品詞情報などが記述された単語辞書が記憶されている。
また、解析用文法データベース２５には、単語辞書に記
述された各単語の情報に基づいて、単語連鎖に関する制
約等が記述された解析用文法規則が記憶されている。そ
して、テキスト解析部２１は、その単語辞書や解析用文
法規則に基づいて、そこに入力されるテキスト（入力テ
キスト）の形態素解析や、構文解析等を行い、その入力
テキストを構成する単語や構文の情報等の言語情報を抽
出する。ここで、テキスト解析部２１における解析方法
としては、例えば、正規文法や、文脈自由文法、統計的
な単語連鎖確率を用いたものなどがある。That is, the dictionary database 24 stores a word dictionary in which notation of each word (word) and part of speech information necessary for applying the grammar for analysis are described.
The analysis grammar database 25 stores analysis grammar rules in which restrictions on word chains are described based on information of each word described in the word dictionary. Then, the text analysis unit 21 performs morphological analysis, syntax analysis, and the like of the text (input text) input thereto based on the word dictionary and the grammatical rules for analysis, and executes the words and syntaxes constituting the input text. Linguistic information such as the information of Here, examples of the analysis method in the text analysis unit 21 include a regular grammar, a context-free grammar, and a method using a statistical word chain probability.

【００７４】テキスト解析部２１で得られた入力テキス
トの解析結果としての言語情報は、言語変換部２２に供
給される。言語変換部２２は、言語変換データベース２
６を参照し、入力テキストの言語（原言語）の言語情報
を、翻訳結果の言語（目的言語）の言語情報に変換す
る。The linguistic information as an analysis result of the input text obtained by the text analyzing unit 21 is supplied to the linguistic conversion unit 22. The language conversion unit 22 includes a language conversion database 2
6, the language information of the language (source language) of the input text is converted into the language information of the language (target language) of the translation result.

【００７５】即ち、言語変換データベース２６には、原
言語（言語変換部２２への入力の言語）の言語情報か
ら、目的言語（言語変換部２２からの出力の言語）の言
語情報への変換パターンや、原言語と目的言語との対訳
用例およびその対訳用例と原言語との間の類似度の計算
に用いられるシソーラス等の、言語情報を変換するため
の、原言語と目的言語との対応関係を記述した言語変換
データが記憶されている。そして、言語変換部２２で
は、このような言語変換データに基づいて、入力テキス
トの言語の言語情報が、目的言語の言語情報に変換され
る。That is, the language conversion database 26 stores a conversion pattern from the language information of the original language (the language of the input to the language conversion unit 22) to the language information of the target language (the language of the output from the language conversion unit 22). Correspondence between source language and target language for converting linguistic information, such as a translation example between the source language and target language and a thesaurus used to calculate the similarity between the translation example and the source language Is stored. Then, the language conversion unit 22 converts the language information of the language of the input text into language information of the target language based on such language conversion data.

【００７６】言語変換部２２で得られた目的言語の言語
情報は、テキスト生成部２３に供給される。テキスト生
成部２３は、辞書データベース２７および生成用文法デ
ータベース２８を参照することにより、言語変換部２２
からの目的言語の言語情報から、入力テキストを目的言
語に翻訳したテキストを生成する。The language information of the target language obtained by the language conversion unit 22 is supplied to a text generation unit 23. The text generation unit 23 refers to the dictionary database 27 and the generation grammar database 28, and
From the language information of the target language from, a text in which the input text is translated into the target language is generated.

【００７７】即ち、辞書データベース２７には、目的言
語の文を生成するのに必要な単語（語句）の品詞や活用
形等の情報が記述された単語辞書が記憶されており、ま
た、生成用文法データベース２８には、目的言語の文を
生成するのに必要な単語の活用規則や語順の制約等の生
成用文法規則が記憶されている。そして、テキスト生成
部２３は、これらの単語辞書および生成用文法規則に基
づいて、言語変換部２２からの言語情報を、テキストに
変換して出力する。That is, the dictionary database 27 stores a word dictionary in which information such as the part of speech and inflected forms of words (phrases) necessary to generate a sentence in the target language is stored. The grammar database 28 stores grammar rules for generation, such as word utilization rules and word order restrictions, necessary for generating sentences in the target language. Then, the text generator 23 converts the linguistic information from the language converter 22 into text based on the word dictionary and the grammar rules for generation, and outputs the text.

【００７８】ここで、テキスト解析部２１、言語変換部
２２、およびテキスト生成部２３は、必要に応じて、音
声認識部１が出力するプロソディデータを用いて処理を
行うようになっている。テキスト解析部２１や、言語変
換部２２、テキスト生成部２３がプロソディデータを用
いて処理を行った場合には、必要に応じてプロソディデ
ータを含んだ処理結果が出力される。なお、音声認識部
１が、常に、プロソディデータを出力するようになって
いない場合には、テキスト解析部２１、言語変換部２
２、およびテキスト生成部２３は、プロソディ情報が必
要なときに、それを要求するリクエスト信号を、音声認
識部１に出力するようになっている。Here, the text analysis unit 21, the language conversion unit 22, and the text generation unit 23 perform processing using the prosody data output from the speech recognition unit 1 as necessary. When the text analysis unit 21, the language conversion unit 22, and the text generation unit 23 perform processing using prosody data, a processing result including prosody data is output as necessary. If the speech recognition unit 1 does not always output prosody data, the text analysis unit 21 and the language conversion unit 2
2, and the text generation unit 23 outputs a request signal requesting the prosody information to the speech recognition unit 1 when the prosody information is required.

【００７９】また、テキスト解析部２１には、処理中に
参照した、辞書データベース２４および解析用文法デー
タベース２５の情報を保持させておくことができる。同
様に、言語変換部２２や、テキスト生成部２３にも、処
理中に参照した、言語変換データベース２６の情報や、
辞書データベース２７および生成用文法データベース２
８の情報を保持させておくことができる。この場合、テ
キスト解析部２１や、言語変換部２２、テキスト生成部
２３において、どの情報を参照して処理を行ったかの問
い合わせるリクエスト信号が、後で処理を行うブロック
（例えば、音声合成部３）から送信されてきたときに、
そのブロックに対して、リクエスト信号が要求する情報
（例えば、プロソディデータ等）を返信することが可能
となる。Further, the text analysis unit 21 can hold information of the dictionary database 24 and the grammar database for analysis 25 referred to during the processing. Similarly, the language conversion unit 22 and the text generation unit 23 also refer to the information of the language conversion database 26 referred to during the processing,
Dictionary database 27 and grammar database 2 for generation
8 information can be stored. In this case, in the text analysis unit 21, the language conversion unit 22, and the text generation unit 23, a request signal inquiring which information is referred to for processing is sent from a block (for example, the speech synthesis unit 3) that performs processing later. When sent,
Information (eg, prosody data or the like) requested by the request signal can be returned to the block.

【００８０】次に、図６は、図５の言語変換データベー
ス２６に記憶されている言語変換データを示している。Next, FIG. 6 shows language conversion data stored in the language conversion database 26 of FIG.

【００８１】本実施の形態では、言語変換データとし
て、原言語を目的言語に翻訳するための対応関係が、原
言語または目的言語のうちの少なくとも一方のプロソデ
ィ情報とともに記述されており、そのような言語変換デ
ータが、変換テーブルに登録されている。In this embodiment, the correspondence for translating the source language into the target language is described as the language conversion data together with the prosody information of at least one of the source language and the target language. Language conversion data is registered in the conversion table.

【００８２】即ち、図６は、日本語または英語のうちの
一方を、原言語とするとともに、他方を、目的言語とし
た場合の変換テーブルを示している。That is, FIG. 6 shows a conversion table when one of Japanese or English is used as a source language and the other is used as a target language.

【００８３】図６（Ａ）の変換テーブルでは、原言語に
よるテキストとしては、いずれも"English teacher"と
表現される目的言語によるテキスト「イギリス人の先
生」および「英語の先生」の対訳が、プロソディ情報と
しての強調されている部分を表す記号*を用いて記述さ
れている。即ち、図６（Ａ）では、英語"English *teac
her*"の対訳として、日本語「イギリス人の先生」が対
応付けられており、英語"*English* teacher"の対訳と
して、日本語「英語の先生」が対応付けられている。In the conversion table of FIG. 6 (A), as the text in the source language, a bilingual translation of the text “English teacher” and “English teacher” in the target language expressed as “English teacher” is used. It is described using a symbol * indicating the highlighted part as prosody information. That is, in FIG. 6A, English "English * teac
Japanese "English teacher" is associated as a translation of her * ", and Japanese" English teacher "is associated as a translation of English" * English * teacher ".

【００８４】この場合、言語変換部２２は、変換テーブ
ルを参照することで、英語による発話"English *teache
r*"を、その音声認識結果とプロソディデータを用い
て、日本語「イギリス人の先生」に正確に翻訳すること
ができる。また、言語変換部２２は、英語による発話"*
English* teacher"を、その音声認識結果とプロソディ
データを用いて、日本語「英語の先生」に正確に翻訳す
ることができる。In this case, the language conversion unit 22 refers to the conversion table to make the utterance “English * teache
"r *" can be accurately translated into Japanese "English teacher" using the speech recognition result and prosody data. The language conversion unit 22 outputs the utterance in English "*
English * teacher "can be accurately translated into Japanese" English teacher "using the speech recognition result and prosody data.

【００８５】図６（Ｂ）の変換テーブルでは、英語"NP
VP"の対訳として、日本語「NPはVP」が対応付けられて
おり、英語"*NP* VP"の対訳として、日本語「NPがVP」
が対応付けられている。ここで、NPは、名詞句(Noun Ph
rase)を、VPは、動詞句(VerbPhrase)を、それぞれ表
す。この場合、言語変換部２２は、英語による発話"NPV
P"を、その音声認識結果とプロソディデータを用いて、
日本語「NPはVP」に正確に翻訳することができ、また、
英語による発話"*NP* VP"も、日本語「NPがVP」に正確
に翻訳することができる。In the conversion table shown in FIG.
Japanese "NP is VP" as a translation of Japanese "NP is VP" as a translation of "VP"
Are associated with each other. Here, NP is a noun phrase (Noun Ph
rase), and VP indicates a verb phrase (VerbPhrase). In this case, the language conversion unit 22 outputs the utterance "NPV
P ", using the speech recognition result and prosody data,
Japanese "NP is VP" can be translated accurately,
The English utterance "* NP * VP" can be accurately translated into Japanese "NP VP".

【００８６】図６（Ｃ）の変換テーブルでは、日本語
「NPはVP」および「NPがVP」の対訳として、いずれも、
英語"NP VP"が対応付けられており、日本語「*NP*がV
P」の対訳として、英語"*NP* VP"が対応付けられてい
る。この場合、言語変換部２２では、日本語「NPはVP」
や「NPがVP」の対訳として、英語"NP VP"が出力され、
音声合成部３では、その英語"NP VP"に対応する合成
音、即ち、アクセントなしの合成音が生成される。ま
た、言語変換部２２では、日本語「*NP*がVP」の対訳と
して、英語"*NP* VP"が出力され、音声合成部３では、
その英語"*NP* VP"に対応する合成音、即ち、名詞句NP
にアクセントを付加した合成音が生成される。従って、
翻訳結果として、違和感のない合成音が出力される。In the conversion table of FIG. 6C, Japanese translations of "NP is VP" and "NP is VP"
English "NP VP" is associated, and Japanese "* NP * is V
The English translation "* NP * VP" is associated with the translation of "P". In this case, in the language conversion unit 22, Japanese "NP is VP"
Or "NP VP" is translated as English "NP VP".
The speech synthesizer 3 generates a synthesized sound corresponding to the English “NP VP”, that is, a synthesized sound without accent. The language conversion unit 22 outputs English “* NP * VP” as a translation of Japanese “* NP * is VP”.
A synthetic sound corresponding to the English language "* NP * VP", that is, the noun phrase NP
A synthesized sound in which accents are added to is generated. Therefore,
As a translation result, a synthesized sound without a sense of incongruity is output.

【００８７】図６（Ｄ）の変換テーブルでは、英語"NP1
also V *NP2*."の対訳として、日本語「NP1はNP2も
V。」が対応付けられており、英語"*NP1* also V NP2."
の対訳として、日本語「NP1もNP2をV。」が対応付けら
れている。ここで、Vは動詞を表す。この場合も、言語
変換部２２は、英語による発話"NP VP""NP1 also V *NP
2*."を、日本語「NP1はNP2もV。」に正確に翻訳するこ
とができ、また、英語による発話"*NP1* also V NP2."
を、日本語「NP1もNP2をV。」に正確に翻訳することが
できる。In the conversion table of FIG. 6D, the English “NP1
Also as a translation of V * NP2 *. ", the Japanese" NP1 is also NP2
V. "Is associated with English" * NP1 * also V NP2. "
The Japanese translation "NP1 is also NP2 V." Here, V represents a verb. Also in this case, the language conversion unit 22 outputs the utterance "NP VP""NP1 also V * NP
2 *. "Can be accurately translated into Japanese" NP1 is also NP2 V. "Also, English utterance" * NP1 * also V NP2. "
Can be accurately translated into Japanese "NP1 also NP2 V."

【００８８】なお、図６の実施の形態では、強調を表す
（強く発音される語句を表す）のに、アスタリスクを用
いるようにしたが、強調その他のプロソディ情報を表す
のには、例えば、ＨＴＭＬ(Hyper Text Markup Languag
e)等で採用されているタグを用いることも可能である。
但し、プロソディ情報のうちの、例えば、アクセントだ
けに注目した場合でも、言語によって、高低型や強弱型
等が存在し、そのような型を区別したタグによってプロ
ソディ情報を記述することは煩雑である。そこで、変換
テーブルの作成の際には、そのような型を区別しないア
スタリスク等の記号を用いて、いわば簡略化された変換
テーブルの記述を行い、後で、変換ツール等を用いて、
簡略化された変換テーブルを、タグを用いて記述された
変換テーブルにコンバートするようにすることができ
る。In the embodiment shown in FIG. 6, an asterisk is used to indicate emphasis (representing a phrase that is strongly pronounced). However, to express emphasis or other prosody information, for example, HTML is used. (Hyper Text Markup Languag
It is also possible to use tags adopted in e) and the like.
However, even if attention is paid only to accents in prosody information, for example, there are high and low types and high and low types depending on the language, and it is troublesome to describe prosody information using tags that distinguish such types. . Therefore, when creating a conversion table, using a symbol such as an asterisk that does not distinguish such a type, a description of a simplified conversion table is made, so to speak, using a conversion tool or the like,
The simplified conversion table can be converted into a conversion table described using a tag.

【００８９】次に、図７は、図２の音声合成部３の構成
例を示している。Next, FIG. 7 shows an example of the configuration of the speech synthesizer 3 in FIG.

【００９０】テキスト解析部３１には、機械翻訳部２が
出力する翻訳結果としてのテキストが、音声合成処理の
対象として入力されるようになっており、テキスト解析
部３１は、辞書データベース３４や解析用文法データベ
ース３５を参照しながら、そのテキストを解析する。The text analysis unit 31 receives a text as a translation result output from the machine translation unit 2 as an object of the speech synthesis processing. The text is analyzed with reference to the usage grammar database 35.

【００９１】即ち、辞書データベース３４には、各単語
（語句）の品詞情報や、読み、アクセント等の情報が記
述された単語辞書が記憶されており、また、解析用文法
データベース３５には、辞書データベース３４の単語辞
書に記述された単語について、単語連鎖に関する制約等
の解析用文法規則が記憶されている。そして、テキスト
解析部３１は、この単語辞書および解析用文法規則に基
づいて、そこに入力されるテキストの形態素解析や構文
解析等の解析を行い、後段の規則合成部３２で行われる
規則音声合成に必要な情報を抽出する。ここで、規則音
声合成に必要な情報としては、例えば、ポーズの位置
や、アクセントおよびイントネーションを制御するため
の情報その他の韻律情報や、各単語の発音等の音韻情報
などがある。That is, the dictionary database 34 stores a word dictionary in which part-of-speech information of each word (phrase) and information such as reading and accent are described, and the analysis grammar database 35 stores a dictionary. For words described in the word dictionary of the database 34, grammatical rules for analysis such as restrictions on word chains are stored. Then, the text analysis unit 31 performs an analysis such as morphological analysis or syntax analysis of the text input thereto based on the word dictionary and the grammatical rules for analysis, and performs a rule speech synthesis performed by the rule synthesis unit 32 in the subsequent stage. Extract necessary information. Here, the information necessary for the rule speech synthesis includes, for example, information for controlling the position of a pause, accent and intonation, other prosody information, and phoneme information such as pronunciation of each word.

【００９２】テキスト解析部３１で得られた情報は、規
則合成部３２に供給され、規則合成部３２では、音素片
データベース３６を用いて、テキスト解析部３１に入力
されたテキストに対応する合成音の音声データ（ディジ
タルデータ）が生成される。The information obtained by the text analysis unit 31 is supplied to the rule synthesis unit 32. The rule synthesis unit 32 uses the phoneme segment database 36 to generate synthesized speech corresponding to the text input to the text analysis unit 31. Is generated.

【００９３】即ち、音素片データベース３６には、例え
ば、ＣＶ(Consonant, Vowel)や、ＶＣＶ、ＣＶＣ等の形
で音素片データが記憶されており、規則合成部３２は、
テキスト解析部３１からの情報に基づいて、必要な音素
片データを接続し、さらに、ポーズ、アクセント、イン
トネーション等を適切に付加することで、テキスト解析
部３１に入力されたテキストに対応する合成音の音声デ
ータを生成する。That is, the speech segment database 36 stores speech segment data in the form of, for example, CV (Consonant, Vowel), VCV, CVC, and the like.
Based on the information from the text analysis unit 31, necessary speech segment data is connected, and a pause, an accent, an intonation, and the like are appropriately added, so that the synthesized speech corresponding to the text input to the text analysis unit 31 is obtained. To generate audio data.

【００９４】この音声データは、ＤＡ変換部３３に供給
され、そこで、アナログ信号としての音声信号に変換さ
れる。この音声信号は、図示せぬスピーカに供給され、
これにより、テキスト解析部３１に入力されたテキスト
に対応する合成音が出力される。This audio data is supplied to the DA converter 33, where it is converted into an audio signal as an analog signal. This audio signal is supplied to a speaker (not shown),
As a result, a synthesized sound corresponding to the text input to the text analysis unit 31 is output.

【００９５】ここで、テキスト解析部３１、規則合成部
３２、およびＤＡ変換部３３は、必要に応じて、音声認
識部１が出力するプロソディデータや、機械翻訳部２が
出力する翻訳結果に含まれるプロソディデータを用いて
処理を行うようになっている。なお、音声認識部１が、
常に、プロソディデータを出力するようになっていない
場合には、テキスト解析部３１、規則合成部３２、およ
びＤＡ変換部３３は、プロソディ情報が必要なときに、
それを要求するリクエスト信号を、音声認識部１に出力
するようになっている。同様に、機械翻訳部２が、常
に、プロソディデータを含む処理結果を出力するように
なっていない場合には、テキスト解析部３１、規則語末
井部３２、およびＤＡ変換部３３は、プロソディ情報が
必要なときに、それを要求するリクエスト信号を、機械
翻訳部２に出力するようになっている。Here, the text analyzing unit 31, the rule synthesizing unit 32, and the DA converting unit 33 include the prosody data output by the speech recognizing unit 1 and the translation result output by the machine translating unit 2 as necessary. The processing is performed using the prosody data. Note that the voice recognition unit 1
If the prosody data is not always output, the text analyzing unit 31, the rule synthesizing unit 32, and the DA converting unit 33 perform
A request signal for requesting the request is output to the voice recognition unit 1. Similarly, if the machine translation unit 2 does not always output a processing result including prosody data, the text analysis unit 31, the rule word end unit 32, and the DA conversion unit 33 output the prosody information. When necessary, a request signal requesting the request is output to the machine translation unit 2.

【００９６】なお、本実施の形態では、音声合成部３
は、プロソディ情報を表すタグ（以下、適宜、プロソデ
ィタグという）を用いて記述されたテキストを処理する
ことができるようになっている。例えば、強調を開始タ
グおよび終了タグで表すとすると、
音声合成部３では、プロソディタグを用いて記述され
た、例えば「一部を強調して喋りま
す」等のテキストに対応する合成音として、「強調し
て」の部分を強調したものが出力される。In this embodiment, the voice synthesizing unit 3
Can process text described using a tag representing prosody information (hereinafter, appropriately referred to as a prosody tag). For example, if your emphasis is expressed as a start tag and an end tag 
In the speech synthesis unit 3, the part "emphasized" is used as a synthesized sound corresponding to a text such as "partially emphasize speak" described using a prosody tag. Is output.

【００９７】また、ＤＡ変換部３３では、プロソディデ
ータが、例えば、合成音の発話速度や、音量等を調整す
るために用いられる。In the DA converter 33, the prosody data is used to adjust, for example, the utterance speed and volume of the synthesized sound.

【００９８】次に、図８のフローチャートを参照して、
図２の音声翻訳システムの動作について、さらに説明す
るする。Next, referring to the flowchart of FIG.
The operation of the speech translation system in FIG. 2 will be further described.

【００９９】図２の音声翻訳システムに対して、原言語
の音声が入力されると、その音声が、音声認識部１にお
いて音声認識される。When a speech in the source language is input to the speech translation system of FIG. 2, the speech is recognized by the speech recognition unit 1.

【０１００】即ち、音声認識部１（図４）において、原
言語の音声は、マイク１１に入力され、さらに、ＡＤ変
換部１２を介することで、ディジタルの音声データとさ
れる。この音声データは、音声データバッファ１４Ａに
供給されて一時記憶されるとともに、特徴量抽出部１３
に供給される。特徴量抽出部１３では、ステップＳ１に
おいて、ＡＤ変換部１２からの音声データから、特徴パ
ラメータが抽出され、特徴量バッファ１４Ｂに供給され
て一時記憶されるとともに、マッチング部１５に供給さ
れ、ステップＳ２に進む。That is, in the voice recognition unit 1 (FIG. 4), the voice in the original language is input to the microphone 11 and further converted into digital voice data via the AD conversion unit 12. This audio data is supplied to the audio data buffer 14A and is temporarily stored, and the characteristic amount extraction unit 13
Supplied to In the feature value extraction unit 13, in step S1, feature parameters are extracted from the audio data from the A / D conversion unit 12, supplied to the feature value buffer 14B and temporarily stored, and supplied to the matching unit 15 in step S2. Proceed to.

【０１０１】ステップＳ２では、マッチング部１５にお
いて、特徴抽出部１３からの特徴パラメータに基づい
て、原言語の音声（入力音声）が音声認識され、その音
声認識結果を表す原言語によるテキスト等が生成され
る。In step S2, the matching unit 15 performs speech recognition of the source language speech (input speech) based on the feature parameters from the feature extraction unit 13, and generates a text or the like in the source language representing the speech recognition result. Is done.

【０１０２】なお、マッチング部１５には、音声認識結
果を１つに特定せずに、複数の音声認識結果の候補を出
力させるようにすることができる。また、マッチング部
１５には、例えば、多義性をもったワードグラフの形式
で、音声認識結果を出力させるようにすることができ
る。ここで、例えば、日本語による音声「この近くに、
おいしいレストランはありますか」についてのワードグ
ラフの一例を、図９および図１０に示す。図９および図
１０の実施の形態においては、発話が開始された時刻を
基準とした開始時刻および終了時刻、その開始時刻から
終了時刻の間における音声の認識結果の候補（単語候
補）、並びに、その単語候補の対数尤度が、時系列に並
んでいる。なお、図１０は、図９に続く図である。The matching unit 15 can output a plurality of voice recognition result candidates without specifying one voice recognition result. In addition, the matching unit 15 can output a speech recognition result in the form of a word graph having ambiguity, for example. Here, for example, the voice in Japanese "Near this,
Examples of word graphs for "Is there a good restaurant?" Are shown in FIGS. 9 and 10. In the embodiment of FIGS. 9 and 10, the start time and the end time based on the time at which the utterance is started, a candidate for a speech recognition result (word candidate) between the start time and the end time, and The log likelihoods of the word candidates are arranged in chronological order. FIG. 10 is a view following FIG.

【０１０３】図８に戻り、ステップＳ２では、さらに、
プロソディ情報抽出部１９において、音声データバッフ
ァ１４に記憶された音声データ、および特徴量バッファ
１４Ｂに記憶された特徴量に基づいて、入力音声のプロ
ソディ情報が抽出され、そのプロソディ情報を表すプロ
ソディデータが、マッチング部１５に出力される。Returning to FIG. 8, in step S2, further,
In the prosody information extracting section 19, prosody information of the input audio is extracted based on the audio data stored in the audio data buffer 14 and the feature amount stored in the feature amount buffer 14B, and the prosody data representing the prosody information is extracted. Are output to the matching unit 15.

【０１０４】ここで、本実施の形態では、プロソディ情
報として、例えば、プリミディブなものと、ユーザ（入
力音声を発話したユーザ）の個人性を反映したもの、あ
るいは抽象化したものとが抽出されるようになってい
る。プリミディブなプロソディ情報としては、例えば、
入力音声のピッチや、パワー、発話速度、発話時間、ポ
ーズの長さ等がある。また、ユーザの個人性を反映した
プロソディ情報、あるいは抽象化したプロソディ情報と
しては、強弱アクセント／イントネーションや、高低ア
クセント／イントネーション、男性または女性のいずれ
の声であるか、どのような感情の口調であるか（例え
ば、怒っている口調であるとか、笑っている口調である
とかなど）、非言語的な音情報（例えば、舌打ちや、く
しゃみの情報）、平叙文であるか、または疑問文である
か、語句の区切り位置等がある。Here, in the present embodiment, for example, prosodic information, information that reflects the personality of the user (the user who uttered the input voice), or information that has been abstracted is extracted as prosody information. It has become. Primitive prosody information, for example,
There are pitch, power, utterance speed, utterance time, and pause length of the input voice. In addition, prosody information reflecting personality of a user or abstracted prosody information includes strong / slow accent / intonation, high / low accent / intonation, male or female voice, and any emotional tone. Something (eg, angry tone, laughing tone, etc.), non-verbal sound information (eg, tongue and sneeze information), declarative text, or question text There are, for example, word break positions.

【０１０５】プロソディ情報抽出部１９では、プリミデ
ィブなプロソディ情報は、音声データや、その特徴パラ
メータを用いて演算を行うことで求められ、ユーザの個
人性を反映したプロソディ情報、あるいは抽象化したプ
ロソディ情報は、プリミディブなプロソディ情報の１つ
以上を用いて求められる。In the prosody information extracting section 19, the primitive prosody information is obtained by performing an operation using audio data and its characteristic parameters, and the prosody information reflecting the personality of the user or the abstracted prosody information is obtained. Is obtained using one or more pieces of primitive prosody information.

【０１０６】即ち、例えば、語句の区切り位置は、その
位置におけるポーズの長さや、高低イントネーションの
変化を用いて求められる。具体的には、例えば、「全員
ロッカー支給」という日本語は、そのテキストだけを見
ると、「全員に、ロッカーを支給する」という意味にも
とれるし、「全員ロッカーという物があって、それを支
給する」という意味にもとれる。この場合、「全員ロッ
カー支給」という日本語が、例えば、「全員に、ロッカ
ーを支給する」という意味であるとすると、「全員ロッ
カー支給」という日本語の「全員」と「ロッカー」との
間の位置は、語句の区切り位置である。That is, for example, the position of the break of a word is determined using the length of the pause at that position or the change in the height intonation. Specifically, for example, the Japanese word "All lockers are paid" can mean "I will provide lockers to everyone" if you look at the text alone, and there is something called "All lockers, Will be provided. " In this case, if the Japanese word "all lockers are supplied" means, for example, "pay lockers to all", the Japanese language "all" and "lockers" will be used. Is a delimiter position of a phrase.

【０１０７】一方、「全員ロッカー支給」という発話が
行われた場合において、それが、「全員に、ロッカーを
支給する」という意味であれば、発話「全員ロッカー支
給」における「全員」と「ロッカー」との間には、一般
に、ポーズが挿入されたり、また、その間のイントネー
ションが低から高に変化する。On the other hand, when the utterance “all lockers are supplied” means “provide lockers to all”, “all” and “lockers” in the utterance “all lockers are supplied” are used. , A pause is generally inserted, and the intonation between them changes from low to high.

【０１０８】従って、この場合、発話「全員ロッカー支
給」における「全員」と「ロッカー」との間のポーズの
有無やイントネーション等のプリミディブなプロソディ
情報を用いることにより、その間の位置が、語句の区切
り位置かどうかを判定することができる。Therefore, in this case, by using the primordial prosody information such as the presence / absence of a pause between the “all” and the “rocker” in the utterance “all lockers are supplied” and the intonation, the position between them can be used to separate words and phrases. The position can be determined.

【０１０９】なお、音声認識結果についての語句の区切
り位置の情報を、機械翻訳部２において用いることによ
り、その音声認識結果の構文解析等を行う際に、その解
析を誤ること等を防止することができる。[0109] By using the information of the word delimiter position of the speech recognition result in the machine translator 2, it is possible to prevent mistakes in the parsing of the speech recognition result when parsing the speech recognition result. Can be.

【０１１０】ここで、図１１に、プロソディ情報抽出部
１９が出力するプロソディデータを示す。Here, FIG. 11 shows prosody data output by the prosody information extracting unit 19.

【０１１１】図１１（Ａ）においては、プロソディ情報
としての高低アクセント、強弱アクセント、および語句
区切りが、それぞれの開始時刻および終了時刻とともに
記述され、さらに、高低アクセントおよび強弱アクセン
トについては、その度合いも記述されている。なお、高
低アクセントや強弱アクセントの度合いは、０を基準と
して、高いまたは強い場合をプラスの数字で、低いまた
は弱い場合をマイナスの数字で、それぞれ表してある。In FIG. 11A, high / low accents, strong / slow accents, and phrase delimiters as prosody information are described together with respective start times and end times, and the degree of high / low accents and strong / slow accents is also described. It has been described. The degree of high / low accents and strong / slow accents is represented by a plus number when it is high or strong and a minus number when it is low or weak, based on 0.

【０１１２】図１１（Ｂ）においては、プロソディ情報
としての高アクセントの開始時刻および終了時刻だけが
記述されている。後段で処理を行う機械翻訳部２や音声
合成部３が、上述したようなプロソディ情報のすべてを
必要とせず、ある高さ以上の高アクセントの有無だけを
必要としている場合には、プロソディ情報抽出部１９に
おいて、すべてのプロソディ情報を抽出するのは無駄で
あり、必要なもののみ抽出すればよい。さらに、必要な
プロソディ情報が、上述のように、高アクセントのみで
あり、しかも、ある高さ以上の高アクセントの有無だけ
である場合には、図１１（Ａ）に示したように、プロソ
ディ情報の種類や、度合いは不要である。そこで、この
ような場合には、図１１（Ｂ）に示したような、ある高
さ以上の高アクセントというプロソディ情報が存在する
開始時刻および終了時刻だけが記述されたプロソディデ
ータを用いることが可能である。In FIG. 11B, only the start time and the end time of the high accent as prosody information are described. If the machine translation unit 2 and the speech synthesis unit 3 that perform processing in the subsequent stage do not need all of the prosody information as described above but need only the presence or absence of a high accent higher than a certain height, the prosody information extraction is performed. It is useless to extract all pieces of prosody information in the unit 19, and only necessary information need be extracted. Further, if the required prosody information is only the high accent as described above and only the presence or absence of a high accent of a certain height or higher, as shown in FIG. The type and degree are not required. Therefore, in such a case, as shown in FIG. 11B, prosody data describing only the start time and the end time in which prosody information of a high accent equal to or higher than a certain height exists can be used. It is.

【０１１３】なお、プロソディデータは、次に説明する
ように、マッチング部１５において、その音声認識結果
に含められるが、その際、プロソディタグに変換され
る。プロソディタグの例を、図１１（Ｃ）に示す。Note that the prosody data is included in the speech recognition result in the matching unit 15 as described below, but is converted into a prosody tag at that time. FIG. 11C shows an example of a prosody tag.

【０１１４】再び、図８に戻り、マッチング部１５は、
プロソディ情報抽出部１９からプロソディデータを受信
すると、ステップＳ３において、そのプロソディデータ
を、音声認識結果に含めたプロソディタグ付き認識結果
を生成し、機械翻訳部２に出力する。Returning again to FIG. 8, the matching unit 15
Upon receiving the prosody data from the prosody information extracting unit 19, in step S 3, a recognition result with a prosody tag including the prosody data in the speech recognition result is generated and output to the machine translation unit 2.

【０１１５】機械翻訳部２（図５）では、ステップＳ４
において、音声認識部１からのプロソディタグ付き認識
結果を用いて、目的言語への機械翻訳が行われる。In the machine translation unit 2 (FIG. 5), step S4
, The machine translation into the target language is performed using the recognition result with the prosody tag from the speech recognition unit 1.

【０１１６】即ち、機械翻訳部２では、テキスト解析部
２１において、音声認識部１からのプロソディタグ付き
認識結果を用いて、原言語による発話の音声認識結果の
テキスト解析が行われ、その解析結果が、言語変換部２
２に供給される。ここで、テキスト解析部２１におい
て、プロソディタグを参照することで、上述したような
誤った翻訳が行われるような解析が行われることを防止
することができる。That is, in the machine translation unit 2, the text analysis unit 21 performs a text analysis of the speech recognition result of the utterance in the original language using the recognition result with the prosody tag from the speech recognition unit 1, and the analysis result Is the language conversion unit 2
2 is supplied. Here, by referring to the prosody tag in the text analysis unit 21, it is possible to prevent the above-described analysis in which an erroneous translation is performed from being performed.

【０１１７】言語変換部２２は、言語変換データを参照
することにより、原言語のプロソディタグ付き認識結果
を翻訳し、目的言語のプロソディタグ付き翻訳結果に変
換する。このプロソディタグ付き翻訳結果は、テキスト
生成部２３に供給され、そこで処理された後、音声合成
部３および表示部４に出力される。The language conversion unit 22 translates the recognition result with the prosody tag of the source language by referring to the language conversion data, and converts it into the translation result with the prosody tag of the target language. The translation result with the prosody tag is supplied to the text generation unit 23, where it is processed and then output to the speech synthesis unit 3 and the display unit 4.

【０１１８】音声合成部３（図７）では、ステップＳ５
において、機械翻訳部２からのプロソディタグ付き翻訳
結果が解析される。即ち、音声合成部３のテキスト解析
部３１は、機械翻訳部２からのプロソディタグ付き翻訳
結果を、そのプロソディタグを参照することにより、高
い精度で解析し、規則音声合成に必要な音韻情報や、ア
クセントを表すアクセント記号等で構成される、規則音
声合成を行うための記号列に変換する。即ち、プロソデ
ィタグ付き翻訳結果の文字部分は音韻情報に変換され、
また、例えば、強調して発音することを表すプロソディ
タグ等は、所定の記号列に変換される。In the voice synthesizing section 3 (FIG. 7), step S5
In, the translation result with the prosody tag from the machine translation unit 2 is analyzed. That is, the text analysis unit 31 of the speech synthesis unit 3 analyzes the translation result with the prosody tag from the machine translation unit 2 with high accuracy by referring to the prosody tag, and obtains phonological information and speech information required for the rule-based speech synthesis. , And a symbol string composed of accent marks or the like representing accents for performing rule-based speech synthesis. That is, the character part of the translation result with the prosody tag is converted into phoneme information,
Also, for example, a prosody tag or the like indicating that it is emphasized and pronounced is converted into a predetermined symbol string.

【０１１９】そして、テキスト解析部３１が出力する記
号列は、規則合成部３２に供給され、規則合成部３２で
は、ステップＳ６において、その記号列に基づいて、規
則音声合成が行われる。ここで、テキスト解析部３１が
出力する記号列には、プロソディタグ付き翻訳結果に含
まれていた、テキスト解析部３１で変換されなかったプ
ロソディタグが含まれており、規則音声合成部３２は、
その記号列に含まれるプロソディタグを、必要に応じて
用いて処理を行う。これにより、規則音声合成部３２で
は、例えば、入力音声をほぼ同一の発話速度による合成
音や、入力音声を発話したユーザの性別、あるいは口調
を反映した合成音等のデータを生成する。このデータ
は、ＤＡ変換部３３においてアナログ信号に変換され、
これにより、目的言語による翻訳文に対応する合成音
が、スピーカ５から出力される。また、表示部４では、
機械翻訳部２から供給される目的言語による翻訳文が表
示され、処理を終了する。The symbol string output by the text analysis unit 31 is supplied to the rule synthesizing unit 32. In step S6, the rule synthesizing unit 32 performs rule-based speech synthesis based on the symbol string. Here, the symbol string output by the text analysis unit 31 includes a prosody tag that was not converted by the text analysis unit 31 and that was included in the translation result with the prosody tag.
Processing is performed using the prosody tag included in the symbol string as needed. As a result, the rule speech synthesizer 32 generates data such as a synthesized speech at substantially the same utterance speed of the input speech, a gender of the user who uttered the input speech, or a synthesized tone reflecting the tone. This data is converted into an analog signal in the DA converter 33,
As a result, a synthesized sound corresponding to the translation in the target language is output from the speaker 5. In the display unit 4,
The translation in the target language supplied from the machine translation unit 2 is displayed, and the process ends.

【０１２０】以上の処理によれば、精度の高い翻訳を行
い、また、翻訳文を、違和感のない合成音で出力するこ
とが可能となる。According to the above-described processing, it is possible to perform high-accuracy translation and to output a translated sentence as a synthesized sound without a sense of incongruity.

【０１２１】即ち、例えば、原言語による発話が、"I a
lso like *her*."であった場合に、音声認識部１のマッ
チング部１５において、例えば、図１２（Ａ）に示すよ
うなワードグラフが、音声認識結果として得られるとと
もに、プロソディ情報抽出部１９において、例えば、図
１２（Ｂ）に示すようなプロソディデータが得られたと
する。なお、図１２（Ｂ）の実施の形態では、図１１
（Ｂ）で説明した場合と同様に、プロソディ情報のうち
の強アクセントが存在する開始時刻および終了時刻だけ
が記述されたプロソディデータを用いている。That is, for example, the utterance in the source language is “I a
lso like * her *. ", the matching unit 15 of the voice recognition unit 1 obtains, for example, a word graph as shown in FIG. For example, assume that prosody data as shown in Fig. 12B is obtained at step 19. In the embodiment of Fig. 12B, Fig. 11
As in the case described in (B), prosody data in which only the start time and the end time of the prosody information where a strong accent exists is used.

【０１２２】この場合、マッチング部１５では、ワード
グラフとプロソディデータから、図１２（Ｃ）に示すよ
うなプロソディタグ付き認識結果が生成される。ここ
で、図１２（Ｃ）に示したプロソディタグ付き認識結
果"I also like <stress>her</stress>."は、強アクセ
ントを表すプロソディタグの開始タグ<stress>と終了タ
グ</stress>との間にある単語"her"に、強アクセントが
あることを表している。In this case, the matching unit 15 generates a recognition result with a prosody tag as shown in FIG. 12C from the word graph and the prosody data. Here, the recognition result with prosody tag “I also like <stress> her </ stress>.” Shown in FIG. 12C is a start tag <stress> and an end tag </ stress of a prosody tag representing a strong accent. This indicates that the word "her" between and> has a strong accent.

【０１２３】機械翻訳部２には、図１２（Ｃ）に示した
ようなプロソディタグ付き認識結果が供給され、テキス
ト解析部２１において解析が行われた後、言語変換部２
２に供給される。言語変換部２２では、プロソディタグ
付き認識結果と、言語変換データベース２６における各
言語変換データとのマッチングが行われ、これにより、
プロソディタグ付き認識結果に最も合致する言語変換デ
ータが検出される。The recognition result with the prosody tag as shown in FIG. 12C is supplied to the machine translation unit 2, and after the text analysis unit 21 analyzes the result, the language conversion unit 2
2 is supplied. In the language conversion unit 22, matching between the recognition result with the prosody tag and each language conversion data in the language conversion database 26 is performed.
Language conversion data that best matches the recognition result with the prosody tag is detected.

【０１２４】即ち、いま、言語変換部２２が参照する言
語変換データベース２６に、図６に示した変換テーブル
が記憶されているとするとともに、テキスト解析部２１
によるプロソディタグ付き認識結果"I also like <stre
ss>her</stress>."の解析結果として、"NP1 also V <st
ress>NP2</stress>."が得られたとする。That is, it is assumed that the conversion table shown in FIG. 6 is stored in the language conversion database 26 referred to by the language conversion unit 22, and the text analysis unit 21
Recognition result with prosody tag by "I also like <stre
ss> her </ stress>. "
ress> NP2 </ stress>. "

【０１２５】この場合、プロソディタグ付き認識結果"I
also like <stress>her</stress>."の解析結果"NP1 al
so V <stress>NP2</stress>."に最も合致する言語変換
データは、図６（Ｄ）の変換テーブルの１行目に記述さ
れている"NP1 also V *NP2*."であり、言語変換部２２
では、この言語変換データが検出される。即ち、解析結
果の最初の単語"NP1"と言語変換データの最初の単語"NP
1"、解析結果の２番目の単語"also"と言語変換データの
２番目の単語"also"、解析結果の３番目の単語"V"と言
語変換データの３番目の単語"V"、および解析結果の４
番目（最後）の単語"NP2"と言語変換データの４番目
（最後）の単語"NP2"は、いずれも一致しており、解析
結果の４番目の単語"NP2"を囲むプロソディタグ<stress
>および</stress>と、言語変換データの４番目の単語"N
P"を囲むアスタリスクは、いずれも強調を表すから、こ
れらも一致する。従って、言語変換部２２では、図６
（Ｄ）の変換テーブルの１行目に記述されている"NP1 a
lso V *NP2*."が、プロソディタグ付き認識結果"I also
like <stress>her</stress>."の解析結果"NP1 also V
<stress>NP2</stress>."に最も合致する言語変換データ
として検出される。In this case, the recognition result with prosody tag “I
also like <stress> her </ stress>. "
so V <stress> NP2 </ stress>. "is" NP1 also V * NP2 *. "described in the first line of the conversion table in FIG. Language converter 22
Then, the language conversion data is detected. That is, the first word "NP1" of the analysis result and the first word "NP" of the language conversion data
1 ", the second word" also "of the analysis result and the second word" also "of the language conversion data, the third word" V "of the analysis result and the third word" V "of the language conversion data, and Analysis result 4
The fourth (last) word "NP2" and the fourth (last) word "NP2" in the language conversion data match, and the prosody tag <stress surrounding the fourth word "NP2" in the analysis result
> And </ stress> and the fourth word "N" in the language conversion data
The asterisks surrounding P "all indicate emphasis, so they also match. Therefore, in the language conversion unit 22, FIG.
"NP1a" described in the first line of the conversion table of (D)
lso V * NP2 *. "is the recognition result with prosody tag" I also
like <stress> her </ stress>. "
<stress> NP2 </ stress>. ".

【０１２６】そして、言語変換部２２は、検出した言語
変換データ"NP1 also V *NP2*."と、その対訳「NP1はNP
2もV。」を、プロソディタグ付き音声認識結果"I also
like <stress>her</stress>."とともに、テキスト生成
部２３に出力する。Then, the language conversion unit 22 converts the detected language conversion data "NP1 also V * NP2 *."
2 also V. "Prosody-tagged speech recognition results" I also
like <stress> her </ stress>. "to the text generator 23.

【０１２７】なお、プロソディタグ付き認識結果のプロ
ソディタグを無視した場合、即ち、プロソディタグのな
い音声認識結果を解析して、その解析結果に最も合致す
る言語変換データを検出する場合には、図６（Ｄ）の"N
P1 also V *NP2*."と、"*NP1* also V NP2."の両方が検
出されることとなり、その優劣をつけることが困難であ
るため、以降の処理を続行するのが困難となる。When the prosody tag of the recognition result with the prosody tag is ignored, that is, when the speech recognition result without the prosody tag is analyzed and the language conversion data that best matches the analysis result is detected, the diagram shown in FIG. 6 (D) "N
"P1 also V * NP2 *." And "* NP1 * also V NP2." Are both detected, and it is difficult to determine the superiority. Therefore, it is difficult to continue the subsequent processing. .

【０１２８】また、プロソディタグ付き認識結果と言語
変換データとのマッチングの方法としては、一般の文章
検索で用いられている、例えば、論理式による検索方法
や、ベクトル空間法による検索方法等も採用することが
できる。論理式による検索方法を採用した場合には、プ
ロソディタグ付き認識結果を構成する語句を用いて、論
理式（例えば、キーワードとなる語句を、ANDや、OR、N
OT等を用いて接続したもの）が作成され、その論理式と
言語変換データとを比較することにより、プロソディタ
グ付き認識結果に最も合致する言語変換データが検索さ
れる。また、ベクトル空間法による検索方法を採用した
場合には、プロソディタグ付き認識結果を構成する語句
から、クエリーベクトル(Query Vector)が作成されると
ともに、言語変換データに基づいてベクトルが作成さ
れ、その２つのベクトルがなす角度を最も小さくする言
語変換データが、プロソディタグ付き認識結果に最も合
致する言語変換データとして検索される。As a method of matching the recognition result with the prosody tag and the language conversion data, for example, a search method using a logical expression, a search method using a vector space method, and the like, which are used in general sentence search, are employed. can do. When a search method using a logical expression is adopted, a logical expression (for example, a word that becomes a keyword is AND, OR, N
OT or the like is created, and by comparing the logical expression with the language conversion data, the language conversion data that best matches the recognition result with the prosody tag is retrieved. When a search method based on the vector space method is adopted, a query vector (Query Vector) is created from words constituting the recognition result with the prosody tag, and a vector is created based on the language conversion data. The language conversion data that minimizes the angle formed by the two vectors is searched for as the language conversion data that best matches the recognition result with the prosody tag.

【０１２９】テキスト生成部２３は、言語変換データ"N
P1 also V *NP2*."、その対訳「NP1はNP2もV。」、およ
びプロソディタグ付き音声認識結果"I also like <stre
ss>her</stress>."を受信すると、言語変換データにお
けるNP1に対応する"I"、Vに対応する"like"、NP2に対応
する"her"を、それぞれ目的言語である日本語に翻訳
し、これにより、NP1に対して「私」、Vに対して「好き
です」、NP2に対して「彼女」を、それぞれ得る。そし
て、テキスト生成部２３は、それらの日本語訳を、対訳
「NP1はNP2もV。」にあてはめ、翻訳結果として、図１
２（Ｄ）に示すように、「私は彼女も好きです。」を得
る。The text generator 23 converts the language conversion data "N
P1 also V * NP2 *. "And its translation" NP1 is also NP2 V. "and the speech recognition result with prosody tag" I also like <stre
ss> her </ stress>. ", the" I "corresponding to NP1," like "corresponding to V, and" her "corresponding to NP2 in the language conversion data are converted into the target language Japanese. By translation, this gives “I” for NP1, “I like” for V, and “She” for NP2. Then, the text generation unit 23 applies those Japanese translations to the bilingual translation “NP1 is also NP2 V.”
As shown in 2 (D), "I like her."

【０１３０】この翻訳結果「私は彼女も好きです。」
は、音声合成部３に供給され、これにより、"I also li
ke *her*."に対する翻訳結果として、合成音「私は彼女
も好きです。」が出力される。This translation result "I like her too."
Is supplied to the speech synthesis unit 3, whereby “I also li
As a translation result for "ke * her *.", a synthetic sound "I like her" is output.

【０１３１】なお、図２の音声翻訳システムでは、例え
ば、前述の特開平６−３３２４９４号公報に記載されて
いるように、原言語による入力音声においてアクセント
のある語句に対応する目的言語の語句にアクセントを付
して出力することも可能である。In the speech translation system shown in FIG. 2, for example, as described in the above-mentioned Japanese Patent Application Laid-Open No. 6-332494, a phrase in the target language corresponding to an accented phrase in the input speech in the source language is used. It is also possible to output with accents.

【０１３２】即ち、例えば、上述の入力音声"I also li
ke *her*."については、そのプロソディタグ付き認識結
果"I also like <stress>her</stress>."から、"her"に
アクセントがあることを検出することができる。さら
に、その機械翻訳時に、アクセントのある"her"に対応
する日本語訳が「彼女」であることも検出することがで
きる。従って、翻訳結果「私は彼女も好きです。」の
「彼女」に、強調を表すプロソディタグを付加し、プロ
ソディタグ付き翻訳結果を、音声合成部３に出力するこ
とにより、原言語による入力音声においてアクセントの
ある語句に対応する目的言語の語句に対してアクセント
を付した合成音を出力することができる。That is, for example, the above-mentioned input voice “I also li
For "ke * her *.", it is possible to detect that "her" has an accent from its prosody-tagged recognition result "I also like <stress> her </ stress>." During translation, it can also be detected that the Japanese translation corresponding to the accented "her" is "her". Therefore, by adding a prosody tag indicating emphasis to “her” of the translation result “I also like her.” And outputting the translation result with the prosody tag to the speech synthesizer 3, the input speech in the original language can be obtained. Thus, it is possible to output a synthesized sound in which an accent is added to a word in the target language corresponding to the accented word.

【０１３３】但し、英語におけるアクセントは、強弱ア
クセントであるのに対して、日本語におけるアクセント
は、高低アクセントであるから、翻訳結果に付加するプ
ロソディタグは、プロソディタグ付き認識結果"I also
like <stress>her</stress>."に含まれる強弱アクセン
トを表すプロソディタグではなく、高低アクセントを表
すプロソディタグとするのが好ましい。いま、高低アク
セントを表すプロソディタグを、図１１（Ｃ）に示した
ように、<hi-picth>および</hi-picth>で表すと、プロ
ソディタグ付き翻訳結果は、図１２（Ｅ）に示すよう
に、「私は<hi-picth>彼女</hi-picth>も好きです。」
となる。However, since accents in English are strong and weak accents, accents in Japanese are high and low accents, the prosody tag added to the translation result is the recognition result with prosody tag "I also
Like <stress> her </ stress>. ", it is preferable to use a prosody tag indicating a high / low accent instead of a prosody tag indicating a high / low accent. As shown in FIG. 12, when expressed by <hi-picth> and </ hi-picth>, as shown in FIG. 12 (E), the translation result with the prosody tag is “I <hi-picth> her . "
Becomes

【０１３４】以上のように、機械翻訳部２において、原
言語を目的言語に翻訳するための翻訳情報が、原言語ま
たは目的言語のうちの少なくとも一方のプロソディ情報
とともに記述されている変換テーブルに基づいて、翻訳
を行うようにしたので、精度の高い翻訳を行い、また、
翻訳文を、違和感のない合成音で出力することが可能と
なる（翻訳結果としてふさわしい合成音を出力すること
が可能となる）。As described above, in the machine translation unit 2, the translation information for translating the source language into the target language is based on the conversion table in which at least one of the source language and the target language is described. The translation is performed with high accuracy.
It is possible to output a translated sentence with a synthetic sound without a sense of incongruity (it is possible to output a synthetic sound suitable as a translation result).

【０１３５】即ち、例えば、図１３に示すように、英語
のユーザが、"I also like *her*."と発話した場合に
は、図１４に示すように、「私は彼女も好きです。」と
いう正しい翻訳結果の合成音が出力される。なお、図１
３においては（次に説明する図１５においても同様）、
強調する単語には、下線を付してある。That is, for example, as shown in FIG. 13, when an English user utters “I also like * her *.”, As shown in FIG. 14, “I also like her. Is output. FIG.
3 (similarly in FIG. 15 described below),
Words to be emphasized are underlined.

【０１３６】また、例えば、図１５に示すように、英語
のユーザが、"*Henry* has arrived."と発話した場合に
は、「*ヘンリーが*到着しました。」という正しい翻訳
結果（英語の発話のアクセントから考えて、適切な助詞
である「が」を用いた翻訳結果）の合成音が、正しい位
置にアクセントが付されて出力される。For example, as shown in FIG. 15, when an English user utters “* Henry * has arrived.”, The correct translation result “English has arrived.” In consideration of the accent of the utterance, the synthesized sound of the translation using the appropriate particle "ga") is output with the correct position accented.

【０１３７】さらに、例えば、図１６に示すように、英
語のユーザが、"Henry has arrived."と発話した場合に
は、「ヘンリーは到着しました。」という正しい翻訳結
果（英語の発話にアクセントがないことから考えて、適
切な助詞である「は」を用いた翻訳結果）の合成音が、
正しい位置にアクセントが付されて出力される。Further, for example, as shown in FIG. 16, when the English user utters "Henry has arrived.", The correct translation result of "Henry has arrived." Given that there is no translation, the synthesized sound of the translation using the appropriate particle "ha")
Outputs with correct accents.

【０１３８】なお、例えば、"English *teacher*"を正
しく「イギリス人の先生」に翻訳するとともに、"*Engl
ish* teacher"を正しく「英語の先生」に翻訳すること
は、音声認識部１（図４）における辞書データベース１
７等に、"English *teacher*"と"*English* teacher"と
を、別の単語（語句）として記述した辞書を登録してお
き、"English *teacher*"と"*English* teacher"の音声
認識結果を区別して出力するようにするとともに、機械
翻訳部２においても、"English *teacher*"と"*English
* teacher"の音声認識結果を区別して取り扱うことで行
うことが可能である。For example, while “English * teacher *” is correctly translated to “English teacher”, “* Engl
Correct translation of "ish * teacher" to "English teacher" requires the dictionary database 1 in the speech recognition unit 1 (FIG. 4).
For 7th grade, a dictionary describing "English * teacher *" and "* English * teacher" as different words (phrases) is registered, and "English * teacher *" and "* English * teacher" In addition to outputting the speech recognition result separately, the machine translation unit 2 also outputs "English * teacher *" and "* English
* Teacher's voice recognition results can be handled separately.

【０１３９】しかしながら、同一の単語列ではあるが、
アクセントのある単語が異なる音声を区別して音声認識
を行うための辞書を作成するには、例えば、複合語化す
ることによりアクセントのある単語が移動する単語列
で、かつその翻訳結果が、アクセントのある単語が移動
していない場合と異なるものを調査して、辞書に記述す
る必要がある。従って、そのような辞書の作成にあたっ
ては、音声認識部１を製作する製作者に、音声認識処理
のための知識以外の言語学的な知識等も要求されるた
め、その作成は、現実には困難であると考えられる。However, although they are the same word string,
In order to create a dictionary for performing speech recognition by distinguishing voices with different accented words, for example, a word sequence in which accented words move by compounding and the translation result is It is necessary to investigate what is different from the case where a word has not moved, and write it in the dictionary. Therefore, when creating such a dictionary, the creator who manufactures the speech recognition unit 1 is required to have linguistic knowledge other than knowledge for speech recognition processing. It is considered difficult.

【０１４０】ところで、上述した場合においては、音声
認識部１（図４）のプロソディ情報抽出部１９におい
て、常時、プロソディ情報を抽出し、プロソディデータ
を出力するようにしたが、そのプロソディデータが、機
械翻訳部２や音声合成部３においてすべて用いられると
は限らず、その一部しか用いられないことがある（前述
の特開平６−３３２４９４号公報に記載されている、原
言語による入力音声から、アクセントのある語句を抽出
し、その語句に対応する目的言語の語句にアクセントを
付す翻訳装置においても、アクセントの抽出を、全音声
区間に亘って行っているが、翻訳結果の合成音に反映さ
れるのは、抽出されたアクセントの一部である）。In the above case, the prosody information extracting section 19 of the voice recognition section 1 (FIG. 4) always extracts prosody information and outputs prosody data. Not all of them are used in the machine translation unit 2 and the speech synthesis unit 3, and only some of them are sometimes used (from the input speech in the source language described in the above-mentioned Japanese Patent Application Laid-Open No. 6-332494). In a translation device that extracts words with accents and adds accents to the words in the target language corresponding to the words, the accent is also extracted over the entire voice section, but is reflected in the synthesized speech of the translation result. It is part of the extracted accent).

【０１４１】具体的には、例えば、図１５と図１６とに
示したように、同一単語列で構成される発話"*Henry* h
as arrived."と"Henry has arrive."との訳し分けを行
う場合において、それぞれの発話のプロソディ情報のう
ち、翻訳結果に影響を与えるのは、"Henry"にアクセン
トがあるかどうかという点だけである。Specifically, for example, as shown in FIGS. 15 and 16, the utterance “* Henry * h
When translating between "as arrived." and "Henry has arrive.", the only thing that affects the translation result among the prosody information of each utterance is whether "Henry" has an accent. It is.

【０１４２】従って、常時、全音声区間に亘ってプロソ
ディ情報を抽出することは、音声認識部１の後に処理を
行う機械翻訳部２や音声合成部３において用いられない
プロソディ情報を抽出することがあり、処理効率が良い
とはいえない。Therefore, always extracting prosody information over the entire voice section means extracting prosody information that is not used by the machine translation unit 2 or the speech synthesis unit 3 that performs processing after the speech recognition unit 1. And the processing efficiency is not good.

【０１４３】そこで、音声認識部１（図４）のプロソデ
ィ情報抽出部１９には、その後段で処理を行う機械翻訳
部２や音声合成部３から要求があった場合にのみ、必要
な音声区間だけを対象に、必要なプロソディ情報だけを
抽出させるようにすることができる。この場合、処理効
率を向上させる（演算量を少なくする）ことができる。Therefore, the prosody information extraction unit 19 of the speech recognition unit 1 (FIG. 4) needs a necessary speech segment only when requested by the machine translation unit 2 or the speech synthesis unit 3 that performs processing in the subsequent stage. Only necessary prosody information can be extracted for only the target. In this case, processing efficiency can be improved (the amount of calculation can be reduced).

【０１４４】なお、同様の観点から、機械翻訳部２に
も、その後段で処理を行う音声合成部３から要求があっ
た場合にのみ所定の処理を行わせるようにして、機械翻
訳部２における処理効率を向上させることが可能であ
る。From the same point of view, the machine translation unit 2 is also made to perform predetermined processing only when requested by the speech synthesis unit 3 which performs processing in the subsequent stage. It is possible to improve processing efficiency.

【０１４５】この場合、音声認識部１、機械翻訳部２、
音声合成部３では、相互に、図１７に示すようなやりと
りが行われる。なお、図１７において、縦方向は、時間
の経過を表す。In this case, the speech recognition unit 1, the machine translation unit 2,
The voice synthesizing unit 3 exchanges data with each other as shown in FIG. In FIG. 17, the vertical direction indicates the passage of time.

【０１４６】図１７に示した場合においては、音声が入
力されると、音声認識部１において音声認識が行われ、
その音声認識結果Ｄ₁が、機械翻訳部２に出力される。
このとき音声認識部１のプロソディ情報抽出部１９で
は、プロソディ情報の抽出は行われず、従って、音声認
識部１は、プロソディデータを含んでいない音声認識結
果Ｄ₁を出力する。なお、ここでは、プロソディ情報抽
出部１９において、一切のプロソディ情報を抽出しない
こととしたが、機械翻訳部２や音声合成部３において処
理に用いられる頻度の高いプロソディ情報だけは常時抽
出するようにし、音声認識結果Ｄ₁に含めるようにする
ことが可能である。In the case shown in FIG. 17, when a voice is input, the voice recognition section 1 performs voice recognition,
The speech recognition result D ₁ is output to the machine translation unit 2.
In Prosody extraction unit 19 of the speech recognition unit 1 at this time, extraction of Prosody is not performed, therefore, the speech recognition unit 1 outputs the voice recognition results D ₁ contains no prosodic data. Here, the prosody information extracting unit 19 does not extract any prosody information, but always extracts only prosody information frequently used in processing in the machine translation unit 2 and the speech synthesis unit 3. , it is possible to be included in the speech recognition result D _1.

【０１４７】機械翻訳部２は、音声認識部１から音声認
識結果Ｄ₁を受信すると、その音声認識結果Ｄ₁を用いて
処理を行い、その処理中に、プロソディ情報が必要とな
った場合には、処理を中断して、必要なプロソディ情報
を要求するリクエスト信号Ｄ₂を、音声認識部１に出力
する。[0147] Machine translation unit 2 receives the speech recognition result D ₁ from the speech recognition unit 1 performs processing using the speech recognition result D _1, during the process, when it becomes necessary Prosody Interrupts the process and outputs to the speech recognition unit 1 a request signal D ₂ requesting necessary process information.

【０１４８】ここで、リクエスト信号Ｄ₂によって要求
するプロソディ情報としては、例えば、ある単語に対応
する音声に、強弱アクセントや高低アクセントが付され
ているかどうか（付されていれば、どの程度の度合いの
アクセントか）や、ユーザ（発話者）の性別は男性また
は女性のいずれであるか、ある単語とその直後の単語と
の間に音声上の区切りがあるかどうか、文末のイントネ
ーションは上り調子または下り調子のいずれであるか等
といった情報がある。Here, the prosody information requested by the request signal D ₂ includes, for example, whether or not a voice corresponding to a certain word has a strong or weak accent (to what degree, Accent), whether the gender of the user (speaker) is male or female, whether there is a speech break between a word and the word immediately following it, and the intonation at the end of the sentence is up or down There is information such as which of the conditions is.

【０１４９】音声認識部１のプロソディ情報抽出部１９
は、機械翻訳部２から、リクエスト信号Ｄ₂を受信する
と、そのリクエスト信号Ｄ₂によって要求されているプ
ロソディ情報を、音声データバッファ１４Ａおよび特徴
量バッファ１４Ｂを参照することで求め、リクエスト信
号Ｄ₂に対する応答Ｄ₃として、機械翻訳部２に出力す
る。The prosody information extraction unit 19 of the speech recognition unit 1
From the machine translation unit 2, upon receiving the request signal D _2, the prosody information requested by the request signal D _2, determined by referring to the audio data buffer 14A and the feature quantity buffer 14B, the request signal D ₂ in response D ₃ against, and outputs to the machine translation unit 2.

【０１５０】機械翻訳部２は、応答Ｄ₃を受信すると、
その応答Ｄ₃に含まれるプロソディ情報を用いて処理を
続行し、以下、処理中に、プロソディ情報が必要となっ
た場合には、処理を中断して、そのプロソディ情報を要
求するリクエスト信号を、音声認識部１に出力し、音声
認識部１から、必要なプロソディ情報としての応答を得
て、処理を再開することを繰り返す。When receiving the response D ₃ , the machine translator 2
Continue the process by using the prosody information included in the response D _3, hereinafter, during processing, if it becomes necessary Prosody suspends the processing, a request signal requesting the Prosody, It outputs to the voice recognition unit 1, obtains a response as necessary prosody information from the voice recognition unit 1, and restarts the process.

【０１５１】そして、機械翻訳部２は、音声認識部１か
らの音声認識結果Ｄ₁の翻訳を完了すると、その翻訳結
果Ｄ₄を、音声合成部３に出力する。When the translation of the speech recognition result D ₁ from the speech recognition unit 1 is completed, the machine translation unit 2 outputs the translation result D ₄ to the speech synthesis unit 3.

【０１５２】音声合成部２は、機械翻訳部２から翻訳結
果Ｄ₄を受信すると、その翻訳結果Ｄ₄を用いて処理を行
い、その処理中に、プロソディ情報や、その他の情報が
必要となった場合には、処理を中断して、必要なプロソ
ディ情報やその他の情報を要求するリクエスト信号Ｄ₅
を機械翻訳部２に出力し、あるいは同様のリクエスト信
号Ｄ₇を音声認識部１に出力する。Upon receiving the translation result D ₄ from the machine translation unit 2, the speech synthesis unit 2 performs processing using the translation result D _4, and during the processing, prosody information and other information are required. In this case, the processing is interrupted and the request signal D ₅ requesting necessary prosody information and other information is interrupted.
The output to the machine translation unit 2, or outputs the same request signal D ₇ to the speech recognition unit 1.

【０１５３】ここで、リクエスト信号Ｄ₅によって、機
械翻訳部２に要求する情報としては、例えば、同表記異
音語の発音（例えば、"read"は、現在形と過去形とで発
音が異なるが、その発音）や、同表記異アクセント語の
アクセント（例えば、"increase"は、品詞によってアク
セントが異なるが、そのアクセント位置）、複合語化し
ている可能性のある単語列のイントネーション（例え
ば、"English teacher"のイントネーション）、文末の
イントネーションは上り調子または下り調子のいずれで
あるか等といった情報がある。なお、これらの情報は、
常時、機械翻訳部２が出力する翻訳結果としてのテキス
トに、プロソディタグと同様のタグを用いて含めるよう
にすることも可能である。Here, as information requested by the machine translation unit 2 by the request signal D ₅ , for example, pronunciation of the same transcription word (for example, “read” is pronounced differently between the present tense and the past tense) However, its pronunciation), the accent of the same spelling different accent word (for example, "increase" has a different accent depending on the part of speech, but the accent position), and the intonation of a word string that may be compounded (for example, There is information such as whether the intonation at the end of the sentence is in ascending or descending tone. Please note that this information
It is also possible to always include the same text as the prosody tag in the text as the translation result output by the machine translator 2.

【０１５４】また、リクエスト信号Ｄ₇によって、音声
認識部１に要求する情報としては、入力音声の発話速度
や、ピッチ、ユーザの性別、感情、口調、文末のイント
ネーションは上り調子または下り調子のいずれであるか
等といった情報がある。Information requested by the voice recognition unit 1 by the request signal D ₇ includes the speech speed and pitch of the input voice, the user's gender, emotion, tone, and the intonation of the end of the sentence, either in ascending or descending tone. There is information such as whether there is.

【０１５５】機械翻訳部２は、音声合成部３から、リク
エスト信号Ｄ₅を受信すると、そのリクエスト信号Ｄ₅に
よって要求されている情報を、上述したように、テキス
ト解析部２１や、言語変換部２２、テキスト生成部２３
が処理中に保持しておいた情報を参照することで求め、
リクエスト信号Ｄ₅に対する応答Ｄ₆として、音声合成部
３に出力する。Upon receiving the request signal D ₅ from the speech synthesizing unit 3, the machine translating unit 2 converts the information requested by the request signal D ₅ into the text analyzing unit 21 and the language converting unit as described above. 22, text generator 23
By referring to the information that was retained during processing,
In response D ₆ to the request signal D _5, and outputs the speech synthesizing unit 3.

【０１５６】また、音声認識部１は、音声合成部３か
ら、リクエスト信号Ｄ₇を受信すると、そのリクエスト
信号Ｄ₂によって要求されている、例えばプロソディ情
報を、音声データバッファ１４Ａおよび特徴量バッファ
１４Ｂを参照することで求め、リクエスト信号Ｄ₇に対
する応答Ｄ₈として、音声合成部３に出力する。Further, upon receiving the request signal D ₇ from the voice synthesizing unit 3, the voice recognizing unit 1 transmits, for example, prosody information requested by the request signal D ₂ to the voice data buffer 14 A and the feature amount buffer 14 B. determined by reference to the, as a response D ₈ to the request signal D _7, and outputs the speech synthesizing unit 3.

【０１５７】音声合成部３は、応答Ｄ₆やＤ₈を受信する
と、その応答Ｄ₆やＤ₈に含まれる情報を用いて処理を続
行し、以下、処理中に、プロソディ情報、その他の必要
な情報が必要となった場合には、処理を中断して、その
必要な情報を要求するリクエスト信号Ｄ₅やＤ₇を、機械
翻訳部２や音声認識部１に出力し、機械翻訳部２や音声
認識部１から、必要な情報としての応答を得て、処理を
再開することを繰り返す。Upon receiving the responses D ₆ and D ₈ , the speech synthesizing unit 3 continues the process using the information included in the responses D ₆ and D _8. Hereinafter, during the processing, the prosody information and other necessary a case where the information is required, interrupts the process, a request signal D ₅ and D ₇ to request the necessary information, and outputs to the machine translation unit 2 and the speech recognition unit 1, the machine translation unit 2 And a response as necessary information is obtained from the voice recognition unit 1 and the process is repeated.

【０１５８】なお、音声合成部３から、音声認識部１に
対しては、直接、リクエスト信号を供給するのではな
く、機械翻訳部２を介して、間接的に、リクエスト信号
を供給するようにし、また、音声認識部１から、音声合
成部３に対しても、直接、応答を供給するのではなく、
機械翻訳部２を介して、間接的に、応答を供給するよう
にすることが可能である。It should be noted that the request signal is not supplied directly from the speech synthesis section 3 to the speech recognition section 1 but is supplied indirectly via the machine translation section 2. Also, instead of directly supplying a response from the speech recognition unit 1 to the speech synthesis unit 3,
It is possible to supply a response indirectly via the machine translator 2.

【０１５９】即ち、例えば、入力音声"I also like *he
r*."に対して、その日本語訳「私は彼女も好きです。」
の合成音を出力する場合に、「私は*彼女*も好きで
す。」のように、入力音声で強調されている単語"her"
に対応する「彼女」を強調するときには、音声合成部３
は、機械翻訳部２から、「私は彼女も好きです。」とい
う翻訳文を受け取った後に、その中の「彼女」に対応す
る入力音声にアクセントがあるかどうかの調査を要求す
るリクエスト信号を、機械翻訳部２に出力する。機械翻
訳部２は、そのリクエスト信号による要求に関して、
「彼女」に対応する入力音声における単語が"her"であ
ることを認識しているので、その"her"に対応する入力
音声にアクセントがあるかどうかの調査を要求するリク
エスト信号を、音声認識部１に出力する。音声認識部１
では、入力音声"I also like *her*."における"her"
に、アクセントがあるかどうかが調査され、いまの場
合、"her"にアクセントがあるため、その旨の応答が、
リクエスト信号を出力してきた機械翻訳部２に供給され
る。機械翻訳部２は、音声認識部１から、"her"にアク
セントがある旨の応答を受信すると、「彼女」に対応す
る入力音声にアクセントがある旨の応答を、リクエスト
信号を出力してきた音声合成部３に出力する。音声合成
部３は、このようにして、機械翻訳部２から、「彼女」
に対応する入力音声にアクセントがある旨の応答を受信
すると、翻訳文「私は彼女も好きです。」における「彼
女」を強調した合成音「私は*彼女*も好きです。」を生
成して出力する。That is, for example, the input voice "I also like * he
r *. ", the Japanese translation of which is" I also like her. "
If you output the synthesized sound of the word "her" that is emphasized in the input voice, such as "I also like * her *."
When emphasizing “She” corresponding to, the voice synthesis unit 3
Receives a translation from "Machine Translator 2" saying "I also like her" and then sends a request signal requesting a check to see if the input speech corresponding to "her" in it has an accent. , To the machine translation unit 2. The machine translation unit 2 relates to the request by the request signal,
Since it recognizes that the word in the input speech corresponding to "her" is "her", the request signal requesting whether the input speech corresponding to the "her" has an accent is recognized by speech recognition. Output to section 1. Voice recognition unit 1
Then, "her" in the input voice "I also like * her *."
Is checked for an accent, and in this case, because "her" has an accent,
The request signal is supplied to the machine translator 2 which has output the request signal. When the machine translation unit 2 receives a response indicating that "her" has an accent from the voice recognition unit 1, the machine translation unit 2 transmits a response indicating that the input voice corresponding to "her" has an accent to the voice that has output the request signal. Output to the combining unit 3. In this way, the speech synthesis unit 3 outputs “her” from the machine translation unit 2.
Receives a response indicating that the input speech corresponding to has an accent, generates a synthetic sound that emphasizes "her" in the translated sentence "I also like her." Output.

【０１６０】次に、図１８のフローチャートを参照し
て、機械翻訳部２や音声合成部３がリクエスト信号を出
力するとともに、音声認識部１や機械翻訳部２が、その
リクエスト信号に対する応答を出力する場合の、音声認
識部１の動作について説明する。Next, referring to the flowchart of FIG. 18, machine translation unit 2 and speech synthesis unit 3 output a request signal, and speech recognition unit 1 and machine translation unit 2 output a response to the request signal. The operation of the voice recognition unit 1 in such a case will be described.

【０１６１】音声認識部１では、ステップＳ１１におい
て、音声が入力されたか、またはリクエスト信号が、機
械翻訳部２や音声合成部３から送信されてきたかといっ
たイベントが生じるまで、待ち時間がおかれ、イベント
が生じると、ステップＳ１２に進み、どのようなイベン
トが生じたかが判定される。ステップＳ１２において、
音声が入力されたというイベント生じたと判定された場
合、ステップＳ１３に進み、その入力された音声が認識
され、ステップＳ１４に進む。ステップＳ１４では、ス
テップＳ１３における音声認識結果が、機械翻訳部２に
出力され、ステップＳ１１に戻り、以下、同様の処理を
繰り返す。なお、ステップＳ１４においては、音声認識
結果を、機械翻訳部２に出力するとともに、必要に応じ
て、表示部５に供給して表示させるようにすることも可
能である。In the voice recognition unit 1, in step S11, a waiting time is set until an event occurs, such as whether a voice is input or a request signal is transmitted from the machine translation unit 2 or the voice synthesis unit 3. When an event occurs, the process proceeds to step S12, and it is determined what event has occurred. In step S12,
If it is determined that an event that a voice has been input has occurred, the process proceeds to step S13, the input voice is recognized, and the process proceeds to step S14. In step S14, the speech recognition result in step S13 is output to the machine translator 2, and the process returns to step S11, and thereafter, the same processing is repeated. In step S14, the speech recognition result may be output to the machine translation unit 2 and, if necessary, supplied to the display unit 5 for display.

【０１６２】一方、ステップＳ１２において、リクエス
ト信号が送信されてきたというイベントが生じたと判定
された場合、ステップＳ１５に進み、そのリクエスト信
号による要求に応じて、プロソディ情報が抽出される。
即ち、ステップＳ１５では、プロソディ情報抽出部１９
において、例えば、アクセントの抽出や、発話速度の計
算、ユーザが男性または女性のいずれであるかの識別、
非言語的な音情報の抽出等が行われる。そして、ステッ
プＳ１６に進み、抽出されたプロソディ情報を表すプロ
ソディデータが、リクエスト信号に対する応答として、
そのリクエスト信号を送信してきたブロック（ここで
は、機械翻訳部２または音声合成部３のうちのいずれ
か）に供給され、ステップＳ１１に戻り、以下、同様の
処理が繰り返される。On the other hand, if it is determined in step S12 that an event that a request signal has been transmitted has occurred, the process proceeds to step S15, and prosody information is extracted in response to a request by the request signal.
That is, in step S15, the prosody information extraction unit 19
In, for example, extraction of accents, calculation of speech rate, identification of whether the user is male or female,
Non-verbal sound information is extracted. Then, the process proceeds to step S16, in which prosody data representing the extracted prosody information is provided as a response to the request signal.
The request signal is supplied to the block that has transmitted the request signal (in this case, either the machine translation unit 2 or the speech synthesis unit 3), the process returns to step S11, and the same process is repeated.

【０１６３】次に、図１９のフローチャートを参照し
て、機械翻訳部２や音声合成部３がリクエスト信号を出
力するとともに、音声認識部１や機械翻訳部２が、その
リクエスト信号に対する応答を出力する場合の、機械翻
訳部２の動作について説明する。Next, referring to the flowchart of FIG. 19, the machine translation unit 2 and the speech synthesis unit 3 output a request signal, and the speech recognition unit 1 and the machine translation unit 2 output a response to the request signal. In this case, the operation of the machine translation unit 2 will be described.

【０１６４】機械翻訳部２では、ステップＳ２１におい
て、音声認識部１から音声認識結果が送信されてきた
か、または音声合成部３から、リクエスト信号が送信さ
れてきたかといったイベントが生じるまで、待ち時間が
おかれ、イベントが生じると、ステップＳ２２に進み、
どのようなイベントが生じたかが判定される。ステップ
Ｓ２２において、音声認識結果が送信されてきたという
イベント生じたと判定された場合、ステップＳ２３に進
み、その音声認識結果が、テキスト解析部２１において
解析され、その解析結果が、言語変換部２２に供給され
る。言語変換部２２では、ステップＳ２４において、テ
キスト解析部２１が出力する解析結果に基づいて、原言
語による音声認識結果が、目的言語の言語情報に変換さ
れ、テキスト生成部２３に出力される。テキスト生成部
２３では、ステップＳ２５において、言語変換部２２か
らの言語情報に基づいて、原言語による音声認識結果
が、目的言語のテキストに翻訳され、ステップＳ２６に
進み、その翻訳結果が、音声合成部３に出力される。そ
して、ステップＳ２１に戻り、以下、同様の処理を繰り
返す。At step S21, the machine translator 2 waits until an event such as whether a speech recognition result has been transmitted from the speech recognizer 1 or a request signal has been transmitted from the speech synthesizer 3 has occurred. When an event occurs, the process proceeds to step S22,
It is determined what event has occurred. If it is determined in step S22 that the event that the speech recognition result has been transmitted has occurred, the process proceeds to step S23, where the speech recognition result is analyzed by the text analysis unit 21, and the analysis result is transmitted to the language conversion unit 22. Supplied. In step S24, the language conversion unit 22 converts the speech recognition result in the source language into language information in the target language based on the analysis result output by the text analysis unit 21 and outputs the result to the text generation unit 23 in step S24. In step S25, the text generation unit 23 translates the speech recognition result in the source language into a text in the target language based on the language information from the language conversion unit 22, and proceeds to step S26. Output to the unit 3. Then, the process returns to step S21, and thereafter, the same processing is repeated.

【０１６５】なお、ステップＳ２６においては、翻訳結
果を、音声合成部３に出力するとともに、必要に応じ
て、表示部５に供給して表示させるようにすることも可
能である。また、テキスト解析部２１、言語変換部２
２、およびテキスト生成部２３は、後述するように、そ
の処理中に、音声認識部１において求められる情報（こ
こでは、プロソディ情報）が必要となった場合は、処理
を中断し、その情報を要求するリクエスト信号を、音声
認識部１に送信する。そして、そのリクエスト信号に対
応して、音声認識部１から情報が送信されてくるのを待
って、処理を再開する。In step S26, the translation result can be output to the speech synthesis unit 3 and, if necessary, supplied to the display unit 5 for display. Further, the text analysis unit 21 and the language conversion unit 2
As described later, if the information (here, prosody information) required by the voice recognition unit 1 becomes necessary during the processing, the text generation unit 23 interrupts the processing and outputs the information. A request signal is transmitted to the voice recognition unit 1. Then, in response to the request signal, the process is restarted after information is transmitted from the voice recognition unit 1.

【０１６６】一方、ステップＳ２２において、音声合成
部３から、リクエスト信号が送信されてきたというイベ
ントが生じたと判定された場合、ステップＳ２７に進
み、そのリクエスト信号によって要求されている情報が
求められる。即ち、上述したように、機械翻訳部２は、
処理中に参照した、辞書データベース２４、解析用文法
データベース２５、言語変換データベース２６、辞書デ
ータベース２７、および生成用文法データベース２８の
情報を保持しており、その保持している情報を参照する
ことで、例えば、入力音声やその翻訳結果におけるアク
セントの位置、単語の発音、非言語的な情報等の、リク
エスト信号によって要求されている情報が求められる。
そして、ステップＳ２８に進み、求められた情報が、リ
クエスト信号に対する応答として、そのリクエスト信号
を送信してきた音声合成部３に供給され、ステップＳ２
１に戻り、以下、同様の処理が繰り返される。On the other hand, if it is determined in step S22 that an event that a request signal has been transmitted from the voice synthesizing unit 3 has occurred, the process proceeds to step S27, and information requested by the request signal is obtained. That is, as described above, the machine translation unit 2
It holds information of the dictionary database 24, the grammar database for analysis 25, the language conversion database 26, the dictionary database 27, and the grammar database 28 for generation referred to during the processing, and refers to the held information. For example, information required by the request signal, such as the position of an accent in an input voice or its translation result, pronunciation of a word, and nonlinguistic information, is obtained.
Then, the process proceeds to step S28, where the obtained information is supplied as a response to the request signal to the voice synthesizing unit 3 that has transmitted the request signal.
Returning to 1, the same processing is repeated thereafter.

【０１６７】次に、図２０を参照して、図１９のステッ
プＳ２３乃至Ｓ２５の処理それぞれの詳細について説明
する。Next, with reference to FIG. 20, the details of each of the processes in steps S23 to S25 in FIG. 19 will be described.

【０１６８】まず最初に、図２０（Ａ）のフローチャー
トを参照して、機械翻訳部２のテキスト解析部２１が行
うステップＳ２３の処理の詳細について説明する。First, the details of the processing in step S23 performed by the text analysis unit 21 of the machine translation unit 2 will be described with reference to the flowchart in FIG.

【０１６９】テキスト解析部２１は、ステップＳ３１に
おいて、音声認識部１からの入力、即ち、音声認識結果
の全部または一部を読み込み、ステップＳ３２に進み、
その音声認識結果を処理するのに、何らかのプロソディ
情報が必要かどうかを判定する。ステップＳ３２におい
て、特に、プロソディ情報が必要でないと判定された場
合、ステップＳ３３およびＳ３４をスキップして、ステ
ップＳ３５に進み、音声認識結果の解析が行われる。In step S31, the text analysis unit 21 reads the input from the speech recognition unit 1, that is, reads all or a part of the speech recognition result, and proceeds to step S32.
It is determined whether some kind of prosody information is needed to process the speech recognition result. In step S32, in particular, when it is determined that prosody information is not necessary, steps S33 and S34 are skipped, and the process proceeds to step S35, where the speech recognition result is analyzed.

【０１７０】また、ステップＳ３２において、何らかの
プロソディ情報が必要であると判定された場合、ステッ
プＳ３３に進み、テキスト解析部２１は、その必要なプ
ロソディ情報を要求するリクエスト信号（どの音声区間
の、どのような種類のプロソディ情報が必要なのかを含
む信号）を、音声認識部１に出力する。そして、そのリ
クエスト信号に対応する応答としての必要なプロソディ
情報が、音声認識部１から送信されてくるのを待って、
ステップＳ３４に進む。ステップＳ３４では、音声認識
部１から送信されてくる、リクエスト信号に対応する応
答としての必要なプロソディ情報が受信され、ステップ
Ｓ３５に進み、そのプロソディ情報を用いて、音声認識
結果が解析される。If it is determined in step S32 that some kind of prosody information is necessary, the process proceeds to step S33, where the text analysis unit 21 requests a request signal (for which voice section, which A signal including whether such kind of prosody information is required) is output to the voice recognition unit 1. Then, after waiting for the necessary prosody information corresponding to the request signal to be transmitted from the voice recognition unit 1,
Proceed to step S34. In step S34, necessary process information as a response to the request signal transmitted from the voice recognition unit 1 is received, and the process proceeds to step S35, where the voice recognition result is analyzed using the process information.

【０１７１】ステップＳ３５の処理後は、ステップＳ３
６に進み、音声認識部１が出力する音声認識結果のすべ
ての解析を終了したかどうかが判定される。ステップＳ
３６において、音声認識結果のすべての解析を、まだ終
了していないと判定された場合、ステップＳ３１に戻
り、まだ解析していない音声認識結果が読み込まれ、以
下、同様の処理が繰り返される。After the processing in step S35, step S3
Then, it is determined whether or not all the analysis of the speech recognition result output by the speech recognition unit 1 has been completed (step 6). Step S
In 36, if it is determined that all the analysis of the voice recognition result has not been completed, the process returns to step S31, the voice recognition result that has not been analyzed is read, and the same processing is repeated thereafter.

【０１７２】一方、ステップＳ３６において、音声認識
結果のすべての解析を終了したと判定された場合、リタ
ーンする。On the other hand, if it is determined in step S36 that all the analysis of the speech recognition result has been completed, the process returns.

【０１７３】次に、図２０（Ｂ）のフローチャートを参
照して、機械翻訳部２の言語変換部２２が行うステップ
Ｓ２４の処理の詳細について説明する。Next, the details of the processing in step S24 performed by the language conversion unit 22 of the machine translation unit 2 will be described with reference to the flowchart in FIG.

【０１７４】言語変換部２２は、ステップＳ４１におい
て、テキスト解析部２１からの音声認識結果の解析結果
の全部または一部を読み込み、ステップＳ４２に進み、
その解析結果を処理するのに、何らかのプロソディ情報
が必要かどうかを判定する。ステップＳ４２において、
特に、プロソディ情報が必要でないと判定された場合、
ステップＳ４３およびＳ４４をスキップして、ステップ
Ｓ４５に進み、テキスト解析部２１からの解析結果が言
語変換処理される。In step S41, the language conversion unit 22 reads all or a part of the analysis result of the speech recognition result from the text analysis unit 21, and proceeds to step S42.
It is determined whether any prosody information is needed to process the analysis result. In step S42,
In particular, if it is determined that prosody information is not required,
Skipping steps S43 and S44, the process proceeds to step S45, where the analysis result from the text analysis unit 21 is subjected to language conversion processing.

【０１７５】また、ステップＳ４２において、何らかの
プロソディ情報が必要であると判定された場合、ステッ
プＳ４３，Ｓ４４に順次進み、言語変換部２２では、図
２０（Ａ）のステップＳ３３，Ｓ３４における場合とそ
れぞれ同様の処理が行われ、これにより、必要なプロソ
ディ情報が、音声認識部１から取得される。そして、ス
テップＳ４５に進み、そのプロソディ情報を用いて、ス
テップＳ４１で読み込んだ解析結果が言語変換処理され
る。If it is determined in step S42 that some kind of prosody information is necessary, the process proceeds to steps S43 and S44, and the language conversion unit 22 performs the processing in steps S33 and S34 of FIG. A similar process is performed, whereby necessary prosody information is obtained from the voice recognition unit 1. Then, the process proceeds to step S45, and the analysis result read in step S41 is subjected to language conversion processing using the prosody information.

【０１７６】ステップＳ４５の処理後は、ステップＳ４
６に進み、テキスト解析部２１が出力する解析結果のす
べての言語変換を終了したかどうかが判定される。ステ
ップＳ４６において、解析結果のすべての言語変換を、
まだ終了していないと判定された場合、ステップＳ４１
に戻り、まだ言語変換していない解析結果が読み込ま
れ、以下、同様の処理が繰り返される。After the processing in step S45, step S4
Then, it is determined whether or not all the language conversions of the analysis result output by the text analysis unit 21 have been completed (step 6). In step S46, all language conversions of the analysis result are
If it is determined that the processing has not been completed, step S41
Then, the analysis result that has not been subjected to language conversion is read, and the same processing is repeated thereafter.

【０１７７】一方、ステップＳ４６において、解析結果
のすべての言語変換を終了したと判定された場合、リタ
ーンする。On the other hand, if it is determined in step S46 that all the language conversions of the analysis result have been completed, the process returns.

【０１７８】次に、図２０（Ｃ）のフローチャートを参
照して、機械翻訳部２のテキスト生成部２３が行うステ
ップＳ２５の処理の詳細について説明する。Next, the details of the processing in step S25 performed by the text generation unit 23 of the machine translation unit 2 will be described with reference to the flowchart in FIG.

【０１７９】テキスト生成部２３は、ステップＳ５１に
おいて、言語変換部２２からの言語変換結果の全部また
は一部を読み込み、ステップＳ５２に進み、その言語変
換結果を処理するのに、何らかのプロソディ情報が必要
かどうかを判定する。ステップＳ５２において、特に、
プロソディ情報が必要でないと判定された場合、ステッ
プＳ５３およびＳ５４をスキップして、ステップＳ５５
に進み、ステップＳ５１で読み込んだ言語変換結果を対
象に、目的言語のテキストが生成される。In step S51, the text generation unit 23 reads all or a part of the language conversion result from the language conversion unit 22, and proceeds to step S52, where some sort of prosody information is required to process the language conversion result. Is determined. In step S52,
If it is determined that prosody information is not necessary, steps S53 and S54 are skipped and step S55 is performed.
The text of the target language is generated for the language conversion result read in step S51.

【０１８０】また、ステップＳ５２において、何らかの
プロソディ情報が必要であると判定された場合、ステッ
プＳ５３，Ｓ５４に順次進み、テキスト生成部２３で
は、図２０（Ａ）のステップＳ３３，Ｓ３４における場
合とそれぞれ同様の処理が行われ、これにより、必要な
プロソディ情報が、音声認識部１から取得される。そし
て、ステップＳ５５に進み、そのプロソディ情報を用い
て、ステップＳ５１で読み込んだ言語変換結果を対象
に、目的言語のテキストが生成される。If it is determined in step S52 that some kind of prosody information is necessary, the flow advances to steps S53 and S54, and the text generation unit 23 executes the processing in steps S33 and S34 of FIG. A similar process is performed, whereby necessary prosody information is obtained from the voice recognition unit 1. Then, the process proceeds to step S55, and the target language text is generated using the prosody information and targeting the language conversion result read in step S51.

【０１８１】ステップＳ５５の処理後は、ステップＳ５
６に進み、言語変換部２２が出力する言語変換結果のす
べてについて、テキストの生成を終了したかどうかが判
定される。ステップＳ５６において、言語変換結果のす
べてについてのテキストの生成を、まだ終了していない
と判定された場合、ステップＳ５１に戻り、まだテキス
トを生成していない言語変換結果が読み込まれ、以下、
同様の処理が繰り返される。After the processing in step S55, step S5
Proceeding to 6, it is determined whether or not text generation has been completed for all of the language conversion results output by the language conversion unit 22. If it is determined in step S56 that the generation of the text for all of the language conversion results has not been completed yet, the process returns to step S51, where the language conversion results for which the text has not been generated are read.
A similar process is repeated.

【０１８２】一方、ステップＳ５６において、言語変換
結果のすべてについてのテキストの生成を終了したと判
定された場合、リターンする。On the other hand, if it is determined in step S56 that the generation of texts for all of the language conversion results has been completed, the process returns.

【０１８３】次に、図２１のフローチャートを参照し
て、機械翻訳部２や音声合成部３がリクエスト信号を出
力するとともに、音声認識部１や機械翻訳部２が、その
リクエスト信号に対する応答を出力する場合の、音声合
成部３の動作について説明する。Next, referring to the flowchart of FIG. 21, machine translation unit 2 and speech synthesis unit 3 output a request signal, and speech recognition unit 1 and machine translation unit 2 output a response to the request signal. The operation of the voice synthesizing unit 3 in the case of performing the operation will be described.

【０１８４】音声合成部３では、ステップＳ６１におい
て、機械翻訳部２からの翻訳結果の送信というイベント
が生じるまで、待ち時間がおかれ、そのイベントが生じ
ると、ステップＳ６２に進み、機械翻訳部２からの翻訳
結果が、テキスト解析部３１において解析され、その解
析結果が、規則合成部３２に供給される。規則合成部３
２では、ステップＳ６３において、テキスト解析部３１
が出力する解析結果に基づいて、規則音声合成が行わ
れ、合成音のディジタルデータが生成される。このディ
ジタルデータは、ＤＡ変換部３３に供給され、ＤＡ変換
部３３では、ステップＳ６４において、規則音声合成部
３２からのディジタルデータがＤ／Ａ変換され、これに
より、翻訳結果に対応する合成音が、スピーカ５から出
力される。そして、ステップＳ６１に戻り、以下、同様
の処理を繰り返す。The speech synthesizing unit 3 waits in step S61 until an event of transmission of the translation result from the machine translation unit 2 occurs. When the event occurs, the process proceeds to step S62, where the machine translation unit 2 Is translated in the text analysis unit 31, and the analysis result is supplied to the rule synthesis unit 32. Rule synthesis part 3
2, in step S63, the text analysis unit 31
The rule-based speech synthesis is performed based on the analysis result output by, and digital data of synthesized speech is generated. This digital data is supplied to the DA converter 33, which converts the digital data from the regular voice synthesizer 32 into digital data in step S64, and thereby generates a synthesized sound corresponding to the translation result. Are output from the speaker 5. Then, the process returns to step S61, and thereafter, the same processing is repeated.

【０１８５】なお、テキスト解析部３１、規則合成部３
２、およびＤＡ変換部３３は、その処理中に、音声認識
部１や機械翻訳部２において求められる情報が必要とな
った場合は、処理を中断し、その情報を要求するリクエ
スト信号を、音声認識部１や機械翻訳部２に送信する。
そして、そのリクエスト信号に対応して、音声認識部１
や機械翻訳部２から情報が送信されてくるのを待って、
処理を再開する。Note that the text analyzer 31 and the rule synthesizer 3
If the information required by the speech recognition unit 1 or the machine translation unit 2 becomes necessary during the processing, the DA conversion unit 33 suspends the processing and sends a request signal requesting the information to the speech It is transmitted to the recognition unit 1 and the machine translation unit 2.
Then, in response to the request signal, the speech recognition unit 1
And waiting for information to be sent from the machine translation unit 2,
Resume processing.

【０１８６】次に、図２２を参照して、図２１のステッ
プＳ６２，Ｓ６３の処理それぞれの詳細について説明す
る。Next, with reference to FIG. 22, the details of the processing in steps S62 and S63 in FIG. 21 will be described.

【０１８７】まず最初に、図２２（Ａ）のフローチャー
トを参照して、音声合成部３のテキスト解析部３１が行
うステップＳ６２の処理の詳細について説明する。First, the details of the processing in step S62 performed by the text analysis unit 31 of the speech synthesis unit 3 will be described with reference to the flowchart in FIG.

【０１８８】テキスト解析部３１は、ステップＳ７１に
おいて、機械翻訳部２からの入力、即ち、翻訳結果の全
部または一部を読み込み、ステップＳ７２に進み、その
翻訳結果を処理するのに、何らかのプロソディ情報（さ
らには、その他の情報）が必要かどうかを判定する。ス
テップＳ７２において、特に、プロソディ情報が必要で
ないと判定された場合、ステップＳ７３およびＳ７４を
スキップして、ステップＳ７５に進み、翻訳結果の解析
が行われる。In step S71, the text analysis unit 31 reads the input from the machine translation unit 2, that is, reads all or a part of the translation result, and proceeds to step S72. (And other information) is determined. In step S72, if it is determined that prosody information is not particularly necessary, steps S73 and S74 are skipped, and the process proceeds to step S75, where the translation result is analyzed.

【０１８９】また、ステップＳ７２において、何らかの
プロソディ情報が必要であると判定された場合、ステッ
プＳ７３に進み、テキスト解析部３１は、その必要なプ
ロソディ情報を要求するリクエスト信号（どの音声区間
の、どのような種類のプロソディ情報が必要なのかを含
む信号）を、音声認識部１または機械翻訳部２に出力す
る。そして、そのリクエスト信号に対応する応答として
の必要なプロソディ情報が、音声認識部１または機械翻
訳部２から送信されてくるのを待って、ステップＳ７４
に進む。ステップＳ７４では、音声認識部１または機械
翻訳部２から送信されてくる、リクエスト信号に対応す
る応答としての必要なプロソディ情報が受信され、ステ
ップＳ７５に進み、そのプロソディ情報を用いて、翻訳
結果が解析される。If it is determined in step S72 that some kind of prosody information is necessary, the process proceeds to step S73, where the text analysis unit 31 requests a request signal (for which voice section, which A signal including whether such kind of prosody information is required) is output to the speech recognition unit 1 or the machine translation unit 2. Then, the process waits until the necessary prosody information as a response corresponding to the request signal is transmitted from the voice recognition unit 1 or the machine translation unit 2, and then the process proceeds to step S74.
Proceed to. In step S74, necessary prosody information as a response to the request signal transmitted from the voice recognition unit 1 or the machine translation unit 2 is received, and the process proceeds to step S75, where the translation result is obtained using the prosody information. Is parsed.

【０１９０】ステップＳ７５の処理後は、ステップＳ７
６に進み、機械翻訳部２が出力する翻訳結果のすべての
解析を終了したかどうかが判定される。ステップＳ７６
において、翻訳結果のすべての解析を、まだ終了してい
ないと判定された場合、ステップＳ７１に戻り、まだ解
析していない翻訳結果が読み込まれ、以下、同様の処理
が繰り返される。After the processing in step S75, step S7
Then, it is determined whether the analysis of all the translation results output by the machine translation unit 2 has been completed or not. Step S76
In, when it is determined that all the analysis of the translation result has not been completed, the process returns to step S71, the translation result that has not been analyzed is read, and the same processing is repeated thereafter.

【０１９１】一方、ステップＳ７６において、翻訳結果
のすべての解析を終了したと判定された場合、リターン
する。On the other hand, if it is determined in step S76 that all the translation results have been analyzed, the process returns.

【０１９２】次に、図２２（Ｂ）のフローチャートを参
照して、音声合成部３の規則合成部３２が行うステップ
Ｓ６３の処理の詳細について説明する。Next, the details of the processing in step S63 performed by the rule synthesizing section 32 of the voice synthesizing section 3 will be described with reference to the flowchart in FIG.

【０１９３】規則合成部３２は、ステップＳ８１におい
て、テキスト解析部３１からの翻訳結果の解析結果の全
部または一部を読み込み、ステップＳ８２に進み、その
解析結果を処理するのに、何らかのプロソディ情報が必
要かどうかを判定する。ステップＳ８２において、特
に、プロソディ情報が必要でないと判定された場合、ス
テップＳ８３およびＳ８４をスキップして、ステップＳ
８５に進み、ステップＳ８５で読み込んだ解析結果にし
たがった規則音声合成が行われる。In step S81, the rule synthesizing unit 32 reads all or a part of the analysis result of the translation result from the text analysis unit 31, and proceeds to step S82, where some prosody information is used to process the analysis result. Determine if it is necessary. In step S82, in particular, when it is determined that prosody information is not necessary, steps S83 and S84 are skipped, and step S83 is skipped.
Proceeding to 85, rule speech synthesis is performed according to the analysis result read in step S85.

【０１９４】また、ステップＳ８２において、何らかの
プロソディ情報（さらには、その他の情報）が必要であ
ると判定された場合、ステップＳ８３，Ｓ８４に順次進
み、規則合成部３２では、図２２（Ａ）のステップＳ７
３，Ｓ７４における場合とそれぞれ同様の処理が行わ
れ、これにより、必要なプロソディ情報が、音声認識部
１または機械翻訳部２から取得される。そして、ステッ
プＳ８５に進み、そのプロソディ情報を用いながら、ス
テップＳ８１で読み込んだ解析結果にしたがった規則音
声合成が行われる。If it is determined in step S82 that some kind of prosody information (further, other information) is necessary, the process proceeds to steps S83 and S84 in order, and the rule synthesizing unit 32 executes the process shown in FIG. Step S7
3, the same processing as in the case of S74 is performed, whereby necessary prosody information is obtained from the speech recognition unit 1 or the machine translation unit 2. Then, the process proceeds to step S85, in which rule speech synthesis is performed according to the analysis result read in step S81 while using the prosody information.

【０１９５】ステップＳ８５の処理後は、ステップＳ８
６に進み、テキスト解析部３１が出力する解析結果のす
べてについての規則音声合成を終了したかどうかが判定
される。ステップＳ８６において、解析結果のすべてに
ついての規則音声合成を、まだ終了していないと判定さ
れた場合、ステップＳ８１に戻り、まだ規則音声合成し
ていない解析結果が読み込まれ、以下、同様の処理が繰
り返される。After the processing in step S85, step S8
Proceeding to 6, it is determined whether or not ruled speech synthesis has been completed for all of the analysis results output by the text analysis unit 31. If it is determined in step S86 that the rule-based speech synthesis has not been completed for all the analysis results, the process returns to step S81, where the analysis result for which the rule-based speech synthesis has not been performed is read. Repeated.

【０１９６】一方、ステップＳ８６において、解析結果
のすべてについての規則音声合成を終了したと判定され
た場合、リターンする。On the other hand, if it is determined in step S86 that the ruled speech synthesis has been completed for all the analysis results, the process returns.

【０１９７】以上のように、音声認識部１や機械翻訳部
２において、機械翻訳部２や音声合成部３からのリクエ
スト信号に応じて、プロソディ情報を提供するようにし
た場合には、そのリクエスト信号によって要求されてい
るプロソディ情報だけを抽出すれば済むので、処理の効
率化を図ることができる。As described above, when the speech recognition unit 1 and the machine translation unit 2 provide prosody information in response to the request signal from the machine translation unit 2 and the speech synthesis unit 3, the request Since only the prosody information required by the signal needs to be extracted, the processing efficiency can be improved.

【０１９８】なお、本実施の形態では、音声認識部１か
ら機械翻訳部２に対して、音声認識結果を、常に送信す
るようにしたが、音声認識部１から機械翻訳部２に対し
ては、機械翻訳部２から必要な音声認識結果を要求する
リクエスト信号が送信されてきた場合にのみ、その必要
な音声認識結果を送信するようにすることが可能であ
る。In the present embodiment, the speech recognition result is always transmitted from the speech recognition unit 1 to the machine translation unit 2. However, the speech recognition unit 1 transmits the speech recognition result to the machine translation unit 2. Only when a request signal requesting a required speech recognition result is transmitted from the machine translation unit 2, the necessary speech recognition result can be transmitted.

【０１９９】即ち、例えば、機械翻訳部２では、言語変
換部２２において、音声認識結果と言語変換データとの
マッチングが行われ、これにより音声認識結果に最も合
致する言語変換データが検出されるが、このマッチング
においては、発話に含まれる一部の語句しか必要としな
い場合がある。具体的には、例えば、言語変換データ
が、述語（動詞など）ごとに分類されている場合におい
ては（例えば、「○○を下さい」や「○○を食べます」
など）、マッチングにおいて最初に必要となる情報は、
発話中の述語と、その述語に係る語句であり、また、例
えば、言語変換データが助詞（例えば、「○○から×
×」や、「○○へ××」など）ごとに分類されている場
合においては、マッチングにおいて最初に必要となる情
報は、発話中の助詞と、その前後に配置されている語句
であり、発話に含まれる一部の語句しか必要としないこ
とがある。なお、発話の内の一部の語句と言語変換デー
タとのマッチングには、上述した論理式やベクトル空間
法などの手法を用いることができる。That is, for example, in the machine translation section 2, the language conversion section 22 performs matching between the speech recognition result and the language conversion data, thereby detecting the language conversion data that best matches the speech recognition result. In this matching, only some words included in the utterance may be required. Specifically, for example, when the language conversion data is classified for each predicate (such as a verb) (for example, “Please give XX” or “Eat XX”)
), The first piece of information needed for matching is
A predicate in the utterance and a phrase related to the predicate. For example, the language conversion data is a particle (for example, “from OO to ×
× ”or“ XX to XX ”), the first information needed for matching is the particles in the utterance and the words placed before and after it. Sometimes only some of the words included in the utterance are needed. It should be noted that a method such as the above-described logical expression or the vector space method can be used to match a part of words in the utterance with the language conversion data.

【０２００】この場合、音声認識部１から機械翻訳部２
に対しては、最初は、発話に含まれる一部の語句の音声
認識結果だけを供給し、使用する言語変換データが検出
されてから、発話全体の音声認識結果を供給すれば十分
である。In this case, the speech recognition unit 1 switches to the machine translation unit 2
It is sufficient to initially supply only the speech recognition results of some phrases included in the utterance, and supply the speech recognition results of the entire utterance after the language conversion data to be used is detected.

【０２０１】そこで、音声認識部１および機械翻訳部２
の間では、音声認識結果について、図１７で説明した場
合と同様のやりとりを行わせることが可能である。Therefore, the speech recognition unit 1 and the machine translation unit 2
In the meantime, the same exchange as in the case described with reference to FIG. 17 can be performed on the speech recognition result.

【０２０２】即ち、この場合、音声認識部１において
は、発話が音声認識され、その認識結果が得られると、
その旨を示すイベントが、機械翻訳部２に送信される。
機械翻訳部２は、音声認識部１から、音声認識結果が得
られた旨のイベントを受信すると、その音声認識結果を
処理するのに、最初に必要となる語句を要求するリクエ
スト信号を、音声認識部１に送信する。音声認識部１
は、機械翻訳部２からリクエスト信号を受信すると、そ
のリクエスト信号によって要求されている語句の音声認
識結果を、機械翻訳部２に送信する。That is, in this case, the speech recognition section 1 recognizes the utterance by speech and obtains the recognition result.
An event indicating this is transmitted to the machine translation unit 2.
When the machine translation unit 2 receives an event indicating that a speech recognition result has been obtained from the speech recognition unit 1, the machine translation unit 2 outputs a speech request signal for requesting a phrase first necessary for processing the speech recognition result. It is transmitted to the recognition unit 1. Voice recognition unit 1
Receives the request signal from the machine translation unit 2 and transmits the speech recognition result of the phrase requested by the request signal to the machine translation unit 2.

【０２０３】機械翻訳部２では、音声認識部１から、リ
クエスト信号に対応して送信されてくる語句の音声認識
結果の解析が行われ、さらに、その解析結果に基づい
て、適切な言語変換データが検出される。そして、機械
翻訳部２は、その言語変換データを用いて、音声認識結
果の言語変換（翻訳）を行うのに必要な語句を求め、そ
の語句を要求するリクエスト信号を、音声認識部１に送
信する。[0203] The machine translation unit 2 analyzes the speech recognition result of the phrase transmitted from the speech recognition unit 1 in response to the request signal. Is detected. Then, the machine translation unit 2 uses the language conversion data to obtain a phrase necessary for performing language conversion (translation) of the speech recognition result, and transmits a request signal requesting the phrase to the speech recognition unit 1. I do.

【０２０４】音声認識部１は、このようにして機械翻訳
部２から送信されてくるリクエスト信号によって要求さ
れている語句を、発話の音声認識結果から抽出し、機械
翻訳部２に送信する。そして、機械翻訳部２では、音声
認識部１から送信されてくる音声認識結果の語句を用い
て、音声認識結果を目的言語に変換した翻訳文を生成す
る。The speech recognition unit 1 extracts the phrase requested by the request signal transmitted from the machine translation unit 2 from the speech recognition result of the utterance, and transmits the extracted phrase to the machine translation unit 2. Then, the machine translation unit 2 uses the words of the speech recognition result transmitted from the speech recognition unit 1 to generate a translated sentence obtained by converting the speech recognition result into the target language.

【０２０５】以上のように、音声認識部１から機械翻訳
部２に対して、機械翻訳部２から要求のあった音声認識
結果の語句だけを送信するようにした場合には、処理効
率を向上させ、また、音声の認識率の低下を防止するこ
とが可能となる。As described above, when only the words of the speech recognition result requested by the machine translation unit 2 are transmitted from the speech recognition unit 1 to the machine translation unit 2, the processing efficiency is improved. It is also possible to prevent a reduction in the voice recognition rate.

【０２０６】即ち、機械翻訳部２では、発話に含まれて
いる語句であっても、その語句が、翻訳結果に反映され
ない場合があり、そのような語句が、音声認識部１から
機械翻訳部２に供給されなくなる結果、処理効率が向上
する。また、音声認識部１では、音声認識結果の候補が
複数得られることがあるが、機械翻訳部２において、そ
のような複数の音声認識結果の候補を、必要に応じて要
求するようにすることで、音声認識部１が、その複数の
音声認識結果の候補の中から、ある候補を、最終的な音
声認識結果として決定して出力することに起因する音声
の認識率の低下を防止することが可能となる。That is, in the machine translation unit 2, even if the phrase is included in the utterance, the phrase may not be reflected in the translation result, and such a phrase is transmitted from the speech recognition unit 1 to the machine translation unit. As a result, the processing efficiency is improved. In addition, the speech recognition unit 1 may obtain a plurality of candidates for speech recognition results. The machine translation unit 2 may request such a plurality of candidates for speech recognition results as necessary. Then, the voice recognition unit 1 prevents a reduction in the voice recognition rate due to determining and outputting a certain candidate as a final voice recognition result from the plurality of voice recognition result candidates. Becomes possible.

【０２０７】次に、上述した一連の処理は、ハードウェ
アにより行うこともできるし、ソフトウェアにより行う
こともできる。一連の処理をソフトウェアによって行う
場合には、そのソフトウェアを構成するプログラムが、
専用のハードウェアとしての音声翻訳システムに組み込
まれているコンピュータや、汎用のコンピュータ等にイ
ンストールされる。Next, the above-described series of processing can be performed by hardware or software. When a series of processing is performed by software, a program constituting the software is
It is installed in a computer incorporated in a speech translation system as dedicated hardware, a general-purpose computer, or the like.

【０２０８】そこで、図２３を参照して、上述した一連
の処理を実行するプログラムをコンピュータにインスト
ールし、コンピュータによって実行可能な状態とするた
めに用いられる、そのプログラムが記録されている記録
媒体について説明する。With reference to FIG. 23, a recording medium on which the program for executing the above-described series of processes is recorded and which is used to install the program in a computer and make it executable by the computer will be described. explain.

【０２０９】プログラムは、図２３（Ａ）に示すよう
に、コンピュータ１０１に内蔵されている記録媒体とし
てのハードディスク１０２や半導体メモリ１０３に予め
記録しておくことができる。As shown in FIG. 23A, the program can be recorded in advance on a hard disk 102 or a semiconductor memory 103 as a recording medium built in the computer 101.

【０２１０】あるいはまた、プログラムは、図２３
（Ｂ）に示すように、フロッピーディスク１１１、CD-R
OM(Compact Disc Read Only Memory)１１２，MO(Magnet
o optical)ディスク１１３，DVD(Digital Versatile Di
sc)１１４、磁気ディスク１１５、半導体メモリ１１６
などの記録媒体に、一時的あるいは永続的に格納（記
録）しておくことができる。このような記録媒体は、い
わゆるパッケージソフトウエアとして提供することがで
きる。Alternatively, the program is executed as shown in FIG.
As shown in (B), the floppy disk 111 and the CD-R
OM (Compact Disc Read Only Memory) 112, MO (Magnet
o optical) disc 113, DVD (Digital Versatile Di)
sc) 114, magnetic disk 115, semiconductor memory 116
Can be temporarily (permanently) stored (recorded) in a recording medium such as. Such a recording medium can be provided as so-called package software.

【０２１１】なお、プログラムは、上述したような記録
媒体からコンピュータにインストールする他、図２３
（Ｃ）に示すように、ダウンロードサイト１２１から、
ディジタル衛星放送用の人工衛星１２２を介して、コン
ピュータ１０１に無線で転送したり、LAN(Local Area N
etwork)、インターネットといったネットワーク１３１
を介して、コンピュータ１０１に有線で転送し、コンピ
ュータ１０１において、内蔵するハードディスク１０２
などにインストールすることができる。The program is installed in the computer from the recording medium as described above,
As shown in (C), from the download site 121,
The data is wirelessly transferred to the computer 101 via an artificial satellite 122 for digital satellite broadcasting, or a LAN (Local Area N
etwork), the Internet 131
Is transferred to the computer 101 via a cable, and the computer 101
And so on.

【０２１２】ここで、本明細書において、コンピュータ
に各種の処理を行わせるためのプログラムを記述する処
理ステップは、必ずしもフローチャートとして記載され
た順序に沿って時系列に処理する必要はなく、並列的あ
るいは個別に実行される処理（例えば、並列処理あるい
はオブジェクトによる処理）も含むものである。Here, in the present specification, processing steps for writing a program for causing a computer to perform various kinds of processing do not necessarily have to be processed in chronological order in the order described in the flowchart, and may be performed in parallel. Alternatively, it also includes processing executed individually (for example, parallel processing or processing by an object).

【０２１３】また、プログラムは、１のコンピュータに
より処理されるものであっても良いし、複数のコンピュ
ータによって分散処理されるものであっても良い。さら
に、プログラムは、遠方のコンピュータに転送されて実
行されるものであっても良い。The program may be processed by one computer, or may be processed by a plurality of computers in a distributed manner. Further, the program may be transferred to a remote computer and executed.

【０２１４】次に、図２４は、図２３のコンピュータ１
０１の構成例を示している。Next, FIG. 24 shows the computer 1 of FIG.
1 shows a configuration example.

【０２１５】コンピュータ１０１は、図２４に示すよう
に、CPU(Central Processing Unit)１４２を内蔵してい
る。CPU１４２には、バス１４１を介して、入出力イン
タフェース１４５が接続されており、CPU１４２は、入
出力インタフェース１４５を介して、ユーザによって、
キーボードやマウス等で構成される入力部１４７が操作
されることにより指令が入力されると、それにしたがっ
て、図２３（Ａ）の半導体メモリ１０３に対応するROM
(Read Only Memory)１４３に格納されているプログラム
を実行する。あるいは、また、CPU１４２は、ハードデ
ィスク１０２に格納されているプログラム、衛星１２２
若しくはネットワーク１３１から転送され、通信部１４
８で受信されてハードディスク１０２にインストールさ
れたプログラム、またはドライブ１４９に装着されたフ
ロッピディスク１１１、CD-ROM１１２、MOディスク１１
３、DVD１１４、若しくは磁気ディスク１１５から読み
出されてハードディスク１０２にインストールされたプ
ログラムを、RAM(Random Access Memory)１４４にロー
ドして実行する。そして、CPU１４２は、その処理結果
を、例えば、入出力インタフェース１４５を介して、LC
D(Liquid CryStal Display)やスピーカ等で構成される
出力部１４６から、必要に応じて出力させる。As shown in FIG. 24, the computer 101 has a built-in CPU (Central Processing Unit) 142. An input / output interface 145 is connected to the CPU 142 via a bus 141, and the CPU 142 is connected to the CPU 142 by the user via the input / output interface 145.
When a command is input by operating an input unit 147 including a keyboard, a mouse, and the like, a ROM corresponding to the semiconductor memory 103 in FIG.
(Read Only Memory) The program stored in 143 is executed. Alternatively, the CPU 142 may execute a program stored in the hard disk 102,
Alternatively, the data is transferred from the network 131 and the communication unit 14
8, the program installed on the hard disk 102 or the floppy disk 111, the CD-ROM 112, and the MO disk 11 mounted on the drive 149.
3. A program read from the DVD 114 or the magnetic disk 115 and installed on the hard disk 102 is loaded into a RAM (Random Access Memory) 144 and executed. Then, the CPU 142 transmits the processing result to the LC via the input / output interface 145, for example.
An output unit 146 including a D (Liquid CryStal Display), a speaker, and the like is output as needed.

【０２１６】なお、本実施の形態では、音声を認識し、
その音声認識結果を翻訳するようにしたが、本発明は、
キーボード等を操作して入力された文を翻訳する場合に
も適用可能である。In this embodiment, speech is recognized,
Although the speech recognition result was translated, the present invention
The present invention is also applicable to the case where a sentence input by operating a keyboard or the like is translated.

【０２１７】また、本実施の形態では、翻訳結果を合成
音で出力するようにしたが、本発明は、翻訳結果を、テ
キストで表示する場合にも適用可能である。In the present embodiment, the translation result is output as a synthesized sound. However, the present invention can be applied to a case where the translation result is displayed as a text.

【０２１８】さらに、本発明は、日英や英日以外の翻訳
にも適用可能である。Furthermore, the present invention can be applied to translations between Japanese and English and languages other than English and Japanese.

【０２１９】[0219]

【発明の効果】本発明の翻訳装置および翻訳方法、並び
に記録媒体によれば、入力文を、翻訳文に翻訳するため
の対応関係が、第１または第２の言語のうちの少なくと
も一方のプロソディ情報とともに記述されているテーブ
ルに基づいて、入力文が、その入力文に対応する翻訳文
に翻訳される。従って、精度の高い翻訳を行い、また、
翻訳文を、違和感のない合成音で出力することが可能と
なる。According to the translating apparatus, the translating method, and the recording medium of the present invention, the correspondence for translating the input sentence into the translated sentence is at least one of the first and second languages. Based on the table described with the information, the input sentence is translated into a translated sentence corresponding to the input sentence. Therefore, high-precision translation is performed,
It is possible to output a translated sentence with a synthetic sound that does not cause discomfort.

[Brief description of the drawings]

【図１】従来の音声翻訳システムの一例の構成を示すブ
ロック図である。FIG. 1 is a block diagram showing a configuration of an example of a conventional speech translation system.

【図２】本発明を適用した音声翻訳システムの一実施の
形態の構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of an embodiment of a speech translation system to which the present invention has been applied.

【図３】本発明を適用した音声翻訳システムの外観構成
例を示す平面図である。FIG. 3 is a plan view showing an example of an external configuration of a speech translation system to which the present invention is applied.

【図４】音声認識部１の構成例を示すブロック図であ
る。FIG. 4 is a block diagram illustrating a configuration example of a voice recognition unit 1;

【図５】機械翻訳部２の構成例を示すブロック図であ
る。FIG. 5 is a block diagram illustrating a configuration example of a machine translation unit 2.

【図６】言語変換データを示す図である。FIG. 6 is a diagram showing language conversion data.

【図７】音声合成部３の構成例を示すブロック図であ
る。FIG. 7 is a block diagram illustrating a configuration example of a speech synthesis unit 3.

【図８】図２の音声翻訳システムの動作を説明するため
のフローチャートである。FIG. 8 is a flowchart illustrating the operation of the speech translation system in FIG. 2;

【図９】ワードグラフを示す図である。FIG. 9 is a diagram showing a word graph.

【図１０】図９に続く図である。FIG. 10 is a view following FIG. 9;

【図１１】プロソディデータおよびプロソディタグを示
す図である。FIG. 11 is a diagram showing prosody data and prosody tags.

【図１２】図２の音声翻訳システムの動作を説明するた
めの図である。FIG. 12 is a diagram for explaining the operation of the speech translation system in FIG. 2;

【図１３】図３の音声翻訳システムの動作を説明するた
めの平面図である。FIG. 13 is a plan view for explaining the operation of the speech translation system in FIG. 3;

【図１４】図３の音声翻訳システムの動作を説明するた
めの平面図である。FIG. 14 is a plan view for explaining the operation of the speech translation system in FIG. 3;

【図１５】図３の音声翻訳システムの動作を説明するた
めの平面図である。FIG. 15 is a plan view for explaining the operation of the speech translation system in FIG. 3;

【図１６】図３の音声翻訳システムの動作を説明するた
めの平面図である。FIG. 16 is a plan view for explaining the operation of the speech translation system in FIG. 3;

【図１７】音声認識部１、機械翻訳部２、および音声合
成部３の間でのやりとりを説明するための図である。FIG. 17 is a diagram for explaining exchanges among the speech recognition unit 1, the machine translation unit 2, and the speech synthesis unit 3.

【図１８】音声認識部１の動作を説明するためのフロー
チャートである。FIG. 18 is a flowchart for explaining the operation of the speech recognition unit 1.

【図１９】機械翻訳部２の動作を説明するためのフロー
チャートである。FIG. 19 is a flowchart for explaining the operation of the machine translation unit 2;

【図２０】図１９のステップＳ２３乃至Ｓ２５それぞれ
の処理を説明するためのフローチャートである。FIG. 20 is a flowchart for explaining processing in steps S23 to S25 in FIG. 19;

【図２１】音声合成部３の動作を説明するためのフロー
チャートである。FIG. 21 is a flowchart for explaining the operation of the speech synthesizer 3;

【図２２】図２１のステップＳ６２，Ｓ６３それぞれの
処理を説明するためのフローチャートである。FIG. 22 is a flowchart for explaining processing in steps S62 and S63 in FIG. 21;

【図２３】本発明を適用した記録媒体を説明するための
図である。FIG. 23 is a diagram illustrating a recording medium to which the present invention has been applied.

【図２４】図２３のコンピュータ１０１の構成例を示す
ブロック図である。24 is a block diagram illustrating a configuration example of a computer 101 in FIG.

[Explanation of symbols]

１音声認識部，２機械翻訳部，３音声合成
部，４表示部，５スピーカ，６操作部，６
Ａカーソルキー，６Ｂ決定キー，６Ｃキャンセ
ルキー，７制御部，１１マイク（マイクロフォ
ン），１２ＡＤ変換部，１３特徴抽出部，１４
バッファ部，１４Ａ音声データバッファ，１４
Ｂ特徴量バッファ，１５マッチング部，１６
音響モデルデータベース，１７辞書データベース，
１８文法データベース，１９プロソディ情報抽出
部，２１テキスト解析部，２２言語変換部，
２３テキスト生成部，２４辞書データベース，
２５解析用文法データベース，２６言語変換デー
タベース，２７辞書データベース，２８生成用
文法データベース，３１テキスト解析部，３２
規則合成部，３３ＤＡ変換部，３４辞書データベ
ース，３５解析用文法データベース，３６音素
片データベース，１０１コンピュータ，１０２
ハードディスク，１０３半導体メモリ，１１１
フロッピーディスク，１１２ CD-ROM，１１３ MO
ディスク，１１４ DVD，１１５磁気ディスク，
１１６半導体メモリ，１２１ダウンロードサイ
ト，１２２衛星，１３１ネットワーク，１４１
バス，１４２ CPU，１４３ ROM，１４４ RA
M，１４５入出力インタフェース，１４６出力
部，１４７入力部，１４８通信部，１４９ド
ライブ1 voice recognition unit, 2 machine translation unit, 3 voice synthesis unit, 4 display unit, 5 speaker, 6 operation unit, 6
A cursor key, 6B decision key, 6C cancel key, 7 control unit, 11 microphone (microphone), 12 AD conversion unit, 13 feature extraction unit, 14
Buffer section, 14A audio data buffer, 14
B feature buffer, 15 matching unit, 16
Acoustic model database, 17 dictionary database,
18 grammar database, 19 prosody information extraction unit, 21 text analysis unit, 22 language conversion unit,
23 text generator, 24 dictionary database,
25 grammar database for analysis, 26 language conversion database, 27 dictionary database, 28 grammar database for generation, 31 text analyzer, 32
Rule synthesis unit, 33 DA conversion unit, 34 dictionary database, 35 grammar database for analysis, 36 phoneme segment database, 101 computer, 102
Hard disk, 103 semiconductor memory, 111
Floppy disk, 112 CD-ROM, 113 MO
Disk, 114 DVD, 115 magnetic disk,
116 semiconductor memory, 121 download site, 122 satellite, 131 network, 141
Bus, 142 CPU, 143 ROM, 144 RA
M, 145 input / output interface, 146 output section, 147 input section, 148 communication section, 149 drive

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 ５５１Ｃ (72)発明者岸秀樹東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者浅野康治東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 5B091 AA05 AA06 AA15 BA03 CA21 CB12 CB32 CC01 EA00 5D015 CC13 CC14 KK02 KK04 5D045 AA07 AB03 9A001 HH14 HZ17 HZ18 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification code FI Theme coat ゛ (Reference) G10L 3/00 551C (72) Inventor Hideki Kishi 6-7-35 Kita Shinagawa, Shinagawa-ku, Tokyo Sony Stock In-house (72) Inventor Koji Asano 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo F-term in Sony Corporation (reference) 5B091 AA05 AA06 AA15 BA03 CA21 CB12 CB32 CC01 EA00 5D015 CC13 CC14 KK02 KK04 5D045 AA07 AB03 9A001 HH14 HZ17 HZ18

Claims

[Claims]

1. A translation device for translating an input sentence in a first language and outputting a translated sentence in a second language, comprising: input means for inputting the input sentence; The input sentence is converted into the translated sentence corresponding to the input sentence based on a table in which the correspondence for translating the input sentence is described together with the prosody information of at least one of the first and second languages. A translation device, comprising: a translation unit that translates; and an output unit that outputs the translated sentence.

2. The translation according to claim 1, wherein said input means includes a voice recognition means for recognizing a voice, and outputs a result of said voice recognition by said voice recognition means as said input sentence. apparatus.

3. The voice recognition means includes extraction means for extracting prosody information of the voice, outputs a voice recognition result of the voice together with the prosody information of the voice, and the translator includes the voice recognition 3. The apparatus according to claim 2, wherein the result is translated using the prosody information.

4. The apparatus according to claim 1, further comprising a request unit for requesting the process information to the voice recognition unit, wherein the voice recognition unit outputs the process information when requested by the request unit. The translation device according to claim 3, wherein

5. The translation according to claim 1, wherein the output unit includes a speech synthesis unit that generates a synthesized speech corresponding to the translated sentence, and outputs the translated sentence as a synthesized speech. apparatus.

6. The translation means outputs the translated sentence together with its prosody information, and the speech synthesis means outputs a synthesized sound corresponding to the translated sentence,
The translation device according to claim 5, wherein the translation is generated using the prosody information.

7. The voice synthesizing means includes a requesting means for requesting the prosody information to the translating means, and the translating means transmits the prosody information when requested by the requesting means. The translation device according to claim 6, wherein the translation is output.

8. The apparatus according to claim 1, further comprising a storage unit that stores the table.

9. A translation method for translating an input sentence in a first language and outputting a translated sentence in a second language, comprising: an inputting step of inputting the input sentence; The input sentence is converted into the translated sentence corresponding to the input sentence based on a table in which the correspondence for translating the input sentence is described together with the prosody information of at least one of the first and second languages. A translation method, comprising: a translation step of translating; and an output step of outputting the translation sentence.

10. A recording medium storing a program for causing a computer to perform a translation process of translating an input sentence in a first language and outputting a translated sentence in a second language, wherein the input sentence is An input step of inputting, and a correspondence relationship for translating the input sentence into the translated sentence is based on a table in which prosody information of at least one of the first or second language is described. A recording medium recording a program including a translation step of translating an input sentence into the translation sentence corresponding to the input sentence, and an output step of outputting the translated sentence.