JPH10116093A

JPH10116093A - Voice recognition device

Info

Publication number: JPH10116093A
Application number: JP8268683A
Authority: JP
Inventors: Atsuko Motoki; 敦子元木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-10-09
Filing date: 1996-10-09
Publication date: 1998-05-06

Abstract

PROBLEM TO BE SOLVED: To provide the voice recognition device which always recognizes voices even through the device is used by the person whose mother language is other than the language that is recognized by the device. SOLUTION: When voices are inputted from the external through a voice inputting section 1, a Japanese voice recognition processing section 2 obtains the recognition likelihood in terms of Japanese against the inputted data and moreover, an English voice recognition processing section 3 obtains the recognition likelihood in terms of English against the inputted voice data. Then, a recognition likelihood comparison processing section 5 compares the respective recognition likelihoods obtained by the sections 2 and 3 and the language, which has a largest recognition likelihood, is outputted as the recognition result.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識装置に関
し、特に、認識可能な言語を母国語としない人がその言
語を発声した場合においても認識可能な音声認識装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device, and more particularly to a speech recognition device capable of recognizing even a person who does not speak a recognizable language as its native language.

【０００２】[0002]

【従来の技術】従来より、音声認識においては、種々の
手法が提案されており、入力される音声をある程度の認
識率で認識することができる様々な装置が開発されてい
る。2. Description of the Related Art Conventionally, various techniques have been proposed for speech recognition, and various devices have been developed which can recognize input speech at a certain recognition rate.

【０００３】従来の音声認識装置においては、予め、認
識可能な言語の音声モデルが格納されており、認識すべ
き音声が入力されると、入力された音声が音声モデルと
比較され、その比較によって音声認識が行われている。In a conventional speech recognition apparatus, a speech model of a recognizable language is stored in advance, and when a speech to be recognized is input, the input speech is compared with the speech model. Voice recognition is being performed.

【０００４】従来の音声認識装置の一例として、特開平
６−２７９８８号公報に、認識すべき音声が入力された
場合に、認識対象となるキーワードのＨＭＭ（Hidden M
arkov Model）に対する尤度を求め、それにより、音声
認識を行う音声認識装置が開示されている。As an example of a conventional speech recognition apparatus, Japanese Patent Laid-Open No. 6-27988 discloses an HMM (Hidden MMM) of a keyword to be recognized when a speech to be recognized is input.
A speech recognition apparatus that obtains likelihood with respect to an arkov model and thereby performs speech recognition is disclosed.

【０００５】[0005]

【発明が解決しようとする課題】上述したような従来の
音声認識装置においては、認識可能な言語を母国語とす
る人の発声を基本パターンとして音声モデルが生成され
ているため、認識可能な言語以外の言語を母国語とし、
かつ、認識可能な言語の発音が基本パターンに近くない
人が装置を使用した場合、その人の音声が認識されない
虞れがある。In the above-described conventional speech recognition apparatus, a speech model is generated using a basic pattern of a person whose native language is a recognizable language. Other than the native language,
In addition, when a person whose recognizable language pronunciation is not close to the basic pattern uses the device, there is a possibility that his / her voice may not be recognized.

【０００６】例えば、英語を認識する音声認識装置にお
いては、英語を母国語とする人の発声を基本パターンと
して音声モデルが生成されているため、英語を母国語と
しない日本人がその装置を使用する場合、日本人の中に
も、英語の発音が英語を母国語とする人に近い発音の人
や、日本語に近い発音（カタカナ外来語）の人がいるた
め、英語の発音が日本語に近い発音の人の音声について
は認識されない虞れがある。For example, in a speech recognition apparatus for recognizing English, since a speech model is generated based on the utterance of a person whose native language is English, a Japanese who does not speak English uses the apparatus. If you do, some of the Japanese have pronunciations that are similar to those whose native English is English, or have pronunciations that are similar to Japanese (Katakana foreign words). There is a possibility that the voice of a person whose pronunciation is close to "" may not be recognized.

【０００７】上述した問題は、例えば、コックピット内
において生じる。コックピット内においては、様々な発
音をする人が存在しており、英語の発音が英語を母国語
とする人に近い発音の人や、日本語に近い発音（カタカ
ナ外来語）の人がいるからである。[0007] The above-mentioned problem occurs, for example, in a cockpit. In the cockpit, there are people who have various pronunciations, and some of the pronunciations of English are similar to those whose native language is English, and some of them are similar to Japanese (Katakana foreign language). It is.

【０００８】ここで、上述した個人差による問題におい
ては、英語の発音が英語を母国語とする人に近い発音の
人については英語音声認識手法を、また、日本語に近い
発音（カタカナ外来語）の人については日本語認識手法
を用いれば解決することができるが、その切り替えは自
動で行われていないため、予め、その人の発声がどちら
に属するのかを調ベておき、その時々で切り替えを行わ
なければならない。Here, in the problem caused by individual differences described above, an English voice recognition method is used for a person whose pronunciation in English is close to a person whose native language is English, and a pronunciation similar to Japanese (Katakana foreign language) is used. ) Can be solved by using the Japanese language recognition method, but the switching is not performed automatically, so check in advance to which voice the person belongs, and A switch must be made.

【０００９】また、日本人による英語発声においては、
同一人物であっても、常に一定の発音をするわけではな
く、英語的な発声になったり、日本語的な発声になった
りすることがあるため、装置を使用する人毎に設定を切
り替えたとしても認識されない部分が生じてしまう虞れ
がある。[0009] Also, in English utterances by Japanese,
Even for the same person, the pronunciation is not always the same, but the pronunciation may be English or Japanese, so the setting was switched for each person who uses the device However, there is a possibility that a part that is not recognized may occur.

【００１０】本発明は、上述したような従来の技術が有
する問題点に鑑みてなされたものであって、認識可能な
言語以外の言語を母国語とする人が使用した場合におい
ても、常に音声を認識することができる音声認識装置を
提供することを目的とする。[0010] The present invention has been made in view of the above-described problems of the conventional technology, and is intended to always provide voice even when a person whose native language is a language other than a recognizable language is used. It is an object of the present invention to provide a voice recognition device capable of recognizing a speech.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するため
に本発明は、入力された音声に対して第１の言語及び第
２の言語として認識を行い、認識尤度が最も大きな言語
を認識結果として出力する音声認識装置であって、音声
が入力され、入力された音声を符号化する音声入力部
と、該音声入力部において符合化されたデータに対して
前記第１の言語としての認識尤度を求める第１の音声認
識処理部と、前記音声入力部において符号化されたデー
タに対して前記第２の言語として認識尤度を求める第２
の音声認識処理部と、前記第１及び第２の音声認識処理
部において求められた認識尤度を正規化する認識尤度正
規化処理部と、該認識尤度正規化処理部において正規化
された認識尤度を比較演算する認識尤度比較処理部と、
該認識尤度比較処理部における比較演算により認識尤度
が最も大きな言語を認識結果として出力する認識結果出
力部とを有することを特徴とする。In order to achieve the above object, the present invention recognizes an input speech as a first language and a second language, and recognizes a language having the highest likelihood of recognition. A speech recognition device for outputting a result, wherein a speech is inputted, and a speech input unit for encoding the inputted speech, and recognition of the data encoded in the speech input unit as the first language. A first speech recognition processing unit for finding a likelihood, and a second speech recognition unit for finding a recognition likelihood as the second language for the data encoded in the speech input unit.
A speech recognition processing unit, a recognition likelihood normalization processing unit that normalizes the recognition likelihood obtained in the first and second speech recognition processing units, and a recognition likelihood normalization processing unit A recognition likelihood comparison processing unit that performs a comparison operation of the recognized likelihoods,
A recognition result output unit that outputs a language having the highest recognition likelihood as a recognition result by the comparison operation in the recognition likelihood comparison processing unit.

【００１２】また、前記第１及び第２の音声認識処理部
は、予め、前記認識尤度を求めるために入力されるデー
タと比較するための音声モデルが格納されていることを
特徴とする。Further, the first and second speech recognition processing units are characterized in that a speech model for comparison with data inputted for obtaining the recognition likelihood is stored in advance.

【００１３】また、前記第１の言語は日本語であり、前
記第２の言語は英語であることを特徴とする。[0013] The first language is Japanese and the second language is English.

【００１４】（作用）上記のように構成された本発明に
おいては、外部から音声入力部を介して音声が入力され
ると、第１及び第２の音声認識処理部のそれぞれにおい
て、入力された音声データに対して第１及び第２の言語
としての認識尤度が求められ、その後、認識尤度比較処
理部において、第１及び第２の音声認識処理部にてそれ
ぞれ求められた認識尤度が比較され、その結果、認識尤
度が最も大きな言語が認識結果として認識結果出力部６
を介して出力される。(Operation) In the present invention configured as described above, when a voice is input from the outside via the voice input unit, the input is performed in each of the first and second voice recognition processing units. Recognition likelihoods as first and second languages are obtained for the speech data, and thereafter, in a recognition likelihood comparison processing unit, the recognition likelihoods obtained in the first and second speech recognition processing units, respectively. Are compared, and as a result, the language with the highest recognition likelihood is recognized as the recognition result by the recognition result output unit 6.
Is output via.

【００１５】このようにして、入力された音声に対して
複数の言語としての認識尤度がそれぞれ求められ、認識
尤度が最も大きな言語が認識結果として出力されるの
で、第１の言語を母国語とする人により第２の言語が入
力された場合においても、発音の差異によって認識され
ない部分が生じる虞れはない。In this way, the recognition likelihood as a plurality of languages is obtained for the input speech, and the language having the largest recognition likelihood is output as a recognition result. Even when the second language is input by a person who is a national language, there is no possibility that an unrecognized part may occur due to a difference in pronunciation.

【００１６】日本人による英語発生の音声が入力された
場合、第１の言語が日本語、第２の言語が英語であれ
ば、入力される音声が、英語を母国語とする人に近い発
音であっても、日本語に近い発音（カタカナ外来語）で
あっても、音声が認識されなくなる虞れはない。If the first language is Japanese and the second language is English, the input voice is similar to that of a person whose native language is English. However, even if the pronunciation is close to Japanese (Katakana foreign words), there is no possibility that the speech will not be recognized.

【００１７】[0017]

【発明の実施の形態】以下に、本発明の実施の形態につ
いて図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１８】図１は、本発明の音声認識装置の実施の一
形態を示すブロック図であり、認識対象が英語である日
本人用の音声認識装置である。FIG. 1 is a block diagram showing an embodiment of a speech recognition apparatus according to the present invention, which is a speech recognition apparatus for Japanese whose recognition target is English.

【００１９】本形態は図１に示すように、音声が入力さ
れ、入力された音声を符号化する音声入力部１と、音声
入力部１において符合化されたデータに対して日本語と
しての認識尤度を求める日本語音声認識処理部２と、音
声入力部１において符号化されたデータに対して英語と
しての認識尤度を求める英語音声認識処理部３と、日本
語音声認識処理部２及び英語音声認識処理部３において
求められた認識尤度を正規化する認識尤度正規化処理部
４と、認識尤度正規化処理部４において正規化された日
本語及び英語の認識尤度を比較演算する認識尤度比較処
理部５と、認識尤度比較処理部５における比較演算によ
り認識尤度が最も大きな言語を認識結果として出力する
認識結果出力部６とから構成されている。In this embodiment, as shown in FIG. 1, a voice is inputted, a voice input unit 1 for encoding the input voice, and data encoded by the voice input unit 1 are recognized as Japanese. Japanese speech recognition processing unit 2 for obtaining likelihood, English speech recognition processing unit 3 for obtaining the recognition likelihood of data encoded in speech input unit 1 as English, Japanese speech recognition processing unit 2 and Comparison between the recognition likelihood normalization processing unit 4 that normalizes the recognition likelihood obtained by the English speech recognition processing unit 3 and the recognition likelihood of Japanese and English normalized by the recognition likelihood normalization processing unit 4 It comprises a recognition likelihood comparison processing unit 5 for calculating, and a recognition result output unit 6 for outputting, as a recognition result, a language having the highest recognition likelihood by the comparison operation in the recognition likelihood comparison processing unit 5.

【００２０】以下に、上記のように構成された音声認識
装置の動作について説明する。Hereinafter, the operation of the speech recognition apparatus configured as described above will be described.

【００２１】まず、外部からマイクロフォン（不図示）
を介して音声入力部１に音声信号（入力音声）が入力さ
れると、音声入力部１において、入力された音声信号が
デジタル信号に変換され、音声符号データとして日本語
音声認識処理部２及び英語音声認識処理部３に対して出
力される。First, an external microphone (not shown)
When a voice signal (input voice) is input to the voice input unit 1 via the voice input unit 1, the input voice signal is converted into a digital signal in the voice input unit 1, and the voice recognition unit 2 converts the input voice signal into voice code data. Output to the English speech recognition processing unit 3.

【００２２】ここで、日本語音声認識処理部２において
は、予め、半音節単位の音声モデルが格納されており、
また、英語音声認識処理部３においては、予め、Ｄｉｐ
ｈｏｎｅ単位の音声モデルが格納されている。Here, the Japanese speech recognition processing unit 2 stores in advance a speech model for each syllable.
In the English speech recognition processing unit 3, the Dip
A phone model is stored for each phone.

【００２３】音声入力部１から出力された音声符号デー
タが日本語音声認識処理部２に入力されると、日本語音
声認識処理部２において、半音節ＨＭＭを用いて、音声
入力部１から出力された音声符号データと、格納されて
いる半音節単位の音声モデルとのマッチング処理が行わ
れ、類似度が最大となる単語及びその時の日本語認識尤
度が算出される。When the speech code data outputted from the speech input unit 1 is inputted to the Japanese speech recognition processing unit 2, the Japanese speech recognition processing unit 2 outputs the speech code data from the speech input unit 1 using a half-syllable HMM. A matching process is performed between the input speech code data and the stored speech model in units of syllables, and the word having the maximum similarity and the Japanese recognition likelihood at that time are calculated.

【００２４】また、音声入力部１から出力された音声符
号データが英語音声認識処理部３に入力されると、英語
音声認識処理部３において、ＤｉｐｈｏｎｅＨＭＭモデ
ルを用いて、音声入力部１から出力された音声符号デー
タと、格納されているＤｉｐｈｏｎｅ単位の音声モデル
とのマッチング処理が行われ、類似度が最大となる単語
及びその時の英語認識尤度が算出される。When the voice code data output from the voice input unit 1 is input to the English voice recognition processing unit 3, the English voice recognition processing unit 3 outputs the voice code data from the voice input unit 1 using the Diphone HMM model. A matching process is performed between the obtained speech code data and the stored speech model in units of Diphones, and the word having the maximum similarity and the English recognition likelihood at that time are calculated.

【００２５】ここで、日本語音声認識処理部２において
算出された日本語認識尤度と英語音声認識処理３におい
て認識された英語認識尤度とにおいては、認識尤度の基
準が互いに異なるため、このままでは認識尤度を比較す
ることができない。Here, the recognition likelihood of the Japanese recognition likelihood calculated by the Japanese speech recognition processing unit 2 and the English recognition likelihood recognized by the English speech recognition processing 3 are different from each other. In this state, the recognition likelihood cannot be compared.

【００２６】そこで、日本語音声認識処理部２及び英語
音声認識処理部３のそれぞれにおいて日本語認識尤度及
び英語認識尤度が算出されると、日本語認識尤度及び英
語認識尤度が認識尤度正規化処理部４に入力され、認識
尤度正規化処理部４において、日本語認識尤度及び英語
認識尤度が互いに比較することができるように正規化さ
れる。When the Japanese speech recognition likelihood and the English speech recognition likelihood are calculated in each of the Japanese speech recognition processing section 2 and the English speech recognition processing section 3, the Japanese recognition likelihood and the English recognition likelihood are recognized. It is input to the likelihood normalization processing unit 4 and normalized in the recognition likelihood normalization processing unit 4 so that the Japanese recognition likelihood and the English recognition likelihood can be compared with each other.

【００２７】認識尤度正規化処理部４において、日本語
認識尤度及び英語認識尤度が正規化されると、認識尤度
比較処理部５において、正規化された２つの認識尤度の
比較演算が行われる。When the Japanese recognition likelihood and the English recognition likelihood are normalized in the recognition likelihood normalization processing unit 4, the recognition likelihood comparison processing unit 5 compares the two normalized recognition likelihoods. An operation is performed.

【００２８】その後、認識結果出力部６において、認識
尤度比較処理部５における比較演算の結果に基づいて、
認識尤度が大きな方の単語が認識結果として出力され
る。Thereafter, in the recognition result output unit 6, based on the result of the comparison operation in the recognition likelihood comparison processing unit 5,
The word with the higher recognition likelihood is output as the recognition result.

【００２９】なお、上述した実施の形態においては、予
め格納されている音声モデルとして、日本語音声認識処
理部２では半音節ＨＭＭ、英語音声認識処理部３ではＤ
ｉｐｈｏｎｅＨＭＭを用いる場合について説明したが、
本発明はこれに限られず、他のモデルを用いてそれぞれ
の認識を行うことも可能である。In the above-described embodiment, as the speech models stored in advance, a half-syllable HMM in the Japanese speech recognition processing unit 2 and a D
Although the case where the iPhone HMM is used has been described,
The present invention is not limited to this, and it is also possible to perform each recognition using another model.

【００３０】また、本実施の形態においては、日本語音
声認識処理と英語音声認識処理とを併用する場合につい
て説明したが、本発明はこれに限られず、認識対象とす
る言語によって、他の複数の言語の音声認識処理を併用
することも可能である。Also, in the present embodiment, the case where both the Japanese speech recognition processing and the English speech recognition processing are used has been described. However, the present invention is not limited to this, and other plural speech recognition processing may be performed depending on the language to be recognized. It is also possible to use the speech recognition processing of the language of FIG.

【００３１】[0031]

【発明の効果】以上説明したように本発明においては、
外部から音声入力部を介して音声が入力されると、第１
及び第２の音声認識処理部のそれぞれにおいて、入力さ
れた音声データに対して第１及び第２の言語としての認
識尤度が求められ、その後、認識尤度比較処理部におい
て、第１及び第２の音声認識処理部にて求められた認識
尤度が比較され、その結果、認識尤度が最も大きな言語
が認識結果として認識結果出力部を介して出力される構
成としたため、入力される音声によって認識手法を切り
替えることなく、高い認識率を得ることができる。As described above, in the present invention,
When voice is input from outside via the voice input unit, the first
In each of the first and second speech recognition processing units, the recognition likelihood as the first and second languages is obtained from the input speech data. 2 is compared with the recognition likelihood obtained by the speech recognition processing unit 2, and as a result, a language having the largest recognition likelihood is output as a recognition result via the recognition result output unit. Thus, a high recognition rate can be obtained without switching the recognition method.

【００３２】例えば、日本人による英語発生の音声が入
力された場合、第１の言語が日本語、第２の言語が英語
であれば、入力される音声が、英語を母国語とする人に
近い発音であっても、日本語に近い発音（カタカナ外来
語）であっても、高い認識率を得ることができる。For example, when a voice of English generation by a Japanese is input, if the first language is Japanese and the second language is English, the input voice is to a person whose native language is English. A high recognition rate can be obtained whether the pronunciation is close or the pronunciation is close to Japanese (Katakana foreign words).

[Brief description of the drawings]

【図１】本発明の音声認識装置の実施の一形態を示すブ
ロック図である。FIG. 1 is a block diagram showing one embodiment of a speech recognition device of the present invention.

[Explanation of symbols]

１音声入力部２日本語音声認識処理部３英語音声認識処理部４認識尤度正規化処理部５認識尤度比較処理部６認識結果出力部 Reference Signs List 1 speech input unit 2 Japanese speech recognition processing unit 3 English speech recognition processing unit 4 recognition likelihood normalization processing unit 5 recognition likelihood comparison processing unit 6 recognition result output unit

Claims

[Claims]

1. A speech recognition apparatus for recognizing an input speech as a first language and a second language, and outputting a language having the highest likelihood of recognition as a recognition result. A speech input unit that encodes input speech, a first speech recognition processing unit that determines a recognition likelihood as the first language for data encoded by the speech input unit, A second speech recognition processing unit that obtains a recognition likelihood as the second language for the data encoded in the input unit; and a recognition likelihood obtained by the first and second speech recognition processing units. A recognition likelihood normalization processing unit for normalizing; a recognition likelihood comparison processing unit for comparing and calculating the recognition likelihood normalized in the recognition likelihood normalization processing unit; and a comparison calculation in the recognition likelihood comparison processing unit Words with the highest recognition likelihood A recognition result output unit that outputs a word as a recognition result.

2. The speech recognition apparatus according to claim 1, wherein the first and second speech recognition processing units are configured to compare a speech model with data input in advance to obtain the recognition likelihood. Is stored.

3. The speech recognition device according to claim 1, wherein the first language is Japanese and the second language is English.