JPH04156600A

JPH04156600A - Voice recognizing device

Info

Publication number: JPH04156600A
Application number: JP2282328A
Authority: JP
Inventors: Mitsugi Matsushita; 貢松下
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-10-19
Filing date: 1990-10-19
Publication date: 1992-05-29

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声認識装置に関する。[Detailed description of the invention] Industrial applications The present invention relates to a speech recognition device.

従来の技術従来、音声を含む音響情報を音響収集部により収集し、
この音響収集部により収集された音響情報の信号を音声
認識部に送ることにより音声認識を行う音声認識装置に
おいて、騒音下で発声された音声の認識は非常に困難で
ある。これは、音声認識装置へ入力される音声に騒音が
付加される問題だけでなく、１９１１年にロンバートに
よって示されたロンバート効果による発声自身の変形が
あるためである。このロンバート効果とは、騒音下で発
声する場合に、発声者が自分の音声を聞き取りにくくな
るために、より大きくより明瞭に発声しようとするため
に起きるものである。BACKGROUND OF THE INVENTION Conventionally, acoustic information including voice is collected by an acoustic collecting section,
In a speech recognition device that performs speech recognition by sending a signal of acoustic information collected by the acoustic collection section to a speech recognition section, it is extremely difficult to recognize speech uttered in noise. This is not only due to the problem of noise being added to the voice input to the speech recognition device, but also due to the deformation of the utterance itself due to the Lombard effect, which was shown by Lombard in 1911. The Lombard effect occurs because when speaking in a noisy environment, the speaker tries to speak louder and clearer because it becomes difficult to hear his or her own voice.

発明が解決しようとする課題このように騒音下における音声認識では、音声に騒音が
付加する問題が大きく取り上げられている。その対策と
しては、周知のアダプテイブノイスキャンセリング法な
どが考えられている。また、ロンバート効果についての
対策は、日本音響学会講演論文集■、ｌ−３−９、ｐｐ
、　１７−１８、１９９０年３月に記載のスペクトルを
補正する方法などが考えられているが、ロンバート効果
が起きにくくするような方法はあまり考えられていない
。また、特開昭６１−２２１９２８号公報では、常に発
声者に騒音を聞かせて大きな声で発声させる方法が考え
られているが、ロンバート効果が起きにくくするような
方法は考えられていない。Problems to be Solved by the Invention As described above, in speech recognition under noisy conditions, the problem of noise being added to speech has been widely discussed. As a countermeasure against this problem, the well-known adaptive noise canceling method is being considered. In addition, countermeasures against the Lombard effect are available in Proceedings of the Acoustical Society of Japan ■, l-3-9, pp.
, 17-18, March 1990, etc., have been considered, but methods that make the Lombard effect less likely to occur have not been considered much. Further, in Japanese Patent Application Laid-open No. 61-221928, a method is considered in which the speaker is always made to hear noise so as to make the speaker speak loudly, but no method has been considered to make the Lombard effect less likely to occur.

課題を解決するための手段そこで、このような問題点を解決するために、請求項１
記載の発明では、音声を含む音響情報を音響収集部によ
り収集し、この音響収集部により収集された音響情報の
信号を音声認識部に送ることにより音声認識を行う音声
認識装置において、前記音響収集部に周囲環境が高騒音
下・でもＳＮ比良く音声を収集する音声高効率収集手段
を設け、この音声高効率収集手段を備えた前記音響収集
部に入力される音声を発声者にフィードバックさせる音
声フィードバック部を設けた。Means for Solving the Problem Therefore, in order to solve such problems, claim 1
In the described invention, in a speech recognition device that performs speech recognition by collecting acoustic information including speech by an acoustic collecting section and sending a signal of the acoustic information collected by the acoustic collecting section to a speech recognizing section, A high-efficiency audio collection unit is provided in the unit to collect audio with a good signal-to-noise ratio even in a high-noise surrounding environment, and the audio input to the audio collection unit equipped with the high-efficiency audio collection unit is fed back to the speaker. A feedback department was established.

請求項２記載の発明では、周囲の雑音を推定する雑音推
定部を設け、この雑音推定部で推定された雑音を用い入
力される音声から雑音を除去する雑音除去部を設けた。In the invention as claimed in claim 2, a noise estimating section is provided for estimating surrounding noise, and a noise removing section is provided for removing noise from input speech using the noise estimated by the noise estimating section.

作用請求項１記載の発明は、音声認識用の音響収集部で入力
される音声を発声者にフィードバックさせているためロ
ンバート効果による認識率の低下を少なくすることがで
き、これにより、高騒音下におけるロンバート効果によ
る音声変動を少なくすることができると共に、高騒音下
においても認識可能な音声認識装置を提供することがで
きる。The invention according to claim 1 can reduce the reduction in recognition rate due to the Lombard effect, since the sound input by the sound collecting section for speech recognition is fed back to the speaker. It is possible to provide a speech recognition device that can reduce speech fluctuations caused by the Lombard effect and that can be recognized even under high noise conditions.

請求項２記載の発明は、請求項１記載の発明にさらに音
声認識装置と雑音除去装置とを組合わせているため音声
以外の雑音成分をより一段と低減させることが可能とな
り、これにより高騒音下においても使用することか可能
な音声認識装置を実現することができる。The invention set forth in claim 2 further combines the invention set forth in claim 1 with a speech recognition device and a noise removal device, so that it is possible to further reduce noise components other than speech, and thereby, it is possible to further reduce noise components other than speech. It is possible to realize a speech recognition device that can also be used in

実施例ます、請求項１記載の発明の一実施例を第１図に基づい
て説明する。その第１図に示すように、音声を含む音響
情報を音響収集部ｌにより収集し、この音響収集部ｌに
より収集された音響情報の音声信号ａを音声認識部２に
送ることにより音声認識を行う音声認識装置において、
前記音響収集部１には図示しない音声高効率収集手段が
設けられている。この音声高効率収集手段は、周囲環境
が高騒音下でもＳＮ比良く音声を収集する働きがある。EXAMPLE An example of the invention as claimed in claim 1 will be described based on FIG. As shown in FIG. 1, acoustic information including speech is collected by an acoustic collecting section 1, and a speech signal a of the acoustic information collected by this acoustic collecting section 1 is sent to a speech recognition section 2 to perform speech recognition. In the speech recognition device that performs
The sound collecting section 1 is provided with a highly efficient sound collecting means (not shown). This high-efficiency audio collection means has the function of collecting audio with a high signal-to-noise ratio even in a highly noisy surrounding environment.

また、このような音声高効率収集手段を備えた前記音響
収集部ｌには、音声フィードバック部３が接続されてい
る。この音声フィードバック部３は、前記音響収集部１
に入力される音声を発声者にフィードバックさせる働き
がある。Further, an audio feedback unit 3 is connected to the audio collecting unit l equipped with such a highly efficient audio collecting means. This audio feedback unit 3 includes the audio collecting unit 1
The function is to feed back the input voice to the speaker.

この場合、前記音響収集部１としては、例えば、マイク
ロフォンを用いて、発声者が発声する音声を集音させる
ことができる。なお、前記マイクロフォンとしては、接
話型マイクロフォン、或いは、指向性マイクロフォンを
用いることが望ましい。In this case, the sound collecting section 1 can collect the sound uttered by the speaker using, for example, a microphone. Note that it is desirable to use a close-talk type microphone or a directional microphone as the microphone.

接話型マイクロフォンは、発声者の口とマイクロフォン
の位置が近いのでＳＮ比良く音声を入力することができ
る。指向性マイクロフォンは、指向性の向きを発声者の
口に合わせることによりＳＮ比良く音声を入力すること
ができる。ただし、このようなマイクロフォン以外のも
のを用いて音声入力を行うと、集音する音声のＳＮ比が
悪くなり、これによりフィードバックさせる音声にも、
周囲の雑音が多く混入してしまい、発声者が自分の音声
を開き取りにくくなってしまう恐れがあるので注意を要
する。A close-talking microphone is close to the speaker's mouth, so it is possible to input voice with a good signal-to-noise ratio. A directional microphone can input voice with a good signal-to-noise ratio by matching the direction of its direction to the mouth of the speaker. However, if you input audio using something other than such a microphone, the SN ratio of the collected audio will be poor, and as a result, the audio to be fed back will also be affected.
Care must be taken because a lot of surrounding noise may be mixed in, making it difficult for the speaker to understand his/her own voice.

また、前記音声フィードバック部３としては、例えば、
発声者に付けてもらったヘッドホンや発声者の近くに設
置したスピーカーを用いて、収集された音声を発声者に
フィードバックさせて発声者自身の音声を聞き取りやす
くすることができる。Further, as the audio feedback section 3, for example,
Using headphones worn by the speaker or speakers placed near the speaker, the collected voice can be fed back to the speaker to make it easier to hear the speaker's own voice.

さらに、前記音声認識部２としては、例えば、公知の音
声認識システムを用いても実現することが可能である。Further, the speech recognition unit 2 can be implemented using, for example, a known speech recognition system.

このような構成において、音声やこの音声以外の周囲環
境の雑音を含む音響情報は、音響収集部１で収集される
。この音響収集部１に設けられた音声高効率収集手段は
、周囲環境が高騒音下でもＳＮ比良く音声を収集するこ
とができるため、このようにして収集された音声信号ａ
は音声認識部２に送られ、これにより音声認識が行われ
る。In such a configuration, acoustic information including voice and surrounding environment noise other than the voice is collected by the acoustic collecting section 1. The high-efficiency audio collecting means provided in the acoustic collecting unit 1 can collect audio with a high SN ratio even in a high noise environment, so the audio signal a collected in this way can be
is sent to the speech recognition section 2, where speech recognition is performed.

またこの時、ＳＮ比良く収集された音声信号ａは、音声
フィードバック部３を介して発声者にフィードバックさ
せているため、ロンバート効果による認識率の低下を少
なくすることかでき、これにより、発声者に自分自身の
音声を開き取りやすくさせることができる。In addition, at this time, since the voice signal a collected with a good S/N ratio is fed back to the speaker via the audio feedback unit 3, it is possible to reduce the decrease in the recognition rate due to the Lombard effect. You can make it easier to open and understand your own voice.

上述したように、音響収集部１の音声高効率収集手段に
より得られた音声信号ａを発声者に音声フィードバック
部３によりフィードバックさせているため、高騒音下に
おけるロンバート効果による音声変動を少なくすること
ができ、しかも、これにより高騒音下においても認識可
能な音声認識装置を提供することが可能となる。As described above, since the audio signal a obtained by the high-efficiency audio collection means of the audio collection unit 1 is fed back to the speaker by the audio feedback unit 3, it is possible to reduce audio fluctuations due to the Lombard effect under high noise conditions. Moreover, this makes it possible to provide a speech recognition device that can recognize speech even under high noise conditions.

次に、請求項２記載の発明の一実施例を第２図に基づい
て説明する。ここでは、前述した請求項１記載の発明の
実施例で述べた構成（第１図参照）に、さらに以下に述
べるような構成を加えたものである。Next, an embodiment of the invention according to claim 2 will be described based on FIG. Here, the following configuration is added to the configuration described in the embodiment of the invention described in claim 1 (see FIG. 1).

すなわち、周囲がうるさい環境下の場合、前述した請求
項１記載の発明の実施例の音声認識装置では、音声信号
にある一定の騒音が付加されてしまい、場合によっては
認識率が低下するという問題もある。そこで、本実施例
では、音声認識部２の前段に、周囲の雑音を推定する雑
音推定部４を設け、さらに、この雑音推定部４で推定さ
れた雑音を用いて入力される音声から雑音を除去する雑
音除去部５を設けたものである。That is, when the surrounding environment is noisy, the speech recognition device according to the embodiment of the invention described in claim 1 has the problem that a certain amount of noise is added to the speech signal, and the recognition rate may decrease depending on the case. There is also. Therefore, in this embodiment, a noise estimation section 4 for estimating the surrounding noise is provided before the speech recognition section 2, and the noise estimated by the noise estimation section 4 is used to remove noise from the input speech. A noise removing section 5 is provided to remove the noise.

このような構成において、音響収集部１により収集され
た音声信号ａは、雑音除去部５に入力される。この時、
雑音推定部４は、音声信号ａが存在しない区間における
音響収集部１からの入力信号をもとにして周囲環境の雑
音を推定し、これにより推定雑音量を得る。例えば、あ
る一定時間における雑音入力のスペクトルパターンの平
均値を求める。そして、このようにして推定された推定
雑音量は雑音除去部５に送られることにより、入力され
た音声信号ａからその推定雑音量が除去される。その後
、このようにして雑音の除去された音声信号ａは、音声
認識部２に送られることにより音声認識が行われること
になる。In such a configuration, the audio signal a collected by the acoustic collecting section 1 is input to the noise removing section 5. At this time,
The noise estimating unit 4 estimates the noise of the surrounding environment based on the input signal from the acoustic collecting unit 1 in the section where the audio signal a does not exist, thereby obtaining an estimated noise amount. For example, the average value of the spectral pattern of the noise input over a certain period of time is determined. Then, the estimated noise amount estimated in this manner is sent to the noise removal section 5, whereby the estimated noise amount is removed from the input audio signal a. Thereafter, the speech signal a from which noise has been removed in this manner is sent to the speech recognition section 2, where speech recognition is performed.

上述したように、音声認識装置に、雑音推定部４と雑音
除去部５とを備えた雑音除去装置を組合わせているため
、音声以外の雑音成分をより一段と低減させることが可
能となり、これにより高騒音下においても使用すること
が可能な音声認識装置を実現することができるものであ
る。As mentioned above, since the speech recognition device is combined with the noise removal device including the noise estimation unit 4 and the noise removal unit 5, it is possible to further reduce noise components other than speech. This makes it possible to realize a speech recognition device that can be used even under high noise conditions.

次に、前述した請求項２記載の発明の変形例を第３図に
基づいて説明する。ここでは、雑音推定部４及び雑音除
去部５を用いず、音声入力用の音響収集部として音声入
力用マイクロを用いる他に、雑音入力用の音響収集部と
して雑音入力用マイク７を別個に設け、さらに、この雑
音入力用マイク７に適応形フィルタ８を接続したアダプ
ティブノイズキャンセリング法によるものである。Next, a modification of the above-mentioned invention according to claim 2 will be explained based on FIG. 3. Here, the noise estimating section 4 and the noise removing section 5 are not used, and in addition to using a voice inputting micro as an acoustic collecting section for voice input, a noise inputting microphone 7 is separately provided as an acoustic collecting section for noise input. Furthermore, an adaptive noise canceling method is employed in which an adaptive filter 8 is connected to the noise input microphone 7.

この場合、雑音入力用マイク７から得られた雑音信号を
、適応形フィルタ８を介して、音声入力用マイクロから
得られた音声信号ａに入力させることにより、雑音成分
が適応的に除去された音声信号を得る二とが可能となり
、これにより前述した実施例（第２図参照）と同様な効
果を得る二とができる。In this case, the noise component is adaptively removed by inputting the noise signal obtained from the noise input microphone 7 to the audio signal a obtained from the audio input micro via the adaptive filter 8. This makes it possible to obtain an audio signal, thereby achieving the same effects as in the embodiment described above (see FIG. 2).

発明の効果請求項１記載の発明は、音声を含む音響情報を音響収集
部により収集し、この音響収集部により収集された音響
情報の信号を音声認識部に送ることにより音声認識を行
う音声認識装置において、前記音響収集部に周囲環境が
高騒音下でもＳＮ比良く音声を収集する音声高効率収集
手段を設け、この音声高効率収集手段を備えた前記音響
収集部に入力される音声を発声者にフィードバックさせ
る音声フィードバック部を設けたので、音声認識用の音
響収集部で入力される音声を発声者にフィードバックさ
せているためロンバート効果による認識率の低下を少な
くする二とができ、これにより、高騒音下におけるロン
バート効果による音声変動を少なくすることができると
共に、高騒音下においても認識可能な音声認識装置を提
供することができるものである。Effects of the Invention The invention according to claim 1 provides a voice recognition system that performs voice recognition by collecting acoustic information including voice by an acoustic collecting section and sending a signal of the acoustic information collected by the acoustic collecting section to a speech recognition section. In the apparatus, the sound collecting section is provided with a high-efficiency sound collecting means that collects sound with a good signal-to-noise ratio even in a high noise environment, and the sound input to the sound collecting section equipped with the high-efficiency sound collecting means is uttered. Since an audio feedback unit is provided to provide feedback to the speaker, the voice input by the acoustic collecting unit for speech recognition is fed back to the speaker, which can reduce the decrease in recognition rate due to the Lombard effect. Accordingly, it is possible to provide a speech recognition device that can reduce speech fluctuations due to the Lombard effect under high noise conditions and can recognize speech even under high noise conditions.

請求項２記載の発明は、周囲の雑音を推定する雑音推定
部を設け、この雑音推定部で推定された雑音を用い入力
される音声から雑音を除去する雑音除去部を設けたので
、請求項１記載の発明にさらに、音声認識装置と雑音除
去装置とを組合わせているため、音声以外の雑音成分を
より一段と低減させることが可能となり、これにより高
騒音下においても使用することが可能な音声認識装置を
実現することができるものである。The invention according to claim 2 is provided with a noise estimating section that estimates surrounding noise, and a noise removing section that removes noise from input speech using the noise estimated by the noise estimating section. Since the invention described in 1 is further combined with a speech recognition device and a noise removal device, it is possible to further reduce noise components other than speech, and thus it can be used even in high noise environments. This makes it possible to realize a speech recognition device.

[Brief explanation of drawings]

第１図は請求項１記載の発明の一実施例を示すブロック
図、第２図は請求項２記載の発明の一実施例を示すブロ
ック図、第３図はその請求項２記載の発明の変形例を示
すブロック図である。１・・・音響収集部、２・・・音声認識部、３・・・音
声フィードバック部、４・・・雑音推定部、５・・・雑
音除去部Fig. 1 is a block diagram showing an embodiment of the invention as claimed in claim 1, Fig. 2 is a block diagram showing an embodiment of the invention as claimed in claim 2, and Fig. 3 is a block diagram showing an embodiment of the invention as claimed in claim 2. It is a block diagram showing a modification. 1... Sound collection section, 2... Speech recognition section, 3... Voice feedback section, 4... Noise estimation section, 5... Noise removal section

Claims

[Claims] 1. A speech recognition device that performs speech recognition by collecting acoustic information including speech by an acoustic collecting section and sending a signal of the acoustic information collected by the acoustic collecting section to a speech recognizing section, The sound collecting section is provided with a high efficiency sound collecting means that collects sound with a good signal-to-noise ratio even in a high noise surrounding environment, and the sound input to the sound collecting section equipped with the high efficiency sound collecting means is fed back to the speaker. A voice recognition device characterized by being provided with a voice feedback section that provides voice feedback. 2. The apparatus according to claim 1, further comprising a noise estimating section for estimating surrounding noise, and a noise removing section for removing noise from input speech using the noise estimated by the noise estimating section. Speech recognition device.