JP2003099100A

JP2003099100A - Speech recognition apparatus and method

Info

Publication number: JP2003099100A
Application number: JP2001289549A
Authority: JP
Inventors: Nobuyuki Kunieda; 伸行國枝; Akira Ishida; 明石田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-09-21
Filing date: 2001-09-21
Publication date: 2003-04-04

Abstract

(57)【要約】【課題】本発明は、複数の音響信号出力手段を有する
場合にも認識性能を向上できる音声認識装置およびその
方法を提供する。【解決手段】発明の音声認識装置は、音響信号を発生
する音響信号発生手段１０と、音響信号を音波として出
力する複数の音響信号出力手段２１、２２と、音声を入
力する音響信号入力手段３０と、音響信号入力手段３０
の出力信号と音響信号発生手段１０の出力信号を入力
し、音響信号入力手段３０の出力信号に混入した複数の
音響信号出力手段２１、２２の出力信号成分を低減する
妨害音除去部１００と、妨害音除去部１００の出力信号
を入力し、音声を認識する音声認識手段４０とを備えた
構成とした。 (57) Abstract: The present invention provides a speech recognition apparatus and method that can improve recognition performance even when a plurality of audio signal output units are provided. SOLUTION: The speech recognition device of the present invention includes an acoustic signal generating means 10 for generating an acoustic signal, a plurality of acoustic signal output means 21 and 22 for outputting the acoustic signal as sound waves, and an acoustic signal input means 30 for inputting voice. And the acoustic signal input means 30
An output signal of the audio signal generating means 10 and an output signal of the audio signal generating means 10, and an interference sound removing unit 100 for reducing output signal components of the plurality of audio signal output means 21 and 22 mixed in the output signal of the audio signal input means 30; Speech recognizing means 40 which receives an output signal of the interfering sound removing unit 100 and recognizes speech is provided.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、スピーカー等から
音響信号が出力される環境で音声認識を行う場合に、ス
ピーカー等からの出力信号を低減して性能良く音声認識
を行うことができる音声認識装置およびその方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition capable of reducing the output signal from the speaker or the like and performing the voice recognition with high performance when the voice recognition is performed in an environment where an acoustic signal is output from the speaker or the like. An apparatus and its method.

【０００２】[0002]

【従来の技術】エコーキャンセラを利用した従来の音声
認識装置としては、例えば特開平５―３２３９９３号公
報に記載されたものがある。2. Description of the Related Art As a conventional voice recognition device using an echo canceller, there is one described in, for example, Japanese Patent Laid-Open No. 5-323993.

【０００３】この従来の音声認識装置では、音声入力マ
イクロホンに音声対話システム自身が出力した音声応答
音が入ってきた場合に音声応答音を打ち消す処理を行
い、音声応答出力時も音声認識を可能にしている。In this conventional voice recognition device, when a voice response sound output by the voice interaction system itself is input to the voice input microphone, a process of canceling the voice response sound is performed, and the voice recognition is enabled even when the voice response is output. ing.

【０００４】また、例えば特開平７―１７５４９８号公
報に記載された装置では、電話回線を経由して音声認識
する場合に利用者に対して流したガイダンス音声をエコ
ーキャンセラを用いて低減するように構成されており、
認識対象音声へのガイダンス音声の混入を抑えている。Further, for example, in the device described in Japanese Patent Laid-Open No. 7-175498, when the voice is recognized via the telephone line, the guidance voice sent to the user is reduced by using the echo canceller. Is configured,
The guidance voice is prevented from being mixed with the recognition target voice.

【０００５】さらに、特開平８―１０７３７５号公報に
記載された装置では、音響信号を再生すると同時に音声
記録をすることを目的とし、装置自身が再生した音響信
号を音声記録部の前で打ち消すような構成にしている。Furthermore, in the device disclosed in Japanese Patent Laid-Open No. 8-107375, the sound signal reproduced by the device itself is canceled in front of the sound recording unit for the purpose of reproducing the sound signal and simultaneously recording the sound. It has a simple structure.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、従来の
エコーキャンセラ手段を有する音声認識装置において
は、音響信号を出力する手段が複数存在する場合には対
応できないという問題があった。However, the voice recognition device having the conventional echo canceller means has a problem that it cannot cope with the case where there are a plurality of means for outputting an acoustic signal.

【０００７】本発明は、従来の問題を解決するためにな
されたもので、複数の音響信号出力手段を有する場合に
も認識性能を向上できる音声認識装置を提供することを
目的とする。The present invention has been made to solve the conventional problems, and an object of the present invention is to provide a speech recognition apparatus which can improve the recognition performance even when it has a plurality of acoustic signal output means.

【０００８】[0008]

【課題を解決するための手段】本発明の音声認識装置
は、音響信号を発生する音響信号発生手段と、前記音響
信号を音波として出力する複数の音響信号出力手段と、
音声を入力する音響信号入力手段と、前記音響信号入力
手段の出力信号と前記音響信号発生手段の出力信号を入
力し、前記音響信号入力手段の出力信号に混入した前記
複数の音響信号出力手段の出力信号成分を低減する妨害
音除去部と、前記妨害音除去部の出力信号を入力し、前
記音声を認識する音声認識手段とを備えた構成とした。A speech recognition apparatus of the present invention comprises an acoustic signal generating means for generating an acoustic signal, and a plurality of acoustic signal outputting means for outputting the acoustic signal as a sound wave.
A sound signal input means for inputting voice, an output signal of the sound signal input means and an output signal of the sound signal generation means are input, and the plurality of sound signal output means mixed in the output signal of the sound signal input means A configuration is provided that includes an interfering sound removing unit that reduces an output signal component and a voice recognizing unit that inputs the output signal of the interfering sound removing unit and recognizes the voice.

【０００９】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、音声認識性能を向上
することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition and the quality of the voice is deteriorated, Can be eliminated and the voice recognition performance can be improved.

【００１０】また、前記音響信号発生手段は、予め定め
た外部の音響信号発生手段の出力信号を入力として受け
取り、前記複数の音響信号出力手段へ音響信号を出力す
ることとした。The acoustic signal generating means receives the output signal of a predetermined external acoustic signal generating means as an input and outputs the acoustic signal to the plurality of acoustic signal outputting means.

【００１１】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、音声認識性能を向上
することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition to deteriorate the quality of the voice, Can be eliminated and the voice recognition performance can be improved.

【００１２】また、前記妨害音除去部は、前記音響信号
入力手段の出力信号に混入している前記音響信号出力手
段の出力信号成分を低減する複数段のエコーキャンセラ
手段を有し、最前段の前記エコーキャンセラ手段は前記
音響信号入力手段の出力信号を入力し、他段の前記エコ
ーキャンセラ手段は前段の前記エコーキャンセラ手段の
出力信号を入力し、それぞれの前記エコーキャンセラ手
段は前記音響信号発生手段の出力信号のうち一つの出力
信号を入力し、前記音響信号入力手段の出力信号に混入
している前記音響信号出力手段の出力信号成分を低減し
て出力することとした。Further, the interfering sound removing section has a plurality of stages of echo canceller means for reducing the output signal component of the acoustic signal output means mixed in the output signal of the acoustic signal input means, The echo canceller means inputs the output signal of the acoustic signal input means, the echo canceller means of the other stage inputs the output signal of the echo canceller means of the preceding stage, and each echo canceller means has the acoustic signal generating means. One of the output signals is input, and the output signal component of the acoustic signal output unit mixed in the output signal of the acoustic signal input unit is reduced and output.

【００１３】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも確実に音声信号を除去し、音声認識性能
を向上することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition to deteriorate the quality of the voice, The voice signal can be removed and the voice recognition performance can be improved.

【００１４】また、前記妨害音除去部は、複数段の前記
エコーキャンセラ手段と、複数段の前記エコーキャンセ
ラ手段へそれぞれ入力される前記音響信号発生手段の出
力信号のレベルを検出する複数の信号レベル検出手段
と、前記複数の信号レベル検出手段の出力信号を入力
し、その信号レベルから前記複数段のエコーキャンセラ
手段のうちエコーキャンセラの性能を向上させる処理を
行うべきものはいずれであるかを判定し、エコーキャン
セラ性能向上のための学習タイミングを決定する学習タ
イミング決定手段とを有することとした。The disturbing sound removing section detects a plurality of levels of the echo canceller means and a plurality of signal levels of the output signals of the acoustic signal generating means inputted to the plurality of echo canceller means, respectively. The detection means and the output signals of the plurality of signal level detection means are input, and it is determined from the signal levels which one of the plurality of stages of echo canceller means is to perform the processing for improving the performance of the echo canceller. However, the learning timing determining means determines the learning timing for improving the echo canceller performance.

【００１５】この構成により、発話者が発生した音声を
認識する際に、妨害音を除去する処理の性能を向上させ
ることができ、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも確実に音声信号を除去し、音声認識性能
を向上することができる。With this configuration, when recognizing the voice generated by the speaker, the performance of the process of removing the interfering sound can be improved, and the acoustic signals output from the plurality of speakers are input microphones for voice recognition. It is possible to reliably remove the voice signal even when the quality of the voice is deteriorated by being mixed in the voice recognition performance and improve the voice recognition performance.

【００１６】また、前記妨害音除去部は、前記音響信号
発生手段よりの複数の出力信号のいずれかと前記音響信
号入力手段の出力信号を入力し、前記音響信号入力手段
の出力信号に混入している前記音響信号出力手段のうち
の一つの出力信号を減衰させるように働くフィルタのフ
ィルタ係数を計算する複数のフィルタ係数算出手段と、
前記複数のフィルタ係数算出手段のそれぞれへ入力され
る前記音響信号発生手段よりの出力信号のレベルをそれ
ぞれ検出する複数の信号レベル検出手段と、前記複数の
信号レベル検出手段の出力信号を入力し、その信号レベ
ルから前記フィルタ係数算出手段のうちフィルタ係数を
学習すべき前記フィルタ係数算出手段はいずれであるか
を判定し、学習のタイミングを決定する学習タイミング
決定手段と、前記複数のフィルタ係数算出手段で計算さ
れたフィルタ係数を統合するフィルタ係数統合手段と、
前記フィルタ係数統合手段と前記音響信号入力手段の出
力信号を入力し、前記フィルタ係数統合手段で求めたフ
ィルタ係数を利用してから前記音響信号出力成分を低減
する処理を行うフィルタリング手段とを有することとし
た。The interfering sound removing section inputs any one of the plurality of output signals from the acoustic signal generating means and the output signal of the acoustic signal input means, and mixes them with the output signal of the acoustic signal input means. A plurality of filter coefficient calculation means for calculating the filter coefficient of the filter that acts to attenuate one output signal of the acoustic signal output means,
A plurality of signal level detecting means for detecting the level of each output signal from the acoustic signal generating means input to each of the plurality of filter coefficient calculating means, and inputting the output signals of the plurality of signal level detecting means, A learning timing determination unit that determines which of the filter coefficient calculation units should learn the filter coefficient from the signal level and determines the learning timing, and the plurality of filter coefficient calculation units. Filter coefficient integrating means for integrating the filter coefficients calculated in
And a filtering unit for inputting the output signal of the filter coefficient integrating unit and the audio signal input unit, and for performing a process of reducing the audio signal output component after using the filter coefficient obtained by the filter coefficient integrating unit. And

【００１７】この構成により、発話者が発生した音声を
認識する際に、妨害音を除去する処理の性能を向上させ
ることができ、なおかつ効率のよい計算が可能となる。
その結果として、複数のスピーカーから出力された音響
信号が音声認識の入力用マイクに混入して音声の品質が
劣化した場合にも確実に音声信号を除去し、音声認識性
能を向上することができる。With this configuration, when recognizing the voice generated by the speaker, the performance of the process of removing the interfering sound can be improved, and the efficient calculation can be performed.
As a result, even when the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition and the quality of the voice is deteriorated, the voice signal can be reliably removed and the voice recognition performance can be improved. .

【００１８】また、前記学習タイミング決定手段は、前
記複数の信号レベル検出手段の出力信号のうち一つを除
いたすべての出力信号が閾値未満であり、なおかつ残り
の一つの出力信号が前記閾値とは別に設定する閾値以上
であったときに、前記信号レベル検出手段に入力された
音響信号を入力とする前記エコーキャンセラ手段または
前記フィルタ係数算出手段の学習を行うようにこととし
た。In the learning timing determining means, all the output signals except one of the output signals of the plurality of signal level detecting means are less than the threshold value, and the remaining one output signal is less than the threshold value. In addition, when it is equal to or more than a threshold value set separately, learning of the echo canceller means or the filter coefficient calculation means that receives the acoustic signal input to the signal level detection means is performed.

【００１９】この構成により、発話者が発生した音声を
認識する際に、妨害音を除去する処理の性能を向上させ
ることができ、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも確実に音声信号を除去し、音声認識性能
を向上することができる。With this configuration, it is possible to improve the performance of the process of removing the interfering sound when the voice generated by the speaker is recognized, and the acoustic signals output from the plurality of speakers are input microphones for voice recognition. It is possible to reliably remove the voice signal even when the quality of the voice is deteriorated by being mixed in the voice recognition performance and improve the voice recognition performance.

【００２０】また、さらに前記音響信号入力手段の出力
信号を入力して騒音を抑圧する騒音抑圧手段を有し、前
記妨害音除去部は、前記騒音抑圧手段の出力信号と前記
音響信号発生手段の出力信号を入力し、前記音響信号入
力手段の出力信号に混入している前記音響信号出力手段
の出力信号成分を低減することとした。Further, the apparatus further comprises a noise suppressing means for suppressing noise by inputting the output signal of the acoustic signal inputting means, and the interfering sound removing section outputs the output signal of the noise suppressing means and the acoustic signal generating means. The output signal is input and the output signal component of the acoustic signal output means mixed in the output signal of the acoustic signal input means is reduced.

【００２１】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、騒音抑圧を行うこと
でさらに音声認識性能を向上することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition and the voice quality is deteriorated, the voice signal is also deteriorated. Is eliminated and noise is suppressed, the voice recognition performance can be further improved.

【００２２】また、さらに前記妨害音除去部の出力信号
を入力して騒音を抑圧する騒音抑圧手段を有し、前記音
声認識手段は、前記騒音抑圧手段の出力信号を入力して
発話者が発声した音声を認識することとした。Further, the speech recognition means further comprises noise suppressing means for suppressing noise by inputting the output signal of the interfering sound removing portion, and the voice recognizing means inputs the output signal of the noise suppressing means and the speaker utters. I decided to recognize the voice.

【００２３】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、騒音抑圧を行うこと
でさらに音声認識性能を向上することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition to deteriorate the quality of the voice, Is eliminated and noise is suppressed, the voice recognition performance can be further improved.

【００２４】また、さらに前記音響信号入力手段の出力
信号を入力して騒音を抑圧する第１の騒音抑圧手段と、
前記妨害音除去部の出力信号を入力して騒音を抑圧する
第２の騒音抑圧手段とを有し、前記妨害音除去部は、前
記第１の騒音抑圧手段の出力信号と前記音響信号発生手
段の出力信号を入力して前記第１の騒音抑圧手段の出力
信号に混入している前記音響信号出力手段の出力信号成
分を低減し、前記音声認識手段は、前記第２の騒音抑圧
手段の出力信号を入力して発話者が発生した音声を認識
することとした。Further, first noise suppressing means for suppressing noise by inputting the output signal of the acoustic signal inputting means,
A second noise suppressing unit for inputting an output signal of the interfering sound removing unit to suppress noise, and the interfering sound removing unit includes an output signal of the first noise suppressing unit and the acoustic signal generating unit. Output signal of the acoustic signal output means mixed in the output signal of the first noise suppression means is reduced, and the voice recognition means outputs the output signal of the second noise suppression means. We decided to recognize the voice generated by the speaker by inputting a signal.

【００２５】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号を音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、騒音抑圧を行うこと
でさらに音声認識性能を向上することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound quality output from a plurality of speakers is mixed into the input microphone for voice recognition to deteriorate the voice quality, Is eliminated and noise is suppressed, the voice recognition performance can be further improved.

【００２６】また、さらに前記音響信号発生手段の出力
を制御する音響信号出力制御手段を有し、前記音響信号
出力制御手段は、前記音響信号発生手段の出力信号を入
力し、前記複数の音響信号出力手段および前記妨害音除
去部へ出力する信号を制御することとした。Further, there is further provided acoustic signal output control means for controlling the output of the acoustic signal generation means, wherein the acoustic signal output control means receives the output signal of the acoustic signal generation means and receives the plurality of acoustic signals. It is decided to control the signal output to the output unit and the disturbing sound removing unit.

【００２７】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、音声認識性能を向上
することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition to deteriorate the quality of the voice, Can be eliminated and the voice recognition performance can be improved.

【００２８】また、さらに伝達特性の計測に適した信号
を生成する伝達特性計測用信号生成手段を有し、前記音
響信号出力制御手段は、必要に応じて前記音響信号発生
手段の出力信号と前記伝達特性計測用信号生成手段の出
力信号のうちいずれかの信号を出力するように制御する
こととした。Further, it further comprises a transfer characteristic measuring signal generating means for generating a signal suitable for measuring the transfer characteristic, and the acoustic signal output control means and the output signal of the acoustic signal generating means and the The transfer characteristic measuring signal generating means is controlled so as to output one of the output signals.

【００２９】この構成により、発話者が発生した音声を
認識する際に、妨害音を除去する処理の性能をさらに向
上させることができ、複数のスピーカーから出力された
音響信号が音声認識の入力用マイクに混入して音声の品
質が劣化した場合にも音声信号を除去し、音声認識性能
を向上することができる。With this configuration, when recognizing the voice generated by the speaker, the performance of the process of removing the interfering sound can be further improved, and the acoustic signals output from the plurality of speakers are used for the voice recognition input. It is possible to improve the voice recognition performance by removing the voice signal even when the voice quality is deteriorated by being mixed in the microphone.

【００３０】また、前記伝達特性計測用信号生成手段か
ら出力される信号が、白色雑音またはＭ系列信号のいず
れかであることとした。The signal output from the transfer characteristic measuring signal generating means is either white noise or an M-sequence signal.

【００３１】この構成により、発話者が発生した音声を
認識する際に、妨害音を除去する処理の性能をさらに向
上させることができ、複数のスピーカーから出力された
音響信号が音声認識の入力用マイクに混入して音声の品
質が劣化した場合にも音声信号を除去し、音声認識性能
を向上することができる。With this configuration, when recognizing the voice generated by the speaker, the performance of the process of removing the interfering sound can be further improved, and the acoustic signals output from the plurality of speakers are used for the voice recognition input. It is possible to improve the voice recognition performance by removing the voice signal even when the voice quality is deteriorated by being mixed in the microphone.

【００３２】また、さらに前記音響信号入力手段の出力
信号および前記音響信号出力制御手段から出力される信
号についての情報から騒音を抑圧する騒音抑圧手段を有
し、前記妨害音除去部は、前記騒音抑圧手段の出力信号
を入力して処理することとした。Further, the apparatus further comprises noise suppressing means for suppressing noise from the information about the output signal of the acoustic signal inputting means and the signal output from the acoustic signal output controlling means, and the interfering sound removing section includes the noise suppressing means. The output signal of the suppressing means is input and processed.

【００３３】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、騒音抑圧を行うこと
でさらに音声認識性能を向上することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition to deteriorate the voice quality, Is eliminated and noise is suppressed, the voice recognition performance can be further improved.

【００３４】また、さらに前記妨害音除去部の出力信号
および前記音響信号出力制御手段から出力される信号に
ついての情報から騒音を抑圧する騒音抑圧手段を有し、
前記音声認識手段は、前記騒音抑圧手段の出力信号を入
力して発話者が発声した音声を認識することとした。Further, the apparatus further comprises noise suppressing means for suppressing noise from the information on the output signal of the interfering sound removing section and the signal output from the acoustic signal output control means,
The voice recognition means inputs the output signal of the noise suppression means and recognizes the voice uttered by the speaker.

【００３５】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、騒音抑圧を行うこと
でさらに音声認識性能を向上することができる。With this configuration, when a speaker recognizes a voice generated, even if the sound signal output from a plurality of speakers is mixed in the input microphone for voice recognition and the voice quality is deteriorated, the voice signal is also deteriorated. Is eliminated and noise is suppressed, the voice recognition performance can be further improved.

【００３６】また、さらに発話者に聞かせる合成音を作
成する合成音生成手段と、音声認識結果を分析して音声
対話を実現するための制御を行う音声対話制御手段とを
有し、前記音響信号出力制御手段は、前記音響信号発生
手段の出力信号と前記合成音生成手段の出力信号と前記
音声対話制御手段の出力信号を入力し、必要に応じて前
記音響信号発生手段の出力信号と前記合成音生成手段の
出力信号のいずれかを出力することとした。Further, the system further comprises a synthetic sound generating means for creating a synthetic sound to be heard by a speaker, and a voice dialogue control means for controlling the voice recognition result to realize a voice dialogue. The signal output control means inputs the output signal of the acoustic signal generation means, the output signal of the synthesized sound generation means, and the output signal of the voice interaction control means, and outputs the output signal of the acoustic signal generation means and the One of the output signals of the synthetic sound generation means is output.

【００３７】この構成により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し音声認識性能を向上で
き、また発話者との対話を実現することで、発話者にと
って使いやすい音声認識が可能となる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition to deteriorate the voice quality, Can be improved to improve the voice recognition performance, and by realizing the dialogue with the speaker, it becomes possible for the speaker to easily recognize the voice.

【００３８】また、さらに前記音声対話制御手段で決定
された内容を表示する表示手段を有することとした。Further, a display means for displaying the contents determined by the voice dialogue control means is further provided.

【００３９】この構成により、また発話者との対話がユ
ーザーにとっても分かりやすいものとなる。With this configuration, the dialogue with the speaker becomes easy for the user to understand.

【００４０】また、前記音響信号発生手段から出力され
る信号がステレオ信号であることとした。The signal output from the acoustic signal generating means is a stereo signal.

【００４１】この構成により、発話者が発生した音声を
認識する際に、ステレオスピーカーから出力された音響
信号が音声認識の入力用マイクに混入して音声の品質が
劣化した場合にも音声信号を除去し、音声認識性能を向
上することができる。With this configuration, when the voice generated by the speaker is recognized, even if the sound signal output from the stereo speaker is mixed in the input microphone for voice recognition and the voice quality is deteriorated, the voice signal is also output. It can be removed to improve the voice recognition performance.

【００４２】また、音響信号発生手段から出力される複
数の信号が同一内容であることとした。Further, a plurality of signals output from the acoustic signal generating means have the same content.

【００４３】この構成により、発話者が発生した音声を
認識する際に、モノラル信号を複数のスピーカーから出
力した音響信号が音声認識の入力用マイクに混入して音
声の品質が劣化した場合にも音声信号を除去し、音声認
識性能を向上することができる。With this configuration, even when the voice generated by the speaker is recognized, even if the acoustic signal output from the plurality of speakers is mixed with the input microphone for voice recognition and the quality of the voice is deteriorated. The voice signal can be removed and the voice recognition performance can be improved.

【００４４】さらに、本発明の音声認識方法は、音響信
号を発生する音響信号発生手順と、前記音響信号を音波
として出力する複数の音響信号出力手順と、音声を入力
する音響信号入力手順と、前記音響信号入力手順での出
力信号と前記音響信号発生手順での出力信号を入力し、
前記音響信号入力手順での出力信号に混入した前記複数
の音響信号出力手順での出力信号成分を低減する妨害音
除去手順と、前記妨害音除去手順の出力信号を入力し、
前記音声を認識する音声認識手順とで行うこととした。Furthermore, the voice recognition method of the present invention comprises: an acoustic signal generation procedure for generating an acoustic signal; a plurality of acoustic signal output procedures for outputting the acoustic signals as sound waves; and an acoustic signal input procedure for inputting voice. Input the output signal in the acoustic signal input procedure and the output signal in the acoustic signal generation procedure,
An interference sound elimination procedure for reducing output signal components in the plurality of acoustic signal output procedures mixed in the output signal in the acoustic signal input procedure, and inputting an output signal of the interference sound elimination procedure,
The voice recognition procedure for recognizing the voice is performed.

【００４５】この方法により、発話者が発生した音声を
認識する際に、複数のスピーカーから出力された音響信
号が音声認識の入力用マイクに混入して音声の品質が劣
化した場合にも音声信号を除去し、音声認識性能を向上
することができる。According to this method, when the voice generated by the speaker is recognized, even if the sound signals output from the plurality of speakers are mixed in the input microphone for voice recognition to deteriorate the voice quality, Can be eliminated and the voice recognition performance can be improved.

【００４６】[0046]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図面を用いて説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００４７】図１は、本発明の第１の実施形態の音声認
識装置のブロック図を示す。FIG. 1 is a block diagram of a speech recognition apparatus according to the first embodiment of the present invention.

【００４８】図１に示すように、第１の実施形態の音声
認識装置は、音響信号を発生する音響信号発生手段１０
と、音響信号発生手段１０の出力信号を音波として空間
へ出力する第１の音響信号出力手段２１および第２の音
響信号出力手段２２と、発話者が発声した音声を入力す
る音響信号入力手段３０と、音響信号入力手段３０の出
力信号と音響信号発生手段１０の出力信号を入力し、音
響信号入力手段３０の出力信号に混入している第１の音
響信号出力手段２１および第２の音響信号出力手段２２
の出力信号成分を低減する妨害音除去部１００と、妨害
音除去部１００の出力信号を入力し、発話者が発声した
音声を認識する音声認識手段４０とから構成されてい
る。As shown in FIG. 1, the speech recognition apparatus of the first embodiment has an acoustic signal generating means 10 for generating an acoustic signal.
A first acoustic signal output means 21 and a second acoustic signal output means 22 for outputting the output signal of the acoustic signal generation means 10 to the space as sound waves, and an acoustic signal input means 30 for inputting the voice uttered by the speaker. And the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal generation means 10 are input, and are mixed in the output signal of the acoustic signal input means 30 with the first acoustic signal output means 21 and the second acoustic signal. Output means 22
And an audio signal recognizing means 40 which receives the output signal of the interfering sound removing unit 100 and recognizes the voice uttered by the speaker.

【００４９】以上のように構成された第１の実施形態の
音声認識装置について、その動作を説明する。The operation of the speech recognition apparatus of the first embodiment configured as described above will be described.

【００５０】まず音響信号発生手段１０が例えば音楽を
生成し、第１の音響信号出力手段２１および第２の音響
信号出力手段２２から音波として空間へ出力している状
態のとき、発話者が発声して音声認識する場合を例にし
て説明する。First, when the acoustic signal generating means 10 generates, for example, music, and the first acoustic signal outputting means 21 and the second acoustic signal outputting means 22 are outputting the sound waves to the space, the speaker speaks. A case of performing voice recognition will be described as an example.

【００５１】発話者が発声した音声信号は、音響信号入
力手段３０を通じて妨害音除去部１００に渡される。こ
のとき、妨害音除去部１００に入力された信号には、第
１の音響信号出力手段２１および第２の音響信号出力手
段２２から出力された音響信号（例えば音楽）も含まれ
ているため、そのまま音声認識したのでは、よい認識結
果は得られない。The voice signal uttered by the speaker is passed to the interfering sound removing section 100 through the acoustic signal input means 30. At this time, since the signal input to the interfering sound removal unit 100 also includes the acoustic signals (for example, music) output from the first acoustic signal output means 21 and the second acoustic signal output means 22, If the voice is recognized as it is, a good recognition result cannot be obtained.

【００５２】そこで妨害音除去部１００では、音響信号
入力手段３０と音響信号発生手段１０の出力を利用して
音響信号出力手段２１および２２から出力された音響信
号（例えば音楽）を低減する処理を行う。妨害音除去部
１００で処理した音声は、音声認識手段４０に渡され、
音声認識処理を行う。Therefore, the interfering sound removing section 100 performs a process of reducing the acoustic signals (for example, music) output from the acoustic signal output means 21 and 22 by utilizing the outputs of the acoustic signal input means 30 and the acoustic signal generation means 10. To do. The voice processed by the interfering sound remover 100 is passed to the voice recognition means 40,
Performs voice recognition processing.

【００５３】このように第１の実施形態によれば、第１
の音響信号出力手段２１および第２の音響信号出力手段
２２を持つ音声認識装置において、妨害音除去部１００
を設けることにより、発話者が発生した音声を認識する
際に、第１の音響信号出力手段２１および第２の音響信
号出力手段２２から出力された音響信号が音響信号入力
手段３０に混入している信号成分を低減することがで
き、音声認識性能を向上できる。As described above, according to the first embodiment, the first
In the voice recognition device having the acoustic signal output unit 21 and the second acoustic signal output unit 22 of FIG.
By providing the sound signal, the sound signals output from the first sound signal output means 21 and the second sound signal output means 22 are mixed into the sound signal input means 30 when the speaker utters the sound. The signal component present can be reduced, and the voice recognition performance can be improved.

【００５４】なお、本発明の第１の実施形態では、第１
の音響信号出力手段２１および第２の音響信号出力手段
２２を持つの場合を例としたが、音響信号出力手段が３
つ以上ある場合にも同様の処理を行うことが可能であ
る。In the first embodiment of the present invention, the first
The case where the sound signal output means 21 and the second sound signal output means 22 are provided is taken as an example, but the sound signal output means 3
Similar processing can be performed when there are two or more.

【００５５】図２は、本発明の第２の実施形態の音声認
識装置のブロック図を示す。FIG. 2 shows a block diagram of a speech recognition apparatus according to the second embodiment of the present invention.

【００５６】図２に示すように、第２の実施形態の音声
認識装置は、外部の音響信号発生手段２００と、外部の
音響信号発生手段２００の出力信号を入力して音響信号
を出力する音響信号発生手段１０と、音響信号発生手段
１０の出力信号を音波として空間へ出力する第１の音響
信号出力手段２１および第２の音響信号出力手段２２
と、発話者が発声した音声を入力する音響信号入力手段
３０と、音響信号入力手段３０の出力信号と音響信号発
生手段１０の出力信号を入力し、音響信号入力手段３０
の出力信号に混入している第１の音響信号出力手段２１
および第２の音響信号出力手段２２の出力信号成分を低
減する妨害音除去部１００と、妨害音除去部１００の出
力信号を入力して発話者が発声した音声を認識する音声
認識手段４０とから構成されている。As shown in FIG. 2, the speech recognition apparatus according to the second embodiment has an external acoustic signal generating means 200 and an acoustic signal for outputting an acoustic signal by inputting an output signal of the external acoustic signal generating means 200. The signal generating means 10, the first acoustic signal outputting means 21 and the second acoustic signal outputting means 22 for outputting the output signal of the acoustic signal generating means 10 to the space as a sound wave.
, The acoustic signal input means 30 for inputting the voice uttered by the speaker, the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal generation means 10, and the acoustic signal input means 30.
First acoustic signal output means 21 mixed in the output signal of
And the interfering sound removing unit 100 that reduces the output signal component of the second acoustic signal outputting unit 22, and the voice recognizing unit 40 that receives the output signal of the interfering sound removing unit 100 and recognizes the voice uttered by the speaker. It is configured.

【００５７】以上のように構成された第２の実施形態の
音声認識装置について、その動作を説明する。ただし、
本発明の第２の実施形態は、第１の実施形態と音響信号
発生手段１０と外部の音響信号発生手段２００の動作の
みが異なるため、この２つの部分についてのみ説明す
る。The operation of the speech recognition apparatus of the second embodiment configured as described above will be described. However,
The second embodiment of the present invention differs from the first embodiment only in the operations of the acoustic signal generating means 10 and the external acoustic signal generating means 200, and therefore only these two parts will be described.

【００５８】まず音響信号発生手段１０には、外部の音
響信号発生手段２００としてCDやラジオなどのオーディ
オ装置が接続されている。音響信号発生手段１０は、接
続端子などを通して外部の音響信号発生手段２００の出
力信号を受け取る。さらに音響信号発生手段１０では、
入力された音響信号を音響信号の第１の音響信号出力手
段２１および第２の音響信号出力手段２２に渡して音波
として空間に出力させるようになっている。このとき、
音響信号発生手段１０は、入力信号である外部の音響信
号発生手段２００の出力に対して、その振幅倍率を変え
たり、出力する信号のチャンネル数を変換させることも
できる。First, an audio device such as a CD or a radio is connected to the acoustic signal generating means 10 as an external acoustic signal generating means 200. The acoustic signal generating means 10 receives the output signal of the external acoustic signal generating means 200 through a connection terminal or the like. Furthermore, in the acoustic signal generating means 10,
The input acoustic signal is passed to the first acoustic signal output means 21 and the second acoustic signal output means 22 for the acoustic signal, and is output to the space as a sound wave. At this time,
The acoustic signal generating means 10 can change the amplitude magnification of the output of the external acoustic signal generating means 200, which is an input signal, or change the number of channels of the output signal.

【００５９】このような本発明の第２の実施の形態によ
れば、第１の音響信号出力手段２１と第２の音響信号出
力手段２２を持つ音声認識装置において、妨害音除去部
１００を設け、外部の音響信号発生手段２００と接続で
きる構成にすることによって、発話者が発生した音声を
認識する際に、第１の音響信号出力手段２１および第２
の音響信号出力手段２２から出力された音響信号が音響
信号入力手段３０に混入している信号成分を低減するこ
とができ、音声認識性能を向上できる。According to the second embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 100 is provided. By connecting to the external acoustic signal generating means 200, the first acoustic signal outputting means 21 and the second acoustic signal outputting means 21 can be used when recognizing the voice generated by the speaker.
The signal component of the acoustic signal output from the acoustic signal output unit 22 mixed in the acoustic signal input unit 30 can be reduced, and the voice recognition performance can be improved.

【００６０】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【００６１】図３は、本発明の第３の実施形態の音声認
識装置のブロック図を示す。FIG. 3 shows a block diagram of a speech recognition apparatus according to the third embodiment of the present invention.

【００６２】図３に示すように、第３の実施形態の音声
認識装置は、音響信号を発生する音響信号発生手段１０
と、音響信号発生手段１０の出力信号を音波として空間
へ出力する第１の音響信号出力手段２１および第２の音
響信号出力手段２２と、発話者が発声した音声を入力す
る音響信号入力手段３０と、音響信号入力手段３０の出
力信号と音響信号発生手段１０の出力信号を入力し、音
響信号入力手段３０の出力信号に混入している第１の音
響信号出力手段２１および第２の音響信号出力手段２２
の出力信号成分を低減する妨害音除去部１００と、妨害
音除去部１００の出力信号を入力し、発話者が発声した
音声を認識する音声認識手段４０とから構成されてい
る。As shown in FIG. 3, the speech recognition apparatus according to the third embodiment has an acoustic signal generating means 10 for generating an acoustic signal.
A first acoustic signal output means 21 and a second acoustic signal output means 22 for outputting the output signal of the acoustic signal generation means 10 to the space as sound waves, and an acoustic signal input means 30 for inputting the voice uttered by the speaker. And the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal generation means 10 are input, and are mixed in the output signal of the acoustic signal input means 30 with the first acoustic signal output means 21 and the second acoustic signal. Output means 22
And an audio signal recognizing means 40 which receives the output signal of the interfering sound removing unit 100 and recognizes the voice uttered by the speaker.

【００６３】さらに、妨害音除去部１００は、音響信号
入力手段３０の出力信号に混入している第１の音響信号
出力手段２１および第２の音響信号出力手段２２の出力
信号成分を低減する第１段のエコーキャンセラ手段１１
１および第２段のエコーキャンセラ手段１１２とから構
成されている。Further, the disturbing sound removing section 100 reduces the output signal components of the first acoustic signal output means 21 and the second acoustic signal output means 22 mixed in the output signal of the acoustic signal input means 30. One-stage echo canceller means 11
It is composed of first and second-stage echo canceller means 112.

【００６４】以上のように構成された第３の実施形の態
音声認識装置について、その動作を説明する。ただし、
妨害音除去部１００以外の構成は第１の実施形態と同様
であるため、妨害音除去部１００の内部の動作について
のみ説明する。The operation of the speech recognition apparatus of the third embodiment configured as described above will be described. However,
Since the configuration other than the interference sound removal unit 100 is the same as that of the first embodiment, only the internal operation of the interference sound removal unit 100 will be described.

【００６５】音響信号入力手段３０から妨害音除去部１
００に入力した音響信号は、第１段のエコーキャンセラ
手段１１１および第２段のエコーキャンセラ手段１１２
によって連続して処理され、音声認識手段４０に渡され
る。From the acoustic signal input means 30 to the interfering sound removing section 1
The acoustic signal input to 00 is echo canceller means 111 of the first stage and echo canceller means 112 of the second stage.
Are successively processed by the voice recognition means 40 and passed to the voice recognition means 40.

【００６６】まず、最初に第１段のエコーキャンセラ手
段１１１で、第１の音響出力手段２１から出力されて音
響信号入力手段３０に混入した信号を除去し、第２段の
エコーキャンセラ手段１１２に渡される。First, the echo canceller means 111 of the first stage removes the signal output from the first acoustic output means 21 and mixed in the acoustic signal input means 30, and the echo canceller means 112 of the second stage. Passed.

【００６７】次に第２段のエコーキャンセラ手段１１２
では、第２の音響出力手段２２から出力されて音響信号
入力手段３０に混入した信号を除去し、出力する。Next, the second-stage echo canceller means 112
Then, the signal outputted from the second sound output means 22 and mixed in the sound signal input means 30 is removed and outputted.

【００６８】ここで、第１段のエコーキャンセラ手段１
１１および第２段のエコーキャンセラ手段１１２で行う
処理の具体例について説明するため、代表的なエコーキ
ャンセラを利用した場合について図１６のエコーキャン
セラのブロック図を用いて説明する。Here, the first-stage echo canceller means 1
In order to describe a specific example of the processing performed by the 11th and second-stage echo canceller means 112, the case of using a typical echo canceller will be described with reference to the block diagram of the echo canceller in FIG.

【００６９】なお、エコーキャンセラについては、文献
「音響システムとディジタル処理」（大賀、山崎、金田
共著：電子情報通信学会出版）などで説明されている。The echo canceller is described in the document "Acoustic System and Digital Processing" (Oga, Yamazaki, Kaneda, published by The Institute of Electronics, Information and Communication Engineers).

【００７０】図１６で、まず音声を発声していない状態
を考える。このとき音響信号出力手段２０から空間に放
射される音響信号の時系列信号をx(k)とすると、x(k)は
空間に放射されて空間伝達特性h'(k)が畳み込まれ、音
声信号入力手段３０に入力されて信号y'(k)となる。こ
のとき、適応フィルタ手段１７０では、音響信号x(k)と
音響信号入力手段３０の出力y'(k)から空間の伝達特性h
(k)を推定し、その伝達特性h(k)をx(k)に畳み込んで音
響信号入力手段３０からの入力信号を推定してy(k)を出
力する。そのy(k)を音響信号入力手段３０が出力したy'
(k)から引き算することで音響エコー（音響信号出力手
段２０から出力した信号が音響信号入力手段３０に入っ
てくる信号）を除去することができる。First, consider a state in which no voice is uttered in FIG. At this time, when the time series signal of the acoustic signal radiated to the space from the acoustic signal output means 20 is x (k), x (k) is radiated to the space and the spatial transfer characteristic h '(k) is convolved. The signal y '(k) is input to the voice signal input means 30. At this time, the adaptive filter means 170 uses the acoustic signal x (k) and the output y ′ (k) of the acoustic signal input means 30 to determine the spatial transfer characteristic h.
(k) is estimated, the transfer characteristic h (k) is convolved with x (k), the input signal from the acoustic signal input means 30 is estimated, and y (k) is output. The y (k) is y'output from the acoustic signal input means 30.
By subtracting from (k), the acoustic echo (the signal output from the acoustic signal output means 20 entering the acoustic signal input means 30) can be removed.

【００７１】次に、上記のように空間の伝達特性をh(k)
と推定し、フィルタが安定した状態で、発話者が発声し
た信号z(k)を入力すると、音声信号入力手段３０に入る
音響信号は、s(k)=y'(k)+z(k)となるが、エコーキャン
セラ手段１１０でs(k)=y'(k)+z(k)-y(k)の処理によっ
て、音響エコーを取り除くことができる。Next, as described above, the transfer characteristic of the space is set to h (k)
If the signal z (k) uttered by the speaker is input with the filter being stable, the acoustic signal entering the voice signal input means 30 is s (k) = y ′ (k) + z (k However, the acoustic echo can be removed by the processing of s (k) = y '(k) + z (k) -y (k) in the echo canceller means 110.

【００７２】これがエコーキャンセラの基本的な原理で
あり、第１段のエコーキャンセラ手段１１１および第２
段のエコーキャンセラ手段１１２では、このような処理
を行う。This is the basic principle of the echo canceller, and the first-stage echo canceller means 111 and the second
The echo canceller means 112 of the stage performs such processing.

【００７３】このような本発明の第３の実施の形態によ
れば、第１の音響信号出力手段２１と第２の音響信号出
力手段２２を持つ音声認識装置において、第１段のエコ
ーキャンセラ手段１１１および第２段のエコーキャンセ
ラ手段１１２を有する妨害音除去部１００を設けること
により、発話者が発生した音声を認識する際に、第１の
音響信号出力手段２１および第２の音響信号出力手段２
２から出力された音響信号が音響信号入力手段３０に混
入している信号成分を低減することができ、音声認識性
能を向上できる。According to the third embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the echo canceller means of the first stage is used. By providing the interfering sound removing unit 100 having the 111 and the second-stage echo canceller means 112, the first acoustic signal output means 21 and the second acoustic signal output means when the voice generated by the speaker is recognized. Two
It is possible to reduce the signal component in which the acoustic signal output from No. 2 is mixed in the acoustic signal input means 30 and improve the voice recognition performance.

【００７４】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【００７５】また、本発明の第３の実施形態では、第１
段のエコーキャンセラ手段１１１および第２段のエコー
キャンセラ手段１１２を持つの場合を例としたが、エコ
ーキャンセラ手段が３つ以上ある場合にも同様の処理を
行うことが可能である。In the third embodiment of the present invention, the first
The case of having the echo canceller means 111 of the second stage and the echo canceller means 112 of the second stage is taken as an example, but the same processing can be performed when there are three or more echo canceller means.

【００７６】さらに、エコーキャンセラのアルゴリズム
も複数存在し、ここで説明したアルゴリズム以外の方法
で処理しても構わない。Further, there are a plurality of echo canceller algorithms, and processing may be performed by a method other than the algorithm described here.

【００７７】図４は、本発明の第４の実施形態の音声認
識装置のブロック図を示す。FIG. 4 shows a block diagram of a speech recognition apparatus according to the fourth embodiment of the present invention.

【００７８】図４に示すように、第４の実施形態の音声
認識装置は、音響信号を発生する音響信号発生手段１０
と、音響信号発生手段１０の出力信号を音波として空間
へ出力する第１の音響信号出力手段２１および第２の音
響信号出力手段２２と、発話者が発声した音声を入力す
る音響信号入力手段３０と、音響信号入力手段３０の出
力信号と音響信号発生手段１０の出力信号を入力し、音
響信号入力手段３０の出力信号に混入している第１の音
響信号出力手段２１および第２の音響信号出力手段２２
の出力信号成分を低減する妨害音除去部１００と、妨害
音除去部１００の出力信号を入力し、発話者が発声した
音声を認識する音声認識手段４０とで構成されている。As shown in FIG. 4, the speech recognition apparatus of the fourth embodiment has an acoustic signal generating means 10 for generating an acoustic signal.
A first acoustic signal output means 21 and a second acoustic signal output means 22 for outputting the output signal of the acoustic signal generation means 10 to the space as sound waves, and an acoustic signal input means 30 for inputting the voice uttered by the speaker. And the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal generation means 10 are input, and are mixed in the output signal of the acoustic signal input means 30 with the first acoustic signal output means 21 and the second acoustic signal. Output means 22
And an audio signal recognition unit 40 that receives the output signal of the interfering sound removing unit 100 and recognizes the voice uttered by the speaker.

【００７９】さらに、妨害音除去部１００では、音響信
号入力手段３０の出力信号から混入した第１の音響信号
出力手段２１および第２の音響信号出力手段２２の出力
信号成分を低減する第１段のエコーキャンセラ手段１１
１および第２段のエコーキャンセラ手段１１２と、第１
段のエコーキャンセラ手段１１１および第２段のエコー
キャンセラ手段１１２へ入力される音響信号発生手段１
０の出力信号のレベルを検出する第１の信号レベル検出
手段１２１および第２の信号レベル検出手段１２２と、
第１の信号レベル検出手段１２１および第２の信号レベ
ル検出手段１２２の出力信号を入力し、その信号レベル
から第１段のエコーキャンセラ手段１１１および第２段
のエコーキャンセラ手段１１２のうちエコーキャンセラ
の性能を向上させる処理を行うべきものはいずれである
かを判定し、エコーキャンセラ性能向上のための学習の
タイミングを決定する学習タイミング決定手段１３０と
から構成されている。Further, in the disturbing sound removing section 100, the first stage for reducing the output signal components of the first acoustic signal output means 21 and the second acoustic signal output means 22 mixed from the output signal of the acoustic signal input means 30. Echo canceller means 11
First and second stage echo canceller means 112;
Acoustic signal generating means 1 input to the echo canceller means 111 in the second stage and the echo canceller means 112 in the second stage
First signal level detecting means 121 and second signal level detecting means 122 for detecting the level of the output signal of 0;
The output signals of the first signal level detecting means 121 and the second signal level detecting means 122 are input, and the echo canceller of the first-stage echo canceller means 111 and the second-stage echo canceller means 112 of the echo canceller is inputted from the signal levels. The learning timing determining means 130 is configured to determine which is the one to perform the processing for improving the performance, and to determine the learning timing for improving the echo canceller performance.

【００８０】以上のように構成された第４の実施形態の
音声認識装置について、その動作を説明する。ただし、
妨害音除去部１００以外の構成は第３の実施形態と同様
であるため、妨害音除去部１００の内部の動作について
のみ説明する。The operation of the speech recognition apparatus of the fourth embodiment constructed as above will be described. However,
Since the configuration other than the interference sound removal unit 100 is the same as that of the third embodiment, only the internal operation of the interference sound removal unit 100 will be described.

【００８１】音響信号３０から妨害音除去部１００に入
力された音響信号は、第１段のエコーキャンセラ手段１
１１および第２段のエコーキャンセラ手段１１２によっ
て連続して処理され、音声認識手段４０に渡される。こ
こで、第１の音響信号出力手段２１から出力される音声
レベルを第１の信号レベル検出手段１２１で検出し、ま
た第２の音響信号出力手段２２から出力される音声レベ
ルは第２の信号レベル検出手段１２２で検出し、学習タ
イミング決定手段１３０に渡される。The acoustic signal input from the acoustic signal 30 to the interfering sound removing section 100 is the echo canceller means 1 of the first stage.
11 and the echo canceller means 112 of the second stage successively process and pass them to the voice recognition means 40. Here, the sound level output from the first sound signal output means 21 is detected by the first signal level detection means 121, and the sound level output from the second sound signal output means 22 is the second signal. It is detected by the level detection means 122 and passed to the learning timing determination means 130.

【００８２】学習タイミング決定手段１３０では、第１
の信号レベル検出手段１２１および第２の信号レベル検
出手段１２２で検出された音声レベルの結果をうけ、第
１の信号レベル検出手段１２１か第２の信号レベル検出
手段１２２の学習を行うか否かを決定する。In the learning timing determining means 130, the first
Whether to learn the first signal level detecting means 121 or the second signal level detecting means 122 based on the result of the audio level detected by the signal level detecting means 121 and the second signal level detecting means 122. To decide.

【００８３】学習タイミング決定手段１３０における学
習のタイミングの設定例を表１に示す。Table 1 shows an example of setting the learning timing in the learning timing determining means 130.

【００８４】[0084]

【表１】 [Table 1]

【００８５】表１の例では「学習タイミング決定手段１
３０に入力される第１の信号レベル検出手段１２１およ
び第２の信号レベル検出手段１２２の出力信号のうち、
一つを除いたすべての出力信号（この場合１個の出力信
号）が閾値Ａ（＝５０）未満であり、なおかつ残りの一
つの出力信号が閾値Ｂ（＝１００）以上であった場合
に、第１の信号レベル検出手段１２１または第２の信号
レベル検出手段１２２に入力された音響信号を入力とす
る第１段のエコーキャンセラ手段１１１または第２段の
エコーキャンセラ手段１１２または、図５に示す第１の
フィルタ係数算出手段１４１または第２のフィルタ係数
算出手段１４２の学習を行うようにする」としたとす
る。In the example of Table 1, "learning timing determining means 1
Of the output signals of the first signal level detecting means 121 and the second signal level detecting means 122 input to 30,
When all the output signals except one (in this case, one output signal) are less than the threshold value A (= 50), and the remaining one output signal is the threshold value B (= 100) or more, The first-stage echo canceller means 111 or the second-stage echo canceller means 112, which receives the acoustic signal input to the first signal level detection means 121 or the second signal level detection means 122, or is shown in FIG. The learning of the first filter coefficient calculation means 141 or the second filter coefficient calculation means 142 is performed. "

【００８６】このような本発明の第４の実施の形態によ
れば、第１の音響信号出力手段２１と第２の音響信号出
力手段２２を持つ音声認識装置において、妨害音除去部
１００を設け、さらにエコーキャンセラの学習のタイミ
ングを決定する学習タイミング決定手段１３０を設ける
ことにより、発話者が発生した音声を認識する際に、第
１の音響信号出力手段２１および第２の音響信号出力手
段２２から出力された音響信号が音響信号入力手段３０
に混入している信号成分を低減することができ、音声認
識性能を向上できる。According to the fourth embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 100 is provided. Further, by providing the learning timing determining means 130 for determining the learning timing of the echo canceller, the first acoustic signal output means 21 and the second acoustic signal output means 22 when recognizing the voice generated by the speaker. The acoustic signal output from the acoustic signal input means 30
It is possible to reduce the signal component that is mixed in, and improve the voice recognition performance.

【００８７】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能であり、エコーキャンセラ手段と信号レベル検出手
段も３個以上設定することができる。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means, and three or more echo canceller means and signal level detection means are set. be able to.

【００８８】図５は、本発明の第５の実施形態の音声認
識装置のブロック図を示す。FIG. 5 shows a block diagram of a speech recognition apparatus according to the fifth embodiment of the present invention.

【００８９】図５に示すように、第５の実施形態の音声
認識装置は、音響信号を発生する音響信号発生手段１０
と、音響信号発生手段１０の出力信号を音波として空間
へ出力する第１の音響信号出力手段２１および第２の音
響信号出力手段２２と、発話者が発声した音声を入力す
る音響信号入力手段３０と、音響信号入力手段３０の出
力信号と音響信号発生手段１０の出力信号を入力し、音
響信号入力手段３０の出力信号から混入した第１の音響
信号出力手段２１および第２の音響信号出力手段２２の
出力信号成分を低減する妨害音除去部１００と、妨害音
除去部１００の出力信号を入力し、発話者が発声した音
声を認識する音声認識手段４０とから構成されている。As shown in FIG. 5, the speech recognition apparatus of the fifth embodiment has an acoustic signal generating means 10 for generating an acoustic signal.
A first acoustic signal output means 21 and a second acoustic signal output means 22 for outputting the output signal of the acoustic signal generation means 10 to the space as sound waves, and an acoustic signal input means 30 for inputting the voice uttered by the speaker. And the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal generation means 10 are input, and the first acoustic signal output means 21 and the second acoustic signal output means are mixed from the output signal of the acoustic signal input means 30. It comprises an interfering sound removing section 100 for reducing the output signal component of 22 and a voice recognizing means 40 which receives the output signal of the interfering sound removing section 100 and recognizes the voice uttered by the speaker.

【００９０】さらに、妨害音除去部１００では、音響信
号発生手段１０の出力信号のうちのいずれかと音響信号
入力手段３０の出力信号を入力し、音響信号入力手段３
０の出力信号に混入している第１の音響信号出力手段２
１および第２の音響信号出力手段２２のうちの一つの出
力信号を減衰させるように働くフィルタのフィルタ係数
を計算する第１のフィルタ係数算出手段１４１および第
２のフィルタ係数算出手段１４２と、第１のフィルタ係
数算出手段１４１および第２のフィルタ係数算出手段１
４２へ入力される音響信号発生手段１０の出力信号のレ
ベルを検出する第１の信号レベル検出手段１２１および
第２の信号レベル検出手段１２２と、第１の信号レベル
検出手段１２１および第２の信号レベル検出手段１２２
の出力信号を入力し、その信号レベルから第１のフィル
タ係数算出手段１４１および第２のフィルタ係数算出手
段１４２のうちいずれのフィルタ係数を学習すべきであ
るかを判定し、学習のタイミングを決定する学習タイミ
ング決定手段１３０と、第１のフィルタ係数算出手段１
４１および第２のフィルタ係数算出手段１４２で計算さ
れたフィルタ係数を統合するフィルタ係数統合手段１５
０と、フィルタ係数統合手段１５０で求めたフィルタ係
数を利用して音響信号入力手段３０から音響信号出力成
分を低減する処理を行うフィルタリング手段１６０とか
ら構成されている。Further, in the disturbing sound removing section 100, one of the output signals of the acoustic signal generating means 10 and the output signal of the acoustic signal input means 30 are inputted, and the acoustic signal input means 3 is inputted.
First acoustic signal output means 2 mixed with 0 output signal
First filter coefficient calculation means 141 and second filter coefficient calculation means 142 for calculating the filter coefficient of a filter that acts to attenuate the output signal of one of the first and second acoustic signal output means 22; First filter coefficient calculation means 141 and second filter coefficient calculation means 1
First signal level detecting means 121 and second signal level detecting means 122 for detecting the level of the output signal of the acoustic signal generating means 10 input to 42, first signal level detecting means 121 and second signal Level detection means 122
Of the first filter coefficient calculation means 141 and the second filter coefficient calculation means 142 is determined from the signal level of the output signal, and the learning timing is determined. Learning timing determining means 130 and first filter coefficient calculating means 1
41 and the filter coefficient integrating means 15 for integrating the filter coefficients calculated by the second filter coefficient calculating means 142.
0, and a filtering unit 160 that performs a process of reducing the acoustic signal output component from the acoustic signal input unit 30 using the filter coefficient obtained by the filter coefficient integration unit 150.

【００９１】以上のように構成された第５の実施形態の
音声認識装置について、その動作を説明する。ただし、
妨害音除去部１００以外の構成は第１の実施形と同様で
あるため、妨害音除去部１００の内部の動作についての
み説明する。The operation of the speech recognition apparatus of the fifth embodiment configured as above will be described. However,
Since the configuration other than the interference sound removing unit 100 is the same as that of the first embodiment, only the operation inside the interference sound removing unit 100 will be described.

【００９２】音響信号３０から妨害音除去部１００に入
力された音響信号は、まず第１のフィルタ係数算出手段
１４１に渡され、第１の音響信号出力手段２１から出力
され音響信号入力手段３０へ混入する音響信号を低減す
るためのフィルタ係数の算出を行う。このフィルタ係数
は、例えば第２の実施形態で説明した適応フィルタの原
理によって求めることができる。The acoustic signal input from the acoustic signal 30 to the interfering sound removing section 100 is first passed to the first filter coefficient calculating means 141, and then output from the first acoustic signal output means 21 to the acoustic signal input means 30. The filter coefficient for reducing the mixed acoustic signal is calculated. The filter coefficient can be obtained, for example, by the principle of the adaptive filter described in the second embodiment.

【００９３】また、音響信号３０から妨害音除去部１０
０に入力された音響信号は、第２のフィルタ係数算出手
段１４２にも渡され、第２の音響信号出力手段２２から
出力され音響信号入力手段３０へ混入する音響信号を低
減するためのフィルタ係数の算出を行う。In addition, the interfering sound removing unit 10 from the acoustic signal 30.
The acoustic signal input to 0 is also passed to the second filter coefficient calculation means 142, and a filter coefficient for reducing the acoustic signal output from the second acoustic signal output means 22 and mixed into the acoustic signal input means 30. Is calculated.

【００９４】このとき、第１の音響信号出力手段２１か
ら出力される音声レベルを信号レベル検出手段１２１で
検出し、第２の音響信号出力手段２２から出力される音
声レベルは信号レベル検出手段１２２で検出して、学習
タイミング決定手段１３０に渡される。学習タイミング
決定手段１３０では、第１の信号レベル検出手段１２１
および第２の信号レベル検出手段１２２で検出された音
声レベルの結果をうけ、フィルタ係数の算出を行うか否
かを決定する。こうして第１のフィルタ係数算出手段１
４１および第２のフィルタ係数算出手段１４２で計算さ
れたフィルタ係数はフィルタ係数統合手段１５０に渡さ
れる。フィルタ係数統合手段１５０では、２つのフィル
タ係数を統合してひとつのフィルタ係数に変換する。At this time, the sound level output from the first sound signal output means 21 is detected by the signal level detection means 121, and the sound level output from the second sound signal output means 22 is the signal level detection means 122. And is passed to the learning timing determining means 130. In the learning timing determining means 130, the first signal level detecting means 121
Then, the result of the voice level detected by the second signal level detecting means 122 is received, and it is determined whether or not to calculate the filter coefficient. Thus, the first filter coefficient calculation means 1
41 and the filter coefficient calculated by the second filter coefficient calculating means 142 are passed to the filter coefficient integrating means 150. The filter coefficient integration means 150 integrates two filter coefficients and converts them into one filter coefficient.

【００９５】フィルタ係数を統合する例としては、第１
のフィルタ係数算出手段１４１および第２のフィルタ係
数算出手段１４２で求められたフィルタ係数を畳み込み
演算することによって求めることができる。The first example of integrating filter coefficients is as follows.
Can be obtained by performing a convolution operation on the filter coefficients obtained by the filter coefficient calculation means 141 and the second filter coefficient calculation means 142.

【００９６】このフィルタ係数統合手段１５０で算出さ
れたフィルタ係数はフィルタリング手段１６０に渡され
る。フィルタリング手段１６０では、音響信号入力手段
３０の出力にフィルタ処理をして出力することで、第１
の音響信号出力手段２１と第２の音響信号出力手段２２
から出力されて音響信号入力手段３０に混入した信号成
分を低減する。このような本発明の第５の実施の形態に
よれば、第１の音響信号出力手段２１と第２の音響信号
出力手段２２を持つ音声認識装置において、妨害音除去
部１００を設け、さらにフィルタ係数を算出する手段と
それらを統合する手段を設けることにより、発話者が発
生した音声を認識する際に、音声認識をする際に第１の
音響信号出力手段２１および第２の音響信号出力手段２
２から出力された音響信号が音響信号入力手段３０に混
入している信号成分を低減することができ、音声認識性
能を向上できる。The filter coefficient calculated by the filter coefficient integrating means 150 is passed to the filtering means 160. In the filtering unit 160, the output of the acoustic signal input unit 30 is filtered and output, so that the first
Acoustic signal output means 21 and second acoustic signal output means 22
The signal component that is output from and mixed into the acoustic signal input means 30 is reduced. According to the fifth embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 100 is provided and the filter is further added. By providing the means for calculating the coefficient and the means for integrating them, the first acoustic signal output means 21 and the second acoustic signal output means when recognizing the voice generated by the speaker are recognized. Two
It is possible to reduce the signal component in which the acoustic signal output from No. 2 is mixed in the acoustic signal input means 30 and improve the voice recognition performance.

【００９７】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【００９８】図６は、本発明の第６の実施形態の音声認
識装置のブロック図を示す。FIG. 6 shows a block diagram of a speech recognition apparatus according to the sixth embodiment of the present invention.

【００９９】図６に示すように、第６の実施形態の音声
認識装置は、音響信号を発生する音響信号発生手段１０
と、音響信号発生手段１０の出力信号を音波として空間
へ出力する第１の音響信号出力手段２１および第２の音
響信号出力手段２２と、発話者が発声した音声を入力す
る音響信号入力手段３０と、音響信号入力手段３０の出
力信号を入力して騒音抑圧処理を行う騒音抑圧手段５０
と、騒音抑圧手段５０と音響信号発生手段１０の出力信
号を入力し、音響信号入力手段３０の出力信号に混入し
ている第１の音響信号出力手段２１および第２の音響信
号出力手段２２の出力信号成分を低減する妨害音除去部
１００と、妨害音除去部１００の出力信号を入力し、発
話者が発声した音声を認識する音声認識手段４０とから
構成されている。As shown in FIG. 6, the speech recognition apparatus according to the sixth embodiment has an acoustic signal generating means 10 for generating an acoustic signal.
A first acoustic signal output means 21 and a second acoustic signal output means 22 for outputting the output signal of the acoustic signal generation means 10 to the space as sound waves, and an acoustic signal input means 30 for inputting the voice uttered by the speaker. And a noise suppression means 50 for performing noise suppression processing by inputting the output signal of the acoustic signal input means 30.
Of the first acoustic signal output means 21 and the second acoustic signal output means 22 which are input with the output signals of the noise suppression means 50 and the acoustic signal generation means 10 and are mixed into the output signals of the acoustic signal input means 30. It is composed of an interfering sound removing unit 100 that reduces an output signal component, and a voice recognizing unit 40 that receives an output signal of the interfering sound removing unit 100 and recognizes a voice uttered by a speaker.

【０１００】以上のように構成された第６の実施形態の
音声認識装置について、その動作を説明する。ただし、
騒音抑圧手段５０以外の構成は第１の実施形態と同様で
あるため、騒音抑圧手段５０の動作についてのみ説明す
る。The operation of the speech recognition apparatus of the sixth embodiment having the above configuration will be described. However,
Since the configuration other than the noise suppressing unit 50 is the same as that of the first embodiment, only the operation of the noise suppressing unit 50 will be described.

【０１０１】騒音抑圧手段５０では、音響信号入力手段
３０の出力信号を入力し、そこに含まれている騒音信号
を除去する処理を行い、妨害音除去部１００に渡す。こ
のとき、騒音抑圧手段５０で行うのは、主に発話者がい
る場所で聞こえる環境騒音を低減することを目的とした
処理であり、スペクトルサブトラクション法などの処理
を行う。The noise suppression means 50 inputs the output signal of the acoustic signal input means 30, performs processing for removing the noise signal contained therein, and passes it to the interfering sound removing section 100. At this time, what is performed by the noise suppressing unit 50 is processing intended mainly to reduce environmental noise heard in a place where a speaker is present, and processing such as a spectral subtraction method is performed.

【０１０２】このような本発明の第６の実施形態によれ
ば、第１の音響信号出力手段２１と第２の音響信号出力
手段２２を持つ音声認識装置において、妨害音除去部１
００を設け、さらに騒音抑圧手段５０を設けることによ
り、発話者が発生した音声を認識する際に、第１の音響
信号出力手段２１および第２の音響信号出力手段２２か
ら出力された音響信号が音響信号入力手段３０に混入し
ている信号成分を低減することができ、さらに発話者の
場所に存在する環境騒音を低減して音声認識性能を向上
できる。According to the sixth embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 1
00 and the noise suppressor 50, the acoustic signals output from the first acoustic signal output means 21 and the second acoustic signal output means 22 are recognized when the voice generated by the speaker is recognized. It is possible to reduce the signal component mixed in the acoustic signal input means 30, further reduce the environmental noise existing in the place of the speaker, and improve the voice recognition performance.

【０１０３】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１０４】図７は、本発明の第７の実施形態の音声認
識装置のブロック図を示す。FIG. 7 shows a block diagram of a speech recognition apparatus according to the seventh embodiment of the present invention.

【０１０５】図７に示すように、第７の実施形態の音声
認識装置は、音響信号を発生する音響信号発生手段１０
と、音響信号発生手段１０の出力信号を音波として空間
へ出力する第１の音響信号出力手段２１および第２の音
響信号出力手段２２と、発話者が発声した音声を入力す
る音響信号入力手段３０と、音響信号入力手段３０と音
響信号発生手段１０の出力信号を入力し、音響信号入力
手段３０の出力信号に混入している第１の音響信号出力
手段２１および第２の音響信号出力手段２２の出力信号
成分を低減する妨害音除去部１００と、妨害音除去部１
００の出力信号を入力して騒音抑圧処理を行う騒音抑圧
手段５０と、騒音抑圧手段５０の出力信号を入力し、発
話者が発声した音声を認識する音声認識手段４０とから
構成されている。As shown in FIG. 7, the speech recognition apparatus of the seventh embodiment has an acoustic signal generating means 10 for generating an acoustic signal.
A first acoustic signal output means 21 and a second acoustic signal output means 22 for outputting the output signal of the acoustic signal generation means 10 to the space as sound waves, and an acoustic signal input means 30 for inputting the voice uttered by the speaker. , The first acoustic signal output means 21 and the second acoustic signal output means 22 which input the output signals of the acoustic signal input means 30 and the acoustic signal generation means 10 and are mixed in the output signals of the acoustic signal input means 30. Interference removal unit 100 for reducing the output signal component of
The noise suppression means 50 inputs the output signal of 00 to perform the noise suppression processing, and the voice recognition means 40 which inputs the output signal of the noise suppression means 50 and recognizes the voice uttered by the speaker.

【０１０６】以上のように構成された第７の実施形態の
音声認識装置について、その動作を説明する。ただし、
騒音抑圧手段５０以外の構成は第１の実施形態と同様で
あるため、騒音抑圧手段５０の動作についてのみ説明す
る。The operation of the speech recognition apparatus of the seventh embodiment configured as above will be described. However,
Since the configuration other than the noise suppressing unit 50 is the same as that of the first embodiment, only the operation of the noise suppressing unit 50 will be described.

【０１０７】騒音抑圧手段５０では、妨害音除去部１０
０の出力信号を入力し、そこに含まれている騒音信号を
除去する処理を行い、音声認識手段４０に渡す。このと
き、騒音抑圧手段５０で行うのは、主に発話者がいる場
所で聞こえる環境騒音を低減することを目的とした処理
であり、スペクトルサブトラクション法などの処理を行
う。In the noise suppressing means 50, the interfering sound removing section 10
An output signal of 0 is input, a noise signal contained therein is removed, and the result is passed to the voice recognition means 40. At this time, what is performed by the noise suppressing unit 50 is processing intended mainly to reduce environmental noise heard in a place where a speaker is present, and processing such as a spectral subtraction method is performed.

【０１０８】このような本発明の第７の実施の形態によ
れば、第１の音響信号出力手段２１と第２の音響信号出
力手段２２を持つ音声認識装置において、妨害音除去部
１００を設け、さらに騒音抑圧手段５０を設けることに
より、発話者が発生した音声を認識する際に、第１の音
響信号出力手段２１および第２の音響信号出力手段２２
から出力された音響信号が音響信号入力手段３０に混入
している信号成分を低減することができ、さらに発話者
の場所に存在する環境騒音を低減して音声認識性能を向
上できる。According to the seventh embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 100 is provided. Further, by providing the noise suppressing means 50, the first acoustic signal output means 21 and the second acoustic signal output means 22 are provided when the voice generated by the speaker is recognized.
It is possible to reduce the signal component in which the acoustic signal output from the device is mixed in the acoustic signal input means 30, further reduce the environmental noise existing at the place of the speaker, and improve the voice recognition performance.

【０１０９】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１１０】図８は、本発明の第８の実施形態の音声認
識装置のブロック図を示す。FIG. 8 shows a block diagram of a speech recognition apparatus according to the eighth embodiment of the present invention.

【０１１１】図８に示すように、第８の実施形態の音声
認識装置は、音響信号を発生する音響信号発生手段１０
と、音響信号発生手段１０の出力信号を音波として空間
へ出力する第１の音響信号出力手段２１および第２の音
響信号出力手段２２と、発話者が発声した音声を入力す
る音響信号入力手段３０と、音響信号入力手段３０の出
力信号を入力して騒音抑圧処理を行う第１の騒音抑圧手
段５１と、第１の騒音抑圧手段５１と音響信号発生手段
１０の出力信号を入力し、音響信号入力手段３０の出力
信号に混入している第１の音響信号出力手段２１および
第２の音響信号出力手段２２の出力信号成分を低減する
妨害音除去部１００と、妨害音除去部１００の出力信号
を入力して騒音抑圧処理を行う第２の騒音抑圧手段５２
と、第２の騒音抑圧手段５２の出力信号を入力し、発話
者が発声した音声を認識する音声認識手段４０とから構
成されている。As shown in FIG. 8, the speech recognition apparatus of the eighth embodiment has an acoustic signal generating means 10 for generating an acoustic signal.
A first acoustic signal output means 21 and a second acoustic signal output means 22 for outputting the output signal of the acoustic signal generation means 10 to the space as sound waves, and an acoustic signal input means 30 for inputting the voice uttered by the speaker. And the first noise suppressing means 51 for inputting the output signal of the acoustic signal input means 30 to perform noise suppressing processing, and the output signals of the first noise suppressing means 51 and the acoustic signal generating means 10 for inputting the acoustic signal. An interfering sound removing unit 100 that reduces output signal components of the first acoustic signal outputting unit 21 and the second acoustic signal outputting unit 22 mixed in the output signal of the input unit 30, and an output signal of the interfering sound removing unit 100. Second noise suppression means 52 for inputting and performing noise suppression processing
And a voice recognition means 40 for recognizing the voice uttered by the speaker by inputting the output signal of the second noise suppression means 52.

【０１１２】以上のように構成された第８の実施形態の
音声認識装置について、その動作を説明する。ただし、
第１の騒音抑圧手段５１と第２の騒音抑圧手段５２以外
の構成は第１の実施形態と同様であるため、第１の騒音
抑圧手段５１と第２の騒音抑圧手段５２の動作について
のみ説明する。The operation of the speech recognition apparatus of the eighth embodiment constructed as above will be described. However,
Since the configuration other than the first noise suppressing means 51 and the second noise suppressing means 52 is the same as that of the first embodiment, only the operations of the first noise suppressing means 51 and the second noise suppressing means 52 will be described. To do.

【０１１３】まず、第１の騒音抑圧手段５１では、音響
信号入力手段３０の出力信号を入力し、そこに含まれて
いる騒音信号を除去する処理を行い、妨害音除去部１０
０に渡す。まだ第２の騒音抑圧手段５２では、妨害音除
去部１００の出力信号を入力し、そこに含まれている騒
音信号を除去する処理を行い、音声認識手段４０に渡
す。このとき、第１の騒音抑圧手段５１および第２の騒
音抑圧手段５２で行うのは、主に発話者がいる場所で聞
こえる環境騒音を低減することを目的とした処理であ
り、スペクトルサブトラクション法などの処理を行う。
また、第１の騒音抑圧手段５２と第２の騒音抑圧手段５
２で行う処理を同一のものとしても構わない。First, in the first noise suppressing means 51, the output signal of the acoustic signal inputting means 30 is inputted, the noise signal contained therein is removed, and the interfering sound removing section 10 is carried out.
Pass to 0. The second noise suppressing means 52 still inputs the output signal of the interfering sound removing section 100, performs processing for removing the noise signal contained therein, and passes it to the voice recognizing means 40. At this time, the first noise suppressing means 51 and the second noise suppressing means 52 perform processing mainly for reducing the environmental noise heard in the place where the speaker is present, such as the spectral subtraction method. Process.
In addition, the first noise suppressing means 52 and the second noise suppressing means 5
The processes performed in 2 may be the same.

【０１１４】このような本発明の第８の実施形態によれ
ば、第１の音響信号出力手段２１と第２の音響信号出力
手段２２を持つ音声認識装置において、妨害音除去部１
００を設け、さらに第１の騒音抑圧手段５１と第２の騒
音抑圧手段５２を設けることにより、発話者が発生した
音声を認識する際に、第１の音響信号出力手段２１およ
び第２の音響信号出力手段２２から出力された音響信号
が音響信号入力手段３０に混入している信号成分を低減
することができ、さらに発話者の場所に存在する環境騒
音を低減して音声認識性能を向上できる。According to the eighth embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 1
00, and by further providing the first noise suppressing means 51 and the second noise suppressing means 52, the first sound signal output means 21 and the second sound when the voice generated by the speaker is recognized. It is possible to reduce the signal component in which the acoustic signal output from the signal output unit 22 is mixed in the acoustic signal input unit 30, and further it is possible to reduce the environmental noise existing in the place of the speaker and improve the voice recognition performance. .

【０１１５】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１１６】図９は、本発明の第９の実施形態の音声認
識装置のブロック図を示す。FIG. 9 shows a block diagram of a speech recognition apparatus according to the ninth embodiment of the present invention.

【０１１７】図９に示すように、第９の実施形態の音声
認識装置は、音響信号を発生する音響信号発生手段１０
と、音響信号発生手段１０の出力を制御する音響信号出
力制御手段６０と、音響信号出力制御手段６０の出力信
号を音波として空間へ出力する第１の音響信号出力手段
２１および第２の音響信号出力手段２２と、発話者が発
声した音声を入力する音響信号入力手段３０と、音響信
号入力手段３０の出力信号と音響信号出力制御手段６０
の出力信号を入力し、音響信号入力手段３０の出力信号
に混入している第１の音響信号出力手段２１および第２
の音響信号出力手段２２の出力信号成分を低減する妨害
音除去部１００と、妨害音除去部１００の出力信号を入
力し、発話者が発声した音声を認識する音声認識手段４
０とから構成されている。As shown in FIG. 9, the speech recognition apparatus of the ninth embodiment has an acoustic signal generating means 10 for generating an acoustic signal.
, An acoustic signal output control means 60 for controlling the output of the acoustic signal generation means 10, and a first acoustic signal output means 21 and a second acoustic signal for outputting the output signal of the acoustic signal output control means 60 to the space as a sound wave. The output unit 22, the acoustic signal input unit 30 for inputting the voice uttered by the speaker, the output signal of the acoustic signal input unit 30, and the acoustic signal output control unit 60.
Of the first acoustic signal output means 21 and the second acoustic signal output means 21 which are mixed with the output signal of the acoustic signal input means 30.
The interfering sound removing unit 100 that reduces the output signal component of the acoustic signal outputting unit 22 and the voice recognizing unit 4 that receives the output signal of the interfering sound removing unit 100 and recognizes the voice uttered by the speaker.
It is composed of 0 and 0.

【０１１８】以上のように構成された第９の実施形態の
音声認識装置について、その動作を説明する。ただし、
音響信号出力制御手段６０以外の構成は第１の実施形態
と同様であるため、音響信号出力制御手段６０の動作に
ついてのみ説明する。The operation of the speech recognition apparatus of the ninth embodiment having the above configuration will be described. However,
Since the configuration other than the acoustic signal output control means 60 is the same as that of the first embodiment, only the operation of the acoustic signal output control means 60 will be described.

【０１１９】音響信号出力手段６０では、音響信号発生
手段１０の出力信号の制御を行う。その出力信号の制御
の例としては、音響信号発生手段１０の出力信号がモノ
ラル信号であった場合にステレオに変換して第１の音響
信号出力手段２１および第２の音響信号出力手段２２に
対して出力することがある。また、妨害音除去部１００
を構成するエコーキャンセラなどの性能を高めるため
に、第１の音響信号出力手段２１および第２の音響信号
出力手段２２のいずれか一方のみの信号を出力するよう
にすることもできる。The acoustic signal output means 60 controls the output signal of the acoustic signal generation means 10. As an example of the control of the output signal, when the output signal of the acoustic signal generating means 10 is a monaural signal, it is converted into a stereo signal and is output to the first acoustic signal output means 21 and the second acoustic signal output means 22. May be output. In addition, the interfering sound removing unit 100
In order to improve the performance of the echo canceller or the like constituting the above, it is possible to output the signal of only one of the first acoustic signal output means 21 and the second acoustic signal output means 22.

【０１２０】このような本発明の第９の実施形態によれ
ば、第１の音響信号出力手段２１と第２の音響信号出力
手段２２を持つ音声認識装置において、妨害音除去部１
００を設け、さらに音響信号出力制御手段６０を設ける
ことにより、妨害音除去部の性能を高めることが可能と
なり、発話者が発生した音声を認識する際に、第１の音
響信号出力手段２１および第２の音響信号出力手段２２
から出力された音響信号が音響信号入力手段３０に混入
している信号成分を低減することができ、音声認識性能
を向上できる。According to the ninth embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 1 is used.
No. 00 and the acoustic signal output control means 60 are provided, the performance of the interfering sound removing section can be improved, and the first acoustic signal output means 21 and the first acoustic signal output means 21 can be used when the voice generated by the speaker is recognized. Second acoustic signal output means 22
It is possible to reduce the signal component in which the acoustic signal output from the device is mixed in the acoustic signal input means 30, and it is possible to improve the voice recognition performance.

【０１２１】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１２２】図１０は、本発明の第１０の実施形態の音
声認識装置のブロック図を示す。FIG. 10 shows a block diagram of a speech recognition apparatus according to the tenth embodiment of the present invention.

【０１２３】図１０に示すように、第１０の実施形態の
音声認識装置は、音響信号を発生する音響信号発生手段
１０と、伝達特性を計測するのに適した信号を生成する
伝達特性計測用信号生成手段７０と、必要に応じて音響
信号発生手段１０の出力信号と伝達特性計測用信号生成
手段７０の出力信号のうちいずれかの信号を出力するよ
うに制御する音響信号出力制御手段６０と、音響信号出
力制御手段６０の出力信号を音波として空間へ出力する
複数の第１の音響信号出力手段２１および第２の音響信
号出力手段２２と、発話者が発声した音声を入力する音
響信号入力手段３０と、音響信号入力手段３０の出力信
号と音響信号出力制御手段６０の出力信号を入力し、音
響信号入力手段３０の出力信号に混入している第１の音
響信号出力手段２１および第２の音響信号出力手段２２
の出力信号成分を低減する妨害音除去部１００と、妨害
音除去部１００の出力信号を入力し、発話者が発声した
音声を認識する音声認識手段４０とから構成されてい
る。As shown in FIG. 10, the speech recognition apparatus according to the tenth embodiment has an acoustic signal generating means 10 for generating an acoustic signal and a transfer characteristic measuring means for generating a signal suitable for measuring the transfer characteristic. A signal generating means 70, and an acoustic signal output control means 60 for controlling so as to output one of the output signal of the acoustic signal generating means 10 and the output signal of the transfer characteristic measuring signal generating means 70 as necessary. , A plurality of first acoustic signal output means 21 and second acoustic signal output means 22 for outputting the output signal of the acoustic signal output control means 60 to the space as sound waves, and an acoustic signal input for inputting a voice uttered by a speaker. The first acoustic signal output means 2 in which the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal output control means 60 are input and mixed into the output signal of the acoustic signal input means 30. And a second acoustic signal output means 22
And an audio signal recognizing means 40 which receives the output signal of the interfering sound removing unit 100 and recognizes the voice uttered by the speaker.

【０１２４】以上のように構成された第１０の実施形態
の音声認識装置について、その動作を説明する。ただ
し、伝達特性計測用信号生成手段７０以外の構成は第９
の実施形態と同様であるため、音響信号出力制御手段６
０と伝達特性計測用信号生成手段７０の動作についての
み説明する。The operation of the speech recognition apparatus of the tenth embodiment constructed as above will be described. However, the configuration other than the transfer characteristic measuring signal generating means 70 is the ninth.
The acoustic signal output control means 6 is similar to that of the first embodiment.
Only 0 and the operation of the transfer characteristic measuring signal generating means 70 will be described.

【０１２５】まず、音響信号出力制御手段６０では、音
響信号発生手段１０から音響信号を出力するか、伝達特
性計測用信号生成手段７０から伝達特性計測用信号を出
力するかの制御を行う。First, the acoustic signal output control means 60 controls whether the acoustic signal generation means 10 outputs an acoustic signal or the transmission characteristic measurement signal generation means 70 outputs a transmission characteristic measurement signal.

【０１２６】妨害音除去部１００における動作を安定さ
せる必要がある場合には、伝達特性計測用信号生成手段
７０から伝達特性を計測用するのに適した伝達特性計測
用信号（例えば白色雑音またはＭ系列信号）を出力す
る。伝達特性計測用信号は、第１の音響信号出力手段２
１および第２の音響信号出力手段２２を経由して出力さ
れ、音声入力手段３０を経由して妨害音除去部１００に
入力される。妨害音除去部１００では、入力された伝達
特性計測用信号をもとに動作を安定させる。When it is necessary to stabilize the operation in the interference sound removing section 100, a transfer characteristic measuring signal (for example, white noise or M) suitable for measuring the transfer characteristic from the transfer characteristic measuring signal generating means 70. Sequence signal) is output. The transfer characteristic measurement signal is the first acoustic signal output means 2
It is output via the first and second acoustic signal output means 22 and is input to the interfering sound removal unit 100 via the voice input means 30. The interfering sound removal unit 100 stabilizes the operation based on the input transfer characteristic measurement signal.

【０１２７】このような本発明の第１０の実施の形態に
よれば、第１の音響信号出力手段２１と第２の音響信号
出力手段２２を持つ音声認識装置において、妨害音除去
部１００を設け、なおかつ妨害音除去部１００の動作を
安定化させる信号を発生する伝達特性計測用信号生成手
段を設けることにより、第１の音響信号出力手段２１と
第２の音響信号出力手段２２から出力されて音響信号入
力手段３０に混入している信号成分を低減することがで
き、音声認識性能を向上できる。According to the tenth embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 100 is provided. Moreover, by providing the transfer characteristic measuring signal generating means for generating the signal for stabilizing the operation of the interference sound removing part 100, the signal is outputted from the first acoustic signal outputting means 21 and the second acoustic signal outputting means 22. The signal component mixed in the acoustic signal input means 30 can be reduced, and the voice recognition performance can be improved.

【０１２８】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１２９】図１１は、本発明の第１１の実施形態の音
声認識装置のブロック図を示す。FIG. 11 shows a block diagram of a speech recognition apparatus according to the eleventh embodiment of the present invention.

【０１３０】図１１に示すように、第１１の実施形態の
音声認識装置は、音響信号を発生する音響信号発生手段
１０と、音響信号発生手段１０の出力を制御する音響信
号出力制御手段６０と、音響信号出力制御手段６０の出
力信号を音波として空間へ出力する第１の音響信号出力
手段２１および第２の音響信号出力手段２２と、発話者
が発声した音声を入力する音響信号入力手段３０と、音
響信号入力手段３０の出力信号と音響信号出力制御手段
６０の出力信号を入力し、騒音抑圧処理を行う騒音抑圧
手段５０と、騒音抑圧手段５０の出力信号と音響信号出
力制御手段６０の出力信号を入力し、音響信号入力手段
３０の出力信号に混入している第１の音響信号出力手段
２１および第２の音響信号出力手段２２の出力信号成分
を低減する妨害音除去部１００と、妨害音除去部１００
の出力信号を入力し、発話者が発声した音声を認識する
音声認識手段４０とから構成されている。As shown in FIG. 11, the speech recognition apparatus of the eleventh embodiment includes an acoustic signal generating means 10 for generating an acoustic signal, and an acoustic signal output control means 60 for controlling the output of the acoustic signal generating means 10. The first acoustic signal output means 21 and the second acoustic signal output means 22 that output the output signal of the acoustic signal output control means 60 to the space as sound waves, and the acoustic signal input means 30 that inputs the voice uttered by the speaker. Of the noise signal suppression means 50 for performing noise suppression processing by inputting the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal output control means 60, and the output signal of the noise suppression means 50 and the acoustic signal output control means 60. An interfering sound for inputting an output signal and reducing output signal components of the first acoustic signal output means 21 and the second acoustic signal output means 22 mixed in the output signal of the acoustic signal input means 30. And removed by section 100, disturbing sound removing unit 100
And a voice recognition means 40 for recognizing the voice uttered by the speaker.

【０１３１】以上のように構成された第１１の実施形態
の音声認識装置について、その動作を説明する。ただ
し、騒音抑圧手段５０以外の構成は第９の実施形態と同
様であるため、音響信号出力制御手段６０と騒音抑圧手
段５０の動作についてのみ説明する。The operation of the speech recognition apparatus of the eleventh embodiment constructed as above will be described. However, since the configuration other than the noise suppressing means 50 is the same as that of the ninth embodiment, only the operations of the acoustic signal output control means 60 and the noise suppressing means 50 will be described.

【０１３２】まず、音響信号出力制御手段６０では、音
響信号発生手段１０の出力信号の制御を行い、その制御
内容を騒音抑圧手段５０に渡す。騒音抑圧手段５０で
は、音響信号入力手段３０の出力信号から騒音抑圧処理
を行い、その結果を妨害音除去部１００へ出力する。こ
のとき、騒音抑圧手段５０では、音響信号出力制御手段
で出力した信号の情報をもとに騒音抑圧を行う。例え
ば、第１の音響信号出力手段２１と第２の音響信号出力
手段２２の両方から音響信号が出力されていないときに
騒音の推定を行い、騒音抑圧処理を行うことが可能とな
る。First, the acoustic signal output control means 60 controls the output signal of the acoustic signal generation means 10 and passes the control contents to the noise suppression means 50. The noise suppression unit 50 performs noise suppression processing from the output signal of the acoustic signal input unit 30, and outputs the result to the interference sound removal unit 100. At this time, the noise suppression unit 50 performs noise suppression based on the information of the signal output by the acoustic signal output control unit. For example, when no acoustic signal is output from both the first acoustic signal output means 21 and the second acoustic signal output means 22, it is possible to estimate noise and perform noise suppression processing.

【０１３３】このような本発明の第１１の実施形態によ
れば、第１の音響信号出力手段２１と第２の音響信号出
力手段２２を持つ音声認識装置において、妨害音除去部
１００と、騒音抑圧手段５０と、音響信号出力制御手段
６０を設けることによって、発話者が発生した音声を認
識する際に、第１の音響信号出力手段２１および第２の
音響信号出力手段２２から出力された音響信号が音響信
号入力手段３０に混入している信号成分を低減すること
ができ、さらに発話者の場所に存在する環境騒音を低減
して音声認識性能を向上できる。According to the eleventh embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the noise canceller 100 and the noise canceller are used. By providing the suppressing means 50 and the acoustic signal output control means 60, when the voice generated by the speaker is recognized, the sound output from the first acoustic signal output means 21 and the second acoustic signal output means 22 is output. It is possible to reduce the signal component of the signal mixed in the acoustic signal input unit 30, further reduce the environmental noise existing at the location of the speaker, and improve the voice recognition performance.

【０１３４】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１３５】図１２は、本発明の第１２の実施形態の音
声認識装置のブロック図を示す。FIG. 12 shows a block diagram of a speech recognition apparatus according to the twelfth embodiment of the present invention.

【０１３６】図１２に示すように、第１２の実施形態の
音声認識装置は、音響信号を発生する音響信号発生手段
１０と、音響信号発生手段１０の出力を制御する音響信
号出力制御手段６０と、音響信号出力制御手段６０の出
力信号を音波として空間へ出力する第１の音響信号出力
手段２１および第２の音響信号出力手段２２と、発話者
が発声した音声を入力する音響信号入力手段３０と、音
響信号入力手段３０の出力信号と音響信号出力制御手段
６０の出力信号を入力し、音響信号入力手段３０の出力
信号に混入している第１の音響信号出力手段２１および
第２の音響信号出力手段２２の出力信号成分を低減する
妨害音除去部１００と、妨害音除去部１００の出力信号
と音響信号出力制御手段６０の出力信号を入力し、騒音
抑圧処理を行う騒音抑圧手段５０と、妨害音除去部１０
０の出力信号を入力し、発話者が発声した音声を認識す
る音声認識手段４０とから構成されている。As shown in FIG. 12, the speech recognition apparatus of the twelfth embodiment includes an acoustic signal generating means 10 for generating an acoustic signal, and an acoustic signal output control means 60 for controlling the output of the acoustic signal generating means 10. The first acoustic signal output means 21 and the second acoustic signal output means 22 that output the output signal of the acoustic signal output control means 60 to the space as sound waves, and the acoustic signal input means 30 that inputs the voice uttered by the speaker. And the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal output control means 60 are input and mixed into the output signal of the acoustic signal input means 30. The noise canceling unit 100 that reduces the output signal component of the signal output unit 22, the noise canceling unit 100, the output signal of the noise canceling unit 100, and the output signal of the acoustic signal output control unit 60 are input. A suppression unit 50, the interference signal cancellation unit 10
The voice recognition means 40 receives the output signal of 0 and recognizes the voice uttered by the speaker.

【０１３７】以上のように構成された第１２の実施形態
の音声認識装置について、その動作を説明する。ただ
し、騒音抑圧手段５０以外の構成は第９の実施形態と同
様であるため、音響信号出力制御手段６０と騒音抑圧手
段５０の動作についてのみ説明する。The operation of the speech recognition apparatus of the twelfth embodiment constructed as above will be described. However, since the configuration other than the noise suppressing means 50 is the same as that of the ninth embodiment, only the operations of the acoustic signal output control means 60 and the noise suppressing means 50 will be described.

【０１３８】まず、音響信号出力制御手段６０では、音
響信号発生手段１０の出力信号の制御を行い、その制御
内容を騒音抑圧手段５０に渡す。騒音抑圧手段５０で
は、妨害音除去部１００の出力信号から騒音抑圧処理を
行い、その結果を音声認識手段４０へ出力する。このと
き、騒音抑圧手段５０では、音響信号出力制御手段６０
で出力した信号の情報をもとに騒音抑圧を行う。これに
よって例えば、第１の音響信号出力手段２１および第２
の音響信号出力手段２２からの出力レベルに応じて、騒
音抑圧量の調整を行うようにすることが可能となる。First, the acoustic signal output control means 60 controls the output signal of the acoustic signal generation means 10 and transfers the control content to the noise suppression means 50. The noise suppression unit 50 performs noise suppression processing on the output signal of the interference sound removal unit 100, and outputs the result to the voice recognition unit 40. At this time, in the noise suppressing means 50, the acoustic signal output controlling means 60
Noise suppression is performed based on the information of the signal output in. Thereby, for example, the first acoustic signal output means 21 and the second acoustic signal output means 21
It is possible to adjust the noise suppression amount according to the output level from the sound signal output means 22.

【０１３９】このような本発明の第１２の実施の形態に
よれば、第１の音響信号出力手段２１と第２の音響信号
出力手段２２を持つ音声認識装置において、妨害音除去
部１００と、騒音抑圧手段５０と、音響信号出力制御手
段６０を設けることによって、発話者が発生した音声を
認識する際に、第１の音響信号出力手段２１および第２
の音響信号出力手段２２から出力された音響信号が音響
信号入力手段３０に混入している信号成分を低減するこ
とができ、さらに発話者の場所に存在する環境騒音を低
減して音声認識性能を向上できる。According to the twelfth embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 100, By providing the noise suppression means 50 and the acoustic signal output control means 60, the first acoustic signal output means 21 and the second acoustic signal output means 21 and the second acoustic signal output means 21 can be used when recognizing the voice generated by the speaker.
The acoustic signal output from the acoustic signal output means 22 can reduce the signal component mixed in the acoustic signal input means 30, and further reduce the environmental noise existing in the place of the speaker to improve the speech recognition performance. Can be improved.

【０１４０】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１４１】図１３は、本発明の第１３の実施形態の音
声認識装置のブロック図を示す。FIG. 13 shows a block diagram of a speech recognition apparatus according to the thirteenth embodiment of the present invention.

【０１４２】図１３に示すように、第１３の実施形態の
音声認識装置は、音響信号を発生する音響信号発生手段
１０と、音声対話を実現するために合成音を作成する合
成音作成手段９０と、音声認識結果を分析し、音声対話
を実現するための制御を行う音声対話制御手段８０と、
音響信号発生手段１０の出力信号と合成音生成手段９０
の出力信号と音声対話制御手段８０の出力信号を入力
し、必要に応じてどちらかの信号を出力するように制御
する音響信号出力制御手段６０と、音響信号出力制御手
段６０の出力信号を音波として空間へ出力する第１の音
響信号出力手段２１および第２の音響信号出力手段２２
と、発話者が発声した音声を入力する音響信号入力手段
３０と、音響信号入力手段３０の出力信号と音響信号発
生手段１０の出力信号を入力し、音響信号入力手段３０
の出力信号に混入している第１の音響信号出力手段２１
および第２の音響信号出力手段２２の出力信号成分を低
減する妨害音除去部１００と、妨害音除去部１００の出
力信号を入力し、発話者が発声した音声を認識する音声
認識手段４０とから構成されている。As shown in FIG. 13, the speech recognition apparatus according to the thirteenth embodiment has a sound signal generating means 10 for generating a sound signal and a synthesized sound generating means 90 for generating a synthesized sound for realizing voice conversation. And a voice dialogue control means 80 which analyzes the voice recognition result and performs control for realizing voice dialogue,
The output signal of the acoustic signal generating means 10 and the synthetic sound generating means 90
Of the sound signal output control means 60 and the output signal of the voice interaction control means 80 are input, and the output signal of the sound signal output control means 60 is controlled by the sound wave. Acoustic signal output means 21 and second acoustic signal output means 22 for outputting to the space as
, The acoustic signal input means 30 for inputting the voice uttered by the speaker, the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal generation means 10, and the acoustic signal input means 30.
First acoustic signal output means 21 mixed in the output signal of
From the interfering sound removing unit 100 that reduces the output signal component of the second acoustic signal output unit 22, and the voice recognizing unit 40 that receives the output signal of the interfering sound removing unit 100 and recognizes the voice uttered by the speaker. It is configured.

【０１４３】以上のように構成された第１３の実施形態
の音声認識装置について、その動作を説明する。ただ
し、合成音生成手段９０と音声対話制御手段８０と音響
信号出力制御手段６０以外の構成は第９の実施形態と同
様であるため、この３つの動作についてのみ説明する。The operation of the speech recognition apparatus of the thirteenth embodiment constructed as above will be described. However, since the configuration other than the synthetic sound generation unit 90, the voice interaction control unit 80, and the acoustic signal output control unit 60 is the same as that of the ninth embodiment, only these three operations will be described.

【０１４４】音響信号出力制御手段６０では、音響信号
発生手段１０から音響信号を出力するか、合成音生成手
段９０で生成した合成音を出力するかの制御を行う。こ
の時、音声対話制御手段８０は音声認識手段４０の結果
をもとに音響信号出力手段６０に対して、合成音で応答
用の音声を出力を行うか、音響信号発生手段１０からの
音響信号を出力するかなどの制御を行う。このとき、音
響信号出力制御手段６０は必要に応じて合成音生成手段
９０に対して必要な合成音の作成を指示する。The acoustic signal output control means 60 controls whether the acoustic signal generation means 10 outputs an acoustic signal or the synthetic sound generated by the synthetic sound generation means 90. At this time, the voice interaction control means 80 outputs a response voice as a synthesized voice to the acoustic signal output means 60 based on the result of the voice recognition means 40, or outputs an acoustic signal from the acoustic signal generation means 10. Control whether to output. At this time, the acoustic signal output control means 60 instructs the synthetic sound generation means 90 to create a necessary synthetic sound, if necessary.

【０１４５】このような本発明の第１３の実施形態によ
れば、第１の音響信号出力手段２１と第２の音響信号出
力手段２２を持つ音声認識装置において、妨害音除去部
１００を設け、さらに合成音生成手段９０、音声対話制
御手段８０を設けることにより、第１の音響信号出力手
段２１と第２の音響信号出力手段２２から出力されて音
響信号入力手段３０に混入している信号成分を低減する
ことができ、音声認識性能を向上できると同時に音声対
話を実現できる。According to the thirteenth embodiment of the present invention as described above, in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22, the interfering sound removing section 100 is provided, Further, by providing the synthetic sound generation means 90 and the voice dialogue control means 80, the signal components output from the first acoustic signal output means 21 and the second acoustic signal output means 22 and mixed in the acoustic signal input means 30. Can be reduced, the voice recognition performance can be improved, and at the same time voice conversation can be realized.

【０１４６】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１４７】図１４は、本発明の第１４の実施形態の音
声認識装置のブロック図を示す。FIG. 14 shows a block diagram of a speech recognition apparatus according to the 14th embodiment of the present invention.

【０１４８】図１４に示すように、第１４の実施形態の
音声認識装置は、音響信号を発生する音響信号発生手段
１０と、音声対話を実現するために合成音を作成する合
成音作成手段９０と、音声認識結果を分析し、音声対話
を実現するための制御を行う音声対話制御手段８０と、
音響信号発生手段１０の出力信号と合成音生成手段９０
の出力信号と音声対話制御手段８０の出力信号を入力
し、必要に応じてどちらかの信号を出力するように制御
する音響信号出力制御手段６０と、音響信号出力制御手
段６０の出力信号を音波として空間へ出力する第１の音
響信号出力手段２１および第２の音響信号出力手段２２
と、発話者が発声した音声を入力する音響信号入力手段
３０と、音響信号入力手段３０の出力信号と音響信号発
生手段１０の出力信号を入力し、音響信号入力手段３０
の出力信号に混入している第１の音響信号出力手段２１
および第２の音響信号出力手段２２の出力信号成分を低
減する妨害音除去部１００と、妨害音除去部１００の出
力信号を入力し、発話者が発声した音声を認識する音声
認識手段４０と、音声対話をするために発話者に対して
情報を提供する表示手段９５を有する構成となってい
る。As shown in FIG. 14, the speech recognition apparatus of the fourteenth embodiment has a sound signal generating means 10 for generating a sound signal and a synthesized sound generating means 90 for generating a synthesized sound for realizing a voice conversation. And a voice dialogue control means 80 which analyzes the voice recognition result and performs control for realizing voice dialogue,
The output signal of the acoustic signal generating means 10 and the synthetic sound generating means 90
Of the sound signal output control means 60 and the output signal of the voice interaction control means 80 are input, and the output signal of the sound signal output control means 60 is controlled by the sound wave. Acoustic signal output means 21 and second acoustic signal output means 22 for outputting to the space as
, The acoustic signal input means 30 for inputting the voice uttered by the speaker, the output signal of the acoustic signal input means 30 and the output signal of the acoustic signal generation means 10, and the acoustic signal input means 30.
First acoustic signal output means 21 mixed in the output signal of
And an interfering sound removing unit 100 that reduces the output signal component of the second acoustic signal output unit 22, and a voice recognizing unit 40 that receives the output signal of the interfering sound removing unit 100 and recognizes the voice uttered by the speaker. The display unit 95 is provided to provide information to the speaker for voice conversation.

【０１４９】以上のように構成された第１４の実施形態
の音声認識装置について、その動作を説明する。ただ
し、表示手段９５以外の構成は第１３の実施形態と同様
であるため、この動作についてのみ説明する。The operation of the speech recognition apparatus of the fourteenth embodiment constructed as above will be described. However, since the configuration other than the display means 95 is the same as that of the thirteenth embodiment, only this operation will be described.

【０１５０】表示手段９５では、発話者とのやりとりを
音声認識および音声合成で行う音声対話制御８０によ
り、映像や文字で発話者へ情報を提供する手段である。
表示手段９５の具体的な動作内容の例としては、音声認
識手段４０で行った音声認識結果を発話者へ表示するこ
とや、発話者が地図の表示を求めた場合に地図を表示す
ること、あるいは音響信号発生手段１０で発生する音響
信号の内容を発話者に文字で提供することなどがあげら
れる。The display means 95 is means for providing information to the speaker in the form of video or characters by the voice dialogue control 80 for performing communication with the speaker by voice recognition and voice synthesis.
As an example of the specific operation content of the display unit 95, the voice recognition result performed by the voice recognition unit 40 is displayed to the speaker, or a map is displayed when the speaker requests the display of the map. Alternatively, the content of the acoustic signal generated by the acoustic signal generating means 10 may be provided to the speaker in text.

【０１５１】このような本発明の第１４の実施の形態に
よれば、第１の音響信号出力手段２１と第２の音響信号
出力手段２２を持つ音声認識装置において、表示手段９
５を設け、音声対話を容易に実現できる。According to the fourteenth embodiment of the present invention as described above, the display means 9 is provided in the voice recognition device having the first acoustic signal output means 21 and the second acoustic signal output means 22.
5, the voice dialogue can be easily realized.

【０１５２】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１５３】図１５は、本発明の第１５の実施形態の音
声認識装置のブロック図を示す。FIG. 15 shows a block diagram of a speech recognition apparatus according to the 15th embodiment of the present invention.

【０１５４】図１５に示すように、第１５の実施形態の
音声認識装置は、音響信号発生手段（オーディオ）１１
と、音響信号発生手段（ラジオ）１２と、音声対話を実
現するために合成音を作成する合成音生成手段９０と、
音声認識結果を分析し、音声対話を実現するための制御
を行う音声対話制御手段８０と、伝達特性を計測するの
に適した信号を生成する伝達特性計測用信号生成手段７
０と、音響信号発生手段１１および音響信号発生手段１
２の出力信号と合成音生成手段９０の出力信号と伝達特
性計測用信号生成手段７０の出力信号と音声対話制御手
段８０の出力信号とを入力し、必要に応じてどちらかの
信号を出力するように制御する音響信号出力制御手段６
０と、音響信号出力制御手段６０の出力信号を音波とし
て空間へ出力する第１の音響信号出力手段２１、第２の
音響信号出力手段２２、第３の音響信号出力手段２３
と、発話者が発声した音声を入力する音響信号入力手段
３０と、音響信号入力手段３０の出力信号と音響信号出
力制御手段６０の出力信号を入力し、音響信号入力手段
３０の出力信号に混入している第１の音響信号出力手段
２１、第２の音響信号出力手段２２、第３の音響信号出
力手段２３の出力信号成分を低減する妨害音除去部１０
０と、妨害音除去部１００の出力信号を入力し、発話者
が発声した音声を認識する音声認識手段４０と、音声対
話に必要な情報を表示する表示手段９５とから構成され
ている。As shown in FIG. 15, the speech recognition apparatus of the fifteenth embodiment has an acoustic signal generating means (audio) 11
An acoustic signal generating means (radio) 12 and a synthetic sound generating means 90 for generating a synthetic sound for realizing voice dialogue,
A voice interaction control unit 80 that analyzes the voice recognition result and performs control for realizing voice interaction, and a transfer characteristic measurement signal generation unit 7 that generates a signal suitable for measuring the transfer characteristic.
0, acoustic signal generation means 11 and acoustic signal generation means 1
2 output signal, the output signal of the synthetic sound generation means 90, the output signal of the transfer characteristic measurement signal generation means 70, and the output signal of the voice interaction control means 80 are input, and either signal is output as necessary. Signal output control means 6 for controlling as described above
0, the first acoustic signal output means 21, which outputs the output signal of the acoustic signal output control means 60 to the space as a sound wave, the second acoustic signal output means 22, and the third acoustic signal output means 23.
And the sound signal input means 30 for inputting the voice uttered by the speaker, the output signal of the sound signal input means 30, and the output signal of the sound signal output control means 60 are mixed and mixed into the output signal of the sound signal input means 30. The disturbing sound removing unit 10 that reduces the output signal components of the first acoustic signal output unit 21, the second acoustic signal output unit 22, and the third acoustic signal output unit 23 that are operating.
0, a voice recognition means 40 for recognizing a voice uttered by a speaker by inputting an output signal of the interfering sound removing section 100, and a display means 95 for displaying information necessary for voice conversation.

【０１５５】以上のように構成された第１５の実施形態
の音声認識装置について、その動作を説明する。The operation of the speech recognition apparatus of the fifteenth embodiment configured as described above will be described.

【０１５６】音響信号出力制御手段６０では、必要に応
じて音響信号発生手段１１と、音響信号発生手段１２と
合成音生成手段９０と伝達特性計測用信号生成手段７０
の出力を選択し、第１の音響信号出力手段２１、第２の
音響信号出力手段２２、第３の音響信号出力手段２３を
通じて発話者へ聞こえるように出力する。In the acoustic signal output control means 60, the acoustic signal generating means 11, the acoustic signal generating means 12, the synthetic sound generating means 90, and the transfer characteristic measuring signal generating means 70 are used as necessary.
Is selected and output so as to be heard by the speaker through the first acoustic signal output means 21, the second acoustic signal output means 22, and the third acoustic signal output means 23.

【０１５７】発話者は、入力を音声信号入力手段３０を
通じて行い、その出力は騒音抑圧手段５０へ渡される。
騒音抑圧手段５０では、発話者のいる環境で発生してい
る騒音を抑圧する処理を行い、その出力を妨害音除去部
１００へ渡す。妨害音除去部１００では、入力された音
響信号から第１の音響信号出力手段２１、第２の音響信
号出力手段２２、第３の音響信号出力手段２３が出力し
ている音響信号が音響信号入力手段３０に混入している
信号成分を除去する処理を行い、音声認識手段４０へ渡
す。音声認識手段４０では、妨害音除去部１００の出力
信号を入力して音声認識処理を行い、音声認識結果を音
声対話制御手段８０に渡す。音声対話制御手段８０で
は、認識結果をもとに音響信号出力制御手段６０に対し
て第１の音響信号出力手段２１、第２の音響信号出力手
段２２、第３の音響信号出力手段２３から出力する信号
を指示し、表示手段９５を通じて音声認識結果にあわせ
て発話者への情報提供を行う。The speaker inputs through the voice signal input means 30 and the output is passed to the noise suppressing means 50.
The noise suppressing unit 50 performs a process of suppressing noise generated in the environment where the speaker is present, and passes the output to the interfering sound removing unit 100. In the disturbance sound removing unit 100, the acoustic signals output from the first acoustic signal output means 21, the second acoustic signal output means 22, and the third acoustic signal output means 23 are acoustic signal input from the input acoustic signals. Processing for removing the signal component mixed in the means 30 is performed, and the result is passed to the voice recognition means 40. The voice recognition means 40 inputs the output signal of the interfering sound removing section 100, performs voice recognition processing, and passes the voice recognition result to the voice dialogue control means 80. The voice interaction control means 80 outputs the first acoustic signal output means 21, the second acoustic signal output means 22, and the third acoustic signal output means 23 to the acoustic signal output control means 60 based on the recognition result. The display unit 95 provides information to the speaker in accordance with the voice recognition result.

【０１５８】このような本発明の第１５の実施形態によ
れば、第１の音響信号出力手段２１と、第２の音響信号
出力手段２２と、第３の音響信号出力手段２３を持つ音
声認識装置において、妨害音除去部１００を設け、さら
に合成音生成手段９０、音声対話制御手段８０、表示手
段９５を設けることにより、第１の音響信号出力手段２
１と、第２の音響信号出力手段２２と、第３の音響信号
出力手段２３とから出力されて音響信号入力手段３０に
混入している信号成分を低減することができ、音声認識
性能を向上できると同時に性能のよい音声対話を実現で
きる。According to the fifteenth embodiment of the present invention as described above, the voice recognition having the first acoustic signal output means 21, the second acoustic signal output means 22 and the third acoustic signal output means 23. In the device, the first sound signal output unit 2 is provided by providing the disturbing sound removing unit 100, and further by providing the synthesized sound generating unit 90, the voice interaction control unit 80, and the display unit 95.
1, the second acoustic signal output means 22 and the third acoustic signal output means 23 can reduce the signal components mixed in the acoustic signal input means 30 to improve the voice recognition performance. At the same time, high-performance voice dialogue can be realized.

【０１５９】なお、第１の実施形態と同様に音響信号出
力手段が３つ以上ある場合にも同様の処理を行うことが
可能である。Similar to the first embodiment, the same processing can be performed when there are three or more acoustic signal output means.

【０１６０】さらに、第１から１５の実施形態では、本
発明に関する音声認識装置について説明したが各処理動
作を手順として音声認識の方法とし、複数の音響信号出
力手段を有する場合にも認識性能を向上することができ
る。Furthermore, in the first to fifteenth embodiments, the speech recognition apparatus according to the present invention has been described. However, the speech recognition method is used with each processing operation as a procedure, and the recognition performance is improved even when a plurality of acoustic signal output means are provided. Can be improved.

【０１６１】[0161]

【発明の効果】以上のように本発明は、複数の音響信号
が出力される場合にもその信号を低減するような構成に
することによって、音声認識性能を向上できるという効
果を有するものである。As described above, the present invention has an effect that the voice recognition performance can be improved by adopting such a configuration that the signals are reduced even when a plurality of acoustic signals are output. .

[Brief description of drawings]

【図１】本発明における第１の実施形態の音声認識装置
のブロック図FIG. 1 is a block diagram of a voice recognition device according to a first embodiment of the present invention.

【図２】本発明における第２の実施形態の音声認識装置
のブロック図FIG. 2 is a block diagram of a voice recognition device according to a second embodiment of the present invention.

【図３】本発明における第３の実施形態の音声認識装置
のブロック図FIG. 3 is a block diagram of a voice recognition device according to a third embodiment of the present invention.

【図４】本発明における第４の実施形態の音声認識装置
のブロック図FIG. 4 is a block diagram of a voice recognition device according to a fourth embodiment of the present invention.

【図５】本発明における第５の実施形態の音声認識装置
のブロック図FIG. 5 is a block diagram of a voice recognition device according to a fifth embodiment of the present invention.

【図６】本発明における第６の実施形態の音声認識装置
のブロック図FIG. 6 is a block diagram of a voice recognition device according to a sixth embodiment of the present invention.

【図７】本発明における第７の実施形態の音声認識装置
のブロック図FIG. 7 is a block diagram of a voice recognition device according to a seventh embodiment of the present invention.

【図８】本発明における第８の実施形態の音声認識装置
のブロック図FIG. 8 is a block diagram of a voice recognition device according to an eighth embodiment of the present invention.

【図９】本発明における第９の実施形態の音声認識装置
のブロック図FIG. 9 is a block diagram of a voice recognition device according to a ninth embodiment of the present invention.

【図１０】本発明における第１０の実施形態の音声認識
装置のブロック図FIG. 10 is a block diagram of a voice recognition device according to a tenth embodiment of the present invention.

【図１１】本発明における第１１の実施形態の音声認識
装置のブロック図FIG. 11 is a block diagram of a voice recognition device according to an eleventh embodiment of the present invention.

【図１２】本発明における第１２の実施形態の音声認識
装置のブロック図FIG. 12 is a block diagram of a voice recognition device according to a twelfth embodiment of the present invention.

【図１３】本発明における第１３の実施形態の音声認識
装置のブロック図FIG. 13 is a block diagram of a voice recognition device according to a thirteenth embodiment of the present invention.

【図１４】本発明における第１４の実施形態の音声認識
装置のブロック図FIG. 14 is a block diagram of a voice recognition device according to a fourteenth embodiment of the present invention.

【図１５】本発明における第１５の実施形態の音声認識
装置のブロック図FIG. 15 is a block diagram of a speech recognition device according to a fifteenth embodiment of the present invention.

【図１６】代表的なエコーキャンセラのブロック図のブ
ロック図FIG. 16 is a block diagram of a block diagram of a typical echo canceller.

[Explanation of symbols]

１０音響信号発生手段２１第１の音響信号出力手段２２第２の音響信号出力手段２３第３の音響信号出力手段３０音響信号入力手段４０音声認識手段５０騒音抑圧手段５１第１の騒音抑圧手段５２第２の騒音抑圧手段６０音響信号出力制御手段７０伝達特性計測用信号生成手段８０音声対話制御手段９０合成音生成手段９５表示手段１００妨害音除去部１１１第１段のエコーキャンセラ手段１１２第２段のエコーキャンセラ手段１２１第１の信号レベル検出手段１２２第２の信号レベル検出手段１３０学習タイミング決定手段１４１第１のフィルタ係数算出手段１４２第２のフィルタ係数算出手段１５０フィルタ係数統合手段１６０フィルタリング手段１７０適応フィルタ手段２００外部の音響信号発生手段 10 Acoustic signal generating means 21 First acoustic signal output means 22 Second sound signal output means 23 Third acoustic signal output means 30 Sound signal input means 40 voice recognition means 50 Noise suppression means 51 First noise suppressing means 52 Second noise suppressing means 60 sound signal output control means 70 Transfer characteristic measuring signal generating means 80 Voice interaction control means 90 Synthetic sound generating means 95 display means 100 Interference remover 111 First Stage Echo Canceller Means 112 Second Stage Echo Canceller Means 121 First signal level detecting means 122 Second signal level detecting means 130 Learning timing determining means 141 First Filter Coefficient Calculation Means 142 Second filter coefficient calculation means 150 Filter coefficient integrating means 160 Filtering means 170 Adaptive filter means 200 External acoustic signal generation means

Claims

[Claims]

1. An acoustic signal generation means for generating an acoustic signal, a plurality of acoustic signal output means for outputting the acoustic signal as a sound wave, an acoustic signal input means for inputting voice, and an output signal of the acoustic signal input means. And an interfering sound removing section for inputting an output signal of the acoustic signal generating means and reducing output signal components of the plurality of acoustic signal outputting means mixed in the output signal of the acoustic signal input means, and the interfering sound removing section. A voice recognition device comprising: a voice recognition unit that receives an output signal and recognizes the voice.

2. The acoustic signal generating means receives an output signal of a predetermined external acoustic signal generating means as an input, and outputs an acoustic signal to the plurality of acoustic signal outputting means. The voice recognition device described in.

3. The interfering sound removal section has a plurality of stages of echo canceller means for reducing the output signal component of the acoustic signal output means mixed in the output signal of the acoustic signal input means, and the echo canceller means of the frontmost stage is provided. The echo canceller means inputs the output signal of the acoustic signal input means, the echo canceller means of the other stage inputs the output signal of the echo canceller means of the preceding stage, and each echo canceller means has the acoustic signal generating means. 2. An output signal of one of the output signals of 1. is input, and the output signal component of the audio signal output means mixed in the output signal of the audio signal input means is reduced and output. The voice recognition device according to 2.

4. The interfering sound removing section detects a plurality of levels of the echo canceller means and a plurality of signal levels of the output signals of the acoustic signal generating means respectively inputted to the plurality of echo canceller means. The detection means and the output signals of the plurality of signal level detection means are input, and it is determined from the signal levels which one of the plurality of stages of echo canceller means is to perform the processing for improving the performance of the echo canceller. The speech recognition apparatus according to claim 3, further comprising a learning timing determining unit that determines a learning timing for improving the echo canceller performance.

5. The interfering sound removing section inputs any one of a plurality of output signals from the acoustic signal generating means and an output signal of the acoustic signal inputting means and mixes them with the output signal of the acoustic signal inputting means. A plurality of filter coefficient calculation means for calculating a filter coefficient of a filter that works to attenuate one output signal of the acoustic signal output means, and the acoustic signal input to each of the plurality of filter coefficient calculation means A plurality of signal level detecting means for respectively detecting the levels of the output signals from the generating means and the output signals of the plurality of signal level detecting means are input, and the filter coefficient of the filter coefficient calculating means is learned from the signal level. A learning timing determining means for determining which of the filter coefficient calculating means should be and determining a learning timing; The filter coefficient integrating means for integrating the filter coefficients calculated by the filter coefficient calculating means, the output signals of the filter coefficient integrating means and the acoustic signal input means are input, and the filter coefficients obtained by the filter coefficient integrating means are used. The speech recognition apparatus according to claim 1 or 2, further comprising a filtering unit that performs a process of reducing the acoustic signal output component.

6. The learning timing determining means is configured such that all output signals except one of the output signals of the plurality of signal level detecting means are less than a threshold value, and the remaining one output signal is less than the threshold value. When the threshold value is set separately, the echo canceller means or the filter coefficient calculation means, which receives the acoustic signal input to the signal level detection means as input, is learned. The voice recognition device according to Item 4 or 5.

7. The apparatus further comprises noise suppressing means for suppressing noise by inputting an output signal of the acoustic signal input means, wherein the interfering sound removing section outputs the output signal of the noise suppressing means and the acoustic signal generating means. 7. An output signal is input, and an output signal component of the acoustic signal output means mixed in the output signal of the acoustic signal input means is reduced.
The voice recognition device according to claim 1.

8. The apparatus further comprises noise suppression means for suppressing noise by inputting an output signal of the disturbing sound removing section, and the voice recognition means inputs an output signal of the noise suppression means and a speaker speaks. The recognized voice is recognized.
7. The voice recognition device according to any one of 1 to 6.

9. A first noise suppressing means for inputting an output signal of the acoustic signal input means to suppress noise, and a second noise suppressing means for inputting an output signal of the interfering sound removing section to suppress noise.
Noise suppressing means, and the interfering sound removing section inputs the output signal of the first noise suppressing means and the output signal of the acoustic signal generating means to the output signal of the first noise suppressing means. The mixed output signal component of the acoustic signal output means is reduced, and the voice recognition means inputs the output signal of the second noise suppression means to recognize the voice generated by the speaker. The voice recognition device according to any one of claims 1 to 6.

10. An acoustic signal output control means for controlling an output of the acoustic signal generation means, wherein the acoustic signal output control means inputs an output signal of the acoustic signal generation means, and outputs the plurality of acoustic signals. The signal output to the output means and the interfering sound removing section is controlled.
10. The voice recognition device according to any one of 1 to 9.

11. A transfer characteristic measuring signal generating means for generating a signal suitable for measuring a transfer characteristic, wherein the acoustic signal output control means and the output signal of the acoustic signal generating means and the 11. The voice recognition device according to claim 10, wherein the voice recognition device is controlled so as to output one of the output signals of the transfer characteristic measuring signal generating means.

12. The speech recognition apparatus according to claim 11, wherein the signal output from the transfer characteristic measuring signal generating means is either white noise or an M-sequence signal.

13. The apparatus further comprises noise suppressing means for suppressing noise from information on the output signal of the acoustic signal input means and the signal output from the acoustic signal output control means, and the interfering sound removing section includes the noise reducing means. 11. The output signal of the suppressing means is input and processed.
The voice recognition device according to any one of 2 above.

14. A noise suppressing unit for suppressing noise from information about an output signal of the interfering sound removing unit and a signal output from the acoustic signal output control unit, and the voice recognizing unit includes the noise suppressing unit. 14. The voice recognition device according to claim 10, wherein the voice signal uttered by the speaker is recognized by inputting the output signal of the means.

15. The system further comprises a synthetic sound generation unit that creates a synthetic sound to be heard by a speaker, and a voice dialogue control unit that analyzes the voice recognition result and performs control for realizing a voice dialogue. The signal output control means inputs the output signal of the acoustic signal generation means, the output signal of the synthesized sound generation means, and the output signal of the voice interaction control means, and outputs the output signal of the acoustic signal generation means and the 2. Outputting one of the output signals of the synthetic sound generation means.
The voice recognition device according to any one of 0 to 14.

16. The voice recognition device according to claim 15, further comprising display means for displaying the contents determined by the voice interaction control means.

17. The voice recognition device according to claim 1, wherein the signal output from the acoustic signal generating means is a stereo signal.

18. The voice recognition device according to claim 1, wherein a plurality of signals output from the acoustic signal generating means have the same content.

19. An acoustic signal generation procedure for generating an acoustic signal, a plurality of acoustic signal output procedures for outputting the acoustic signals as sound waves, an acoustic signal input procedure for inputting voice, and an output in the acoustic signal input procedure. A signal and an output signal in the acoustic signal generation procedure, and an interference sound removal procedure for reducing output signal components in the plurality of acoustic signal output procedures mixed in the output signal in the acoustic signal input procedure; A voice recognition method comprising: inputting an output signal of a sound removal procedure and performing a voice recognition procedure for recognizing the voice.