JP2008020872A

JP2008020872A - Voice recognition device for vehicle and navigation device for vehicle

Info

Publication number: JP2008020872A
Application number: JP2006254358A
Authority: JP
Inventors: Takayuki Nishiwaki; 貴之西脇
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2006-06-14
Filing date: 2006-09-20
Publication date: 2008-01-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition device for vehicles, capable of improving a recognition rate of voice recognition processing performed in a vehicle. <P>SOLUTION: In a vehicle room, sound collecting microphones 4 to 7 are provided at a steering wheel 11, at a dashboard 5, in front of a passenger seat side speaker 14, and at the backside of the passenger seat 13. A voice separation processor 2 separates a voice signal which is input by the sound collecting microphones 4 to 7 into a driver voice signal S and the other noise signals N1 to N3 by a BSS method. A voice recognition processor 3 of the navigation device 1 for vehicles performs voice recognition processing on the separated driver voice signal. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、車室内において車両の乗員が発した音声を集音マイクにより集音し、音声認識処理を行なう車両用音声認識装置、及び当該装置を備えてなる車両用ナビゲーション装置に関する。 The present invention relates to a voice recognition device for a vehicle that performs voice recognition processing by collecting voice generated by a vehicle occupant in a vehicle interior using a sound collecting microphone, and a vehicle navigation device including the device.

最近の車両用ナビゲーション装置は、音声認識処理機能が向上した結果、ドライバが発した音声によって操作が可能であるものが増えてきている。例えば、音声認識処理に関する従来技術としては、特許文献１〜５に開示されているものがある。 As a result of the improvement of the voice recognition processing function, recent vehicle navigation apparatuses are increasing in number that can be operated by voice emitted by a driver. For example, as a related art regarding the voice recognition processing, there are those disclosed in Patent Documents 1 to 5.

しかし、車室内は、車両のエンジン音や、カーオーディオ装置がスピーカより出力する音など、多くのノイズが発生する環境であるため、音声の認識率を向上させるには更なる改良を行うことが好ましい。上記従来技術のうち、特許文献４，５に開示されているブラインドソースセパレーション（ＢＳＳ）法は、雑音のみが発生している区間が存在せずとも、認識対象の音声信号とその他の雑音信号とを明確に認識することができる。
特開２００４−０６９７７２号公報特開２００４−３０９５３６号公報特開２００４−３１７９４２号公報特開２００２−０２３７７６号公報特開２００５−２２７５１２号公報 However, since the interior of the vehicle is an environment where a lot of noise is generated, such as the engine sound of the vehicle and the sound output from the speaker of the car audio device, further improvements can be made to improve the voice recognition rate. preferable. Among the prior arts described above, the blind source separation (BSS) method disclosed in Patent Documents 4 and 5 includes a speech signal to be recognized and other noise signals, even if there is no section in which only noise is generated. Can be clearly recognized.
JP 2004-069772 A JP 2004-309536 A JP 2004-317942 A JP 2002-023776 A JP 2005-227512 A

また、特許文献５には、ＢＳＳ法を車室内の音声認識に適用することが示唆されているが、適用するに当たり、具体的にどのような構成に基づいてＢＳＳ法を実施するかについては、全く開示されていない。
本発明は上記事情に鑑みてなされたものであり、その目的は、車室内で行う音声認識処理の認識率を更に向上させることができる車両用音声認識装置，並びにその車両用音声認識装置備えて構成される車両用ナビゲーション装置を提供することにある。 In addition, Patent Document 5 suggests that the BSS method is applied to the voice recognition in the vehicle interior. It is not disclosed at all.
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a vehicle speech recognition device capable of further improving the recognition rate of speech recognition processing performed in the vehicle interior, and the vehicle speech recognition device. It is providing the vehicle navigation apparatus comprised.

請求項１記載の車両用音声認識装置によれば、集音マイクを、少なくとも、運転者音声，エンジン音，スピーカ出力音声を夫々収集する位置に設置し、これらの集音マイクにより入力された音声信号を、ブラインドソースセパレーション法によって運転者音声信号とその他の雑音信号とに分離し、分離した運転者音声信号について音声認識処理を行なう。即ち、車室内において運転者が発する音声以外の音源としては、エンジン音，カーオーディオが発する音声などが主であるから、少なくともそれらの音源が発する音声を集音するためのマイクを配置してＢＳＳ法を適用すれば、運転者が発した音声以外の雑音を効果的に分離することが可能となる。 According to the vehicle voice recognition apparatus of claim 1, the sound collecting microphone is installed at a position for collecting at least the driver sound, the engine sound, and the speaker output sound, and the sound input by these sound collecting microphones. The signal is separated into a driver voice signal and other noise signals by a blind source separation method, and voice recognition processing is performed on the separated driver voice signal. That is, as the sound source other than the sound emitted by the driver in the passenger compartment, the engine sound, the sound emitted by the car audio, etc. are mainly used. If the method is applied, it is possible to effectively separate noise other than the voice emitted by the driver.

請求項２記載の車両用音声認識装置によれば、ブラインドソースセパレーション法によって分離した信号ｙを、信号源ｓが前記集音マイクによって集音される場合の混合過程を非線形モデル化することで、ｙ＝ｈｓ＋ｋｓ²のように設定する。そして、ｋｓ²＋ｈｓ−ｙ＝０を、２次方程式の解の公式よりｓについて解くと、
ｓ＝−ｈ／（２ｋ）±（ｈ²／（４ｋ²）＋ｙ／ｋ）^1/2
が得られる。このｓを非線形関数Ｇとして、演算
ｚ＝Ｇ（ｙ）＝−α／２±（α²／４＋ｙ／β）^1/2
（α＝ｈ／ｋ，β＝１／ｋ）
を行なうことで、線形化処理した信号ｚを得る。即ち、信号源ｓが、例えばスピーカのように、周期性を有する音響信号を発するものである場合、混合過程は非線形的となる。従って、非線形関数Ｇを最適化するように係数α，βを決定すれば、非線形関数Ｇを用いて分離信号ｙを線形化することで、運転者音声信号とその他の各雑音信号とをより高精度に分離することが可能となる。 According to the vehicle voice recognition device of claim 2, the signal y separated by the blind source separation method is subjected to nonlinear modeling of a mixing process when the signal source s is collected by the sound collecting microphone. Set as y = hs + ks ² . And when ks ² + hs−y = 0 is solved for s from the quadratic equation solution formula,
s = −h / (2k) ± (h ² / (4k ² ) + y / k) ^1/2
Is obtained. The s as a nonlinear function G, calculation z = G (y) = - α / 2 ± (α 2/4 + y / β) 1/2
(Α = h / k, β = 1 / k)
To obtain a linearized signal z. That is, when the signal source s emits an acoustic signal having periodicity such as a speaker, the mixing process is nonlinear. Therefore, if the coefficients α and β are determined so as to optimize the nonlinear function G, the separated signal y is linearized using the nonlinear function G, so that the driver voice signal and other noise signals are increased. It becomes possible to separate with accuracy.

請求項３記載の車両用音声認識装置によれば、複数の集音マイクを、ハンドル，ダッシュボード，助手席側スピーカの前方に夫々設置する。即ち、上記の配置は、運転者の音声、エンジン音，カーオーディオの音声を夫々集音するのに適切であり、車室内の主たる雑音を確実に捉えることができる。 According to the vehicle voice recognition apparatus of the third aspect, the plurality of sound collecting microphones are respectively installed in front of the steering wheel, the dashboard, and the front passenger side speaker. That is, the above arrangement is suitable for collecting the driver's voice, engine sound, and car audio sound, respectively, and can reliably capture main noise in the passenger compartment.

請求項４記載の車両用音声認識装置によれば、集音マイクを、後部座席の乗員音声を収集する位置にも設置する。即ち、後部座席が存在する車両で且つ当該座席に乗員が存在する場合は、その乗員が音声を発することも想定される。従って、後部座席の乗員音声も集音することで、運転者音声の分離をより良好に行うことができる。
請求項５記載の車両用ナビゲーション装置によれば、請求項４の場合に、集音マイクを助手席後方側に設置するので、後部座席の乗員音声を良好に集音することができる。 According to the vehicle voice recognition apparatus of the fourth aspect, the sound collecting microphone is also installed at a position where the passenger's voice of the rear seat is collected. That is, when the vehicle has a rear seat and an occupant is present in the seat, it is also assumed that the occupant emits sound. Therefore, the driver's voice can be separated more favorably by collecting the passenger's voice in the rear seat.
According to the vehicle navigation device of the fifth aspect, in the case of the fourth aspect, since the sound collecting microphone is installed on the rear side of the passenger seat, the passenger's voice in the rear seat can be collected well.

請求項６記載の車両用音声認識装置によれば、分離した雑音信号を、複数の集音マイクより入力される信号より減算する。即ち、ＢＳＳ法により分離されて得られた運転者音声信号においても雑音信号は僅かに含まれているので、分離した雑音信号を入力側にフィードバックして集音マイクより入力される信号より減算すれば、入力される雑音信号のレベルを低減させることができ、よりクリアな運転者音声信号を得ることが可能となる。 According to the vehicle voice recognition apparatus of the sixth aspect, the separated noise signal is subtracted from the signals input from the plurality of sound collecting microphones. In other words, the driver's voice signal obtained by separation by the BSS method also contains a slight amount of noise signal. Therefore, the separated noise signal is fed back to the input side and subtracted from the signal input from the sound collecting microphone. Accordingly, the level of the input noise signal can be reduced, and a clearer driver voice signal can be obtained.

請求項７記載の車両用ナビゲーション装置によれば、請求項１乃至６の何れかに記載の車両用音声認識装置を備え、分離された運転者音声信号についての音声認識処理結果に基き、より確度が高い操作制御を行うことが可能となる。 According to the vehicle navigation device of the seventh aspect, the vehicle voice recognition device according to any one of the first to sixth aspects is provided, and more accurate based on the voice recognition processing result for the separated driver voice signal. It becomes possible to perform high operation control.

（第１実施例）
以下、本発明を車両用ナビゲーション装置に適用した場合の第１実施例について図１乃至図６を参照して説明する。図１は、車両用ナビゲーション装置の構成を概略的に示す機能ブロック図である。車両用ナビゲーション装置（車両用音声認識装置）１は、音声分離処理部２及び音声認識処理部３を内蔵して構成されている。音声分離処理部２は、車室内に配置される４本の集音マイク４〜７より入力される音声信号を、ドライバ（運転者）が発した音声信号とその他の雑音信号とに分離し、前者の音声信号を音声認識処理部３に出力する。すると、音声認識処理部３は、与えられた運転者音声信号について音声認識処理を行なう。
音声認識処理部３による音声認識結果はナビ操作制御回路９に出力され、ナビ操作制御回路９は、上記の音声認識結果に基づいてナビゲーション装置１の操作制御、例えば、目的地の設定などを行うようになっている。 (First embodiment)
Hereinafter, a first embodiment in which the present invention is applied to a vehicle navigation apparatus will be described with reference to FIGS. FIG. 1 is a functional block diagram schematically showing the configuration of the vehicle navigation device. A vehicle navigation device (vehicle speech recognition device) 1 includes a speech separation processing unit 2 and a speech recognition processing unit 3. The voice separation processing unit 2 separates a voice signal input from the four sound collecting microphones 4 to 7 arranged in the passenger compartment into a voice signal emitted by a driver (driver) and other noise signals, The former voice signal is output to the voice recognition processing unit 3. Then, the voice recognition processing unit 3 performs voice recognition processing on the given driver voice signal.
A voice recognition result by the voice recognition processing unit 3 is output to the navigation operation control circuit 9, and the navigation operation control circuit 9 performs operation control of the navigation device 1, for example, setting of a destination, based on the voice recognition result. It is like that.

図２には、車室内における集音マイク４〜７の配置状態を示す。集音マイク４は、ドライバが発する音声を集音するためハンドル１１部分に配置されており、集音マイク５は、車両のエンジン音を集音するため、前方のダッシュボード１２上に配置されている。集音マイク６は、カーオーディオシステムによって出力される音声を集音するため、助手側１３のドアスピーカ１３の前に配置されており、集音マイク７は、後部座席の乗員が発する音声を集音するため、助手席１４の後方側に配置されている。
そして、図１に示すように、集音マイク４〜７によって夫々集音される音声は、各レベルの比に差はあるが、ドライバの音声と、その他のノイズであるエンジン音，スピーカ音，後部座席乗員の音声とが混合されたものとなっている。 In FIG. 2, the arrangement | positioning state of the sound collection microphones 4-7 in a vehicle interior is shown. The sound collecting microphone 4 is disposed on the handle 11 for collecting the sound emitted by the driver, and the sound collecting microphone 5 is disposed on the front dashboard 12 for collecting the engine sound of the vehicle. Yes. The sound collecting microphone 6 is disposed in front of the door speaker 13 on the passenger side 13 to collect sound output by the car audio system, and the sound collecting microphone 7 collects sound emitted by the passenger in the rear seat. In order to make a sound, it is arranged behind the passenger seat 14.
As shown in FIG. 1, the sounds collected by the sound collection microphones 4 to 7 are different in the ratio of each level, but the driver sound and other noises such as engine sound, speaker sound, The sound of the rear seat occupant is mixed.

次に、本実施例の作用について図３乃至図６も参照して説明する。図３は、ナビゲーション装置１による処理を、本発明の要旨に係る部分について示すフローチャートである。音声分離処理部２は、集音マイク４〜７により集音された音声信号の入力受付けを開始すると（ステップＳ１）、その入力が完了するまで待機する（ステップＳ２）。ここで、各集音マイク４〜７によって夫々集音される信号を、ｘ１（ｉ）〜ｘ４（ｉ）とする。変数ｉ（＝１〜ｎ）は入力信号のサンプル数を示し、サンプル数がｎに達すると入力完了となる（「ＹＥＳ」）。 Next, the operation of this embodiment will be described with reference to FIGS. FIG. 3 is a flowchart showing the processing by the navigation device 1 with respect to the portion according to the gist of the present invention. When the voice separation processing unit 2 starts receiving an input of the voice signal collected by the voice collecting microphones 4 to 7 (step S1), it waits until the input is completed (step S2). Here, signals collected by the respective sound collecting microphones 4 to 7 are assumed to be x1 (i) to x4 (i). The variable i (= 1 to n) indicates the number of samples of the input signal. When the number of samples reaches n, the input is completed (“YES”).

続くステップＳ３では、入力信号にドライバが発声した音声信号が含まれているかどうかを判断する。例えば、集音マイク４の入力レベルが、その他の集音マイク５〜７の入力レベルに比較してある程度高い場合は、入力信号にドライバが発声した音声が含まれていると判断することができる（「ＹＥＳ」）。すると、ステップＳ４に移行して、音声分離処理を行う。 In a succeeding step S3, it is determined whether or not the input signal includes an audio signal uttered by the driver. For example, when the input level of the sound collecting microphone 4 is somewhat higher than the input levels of the other sound collecting microphones 5 to 7, it can be determined that the input signal includes the sound uttered by the driver. ("YES"). Then, it transfers to step S4 and performs an audio | voice separation process.

音声分離処理では、詳細は後述するが、各集音マイク４〜７に入力された信号からＢＳＳ（Blind Source Separation）法によりドライバが発声した音声信号とその他のノイズとを分離する。そして、分離した音声信号だけを音声認識処理部３に出力して、音声認識処理を実行させる（ステップＳ５）。音声認識を正常に行うことができた場合（ステップＳ６：「ＹＥＳ」）、ナビ操作制御回路９は、その認識結果に対応するナビゲーション装置１の操作制御を行う（ステップＳ７）。 In the audio separation process, the details will be described later, but the audio signal uttered by the driver and other noises are separated from the signals input to the sound collecting microphones 4 to 7 by the BSS (Blind Source Separation) method. Then, only the separated speech signal is output to the speech recognition processing unit 3 to execute speech recognition processing (step S5). When the voice recognition can be normally performed (step S6: “YES”), the navigation operation control circuit 9 performs the operation control of the navigation device 1 corresponding to the recognition result (step S7).

図４は、ステップＳ４における音声分離処理の内容を示すフローチャートである。また、図５は、ＢＳＳ法による音声分離処理を概念的に説明するものである。複数の信号源Ｓ１〜Ｓｎより出力される音声信号（Ｓｉ）を同数の集音マイクＸ１〜Ｘｎで集音する場合、各集音マイクに入力される信号（Ｘｉ）は、音声信号（Ｓｉ）と、音声信号の伝達環境に応じた混合係数Ａijとの線形和（Ｘ＝ＡＳ：ベクトル表示式）となる。そして、各集音マイクに入力される信号（Ｘｉ）から、各信号源Ｓ１〜Ｓｎに対応する音声信号（Ｓｉ）を分離して得られる出力信号（Ｙｉ）も、入力信号（Ｘｉ）と分離係数Ｗijとの線形和（Ｙ＝ＷＸ）となる。 FIG. 4 is a flowchart showing the contents of the voice separation process in step S4. FIG. 5 conceptually illustrates voice separation processing by the BSS method. When sound signals (Si) output from a plurality of signal sources S1 to Sn are collected by the same number of sound collecting microphones X1 to Xn, the signals (Xi) input to the sound collecting microphones are sound signals (Si). And a linear sum (X = AS: vector expression) of the mixing coefficient Aij according to the transmission environment of the audio signal. The output signal (Yi) obtained by separating the audio signals (Si) corresponding to the signal sources S1 to Sn from the signal (Xi) input to each sound collecting microphone is also separated from the input signal (Xi). It becomes a linear sum (Y = WX) with the coefficient Wij.

上記２つの式を合わせればＹ＝ＷＡＳとなり、Ｔ＝ＷＡとすると、Ｙ＝ＴＳとなる。この行列Ｔが図５に示すように単位行列となる場合に、各信号が完全に分離できたと言える。ＢＳＳでは、このように分離を行うため、自然勾配法を用いる。自然勾配法は、相互情報量を最小化するように学習するアルゴリズムであり、分離係数Ｗを得るための学習更新式は、
Ｗ（ｔ＋１）＝Ｗ（ｔ）＋η［Λ（ｔ）−ｆ（Ｙ（ｔ））Ｙ^T（ｔ）]Ｗ（ｔ）
・・・（１）
ここで、η：学習率，Λ（ｔ）：対角行列であり、
ｆ（Ｙ）＝−ｄｐ（Ｙ）／ｄＹ／ｐ（Ｙ），ｐ（Ｙ）は出力信号Ｙの確率密度関数
である。また、ｔは処理の時系列を示す変数である。そして、ＢＳＳ法を用いれば、混合係数Ａijが未知であっても、信号の分離を行なうことが可能となっている。 If the above two expressions are combined, Y = WAS, and if T = WA, Y = TS. When this matrix T becomes a unit matrix as shown in FIG. 5, it can be said that each signal was completely separated. In the BSS, a natural gradient method is used to perform the separation in this way. The natural gradient method is an algorithm that learns to minimize the mutual information amount, and the learning update formula for obtaining the separation coefficient W is:
W (t + 1) = W (t) + η [Λ (t) −f (Y (t)) Y ^T (t)] W (t)
... (1)
Where η: learning rate, Λ (t): diagonal matrix,
f (Y) = − dp (Y) / dY / p (Y), p (Y) is a probability density function of the output signal Y. T is a variable indicating a time series of processing. If the BSS method is used, the signal can be separated even if the mixing coefficient Aij is unknown.

図４において、音声分離処理部２は、先ず、変数ｉを「０」に初期化すると（ステップＳ１１）、変数ｉをインクリメントする（ステップＳ１２）。そして、変数ｉが全サンプル数を示す［ｎ］に達するまでの間（ステップＳ１３：「ＮＯ」）、以降の処理を繰り返し実行する。 In FIG. 4, the speech separation processing unit 2 first initializes a variable i to “0” (step S11), and increments the variable i (step S12). Until the variable i reaches [n] indicating the total number of samples (step S13: “NO”), the subsequent processing is repeatedly executed.

ステップＳ１４において、音声分離処理部２は、入力信号ｘ１（ｉ）〜ｘ４（ｉ）より、前回の処理で分離されたノイズ成分Ｎ１（ｉ−１）〜Ｎ３（ｉ−１）の総和を減算し、ｘ’１（ｉ）〜ｘ４’（ｉ）を得る。この処理については、図６で説明している。即ち、ＢＳＳ法により分離して得られた音声信号には、僅かではあるがノイズ成分も含まれている。そこで、分離したノイズ成分を入力側フィードバックして入力信号より減算したものについて音声分離処理を行うことで、出力信号におけるノイズレベルの更なる低減を図るようにしている。 In step S14, the speech separation processing unit 2 subtracts the sum of the noise components N1 (i-1) to N3 (i-1) separated in the previous process from the input signals x1 (i) to x4 (i). X'1 (i) to x4 '(i) are obtained. This process is described in FIG. That is, the audio signal obtained by separation by the BSS method contains a small amount of noise components. Therefore, the noise level in the output signal is further reduced by performing a sound separation process on the separated noise component fed back on the input side and subtracted from the input signal.

再び、図４を参照する。続くステップＳ１５において、音声分離処理部２は、入力信号Ｘ’（ｉ）に、前回の処理で得られている分離係数Ｗ（ｉ）を乗じることで、分離出力信号Ｙ（ｉ）を得る。尚、アルファベットの大文字で表記したものは、行列を示すものとする。それから、上記（１）式を演算して、次回の処理に使用する分離係数Ｗ（ｉ＋１）を求めておく（ステップＳ１６）。尚、ステップＳ１６では、（１）式の変数ｔをｉに置き換えて表記している。 Reference is again made to FIG. In subsequent step S15, the sound separation processing unit 2 obtains a separated output signal Y (i) by multiplying the input signal X '(i) by the separation coefficient W (i) obtained in the previous process. In addition, what was written with the capital letter of an alphabet shall show a matrix. Then, the above equation (1) is calculated to obtain the separation coefficient W (i + 1) used for the next processing (step S16). In step S16, the variable t in equation (1) is replaced with i.

以降のステップＳ１７〜Ｓ２０では、音声特定処理を行う。即ち、ＢＳＳ法では、図５に示すように得られる分離出力信号は、何れの出力にドライバの音声信号が得られるかが特定できない。即ち、図１の構成では、４つの入力（集音マイク４〜７）に対応して、音源が分離された４つの出力が得られるが、その４つの何れがドライバ音声であるのかが分からない。そこで、特許文献４に開示されている技術を使用し、何れの分離出力信号がドライバの音声信号であるかを特定する。そのため、変数ｉを変数ｊに置き換え、その変数ｊが変数ｉの現在値となるまで、ステップＳ１８において音声特定処理を実行する。 In subsequent steps S17 to S20, a voice specifying process is performed. In other words, in the BSS method, the separated output signal obtained as shown in FIG. 5 cannot be specified for which output the driver's audio signal is obtained. That is, in the configuration of FIG. 1, four outputs from which the sound source is separated are obtained corresponding to the four inputs (sound collecting microphones 4 to 7), but it is not known which of the four is the driver sound. . Therefore, the technique disclosed in Patent Document 4 is used to specify which separated output signal is the audio signal of the driver. Therefore, the variable i is replaced with the variable j, and the voice specifying process is executed in step S18 until the variable j becomes the current value of the variable i.

上記音声特定処理について簡単に説明すると、分離された各出力信号について確率分布の尖度を計算して比較する。そして、尖度が最も大きくなっている信号がドライバの音声信号Ｓ（ｉ）であり、その他の信号がノイズＮ１（ｉ）〜Ｎ３（ｉ）となる（ステップＳ２１）。尚、ノイズＮ１（ｉ）〜Ｎ３（ｉ）については、上述したようにステップＳ１４においてそれらの総和を減算するために使用するので、敢えて発生音源を特定する必要はない。音声信号の特定を行なうと、ステップＳ１２に移行して変数ｉをインクリメントし、処理を続行する。
音声分離処理部２は、以上のようにしてドライバが発した音声信号Ｓ（ｉ）を得ると音声認識処理部３に出力し、ステップＳ６における音声認識処理を実行させる。 The speech specifying process will be briefly described. The kurtosis of the probability distribution is calculated and compared for each separated output signal. The signal having the highest kurtosis is the driver's voice signal S (i), and the other signals are noises N1 (i) to N3 (i) (step S21). Since the noises N1 (i) to N3 (i) are used for subtracting the sum in step S14 as described above, it is not necessary to dare to specify the generated sound source. When the audio signal is specified, the process proceeds to step S12, the variable i is incremented, and the process is continued.
When the voice separation processing unit 2 obtains the voice signal S (i) issued by the driver as described above, the voice separation processing unit 2 outputs the voice signal S (i) to the voice recognition processing unit 3 to execute the voice recognition processing in step S6.

以上のように本実施例によれば、車室内において、集音マイク４〜７を、ドライバが発した音声，エンジン音，スピーカ出力音声，後部座席の乗員が発した音声を夫々収集するように、ハンドル１１，ダッシュボード５，助手席側スピーカ１４の前，助手席１３後方側に設置した。そして、音声分離処理部２は、ＢＳＳ法により、これらの集音マイク４〜７により入力された音声信号を、ドライバ音声信号Ｓとその他の雑音信号Ｎ１〜Ｎ３とに分離し、ナビゲーション装置１の音声認識処理部３は、分離されたドライバ音声信号について音声認識処理を行なうようにした。
即ち、車室内においてドライバが発する音声以外の音源としては、エンジン音，カーオーディオが発する音声，後部座席の乗員が発する音声などが主であるので、少なくともそれらの音源が発する音声を集音するように集音マイク４〜７を配置し、ＢＳＳ法を適用すれば、ノイズを効果的に分離することが可能となる。 As described above, according to the present embodiment, in the vehicle interior, the sound collection microphones 4 to 7 collect the sound emitted by the driver, the engine sound, the speaker output sound, and the sound emitted by the passenger in the rear seat, respectively. The steering wheel 11, the dashboard 5, the front passenger side speaker 14 and the rear side of the passenger seat 13 are installed. Then, the voice separation processing unit 2 separates the voice signal input from the sound collecting microphones 4 to 7 into the driver voice signal S and the other noise signals N1 to N3 by the BSS method, and The voice recognition processing unit 3 performs voice recognition processing on the separated driver voice signal.
That is, as the sound source other than the sound emitted by the driver in the vehicle interior, mainly the engine sound, the sound emitted by the car audio, the sound emitted by the occupant of the rear seat, etc., so that at least the sound emitted by those sound sources is collected. If the sound collecting microphones 4 to 7 are disposed in the BSS method and the BSS method is applied, noise can be effectively separated.

また、音声分離処理部２は、分離した雑音信号を、集音マイク４〜７より入力される信号より減算するので、集音マイク４〜７より入力される信号に含まれる雑音信号のレベルを低減させることができ、よりクリアな運転者音声信号を得ることが可能となる。そして、ナビゲーション装置１は、分離されたドライバ音声信号についての音声認識処理結果に基き、より確度が高い操作制御を行うことが可能となる。 Further, since the sound separation processing unit 2 subtracts the separated noise signal from the signal input from the sound collection microphones 4 to 7, the level of the noise signal included in the signal input from the sound collection microphones 4 to 7 is set. Therefore, a clearer driver voice signal can be obtained. Then, the navigation device 1 can perform operation control with higher accuracy based on the voice recognition processing result of the separated driver voice signal.

（第２実施例）
図７乃至図９は本発明の第２実施例を示すものであり、第１実施例と同一部分には同一符号を付して説明を省略し、以下異なる部分について説明する。第１実施例では、音声信号と雑音信号とが混合されてマイクに入力される過程（混合過程）が線形であることを前提としている。即ち、
ｘｉ＝ａi1・ｓ１＋ａi2・ｓ２＋・・・・・・（２）
というモデルである（ｉ＝１〜４）。 (Second embodiment)
7 to 9 show a second embodiment of the present invention. The same parts as those of the first embodiment are denoted by the same reference numerals and the description thereof will be omitted. Hereinafter, different parts will be described. In the first embodiment, it is assumed that the process (mixing process) in which the audio signal and the noise signal are mixed and input to the microphone is linear. That is,
xi = ai1.s1 + ai2.s2 + (2)
(I = 1 to 4).

しかし、車室内のように、雑音となる音源がスピーカ等のように周期性を有する音響信号を発生する場合は、混合過程は非線形となることがあり、マイクに入力される信号には非線形成分も含まれることになる。即ち、
ｘｉ＝ａi1・ｓ１＋ａi2・ｓ２＋ａi1²・ｓ１²＋ａi2²・ｓ２²・・・
・・・（３）
というモデルとなる。実際には、３次以上の項は極めて値が小さく無視しても問題がないので、各信号源ｓが係数ａにより混合された信号ｕに作用する非線形関数Ｆｉ（ｕ）を、
Ｆｉ（ｕｉ）＝ａｉ・ｕｉ＋ｂｉ・ｕｉ² ・・・（４）
として想定する。尚、（４）式の係数ａｉは図７に示す混合係数ではなく、非線形関数Ｆｉ（ｕｉ）を記述するための係数である。
そこで、第２実施例では、第１実施例のようにＢＳＳ法を用いて分離した信号Ｙｉに非線形関数Ｇｉを作用させることで信号Ｙｉを線形化する処理を行い、信号Ｚｉを得るようにする。 However, if the sound source that generates noise, such as a speaker, generates an acoustic signal with periodicity such as a speaker, the mixing process may be nonlinear, and the signal input to the microphone is nonlinear. Will also be included. That is,
xi = ai1 · s1 + ai2 · s2 + ai1 2 · s1 2 + ai2 2 · s2 2 ···
... (3)
Model. Actually, the third and higher terms are extremely small in value and can be ignored, so there is no problem. Therefore, the nonlinear function Fi (u) acting on the signal u mixed with the coefficient a by each signal source s is expressed as follows:
Fi (ui) = ai · ui + bi · ui ² (4)
Assuming that Note that the coefficient ai in the equation (4) is not a mixing coefficient shown in FIG. 7, but a coefficient for describing the nonlinear function Fi (ui).
Therefore, in the second embodiment, the signal Yi is linearized by applying a nonlinear function Gi to the signal Yi separated by using the BSS method as in the first embodiment to obtain the signal Zi. .

図７は、混合過程が非線形的である場合に、線形化する処理までを含めてモデル化した図５相当図である。但し、非線形関数Ｇｉをより簡単化するため、信号源については、音声信号と、その他の雑音を１つにまとめたものとに２本化して、Ｓ１，Ｓ２としている（従って、信号Ｓ，関数Ｇ，信号Ｚに関するｉは、ｉ＝１，２である）。例えば、Ｙ１が第１実施例における音声信号Ｓであるとすると、Ｙ２は、ノイズ１〜３を全て足し合わせたもの、即ち、
Ｙ２＝Ｎ１＋Ｎ２＋Ｎ３・・・（５）
となっている。 FIG. 7 is a diagram corresponding to FIG. 5 modeled including the linearization process when the mixing process is nonlinear. However, in order to further simplify the nonlinear function Gi, the signal source is divided into a speech signal and other noises that are combined into one, and S1 and S2 are obtained (therefore, the signal S, the function). I for G and signal Z is i = 1, 2). For example, if Y1 is the audio signal S in the first embodiment, Y2 is the sum of all noises 1-3, ie,
Y2 = N1 + N2 + N3 (5)
It has become.

また、図８は図３相当図であるが、ステップＳ４，Ｓ５の間に「線形化処理」のステップ８が挿入されることになる。
以下、ステップＳ８で行う線形化処理についてより詳細に説明する。また、各信号は全て小文字で表記する。混合過程において非線形関数Ｆｉ（ｕｉ）を経て得られる混合信号ｘｉは、
ｘｉ＝ｃｉ・ｓ１＋ｄｉ・ｓ２＋ｅｉ・ｓ１²＋ｆｉ・ｓ１・ｓ２＋ｇｉ・ｓ２²
・・・（６）
となる。尚、ｃ，ｄ，ｅ，ｆ，ｇは、各項の係数である。 FIG. 8 is a diagram corresponding to FIG. 3, but step 8 of “linearization processing” is inserted between steps S4 and S5.
Hereinafter, the linearization process performed in step S8 will be described in more detail. Each signal is written in lower case. The mixed signal xi obtained through the nonlinear function Fi (ui) in the mixing process is
xi = ci · s1 + di · s2 + ei · s1 ² + fi · s1 · s2 + gi · s2 ²
... (6)
It becomes. Note that c, d, e, f, and g are coefficients of each term.

ある２つの独立な信号（ｓ１，ｓ２）は、その高次項（ｓ１²，ｓ２²）を含んでいても独立であると、一般に言うことができる。従って、混合信号ｘｉを独立した形に分離した信号ｙｉは、
ｙ１＝ｈ１・ｓ１＋ｋ１・ｓ１² ・・・（７）
ｙ２＝ｈ２・ｓ２＋ｋ２・ｓ２² ・・・（８）
となる。但し、ｈ，ｋは、任意の暫定的な係数である。ここで、例えば（７）式を導出する場合、（６）式よりｓ２，ｓ１・ｓ２，ｓ２²の３項を消去する必要があるが、集音マイクが４つあることから４つの連立方程式が立てられるため、それらに基づき不要な３項を消去することができる。 It can be generally said that two independent signals (s1, s2) are independent even if their higher-order terms (s1 ² , s2 ² ) are included. Therefore, the signal yi obtained by separating the mixed signal xi into an independent form is
y1 = h1 · s1 + k1 · s1 ² (7)
y2 = h2 · s2 + k2 · s2 ² (8)
It becomes. However, h and k are arbitrary provisional coefficients. Here, for example, when the equation (7) is derived, it is necessary to eliminate the three terms s2, s1, s2, and s2 ² from the equation (6), but since there are four sound collecting microphones, four simultaneous equations are required. Therefore, unnecessary three terms can be deleted based on them.

そして、ｓｉの２次方程式
ｋｉ・ｓｉ²＋ｈｉ・ｓｉ−ｙｉ＝０・・・（９）
より、解の公式から原信号ｓｉを求めると、
ｓｉ＝−ｈｉ／（２ｋｉ）±（ｈｉ²／（４ｋｉ²）＋ｙｉ／ｋｉ）^1/2
・・・（１０）
が得られる。この（１０）式を非線形関数Ｇｉ（ｙｉ）として、演算
ｚｉ＝Ｇｉ（ｙｉ）＝−αｉ／２±（αｉ²／４＋ｙｉ／βｉ）^1/2 ・・・（１１）
（αｉ＝ｈｉ／ｋｉ，βｉ＝１／ｋｉ）
を行なうことで、線形化処理した信号ｚｉが得られる。つまり、非線形関数Ｇｉは、非線形関数Ｆｉの逆関数的な存在である。 And the quadratic equation of si ki · si ² + hi · si−yi = 0 (9)
From the solution formula, the original signal si is
si = −hi / (2ki) ± (hi ² / (4ki ² ) + yi / ki) ^1/2
... (10)
Is obtained. The (10) as a nonlinear function Gi (yi), and operation zi = Gi (yi) = - αi / 2 ± (αi 2/4 + yi / βi) 1/2 ··· (11)
(Αi = hi / ki, βi = 1 / ki)
By performing the above, a linearized signal zi is obtained. That is, the nonlinear function Gi is an inverse function of the nonlinear function Fi.

ここで、ＢＳＳ法により分離された信号ｙｉには、（７）〜（９）式に示すように１次項ｓｉと２次項ｓｉ²とが含まれているが、周期性を有する音声信号等の場合、１次項の平均は「０」となり、２次項の平均はある値を持つ。従って、分離信号ｙｉの平均値が「０」になれば２次項も「０」になるので、２次項を消去して１次項だけを取り出すことができる。そこで、最終出力信号ｚｉの平均値を誤差関数Ｅｉ（ｔ）とする。尚、ｔは演算の時系列を示す変数である。

また、係数α，βの更新式は、以下のようになる。

Here, the signal yi separated by the BSS method includes a first-order term si and a second-order term si ^{2 as} shown in equations (7) to (9). In this case, the average of the primary term is “0”, and the average of the secondary term has a certain value. Accordingly, when the average value of the separation signal yi becomes “0”, the second-order term also becomes “0”. Therefore, the second-order term can be eliminated and only the first-order term can be extracted. Therefore, an average value of the final output signal zi is set as an error function Ei (t). Note that t is a variable indicating a time series of calculation.

The update formulas for the coefficients α and β are as follows.

尚、（１３），（１４）式におけるηは学習率（ステップ関数）であり、更新される度合いを設定するものである。学習率ηが小さい場合は学習（更新）に時間を要するが確実に最適値に到達することができる。一方、学習率ηが大きい場合は学習（更新）にはあまり時間はかからないが、最適値の近傍で発散してしまい、最適値に到達しないおそれがある。また、Ｍは１回の演算で使用するサンプル数であり、例えば５００〜１０００程度である。 In the equations (13) and (14), η is a learning rate (step function) and sets the degree of updating. When the learning rate η is small, learning (update) takes time, but the optimum value can be surely reached. On the other hand, when the learning rate η is large, learning (updating) does not take much time, but it diverges in the vicinity of the optimum value and may not reach the optimum value. M is the number of samples used in one calculation, and is about 500 to 1000, for example.

図９は、図８のステップＳ８における「線形化処理」の詳細を示すフローチャートである。即ち、出力信号ｚｉ（ｔ）の平均である誤差関数Ｅｉ（ｔ）を演算し（ステップＳ３１）、その結果が最適値（＝０）となるか否かを判定する（ステップＳ３２）。最適値で無ければ（「ＮＯ」）（１３），（１４）式により係数α，βを更新し（ステップＳ３３）、ステップＳ３１に戻って誤差関数Ｅｉ（ｔ）を再度演算する。 FIG. 9 is a flowchart showing details of the “linearization process” in step S8 of FIG. That is, an error function Ei (t) that is an average of the output signals zi (t) is calculated (step S31), and it is determined whether or not the result is an optimum value (= 0) (step S32). If it is not the optimum value (“NO”), the coefficients α and β are updated by the equations (13) and (14) (step S33), and the process returns to step S31 to calculate the error function Ei (t) again.

以上の処理を繰り返した結果、ステップＳ３２においてＥｉ（ｔ）＝０となると（「ＹＥＳ」）、その時点で係数α，βが確定し、非線形関数Ｇｉ（ｙｉ）が得られる（ステップＳ３４）。そして、非線形関数Ｇｉ（ｙｉ）を演算することで、分離信号ｙｉに２次項ｓｉ²を含むことなく、１次項ｓｉ（原信号）に、例えば所定の係数ｑを乗じた形式の出力信号ｚｉ，即ち、
ｚｉ＝Ｇｉ（ｙｉ）＝ｑ・ｓｉ・・・（１７）
を得ることができる（ステップＳ３５）。 As a result of repeating the above processing, when Ei (t) = 0 in Step S32 (“YES”), the coefficients α and β are determined at that time, and the nonlinear function Gi (yi) is obtained (Step S34). Then, by calculating the nonlinear function Gi (yi), the output signal zi in the form in which the primary term si (original signal) is multiplied by, for example, a predetermined coefficient q without including the secondary term si ² in the separated signal yi. That is,
zi = Gi (yi) = q · si (17)
Can be obtained (step S35).

以上のように第２実施例によれば、ＢＳＳ法により分離した信号ｙを、信号源ｓが集音マイク４〜７によって集音される場合の混合過程を非線形モデル化することで、ｙ＝ｈｓ＋ｋｓ²のように設定し、そのモデルに基づいて非線形関数Ｇｉを（１０）式のように定め、学習により係数α，βを最適化して、分離信号ｙを線形化した信号ｚｉを得るようにした。従って、運転者の音声信号とその他の各雑音信号とを、互いにより高精度に分離することが可能となる。 As described above, according to the second embodiment, the signal y separated by the BSS method is converted into a non-linear model by mixing the signal source s when the signal source s is collected by the sound collecting microphones 4 to 7. hs + ks ² is set, the nonlinear function Gi is determined as shown in the equation (10) based on the model, the coefficients α and β are optimized by learning, and the signal zi obtained by linearizing the separated signal y is obtained. did. Therefore, the driver's voice signal and other noise signals can be separated from each other with higher accuracy.

本発明は上記し又は図面に記載した実施例にのみ限定されるものではなく、以下のような変形が可能である。
ステップＳ１４におけるノイズＮ１（ｉ）〜Ｎ３（ｉ）の減算処理は、必要に応じて行えば良い。その場合、ｘ’１（ｉ）〜ｘ４’（ｉ）をｘ１（ｉ）〜ｘ４（ｉ）に置き換えれば良い。
集音マイク４〜７の配置は一例であり、ドライバ音声，エンジン音，スピーカ出力音声，後部座席乗員音声を夫々良好に収集することができれば、異なる車室の構造に応じて適宜変更して実施すれば良い。 The present invention is not limited to the embodiments described above or shown in the drawings, and the following modifications are possible.
The subtraction processing of the noises N1 (i) to N3 (i) in step S14 may be performed as necessary. In that case, x′1 (i) to x4 ′ (i) may be replaced with x1 (i) to x4 (i).
The arrangement of the sound collection microphones 4 to 7 is an example, and if the driver sound, engine sound, speaker output sound, and rear seat occupant sound can be collected satisfactorily, the sound collecting microphones 4 to 7 are appropriately changed according to the structure of the different passenger compartment. Just do it.

後部座席の乗員音声を収集するための集音マイク７は、必要に応じて配置すれば良い。例えば、後部座席に乗員が存在するケースが殆どない車両の場合は削除しても良い。また、２シータの車両の場合も不要である。
また、車室内に配置する集音マイクの数は、５つ以上であっても良い。
車両用ナビゲーション装置に適用するものに限らず、例えば、カーオーディオやカーエアコンなどのシステムに適用して、それらの操作制御を音声認識によって行うようにしても良い。 The sound collection microphone 7 for collecting the occupant voice of the rear seat may be arranged as necessary. For example, in the case of a vehicle with few passengers in the rear seat, the vehicle may be deleted. Also, it is not necessary for a two-theta vehicle.
Further, the number of sound collecting microphones arranged in the vehicle compartment may be five or more.
For example, the present invention may be applied to a system such as a car audio system or a car air conditioner, and the operation control thereof may be performed by voice recognition.

本発明を車両用ナビゲーション装置に適用した場合の一実施例であり、ナビゲーション装置の構成を概略的に示す機能ブロック図1 is a functional block diagram schematically showing a configuration of a navigation device according to an embodiment when the present invention is applied to a vehicle navigation device. 車室内における各集音マイクの配置状態を示す図The figure which shows the arrangement | positioning state of each sound collection microphone in a vehicle interior ナビゲーション装置による処理を、本発明の要旨に係る部分について示すフローチャートThe flowchart which shows the process by a navigation apparatus about the part which concerns on the summary of this invention 図３のステップＳ４における音声分離処理の内容を示すフローチャートThe flowchart which shows the content of the audio | voice separation process in step S4 of FIG. ＢＳＳ法による音声分離処理を説明する図The figure explaining the audio | voice separation process by BSS method 分離されたノイズ信号のフィードバック処理を説明する図The figure explaining the feedback process of the separated noise signal 本発明の第２実施例を示す図５相当図FIG. 5 equivalent diagram showing a second embodiment of the present invention. 図３相当図3 equivalent figure 図８のステップＳ８における「線形化処理」の詳細を示すフローチャートThe flowchart which shows the detail of the "linearization process" in step S8 of FIG.

Explanation of symbols

図面中、１は車両用ナビゲーション装置（車両用音声認識装置）、２は音声分離処理部、３は音声認識処理部、４〜７は集音マイク、９はナビ操作制御回路を示す。 In the drawings, 1 is a vehicle navigation device (vehicle speech recognition device), 2 is a speech separation processing unit, 3 is a speech recognition processing unit, 4 to 7 are sound collecting microphones, and 9 is a navigation operation control circuit.

Claims

In a vehicle voice recognition device that collects voice generated by a vehicle occupant in a passenger compartment with a microphone and performs voice recognition processing.
The sound collecting microphone is installed at a position for collecting at least the driver sound, engine sound, and speaker output sound, and the sound signal input by the plurality of sound collecting microphones is converted into the driver sound signal by the blind source separation method. And a noise signal, and a voice recognition process is performed on the driver voice signal.

The signal y separated by the blind source separation method is set as shown in the following equation by nonlinear modeling the mixing process when the signal source s is collected by the sound collecting microphone,
y = hs + ks ² (h and k are arbitrary provisional coefficients)
The s obtained by solving the above equation, calculation z = G was used as the nonlinear function G (y) = - α / 2 ± (α 2/4 + y / β) 1/2
(Α = h / k, β = 1 / k)
The vehicle speech recognition apparatus according to claim 1, wherein a linearized signal z is obtained by performing the step.

The vehicle voice recognition apparatus according to claim 1 or 2, wherein the plurality of sound collecting microphones are respectively installed in front of a handle, a dashboard, and a passenger seat side speaker.

The vehicular voice recognition device according to any one of claims 1 to 3, wherein the sound collecting microphone is also installed at a position for collecting occupant voice in a rear seat.

The vehicle sound recognition apparatus according to claim 4, wherein the sound collecting microphone is installed on the rear side of the passenger seat.

The vehicular speech recognition apparatus according to any one of claims 1 to 5, wherein the separated noise signal is subtracted from signals input from the plurality of sound collecting microphones.

A vehicle navigation system comprising the vehicle voice recognition device according to any one of claims 1 to 6, wherein the vehicle navigation system is configured to perform operation control based on a voice recognition processing result for the driver voice signal. apparatus.