JP2008145676A

JP2008145676A - Speech recognition device and vehicle navigation device

Info

Publication number: JP2008145676A
Application number: JP2006331876A
Authority: JP
Inventors: Takashi Ishizaki; 貴士石嵜
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2006-12-08
Filing date: 2006-12-08
Publication date: 2008-06-26

Abstract

<P>PROBLEM TO BE SOLVED: To promptly perform speech recognition of a phrase which is desired to recognize by a user, while alleviating burden for uttering a phrase for a user. <P>SOLUTION: A speech recognition device 10 detects a position where the user holds a steering wheel 16, and changes a speech dictionary corresponding to the holding position in a plurality of voice dictionaries 15a to 15c, as a dictionary for use. When speech which the user utters is input, the speech uttered by the user is recognized by collating a speech waveform of the input speech and a similar waveform of the phrase registered in the speech dictionary which is changed as the dictionary for use. The user does not need to utter the phrase for changing one of the dictionaries as the dictionary for use, in the plurality of speech dictionaries 15a to 15c, and the user may only utter the phrase which is desired to recognize. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ユーザが発した音声の音声波形と複数の音声辞書のうちから使用辞書として切替えた音声辞書に登録されている語句の近似波形とを照合して当該ユーザが発した音声を音声認識する音声認識装置及び車両ナビゲーション装置に関する。 The present invention recognizes a voice uttered by a user by collating a voice waveform of a voice uttered by a user with an approximate waveform of a word registered in a voice dictionary switched from a plurality of voice dictionaries as a use dictionary. The present invention relates to a voice recognition device and a vehicle navigation device.

例えば車両ナビゲーション装置に搭載されている音声認識装置では、ユーザが発した音声の音声波形と音声辞書に登録されている語句の近似波形とを照合して当該ユーザが発した音声を音声認識するように構成されている（例えば特許文献１参照）。
特開２００５−１２１９６６号公報 For example, in a speech recognition device mounted on a vehicle navigation device, a speech waveform of a user uttered and an approximate waveform of a phrase registered in a speech dictionary are collated to recognize the speech uttered by the user. (For example, refer to Patent Document 1).
JP 2005-121966 A

ところで、近年では、機能追加に伴って音声認識が多種多様な状況で使用されるようになっており、それに伴って、例えば住所用の音声辞書、施設名称用の音声辞書及びナビゲーション機能制御用の音声辞書などの用途に応じて複数の音声辞書を用意しておき、最初に複数の音声辞書のうちからいずれかを使用辞書として切替え、続いて音声認識させたい音声の音声波形と使用辞書として切替えた音声辞書に登録されている語句の近似波形とを照合し、音声認識させたい音声を音声認識する構成が供されている。 By the way, in recent years, with the addition of functions, voice recognition has been used in a wide variety of situations. Along with this, for example, a voice dictionary for addresses, a voice dictionary for facility names, and navigation function control Prepare multiple voice dictionaries according to the purpose of the voice dictionary, etc., first switch one of the multiple voice dictionaries as the use dictionary, then switch as the voice waveform of the voice you want to recognize and the use dictionary There is provided a configuration for recognizing a speech to be recognized by collating with an approximate waveform of a word registered in the speech dictionary.

この場合、ユーザは、最初に複数の音声辞書のうちからいずれかを使用辞書として切替えるための語句を発声し、続いて音声認識させたい語句を発声する。具体的には、ユーザは、最初に例えば「ジ」「ュ」「ウ」「シ」「ョ」と発声することにより、住所用の音声辞書を使用辞書として切替えることができ、続いて例えば「ア」「イ」「チ」「ケ」「ン」と発声することにより、音声認識させたい「愛知県」という語句を音声認識させることができる。 In this case, the user first utters a phrase for switching one of the plurality of voice dictionaries as a use dictionary, and then utters a phrase to be voice-recognized. Specifically, the user can first switch the address voice dictionary as a use dictionary by, for example, saying “ji” “yu” “u” “shi” “yo”. By saying “a”, “b”, “chi”, “ke”, and “n”, the word “Aichi” that is desired to be recognized can be recognized.

しかしながら、これでは、使用辞書を切替えるための語句と音声認識させたい語句との２つの語句を音声認識することになるので、ユーザにとって語句を２回発声することが煩雑であり、また、ユーザが音声認識させたい語句を音声認識するまでに時間がかかり、操作性に劣るという問題があった。 However, in this case, since two words, a word for switching the dictionary to be used and a word to be recognized by voice, are recognized by voice, it is troublesome for the user to utter the word twice. There is a problem that it takes time until speech recognition is performed for a phrase to be recognized, which is inferior in operability.

本発明は、上記した事情に鑑みてなされたものであり、その目的は、ユーザにとって語句を発声する負担を軽減することができると共に、ユーザが音声認識させたい語句を速やかに音声認識することができ、操作性を高めることができる音声認識装置及び車両ナビゲーション装置を提供することにある。 The present invention has been made in view of the above-described circumstances, and an object of the present invention is to reduce the burden on a user to speak a phrase and to quickly recognize a phrase that the user wants to recognize by speech. It is possible to provide a voice recognition device and a vehicle navigation device that can improve operability.

請求項１に記載した発明によれば、操作位置検出手段は、複数の音声辞書に対応して操作対象に設定されている複数の操作位置のうちからユーザの操作位置を検出し、音声辞書切替手段は、複数の音声辞書のうちから操作位置検出手段が検出した操作位置に対応する音声辞書を使用辞書として切替える。そして、音声認識手段は、ユーザが発した音声を音声入力手段が入力すると、その音声入力手段が入力した音声の音声波形と音声辞書切替手段が使用辞書として切替えた音声辞書に登録されている語句の近似波形とを照合して当該ユーザが発した音声を音声認識する。 According to the first aspect of the present invention, the operation position detecting means detects a user's operation position from among a plurality of operation positions set as operation targets corresponding to a plurality of voice dictionaries, and switches voice dictionaries. The means switches the voice dictionary corresponding to the operation position detected by the operation position detection means from among a plurality of voice dictionaries as a use dictionary. When the voice input unit inputs the voice uttered by the user, the voice recognition unit and the speech waveform input by the voice input unit and the phrase registered in the voice dictionary switched by the voice dictionary switching unit The voice generated by the user is recognized by collating with the approximate waveform.

これにより、ユーザの操作位置に基づいて複数の音声辞書のうちからいずれかを使用辞書として切替えることにより、ユーザが複数の音声辞書のうちからいずれかを使用辞書として切替えるための語句を発声する必要がなくなり、ユーザが音声認識させたい語句のみを発声すれば良く、ユーザにとって語句を発声する負担を軽減することができると共に、ユーザが音声認識させたい語句を速やかに音声認識することができ、操作性を高めることができる。 Accordingly, it is necessary for the user to utter a phrase for switching any one of the plurality of voice dictionaries as the use dictionary by switching any one of the plurality of voice dictionaries based on the operation position of the user. It is sufficient that the user speaks only the words and phrases that the user wants to recognize, and the burden on the users to speak the words can be reduced, and the user can quickly recognize the words and phrases that the user wants to recognize. Can increase the sex.

請求項２に記載した発明によれば、音声辞書切替手段は、自装置が音声認識待ち状態に移行する前に、複数の音声辞書のうちから操作位置検出手段が検出した操作位置に対応する音声辞書を使用辞書として切替えるので、自装置が音声認識待ち状態に移行した時点では、既に複数の音声辞書のうちからいずれかを使用辞書として切替えておくことができ、ユーザが音声認識させたい語句をより速やかに音声認識することができる。 According to the second aspect of the present invention, the voice dictionary switching unit is configured to provide a voice corresponding to the operation position detected by the operation position detection unit from among a plurality of voice dictionaries before the own device shifts to the voice recognition standby state. Since the dictionary is switched as the use dictionary, when the device shifts to the voice recognition standby state, one of the plurality of voice dictionaries can be already switched as the use dictionary, and the word or phrase that the user wants to recognize by voice can be selected. Speech can be recognized more promptly.

請求項３に記載した発明によれば、操作対象は、ステアリングであり、操作位置検出手段は、ユーザがステアリングを握っている位置を検出し、音声辞書切替手段は、複数の音声辞書のうちから操作位置検出手段が検出したステアリング握り位置に対応する音声辞書を使用辞書として切替えるので、ユーザがステアリングを握る位置を切替えることにより、複数の音声辞書のうちからいずれかを使用辞書として切替えることができる。 According to the third aspect of the present invention, the operation target is steering, the operation position detection unit detects a position where the user is grasping the steering, and the voice dictionary switching unit is selected from the plurality of voice dictionaries. Since the speech dictionary corresponding to the steering wheel position detected by the operation position detection means is switched as the use dictionary, any one of the plurality of speech dictionaries can be switched as the use dictionary by switching the position at which the user grips the steering wheel. .

請求項４に記載した発明によれば、操作対象は、ステアリングの一部であるので、ステアリングの操作対象とされている箇所を一方の手で握ることにより複数の音声辞書のうちからいずれかを使用辞書として切替えることができると共に、ステアリングの操作対象とされていない箇所を他方の手で握ることにより運転操作することができ、運転操作に支障を来たすことなく複数の音声辞書のうちからいずれかを使用辞書として容易に切替えることができる。 According to the invention described in claim 4, since the operation target is a part of the steering wheel, one of the plurality of voice dictionaries is selected by grasping the portion that is the steering operation target with one hand. It can be switched as a use dictionary, and it can be operated by grasping a part that is not subject to steering operation with the other hand, and any of a plurality of voice dictionaries can be used without causing any trouble in driving operation. Can be easily switched as a use dictionary.

請求項５に記載した発明によれば、握り圧力検出手段は、ユーザがステアリングを握っている握り圧力を検出し、その握り圧力検出手段が検出した握り圧力が閾値以上であると、自装置が音声認識待ち状態に移行するので、ユーザが例えばステアリングから手を離して別のスイッチを操作しなくとも、ユーザがステアリングを握ったまま閾値以上の握り圧力で握ることにより、音声認識待ち状態に移行することができ、操作性をより一層高めることができる。 According to the invention described in claim 5, the grip pressure detection means detects the grip pressure at which the user is gripping the steering wheel, and if the grip pressure detected by the grip pressure detection means is equal to or greater than a threshold value, Transitions to the voice recognition standby state, so that the user transitions to the voice recognition standby state by holding the steering wheel with a gripping pressure equal to or higher than the threshold value, for example, without operating the other switch by releasing the hand from the steering wheel. Therefore, the operability can be further enhanced.

請求項６に記載した発明によれば、音声辞書割当切替手段は、ステアリングの握り位置に対する音声辞書の割当を切替えるので、例えばステアリングにあってユーザが握り易い位置に使用頻度が高い音声辞書を割当てることにより、操作性をより一層高めることができる。 According to the invention described in claim 6, since the voice dictionary assignment switching means switches the assignment of the voice dictionary to the steering position of the steering wheel, for example, the voice dictionary having a high frequency of use is assigned to a position that is easy for the user to grip in the steering wheel. As a result, the operability can be further enhanced.

以下、本発明の一実施形態について、図面を参照して説明する。図２は、車両ナビゲーション装置の構成を機能ブロック図として示している。車両ナビゲーション装置１は、制御装置２、位置検出器３、地図データ記憶装置４、操作スイッチ群５、車両信号入出力器６、通信装置７、ＶＩＣＳ受信機８、音声制御装置９、音声認識装置１０、表示装置１１、メモリ１２及びリモコンセンサ１３を備えて構成されている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 2 shows the configuration of the vehicle navigation apparatus as a functional block diagram. The vehicle navigation device 1 includes a control device 2, a position detector 3, a map data storage device 4, an operation switch group 5, a vehicle signal input / output device 6, a communication device 7, a VICS receiver 8, a voice control device 9, and a voice recognition device. 10, a display device 11, a memory 12, and a remote control sensor 13.

制御装置２は、ＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ／Ｏインタフェース、これらを接続するバスなど（いずれも図示せず）を備えて構成されており、車両ナビゲーション装置１の動作全般を制御する。位置検出器３は、Ｇセンサ３ａ、ジャイロスコープ３ｂ、距離センサ３ｃ及びＧＰＳ受信機３ｄから構成されており、これら位置検出器３の各構成要素は互いに性質の異なる検出誤差を有している。この場合、制御装置２は、位置検出器３の各構成要素から検出信号を入力すると、それら入力した検出信号を互いに補完し、自車両の現在位置及び進行方向などを検出（特定）する。 The control device 2 includes a CPU, a ROM, a RAM, an I / O interface, a bus for connecting them (not shown), and controls the overall operation of the vehicle navigation device 1. The position detector 3 includes a G sensor 3a, a gyroscope 3b, a distance sensor 3c, and a GPS receiver 3d, and each component of the position detector 3 has detection errors having different properties. In this case, when a detection signal is input from each component of the position detector 3, the control device 2 complements the input detection signals to detect (specify) the current position and traveling direction of the host vehicle.

地図データ記憶装置４は、例えばＤＶＤ−ＲＯＭなどの記録媒体１４から転送された地図データを記憶する。この場合、記録媒体１４は、経路案内に使用する地図データを記録していると共に、図１に示すように、音声辞書として住所用の音声辞書１５ａ、施設名称用の音声辞書１５ｂ及びナビゲーション機能制御用の音声辞書１５ｃなどの用途に応じた複数（本実施形態では３個）の音声辞書を記録している。尚、記録媒体１４は、例えばＨＤＤやメモリカードなどであっても良い。操作スイッチ群５は、表示装置１１の周辺に配置されているメカニカルスイッチ、表示装置１１の例えばカラー液晶ディスプレイ上に形成されるタッチスイッチ及び車室内の所定位置に配置されているＰＴＴスイッチなどから構成されている。 The map data storage device 4 stores map data transferred from a recording medium 14 such as a DVD-ROM. In this case, the recording medium 14 records the map data used for route guidance, and as shown in FIG. 1, as the voice dictionary, the address voice dictionary 15a, the facility name voice dictionary 15b, and the navigation function control. A plurality of (three in the present embodiment) voice dictionaries corresponding to the application such as the voice dictionary 15c for use are recorded. The recording medium 14 may be, for example, an HDD or a memory card. The operation switch group 5 includes a mechanical switch disposed around the display device 11, a touch switch formed on, for example, a color liquid crystal display of the display device 11, a PTT switch disposed at a predetermined position in the vehicle interior, and the like. Has been.

車両信号入出力器６は、車両に搭載されている各種センサや車載機器との間で各種信号を入出力し、例えば車速センサから車速を表す車速信号を入力すると共に、ステアリング１６（本発明でいう操作対象）からユーザが当該ステアリング１６を握っている位置（握り位置）を表す握り位置信号を入力する。ステアリング１６は、図３に示すように、その正面視にて円環状のリム部の左半部の領域に３個の曲線形状をなす圧力センサ１７ａ〜１７ｃが均等に埋込まれており、各々の圧力センサ１７ａ〜１７ｃは、ユーザがステアリング１６の該当する箇所を握っていることに応じて所定値以上の圧力を検出すると、ユーザがステアリング１６の該当する箇所を握っている旨を表す握り位置信号を出力する。この場合、各々の圧力センサ１７ａ〜１７ｃは、ユーザがステアリングを握っている握り圧力を示す数値を握り位置信号に含めて出力する。 The vehicle signal input / output device 6 inputs / outputs various signals to / from various sensors and in-vehicle devices mounted on the vehicle. For example, the vehicle signal input / output device 6 inputs a vehicle speed signal indicating the vehicle speed from the vehicle speed sensor, and the steering 16 (in the present invention). A grip position signal indicating the position (grip position) where the user is gripping the steering wheel 16 is input. As shown in FIG. 3, the steering 16 has three pressure sensors 17a to 17c having an evenly embedded shape embedded in the left half region of the annular rim portion when viewed from the front. The pressure sensors 17a to 17c indicate grip positions indicating that the user is gripping the corresponding portion of the steering wheel 16 when a pressure of a predetermined value or more is detected in response to the user gripping the corresponding portion of the steering wheel 16. Output a signal. In this case, each of the pressure sensors 17a to 17c includes a numerical value indicating a grip pressure at which the user is gripping the steering wheel in the grip position signal and outputs the grip position signal.

ここで、各々の圧力センサ１７ａ〜１７ｃが埋込まれている箇所は、上記した記録媒体１４に記録されている３個の音声辞書１５ａ〜１５ｃに対応している。具体的には、圧力センサ１７ａが埋込まれている箇所は、住所用の音声辞書１５ａに対応しており、圧力センサ１７ｂが埋込まれている箇所は、施設名称用の音声辞書１７ｂに対応しており、圧力センサ１７ｃが埋込まれている箇所は、ナビゲーション機能制御用の音声辞書１７ｃに対応している。 Here, the locations where the respective pressure sensors 17a to 17c are embedded correspond to the three voice dictionaries 15a to 15c recorded on the recording medium 14 described above. Specifically, the location where the pressure sensor 17a is embedded corresponds to the address speech dictionary 15a, and the location where the pressure sensor 17b is embedded corresponds to the facility name speech dictionary 17b. The location where the pressure sensor 17c is embedded corresponds to the voice dictionary 17c for navigation function control.

通信装置７は、移動通信網との間で通信する。ＶＩＣＳ受信機８は、外部からＶＩＣＳ情報を受信する。音声制御装置９は、車室内の所定位置に配置されているマイクロホン１８が入力した音声及び車室内の所定位置に配置されているスピーカ１９が出力する音声を音声制御する。 The communication device 7 communicates with a mobile communication network. The VICS receiver 8 receives VICS information from the outside. The sound control device 9 performs sound control on the sound input by the microphone 18 disposed at a predetermined position in the vehicle interior and the sound output by the speaker 19 disposed at a predetermined position in the vehicle interior.

音声認識装置１０は、図１に示すように、機能毎に、握り位置検出部２０ａ（本発明でいう操作位置検出手段、握り圧力検出手段）、音声辞書切替部２０ｂ（本発明でいう音声辞書切替手段、音声辞書割当切替手段）、音声辞書格納部２０ｃ、音声入力部２０ｄ（本発明でいう音声入力手段）、音声認識部２０ｅ（本発明でいう音声認識手段）及び認識結果出力部２０ｆを備えて構成されている。 As shown in FIG. 1, the speech recognition apparatus 10 includes a grip position detection unit 20a (operation position detection unit and grip pressure detection unit in the present invention) and a voice dictionary switching unit 20b (speech dictionary in the present invention) for each function. Switching means, voice dictionary assignment switching means), voice dictionary storage section 20c, voice input section 20d (speech input means in the present invention), voice recognition section 20e (speech recognition means in the present invention) and recognition result output section 20f. It is prepared for.

握り位置検出部２０ａは、ステアリング１６から出力された握り位置信号を車両信号入出力器６を介して入力すると、その握り位置信号を解析してユーザがステアリング１６を握っている位置（握り位置）を検出すると共に、ユーザがステアリングを握っている握り圧力をも検出する。音声辞書切替部２０ｂは、ＤＶＤ１４に記録されている住所用の音声辞書１５ａ、施設名称用の音声辞書１５ｂ及びナビゲーション機能制御用の音声辞書１５ｃからいずれかを使用辞書として切替える（選択する）。また、音声辞書切替部２０ｂは、例えばユーザが操作スイッチ群５にて音声辞書の割当を切替える操作を行うと、上記した圧力センサ１７ａ〜１７ｃが埋込まれている箇所と記録媒体１４に記録されている３個の音声辞書１５ａ〜１５ｃとの対応（割当）を切替える。音声辞書格納部２０ｃは、音声辞書切替部２０ｂが使用辞書として切替えた音声辞書を格納する。 When the grip position detection unit 20a inputs the grip position signal output from the steering 16 via the vehicle signal input / output device 6, the grip position signal is analyzed to analyze the position at which the user is gripping the steering wheel 16 (grip position). And the grip pressure at which the user is gripping the steering wheel is also detected. The voice dictionary switching unit 20b switches (selects) one of the address voice dictionary 15a, the facility name voice dictionary 15b, and the navigation function control voice dictionary 15c recorded on the DVD 14 as a use dictionary. Further, for example, when the user performs an operation of switching the assignment of the voice dictionary with the operation switch group 5, the voice dictionary switching unit 20 b is recorded on the recording medium 14 and the place where the above-described pressure sensors 17 a to 17 c are embedded. The correspondence (allocation) with the three voice dictionaries 15a to 15c is switched. The speech dictionary storage unit 20c stores the speech dictionary switched by the speech dictionary switching unit 20b as a use dictionary.

音声入力部２０ｄは、マイクロホン１８が入力した音声を音声制御装置９を介して入力する。音声認識部２０ｅは、音声入力部２０ｄが音声を入力すると、音声辞書切替部２０ｂが格納している音声辞書、つまり、音声辞書切替部２０ｂが使用辞書として切替えた音声辞書を参照し、音声入力部２０ｄが入力した音声の音声波形と音声辞書格納部２０ｃが使用辞書として格納している音声辞書に登録されている語句の近似波形とを照合し、ユーザが発した音声を音声認識する。認識結果出力部２０ｆは、音声認識部２０ｅが音声認識した認識結果を制御装置２に出力する。 The voice input unit 20 d inputs the voice input by the microphone 18 via the voice control device 9. When the voice input unit 20d inputs a voice, the voice recognition unit 20e refers to the voice dictionary stored in the voice dictionary switching unit 20b, that is, the voice dictionary switched by the voice dictionary switching unit 20b as a use dictionary. The speech waveform input by the unit 20d is compared with the approximate waveform of a phrase registered in the speech dictionary stored in the speech dictionary storage unit 20c as a use dictionary, so that speech uttered by the user is recognized as speech. The recognition result output unit 20f outputs the recognition result recognized by the voice recognition unit 20e to the control device 2.

制御装置２は、このようにして音声認識装置１０から入力した認識結果に基づいて、例えば目的地を検索したり地図表示の縮尺を変更したりする。尚、音声認識装置１０は、例えば上記したＰＴＴスイッチが押下されている期間に音声認識待ち状態となり、上記した音声認識を行うことが可能になる。 Based on the recognition result input from the speech recognition device 10 in this manner, the control device 2 searches for a destination or changes the scale of the map display, for example. Note that the voice recognition device 10 is in a voice recognition standby state, for example, while the PTT switch is pressed, and can perform the voice recognition described above.

表示装置１１は、例えばカラー液晶ディスプレイから構成されており、自車両の現在位置を表す現在位置図形や走行軌跡を地図データに対応する地図上に重ねて表示する。尚、表示装置１０は、有機ＥＬやプラズマディスプレイなどから構成されていても良い。メモリ１２は、例えば着脱可能なフラッシュメモリカードなどにより構成されている。リモコンセンサ１３は、操作リモコン２１から送信された操作信号を受信して制御装置２に出力する。 The display device 11 is composed of, for example, a color liquid crystal display, and displays a current position graphic representing the current position of the host vehicle and a travel locus on a map corresponding to map data. The display device 10 may be composed of an organic EL, a plasma display, or the like. The memory 12 is composed of, for example, a removable flash memory card. The remote control sensor 13 receives the operation signal transmitted from the operation remote controller 21 and outputs it to the control device 2.

次に、上記した構成の作用として、音声認識装置１０が行う処理を図４に示すフローチャートを参照して説明する。
音声認識装置１０は、ユーザがＰＴＴスイッチを押下したことに応じて音声認識待ち状態に移行すると、その時点でステアリング１６から車両信号入出力器６を介して入力している握り位置信号を解析してユーザがステアリング１６を握っている位置（握り位置）を握り位置検出部２０ａにて検出する（ステップＳ１）。 Next, as an operation of the above-described configuration, processing performed by the speech recognition apparatus 10 will be described with reference to a flowchart shown in FIG.
When the voice recognition device 10 shifts to a voice recognition standby state in response to the user pressing the PTT switch, the voice recognition device 10 analyzes the grip position signal input from the steering 16 via the vehicle signal input / output device 6 at that time. Then, the position where the user is gripping the steering wheel 16 (grip position) is detected by the grip position detection unit 20a (step S1).

次いで、音声認識装置１０は、その時点で音声辞書格納部２０ｃが格納している音声辞書、つまり、音声辞書切替部２０ｂが使用辞書として切替えている音声辞書を参照し（ステップＳ２）、音声辞書切替部２０ｂが使用辞書として切替えている音声辞書が当該検出した握り位置に対応しているか否かを判定する（ステップＳ３）。 Next, the speech recognition apparatus 10 refers to the speech dictionary stored in the speech dictionary storage unit 20c at that time, that is, the speech dictionary switched by the speech dictionary switching unit 20b as a use dictionary (step S2). It is determined whether or not the voice dictionary that is switched as the use dictionary by the switching unit 20b corresponds to the detected grip position (step S3).

ここで、音声認識装置１０は、音声辞書切替部２０ｂが使用辞書として切替えている音声辞書が当該検出した握り位置に対応している旨を判定すると（ステップＳ３にて「ＹＥＳ」）、ユーザが発した音声の入力を待機すると共に（ステップＳ６）、音声認識待ち状態を解除したか否かを判定する（ステップＳ７）。そして、音声認識装置１０は、ユーザが発した音声を音声入力部２０ｄにて入力した旨を判定すると（ステップＳ６にて「ＹＥＳ」）、入力した音声の音声波形と使用辞書として切替えている音声辞書に登録されている語句の近似波形とを照合してユーザが発した音声を音声認識部２０ｅにて音声認識し（ステップＳ８）、その音声認識した認識結果を制御装置２に出力し（ステップＳ９）、
上記したステップＳ１に戻り、上記した一連の処理を繰返して行う。 Here, when the voice recognition device 10 determines that the voice dictionary switched by the voice dictionary switching unit 20b as the use dictionary corresponds to the detected grip position ("YES" in step S3), the user recognizes the voice dictionary. While waiting for input of the uttered voice (step S6), it is determined whether or not the voice recognition waiting state has been released (step S7). If the voice recognition device 10 determines that the voice uttered by the user has been input by the voice input unit 20d ("YES" in step S6), the voice waveform of the input voice and the voice switched as the use dictionary are determined. The speech uttered by the user by collating with the approximate waveform of the word registered in the dictionary is recognized by the speech recognition unit 20e (step S8), and the recognition result of the speech recognition is output to the control device 2 (step S8). S9),
Returning to step S1, the above-described series of processing is repeated.

一方、音声認識装置１０は、音声辞書切替部２０ｂが使用辞書として切替えている音声辞書が当該検出した握り位置に対応していない旨を判定すると（ステップＳ３にて「ＮＯ」）、検出した握り位置に対応する音声辞書を使用辞書として切替え（ステップＳ４）、使用辞書として切替えた音声辞書を音声辞書格納部２０ｃに格納する（ステップＳ５）。そして、音声認識装置１０は、ユーザが発した音声の入力を待機すると共に（ステップＳ６）、音声認識待ち状態を解除したか否かを判定し（ステップＳ７）、ユーザが発した音声を入力した旨を判定すると（ステップＳ６にて「ＹＥＳ」）、これ以降、上記したステップＳ８，Ｓ９を行う。 On the other hand, when the speech recognition device 10 determines that the speech dictionary switched as the use dictionary by the speech dictionary switching unit 20b does not correspond to the detected grip position (“NO” in step S3), the detected grip The speech dictionary corresponding to the position is switched as a use dictionary (step S4), and the speech dictionary switched as the use dictionary is stored in the speech dictionary storage unit 20c (step S5). Then, the voice recognition device 10 waits for input of the voice uttered by the user (step S6), determines whether or not the voice recognition waiting state is canceled (step S7), and inputs the voice uttered by the user. If it is determined ("YES" in step S6), then steps S8 and S9 described above are performed.

また、音声認識装置１０は、ＰＴＴスイッチの押下が解除されたことに応じて音声認識待ち状態を解除した旨を判定すると（ステップＳ７にて「ＹＥＳ」）、上記した一連の処理を終了する。 When the voice recognition device 10 determines that the voice recognition standby state has been released in response to the release of the PTT switch being released (“YES” in step S7), the above-described series of processing ends.

ところで、以上は、ユーザがＰＴＴスイッチを押下したことに応じて音声認識待ち状態に移行する構成を説明したが、握り位置検出部２０ａが検出した握り圧力が閾値以上であることを条件として音声認識待ち状態に移行する構成であっても良い。また、ユーザがＰＴＴスイッチを押下したことに応じて音声認識待ち状態に移行した後に、ユーザがステアリング１６を握っている位置に応じて音声辞書を切替える構成を説明したが、例えば車両ナビゲーション装置１が起動している場合に、ユーザがＰＴＴスイッチを押下していない状況であっても、ユーザがステアリング１６を握っている位置に応じて音声辞書を切替える構成であっても良い。 By the way, the above has described the configuration in which the user shifts to the voice recognition waiting state in response to the user pressing the PTT switch. However, the voice recognition is performed on condition that the grip pressure detected by the grip position detection unit 20a is equal to or greater than a threshold value. It may be configured to shift to a waiting state. In addition, the configuration in which the voice dictionary is switched according to the position where the user is holding the steering wheel 16 after shifting to the voice recognition waiting state in response to the user pressing the PTT switch has been described. Even if the user is not pressing the PTT switch when activated, the voice dictionary may be switched according to the position where the user is holding the steering wheel 16.

以上に説明したように本実施形態によれば、音声認識装置１０において、ユーザがステアリング１６を握っている位置を検出し、複数の音声辞書１５ａ〜１５ｃのうちから握り位置に対応する音声辞書を使用辞書として切替え、ユーザが発した音声を入力すると、その入力した音声の音声波形と使用辞書として切替えた音声辞書に登録されている語句の近似波形とを照合して当該ユーザが発した音声を音声認識するように構成したので、ユーザが複数の音声辞書１５ａ〜１５ｃのうちからいずれかを使用辞書として切替えるための語句を発声する必要がなくなり、ユーザが音声認識させたい語句のみを発声すれば良く、ユーザにとって語句を発声する負担を軽減することができると共に、ユーザが音声認識させたい語句を速やかに音声認識することができ、操作性を高めることができる。 As described above, according to the present embodiment, the voice recognition device 10 detects the position where the user is gripping the steering wheel 16, and selects a voice dictionary corresponding to the grip position from the plurality of voice dictionaries 15 a to 15 c. When a voice uttered by a user is input after switching as a use dictionary, the voice uttered by the user is checked by comparing the voice waveform of the input voice with the approximate waveform of a phrase registered in the voice dictionary switched as the use dictionary. Since the speech recognition is configured, it is not necessary for the user to utter a word or phrase for switching any one of the plurality of speech dictionaries 15a to 15c as the use dictionary, and only the word or phrase that the user wants to recognize by speech is spoken. It is possible to reduce the burden of speaking a phrase for the user, and quickly recognize the phrase that the user wants to recognize. It can, it is possible to improve the operability.

また、音声認識装置１０が音声認識待ち状態に移行する前に、複数の音声辞書１５ａ〜１５ｃのうちから握り位置に対応する音声辞書を使用辞書として切替えるように構成すれば、音声認識装置１０が音声認識待ち状態に移行した時点では、既に複数の音声辞書１５ａ〜１５ｃのうちからいずれかを使用辞書として切替えておくことができ、ユーザが音声認識させたい語句をより速やかに音声認識することができる。 Further, if the speech recognition device 10 is configured to switch the speech dictionary corresponding to the grip position as the use dictionary from among the plurality of speech dictionaries 15a to 15c before the speech recognition device 10 shifts to the speech recognition standby state, the speech recognition device 10 can be used. At the time of shifting to the voice recognition standby state, any one of the plurality of voice dictionaries 15a to 15c can be already switched as the use dictionary, and the user can more quickly recognize the words that the user wants to recognize by voice. it can.

また、ステアリング１６の左半部を利用してユーザがステアリング１６を握っている位置を検出するように構成したので、ステアリング１６の左半部を左手で握ることにより複数の音声辞書１５ａ〜１５ｃのうちからいずれかを使用辞書として切替えることができると共に、ステアリング１６の右半部を右手で握ることにより運転操作することができ、運転操作に支障を来たすことなく複数の音声辞書１５ａ〜１５ｃのうちからいずれかを使用辞書として容易に切替えることができる。 In addition, since the position where the user is grasping the steering wheel 16 is detected using the left half portion of the steering wheel 16, the left half portion of the steering wheel 16 is grasped with the left hand so that a plurality of voice dictionaries 15 a to 15 c can be stored. One of them can be switched as a use dictionary, and a driving operation can be performed by grasping the right half of the steering wheel 16 with the right hand. Among the plurality of voice dictionaries 15a to 15c, there is no hindrance to the driving operation. Can be easily switched as a use dictionary.

また、握り位置検出部２０ａが検出した握り圧力が閾値以上であることを条件として音声認識待ち状態に移行するように構成すれば、ユーザが例えばステアリング１６から手を離して別のスイッチを操作しなくとも、ユーザがステアリング１６を握ったまま閾値以上の握り圧力で握ることにより、音声認識待ち状態に移行することができ、操作性をより一層高めることができる。 In addition, when the grip position detection unit 20a is configured to shift to the voice recognition waiting condition on condition that the grip pressure detected by the grip position detection unit 20a is equal to or greater than the threshold, the user releases the hand from the steering wheel 16 and operates another switch, for example. Even if the user does not hold the steering wheel 16 and grips it with a gripping pressure equal to or higher than the threshold, the user can shift to a voice recognition standby state and can further improve operability.

さらに、例えばユーザが操作スイッチ群５にて音声辞書の割当を切替える操作を行うと、ステアリング１６の握り位置に対する音声辞書の割当を切替えるように構成したので、ステアリング１６にあってユーザが握り易い位置に使用頻度が高い音声辞書を割当てることにより、操作性をより一層高めることができる。 Furthermore, for example, when the user performs an operation of switching the assignment of the voice dictionary with the operation switch group 5, the assignment of the voice dictionary to the grip position of the steering wheel 16 is switched. By assigning a voice dictionary having a high use frequency to, operability can be further improved.

本発明は、上記した実施形態にのみ限定されるものではなく、以下のように変形または拡張することができる。
操作対象は、ステアリングに限らず、運転操作に支障を来たすことがない範囲でユーザが運転中に操作可能な別の機器（例えばシフトレバーなど）であっても良い。
ユーザがステアリングを握っている位置を検出する手段として、圧力センサが利用される構成に限らず、例えば電極などの他のセンサが利用される構成であっても良い。また、センサがステアリングの右半部、下部あるいは上部に配置される構成であっても良い。
音声辞書は、住所用の音声辞書、施設名称用の音声辞書及びナビゲーション機能制御用の音声辞書以外の用途のものであっても良い。
ユーザがＰＴＴスイッチを押下することやユーザがステアリングを閾値以上の握り圧力で握ること以外のユーザ操作をトリガとして音声認識装置が音声認識待ち状態に移行する構成であっても良い。 The present invention is not limited to the above-described embodiment, and can be modified or expanded as follows.
The operation target is not limited to the steering, but may be another device (for example, a shift lever) that can be operated by the user during driving within a range in which the driving operation is not hindered.
The means for detecting the position where the user is grasping the steering is not limited to the configuration in which the pressure sensor is used, and may be a configuration in which another sensor such as an electrode is used. Moreover, the structure by which a sensor is arrange | positioned at the right half part, lower part, or upper part of steering may be sufficient.
The voice dictionary may be used for purposes other than the address voice dictionary, the facility name voice dictionary, and the navigation function control voice dictionary.
A configuration in which the voice recognition device shifts to a voice recognition standby state triggered by a user operation other than the user pressing the PTT switch or the user gripping the steering wheel with a gripping pressure equal to or higher than a threshold value may be used.

本発明の一実施形態を示すもので、音声認識装置の機能ブロック図1 is a functional block diagram of a speech recognition apparatus according to an embodiment of the present invention. 車両ナビゲーション装置の機能ブロック図Functional block diagram of vehicle navigation device ステアリングの握り位置と音声辞書との対応を示す図The figure which shows correspondence with the grip position of the steering wheel and the voice dictionary フローチャートflowchart

Explanation of symbols

図面中、１は車両ナビゲーション装置、１０は音声認識装置、１５ａ〜１５ｃは音声辞書、１６はステアリング（操作対象）、２０ａは握り位置検出部（操作位置検出手段、握り圧力検出手段）、２０ｂは音声辞書切替部（音声辞書切替手段、音声辞書割当切替手段）、２０ｄは音声入力部（音声入力手段）、２０ｅは音声認識部（音声認識手段）である。 In the drawings, 1 is a vehicle navigation device, 10 is a speech recognition device, 15a to 15c are speech dictionaries, 16 is a steering (operation target), 20a is a grip position detection unit (operation position detection means, grip pressure detection means), and 20b is A speech dictionary switching unit (speech dictionary switching unit, speech dictionary allocation switching unit), 20d is a speech input unit (speech input unit), and 20e is a speech recognition unit (speech recognition unit).

Claims

A voice dictionary switching means for switching one of a plurality of voice dictionaries as a use dictionary; a voice input means for inputting a voice uttered by a user; a voice waveform of the voice inputted by the voice input means; and the voice dictionary switching means. A speech recognition device comprising speech recognition means for recognizing speech uttered by the user by collating with an approximate waveform registered in a speech dictionary switched as a use dictionary,
An operation position detecting means for detecting a user's operation position from among a plurality of operation positions set as operation targets corresponding to a plurality of voice dictionaries;
The voice recognition apparatus characterized in that the voice dictionary switching means switches a voice dictionary corresponding to the operation position detected by the operation position detection means from a plurality of voice dictionaries as a use dictionary.

The speech recognition apparatus according to claim 1,
The voice dictionary switching means switches the voice dictionary corresponding to the operation position detected by the operation position detection means from among a plurality of voice dictionaries as a use dictionary before the apparatus shifts to a voice recognition standby state. Voice recognition device.

The speech recognition apparatus according to claim 1 or 2,
The operation target is steering,
The operation position detecting means detects a position where the user is holding the steering wheel,
The speech recognition apparatus characterized in that the speech dictionary switching means switches a speech dictionary corresponding to a steering grip position detected by the operation position detection means from a plurality of speech dictionaries as a use dictionary.

The speech recognition apparatus according to claim 3,
The speech recognition apparatus characterized in that the operation target is a part of a steering wheel.

The speech recognition device according to claim 3 or 4,
A grip pressure detecting means for detecting a grip pressure at which the user is gripping the steering wheel;
The speech recognition apparatus according to claim 1, wherein when the grip pressure detected by the grip pressure detection means is equal to or greater than a threshold value, the apparatus shifts to a speech recognition standby state.

The speech recognition device according to any one of claims 3 to 5,
A speech recognition apparatus comprising speech dictionary assignment switching means for switching assignment of a speech dictionary to a steering position of a steering wheel.

A voice recognition device according to any one of claims 1 to 6, comprising:
A vehicle navigation device that performs a process related to navigation based on a recognition result recognized by the voice recognition device.