JP2003015688A

JP2003015688A - Voice recognition method and apparatus

Info

Publication number: JP2003015688A
Application number: JP2001201888A
Authority: JP
Inventors: Hiromi Tokifuji; 浩美時藤; Akiichi Kobayashi; 明一小林
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-07-03
Filing date: 2001-07-03
Publication date: 2003-01-17

Abstract

(57)【要約】【課題】トークバック機能付きの音声認識方法では、
認識結果を「○○○を表示します」と表現内容が固定的
であり、また、誤認識が発生した場合に音声入力操作を
繰り返す必要があった。【解決手段】入力した単語の音声データと単語辞書内
の音声パターンとの類似度を計算し、類似度の高い順に
５個の認識結果候補を取得し、類似度の高い順に１個ず
つトークバックにより出力して、類似度が基準よりも高
いものについては「○○○を表示します」と肯定表現と
し、類似度が基準よりも低いものについては「○○○で
すか」と間接疑問表現とし、さらにその認識結果が正し
いか間違っているかをユーザに音声により確認を求め、
正しい場合にはその認識結果を確定し、間違っている場
合には次候補の認識結果を表示する。 (57) [Summary] [Problem] In a speech recognition method with a talkback function,
The expression content of the recognition result is "display xxx", and the voice input operation has to be repeated when erroneous recognition occurs. SOLUTION: A similarity between voice data of an input word and a voice pattern in a word dictionary is calculated, five recognition result candidates are acquired in descending order of similarity, and talkback is performed one by one in descending order of similarity. If the similarity is higher than the criterion, a positive expression will be displayed as "○○○". If the similarity is lower than the criterion, the expression will be "○○○？" And ask the user to confirm by voice whether the recognition result is correct or incorrect,
If it is correct, the recognition result is determined. If it is incorrect, the recognition result of the next candidate is displayed.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、入力操作の確認を
行うトークバック機能を備えた音声認識方法および装置
と車載ナビゲーション装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition method and device having a talkback function for confirming an input operation, and an on-vehicle navigation device.

【０００２】[0002]

【従来の技術】従来、この種の車載ナビゲーション装置
では、目的地の設定やルートの設定、施設の検索等を、
ディスプレイの前面に設けられたタッチパネルまたはリ
モコン操作の他に、音声認識技術を利用して音声入力に
より簡単に行うことができる。例えば音声入力ボタンを
押してから「○○○へ行く」と音声入力するだけで、そ
こを目的地に設定した推奨経路がディスプレイ上の地図
に表示される。また、施設の名称、住所、電話番号、郵
便番号等を、何段階かに分けて音声入力することによ
り、希望する施設や住居の検索が可能である。このよう
な車載ナビゲーション装置おける音声による入力は、ユ
ーザが車両の運転のために前方を注視しながら、またハ
ンドルから手を離すことなく入力できるので、安全運転
に大きく寄与することができる。また、装置に入力され
た音声に応答して音声を出力するトークバック機能によ
り、ユーザが入力操作した内容と装置が判断した内容と
が一致しているかどうかを確認することができ、大変便
利である。2. Description of the Related Art Conventionally, in this type of vehicle-mounted navigation device, setting of a destination, setting of a route, search of facilities, etc.
In addition to the touch panel provided on the front surface of the display or remote control operation, voice recognition technology can be used to easily perform voice input. For example, by simply pressing the voice input button and then voice inputting “go to XX”, the recommended route with the destination set as the destination is displayed on the map on the display. In addition, it is possible to search for a desired facility or residence by voice inputting the facility name, address, telephone number, zip code, etc. in several stages. The voice input in such an in-vehicle navigation device can be greatly contributed to safe driving because the user can input while watching the front for driving the vehicle and without releasing the hand from the steering wheel. In addition, the talkback function that outputs voice in response to the voice input to the device allows you to check whether the content input by the user matches the content determined by the device, which is very convenient. is there.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記従
来のトークバック機能付きの音声認識方法では、例えば
「八王子駅」と音声入力した場合に、トークバックによ
り「八王子駅を表示します」と認識結果が音声により出
力されるが、その場合の表現内容が固定的である。ユー
ザが「八王子駅」を音声入力して、音声認識の結果、八
王子駅が正しく認識された場合はよいが、常に正しい音
声認識ができるとは限らない。話者の発音が不明確であ
ったり、外部の雑音が混入することにより、誤認識が発
生することがあり、このため通常は３回までの再入力が
可能になっている。そこで、「八王子駅」と音声入力し
たにも拘らず、誤って「王子駅」として認識され、「王
子駅を表示します」とトークバックが行われた場合、ユ
ーザは再びリモコンを操作して音声入力をやり直さなけ
ればならず、負担が大きくなるばかりでなく、装置の信
頼性に対して不満を持つことになる。[Problems to be Solved by the Invention] However, in the above-mentioned conventional voice recognition method with a talkback function, for example, when a voice input is made for "Hachioji Station", the result of recognition is "display Hachioji Station" by the talkback. Is output by voice, but the expression content in that case is fixed. The user may input “Hachioji station” by voice, and as a result of voice recognition, the Hachioji station may be correctly recognized, but correct voice recognition is not always possible. The speaker's pronunciation may be unclear or external noise may mix in, resulting in erroneous recognition. Therefore, normally, re-input is possible up to three times. Therefore, if the voice is entered as "Hachioji Station" but it is mistakenly recognized as "Oji Station" and the talk back "Display Prince Station" is made, the user operates the remote control again. The voice input must be redone, which not only increases the burden but also makes the device unsatisfactory.

【０００４】本発明は、このような従来の問題を解決す
るものであり、装置に対する信頼性を高めることがで
き、操作が簡単で使い勝手の良好な音声認識方法および
装置とそれを利用した車載ナビゲーション装置を提供す
ることを目的とする。The present invention solves such a conventional problem, improves the reliability of the device, is easy to operate, and is a convenient voice recognition method and device, and in-vehicle navigation using the same. The purpose is to provide a device.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するため
に、本発明の音声認識方法は、入力した単語の音声デー
タと単語辞書内の音声パターンとの類似度を計算し、類
似度の高い順に複数の音声パターンを単語認識候補と
し、前記単語認識候補を音声で出力することにより入力
操作の確認を行う際に、類似度が基準よりも高いか低い
かによって出力される音声の表現内容を変えることを特
徴するものである。類似度の計算は、通常はゼロから９
９９９通りの計算結果が得られ、得点数が高いほど類似
度が高くなる。そこで、例えば３０００点を基準とし
て、３０００点以下の場合には誤認識の可能性が高くな
り、３０００点を越えると誤認識の可能性が低くなるこ
とが経験的に知られているので、この点数を基準として
トークバックによる表現内容を変えることにより、ユー
ザは、装置がきちんと判断を行っていると理解し、装置
に対する信頼性を高めることができる。In order to achieve the above object, the speech recognition method of the present invention calculates the degree of similarity between the voice data of an input word and the voice pattern in the word dictionary to obtain a high degree of similarity. A plurality of voice patterns are sequentially used as word recognition candidates, and when the input operation is confirmed by outputting the word recognition candidates by voice, the expression content of the voice output depending on whether the similarity is higher or lower than a reference is displayed. It is characterized by changing. Similarity calculations are usually from 0 to 9
999 calculation results are obtained, and the higher the score, the higher the similarity. Therefore, for example, it is empirically known that the possibility of erroneous recognition increases when the number of points is 3000 points or less and the possibility of erroneous recognition decreases when the number of points exceeds 3000. By changing the content expressed by the talkback based on the score, the user can understand that the device makes a proper judgment and can improve the reliability of the device.

【０００６】また、本発明の音声認識方法は、前記類似
度が基準よりも高い場合には前記出力される音声の表現
内容を肯定形とし、前記類似度が基準よりも低い場合に
は前記出力される音声の表現内容を疑問形とすることを
特徴とするものであり、例えば類似度が高い場合は「○
○○を表示します」と表現し、類似度が低い場合には
「○○○ですか」と問いかける表現をすることにより、
ユーザは、装置がきちんと判断を行っていると理解し、
装置に対する信頼性をより高めることができる。Further, in the voice recognition method of the present invention, when the similarity is higher than a reference, the expression content of the output voice is affirmative, and when the similarity is lower than the reference, the output is performed. It is characterized in that the expression content of the voice that is expressed is a question form. For example, if the similarity is high, "○
"○○ is displayed", and when the similarity is low, by asking "Is it ○○○?"
The user understands that the device is making the right decisions,
The reliability of the device can be further increased.

【０００７】また、本発明の音声認識方法は、前記出力
される音声によりユーザに出力単語の「正」「否」の確
認を求め、「正」の場合にはその出力単語を確定し、
「否」の場合には次候補の単語を音声で出力することを
特徴するものであり、誤認識が発生した場合でも、従来
のように音声入力を再び行う必要がなく、例えば「いい
え」と発声するだけで次候補が表示されるので、ユーザ
にとっては負担が少なく、使い勝手を向上させることが
できる。Further, in the voice recognition method of the present invention, the user is asked to confirm whether the output word is “correct” or “not” by the output voice, and when it is “correct”, the output word is confirmed.
In the case of "no", it is characterized in that the next candidate word is output by voice, and even if an erroneous recognition occurs, it is not necessary to input the voice again as in the conventional case. Since the next candidate is displayed only by uttering, the burden on the user is small and the usability can be improved.

【０００８】また、本発明の音声認識装置は、入力した
単語の音声データと単語辞書内の音声パターンとの類似
度を計算し、類似度の高い順に複数の音声パターンを単
語認識候補として出力することにより入力操作の確認を
行う音声認識手段と、前記出力される音声により確認を
行う際に、類似度が基準よりも高いか低いかによって前
記出力される音声による表現内容を変える制御手段とを
備えたものであり、例えば類似度が高い場合は「○○○
を表示します」と表現し、類似度が低い場合には「○○
○ですか」と問いかける表現をすることにより、ユーザ
は装置がきちんと判断を行っていると理解し、装置に対
する信頼性をより高めることができる。Further, the voice recognition apparatus of the present invention calculates the similarity between the voice data of the input word and the voice pattern in the word dictionary, and outputs a plurality of voice patterns as word recognition candidates in descending order of similarity. A voice recognition means for confirming the input operation, and a control means for changing the expression content by the output voice depending on whether the similarity is higher or lower than a reference when confirming the output voice. For example, if the similarity is high, "○○○
Is displayed. ”And when the degree of similarity is low,“ ○○
By asking the question "Is it o?", The user can understand that the device is making a proper judgment and can further enhance the reliability of the device.

【０００９】また、本発明の音声認識装置は、前記制御
手段が、前記出力される音声によりユーザに出力単語の
「正」「否」の確認を求め、「正」の場合にはその出力
単語を確定し、「否」の場合には次候補の単語を音声で
出力することを特徴するものであり、誤認識が発生した
場合でも、従来のように音声入力を再び行う必要がな
く、例えば「いいえ」と発声するだけで次候補が表示さ
れるので、ユーザにとっては負担が少なく、使い勝手を
向上させることができる。Further, in the voice recognition device of the present invention, the control means asks the user for confirmation of "correct" or "not" of the output word by the output voice, and when it is "correct", the output word is correct. Is determined, and in the case of “no”, the next candidate word is output by voice, and even if an erroneous recognition occurs, it is not necessary to input voice again as in the conventional case. Since the next candidate is displayed only by uttering “No”, the burden on the user is reduced and usability can be improved.

【００１０】また、本発明は、上記した音声認識装置を
備えた車載ナビゲーション装置であり、ナビゲーション
装置の利便性および操作性をより向上させることができ
る。Further, the present invention is a vehicle-mounted navigation device equipped with the above-mentioned voice recognition device, and the convenience and operability of the navigation device can be further improved.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。図１は本発明の実施の形態
における音声認識装置を備えた車載ナビゲーション装置
の構成を示している。図１において、方位センサ１は、
３Ｄジャイロが使用され、自車の進行方位を検出する。
車速センサ２は、本装置を搭載した車両の電子制御装置
に使用されているもので、車輪の回転数に応じた車速パ
ルスを発生する。各種センサ３は、リバーススイッチ、
パーキングスイッチ、ライトスイッチなどであり、車両
の走行状態を検出する。センサ信号処理部４は、方位セ
ンサ１からの信号を基に車両の進行方向を算出するとと
もに、車速センサ２からの車速信号から走行距離を算出
し、さらに各種センサ３からの信号を基に車両の走行状
態を検出して、制御に必要な信号を生成する。ＤＶＤ−
ＲＯＭ５は、地図データや音声データ、音声認識辞書デ
ータなどが記録されている。ＤＶＤ−ＲＯＭドライブ６
は、ＤＶＤ−ＲＯＭ５から地図データや音声データ、音
声認識辞書データなどを読み出すものである。液晶ディ
スプレイ７は、地図および現在の自車位置、方位、操作
メニューなどを表示するものであり、その前面にタッチ
パネルなどの操作部を備えていてもよい。ＧＰＳ受信機
８は、複数の衛星から送信される電波を受信演算するこ
とで自車の現在位置（緯度・経度）を求めるものであ
る。ＧＰＳアンテナ９は、ＧＰＳ電波を受信するための
アンテナである。これらＤＶＤ−ＲＯＭドライブ６、液
晶ディスプレイ７、ＧＰＳ受信機８等は、車両のダッシ
ュボード上に配置され、車内ＬＡＮ１０を通じて装置本
体１１の通信インターフェース１２に接続される。装置
本体１１は、車両のトランクルームや車内のセンターコ
ンソールなどに設置される。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows the configuration of a vehicle-mounted navigation device equipped with a voice recognition device according to an embodiment of the present invention. In FIG. 1, the orientation sensor 1 is
A 3D gyro is used to detect the heading of the vehicle.
The vehicle speed sensor 2 is used in an electronic control unit of a vehicle equipped with this device, and generates a vehicle speed pulse according to the number of rotations of wheels. Various sensors 3 are reverse switches,
It is a parking switch, a light switch, etc., and detects the running state of the vehicle. The sensor signal processing unit 4 calculates the traveling direction of the vehicle based on the signal from the azimuth sensor 1, the traveling distance from the vehicle speed signal from the vehicle speed sensor 2, and further the vehicle based on the signals from the various sensors 3. Detects the traveling state of the vehicle and generates a signal required for control. DVD-
The ROM 5 stores map data, voice data, voice recognition dictionary data, and the like. DVD-ROM drive 6
Is for reading map data, voice data, voice recognition dictionary data, etc. from the DVD-ROM 5. The liquid crystal display 7 displays a map and the current vehicle position, direction, operation menu, and the like, and may have an operation unit such as a touch panel on the front surface thereof. The GPS receiver 8 obtains the current position (latitude / longitude) of the vehicle by receiving and calculating radio waves transmitted from a plurality of satellites. The GPS antenna 9 is an antenna for receiving GPS radio waves. The DVD-ROM drive 6, the liquid crystal display 7, the GPS receiver 8 and the like are arranged on the dashboard of the vehicle and are connected to the communication interface 12 of the device body 11 through the in-vehicle LAN 10. The device main body 11 is installed in a trunk room of a vehicle or a center console in the vehicle.

【００１２】マイク１３は、車内の運転者近傍に配置さ
れ、使用者からの発声語句を入力するものであり、スピ
ーカ１４は、検索結果や音声認識結果、走行ルート上の
交差点案内、分岐案内、料金所案内、出口案内などの音
声案内、リモコンでの操作内容を音声で確認したりする
場合に使用される。音声認識装置１５は、マイク１３か
ら入力された語句の音声認識を行う。記憶部１６は、プ
ログラムを格納したＲＯＭや作業データを一時的に格納
するＲＡＭ、画像データを格納するＶＲＡＭなどから構
成されている。画像プロセッサ１７は、メニューデータ
や地図データ、自車の現在位置データ、建物データなど
に基づき表示画像の生成処理を行う。表示制御部１８
は、通常設定モード時および音声認識モード時におい
て、それぞれ必要な画像データを画像プロセッサ１７か
ら読み出してＣＰＵ２０に渡す。音声プロセッサ１９
は、音声認識結果を音声信号に変換したり、検索結果や
走行ルート上の音声案内、リモコンでの操作内容を表す
音声信号をスピーカ１４に出力する。ＣＰＵ（中央処理
装置）２０は、装置全体を制御するものであり、通常設
定モード時および音声認識モード時においてそれぞれ必
要な制御を行うためのプログラムを実行する。リモコン
２１は、通常設定モードと音声認識モードとを切り替え
るための操作ボタンやその他の操作ボタンを有し、赤外
線を利用してリモコン受光部２２と通信を行う。リモコ
ン受光部２２は、液晶ディスプレイ７の前面に設けられ
ているが、他の位置に設けられる場合もあり、リモコン
２１から受信した操作信号を車内ＬＡＮ１０から通信イ
ンタフェース１２を介してＣＰＵ２０に送る。The microphone 13 is arranged in the vicinity of the driver in the vehicle and is used for inputting a vocal phrase from the user, and the speaker 14 is used for a search result, a voice recognition result, intersection guidance on a traveling route, branch guidance, It is used for voice guidance such as toll booth guidance and exit guidance, and for confirming the operation contents with the remote control by voice. The voice recognition device 15 performs voice recognition of the phrase input from the microphone 13. The storage unit 16 includes a ROM that stores a program, a RAM that temporarily stores work data, a VRAM that stores image data, and the like. The image processor 17 performs display image generation processing based on menu data, map data, current position data of the vehicle, building data, and the like. Display control unit 18
In the normal setting mode and the voice recognition mode, the necessary image data is read from the image processor 17 and passed to the CPU 20. Voice processor 19
Converts the voice recognition result into a voice signal, and outputs a voice signal indicating the search result, voice guidance on the traveling route, and operation content of the remote controller to the speaker 14. A CPU (Central Processing Unit) 20 controls the entire apparatus, and executes programs for performing necessary control in the normal setting mode and the voice recognition mode, respectively. The remote controller 21 has an operation button for switching between the normal setting mode and the voice recognition mode and other operation buttons, and communicates with the remote controller light receiving unit 22 using infrared rays. Although the remote controller light receiving unit 22 is provided on the front surface of the liquid crystal display 7, it may be provided at another position, and sends an operation signal received from the remote controller 21 from the in-vehicle LAN 10 to the CPU 20 via the communication interface 12.

【００１３】次に、本実施の形態における動作につい
て、まずナビゲーション装置としての基本動作について
説明する。図１において、車両のエンジンをかけると、
ナビゲーション装置の電源がオンになり、液晶ディスプ
レイ７にはメニュー画面が表示され、ＣＰＵ２０が現在
位置検出プログラムを起動させる。車両が走行を開始す
ると、ＧＰＳ受信機８からの位置情報と、方位センサ１
および車速センサ２からの信号をセンサ信号処理部４に
より処理したデータを基に、自車の正確な現在位置を算
出する。この自車位置情報に基づき、ＣＰＵ２０が、Ｄ
ＶＤ−ＲＯＭドライブ６を通じてＤＶＤ−ＲＯＭ５から
該当する地図データを読み出し、画像プロセッサ１７に
より画像データに変換して記憶部１６のＶＲＡＭに一旦
記憶した後、色信号に変換して通信インターフェイス１
２を通じて液晶ディスプレイ７の画面上に自車位置とと
もに表示する。また、マイク１３を通じて目的地などの
住所名を入力すると、音声認識装置１５の音声認識機能
によりその住所名を認識し、目的地が設定される。目的
地が設定されると、ＣＰＵ２０は、経路探索プログラム
を起動し、設定された目的地までの自車現在位置からの
最適な案内経路を算出し、液晶ディスプレイ７の地図上
に重ねて表示する。ユーザは液晶ディスプレイ７に表示
された案内経路に沿って車両を進めると、ＣＰＵ２０
は、現在位置情報と地図データ上の道路ネットワークデ
ータを基に、液晶ディスプレイ７上の自車位置マークを
順次更新してゆく。車両が案内経路中の分岐点などに差
し掛かると、地図データに付加された音声案内がスピー
カ１４から出力される。Next, regarding the operation of the present embodiment, the basic operation of the navigation device will be described first. In FIG. 1, when the engine of the vehicle is started,
The navigation device is powered on, a menu screen is displayed on the liquid crystal display 7, and the CPU 20 activates the current position detection program. When the vehicle starts traveling, the position information from the GPS receiver 8 and the direction sensor 1
The accurate current position of the vehicle is calculated based on the data obtained by processing the signal from the vehicle speed sensor 2 by the sensor signal processing unit 4. Based on this vehicle position information, the CPU 20
Corresponding map data is read from the DVD-ROM 5 through the VD-ROM drive 6, converted into image data by the image processor 17 and temporarily stored in the VRAM of the storage unit 16, and then converted into a color signal for communication interface 1.
Displayed together with the vehicle position on the screen of the liquid crystal display 7 through 2. When an address name such as a destination is input through the microphone 13, the voice recognition function of the voice recognition device 15 recognizes the address name and the destination is set. When the destination is set, the CPU 20 activates the route search program, calculates the optimum guide route from the current position of the vehicle to the set destination, and displays it on the map of the liquid crystal display 7 in an overlapping manner. . When the user advances the vehicle along the guide route displayed on the liquid crystal display 7, the CPU 20
Will sequentially update the vehicle position mark on the liquid crystal display 7 based on the current position information and the road network data on the map data. When the vehicle approaches a branch point or the like in the guide route, the voice guide added to the map data is output from the speaker 14.

【００１４】次に、上記実施の形態における音声認識装
置１５について説明する。図２は音声認識装置１５の構
成を示している。音声分析部２３は、マイク１３から入
力された音声を周波数分析して出力する。単語辞書部２
４には、単語音声の標準パターンが格納されている。音
声照合部２５は、音声分析部２３から出力された入力音
声のデータと単語辞書部２４から出力された単語の標準
パターンとの類似度を計算し、両者の間の類似度が高い
順に認識結果として出力する。制御部２６は、音声認識
装置１５の全体を制御するマイクロコンピュータであ
り、車載ナビゲーション装置のＣＰＵ２０を通じてＤＶ
Ｄ−ＲＯＭ５から読み出した単語辞書データを単語辞書
部２４に格納し、音声照合部２５における照合の際に読
み出したり、音声照合部２５から出力された認識結果を
音声プロセッサ１９を介して合成音声としてスピーカ１
４から出力する処理を行う。Next, the voice recognition device 15 in the above embodiment will be described. FIG. 2 shows the configuration of the voice recognition device 15. The voice analysis unit 23 frequency-analyzes the voice input from the microphone 13 and outputs it. Word dictionary section 2
In 4, a standard pattern of word voice is stored. The voice collation unit 25 calculates the similarity between the input voice data output from the voice analysis unit 23 and the standard pattern of the word output from the word dictionary unit 24, and the recognition results are calculated in descending order of similarity between the two. Output as. The control unit 26 is a microcomputer that controls the entire voice recognition device 15, and the DV is transmitted through the CPU 20 of the vehicle-mounted navigation device.
The word dictionary data read from the D-ROM 5 is stored in the word dictionary unit 24, read at the time of matching in the voice matching unit 25, and the recognition result output from the voice matching unit 25 is converted into synthetic speech via the voice processor 19. Speaker 1
The process of outputting from 4 is performed.

【００１５】次に、制御部２６における制御について図
３のフロー図を参照して説明する。マイク１３から音声
が入力されると（ステップＳ１）、その音声が音声分析
部２３で単語毎に周波数分析され、短時間スペクトルの
時系列（ＬＰＣケプストラム係数列）として出力される
（ステップＳ２）。単語辞書部２４には、音素記号系列
（ＬＰＣケプストラム係数列）からなる約３０００単語
の標準パターンが格納されている。音声照合部２５は、
音声分析部２３から出力された単語の短時間スペクトル
の時系列と単語辞書部２４から出力された単語の音素記
号系列との類似度を計算し、ゼロから９９９９通りの計
算結果から得点数の高いすなわち類似度の高い順に５個
の音素記号系列を認識結果として出力し（ステップＳ
３）、類似度点数の高い方から順番に並べて上位５個の
リストを作成する（ステップＳ４）。制御部２６は、音
声照合部２５から認識結果が出力されると、その都度、
類似度点数が３０００点を越えているかどうかを調べ
（ステップＳ５）、３０００点を越えている認識結果出
力に対してはトークバックの表現パターンＡに分類し
（ステップＳ６）、３０００点を越えていない認識結果
出力に対してはトークバックの表現パターンＢに分類す
る（ステップＳ７）。表現パターンＡは、図４に示すよ
うに、「○○○を表示します。」という肯定文からな
り、表現パターンＢは、「○○○ですか？正しかったら
「はい」、間違っていたら「いいえ」とお話下さい」と
いう間接疑問文とその確認からなる。このように分類し
た後、上から順番に１つずつトークバックにより出力す
る（ステップＳ８）。例えば、「東京ドーム」と音声入
力した場合に、音声認識結果が図５に示すように、「東
京堂」が２９００点、「東京ドーン」が２３００点、
「東京ドーム」が１８００点、「東京ドール」が１５０
０点、「東京都」が１０００点の場合に、まず初めに
「東京堂を表示します。正しかったら“はい”、間違っ
ていたら“いいえ”とお話下さい」とトークバックが音
声出力される（ステップＳ９）。これに対しユーザが
“いいえ”と発声すると、それを音声認識装置１５が音
声認識するとともに、制御部２６は“いいえ”に対応し
て次ぎの音声認識結果候補をリストから抽出して、「東
京ドーンですか？正しかったら“はい”、間違っていた
ら“いいえ”とお話下さい」とトークバックが音声出力
される。そして、これに対しても、ユーザから“いい
え”が発声されると、制御部２６は次ぎの音声認識結果
候補を抽出して、「東京ドームですか？正しかったら
“はい”、間違っていたら“いいえ”とお話下さい」と
トークバックが音声出力される。そして、これに対し
て、ユーザから“はい”が発声されると、制御部２６は
音声認識結果として「東京ドーム」を確定する（ステッ
プＳ１０）。音声認識処理をこれで終了するが、その後
のナビゲーション機能として、「行き先」、「経由地設
定」、「ポイント登録」の３つの選択項目が液晶ディス
プレイ７に表示され、「行き先設定か経由地設定かポイ
ント登録のいずれにしますか」という音声案内が出力さ
れるので、ユーザがその１つを選択することにより、検
索した施設に対する設定登録が行われる。Next, the control in the control unit 26 will be described with reference to the flow chart of FIG. When a voice is input from the microphone 13 (step S1), the voice is frequency-analyzed for each word by the voice analysis unit 23 and output as a time series of a short-time spectrum (LPC cepstrum coefficient sequence) (step S2). The word dictionary unit 24 stores a standard pattern of about 3000 words consisting of a phoneme symbol series (LPC cepstrum coefficient sequence). The voice matching unit 25
The similarity between the time series of the short-term spectrum of the word output from the speech analysis unit 23 and the phoneme symbol sequence of the word output from the word dictionary unit 24 is calculated, and the score is high from zero to 9999 calculation results. That is, five phoneme symbol sequences are output as recognition results in descending order of similarity (step S
3), the top five lists are created by arranging them in descending order of similarity score (step S4). Whenever the recognition result is output from the voice collation unit 25, the control unit 26,
It is checked whether or not the similarity score exceeds 3000 points (step S5), and the recognition result output exceeding 3000 points is classified into the talkback expression pattern A (step S6) and exceeds 3000 points. The recognition result output that does not exist is classified into the talkback expression pattern B (step S7). As shown in FIG. 4, the expression pattern A is composed of an affirmative sentence "Display XX.", And the expression pattern B is "XX? Is correct? Yes, if wrong," It consists of an indirect question and a confirmation that say "No". After classifying in this way, one by one is output by talkback in order from the top (step S8). For example, when "Tokyo Dome" is input by voice, the voice recognition result is as shown in FIG. 5, "Tokyodo" is 2900 points, "Tokyo Dawn" is 2300 points,
1800 points for "Tokyo Dome" and 150 points for "Tokyo Doll"
If there are 0 points and 1000 points in "Tokyo", first of all, the talkback will be output as "Display Tokyo-do. If correct, please say" Yes ", if wrong, please say" No "" ( Step S9). On the other hand, when the user utters "No", the voice recognition device 15 recognizes the voice and the control unit 26 extracts the next voice recognition result candidate from the list corresponding to "No", Dawn? If yes, please say "yes", and if not, please say "no". " Also, in response to this, when the user utters "No", the control unit 26 extracts the next voice recognition result candidate and asks, "Is this the Tokyo Dome? "No, please say", and the talkback is output as voice. On the other hand, when the user utters "Yes", the control unit 26 determines "Tokyo Dome" as the voice recognition result (step S10). Although the voice recognition process is finished with this, as a navigation function thereafter, three selection items of “destination”, “route setting”, and “point registration” are displayed on the liquid crystal display 7, and “destination setting or route setting” is displayed. The voice guidance "Whether you want to register or point registration" is output, and the user selects one of them to perform the setting registration for the searched facility.

【００１６】このように、本実施の形態によれば、入力
した単語の音声データと単語辞書内の音声パターンとの
類似度を計算し、類似度の高い順に５個の認識結果候補
をリストとして保持し、類似度の高い順番に１個ずつト
ークバックにより出力して、類似度点数が３０００点を
超えるものについては肯定表現とし、３０００点以下の
ものについては間接疑問表現とし、さらにその認識結果
が正しいか間違っているかをユーザに確認を求めるの
で、ユーザは装置が正しく判断を行っていることを理解
できるとともに、従来のような何回も音声入力操作を繰
り返す必要がないので、装置に対する信頼性と操作性を
向上させることができる。As described above, according to the present embodiment, the similarity between the voice data of the input word and the voice pattern in the word dictionary is calculated, and five recognition result candidates are listed in descending order of similarity. It is held and output one by one in descending order of similarity, and if the similarity score exceeds 3000 points, it is expressed as an affirmative expression, and if the similarity score is less than 3000 points, it is used as an indirect question expression, and the recognition result Asks the user to confirm whether or not is correct or incorrect, so that the user can understand that the device is making a correct decision, and it is not necessary to repeat the voice input operation many times as in the past, so that the device can be trusted. The operability and operability can be improved.

【００１７】なお、上記実施の形態におけるトークバッ
クの表現パターンは一例であり、これら以外の種々の表
現が可能である。また、トークバックの表現内容を変更
する際の類似度点数の設定も可変であり、トークバック
の表現パターンの数や音声認識結果リストに計上する単
語数も任意に設定することができる。Note that the talkback expression pattern in the above embodiment is an example, and various expressions other than these are possible. Further, the setting of the similarity score when changing the expression content of the talkback is variable, and the number of expression patterns of the talkback and the number of words to be included in the voice recognition result list can be set arbitrarily.

【００１８】[0018]

【発明の効果】以上説明したように、本発明の音声認識
方法および装置は、入力した単語の音声データと単語辞
書内の音声パターンとの類似度を計算し、類似度の高い
順に複数の音声パターンを単語認識候補として出力し、
出力される音声により入力操作の確認を行う際に、類似
度が基準よりも高いか低いかによって出力される音声に
よる表現内容を変えるので、ユーザは、装置が類似度に
応じて音声認識をきちんと行っていると判断し、装置に
対する信頼性を高めることができる。As described above, the speech recognition method and apparatus of the present invention calculates the degree of similarity between the speech data of an input word and the speech pattern in the word dictionary, and a plurality of speeches are arranged in descending order of similarity. Output patterns as word recognition candidates,
When confirming the input operation by the output voice, the content of the output voice is changed depending on whether the similarity is higher or lower than the reference, so that the user can properly recognize the voice according to the similarity of the device. Therefore, it is possible to improve the reliability of the device.

【００１９】また、出力される音声によりユーザに出力
単語の「正」「否」の確認を求め、「正」の場合にはそ
の出力単語を確定し、「否」の場合には次候補の単語を
音声で出力するので、誤認識が発生した場合でも、従来
のように音声入力操作を繰り返す必要がなく、例えば
「いいえ」と発声するだけで次候補が表示されるので、
ユーザにとっては負担が少なく、使い勝手を向上させる
ことができる。Further, the user is asked to confirm whether the output word is "correct" or "not" by the output voice. If the word is "correct", the output word is confirmed, and if it is "no", the next candidate is selected. Since words are output by voice, even if erroneous recognition occurs, there is no need to repeat the voice input operation as in the past, for example, just saying "No" will display the next candidate,
The burden on the user is small and usability can be improved.

[Brief description of drawings]

【図１】本発明の実施の形態における車載ナビゲーショ
ン装置の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a vehicle-mounted navigation device according to an embodiment of the present invention.

【図２】本発明の実施の形態における音声認識装置の構
成を示すブロック図FIG. 2 is a block diagram showing a configuration of a voice recognition device according to an embodiment of the present invention.

【図３】本発明の実施の形態における音声認識装置の処
理を示すフロー図FIG. 3 is a flowchart showing processing of the voice recognition device in the embodiment of the present invention.

【図４】本発明の実施の形態におけるトークバックにお
ける表現パターンを示すテーブル図FIG. 4 is a table diagram showing an expression pattern in talkback according to the embodiment of the present invention.

【図５】本発明の実施の形態における類似度点数による
音声認識結果のリストを示すテーブル図FIG. 5 is a table diagram showing a list of voice recognition results based on similarity score according to the embodiment of the present invention.

[Explanation of symbols]

１方位センサ２車速センサ３各種センサ４センサ信号処理部５ＤＶＤ−ＲＯＭ６ＤＶＤ−ＲＯＭドライブ７液晶ディスプレイ８ＧＰＳ受信機９ＧＰＳアンテナ１０車内ＬＡＮ１１装置本体１２通信インターフェイス１３マイク１４スピーカ１５音声認識装置１６記憶部１７画像プロセッサ１８表示制御部１９音声プロセッサ２０ＣＰＵ２１リモコン２２リモコン受光部２３音声分析部２４単語辞書部２５音声照合部２６制御部 1 Direction sensor 2 vehicle speed sensor 3 Various sensors 4 Sensor signal processing unit 5 DVD-ROM 6 DVD-ROM drive 7 LCD display 8 GPS receiver 9 GPS antenna 10 Car LAN 11 Device body 12 Communication interface 13 microphone 14 speakers 15 Speech recognition device 16 Memory 17 Image Processor 18 Display control unit 19 voice processor 20 CPU 21 remote control 22 Remote receiver 23 Speech analysis section 24 word dictionary 25 Voice verification unit 26 Control unit

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 2F029 AA02 AB01 AB07 AB09 AB13 AC02 AC08 AC14 AC18 5D015 KK01 LL06 5H180 AA01 BB13 EE01 FF04 FF05 FF22 FF25 FF27 FF33 ─────────────────────────────────────────────────── ─── Continued front page F term (reference) 2F029 AA02 AB01 AB07 AB09 AB13 AC02 AC08 AC14 AC18 5D015 KK01 LL06 5H180 AA01 BB13 EE01 FF04 FF05 FF22 FF25 FF27 FF33

Claims

[Claims]

1. A degree of similarity between voice data of an input word and a voice pattern in a word dictionary is calculated, a plurality of voice patterns are set as word recognition candidates in descending order of similarity, and the word recognition candidates are output by voice. Thus, when confirming an input operation, a voice recognition method is characterized in that the expression content of the output voice is changed depending on whether the similarity is higher or lower than a reference.

2. When the similarity is higher than a reference, the expression content of the output voice is affirmative, and when the similarity is lower than the reference, the expression content of the output voice is doubtful. The speech recognition method according to claim 1, wherein the speech recognition method has a shape.

3. The output voice prompts the user to confirm whether the output word is “correct” or “not”. When the word is “correct”, the output word is confirmed, and when it is “not”, the next candidate is selected. 3. The word of 1 is output as a voice.
The voice recognition method described.

4. The input operation is confirmed by calculating the similarity between the voice data of the input word and the voice pattern in the word dictionary, and outputting a plurality of voice patterns as word recognition candidates in descending order of similarity. A voice recognition device comprising: a voice recognition unit; and a control unit that changes the expression content of the output voice depending on whether the similarity is higher or lower than a reference when performing confirmation by the output voice.

5. The control means requests the user to confirm whether the output word is “correct” or “not” by the output voice,
5. The voice recognition device according to claim 4, wherein the output word is determined when the result is “positive”, and the next candidate word is output by voice when the result is “no”.

6. An in-vehicle navigation device equipped with the voice recognition device according to claim 4.