JP2004061754A

JP2004061754A - Voice controlled unit

Info

Publication number: JP2004061754A
Application number: JP2002218610A
Authority: JP
Inventors: Makoto Sakai; 坂井　誠; Kunio Yokoi; 横井　邦雄; Toru Nada; 名田　徹
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2002-07-26
Filing date: 2002-07-26
Publication date: 2004-02-26
Anticipated expiration: 2022-07-26
Also published as: JP4004885B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice controlled unit which makes it easy to erase a part inputted by mistake when an address is vocally inputted. <P>SOLUTION: When a user speaks "return" successively twice, a display control part of the voice controlled unit erases a display of names of addresses on the display screen of a display part at a time. Consequently, when vocal input is redone from a prefecture name after, for example, "Aichi-prefecture", "Kariya-city", "Showa-machi", and "1-chome" are inputted, the user speaks "return" successively twice to erase a display of a plurality of words of "Aichi-prefecture, Kariya-city, Showa-cho, 1-chome" at a time. Consequently, the user can easily erase the display of the address without repeatedly voicing "return" again and again. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、音声制御装置に関するものである。
【０００２】
【従来の技術】
従来、話者の発する音声を認識して住所の入力を行う音声制御装置がある。この音声制御装置において、ユーザが所望の住所を入力する場合には、都道府県、市区郡、町村字等の名称を単位として（以下、階層と呼ぶ）、この階層毎に所望の住所を区切って読みあげる。例えば、「愛知県刈谷市昭和町１丁目１番地」なる住所を入力する場合には、第１階層となる「愛知県」、第２階層となる「刈谷市」、第３階層となる「昭和町」、第４階層となる「１丁目」、及び第５階層となる「１番地」というように、階層毎に区切って読みあげる。
【０００３】
一方、誤って入力した階層部分を修正して再度入力をやり直す場合、ユーザは、１つの階層の表示を消去するコマンドに対応する「戻る」を発することで、１つの階層のみが消去される。例えば、図１５に示すように、ユーザが「愛知県刈谷市昭和町１丁目」まで発声した後に、「愛知県刈谷市」以降から再度入力し直す場合には、ユーザは、「１丁目」を発声した直後に「戻る」を２回連続して発声することで、目的とする階層まで入力した住所が消去される。その後、ユーザは、再度「愛知県刈谷市」以降の住所を入力する。
【０００４】
このように、従来の音声制御装置では、ユーザは、住所を階層毎に区切って読みあげて所望の住所を入力し、また、誤って入力した階層部分を消去する場合には、「戻る」を発声して誤り部分を消去する。
【０００５】
【発明が解決しようとする課題】
しかしながら、上述の音声制御装置における「戻る」に対応するコマンドは、消去対象である階層が１階層に限定されている。従って、例えば「愛知県刈谷市昭和町１丁目」と入力した後、再度、第１階層である都道府県名称から音声入力をやり直す場合には、ユーザは、「戻る」を４回連続して発声する必要があり、このような同一の発話内容を連続して発声することは、ユーザにとって煩わしいものであった。
【０００６】
本発明は、かかる問題を鑑みてなされたもので、音声入力した住所の消去を簡単に行うことのできる音声制御装置を提供することを目的とする。
【０００７】
【課題を解決するための手段】
請求項１に記載の音声制御装置は、音声を入力するための音声入力手段と、音声入力手段に入力されたユーザの発話音声を認識する音声認識手段と、ユーザによって音声が入力されたとき、認識された発話内容を複数の階層に分類して記憶するとともに、この発話内容を修正するための修正コマンドが入力されたとき、最下層に分類して記憶されている発話内容の一部を消去する制御手段とを備え、制御手段は、修正コマンドが２回連続して入力された場合、複数の階層に分類して記憶されている発話内容の全部を消去することを特徴とする。
【０００８】
このように、本発明の音声制御装置は、修正コマンドが２回連続して入力された場合、複数の階層に分類して記憶されている発話内容の全部を消去する。これにより、修正コマンドを何度も繰り返し発声することなく、容易に発話内容を消去することが可能となる。
【０００９】
請求項２に記載のように、複数の階層に分類して記憶される発話内容は、少なくとも住所であっても良い。例えば、住所を音声入力する場合を考えると、「愛知県」、「刈谷市」、「昭和町」、「１丁目」の４つの階層を入力した後に、再度、第１の階層である都道府県から音声入力をやり直す場合、ユーザは、音声入力した住所を修正するコマンドを２回連続して発声することで、「愛知県刈谷市昭和町１丁目」なる４つの階層に分類した記憶している住所を全部消去することが可能となる。これにより、ユーザは、修正コマンドを何度も繰り返し発声することなく、容易に入力した住所を全て消去することが可能となる。
【００１０】
請求項３に記載の音声制御装置によれば、住所は、少なくとも都道府県、市区郡、町村字、丁目及び番地の５階層に分類されることを特徴とする。これにより、ユーザは、音声による住所入力を行う場合には、所望の住所の都道府県、市区郡、町村字、丁目、及び番地の各階層を発話することで、所望の住所を入力することができる。
【００１１】
【発明の実施の形態】
以下、本発明の実施の形態における音声制御装置に関して、図面に基づいて説明する。なお、本実施形態では、本発明の音声制御装置をカーナビゲーション装置に適用した例について説明する。
【００１２】
図１は、本実施形態に係わるカーナビゲーション装置の概略構成を示すブロック図である。同図に示すように、本実施形態のカーナビゲーション装置１は、音声認識部１０、経路案内部１１、車両位置・車両向き計算部１２から構成されている。また、カーナビゲーション装置１は、図示しない道路地図描画部等を有している。さらに、カーナビゲーション装置１は、音声入力に用いられるマイク２、トークスイッチ３、表示装置４、スピーカ５、ＧＰＳ受信機６、車速センサ７、ヨーレートセンサ８、及び地図データベース９等と接続されている。
【００１３】
マイク２、及びトークスイッチ３は、音声入力に用いられる装置である。音声を入力する場合には、例えば、トークスイッチ３の押しボタンを押すことで、入力トリガ信号が後述する音声認識部１０に送信され、この音声認識部１０は、入力トリガ信号を受信すると、マイク２から音声入力を受け付けるモードに変更される。
【００１４】
この音声入力を受け付けるモードのとき、ユーザによって音声が入力されると、その音声がマイク２によって音声信号に変換され、音声認識部１０に送られる。音声認識部１０は、この音声信号を認識して、音声に対応する都道府県や市区郡等の名称に変換して経路案内部１１に与える。例えば、「あいちけん」と認識された音声は、「愛知県」という都道府県の名称に変換される。この都道府県の名称を受ける経路案内部１１は、受信した名称を記憶するとともに表示装置４に表示し、その後、市区郡、町村字、丁目、及び番地等の名称を受信する毎に、これらを階層的につなぎ合わせて記憶し、かつ表示装置４に表示させる。なお、本実施形態では、都道府県を第１階層、市区郡を第２階層、町村字を第３階層、丁目を第４階層、及び番地を第５階層と呼ぶことにする。
【００１５】
また、経路案内部１１は、都道府県、市区郡、町村字、丁目、及び番地等の名称からなる住所を全て受信した場合、この住所に対応する道路地図上の地点を検索し、検索した地点を示すマークを、その周辺の道路地図とともに表示装置４へ表示する。
【００１６】
さらに、住所の入力中に、ユーザが、例えば「戻る」と発話した場合には、この音声を認識して、音声に対応するコマンドコードに変換し、経路案内部１１等に与える。例えば、「戻る」と認識された音声は「表示消去」というコマンドコードに変換される。このコマンドコードを受けた経路案内部１１は、このコマンドコードを２回連続して受けたか否かを判断し、判断の結果、１回目である場合には、階層的につなぎ合わせて記憶・表示される住所のうち、最下位の階層に位置する名称のみ消去する。また、２回連続して受けた場合には、記憶及び表示される住所の全てを一括して消去する。
【００１７】
表示装置４は、道路地図等を表示する液晶ディスプレイによって構成される。また、表示装置４のディスプレイにタッチパネルが採用されるものであっても良い。
【００１８】
スピーカ５は、音声案内や各種警告音等の出力に使用されるものであり、例えば、車両に装備されたスピーカであっても良いし、カーナビゲーション装置１に内蔵されたものであっても良い。
【００１９】
ＧＰＳ受信機６、車速センサ７、及びヨーレートセンサ８は、周知のごとく、車両の現在位置や車両進行方向等を算出するのに必要な信号（以下、センサ信号と呼ぶ）を生成するものである。生成されたセンサ信号は、車両位置・車両向き計算部１２に送られる。
【００２０】
地図データベース９は、図示しない記憶媒体に格納されるもので、地図情報、道路情報からなる。なお、記憶媒体としては、そのデータ量からＣＤ−ＲＯＭやＤＶＤ−ＲＯＭを用いるのが一般的であるが、メモリカードやハードディスクなどの媒体を用いてもよい。また、地図情報とは、表示装置４に表示するランドマーク等を描画するために必要なデータであり、施設名称、住所、電話番号、及び地図上の座標等を関連付けたデータから構成される。
【００２１】
次に、カーナビゲーション装置１に内蔵される音声認識部１０について、図２を用いて説明する。同図に示すように音声認識部１０は、ＡＤ変換回路１０１、認識プログラム処理部１０２、音響モデル記憶部１０３、及び認識辞書記憶部１０４等によって構成される。
【００２２】
ＡＤ変換回路１０１は、マイク２を介して入力されるアナログの音声信号を受信し、この信号をデジタル化した信号に変換する。変換されたデジタル音声信号は、認識プログラム処理部１０２に送信される。
【００２３】
認識プログラム処理部１０２は、音響モデル記憶部１０３、及び認識辞書記憶部１０４を用いて、デジタル音声信号を都道府県等の名称やコマンドコードに変換するものである。まず、認識プログラム処理部１０２は、音響モデル記憶部１０３に記憶される、例えば、周知の隠れマルコフモデル（Ｈｉｄｄｅｎ　Ｍａｒｋｏｖ　Ｍｏｄｅｌ）等の手法を用いて、デジタル音声信号１０６に対応する発話内容（以後、認識語読みと呼ぶ）を解析する。
【００２４】
この解析された認識語読みは、認識辞書記憶部１０４に記憶される認識語と照合され、最も確からしい認識語、及びその認識語に対応する都道府県や市区郡等の名称、或いはコマンドコードが抽出される。
【００２５】
ここで、認識辞書記憶部１０４について説明する。この認識辞書記憶部１０４は、都道府県の名称と認識語とを関連付けて記憶する都道府県辞書、都道府県別に分割された市区郡の名称と認識語とを関連付けて記憶する市区郡辞書、都道府県別に分割され、かつ、特定の都道府県における市区郡別に分割された町村字の名称と認識語とを関連付けて記憶する町村字辞書、及び、丁目や番地の名称と認識語とを関連付けて記憶する丁目辞書と番地辞書といった、計５つの認識辞書を有している。
【００２６】
すなわち、市区郡辞書は、５０程度の都道府県別に用意されるもので、例えば、図５に示すように、都道府県が「愛知県」である市区郡辞書には、愛知県に属する市区郡の名称と認識語とが記憶されている。さらに、町村字辞書は、５０程度の都道府県別で、かつ、ある特定の都道府県の市区郡別に用意されるものである。例えば、図６に示すように、愛知県刈谷市に属する町村字の名称と認識語とが記憶されている。なお、図７に示す丁目辞書、及び図８に示す番地辞書については、数字からなる名称であるため、ある特定の場所に属するものではない。
【００２７】
さらに、認識辞書記憶部１０４には、図９に示すように、都道府県や市区郡等の名称とは異なる認識語とコマンドコードとを関連付けて記憶するコマンドコード辞書を有している。このコマンドコード辞書においては、例えば、認識語が「戻る」である場合には、これに対応するコマンドコード「Ｃ０００１」が抽出される。このコマンドコードは、後述する経路案内部１１の機能実行部１１０が認識可能なコードである。
【００２８】
また、認識プログラム処理部１０２は、都道府県の名称を認識辞書記憶部１０４から抽出した後、次回に照合／抽出する認識辞書を、都道府県の名称に対応する市区郡辞書へ自動的に切り換える。例えば、都道府県名称が「愛知県」であった場合には、次回に照合／抽出する認識辞書を、愛知県に対応する市区郡辞書へ自動的に切り換える。これは、市区郡の名称を照合／抽出した後も同様であり、次回に照合／抽出する認識辞書を、市区郡の名称に対応する町村字辞書へ自動的に切り換える。但し、図９のコマンドコード辞書は、上述の認識辞書の切り換えには該当せず、常に照合／抽出の対象となる認識辞書である。
【００２９】
そして、認識プログラム処理部１０２は、上述の処理により得られた都道府県や市区郡等の名称に対応する信号を経路案内部１１に出力する。例えば、都道府県の「あいちけん」という音声が入力された場合には、都道府県名称の「愛知県」に対応する信号を送信する。
【００３０】
続いて、カーナビゲーション装置１の経路案内部１１について、図３を用いて説明する。同図に示すように、経路案内部１１は機能実行部１１０を有している。この機能実行部１１０は、現在地周辺の道路地図を表示する機能や、住所入力による地点検索機能等を実行する。例えば、現在地周辺の道路地図を表示する機能では、車両位置・車両向き計算部１２から車両位置・車両の進行方向信号を受信し、地図データベース９から車両位置周辺の地図データを読み出し、画像信号１５に変換して表示装置４に表示したりする。
【００３１】
一方、住所入力による地点検索機能では、音声認識部１０から送信される都道府県や市区郡等の名称を表示装置４に表示したり、都道府県、市区郡、町村字、丁目、及び番地等の名称からなる住所に対応する地点のマークと、その周辺の道路地図を表示装置４に表示したりするものである。
【００３２】
例えば、「愛知県」なる都道府県の名称に対応する信号を受信した場合には、この都道府県の名称を表示装置４に表示する。また、都道府県や市区郡等の名称とは異なるコマンドコードＣ０００１を受信した場合には、階層的につなぎ合わせて記憶・表示している住所のうち、最下位の階層に位置する名称のみ消去する。さらに、このコマンドコードＣ０００１を２回連続して受信した場合には、記憶・表示される住所の名称全てを一括して消去する。なお、このコマンドコードＣ０００１を受信した場合に、スピーカ５を介して、表示の消去を実行する案内音やメッセージ等を報知しても良い。
【００３３】
また、例えば、「愛知県刈谷市昭和町１丁目１番地」なる住所の名称を全て受信し終えた場合には、地図データベース９から受信した住所に対応する地点の座標を抽出し、さらに、抽出した座標周辺の地図情報や道路情報を読み出す。その後、読み出した情報を画像信号に変換して、表示装置４に住所に対応する地点のマークや、その周辺の道路地図を表示させる。
【００３４】
次に、上述のカーナビゲーション装置１において、音声入力による住所に基づく地点検索が行われる地点検索機能の処理について、図１０〜図１３のフローチャート、及び図１４の表示装置４の表示イメージ図を用いて説明する。なお、具体的な例として、ユーザによって、「愛知県刈谷市昭和町１丁目」という住所を入力した後、再度、都道府県名称から入力し直す場面を想定して説明を進める。
【００３５】
先ず、図１０のステップＳ１は、トークスイッチ３がユーザに押されるまで待機状態を継続し、トークスイッチ３が押された場合には、ステップＳ２に処理を進める。ステップＳ２では、音声認識部１０が入力モードに切り換わり、音声の入力を受け付ける状態となる。また、音声認識部１０は、これから実行する認識語の照合と、その認識語に対応する都道府県や市区郡等の名称の抽出を、都道府県辞書に基づいて行うように、認識辞書を切り換える。
【００３６】
続いて、ステップＳ３における音声認識処理を、図１１〜図１３のフローチャートを用いて説明する。まず、図１１に示すステップＳ３０では、変数「Ｂａｃｋ」をゼロに初期化する。この変数「Ｂａｃｋ」は、コマンドコードＣ０００１に対応する認識語を照合／抽出する毎に１が加算されるものであり、すなわち、この変数「Ｂａｃｋ」を参照することで、コマンドコードＣ０００１が２回連続して抽出されたか否かを判断することができる。
【００３７】
ステップＳ３１は、ユーザによって発話されたか否かを判断し、発話があった場合には、次のステップへ処理を進め、発話がない場合には、発話があるまで待機状態となる。本実施形態では、「あいちけん」という発話があったとする。
【００３８】
ステップＳ３２では、ユーザによる発話を解析し、解析した認識語読みを都道府県辞書に記憶される認識語と照合する。そして、最も確からしい認識語と、その認識語に対応する都道府県名称を抽出する。本実施形態では、「あいちけん」という認識語に対応する「愛知県」なる県名が抽出されたとする。
【００３９】
ステップＳ３３は、ステップＳ３２において照合／抽出した認識辞書が都道府県辞書、市区郡辞書、町村字辞書、丁目辞書、及び番地辞書のうち、どの階層の認識辞書であったかを判断する。本実施形態では、第１階層の認識辞書であったと判断される。
【００４０】
さらに、ステップＳ３３においては、次回の照合／抽出する認識辞書を、現在の認識辞書の下位の階層に位置し、かつ、現在抽出した名称に対応する認識辞書に切り換える。本実施形態では、第１階層の下位の階層である第２の階層で、かつ、「愛知県」の名称に対応する市区郡の認識辞書に切り換わる。なお、現在の認識辞書が第５階層である場合には、その下位の階層に位置する認識辞書は存在しないので、認識辞書を切り換える処理を実行しない。
【００４１】
ステップＳ３４は、ステップＳ３３において判断した認識辞書の階層を記憶しておく。本実施形態では、第１階層の認識辞書であったため、変数「ｋａｉｓｏｕ」を１に設定する。
【００４２】
ステップＳ３５では、ステップＳ３２において抽出した名称を、表示装置４へ表示する。本実施形態では、図１４に示すように、表示内容２０「愛知県」が表示される。
【００４３】
ステップＳ３６は、変数「ｋａｉｓｏｕ」が５であるか、或いは、「セット」や「新規セット」など発話終了を意味するコマンドが発声された否かを判断し、変数「ｋａｉｓｏｕ」が５である、或いは、発話終了を意味するコマンドが発声された場合には、第１〜第５階層までの名称入力が終了したと判断して、本音声認識処理を終了する。一方、これに該当しない場合には、ステップＳ３７へ処理を進める。本実施形態では、変数「ｋａｉｓｏｕ」は１であるので、ステップＳ３７へ処理を進める。
【００４４】
ステップＳ３７の判断は、ステップＳ３１と同一であるので説明を省略するが、本実施形態では、「かりやし」という発話があったとする。ステップＳ３８では、ユーザによる発話を解析し、解析した認識語読みを市区郡辞書に記憶される認識語と照合し、認識語に対応する市区郡名称を抽出する。なお、解析した認識語読みが、市区郡辞書の認識語から照合しても、最も確からしい認識語が選定できない場合には、コマンドコード辞書の認識語からの照合を試みる。そして、このコマンドコード辞書から認識語が照合された場合には、その認識語に対応するコマンドコードを抽出する。なお、本実施形態では、「かりやし」という認識語に対応する「刈谷市」なる市名が抽出されたとする。
【００４５】
ステップＳ３９は、ステップＳ３８において、認識語「戻る」に対応するコマンドコードＣ０００１が抽出されたか否かを判断する。ここで、コマンドコードＣ０００１が抽出された場合には、図１３のステップＳ５０へ処理を移行し、これに該当しない場合には、図１２のステップＳ４０へ処理を進める。本実施形態では、「刈谷市」なる市名が抽出されたので、ステップＳ４０へ処理を進める。
【００４６】
図１２に示すステップＳ４０では、変数「Ｂａｃｋ」をゼロに設定する。ここで変数「Ｂａｃｋ」をゼロに設定する理由は、ユーザが２回連続して「戻る」を発話した場合と、２回連続ではないものの「戻る」を複数回発話した場合との処理を切り換えるためである。このステップＳ４０では、ステップＳ３９において、認識語「戻る」に対応するコマンドコードＣ０００１が抽出されなかったため、変数「Ｂａｃｋ」にゼロが設定される。
【００４７】
ステップＳ４１は、ステップＳ３８において照合／抽出した認識辞書が都道府県辞書、市区郡辞書、町村字辞書、丁目辞書、及び番地辞書のうち、どの階層の認識辞書であったかを判断する。本実施形態では、第２階層の認識辞書であったと判断される。
【００４８】
さらに、ステップＳ４１では、次回の照合／抽出する認識辞書を、現在の認識辞書の下位の階層に位置し、かつ、現在抽出した名称に対応する認識辞書に切り換える。本実施形態では、第２階層の下位の階層である第３の階層で、かつ、「愛知県刈谷市」の名称に対応する町村字の認識辞書に切り換わる。なお、現在の認識辞書が第５階層である場合には、その下位の階層に位置する認識辞書は存在しないので、認識辞書を切り換える処理を実行しない。
【００４９】
ステップＳ４２は、ステップＳ４１において判断した認識辞書の階層を記憶しておく。本実施形態では、第２階層の認識辞書であったため、変数「ｋａｉｓｏｕ」を２に設定して記憶する。
【００５０】
ステップＳ４３では、ステップＳ３８において抽出した名称を記憶するとともに、表示装置４へ表示する。本実施形態では、図１４に示すように、表示内容２１「愛知県刈谷市」が表示される。なお、表示内容２１のように、下位の階層に位置する名称は、上位の階層につなぎ合わせて表示装置４に表示される。その後、ステップＳ３６に処理を移行する。
【００５１】
ステップＳ３６は、上述のごとく、変数「ｋａｉｓｏｕ」が５であるか、或いは、「セット」や「新規セット」など発話終了を意味するコマンドが発声された否かを判断する。本実施形態では、変数「ｋａｉｓｏｕ」は２であるので、ステップＳ３７へ処理を進める。
【００５２】
ステップＳ３７において、本実施形態では、「しょうわちょう」という発話があったとして処理を進め、ステップＳ３８では、解析した認識語読みを町村字辞書に記憶される認識語と照合し、認識語に対応する町村字名称を抽出する。本実施形態では、「しょうわちょう」という認識語に対応する「昭和町」なる町名が抽出されたとする。ステップＳ３９は、ステップＳ３８において、「昭和町」なる町名が抽出されたので、ステップＳ４０へ処理を進める。
【００５３】
ステップＳ４０では、変数「Ｂａｃｋ」をゼロに再度設定し、ステップＳ４１は、ステップＳ３８において、第３階層の認識辞書であったと判断するとともに、次回の照合／抽出する認識辞書を、第３階層の下位の階層である第４階層の丁目辞書に切り換える。ステップＳ４２は、本実施形態では、第３階層の認識辞書であったため、変数「ｋａｉｓｏｕ」を３に設定して記憶し、ステップＳ４３では、ステップＳ３８において抽出した名称を記憶し、表示装置４へ表示する。本実施形態では、図１４に示すように、表示内容２２「愛知県刈谷市昭和町」が表示される。その後、ステップＳ３６に処理を移行する。
【００５４】
ステップＳ３６は、上述のごとく、変数「ｋａｉｓｏｕ」が５であるか、或いは、「セット」や「新規セット」など発話終了を意味するコマンドが発声された否かを判断する。本実施形態では、変数「ｋａｉｓｏｕ」は３であるので、ステップＳ３７へ処理を進める。
【００５５】
ステップＳ３７において、本実施形態では、「いちちょうめ」という発話があったとして処理を進め、ステップＳ３８では、解析した認識語読みを丁目辞書に記憶される認識語と照合し、認識語に対応する丁目名称を抽出する。本実施形態では、「いちちょうめ」という認識語に対応する「１丁目」なる丁目名称が抽出されたとする。ステップＳ３９は、ステップＳ３８において、「１丁目」なる町名が抽出されたので、ステップＳ４０へ処理を進める。
【００５６】
ステップＳ４０では、変数「Ｂａｃｋ」をゼロに設定し、ステップＳ４１は、ステップＳ３８において、第４階層の認識辞書であったと判断するとともに、次回の照合／抽出する認識辞書を、第４階層の下位の階層である第５階層の番地辞書に切り換える。ステップＳ４２は、本実施形態では、第４階層の認識辞書であったため、変数「ｋａｉｓｏｕ」を４に設定して記憶し、ステップＳ４３では、ステップＳ３８において抽出した名称を、表示装置４へ表示する。本実施形態では、図１４に示すように、表示内容２３「愛知県刈谷市昭和町１丁目」が表示される。その後、ステップＳ３６に処理を移行する。
【００５７】
ステップＳ３６は、上述のごとく、変数「ｋａｉｓｏｕ」が５であるか、或いは、「セット」や「新規セット」など発話終了を意味するコマンドが発声された否かを判断する。本実施形態では、変数「ｋａｉｓｏｕ」は４であるので、ステップＳ３７へ処理を進める。
【００５８】
ステップＳ３７において、本実施形態では、「戻る」という発話があったとして処理を進め、ステップＳ３８では、解析した認識語読みを番地辞書に記憶される認識語と照合し、認識語に対応する番地名称を抽出する。しかしながら、本実施形態の認識語読み「もどる」は、番地辞書の認識語から照合しても、最も確からしい認識語が選定されない。従って、コマンドコード辞書の認識語から照合が試される。すると、本実施形態では、「もどる」という認識語に対応するコマンドコードＣ０００１が抽出される。
【００５９】
ステップＳ３９は、ステップＳ３８において、認識語「戻る」に対応するコマンドコードＣ０００１が抽出されたか否かを判断する。本実施形態では、コマンドコードＣ０００１が抽出されたので、図１３のステップＳ５０へ処理を移行する。
【００６０】
図１３のステップＳ５０では、変数「Ｂａｃｋ」に１を加える。本実施形態では、直前に処理されたステップＳ４０において、変数「Ｂａｃｋ」がゼロに設定されているので、ここでは、変数「Ｂａｃｋ」は１となる。すなわち、直前のステップＳ３８において、コマンドコードＣ０００１が抽出されたので、１回抽出されたことを意味するように１が加えられる。
【００６１】
ステップＳ５１は、変数「Ｂａｃｋ」が２であるか否かを判断し、変数「Ｂａｃｋ」が２である場合には、ステップＳ５４に処理を移行し、これに該当しない場合には、ステップＳ５２に処理を進める。本実施形態では、変数「Ｂａｃｋ」は１であるので、ステップＳ５２に処理を進める。
【００６２】
ステップＳ５２では、変数「ｋａｉｓｏｕ」から１を減じるとともに、次回の照合／抽出する認識辞書を、現在の認識辞書の１つ上位の階層である認識辞書に切り換える。本実施形態では、第４階層の１つ上位の階層である第３の階層で、かつ、第１及び第２階層である「愛知県刈谷市」に対応する町村字辞書に切り換わる。
【００６３】
ステップＳ５３において、表示装置４に表示している住所のうち、最下位の階層に位置する名称の記憶及び表示が消去される。本実施形態では、図１４の表示内容２４に示すように、第４の階層である「１丁目」の表示が消去される。その後、ステップＳ３７に処理が移行する。
【００６４】
ステップＳ３７において、本実施形態では、ユーザによって、再度「戻る」という発話があったとして処理を進め、ステップＳ３８では、解析した認識語読みを町村字辞書に記憶される認識語と照合し、認識語に対応する町村字名称を抽出する。しかしながら、本実施形態の認識語読み「もどる」は、町村字辞書の認識語から照合しても、最も確からしい認識語が選定されない。従って、コマンドコード辞書の認識語から照合が試される。すると、本実施形態では、「もどる」という認識語に対応するコマンドコードＣ０００１が抽出される。
【００６５】
ステップＳ３９は、コマンドコードＣ０００１が抽出されたので、ステップＳ５０へ処理を移行し、ステップＳ５０では、変数「Ｂａｃｋ」に１を加える。本実施形態では、直前に処理されたステップＳ５０において、変数「Ｂａｃｋ」に１が加算されているので、ここでは、変数「Ｂａｃｋ」は２となる。すなわち、コマンドコードＣ０００１が２回連続して抽出されたことを意味する。ステップＳ５１は、変数「Ｂａｃｋ」が２であるので、ステップＳ５４に処理を移行する。
【００６６】
そして、ステップＳ５４においては、表示装置４に表示している住所のうち、全ての階層に位置する名称の記憶及び表示を消去する。本実施形態では、図１４の表示内容２５に示すように、これまでに表示した名称を全て一括して消去する。その後、ステップＳ３０へ処理を移行し、再び、都道府県名称の入力から実行される。
【００６７】
なお、第１階層から第５階層までの名称が全て入力された場合、或いは、第５階層まで入力がされていなくても、「セット」や「新規セット」等、発話終了を意味するコマンドが発声されたときは、ステップＳ３６から図１０のステップＳ４へ処理を移行する。そして、ステップＳ４において、地図データベース９から、入力された住所に対応する地点の座標が抽出され、また、抽出した座標周辺の地図情報や道路情報が読み出される。
【００６８】
その後、ステップＳ５において、ステップＳ４において読み出した情報を画像信号に変換して、表示装置４に住所に対応する地点のマークと、その周辺の道路地図を表示させる。
【００６９】
このように、本発明の音声認識装置は、「戻る」を２回連続して認識した場合、階層的につなぎ合わせて表示される住所の表示を一括して消去する。これにより、入力途中において、再度、都道府県から入力をやり直す場合に、ユーザは、「戻る」を２回連続して発声することで、住所の表示を一括して消去することができる。その結果、「戻る」を何度も繰り返し発声することなく、簡単に表示を消去することが可能となる。
【００７０】
なお、本実施形態では、コマンドコードＣ０００１に対する認識語として「戻る」を設定しているが、これに限定されるものではない。さらに、コマンドコード辞書に、例えば、「消去」、「はじめに戻る」等の認識語と、これらの認識語に対してコマンドコードＣ０００２を設け、このコマンドコードＣ０００２が抽出されたときに、階層的につなぎ合わせて表示される住所の表示を一括して消去しても良い。
【００７１】
また、本発明の適用範囲は、カーナビゲーション装置の音声による住所入力機能に限定されるものではなく、例えば、住所と同様に階層的に全体が示される電話番号等を音声入力する機能等にも適用できる。
【図面の簡単な説明】
【図１】本実施形態に係わる、カーナビゲーション装置１の概略構成を示すブロック図である。
【図２】本実施形態に係わる、音声認識部１０の構成を示すブロック図である。
【図３】本実施形態に係わる、経路案内部１１の構成を示すブロック図である。
【図４】本実施形態に係わる、都道府県辞書を示す図である。
【図５】本実施形態に係わる、愛知県に属する市区郡の辞書を示す図である。
【図６】本実施形態に係わる、愛知県刈谷市に属する町村字の辞書を示す図である。
【図７】本実施形態に係わる、丁目辞書を示す図である。
【図８】本実施形態に係わる、番地辞書を示す図である。
【図９】本実施形態に係わる、コマンドコード辞書を示す図である。
【図１０】本実施形態に係わる、カーナビゲーション装置１の全体の処理の流れを示すフローチャートである。
【図１１】本実施形態に係わる、音声認識処理の全体の流れを示すフローチャートである。
【図１２】本実施形態に係わる、音声認識処理の一部の流れを示すフローチャートである。
【図１３】本実施形態に係わる、音声認識処理の表示消去処理の流れを示すフローチャートである。
【図１４】本実施形態に係わる、表示装置４の表示イメージを示した図である。
【図１５】従来技術における、音声入力された結果を示す図である。
【符号の説明】
１　カーナビゲーション装置
２　マイク
３　トークスイッチ
４　表示装置
５　スピーカ
６　ＧＰＳ受信機
７　車速センサ
８　ヨーレートセンサ
９　地図データベース
１０　音声認識部
１１　経路案内部
１２　車両位置・車両向き計算部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice control device.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, there is a voice control device that recognizes a voice uttered by a speaker and inputs an address. In this voice control device, when a user inputs a desired address, the desired address is delimited for each layer in units of names such as prefectures, municipalities, towns and villages (hereinafter referred to as layers). Read it out. For example, when inputting an address “1-1, Showa-cho, Kariya-shi, Aichi”, the first level “Aichi prefecture”, the second level “Kariya”, and the third level “Showa” The text is read out for each level, such as "town", "1st-chome" as the fourth level, and "1st address" as the fifth level.
[0003]
On the other hand, in the case where the layer portion that has been erroneously input is corrected and the input is performed again, the user issues “return” corresponding to the command for deleting the display of one layer, thereby deleting only one layer. For example, as shown in FIG. 15, when the user utters up to “Showa-cho 1-chome, Kariya-shi, Aichi-ken” and then inputs again from “Kariya-shi Aichi-ken”, the user must input “1-chome”. Immediately after the utterance, "return" is uttered twice in succession, whereby the address input to the target hierarchy is deleted. Thereafter, the user again inputs the address after “Kariya City, Aichi Prefecture”.
[0004]
As described above, in the conventional voice control device, the user reads the address divided into layers and inputs the desired address, and when erasing the erroneously input layer portion, returns “return”. Speak to erase the error.
[0005]
[Problems to be solved by the invention]
However, the command corresponding to “return” in the above-described voice control device is limited to one layer to be deleted. Therefore, for example, after inputting “1-chome, Showa-cho, Kariya-shi, Aichi”, when re-entering the voice from the prefecture name which is the first hierarchy, the user utters “return” four times consecutively. It is troublesome for the user to continuously utter such identical utterance contents.
[0006]
The present invention has been made in view of such a problem, and an object of the present invention is to provide a voice control device capable of easily deleting an address input by voice.
[0007]
[Means for Solving the Problems]
The voice control device according to claim 1, wherein voice input means for inputting voice, voice recognition means for recognizing a user's uttered voice input to the voice input means, and when voice is input by the user, Recognized utterance contents are classified and stored in a plurality of layers, and when a correction command for correcting the utterance contents is input, a part of the utterance contents classified and stored in the lowest layer is erased. Control means for deleting all of the utterance contents classified and stored in a plurality of layers when the correction command is input twice consecutively.
[0008]
As described above, when the correction command is input twice consecutively, the voice control device of the present invention deletes all of the utterance contents classified and stored in a plurality of layers. This makes it possible to easily delete the utterance content without repeatedly issuing the correction command.
[0009]
As described in claim 2, the utterance content classified and stored in a plurality of layers may be at least an address. For example, when the address is input by voice, after inputting four levels of “Aichi prefecture”, “Kariya city”, “Showa-cho”, and “1-chome”, the prefectural level of the first level is again input. In the case of redoing the voice input from the user, the user utters the command for correcting the address input by voice twice in succession, and the command is stored in four levels of “1-chome, Showa-cho, Kariya-shi, Aichi”. It is possible to delete all addresses. As a result, the user can easily delete all the input addresses without repeatedly issuing the correction command.
[0010]
According to the third aspect of the present invention, the addresses are classified into at least five levels: prefectures, municipalities, towns and villages, streets and addresses. Accordingly, when inputting the address by voice, the user can input the desired address by speaking the respective layers of the prefecture, city, county, town, village, character, and street of the desired address. Can be.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a voice control device according to an embodiment of the present invention will be described with reference to the drawings. In this embodiment, an example in which the voice control device of the present invention is applied to a car navigation device will be described.
[0012]
FIG. 1 is a block diagram illustrating a schematic configuration of a car navigation device according to the present embodiment. As shown in FIG. 1, the car navigation device 1 according to the present embodiment includes a voice recognition unit 10, a route guidance unit 11, and a vehicle position / vehicle direction calculation unit 12. The car navigation device 1 has a road map drawing unit (not shown) and the like. Further, the car navigation device 1 is connected to a microphone 2, a talk switch 3, a display device 4, a speaker 5, a GPS receiver 6, a vehicle speed sensor 7, a yaw rate sensor 8, a map database 9, and the like used for voice input. .
[0013]
The microphone 2 and the talk switch 3 are devices used for voice input. When inputting voice, for example, by pressing a push button of the talk switch 3, an input trigger signal is transmitted to a voice recognition unit 10 described later. The mode is changed from 2 to a mode for receiving a voice input.
[0014]
In the mode in which the voice input is accepted, when a voice is input by the user, the voice is converted into a voice signal by the microphone 2 and sent to the voice recognition unit 10. The voice recognition unit 10 recognizes the voice signal, converts the voice signal into a name corresponding to the voice, such as a prefecture or a city, and gives the name to the route guidance unit 11. For example, a voice recognized as “Aichiken” is converted into a prefecture name “Aichi”. The route guidance unit 11 receiving the name of the prefecture stores the received name and displays the name on the display device 4. Thereafter, each time the name such as a city, a ward, a town character, a street address, and an address is received, the route guidance unit 11 receives these names. Are hierarchically connected and stored, and displayed on the display device 4. In the present embodiment, prefectures are referred to as a first hierarchy, municipalities are referred to as a second hierarchy, towns and villages are referred to as a third hierarchy, streets are referred to as a fourth hierarchy, and addresses are referred to as a fifth hierarchy.
[0015]
Further, when the route guidance unit 11 receives all addresses including names such as prefectures, municipalities, towns and villages, streets and addresses, it searches for a point on a road map corresponding to this address, and searches for it. A mark indicating a point is displayed on the display device 4 together with a road map around the point.
[0016]
Further, when the user utters, for example, "return" during input of the address, the voice is recognized, converted into a command code corresponding to the voice, and given to the route guidance unit 11 and the like. For example, a voice recognized as “return” is converted into a command code of “display erase”. Upon receiving this command code, the route guidance unit 11 determines whether or not this command code has been received twice consecutively. Of the addresses to be deleted, only the name located at the lowest level is deleted. If the address is received twice consecutively, all of the stored and displayed addresses are collectively erased.
[0017]
The display device 4 is configured by a liquid crystal display that displays a road map and the like. Further, a touch panel may be employed as the display of the display device 4.
[0018]
The speaker 5 is used for outputting voice guidance, various warning sounds, and the like. For example, the speaker 5 may be a speaker provided in a vehicle or may be a built-in speaker in the car navigation device 1. .
[0019]
As is well known, the GPS receiver 6, the vehicle speed sensor 7, and the yaw rate sensor 8 generate a signal (hereinafter, referred to as a sensor signal) necessary for calculating the current position of the vehicle, the traveling direction of the vehicle, and the like. . The generated sensor signal is sent to the vehicle position / vehicle direction calculator 12.
[0020]
The map database 9 is stored in a storage medium (not shown), and includes map information and road information. As a storage medium, a CD-ROM or a DVD-ROM is generally used because of its data amount, but a medium such as a memory card or a hard disk may be used. The map information is data necessary for drawing landmarks and the like to be displayed on the display device 4, and is composed of data in which facility names, addresses, telephone numbers, coordinates on a map, and the like are associated.
[0021]
Next, the speech recognition unit 10 built in the car navigation device 1 will be described with reference to FIG. As shown in FIG. 1, the speech recognition unit 10 includes an AD conversion circuit 101, a recognition program processing unit 102, an acoustic model storage unit 103, a recognition dictionary storage unit 104, and the like.
[0022]
The AD conversion circuit 101 receives an analog audio signal input via the microphone 2, and converts this signal into a digitized signal. The converted digital audio signal is transmitted to the recognition program processing unit 102.
[0023]
The recognition program processing unit 102 uses the acoustic model storage unit 103 and the recognition dictionary storage unit 104 to convert a digital voice signal into a name of a prefecture or a command code. First, the recognition program processing unit 102 uses the technique such as the well-known hidden Markov model (Hidden Markov Model) stored in the acoustic model storage unit 103 to generate the utterance content (hereinafter, referred to as the digital speech signal 106). (Referred to as recognition word reading).
[0024]
The analyzed recognition word reading is compared with the recognition word stored in the recognition dictionary storage unit 104, and the most probable recognition word and the name of the prefecture, city, county or the like corresponding to the recognition word, or the command code Is extracted.
[0025]
Here, the recognition dictionary storage unit 104 will be described. The recognition dictionary storage unit 104 includes a prefectural dictionary that stores names of prefectures and recognition words in association with each other, a city dictionary that stores names of cities and districts divided by prefecture and recognition words in association with each other, A town and village character dictionary that is divided by prefecture and divided by city and district in a specific prefecture and stores the name and recognition word in association with each other, and the names of streets and streets are associated with recognition words It has a total of five recognition dictionaries, such as a chome dictionary and an address dictionary.
[0026]
That is, the municipal county dictionary is prepared for about 50 prefectures. For example, as shown in FIG. 5, the municipal county dictionary in which the prefecture is “Aichi prefecture” includes the cities belonging to the Aichi prefecture. The name of the ward and the recognition word are stored. Further, the town / village character dictionary is prepared for about 50 prefectures and for each municipal district of a specific prefecture. For example, as shown in FIG. 6, a name of a town and village character belonging to Kariya city in Aichi prefecture and a recognition word are stored. Note that the chome dictionary shown in FIG. 7 and the address dictionary shown in FIG. 8 do not belong to a specific place because they have names composed of numerals.
[0027]
Further, as shown in FIG. 9, the recognition dictionary storage unit 104 has a command code dictionary that stores a recognition word different from a name of a prefecture, a city, a county, or the like and a command code in association with each other. In this command code dictionary, for example, when the recognition word is “return”, the corresponding command code “C0001” is extracted. This command code is a code that can be recognized by the function execution unit 110 of the route guidance unit 11 described later.
[0028]
After extracting the name of the prefecture from the recognition dictionary storage unit 104, the recognition program processing unit 102 automatically switches the recognition dictionary to be collated / extracted next time to the city / ward / county dictionary corresponding to the name of the prefecture. . For example, if the prefecture name is "Aichi prefecture", the recognition dictionary to be collated / extracted next time is automatically switched to the city / ward / county dictionary corresponding to Aichi prefecture. This is the same even after matching / extracting the name of the city / ward / county. The recognition dictionary to be checked / extracted next time is automatically switched to the town / village dictionary corresponding to the name of the city / ward / county. However, the command code dictionary in FIG. 9 does not correspond to the above-described switching of the recognition dictionary, and is a recognition dictionary that is always a target of collation / extraction.
[0029]
Then, the recognition program processing unit 102 outputs to the route guidance unit 11 a signal corresponding to the name of the prefecture, city, county or the like obtained by the above processing. For example, when the voice of “Aichiken” of the prefecture is input, a signal corresponding to the name of the prefecture “Aichi” is transmitted.
[0030]
Next, the route guidance unit 11 of the car navigation device 1 will be described with reference to FIG. As shown in the figure, the route guidance unit 11 has a function execution unit 110. The function execution unit 110 executes a function of displaying a road map around the current location, a point search function by inputting an address, and the like. For example, in the function of displaying a road map around the current position, the vehicle position / vehicle direction calculation unit 12 receives a vehicle position / vehicle traveling direction signal, reads map data around the vehicle position from the map database 9, and outputs an image signal 15. And display it on the display device 4.
[0031]
On the other hand, in the point search function by inputting an address, the names of prefectures, municipalities, and the like transmitted from the voice recognition unit 10 are displayed on the display device 4, and the names of prefectures, municipalities, towns, villages, streets, and addresses are displayed. For example, a mark of a point corresponding to an address having a name such as a name and a road map around the mark are displayed on the display device 4.
[0032]
For example, when a signal corresponding to the name of a prefecture “Aichi” is received, the name of the prefecture is displayed on the display device 4. When a command code C0001 different from the name of the prefecture, city, or county is received, only the name located at the lowest level among addresses stored and displayed in a hierarchically connected manner is deleted. I do. Further, when the command code C0001 is received twice consecutively, all the names of the addresses stored and displayed are collectively erased. Note that, when the command code C0001 is received, a guidance sound, a message, or the like for executing display deletion may be notified via the speaker 5.
[0033]
Further, for example, when all the names of the addresses “1-1, Showa-cho, Kariya-shi, Aichi” have been received, the coordinates of the point corresponding to the received address are extracted from the map database 9 and further extracted. It reads out map information and road information around the coordinates. Then, the read information is converted into an image signal, and the display device 4 displays a mark of a point corresponding to the address and a road map around the mark.
[0034]
Next, in the above-described car navigation device 1, a process of a point search function for performing a point search based on an address by voice input will be described with reference to flowcharts of FIGS. 10 to 13 and a display image diagram of the display device 4 of FIG. explain. Note that, as a specific example, the description will be made on the assumption that the user inputs an address of “1-chome, Showa-cho, Kariya-shi, Aichi” and then re-enters the name from the prefecture name.
[0035]
First, in step S1 of FIG. 10, the standby state is continued until the talk switch 3 is pressed by the user, and when the talk switch 3 is pressed, the process proceeds to step S2. In step S2, the voice recognition unit 10 switches to the input mode, and enters a state of receiving a voice input. In addition, the voice recognition unit 10 switches the recognition dictionary so that the matching of the recognition word to be executed from now on and the extraction of the name of the prefecture, city, district, etc. corresponding to the recognition word are performed based on the prefecture dictionary. .
[0036]
Next, the speech recognition processing in step S3 will be described with reference to the flowcharts in FIGS. First, in step S30 shown in FIG. 11, the variable "Back" is initialized to zero. The variable “Back” is incremented by 1 each time the recognition word corresponding to the command code C0001 is collated / extracted. That is, by referring to the variable “Back”, the command code C0001 is output twice. It can be determined whether or not they have been continuously extracted.
[0037]
In a step S31, it is determined whether or not the user has spoken. If there is a speech, the process proceeds to the next step. If there is no speech, the process enters a standby state until there is a speech. In the present embodiment, it is assumed that there is an utterance “Aichiken”.
[0038]
In step S32, the utterance of the user is analyzed, and the analyzed recognized word reading is compared with the recognized words stored in the prefectural dictionary. Then, the most likely recognized word and the name of the prefecture corresponding to the recognized word are extracted. In the present embodiment, it is assumed that a prefecture name “Aichi Prefecture” corresponding to the recognition word “Aichiken” has been extracted.
[0039]
In step S33, it is determined in which hierarchy the recognition dictionary that has been collated / extracted in step S32 is one of the prefectural dictionary, the city / ward county dictionary, the town / village dictionary, the chome dictionary, and the address dictionary. In the present embodiment, it is determined that the recognition dictionary is the first hierarchy recognition dictionary.
[0040]
Further, in step S33, the recognition dictionary to be collated / extracted next time is switched to a recognition dictionary located at a lower hierarchy of the current recognition dictionary and corresponding to the currently extracted name. In the present embodiment, the recognition dictionary is switched to the recognition dictionary of the city, ward and county corresponding to the name of "Aichi prefecture" in the second hierarchy, which is lower than the first hierarchy. If the current recognition dictionary is at the fifth level, there is no recognition dictionary located at a lower level, and therefore, the process of switching the recognition dictionary is not executed.
[0041]
In step S34, the hierarchy of the recognition dictionary determined in step S33 is stored. In the present embodiment, the variable “kaisou” is set to 1 because it is the first-level recognition dictionary.
[0042]
In step S35, the name extracted in step S32 is displayed on the display device 4. In the present embodiment, as shown in FIG. 14, the display content 20 "Aichi Prefecture" is displayed.
[0043]
The step S36 decides whether or not the variable "kaisou" is 5, or whether a command indicating the end of the utterance such as "set" or "new set" is uttered, and the variable "kaisou" is 5. Alternatively, when a command indicating the end of the utterance is uttered, it is determined that the input of the names of the first to fifth layers has been completed, and the voice recognition processing ends. If this is not the case, the process proceeds to step S37. In the present embodiment, since the variable “kaisou” is 1, the process proceeds to step S37.
[0044]
The determination in step S37 is the same as that in step S31, and a description thereof will not be repeated. In step S38, the utterance of the user is analyzed, the analyzed recognized word reading is compared with the recognized word stored in the city / ward county dictionary, and a city / ward / county name corresponding to the recognized word is extracted. In addition, even if the analyzed recognized word reading is compared with the recognized word in the city / ward county dictionary, if the most reliable recognized word cannot be selected, the matching from the recognized word in the command code dictionary is attempted. Then, when the recognized word is collated from the command code dictionary, a command code corresponding to the recognized word is extracted. In the present embodiment, it is assumed that the city name “Kariya” corresponding to the recognition word “Kariyashi” is extracted.
[0045]
A step S39 decides whether or not the command code C0001 corresponding to the recognition word "return" has been extracted in the step S38. Here, if the command code C0001 is extracted, the process proceeds to step S50 in FIG. 13, and if not, the process proceeds to step S40 in FIG. In the present embodiment, since the city name “Kariya” has been extracted, the process proceeds to step S40.
[0046]
In step S40 shown in FIG. 12, the variable “Back” is set to zero. Here, the reason why the variable “Back” is set to zero is to switch the processing between the case where the user utters “return” twice consecutively and the case where the user utters “return” plural times although it is not twice consecutive. That's why. In step S40, since the command code C0001 corresponding to the recognized word "return" was not extracted in step S39, zero is set to the variable "Back".
[0047]
In step S41, it is determined in which hierarchy the recognition dictionary that has been collated / extracted in step S38 is a recognition dictionary of a prefectural dictionary, a municipal county dictionary, a town / village dictionary, a street dictionary, and an address dictionary. In the present embodiment, it is determined that the recognition dictionary is the second hierarchy recognition dictionary.
[0048]
Further, in step S41, the recognition dictionary to be collated / extracted next time is switched to a recognition dictionary located at a lower hierarchy of the current recognition dictionary and corresponding to the currently extracted name. In the present embodiment, the recognition dictionary is switched to the town / village character recognition dictionary corresponding to the name of “Kariya City, Aichi Prefecture” in the third hierarchy, which is lower than the second hierarchy. If the current recognition dictionary is at the fifth level, there is no recognition dictionary located at a lower level, and therefore, the process of switching the recognition dictionary is not executed.
[0049]
In step S42, the hierarchy of the recognition dictionary determined in step S41 is stored. In the present embodiment, since the recognition dictionary is a second-level recognition dictionary, the variable “kaisou” is set to 2 and stored.
[0050]
In step S43, the name extracted in step S38 is stored and displayed on the display device 4. In the present embodiment, as shown in FIG. 14, display content 21 "Kariya City, Aichi Prefecture" is displayed. Note that a name located at a lower hierarchy, such as the display content 21, is displayed on the display device 4 by being connected to an upper hierarchy. After that, the processing shifts to the step S36.
[0051]
In step S36, as described above, it is determined whether the variable "kaisou" is 5, or whether a command indicating the end of the utterance, such as "set" or "new set", has been uttered. In the present embodiment, since the variable “kaisou” is 2, the process proceeds to step S37.
[0052]
In step S37, in the present embodiment, the process proceeds assuming that there is an utterance of "showacho". In step S38, the analyzed recognized word reading is collated with the recognized word stored in the town and village character dictionary, and the recognized word is recognized. Extract the corresponding town character name. In the present embodiment, it is assumed that the town name “Showa-cho” corresponding to the recognition word “showacho” has been extracted. In step S39, the process proceeds to step S40 since the town name "Showa-cho" has been extracted in step S38.
[0053]
In step S40, the variable “Back” is set to zero again. In step S41, it is determined in step S38 that the recognition dictionary is the third-layer recognition dictionary, and the next recognition dictionary to be compared / extracted is set to the third-layer recognition dictionary. The mode is switched to the fourth level, that is, the lower level hierarchical dictionary. In step S42, in the present embodiment, since the recognition dictionary is the third hierarchy, the variable “kaisou” is set to 3 and stored. In step S43, the name extracted in step S38 is stored and stored in the display device 4. indicate. In the present embodiment, as shown in FIG. 14, display content 22 "Showa-cho, Kariya-shi, Aichi" is displayed. After that, the processing shifts to the step S36.
[0054]
In step S36, as described above, it is determined whether the variable "kaisou" is 5, or whether a command indicating the end of the utterance, such as "set" or "new set", has been uttered. In the present embodiment, since the variable “kaisou” is 3, the process proceeds to step S37.
[0055]
In step S37, in the present embodiment, the process proceeds assuming that there is an utterance of "Ichichome". In step S38, the analyzed recognized word reading is collated with the recognized word stored in the chome dictionary, and the recognized word is matched. Extract the name of the chome to perform. In the present embodiment, it is assumed that a chome name “1 chome” corresponding to the recognition word “ichichome” has been extracted. In step S39, the process proceeds to step S40 since the street name "1-chome" is extracted in step S38.
[0056]
In a step S40, the variable “Back” is set to zero, and in a step S41, it is determined in the step S38 that the recognition dictionary is a fourth-layer recognition dictionary, and the next recognition / extraction dictionary to be compared / extracted is set in a lower layer of the fourth layer. Is switched to the fifth level address dictionary. In step S42, since the recognition dictionary is the fourth hierarchy in the present embodiment, the variable “kaisou” is set to 4 and stored, and in step S43, the name extracted in step S38 is displayed on the display device 4. . In the present embodiment, as shown in FIG. 14, display content 23 "1-chome, Showa-cho, Kariya-shi, Aichi" is displayed. After that, the processing shifts to the step S36.
[0057]
In step S36, as described above, it is determined whether the variable "kaisou" is 5, or whether a command indicating the end of the utterance, such as "set" or "new set", has been uttered. In the present embodiment, since the variable “kaisou” is 4, the process proceeds to step S37.
[0058]
In step S37, in the present embodiment, the process proceeds assuming that there is an utterance of "return". In step S38, the analyzed recognized word reading is compared with the recognized word stored in the address dictionary, and the address corresponding to the recognized word is read. Extract the name. However, the recognition word reading “return” of the present embodiment does not select the most probable recognition word even when collating from the recognition word in the address dictionary. Therefore, matching is attempted from the recognized words in the command code dictionary. Then, in the present embodiment, the command code C0001 corresponding to the recognition word “return” is extracted.
[0059]
A step S39 decides whether or not the command code C0001 corresponding to the recognition word "return" has been extracted in the step S38. In the present embodiment, since the command code C0001 has been extracted, the process proceeds to step S50 in FIG.
[0060]
In step S50 of FIG. 13, 1 is added to the variable "Back". In this embodiment, since the variable “Back” is set to zero in the immediately preceding step S40, the variable “Back” is set to 1 here. That is, since the command code C0001 has been extracted in the immediately preceding step S38, 1 is added to mean that the command code has been extracted once.
[0061]
A step S51 decides whether or not the variable "Back" is 2; if the variable "Back" is 2, the process shifts to a step S54; otherwise, the process goes to a step S52. Proceed with the process. In the present embodiment, since the variable “Back” is 1, the process proceeds to step S52.
[0062]
In step S52, 1 is subtracted from the variable "kaisou", and the recognition dictionary to be collated / extracted next time is switched to the recognition dictionary which is one level higher than the current recognition dictionary. In the present embodiment, the dictionary is switched to the town / village dictionary corresponding to “Kariya City, Aichi Prefecture” which is the third hierarchy which is one hierarchy higher than the fourth hierarchy and which is the first and second hierarchy.
[0063]
In step S53, the storage and display of the name located at the lowest level among the addresses displayed on the display device 4 are deleted. In the present embodiment, as shown in the display content 24 of FIG. 14, the display of the fourth hierarchical level “1” is deleted. Thereafter, the process proceeds to step S37.
[0064]
In step S37, in the present embodiment, the process proceeds assuming that there is an utterance "return" again by the user. In step S38, the analyzed recognized word reading is collated with the recognized word stored in the town and village character dictionary, and the recognition is performed. Extract the town / village name corresponding to the word. However, the recognition word reading “return” of the present embodiment does not select the most probable recognition word even if it is collated from the recognition word of the Machimura dictionary. Therefore, matching is attempted from the recognized words in the command code dictionary. Then, in the present embodiment, the command code C0001 corresponding to the recognition word “return” is extracted.
[0065]
In the step S39, since the command code C0001 is extracted, the process shifts to a step S50. In the step S50, 1 is added to the variable "Back". In the present embodiment, since 1 is added to the variable “Back” in the immediately preceding step S50, the variable “Back” is 2 here. That is, it means that the command code C0001 is extracted twice consecutively. In step S51, since the variable “Back” is 2, the process proceeds to step S54.
[0066]
In step S54, of the addresses displayed on the display device 4, the storage and display of names located at all levels are deleted. In the present embodiment, as shown in the display content 25 of FIG. 14, all the names displayed so far are collectively deleted. Thereafter, the process proceeds to step S30, and the process is executed again from the input of the prefecture name.
[0067]
Note that when all the names from the first to fifth layers have been input, or even when no input has been made to the fifth layer, a command such as “set” or “new set” that means the end of the utterance is used. If uttered, the process proceeds from step S36 to step S4 in FIG. Then, in step S4, the coordinates of a point corresponding to the input address are extracted from the map database 9, and map information and road information around the extracted coordinates are read.
[0068]
Then, in step S5, the information read in step S4 is converted into an image signal, and the display device 4 displays a mark of a point corresponding to the address and a road map around the mark.
[0069]
As described above, when the speech recognition device of the present invention recognizes "return" twice in succession, it erases the display of the addresses that are connected hierarchically and displayed collectively. Thus, when inputting again from the prefecture during the input, the user can erase the display of the address in a lump by uttering “return” twice consecutively. As a result, the display can be easily erased without repeatedly saying “return”.
[0070]
In the present embodiment, “return” is set as a recognition word for the command code C0001, but the present invention is not limited to this. Furthermore, in the command code dictionary, for example, recognition words such as “erase” and “return to the beginning” and command codes C0002 for these recognition words are provided. When the command codes C0002 are extracted, The display of the addresses displayed jointly may be collectively deleted.
[0071]
Further, the scope of application of the present invention is not limited to the voice address input function of the car navigation device, and may be, for example, a function of voice inputting a telephone number or the like that is hierarchically indicated as in the case of an address. Applicable.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a schematic configuration of a car navigation device 1 according to the present embodiment.
FIG. 2 is a block diagram illustrating a configuration of a speech recognition unit 10 according to the embodiment.
FIG. 3 is a block diagram illustrating a configuration of a route guide unit 11 according to the present embodiment.
FIG. 4 is a diagram showing a prefectural dictionary according to the embodiment.
FIG. 5 is a diagram showing a dictionary of municipalities belonging to Aichi prefecture according to the present embodiment.
FIG. 6 is a diagram showing a dictionary of town and village characters belonging to Kariya city, Aichi prefecture, according to the present embodiment.
FIG. 7 is a diagram showing a chome dictionary according to the embodiment;
FIG. 8 is a diagram showing an address dictionary according to the embodiment.
FIG. 9 is a diagram showing a command code dictionary according to the embodiment.
FIG. 10 is a flowchart showing a flow of overall processing of the car navigation device 1 according to the embodiment.
FIG. 11 is a flowchart illustrating an overall flow of a voice recognition process according to the embodiment.
FIG. 12 is a flowchart illustrating a partial flow of a voice recognition process according to the embodiment;
FIG. 13 is a flowchart showing a flow of a display erasing process of a voice recognition process according to the embodiment.
FIG. 14 is a diagram showing a display image of the display device 4 according to the present embodiment.
FIG. 15 is a diagram showing a result of voice input according to the related art.
[Explanation of symbols]
1 Car navigation system
2 microphone
3 Talk switch
4 Display device
5 Speaker
6 GPS receiver
7 Vehicle speed sensor
8 Yaw rate sensor
9 map database
10 Voice Recognition Unit
11 route guidance
12 Vehicle position / vehicle direction calculation unit

Claims

Voice input means for inputting voice;
Voice recognition means for recognizing a user's uttered voice input to the voice input means,
When a voice is input by the user, the recognized utterance contents are classified and stored in a plurality of layers, and when a correction command for correcting the utterance contents is input, the recognized utterance contents are classified and stored in the lowest layer. Control means for erasing a part of the utterance content,
The voice control device, wherein when the correction command is input twice consecutively, the control unit deletes all of the utterance contents stored in a plurality of layers.

The voice control device according to claim 1, wherein the utterance content classified and stored in a plurality of layers is at least an address.

3. The voice control device according to claim 2, wherein the addresses are classified into at least five levels of prefectures, municipalities, towns and villages, streets and addresses.