JP2004021028A

JP2004021028A - Voice interaction device and voice interaction program

Info

Publication number: JP2004021028A
Application number: JP2002177301A
Authority: JP
Inventors: Tsukasa Shimizu; 清水　司
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2002-06-18
Filing date: 2002-06-18
Publication date: 2004-01-22

Abstract

【課題】対話装置からのガイダンスによる質問、確認に対して、ユーザが無効な発話を行った場合や、装置が誤認識し、再度同一内容のガイダンスが必要となった場合、ユーザが不快感を抱かないようなガイダンス文を生成すること。
【解決手段】回答文から、ガイダンス文が目的とした回答が得られたか否かを判断する判断手段と、判断手段により目的の回答が得られなかったと判断された場合には、その目的とする回答を得るための他の異なるガイダンス文を音声出力して、発話者に再度、回答文を要求する再要求手段とを有するようにした。
【選択図】　図１An object of the present invention is to provide a user with discomfort when a user makes an invalid utterance in response to a question or confirmation by a guidance from a dialogue device, or when the device misrecognizes and needs the same content again. Generate guidance sentences that you do not hold.
SOLUTION: From an answer sentence, a judging means for judging whether or not a target answer of a guidance sentence has been obtained, and, when it is judged by the judging means that a target answer could not be obtained, the object is set as the target Another different guidance sentence for obtaining an answer is output as voice, and a re-requesting means for requesting the speaker again for the answer sentence is provided.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、発話者から目的とする回答を得るためにガイダンス文を音声出力し、このガイダンス文に応答して発話者から得られる回答文を解析して目的とする回答を確定する音声認識装置に関する。したがって、本発明は例えば、車載用のカーナビゲーション・システム等に応用可能で、例えば目的地の施設名称や住所などの、所謂「スロット」に該当する情報を埋めていくような対話を行う音声対話装置等に適用することができる。
【０００２】
【従来の技術】
発話者から目的とする回答を得るためにガイダンス文を音声出力し、このガイダンス文に応答して発話者から得られる回答文を解析して目的とする回答を確定する音声認識装置としては、一例として、公開特許公報「特開平１０−２０８８４：音声対話装置」に記載されている音声対話装置が知られている。発話者からよりスムーズに回答を得るために、ガイダンス文をより分かりやすいものにしたり、特開平１０−２０８８４による音声対話装置においては、ユーザが音声を入力するまでの時間や、正しい認識結果の割合、対話の流れから、ユーザの熟練度を推定し、熟練度に応じて、複数の異なる表現を用いた音声ガイダンスを自動的に選択する方法が取られている。
【０００３】
【発明が解決しようとする課題】
上記、特開平１０−２０８８４による音声対話装置では、目的とする回答を得るために複数のガイダンスを用意しているが、これは、ユーザの熟練度のみを考慮して、ガイダンス文の変更を行っている。つまり、誤認識した場合などの繰り返されるガイダンスに関しては、考慮されていない。つまり、従来の音声対話装置では、誤認識をした場合、認識できなかった場合などは、同一のガイダンス文が繰り返されていた。従って、ユーザは、正しい回答をしているにもかかわらず、音声対話装置の誤認識により、発話が無効となる場合もある。このような場合、音声対話装置は、再度、同一の内容のガイダンスを行う必要がある。
【０００４】
従来装置では、同一のガイダンス文を繰り返すため、ユーザは、装置に対して「融通の聞かない装置」といった不快感を抱くことになる。又、ユーザにとって、聞き取り難いガイダンス文であった場合、同一のガイダンス文を繰り返されても判り辛く、結局、ユーザから、音声対話装置の目的とする回答が得られないという問題もあった。本発明は、このようなユーザの音声対話装置に対する不快感を低減することを目的としている。又、本発明の更なる目的は、ユーザと、音声対話装置の円滑な音声対話を実現させることである。
【０００５】
本発明は、上記の課題を解決する為になされたものであり、その目的は、ユーザに不快感を抱かせず、音声対話装置との円滑な音声対話を実現させ目的とする回答を得ることである。
【０００６】
なお、上述したある１つの発明が、上記した全ての目的を同時に達成するものと解されるべきではなく、個々の発明が、それぞれの目的を達成するものと解されるべきである。
【０００７】
【課題を解決するための手段】
上記課題を解決する為に、請求項１の音声対話装置においては、発話者から目的とする回答を得るためにガイダンス文を音声出力し、このガイダンス文に応答して発話者から得られる回答文を解析して目的とする回答を確定する音声対話装置において、回答文から、ガイダンス文が目的とした回答が得られたか否かを判断する判断手段と、判断手段により目的の回答が得られなかったと判断された場合には、その目的とする回答を得るための他の異なるガイダンス文を音声出力して、発話者に再度、回答文を要求する回答文再要求手段とを有することを特徴とする。つまり、音声対話装置が、目的とする回答が得られなかった場合は、目的とする回答を得るために、異なるガイダンス文を音声出力し、再度発話者の回答を要求するようにした。
【０００８】
さらに、請求項２の発明は、発話者から、音声対話装置が目的とする回答を得るために音声出力されるガイダンス文は、異なるカテゴリー毎に設けられた少なくとも１つのスロットに目的の回答を得るために、発話者に回答文を促す文であり、回答を求める少なくとも１つのスロット毎に、異なる複数のガイダンス文が音声出力されることを特徴とする。
【０００９】
さらに、請求項３の発明は、異なるガイダンス文を出力するために、少なくとも１つのスロット毎に複数のガイダンス文を記憶した記憶手段を有し、回答文再要求手段は、回答を求める少なくとも１つのスロットに対応した複数のガイダンス文を、記憶手段から順次選択して音声出力することを特徴とする。
【００１０】
さらに、請求項４の発明は、回答を求める少なくとも１つのスロット毎に設定されている複数のガイダンス文の集合は、そのスロットの目的とする回答を引き出すために、異なる表現のガイダンス文の集合から成ることを特徴とする。
【００１１】
さらに、請求項５の発明の音声対話プログラムは、音声対話装置のコンピュータにおいて、発話者から目的とする回答を得るためにガイダンス文を音声出力する手順と、ガイダンス文に応答して発話者から得られる回答文を解析し、回答文から、ガイダンス文が目的とした回答が得られたか否かを判断する判断手順と、判断手順により目的の回答が得られなかったと判断された場合には、その目的とする回答を得るための他の異なるガイダンス文を生成しガイダンス文を音声出力し発話者に再度、回答文を要求する回答文再要求手順とを有することを特徴とする。
【００１２】
さらに、請求項６の発明の音声対話プログラムは、音声対話装置のコンピュータにおいて、回答文再要求手順は、異なるカテゴリ毎に設けられた少なくとも１つのスロットに目的の回答を得るために、回答を求める少なくとも１つのスロット毎に異なる複数のガイダンス文を音声出力する手順をさらに有することを特徴とする。
【００１３】
【発明の作用、効果】
本欄では、各請求項に記載の発明に関して、主としてその作用及び効果を記載する。発明の理解を容易にするために、例示的に具体化して説明しているが、請求項の構成を限定するものではない。そして、例示的に具体化して説明した部分は、発明の実施の形態の説明でもある。
【００１４】
まず、請求項１の発明は、発話者から目的とする回答を得るためにガイダンス文を音声出力し、このガイダンス文に応答して発話者から得られる回答文を解析して、ガイダンス文が目的とした回答が得られたか否かを判断する判断手段を有するようにしたので、ガイダンス文に対して、回答が得られたかどうかが判断できるようになる。判断の結果、目的の回答が得られなかったと判断された場合には、その目的とする回答を得るための他の異なるガイダンス文を音声出力して、発話者に再度、回答文を要求する回答文再要求手段を有するようにしたので、回答が得られ無かった時は、回答文再要求手段により、前回と異なるガイダンス文が、音声出力されるようになる。この結果、ユーザの不快感が低減されるようになる。さらに、異なる表現であるため、何を答えたらよいか分かりやすくなるという効果もある。例えば、音声対話装置が「店名」について質問し、それに対してユーザが応答する。ここでは、ユーザが、誤った回答をしたために、有効な回答が得られなかったとする。
対話装置：「お店の名前を教えてください。」
ユーザ　：「レストランです。」
認識結果：「レストラン　です。」
これは、業種であり、ガイダンスが目的とする回答である店名ではないので、再度質問をする必要があるので、直前のガイダンス文とは異なるガイダンス文が出力されるようになる。
対話装置：「なんと言うお店でしょうか？」
ユーザ　：「タンポポです。」
認識結果：「タンポポ　です。」
のように、異なるガイダンス文が音声出力されるので、ユーザは、装置から、問い直されても、不快感を抱くことが少なくなり、さらに、何を答えなければいけないのか、理解し易くなる。このように、期待する回答が得られないときに、表現を変えるということは、人対人の会話では、極自然に行われることである。従って、装置との対話であっても、違和感を抱くことなく円滑に対話を進めることができるようになる。
【００１５】
さらに、請求項２の発明は、音声対話装置の目的である、異なるカテゴリー毎に設けられた少なくとも１つのスロットに目的の回答を得るために、発話者に回答文を促す文であるガイダンス文は、回答を求める少なくとも１つの目的のスロット毎に、異なる複数のガイダンス文が音声出力されるようにしたので、目的のスロットに応じた複数の異なるガイダンス文を音声出力できるようになる。
【００１６】
さらに、請求項３の発明は、少なくとも１つのスロット毎に複数のガイダンス文を記憶手段に記憶できるようにし、回答文再要求手段により、回答を求める少なくとも１つのスロットに対応した複数のガイダンス文を、記憶手段から順次選択して音声出力できるようにしたので、直前のガイダンス文が求めた回答と同じ回答を求める場合でも、直前のガイダンス文とは異なるガイダンス文を音声出力できるようになる。このように記憶手段から、順次選択して、音声出力できるようにすることにより、例えば、ガイダンス文の表現を選択する順に、最初のガイダンス文の表現は一般的なもの。次のガイダンス文は、簡潔さより、少し分かり易さ重視の表現を用いたもの。といったランク付けをしたガイダンス文を出力することもできるようになる。つまり、次のガイダンス文の表現は、さらに分かり易さを重視した表現のもの。つまり、ガイダンス文に、求められる分かり易さと、簡潔さは、相反するものであり、どちらかに重きをおけば、どちらかが犠牲となる。しかし、異なる表現のガイダンス文を順次選択して、音声出力することができるので、ガイダンスを繰り返す毎に、異なる比重のガイダンス文を用意することもできるようになる。これにより、不慣れなユーザであっても、同じガイダンス文を繰り返され、良く分からない、融通の利かない装置といった不快感を抱くことなく、装置を利用できるようになる。
【００１７】
さらに、請求項４の発明は、回答を求める少なくとも１つのスロットの目的とする回答を引き出すために、表現の異なるガイダンス文の集合から成る様に、それぞれの回答を求める少なくとも１つのスロット毎に設定されている複数のガイダンス文のを構成したので、目的とするスロットに対する回答を得るために、異なる表現のガイダンスを音声出力できるようになる。
【００１８】
さらに、請求項５の発明のプログラムは、上述した音声対話装置にインストールして用いると、上述した音声対話装置において、発話者から目的とする回答を得るためにガイダンス文を音声出力できるようになり、さらに、ガイダンス文に応答して発話者から得られる回答文を解析し、回答文から、ガイダンス文が目的とした回答が得られたか否かを判断する判断できるようになる。さらに、目的の回答が得られなかったと判断された場合には、その目的とする回答を得るための他の異なるガイダンス文を生成しガイダンス文を音声出力し発話者に再度、回答文を要求することができるようになる。
【００１９】
さらに、請求項６の発明のプログラムは、上述した音声対話装置にインストールして用いると、異なるカテゴリ毎に設けられた少なくとも１つのスロットに目的の回答を得るために、回答を求める少なくとも１つのスロット毎に異なる複数のガイダンス文を音声出力することができるようになる。
【００２０】
【発明の実施の形態】
以下、本発明を具体的な実施例に基づいて説明する。ただし、本発明は、以下に示す実施例に限定されるものではない。
【００２１】
ここでは、カーナビゲーションシステムにおける目的地設定を対象タスクとした対話を例に説明する。
【００２２】
このタスクでは、音声対話装置は、目的地設定に必要な３つの目的スロット、「店名」、「住所」、「業種」についてユーザに質問し、ユーザからの回答に対して確認を行う。ここでは、次の質問ガイダンスに直前の認識結果を入れることによって、質問と同時に、直前の認識結果の確認（暗黙的な確認）を行うものとする。
【００２３】
ガイダンス生成部では、質問すべきスロット（質問スロット）、確認すべきスロット（確認スロット）の組み合わせからなるガイダンススロットに対応して、複数の異なる表現からなるガイダンス文テンプレート（図３）が、ガイダンス記憶部１７１に記憶されており、該当する質問スロットと、確認スロットに組み合わせに対応するテンプレートのグループから選択され、ガイダンスの生成に用いられるものとする。
【００２４】
図１は、本発明の実施例に係わる音声対話装置１００の論理的な構成を例示する構成図である。
【００２５】
音声対話装置１００は、主に、音声入力部１１０と、音声認識部１２０と、意味理解部１３０と、対話制御部１４０と、ガイダンス生成部１５０と、音声出力部１６０、データベース１７０等から構成されている。勿論、音声対話装置１００は、物理的なハードウエア構成としては、周知の音声対話装置と同様に、音声入力部１１０が有するマイクや、音声出力部１６０が有するスピーカー等のマンマシン・インターフェイス部を備えたコンピュータ・システムにより具現されている。
【００２６】
音声認識部１２０は、発話者の発話音声を文字列として認識する。即ち、マイク（音声入力部１１０）から入力された音声情報を、音声認識用辞書（認識用言語辞書や認識用音響辞書等）を用いた音声認識処理により文字列に変換する。
【００２７】
意味理解部１３０は、主に、単語抽出部１３１、単語判断部１３２等から構成されている。これらにより、上記の文字列の中から必要なキーワード（スロット値となる単語）を抽出し、その単語がガイダンスが目的とした単語かどうか判断し、目的とした単語であれば、各スロットへ、スロット値として保持する。具体的には、単語抽出部１３１は、音声認識結果として出力された文字列から、スロット値となる様な単語をデータベース１７０に記憶された単語辞書等を利用して抽出する。単語判断部１３２は、抽出された単語が、ガイダンスによる目的単語であるかどうか判断する。判断するには、例えば、抽出された単語のカテゴリーを単語辞書、或いはスロット値候補単語リスト等を利用して同定し、ガイダンスが想定しているカテゴリーに属するか判断することにより、目的単語であるかどうか判断する。カテゴリーとしては、例えば、住所、地名、施設の種類、店名、業種、施設名、ランドマーク名、或いはユーザ設定名等の任意の属性を定義することができる。判断の結果、目的単語であると判断すると、ガイダンスが目的としたスロットへ、スロット値として保持する。又、「はい」、「いいえ」といった回答の場合は、該当するスロットのスロット状態を進展、又は後退させる様に更新する。また、暗黙の確認に対しては、否定を表す単語がなければ、確認されたとし、スロット状態を更新する。スロット状態とは、例えば、スロット値が何も保持されていない状態は、スロット値を得るために質問すべきスロット（質問スロット）である。質問スロットである状態を「質問」状態。対話により、スロット値となる単語が保持されているが、確認されておらず、確認すべきスロット（確認スロット）である状態を「確認」状態。スロット値を確認した状態を「確定」　状態と呼ぶ。
【００２８】
対話制御部１４０は、次に尋ねるべき質問項目又は確認項目を決定し、対話の流れを制御し、対話の進展により、保持されたスロット値に対して、スロット状態が、推移し、すべてのスロットが、「確定」　状態になるまで、対話を行う。
【００２９】
回答文再要求部１４１においては、テンプレート番号を更新することにより、直前のガイダンス文と同一のガイダンス文とならないように制御する。
【００３０】
ガイダンス生成部１５０は、発話者（ユーザ）に対する応答文（確認応答文や質問応答文等）を生成し、更に、その応答文（単語列）を音響的なデジタル信号（音声情報）に変換・合成する。ただし、この変換・合成処理は、以下に例示する様に、音声出力部１６０が行う用にしても良い。ガイダンス文を生成するには、具体的には、図２（ｂ）ガイダンス文スロットを参照することにより、確認スロット、質問スロットの組み合わせを得る。さらに、テンプレート番号を得ることにより、直前のガイダンス文とは異なるガイダンス文を生成するテンプレート番号を得る。その後、確認スロット、質問スロットに保持されている「店名」　、「住所」、「業種」　により、目的スロットを参照し、ガイダンス文を生成する。
【００３１】
データベース１５０は、主に、音声認識用辞書、単語辞書、スロット値候補単語リスト、音声合成用辞書、及びガイダンス記憶部１７１に記憶されたガイダンス文テンプレート（図３）等から構成されている。
（ａ）音声認識用辞書
認識用言語辞書や認識用音響辞書等からなる。
（ｂ）単語辞書
カテゴリー、関連カテゴリー、その他の属性、発音情報等を有する。
（ｃ）スロット値候補単語リスト
候補単語リスト。単語とその単語のカテゴリーの対から構成されたテーブル。
（ｄ）音声合成用辞書
発話の抑揚、単語接続、間などに関する音声合成用の発音規則を有する。
（ｅ）　ガイダンス文テンプレート（図３）
確認スロット、質問スロットの対に対応したテンプレート番号の付された複数のガイダンス文のテンプレートからなる。
【００３２】
図４は、上記の音声対話装置１００が実行する処理の手順を例示するフローチャートである。本手順では、まず最初に、ステップ４００により初期処理を実行し、目的スロット、対話状態、ガイダンス文スロット、テンプレート番号の初期化を行う。本処理では、データベース１７０の中から使用頻度が高いと予期されるプログラム及びデータを、比較的アクセス速度の高いメモリー上にローディングしておく等の初期処理を実行する。例えば、音声対話装置１００がディスプレイ装置（図略）を有する場合等には、例えば初期メニュー画面を表示する等のその他の初期処理を行っても良い。
【００３３】
ステップ４０２において、ガイダンス生成部１５０において、ガイダンス文が生成される。ガイダンス文の生成は、図２（ｂ）に示すガイダンス文スロットを参照することにより、「確認スロット」と、「質問スロット」の組み合わせと、「テンプレート番号」を得る。この、「確認スロット」と、「質問スロット」の組み合わせと、「テンプレート番号」により、ガイダンス文テンプレートから、ガイダンス文テンプレートを選択する。
【００３４】
初期状態としては、まず、すべての目的スロットは、スロット値を保持していないので、確認すべきスロットはない。まず、店名を尋ねるので、「質問スロット」に「店名」が保持されており、テンプレート番号としては、初期値として、１がセットされている。従って、ガイダンス生成部では、「確認スロット」が、「空」、「質問スロット」が「店名」のテンプレート番号「１」のガイダンス文テンプレートが選択される。ここでは、次のようなガイダンス文が用意される。例：　対話装置：「お店の名前を言ってください。」
【００３５】
次に、ステップ４０４において、音声出力部１６０において、ステップ４０２において生成されたガイダンス文が音響的なデジタル信号（音声情報）に変換・合成され、スピーカーに音声として出力される。
【００３６】
ステップ４０６において、音声入力部１１０において、マイク（音声入力部１１０）から発話者の音声を音声情報として取り込む。
【００３７】
ステップ４０８において、音声認識部１２０において、音声情報を音声認識用辞書（認識用言語辞書や認識用音響辞書等）を用いて、文字列に変換する。
【００３８】
ステップ４１０において、意味理解部１３０の単語抽出部１３１において、上記の文字列の中から必要な単語（スロット値となる単語）を抽出し、その単語のカテゴリーを同定する。次に、ステップ４１２において、単語判断部１３２により目的単語であるかどうか判断する。目的単語であった場合は、ステップ４１４へ進み、抽出単語が、目的スロットのスロット値としての単語であれば、目的スロットへ抽出単語をスロット値として保持し、スロット状態を更新する。又、目的単語が、「はい」「いいえ」などの確認単語であれば、必要に応じて、スロット値の消去、及び、スロット状態の更新を行う。同時に、テンプレート番号の初期化を行い「０」をセットする（判断手段）。
【００３９】
ステップ４１６において、対話制御部１４０において、目的スロットは、全て「確定」状態になったか判断する。確定状態と判断されると、ステップ４２０において、終了ガイダンスが出力され音声対話装置の処理を終了する。一方、目的スロットが、まだ、確定していないと判断されると、ステップ４１８において、スロット状態により、ガイダンス文スロットの設定、テンプレート番号の更新を行う。ここで、ステップ４１２において、目的単語で無いと判断されると、スロット状態の更新が行われないので、ステップ４１８において、直前のガイダンス文と同一の「質問スロット」「確認スロット」の組み合わせとなり、同一の回答を要求するガイダンス文テンプレートのグループが選択される（回答文再要求手段）。しかし、テンプレート番号が、ステップ４１８において更新されるので、同一のグループのテンプレートの選択であっても、ガイダンス文は異なる表現のガイダンス文テンプレートの選択となる。ステップ４０２から、ステップ４１８を繰り返し、全ての目的スロットへスロット値となる単語を設定し、スロット状態を「確定」とする。
【００４０】
以下、具体例により、各目的スロットのスロット値、スロット状態、ガイダンススロットの状態を詳しく説明する。
【００４１】
（１）音声対話装置が「店名」について質問し、それに対してユーザが応答する。ここでは、音声認識部が正しく認識したとする。
対話装置：「お店の名前を言ってください。」
ユーザ　：「タンポポです。」
認識結果：「タンポポ　です。」
これにより、「店名スロット」に「タンポポ」が保持され、スロット、「店名スロット」の状態は、「確認」に更新される。ガイダンス文スロットへは、「確認スロット」へ「店名」、「質問スロット」へ「住所」、テンプレート番号は、「０」がセットされる。
【００４２】
（２）「店名スロット」に「タンポポ」を保持し、その結果を確認しつつ「住所」を質問するガイダンスを生成する。それに対して、ユーザは、「住所」を答える。ここでは、誤認識し有効な認識結果が得られなかったとする。
対話装置：「タンポポの住所を言ってください。」
ユーザ　：「名古屋市です。」
認識結果：「何個　です。」
「住所スロット」に対する有効な認識結果が得られなかったため、スロット値、スロット状態は更新されないが、テンプレート番号は更新されるので、次にガイダンス文テンプレートが選択され、ガイダンス文が生成される。
【００４３】
（３）ここでは直前のガイダンス文と、異なる言い回しのガイダンス文を生成する。それに対して、ユーザは、「住所」を答える。ここでは、音声認識部が、正しく認識したとする。
対話装置：「タンポポはどこにあるのでしょうか？」
ユーザ　：「名古屋市です。」
認識結果：「名古屋市です。」
「住所スロット」に「名古屋市」を保持し、スロット状態は、「確認」に更新される。「店名スロット」のスロット状態は、「確定」へ更新され、テンプレート番号は、初期化され、「０」がセットされる。ガイダンス文スロットへは、「確認スロット」へ「住所」、「質問スロット」へ「業種」、テンプレート番号は、「０」がセットされる。従って、「住所スロット」その結果を確認しつつ次に「業種」を質問するガイダンスを生成する。
【００４４】
（４）ユーザは、「業種」を答える。ここでは、誤認識し、有効な認識結果が得られなかったとする。
対話装置：「名古屋市の何のお店でしょうか？」
ユーザ　：「レストランです。」
認識結果：「です　えーと　です」
「業種」スロットに対する有効な認識結果が得られなかったため、スロット値、スロット状態は、更新されない。ステップ４１８において、ガイダンス文スロットの設定を行う再に、直前の質問スロットと同一となるが、テンプレート番号が更新される為、同一のガイダンス文とはならない。
【００４５】
（５）再度、「業種」を質問するガイダンスが選択されるが、「２」のガイダンス文テンプレートが選択される。それに対して、ユーザは、「業種」を答える。
対話装置：「名古屋市にあるどのような業種でしょうか？」
ユーザ　：「レストランです。」
認識結果：「レストラン　です」
「業種」スロットに「レストラン」を保持し、スロット状態を「確認」に更新する。さらに、「名古屋市」は「確定」状態へと更新される。テンプレート番号は、初期値「０」がセットされる。ステップ４１８において、ガイダンス文スロットの設定を行う際にテンプレート番号が更新されるのでテンプレート番号「１」のガイダンス文テンプレートが、ここでは選択される。
【００４６】
（６）最後に「業種スロット」に対する確認を行う。それに対してユーザは、「はい」と答え、対話が終了する。
対話装置：「「レストランですか？」
ユーザ　：「はい」
認識結果：「はい」
【００４７】
ステップ４２０において出力されるガイダンス文は、例え、目的スロットを参照し、次のようなガイダンス文を出力する。
対話装置：「では、名古屋市のレストランで、タンポポを設定します。」
このように、上記発明を実施することで、上記の音声対話装置を実現することができる。
【００４８】
これにより、繰り返し同一内容のガイダンスを行う場合でも、同一のガイダンスを行うことなく、異なる表現のガイダンス文により、ユーザへ回答を促すことができるようになり、ユーザへ不快感を抱かせることなく、より円滑な音声対話装置との会話が期待できるようになる。
【００４９】
なお、上述した実施例においては、「店名」「住所」「業種」の３つのスロットを確定する例において説明したが、これ以外の少なくとも１つのスロットを確定するようにしてもよい。さらに、これ以外のスロットを確定するに従い、ガイダンス文テンプレートは、ふさわしいものを用意するようにしてもよい。
【００５０】
上記実施例でのガイダンス文テンプレートは１例であるので、これ以外に用意しても良い。
【００５１】
さらに、ガイダンス文テンプレートの「質問スロット」「確認スロット」の組み合わせも上述した以外に用意しても良い。「質問スロット」「確認スロット」単独のガイダンス文テンプレートを用意しても良い。
【００５２】
さらに、上記実施例では、直前の認識結果を次のガイダンス文に入れることにより、暗黙の確認を行ったが、目的とする回答の内容によっては、１スロット毎に、質問、確認を行うように対話の制御を行っても良い。この場合、用意されるガイダンス文テンプレートは、各目的スロット毎に、質問ガイダンス、確認ガイダンスを複数用意するようにする。さらに、これ以外に必要なガイダンスがあれば、用意しても良い。
【００５３】
さらに、上記実施例では、直前に回答を得た１つのスロットについて暗黙の確認を行い、次の１つのスロットについての回答を求める制御による例を示したが、これは、暗黙の確認を求めるスロット、回答を求めるスロット共に、１つとは限らない。必要に応じて、それそれが、単数であっても、複数であっても良い。さらに、少なくとも１つのスロットに付いて確認のみを求める制御であっても、少なくとも１つのスロットに付いて回答のみを求める制御であっても構わない。
【００５４】
さらに、上記実施例では、ガイダンス文においても、確認を求めるスロット、回答を求めるスロット共に、１つづつの例を示したが、それそれが、複数であっても、単数であっても、確認スロット及び、質問スロットの制御に対応するガイダンス文であれば良い。従って、必要に応じて、質問、確認のそれぞれ少なくとも１つのスロットに応じたガイダンス文を用意しても良い。
【００５５】
さらに、必要に応じて、ガイダンス文スロットの確認スロット、質問スロットも複数用意しても良い。
【００５６】
上述した実施形態は、本発明の一例であって、これに限定されるものではなく、発明の本質に照らして、様々な変形例が考えられる。
【図面の簡単な説明】
【図１】本発明の実施例に係わる音声対話装置１００の論理的な構成を例示する構成図。
【図２】単語判断部１３２、回答文再要求部１４１、ガイダンス生成部１５０が使用するスロットテーブルの説明図。
【図３】ガイダンス生成部１５０が使用するガイダンス文テンプレートの説明図。
【図４】音声対話装置１００が実行する処理の手順を例示するフローチャート。
【符号の説明】
１００　…　音声対話装置
１１０　…　音声入力部
１２０　…　音声認識部
１３０　…　意味理解部
１３１　…　単語抽出部
１３２　…　単語判断部
１４０　…　対話制御部
１４１　…　回答文再要求部
１５０　…　ガイダンス生成部
１６０　…　音声出力部
１７０　…　データベース
１７１　…　ガイダンス記憶部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention provides a voice recognition device that outputs a guidance sentence in order to obtain a target answer from a speaker, analyzes a response sentence obtained from the speaker in response to the guidance sentence, and determines a target answer. About. Therefore, the present invention can be applied to, for example, an on-vehicle car navigation system and the like, and a voice dialogue for performing a dialogue for filling information corresponding to a so-called “slot” such as a destination facility name or address. It can be applied to devices and the like.
[0002]
[Prior art]
An example of a speech recognition device that outputs a guidance sentence in order to obtain a desired answer from a speaker and analyzes the answer sentence obtained from the speaker in response to the guidance sentence to determine a desired answer is an example. As an example, there is known a voice interactive device described in Japanese Patent Laid-Open Publication No. Hei 10-20884: Voice Interactive Device. In order to obtain a smoother answer from the speaker, the guidance sentence can be made easier to understand. In a speech dialogue device disclosed in Japanese Patent Application Laid-Open No. 10-20884, the time required for the user to input a speech, the ratio of a correct recognition result, In this method, a skill level of a user is estimated from a flow of a dialog, and a voice guidance using a plurality of different expressions is automatically selected according to the skill level.
[0003]
[Problems to be solved by the invention]
In the above-described voice dialogue apparatus disclosed in Japanese Patent Application Laid-Open No. H10-20884, a plurality of guidances are prepared in order to obtain a desired answer, but this is done by changing the guidance sentence in consideration of only the user's skill level. ing. That is, repeated guidance such as in the case of erroneous recognition is not considered. That is, in the conventional voice interaction device, the same guidance sentence is repeated when erroneous recognition is not performed or recognition is not performed. Therefore, the utterance may be invalid due to erroneous recognition of the voice interaction device even though the user has given a correct answer. In such a case, the voice interaction device needs to provide the same guidance again.
[0004]
In the conventional device, since the same guidance sentence is repeated, the user has an unpleasant sensation to the device, such as “a device that is inflexible”. Further, when the guidance sentence is difficult for the user to hear, it is difficult to understand even if the same guidance sentence is repeated, and there is also a problem that the user cannot obtain a desired answer of the voice interaction device after all. An object of the present invention is to reduce such discomfort of the user for the voice interaction device. A further object of the present invention is to realize a smooth voice dialogue between a user and a voice dialogue device.
[0005]
The present invention has been made in order to solve the above-described problems, and an object of the present invention is to realize a smooth voice dialogue with a voice dialogue device and obtain a desired answer without causing discomfort to a user. It is.
[0006]
It should be understood that one invention described above is not intended to achieve all the objects described above at the same time, and individual inventions are intended to achieve the respective objects.
[0007]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, in the voice interactive device according to claim 1, a guidance sentence is output as voice to obtain a target answer from the speaker, and an answer sentence obtained from the speaker in response to the guidance sentence is provided. In the voice dialogue device that analyzes the answer and determines the target answer, the judgment means for judging whether or not the target answer of the guidance sentence is obtained from the answer sentence, and the target answer is not obtained by the judgment means If it is determined that the answer is different, another different guidance sentence for obtaining the intended answer is output as voice, and the speaker again has an answer sentence re-requesting means for requesting the answer sentence. I do. In other words, when the target answer cannot be obtained, the voice interaction apparatus outputs a different guidance sentence in voice to request the target answer, and requests the speaker's answer again.
[0008]
Further, according to the invention of claim 2, in the guidance sentence which is output by voice from the speaker in order to obtain the target answer, the target answer is obtained in at least one slot provided for each different category. Therefore, a sentence that prompts the speaker for an answer sentence, and a plurality of different guidance sentences are output as speech for at least one slot for which an answer is requested.
[0009]
Further, the invention according to claim 3 has a storage unit for storing a plurality of guidance sentences for each at least one slot in order to output different guidance sentences, and the answer sentence re-requesting unit includes at least one answer requesting answer. It is characterized in that a plurality of guidance sentences corresponding to the slots are sequentially selected from the storage means and output as voice.
[0010]
Further, according to the invention of claim 4, a set of a plurality of guidance sentences set for each of at least one slot for which an answer is required is derived from a set of guidance sentences of different expressions in order to derive a target answer of the slot. It is characterized by comprising.
[0011]
Further, according to a fifth aspect of the present invention, there is provided a voice dialogue program, wherein a computer of the voice dialogue apparatus outputs a guidance sentence by voice to obtain a desired answer from the speaker, The answer sentence is analyzed, and from the answer sentence, a judgment procedure for judging whether or not the intended answer for the guidance sentence has been obtained.If the judgment procedure indicates that the intended answer was not obtained, the The present invention is characterized in that it has another procedure for generating another different guidance sentence for obtaining a target answer, outputting the guidance sentence as voice, and requesting the speaker again for the answer sentence.
[0012]
Further, in the voice dialogue program according to the invention of claim 6, in the computer of the voice dialogue device, the answer sentence re-requesting procedure seeks an answer in order to obtain a target answer in at least one slot provided for each different category. The method further comprises a step of outputting a plurality of different guidance sentences by voice for at least one slot.
[0013]
Actions and effects of the present invention
This section mainly describes the functions and effects of the invention described in each claim. To facilitate understanding of the present invention, the present invention is exemplarily embodied and described, but does not limit the configuration of the claims. The part concretely described as an example is also an explanation of the embodiment of the invention.
[0014]
First, according to the first aspect of the present invention, a guidance sentence is output by voice in order to obtain a desired answer from a speaker, and an answer sentence obtained from the speaker is analyzed in response to the guidance sentence. Since the determination means is provided for determining whether or not an answer has been obtained, it can be determined whether or not an answer has been obtained for the guidance sentence. If it is determined that the intended answer could not be obtained, another different guidance sentence for obtaining the intended answer is output as a voice, and an answer requesting the speaker to request the answer again. Since a sentence re-requesting means is provided, when an answer is not obtained, a guidance sentence different from the previous sentence is output by the answer sentence re-requesting means. As a result, user discomfort is reduced. Furthermore, since the expressions are different, there is an effect that it is easy to understand what to answer. For example, the spoken dialogue device inquires about “store name”, and the user responds thereto. Here, it is assumed that a valid answer was not obtained because the user made an incorrect answer.
Dialogue device: "Please tell me the name of the store."
User: "Restaurant."
Recognition result: "Restaurant."
This is the type of business, and is not the store name that is the target answer of the guidance, so it is necessary to ask a question again, so that a guidance sentence different from the immediately preceding guidance sentence is output.
Dialogue device: "What kind of store is it?"
User: "It's a dandelion."
Recognition result: "It is a dandelion."
As described above, a different guidance sentence is output as a voice, so that the user is less likely to feel uncomfortable even when asked again from the device, and it is easier to understand what must be answered. In this way, changing the expression when the expected answer is not obtained is a very natural thing in a person-to-person conversation. Therefore, even in the case of a dialogue with the device, the dialogue can be smoothly advanced without feeling uncomfortable.
[0015]
Further, according to the invention of claim 2, a guidance sentence which is a sentence urging a speaker to answer in order to obtain an intended answer in at least one slot provided for each different category, which is an object of the voice interaction apparatus, is: Since a plurality of different guidance sentences are output by voice for at least one target slot for which an answer is requested, a plurality of different guidance sentences corresponding to the target slot can be output by voice.
[0016]
Further, the invention according to claim 3 enables a plurality of guidance sentences to be stored in the storage means for each at least one slot, and a plurality of guidance sentences corresponding to at least one slot for which an answer is requested by the answer sentence re-requesting means. Since it is possible to sequentially select and output voice from the storage means, even when the same answer as the previous guidance sentence is requested, a guidance sentence different from the immediately preceding guidance sentence can be output as voice. In this way, by sequentially selecting from the storage means and outputting the voice, for example, the expression of the first guidance sentence is general in the order of selecting the expression of the guidance sentence. The following guidance text uses expressions that emphasize a little more intelligibility than simplicity. It is also possible to output a guidance sentence with a ranking such as: In other words, the expression in the following guidance sentence emphasizes the comprehensibility. In other words, the simplicity and simplicity required of a guidance sentence are contradictory, and emphasis on one will cost one of them. However, since guidance sentences of different expressions can be sequentially selected and output as voice, it becomes possible to prepare guidance sentences of different specific gravity every time the guidance is repeated. As a result, even an unskilled user can use the apparatus without having to repeat the same guidance sentence and to have an uncomfortable or inflexible apparatus.
[0017]
Furthermore, in order to derive a target answer of at least one slot for which an answer is required, the invention of claim 4 is set for each at least one slot for which each answer is required so as to be composed of a set of guidance sentences having different expressions. Since a plurality of guidance sentences are configured, guidance of different expressions can be output as voice in order to obtain an answer for a target slot.
[0018]
Further, when the program according to the fifth aspect of the present invention is installed and used in the above-described voice interactive device, the above-mentioned voice interactive device can output a guidance sentence in order to obtain a desired answer from a speaker. Further, it is possible to analyze the answer sentence obtained from the speaker in response to the guidance sentence, and to judge from the answer sentence whether or not the intended answer of the guidance sentence has been obtained. Further, when it is determined that the intended answer was not obtained, another different guidance sentence for obtaining the intended answer is generated, the guidance sentence is output as a voice, and the speaker is requested again for the answer sentence. Will be able to do it.
[0019]
Further, when the program according to the invention of claim 6 is installed and used in the above-described voice interactive device, at least one slot for which an answer is required in order to obtain a target answer in at least one slot provided for each different category. A plurality of different guidance sentences can be output by voice.
[0020]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described based on specific examples. However, the present invention is not limited to the embodiments described below.
[0021]
Here, a description will be given of an example of a dialogue in which a destination setting in a car navigation system is a target task.
[0022]
In this task, the spoken dialogue apparatus asks the user about three destination slots required for setting a destination, “shop name”, “address”, and “industry type”, and confirms an answer from the user. Here, it is assumed that the immediately preceding recognition result is confirmed (implicit confirmation) simultaneously with the question by putting the immediately preceding recognition result into the next question guidance.
[0023]
In the guidance generation unit, a guidance sentence template (FIG. 3) having a plurality of different expressions is stored in the guidance storage corresponding to the guidance slot including the combination of the slot to be asked (question slot) and the slot to be confirmed (confirmation slot). It is stored in the unit 171 and is selected from a group of templates corresponding to the combination of the corresponding question slot and the confirmation slot, and is used for generating guidance.
[0024]
FIG. 1 is a configuration diagram illustrating a logical configuration of a voice interaction device 100 according to an embodiment of the present invention.
[0025]
The voice interaction device 100 mainly includes a voice input unit 110, a voice recognition unit 120, a meaning understanding unit 130, a dialog control unit 140, a guidance generation unit 150, a voice output unit 160, a database 170, and the like. ing. Of course, the voice interactive device 100 has, as a physical hardware configuration, a man-machine interface unit such as a microphone included in the voice input unit 110 and a speaker included in the voice output unit 160, similarly to a known voice interactive device. It is embodied by a computer system provided.
[0026]
The voice recognition unit 120 recognizes the voice of the speaker as a character string. That is, the voice information input from the microphone (voice input unit 110) is converted into a character string by voice recognition processing using a voice recognition dictionary (a recognition language dictionary, a recognition acoustic dictionary, or the like).
[0027]
The meaning understanding unit 130 mainly includes a word extraction unit 131, a word determination unit 132, and the like. From these, necessary keywords (words serving as slot values) are extracted from the above character strings, it is determined whether or not the word is the target word of the guidance. Stored as slot value. More specifically, the word extracting unit 131 extracts a word having a slot value from a character string output as a result of speech recognition using a word dictionary or the like stored in the database 170. The word determining unit 132 determines whether the extracted word is a target word according to the guidance. To determine, for example, the category of the extracted word is identified using a word dictionary or a slot value candidate word list or the like, and it is determined whether the category belongs to the category assumed by the guidance. Judge whether or not. As the category, for example, any attribute such as an address, a place name, a facility type, a shop name, a business type, a facility name, a landmark name, or a user setting name can be defined. As a result of the determination, when it is determined that the target word is a target word, the target word is held as a slot value in a target slot. If the answer is "yes" or "no", the slot status of the corresponding slot is updated so as to evolve or retreat. If there is no negative word for the implicit confirmation, the confirmation is made and the slot state is updated. The slot state is, for example, a state in which no slot value is held is a slot (query slot) to be queried in order to obtain a slot value. The question slot state is a question slot state. The word “slot value” is held by the dialogue, but is not confirmed and the state of the slot to be confirmed (confirmation slot) is “confirmed” state. The state in which the slot value has been confirmed is called the “determined” state.
[0028]
The dialog control unit 140 determines a question item or a confirmation item to be asked next, controls the flow of the dialog, and the progress of the dialog changes the slot state with respect to the retained slot value. Interact until they reach the “determined” state.
[0029]
The response sentence re-requesting unit 141 updates the template number so as to control the same guidance sentence as the previous guidance sentence.
[0030]
The guidance generation unit 150 generates a response sentence (a confirmation response sentence, a question response sentence, etc.) for the speaker (user), and further converts the response sentence (word string) into an acoustic digital signal (voice information). Combine. However, this conversion / synthesis processing may be performed by the audio output unit 160 as exemplified below. To generate a guidance sentence, specifically, a combination of a confirmation slot and a question slot is obtained by referring to the guidance sentence slot in FIG. Further, by obtaining the template number, a template number for generating a guidance sentence different from the immediately preceding guidance sentence is obtained. After that, a guidance sentence is generated by referring to the target slot based on the “store name”, “address”, and “business type” held in the confirmation slot and the question slot.
[0031]
The database 150 mainly includes a speech recognition dictionary, a word dictionary, a slot value candidate word list, a speech synthesis dictionary, and a guidance sentence template (FIG. 3) stored in the guidance storage unit 171.
(A) Dictionary for speech recognition
It consists of a language dictionary for recognition, a sound dictionary for recognition, and the like.
(B) Word dictionary
It has categories, related categories, other attributes, pronunciation information, and the like.
(C) Slot value candidate word list
Candidate word list. A table consisting of pairs of words and their categories.
(D) Speech synthesis dictionary
It has pronunciation rules for speech synthesis regarding utterance inflection, word connection, spacing, and the like.
(E) Guidance sentence template (Fig. 3)
It consists of a plurality of guidance sentence templates with template numbers corresponding to pairs of confirmation slots and question slots.
[0032]
FIG. 4 is a flowchart illustrating a procedure of a process executed by the above-described voice interaction apparatus 100. In this procedure, first, an initial process is executed in step 400 to initialize a target slot, a conversation state, a guidance sentence slot, and a template number. In this process, an initial process such as loading a program and data expected to be frequently used from the database 170 onto a memory having a relatively high access speed is executed. For example, when the voice interactive device 100 has a display device (not shown), other initial processing such as displaying an initial menu screen may be performed.
[0033]
In step 402, the guidance generation unit 150 generates a guidance sentence. In generating the guidance sentence, a combination of a “confirmation slot”, a “question slot”, and a “template number” are obtained by referring to the guidance sentence slot shown in FIG. The guidance sentence template is selected from the guidance sentence template based on the combination of the “confirmation slot” and the “question slot” and the “template number”.
[0034]
In an initial state, first, all target slots do not hold slot values, and thus there are no slots to be checked. First, since the shop name is asked, "shop name" is held in the "question slot", and 1 is set as the template number as an initial value. Accordingly, in the guidance generation unit, the guidance sentence template with the template number “1” whose “confirmation slot” is “empty” and the “question slot” is “store name” is selected. Here, the following guidance sentence is prepared. Example: Dialogue device: "Please say the name of the store."
[0035]
Next, in step 404, in the voice output unit 160, the guidance sentence generated in step 402 is converted and synthesized into an acoustic digital signal (voice information), and is output to the speaker as voice.
[0036]
In step 406, the voice input unit 110 captures the voice of the speaker from the microphone (voice input unit 110) as voice information.
[0037]
In step 408, the speech recognition unit 120 converts the speech information into a character string using a speech recognition dictionary (such as a recognition language dictionary or a recognition acoustic dictionary).
[0038]
In step 410, the word extracting unit 131 of the meaning understanding unit 130 extracts a necessary word (word serving as a slot value) from the above-mentioned character string, and identifies the category of the word. Next, in step 412, the word judgment unit 132 judges whether the word is a target word. If it is the target word, the process proceeds to step 414, and if the extracted word is a word as the slot value of the target slot, the extracted word is held in the target slot as the slot value, and the slot state is updated. If the target word is a confirmation word such as "Yes" or "No", the slot value is deleted and the slot state is updated as necessary. At the same time, the template number is initialized and set to "0" (determination means).
[0039]
In step 416, the dialog control unit 140 determines whether or not all the target slots are in the “determined” state. If it is determined that the state is determined, the end guidance is output in step 420, and the process of the voice interaction device ends. On the other hand, if it is determined that the target slot has not been determined yet, in step 418, the guidance statement slot is set and the template number is updated based on the slot state. Here, if it is determined in step 412 that the word is not the target word, the slot state is not updated. Therefore, in step 418, the same combination of “question slot” and “confirmation slot” as the previous guidance sentence is obtained. A group of guidance sentence templates requesting the same answer is selected (answer sentence re-requesting means). However, since the template number is updated in step 418, even if a template of the same group is selected, the guidance sentence becomes a selection of a guidance sentence template having a different expression. Steps 418 to 418 are repeated to set a word serving as a slot value to all target slots, and set the slot state to “determined”.
[0040]
Hereinafter, the slot value, slot state, and guidance slot state of each target slot will be described in detail using specific examples.
[0041]
(1) The spoken dialogue apparatus asks a question about the “store name”, and the user responds to the question. Here, it is assumed that the speech recognition unit has correctly recognized.
Dialogue device: "Please say the name of the store."
User: "It's a dandelion."
Recognition result: "It is a dandelion."
As a result, “dandelion” is held in the “store name slot”, and the status of the slot and the “store name slot” is updated to “confirm”. In the guidance sentence slot, “store name” is set in “confirmation slot”, “address” is set in “question slot”, and “0” is set as the template number.
[0042]
(2) A “dandelion” is stored in the “store name slot”, and a guidance for asking an “address” is generated while confirming the result. In response, the user answers “address”. Here, it is assumed that erroneous recognition is performed and a valid recognition result is not obtained.
Dialogue device: "Please tell me the address of the dandelion."
User: "I'm Nagoya City."
Recognition result: "How many?"
Since a valid recognition result for “address slot” was not obtained, the slot value and the slot state are not updated, but the template number is updated. Therefore, a guidance sentence template is selected next, and a guidance sentence is generated.
[0043]
(3) Here, a guidance sentence having a different wording from the immediately preceding guidance sentence is generated. In response, the user answers “address”. Here, it is assumed that the speech recognition unit has correctly recognized.
Dialogue device: "Where is the dandelion?"
User: "I'm Nagoya City."
Recognition result: "Nagoya city."
“Nagoya City” is held in the “address slot”, and the slot state is updated to “confirmation”. The slot state of the “store name slot” is updated to “determined”, the template number is initialized, and “0” is set. For the guidance sentence slot, “address” is set to “confirmation slot”, “business type” is set to “question slot”, and “0” is set for the template number. Therefore, the guidance for asking the "business type" is generated while confirming the result of the "address slot" and the result.
[0044]
(4) The user answers "business type". Here, it is assumed that erroneous recognition is performed and a valid recognition result is not obtained.
Dialogue device: "What shop in Nagoya?"
User: "Restaurant."
Recognition result: "Yes, it is."
Since a valid recognition result for the “business type” slot was not obtained, the slot value and the slot state are not updated. In step 418, when the guidance sentence slot is set again, the guidance slot becomes the same as the question slot immediately before, but since the template number is updated, it does not become the same guidance sentence.
[0045]
(5) The guidance for asking the "business type" is selected again, but the guidance sentence template of "2" is selected. In response, the user answers "business type".
Dialogue device: "What kind of business is in Nagoya?"
User: "Restaurant."
Recognition result: "It is a restaurant"
“Restaurant” is held in the “industry” slot, and the slot status is updated to “confirm”. Further, "Nagoya City" is updated to a "fixed" state. For the template number, an initial value “0” is set. In step 418, since the template number is updated when setting the guidance sentence slot, the guidance sentence template with the template number "1" is selected here.
[0046]
(6) Finally, confirmation is made for the “industry slot”. In response, the user answers "yes" and the dialog ends.
Dialogue device: "Are you a restaurant?"
User: "Yes"
Recognition result: "Yes"
[0047]
The guidance sentence output in step 420 refers to the target slot, for example, and outputs the following guidance sentence.
Dialogue device: "Now, set a dandelion at a restaurant in Nagoya City."
Thus, by implementing the above-described invention, the above-described voice interactive device can be realized.
[0048]
As a result, even when the same content is repeatedly provided, it is possible to prompt the user to answer with a guidance sentence having a different expression without performing the same guidance, and without causing discomfort to the user. It is possible to expect a smoother conversation with the voice interaction device.
[0049]
In the above-described embodiment, an example has been described in which three slots of “store name”, “address”, and “business type” are determined. However, at least one other slot may be determined. Further, as other slots are determined, a suitable guidance sentence template may be prepared.
[0050]
Since the guidance sentence template in the above embodiment is one example, other guidance sentence templates may be prepared.
[0051]
Further, a combination of “question slot” and “confirmation slot” of the guidance sentence template may be prepared in addition to the above. A guidance sentence template for “question slot” and “confirmation slot” alone may be prepared.
[0052]
Furthermore, in the above embodiment, the implicit confirmation was performed by putting the immediately preceding recognition result into the next guidance sentence. However, depending on the contents of the intended answer, the question and confirmation may be performed for each slot. Dialogue control may be performed. In this case, the prepared guidance sentence template prepares a plurality of question guidances and confirmation guidances for each target slot. Further, any other necessary guidance may be provided.
[0053]
Furthermore, in the above-described embodiment, an example is shown in which control is performed for implicitly confirming one slot for which an answer was obtained immediately before and for determining an answer for the next one slot. The number of slots for which an answer is required is not limited to one. If necessary, each may be singular or plural. Further, the control may be a control for requesting confirmation only for at least one slot, or a control for requesting only an answer for at least one slot.
[0054]
Further, in the above embodiment, the guidance sentence also shows one example for each of the slot for asking for confirmation and the slot for asking for an answer. Further, any guidance sentence corresponding to the control of the question slot may be used. Therefore, guidance sentences corresponding to at least one slot for each of a question and a confirmation may be prepared as needed.
[0055]
Furthermore, a plurality of confirmation sentence slots and question slots may be prepared as needed.
[0056]
The above-described embodiment is an example of the present invention, and the present invention is not limited to the embodiment. Various modifications can be considered in light of the essence of the present invention.
[Brief description of the drawings]
FIG. 1 is a configuration diagram illustrating a logical configuration of a voice interaction device according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram of a slot table used by a word determination unit 132, an answer sentence re-request unit 141, and a guidance generation unit 150.
FIG. 3 is an explanatory diagram of a guidance sentence template used by a guidance generation unit 150.
FIG. 4 is a flowchart illustrating a procedure of a process executed by the voice interaction device 100;
[Explanation of symbols]
100… voice interaction device
110 ... voice input unit
120… voice recognition unit
130… semantic understanding
131… word extraction unit
132… word judgment unit
140… Dialogue control unit
141… answer sentence re-request section
150 ... Guidance generation unit
160 ... audio output unit
170… Database
171: Guidance storage unit

Claims

A voice recognition device that outputs a guidance sentence to obtain a desired answer from a speaker, analyzes a response sentence obtained from the speaker in response to the guidance sentence, and determines a desired answer.
From the answer sentence, a determining means for determining whether or not the answer intended by the guidance sentence has been obtained,
If it is determined by the determining means that the target answer was not obtained, another different guidance sentence for obtaining the target answer is output by voice, and the answer requesting the speaker again for the answer sentence A spoken dialogue apparatus comprising: a sentence re-requesting unit.

The guidance sentence is a sentence that prompts a speaker for an answer sentence in order to obtain a target answer in at least one slot provided for each different category. The voice interaction device according to claim 1, wherein the guidance sentence is output by voice.

Storage means for storing a plurality of guidance sentences for each of the at least one slot, wherein the answer sentence re-requesting means sequentially selects a plurality of guidance sentences corresponding to at least one slot for which an answer is sought from the storage means; The speech recognition device according to claim 2, wherein the speech recognition device outputs the speech.

The set of a plurality of guidance sentences set for at least one slot for which the answer is requested is composed of a set of guidance sentences having different expressions in order to derive a target answer of the slot. The speech recognition device according to claim 2 or 3.

Outputting a guidance sentence to obtain a desired answer from the speaker;
Analyzing the answer sentence obtained from the speaker in response to the guidance sentence, from the answer sentence, a determination procedure to determine whether the intended answer of the guidance sentence has been obtained,
When it is determined that the target answer is not obtained by the above-described determination procedure, another different guidance sentence for obtaining the target answer is generated, the guidance sentence is output as a voice, and the answer is again sent to the speaker. A spoken dialogue program that causes a computer to execute a requested answer sentence re-request procedure.

The answer sentence re-requesting step further includes a step of outputting a plurality of different guidance sentences for at least one slot for which an answer is sought in order to obtain a target answer in at least one slot provided for each different category. The speech dialogue program according to claim 5, wherein: