JP2005062240A

JP2005062240A - Voice response system

Info

Publication number: JP2005062240A
Application number: JP2003207456A
Authority: JP
Inventors: Toshihiro Ide; 敏博井手; Hiroshi Sugitani; 浩杉谷; Hideo Ueno; 英雄上野; Yayoi Nakamura; やよい中村; Shingo Suzumori; 信吾鈴森; Koji Yamamoto; 幸二山本; Taku Yoshida; 卓吉田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-08-13
Filing date: 2003-08-13
Publication date: 2005-03-10

Abstract

【課題】利用者の置かれている状況から生まれる利用者の求める応対を提供する。
【解決手段】利用者に対して音声応答サービスを提供する音声応答システムであって、利用者音声から音声情報を生成する音声入力部と、利用者が置かれている現在の状況を特定する状況特定部と、利用者の現在の感情を推測する感情分析部と、前記音声入力部により生成された音声情報に対して音声認識処理を実行する音声認識部と、利用者が置かれている状況ごとに、利用者の感情と応対パターンとを対応付けた個人特性情報を含む利用者特性情報が格納される格納部と、前記状況特定部により特定された利用者が置かれている現在の状況と前記感情分析部により推測された利用者の現在の感情とに対応する応対パターンを、前記個人特性情報から取得する応対パターン決定部と、前記応答パターン決定部により取得された応対パターンと前記音声認識部による認識結果とに対応する発話文を取得する応答処理部と、前記応答処理部により取得された発話文を読上げる音声出力部と、を備える。
【選択図】図６An object of the present invention is to provide a response required by a user who is born from a situation where the user is placed.
A voice response system for providing a voice response service to a user, a voice input unit for generating voice information from the user voice, and a situation for specifying a current situation where the user is placed A situation in which a specific unit, an emotion analysis unit that estimates a user's current emotion, a voice recognition unit that performs voice recognition processing on voice information generated by the voice input unit, and a user are placed A storage unit storing user characteristic information including personal characteristic information in which a user's emotion and a response pattern are associated with each other, and a current situation in which the user specified by the situation specifying unit is placed A response pattern corresponding to the user's current emotion estimated by the emotion analysis unit, a response pattern determination unit acquiring the response pattern from the personal characteristic information, and a response pattern acquired by the response pattern determination unit Wherein comprises a response processing unit that acquires a spoken sentence that corresponds to the recognition result by the voice recognition unit, and a sound output section which reads out a spoken sentence acquired by the response processing unit.
[Selection] Figure 6

Description

【０００１】
【発明の属する技術分野】
本発明は、音声を利用した音声応答サービス分野に関する。
【０００２】
【従来の技術】
近年、様々な分野で音声応答サービスが利用されている。このサービスは、音声認識技術を備えたシステムと利用者が対話を行うことで、システムが利用者に対して様々なサービスを提供する（例えば図２６参照）。具体的には、デジタルデバイド世代向けに、キーボードやマウスを利用することなく、音声で簡単に操作できるパソコン、利用者がパソコンや電話から音声を用いてアクセスし、サービスを受けることができるボイスポータル等がある。
【０００３】
一般に、音声応答サービスは、システムが持つシナリオに沿って提供される。
システムは、利用者から発声された音声を認識し、その認識結果とシナリオを照らし合わせ、次の発話文を生成または選択し、応答する。
【０００４】
また近年では、利用者の音声入力の状態から感情を分析し、その分析結果に応じて、応答を変化させることが可能である（特許文献１参照）。
【０００５】
また、利用者のシステム使用頻度による対話進行の変更や、利用者の個人情報を持つことで、利用者の対話進行に関する好みに合わせることが可能である対話制御方法が開示されている（特許文献２参照）。
【０００６】
【特許文献１】
特開平１０−５５１９４号公報
【特許文献２】
特開２００２−９９４０４号公報
【０００７】
【発明が解決しようとする課題】
しかしながら、利用者のシステム使用頻度と個人情報だけを用いて、音声応答を変更する場合は、常に変化する利用者の置かれている状況（利用場所、利用手段、利用時間等）を把握していないため、利用者の置かれている状況から生まれる利用者の求める応対を柔軟に提供することができないという問題がある。即ち、適切な応対を行うためには、利用者の感情を把握することだけでは難しく、また、利用者のシステム使用頻度や個人情報だけを用いても難しい。
【０００８】
また、従来技術では、利用者の音声入力に注目し、音声入力の状態（利用者の声のピッチ、音量）のみから、その利用者の感情を推測し、応答を決定している。それゆえ、例えば、元々早口な人がゆっくり話す場合と、元々ゆっくり話す人が早口で話す場合と、普通の人が通常の速度で話す場合を区別できない。この場合、元々早口な人はくつろいでいると推測するのが、元々ゆっくり話す人は焦っていると推測するのが、それぞれ妥当であると思われる。これらの違いを考慮しないと、間違った状態を把握してしまうということに繋がり、利用者に対して不適当な応答をする可能性があるという問題がある。
【０００９】
本発明の第１の課題は、利用者の置かれている状況から生まれる利用者の求める応対を提供することにある。本発明の第２の課題は、利用者の特性を考慮して、利用者に対して適切な応答をすることにある。
【００１０】
【課題を解決するための手段】
本発明は、上記課題を解決するために、利用者に対して音声応答サービスを提供するシステムであって、利用者音声から音声情報を生成する音声入力部と、利用者が置かれている現在の状況を特定する状況特定部と、利用者の現在の感情を推測する感情分析部と、前記音声入力部により生成された音声情報に対して音声認識処理を実行する音声認識部と、利用者が置かれている状況ごとに、利用者の感情と応対パターンとを対応付けた個人特性情報を含む利用者特性情報が格納される格納部と、前記状況特定部により特定された利用者が置かれている現在の状況と前記感情分析部により推測された利用者の現在の感情とに対応する応対パターンを、前記個人特性情報から取得する応対パターン決定部と、前記応答パターン決定部により取得された応対パターンと前記音声認識部による認識結果とに対応する発話文を取得する応答処理部と、前記応答処理部により取得された発話文を読上げる音声出力部と、を備える構成とした。
【００１１】
本発明によれば、音声出力部により読み上げられる発話文は、応対パターン（利用者が置かれている現在の状況と利用者の現在の感情とにより定まる。）と音声認識部による認識結果により定まる。従って、本発明によれば、利用者の置かれている状況から生まれる利用者の求める応対を提供することが可能となる。
【００１２】
例えば、利用者の置かれている現在の状況が「自宅」の場合は、その利用者は「ゆっくりと落ち着いた応答サービス」を望み、利用者の置かれている現在の状況が「会社」の場合は、その利用者は「淡々とスピーディな応答サービス」を望むという利用者の求める応答サービスに応じることが可能となる。なお、利用者の置かれている現在の状況については、例えば、発信番号を用いることで、又は音声応答サービスの初期の段階で利用者から聞き出すことで、特定（把握）できる。
【００１３】
上記音声応答システムにおいては、例えば、前記利用者の感情と応対パターンとの対応関係を更新する更新部をさらに備える。
【００１４】
このようにすれば、利用者に対する次回の音声応答サービスをより最適化することが可能となる。例えば、音声応答サービスの開始から終了までの間の利用者の応答履歴に基づいて両者の対応関係を更新することが考えられる。
【００１５】
上記音声応答システムにおいては、例えば、前記感情分析部は、予め格納されている個人音声特性情報を用いて比較分析を行うことにより、利用者の現在の感情を推測する。
【００１６】
このようにすれば、利用者の個人音声特性（音量レベルや音声ピッチ等の音声特性）を考慮して、利用者の感情を把握することが可能となるため、利用者に対してより適切な応答をすることが可能となる。
【００１７】
上記音声応答システムにおいては、例えば、前記利用者特性情報、前記音声入力部により生成された音声情報、及び前記音声応答サービス開始から終了までの間の利用者の応答履歴に基づいて利用者特性情報を追加及び更新する利用者分析部をさらに備える。
【００１８】
このようにすれば、利用者に対する次回の音声応答サービスをより最適化することが可能となる。
【００１９】
上記音声応答システムにおいては、例えば、前記応答履歴は、前記音声応答サービスによる音声出力と利用者の音声入力の開始終了のタイミング情報を含み、利用者特性情報は、その利用者の過去に音声応答サービスを利用した際の音声応答サービスによる音声出力と利用者の音声入力の開始終了のタイミング情報を含む。
【００２０】
このようにすれば、利用者に対する次回の音声応答サービスをより最適化することが可能となる。
【００２１】
また、本発明は次のように特定することができる。
【００２２】
利用者に対して音声応答サービスを提供する音声応答システムが、利用者の音声情報から推測される利用者の感情に応じた応対を行う音声応答サービスを提供する方法であって、利用者の置かれている状況、利用者の話し方、前記音声応答サービスの開始から終了までの間の利用者の感情の変化を含む応答履歴から推測される利用者特性情報、及びその利用者特性情報に基づいて得られる利用者の置かれている状況を考慮した感情情報をもとに、利用者に対する応答の順序や応答内容を変更する音声応答サービス提供方法。
【００２３】
このようにすれば、利用者の置かれている状況とその時の利用者の感情に応じて臨機応変に柔軟に応対することができる。
【００２４】
また、本発明は次のように特定することができる。
【００２５】
利用者に対して音声応答サービスを提供する音声応答システムであって、利用者の音声から音声情報を生成する音声入力部と、利用者の利用者特性情報を推測し利用者特性情報を生成する利用者分析部と、利用者の感情を推測し感情情報を生成する感情分析部と、発話文の作成に必要な文章が蓄積されているシナリオ蓄積部と、音声情報から利用者の言葉を識別し、前記利用者特性情報と前記感情情報をもとに、前記シナリオ蓄積部から発話文を作成する応答処理部と、前記発話文を読上げる音声出力部と、を備える音声応答システム。
【００２６】
このようにすれば、利用者の置かれている状況とその時の利用者の感情に応じて、応答の順序や応答内容を変更することができる。
【００２７】
【発明の実施の形態】
以下、本発明の実施形態である音声応答システムについて説明する。
【００２８】
（音声応答システムの原理）
【００２９】
図１〜図４は、本実施形態の音声応答システムの原理を説明するための図である。
【００３０】
（音声応答システムの概略システム構成）
【００３１】
図１に示すように、音声応答システム１０は、音声入力部１１０、音声出力部１２０、応答処理部１３０、利用者認証部１４０、感情分析部１５０、利用者分析部１６０、シナリオ蓄積部４１０、応答履歴格納部４２０、利用者特性情報格納部４３０等を備えている。
【００３２】
（音声応答システムの概略動作）
【００３３】
本実施形態の音声応答システムの動作について、図２、図３、及び図４を参照しながら説明する。
【００３４】
図２は、本実施形態の音声応答システムの動作を説明するためのフローチャートである。図３は、本実施形態の音声応答システムの動作原理図である。これは、データの流れも示している。図４は、利用者分析部の動作を説明するためのフローチャートである。
【００３５】
＜利用者認証（Ｓ１００）＞
【００３６】
利用者認証部１４０は、本システムの利用者とその利用者が置かれている状況を特定する。この特定は、例えば、利用者が電話によって本システムにアクセスした場合には、発信番号通知機能、音声認証技術等、その他利用者を特定できる方法を用いることによって実現可能である。特定された内容は利用者情報として応答処理部１３０へ渡される。
＜利用者特性情報設定（Ｓ１１０）＞
【００３７】
応答処理部１３０は、利用者情報をもとに、利用者特性情報格納部４３０から、利用者が置かれている状況に対応する利用者の個人特性情報を取得する。また、応答処理部１３０は、利用者情報を感情分析部１５０へ渡す。
【００３８】
感情分析部１５０は、応答処理部１３０から渡された利用者情報をもとに、利用者特性情報格納部４３０から、利用者の置かれている状況に対応する利用者の個人音声特性情報を取得する。
＜音声入力（Ｓ１２０）＞
【００３９】
音声入力部１１０は、利用者音声から音声情報を生成し、これを応答処理部１３０と感情分析部１５０に渡す。
【００４０】
＜感情分析（Ｓ１３０）＞
【００４１】
感情分析部１５０は、利用者特性情報格納部４３０から取得された個人音声特性情報を用いて、音声入力部１１０から渡された音声情報から抽出した音声特性情報と比較分析する。これにより、利用者の置かれている状況が考慮され、感情が推測され、感情情報作成される。その感情情報は、応答処理部１３０へ渡される
【００４２】
＜音声認識（Ｓ１４０）＞
【００４３】
応答処理部１３０は、音声入力部１１０から渡された音声情報に対する音声認識処理を実行する。即ち、利用者から発話された言葉の識別を行う。これにより、音声認識処理の結果（識別情報）が取得される。
【００４４】
＜発話文作成（Ｓ１５０）＞
【００４５】
応答処理部１３０は、感情分析部１５０から渡された感情情報、利用者特性情報格納部４３０から取得された個人特性情報、及び利用者の音声情報から取得された識別情報をもとに、シナリオ蓄積部４１０から最適な応答の順序で、最適な応答内容の発話文を選択する。この選択された発話文は音声出力部１２０へ渡される。
【００４６】
＜音声出力（Ｓ１６０）＞
【００４７】
音声出力部１２０は、応答処理部１３０から渡された発話文を、音声合成技術による読み上げ等の利用者にメッセージを伝える何らかの方法によって、利用者へ通知される。
【００４８】
なお、音声入力（Ｓ１２０）から音声出力（Ｓ１６０）までの処理は、音声応答サービスが終了するまで繰り返される。
【００４９】
＜応答履歴登録（Ｓ１８０）＞
【００５０】
利用者と音声応答システム１０の音声応答サービスが終了すると（Ｓ１７０：Ｙｅｓ）、応答処理部１３０は、利用者の置かれている状況（利用場所、利用時間、利用手段等）、利用者の話し方（音量情報、ピッチ情報、発話タイミング情報等）、及びその音声応答間の利用者の感情の変化を含む応答履歴情報を、応答履歴格納部４２０へ渡す（格納する）。
【００５１】
＜利用者分析（Ｓ１９０）＞
【００５２】
図４に示すように、利用者分析部１６０は、応答履歴格納部４２０に応答履歴情報が渡されると、その応答履歴情報を取得する（Ｓ２００）。また、利用者分析部１６０は、利用者特性情報の要求を行い（Ｓ２１０）、その利用者の利用者特性情報があれば（Ｓ２２０：Ｙｅｓ）、利用者特性情報格納部４３０からその利用者の利用者特性情報を取得する（Ｓ２３０）。そして、それらの情報をもとに、その利用者の利用者特性情報が再び分析（又は推測）され（Ｓ２４０）、作成される（Ｓ２５０）。
【００５３】
一方、利用者分析部１６０は、その利用者の利用者特性情報が無い場合は（Ｓ２６０）、一般的な利用者特性情報を取得する（Ｓ２６０）。そして、その利用者の利用者特性情報が新規に作成される。
【００５４】
＜利用者特性情報登録（Ｓ２００）＞
【００５５】
利用者分析部１６０は、上記Ｓ２５０で作成した利用者特性情報を、利用者特性情報格納部４３０へ渡す。これにより、利用者特性情報が追加又は更新される（Ｓ２６０）。なお、ここで作成された利用者特性情報は、利用者の次回の音声応答サービス利用時に用いられる。
【００５６】
以上説明したように、本実施形態においては、利用者の置かれている状況とその時の利用者の感情に応じて、応答の順序や応答内容を変更することができる。
従って、臨機応変に柔軟にすることができる音声応答サービスを行うことが可能となる。
【００５７】
（実施例）
【００５８】
以下、本発明を、ホテル予約用の音声応答システムに適用した例について説明する。
【００５９】
（音声応答システムの概要）
【００６０】
図５は、ホテル予約用の音声応答システム（以下単に音声応答システムという）の概略システム構成を説明するための図である。
【００６１】
音声応答システム１０は、公衆電話網４０に接続されており、電話機３０から利用者の着呼があった場合に、その利用者に対して、ホテル予約のための音声応答サービス（ガイダンス音声の読み上げ等）を行う。
【００６２】
この音声応答サービスは、利用者が置かれている現在の状況と利用者の現在の感情を考慮して行われる。この音声応答サービスを行うため、音声応答システム１０は、後述する利用者特性情報格納部４３０等を備える。
【００６３】
また、同一利用者に対する次回の音声応答サービスをより最適化するため、利用者特性情報格納部４３０の格納内容が更新される。
【００６４】
（音声応答システムの概略システム構成）
【００６５】
図６に示すように、音声応答システム１０は、音声入力部１１０、音声出力部１２０、応答処理部１３０、利用者認証部１４０、感情分析部１５０、利用者分析部１６０、シナリオ蓄積部４１０、応答履歴格納部４２０、利用者特性情報格納部４３０等を備えている。
【００６６】
音声入力部１１０は、電話機３０を介して入力される利用者音声（利用者が発した音声）から音声情報を生成するためのものである。
【００６７】
音声出力部１２０は、応答処理部１３０により取得（選択又は生成等）された発話文（応答文）等を、既存の音声合成技術により読み上げるためのものである。音声出力部１２０により読み上げられた発話文は、電話機３０を介して利用者に報知される。
【００６８】
応答処理部１３０は、図７に示す各種の処理の実行等を行うためのものである。
【００６９】
利用者認証部１４０は、電話機３０から利用者の発呼があった場合に、その利用者とその利用者が置かれている状況を特定（又は認証）するためのものである。これらの認証は、例えば発信者番号通知機能により通知される発信者番号と利用者データベースとを照合することで特定することが考えられる。
【００７０】
感情分析部１５０は、利用者の現在の感情を推測するためのものである。
【００７１】
利用者分析部１６０は、利用者特性情報、音声入力部１２０により生成された音声情報、及び音声応答サービス開始から終了までの間の利用者の応答履歴に基づいて利用者特性情報を追加及び更新するためのものである。
【００７２】
シナリオ蓄積部４１０には、音声応答システム１０のサービスの流れと発話文（シナリオ）が、シナリオの流れと利用者の感情ごとに作成されて蓄積されている。なお、シナリオは、利用者の感情ごとのテーブルとして蓄積されていてもよい。
【００７３】
応答履歴格納部４２０は、音声応答サービスの開始から終了までの間、利用者の応答履歴を格納するためのものである。応答履歴としては、利用者が置かれている状況（利用場所、利用時間、利用手段等）、利用者の話し方（音量情報、ピッチ情報、発話タイミング情報等）、その音声応答間の利用者の感情の変化、音声応答システム１０の音声出力と利用者の音声入力の開始終了のタイミング情報等がある。
【００７４】
利用者特性情報格納部４３０は、利用者が置かれている状況ごとに、利用者の感情と応対パターンとを対応付けた個人特性情報を含む利用者特性情報（図１０）を格納するためのものである。
【００７５】
（音声応答システムの動作）
【００７６】
上記構成の音声応答システムの動作について、図面を参照しながら説明する。
図７は、本実施例の音声応答システムの動作を説明するためのフローチャートである。
【００７７】
＜サービススタート（Ｓ３００）＞
【００７８】
音声応答システム１０は、電話機３０から利用者Ａの着呼があった場合に、以下のホテル予約のための音声応答サービスを開始する。
【００７９】
＜利用者認証（Ｓ３０１）＞
【００８０】
音声応答システム１０は、電話機３０から利用者Ａの着呼があると、利用者認証部１４０により、その利用者Ａとその利用者Ａが置かれている状況を特定（認識）する。特定結果は、利用者情報として応答処理部１３０へ渡される（図６▲１▼）。
【００８１】
ここでは、利用者等の認証のために、音声応答システム１０は、音声出力部１２０により、「ご利用ありがとうございます。こちらは、ホテル予約システムです。お名前よろしいですか？」を読み上げる。この読み上げられた発話文は、電話機３０を介して利用者Ａに報知される。これに対して、利用者Ａが「利用者Ａです。」と応えると、音声応答システム１０は、音声認識処理を実行すること等により、「利用者Ａ」という利用者名を特定（又は認識）する。また、音声応答システム１０は、その利用者Ａが置かれている現在の状況を特定（又は認識）する。この特定は、例えば、発信者電話番号と利用者データベース（電話番号と、利用手段及び利用場所等とを対応付けたもの）とを照合等することにより行う。
ここでは、利用者Ａが置かれている現在の状況として、「固定電話」という利用手段、「会社」という利用場所（電話がかけられている場所）が特定（又は認識）されたものとする。
【００８２】
利用者認証部１４０は、これらの特定結果（「利用者Ａ」、「会社」、「固定電話」）を、利用者情報（図８）として応答処理部１３０へ渡す（図６▲１▼）。
【００８３】
＜利用者特性情報設定（Ｓ３０２）＞
【００８４】
応答処理部１３０は、利用者情報をもとに、利用者特性情報格納部４３０（図９）から、その利用者Ａが置かれている現在の状況（「会社」、「固定電話」）に対応する個人特性情報（図１０）を取得する（図６▲２▼）。この個人特性情報は、自己のメモリ等の所定記憶部に格納（又は設定）される。
【００８５】
また、応答処理部１３０は、利用者情報を感情分析部１５０へ渡す（図６▲３▼）。感情分析部１５０は、応答処理部１３０から渡された利用者情報をもとに、利用者特性情報格納部４３０から、その利用者Ａが置かれている現在の状況（「会社」、「固定電話」）に対応する個人音声特性情報（図１１）を取得する（図６▲４▼）。この個人音声特性情報は、自己のメモリ等の所定記憶部に格納（又は設定）される。
【００８６】
＜音声入力（Ｓ３０３）＞
【００８７】
利用者Ａが発話を開始すると、音声応答システム１０は、音声入力部１１０により、電話機３０を介して入力される利用者Ａの音声から音声情報（図１２）を作成する。この音声情報は、応答処理部１３０と感情分析部１５０に渡される（図６▲５▼▲６▼）。
【００８８】
＜感情分析（Ｓ３０４）＞
【００８９】
感情分析部１５０は、音声入力部１１０から渡された音声情報から抽出した音声特性情報（図１３）から、「音量レベル：６」、「音声ピッチ：８」を抽出する。
【００９０】
次に、感情分析部１５０は、利用者Ａの現在の感情を推測する。この推測のために、利用者Ａの個人音声特性情報（図１１）と、利用者Ａの現在の音声特性情報（「音量レベル：６」、「音声ピッチ：８」）とを比較分析する。利用者Ａの個人音声特性情報（図１２）を参照すると、利用者Ａの現在の音声特性情報（「音量レベル：６」、「音声ピッチ：８」）にはいずれも「普通」が対応している。従って、感情分析部１５０は、利用者Ａの現在の感情が「普通」と推定（又は分析）する。
【００９１】
また、感情分析部１５０は、利用者Ａの現在の応答特性を推測する。この推測のために、音声応答システム１０の発話時間帯と利用者Ａの発話時間帯とが重なっているか否かを判定する。ここで、利用者Ａの音声情報（図１２）を参照すると、音声応答システム１０の発話時間帯と利用者Ａの発話時間帯とが重なっていない。従って、感情分析部１５０は、利用者Ａの応答特性が「普通」と推定（又は分析）する。
【００９２】
以上のように推定（又は分析）された結果（「現在の感情：普通」、「応答特性：普通」）は、感情情報（図１４）として応答処理部１３０へ渡される（図６▲７▼）。
【００９３】
＜音声認識＞
【００９４】
応答処理部１３０は、音声入力部１１０から渡された音声情報に対して音声認識処理を実行する。ここでは、応答処理部１３０は、音声認識の結果、利用者Ａから発話された言葉の識別が行われ、識別情報「９月３０日にホテル音田を予約したい」（図１５）を取得したものとする。
【００９５】
＜発話文作成＞
【００９６】
応答処理部１３０は、感情分析部１５０から渡された感情情報「現在の感情：普通」と音声認識の結果「識別情報：９月３０日にホテル音田を予約したい」をもとに、シナリオ蓄積部４１０から、今のシナリオの流れに合う（ここでは、ステップ３−１）感情が普通のシナリオ（図１６）を選択する。
【００９７】
次に、応答処理部１３０は、利用者Ａが置かれている現在の状況（「会社」、「固定電話」）に対応する個人特性情報から、利用者Ａの現在の感情（「普通」）に対応する応対パターンを取得する。
【００９８】
ここで、利用者Ａが置かれている現在の状況（「会社」、「固定電話」）に対応する個人特性情報（これは図６▲２▼で取得され、自己のメモリ等に格納されている。図１０参照。）を参照すると、利用者Ａの現在の感情（「普通」）に対応する応対パターンは「普通の対応」である。
【００９９】
従って、応答処理部１３０は、利用者Ａの現在の感情（「普通」）に対応する応答パターンとして「普通の対応」を取得する。
【０１００】
このため、応答処理部１３０は、最終的に、上記選択したシナリオ（図１６）から、応答パターン「普通の対応」に対応する「申し訳ございません。満室です。ホテルを条件から検索しますか？」を取得（選択）する。この発話文は音声出力部１２０へ渡される。
【０１０１】
一方、上記感情分析（Ｓ３０４）において、図１７に示すように、音声応答システム１０の発話時間帯と利用者Ａの発話時間帯とが重なっていたとする。この場合、感情分析部１５０は、利用者Ａの応答特性が「せっかち」と推定（又は分析）する。利用者Ａの応答特性が「せっかち」と推定（又は分析）された場合には、応答処理部１３０は、応答パターン「普通の対応」とは無関係に、最終的に、上記選択したシナリオ（図１６）から、応対パターン「スピーディな応対」に対応する「申し訳ございません。ありませんでした。お急ぎですか？」を選択する（図１８）。
【０１０２】
＜音声出力（Ｓ３０７）＞
【０１０３】
音声出力部１２０は、応答処理部１３０から渡された発話文を、音声合成技術により読み上げる。この読み上げられた発話文は、電話機３０を介して利用者Ａに報知（通知）される。
【０１０４】
また、上記音声入力（Ｓ３０３）から音声出力（Ｓ３０７）までの処理は、音声応答サービスが終了するまで繰り返される。
【０１０５】
＜応答履歴登録（Ｓ３０９）＞
【０１０６】
利用者Ａに対する音声応答システム１０の音声応答サービスが終了すると、応答処理部１３０は、利用者Ａが置かれている状況（利用場所、利用時間、利用手段等）、利用者の話し方（音量情報、ピッチ情報、発話タイミング情報等）、その音声応答間の利用者の感情の変化、音声応答システム１０の音声出力と利用者Ａの音声入力の開始終了のタイミング情報を含んだ応答履歴情報（図１９）を応答履歴格納部４２０へ渡す。
【０１０７】
＜利用者分析（Ｓ３１０）＞
【０１０８】
利用者分析部１６０は、応答履歴格納部４２０に応答履歴情報が渡されると、その応答履歴情報を取得する。また、利用者分析部１６０は、利用者特性情報格納部４３０から、その利用者Ａの利用者特性情報があれば、これを（図９）取得する。
【０１０９】
次に、利用者分析部１６０は、その利用者Ａの利用者特性情報を更新する。この更新のために、利用者分析部１６０は、応答履歴情報と利用者特性情報とを比較分析する。利用者Ａの応答履歴情報（図１９）を参照すると、利用者Ａの感情は最初、普通であったが、音声応答サービスの最後で、嫌悪となっている。これは、音声応答サービス中に利用者Ａを不機嫌とするなんらかの原因があったと推測できる。また、音声出力と音声入力のタイミングを比較すると、音声応答サービスの開始から終了まで、音声出力中に、音声入力を行っていることが分かる。
これは、利用者Ａは、音声出力のメッセージが長かったことが原因で、嫌悪感を感じたと考えられる。つまり、感情が「普通」である時、この利用者Ａの求める応対パターンは、「スピーディな応対」であるということが分かる（後述のルール２）。
【０１１０】
従って、利用者分析部１６０は、利用者Ａの利用者特性情報（図１０）における応対パターン「普通の応対」を、「スピーディな応対」に修正（再作成）する（図２０）。
【０１１１】
一方、利用者分析部１６０は、利用者特性情報格納部４３０にその利用者Ａの利用者特性情報が無い場合は、一般的な利用者特性情報（図２１）を取得し、上記と同様に修正を行い、その利用者Ａの利用者特性情報を新規に作成する。
【０１１２】
図２２は、利用者分析部１６０が分析に用いるテーブルの例を示している。これは、「ＩＦ〜ＴＨＥＮ」形式のルールを集めた知識ベースとなっている。同図は、３つのルールを例示する。なお、＃で始まる行はコメントである。
【０１１３】
ルール１は、新規利用者に対して適用されるルールである。即ち、利用回数が０の場合（上記説明では、利用者特性情報格納部４３０に利用者Ａの利用者特性情報が無い場合）、音量レベル等の各種パラメータを設定する。
【０１１４】
ルール２は、利用者の感情が「通常」から「嫌悪」に変化し、利用者が音声出力中に応答し、かつ、応答パターンが「普通の応対」である場合は、応答パターン「普通の対応」を「スピーディな応対」に修正する。これについては、既に説明した。
【０１１５】
ルール３は、ルール２とは逆の修正を行うためのルールである。即ち、感情の変化がなく、利用者が音声出力中に応答せず、かつ、応答パターンが「普通の応対」である場合は、応答パターン「普通の応対」を「丁寧な応対」に修正する。
【０１１６】
＜利用者特性情報登録（Ｓ３１１）＞
【０１１７】
利用者分析部１６０は、上記Ｓ３１０で作成された利用者特性情報を、利用者特性情報格納部４３０に追加又は更新する。この追加又は更新された利用者特性情報は、利用者の次回の音声応答サービス利用時に用いられる。
【０１１８】
つまり、利用者Ａが置かれている状況が次回も「会社」、「固定電話」で、かつ、利用者Ａの感情が「普通」であった場合は、修正後の応対パターン「スピーディな応対」にて、応対が行われることになる。
【０１１９】
これにより、同一利用者Ａに対する次回の音声応答サービスをより最適化することが可能となる。
【０１２０】
また、音声応答システム１０による音声応答サービスを、利用者Ａが置かれている現在の状況と利用者Ａの現在の感情を考慮して行うことが可能となる。
なお、本実施例で説明した利用者特性情報（図９）は一例であり、これに代えて、例えば、図２３に示すように規定された利用者特性情報を用いてもよい。この利用者特性情報には、音声レベルや音声ピッチが幅（例えば０〜４や０〜７）により規定されている。このため、より適切に利用者の感情を推測することが可能になる。
なお、図２２の利用者特性情報を用いた場合、この利用者特性情報は、例えば、図２４に示すように修正（再作成）される。また、この場合、一般的な利用者特性情報として、例えば、図２５に示すものが用いられる。
【０１２１】
本発明は、その精神または主要な特徴から逸脱することなく、他の様々な形で実施することができる。このため、上記の実施形態は、あらゆる点で単なる例示にすぎず、限定的に解釈されるものではない。
【０１２２】
本発明は、次のように特定することもできる。
（付記１）利用者に対して音声応答サービスを提供する音声応答システムであって、利用者音声から音声情報を生成する音声入力部と、利用者が置かれている現在の状況を特定する状況特定部と、利用者の現在の感情を推測する感情分析部と、前記音声入力部により生成された音声情報に対して音声認識処理を実行する音声認識部と、利用者が置かれている状況ごとに、利用者の感情と応対パターンとを対応付けた個人特性情報を含む利用者特性情報が格納される格納部と、前記状況特定部により特定された利用者が置かれている現在の状況と前記感情分析部により推測された利用者の現在の感情とに対応する応対パターンを、前記個人特性情報から取得する応対パターン決定部と、前記応答パターン決定部により取得された応対パターンと前記音声認識部による認識結果とに対応する発話文を取得する応答処理部と、前記応答処理部により取得された発話文を読上げる音声出力部と、を備える音声応答システム。
（付記２）前記利用者の感情と応対パターンとの対応関係を更新する更新部をさらに備える付記１に記載の音声応答システム。
（付記３）前記感情分析部は、予め格納されている個人音声特性情報を用いて比較分析を行うことにより、利用者の現在の感情を推測する付記１に記載の音声応答システム。
（付記４）前記利用者特性情報、前記音声入力部により生成された音声情報、及び前記音声応答サービス開始から終了までの間の利用者の応答履歴に基づいて利用者特性情報を追加及び更新する利用者分析部をさらに備える付記３に記載の音声応答システム。
（付記５）前記応答履歴は、前記音声応答サービスによる音声出力と利用者の音声入力の開始終了のタイミング情報を含み、利用者特性情報は、その利用者の過去に音声応答サービスを利用した際の音声応答サービスによる音声出力と利用者の音声入力の開始終了のタイミング情報を含む付記４に記載の音声応答システム。
（付記６）利用者に対して音声応答サービスを提供する音声応答システムによる音声応答方法であって、利用者音声から音声情報を生成するステップと、利用者が置かれている現在の状況を特定するステップと、利用者の現在の感情を推測するステップと、前記生成された音声情報に対して音声認識処理を実行するステップと、前記特定された利用者が置かれている現在の状況と前記推測された利用者の現在の感情とに対応する応対パターンを、利用者が置かれている状況ごとに利用者の感情と応対パターンとを対応付けた個人特性情報から取得するステップと、前記取得された応対パターンと前記音声認識結果とに対応する発話文を取得するステップと、前記取得された発話文を読み上げるステップと、を備える音声応答方法。
（付記７）利用者に対して音声応答サービスを提供する音声応答システムが、利用者の音声情報から推測される利用者の感情に応じた応対を行う音声応答サービスを提供する方法であって、利用者の置かれている状況、利用者の話し方、前記音声応答サービスの開始から終了までの間の利用者の感情の変化を含む応答履歴から推測される利用者特性情報、及びその利用者特性情報に基づいて得られる利用者の置かれている状況を考慮した感情情報をもとに、利用者に対する応答の順序や応答内容を変更する音声応答サービス提供方法。
（付記８）利用者に対して音声応答サービスを提供する音声応答システムであって、利用者の音声から音声情報を生成する音声入力部と、利用者の利用者特性情報を推測し利用者特性情報を生成する利用者分析部と、利用者の感情を推測し感情情報を生成する感情分析部と、発話文の作成に必要な文章が蓄積されているシナリオ蓄積部と、音声情報から利用者の言葉を識別し、前記利用者特性情報と前記感情情報をもとに、前記シナリオ蓄積部から発話文を作成する応答処理部と、前記発話文を読上げる音声出力部と、を備える音声応答システム。
【０１２３】
【発明の効果】
以上説明したように、本発明によれば、利用者の特性情報及び特性情報を加味して得られる感情情報に応じた応対が可能となるため、以下のような効果がある。
【０１２４】
第１に、利用者の置かれている状況を把握し、その利用者の置かれた状況に応じた臨機応変な応対をすることが可能となる。第２に、利用者の話し方に影響されずに利用者の感情を把握し、その感情に応じた臨機応変な応対をすることが可能となる。
【０１２５】
また、利用者の置かれている状況とその時の利用者の感情に応じた臨機応変な応対をすることが可能となる。また、利用者の置かれている状況による利用者特性の変化を踏まえることにより、利用者の感情をより正確に推測することができる。さらに、利用者にとって気の利いた音声応答サービスを提供することができる。
これらはサービスの顧客満足度の向上に繋がり、リピータの獲得、サービス利用者（顧客）の増加にも繋がる。
【図面の簡単な説明】
【図１】本実施形態の音声応答システムの原理を説明するための図である。
【図２】本実施形態の音声応答システムの動作を説明するためのフローチャートである。
【図３】本実施形態の音声応答システムの動作原理図である。
【図４】利用者分析部の動作を説明するためのフローチャートである。
【図５】ホテル予約用の音声応答システムの概略システム構成を説明するための図である。
【図６】ホテル予約用の音声応答システムの概略システム構成を説明するための図である。
【図７】ホテル予約用の音声応答システムの動作を説明するためのフローチャートである。
【図８】利用者情報の例である。
【図９】利用者特性情報の例である。
【図１０】個人特性情報の例である。
【図１１】個人音声特性情報の例である。
【図１２】音声情報の例である。
【図１３】音声特性情報の例である。
【図１４】感情情報の例である。
【図１５】識別情報の例である。
【図１６】シナリオの例である。
【図１７】音声情報の例である。
【図１８】発話文の例である。
【図１９】応答履歴情報の例である。
【図２０】再作成された利用者特性情報の例である。
【図２１】一般的な利用者特性情報の例である。
【図２２】利用者分析部が分析に用いるテーブルの例である。
【図２３】利用者特性情報の他の例である。
【図２４】再作成された利用者特性情報の他の例である。
【図２５】一般的な利用者特性情報の他の例である。
【図２６】従来技術を説明するための図である。
【符号の説明】
１０音声応答システム
３０電話機
４０公衆網
１１０音声入力部
１２０音声出力部
１３０応答処理部
１４０利用者認証部
１５０感情分析部
１６０利用者分析部
４１０シナリオ蓄積部
４２０応答履歴格納部
４３０利用者特性情報格納部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to the field of voice response services using voice.
[0002]
[Prior art]
In recent years, voice response services have been used in various fields. In this service, the system provides various services to the user by the user interacting with the system having the voice recognition technology (see, for example, FIG. 26). Specifically, for the digital divide generation, a personal computer that can be easily operated by voice without using a keyboard or mouse, and a voice portal that allows users to access and receive services from a personal computer or telephone. Etc.
[0003]
In general, the voice response service is provided in accordance with a scenario of the system.
The system recognizes the voice uttered by the user, compares the recognition result with the scenario, generates or selects the next utterance sentence, and responds.
[0004]
In recent years, it is also possible to analyze emotions from the state of voice input by the user and change the response according to the analysis result (see Patent Document 1).
[0005]
In addition, a dialog control method is disclosed that can be changed according to the user's system usage frequency, and can be adapted to the user's preference regarding dialog progress by having personal information of the user (Patent Document) 2).
[0006]
[Patent Document 1]
Japanese Patent Laid-Open No. 10-55194
[Patent Document 2]
JP 2002-99404 A
[0007]
[Problems to be solved by the invention]
However, when changing the voice response using only the user's system usage frequency and personal information, the situation (usage location, usage means, usage time, etc.) of the changing user is always known. Therefore, there is a problem that it is not possible to flexibly provide a response that a user is born from the situation where the user is placed. That is, in order to perform an appropriate response, it is difficult only to grasp the user's emotions, and it is difficult to use only the user's system frequency and personal information.
[0008]
In the prior art, attention is paid to the voice input of the user, and the user's emotion is estimated only from the voice input state (the pitch and volume of the user's voice), and the response is determined. Therefore, for example, it is not possible to distinguish between a case where an originally fast-speaking person speaks slowly, a case where an originally slowly-speaking person speaks quickly and a case where a normal person speaks at a normal speed. In this case, it seems reasonable to assume that people who are quick-hearted are relaxed, but that people who are slowly speaking originally are impatient. If these differences are not taken into account, there is a problem that an incorrect state can be responded to the user because it leads to grasping the wrong state.
[0009]
A first object of the present invention is to provide a response required by a user born from a situation where the user is placed. The second problem of the present invention is to give an appropriate response to the user in consideration of the characteristics of the user.
[0010]
[Means for Solving the Problems]
In order to solve the above-described problems, the present invention is a system that provides a voice response service to a user, and a voice input unit that generates voice information from the user voice and a user are placed at present. A situation identifying unit that identifies the situation of the user, an emotion analyzing unit that estimates a user's current emotion, a speech recognition unit that performs speech recognition processing on the speech information generated by the speech input unit, and a user For each situation where user characteristics information including personal characteristic information in which a user's emotion is associated with a response pattern is stored, and a user specified by the situation specifying unit is stored. A response pattern determination unit that acquires from the personal characteristic information, a response pattern determination unit that acquires a response pattern that corresponds to the current situation of the user and the user's current emotion estimated by the emotion analysis unit. A response processing unit that acquires a spoken sentence that the answering pattern corresponding to a recognition result by the voice recognition unit has, and a configuration and an audio output unit which reads out the utterance acquired by the response processing unit.
[0011]
According to the present invention, the uttered sentence read out by the voice output unit is determined by the response pattern (determined by the current situation where the user is placed and the user's current emotion) and the recognition result by the voice recognition unit. . Therefore, according to the present invention, it is possible to provide a response required by a user born from a situation where the user is placed.
[0012]
For example, if the current situation where the user is located is “home”, the user wants a “slow and calm response service” and the current situation where the user is located is “company”. In this case, the user can respond to the response service requested by the user that he / she wants a “quickly and speedy response service”. Note that the current situation where the user is placed can be specified (understood) by using, for example, a transmission number or by hearing from the user at an early stage of the voice response service.
[0013]
The voice response system further includes, for example, an update unit that updates a correspondence relationship between the user's emotion and a response pattern.
[0014]
In this way, the next voice response service for the user can be further optimized. For example, it is conceivable to update the correspondence between the two based on the response history of the user from the start to the end of the voice response service.
[0015]
In the voice response system, for example, the emotion analysis unit estimates a user's current emotion by performing a comparative analysis using personal voice characteristic information stored in advance.
[0016]
In this way, it becomes possible to grasp the user's emotions in consideration of the user's personal voice characteristics (sound characteristics such as volume level and voice pitch), which is more appropriate for the user. It becomes possible to respond.
[0017]
In the voice response system, for example, the user characteristic information based on the user characteristic information, the voice information generated by the voice input unit, and the response history of the user from the start to the end of the voice response service. Is further provided with a user analysis unit for adding and updating.
[0018]
In this way, the next voice response service for the user can be further optimized.
[0019]
In the voice response system, for example, the response history includes timing information on the start and end of voice output by the voice response service and voice input of the user, and the user characteristic information is voice response in the past of the user. It includes the timing information of the voice output by voice response service when using the service and the start / end of the voice input by the user.
[0020]
In this way, the next voice response service for the user can be further optimized.
[0021]
Further, the present invention can be specified as follows.
[0022]
A voice response system that provides a voice response service to a user is a method of providing a voice response service that responds to a user's emotions inferred from the user's voice information. The user characteristics information estimated from the response history including changes in the user's emotions from the start to the end of the voice response service, and the user characteristics information A method for providing a voice response service in which the order of responses to users and the content of responses are changed based on emotion information that takes into account the user's situation.
[0023]
In this way, it is possible to respond flexibly to the occasion according to the situation where the user is placed and the emotion of the user at that time.
[0024]
Further, the present invention can be specified as follows.
[0025]
A voice response system that provides a voice response service to a user. The voice input unit generates voice information from the user's voice, and generates user characteristic information by estimating user characteristic information of the user. Identifies user's words from voice analysis, user analysis section, emotion analysis section that guesses user's emotion and generates emotion information, scenario storage section that stores sentences necessary to create spoken sentences A voice response system comprising: a response processing unit that creates an utterance sentence from the scenario storage unit based on the user characteristic information and the emotion information; and a voice output unit that reads the utterance sentence.
[0026]
In this way, the order of responses and the content of responses can be changed according to the situation where the user is placed and the emotion of the user at that time.
[0027]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a voice response system according to an embodiment of the present invention will be described.
[0028]
(Principle of voice response system)
[0029]
1-4 is a figure for demonstrating the principle of the voice response system of this embodiment.
[0030]
(Outline system configuration of voice response system)
[0031]
As shown in FIG. 1, the voice response system 10 includes a voice input unit 110, a voice output unit 120, a response processing unit 130, a user authentication unit 140, an emotion analysis unit 150, a user analysis unit 160, a scenario storage unit 410, A response history storage unit 420, a user characteristic information storage unit 430, and the like are provided.
[0032]
(Outline operation of voice response system)
[0033]
The operation of the voice response system according to this embodiment will be described with reference to FIGS. 2, 3, and 4. FIG.
[0034]
FIG. 2 is a flowchart for explaining the operation of the voice response system of this embodiment. FIG. 3 is an operation principle diagram of the voice response system of the present embodiment. This also shows the data flow. FIG. 4 is a flowchart for explaining the operation of the user analysis unit.
[0035]
<User authentication (S100)>
[0036]
The user authentication unit 140 identifies the user of this system and the situation where the user is placed. For example, when the user accesses the system by telephone, this identification can be realized by using other methods that can identify the user, such as a calling number notification function and a voice authentication technique. The specified content is passed to the response processing unit 130 as user information.
<User characteristic information setting (S110)>
[0037]
The response processing unit 130 acquires the personal characteristic information of the user corresponding to the situation where the user is placed from the user characteristic information storage unit 430 based on the user information. In addition, the response processing unit 130 passes the user information to the emotion analysis unit 150.
[0038]
Based on the user information passed from the response processing unit 130, the emotion analysis unit 150 obtains the user's personal voice characteristic information corresponding to the situation where the user is placed from the user characteristic information storage unit 430. get.
<Voice input (S120)>
[0039]
The voice input unit 110 generates voice information from the user voice and passes it to the response processing unit 130 and the emotion analysis unit 150.
[0040]
<Emotion analysis (S130)>
[0041]
The emotion analysis unit 150 uses the personal voice characteristic information acquired from the user characteristic information storage unit 430 and performs comparison analysis with the voice characteristic information extracted from the voice information passed from the voice input unit 110. Thereby, the situation where the user is placed is taken into consideration, the emotion is estimated, and emotion information is created. The emotion information is passed to the response processing unit 130.
[0042]
<Voice recognition (S140)>
[0043]
The response processing unit 130 performs voice recognition processing on the voice information passed from the voice input unit 110. That is, the words uttered by the user are identified. Thereby, the result (identification information) of the voice recognition process is acquired.
[0044]
<Spoken sentence creation (S150)>
[0045]
The response processing unit 130 creates a scenario based on the emotion information passed from the emotion analysis unit 150, the personal characteristic information acquired from the user characteristic information storage unit 430, and the identification information acquired from the user's voice information. The utterance sentence with the optimal response content is selected from the storage unit 410 in the optimal response order. The selected utterance sentence is passed to the voice output unit 120.
[0046]
<Audio output (S160)>
[0047]
The voice output unit 120 notifies the user of the utterance sentence delivered from the response processing unit 130 by some method of transmitting a message to the user such as reading out by a voice synthesis technique.
[0048]
Note that the processing from voice input (S120) to voice output (S160) is repeated until the voice response service ends.
[0049]
<Response history registration (S180)>
[0050]
When the voice response service of the user and the voice response system 10 ends (S170: Yes), the response processing unit 130 determines the situation where the user is placed (use place, use time, use means, etc.), how the user speaks. Response history information including changes in the user's emotion between the sound response (volume information, pitch information, utterance timing information, etc.) is passed (stored) to the response history storage unit 420.
[0051]
<User analysis (S190)>
[0052]
As shown in FIG. 4, when the response history information is passed to the response history storage unit 420, the user analysis unit 160 acquires the response history information (S200). Further, the user analysis unit 160 requests user characteristic information (S210), and if there is user characteristic information of the user (S220: Yes), the user characteristic information storage unit 430 sends the user characteristic information. User characteristic information is acquired (S230). Based on the information, the user characteristic information of the user is analyzed (or estimated) again (S240) and created (S250).
[0053]
On the other hand, when there is no user characteristic information of the user (S260), the user analysis unit 160 acquires general user characteristic information (S260). And the user characteristic information of the user is newly created.
[0054]
<User characteristic information registration (S200)>
[0055]
The user analysis unit 160 passes the user characteristic information created in S250 to the user characteristic information storage unit 430. Thereby, user characteristic information is added or updated (S260). The user characteristic information created here is used when the user uses the voice response service next time.
[0056]
As described above, in the present embodiment, the order of responses and the content of responses can be changed according to the situation where the user is placed and the emotion of the user at that time.
Therefore, it is possible to provide a voice response service that can be flexibly changed.
[0057]
(Example)
[0058]
Hereinafter, an example in which the present invention is applied to a voice response system for hotel reservation will be described.
[0059]
(Outline of voice response system)
[0060]
FIG. 5 is a diagram for explaining a schematic system configuration of a voice response system for hotel reservation (hereinafter simply referred to as a voice response system).
[0061]
The voice response system 10 is connected to a public telephone network 40, and when a user receives an incoming call from the telephone 30, a voice response service for hotel reservation (reading of guidance voice) is given to the user. Etc.).
[0062]
This voice response service is performed in consideration of the current situation where the user is placed and the user's current emotion. In order to perform this voice response service, the voice response system 10 includes a user characteristic information storage unit 430 described later.
[0063]
Further, in order to further optimize the next voice response service for the same user, the stored contents of the user characteristic information storage unit 430 are updated.
[0064]
(Outline system configuration of voice response system)
[0065]
As shown in FIG. 6, the voice response system 10 includes a voice input unit 110, a voice output unit 120, a response processing unit 130, a user authentication unit 140, an emotion analysis unit 150, a user analysis unit 160, a scenario storage unit 410, A response history storage unit 420, a user characteristic information storage unit 430, and the like are provided.
[0066]
The voice input unit 110 is for generating voice information from user voice (voice uttered by the user) input via the telephone 30.
[0067]
The voice output unit 120 is for reading out an utterance sentence (response sentence) or the like acquired (selected or generated) by the response processing unit 130 using an existing voice synthesis technique. The utterance sentence read out by the voice output unit 120 is notified to the user via the telephone 30.
[0068]
The response processing unit 130 is for performing various processes shown in FIG.
[0069]
The user authentication unit 140 is for specifying (or authenticating) the user and the situation where the user is placed when a call is made from the telephone 30. These authentications can be specified by, for example, comparing the caller number notified by the caller number notification function with the user database.
[0070]
The emotion analysis unit 150 is for estimating the current emotion of the user.
[0071]
The user analysis unit 160 adds and updates the user characteristic information based on the user characteristic information, the voice information generated by the voice input unit 120, and the response history of the user from the start to the end of the voice response service. Is to do.
[0072]
In the scenario storage unit 410, the service flow and utterance (scenario) of the voice response system 10 are created and stored for each scenario flow and user emotion. The scenario may be accumulated as a table for each user's emotion.
[0073]
The response history storage unit 420 is for storing a user's response history from the start to the end of the voice response service. The response history includes the situation in which the user is placed (use location, use time, use method, etc.), how the user speaks (volume information, pitch information, utterance timing information, etc.), There are emotional changes, voice output of the voice response system 10 and timing information of start / end of voice input by the user.
[0074]
The user characteristic information storage unit 430 stores user characteristic information (FIG. 10) including personal characteristic information in which a user's emotion is associated with a response pattern for each situation where the user is placed. Is.
[0075]
(Operation of voice response system)
[0076]
The operation of the voice response system configured as described above will be described with reference to the drawings.
FIG. 7 is a flowchart for explaining the operation of the voice response system according to this embodiment.
[0077]
<Service start (S300)>
[0078]
The voice response system 10 starts the following voice response service for hotel reservation when the user A receives an incoming call from the telephone 30.
[0079]
<User authentication (S301)>
[0080]
When there is an incoming call from the telephone 30 to the user A, the voice response system 10 uses the user authentication unit 140 to identify (recognize) the user A and the situation where the user A is placed. The specific result is passed to the response processing unit 130 as user information ((1) in FIG. 6).
[0081]
Here, the voice response system 10 reads out “Thank you for using this. This is a hotel reservation system. Are you sure you want your name?” By the voice output unit 120 for user authentication. The uttered sentence read out is notified to the user A via the telephone 30. On the other hand, when the user A responds “user A”, the voice response system 10 identifies (or recognizes) the user name “user A” by executing voice recognition processing or the like. ) Further, the voice response system 10 specifies (or recognizes) the current situation in which the user A is placed. This specification is performed, for example, by collating a caller telephone number with a user database (a telephone number is associated with a usage means, a usage location, etc.).
Here, it is assumed that, as the current situation where the user A is placed, the usage means “landline phone” and the usage location “place” (location where the phone is called) are specified (or recognized). .
[0082]
The user authentication unit 140 passes these specific results (“user A”, “company”, “fixed phone”) to the response processing unit 130 as user information (FIG. 8) ((1) in FIG. 6). .
[0083]
<User characteristic information setting (S302)>
[0084]
Based on the user information, the response processing unit 130 changes from the user characteristic information storage unit 430 (FIG. 9) to the current situation (“company”, “fixed phone”) in which the user A is placed. Corresponding personal characteristic information (FIG. 10) is acquired ((2) in FIG. 6). This personal characteristic information is stored (or set) in a predetermined storage unit such as its own memory.
[0085]
The response processing unit 130 passes the user information to the emotion analysis unit 150 ((3) in FIG. 6). The emotion analysis unit 150 determines the current situation (“company”, “fixed”) from the user characteristic information storage unit 430 based on the user information passed from the response processing unit 130. Personal voice characteristic information (FIG. 11) corresponding to “telephone”) is acquired ((4) in FIG. 6). This personal voice characteristic information is stored (or set) in a predetermined storage unit such as its own memory.
[0086]
<Voice input (S303)>
[0087]
When the user A starts speaking, the voice response system 10 creates voice information (FIG. 12) from the voice of the user A input via the telephone 30 by the voice input unit 110. This voice information is passed to the response processing unit 130 and the emotion analysis unit 150 (FIG. 6 (5) (6)).
[0088]
<Emotion analysis (S304)>
[0089]
The emotion analysis unit 150 extracts “volume level: 6” and “sound pitch: 8” from the voice characteristic information (FIG. 13) extracted from the voice information passed from the voice input unit 110.
[0090]
Next, the emotion analysis unit 150 estimates the current emotion of the user A. For this estimation, the personal voice characteristic information of user A (FIG. 11) and the current voice characteristic information of user A (“volume level: 6”, “voice pitch: 8”) are compared and analyzed. Referring to the personal voice characteristic information of user A (FIG. 12), “normal” corresponds to the current voice characteristic information of user A (“volume level: 6”, “voice pitch: 8”). ing. Therefore, the emotion analysis unit 150 estimates (or analyzes) that the current emotion of the user A is “normal”.
[0091]
The emotion analysis unit 150 also estimates the current response characteristics of the user A. For this estimation, it is determined whether or not the speech time zone of the voice response system 10 and the user A's speech time zone overlap. Here, referring to the voice information of user A (FIG. 12), the speech time zone of voice response system 10 and the speech time zone of user A do not overlap. Therefore, the emotion analysis unit 150 estimates (or analyzes) that the response characteristic of the user A is “normal”.
[0092]
The results (“current emotion: normal”, “response characteristics: normal”) estimated (or analyzed) as described above are passed to the response processing unit 130 as emotion information (FIG. 14) (FIG. 6 (7)). ).
[0093]
<Voice recognition>
[0094]
The response processing unit 130 performs voice recognition processing on the voice information passed from the voice input unit 110. Here, as a result of the voice recognition, the response processing unit 130 identifies the words spoken by the user A, and acquires the identification information “I would like to reserve Hotel Onda on September 30” (FIG. 15). Shall.
[0095]
<Spoken sentence creation>
[0096]
Based on the emotion information “current emotion: normal” and the voice recognition result “identification information: I would like to reserve Hotel Onda on September 30”, the response processing unit 130 makes a scenario. From the accumulation unit 410, a scenario (FIG. 16) in which the emotion is normal (FIG. 16) matching the current scenario flow (in this case, step 3-1) is selected.
[0097]
Next, the response processing unit 130 obtains the current emotion (“normal”) of the user A from the personal characteristic information corresponding to the current situation (“company”, “fixed phone”) in which the user A is placed. Get the response pattern corresponding to.
[0098]
Here, personal characteristic information corresponding to the current situation (“company”, “landline telephone”) in which user A is placed (this is obtained in FIG. 6 (2) and stored in its own memory or the like. Referring to FIG. 10, the response pattern corresponding to the current emotion (“normal”) of user A is “normal response”.
[0099]
Therefore, the response processing unit 130 acquires “normal response” as a response pattern corresponding to the current emotion (“normal”) of the user A.
[0100]
For this reason, the response processing unit 130 finally “sorry. Full of rooms corresponding to the response pattern“ normal response ”from the selected scenario (FIG. 16). "Is acquired (selected). This spoken sentence is passed to the voice output unit 120.
[0101]
On the other hand, in the emotion analysis (S304), it is assumed that the utterance time zone of the voice response system 10 and the utterance time zone of the user A overlap as shown in FIG. In this case, the emotion analysis unit 150 estimates (or analyzes) that the response characteristic of the user A is “impatient”. When the response characteristic of the user A is estimated (or analyzed) as “impatient”, the response processing unit 130 finally determines the selected scenario (FIG. 1) regardless of the response pattern “ordinary correspondence”. 16) Select “I'm sorry. No, you are in a hurry?” Corresponding to the response pattern “Speedy response” (FIG. 18).
[0102]
<Audio output (S307)>
[0103]
The voice output unit 120 reads out the utterance sentence delivered from the response processing unit 130 using a voice synthesis technique. The uttered sentence read out is notified (notified) to the user A via the telephone 30.
[0104]
The processing from the voice input (S303) to the voice output (S307) is repeated until the voice response service is completed.
[0105]
<Response history registration (S309)>
[0106]
When the voice response service of the voice response system 10 for the user A is finished, the response processing unit 130 displays the situation where the user A is placed (use location, use time, use means, etc.), how the user speaks (volume information , Pitch information, utterance timing information, etc.), response history information including timing information of the user's emotion change during the voice response, voice output of the voice response system 10 and the start / end timing of voice input of the user A 19) to the response history storage unit 420.
[0107]
<User analysis (S310)>
[0108]
When the response history information is passed to the response history storage unit 420, the user analysis unit 160 acquires the response history information. Moreover, if there exists user characteristic information of the user A from the user characteristic information storage part 430, the user analysis part 160 will acquire this (FIG. 9).
[0109]
Next, the user analysis unit 160 updates the user characteristic information of the user A. For this update, the user analysis unit 160 compares and analyzes the response history information and the user characteristic information. Referring to the response history information of the user A (FIG. 19), the emotion of the user A is normal at first, but is disgusting at the end of the voice response service. It can be inferred that this was due to some reason for making the user A unhappy during the voice response service. Further, comparing the timing of voice output and voice input, it can be seen that voice input is performed during voice output from the start to the end of the voice response service.
It is considered that this is because user A felt disgust because of the long voice output message. That is, when the emotion is “normal”, it can be understood that the response pattern desired by the user A is “speedy response” (rule 2 described later).
[0110]
Therefore, the user analysis unit 160 corrects (recreates) the response pattern “normal response” in the user characteristic information (FIG. 10) of the user A to “speedy response” (FIG. 20).
[0111]
On the other hand, when there is no user characteristic information of the user A in the user characteristic information storage unit 430, the user analysis unit 160 acquires general user characteristic information (FIG. 21), and the same as above. Correction is performed and user characteristic information of the user A is newly created.
[0112]
FIG. 22 shows an example of a table used by the user analysis unit 160 for analysis. This is a knowledge base in which rules of the “IF to THEN” format are collected. The figure illustrates three rules. A line starting with # is a comment.
[0113]
Rule 1 is a rule applied to a new user. That is, when the number of uses is 0 (in the above description, when there is no user characteristic information of the user A in the user characteristic information storage unit 430), various parameters such as a volume level are set.
[0114]
Rule 2 is that when the user's emotion changes from “normal” to “disgust”, the user responds during voice output, and the response pattern is “ordinary reception”, the response pattern “normal” Correct "Response" to "Speedy response". This has already been explained.
[0115]
Rule 3 is a rule for performing the reverse correction to rule 2. That is, if there is no change in emotion, the user does not respond during voice output, and the response pattern is “ordinary reception”, the response pattern “ordinary reception” is corrected to “polite reception”. .
[0116]
<User characteristic information registration (S311)>
[0117]
The user analysis unit 160 adds or updates the user characteristic information created in S310 to the user characteristic information storage unit 430. This added or updated user characteristic information is used when the user uses the voice response service next time.
[0118]
In other words, if the situation where the user A is placed is “company” or “landline phone” next time and the emotion of the user A is “ordinary”, the modified response pattern “speedy reception” ”Will be handled.
[0119]
As a result, the next voice response service for the same user A can be further optimized.
[0120]
Further, the voice response service by the voice response system 10 can be performed in consideration of the current situation where the user A is placed and the current emotion of the user A.
Note that the user characteristic information (FIG. 9) described in the present embodiment is merely an example, and instead of this, for example, user characteristic information defined as shown in FIG. 23 may be used. In the user characteristic information, the sound level and the sound pitch are defined by the width (for example, 0 to 4 or 0 to 7). For this reason, it becomes possible to guess a user's emotion more appropriately.
When the user characteristic information of FIG. 22 is used, this user characteristic information is corrected (recreated) as shown in FIG. 24, for example. In this case, for example, the general user characteristic information shown in FIG. 25 is used.
[0121]
The present invention can be implemented in various other forms without departing from the spirit or main features thereof. For this reason, said embodiment is only a mere illustration in all points, and is not interpreted limitedly.
[0122]
The present invention can also be specified as follows.
(Supplementary note 1) A voice response system that provides voice response services to users, a voice input unit that generates voice information from user voices, and a situation that identifies the current situation in which the user is placed A situation where a specific unit, an emotion analysis unit that estimates a user's current emotion, a voice recognition unit that performs voice recognition processing on voice information generated by the voice input unit, and a user are placed A storage unit storing user characteristic information including personal characteristic information in which a user's emotion and a response pattern are associated with each other, and a current situation in which the user specified by the situation specifying unit is placed A response pattern corresponding to the user's current emotion estimated by the emotion analysis unit, a response pattern determination unit that acquires from the personal characteristic information, and a response pattern acquired by the response pattern determination unit A voice response system comprising: a response processing unit that acquires an utterance sentence corresponding to a recognition result by the voice recognition unit; and a voice output unit that reads out the utterance sentence acquired by the response processing unit.
(Additional remark 2) The voice response system of Additional remark 1 further provided with the update part which updates the correspondence of the said user's emotion and a response pattern.
(Supplementary note 3) The voice response system according to supplementary note 1, wherein the emotion analysis unit estimates a user's current emotion by performing comparative analysis using personal voice characteristic information stored in advance.
(Additional remark 4) Based on the said user characteristic information, the audio | voice information produced | generated by the said audio | voice input part, and the user's response history from the start to the end of the said voice response service, user characteristic information is added and updated The voice response system according to attachment 3, further comprising a user analysis unit.
(Additional remark 5) The said response history contains the timing information of the audio | voice output by the said voice response service, and the start / end of a user's voice input, and user characteristic information is when the voice response service was used in the past of the user. The voice response system according to appendix 4, including timing information of voice output by the voice response service and start / end timing of voice input by the user.
(Supplementary note 6) A voice response method using a voice response system that provides voice response services to users, generating voice information from user voices, and specifying the current situation in which the user is placed Performing a step of estimating a user's current emotion, performing a speech recognition process on the generated speech information, a current situation in which the identified user is placed, and the step Obtaining a response pattern corresponding to the inferred user's current emotion from personal characteristic information in which the user's emotion and response pattern are associated for each situation in which the user is placed; and A voice response method comprising: obtaining an utterance sentence corresponding to the received response pattern and the voice recognition result; and reading the acquired utterance sentence.
(Supplementary note 7) A voice response system that provides a voice response service to a user is a method of providing a voice response service that performs a response according to a user's emotion estimated from the voice information of the user, User characteristics information inferred from response history including changes in user's emotions from the start to the end of the voice response service, the user's situation, the user's situation, and the user characteristics A voice response service providing method for changing a response order and response contents to a user based on emotion information in consideration of a situation of the user obtained based on information.
(Supplementary note 8) A voice response system that provides a voice response service to a user, including a voice input unit that generates voice information from the user's voice, and a user characteristic by estimating user characteristic information of the user A user analysis unit that generates information, an emotion analysis unit that estimates emotions of users and generates emotion information, a scenario storage unit that stores sentences required to create spoken sentences, and a user based on voice information A voice response comprising: a response processing unit that creates a spoken sentence from the scenario storage unit based on the user characteristic information and the emotion information; and a voice output unit that reads the spoken sentence system.
[0123]
【The invention's effect】
As described above, according to the present invention, since it is possible to respond according to emotion information obtained by taking into account user characteristic information and characteristic information, the following effects are obtained.
[0124]
First, it is possible to grasp the situation where the user is placed and respond flexibly according to the situation where the user is placed. Second, it is possible to grasp the user's emotions without being influenced by the user's way of speaking and to respond flexibly according to the emotions.
[0125]
It is also possible to respond flexibly according to the situation where the user is placed and the emotion of the user at that time. In addition, the user's emotion can be estimated more accurately by taking into account changes in user characteristics depending on the situation where the user is placed. Furthermore, it is possible to provide a voice response service that is nifty for the user.
These lead to improvement in customer satisfaction of the service, acquisition of repeaters, and increase in service users (customers).
[Brief description of the drawings]
FIG. 1 is a diagram for explaining the principle of a voice response system according to an embodiment;
FIG. 2 is a flowchart for explaining the operation of the voice response system according to the embodiment.
FIG. 3 is an operation principle diagram of the voice response system according to the present embodiment.
FIG. 4 is a flowchart for explaining an operation of a user analysis unit.
FIG. 5 is a diagram for explaining a schematic system configuration of a voice response system for hotel reservation.
FIG. 6 is a diagram for explaining a schematic system configuration of a voice response system for hotel reservation.
FIG. 7 is a flowchart for explaining the operation of the hotel reservation voice response system.
FIG. 8 is an example of user information.
FIG. 9 is an example of user characteristic information.
FIG. 10 is an example of personal characteristic information.
FIG. 11 is an example of personal voice characteristic information.
FIG. 12 is an example of audio information.
FIG. 13 is an example of audio characteristic information.
FIG. 14 is an example of emotion information.
FIG. 15 is an example of identification information.
FIG. 16 is an example of a scenario.
FIG. 17 is an example of audio information.
FIG. 18 is an example of an utterance sentence.
FIG. 19 is an example of response history information.
FIG. 20 is an example of re-created user characteristic information.
FIG. 21 is an example of general user characteristic information.
FIG. 22 is an example of a table used by the user analysis unit for analysis;
FIG. 23 is another example of user characteristic information.
FIG. 24 is another example of re-created user characteristic information.
FIG. 25 is another example of general user characteristic information.
FIG. 26 is a diagram for explaining the prior art.
[Explanation of symbols]
10 Voice response system
30 telephone
40 Public network
110 Voice input unit
120 Audio output unit
130 Response processing unit
140 User Authentication Department
150 Emotion Analysis Department
160 User Analysis Department
410 Scenario storage unit
420 Response history storage
430 User characteristic information storage

Claims

A voice response system that provides voice response services to users,
A voice input unit for generating voice information from user voice;
A situation identification unit that identifies the current situation where the user is located,
An emotion analysis unit that guesses the user's current emotions,
A voice recognition unit that performs voice recognition processing on the voice information generated by the voice input unit;
A storage unit for storing user characteristic information including personal characteristic information in which a user's emotion and a response pattern are associated with each other for each situation in which the user is placed;
A response pattern for acquiring a response pattern corresponding to the current status of the user specified by the status specification unit and the current emotion of the user estimated by the emotion analysis unit from the personal characteristic information A decision unit;
A response processing unit that acquires an utterance corresponding to the response pattern acquired by the response pattern determination unit and the recognition result by the voice recognition unit;
A voice response system comprising: a voice output unit that reads out an utterance sentence acquired by the response processing unit.

The voice response system according to claim 1, further comprising an update unit that updates a correspondence relationship between the user's emotion and a response pattern.

The voice response system according to claim 1, wherein the emotion analysis unit estimates a user's current emotion by performing comparative analysis using personal voice characteristic information stored in advance.

A user analysis unit that adds and updates user characteristic information based on the user characteristic information, the voice information generated by the voice input unit, and the response history of the user from the start to the end of the voice response service The voice response system according to claim 3, further comprising:

The response history includes timing information of start and end of voice output and voice input of the user by the voice response service,
5. The voice response system according to claim 4, wherein the user characteristic information includes timing information of voice output by the voice response service when the voice response service is used by the user in the past and start / end timing of voice input by the user. .