JP2020064168A

JP2020064168A - Guidance robot system and guidance method

Info

Publication number: JP2020064168A
Application number: JP2018195515A
Authority: JP
Inventors: 晋資大竹; Shinsuke Otake; 弘光本橋; Hiromitsu Motohashi; 安司高野; Yasushi Takano
Original assignee: Hitachi Building Systems Co Ltd
Current assignee: Hitachi Building Systems Co Ltd
Priority date: 2018-10-17
Filing date: 2018-10-17
Publication date: 2020-04-23
Anticipated expiration: 2038-10-17
Also published as: JP7117970B2; CN111055291B; CN111055291A

Abstract

【課題】案内サービスに使用する言語を、利用者の発話内容に基づき利用者の使用する言語に変更する技術を提供する。【解決手段】複数言語の会話を用いる案内ロボットシステムであり、音声を取得する音声取得部と、取得した音声に対して複数言語の音声認識を行う音声認識部と、取得した音声に対して複数言語の信頼度を算出する信頼度算出部を備える。また複数言語の音声認識結果を事前に登録したキーワードに照合して一致する言語を得るキーワード照合部と、複数言語の信頼度に基づき音声取得部で取得した音声の言語を特定する言語選択部と、信頼度に基づき会話内容を切り替える会話処理部と、を備える。【選択図】図６PROBLEM TO BE SOLVED: To provide a technique for changing a language used for a guidance service to a language used by a user based on the utterance content of the user. SOLUTION: This is a guidance robot system using conversations in a plurality of languages, a voice acquisition unit for acquiring voice, a voice recognition unit for performing voice recognition in a plurality of languages for the acquired voice, and a plurality of voice recognition units for the acquired voice. It is equipped with a reliability calculation unit that calculates the reliability of the language. In addition, a keyword matching unit that collates voice recognition results of multiple languages with keywords registered in advance to obtain a matching language, and a language selection unit that specifies the language of the voice acquired by the voice acquisition unit based on the reliability of multiple languages. , A conversation processing unit that switches conversation content based on reliability. [Selection diagram] FIG. 6

Description

本発明は、案内ロボットシステム及び案内方法に関する。 The present invention relates to a guidance robot system and a guidance method.

従来、ロボットが案内サービスを行う場合、案内ロボットが発話・認識する言語は、案内ロボットに設定されている言語を用いることになるが、この設定されている言語と異なる言語で会話する利用者は、ロボットによる案内サービスを利用することが困難であった。 Conventionally, when a robot provides a guidance service, the language spoken and recognized by the guidance robot uses the language set in the guidance robot, but a user who speaks in a language different from the set language is , It was difficult to use the guidance service by the robot.

一般的な言語の切替方法については、電子辞書のような装置で音声入力を受取り、入力された音声を音声識別してテキスト化し、その結果を予め言語毎に登録しておいたテキストと照合し、一致した言語へ切り替える技術が提案されている（例えば、特許文献１参照）。
また、ロボットにおける言語の切替方法に関しては、複数言語に対応する音声認識部で音声を認識し、その認識結果に対する信頼度を算出して、最も高い信頼度の言語へ切り替える技術が提案されている（例えば、特許文献２参照）。 As for a general language switching method, a device such as an electronic dictionary receives voice input, recognizes the input voice as voice, converts it into text, and compares the result with the text registered in advance for each language. , A technique for switching to a matching language has been proposed (for example, see Patent Document 1).
As for a language switching method in a robot, a technology has been proposed in which a voice recognition unit corresponding to a plurality of languages recognizes a voice, calculates reliability of the recognition result, and switches to a language with the highest reliability. (For example, refer to Patent Document 2).

特開２００１−２８２７８８号公報Japanese Patent Laid-Open No. 2001-228788 特開２０１８−０８７９４５号公報JP, 2008-087945, A

しかしながら、特許文献１に記載される技術では、ロボットは予め登録されたデータでしか言語の切り替えを行うことができない。このため、登録されていない内容を利用者が発話した場合には、言語を切り替えることができず、異なる言語で会話する利用者は、ロボットによる案内サービスを利用することが困難であった。 However, in the technique described in Patent Document 1, the robot can switch the language only with the data registered in advance. For this reason, when the user speaks unregistered content, the language cannot be switched, and it is difficult for the user who speaks in a different language to use the robot guidance service.

また、特許文献２に記載される技術では、複数言語の音声認識の信頼度がいずれも低かった場合に、言語の特定を誤る可能性が高くなるという問題があった。すなわち、周囲の雑音が多い場合や、発話音量が小さい場合、あるいは発話がはっきりとしない場合などに音声認識の信頼度が低くなるという問題である。 In addition, the technique described in Patent Document 2 has a problem that when the reliability of speech recognition of a plurality of languages is low, the possibility of erroneous language identification increases. That is, there is a problem that the reliability of voice recognition becomes low when there is a lot of ambient noise, when the utterance volume is low, or when the utterance is not clear.

本発明は、周囲の雑音等が多い場合であっても、利用者の発話内容と予め登録されたデータとを照合して、案内サービスに使用する言語を、利用者の使用する言語に正確に切り換えることができる案内ロボットシステム及び案内方法を提供することを目的とする。 The present invention compares the utterance content of the user with the data registered in advance to accurately set the language used for the guidance service to the language used by the user even when there is a lot of ambient noise. An object is to provide a guide robot system and a guide method that can be switched.

上記課題を解決するために、例えば特許請求の範囲に記載の構成を採用する。本願は、上記課題を解決する手段を複数含んでいるが、その一例を挙げるならば、本発明の案内ロボットシステムは、複数言語の会話を用いて案内サービスを行う案内ロボットシステムであって、音声を取得する音声取得部と、音声取得部で取得した音声に対して複数言語の音声認識を行う音声認識部と、音声取得部で取得した音声に対して複数言語の信頼度を算出する信頼度算出部を備える。
また、音声認識部で得た複数言語の音声認識結果を事前に登録したキーワードに照合して一致する言語を得るキーワード照合部と、信頼度算出部で得た複数言語の信頼度に基づき音声取得部で取得した音声の言語を特定する言語選択部と、信頼度算出部で得た信頼度に基づき会話内容を切り替える会話処理部と、を備える。 In order to solve the above problems, for example, the configurations described in the claims are adopted. The present application includes a plurality of means for solving the above problems. To give an example, the guide robot system of the present invention is a guide robot system that provides a guide service using conversations in multiple languages. The voice acquisition unit that acquires the voice, the voice recognition unit that performs voice recognition in multiple languages on the voice acquired by the voice acquisition unit, and the reliability that calculates the reliability of the multiple languages for the voice acquired by the voice acquisition unit A calculator is provided.
In addition, a keyword matching unit that obtains a matching language by matching the speech recognition results of multiple languages obtained by the speech recognition unit with a keyword registered in advance, and voice acquisition based on the reliability of the multiple languages obtained by the reliability calculation unit. A language selection unit that specifies the language of the voice acquired by the unit, and a conversation processing unit that switches the conversation content based on the reliability obtained by the reliability calculation unit.

本発明によれば、複数言語の音声認識の信頼度がいずれも閾値より低い場合であっても、予め登録されたデータとの一致による言語の切り替えを円滑に実現することが可能になる。
上記した以外の課題、構成および効果は、以下の実施形態の説明により明らかにされる。 According to the present invention, even if the reliability of speech recognition in a plurality of languages is lower than a threshold value, it is possible to smoothly realize language switching based on matching with pre-registered data.
Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

本発明の第１の実施の形態例における案内ロボットシステム全体の構成図である。It is a block diagram of the whole guide robot system in the 1st Embodiment of this invention. 本発明の第１の実施の形態例に用いられるロボットの構成例を示す図である。It is a figure which shows the structural example of the robot used for the 1st Embodiment of this invention. 本発明の第１の実施の形態例に用いられるロボット管理サーバの構成例を示す図である。It is a figure which shows the structural example of the robot management server used for the 1st Embodiment of this invention. 本発明の第１の実施の形態例に用いられるロボット制御装置の構成例を示す図である。It is a figure which shows the structural example of the robot control apparatus used for the 1st Embodiment of this invention. 本発明の第１の実施の形態例における言語選択を含む会話機能の一例を示す図である。It is a figure which shows an example of the conversation function containing the language selection in the 1st Embodiment of this invention. 本発明の第１の実施の形態例において、言語切替を行って会話を実行する処理を説明するフローチャートの例である。5 is an example of a flowchart illustrating a process of performing a language switch and conversation in the first exemplary embodiment of the present invention. 本発明の第１の実施の形態例に用いられるキーワードテーブルの例を示す図である。It is a figure which shows the example of the keyword table used for the 1st Embodiment of this invention. 本発明の第１の実施の形態例に用いられるクローズドクエスチョン会話テーブルの例を示す図である。It is a figure which shows the example of the closed question conversation table used for the 1st Embodiment of this invention. 本発明の第１の実施の形態例に用いられるオープンクエスチョン会話テーブルの例を示す図である。It is a figure which shows the example of the open question conversation table used for the 1st Embodiment of this invention.

＜案内ロボットシステムの全体構成＞
以下、図面を参照して、本発明の実施の形態例（以下、「本例」と称する）である案内ロボットシステムと、その言語選択方法について説明する。
図１は、案内ロボットシステム全体の構成例を示した図である。案内ロボットシステム１は、ロボット１００と、ロボット制御装置２００と、ネットワークを介してロボット制御装置２００に接続されるロボット管理サーバ３００から構成される。 <Overall configuration of guide robot system>
Hereinafter, a guide robot system according to an embodiment of the present invention (hereinafter referred to as “this example”) and a language selection method thereof will be described with reference to the drawings.
FIG. 1 is a diagram showing a configuration example of the entire guide robot system. The guide robot system 1 includes a robot 100, a robot controller 200, and a robot management server 300 connected to the robot controller 200 via a network.

本例の案内ロボットシステム１は、ロボット１００が複数の言語を用いて案内サービスを行うシステムである。ロボット１００とロボット制御装置２００は無線通信で接続されており、案内サービスが実施される建物２の敷地内に配置されている。ロボット１００は、ロボット制御装置２００から制御命令を受取り、利用者に建物２内の設備やテナントの場所、テナントの提供する商品、サービス、建物２の周辺の施設等を紹介する案内サービスを実施する。 The guidance robot system 1 of this example is a system in which the robot 100 provides guidance services using a plurality of languages. The robot 100 and the robot control device 200 are connected to each other by wireless communication, and are arranged on the premises of the building 2 where the guidance service is implemented. The robot 100 receives a control command from the robot control device 200, and provides a guide service that introduces the user to the facilities in the building 2, the location of the tenant, the products and services provided by the tenant, the facilities around the building 2, and the like. .

図１では、建物２に対して、１台のロボット制御装置２００が１台のロボット１００を制御する例が示されているが、１台のロボット制御装置２００により、複数台のロボット１００を制御するようにしてもよく、また、建物２の内部に複数台のロボット制御装置２００を設置してもよい。
また、建物２内に複数のロボット１００が配置される場合、それぞれのロボット１００が異なる案内サービスを行うようにしてもよい。 Although FIG. 1 shows an example in which one robot control device 200 controls one robot 100 with respect to the building 2, one robot control device 200 controls a plurality of robots 100. Alternatively, a plurality of robot control devices 200 may be installed inside the building 2.
Moreover, when a plurality of robots 100 are arranged in the building 2, each robot 100 may perform different guidance services.

ロボット制御装置２００は、ネットワーク３を介してロボット管理サーバ３００に接続される。図１では、一つの建物２のロボット制御装置２００だけがロボット管理サーバ３００と接続されているが、複数の建物２内に配置されるロボット制御装置２００がロボット管理サーバ３００と接続されるようにしてもよい。 The robot controller 200 is connected to the robot management server 300 via the network 3. In FIG. 1, only the robot control device 200 in one building 2 is connected to the robot management server 300, but the robot control devices 200 arranged in a plurality of buildings 2 are connected to the robot management server 300. May be.

ロボット管理サーバ３００は、どの建物２にどのロボット１００が配置されているかを管理するとともに、各ロボット１００が正常に動作しているか、メンテナンスが必要かどうかなどの状態を管理する。このように、ロボット管理サーバ３００がロボット１００を管理することで、管理者は、ロボット１００のメンテナンスが必要になった場合等に早急に対応することができる。 The robot management server 300 manages which robot 100 is placed in which building 2, and also manages states such as whether each robot 100 is operating normally and whether maintenance is required. In this way, the robot management server 300 manages the robot 100, so that the administrator can promptly respond to the need for maintenance of the robot 100.

＜ロボットの構成例＞
図２は、ロボット１００の構成例を示す図である。ロボット１００は、ＣＰＵ（Central Processing Unit）１１０、記憶装置１２０，入出力装置１３０及び通信インターフェース１４０を備える。
ＣＰＵ１１０は、ロボット１００の各部の処理の制御を行う。記憶装置１２０には、様々なソフトウェアモジュールやデータが記憶されている。 <Robot configuration example>
FIG. 2 is a diagram showing a configuration example of the robot 100. The robot 100 includes a CPU (Central Processing Unit) 110, a storage device 120, an input / output device 130, and a communication interface 140.
The CPU 110 controls processing of each unit of the robot 100. Various software modules and data are stored in the storage device 120.

記憶装置１２０は、駆動機構を制御する駆動制御部１２１と、会話を制御する会話制御部１２２と、入出力装置１３０とのデータの入出力を行う入出力部１２３を備える。
入出力装置１３０は、周囲の映像や画像を撮影するカメラ１３１と、周囲の音を収音するマイク１３２を備える。また、入出力装置１３０は、ロボット１００の傾きや回転などの姿勢を検出するジャイロセンサ１３３と、周囲の物体との距離を測定する測域センサ１３４と、音声を発するスピーカー１３５と、ロボット１００の移動を行ったり、関節を可動させたりする駆動機構１３６を備える。 The storage device 120 includes a drive control unit 121 that controls a drive mechanism, a conversation control unit 122 that controls conversation, and an input / output unit 123 that inputs / outputs data to / from the input / output device 130.
The input / output device 130 includes a camera 131 that captures surrounding images and images, and a microphone 132 that collects ambient sounds. The input / output device 130 also includes a gyro sensor 133 that detects a posture of the robot 100 such as tilt and rotation, a range sensor 134 that measures a distance to an object around the robot 100, a speaker 135 that emits sound, and the robot 100. A drive mechanism 136 for moving and moving a joint is provided.

通信インターフェース１４０は、入出力装置１３０のカメラ１３１からの映像やマイク１３２からの音声を取得して、ロボット制御装置２００に送信する。また、通信インターフェース１４０は、ロボット制御装置２００からの制御命令を受信する。
ロボット１００は、通信インターフェース１４０で受信したロボット制御装置２００からの制御指令に基づいて、駆動制御部１２１、会話制御部１２２、及び入出力部１２３を制御し、案内サービスを実施する。 The communication interface 140 acquires a video from the camera 131 of the input / output device 130 and a voice from the microphone 132, and transmits the video to the robot controller 200. The communication interface 140 also receives a control command from the robot controller 200.
The robot 100 controls the drive control unit 121, the conversation control unit 122, and the input / output unit 123 based on the control command from the robot control device 200 received by the communication interface 140, and implements the guidance service.

また、ロボット１００は、ロボット制御部２００から移動の指示を受けると、駆動機構１３５により建物２内を移動する。そして、ロボット１００は、移動中に測域センサ１３４からの信号に基づいて障害物を検知し、駆動制御部１２１により自律的に移動を停止したり、障害物を回避したりする。 In addition, when the robot 100 receives a movement instruction from the robot control unit 200, the driving mechanism 135 moves the robot 100 inside the building 2. Then, the robot 100 detects an obstacle based on a signal from the range sensor 134 while moving, and the drive controller 121 autonomously stops the movement or avoids the obstacle.

＜管理サーバの構成例＞
図３は、ロボット管理サーバ３００の構成例を示す図である。ロボット管理サーバ３００は、ＣＰＵ３１０と、ロボット配置管理部３２１を含む記憶装置３２０と、通信インターフェース３３０を備える。
ロボット管理サーバ３００は、通信インターフェース３３０を介して、ロボット制御装置２００と接続され、ロボット制御装置２００を介してロボット配置管理部３２１により各ロボットの状態を管理する。 <Management server configuration example>
FIG. 3 is a diagram showing a configuration example of the robot management server 300. The robot management server 300 includes a CPU 310, a storage device 320 including a robot placement management unit 321, and a communication interface 330.
The robot management server 300 is connected to the robot control device 200 via the communication interface 330, and manages the state of each robot by the robot placement management unit 321 via the robot control device 200.

＜ロボット制御装置の構成例＞
図４は、ロボット制御装置２００の構成例を示す図である。ロボット制御装置２００は、各部の処理を制御するＣＰＵ２１０と、各ソフトウェアモジュール、テーブル等のデータを記憶する記憶装置２２０と、ロボット１００及びロボット管理サーバ３００と相互通信を行う通信インターフェース２１１を備える。 <Example of robot controller configuration>
FIG. 4 is a diagram showing a configuration example of the robot control device 200. The robot control device 200 includes a CPU 210 that controls processing of each unit, a storage device 220 that stores data such as software modules and tables, and a communication interface 211 that performs mutual communication with the robot 100 and the robot management server 300.

ＣＰＵ２１０は、記憶装置２２０に記憶されたプログラムを読みだすことで、各種の制御機能を実行する。すなわち、ＣＰＵ２１０は、記憶装置２２０に記憶されているプログラムを読み出すことにより、入出力データ処理部２３０、サービスフロー処理部２４０、対面検知部２５０、音声処理部２６０、言語選択部２７０、会話処理部２８０、移動指示部２９０として示される各機能を実現する。 The CPU 210 executes various control functions by reading the programs stored in the storage device 220. That is, the CPU 210 reads out the program stored in the storage device 220 to thereby input / output data processing unit 230, service flow processing unit 240, meeting detection unit 250, voice processing unit 260, language selection unit 270, conversation processing unit. 280 and each function shown as the movement instruction unit 290 are realized.

入出力データ処理部２３０は、音声取得部２３１、音声出力部２３２、画像取得部２３３、動作出力部２３４、測域データ取得部２３５、エラー入出力部２３６を含む。
この入出力データ処理部２３０は、ロボット１００から受信したデータの処理及びロボット１００並びにロボット管理サーバ３００へ送信するデータの処理を行う部分である。 The input / output data processing unit 230 includes a voice acquisition unit 231, a voice output unit 232, an image acquisition unit 233, an operation output unit 234, a range data acquisition unit 235, and an error input / output unit 236.
The input / output data processing unit 230 is a unit that processes the data received from the robot 100 and the data transmitted to the robot 100 and the robot management server 300.

音声取得部２３１は、ロボット１００から受信した音声データを処理し、音声出力部２３２は、ロボット１００に発話させるために送信する音声データを処理する。
画像取得部２３３は、ロボット１００から受信した画像データを処理し、動作出力部２３４は、ロボット１００を稼働させるためのデータを出力する処理を行う。
また、測域データ取得部２３５は、ロボット１００から受信した測域センサの出力を処理し、エラー出力部は、ロボット管理サーバ３００へ送信するエラーログに関するデータの処理を行う。 The voice acquisition unit 231 processes the voice data received from the robot 100, and the voice output unit 232 processes the voice data transmitted to cause the robot 100 to speak.
The image acquisition unit 233 processes the image data received from the robot 100, and the motion output unit 234 performs a process of outputting data for operating the robot 100.
Further, the range data acquisition unit 235 processes the output of the range sensor received from the robot 100, and the error output unit processes the data related to the error log transmitted to the robot management server 300.

サービスフロー処理部２４０は、図６で後述するサービスフローに基づいた案内サービスを実行する。
対面検知部２５０は、ロボット１００の前に対面している人がいることを検知する。すなわち、対面検知部２５０は、ロボット１００のカメラ１３１及び測域センサ１３４から得られる画像情報及び障害物情報を取得し、これらの情報に基づいて、ロボット１００が利用者と対面状態になっているか否かを検知する。 The service flow processing unit 240 executes the guidance service based on the service flow described later with reference to FIG.
The face-to-face detection unit 250 detects that there is a person facing the robot 100. That is, the facing detection unit 250 acquires the image information and the obstacle information obtained from the camera 131 and the range sensor 134 of the robot 100, and based on these information, is the robot 100 in a facing state with the user? Detect whether or not.

本例の案内ロボットシステムでは、ロボット１００が利用者と対面状態である間に案内サービスが行われ、利用者がロボット１００から離れて対面状態でなくなれば案内サービスが中止される。なお、ロボット１００が誤った言語で案内サービスを開始した場合には、利用者はロボット１００の前から離れて対面状態ではなくなるので、ロボット１００による案内サービスを中止させる。 In the guide robot system of this example, the guide service is provided while the robot 100 is in the face-to-face state with the user, and the guide service is stopped when the user leaves the robot 100 and is no longer in the face-to-face state. If the robot 100 starts the guidance service in the wrong language, the user leaves the front of the robot 100 and is no longer in a face-to-face state, so the guidance service by the robot 100 is stopped.

＜音声選択を含む会話機能の例＞
図５は、本例の案内ロボットシステムにおける、言語選択を含む会話機能の一例を表した図であり、音声処理部２６０、言語選択部２７０及び会話処理部２８０の機能を示している。
音声処理部２６０は、第一言語音声処理部２６１、第二言語音声処理部２６２、及び第三言語音声処理部２６３を備える。 <Example of conversation function including voice selection>
FIG. 5 is a diagram showing an example of a conversation function including language selection in the guide robot system of this example, and shows the functions of the voice processing unit 260, the language selection unit 270, and the conversation processing unit 280.
The voice processing unit 260 includes a first language voice processing unit 261, a second language voice processing unit 262, and a third language voice processing unit 263.

第一言語音声処理部２６１は、例えば日本語を処理する処理部であり、第一言語音声認識部２６１１及び第一言語信頼度算出部２６１２を含む。
第二言語音声処理部２６２は、例えば英語を処理する処理部であり、第二言語音声認識部２６２１及び第二言語信頼度算出部２６２２を含む。
第三言語音声処理部２６３は、例えば中国語を処理する処理部であり、第三言語音声認識部２６３１及び第三言語信頼度算出部２６３２を含む。 The first language speech processing unit 261 is a processing unit that processes Japanese, for example, and includes a first language speech recognition unit 2611 and a first language reliability calculation unit 2612.
The second language speech processing unit 262 is, for example, a processing unit that processes English, and includes a second language speech recognition unit 2621 and a second language reliability calculation unit 2622.
The third language speech processing unit 263 is, for example, a processing unit that processes Chinese, and includes a third language speech recognition unit 2631 and a third language reliability calculation unit 2632.

ロボット制御装置２００（図４）の音声取得部２３１でロボット１００から取得した音声データは、第一言語音声処理部２６１、第二言語音声処理部２６２及び第三言語音声処理部２６３に供給され、三つの言語の並列処理が行われる。 The voice data acquired from the robot 100 by the voice acquisition unit 231 of the robot controller 200 (FIG. 4) is supplied to the first language voice processing unit 261, the second language voice processing unit 262, and the third language voice processing unit 263. Parallel processing of three languages is performed.

第一言語音声処理部２６１は、第一言語音声認識部２６１１で音声データを音声認識して第一言語である日本語でテキスト化し、第一言語信頼度算出部２６１２でその信頼度を算出する。
第二言語音声処理部２６２は、第二言語音声認識部２６２１で音声データを音声認識して第二言語である英語でテキスト化し、第二言語信頼度算出部２６２２でその信頼度を算出する。
第三言語音声処理部２６３は、第三言語音声認識部２６３１で音声データを音声認識して第三言語である中国語でテキスト化し、第三言語信頼度算出部２６３２でその信頼度を算出する。 In the first language voice processing unit 261, the first language voice recognition unit 2611 performs voice recognition on the voice data to convert it into text in Japanese as the first language, and the first language reliability calculation unit 2612 calculates the reliability thereof. .
In the second-language speech processing unit 262, the second-language speech recognition unit 2621 speech-recognizes the speech data into a text in English as the second language, and the second-language reliability calculation unit 2622 calculates the reliability thereof.
In the third language voice processing unit 263, the third language voice recognition unit 2631 performs voice recognition on the voice data to convert it into text in Chinese as the third language, and the third language reliability calculation unit 2632 calculates the reliability thereof. .

信頼度は、０〜１の間の数値で決定されるものであり、「０」は一致度が最も低い値であり、「１」は一致度が最も高い値である。例えば、利用者が話す言語が日本語であれば、第一言語信頼度算出部２６１２で算出した信頼度は、「１」に近い値になり、英語を処理する第二言語信頼度算出部２６２２及び中国語を処理する第二言語信頼度算出部２６３２で算出した信頼度は、「０」に近い値になる。しかし、実際には、利用者の話す言語が日本語、英語、または中国語といった特定の言語に認識されない場合も多く、信頼度は０〜１の間の数値として算出されることが多くなる。 The reliability is determined by a numerical value between 0 and 1, "0" is the lowest matching value, and "1" is the highest matching value. For example, if the language spoken by the user is Japanese, the reliability calculated by the first language reliability calculation unit 2612 becomes a value close to “1”, and the second language reliability calculation unit 2622 that processes English. The reliability calculated by the second language reliability calculation unit 2632 for processing Chinese and Chinese is close to “0”. However, in reality, the language spoken by the user is often not recognized by a specific language such as Japanese, English, or Chinese, and the reliability is often calculated as a numerical value between 0 and 1.

この音声認識結果に対する信頼度の研究は、音声認識の後処理として、認識結果を受け入れるか、あるいは棄却するかの発話検証問題として研究されている。この研究では、入力された音声に対して、もっとも尤度の高い単語列を出力するという処理が必要であるため、正しい認識結果と認識誤りとを判定するための尺度となる閾値が必要になる。例えば、信頼度を０〜１の範囲で表現する場合、閾値を０と１の中間の値である、例えば０．５のような値に設定することも考えられる。
この信頼度の算出方法には、いくつかの方法が考えられるが、例えば、駒谷、河原著「音声認識結果の信頼度を用いた効率的な確認・誘導を行う対話処理」（情報処理学会論文誌、Vol.43、No.10、pp3078-3086）が知られている。 The research on the reliability of the voice recognition result is conducted as a post-processing of the voice recognition as a speech verification problem of accepting or rejecting the recognition result. In this research, the input speech is required to output the word string with the highest likelihood, so a threshold value is required to judge the correct recognition result and recognition error. . For example, when the reliability is expressed in the range of 0 to 1, it is possible to set the threshold to a value that is an intermediate value between 0 and 1, such as 0.5.
There are several possible methods for calculating the reliability. For example, Komaya and Kawahara “Dialogue processing for efficient confirmation / guidance using reliability of speech recognition result” (Information Processing Society of Japan Magazine, Vol.43, No.10, pp3078-3086) is known.

言語選択部２７０は、キーワード照合部２７１と、信頼度比較部２７２と、選択言語格納部２７３と、キーワードテーブル２７４を含む。
キーワード照合部２７１は、各言語の音声認識結果のテキストとキーワードテーブル２７４に登録された各言語のキーワードとを照合し、一致するキーワードとその言語を求める。信頼度比較部２７２は、各言語の信頼度を比較し、信頼度が最も高い言語を求める。選択言語格納部２７３には、キーワード照合部２７１における照合の結果、キーワードが一致した言語が格納されるとともに、信頼度比較部２７２で求められた信頼度の最も高い言語が格納される。 The language selection unit 270 includes a keyword matching unit 271, a reliability comparison unit 272, a selected language storage unit 273, and a keyword table 274.
The keyword collating unit 271 collates the text of the voice recognition result of each language with the keyword of each language registered in the keyword table 274, and finds the matching keyword and its language. The reliability comparison unit 272 compares the reliability of each language and finds the language with the highest reliability. The selected language storage unit 273 stores the language in which the keyword matches as a result of the matching by the keyword matching unit 271, and the language with the highest reliability obtained by the reliability comparing unit 272.

会話処理部２８０は、第一言語会話作成部２８１と、第二言語会話作成部２８２と、第三言語会話作成部２８３と、クローズドクエスチョン会話テーブル２８４と、オープンクエスチョン会話テーブル２８５を含む。
ここで、クローズドクエスチョン形式の会話とは、回答範囲を限定した質問を行う会話形式を意味し、オープンクエスチョン形式の会話とは、回答に制約を設けずに相手に自由に答えさせる質問を行う会話形式を意味する。クローズドクエスチョン形式の会話を行うことで、利用者が言語選択部２７０の選択言語格納部２７３に格納された言語で発話することが期待される。 The conversation processing unit 280 includes a first language conversation creating unit 281, a second language conversation creating unit 282, a third language conversation creating unit 283, a closed question conversation table 284, and an open question conversation table 285.
Here, the closed question style conversation means a conversation style in which a question with a limited range of answers is asked, and the open question style conversation means a conversation in which the other person is allowed to freely answer the question without restricting the answer. Means format. By conducting a conversation in the closed question format, it is expected that the user speaks in the language stored in the selected language storage unit 273 of the language selection unit 270.

第一言語会話作成部２８１は、入力されたテキストに対してクローズドクエスチョン会話テーブル２８４もしくはオープンクエスチョン会話テーブル２８５に基づいて第一言語（例えば、日本語）のシステム発話を作成する。システム発話とは、ロボットが発生する音声（発話）のことである。 The first language conversation creation unit 281 creates a system utterance in the first language (for example, Japanese) for the input text based on the closed question conversation table 284 or the open question conversation table 285. The system utterance is a voice (utterance) generated by the robot.

第二言語会話作成部２８２は、入力されたテキストに対してクローズドクエスチョン会話テーブル２８４もしくはオープンクエスチョン会話テーブル２８５に基づいて第二言語（例えば、英語）のシステム発話を作成する。
第三言語会話作成部２８３は、入力されたテキストに対してクローズドクエスチョン会話テーブル２８４もしくはオープンクエスチョン会話テーブル２８５に基づいて第三言語（例えば、中国語）のシステム発話を作成する。 The second language conversation creation unit 282 creates a system utterance in a second language (for example, English) based on the closed question conversation table 284 or the open question conversation table 285 for the input text.
The third language conversation creation unit 283 creates a system utterance in a third language (for example, Chinese) for the input text based on the closed question conversation table 284 or the open question conversation table 285.

クローズドクエスチョン会話テーブル２８４には、図８で後述するように、選択言語に対するシステム発話が登録されている。つまり、クローズドクエスチョン会話テーブル２８４には、利用者の応答発話が限定されるような質問（クローズドクエスチョン）がシステム発話として登録されている。 In the closed question conversation table 284, as will be described later with reference to FIG. 8, system utterances for the selected language are registered. That is, in the closed question conversation table 284, a question (closed question) that limits the response utterance of the user is registered as a system utterance.

オープンクエスチョン会話テーブル２８５には、図９で後述するように、選択言語及び利用者の発話に対するシステム発話と会話終了フラグが登録されている。つまり、オープンクエスチョン会話テーブル２８５には、利用者の発話応答が限定されない質問（オープンクエスチョン）がシステム発話として登録される。会話終了フラグは、会話を継続して行うか、終了するかを示すフラグであり、ＦＡＬＳＥの場合は会話を継続し、ＴＲＵＥの場合は会話を終了する。オープンクエスチョンの最初の質問に対する会話終了フラグは、会話を継続する必要があるので、すべてＦＡＬＳＥになっている。 In the open question conversation table 285, as will be described later with reference to FIG. 9, the system utterance and the conversation end flag for the selected language and the utterance of the user are registered. That is, in the open question conversation table 285, a question (open question) in which the user's utterance response is not limited is registered as a system utterance. The conversation end flag is a flag indicating whether to continue or end the conversation. In the case of FALSE, the conversation is continued, and in the case of TRUE, the conversation is ended. The conversation end flags for the first question in the open question are all FALSE because the conversation needs to continue.

なお、本例の案内ロボットシステムでは、日本語、英語、中国語といった３言語への対応例を示しているが、２言語もしくは４言語以上に対応するように構成することも可能である。
音声出力部２３２は、会話処理部２８０で作成されたシステム発話をロボット１００に送信し、ロボット１００に第一言語、第二言語または第三言語のいずれかの言語で発話させる。 Note that the guide robot system of this example shows an example of supporting three languages such as Japanese, English, and Chinese, but it can be configured to support two languages or four or more languages.
The voice output unit 232 transmits the system utterance created by the conversation processing unit 280 to the robot 100 and causes the robot 100 to utter in any one of the first language, the second language, and the third language.

＜言語選択を含む会話のフローチャート＞
図６は、本例の案内ロボットシステム１の処理のうち、国際空港利用者に対する言語選択を含む会話処理の一例を示すフローチャートである。なお、会話処理は、利用者がロボット１００に対面し、ロボット制御装置２００が対面検知することにより開始される。 <Flow chart of conversation including language selection>
FIG. 6 is a flowchart showing an example of a conversation process including language selection for an international airport user among the processes of the guide robot system 1 of this example. The conversation process is started when the user faces the robot 100 and the robot control device 200 detects the face-to-face.

まず、ロボット制御装置２００は、音声取得部２３１でロボット１００からの音声取得を待つ（Ｓ１）。ステップＳ１で音声が取得される（Ｓ１のＹＥＳ）と、第一言語音声処理部２６１の第一言語音声認識部２６１１では、第一言語（例えば、日本語）で音声認識され（Ｓ２）、第一言語信頼度算出部２６１２により第一言語の信頼度が計算される（Ｓ３）。ここで、利用者が第一言語（日本語）を話していることがロボット制御装置２００によって認識された場合には、第一言語信頼度算出部２６１２で算出された信頼度が「１」に近い高い値になる。 First, in the robot control device 200, the voice acquisition unit 231 waits for voice acquisition from the robot 100 (S1). When voice is acquired in step S1 (YES in S1), the first language voice recognition unit 2611 of the first language voice processing unit 261 performs voice recognition in the first language (for example, Japanese) (S2), The one-language reliability calculation unit 2612 calculates the reliability of the first language (S3). Here, when the robot controller 200 recognizes that the user speaks the first language (Japanese), the reliability calculated by the first language reliability calculation unit 2612 becomes “1”. It will be a high value close.

また、同時に、第二言語音声処理部２６２の第二言語音声認識部２６２１では、第二言語（例えば、英語）で音声認識され（Ｓ４）、第二言語信頼度算出部２６２２により第二言語の信頼度が計算される（Ｓ５）。同様に、第三言語音声処理部２６３の第三言語音声認識部２６３１では、第三言語（例えば、中国語）で音声認識され（Ｓ６）、第三言語信頼度算出部２６３２により第二言語の信頼度が計算される（Ｓ７）。 At the same time, the second language voice recognition unit 2621 of the second language voice processing unit 262 recognizes the voice in the second language (for example, English) (S4), and the second language reliability calculation unit 2622 detects the second language. The reliability is calculated (S5). Similarly, the third language voice recognition unit 2631 of the third language voice processing unit 263 performs voice recognition in the third language (for example, Chinese) (S6), and the third language reliability calculation unit 2632 recognizes the second language. The reliability is calculated (S7).

なお、利用者が第一言語（日本語）を話していることがロボット制御装置２００によって認識された場合には、第二言語信頼度算出部２６２２と第三言語信頼度算出部２６３２で算出する信頼度は「０」に近い低い値となる。
なお、ステップＳ１で音声取得がなされない場合（Ｓ１のＮＯ）には、ロボット１００に対して音声入力がなされるまで待機する。 When the robot control device 200 recognizes that the user speaks the first language (Japanese), the second language reliability calculation unit 2622 and the third language reliability calculation unit 2632 calculate. The reliability has a low value close to “0”.
If voice acquisition is not performed in step S1 (NO in S1), the robot 100 waits until voice is input.

ステップＳ２〜ステップＳ７で、第一言語から第三言語での音声認識および信頼度算出が行われると、キーワード照合部２７１は、各言語の音声認識結果がキーワードテーブル２７４に登録されているキーワードと一致するかどうかを照合する（Ｓ８）。
ステップＳ８のキーワード照合部２７１の処理で、第一から第三のどの言語のキーワードにも一致しない場合（Ｓ８のＮＯ）には、第一言語から第三言語に対して信頼度比較部２７２にて信頼度が閾値以上の言語があるか否かが判定される（Ｓ９）。 When the voice recognition and the reliability calculation in the first language to the third language are performed in steps S2 to S7, the keyword matching unit 271 determines that the voice recognition result of each language corresponds to the keywords registered in the keyword table 274. It is checked whether they match (S8).
In the processing of the keyword matching unit 271 in step S8, when the keyword does not match any of the first to third languages (NO in S8), the reliability comparing unit 272 for the first language to the third language is performed. Then, it is determined whether or not there is a language whose reliability is equal to or higher than a threshold value (S9).

ステップＳ９で信頼度が所定の閾値以上の言語がないと判定された場合（Ｓ９のＮＯ）には、信頼度比較部２７２にて各言語の信頼度を比較し、信頼度が最も高い言語が選択言語格納部２７３に格納される（Ｓ１０）。
なお、閾値は、０〜１の間の任意の値に設定することができるものであるが、例えば閾値を「０．５」に設定した場合には、第一言語から第三言語のすべての信頼度が「０．５」以下であれば、その中の最も高い信頼度（例えば、０．４５）を持つ言語が選択言語格納部２７３に格納されることになる。 When it is determined in step S9 that there is no language whose reliability is equal to or higher than the predetermined threshold (NO in S9), the reliability comparing unit 272 compares the reliability of each language, and the language with the highest reliability is determined. It is stored in the selected language storage unit 273 (S10).
The threshold value can be set to any value between 0 and 1, but when the threshold value is set to "0.5", for example, all the first to third languages are set. If the reliability is “0.5” or less, the language having the highest reliability (for example, 0.45) among them is stored in the selected language storage unit 273.

続いて、会話処理部２８０において、第一言語会話作成部２８１、第二言語会話作成部２８２及び第三言語会話作成部２８３のうちの選択された言語の会話作成部は、クローズドクエスチョン会話テーブル２８４に基づいてシステム発話を作成する（Ｓ１１）。
ステップＳ１１で、クローズドクエスチョンのシステム発話が作成されると、ロボット制御装置２００の音声出力部２３２がロボット１００へ音声データを送信し、ロボット１００はスピーカー１３５でその音声データを再生して発話する（Ｓ１２）。 Then, in the conversation processing unit 280, the conversation creating unit of the selected language among the first language conversation creating unit 281, the second language conversation creating unit 282, and the third language conversation creating unit 283 is the closed question conversation table 284. A system utterance is created based on (S11).
In step S11, when the closed question system utterance is created, the voice output unit 232 of the robot controller 200 transmits voice data to the robot 100, and the robot 100 reproduces the voice data by the speaker 135 and speaks ( S12).

そして、ロボット１００は、クローズドクエスチョンに対する利用者の応答による音声が取得されるまで待機する（Ｓ１３）。ステップＳ１３で音声取得がなされなかった場合には（Ｓ１３のＮＯ）、ステップＳ１に戻って、次の音声取得を待つ。
ステップＳ１３で音声が取得された場合（Ｓ１３のＹＥＳ）には、ロボット制御装置２００は、ロボット１００と利用者との会話成立をリトライし、リトライ回数が所定の回数を超えたか否かを判定する（Ｓ１４）。 Then, the robot 100 waits until the voice of the user's response to the closed question is acquired (S13). When voice acquisition is not performed in step S13 (NO in S13), the process returns to step S1 to wait for the next voice acquisition.
When the voice is acquired in step S13 (YES in S13), the robot control device 200 retries the conversation establishment between the robot 100 and the user, and determines whether or not the number of retries exceeds a predetermined number. (S14).

ステップＳ１４で、リトライ回数が予め回数を超えていないと判定された場合（Ｓ１４のＮＯ）、ロボット制御装置２００は、選択された言語の会話作成部で別のクローズドクエスチョンのシステム発話を作成し、リトライ回数をインクリメントする（Ｓ１５）。そして、再度、ロボット制御装置２００の音声出力部２３２は、ロボット１００にシステム発話を再生させる（Ｓ１２）。ステップＳ１４でリトライ回数が予め設定した回数をオーバーしたと判定された場合（Ｓ１４のＹＥＳ）には、ロボット制御装置２００は、利用者に対するロボット１００の発話を断念し、ステップＳ１に戻って、他の利用者からの音声取得を待つ。 When it is determined in step S14 that the number of retries has not exceeded the number in advance (NO in S14), the robot controller 200 creates another closed question system utterance in the conversation creating unit of the selected language, The number of retries is incremented (S15). Then, again, the voice output unit 232 of the robot control device 200 causes the robot 100 to reproduce the system utterance (S12). When it is determined in step S14 that the number of retries has exceeded the preset number (YES in S14), the robot controller 200 gives up the utterance of the robot 100 to the user, returns to step S1, and returns to other steps. Wait for the voice acquisition from the user.

ここで、リトライ回数が所定回数をオーバーしたか否かの判断であるが、ここでは例えば、対話を継続するリトライ回数を予め決めておき（例えば、３回など）、３回のリトライをしても、ロボット１００と利用者との会話が成立しないときは、ロボット制御装置２００は、ロボット１００に当該利用者との対話を中止させるように制御する。 Here, it is a judgment as to whether or not the number of retries exceeds a predetermined number, but here, for example, the number of retries for continuing the dialogue is predetermined (for example, 3 times), and 3 retries are performed. Also, when the conversation between the robot 100 and the user is not established, the robot control device 200 controls the robot 100 to stop the dialogue with the user.

ステップＳ８のキーワード照合部２７１の処理で、キーワードテーブル２７４に登録されているキーワードとロボット１００による音声認識結果を照合した結果、音声認識結果の中にキーワード一致する言語がある場合（Ｓ８のＹＥＳ）には、キーワードに一致した言語を選択言語格納部２７３に格納する（Ｓ１６）。 When the keyword registered in the keyword table 274 is compared with the voice recognition result by the robot 100 in the process of the keyword matching unit 271 in step S8, and there is a language that matches the keyword in the voice recognition results (YES in S8). The language matching the keyword is stored in the selected language storage unit 273 (S16).

また、ステップＳ９における信頼度比較部２７２における、第一言語から第三言語の各言語の信頼度比較で、信頼度が予め設定した閾値以上の言語があると判定された場合（Ｓ９のＹＥＳ）にも、同様に、信頼度が閾値以上の言語を選択言語格納部２７３に格納する（Ｓ１６）。 Further, in the reliability comparison of the first language to the third language by the reliability comparing unit 272 in step S9, it is determined that there is a language whose reliability is equal to or higher than a preset threshold value (YES in S9). Similarly, the language with the reliability higher than the threshold is stored in the selected language storage unit 273 (S16).

続いて、選択された言語の会話作成部において、オープンクエスチョン会話テーブル２８５に基づいてシステム発話を作成する（Ｓ１７）。
ステップＳ１７で、オープンクエスチョン形式のシステム発話が作成されると、音声出力部２３２は、ロボット１００へ音声データを送信し、ロボット１００はスピーカー１３５でその音声データを再生して発話する（Ｓ１８）。 Then, in the conversation creating unit of the selected language, a system utterance is created based on the open question conversation table 285 (S17).
When an open question format system utterance is created in step S17, the voice output unit 232 transmits voice data to the robot 100, and the robot 100 reproduces the voice data by the speaker 135 and speaks (S18).

そして、オープンクエスチョン形式のシステム発話が出力されると、会話処理部２８０の中の選択された言語の会話作成部は、会話終了フラグが「ＴＲＵＥ」であるか「ＦＡＬＳＥ」であるかを確認する（Ｓ１９）。なお、図９で後述するように、オープンクエスチョン形式の発話では、応答を継続させる必要があることから、会話終了フラグは「ＦＡＬＳＥ」になっている。 Then, when the system utterance in the open question format is output, the conversation creating unit of the selected language in the conversation processing unit 280 confirms whether the conversation end flag is “TRUE” or “FALSE”. (S19). As will be described later with reference to FIG. 9, in the open question format utterance, since it is necessary to continue the response, the conversation end flag is “FALSE”.

ステップＳ１９で会話終了フラグが「ＦＡＬＳＥ」であった場合（Ｓ１９のＮＯ）には、新たな音声取得を待つ（Ｓ２０）。そして、ステップＳ２０で、新たな音声が取得されれば（Ｓ２０のＹＥＳ）、第一から第三の言語のうち選択された言語の音声認識部で音声認識を行い（Ｓ２１）、ステップＳ１７で再度選択された言語の会話作成部でオープンクエスチョン形式のシステム発話を作成して、ステップＳ１８でロボット１００に発話させる。ステップＳ２０で、新たな音声が取得されなければ（Ｓ２０のＮＯ）、新たな音声が取得されるまで待機する。
ステップＳ１９で、ロボット１００と利用者との会話が成立し、会話終了フラグが「ＴＲＵＥ」であると判定された場合（Ｓ１９のＹＥＳ）には、会話処理を終了し、ステップＳ１に戻る。 When the conversation end flag is "FALSE" in step S19 (NO in S19), a new voice acquisition is waited (S20). Then, if a new voice is acquired in step S20 (YES in S20), the voice recognition unit of the language selected from the first to third languages performs voice recognition (S21), and again in step S17. The conversation creating unit of the selected language creates a system utterance in the open question format, and causes the robot 100 to utter in step S18. If a new voice is not acquired in step S20 (NO in S20), the process stands by until a new voice is acquired.
If the conversation between the robot 100 and the user is established in step S19 and it is determined that the conversation end flag is “TRUE” (YES in S19), the conversation process is ended and the process returns to step S1.

なお、図６のフローチャートではステップＳ９の条件を「信頼度が閾値以上の言語があるか否か」としたが、これを「最も高い言語の信頼度と他の言語の信頼度の差が閾値以上であるか否か」としてもよい。そして、最も高い言語の信頼度と他の言語の信頼度の差が僅差である場合には、ステップＳ１７でクローズドクエスチョンを行うようにする。 In the flowchart of FIG. 6, the condition of step S9 is “whether or not there is a language whose reliability is equal to or higher than a threshold”. Whether or not the above is satisfied ". If the difference between the highest language reliability and the other language reliability is small, the closed question is performed in step S17.

例えば、利用者の話す言語が、第一言語なのか、あるいは第二言語なのか、あるいはその他の言語なのか判別しにくいような場合には、第一言語信頼度算出部２６１２で算出した信頼度が最も高い信頼度であっても、第一言語信頼度算出部２６１２で算出した信頼度と第二言語信頼度算出部２６２２で算出した信頼度とが近い値となることが想定される。
つまり、最も高い第一言語（日本語）の信頼度が「０．８」で、第二言語（英語）の信頼度が「０．７」であるような場合には、差が「０．１」になって微差になる。この場合には、利用者の話す言語が第一言語（日本語）なのか第二言語（英語）なのかを判定しにくい。
一方、最も高い第一言語（日本語）の信頼度が「０．５」で、第二言語（英語）の信頼度が「０．１」であれば、その差は「０．４」になって微差ではなくなってくる。このような場合には、利用者の話す言語は第一言語（日本語）であると判定することができる。 For example, when it is difficult to determine whether the language spoken by the user is the first language, the second language, or another language, the reliability calculated by the first language reliability calculation unit 2612. Even if is the highest reliability, it is assumed that the reliability calculated by the first language reliability calculation unit 2612 and the reliability calculated by the second language reliability calculation unit 2622 are close values.
That is, when the reliability of the highest first language (Japanese) is “0.8” and the reliability of the second language (English) is “0.7”, the difference is “0. It becomes 1 ”, which is a slight difference. In this case, it is difficult to determine whether the language spoken by the user is the first language (Japanese) or the second language (English).
On the other hand, if the reliability of the highest first language (Japanese) is “0.5” and the reliability of the second language (English) is “0.1”, the difference is “0.4”. It will not be a slight difference. In such a case, it can be determined that the language spoken by the user is the first language (Japanese).

このように、信頼度が最も高い言語の信頼度と他の言語の信頼度の差が閾値以上の場合には、オープンクエスチョン形式の会話内容を作成し、信頼度が最も高い言語の信頼度と他の言語の信頼度の差が閾値以下の場合には、クローズドクエスチョン形式の会話内容を作成する。
すなわち、複数言語の音声認識の信頼度がいずれも閾値より低い場合には、システムが利用者に対してクローズドクエスチョン形式の問いかけを行い、利用者の発話内容を限定するようにしている。 In this way, if the difference between the reliability of the language with the highest reliability and the reliability of the other languages is greater than or equal to the threshold value, open question form conversation content is created and the reliability of the language with the highest reliability is set. If the difference in reliability between other languages is less than or equal to the threshold value, the closed question form conversation content is created.
That is, when the reliability of speech recognition in a plurality of languages is lower than the threshold value, the system asks the user in a closed question form to limit the utterance content of the user.

＜各種テーブルの説明＞
図７は、第一言語を日本語、第二言語を英語、第三言語を中国語とした国際空港での利用における言語選択部２７０のキーワードテーブル２７４の一例を示している。キーワードテーブル２７４には、第一言語（日本語）として、「はい」、「よろしいです」が格納されており、またこれらの日本語に対応する第二言語（英語）、及び第三言語（中国語）の言葉も格納されている。 <Explanation of various tables>
FIG. 7 shows an example of the keyword table 274 of the language selection unit 270 in use at an international airport where the first language is Japanese, the second language is English, and the third language is Chinese. The keyword table 274 stores “yes” and “yes” as the first language (Japanese), and also corresponds to these Japanese, the second language (English) and the third language (Chinese). Words) are also stored.

図８は、第一言語を日本語、第二言語を英語、第三言語を中国語とした国際空港での利用における会話処理部２８０のクローズドクエスチョン会話テーブル２８４の一例を示している。
すなわち、クローズドクエスチョン会話テーブル２８４のシステム発話には、利用者がキーワードテーブル２７４に登録されているキーワードを発話するように誘導するシステム発話が登録される。例えば、第一言語（日本語）としては、キーワードテーブルに登録されている「はい」を発話することを期待して、「日本語でよかったでしょうか？」や、「使用したい言語を教えてください」などの、定型の質問文が登録される。同様に、第二言語（英語）と第三言語（中国語）に関しても同じ意味の定型の質問文が登録される。 FIG. 8 shows an example of the closed question conversation table 284 of the conversation processing unit 280 in use at an international airport where the first language is Japanese, the second language is English, and the third language is Chinese.
That is, the system utterance of the closed question conversation table 284 is registered with the system utterance that guides the user to utter the keyword registered in the keyword table 274. For example, as the first language (Japanese), expecting to speak "Yes" registered in the keyword table, "Is it okay in Japanese?" Or "Please tell me the language you want to use" A standard question text such as “” is registered. Similarly, fixed-form question sentences having the same meaning are registered for the second language (English) and the third language (Chinese).

図９は、第一言語を日本語、第二言語を英語、第三言語を中国語とした国際空港での利用におけるオープンクエスチョン会話テーブル２８５の一例を示す。オープンクエスチョン会話テーブル２８５には、第一言語（日本語）、第二言語（英語）及び第三言語（中国語）のそれぞれの言語に対して、ユーザー発話の欄、システム発話の欄及び会話終了フラグの欄が設けられている。 FIG. 9 shows an example of an open question conversation table 285 when used at an international airport where the first language is Japanese, the second language is English, and the third language is Chinese. In the open question conversation table 285, a user utterance column, a system utterance column, and a conversation end are provided for each of the first language (Japanese), the second language (English), and the third language (Chinese). A flag column is provided.

ロボット１００に発話させるシステム発話が「何がしたいですか？」、あるいは「何が食べたいですか？」などの問いかけ形式の質問に対しては、会話終了フラグを「ＦＡＬＳＥ」として会話を継続する。そして、ロボット１００の発話が「トイレはここから左に行くとあります。」のように案内を行う時の発話では、会話終了フラグを「ＴＲＵＥ」として会話を終了する。これにより、複数回の会話から利用者の求める案内内容を絞り込んで最終的な案内を行うことができる。 For a question in the form of a system, such as "What do you want to do?" Or "What do you want to eat?" As the system utterance to make the robot 100 speak, the conversation end flag is set to "FALSE" and the conversation is continued. . Then, in the utterance at the time of providing guidance such as the utterance of the robot 100 "There is a toilet going to the left from here.", The conversation end flag is set to "TRUE" to end the conversation. As a result, the final guidance can be provided by narrowing down the guidance content requested by the user from a plurality of conversations.

なお、本発明は上記した実施の形態例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施の形態例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications are included. For example, the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the configurations described.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能などは、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、または、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。
また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 Further, the above-described respective configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Further, the above-described respective configurations and functions may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as a program, a table, and a file that realizes each function can be stored in a memory, a recording device such as a hard disk and an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, and a DVD.
Further, the control lines and information lines are shown as being considered necessary for explanation, and not all control lines and information lines are shown in the product. In reality, it may be considered that almost all the configurations are connected to each other.

１…案内ロボットシステム、２…建物、
１００…（案内）ロボット、
１１０…ＣＰＵ、１２０…記憶装置、１２１…駆動制御部、１２２…会話制御部、１２３…入出力部、１３０…入出力装置、１３１…カメラ、１３２…マイク、１３３…ジャイロセンサ、１３４…測域センサ、１３５…スピーカー、１３６…駆動機構、１４０…通信インターフェース、
２００…ロボット制御装置、
２１０：ＣＰＵ、２１１…通信インターフェース、２２０…記憶装置、２３０…入出力データ処理部、２３１…音声取得部、２３２…音声出力部、２３３…画像取得部、２３４…動作出力部、２３５…測域データ取得部、２３６…エラー出力部、２４０…サービスフロー処理部、２５０…対面検知部、２６０…音声処理部、２６１…第一言語音声処理部、２６１１…第一言語音声認識部、２６１２…第一言語信頼度算出部、２６２…第二言語音声処理部、２６２１…第二言語音声認識部、２６２２…第二言語信頼度算出部、２６３…第三言語音声処理部、２６３１…第三言語音声認識部、２６３２…第三言語音声処理部、２７０…言語選択部、２７１…キーワード照合部、２７２…信頼度比較部、２７３…選択言語格納部、２７４…キーワードテーブル、２８０…会話処理部、２８１…第一言語会話作成部、２８２…第二言語会話作成部、２８３…第三言語会話作成部、２８４…クローズドクエスチョン会話テーブル、２８５…オープンクエスチョン会話テーブル
３００…ロボット管理サーバ
３１０…ＣＰＵ、３２０…記憶装置、３２１…ロボット配置管理部、３３０…通信インターフェース 1 ... Guidance robot system, 2 ... Building,
100 ... (guidance) robot,
110 ... CPU, 120 ... Storage device, 121 ... Drive control unit, 122 ... Conversation control unit, 123 ... Input / output unit, 130 ... Input / output device, 131 ... Camera, 132 ... Microphone, 133 ... Gyro sensor, 134 ... Range Sensor, 135 ... Speaker, 136 ... Driving mechanism, 140 ... Communication interface,
200 ... robot controller,
210: CPU, 211 ... Communication interface, 220 ... Storage device, 230 ... Input / output data processing unit, 231 ... Audio acquisition unit, 232 ... Audio output unit, 233 ... Image acquisition unit, 234 ... Operation output unit, 235 ... Range Data acquisition unit, 236 ... Error output unit, 240 ... Service flow processing unit, 250 ... Meeting detection unit, 260 ... Voice processing unit, 261 ... First language voice processing unit, 2611 ... First language voice recognition unit, 2612 ... One language reliability calculation unit, 262 ... Second language speech processing unit, 2621 ... Second language speech recognition unit, 2622 ... Second language reliability calculation unit, 263 ... Third language speech processing unit, 2631 ... Third language speech Recognition unit, 2632 ... Third language speech processing unit, 270 ... Language selection unit, 271 ... Keyword collation unit, 272 ... Reliability comparison unit, 273 ... Selected language storage unit, 274 ... Key word Table 280 ... Conversation processing unit, 281 ... First language conversation creation unit, 282 ... Second language conversation creation unit, 283 ... Third language conversation creation unit, 284 ... Closed question conversation table, 285 ... Open question conversation table 300 ... Robot management server 310 ... CPU, 320 ... Storage device, 321 ... Robot placement management unit, 330 ... Communication interface

Claims

A guidance robot system that provides guidance services using conversations in multiple languages,
A voice acquisition unit that acquires voice,
A voice recognition unit that performs voice recognition in multiple languages on the voice acquired by the voice acquisition unit,
A reliability calculation unit that calculates reliability of the plurality of languages for the voice acquired by the voice acquisition unit;
A keyword collating unit that obtains a matching language by collating the speech recognition results of the plurality of languages obtained by the speech recognition unit with a keyword registered in advance;
A language selection unit that specifies the language of the voice acquired by the voice acquisition unit based on the reliability of the plurality of languages obtained by the reliability calculation unit;
A conversation processing unit that switches conversation content based on the reliability obtained by the reliability calculation unit;
A guidance robot system including.

The conversation processing unit switches between a closed question format conversation in which the user's utterance is limited to the keywords matched by the keyword matching unit and an open question format conversation in which guidance is provided.
The guide robot system according to claim 1.

The conversation processing unit determines a closed question type conversation when the reliability of the language with the highest reliability is smaller than the threshold among the reliability obtained by the reliability calculation unit, and the reliability of the language with the highest reliability. If the degree is higher than the threshold, switch to open question style conversation,
The guide robot system according to claim 2.

The conversation processing unit uses the closed question form when the difference between the reliability of the most reliable language and the reliability of other languages is smaller than a threshold with respect to the reliability obtained by the reliability calculation unit. It is a conversation, and when the difference between the reliability of the most reliable language and the reliability of other languages is larger than the threshold value, it is switched to an open question type conversation.
The guide robot system according to claim 2.

The language selection unit compares the reliability of a plurality of languages obtained by the reliability calculation unit, and specifies the language with the highest reliability as the language of the voice acquired by the voice acquisition unit,
The guide robot system according to claim 1.

A guidance method in which a guidance robot provides guidance services using conversations in multiple languages,
A step of acquiring voice by the voice acquisition unit of the guide robot,
A step of performing voice recognition of a plurality of languages by a voice recognition unit on the voice acquired by the voice acquisition unit;
A step of calculating reliability of a plurality of languages by a reliability calculation unit for the sound acquired by the sound acquisition unit;
A step of collating the speech recognition results of a plurality of languages obtained by the speech recognition section with a keyword registered in advance by a keyword collation section to obtain a matching language;
A step of specifying a language of a voice acquired by the voice acquisition section on the basis of the reliability of a plurality of languages obtained by the reliability calculation section by a language selection section;
Switching the conversation content in the conversation processing unit based on the reliability obtained by the reliability calculation unit,
Including guidance.