JP2005202076A

JP2005202076A - Uttering control device and method and robot apparatus

Info

Publication number: JP2005202076A
Application number: JP2004007306A
Authority: JP
Inventors: Hideki Shimomura; 秀樹下村
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2004-01-14
Filing date: 2004-01-14
Publication date: 2005-07-28

Abstract

PROBLEM TO BE SOLVED: To provide an uttering control device and method, and a robot apparatus, in which entertainment property of a robot having a communication control function is improved. SOLUTION: Equipment used in the communication between a user and the robot or uttering configuration of the robot apparatus is varied if so required, in accordance with a range between the equipment or the robot apparatus and the user. COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、発話制御装置及び方法並びにロボット装置に関し、例えばインターテインメイントロボットに適用して好適なものである。 The present invention relates to an utterance control apparatus and method, and a robot apparatus, and is suitably applied to, for example, an integrated robot.

近年、多くの企業や大学等の研究機関においてヒューマノイド型ロボットの開発が進められている。かかるヒューマノイド型ロボットは、ＣＣＤ（Charge Coupled Device）カメラ、マイクロホン及びタッチセンサ等の外部センサと、バッテリセンサ及び加速度センサ等の内部センサとが搭載され、これら外部センサ及び内部センサの出力に基づいて外部及び内部の状態を認識し、認識結果に基づいて自律的に行動し得るようになされたものである（例えば非特許文献１参照）。 In recent years, humanoid robots have been developed by many companies and research institutions such as universities. Such a humanoid robot includes an external sensor such as a CCD (Charge Coupled Device) camera, a microphone and a touch sensor, and an internal sensor such as a battery sensor and an acceleration sensor, and externally based on the output of the external sensor and the internal sensor. In addition, it recognizes the internal state and can act autonomously based on the recognition result (see, for example, Non-Patent Document 1).

また近年では、音声認識機能及び対話制御機能が搭載され、ユーザとの間で簡単な日常会話を行い得るようになされたエンターテインメントロボットも多く見受けられるようになってきた。
特願２００３−２７０８３５ In recent years, there have been many entertainment robots that are equipped with a voice recognition function and a dialog control function so that a simple daily conversation with a user can be performed.
Japanese Patent Application No. 2003-270835

ところで、かかる音声認識及び対話制御機能が搭載された従来のロボットでは、ロボットとユーザとの物理的な距離にかかわりなく、常に予め設定された一定の発話音量でユーザと対話するように構築されている。 By the way, a conventional robot equipped with such voice recognition and dialogue control functions is constructed so that it always interacts with the user at a predetermined utterance volume regardless of the physical distance between the robot and the user. Yes.

このため発話音量の設定によっては、ロボットの近くにいるユーザにとっては適切であったとしても、ロボットから少し離れた場所にいるユーザにとっては音が小さ過ぎて発話内容が聞き取り難かったり、逆にロボットから少し離れた場所にいるユーザにとっては適切であったとしても、ロボットの近くにいるユーザにとっては音が大き過ぎて発話内容が聞き取り難い場合がある問題があった。 For this reason, depending on the utterance volume setting, even if it is appropriate for the user near the robot, the sound is too low for the user at a location slightly away from the robot and the utterance content is difficult to hear. Even if it is appropriate for a user who is a little away from the robot, there is a problem that it is difficult for the user near the robot to hear the utterance content because the sound is too loud.

かかる問題点を解決するための１つの方法として、スイッチ操作によりエンターテインメントロボットの発話音量を自在に変更し得るようにすることも考えられる。しかしながらこの方法によると、ユーザとロボットとのインタラクションの自然性が損なわれる問題があり、さらに発話音量の設定をその都度行わなければならないとすることは、ユーザにとって不便極まりない問題がある。 As one method for solving such a problem, it is conceivable that the speech volume of the entertainment robot can be freely changed by a switch operation. However, according to this method, there is a problem that the naturalness of the interaction between the user and the robot is impaired, and further, there is a problem that it is inconvenient for the user to set the utterance volume each time.

また残響が多い部屋など、周囲の環境を考慮した場合、単にロボットの発話音量を上げるだけでは、ロボットから離れた場所にいるユーザに対して常にロボットの発話を聞き取り易くさせ得ることになるとは限らない。 Also, considering the surrounding environment such as a room with a lot of reverberations, simply increasing the volume of the robot's speech may not always make it easier for the user who is away from the robot to hear the speech of the robot. Absent.

そしてこのようにロボットの発話内容がユーザに聞き取り難いという状況は、ユーザとロボットとのスムーズかつ自然な対話を阻害する要因となって、対話制御機能を有するロボットのエンターテインメント性を損なうものであり、何らかの解決策が望まれる。 And the situation that it is difficult for the user to hear the utterance content of the robot in this way is a factor that hinders the smooth and natural dialogue between the user and the robot, and impairs the entertainment property of the robot having the dialogue control function, Some solution is desired.

本発明は以上の点を考慮してなされたもので、対話制御機能を有するロボットのエンターテインメント性を向上させ得る発話制御装置及び方法並びにロボット装置を提案しようとするものである。 The present invention has been made in consideration of the above points, and an object of the present invention is to propose an utterance control apparatus and method, and a robot apparatus that can improve the entertainment performance of a robot having a dialog control function.

かかる課題を解決するため本発明においては、機器とユーザとの対話時における当該機器の発話を制御する発話制御装置において、機器及びユーザ間の距離に応じて、ユーザとの対話時における機器の発話形態を必要に応じて変更させる発話形態変更手段を設けるようにした。 In order to solve such a problem, in the present invention, in an utterance control device that controls the utterance of a device at the time of dialogue between the device and the user, the utterance of the device at the time of dialogue with the user according to the distance between the device and the user An utterance form changing means for changing the form as needed is provided.

この結果この発話制御装置は、常にユーザに聞き取り易い発話形態で発話することができるため、ユーザとスムーズな対話を行うことができる。 As a result, since this utterance control device can always utter in an utterance form that is easy for the user to hear, a smooth conversation with the user can be performed.

また本発明においては、機器とユーザとの対話時における当該機器の発話を制御する発話制御方法において、機器及びユーザ間の距離に応じて、ユーザとの対話時における機器の発話形態を必要に応じて変更するようにした。 Further, in the present invention, in the utterance control method for controlling the utterance of the device at the time of the dialogue between the device and the user, the utterance form of the device at the time of the dialogue with the user is changed according to the distance between the device and the user as necessary. And changed it.

この結果この発話制御方法によれば、常にユーザに聞き取り易い発話形態で発話することができるため、ユーザとスムーズな対話を行うことができる。 As a result, according to this utterance control method, it is possible to always utter in an utterance form that is easy for the user to hear, so that a smooth conversation with the user can be performed.

さらに本発明においては、ロボット装置において、ユーザまでの距離に応じて、ユーザとの対話時における発話形態を必要に応じて変更する発話形態変更手段を設けるようにした。 Furthermore, in the present invention, the robot apparatus is provided with an utterance form changing means for changing the utterance form at the time of dialogue with the user as necessary according to the distance to the user.

この結果このロボット装置は、常にユーザに聞き取り易い発話形態で発話することができるため、ユーザとスムーズな対話を行うことができる。 As a result, since this robot apparatus can always utter in an utterance form that is easily heard by the user, it can perform a smooth conversation with the user.

本発明によれば、機器とユーザとの対話時における当該機器の発話を制御する発話制御装置及び方法において、機器及びユーザ間の距離に応じて、ユーザとの対話時における機器の発話形態を必要に応じて変更させるようにしたことにより、常にユーザに聞き取り易い発話形態で発話することができるため、ユーザとスムーズな対話を行うことができ、かくしてエンターテインメント性を向上させ得る発話制御装置及び方法を実現できる。 According to the present invention, in the utterance control apparatus and method for controlling the utterance of the device at the time of dialogue between the device and the user, the utterance form of the device at the time of dialogue with the user is required according to the distance between the device and the user. The speech control apparatus and method that can perform a smooth conversation with the user and thus improve the entertainment property because the user can always speak in a speech form that is easily heard by the user. realizable.

また本発明によれば、ロボット装置において、ユーザまでの距離に応じて、ユーザとの対話時における発話形態を必要に応じて変更する発話形態変更手段を設けるようにしたことにより、常にユーザに聞き取り易い発話形態で発話することができるため、ユーザとスムーズな対話を行うことができ、かくしてエンターテインメント性を向上させ得るロボット装置を実現できる。 According to the present invention, the robot apparatus is provided with the utterance form changing means for changing the utterance form at the time of dialogue with the user according to the distance to the user, so that the user can always listen to the user. Since it is possible to speak in an easy-to-speak form, it is possible to realize a robot apparatus that can perform a smooth conversation with the user and thus improve the entertainment property.

以下図面について、本発明の一実施の形態を詳述する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

（１）本実施の形態によるロボットの構成
（１−１）ロボット１のハードウェア構成
図１において、１は全体として本実施の形態によるロボットを示し、胴体部ユニット２の上部に首関節３を介して頭部ユニット４が取り付けられると共に、当該胴体部ユニット２の左右上部にそれぞれ肩関節４Ａ、４Ｂを介して腕部ユニット５Ａ、５Ｂが取り付けられ、かつ胴体部ユニット２の下部にそれぞれ股関節６Ａ、６Ｂを介して一対の脚部ユニット７Ａ、７Ｂが取り付けられることにより構成されている。 (1) Configuration of Robot According to this Embodiment (1-1) Hardware Configuration of Robot 1 In FIG. 1, reference numeral 1 denotes a robot according to this embodiment as a whole, and a neck joint 3 is provided above the body unit 2. The head unit 4 is attached to the body unit 2, the arm units 5 A and 5 B are attached to the left and right upper parts of the body unit 2 via shoulder joints 4 A and 4 B, respectively, and the hip unit 6 A is attached to the lower part of the body unit 2. , 6B, a pair of leg units 7A and 7B are attached.

図２は、このロボット１の機能構成を模式的に示したものである。この図２に示すように、ロボット１は、全体の動作の統括的制御やその他のデータ処理を行う制御ユニット１０と、入出力部１１と、駆動部１２と、電源部１３とで構成される。 FIG. 2 schematically shows the functional configuration of the robot 1. As shown in FIG. 2, the robot 1 includes a control unit 10 that performs overall control of the entire operation and other data processing, an input / output unit 11, a drive unit 12, and a power supply unit 13. .

入出力部１１は、入力部としてロボット１の目に相当する一対のＣＣＤ（Charge Coupled Device）カメラ２０や、耳に相当する一対のマイクロホン２１、頭部や手及び足裏などの部位に配設されてユーザからの物理的な働きかけや、手と外部物体との接触、足裏面の接地等を感知するタッチセンサ２２、あるいは五感に相当するその他の各種のセンサを含む。 The input / output unit 11 is disposed in a part such as a pair of CCD (Charge Coupled Device) cameras 20 corresponding to the eyes of the robot 1 or a pair of microphones 21 corresponding to the ears, a head, a hand, or a sole as an input unit. The touch sensor 22 detects a physical action from the user, a contact between a hand and an external object, a grounding of the back of the foot, or other various sensors corresponding to the five senses.

また入出力部１１は、出力部としてロボット１の口に相当するスピーカ２３、あるいは点滅の組み合わせや点灯のタイミングにより顔の表情を形成するＬＥＤ（目ランプ）２４などを装備している。これら出力部は、音声やランプの点滅など、脚などによる機械運動パターン以外の形式でもロボット１からのユーザ・フィードバックを表現することができる。 Further, the input / output unit 11 is equipped with a speaker 23 corresponding to the mouth of the robot 1 as an output unit, or an LED (eye lamp) 24 that forms a facial expression according to a combination of blinking and lighting timing. These output units can express user feedback from the robot 1 in a format other than a mechanical motion pattern such as a leg or the like, such as voice or lamp blinking.

駆動部１２は、制御ユニット１０が指令する所定の運動パターンに従ってロボット１の機体動作を実現する機能ブロックであり、行動制御による制御対象物である。駆動部１２は、ロボット１の各関節における自由度を実現するための機能モジュールであり、それぞれの関節におけるロール、ピッチ、ヨーなど各軸毎に設けられた複数の駆動ユニット２５_１〜２５_ｎで構成される。各駆動ユニット２５_１〜２５_ｎは、所定軸回りの回転動作を行うモータ２６_１〜２６_ｎと、モータ２６_１〜２６_ｎの回転位置を検出するエンコーダ２７_１〜２７_ｎと、エンコーダ２７_１〜２７_ｎの出力に基づいてモータ２６_１〜２６_ｎの回転位置や回転速度を適応的に制御するドライバ２８_１〜２８_ｎの組み合わせで構成される。 The drive unit 12 is a functional block that realizes the body operation of the robot 1 in accordance with a predetermined motion pattern commanded by the control unit 10, and is a controlled object by behavior control. The drive unit 12 is a functional module for realizing the degree of freedom in each joint of the robot 1, and includes a plurality of drive units 25 _{1 to} 25 _n provided for each axis such as a roll, a pitch, and a yaw in each joint. Composed. Each driving unit ₂₅ 1 to 25 _n includes a motor ₂₆ 1 ~ 26 _n for rotating operation of the predetermined axis, an encoder ₂₇ 1 ~ 27 _n for detecting a rotational position of the motor ₂₆ 1 ~ 26 _n, encoders ₂₇ 1 to It is configured by a combination of drivers 28 _{1 to} 28 _n that adaptively control the rotational positions and rotational speeds of the motors 26 _{1 to} 26 _n based on the outputs of 27 _n .

電源部１３は、その字句通り、ロボット１内に各電気回路などに対して給電を行う機能モジュールである。本実施の形態に係るロボット１は、バッテリを用いた自律駆動式であり、電源部１３は、充電バッテリ２９と、充電バッテリ３０の充放電状態を管理する充放電制御部３１とで構成される。 The power supply unit 13 is a functional module that supplies power to each electric circuit or the like in the robot 1 as the term indicates. The robot 1 according to the present embodiment is an autonomous drive type using a battery, and the power supply unit 13 includes a charge battery 29 and a charge / discharge control unit 31 that manages a charge / discharge state of the charge battery 30. .

充電バッテリ２９は、例えば、複数本のリチウムイオン２次電池セルをカートリッジ式にパッケージ化した「バッテリ・パック」の形態で構成される。 The rechargeable battery 29 is configured, for example, in the form of a “battery pack” in which a plurality of lithium ion secondary battery cells are packaged in a cartridge type.

また、充放電制御部３０は、バッテリ２９の端子電圧や充電／放電電流量、バッテリ２９の周囲温度などを測定することでバッテリ２９の残存容量を把握し、充電の開始時期や終了時期などを決定する。充放電制御部３０が決定する充電の開始及び終了時期は制御ユニット１０に通知され、ロボット１が充電オペレーションを開始及び終了するためのトリガとなる。 Further, the charge / discharge control unit 30 grasps the remaining capacity of the battery 29 by measuring the terminal voltage of the battery 29, the amount of charge / discharge current, the ambient temperature of the battery 29, etc., and determines the start timing and end timing of charging, etc. decide. The charging start / end timing determined by the charge / discharge control unit 30 is notified to the control unit 10 and serves as a trigger for the robot 1 to start and end the charging operation.

制御ユニット１０は、ロボット１の「頭脳」に相当し、例えば頭部ユニット４又は胴体部ユニット２内に搭載されている。この制御ユニット１０は、図３に示すように、メイン・コントローラとしてのＣＰＵ（Central Processing Unit）３１が、メモリやその他の各回路コンポーネントや周辺機器とバス接続された構成となっている。バス３７は、データ・バス、アドレス・バス、コントロール・バスなどを含む共通信号伝送路である。バス３７上の各装置にはそれぞれに固有のアドレス（メモリ・アドレス又はＩ／Ｏアドレス）が割り当てられている。ＣＰＵ３１は、アドレスを指定することによってバス３７上の特定の装置と通信することができる。 The control unit 10 corresponds to the “brain” of the robot 1 and is mounted, for example, in the head unit 4 or the torso unit 2. As shown in FIG. 3, the control unit 10 has a configuration in which a CPU (Central Processing Unit) 31 serving as a main controller is connected to a memory, other circuit components, and peripheral devices via a bus. The bus 37 is a common signal transmission path including a data bus, an address bus, a control bus, and the like. Each device on the bus 37 is assigned a unique address (memory address or I / O address). The CPU 31 can communicate with a specific device on the bus 37 by designating an address.

ＲＡＭ（Read Access Memory）３２は、ＤＲＡＭ（Dynamic RAM）などの揮発性メモリで構成された書き込み可能メモリであり、ＣＰＵ３１が実行するプログラム・コードをロードしたり、実行プログラムによる作業データの一時的な保存のために使用される。 A RAM (Read Access Memory) 32 is a writable memory composed of a volatile memory such as a DRAM (Dynamic RAM), loads a program code to be executed by the CPU 31, and temporarily stores work data by the execution program. Used for preservation.

ＲＯＭ（Read Only Memory）３３は、プログラムやデータを恒久的に格納する読み出し専用メモリである。ＲＯＭ３３に格納されるプログラム・コードには、ロボット１の電源投入時に実行する自己診断テスト・プログラムや、ロボット１の動作を規定する制御プログラムなどが挙げられる。 A ROM (Read Only Memory) 33 is a read only memory for permanently storing programs and data. Examples of the program code stored in the ROM 33 include a self-diagnosis test program that is executed when the power of the robot 1 is turned on and a control program that defines the operation of the robot 1.

ロボット１の制御プログラムには、ＣＣＤカメラ２０やマイクロホン２１などの各種センサからの入力を処理してシンボルとして認識する「センサ入力・認識処理プログラム」、短期記憶などの記憶動作を司りながらセンサ入力と所定の行動制御モデルとに基づいてロボット１の行動を制御する「行動制御プログラム」、行動制御モデルに従って各関節モータの駆動やスピーカ２２の音声出力などを制御する「駆動制御プログラム」などが含まれる。 The control program for the robot 1 includes a “sensor input / recognition processing program” that processes inputs from various sensors such as the CCD camera 20 and the microphone 21 and recognizes them as symbols, and sensor inputs while managing storage operations such as short-term memory. A “behavior control program” for controlling the behavior of the robot 1 based on a predetermined behavior control model, a “drive control program” for controlling the driving of each joint motor, the sound output of the speaker 22, and the like according to the behavior control model are included. .

不揮発性メモリ３４は、例えばＥＥＰＲＯＭ（Electrically Erasable and Programmable ROM）のように電気的に消去再書き込みが可能なメモリ素子で構成され、逐次更新すべきデータを不揮発的に保持するために使用される。逐次更新すべきデータには、暗記鍵やその他のセキュリティ情報、出荷後にインストールすべき装置制御プログラムなどが挙げられる。 The non-volatile memory 34 is composed of a memory element that can be electrically erased and rewritten, such as an EEPROM (Electrically Erasable and Programmable ROM), and is used to hold data to be sequentially updated in a non-volatile manner. The data to be updated sequentially includes a memorization key, other security information, and a device control program to be installed after shipment.

インターフェース３５は、制御ユニット１０外の機器と相互接続し、データ交換を可能にするための装置である。インターフェース３５は、例えば、入出力部１１内のＣＣＤカメラ２０やマイクロホン２１及びスピーカ２２との間でデータ入出力を行う。また、インターフェース３５は、駆動部１２内の各ドライバ２８_１〜２８_ｎとの間でデータやコマンドの入出力を行う。 The interface 35 is a device for interconnecting with devices outside the control unit 10 and enabling data exchange. For example, the interface 35 performs data input / output with the CCD camera 20, the microphone 21, and the speaker 22 in the input / output unit 11. The interface 35 inputs and outputs data and commands to and from the drivers 28 _{1 to} 28 _n in the drive unit 12.

また、インターフェース３５は、ＲＳ（Recommended Standard）−２３２Ｃなどのシリアル・インターフェース、ＩＥＥＥ（Institute of Electrical and Electronics Engineers ）１２８４などのパラレル・インターフェース、ＵＳＢ（Universal Serial Bus）インターフェース、ｉ−Ｌｉｎｋ（ＩＥＥＥ１３９４）インターフェース、ＳＣＳＩ（Small Computer System Interface ）インターフェース、ＰＣカードやメモリ・スティックを受容するメモリ・カードインターフェース（カードスロット）などのような、コンピュータの周辺機器接続用の汎用インターフェースを備え、ローカル接続された外部機器との間でプログラムやデータの移動を行い得るようにしてもよい。また、インターフェース３５の他の例として、赤外線通信（ＩｒＤＡ）インターフェースを備え、外部機器と無線通信を行うようにしてもよい。 The interface 35 includes a serial interface such as RS (Recommended Standard) -232C, a parallel interface such as IEEE (Institute of Electrical and Electronics Engineers) 1284, a USB (Universal Serial Bus) interface, and an i-Link (IEEE 1394) interface. Externally connected devices with general-purpose interfaces for connecting peripheral devices such as SCSI (Small Computer System Interface) interfaces, memory card interfaces (card slots) that accept PC cards and memory sticks, etc. It may be possible to move programs and data to and from. As another example of the interface 35, an infrared communication (IrDA) interface may be provided to perform wireless communication with an external device.

さらに、制御ユニット１０は、無線通信インターフェース３６やネットワーク・インターフェース・カード（ＮＩＣ）３８などを含み、Ｂｌｕｅｔｏｏｔｈのような近接無線データ通信や、ＩＥＥＥ８０２．１１ｂのような無線ネットワーク、あるいはインターネットなどの高域ネットワークを経由して、外部のさまざまなホスト・コンピュータとデータ通信を行うことができる。 Further, the control unit 10 includes a wireless communication interface 36, a network interface card (NIC) 38, etc., and close proximity wireless data communication such as Bluetooth, a wireless network such as IEEE802.11b, or a high band such as the Internet. Data communication can be performed with various external host computers via the network.

このようなロボット１とホスト・コンピュータ間におけるデータ通信により、遠隔のコンピュータ資源を用いて、ロボット１の複雑な動作制御を演算したり、リモート・コントロールすることができる。 By such data communication between the robot 1 and the host computer, complex operation control of the robot 1 can be calculated or remotely controlled using remote computer resources.

（１−２）ロボット１のソフトウェア構成
図４は、ＲＯＭに格納された制御プログラム群により構成されるロボット１の行動制御システム４０の機能構成を模式的に示したものである。 (1-2) Software Configuration of Robot 1 FIG. 4 schematically shows a functional configuration of the behavior control system 40 of the robot 1 configured by a control program group stored in the ROM.

この行動制御システム４０は、オブジェクト指向プログラミングを採り入れて実装されている。この場合、各ソフトウェアは、データとそのデータに対する処理手続きとを一体化させた「オブジェクト」というモジュール単位で扱われる。また各オブジェクトは、メッセージ通信と共有メモリを使ったオブジェクト間通信方法によりデータの受け渡しとＩｎｖｏｋｅを行うことができる。 This behavior control system 40 is implemented by adopting object-oriented programming. In this case, each software is handled in units of modules called “objects” in which data and processing procedures for the data are integrated. Each object can perform data transfer and invoke by message communication and an inter-object communication method using a shared memory.

ここで行動制御システム４０は、ＣＣＤカメラ２０（図２）、マイクロホン２１（図２）及びタッチセンサ２２（図２）などの各種センサからのセンサ出力に基づいて外部環境を認識するための画像認識部４１、音声認識部４２及び接触認識部４３を有している。 Here, the behavior control system 40 recognizes an external environment based on sensor outputs from various sensors such as the CCD camera 20 (FIG. 2), the microphone 21 (FIG. 2), and the touch sensor 22 (FIG. 2). Part 41, voice recognition part 42 and contact recognition part 43.

画像認識部４１は、ＣＣＤカメラ２０から与えられる画像信号Ｓ１に基づいて顔認識や色認識などの画像認識処理や特徴抽出を実行する。そして画像認識部４１は、かかる顔認識結果であるその人物に固有の顔ＩＤ（識別子）、顔画像領域の位置及び大きさといった顔認識情報や、色認識結果である色領域の位置や大きさ及び特徴量といった色認識情報などの各種画像認識結果と、画像信号Ｓ１とを短期記憶部４４に送出する。また画像認識部４１は、ＣＣＤカメラカメラ２０からの画像信号Ｓ１に基づいて、いわゆるステレオビジョン法によって撮像対象までの距離を検出し、検出結果を短期記憶部４４に送出する。 The image recognition unit 41 performs image recognition processing such as face recognition and color recognition and feature extraction based on the image signal S 1 given from the CCD camera 20. Then, the image recognition unit 41 performs face recognition information such as the face ID (identifier) unique to the person as the face recognition result, the position and size of the face image area, and the position and size of the color area as the color recognition result. And various image recognition results such as color recognition information such as feature quantities and the image signal S1 are sent to the short-term storage unit 44. Further, the image recognition unit 41 detects the distance to the imaging target by the so-called stereo vision method based on the image signal S1 from the CCD camera 20, and sends the detection result to the short-term storage unit 44.

音声認識部４２は、マイクロホン２１から与えられる音声信号Ｓ２に基づいて音声認識や話者認識、音源方向認識などの各種音に関する認識処理を実行する。そして音声認識部４２は、かかる音声認識結果である認識した単語の文字列情報と、音響的特徴等に基づく話者認識処理結果であるその話者に固有の話者ＩＤ情報と、音源方向認識結果である音源方向情報となどの各種音声認識結果を短期記憶部４４に送出する。また音声認識部４２は、これら各種音声認識結果と併せて、これら音声信号Ｓ２を短期記憶部４４に送出する。 The voice recognition unit 42 executes recognition processing for various sounds such as voice recognition, speaker recognition, and sound source direction recognition based on the voice signal S 2 given from the microphone 21. The speech recognition unit 42 then recognizes the character string information of the recognized word that is the speech recognition result, speaker ID information unique to the speaker that is the speaker recognition processing result based on the acoustic features, and the sound source direction recognition. Various voice recognition results such as sound source direction information as a result are sent to the short-term storage unit 44. The voice recognition unit 42 sends these voice signals S2 to the short-term storage unit 44 together with the various voice recognition results.

さらに接触認識部４３は、頭部ユニット４（図１）の上部や腕部ユニット５Ａ、５Ｂ（図１）の先端部である手、脚部ユニット７Ａ、７Ｂ（図１）の底部である足底等に配設されたタッチセンサ２２からそれぞれ与えられる圧力検出信号Ｓ３に基づいて「撫でられた」、「叩かれた」、「物を把持した」、「足裏面が接地した」という外部との物理的な接触を認識し、得られたこれら接触認識結果を短期記憶部４４に送出する。また接触認識部４４は、これら接触認識結果と併せて、各タッチセンサ２２からの圧力検出信号Ｓ３を短期記憶部４４に送出する。 Further, the contact recognition unit 43 is the upper part of the head unit 4 (FIG. 1), the hand that is the tip of the arm unit 5A, 5B (FIG. 1), and the foot that is the bottom of the leg unit 7A, 7B (FIG. 1). Based on the pressure detection signals S3 respectively provided from the touch sensors 22 arranged on the bottom, etc., the outside that “boiled”, “struck”, “gripped an object”, “the back of the foot was grounded” These physical recognition results are sent to the short-term storage unit 44. Further, the contact recognition unit 44 sends the pressure detection signal S3 from each touch sensor 22 to the short-term storage unit 44 together with these contact recognition results.

短期記憶部４４は、ロボット１の外部環境に関する情報を比較的短い時間だけ保持するオブジェクトであり、画像認識部４１から与えられる各種画像認識結果及び画像信号Ｓ１と、音声認識部４２から与えられる各種音声認識結果及び音声信号Ｓ２と、接触認識部４３から与えられる各種接触認識結果及び圧力検出信号Ｓ３とを受け取り、これらを短期間だけ記憶する。 The short-term storage unit 44 is an object that holds information about the external environment of the robot 1 for a relatively short time, and various image recognition results and image signals S1 given from the image recognition unit 41 and various types given from the voice recognition unit 42. The voice recognition result and voice signal S2 and the various contact recognition results and pressure detection signal S3 given from the contact recognition unit 43 are received and stored for a short period of time.

また短期記憶部４４は、これら受け取った画像認識結果、音声認識結果及び接触認識結果と、画像信号Ｓ１、音声信号Ｓ２及び各圧力検出信号Ｓ３とを統合的に用いて顔画像領域、人物ＩＤ、話者ＩＤ及び文字列情報等の対応付けを行うことにより、現在どこにどの人物がいて、発した言葉がどの人物のものであり、その人物とはこれまでにどのような対話を行ったかというターゲット情報及びイベント情報を生成し、これを行動選択制御部４５に送出する。 Further, the short-term storage unit 44 uses the received image recognition result, voice recognition result and contact recognition result, the image signal S1, the voice signal S2, and each pressure detection signal S3 in an integrated manner, and the face image region, person ID, By associating speaker IDs and character string information, etc., the target is where the person is currently, who the uttered word is, and what kind of dialogue the person has so far Information and event information are generated and sent to the action selection control unit 45.

行動選択制御部４５は、短期記憶部４４から与えられるターゲット情報及びイベント情報と、短期記憶部４４の記憶内容とに基づいて、予め複数用意されている行動の中からロボット１が現在置かれている状況及び以前の行動に依存して選択した行動（状況依存行動）や、外部刺激に応じた反射的な行動（反射行動）、又は与えられた状況若しくはユーザからの命令に応じた比較的長期に渡る行動計画に基づく行動（熟考行動）などを次のロボット１の行動として決定する。そして行動選択制御部４５は、このようにして決定した行動を出力管理部４６に通知する。 Based on the target information and event information given from the short-term storage unit 44 and the stored contents of the short-term storage unit 44, the action selection control unit 45 is configured to have the robot 1 currently placed among a plurality of actions prepared in advance. Selected depending on the situation and previous behavior (situation-dependent behavior), reflexive behavior in response to external stimuli (reflective behavior), or a relatively long period in response to a given situation or user command The action (contemplation action) based on the action plan over the next is determined as the action of the next robot 1. Then, the action selection control unit 45 notifies the output management unit 46 of the action determined in this way.

出力管理部４６は、行動選択制御部４５からの通知に応じて、状況依存行動及び反射行動などの複数の行動が競合した場合の調停処理や、動作、音声及びＬＥＤ２４の点滅の同期をとる処理を行いながら、対応する駆動ユニット２５_１〜２５_ｎのモータ２６_１〜２６_ｎを駆動したり、ＬＥＤ２４を所定パターンで点滅駆動させる。 In response to the notification from the action selection control unit 45, the output management unit 46 performs an arbitration process when a plurality of actions such as a situation-dependent action and a reflex action compete, and a process of synchronizing operation, sound, and blinking of the LED 24 , The motors 26 ₁ to 26 _n of the corresponding drive units 25 _{1 to} 25 _n are driven, or the LEDs 24 are driven to blink in a predetermined pattern.

また行動選択制御部４５は、次の行動としてユーザとの対話を決定した場合には、この後音声認識部４２により短期記憶部４４に順次格納されるユーザの発話の音声認識結果を常時監視し、この音声認識結果に基づいて、ロボット１に発話させるべき内容を順次決定する。そして行動選択制御部４５は、この決定結果に基づいて必要な文字列を予めＲＯＭ３３（図３）に格納された発話文字列データベース４７から読み出し、これを出力管理部４６に送出する。 In addition, when the action selection control unit 45 determines an interaction with the user as the next action, the action recognition control unit 45 continuously monitors the voice recognition result of the user's utterance sequentially stored in the short-term storage unit 44 by the voice recognition unit 42 thereafter. The contents to be uttered by the robot 1 are sequentially determined based on the voice recognition result. Then, the action selection control unit 45 reads a necessary character string from the utterance character string database 47 stored in advance in the ROM 33 (FIG. 3) based on the determination result, and sends it to the output management unit 46.

このとき出力管理部４６は、行動選択制御部４５から与えられる文字列を音声合成部４８に送出する一方、音声合成部４８は、供給される文字列に基づいて合成音声の音声信号Ｓ４を生成し、これをスピーカ２３（図２）に送出する。この結果この音声信号Ｓ４に基づく音声がスピーカ２３から出力される。 At this time, the output management unit 46 sends the character string given from the action selection control unit 45 to the speech synthesis unit 48, while the speech synthesis unit 48 generates a speech signal S4 of the synthesized speech based on the supplied character string. Then, this is sent to the speaker 23 (FIG. 2). As a result, sound based on the sound signal S4 is output from the speaker 23.

このようにしてこのロボット１においては、ＣＣＤカメラ２０、マイクロホン２１及びタッチセンサ２３などの各種センサのセンサ出力に基づき認識される外部状況等に基づいて自律的に行動し得るようになされている。 In this way, the robot 1 can act autonomously based on an external situation or the like recognized based on sensor outputs of various sensors such as the CCD camera 20, the microphone 21, and the touch sensor 23.

（２）ロボット１における発話制御機能
次に、このロボット１に搭載された発話制御機能について説明する。 (2) Speech Control Function in Robot 1 Next, the speech control function mounted on the robot 1 will be described.

このロボット１には、対話相手までの物理的な距離に応じて、発話音量や、発話スピード、イントネーション及び文節間の間などの発話形態を制御する発話制御機能が搭載されている。またこのロボット１には、このような発話形態のうちの特に発話音量に関する感じ方の個人差が大きいことを考慮して、ユーザからの要求に応じて基準とすべき発話音量（以下、これを基準音量と呼ぶ）をユーザごとに変更する発話制御機能も搭載されている。 The robot 1 is equipped with an utterance control function for controlling an utterance form such as an utterance volume, an utterance speed, intonation, and between phrases according to a physical distance to a conversation partner. The robot 1 also has an utterance volume to be used as a reference in response to a request from the user in consideration of a large individual difference in the feeling of the utterance volume among the utterance forms. It also has an utterance control function that changes the reference volume (referred to as reference volume) for each user.

実際上、このロボット１の場合、基準音量をユーザごとに変更する発話制御を行うための手段として、行動制御システム４０には、ユーザごとの基準音量を記憶保持するための基準音量記憶部４９が設けられている。因みに、この基準音量記憶部４９は、不揮発性メモリ３４（図３）により構成されるものである。 In practice, in the case of this robot 1, as a means for performing utterance control for changing the reference volume for each user, the behavior control system 40 has a reference volume storage unit 49 for storing and holding the reference volume for each user. Is provided. Incidentally, the reference volume storage unit 49 is configured by the nonvolatile memory 34 (FIG. 3).

そして行動選択制御部４５は、画像認識処理部４１の画像認識結果に基づいて短期記憶部４４が新たなユーザを検出するごとに、図５に示すように、そのユーザの人物ＩＤと当該ユーザに対する基準音量（初期設定値は「３」）とを対応付けて、これら人物ＩＤ及び基準音量を基準音量記憶部４９に格納する一方、この後そのユーザとの対話時に図６に示す基準音量変更処理手順ＲＴ１に従って、基準音量記憶部４９に記憶保持されたそのユーザの基準音量を必要に応じて変更するようになされている。 Then, each time the short-term storage unit 44 detects a new user based on the image recognition result of the image recognition processing unit 41, the action selection control unit 45, as shown in FIG. The reference volume (the initial setting value is “3”) is associated with each other, and the person ID and the reference volume are stored in the reference volume storage unit 49. Thereafter, the reference volume change process shown in FIG. According to the procedure RT1, the reference volume of the user stored and held in the reference volume storage unit 49 is changed as necessary.

すなわち行動選択制御部４５は、ユーザとの対話を開始するとこの基準音量変更処理手順ＲＴ１をステップＳＰ０において開始し、続くステップＳＰ１において、「もう少し大きな声で話して」又は「聞こえない」などといった、ロボットの発話音量を上げることを要求する意味合いの言葉や、「もう少し小さな声で話して」又は「うるさい」などといった、ロボットの発話音量を下げることを要求する意味合いの言葉が与えられるのを待ち受ける。 That is, the action selection control unit 45 starts the reference volume change processing procedure RT1 at step SP0 when starting the dialogue with the user, and at the subsequent step SP1, "speak a little louder" or "cannot hear", etc. It waits for a meaning word that requires raising the utterance volume of the robot, or a meaning word that demands lowering the utterance volume of the robot, such as “speak in a little louder voice” or “noisy”.

そして行動選択制御部４５は、この後上述のようなロボット１の発話音量を上げ又は下げることを要求する意味合いの言葉がユーザから発せられたことを短期記憶部４４に記憶保持された音声認識部４２の音声認識結果に基づいて認識すると、ステップＳＰ２に進んで、短期記憶部４４から与えられるターゲット情報及びイベント情報に基づいて、その言葉が現在の対話相手から発せられたものであるとの特定ができているか否かを判断する。 Then, the action selection control unit 45 then stores the speech recognition unit stored in the short-term storage unit 44 that a meaningful word requesting to increase or decrease the utterance volume of the robot 1 as described above is issued from the user. If it recognizes based on the speech recognition result of 42, it will progress to step SP2 and will specify that the word is the thing currently uttered from the other party based on the target information and event information given from the short-term memory | storage part 44. Judge whether or not.

行動選択制御部４５は、このステップＳＰ２において否定結果を得るとステップＳＰ１に戻り、これに対して肯定結果を得るとステップＳＰ３に進んで、基準音量記憶部４９に記憶保持されたそのユーザの基準音量を、そのユーザが発した言葉の意味内容に応じて、発話音量を上げることを要求する意味合いの言葉であった場合には規定値（例えば「１」）だけ上げ、これに対して発話音量を下げることを要求する意味合いの言葉であった場合には規定値だけ下げる。なお、音量の数値はロボット１の発話音量の音量レベルの大小を表すものであり、数値が大きくなるほど大きな所定の音量レベルがそれぞれ対応付けられている。 If the action selection control unit 45 obtains a negative result in step SP2, the process returns to step SP1. If the action selection control unit 45 obtains an affirmative result, the action selection control unit 45 proceeds to step SP3 and stores the user's reference stored in the reference volume storage unit 49. The volume is increased by a specified value (for example, “1”) in the case of a meaning word that requires the utterance volume to be increased in accordance with the meaning content of the word uttered by the user. If it is a meaningful word that requires lowering, lower it by the specified value. The numerical value of the sound volume represents the volume level of the utterance sound volume of the robot 1, and a larger predetermined sound volume level is associated with a larger numerical value.

そして行動選択制御部４５は、その後ステップＳＰ１に戻り、この後ステップＳＰ１〜ステップＳＰ３について同様の処理を繰り返す。このようにして行動選択制御部４５は、ユーザからの発話音量の変更要求に応じて、ユーザごとに基準音量を変更する。 And the action selection control part 45 returns to step SP1 after that, and repeats the same process about step SP1-step SP3 after this. In this way, the action selection control unit 45 changes the reference volume for each user in response to a request for changing the utterance volume from the user.

一方、このロボット１では、対話相手までの物理的な距離に応じて、意味内容の区切りがはっきりと分かるように文節間の間を変更するための手法として、必要に応じて各文節の後ろに「ね」を入れるように発話制御を行うようになされ、そのための手段として行動制御システム４５に発話文字列変形部５１が設けられている。 On the other hand, in this robot 1, as a method for changing the interval between phrases so that the semantic content can be clearly recognized according to the physical distance to the conversation partner, it is placed after each phrase as necessary. The utterance control is performed so as to insert “ne”, and the utterance character string deforming unit 51 is provided in the behavior control system 45 as means for that.

この場合、発話文字列変形部５１は、図７に示すように、対話相手までの距離に対して各文節の後ろに「ね」を入れるように文字列を変形するか否かを規定した文字列変形判断テーブル５０をＲＯＭ３３（図３）に記憶保持している。 In this case, as shown in FIG. 7, the utterance character string deforming unit 51 defines whether or not to deform the character string so that “ne” is inserted after each phrase with respect to the distance to the conversation partner. A column deformation determination table 50 is stored and held in the ROM 33 (FIG. 3).

そして行動選択制御部４５は、ユーザとの対話時、ロボット１が発話すべき内容（言葉）に対応した文字列を発話文字列データベース４７から読み出し、これを文字列情報として発話文字列変形部５１に順次送出する一方、これと併せて短期記憶部４４に記憶保持された画像認識部４１により検出された対話相手までの距離を表す距離情報を発話文字列変形部５１に送出する。 Then, the action selection control unit 45 reads a character string corresponding to the contents (words) to be uttered by the robot 1 during dialogue with the user from the utterance character string database 47 and uses this as character string information as an utterance character string transformation unit 51. At the same time, distance information representing the distance to the conversation partner detected by the image recognition unit 41 stored and held in the short-term storage unit 44 is transmitted to the utterance character string deforming unit 51.

発話文字列変形部５１は、行動選択制御部４５から文字列及び距離情報が与えられると、当該距離情報及び文字列変形判断テーブル５０に基づき、図８に示す文字変形処理手順ＲＴ２に従って、その文字列情報に基づく文字列を変形すべきか否かを判断し、必要時にはこの文字列を変形する。 When the character string and the distance information are given from the action selection control unit 45, the utterance character string deforming unit 51 follows the character deformation processing procedure RT2 shown in FIG. It is determined whether or not the character string based on the column information should be deformed, and the character string is deformed when necessary.

すなわち発話文字列変形部４９は、行動選択制御部４５から文字列情報及び距離情報が与えられると、この文字列変形処理手順ＲＴ２をステップＳＰ１０において開始し、続くステップＳＰ１１において行動選択制御部４５からの距離情報に基づき認識される対話相手までの距離と、文字列変形判断テーブル５０（図７）とに基づいて、この文字列情報に基づく文字列を変形すべきか否かを判断する。 That is, when the character string information and the distance information are given from the action selection control unit 45, the utterance character string modification unit 49 starts this character string modification processing procedure RT2 in step SP10, and then from the action selection control unit 45 in step SP11. Whether or not the character string based on the character string information should be deformed is determined based on the distance to the conversation partner recognized based on the distance information and the character string deformation determination table 50 (FIG. 7).

具体的に発話文字列変形部４９は、文字列変形判断テーブル５０に従って、画像認識部４１が認識した対話相手までの距離が350〔cm〕未満である場合には文字列を変形すべきでないと判断し、これに対して対話相手までの距離が350〔cm〕以上である場合には文字列を変形すべきと判断する。 Specifically, the uttered character string deforming unit 49 should not deform the character string when the distance to the conversation partner recognized by the image recognizing unit 41 is less than 350 [cm] according to the character string deformation determining table 50. In contrast, if the distance to the conversation partner is 350 [cm] or more, it is determined that the character string should be transformed.

発話文字列変形部４９は、この判断結果としてこのステップＳＰ１１において否定結果を得ると、その文字列に対して変形処理を施すことなくステップＳＰ１４に進んでこの文字列変形処理手順ＲＴ２を終了し、この後この文字列の文字列情報をそのまま行動選択制御部４５、出力管理部４６を順次介して音声合成部４８に送出する。 If the utterance character string deforming unit 49 obtains a negative result in step SP11 as the determination result, the utterance character string deforming unit 49 proceeds to step SP14 without performing the deformation process on the character string, and ends the character string deforming process procedure RT2. Thereafter, the character string information of this character string is sent as it is to the speech synthesizer 48 via the action selection controller 45 and the output manager 46 sequentially.

これに対して発話文字列変形部４９は、ステップＳＰ１１において肯定結果を得ると、ステップＳＰ１２に進んでその文字列に対して形態素解析処理を行ってその文字列に含まれる各文節末助動詞をそれぞれ検出する。 On the other hand, if the utterance character string deforming unit 49 obtains a positive result in step SP11, the utterance character string deforming unit 49 proceeds to step SP12 to perform morphological analysis processing on the character string, and each ending sentence auxiliary verb included in the character string is respectively determined. To detect.

さらに発話文字列変形部４９は、この後ステップＳＰ１３に進んで、ステップＳＰ１２において検出したこれら文節末助動詞の後ろに順次「ね」の文字を付加する。従って、発話文字列変形部４９は、例えば行動選択制御部４５から「今日の東京の天気は晴れだよ」という文字列を与えられた場合、「今日の」、「東京の」及び「天気は」という各文節末助動詞の後ろにそれぞれ「ね」の文字を付加した「今日のね、東京のね、天気はね、晴れだよ」といった文字列を生成することとなる。 Further, the utterance character string deforming unit 49 proceeds to step SP13, and sequentially adds “Ne” characters after the sentence end auxiliary verbs detected in step SP12. Therefore, when the character string “Today's weather in Tokyo is sunny” is given from the action selection control unit 45, for example, the utterance character string deforming unit 49 has “Today”, “Tokyo”, and “Weather is A character string such as “Today's day, in Tokyo, in the weather, in fine weather” is generated by adding “ne” to the end of each sentence end auxiliary verb.

そして発話文字列変形部４９は、この後ステップＳＰ１４に進んでこの文字列変形処理手順ＲＴ２を終了し、この後このようにして得られた変形した新たな文字列を表す文字列情報を行動選択制御部４５、出力管理部４６を順次介して音声合成部４８に送出する。 Then, the utterance character string deforming unit 49 proceeds to step SP14 to end the character string deforming process procedure RT2, and then selects the character string information representing the new character string obtained in this way and selects the action. The data is sent to the voice synthesis unit 48 via the control unit 45 and the output management unit 46 in order.

このとき音声合成部４８は、図９に示すように、対話相手までの距離に対するロボット１の発話音量の変更量を規定した音量変更量規定テーブル５２と、図１０に示すように、当該距離に対するロボット１の発話スピードの変更量を規定したスピード変更量規定テーブル５３と、図１１に示すように、当該距離に対する文節間の間（ポーズ長）の変更量を規定したポーズ長変更量規定テーブル５４と、図１２に示すように、当該距離に対して発話のイントネーションを変えて文節末を強調するか否かを規定した文節末強調判断テーブル５５とをＲＯＭ３３（図３）に記憶保持している。また音声合成部４８には、短期記憶部４４からそのときの対話相手の人物ＩＤが人物ＩＤ情報として与えられる一方、当該対話相手までの距離が距離情報として順次与えられる。 At this time, as shown in FIG. 9, the speech synthesizer 48 defines a volume change amount defining table 52 that defines the amount of change in the speech volume of the robot 1 with respect to the distance to the conversation partner, and as shown in FIG. A speed change amount defining table 53 that defines the amount of change in the speaking speed of the robot 1 and a pause length change amount defining table 54 that defines the amount of change between phrases (pose length) with respect to the distance, as shown in FIG. As shown in FIG. 12, a phrase ending emphasis determination table 55 that defines whether or not to emphasize the ending of the sentence by changing the intonation of the utterance with respect to the distance is stored in the ROM 33 (FIG. 3). . The voice synthesizing unit 48 is provided with the person ID of the conversation partner at that time from the short-term storage unit 44 as person ID information, and the distance to the conversation partner is sequentially provided as distance information.

そして音声合成部４８は、出力管理部４６を介してロボット１が発話すべき内容の文字列情報が与えられると、短期記憶部４４から与えられる人物ＩＤ情報及び距離情報と、ＲＯＭ３３に格納された各テーブル５２〜５５と、基準音量記憶部４９に記憶保持されたそのユーザの基準音量とに基づき、図１３に示す発話変更処理手順ＲＴ３に従って、ロボット１の発話音量や、発話スピード等を必要に応じて変更する。 When the character string information of the content to be spoken by the robot 1 is given via the output management unit 46, the voice synthesis unit 48 stores the person ID information and distance information given from the short-term storage unit 44 and the ROM 33. Based on each table 52 to 55 and the reference volume of the user stored and held in the reference volume storage unit 49, the utterance volume, the utterance speed, etc. of the robot 1 are required according to the utterance change processing procedure RT3 shown in FIG. Change accordingly.

すなわち音声合成部４８は、かかる文字列情報が与えられるとこの文字列変形処理手順ＲＴ３をステップＳＰ２０において開始し、続くステップＳＰ２１において、短期記憶部４４から与えられた人物ＩＤ情報に基づいて、基準音量記憶部４９から対応する対話相手の基準音量を読み出し、この後ステップＳＰ２２に進んで距離情報から認識される対話相手までの距離に基づき、音量変更量規定テーブル５２（図９）を参照して、ロボット１の発話音量を規定する音量パラメータのパラメータ値を必要に応じて変更する。 That is, when such character string information is given, the speech synthesizer 48 starts this character string transformation processing procedure RT3 in step SP20, and in the following step SP21, based on the person ID information given from the short-term storage unit 44, The reference volume of the corresponding conversation partner is read from the volume storage unit 49, and thereafter, the process proceeds to step SP22, and the volume change amount defining table 52 (FIG. 9) is referred to based on the distance to the conversation partner recognized from the distance information. The parameter value of the volume parameter that defines the speech volume of the robot 1 is changed as necessary.

具体的に音声合成部４８は、距離情報として与えられた対話相手までの距離が例えば50〔cm〕未満である場合には、基準音量記憶部４９から読み出した基準音量をそのときの発話音量として決定し、対話相手までの距離が50〜80〔cm〕の範囲である場合には、基準音量よりも所定の規定値（例えば「１」）だけ大きい音量をそのときの発話音量として決定し、以降同様にして、当該ユーザまでの距離が80〜150〔cm〕、150〜250〔cm〕又は250〔cm〕以上の場合には、それぞれ基準音量よりも「２」、「３」又は「５」だけ大きい音量をそのときの発話音量として決定する。そして音声合成部４８は、この後この決定結果に応じて音量パラメータのパラメータ値を必要に応じて変更する。 Specifically, when the distance to the conversation partner given as the distance information is less than 50 [cm], for example, the speech synthesis unit 48 uses the reference volume read from the reference volume storage unit 49 as the utterance volume at that time. When the distance to the conversation partner is in the range of 50 to 80 [cm], a volume that is larger than the reference volume by a predetermined specified value (for example, “1”) is determined as the utterance volume at that time, In the same manner, when the distance to the user is 80 to 150 [cm], 150 to 250 [cm], or 250 [cm] or more, “2”, “3”, or “5” respectively than the reference volume. Is determined as the utterance volume at that time. The speech synthesizer 48 then changes the parameter value of the volume parameter as necessary according to the determination result.

続いて音声合成部４８は、ステップＳＰ２３に進んで、ステップＳＰ２２において上述のように決定した音量が予め設定された閾値を超えたか否かを判断する。 Subsequently, the speech synthesizer 48 proceeds to step SP23 and determines whether or not the volume determined as described above in step SP22 exceeds a preset threshold value.

すなわちロボット１の発話音量には一定の限界があるため、例えば基準音量が既に限界値やこれに近い値に設定されている場合には、ステップＳＰ２２において音量を決定する際に、そのユーザとの距離に応じて本来的には発話音量をもっと大きな音量に決定すべき場合においてもできないことがある。そしてこのような場合、かかるユーザにとって、ロボット１の発話の内容が聞き取り難く感じられるおそれがある。 That is, since the utterance volume of the robot 1 has a certain limit, for example, when the reference volume has already been set to a limit value or a value close to the limit value, when the volume is determined in step SP22, Depending on the distance, it may not be possible even when the utterance volume is to be determined to be a larger volume. In such a case, the user may feel that the content of the utterance of the robot 1 is difficult to hear.

そこで音声合成部４８は、このステップＳＰ２３において肯定結果を得た場合には、ステップＳＰ２４〜ステップＳＰ２６において発話音量以外の発話スピード等の他の発話形態について通常の変更処理を行い、これに対してステップＳＰ２３において否定結果を得た場合には、ステップＳＰ２７〜ステップＳＰ２９においてかかる他の発話形態について特殊な変更処理を行うようになされている。 Therefore, when a positive result is obtained in step SP23, the speech synthesizer 48 performs normal change processing on other utterance modes such as the utterance speed other than the utterance volume in step SP24 to step SP26. When a negative result is obtained in step SP23, a special change process is performed for the other utterance forms in steps SP27 to SP29.

実際上、音声合成部４８は、ステップＳＰ２３において肯定結果を得ると、ステップＳＰ２４に進んで、短期記憶部４４から与えられる距離情報により認識される対話相手までの距離に基づき、ＲＯＭ３３に格納されたスピード変更量規定テーブル５３（図１０）を参照して、ロボット１の発話スピードを規定するスピードパラメータのパラメータ値を必要に応じて変更する。 In practice, if the speech synthesizer 48 obtains a positive result in step SP23, it proceeds to step SP24 and is stored in the ROM 33 based on the distance to the conversation partner recognized by the distance information given from the short-term storage unit 44. With reference to the speed change amount defining table 53 (FIG. 10), the parameter value of the speed parameter that defines the speech speed of the robot 1 is changed as necessary.

具体的に音声合成部４８は、距離情報として与えられた対話相手までの距離が例えば200〔cm〕未満である場合には、予め定められた所定の初期設定値を発話時のスピードとして決定し、対話相手までの距離が200〜350〔cm〕の範囲である場合には、初期設定値よりも20〔％〕だけ遅いスピードをそのときの発話スピードとして決定し、対話相手までの距離が350〔cm〕以上の場合には、初期設定値よりも50〔％〕だけ遅いスピードをそのときの発話スピードとして決定する。そして音声合成部４８は、この後この決定結果に応じてスピードパラメータのパラメータ値を必要に応じて変更する。 Specifically, when the distance to the conversation partner given as the distance information is less than 200 [cm], for example, the speech synthesizer 48 determines a predetermined predetermined initial setting value as the speed at the time of speech. When the distance to the conversation partner is in the range of 200 to 350 [cm], a speed that is 20% lower than the initial setting value is determined as the speech speed at that time, and the distance to the conversation partner is 350 In the case of [cm] or more, a speed slower by 50 [%] than the initial setting value is determined as the speech speed at that time. The speech synthesizer 48 then changes the parameter value of the speed parameter as necessary according to the determination result.

続いて音声合成部４８は、ステップＳＰ２５に進んで、短期記憶部４４から与えられる距離情報により認識される対話相手までの距離に基づき、ＲＯＭ３３に格納されたポーズ長変更量規定テーブル５４（図１１）を参照して、その文字列を発話する際の文節間の間の長さ（ポーズ長）を規定するポーズ長パラメータを必要に応じて変更する。 Subsequently, the speech synthesizer 48 proceeds to step SP25, and based on the distance to the conversation partner recognized by the distance information given from the short-term storage unit 44, the pause length change amount definition table 54 (FIG. 11) stored in the ROM 33. ), The pause length parameter that defines the length between pauses (pause length) when the character string is uttered is changed as necessary.

具体的に音声合成部４８は、距離情報として与えられた対話相手までの距離が例えば100〔cm〕未満である場合には、予めポーズ長の初期設定値として定められた値をそのときのポーズ長として決定し、対話相手までの距離が100〜350〔cm〕の範囲である場合には、初期設定値よりも30〔％〕だけ長い時間をそのときのポーズ長として決定し、対話相手までの距離が350〔cm〕以上の場合には、初期設定値よりも60〔％〕だけ長い時間をそのときのポーズ長として決定する。そして音声合成部４８は、この後この決定結果に応じてポーズ長パラメータのパラメータ値を必要に応じて変更する。 Specifically, when the distance to the conversation partner given as the distance information is less than 100 [cm], for example, the speech synthesizer 48 uses the value set in advance as the initial setting value of the pause length at that time. When the distance to the conversation partner is in the range of 100 to 350 [cm], a time that is 30 [%] longer than the initial setting value is determined as the pause length at that time. When the distance is 350 [cm] or more, a time longer than the initial set value by 60 [%] is determined as the pause length at that time. Then, the speech synthesizer 48 changes the parameter value of the pause length parameter as necessary according to the determination result.

次いで音声合成部４８は、ステップＳＰ２６に進んで、短期記憶部４４から与えられる距離情報により認識される対話相手までの距離に基づき、ＲＯＭ３３に格納された文節末強調判断テーブル５５（図１２）を参照して、その文字列の内容を発話する際に各文節末を強調するイントネーションとなるように、対応するパラメータである文節末強調パラメータのパラメータ値を必要に応じて変更する。 Next, the speech synthesizing unit 48 proceeds to step SP26, and based on the distance to the conversation partner recognized by the distance information given from the short-term storage unit 44, the sentence ending emphasis determination table 55 (FIG. 12) stored in the ROM 33 is obtained. The parameter value of the paragraph ending emphasis parameter, which is a corresponding parameter, is changed as necessary so that it becomes an intonation that emphasizes each ending at the time of speaking the contents of the character string.

具体的に音声合成部４８は、距離情報として与えられた対話相手までの距離が例えば200〔cm〕未満である場合には、その文字列を音声として発話する際に各文節末を変化させないことを決定し、対話相手までの距離が200〔cm〕以上の場合には各文節末をそれぞれ上げることを決定する。そして音声合成部４８は、この後この決定結果に応じて文節末強調パラメータのパラメータ値を必要に応じて変更する。 Specifically, when the distance to the conversation partner given as the distance information is less than 200 [cm], the speech synthesizer 48 does not change the end of each sentence when the character string is uttered as speech. When the distance to the conversation partner is 200 [cm] or more, it is decided to raise each sentence end. Then, the speech synthesizer 48 changes the parameter value of the paragraph ending emphasis parameter as necessary according to the determination result.

これに対して音声合成部４８は、ステップＳＰ２３において否定結果を得ると、ステップＳＰ２７に進んで、短期記憶部４４から与えられる距離情報により認識される対話相手までの距離にかかわりなく、ロボット１の発話スピードを、常に、スピード変更量規定テーブル５３（図１０）において最大距離に対して規定されている発話スピードに決定する。従って、この実施の形態においては、発話スピードとして初期設定値よりも50〔％〕だけ遅いスピードがそのときの発話スピードとして決定される。そして音声合成部４８は、この後この決定結果に応じてスピードパラメータのパラメータ値を必要に応じて変更する。 On the other hand, if the speech synthesizer 48 obtains a negative result in step SP23, the speech synthesizer 48 proceeds to step SP27, regardless of the distance to the conversation partner recognized by the distance information given from the short-term storage unit 44. The speech speed is always determined as the speech speed defined for the maximum distance in the speed change amount defining table 53 (FIG. 10). Therefore, in this embodiment, a speed slower by 50% than the initial set value is determined as the speech speed at that time. The speech synthesizer 48 then changes the parameter value of the speed parameter as necessary according to the determination result.

続いて音声合成部４８は、ステップＳＰ２８に進んで、ロボット１の発話時のポーズ長を、常に、ポーズ長変更量規定テーブル５４（図１１）において最大距離に対して規定されている発話スピードに決定する。従って、この実施の形態においては、ポーズ長として初期設定値よりも60〔％〕だけ長い時間がそのときの発話スピードとして決定される。そして音声合成部４８は、この後この決定結果に応じてポーズ長パラメータのパラメータ値を必要に応じて変更する。 Subsequently, the speech synthesizer 48 proceeds to step SP28, and the pose length when the robot 1 speaks is always set to the utterance speed defined for the maximum distance in the pose length change amount definition table 54 (FIG. 11). decide. Therefore, in this embodiment, a time longer than the initial setting value by 60% as the pause length is determined as the speech speed at that time. Then, the speech synthesizer 48 changes the parameter value of the pause length parameter as necessary according to the determination result.

次いで音声合成部４８は、ステップＳＰ２９に進んで、その文字列の内容を発話する際における文節末強調判断パラメータを、常に、文節末強調判断テーブル５５（図１２）において最大距離に対して規定された状態に決定する。従って、この実施の形態においては、常に各文節末を強調すべき旨が決定される。そして音声合成部４８は、この後この決定結果に応じて文節末強調パラメータのパラメータ値を文節末を強調するように必要に応じて変更する。 Next, the speech synthesizer 48 proceeds to step SP29, and the phrase ending emphasis determination parameter when uttering the contents of the character string is always defined for the maximum distance in the sentence ending emphasis determination table 55 (FIG. 12). Determine the state. Therefore, in this embodiment, it is determined that the end of each sentence should always be emphasized. Then, the speech synthesizer 48 changes the parameter value of the paragraph ending emphasis parameter as necessary so as to emphasize the paragraph ending according to the determination result.

そして音声合成部４８は、このようにして音量パラメータやスピードパラメータ等の各種パラメータを設定し終えると、ステップＳＰ３０に進んで、与えられた文字列情報に基づく文字列に応じた音声波形を生成すると共に、この音声波形を上述のようにして設定した音量パラメータ、スピードパラメータ、ポーズ長パラメータ及び文節末強調パラメータに応じて変形し、かくして得られた音声波形の音声信号Ｓ４をスピーカ２３に送出する。 When the speech synthesizer 48 finishes setting the various parameters such as the volume parameter and the speed parameter in this way, the speech synthesizer 48 proceeds to step SP30 and generates a speech waveform corresponding to the character string based on the given character string information. At the same time, the speech waveform is transformed in accordance with the volume parameter, speed parameter, pause length parameter and paragraph end emphasis parameter set as described above, and the speech signal S4 having the speech waveform thus obtained is sent to the speaker 23.

このようにしてこのロボット１においては、対話相手までの物理的な距離に応じて発話音量や、発話スピード、イントネーション及び発話時の間などの発話形態を制御する一方、さらに発話音量についてはユーザごとに制御することにより、常に対話相手に聞き取り易い発話を行い得るようになされている。 In this way, the robot 1 controls the utterance volume, the utterance speed, intonation and the duration of utterance according to the physical distance to the conversation partner, and further controls the utterance volume for each user. By doing so, it is possible to always make an utterance that is easily heard by the conversation partner.

（３）本実施の形態の動作及び効果
以上の構成において、このロボット１は、対話相手のユーザが離れた場所にいるほど、より大きな音量で、よりゆっくりと、文節間により大きな間を開けて発話する一方、そのユーザまでの距離によっては、さらに文節末を強調するようなイントネーションで発話したり、各文節の後ろに「ね」という言葉を挿入して発話する。 (3) Operation and effect of the present embodiment In the above configuration, the robot 1 can open a larger space between phrases more slowly and at a higher volume as the user of the conversation partner is farther away. On the other hand, depending on the distance to the user, the user speaks with an intonation that emphasizes the end of the phrase, or inserts the word “ne” after each phrase.

従って、このロボット１は、対話相手のユーザまでの距離にかかわりなく常にユーザに聞き取り易い発話音量で発話することができることは無論のこと、例えば対話場所が残響が多い部屋などである場合においても、常にユーザに聞き取り易い発話を行うことができ、この結果としてユーザとスムーズかつ自然な音量での対話を行うことができる。 Therefore, this robot 1 can always speak at an utterance volume that is easily heard by the user regardless of the distance to the user of the conversation partner. For example, even when the conversation place is a room with a lot of reverberation, It is possible to always make an utterance that is easy to hear for the user, and as a result, it is possible to have a smooth and natural conversation with the user.

この場合において、このロボット１では、かかる発話音量の基準音量をユーザからの要求に応じてユーザごとに変更するため、対話相手のユーザの聴覚特性に応じた適切な音量で各ユーザと対話を行うことができ、その分ユーザに不快感を与えることなく、より一層とユーザとスムーズな対話を行うことができる。 In this case, in this robot 1, since the reference volume of the utterance volume is changed for each user in response to a request from the user, the conversation is performed with each user at an appropriate volume according to the auditory characteristics of the conversation partner user. Therefore, a smoother dialogue with the user can be performed without causing the user to feel uncomfortable.

以上の構成によれば、対話相手までの物理的な距離に応じて発話音量や、発話スピード、イントネーション及び発話時の間などの発話形態を制御するようにしたことにより、常にユーザに聞き取り易い発話形態で発話を行うことができる。かくするにつきユーザとスムーズ対話を行うことができ、かくしてエンターテインメント性を向上させ得るロボットを実現できる。 According to the above configuration, the utterance volume such as the utterance volume, utterance speed, intonation and utterance time is controlled according to the physical distance to the conversation partner, so that the utterance form is always easy to hear for the user. You can speak. In this way, it is possible to realize a robot capable of smoothly interacting with the user and thus improving the entertainment property.

（４）他の実施の形態
なお上述の実施の形態においては、本発明をヒューマノイド型のロボット１に適用するようにした場合について述べたが、本発明はこれに限らず、この種々の形態のロボット装置及びロボット装置以外の対話機能を有する種々の形態の機器に広く適用することができる。 (4) Other Embodiments In the above-described embodiment, the case where the present invention is applied to the humanoid robot 1 has been described. The present invention can be widely applied to various types of devices having interactive functions other than robot devices and robot devices.

また上述の実施の形態においては、対話相手までの距離に応じて発話音量、発話スピード、イントネーション及び発話時の間を変更するようにした場合について述べたが、本発明はこれに限らず、これら発話形態のうちの一部のみを変更し又はこれら発話形態に加えて他の発話形態をも変更するようにしても良い。 In the above-described embodiment, the case where the utterance volume, the utterance speed, the intonation and the utterance time are changed according to the distance to the conversation partner has been described, but the present invention is not limited to this, and these utterance forms Of these, only a part of them may be changed or other utterance forms may be changed in addition to these utterance forms.

この場合において、上述の実施の形態においては、ロボット１がイントネーションを変更する際に各文節末を強めるようなイントネーションで発話するようにした場合について述べたが、本発明はこれに限らず、文節末以外の例えば助詞を強調するようにイントネーションを変更するようにしても良い。 In this case, in the above-described embodiment, a case has been described in which the robot 1 speaks with an intonation that strengthens the end of each phrase when the intonation is changed. However, the present invention is not limited to this, and the phrase is not limited to this. For example, the intonation may be changed so as to emphasize particles other than the end.

さらに上述の実施の形態においては、対話相手までの距離に応じて発話音量、発話スピード、イントネーション及び発話時の間を段階的に変更するようにした場合について述べたが、本発明はこれに限らず、これらを対話相手までの距離に応じて関数式を用いた演算等により連続的に変更するようにしても良い。この場合において例えばイントネーション（各文節末の強調）の変更については、対話相手までの距離に応じて各文節末の強調の程度を連続的に変化させるようにすれば良い。また各文節末の後ろに「ね」を挿入することによる対話時の間については、全ての文節末の後ろに挿入するのではなく、対話相手までの距離に応じて段階的に「ね」を挿入する文節末の数を変化させるようにしても良い。なお、文節間の間を稼ぐために各文節末の後ろに挿入する言葉としては「ね」以外の言葉を適用できることは言うまでもない。 Furthermore, in the above-described embodiment, the case where the utterance volume, the utterance speed, the intonation and the utterance time are changed stepwise according to the distance to the conversation partner has been described, but the present invention is not limited to this, You may make it change these continuously by the calculation using a function type | formula etc. according to the distance to a dialogue partner. In this case, for example, regarding the change of intonation (emphasis of each sentence end), the degree of emphasis at each sentence end may be continuously changed according to the distance to the conversation partner. Also, during conversations by inserting “ne” after the end of each sentence, insert “ne” in stages according to the distance to the conversation partner, instead of inserting them after the end of all sentences. You may make it change the number of paragraph ends. Needless to say, words other than “Ne” can be applied as words inserted after the end of each phrase in order to earn a space between phrases.

さらに上述の実施の形態においては、対話相手のユーザまでの距離を検出する距離検出手段を、一対のＣＣＤカメラ２０と、その出力に基づきステレオビジョン法により当該距離を検出する短期記憶部４４とにより構成するようにした場合について述べたが、本発明はこれに限らず、例えば距離センサなどの他の手段を広く適用することができる。 Further, in the above-described embodiment, the distance detecting means for detecting the distance to the user of the conversation partner is constituted by the pair of CCD cameras 20 and the short-term storage unit 44 for detecting the distance by the stereo vision method based on the output thereof. Although the case where it comprised is described, this invention is not limited to this, For example, other means, such as a distance sensor, can be applied widely.

さらに上述の実施の形態においては、対話相手のユーザまでの距離に応じて、当該ユーザとの対話時におけるロボット１の発話形態を必要に応じて変更させる発話形態変更手段としての機能を、発話文字列変形部５１及び音声合成部４８に分散させるようにした場合について述べたが、本発明はこれに限らず、かかる全ての機能を音声合成部４８に搭載するようにしても良い。 Furthermore, in the above-described embodiment, the function as an utterance form changing means for changing the utterance form of the robot 1 at the time of dialogue with the user as necessary according to the distance to the user of the conversation partner, Although the case where it is made to distribute to the row | line | column deformation | transformation part 51 and the speech synthesizing part 48 was described, this invention is not limited to this, You may make it mount all these functions in the speech synthesizing part 48.

さらに上述の実施の形態においては、対話相手のユーザを特定するユーザ特定手段としての機能を短期記憶部４４にもたせるようにした場合について述べたが、本発明はこれに限らず、かかる機能を行動選択制御部４５にもたせるようにしても良い。 Furthermore, in the above-described embodiment, the case where the short-term storage unit 44 is provided with a function as a user specifying means for specifying the user of the conversation partner has been described. The selection control unit 45 may be provided.

本発明は、エンターテインメントロボットのほか、対話機能を有する他の用途のロボット装置やロボット装置以外の他の機器に広く適用することができる。 The present invention can be widely applied to an entertainment robot, a robot apparatus for other uses having an interactive function, and other devices other than the robot apparatus.

本実施の形態によるロボットの外観構成を示す斜視図である。It is a perspective view which shows the external appearance structure of the robot by this Embodiment. 本実施の形態によるロボットの内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the robot by this Embodiment. 制御ユニットの構成を示すブロック図である。It is a block diagram which shows the structure of a control unit. 本実施の形態によるロボットの行動制御システムの具体的構成を示すブロック図である。It is a block diagram which shows the specific structure of the action control system of the robot by this Embodiment. 基準音量記憶部に記憶保持されたユーザごとの基準音量の説明に供する概念図である。It is a conceptual diagram with which it uses for description of the reference | standard volume for every user memorize | stored and hold | maintained at the reference | standard volume memory | storage part. 基準音量変更処理手順を示すフローチャートである。It is a flowchart which shows a reference | standard volume change process sequence. 文字列変形判断テーブルを示す概念図である。It is a conceptual diagram which shows a character string deformation | transformation judgment table. 文字列変形処理手順を示すフローチャートである。It is a flowchart which shows a character string deformation | transformation process procedure. 音量変更量テーブルを示す概念図である。It is a conceptual diagram which shows a volume change amount table. スピード変更量テーブルを示す概念図である。It is a conceptual diagram which shows a speed change amount table. ポーズ長変更量テーブルを示す概念図である。It is a conceptual diagram which shows a pose length change amount table. 文節末強調判断テーブルを示す概念図である。It is a conceptual diagram which shows the paragraph end emphasis determination table. 発話変更処理手順を示すフローチャートである。It is a flowchart which shows the speech change process sequence.

Explanation of symbols

１……ロボット、４０……行動制御システム、４１……画像認識部、４２……音声認識部、４４……短期記憶部、４５……行動選択制御部、４７……発話文字列データベース、４８……音声合成部、４９……基準音量記憶部、５０……文字列変形判断テーブル、５１……発話文字列変形部、５２……音量変更量テーブル、５３……スピード変更量テーブル、５４……ポーズ長変更量テーブル、５５……文節末強調判断テーブル、Ｓ１……画像信号、Ｓ２、Ｓ４……音声信号、ＲＴ１……基準音量変更処理手段、ＲＴ２……文字列変形処理手順、ＲＴ３……発話変更処理手順。
DESCRIPTION OF SYMBOLS 1 ... Robot, 40 ... Action control system, 41 ... Image recognition part, 42 ... Voice recognition part, 44 ... Short-term memory part, 45 ... Action selection control part, 47 ... Utterance character string database, 48 …… Speech synthesizer 49. Reference volume storage 50, character string deformation determination table 51 utterance character string deformation unit 52 52 volume change amount table 53 53 speed change amount table 54 ... pause length change amount table, 55 ... sentence end emphasis judgment table, S1 ... image signal, S2, S4 ... audio signal, RT1 ... reference volume change processing means, RT2 ... character string transformation processing procedure, RT3 ... ... Speech change processing procedure.

Claims

In an utterance control device that controls the utterance of a device having a dialogue function with a user and the device at the time of dialogue with the user,
Distance detecting means for detecting the distance between the device and the user of the conversation partner;
Utterance form changing means for changing the utterance form of the equipment at the time of dialogue with the user as needed according to the distance between the device and the user detected by the distance detecting means. A featured speech control device.

The utterance form changing means is:
The utterance control apparatus according to claim 1, wherein the utterance volume is changed as the utterance form.

The utterance form changing means is:
The utterance control apparatus according to claim 1, wherein the utterance speed is changed as the utterance form.

The utterance form changing means is:
The utterance control apparatus according to claim 1, wherein the intonation of the utterance is changed as the utterance form.

The utterance form changing means is:
The utterance control apparatus according to claim 1, wherein between the phrases is changed as the utterance form.

A character string output means for outputting a character string according to the content to be uttered by the device;
Voice means for generating a voice signal of a synthesized voice corresponding to the character string;
A speaker for outputting sound based on the sound signal,
The utterance form changing means is:
The speech control apparatus according to claim 5, wherein the character string output from the character string output unit is transformed as a method of changing between the phrases.

User identification means for identifying the user of the conversation partner;
Storage means for storing the reference volume for each user;
Reference volume changing means for changing the reference volume of the user stored in the storage means in response to a request from the user at the time of dialogue,
The utterance form changing means is:
The utterance control device according to claim 2, wherein the utterance volume is changed based on the reference volume of the user of the conversation partner stored in the reference volume changing means.

The utterance form changing means is:
3. The utterance control apparatus according to claim 2, wherein when the utterance volume after the change exceeds a predetermined threshold, another utterance form is changed by the maximum amount.

In an utterance control method for controlling the utterance of a device having a dialogue function with a user and the device at the time of dialogue with the user,
A first step of detecting a distance between the device and the user of the conversation partner;
An utterance control method comprising: a second step of changing the utterance form of the device at the time of dialogue with the user according to need according to the detected distance between the device and the user. .

In the second step,
The speech control method according to claim 9, wherein the speech volume is changed as the speech form.

In the second step,
The speech control method according to claim 9, wherein the speech speed is changed as the speech form.

In the second step,
The speech control method according to claim 9, wherein the intonation of the speech is changed as the speech form.

In the second step,
The utterance control method according to claim 9, wherein between the phrases is changed as the utterance form.

The second step is
A character string output step for outputting a character string corresponding to the content to be uttered by the device;
A voice generation step of generating a voice signal of a synthesized voice according to the character string;
An audio output step for outputting audio based on the audio signal,
In the above voice generation step,
The speech control method according to claim 13, wherein the output character string is transformed as a method of changing the intonation of the speech.

A storage step for storing the reference volume for each user;
A user identification step for identifying the user of the conversation partner;
A reference volume changing step for changing the reference volume of the user stored in response to a request from the user at the time of dialogue,
In the second step,
The utterance control method according to claim 10, wherein the utterance volume is changed based on the stored reference volume of the user of the conversation partner.

In the second step,
The utterance control method according to claim 10, wherein when the utterance volume after the change exceeds a predetermined threshold, another utterance form is changed by the maximum amount.

In a robot apparatus having a dialog function with a user,
Distance detection means for detecting the distance to the user of the conversation partner;
An utterance form changing means for changing an utterance form at the time of dialogue with the user as needed according to the distance to the user detected by the distance detecting means.

The utterance form changing means is:
The robot apparatus according to claim 17, wherein the utterance volume is changed as the utterance form.

The utterance form changing means is:
The robot apparatus according to claim 17, wherein the speech speed is changed as the speech form.

The utterance form changing means is:
The robot apparatus according to claim 17, wherein the intonation of the utterance is changed as the utterance form.

A character string output means for outputting a character string corresponding to the content to be uttered by the robot device;
Voice means for generating a voice signal of a synthesized voice corresponding to the character string;
A speaker for outputting sound based on the sound signal,
The utterance form changing means is:
The robot apparatus according to claim 20, wherein the character string output from the character string output unit is transformed as a method of changing the intonation of the utterance.

The utterance form changing means is:
The robot apparatus according to claim 17, wherein between the phrases is changed as the utterance form.

User identification means for identifying the user of the conversation partner;
Storage means for storing the reference volume for each user;
Reference volume changing means for changing the reference volume of the user stored in the storage means in response to a request from the user at the time of dialogue,
The utterance form changing means is:
The robot apparatus according to claim 18, wherein the utterance volume is changed based on the reference volume of the user of the conversation partner stored in the reference volume changing means.

The utterance form changing means is:
The robot apparatus according to claim 18, wherein when the utterance volume after the change exceeds a predetermined threshold, another utterance form is changed by a maximum amount.