JP2024026341A

JP2024026341A - Equipment and programs etc.

Info

Publication number: JP2024026341A
Application number: JP2023208965A
Authority: JP
Inventors: 隆之水野; Takayuki Mizuno; 幹雄島津江; Mikio Shimazue; 裕一梶田; Yuichi Kajita; 勇喜清水; Yuki Shimizu; 昌浩和田; Masahiro Wada; 圭三高橋; Keizo Takahashi; 慶介高橋; Keisuke Takahashi
Original assignee: Yupiteru Corp; Yupiteru Kagoshima Corp
Current assignee: Yupiteru Corp; Yupiteru Kagoshima Corp
Priority date: 2018-01-18
Filing date: 2023-12-12
Publication date: 2024-02-28
Anticipated expiration: 2038-01-18
Also published as: JP2025061062A; JP2019124855A; JP7408105B2; JP2022169645A; JP7130201B2; JP7622956B2

Abstract

To provide a device and a program for improving the convenience for a user or another device in communicating with the device.SOLUTION: A robot 1 has a function of outputting voice and a function of communicating with a user. The robot 1 interacts with the user using an interaction engine, converts the content of the previous speech of the user into character string data simultaneously with the interaction, and displays the character string data on a touch panel unit 7 serving as a display unit. The user can visually confirm what the user has spoken, thereby contributing to correction or directionality of subsequent communication.SELECTED DRAWING: Figure 1

Description

本発明は、例えばコミュニケーション等を行う機能を備えた装置及びプログラム等に関するものである。 TECHNICAL FIELD The present invention relates to a device, a program, and the like having a function of, for example, communicating.

特許文献１には、対話式のコミュニケーションロボットに関する技術が開示されている。 Patent Document 1 discloses a technology related to an interactive communication robot.

特開２０１１－０００６８１号公報Japanese Patent Application Publication No. 2011-000681

しかし、従来のコミュニケーションロボットは十分な能力を備えていないという課題があった。そこで従来よりも優れた能力を有する装置及びプログラム等を提供することを目的とする。
本願の発明の目的はこれに限定されず、本明細書および図面等に開示される構成の部分から奏する効果を得ることを目的とする構成についても分割出願・補正等により権利取得する意思を有する。例えば本明細書において「～できる」と記載した箇所を「～が課題で
ある」と読み替えた課題が本明細書には開示されている。課題はそれぞれ独立したものとして記載しているものであり、この課題を解決するための構成についても単独で分割出願・補正等により権利取得する意思を有する。課題が明細書の記載から黙字的に把握されるものであっても、本出願人は本明細書に記載の構成の一部を補正または分割出願にて特許請求の範囲とする意思を有する。またこれら独立の課題を組み合わせた課題も開示されている。 However, the problem with conventional communication robots is that they do not have sufficient capabilities. Therefore, it is an object of the present invention to provide a device, a program, etc., which have better capabilities than the conventional ones.
The purpose of the invention of the present application is not limited to this, and we intend to acquire rights through divisional applications, amendments, etc. for structures that aim to obtain effects from the parts of the structure disclosed in this specification, drawings, etc. . For example, the present specification discloses a problem in which the phrase ``can be done'' is replaced with ``the problem is.'' Each of the problems is described as independent, and we intend to obtain rights for the structure to solve these problems independently through divisional applications, amendments, etc. Even if the problem is understood implicitly from the description, the applicant has the intention to claim a part of the structure described in the specification in an amendment or divisional application. . Also, issues that combine these independent issues are also disclosed.

（１）ユーザー又は他の機器の少なくともいずれか一方への出力情報の出力をすることでコミュニケーションを行う機能とを備える装置であって、前記コミュニケーションのための前記出力情報の生成を制御する機能又は前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能を備えるとよい。
このようにすれば、ユーザーまたは他の機器の少なくともいずれか一方は、生成が制御された出力情報又はタイミングが制御された出力情報の少なくともいずれか一方を得ることができる。従来よりも優れた装置を提供できる。
装置はコミュニケーションのための前記出力情報の生成を制御として例えばコミュニケーションに応じた出力情報を生成するとよい。このようにすれば特にユーザー又は他の機器において装置とのコミュニケーションを図る際の利便性が高まる。コミュニケーションはどのような出力情報を出力して行うようにしてもよいが、特に過去のコミュニケーションの履歴情報を記憶しておき当該履歴情報にも基いて行うとよい。また異なる複数のユーザーまたは他の機器とのコミュニケーションの履歴情報に基いて行うとよい。特に出力情報の出力は１の出力手段から行うようにしてもよいが、異なる複数の出力手段から可能な構成とし、これらのうちいずれかの出力手段を選択して出力を行うようにするとよい。出力手段としては例えば音声出力手段、表示手段、通信手段等とするとよい。コミュニケーションは、音声や目視あるいはそれら以外の五感、例えば触感にうったえコミュニケーションを図る構成としてもよい。 (1) A device having a function of communicating by outputting output information to at least one of a user or another device, the device having a function of controlling generation of the output information for the communication; It is preferable to include a function of controlling the timing of outputting the output information for the communication.
In this way, at least one of the user and the other device can obtain output information whose generation is controlled or output information whose timing is controlled. It is possible to provide a device that is superior to conventional ones.
The device may control the generation of the output information for communication to generate output information corresponding to communication, for example. In this way, it is especially convenient for the user or other equipment to communicate with the device. Communication may be performed by outputting any kind of output information, but it is particularly preferable to store history information of past communications and perform communication based on that history information as well. Further, it is preferable to perform this based on history information of communication with a plurality of different users or other devices. In particular, the output information may be output from one output means, but it is preferable to have a configuration in which a plurality of different output means can be used, and one of these output means is selected to perform the output. As the output means, for example, audio output means, display means, communication means, etc. may be used. Communication may be performed using audio, visual, or other five senses, such as touch.

「装置」は特に出力情報を音声出力をする機能を備えるとよい。このようにすれば、例えば音声を入力して動作するデバイスを制御できる。装置は、特にコミュニケーションするためのインターフェースを備え、コミュニケーションを実行するための判断手段を備えるとよい。
なお「装置」の構成の含まれる部分は複数の筺体で構成してもよいが、特に１つの筺体で構成するとよい。
また装置は、例えば、有線や無線を通じてネットワークにアクセスする機能を備えるシステムとするとよい。特に、例えば、スマートフォン、タブレット端末、スマートスピーカ、スマートカメラ等とするとよい。また、外観も限定されるものではないが、特に、いかにも他者とコミュニケーションをとるような装置とするとよい。特にロボットとするとよい。例えば人や動物を模したような、あるいは例えばそれら以外の擬人化した形態のロボットとすると特によい。 It is particularly preferable that the "device" has a function of outputting the output information in voice. In this way, it is possible to control a device that operates by inputting voice, for example. The device may in particular be provided with an interface for communication and may be provided with decision means for carrying out the communication.
Note that the portion including the configuration of the "apparatus" may be constructed from a plurality of casings, but is particularly preferably constructed from one casing.
Further, the device may be, for example, a system having a function of accessing a network via wire or wireless. In particular, for example, a smartphone, a tablet terminal, a smart speaker, a smart camera, etc. may be used. Further, although the appearance is not limited, it is particularly preferable to use a device that looks like it can communicate with others. Especially good for robots. For example, it is particularly preferable to use a robot that imitates a person or an animal, or an anthropomorphic form other than these.

装置は、コミュニケーションするための入力側のインターフェースを備えるとよく、例えばキーボードのような入力装置、例えば文字を読み込んでデータ化する光学文字認識（ＯＣＲ：Optical character recognition）とのインターフェースでもよいが、入力側のインターフェースとして音声によるものを備えるとよい。音声によるものとしては、例えばマイクロフォンで電気信号に変換した音声信号に基づく音声データの取得する機能を備えるとよい。
出力側のインターフェースとして音声によるものを備えるとよく、例えばスピーカ装置、イヤフォン等がよい。出力側のインターフェースとして目視によるものを備えるとよく、例えば表示内容を変更可能なディスプレイを備えるとよく、例えば液晶ディスプレイ（ＬＣＤ）、プラズマディスプレイ（ＰＤＰ）、有機ELディスプレイ、ブラウン管等の表示
装置を備えるとよい。また、例えば印刷物による出力を備えるとよい。また特に出力側のインターフェースとして実際に動きを発生するものを備えるとよい。実際に動きを発生するものとしてアクチュエータを備えるとよい。例えばモータ等を備えるとよい。特に装置は実際に動きを発生する部材を備えるロボットとするとよい。装置は特に出力情報の出力を、実際に動きを発生する部材の動きとして行なうとよい。特に、出力側のインターフェースとしては音声によるものと目視によるものと実際に動きを発生するものをいずれも備えるとよい。
「ユーザー」は例えば装置を扱える人であって、一人でもよいが、複数人とするとよい。
「他の装置」は上記の装置の具体的な１つと例えば外観、機能等が同じであっても異なるものであってもよい。他の装置は音声出力をする機能を備えてなくともよい。音声出力機能を備えるとよい。また他の装置は音声入力をする機能を備えてなくともよいが音声入力機能を備えるとよい。
他の機器は、ネットワークにアクセスできない機器としてもよいが、ネットワークにアクセスできる機器とするとよい。特にインターネットにアクセスできる機器とするとよい。
出力情報は出力手段からある出力をさせる構成とするとよい。「ある出力」は、例えば外部に対する報知である。明らかな「報知」という形態でなくともそれによって結果的に何かの変化があったことだけでも「報知」と解釈できる。「ある出力」は必ずしも報知することを目的としたものでなくともよい。例えばなんらかの情報を有する、あるいはなんらの情報も有さない音や光の出力がよく、例えば何か物理的な量の変化、物の移動等がよい。 The device is preferably provided with an input-side interface for communication, and may be an interface with an input device such as a keyboard, for example, optical character recognition (OCR) that reads characters and converts them into data. It would be good to have an audio interface as the side interface. As for audio data, it is preferable to have a function of acquiring audio data based on an audio signal converted into an electrical signal using a microphone, for example.
It is preferable to provide an audio interface as the output side interface, for example, a speaker device, earphones, etc. It is preferable to have a visual interface as the output side interface, for example, it is preferable to have a display that can change the display contents, and for example, it is preferable to have a display device such as a liquid crystal display (LCD), a plasma display (PDP), an organic EL display, a cathode ray tube, etc. Good. Furthermore, it is preferable to provide output in the form of printed matter, for example. In particular, it is preferable to provide an interface on the output side that actually generates movement. It is preferable to include an actuator as a device that actually generates movement. For example, it is preferable to include a motor or the like. In particular, the device may be a robot equipped with members that actually generate movement. In particular, the device preferably outputs output information as a movement of a member that actually generates movement. In particular, it is preferable to provide an audio interface, a visual interface, and an interface that actually generates movement as an output interface.
The "user" is, for example, a person who can handle the device, and may be one person, but preferably more than one person.
The "other device" may be the same or different in appearance, function, etc., from the specific one of the above devices. Other devices may not have the function of outputting audio. It is good to have an audio output function. Further, other devices do not need to have a voice input function, but are preferably equipped with a voice input function.
Other devices may be devices that cannot access the network, but are preferably devices that can access the network. In particular, it is preferable to use a device that can access the Internet.
The output information may be configured to cause the output means to output a certain amount. A "certain output" is, for example, a notification to the outside. Even if it is not an obvious form of ``notification,'' just the fact that there is a change as a result of it can be interpreted as ``notification.''"A certain output" does not necessarily have to be for the purpose of notification. For example, the output of sound or light with or without any information is good, such as a change in some physical quantity, movement of an object, etc.

（２）前記装置は、音声による前記コミュニケーションによって表示部での表示態様を変化させるように表示させる表示機能を備えているとよい。 (2) The device may have a display function that changes the display mode on the display unit depending on the voice communication.

音声によってコミュニケーションを取る際に表示部に音声によるコミュニケーションとの関係で表示態様が変化させられるため、音声を目で見る表示に変更してコミュニケーションできることとなり、コミュニケーションを図る際の利便性が高まる。
「表示部」は、音声による前記コミュニケーションによって前記表示部での表示態様を変化させるように表示させるデバイスとするとよく、例えば液晶ディスプレイ（ＬＣＤ）、プラズマディスプレイ（ＰＤＰ）、有機ELディスプレイ、ブラウン管等のような表示装置がよい。特に表示部は装置に備えるとよい。
「音声による前記コミュニケーションによって表示部での表示態様を変化させるように表示させる」は、ユーザー又は他の機器の音声を表示部に表示させてその態様を変化させる場合と、装置自身の音声も表示部に表示させてその態様を変化させる場合のいずれか一方のみとしてもよいが、特に両方を備えるとよく、このときどちらか片方だけ表示させても両方とも表示させてもよいが、片方だけ表示させる状態と両方を表示させる状態との双方を備え、切り替え可能な構成とするとよい。例えば、片方だけ表示させる例としては下記実施の形態１のロボット１で顔画面Ｓ１が表示されている場合の態様であり、両方とも表示させる例としては下記実施の形態１のロボット１でチャット画面Ｓ２が表示されている場合の態様である。
表示態様としては、例えば音声との関係で画面を様々に変化させることとするとよく、例えば、音声によって表示画面に表示されたオブジェクトが動くようなアニメーションを実行させるとよい。例えば、音声の出力に伴って画像を変動させたり、音声の変化によって画像に他の画像を重ねたりするとよい。また、例えば、音声データを文字データに変換して表示させたりするとよい。その文字データの表示は音声の変化に応じて刻々と変化させるとよい。
音声による前記コミュニケーションは、前記出力情報の生成を制御する機能又は前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって
制御するとよく、特に前記コミュニケーションは前記出力情報の生成を制御する機能及び前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよい。
また、表示部での表示態様を変化させる機能は、前記出力情報の生成を制御する機能又は前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよく、特に前記コミュニケーションは前記出力情報の生成を制御する機能及び前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよい。
以下（３）以降も同様に、装置からの出力を行なう構成については、前記出力情報の生成を制御する機能又は前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよく、特に前記コミュニケーションは前記出力情報の生成を制御する機能及び前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよい。 When communicating by voice, the display mode on the display unit is changed in relation to the voice communication, so it is possible to communicate by changing the voice to a visual display, increasing convenience when communicating.
The "display section" is preferably a device that changes the display mode on the display section according to the voice communication, such as a liquid crystal display (LCD), a plasma display (PDP), an organic EL display, a cathode ray tube, etc. A display device like this is good. In particular, it is preferable to include a display section in the device.
"Display in such a way that the display mode on the display section changes depending on the communication by voice" refers to the case where the voice of the user or other device is displayed on the display section and its mode is changed, and the voice of the device itself is also displayed. It is possible to have only one of them displayed in the section and change its aspect, but it is especially good to have both.In this case, it is possible to display either one or both, but only one is displayed It is preferable to have a configuration that has both a state in which the screen is displayed and a state in which both are displayed and can be switched. For example, an example in which only one side is displayed is the case where the face screen S1 is displayed in the robot 1 of the first embodiment below, and an example in which both are displayed is the chat screen in the robot 1 in the first embodiment below. This is the mode when S2 is displayed.
As for the display mode, for example, the screen may be changed in various ways in relation to the audio, for example, it is preferable to perform an animation in which an object displayed on the display screen moves according to the audio. For example, it may be possible to change the image as the sound is output, or to superimpose another image on top of the image as the sound changes. Also, for example, audio data may be converted into character data and displayed. It is preferable that the display of the character data changes from moment to moment according to changes in the voice.
The communication by voice is preferably controlled by a function that controls the generation of the output information or a function that controls the timing of outputting the output information for the communication, and in particular, the communication controls the generation of the output information. It is preferable that the control is performed by a function that controls the function and the timing of outputting the output information for the communication.
Further, the function of changing the display mode on the display unit may be controlled by a function of controlling the generation of the output information or a function of controlling the timing of outputting the output information for the communication, and in particular, the function of controlling the output of the output information for the communication. It is preferable that the control is performed by a function of controlling the generation of the output information and a function of controlling the timing of outputting the output information for the communication.
Similarly, in (3) and later, the configuration for outputting from the device is preferably controlled by a function that controls the generation of the output information or a function that controls the timing of outputting the output information for the communication. In particular, the communication may be controlled by a function that controls the generation of the output information and a function that controls the timing of outputting the output information for the communication.

（３）前記表示部での前記表示態様の変化は、前記ユーザー又は前記他の機器の少なくともいずれか一方の発話のみに基づく構成とするとことがよい。 (3) It is preferable that the change in the display mode on the display unit is based only on utterances from at least one of the user and the other device.

ユーザーや他の機器からの発話に基づいて装置が表示部での表示態様を変化させることでユーザーや他の機器側では自身の発話が装置に認識されているかが目視でわかることとなり、コミュニケーションを図る際の利便性が高まる。また、異種のヒューマンインターフェースによる特殊なコミュニケーションとなって、新鮮でおもしろさを感じる。
発話は直接的にユーザー又は他の機器から行われてもよく、間接的に発話を例えば文字データ化したものを使用してもよい。また、発話をなんらかの対応する情報、例えば他の音や視覚化した模様等に変換し、それに基づいて表示部で表示態様を変化させるようにしてもよい。
（３）では「ユーザー又は他の機器の少なくともいずれか一方の発話に基づく」ものであるため、例えば装置自身は音声のコミュニケーション機能とするとよい。
前記ユーザー又は前記他の機器の少なくともいずれか一方の発話に基づく構成としては、例えば、音声認識機能により音声を文字列に変換して前記ユーザー又は前記他の機器の少なくともいずれか一方の発話内容を特定する構成を備えるとよい。 The device changes the display mode on the display based on the utterances from the user or other devices, allowing the user or other devices to visually see whether their utterances are being recognized by the device, improving communication. It will be more convenient when planning. In addition, the special communication through different types of human interfaces feels fresh and interesting.
The utterance may be made directly from the user or from another device, or may be made indirectly by converting the utterance into text data, for example. Alternatively, the utterance may be converted into some corresponding information, such as another sound or a visualized pattern, and the display mode may be changed on the display unit based on the converted information.
Since (3) is "based on the utterances of at least one of the user and other devices," the device itself may have an audio communication function, for example.
The configuration based on the utterance of at least one of the user or the other device may include, for example, converting the voice into a character string using a voice recognition function and interpreting the content of the utterance of at least one of the user or the other device. It is preferable to have a configuration for specifying.

（４）前記表示部での前記表示態様の変化は、前記装置からの音声出力と交互に行われるようにした。
コミュニケーションが一方的にならず、安定して意思疎通しながらコミュニケーションを図ることができる。「交互」とは、例えば基本的に装置側とユーザー又は他の機器側とのコミュニケーションが対話形式で進行するように構成とするとよく、片方だけが一方的に発話する構成でない構成とするとよい。 (4) The display mode on the display section is changed alternately with audio output from the device.
Communication is not one-sided, and it is possible to communicate in a stable manner. "Alternating" may mean, for example, a configuration in which communication between the device side and the user or other device side basically proceeds in an interactive manner, and it is preferably a configuration in which only one side speaks unilaterally.

（５）前記表示態様は前記ユーザー又は前記他の機器の少なくともいずれか一方の発話が変換された文字情報でを備えるとよい。 (5) The display mode may include character information obtained by converting utterances from at least one of the user and the other device.

このようにすれば、ユーザーや他の機器側では自身の発話がどのように装置に認識されているかが表示された文字情報から具体的にわかることとなり、正しくコミュニケーションができているかを表示された文字情報の内容から判断できる。また、発話した内容の目視での確認ができる。また、認識が誤っているならもう一度言ったり、他の表現で言い直したりして正しいコミュニケーションに導くことができる。
「文字情報」は、例えば日本語であれば、例えば通常の漢字、ひらがな、かたかな等のユーザー又は他の機器の発話に基づく文字であり、発話が文を構成している場合には、漢字、ひらがな、かたかな、外国語表記等の混じった文節を有する文であることがよい。外国語、例えば英語や中国語等で発話される場合には、それらの文字で表示されることがよ
い。
例えば、音声認識機能と音声出力機能とを備え、音声認識機能によって音声認識し文字情報に変換された前記ユーザー又は前記他の機器の少なくともいずれか一方の発話の内容を表示部に表示させ、その内容に基づく返答文字情報を生成し、当該返答文字情報を音声合成機能により音声情報に変換して、音声として出力させる機能を備えると特によい。 In this way, users and other devices will be able to concretely understand how their own utterances are being recognized by the device from the displayed text information, and will be able to see whether or not they are communicating correctly. It can be determined from the content of text information. You can also visually confirm what you have said. Also, if you misunderstand something, you can say it again or rephrase it in a different way to guide correct communication.
For example, in the case of Japanese, "character information" is characters based on the utterances of the user or other devices, such as regular kanji, hiragana, and katakana, and if the utterances constitute a sentence, Sentences should contain a mixture of kanji, hiragana, katakana, foreign language, etc. When a foreign language is spoken, such as English or Chinese, it is preferable to display those characters.
For example, it may be equipped with a voice recognition function and a voice output function, and display on a display unit the contents of the utterance of at least one of the user or the other device, which is recognized by the voice recognition function and converted into text information. It is particularly preferable to have a function of generating response character information based on the content, converting the response character information into voice information using a voice synthesis function, and outputting the voice information.

（６）前記発話が変換された文字は発話の開始から終了までの全内容が同時に前記表示部に表示されるようにするとよい。
ユーザーや他の機器からのある長さを持った発話全体が装置側に受け止められるため、その発話に対する装置からの正しいコミュニケーションが期待できる。また、自らが発話した内容の目視での確認が瞬時にできることとなり、以後のコミュニケーションの修正や方向性に寄与する。
発話の開始から終了は、人が一息で発話できる時間を加味して設定するとよい。発話の終了は、例えば、所定の時間、音声の発話がないとみなされる音の大きさが続いた時点とするとよい。発話の開始は例えば音の大きさが所定のレベルを越えたことを条件に開始するとよい。発話の開始は例えば音の大きさが所定のレベル以上のレベルの急激な変化が検出されたことを条件に開始するとよい。 (6) It is preferable that the entire contents of the characters into which the utterance is converted from the start to the end of the utterance are simultaneously displayed on the display unit.
Since the entire utterance of a certain length from the user or other device is received by the device, correct communication from the device in response to that utterance can be expected. In addition, users can instantly visually check what they have said, which contributes to correcting and directing future communication.
The time from the start to the end of speech should be set taking into account the amount of time a person can speak in one breath. The end of speech may be, for example, when the volume of sound continues for a predetermined period of time at which it is considered that there is no speech. Speech may be started, for example, on the condition that the volume of the sound exceeds a predetermined level. Speech may be started, for example, on the condition that a sudden change in the level of the sound level is greater than or equal to a predetermined level is detected.

（７）前記装置は、前記装置と前記ユーザー又は前記他の機器の少なくともいずれか一方の音声によるコミュニケーションの対話履歴を文字情報として前記表示部に表示させる機能を備えるとよい。 (7) The device may have a function of displaying a dialogue history of voice communication between the device and at least one of the user and the other device as text information on the display unit.

装置とユーザー又は他の機器との間でどのように対話がされたかが容易にわかり、音声によるコミュニケーションにおける利便性が高まる。
対話履歴の表示は、例えばどちらが対話したものかがわかるように表示させることがよい。そのためには、例えば吹き出しを設けていずれの発話に基づく文字情報かを区別することがよい。過去の対話については画面上でスクロールして確認できることがよい。対話形式であることを示すために装置側とユーザー又は他の機器側とで異なるアバターキャラクターを表示させるとよい。いつ発話したのかその日時が同時に表示されるとユーザーが対話履歴から過去を思い出す契機となるのでよい It is easy to see how the device interacted with the user or other equipment, increasing the convenience of voice communication.
It is preferable to display the conversation history in such a way that it is possible to see which party has had the conversation, for example. For this purpose, for example, it is preferable to provide a speech bubble to distinguish which utterance the text information is based on. It is good to be able to check past conversations by scrolling on the screen. In order to indicate that it is an interactive format, it is preferable to display different avatar characters on the device side and on the user or other device side. It is good if the date and time of the utterance are displayed at the same time, as this will give the user an opportunity to recall the past from the dialogue history.

（８）前記装置は、前記対話履歴を文字情報として表示させる際に、前記ユーザー又は前記他の機器の交替があり、前記装置の対話対象が代わった場合には前記表示部にその旨が表示させる機能を備えるとよい。 (8) When displaying the dialogue history as text information, the device may display a message to that effect on the display unit if there is a change in the user or the other device and the device's dialogue target changes. It would be good to have a function to do this.

装置の対話対象が代われば対話内容にも変化がある。対話対象が代わった旨を表示させることで、例えば過去の対話履歴を見た際にその表示があることでその前後で対話内容が代わることが読み手にわかるため、対話内容の切れ目がわかることとなる。
装置は前記ユーザー又は前記他の機器の交代を検出する機能を備えるとよい。交代の検出は、音声の特徴の変化から検出する機能を備えるとよいが、カメラを用いて周囲の人または機器の状態を取得して検出する機能を備える構成が望ましく、特に両者に基づいて検出する構成とするとよい。 If the object of interaction with the device changes, the content of the interaction will also change. By displaying that the subject of dialogue has changed, for example, when viewing past dialogue history, the reader can see that the content of the dialogue has changed before and after that, allowing them to see the break in the content of the dialogue. Become.
The device may have a function of detecting a change in the user or the other device. For the detection of replacement, it is preferable to have a function to detect from changes in the characteristics of the voice, but it is also preferable to have a function to use a camera to acquire and detect the status of surrounding people or equipment. It is good to have a configuration that does this.

（９）前記装置は音声認識機能によって前記ユーザー又は前記他の機器の少なくともいずれか一方の音声の認識状況を前記装置の表示部に表示させる機能を備えるようにするとよい。 (9) Preferably, the device has a function of displaying a voice recognition status of at least one of the user and the other device on a display unit of the device using a voice recognition function.

例えば、ユーザーや他の機器が発話している場合に、それを間違いなく聞いていることをユーザー等に理解させることで円滑に対話が行われていることをユーザー等に理解させることができる。
例えば、視覚を通じた機能として、表示部に音声認識の度合いに応じて、例えば表示画面や表示画面に表示されるオブジェクトの色を変えたり、例えば音声の認識状況応じて異なるオブジェクトを表示をさせたり、音声認識の度合いに応じて量的な表示、例えばよく認識していれば高い数値を示したりすることがよい。聴覚を通じた機能として、例えば音声認識の度合いに応じて音を大きくしたり小さくしたりすることがよく、例えば音色を変えたりすることがよい。
特に擬人化された態様での表示を表示部に行なうようにし、その表情を変化させる構成とするとよい。 For example, when a user or another device is speaking, by making the user understand that the user is definitely listening, it is possible to make the user understand that the conversation is occurring smoothly.
For example, as a visual function, depending on the degree of voice recognition, the display unit can change the color of the display screen or objects displayed on the display screen, for example, or display different objects depending on the voice recognition status. It is preferable to display a quantitative display according to the degree of speech recognition, for example, if the speech recognition is well recognized, a high numerical value is displayed. As a function through auditory sense, for example, it is good to make the sound louder or quieter depending on the degree of voice recognition, and for example, it is good to change the tone color.
In particular, it is preferable to have a configuration in which an anthropomorphic display is displayed on the display section and the facial expression thereof is changed.

（１０）音声を認識して文字列に変換した結果を用いて前記音声出力を行うことで前記コミュニケーションを行うための機能を備え、音声を認識して文字列に変換した結果が、予め前記結果の文字列と出力内容との対応関係を記憶した記憶手段に記憶された文字列と一致する部分がある場合に当該文字列に対応する出力内容を音声出力する機能を備えるようにするとよい。 (10) A function is provided for performing the communication by performing the voice output using the result of recognizing voice and converting it into a character string, and the result of recognizing voice and converting it into a character string is It is preferable to provide a function of outputting the output content corresponding to the character string as a voice when there is a part that matches the character string stored in the storage means that stores the correspondence between the character string and the output content.

音声を認識して変換した文字列が音声記憶手段に記憶された文字列と一致する部分がある場合に、装置内のみで必要な音声出力ができれば、ユーザーや他の機器からの発話に迅速に応じることができる。また、外部サーバーに接続しないため、接続のためのコストが削減できる。
例えば、記憶手段に記憶された文字列として多数のビルトインシナリオを用意することがよい。ビルトインシナリオは予定された対話であって例えば、ユーザーからの「おはよう」に対して装置側から「お元気ですか」と返答するような簡単な挨拶や、一定の処理を実行するための、例えば「設定画面を開いて」（ユーザー）、「本当にいいですか」（装置）、「はい」（ユーザー）、「じゃあ、設定画面を開くね」（装置）のようなシナリオ等がよい。記憶手段としては、例えばコンピュータ内部のＲＯＭやＳＳＤや外付けのＳＤカードmicroＳＤカード、ＣＤ－ＲＯＭ等がよい。 If a character string converted by recognizing voice matches a character string stored in the voice storage means, if the necessary voice output can be performed only within the device, it will be possible to quickly respond to utterances from the user or other devices. I can comply. Additionally, since there is no connection to an external server, connection costs can be reduced.
For example, it is preferable to prepare a large number of built-in scenarios as character strings stored in the storage means. Built-in scenarios are scheduled interactions, such as a simple greeting such as "How are you" from the device in response to a "Good morning" from the user, or a conversation to perform a certain process, for example. Scenarios such as ``Open the settings screen'' (user), ``Are you sure?'' (device), ``Yes'' (user), and ``Okay, let's open the settings screen'' (device) are good examples. As a storage means, for example, a ROM or SSD inside the computer, an external SD card, a microSD card, a CD-ROM, etc. are preferable.

ここで「文字列と一致する部分がある場合」とは、完全に記憶された文字列と一致する場合と、ある部分が異なっていてもよい正規表現である場合である。正規表現とは文字列の集合を一つの文字列で表現する言語処理方法の一つであり、例えば「×××音量大きく××」という場合に「ユピ坊音量大きくしてよ」とか「おい音量大きくして」「音量大きくしてください」等のように異なる部分があっても要部が一致すれば解釈として「音量を大きく」する表現として認識するような場合である。そして、「当該文字列に対応する出力内容を音声出力する」とは、例えば、このような「音量を大きく」するという当該文字列に応じて「はい、音量を大きくします」というような音声出力がよい。また、このような音声出力に続いて装置はある処理をするようにしてもよい。例えば、「はい、音量を大きくします」という発話の後で装置は実際に以後の対話における自身の発話の音量を大きくすることがよい。
音声を認識して文字列に変換する処理は、装置で行なうようにしてもよいが、ネットワークに接続された音声認識サーバーに対して音声データを送信し、音声認識サーバーで変換された文字列を受信するようにして行なうようにしてもよい。望ましくは両者を備えるとよく、コミュニケーションの状況等に応じていずれの結果を用いるかを決定する機能を備えるとよい。 Here, "the case where there is a part that matches the character string" refers to the case where there is a match with a completely stored character string, and the case where the regular expression may be different in some part. A regular expression is a language processing method that expresses a set of character strings as a single string. For example, when you say "××× louder××", you can say "Yupibo, turn up the volume" or "hey, please turn up the volume." This is a case where even if there are different parts, such as "Please turn up the volume," or "Please turn up the volume," if the main parts are the same, the expression is interpreted as "Turn up the volume.""Outputting the output content corresponding to the character string" means, for example, outputting a voice that says "Yes, I will increase the volume" in response to the character string "increase the volume". Good output. Furthermore, the device may perform certain processing following such audio output. For example, after uttering "Yes, turn up the volume," the device may actually increase the volume of its own utterances in subsequent interactions.
The process of recognizing voice and converting it into a character string may be performed by the device, but it is also possible to send voice data to a voice recognition server connected to a network and convert the character string converted by the voice recognition server. It may also be done by receiving the information. It is desirable to have both, and it is good to have a function to decide which result to use depending on the communication situation.

（１１）音声を認識して文字列に変換した結果が、予め前記結果の文字列と出力内容との対応関係を記憶した記憶手段に記憶された文字列と一致する部分がない場合に、対話エンジンを備えるサーバーに接続して音声データを出力する機能を備えるとよい。 (11) If the result of recognizing speech and converting it into a character string does not match the character string stored in the storage means that previously stored the correspondence between the resulting character string and the output content, the dialog It is preferable to have a function of connecting to a server equipped with an engine and outputting audio data.

音声を認識して変換した文字列が音声記憶手段に記憶された文字列と一致する部分がない場合に、対話エンジンを備えるサーバーに接続するため、接続のためのコストが削減で
きる。
サーバーは、例えばインターネット回線を使用して接続する記憶部、制御部としてのコンピュータの機能を有する装置とするとよい。本発明では対話エンジンを備えていることがよい。外部サーバーの場合には例えばＩＤやパスワードや電子認証によって接続可能となる。外部サーバーはクラウドサーバーがよい。サーバーは音声認識エンジンを備え、音声認識エンジンによって音声を文字列データに変換ことができることがよい。変換された文字列データはインターネット回線を使用して装置に送信されることがよい。
対話エンジンを備えるサーバーに接続して音声データを出力する機能は、例えば、音声認識した文字列を対話エンジンに送信し、対話エンジンからその文字列に対応する対話内容を含む文字列を受信して、当該対話内容の文字列を音声合成機能で音声データに変換するとよい。
文字列の音声データへの変換は、装置に備えた音声合成エンジンで行ってもよいが、文字列を音声データに変換する音声認識サーバーに文字列を送信し、当該音声認識サーバーから変身された当該文字列に対応する音声データを受信して行うとよい。 When there is no part of the character string obtained by recognizing and converting the voice that matches the character string stored in the voice storage means, a connection is made to a server equipped with a dialogue engine, thereby reducing the cost for connection.
The server may be, for example, a device that is connected using an Internet line and has computer functions as a storage unit and a control unit. The present invention preferably includes a dialogue engine. In the case of an external server, connection can be made using, for example, an ID, password, or electronic authentication. A cloud server is recommended as the external server. Preferably, the server includes a speech recognition engine, and the speech recognition engine is capable of converting speech into character string data. The converted character string data may be sent to the device using an internet line.
The function to connect to a server equipped with a dialogue engine and output voice data is, for example, to send a voice-recognized character string to the dialogue engine, and receive a character string containing the dialogue content corresponding to the character string from the dialogue engine. , it is preferable to convert the character string of the dialogue content into audio data using a speech synthesis function.
The conversion of character strings into voice data may be performed by a speech synthesis engine provided in the device, but it is also possible to send the character strings to a voice recognition server that converts character strings into voice data, and to convert the character strings from the voice recognition server. It is preferable to perform this by receiving audio data corresponding to the character string.

（１２）音声を認識して文字列に変換した結果が、予め前記結果の文字列と出力内容との対応関係を記憶した記憶手段に記憶された文字列と一致する部分があっても、ある条件を満たすことで音声認識エンジンを備えるサーバーに接続して音声データを出力する機能を備えるようにするとよい。 (12) Even if the result of recognizing speech and converting it into a character string has a part that matches a character string stored in a storage means that stores in advance the correspondence between the resulting character string and the output content, It is preferable to provide a function that connects to a server equipped with a speech recognition engine and outputs speech data when conditions are met.

音声を認識して変換した文字列が記憶手段に記憶された文字列と一致する部分がある場合に、ユーザーが予測できるような決まった音声出力をすることは対話の意欲を削ぐことにもなるため、敢えてこのよう外部サーバーに接続することが、より人間的な対話ができることとなりよい。
例えば、ユーザーから「こんにちは」と発話がされ、それを装置側が認識した場合に、本来のシナリオでは「こんにちは、ご機嫌はいかがですか」というように対話をさせるビルトインシナリオであった場合に、そのシナリオを使用せずに外部サーバーに「こんにちは」という音声データをリクエストし、外部サーバーの対話エンジンを使用してその「こんにちは」に対する返答データの作成をリクエストするようにすることがよい。ある条件は例えば何回かに一回の回数や、ランダムなタイミングとするとよい。 If there is a part of the character string that is converted by recognizing the voice that matches the character string stored in the storage device, outputting a fixed voice that the user can predict will also reduce the user's desire to interact. Therefore, connecting to an external server in this way allows for more human-like dialogue.
For example, when a user utters "Hello," and the device recognizes it, the original scenario is a built-in scenario that prompts a conversation like "Hello, how are you doing?" It is better to request voice data for "Hello" from an external server without using a scenario, and request creation of response data for that "Hello" using the external server's dialogue engine. The certain condition may be, for example, once every few times or random timing.

（１３）音声認識後に音声が途切れて無音状態となったことを検知する機能と、音声認識から無音状態となるまでの音声データを記憶する記憶手段と、前記記憶手段に記憶された音声データを無音状態となったタイミングで音声認識エンジンを備えるサーバーに接続して音声データを出力する機能を備えるとよい。 (13) A function for detecting when the voice is interrupted and becomes silent after voice recognition, a storage means for storing voice data from voice recognition to the silence state, and a function for storing the voice data stored in the memory means. It is preferable to have a function of connecting to a server equipped with a voice recognition engine and outputting voice data when a silent state occurs.

対話においてはしばしば無音状態となることがある。しかし、無音状態となっても外部の音声認識エンジンを備えるサーバーに接続したままでは無用なコストがかかってしまう。そのためこのような前もって音声認識から無音状態となるまでの音声データを記憶手段に記憶させ、リアルタイムではなくその音声データを無音状態となったタイミングで送ることで無音部分の時間分をカットできるため、コストが削減できる。 There is often silence in dialogue. However, even in a silent state, if the device remains connected to a server equipped with an external speech recognition engine, unnecessary costs will be incurred. Therefore, by storing the audio data from speech recognition to silence in advance in a storage means and sending that audio data at the timing of silence rather than in real time, it is possible to cut out the silent portion. Costs can be reduced.

（１４）前記装置は録音機能を備え、所定の音圧レベルの音声の検出によって音声認識エンジンを備えるサーバーに接続して音声データを出力する機能を備えるようにするとよい。
常に外部の音声認識エンジンを備えるサーバーに接続したままでは無用なコストがかかってしまう。これによって無音や無音に近いような対話になっていない場合には接続せずに必要な対話が開始される場合にのみ外部サーバーに接続するため、接続のためのコストが削減できる。
音声認識エンジンを備えるサーバーは、装置ですでに録音済みの過去の所定期間の録音
データを受信して、当該録音データに対する文字列を返信するものとしてもよいが、特に、例えばストリーミングデータとしてリアルタイムに音声データを受信して、文字列を返信するタイプのものとするとよい。音声データの受信時間当たり何円という形で従量課金等されるケースが多いが、大幅にコストを削減することが可能となる。 (14) The device preferably has a recording function, and a function of connecting to a server equipped with a voice recognition engine and outputting voice data by detecting voice at a predetermined sound pressure level.
If the device is always connected to a server equipped with an external speech recognition engine, unnecessary costs will be incurred. As a result, the connection is not made unless the dialogue is silent or close to silent, and the external server is connected only when a necessary dialogue starts, thereby reducing connection costs.
A server equipped with a speech recognition engine may receive recorded data of a predetermined period in the past that has already been recorded by the device, and return a character string corresponding to the recorded data, but in particular, it may receive recorded data of a predetermined period in the past, which has already been recorded by the device, and return a character string corresponding to the recorded data. It is best to use a type that receives audio data and returns a character string. Although there are many cases where a pay-as-you-go charge is made in the form of a number of yen per hour for receiving audio data, it is possible to significantly reduce costs.

（１５）音声認識エンジンを備えるサーバーに接続して音声データを出力した際に、前記サーバーがビジー状態である場合に、前記ユーザーに対して記憶手段に記憶された対話データから選択された対話例を音声出力する機能を備えるようにするとよい。 (15) An example of a dialogue selected from the dialogue data stored in the storage means for the user when the server is busy when connecting to a server equipped with a speech recognition engine and outputting audio data. It is preferable to provide a function to output audio.

ビジー状態である場合にはその旨の報知をすることが普通であるが、例えば対話途中でそのような報知は唐突でいかにも対話とは関係ない発話であり、対話がしらけてしまう可能性もある。そのため、ビジー状態である旨の報知の代わりに例えば「もう一度いってくれる？」という呼びかけや「ほう、そうですか」などのつなぎの発話をして対話をつなぐようにすれば、その間に音声認識エンジンに接続して適切な対話を続けることが可能となるし、対話が不自然にならない。 When a user is busy, it is normal to make a notification to that effect, but such notifications are sudden and unrelated to the conversation, for example, and can potentially disrupt the conversation. . Therefore, instead of notifying you that you are busy, for example, you can use a call to say "Can you come again?" or a transitional utterance such as "Huh, I see." It becomes possible to connect to the engine and continue appropriate dialogue, and the dialogue does not become unnatural.

（１６）認識した前記ユーザーの発話が長すぎると判断した場合に、音声認識エンジンを備えるサーバーに接続することなく記憶手段に記憶された音声データから選択された対話例を音声出力する機能を備えるようにするとよい。 (16) Provided with a function to output a dialogue example selected from the voice data stored in the storage means without connecting to a server equipped with a voice recognition engine when it is determined that the recognized utterance of the user is too long. It is better to do this.

ユーザー側の発話が長すぎると、音声認識エンジンが誤認識をする可能性がある。そして、その結果的外れな返答が返ってくることがある。そのため、一定以上のセンテンスになってしまった場合には、あえてそのような可能性を排除して対話を仕切り直しするために「うん」とか「マジ？」とか「本当ですか？」などという対話においてどのようにも取れる相づちのような対話例を選択して音声出力することがよく、それによって適切な対話を続けることが可能となる。 If the user's utterance is too long, the speech recognition engine may misrecognize it. As a result, you may receive an irrelevant answer. Therefore, when a sentence exceeds a certain level, in order to eliminate such a possibility and restart the conversation, say things like "Yeah," "Seriously?" or "Is it true?" It is best to select and output audible dialogue examples that can be taken in any way, allowing the user to continue the dialogue appropriately.

（１７）対話による前記コミュニケーションにおいて、前記装置の音声を聞き逃した際に、前記ユーザーのある発話に基づいて前記装置は直前の音声を再度出力するとよい。
例えば「もう一度言って」とか「もう一回しゃべって」のような直前に装置が話した言葉が聞き取れなかったり、うっかり聞き忘れた場合にこのような呼びかけをすることで、直前に装置が話した言葉を発話させることができる。これによって、直前まで行っていた対話を途切れさせることなくそのまま続けることが可能となる。 (17) In the communication through dialogue, when the voice of the device is missed, the device may re-output the previous voice based on a certain utterance of the user.
For example, if you cannot hear or forget to hear what the device said just before, such as "Say it again" or "Speak again," you can use this call to listen to the words spoken by the device just before. Can make words speak. This makes it possible to continue the conversation that was occurring just before without interruption.

（１８）音声を認識できなかった場合に、前記ユーザーに対して再度の発話を促すように前記装置から音声が出力されるとよい。
これによって、直前まで行っていた対話を途切れさせることなくそのまま続けることが可能となる。 (18) When the voice cannot be recognized, it is preferable that the device outputs a voice to prompt the user to speak again.
This makes it possible to continue the conversation that was occurring just before without interruption.

（１９）認識した音声内容がある条件を満たす場合に、表示部にある表示をさせるようにするとよい。
例えば、所定の言葉が含まれた発話がされ、それを音声認識した場合に、表示部にその言葉に対応する「ある表示」をさせるようにする。「所定の言葉」とは、例えば、ユーザーの誕生日、ユーザーの子供の名前、装置の愛称、会社の名称、特定の宣伝用のキャッチフレーズ等とするとよい。所定の言葉とある表示との対応関係を予め設定しておく機能を備えるとよい。これによって、単なる対話に留まらず目視を含めたコミュニケーションをすることができ、装置との間でコミュニケーションの態様が増すこととなってコミュニケーションを図る際の利便性が高まる。 (19) When the recognized voice content satisfies a certain condition, a certain display may be displayed on the display unit.
For example, when an utterance containing a predetermined word is made and the speech is recognized, the display section is made to display "a certain display" corresponding to the word. The "predetermined word" may be, for example, the user's birthday, the name of the user's child, the nickname of the device, the name of the company, a specific advertising catchphrase, or the like. It is preferable to have a function of presetting a correspondence between a predetermined word and a certain display. As a result, it is possible to communicate not only by simple dialogue but also by visual inspection, increasing the number of modes of communication with the device, and increasing the convenience of communication.

（２０）前記装置は筐体又は筐体に接続される部分を動かす機能を備え、認識した音声
内容がある条件を満たす場合に、筐体又は筐体に接続される部分がある動きをするとよい。
例えば、所定の言葉が含まれた発話がされ、それを音声認識した場合に、筐体又は筐体に接続される部分にその言葉に対応するある動き、例えばジェスチャーをさせるようにする。これによって、単なる対話に留まらず装置の動きを含めたコミュニケーションをすることができ、装置との間でコミュニケーションの態様が増すこととなってコミュニケーションを図る際の利便性が高まる。上記の「ある表示」と組み合わせると特によい。 (20) The device preferably has a function of moving the casing or the part connected to the casing, and when the recognized voice content satisfies a certain condition, the casing or the part connected to the casing may make a certain movement. .
For example, when an utterance containing a predetermined word is made and the speech is recognized, the casing or a part connected to the casing is made to make a certain movement, such as a gesture, corresponding to the word. This allows communication to include not only simple dialogue but also the movement of the device, increasing the number of modes of communication with the device, and increasing the convenience of communication. It is especially good when combined with the above “certain indication”.

（２１）前記装置は前記ユーザーが目として認識できる部分である目部と、前記ユーザーの位置を認識するユーザー位置認識機能と、前記目部を動かす機能とを備え、前記コミュニケーションとして前記位置認識機能で認識した前記ユーザーの位置方向を向くよう前記目部を動かす機能を備えるとよい。 (21) The device includes an eye that is a part that the user can recognize as an eye, a user position recognition function that recognizes the user's position, and a function that moves the eye, and the position recognition function serves as the communication. It is preferable to have a function of moving the eyes so as to face the direction of the user's position recognized in the above.

目として認識できる目部がユーザーの位置方向を向くことで、実際に人と話しているような疑似感覚を得られることとなり、装置とコミュニケーションを取りたいという欲求もますこととなり、装置の利用価値が向上する。
「ユーザーが目として認識できる部分である目部」は表示画面に表示されるオブジェクトとしての目でもよく、そのようなバーチャルな映像ではない実際に機械的に動作する目でもよい。目部と同期して装置自体もユーザーの位置方向を向くよう制御してもよい。目部をユーザーの方向に向けるための装置だけをユーザーの位置方向を向くよう制御してもよい。 When the eyes, which can be recognized as eyes, face the direction of the user's position, the user can get a pseudo-sensation of talking to a person, increasing the desire to communicate with the device, and increasing the value of using the device. will improve.
The ``eyes, which are the parts that the user can recognize as eyes'' may be eyes as objects displayed on a display screen, or may be eyes that actually operate mechanically rather than in such a virtual image. The device itself may also be controlled to face in the direction of the user's position in synchronization with the eyes. Only the device for directing the eyes in the direction of the user may be controlled to direct in the direction of the user's position.

（２２）前記装置はユーザーの顔を認識する顔認識機能を備えるとよい。
個々の人物の顔を識別できるため、個々の個性に応じたコミュニケーションをとることが可能となる。例えば個々の人物の認証された顔と名前を関連付けすることで、対話の際に顔認識した人物をその名前で呼ぶことができる。また、過去の対話履歴に基づいて顔認識した人物に特化した対話を行う構成とすると特によい。 (22) The device preferably has a face recognition function that recognizes the user's face.
Since it is possible to identify the faces of individual people, it becomes possible to communicate in a way that is tailored to each person's individuality. For example, by associating each person's recognized face with their name, it is possible to call the person whose face has been recognized during a conversation by that name. Furthermore, it is particularly preferable to have a configuration in which a dialogue is performed specifically for a person whose face has been recognized based on past dialogue history.

（２３）前記装置は表示部を備え、前記顔認識機能によってユーザーの顔の認識状況を表示部に表示させる機能を備えるとよい。
このようにすればユーザーは自身の顔の装置での認識状況を表示部を見ることで把握できる。特に認識状況として顔認識が完了しているか、それとも未だ人物の顔として認識されていないかを表示させるとよく、このようにすれば、ユーザーは未だ認識が完了していなければなるべく装置が認識しやすいように顔を動かさないようにして協力することができる。 (23) The device preferably includes a display unit, and has a function of displaying the recognition status of the user's face on the display unit using the face recognition function.
In this way, the user can understand the state of recognition of his or her own face by the device by looking at the display section. In particular, it is a good idea to display whether face recognition has been completed or whether it has not yet been recognized as a person's face as a recognition status.In this way, the user can see if recognition has not yet been completed and the device recognizes the face as much as possible. You can cooperate by keeping your face still to make it easier.

（２４）前記位置認識機能は、三角形の頂点に配置された３つのマイクロフォンと、音源から前記３つのマイクロフォンの各々までの音の到達時間の差に基づき、前記音源の位置を、前記三角形を含む平面に垂直な方向に沿って前記三角形を含む平面に投影した位置から前記平面の前記三角形で囲まれた領域の内側にある基準位置へ向かう音源方向を特定する特定部と、を備える音源方向特定機能であるとよい。
これによって３つのマイクロフォンで音源方向を特定することができる。そして、音源方向を特定することができれば、ユーザーが発話すればその方向に装置を向けさせることができるため、対話によるコミュニケーションをしているようにユーザーは感じることができる。 (24) The position recognition function determines the position of the sound source, including the triangle, based on three microphones placed at the vertices of the triangle and the difference in arrival time of sound from the sound source to each of the three microphones. a sound source direction identification unit that identifies a sound source direction from a position projected onto a plane including the triangle along a direction perpendicular to the plane to a reference position located inside an area surrounded by the triangle on the plane; It would be nice if it were a function.
This allows the three microphones to specify the direction of the sound source. If the direction of the sound source can be specified, the device can be directed in that direction when the user speaks, allowing the user to feel as if they are communicating through dialogue.

（２５）前記装置は赤外線リモコン信号出力部を備え、前記コニュニケーションは赤外線リモコン受信機能を備える前記他の機器との間のコミュニケーションであるとよい。
これによって装置の赤外線リモコン信号出力部を介して簡単に機器との間のコミュニケーションを取ることができる。
例えば、赤外線リモコン信号受信部を備えた受信側装置、例えば、赤外線リモコン信号受信部を備えた受信側装置、例えばテレビ、オーディオ装置、エアコン装置等に対して装置から赤外線リモコン信号を出力して例えばＯＮ・０ＦＦ等の制御を実行させることが可能となる。特に装置に音声対話機能を備え、例えば「テレビつけて」とか「テレビ消して」という命令語句の発話に対し、装置はその命令に基づいて赤外線リモコン信号出力部を制御する構成とよい。 (25) The device preferably includes an infrared remote control signal output section, and the communication is preferably communication with the other device having an infrared remote control receiving function.
This makes it possible to easily communicate with the device via the infrared remote control signal output section of the device.
For example, an infrared remote control signal is output from the device to a receiving device equipped with an infrared remote control signal receiving section, such as a television, an audio device, an air conditioner device, etc. It becomes possible to execute control such as ON/OFF. In particular, it is preferable that the device is equipped with a voice interaction function, so that when a command phrase such as "turn on the TV" or "turn off the TV" is uttered, the device controls the infrared remote control signal output section based on the command.

（２６）前記装置は前記他の機器からのインターネットを介して遠隔操作されるようにするとよい。
他の機器から装置を遠隔操作できるため、装置の利便性が高まる。例えば、他の機器としてのスマートフォン等とするとよい。装置にはカメラを備えるとよい。装置にはカメラの向きを変える機構を備えるとよい。他の機器からアクセスして、例えば、装置側のカメラ動画を見たり、カメラの向きを代えたりすることがよい。これによって装置の近くにいなくとも装置の制御が可能となる。また、例えばスマートフォンからアクセスして、例えば装置の見守り機能をＯＮとして、人（物）が動いたことをスマートフォンにｅメールで通報するようにするとよい。また、例えば病人や被介護者の見守りとして、常に動いていることを前提とし、例えば一定時間以上その人が動いていない場合に通報するようにするとよい。他の機器とは、例えばタブレット端末やパソコン等でもよい。 (26) The device may be remotely controlled via the Internet from the other device.
The convenience of the device is increased because the device can be remotely controlled from other devices. For example, a smartphone or the like may be used as another device. The device may be equipped with a camera. The device may be equipped with a mechanism for changing the direction of the camera. It is preferable to access from another device and, for example, view videos from the camera on the device side or change the direction of the camera. This makes it possible to control the device without being near the device. Furthermore, it is preferable to access the system from a smartphone, turn on the monitoring function of the device, and report the movement of a person (object) to the smartphone by e-mail. Furthermore, it is preferable to assume that the device is always moving, for example to watch over a sick person or a cared person, and to notify the user if the person has not moved for a certain period of time or more. The other device may be, for example, a tablet terminal or a personal computer.

（２７）装置は前記他の機器からインターネットを介して送信された文字情報を用いて前記音声出力を行うようにするとよい。
例えば受信した電子メールの文字列を読み上げる機能を備えるとよい。誰かからのｅメールが届く設定にしておくことで、そのメール内容が装置から音声出力されるため、自身の端末を目視で確認する必要がなくなる。他の機器とは、例えばスマートフォンやタブレット端末やパソコン等とするとよいが、他の「装置」としてもよい。 (27) Preferably, the device outputs the audio using text information transmitted from the other device via the Internet.
For example, it may be provided with a function to read out the character strings of received e-mails. By setting the device to receive e-mail from someone, the contents of the e-mail will be output audibly from the device, eliminating the need to visually check one's own terminal. Other devices may be, for example, smartphones, tablet terminals, personal computers, etc., but may also be other "devices."

（２８）前記音声出力は前記文字情報の内容によって前記音声出力を行う時間、時刻又は回数を変更できるとよい。
例えば「薬飲んだ」とういうメールは決まった時刻にしゃべらせたい。例えば件名の記載が合致することで、所定の時刻に装置が発話したり、例えば重要な内容を時間を空けて２回発話させるようにすれば、装置側の近くにいるユーザーにメール内容を間違いなく実行させることができる。 (28) It is preferable that the time, time, or number of times the audio output is performed can be changed depending on the content of the text information.
For example, if you send an email that says ``I took some medicine,'' I want it to be sent at a set time. For example, if the subject lines match, the device will speak at a predetermined time, or, for example, you can have the device speak important information twice with a gap in time, so that a user near the device can misunderstand the contents of the email. It can be executed without any problem.

（２９）前記装置は音声認識した文字情報を前記他の機器へインターネットを介して送信する機能を有するとよい。
装置の音声コミュニケーション機能を使用して音声を文字化して他の機器に文字データとして送れば、例えばｅメールを送りたい場合に自身の端末に手入力しなくとも、送ることができる。 (29) The device preferably has a function of transmitting voice-recognized character information to the other device via the Internet.
By using the voice communication function of the device to convert voice into text and sending it as text data to another device, for example, if you want to send an e-mail, you can send it without manually inputting it into your own terminal.

（３０）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、最も長い前記出力文字列を選択して対話させる機能を有するとよい。
最も長い返答であると、いかにも対話しているように感じ、対話の単調さがなくなり、聞き手（ユーザー）は対話を楽しむことができる。 (30) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. It is preferable to have a function of receiving character strings output as , selecting the longest output character string, and having the user interact with the character string.
When the response is the longest, it feels like a conversation, the monotony of the conversation disappears, and the listener (user) can enjoy the conversation.

（３１）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、語尾に疑問符がついた前記出力文字列を選択して対話させる機能を有するとよい。
語尾に疑問符がつくと、その疑問に更に答えるような話の流れになるため、会話が続きやすくなり聞き手（ユーザー）は対話を楽しむことができる。 (31) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. It is preferable to have a function of receiving a character string outputted as , select the output character string with a question mark at the end, and having the user interact with the character string.
When a question mark is added at the end of a word, the flow of the conversation begins to further answer that question, making it easier for the conversation to continue and for the listener (user) to enjoy the conversation.

（３２）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、肯定文を組み合わせた後に疑問文を組み合わせて対話させる機能を有するとよい。
このようにアレンジすることでいかにも考えて文章を練ったような応答になるため、ユーザーは真剣に自身の発話を聞いてもらっているような感覚となり、続けて会話をしたいと思うようになるため、会話が続きやすくなり聞き手（ユーザー）は対話を楽しむことができる。また、出力尺をかせぐことができるとともに聞き手（ユーザー）への返答を求めることができる。 (32) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. It is preferable to have a function to receive a character string outputted as , combine it with an affirmative sentence, and then combine it with a question sentence to have a dialogue.
By arranging your responses in this way, your responses will appear as if you have put a lot of thought into your sentences, making users feel as if their utterances are being taken seriously, and making them want to continue the conversation. It becomes easier to continue the conversation, and the listener (user) can enjoy the conversation. In addition, it is possible to obtain an output scale and to request a response from the listener (user).

（３３）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、話題転換した文字列を最後に配置するように組み合わせて対話させる機能を有するとよい。
このようにアレンジすることで話題転換したことで次の発話を誘うような対話となり、対話が続きやすくなる。 (33) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. It is preferable to have a function to receive character strings output as , and have a conversation by combining them so that the character string that changed the topic is placed at the end.
By arranging the conversation in this way, changing the topic creates a conversation that invites the next utterance, making it easier to continue the conversation.

（３４）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、フレンドリーな前記出力文字列を初めに配置するように組み合わせて対話させる機能を有するとよい。
このようにアレンジすることで聞き手（ユーザー）が対話に引き込まれやすくなり、対話が続きやすくなる。 (34) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. It is preferable to have a function to receive character strings outputted as , and to combine and interact so that a friendly output character string is placed first.
Arranging things in this way makes it easier for the listener (user) to be drawn into the conversation, making it easier to continue the conversation.

（３５）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、それらをランダムな順で組み合わせて対話させる機能を有するとよい。
対話のバリエーションが増えることとなるため、聞き手（ユーザー）が同じ発話をした場合でもまったく同じ応答が帰ってきてしまうことがなくなり、対話に飽きることがなく対話が続きやすくなる。 (35) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. It is preferable to have a function to receive character strings output as , combine them in a random order, and interact with them.
This increases the variety of dialogue, so even if the listener (user) makes the same utterance, they will no longer receive exactly the same response, making it easier to continue the dialogue without getting bored with it.

（３６）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、それらの内に顔文字を含む前記出力文字列がある場合には、対話対象とせず、表示部には対話対象とされた前記出力文字列と一緒に表示させる機能を有するとよい。
顔文字は音声出力できないが、表示部に敢えて顔文字を表示させることで、音声と併せて対話の一部とすることで通常にはない対話のおもしろさを創出することができる。 (36) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. If a character string output as is received and there is an output character string that includes an emoticon among them, it is not used as an interaction target, and is displayed on the display unit together with the output character string that is an interaction target. It would be good to have a function that allows
Although emoticons cannot be output as audio, by deliberately displaying emoticons on the display and making them part of the dialogue along with the audio, it is possible to create an unusually interesting dialogue.

（３７）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、同じ文字列が含まれる前記出力文字列同士についてはいずれか１つのみを選択して他の前記出力文字列と組み合わせて対話させる機能を有するとよい。
同じ文字列が繰り返されると対話がくどくなってしまうし、聞き手に違和感を覚えさせてしまうためである。 (37) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. It is preferable to have a function of receiving character strings output as , selecting only one of the output character strings containing the same character string, and combining it with the other output character strings to interact with each other.
This is because if the same string of characters is repeated, the dialogue becomes tedious and the listener feels uncomfortable.

（３８）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、前記出力文字列の語尾を語尾変換エンジンによって変換してから組み合わせて対話させる機能を有するとよい。
普通の対話エンジンの文章に比べて、より親しみやすい表現となるのでよい。 (38) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. It is preferable to have a function of receiving character strings output as , converting the endings of the output character strings using an ending conversion engine, and then combining and interacting with the endings of the output character strings.
This is good because the expressions are more approachable than the sentences produced by ordinary dialogue engines.

（３９）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、すべての前記出力文字列を使用せずに一部の前記出力文字列を記憶手段に記憶させておき、以後の対話で前記記憶手段から取り出して対話に使用させる機能を有するとよい。
音声認識が失敗した場合や、外部サーバーからのレスポンスがなかなか来ない場合に使用することで、対話が途切れずにつなげることができ、自然な対話に寄与する。 (39) Sending the voice-recognized character string as the input character string to a plurality of different servers equipped with a dialogue engine that outputs an output character string corresponding to the input character string, and transmitting the output character string from the different servers. receives a character string output as , stores some of the output character strings in a storage means without using all of the output character strings, and retrieves them from the storage means in a subsequent dialogue to use them in the dialogue. It would be good to have a function that allows
By using it when voice recognition fails or when a response from an external server is slow to arrive, it allows the conversation to continue without interruption, contributing to natural dialogue.

（４０）音声認識エンジンを備えるサーバーを利用する際に料金が無料のサーバーと有料のサーバーをミックスして利用するとよい。
これによって例えば特に装置との対話のヘビーユーザーはサーバー接続料金を節約することができる。 (40) When using a server equipped with a speech recognition engine, it is recommended to use a mix of free servers and paid servers.
This allows, for example, especially heavy users of device interaction to save on server connection fees.

（４１）前記他の機器はスマートスピーカであり、前記装置は前記スマートスピーカに音声出力を行って前記スマートスピーカとコミュニケーションを行うようにするとよい。
スマートスピーカは表示部がないため、装置と組み合わせて使用することで利便性が高まる。
スマートスピーカとは、例えば無線通信接続機能と音声操作のアシスタント機能を持つスピーカーとするとよい。例えばGoogleHome、AmazonEcho、LINE Clova等とするとよい。スマートスピーカは例えば様々な機能・能力（スキル）を実現する機能を備えるものとするとよい。て音声でのコミュニケーションをする装置からスマートスピーカに対して発話することでその機能を実行させることができる。装置から発話させる際には、例えばユーザーがスマートスピーカのスキルを起動させるフレーズを発話し、装置がその発話を音声認識して文字列データとして保存し、あるタイミングでその文字列データを音声合成してスマートスピーカに対して発話してスキルを実行させる構成とするとよい。 (41) Preferably, the other device is a smart speaker, and the device communicates with the smart speaker by outputting audio to the smart speaker.
Since smart speakers do not have a display, they can be used in combination with other devices for added convenience.
The smart speaker may be, for example, a speaker that has a wireless communication connection function and a voice operation assistant function. For example, GoogleHome, AmazonEcho, LINE Clova, etc. may be used. For example, the smart speaker may be equipped with functions that realize various functions and abilities (skills). By speaking to the smart speaker from a voice communication device, the smart speaker can perform its functions. When making a speech from a device, for example, the user utters a phrase that activates a smart speaker skill, the device recognizes the speech, saves it as string data, and at a certain timing synthesizes the string data into speech. It is recommended that the skill be executed by speaking to the smart speaker.

（４２）前記他の機器はスマートスピーカであり、前記装置は前記スマートスピーカに音声出力を行って前記スマートスピーカとコミュニケーションを行うようにするとよい。
スマートスピーカを起動させたり、スキルを起動させる。あるいは問い合わせを行う。自らスマートスピーカを起動させなくとも、ある決まったタイミングや、ある予定時刻にスマートスピーカのスキルを自動的に実行させることが可能となる。 (42) Preferably, the other device is a smart speaker, and the device communicates with the smart speaker by outputting audio to the smart speaker.
Activate a smart speaker or activate a skill. Or make inquiries. It becomes possible to automatically execute smart speaker skills at a certain fixed timing or at a certain scheduled time without having to activate the smart speaker yourself.

（４３）前記他の機器はスマートスピーカであり、前記装置は前記スマートスピーカの音声出力を翻案する翻案機能を有するとよい。
質問に対するスマートスピーカの決まった長い回答を聞くのが面倒であったりする場合や、内容をざっと再確認したい場合に便利である。 (43) Preferably, the other device is a smart speaker, and the device has a translation function for translating the audio output of the smart speaker.
This is useful if you find it tedious to listen to the smart speaker's long, predetermined answers to questions, or if you want to quickly review the content.

（４４）前記装置の音声出力はＷｅｂ記事の読み上げる機能を有するとよい。
Ｗｅｂ記事を読まなくとも装置との対話のみで聞くことができる。 (44) The audio output of the device preferably has a function of reading out web articles.
Even if you don't read the web article, you can listen to it just by interacting with the device.

（４５）前記ある出力としてロボティクスプロセスオートメーションの所定の処理単位の実行が完了した時点でなされるようにするとよい。
ロボティクスプロセスオートメーションは処理単位の実行状況がわかりにくいが、装置
に処理単位の実行に応じた「ある出力」をさせることで処理状況がわかりやすくなり利便性が高まる。 (45) Preferably, the certain output is performed when execution of a predetermined processing unit of robotics process automation is completed.
With robotic process automation, it is difficult to understand the execution status of each processing unit, but by having the equipment produce a ``certain output'' according to the execution of the processing unit, the processing status becomes easier to understand and convenience increases.

（４６）前記ある出力とは報知動作とするとよい。
報知動作によってある出力がされたことがわかることとなる。
（４７）ロボティクスプロセスオートメーションの実行中のコンピュータがユーザーからの入力待ち状態となった場合に、報知動作を行うようにした。
これによって入力待ち状態となったことをユーザーに報せ、次の処理を促すことが可能となる。 (46) The certain output may be a notification operation.
It is known that a certain output has been made by the notification operation.
(47) When the computer that is executing robotics process automation enters a state of waiting for input from the user, a notification operation is performed.
This allows the user to be notified of the input waiting state and prompts the user to proceed with the next process.

（４８）ロボティクスプロセスオートメーションを行うクライアントコンピュータと、前記クライアントコンピュータに対してロボティクスプロセスオートメーションの実行指示を与えるサーバーコンピュータとを備え、前記クライアントコンピュータに前記サーバーコンピュータからの指示があった場合、報知動作を行うようにするとよい。
これによってサーバーコンピュータからの指示があったことをユーザーに報せ、次の処理を促すことが可能となる (48) A client computer that performs robotics process automation, and a server computer that instructs the client computer to execute robotics process automation, and when the client computer receives an instruction from the server computer, performs a notification operation. It is a good idea to do so.
This makes it possible to notify the user that there has been an instruction from the server computer and prompt the user to proceed with the next process.

（４９）ロボティクスプロセスオートメーションを実行しているコンピュータの方向を指し示す動作を行なう前記出力情報を生成するとよい。
これによってどのコンピュータにおいて実行が行われたのかをユーザーがわかることとなり、ユーザーに次の処理を促すことが可能となる。
（５０）ロボティクスプロセスオートメーションの実行状態に応じて異なる前記出力情報を生成するものとするとよい。
これによってどのような実行が行われたかを区別することができる。 (49) Preferably, the output information is generated to perform an action pointing in the direction of a computer executing robotics process automation.
This allows the user to know on which computer the execution was performed, making it possible to prompt the user to proceed with the next process.
(50) The output information may be generated differently depending on the execution state of the robotics process automation.
This allows you to distinguish what kind of execution was performed.

（５１）（１）～（５０）のいずれかに記載の装置の機能をコンピュータに実現させるためのプログラム。
「ある出力」など「ある」と記載した部分は例えば「所定の」とするとよい。
上述した（１）から（５０）に示した発明は、任意に組み合わせることができる。例えば、（１）に示した発明の全てまたは一部の構成に、（２）以降の少なくとも１つの発明の少なくとも一部の構成を加える構成としてもよい。特に、（１）に示した発明に、（２）以降の少なくとも１つの発明の少なくとも一部の構成を加えた発明とするとよい。また、（１）から（５０）に示した発明から任意の構成を抽出し、抽出された構成を組み合わせてもよい。本願の出願人は、これらの構成を含む発明について権利を取得する意思を有する。また「～の場合」「～のとき」という記載があったとしてもその場合やそのときに限られる構成として記載はしているものではない。これらの場合やときでない構成についても開示しているものであり、権利取得する意思を有する。また順番を伴った記載になっている箇所もこの順番に限らない。一部の箇所を削除したり、順番を入れ替えた構成についても開示しているものであり、権利取得する意思を有する。 (51) A program for causing a computer to implement the functions of the device according to any one of (1) to (50).
For example, a part written as "certain" such as "certain output" may be written as "predetermined".
The inventions shown in (1) to (50) above can be combined arbitrarily. For example, a configuration may be adopted in which at least a part of the configuration of at least one invention following (2) is added to all or part of the configuration of the invention shown in (1). In particular, it is preferable to create an invention in which at least a part of the structure of at least one invention after (2) is added to the invention shown in (1). Further, arbitrary configurations may be extracted from the inventions shown in (1) to (50) and the extracted configurations may be combined. The applicant of this application intends to acquire rights to inventions containing these structures. Furthermore, even if there is a description of "in the case of..." or "at the time of...", the description is not intended to be limited to those cases or times. We have also disclosed these cases and other configurations, and we intend to acquire the rights. Furthermore, the sections described in order are not limited to this order. It also discloses a configuration in which some parts have been deleted or the order has been changed, and we have the intention to acquire the rights.

ユーザーや他の機器とコミュニケーションを取る際に、装置はコミュニケーションに応じた出力情報を生成することができる。そのため、ユーザー又は他の機器において装置とのコミュニケーションを図る際の利便性が高まる。
本願の発明の効果はこれに限定されず、本明細書および図面等に開示される構成の部分から奏する効果についても開示されており、当該効果を奏する構成についても分割出願・補正等により権利取得する意思を有する。例えば本明細書において「～できる」と記載した箇所などは奏する効果を明示する記載であり、また「～できる」と記載がなくとも効果を示す部分が存在する。またこのような記載がなくとも当該構成よって把握される効果が存在する。 When communicating with a user or other device, the device can generate output information in response to the communication. Therefore, it is more convenient for users or other devices to communicate with the device.
The effects of the invention of the present application are not limited to these, but the effects obtained from the parts of the structure disclosed in the specification and drawings are also disclosed, and the rights to the structure that achieves the effects have also been acquired through divisional applications, amendments, etc. have the intention to do so. For example, in this specification, a portion where it is written as “can be done” is a description that clearly indicates the effect to be achieved, and there are also portions that show an effect even if there is no description that “can be done.” Furthermore, even without such a description, there are effects that can be understood from the configuration.

本発明にかかる実施の形態１のロボットの正面図。1 is a front view of a robot according to a first embodiment of the present invention. 同じ実施の形態１のロボットの側面図。FIG. 3 is a side view of the robot according to the first embodiment. 同じ実施の形態１のロボットの背面図。FIG. 3 is a rear view of the robot according to the first embodiment. ロボットの電気的構成を説明するブロック図。FIG. 2 is a block diagram illustrating the electrical configuration of the robot. ロボットの顔面部に表示される顔画面のある表情の一例を捉えた説明図。An explanatory diagram capturing an example of a certain expression on a face screen displayed on a robot's face. ロボットの顔面部に表示されるチャット画面のあるチャット状態の一例を捉えた説明図。An explanatory diagram capturing an example of a chat state with a chat screen displayed on the robot's face. ロボットの顔面部に表示される顔画面を背景とする待ち受け画像を説明する説明図。FIG. 2 is an explanatory diagram illustrating a standby image with a face screen displayed on a robot's face as a background. ロボットの顔面部に表示されるチャット画面を背景とする待ち受け画像を説明する説明図。FIG. 3 is an explanatory diagram illustrating a standby image with a chat screen displayed on the robot's face as a background. （ａ）～（ｄ）はロボットの顔面部に表示される目オブジェクトの変形パターンを説明する説明図。(a) to (d) are explanatory diagrams illustrating deformation patterns of eye objects displayed on the robot's face. （ａ）～（ｃ）はロボットの顔面部にユーザーの発話内容が文字列として徐々に表れてくる様子を説明する説明図。(a) to (c) are explanatory diagrams illustrating how the content of the user's utterance gradually appears as a character string on the robot's face. ロボットの顔面部に表示される顔画面において目オブジェクトがユーザーの顔を追って移動している状態を説明する説明図。FIG. 2 is an explanatory diagram illustrating a state in which an eye object is moving following a user's face on a face screen displayed on a robot's face. スマートフォンの一例を説明する説明図。An explanatory diagram illustrating an example of a smartphone. ロボットの起動～ウェイクアップモード～対話モード～スリープモードの関係を説明する説明図。An explanatory diagram illustrating the relationship between robot startup, wake-up mode, dialogue mode, and sleep mode. 実施の形態７においてロボットとスマートスピーカの関係を説明する説明図。FIG. 7 is an explanatory diagram illustrating the relationship between a robot and a smart speaker in Embodiment 7. 実施の形態９においてスマートフォンの一例を説明する説明図。FIG. 9 is an explanatory diagram illustrating an example of a smartphone in Embodiment 9.

＜実施の形態１＞
図１～図３に示すように、人の声に反応して動作するコミュニケーションロボットであるロボット１は、下半身となる固定部２と、固定部２上に載置される上半身となる可動部３を筐体として備えている。可動部３は固定部２に隣接配置された胴部４と、胴部４に支持された頭部５とから構成されている。固定部２は上に開いた碗状の外観に形成され、胴部４は固定部２上縁と上下方向に連続的なカーブで構成された筒体形状に形成されている。ロボット１は固定部２と胴部４の接続部分がもっとも大径に構成されて、その接続部分を境界に上下方向に窄まった外形とされている。胴部４はその筒体形状の前方が半円形形状に大きく切り欠かれている。頭部５は胴部４の上部に埋め込まれるように嵌合されている。胴部４は固定部２に対して水平方向（図１の矢印方向）に回動し、頭部５は胴部４に対して縦方向（図２の矢印方向）と左右回転方向（図３の矢印方向）の２方向に回動する。 <Embodiment 1>
As shown in FIGS. 1 to 3, a robot 1, which is a communication robot that operates in response to human voices, has a fixed part 2 that is the lower body, and a movable part 3 that is the upper body that is placed on the fixed part 2. It is equipped with a casing. The movable part 3 includes a body part 4 disposed adjacent to the fixed part 2 and a head part 5 supported by the body part 4. The fixed part 2 is formed to have an upwardly open bowl-like appearance, and the body part 4 is formed in a cylindrical shape that is formed by an upper edge of the fixed part 2 and a continuous curve in the vertical direction. The robot 1 has the largest diameter at the connecting portion between the fixed portion 2 and the body portion 4, and has an outer shape that narrows in the vertical direction with the connecting portion as a boundary. The body part 4 has a cylindrical shape with a large notch in the front part having a semicircular shape. The head 5 is fitted into the upper part of the body 4 so as to be embedded therein. The body 4 rotates horizontally (in the direction of the arrow in FIG. 1) with respect to the fixed part 2, and the head 5 rotates vertically (in the direction of the arrow in FIG. 2) and in the left-right rotation direction (in the direction of the arrow in FIG. 3) with respect to the body 4. It rotates in two directions (in the direction of the arrow).

頭部５は全体として球体の一部（前面部分）が１つの平面でカットされた残余である球欠状の形状に構成されている。カット状に形成された前面部分は円形形状に現れロボット１の顔面部６を構成する。顔面部６の表面に形成された長方形部分はタッチパネル機能を備えた液晶ディスプレイ（ＬＣＤ）である表示部としてのタッチパネル部７とされている。タッチパネル部７に表示される内容については後述する。タッチパネル部７の周囲の顔面部６領域にはスモークパネルが配置され顔面部６全体が統一された濃色の背景となっている。頭部５の内部において顔面部６の上方左右の収容位置には照度センサ８と高輝度白色ＬＥＤ９がそれぞれ配設されている。顔面部６においてタッチパネル部７の上部中央位置には顔認識用カメラ１０のレンズ１１が配設されている。 The head 5 as a whole has a truncated spherical shape, which is a portion of a spherical body (front part) cut off in one plane. The cut-shaped front portion appears in a circular shape and constitutes the face portion 6 of the robot 1. A rectangular portion formed on the surface of the face portion 6 is a touch panel portion 7 serving as a display portion that is a liquid crystal display (LCD) with a touch panel function. The contents displayed on the touch panel section 7 will be described later. A smoke panel is arranged in the area of the face part 6 around the touch panel part 7, so that the entire face part 6 has a uniform dark-colored background. Inside the head 5, an illuminance sensor 8 and a high-intensity white LED 9 are disposed at housing positions on the left and right sides above the face section 6, respectively. A lens 11 of a face recognition camera 10 is disposed at the upper center of the touch panel section 7 in the face section 6 .

胴部４内部において胴部４の前方左右寄り位置と後方中央位置の１２０度ずつずれた同
じ高さの３箇所の位置にはマイクロフォン１２が配設されている。固定部２内部において固定部２上には左右一対のスピーカ装置１３が配設されている。スピーカ装置１３の側方にはスピーカ装置１３で発生した音を出力するための開口部１４が形成されている。スピーカー用開口部１４に隣接した位置には電源スイッチ１５とスピーカー装置１３の音量を調整するためのアップスイッチ１６とダウンスイッチ１７がそれぞれ配設されている。固定部２の後方位置にはＵＳＢのＯＴＧ（On-The-Go）用の端子１８、ＤＣ１２Ｖ用の電源用ジャック２０、マイクロＳＤカード用ソケット（リーダー）１９が配設されている。 Microphones 12 are disposed inside the body 4 at three positions at the same height, shifted by 120 degrees, at the front left and right positions of the body 4 and at the rear center position. A pair of left and right speaker devices 13 are disposed on the fixed portion 2 inside the fixed portion 2 . An opening 14 for outputting sound generated by the speaker device 13 is formed on the side of the speaker device 13. A power switch 15 and an up switch 16 and a down switch 17 for adjusting the volume of the speaker device 13 are disposed adjacent to the speaker opening 14, respectively. A USB OTG (On-The-Go) terminal 18, a DC 12V power jack 20, and a micro SD card socket (reader) 19 are provided at the rear of the fixed part 2.

ロボット１はインターネット回線を利用して所定の外部のクラウドサーバーに接続可能とされている。クラウドサーバーはロボット１が必要とするデータを記憶する記憶手段としての記憶領域、ロボット１が必要とする各種処理を行うための各種エンジン等を備えている。そのため、広義にはロボット１はこれらのクラウドサーバーのソフトウェア等の部分を含めた装置として解釈することができる。 The robot 1 is capable of connecting to a predetermined external cloud server using an Internet line. The cloud server includes a storage area as a storage means for storing data required by the robot 1, various engines for performing various processes required by the robot 1, and the like. Therefore, in a broad sense, the robot 1 can be interpreted as a device that includes software and other parts of these cloud servers.

次に、図４のブロック図に基づいて、実施の形態１のロボット１の電気的構成について説明する。
制御手段としてのコントローラＭＣには上記のタッチパネル部７、照度センサ８、高輝度白色ＬＥＤ９、顔認識用カメラ１０、マイクロフォン１２、スピーカ装置１３、端子１８、マイクロＳＤカード用ソケット１９が接続され、これらに加え、無線ＬＡＮ装置２１、ドップラーセンサ２２、第１～第３のモータ２３～２５等がそれぞれ接続されている。 Next, the electrical configuration of the robot 1 according to the first embodiment will be described based on the block diagram of FIG. 4.
The above touch panel section 7, illuminance sensor 8, high brightness white LED 9, face recognition camera 10, microphone 12, speaker device 13, terminal 18, and micro SD card socket 19 are connected to the controller MC as a control means. In addition, a wireless LAN device 21, a Doppler sensor 22, first to third motors 23 to 25, etc. are connected, respectively.

タッチパネル部７はその表面に接触することで入力する入力操作機能を有する。タッチパネル部７は後述する自然対話モードにおいては図５又は図６のような異なった画面を表示させることができる。コントローラＭＣは第１の画面として図５のようなロボット１の表情、特に目の周辺の変化を司る顔画面Ｓ１を変位可能にタッチパネル部７に表示させる。顔画面Ｓ１はデフォルトで表示される画面であって、ロボット１の目オブジェクト２７とほっぺオブジェクト２８と楕円領域２９が表示される。目オブジェクト２７はアニメーション画像としていくつかの目オブジェクト２７の変形パターンを備えている（図９（ａ）～（ｄ））。また、アニメーション画像として瞳オブジェクト２７ａが左右に移動する動きをする。 The touch panel section 7 has an input operation function for inputting information by touching its surface. The touch panel unit 7 can display different screens as shown in FIG. 5 or 6 in a natural dialogue mode to be described later. The controller MC causes the touch panel section 7 to display, as a first screen, a face screen S1 that controls the expression of the robot 1 as shown in FIG. 5, particularly changes in the area around the eyes, on the touch panel section 7. The face screen S1 is a screen displayed by default, and the eye object 27, cheek object 28, and elliptical area 29 of the robot 1 are displayed. The eye object 27 includes several deformation patterns of the eye object 27 as animation images (FIGS. 9(a) to 9(d)). Furthermore, the pupil object 27a moves from side to side as an animation image.

また、コントローラＭＣは第２の画面として図６のようなチャット画面Ｓ２をタッチパネル部７に表示させる。チャット画面Ｓ２は顔画面Ｓ１の状態でタッチパネル部７をタッチしてスライドさせることで顔画面Ｓ１に代えてタッチパネル部７上にチャット画面Ｓ２を表示させることができる。スライド操作によって顔画面Ｓ１とチャット画面Ｓ２は相互に表示切り替えが可能となっている。チャット画面Ｓ２については後述する。
また、タッチパネル部７は待ち受けモード（ウェイクアップモード）ではタッチパネル部７上に図７又は図８の待ち受け画像を表示させることができる。待ち受け画像については後述する。
ロボット１にはこれら以外の異なる画像として設定画面が用意され、チャット画面Ｓ２からその設定画面に移動可能である。ロボット１が初めて起動された状態では設定画面からアクセスして、例えば、ＩＤ・パスワードのサーバーへの設定登録、Wi-Fiパスワードの設定登録、ユーザー登録（例えば名前、年齢、性別等）、顔認証、データを転送する先となるｅメールアドレスの設定等の必要な初期設定項目を入力する。 Further, the controller MC causes the touch panel section 7 to display a chat screen S2 as shown in FIG. 6 as a second screen. The chat screen S2 can be displayed on the touch panel section 7 instead of the face screen S1 by touching and sliding the touch panel section 7 while the face screen S1 is displayed. The display of the face screen S1 and the chat screen S2 can be switched between each other by a slide operation. Chat screen S2 will be described later.
Further, the touch panel section 7 can display the standby image of FIG. 7 or FIG. 8 on the touch panel section 7 in the standby mode (wake-up mode). The standby image will be described later.
The robot 1 is provided with a setting screen as a different image other than these, and it is possible to move to the setting screen from the chat screen S2. When the robot 1 is started for the first time, you can access it from the settings screen and, for example, register your ID/password to the server, register your Wi-Fi password, user registration (for example, name, age, gender, etc.), and facial recognition. , enter the necessary initial setting items such as setting the e-mail address to which the data will be transferred.

照度センサ８は、ロボット１の設置された環境の明るさを認識する。高輝度白色ＬＥＤ９は照度センサ８の検出した数値に基づいて顔認識用カメラ１０による撮影に光度が足りない場合に自動的に点灯される。
マイクロフォン１２は、ユーザーとの対話においてユーザーの発話を取得する音声入力手段であると同時に、三角形の頂点に配置される３つのマイクロフォン１２を同時に使用
することで、これらの間での音の到達時間の差によって音源方向を特定することができる方向検知手段でもある。コントローラＭＣは各マイクロフォン１２の取得した電気信号の位相差から到達時間差を求める。コントローラＭＣはその到達時間差に基づいて基準方向に対する音源角度を算出する。ユーザーとの対話に特化したマイクロフォンを例えば顔面部６に設けるようにしてもよい。
スピーカ装置１３は、ユーザーとの対話においてロボット１が発話（音声出力）する音声出力手段である。
マイクロＳＤカード用ソケット１９は挿入されるmicroＳＤカードのデータの読み取り及び書き換えをする。
無線ＬＡＮ装置２１は、Wi-Fi対応機器であるロボット１をインターネットに無線接続させるための機器である。本実施の形態ではIEEE802.11bの国際標準規格とされている。
ドップラーセンサ２２は、マイクロ波を使用したセンサであって、マイクロ波を発射し、反射してきたマイクロ波の周波数と、発射した電波の周波数とを比較し、物体（人）が動いているかどうかを検出する。ドップラー効果により物体（人）が動いている場合の反射波の周波数が変化することを利用するものである。例えば、ユーザー不在時の不審者の有無等のように、ロボット１の周囲の異常を検知するために使用される装置である。
第１のモータ２３は胴部４を固定部２に対して水平方向（図１の矢印方向）に回動させるためのサーボモータである。第２のモータ２４は頭部５を胴部４に対して縦方向（図２の矢印方向）に回動させるためのサーボモータである。第３のモータ２５は頭部５を胴部４に対して左右回転方向（図３の矢印方向）に回動させるためのサーボモータである。マイクロフォン１２によってユーザーの発話する方向が決定された場合にはコントローラＭＣは顔面部６がユーザーの方向に正対するように第１のモータ２３を制御して固定部２に対して胴部４（可動部３）を回動させる。 The illuminance sensor 8 recognizes the brightness of the environment in which the robot 1 is installed. The high-brightness white LED 9 is automatically turned on when the light intensity is insufficient for photographing by the face recognition camera 10 based on the value detected by the illuminance sensor 8.
The microphone 12 is an audio input means for acquiring the user's utterances during dialogue with the user, and at the same time, by using the three microphones 12 placed at the vertices of the triangle at the same time, the arrival time of sound between them can be adjusted. It is also a direction detection means that can identify the direction of the sound source based on the difference in the direction of the sound. The controller MC determines the arrival time difference from the phase difference between the electrical signals acquired by each microphone 12. The controller MC calculates the sound source angle with respect to the reference direction based on the arrival time difference. For example, a microphone specialized for dialogue with the user may be provided on the face part 6.
The speaker device 13 is a voice output means through which the robot 1 speaks (voice output) during dialogue with the user.
The microSD card socket 19 reads and rewrites data on the inserted microSD card.
The wireless LAN device 21 is a device for wirelessly connecting the robot 1, which is a Wi-Fi compatible device, to the Internet. In this embodiment, the international standard is IEEE802.11b.
The Doppler sensor 22 is a sensor that uses microwaves. It emits microwaves and compares the frequency of the reflected microwaves with the frequency of the emitted radio waves to determine whether an object (person) is moving. To detect. It takes advantage of the fact that the frequency of reflected waves changes when an object (person) is moving due to the Doppler effect. For example, this is a device used to detect abnormalities around the robot 1, such as the presence or absence of a suspicious person when the user is not present.
The first motor 23 is a servo motor for rotating the body portion 4 in the horizontal direction (in the direction of the arrow in FIG. 1) with respect to the fixed portion 2. The second motor 24 is a servo motor for rotating the head 5 in the vertical direction (in the direction of the arrow in FIG. 2) with respect to the body 4. The third motor 25 is a servo motor for rotating the head 5 in the left-right rotation direction (in the direction of the arrow in FIG. 3) with respect to the body 4. When the direction in which the user speaks is determined by the microphone 12, the controller MC controls the first motor 23 so that the face part 6 faces directly toward the user, and moves the torso part 4 (movable) relative to the fixed part 2. Rotate part 3).

コントローラＭＣは周知のＣＰＵやＲＯＭ及びＲＡＭ、ＳＳＤ等の記憶手段としてのメモリ、バス、リアルタイムクロック（ＲＴＣ）等から構成されている。コントローラＭＣのＲＯＭ内にはロボット１の各種機能を実行させるための各種プログラムが記憶されている。
各種プログラムとしては、例えばマイクロフォン１２とスピーカ装置１３を介したユーザーとの対話を制御するための対話プログラム、顔認識用カメラ１０を使用した顔認識に関する顔認識プログラム、タッチパネル部７や第１～第３のモータ２３～２５を制御してロボット１との対話中におけるロボット１の表情や動作を変化させるための表示変動・ジェスチャープログラム、ユーザーとの対話やタッチパネル部７の操作に基づいて異なる画面や画像をタッチパネル部７上に表示させる画面表示プログラム、他のコンピュータやスマートフォンとの間でロボット１側で取得した例えばカメラ画像やスマートフォン等からのｅメール等を処理するデータ送受信プログラム、ユーザーが不在の際の見守りのための留守設定時プログラム、ＧＵＩ機能・ネット接続機能、プロセス管理等の操作・運用・運転のためのＯＳ等が記憶されている。ＲＡＭ内には対話や顔認識における入出力データ等が一旦記憶される。各プログラムは他のプログラムと連携してあるいは独立してマルチタスクで対話、顔認識、ジェスチャー等の機能が実現される。 The controller MC is composed of a well-known CPU, ROM, RAM, memory as a storage means such as SSD, a bus, a real-time clock (RTC), and the like. Various programs for executing various functions of the robot 1 are stored in the ROM of the controller MC.
Various programs include, for example, a dialogue program for controlling dialogue with the user via the microphone 12 and the speaker device 13, a face recognition program related to face recognition using the face recognition camera 10, a touch panel unit 7 and the first to first A display variation/gesture program for controlling the motors 23 to 25 of 3 to change the facial expressions and movements of the robot 1 during interaction with the robot 1, and a display variation/gesture program that controls the motors 23 to 25 of 3 to change the facial expressions and movements of the robot 1 during interaction with the robot 1. A screen display program that displays images on the touch panel section 7, a data transmission/reception program that processes camera images acquired on the robot 1 side, e-mails from smartphones, etc., with other computers and smartphones, and a data transmission/reception program that processes e-mails from smartphones, etc. A program for when the user is away from home for monitoring purposes, a GUI function/internet connection function, an OS for operation/operation such as process management, etc. are stored. Input/output data, etc. for dialogue and face recognition are temporarily stored in the RAM. Each program can perform multitasking functions, such as dialogue, facial recognition, and gestures, either in conjunction with other programs or independently.

Ａ．対話時の動作内容について
上記のような構成において、コントローラＭＣは対話プログラムを実行することによってユーザーとの対話によるコミュニケーションを制御する。尚、対話の開始可能と同期して可能となる顔認識については下記「Ｂ．顔認識時の動作内容について」で後述する。
ここで、対話プログラムは、
１）マイクロフォン１２から取得したユーザーの発話データ（音声データ）をクラウドサーバーにリクエスト発行し、サーバー側の音声認識エンジンを使用してテキスト化したユーザーの発話データ（文字列データ）をレスポンスするためのサブプログラム
２）ユーザーの発話（文字列データ）に基づいてビルトインシナリオの対話を実行させる
ビルトインシナリオサブプログラム
３）ユーザーの発話がビルトインシナリオに対応しない場合に発話データ（文字列データ）を再びクラウドサーバーにリクエスト発行し、対話ＡＰＩ（アプリケーションプログラミングインタフェース）を利用して対話エンジンにロボット１の返答データ（文字列データ）を作成させる発話データ転送サブプログラム
４）レスポンスされた返答データ（文字列データ）を表示部としてのタッチパネル部７に表示させる文字列データ表示サブプログラム
５）レスポンスされた返答データ（文字列データ）を音声合成エンジンによって音声データに変換しスピーカ装置１３からロボット１側の発話として音声出力させるための音声データサブプログラム
６）ユーザー側文字列データやロボット１側文字列データに基づいてタッチパネル部７上の表示態様やロボット１の動作を変動させる表示態様・動作変動サブプログラム、
等を含む。
以下、主として対話プログラムに基づいたコントローラＭＣの制御内容の一例について、起動後の待ち受けモード（ウェイクアップモード）と自然対話モードとスリープモードの相互の関係と共に説明する。これらの相互の関係は図１２に示されるとおりである。
図５と図６は自然対話モードの画面であり、図７と図８は待ち受けモードの画面である。スリープモードではこれらの画面はタッチパネル部７のバックライトが消灯して暗くなった画面である。 A. Regarding the contents of operations during dialogue In the configuration described above, the controller MC controls communication through dialogue with the user by executing a dialogue program. Note that face recognition that becomes possible in synchronization with the start of dialogue will be described later in "B. Operation details during face recognition" below.
Here, the dialogue program is
1) Issue a request to the cloud server for the user's utterance data (voice data) acquired from the microphone 12, and respond with the user's utterance data (character string data) converted into text using the speech recognition engine on the server side. Subprogram 2) Built-in scenario subprogram that executes the built-in scenario dialogue based on the user's utterances (string data) 3) If the user's utterances do not correspond to the built-in scenario, the utterance data (string data) is sent back to the cloud server The utterance data transfer subprogram 4) issues a request to the robot 1 and causes the dialogue engine to create response data (character string data) for the robot 1 using the dialogue API (application programming interface). Character string data display subprogram to be displayed on the touch panel section 7 as a display section 5) Converts the response data (character string data) into voice data by a voice synthesis engine and outputs the voice as speech from the robot 1 side from the speaker device 13 6) A display mode/motion variation subprogram that changes the display mode on the touch panel section 7 and the motion of the robot 1 based on the character string data on the user side and the character string data on the robot 1 side;
Including etc.
Hereinafter, an example of the control contents of the controller MC mainly based on the dialogue program will be described together with the mutual relationship between the standby mode (wake-up mode) after startup, the natural dialogue mode, and the sleep mode. Their mutual relationship is as shown in FIG.
5 and 6 are screens in natural dialogue mode, and FIGS. 7 and 8 are screens in standby mode. In the sleep mode, these screens are darkened because the backlight of the touch panel section 7 is turned off.

１．起動
電源スイッチ１５の投入によってロボット１は起動される（図１３の処理Ｍ０）。コントローラＭＣではブート・プログラムが実行され、次いでＯＳが起動すると、ＯＳはユーザーからの「命令（コマンド）」待ち状態、つまりウェイクアップ状態となる。この初期の待ち受けモードでは図７の待ち受け画面が表示される。
尚、以下では初期設定が完了した後の状態、つまりロボット１のＩＤとパスワードがクラウドサーバーに登録され、ユーザー登録が完了し、複数のユーザーの顔認証がされ、スマートフォンのｅメールアドレスがロボット１に登録される等以後の起動とする。
『１．起動における効果』
このように起動によって複数の待ち受け画面から選択された１つの画面（図７）がまず表示される。つまり、起動時には常に決まった画面が表示されることとなる。そしてロボット１の目が閉じている（対話ができないことを暗示している）ことから待ち受けモードにあることがユーザーに容易にわかるようになっている。 1. Startup The robot 1 is started by turning on the power switch 15 (process M0 in FIG. 13). A boot program is executed in the controller MC, and then when the OS is started, the OS enters a state of waiting for a "command" from a user, that is, a wake-up state. In this initial standby mode, the standby screen shown in FIG. 7 is displayed.
The following describes the state after the initial settings are completed, that is, the ID and password of robot 1 are registered on the cloud server, user registration is completed, facial recognition of multiple users is performed, and the e-mail address of the smartphone is changed to robot 1. It will be registered in , etc. on subsequent startups.
“1. “Effects on startup”
In this way, upon startup, one screen (FIG. 7) selected from a plurality of standby screens is first displayed. In other words, a fixed screen will always be displayed at startup. Since robot 1's eyes are closed (suggesting that it cannot interact), the user can easily tell that it is in standby mode.

２．待ち受けモード（ウェイクアップモード）
待ち受けモードは自然対話モードの開始のトリガーがあるとロボット１と対話が可能となる状態である。また、自然対話モードにおいて対話が終了した場合に移行する状態でもある。また、一定時間自然対話モードが開始されないとスリープモードになってしまう状態でもある。スリープモードはロボット１と対話でコミュニケーションが取れない状態である。
ここに「自然対話」とはロボット１のビルトインシナリオやサーバ上の対話エンジン（対話ソフト）を使用してユーザーが音声合成された装置（ロボット１）側の音声と対話することをいう。自然対話モードは自然対話が可能な状態である。
待ち受けモードの画面は複数あり、本実施の形態では図７と図８の２種類が用意されている。
図７は図５の自然対話モードにおける画面から移行する待ち受け画面である（図１３の処理Ｍ２）。また、図１３の処理Ｍ０によって起動時に表示される待ち受け画面でもある。図７では、日時と曜日と、大きく現時間が表示がされた時計レイヤーの画面の背景にロボット１の目（目オブジェクト２７）が閉じた状態の顔画面Ｓ１のレイヤー画面が薄く表示されている。
図８は図６の自然対話モードにおける画面から移行する待ち受け画面である（図１３の処理Ｍ２）。図８では、日時と曜日と、大きく現時間が表示がされた時計レイヤーの背景にチャット画面Ｓ２のレイヤー画面が薄く表示されている。つまり、待ち受けモードではあるが自然対話モードではない。
『２．待ち受けモード（ウェイクアップモード）における効果』
このように、異なる待ち受け画面が用意されているので、ある待ち受け画面から自然対話モードが開始される場合にユーザーは直前にアクセスしていた画面での対話を行うことができるため利便性がよい。また、待ち受けモード特有の画面を表示させることで、ロボット１が待ち受けモードにあることがユーザーに容易にわかるようになっている。 2. Standby mode (wake-up mode)
The standby mode is a state in which it is possible to interact with the robot 1 when there is a trigger to start the natural interaction mode. It is also the state to which a transition occurs when the dialogue ends in the natural dialogue mode. Also, if the natural dialogue mode is not started for a certain period of time, the device will enter sleep mode. The sleep mode is a state in which it is not possible to communicate with the robot 1 through dialogue.
Here, "natural dialogue" means that the user interacts with the synthesized voice of the device (robot 1) using the built-in scenario of the robot 1 or the dialogue engine (dialogue software) on the server. The natural dialogue mode is a state in which natural dialogue is possible.
There are a plurality of screens in the standby mode, and in this embodiment, two types are prepared, as shown in FIG. 7 and FIG. 8.
FIG. 7 is a standby screen that transitions from the screen in the natural dialogue mode of FIG. 5 (process M2 in FIG. 13). It is also a standby screen displayed at startup by process M0 in FIG. 13. In FIG. 7, the layer screen of the face screen S1 with the eyes (eye object 27) of the robot 1 closed is displayed thinly in the background of the screen of the clock layer on which the date, time, day of the week, and current time are displayed in large letters. .
FIG. 8 is a standby screen that transitions from the screen in the natural dialogue mode of FIG. 6 (process M2 in FIG. 13). In FIG. 8, the layer screen of the chat screen S2 is displayed thinly in the background of a clock layer on which the date and time, day of the week, and current time are displayed in large letters. In other words, it is in standby mode, but not in natural dialogue mode.
“2. Effects in standby mode (wake-up mode)
In this way, since different standby screens are prepared, when the natural dialogue mode is started from a certain standby screen, the user can interact on the screen that was accessed immediately, which is convenient. Furthermore, by displaying a screen specific to the standby mode, the user can easily know that the robot 1 is in the standby mode.

３．自然対話モードの開始と停止
起動されて待ち受けモードやスリープモードにある状態から、コントローラＭＣは例えば次のような複数のタイミング、つまりモード移行のトリガーによって自然対話モードに移行させるよう処理する（図１３の処理Ｍ１、Ｍ５）。以下のトリガーは一例である。自然対話モードでは下記「Ｂ．顔認識時の動作内容について」で説明するような顔認識モードに切り替わる（顔認識ができるようになる）。 3. Starting and Stopping the Natural Dialogue Mode After being activated and in the standby mode or sleep mode, the controller MC processes the transition to the natural dialogue mode at the following multiple timings, that is, the mode transition trigger (Figure 13). processing M1, M5). The trigger below is an example. In the natural dialogue mode, the mode switches to the face recognition mode (face recognition becomes possible) as described in "B. Operation details during face recognition" below.

１－１）待ち受けモードにおいてコントローラＭＣは一定時間内に起動フレーズとして、例えば「ねえ、ユピ坊」というような発話（音声）をマイクロフォン１２から認識するとそれをトリガーとして自然対話モードとする（図１３の処理Ｍ１）。また、タッチパネル部７へのタッチ動作があったと判断した場合もそれをトリガーとして自然対話モードとする（図１３の処理Ｍ１）。
１－２）スリープモードにおいてコントローラＭＣは、所定のタイミングで待ち受け画面のタッチパネル部７へのタッチ動作があったかどうかを判断する。タッチパネル部７へのタッチ動作は顔面部６における表示態様によって異なり、顔画面Ｓ１の待ち受け画面ではタッチパネル部７全域へのタッチが、チャット画面Ｓ２の待ち受け画面では後述する対話開始ボタンオブジェクト３６へのタッチで開始される。つまり、異なる画面で異なる操作で開始されることとなる。
タッチパネル部７へのタッチ動作があったと判断した場合には、コントローラＭＣは一旦待ち受けモードとし（図１３の処理Ｍ３）、続いてもう一度タッチがあったと判断すると自然対話モードとする（図１３の処理Ｍ１）。
１－３）スリープモードにおいてコントローラＭＣは、１－２）と同様にタッチパネル部７へのタッチ動作があったかどうかを判断する。タッチ動作があったと判断した場合には、コントローラＭＣは一旦待ち受けモードとする（図１３の処理Ｍ３）。この状態で一定時間内に起動フレーズとして、例えば「ねえ、ユピ坊」というような発話（音声）をマイクロフォン１２から認識するとコントローラＭＣはそれをトリガーとして自然対話モードとする（図１３の処理Ｍ１）。
２）スリープモードにおいてコントローラＭＣはＲＴＣによってあらかじめ設定された所定の時刻になったかどうかを判断し、所定の時刻になったタイミングで自然対話モードとする（図１３の処理Ｍ５）。
３）スリープモードにおいてコントローラＭＣは、例えば所定のタイミングで生成した乱数によって、ランダムな時間間隔でランダムにある発話（音声データ）をスピーカ装置１３から出力する。つまり、一種の独り言として、例えば「ねえねえ何してる？」とか「暇だなあ」のような対話を誘うような音声をロボット１から出力させて自然対話モードとし（図１３の処理Ｍ５）、ユーザーに発話を促す。
４）スリープモードにおいてコントローラＭＣはドップラーセンサ２２によって物体（人）が動いているかどうかを判断し、物体（人）が動いていることを検出したタイミングで自然対話モードとする（図１３の処理Ｍ５）。 1-1) In the standby mode, when the controller MC recognizes an utterance (voice) such as "Hey, Yupibo" from the microphone 12 as a startup phrase within a certain period of time, it uses this as a trigger to enter the natural dialogue mode (Figure 13 Processing M1). Further, when it is determined that there is a touch operation on the touch panel section 7, this is also used as a trigger to set the natural dialogue mode (process M1 in FIG. 13).
1-2) In the sleep mode, the controller MC determines whether or not there is a touch operation on the touch panel section 7 on the standby screen at a predetermined timing. The touch operation on the touch panel section 7 differs depending on the display mode on the face section 6. On the standby screen of the face screen S1, a touch on the entire touch panel section 7 is performed, and on the standby screen of the chat screen S2, a touch on a dialogue start button object 36, which will be described later, is performed. will be started. In other words, it will start with different operations on different screens.
If it is determined that there has been a touch operation on the touch panel section 7, the controller MC is placed in standby mode (processing M3 in FIG. 13), and then if it is determined that there has been another touch, it is placed in natural dialogue mode (processing in FIG. 13). M1).
1-3) In the sleep mode, the controller MC determines whether there is a touch operation on the touch panel section 7, as in 1-2). If it is determined that there has been a touch operation, the controller MC is temporarily placed in standby mode (process M3 in FIG. 13). In this state, if an utterance (voice) such as "Hey, Yupibo" is recognized from the microphone 12 as a startup phrase within a certain period of time, the controller MC uses this as a trigger to enter the natural dialogue mode (process M1 in FIG. 13). .
2) In the sleep mode, the controller MC determines whether a predetermined time preset by the RTC has arrived, and at the timing when the predetermined time has arrived, the controller MC enters the natural dialogue mode (process M5 in FIG. 13).
3) In the sleep mode, the controller MC outputs random utterances (audio data) from the speaker device 13 at random time intervals, for example, using random numbers generated at a predetermined timing. In other words, as a kind of soliloquy, the robot 1 outputs a voice that invites dialogue, such as "Hey, hey, what are you doing?" or "I have free time," to set the robot 1 in a natural dialogue mode (process M5 in FIG. 13). Encourage users to speak.
4) In the sleep mode, the controller MC determines whether the object (person) is moving using the Doppler sensor 22, and enters the natural dialogue mode at the timing when it detects that the object (person) is moving (process M5 in FIG. 13). ).

５）スリープモードにおいてコントローラＭＣは天候異常や地震等の気象の変化を察知し
た場合に、それをユーザーに報知してこれを契機として自然対話を開始する。外部のクラウドサーバーでは一定の基準で例えば天候異常（例えば、大雪、台風等）や地震、落雷等を含む異常気象の情報を異常気象検出エンジンを利用して一定時刻ごとに取得して記憶する。一定時刻とはすべて同じタイミングでもよく、気象の内容によって取得するタイミングを変えてもよい。異常気象の情報は本実施の形態１では、例えばサーバーからプッシュ型の配信システムを採用して装置（コントローラＭＣ）に配信される。コントローラＭＣは情報を取得すると自然対話モードとする（図１３の処理Ｍ５）。
６）コントローラＭＣは上記１）～５）においてそれぞ自然対話モードとなった状態で一定時間ユーザーからの発話を検出しなかった場合には、待ち受けモードとし（図１３の処理Ｍ２）、更に一定時間後にスリープモードとする（図１３の処理Ｍ４）。これらのモード変位時間の長さは、例えば端末装置よって、あるいはビルトインシナリオとして発話によって適宜設定変更可能である。 5) In the sleep mode, when the controller MC detects a weather change such as a weather abnormality or an earthquake, it notifies the user and uses this as an opportunity to start a natural dialogue. The external cloud server uses an abnormal weather detection engine to acquire and store information on abnormal weather conditions, including weather abnormalities (e.g., heavy snow, typhoons, etc.), earthquakes, lightning strikes, etc., at regular time intervals based on certain criteria. The constant time may be the same timing, or the timing of acquisition may be changed depending on the weather conditions. In the first embodiment, abnormal weather information is distributed from the server to the device (controller MC) using a push-type distribution system, for example. When the controller MC acquires the information, it enters the natural dialogue mode (process M5 in FIG. 13).
6) If the controller MC does not detect any utterance from the user for a certain period of time in the state of being in the natural dialogue mode in each of 1) to 5) above, it changes to the standby mode (processing M2 in FIG. 13), and then After a certain period of time, the computer enters sleep mode (process M4 in FIG. 13). The length of these mode displacement times can be changed as appropriate, for example, using the terminal device or by speaking as a built-in scenario.

『３．自然対話モードの開始と停止における効果』
このように多種類の自然対話モードの開始が用意されることで、様々なタイミングでロボット１と対話することとなりロボット１との対話する機会が多くなり、それによって自然と対話を楽しむ機会も増えることとなって、ユーザーがロボット１を所有するメリットを感じることとなる。また、対話モードが終了すると一旦待ち受けモードになってからスリープモードとなるため、電力コストが削減される。
また、スリープモードから待ち受けモードを飛び越して自然対話モードの画面になるので、直ちに対話を初めることができるため対話開始がスムーズである。また、対話が続く限り対話用の画面（図５や図７）が表示されるため、ユーザーに対話する意欲を惹起させることとなる。 “3. Effects on starting and stopping natural dialogue mode”
By providing the start of many types of natural interaction modes in this way, you will be able to interact with robot 1 at various times, increasing the number of opportunities to interact with robot 1, and thereby increasing the opportunities to enjoy interacting with nature. As a result, the user will feel the benefits of owning the robot 1. Furthermore, when the interaction mode ends, the device first enters the standby mode and then enters the sleep mode, which reduces power costs.
Furthermore, since the screen jumps from the sleep mode to the standby mode and becomes the natural dialogue mode screen, the dialogue can be started immediately, so the dialogue can be started smoothly. Further, as long as the dialogue continues, the dialogue screen (FIGS. 5 and 7) is displayed, which motivates the user to engage in dialogue.

４．自然対話モードにおけるビルトインシナリオの対話
自然対話モードにおいては、ビルトインシナリオの対話とサーバーの対話エンジンを使用した通常対話の複数の対話処理が用意されている。
コントローラＭＣは、ユーザーの発話に基づく発話データ（文字列データ）が、まずビルトインシナリオに合致するかどうかを判断し、そうではない場合にクラウドサーバー経由での対話エンジンを使用した対話（以下、通常対話とする）とするよう制御する。ユーザーからすると常にロボット１と対話しているようであるが、実際は自然対話モードの内部処理は複数あることとなる。
コントローラＭＣはクラウドサーバー側の音声認識エンジンによって作成されたユーザーの発話（文字列データ）をビルトインシナリオ（スクリプト）のテキストデータと比較する処理を実行する。本実施の形態ではビルトインシナリオのテキストデータはメモリに記憶されている。ビルトインシナリオをＳＤカードに追加させてもよい。ＳＤカードであれば書き換えによってビルトインシナリオを次々と増やすことが容易である。
コントローラＭＣはユーザーの発話を認識するとその文字列データが予定した正規表現又は非正規表現に合致するかどうか判断し、合致する場合にはその文字列データに対応するスクリプトを音声合成エンジンによって音声データに変換しスピーカ装置１３からロボと１の発話として音声出力させる。
ビルトインシナリオには、例えばユーザーの発話を促すための「こんにちは」「今日はいい天気ですね」のような挨拶のような簡単なシナリオや、ユーザーからの発話に基づく何かの処理を求めるためのシナリオのようなもの等、多くのビルトインシナリオが設定されている（用意されている）。表１～３にこのようなビルトインシナリオの一例を開示する。もちろん、実際にはこれらのビルトインシナリオ以外にも多くのビルトインシナリオが設定されている。 4. Built-in scenario dialogue in natural dialogue mode In natural dialogue mode, multiple dialogue processes are available: built-in scenario dialogue and normal dialogue using the server's dialogue engine.
The controller MC first determines whether the utterance data (character string data) based on the user's utterances matches the built-in scenario, and if it does not match the conversation using the dialogue engine via the cloud server (hereinafter referred to as normal). (conversation). From the user's perspective, it seems as if he is constantly interacting with the robot 1, but in reality there are multiple internal processes in the natural interaction mode.
The controller MC executes a process of comparing the user's utterance (character string data) created by the voice recognition engine on the cloud server side with the text data of the built-in scenario (script). In this embodiment, the text data of the built-in scenario is stored in memory. A built-in scenario may be added to the SD card. With an SD card, it is easy to increase the number of built-in scenarios one after another by rewriting.
When the controller MC recognizes the user's utterance, it determines whether the character string data matches the scheduled regular expression or non-regular expression, and if it matches, the voice synthesis engine converts the script corresponding to the character string data into voice data. , and the speaker device 13 outputs the voice as the utterance of the robot and 1.
Built-in scenarios include simple scenarios such as greetings such as ``Hello'' and ``It's nice weather today'' to encourage the user to speak, and scenarios to request processing based on the user's utterances. Many built-in scenarios such as scenarios are set (prepared). Tables 1 to 3 disclose examples of such built-in scenarios. Of course, in reality, many built-in scenarios are set in addition to these built-in scenarios.

ビルトインシナリオ通りに対話がされない場合には、途中でビルトインシナリオでの対話は終了する。ビルトインシナリオ通りに対話がされない場合とは、例えば次のような場
合である。
１）予定した正規表現又は非正規表現に合致しない場合
ビルトインシナリオに当初から、あるいは途中から正規表現又は非正規表現に合致しなくなる場合である。また、ユーザーの滑舌が悪くて発話を正しく取得できなかった場合も含む。この場合にはコントローラＭＣは通常対話であると判断して直ちに外部のクラウドサーバーに接続し、以後は外部のクラウドサーバーへ発話データをリクエスト発行し、外部のクラウドサーバー側の対話エンジンに文字列データ化された返答データを作成させる。そして、その返答データを音声合成エンジンによって音声データに変換しスピーカ装置１３から音声出力させるようにして対話を続ける。 If the dialogue is not performed according to the built-in scenario, the dialogue in the built-in scenario ends midway. Examples of cases in which dialogue is not performed according to the built-in scenario are as follows.
1) When it does not match the planned regular expression or non-regular expression This is a case where the built-in scenario does not match the regular expression or non-regular expression from the beginning or midway through. This also includes cases where the user's utterances could not be correctly captured due to poor diction. In this case, the controller MC determines that it is a normal conversation, immediately connects to the external cloud server, issues a request for speech data to the external cloud server, and sends the string data to the dialogue engine on the external cloud server. Create formatted response data. Then, the response data is converted into voice data by the voice synthesis engine, and the voice is output from the speaker device 13 to continue the conversation.

２）予定通りにビルトインシナリオでの対話が終了した場合
例えば、ユーザーに対してシナリオに従った、例えば、「×××を行ってよいですか？」という発話をした際に、「はい」や「お願いします」等の肯定的な発話があって予定通りにビルトインシナリオでの対話が終了したため対話がなくなった場合、あるいはシナリオの途中で対話がなくなった場合等が考えられる。この場合には一定時間後に待ち受けモードとなる。
３）ある処理を進めてよいかどうかについてユーザーの発話が否定的であった場合
ユーザーに対してシナリオに従った、例えば、「×××を行ってよいですか？」という発話をした際に、「はい」や「お願いします」等の肯定的な発話ではなく、「いいえ」「間違いでした」のような否定的な発話があった場合もビルトインシナリオは終了し、以後の対話は１）又は２）と同様である。
この否定的発話の際にはコントローラＭＣは「本当にいいですか？」などと処理をやめてよいかどうかの確認を行う。これによってユーザーの言い間違いや心変わり等に対応することができる。例えば、電源オフ用シナリオにおいてユーザーに対してシナリオに従った「本当に電源オフしなくてもいいの？」という問いかけの発話をした際に、ユーザーから「はい」という発話があった場合には「本当に電源オフしなくてもいいの？」という問いかけを複数回（実施の形態では例えば３回）繰り返して「はい」があるとビルトインシナリオでの対話は終了する。 2) When the dialogue in the built-in scenario ends as planned For example, when you say to the user according to the scenario, for example, "Can I do XXX?", if you say "Yes" or Possible cases include a case where there is no dialogue because a positive utterance such as "Please" is made and the dialogue in the built-in scenario ends as planned, or a case where dialogue ends in the middle of the scenario. In this case, the device enters standby mode after a certain period of time.
3) When the user's utterance is negative regarding whether or not to proceed with a certain process When the user utters according to the scenario, for example, "Can I proceed with XXX?" , the built-in scenario also ends if there is a negative utterance such as ``No'' or ``It was a mistake'' instead of a positive utterance such as ``Yes'' or ``Please''. ) or similar to 2).
At the time of this negative utterance, the controller MC confirms whether or not it is okay to stop the process by asking, ``Are you sure?''. This makes it possible to respond to mistakes made by the user, changes of mind, etc. For example, in a power off scenario, if you ask the user, "Do you really need to turn off the power?" according to the scenario, and the user says "yes," then ""Do I really need to turn off the power?" is repeated multiple times (for example, three times in the embodiment), and if the answer is "Yes," the dialogue in the built-in scenario ends.

『ビルトインシナリオとする効果』
このようにビルトインシナリオが用意されていると、すべての対話を外部サーバーにリクエストする必要がなく、装置内部で処理できるため、サーバーに接続する通信コストが軽減され、また通信時間やサーバー側での計算時間が不要となるためユーザーの発話に対する返答が遅くなりすぎて会話が途切れてしまうような違和感を覚えることがなくなる。また、例えば、決まった処理を実行させる場合にこのようなビルトインシナリオを設けておくことでユーザーは処理実行のためにタッチパネル部７を操作したり、他の端末からロボット１にアクセスしたりする必要がなくなり対話で処理を実行させることができ、ユーザーフレンドリーである。 “Effect of using built-in scenario”
With built-in scenarios like this, all interactions do not need to be requested to an external server and can be handled within the device, reducing communication costs for connecting to the server, and reducing communication time and server-side Since no calculation time is required, the user no longer feels uncomfortable when the conversation is interrupted due to a delay in responding to the user's utterances. In addition, for example, by providing such a built-in scenario when executing a predetermined process, the user does not have to operate the touch panel unit 7 or access the robot 1 from another terminal to execute the process. It is user-friendly as it eliminates the need for dialogue and allows processes to be executed interactively.

５．通常対話におけるリクエストとレスポンス
一方、発話データ（文字列データ）はビルトインシナリオではない場合に、コントローラＭＣはクラウドサーバーに接続させ、改めて発話データ（文字列データ）をサーバーに送信し対話エンジンによる返答データの作成をリクエストする。コントローラＭＣはユーザーが認証できている場合にはリクエストにおいてユーザー毎の認証情報（例えばＩＤとパスワード）に発話データの冒頭に送る。その場合には過去の対話情報が加味されて返答データが作成される。一方、ユーザーが認証できていない場合には過去に特定されていない人物として特に認証情報は送信しないので、過去の対話情報は加味されない。クラウドサーバーはリクエストがあると対話ＡＰＩ（Application Programming Interface）を利用して対話エンジンにその発話データ（文字列データ）に基づいて返答データ（文字列データ）を作成させ、ロボット１（コントローラＭＣ）にレスポンスする。過去のユーザーの対話履歴がある場合にはその内容を加味した返答データが作成される。コントローラＭＣはこの返答データを音声データに変換しスピーカ装置１３からロボット１側の発話として出力させる。
上記のビルトインシナリオと同様に一定以上時間のユーザーの無言があれば待ち受けモードとなる。
『５．通常対話におけるリクエストとレスポンスにおける効果』
ビルトインシナリオと異なり外部のクラウドサーバーに接続して通常対話をすることで、ビルトインシナリオに比べて格段なデータ量による高度な対話解析が迅速に実行できることとなり、実際に人と対話しているような高度な対話が実現できる。 5. Requests and responses in normal dialogue On the other hand, if the utterance data (character string data) is not a built-in scenario, the controller MC connects to the cloud server, sends the utterance data (character string data) to the server again, and receives the response data from the dialogue engine. request creation. If the user has been authenticated, the controller MC sends authentication information for each user (for example, ID and password) at the beginning of the speech data in the request. In that case, past dialogue information is taken into account to create response data. On the other hand, if the user cannot be authenticated, no authentication information is sent as the user has not been identified in the past, so past interaction information is not taken into consideration. When the cloud server receives a request, it uses the dialogue API (Application Programming Interface) to have the dialogue engine create response data (character string data) based on the utterance data (character string data), and sends it to robot 1 (controller MC). respond. If there is a history of past user interactions, response data is created that takes this information into account. The controller MC converts this response data into audio data and causes the speaker device 13 to output it as speech from the robot 1 side.
Similar to the built-in scenario above, if the user remains silent for a certain amount of time, the system will enter standby mode.
“5. Effects on requests and responses in normal dialogue”
Unlike the built-in scenario, by connecting to an external cloud server and having normal conversations, it is possible to quickly perform advanced dialogue analysis using a much larger amount of data than the built-in scenario, making it as if you were actually having a conversation with a person. High-level dialogue can be achieved.

６．対話時のロボット１の所作について
コントローラＭＣは、ビルトインシナリオ又は通常対話に関わらず自然対話モードで対話が行われている際に以下のイ．～ニ．のような所作の制御を実行する。
イ．起動～自然対話モード～待ち受けモードにおけるロボット１のジェスチャー
コントローラＭＣはロボット１の以下の様々なタイミングで第１～第３のモータ２３～２５を制御してロボット１の姿勢を変えるようにする。以下は一例である。
１）起動時：頭部５の顔面部６が正面を向いていない場合や頭部５が傾いている場合に正面のデフォルト位置に移動させる。
２）画面タッチ時：１）と同様（顔認証における顔認識用カメラ１０をユーザーと正対させるため）
３）「ねぇユピ坊」というトリガーの発話発生時：１）と同様（顔認証における顔認識用カメラ１０をユーザーと正対させるため）
４）音声方向検出時：頭部５の顔面部６をその方向に向ける。
５）特別な感情発話として、例えばうれしい場合：頭部５を顔面部６をユーザーに向けたまま左右方向（時計回りと反時計回り）に回動するように第３のモータ２５を制御する。
６）特別な発話として、例えば悲しい場合：頭部５をうなずいたまましばらく静止させ、その後デフォルト位置に戻すように第２のモータ２４を制御する。
７）感情は発話として、例えば「おはよう」「こんにちは」「こんばんわ」「ハロー」のような挨拶用の発話発生時：頭部５をお辞儀させるように第２のモータ２４を制御する。
８）特別ではない感情発話として、例えば「そうか」「わかった」「そうだね」「うん！」のような簡単な肯定的な意思疎通の用語の発話発生時：頭部５をうなずかせるように第２のモータ２４を制御する。６～８）では第２のモータ２４の速度や時間を変更して悲しさとお辞儀とうなづきが異なるようにするとよい。
９）特別ではない感情発話として、例えば「いいえ」「できない」「だめだよ」のような簡単な否定的な意思疎通の用語の発話発生時：可動部３を何度か左右に回動させるように第１のモータ２３を制御する。
１０）特別な感情発話や特別ではない感情発話と様々な対話に応じて、様々なジェスチャー、例えば頭部５をなんどか左右方向（時計回りと反時計回り）にや前後に回動させたり、それと可動部３全体を左右に回動させたり、大きく回動させたり小さくうなづくように回動させたりを組み合わせてもよい。
このロボット１のジェスチャーは以下のタッチパネル部７（顔面部６）における表示と組み合わせるとよい。
『６．対話時のロボット１の所作についてにおける効果その１』
ロボット１にこれらのようなジェスチャーをさせることで、ユーザーはロボット１に親しみを覚えることとなり、ロボット１との対話を楽しむと同時にロボット１と積極的に触れ合う楽しみを覚えることになる。 6. Regarding the actions of robot 1 during dialogue, the controller MC performs the following actions when dialogue is being performed in natural dialogue mode, regardless of whether it is a built-in scenario or normal dialogue. ～d. Execute control of gestures such as
stomach. Gestures of the robot 1 in startup, natural dialogue mode, and standby mode The controller MC controls the first to third motors 23 to 25 of the robot 1 at the following various timings to change the posture of the robot 1. Below is an example.
1) At startup: If the face part 6 of the head 5 is not facing the front or if the head 5 is tilted, it is moved to the default position of the front.
2) When touching the screen: Same as 1) (to make the face recognition camera 10 face the user directly in face recognition)
3) When the trigger utterance "Hey Yupibo" occurs: Same as 1) (to make the face recognition camera 10 face the user directly in face recognition)
4) When detecting the voice direction: Turn the face part 6 of the head 5 in that direction.
5) As a special emotional utterance, for example, when the user is happy: The third motor 25 is controlled to rotate the head 5 in the left and right directions (clockwise and counterclockwise) with the face 6 facing the user.
6) As a special utterance, for example, when the user is sad: The second motor 24 is controlled so as to keep the head 5 nodding and still for a while, and then return it to the default position.
7) Emotions are expressed as utterances, such as when greeting utterances such as "good morning,""hello,""goodevening," and "hello" are generated: The second motor 24 is controlled to bow the head 5.
8) When utterances of simple positive communication terms such as "I see,""Isee,""Isee," and "Yeah!" occur as non-special emotional utterances: Nod the head 5. The second motor 24 is controlled as follows. In steps 6 to 8), it is preferable to change the speed and time of the second motor 24 so that sadness, bowing, and nodding are different.
9) When a simple negative communication term such as "No", "I can't", or "No" is uttered as a non-special emotional utterance: Rotate the movable part 3 left and right several times. The first motor 23 is controlled as follows.
10) In response to special emotional utterances, non-special emotional utterances, and various dialogues, various gestures may be made, such as rotating the head 5 in some direction to the left or right (clockwise and counterclockwise), back and forth, This may be combined with rotating the entire movable part 3 left and right, rotating it largely, or rotating it in a small nodding manner.
This gesture of the robot 1 may be combined with the following display on the touch panel section 7 (face section 6).
“6. Effect 1 on robot 1's behavior during dialogue
By having the robot 1 make these gestures, the user becomes familiar with the robot 1, enjoys dialogue with the robot 1, and at the same time learns the pleasure of actively interacting with the robot 1.

ロ．顔画面Ｓ１の状態での表示態様の変化
１）ユーザーから発話がされている場合
コントローラＭＣはユーザーの発話をマイクロフォン１２から取得してこれを認識すると、タッチパネル部７の図５に示すような顔画面Ｓ１において楕円領域２９を青色表示としてユーザーの発話音量に応じてその領域の面積（つまり大きさ）を変化させるアニメーション表示をする。具体的にはコントローラＭＣは、ユーザーの発話の音量が大きくなると楕円領域２９は楕円形状を保ったまま拡大させ、音量が小さくなると楕円形状を保ったまま縮小させる。また、ほっぺオブジェクト２８を緑色で表示させる。 B. Change in display mode in the state of the face screen S1 1) When the user is speaking When the controller MC acquires the user's speech from the microphone 12 and recognizes it, a face as shown in FIG. 5 on the touch panel section 7 is displayed. On the screen S1, an animation is displayed in which the elliptical area 29 is displayed in blue and the area (that is, the size) of the area changes in accordance with the user's utterance volume. Specifically, when the volume of the user's speech increases, the controller MC expands the elliptical region 29 while maintaining the elliptical shape, and when the volume decreases, the controller MC contracts the elliptical region 29 while maintaining the elliptical shape. Also, the cheek object 28 is displayed in green.

また、コントローラＭＣは、クラウドサーバーからレスポンスされたユーザーの発話データ（文字列データ）を所定の態様でタッチパネル部７に表示させる。
例えば、図５に示すように、タッチパネル部７が顔画面Ｓ１の場合に、ユーザーからの例えば「こんにちは」という発話を取得すると、コントローラＭＣは顔画面Ｓ１のレイヤ
ー画面にこの発話に基づく「こんにちは」という文字列を表示させる。順序としてはユーザーの発話の返答となるロボット１の発話よりも先にこの表示が開始される。
表示態様としては、例えば顔画面Ｓ１を図１０（ａ）から図１０（ｂ）のように透明な状態から徐々に不透明になるように表示させ、最後に図１０（ｃ）のように背後の顔画面Ｓ１を完全に隠すようにする。つまり、徐々に文字列を表示させていくようにする。この文字列だけを暗い背景に対して文字部分だけを明るく表示させた図１０（ｃ）の状態をごくわずかな一定時間停止表示させた後に、今度は逆に文字列を表示したレイヤー画面を図１０（ｃ）→図１０（ｂ）→図１０（ａ）というように徐々に消していき、デフォルト状態である顔画面Ｓ１に戻すようにする。このとき、一回の発話での文字列はすべて同時に現れてきて同時に消失していくように表示される。この表示態様は一例であり、異なる態様で表示させるようにしてもよい。
『６．対話時のロボット１の所作についてにおける効果その２』
これによってユーザーは自分の発した言葉をロボット１上で目で見ることができるため、ロボット１が正しく聞き取ったかどうかを確認でき、対話が間違いなく行われているかを判断でき、おかしな的外れな対話にならないように対話を導くことができる。また、的外れな会話はついイライラしてしまうが、確認することでその理由もわかるため、しゃべり方を変えて再度対話を試みることもできる。
また、ユーザーの発話データはビルトインシナリオの対象も通常対話の対象もすべてクラウドサーバーに一旦文字列データすることをリクエストするため、文字列データ化の前提処理に手間取らず、また、このような文字列データ後に初めて対話エンジンによる返答データの作成がリクエストされることとなるため、ユーザーの発話のタッチパネル部７の表示は少なくともロボット１の返答データによる発話より前に行うことができ、対話の順序を間違うおそれがない。 Further, the controller MC causes the touch panel unit 7 to display the user's utterance data (character string data) responded from the cloud server in a predetermined manner.
For example, as shown in FIG. 5, when the touch panel unit 7 is on the face screen S1, when an utterance such as "Hello" is acquired from the user, the controller MC displays "Hello" on the layer screen of the face screen S1 based on this utterance. Display the string. In terms of order, this display starts before the robot 1's utterance, which is a response to the user's utterance.
As a display mode, for example, the face screen S1 is displayed from a transparent state to gradually become opaque as shown in FIGS. 10(a) to 10(b), and finally the face screen S1 is displayed as shown in FIG. 10(c). The face screen S1 is completely hidden. In other words, the character strings are displayed gradually. The state shown in Figure 10(c) in which only this character string is displayed brightly against a dark background is stopped and displayed for a very short period of time, and then the layer screen in which the character string is displayed in reverse is shown in Figure 10(c). 10(c)→FIG. 10(b)→FIG. 10(a), and the face screen S1 is returned to the default state. At this time, all the character strings in one utterance appear and disappear at the same time. This display mode is an example, and the display may be displayed in a different mode.
“6. Effects on robot 1's behavior during dialogue Part 2
This allows users to visually see the words they say on Robot 1, allowing them to check whether Robot 1 has heard them correctly, determine whether the dialogue is occurring correctly, and prevent strange and off-topic dialogue. You can guide the dialogue to avoid this. Also, if a conversation is off-topic, it's easy to get irritated, but by checking it, you can understand the reason, so you can change the way you speak and try the conversation again.
In addition, since the user's utterance data, both the targets of built-in scenarios and the targets of normal dialogue, are requested to be converted into string data to the cloud server, there is no need to take the time to perform the prerequisite processing to convert such characters into string data. Since creation of response data by the dialogue engine is requested for the first time after the column data, the user's utterances can be displayed on the touch panel section 7 at least before the robot 1's utterances based on the response data, and the order of the dialogue can be adjusted. There is no risk of making a mistake.

ユーザーからの発話を文字列とする場合、その長さは発話に応じて異なるため同じではない。また、単語ではなく文節がある「文」となっている場合にはかなり長くなる場合もある。コントローラＭＣはそのように長い文の発話である場合でも、一回の発話の内容がタッチパネル部７にすべて同時に表示されるように文字列のフォントの大きさを調整する。つまり、一回の発話が短ければ大きなフォントで、一回の発話が長くなるほど相対的に小さなフォントで表示させる。
『６．対話時のロボット１の所作についてにおける効果その３』
これによって、ユーザーがどのような発話をしても、一回の目視で確認できるため、全文が現れるまで対話を中断しにくく、次のユーザーからの発話とかぶりにくくなる。また、一回の発話が一度に同時に現れるため、文全体を一挙に理解できることとなり、表示される時間が短くともユーザーは十分理解できることとなる。また、タッチパネル部７全体に文字列が展開されるため、字の１つ１つを大きく表示できユーザーにとって見やすくなっており、ごく短い表示時間であっても十分確認できるようになっている。 When utterances from the user are expressed as character strings, the length varies depending on the utterance and is therefore not the same. Furthermore, if the sentence is a "sentence" that has clauses instead of words, it may become quite long. Even when the utterance is such a long sentence, the controller MC adjusts the font size of the character string so that the contents of one utterance are all displayed on the touch panel section 7 at the same time. That is, the shorter one utterance is displayed in a larger font, and the longer one utterance is displayed in a relatively smaller font.
“6. Effects on robot 1's behavior during dialogue Part 3
As a result, no matter what the user says, it can be confirmed with a single visual glance, making it difficult to interrupt the conversation until the full text appears, and making it difficult to overlap the next user's utterance. Furthermore, since each utterance appears at the same time, the entire sentence can be understood at once, and even if the display time is short, the user can fully understand it. In addition, since the character string is spread over the entire touch panel section 7, each character can be displayed in a large size, making it easy for the user to see, and even if the display time is very short, it can be sufficiently confirmed.

２）ロボット１から発話がされている場合
タッチパネル部７の図５に示すような顔画面Ｓ１において楕円領域２９を赤色表示としてロボットの発話音量に応じてその領域の面積（つまり大きさ）を変化させるアニメーション表示をする。具体的にはコントローラＭＣは、スピーカ装置１３からの出力レベルが大きくなると楕円領域２９は楕円形状を保ったまま拡大させ、音量が小さくなると楕円形状を保ったまま縮小させる。また、ほっぺオブジェクト２８を薄い赤色で表示させる。
『６．対話時のロボット１の所作についてにおける効果その４』
このように、ユーザーとロボット１との交互の対話に応じて顔画面Ｓ１における表示態様が異なることとなり、実際の対話だけでなく画面においても交互に行われるというおもしろさがあり、会話がはずむことになる。 2) When the robot 1 is speaking, the elliptical area 29 is displayed in red on the face screen S1 as shown in FIG. Display the animation that will be displayed. Specifically, when the output level from the speaker device 13 increases, the controller MC expands the elliptical area 29 while maintaining its elliptical shape, and when the volume decreases, it reduces the elliptical area 29 while maintaining its elliptical shape. Also, the cheek object 28 is displayed in light red.
“6. Effect 4 on robot 1's behavior during dialogue
In this way, the display mode on the face screen S1 changes depending on the alternating interaction between the user and the robot 1, and it is interesting that the interaction takes place not only during the actual interaction but also on the screen, making the conversation more lively. become.

ハ．チャット画面Ｓ２の表示態様の変化
図６及び図８に基づいてタッチパネル部７のチャット画面Ｓ２の対話に伴う表示態様について説明する。上記のようにユーザーの操作によって顔画面Ｓ１からチャット画面Ｓ２へと表示が変わる。
まず、改めてチャット画面Ｓ２の構成について説明する。
図６に示すように、チャット画面Ｓ２の左寄り下側位置にはアバターキャラクターとしてユーザーオブジェクト３１が、右寄り下側位置には同じくロボットオブジェクト３２が対向するように配置されて表示されている。ユーザーオブジェクト３１は後述する顔認識モードで認識された認証されたユーザー毎あるいは認証のないユーザにおいて異なるオブジェクトが用意され、現在対話しているユーザーに応じてそれぞれ異なるオブジェクトが表示される。中央寄り領域にはユーザー側とロボット１側の対話内容を文字列化して配置した吹き出しオブジェクト３３が時間軸に沿って順に表示されている。チャット画面Ｓ２の左寄り上側位置には対話停止ボタンオブジェクト３４が表示されている。チャット画面Ｓ２の右寄り上側位置には設定ボタンオブジェクト３５が表示されている。 C. Changes in Display Mode of Chat Screen S2 The display mode of the chat screen S2 of the touch panel unit 7 that accompanies dialogue will be described based on FIGS. 6 and 8. As described above, the display changes from the face screen S1 to the chat screen S2 depending on the user's operation.
First, the configuration of the chat screen S2 will be explained again.
As shown in FIG. 6, a user object 31 as an avatar character is displayed at the lower left position of the chat screen S2, and a robot object 32 is also arranged to face the user object 32 at the lower right position. A different user object 31 is prepared for each authenticated user recognized in a face recognition mode to be described later or for an unauthenticated user, and a different object is displayed depending on the user currently interacting. In the central region, balloon objects 33 in which the contents of the dialogue between the user side and the robot 1 side are arranged in character strings are displayed in order along the time axis. A dialogue stop button object 34 is displayed on the upper left side of the chat screen S2. A setting button object 35 is displayed on the upper right side of the chat screen S2.

チャット画面Ｓ２ではユーザー側とロボット１側の対話に応じて刻々と吹き出しオブジェクト３３が追加されるように表示される。吹き出しオブジェクト３３には文字列データ化されたユーザーの発話内容と、同じく文字列データ化されたロボット１の発話内容が時間軸に沿って一列に表示されてチャット画面Ｓ２上に表示可能とされ、直近の発話内容は新たな吹き出しオブジェクト３３内にその発話と同期して過去の吹き出しオブジェクト３３列の最も下側に表示される。
吹き出しオブジェクト３３はユーザー側の発話内容かロボット１側の発話内容かわかるように発話方向が示されている。すべての対話履歴を一度に画面表示できないためチャット画面Ｓ２は上下方向にスクロール可能な画面構成とされ、過去に遡って吹き出しオブジェクト３１を表示させることができる。過去に遡らない場合には常に直近の対話の吹き出しオブジェクト３３が表示される。
本実施の形態１では一旦対話が終了して待ち受けモードとなった後に、対話が再開され、その際に後述する顔認識モードで改めて認識されたユーザーが変更された場合には、吹き出しオブジェクト３３列の途中に「ユーザー交替」の表示がされ、ユーザーオブジェクト３１が改めて認識されたユーザーに応じて違うユーザーオブジェクト３１が表示される。 On the chat screen S2, balloon objects 33 are displayed so as to be added every moment according to the dialogue between the user side and the robot 1 side. In the speech bubble object 33, the user's utterance contents converted into character string data and the utterance contents of the robot 1 also converted into character string data are displayed in a line along the time axis and can be displayed on the chat screen S2, The latest speech content is displayed in a new speech bubble object 33 at the bottom of the past speech bubble object 33 column in synchronization with the speech.
The direction of the speech bubble object 33 is indicated so that it can be determined whether the speech is from the user or the robot 1. Since it is not possible to display all of the dialogue history on the screen at once, the chat screen S2 has a screen configuration that can be scrolled in the vertical direction, so that the balloon object 31 can be displayed going back to the past. When not going back to the past, the balloon object 33 of the most recent dialogue is always displayed.
In Embodiment 1, if the dialogue is restarted after the dialogue ends and the standby mode is entered, and the user who is recognized again in the face recognition mode described later is changed, 33 columns of speech bubble objects are displayed. ``User change'' is displayed in the middle, and a different user object 31 is displayed depending on the user whose user object 31 has been recognized anew.

また、チャット画面Ｓ２の対話停止ボタンオブジェクト３４をタッチすることで、対話はユーザーによって能動的に中断され、待ち受けモードとなる。この場合には図６に代わって図８のチャット画面Ｓ２の待ち受け画面が表示されることとなるが、対話停止ボタンオブジェクト３４に位置には対話開始ボタンオブジェクト３６が代わって表示される。再び自然対話モードにする場合には対話開始ボタンオブジェクト３６をタッチすることで図６のチャット画面Ｓ２に戻ることができる。
『６．対話時のロボット１の所作についてにおける効果その５』
このように対話する関係にあるユーザーのユーザーオブジェクト３１とロボット１のロボットオブジェクト３２とが対向するように配置され、その間に対話した吹き出しオブジェクト３３が並ぶことでいかにも対話しているような感覚をチャット画面Ｓ２から受けることができる。
また、過去のチャット履歴を後から確認することもできるため日記代わりにチャット利用をすることができる。また、だれがどのような対話をしたかもわかるため、家族でだれがよく利用しているか等といったデータを確認することもできる。「ユーザー交替」という表示がされるので、そこで一旦対話が途切れていることがわかり、過去の履歴を読んだ際の混乱がない。 Furthermore, by touching the dialogue stop button object 34 on the chat screen S2, the user actively interrupts the dialogue and enters a standby mode. In this case, the standby screen of the chat screen S2 of FIG. 8 will be displayed instead of FIG. 6, but a dialogue start button object 36 will be displayed in place of the dialogue stop button object 34. To return to the natural dialogue mode, touch the dialogue start button object 36 to return to the chat screen S2 in FIG. 6.
“6. Effect 5 on robot 1's behavior during dialogue
In this way, the user object 31 of the user and the robot object 32 of the robot 1, which are in a dialogue relationship, are arranged so as to face each other, and the speech bubble objects 33 with which they have interacted are lined up between them, giving the chat feeling that they are actually having a dialogue. It can be received from screen S2.
You can also check your past chat history later, so you can use chat instead of a diary. Additionally, since you can see who has had what kind of conversations, you can also check data such as who in your family uses the service the most. Since the message "User replacement" is displayed, you can see that the dialogue is interrupted at that point, and there is no confusion when reading the past history.

ニ．通常対話における特別な所作
通常対話においてクラウドサーバーはユーザーの発話データ内に特定の言葉が含まれて
いると判断した場合に特別な所作を実行させるようなコマンドを文字列データとともにレスポンスする。コントローラＭＣはそのコマンドによって上記の画面表示プログラムやジェスチャープログラムや対話プログラムに基づいて、例えば次のような具体的な所作を実行させる。以下の制御は一例であり、他の所作となるように制御をさせてもよく、ユーザーの発話中にコマンドが複数あれば連続又は同時に所作を行うように制御してもよい。以下の特別な所作はそれぞれ別個でもよく、組み合わせるように実行されてもよい。上記の「６．対話時のロボット１の所作について」のイ．におけるロボット１のジェスチャーに代わって下記の表示部での表示をしてもよく、下記の表示部での表示を適宜組み合わせるようにしてもよい。 D. Special gestures in normal dialogue During normal dialogue, when the cloud server determines that a specific word is included in the user's utterance data, it responds with a command to perform a special gesture along with character string data. Based on the command, the controller MC causes the user to perform the following specific actions based on the above screen display program, gesture program, and dialogue program. The following control is an example, and the control may be performed to perform other gestures, or if there are multiple commands while the user is speaking, the command may be controlled to perform the gestures consecutively or simultaneously. The following special movements may be performed separately or in combination. I. of "6. Behavior of robot 1 during dialogue" above. In place of the gesture of the robot 1 in , the following display unit may be displayed, or the following display unit displays may be combined as appropriate.

１）通常の人同士の会話で否定的な表現がユーザーから発話された場合には、ロボット１側の発話と同時にタッチパネル部７の表示を図９（ａ）の通常の目から図９（ｂ）の怒った目のオブジェクトに変化するアニメーション表示をさせる。本実施の形態では目のオブジェクトはメモリに記憶されている。
２）通常の人同士の会話で楽しくなるような表現がユーザーから発話された場合には、ロボット１側の発話と同時にタッチパネル部７の表示を図９（ａ）の通常の目から図９（ｃ）の笑った目のオブジェクトに変化するアニメーション表示をさせる。本実施の形態では目のオブジェクトはメモリに記憶されている。
３）通常の人同士の会話で悲しくなるような表現がユーザーから発話された場合には、ロボット１側の発話と同時にタッチパネル部７の表示を図９（ａ）の通常の目から図９（ｄ）の悲しそうな目のオブジェクトに変化するアニメーション表示をさせる。本実施の形態では目のオブジェクトはメモリに記憶されている。
４）ユーザーの子供の名前がユーザーから発話された場合には、ロボット１の頭部５がうなずくようなジェスチャー動作をするように第２のモータ２４を制御する。本実施の形態では目ジェスチャー用のプログラムはメモリに記憶されている。
５）ロボット１を製造している会社名がユーザーから発話された場合には、ロボット１の頭部５が前後左右に動くと同時に胴部４が固定部２に対してなんども揺動を繰り返すようなジェスチャー動作をするように第１～第３のモータ２３～２５を制御する。同時にその会社名のテキストデータを音声合成して、音声としてスピーカ装置１３から会社名を連呼させる。本実施の形態ではジェスチャー用のプログラムはメモリに記憶されている。
『６．対話時のロボット１の所作についてにおける効果その６』
これらのような特別な所作が行われることで、ユーザーは対話と同時にロボット１の思わぬ所作を期待することができ、ロボット１との対話を積極的に楽しむことができる。 1) When a negative expression is uttered by the user in a normal conversation between people, the display on the touch panel section 7 is changed from the normal eyes in FIG. 9(a) to the display in FIG. ) to display a changing animation on the angry eyes object. In this embodiment, the eye object is stored in memory.
2) When the user utters an expression that would be fun in a conversation between normal people, the display on the touch panel section 7 is changed from the normal eyes of FIG. c) The smiling eye object is displayed in a changing animation. In this embodiment, the eye object is stored in memory.
3) When a user utters an expression that makes one feel sad during a conversation between normal people, the display on the touch panel section 7 is changed from the normal eyes of FIG. d) Display a changing animation of the sad-looking object. In this embodiment, the eye object is stored in memory.
4) When the user's child's name is spoken by the user, the second motor 24 is controlled so that the head 5 of the robot 1 performs a nodding gesture. In this embodiment, the eye gesture program is stored in memory.
5) When the name of the company that manufactures the robot 1 is uttered by the user, the head 5 of the robot 1 moves back and forth and left and right, and at the same time the body 4 repeatedly swings against the fixed part 2. The first to third motors 23 to 25 are controlled to perform gesture movements such as the following. At the same time, the text data of the company name is synthesized into speech, and the company name is repeatedly called out as voice from the speaker device 13. In this embodiment, the gesture program is stored in memory.
“6. Effect 6 on robot 1's behavior during dialogue
By performing these special actions, the user can expect unexpected actions from the robot 1 at the same time as the dialogue, and can actively enjoy the dialogue with the robot 1.

Ｂ．顔認識時の動作内容について
１．顔認識モードの開始と停止
コントローラＭＣは顔認識プログラムを実行することによってユーザーの顔の認識及び認証をする。顔認識プログラムでは取得した画像を顔パターン認識することによって人の顔と認識し、かつ認識された顔の様々な位置を数値化して記憶することで過去に登録された顔の数値データとの一致度を判断して認証を行う。コントローラＭＣは自然対話モードと同期して顔認識モードとし、待ち受けモードから自然対話モードに移行する度に顔認証を実行する。 B. Regarding the operation details during face recognition 1. Starting and Stopping the Face Recognition Mode The controller MC recognizes and authenticates the user's face by running a face recognition program. The face recognition program recognizes the acquired image as a human face by recognizing the facial pattern, and also digitizes and stores the various positions of the recognized face to match the numerical data of the face registered in the past. Authentication is performed by determining the degree of The controller MC enters the face recognition mode in synchronization with the natural dialogue mode, and executes face recognition every time the mode shifts from the standby mode to the natural dialogue mode.

顔認識モードではコントローラＭＣは顔認識用カメラ１０を使用してユーザーの顔認識を行う。具体的には、
１）顔認識用カメラ１０を起動させる。顔認識用カメラ１０に写ったユーザーの顔画像をタッチパネル部７に表示させる（顔表示モード）。つまり、ユーザーにタッチパネル部７上の自分の顔を見るように促す。これによって顔認識処理が可能となり、このようにプレビューさせることで過去に認証された人かどうかを判断できる。
２）１）で顔認識用カメラ１０が顔を撮影できず、一定時間内に顔認識ができなかった場
合には、第３のモータ２５を駆動させて頭部５を上下に揺動させる。つまり顔認識用カメラ１０に縦方向をスキャンさせる。そして、そのように顔認識用カメラ１０を縦方向にスキャンさせながら第１のモータ２３を駆動させて顔認識用カメラ１０を３６０度一周回転させながら顔認識動作をさせる。
３）１）又は２）で顔認識できた場合には認証を行う。既に登録されたユーザーであれば対話において特定の認証されたユーザーのデータを利用して上記自然対話モードとする。登録されていないユーザーであれば不特定の人物として認識して上記自然対話モードとする。タッチパネル部７は顔表示モードから直前の顔画面Ｓ１（図５）かチャット画面Ｓ２（図６）のいずれかに復帰する。
４）２）において顔認識ができなかった場合には人を認識できなかったとして顔認識モードとともに自然対話モード自体を終了させて待ち受けモードとする。タッチパネル部７は顔表示モードから直前の待ち受けモードである顔画面Ｓ１の待ち受け画面（図７）かチャット画面Ｓ２の待ち受け画面（図８）のいずれかに復帰する。
『１．顔認識モードの開始と停止における効果』
対話は相手の顔を見ながら話すのが基本であるため、顔が認識できない場合には対話をさせないことで、積極的にユーザーに顔認識をさせるようにしたため、対話においてはロボット１と実際に面と向かわないと対話はできず、そのためユーザーは実際に対話をしているような感覚を得ることができる。 In the face recognition mode, the controller MC uses the face recognition camera 10 to recognize the user's face. in particular,
1) Activate the face recognition camera 10. The user's face image captured by the face recognition camera 10 is displayed on the touch panel unit 7 (face display mode). In other words, the user is prompted to look at his or her face on the touch panel section 7. This enables facial recognition processing, and by previewing it in this way, it can be determined whether the person has been authenticated in the past.
2) If the face recognition camera 10 cannot photograph the face in 1) and the face cannot be recognized within a certain period of time, the third motor 25 is driven to swing the head 5 up and down. In other words, the face recognition camera 10 is caused to scan in the vertical direction. Then, while scanning the face recognition camera 10 in the vertical direction, the first motor 23 is driven to rotate the face recognition camera 10 once around 360 degrees to perform a face recognition operation.
3) If the face can be recognized in 1) or 2), perform authentication. If the user is already registered, data of a specific authenticated user is used in the dialogue to enter the natural dialogue mode. If the user is not registered, the system recognizes the user as an unspecified person and enters the natural dialogue mode. The touch panel section 7 returns from the face display mode to either the previous face screen S1 (FIG. 5) or the chat screen S2 (FIG. 6).
4) If the face cannot be recognized in step 2), it is assumed that the person could not be recognized, and the face recognition mode and the natural dialogue mode are terminated to enter the standby mode. The touch panel unit 7 returns from the face display mode to either the standby screen of the face screen S1 (FIG. 7) or the standby screen of the chat screen S2 (FIG. 8), which is the previous standby mode.
“1. Effects on starting and stopping face recognition mode”
Since the basic idea of dialogue is to look at the other person's face while speaking, if the face cannot be recognized, the user will not be able to have the dialogue, so the user will be actively asked to recognize the face. Dialogue cannot occur unless you are face-to-face, so users can feel as if they are actually having a conversation.

２．顔認識時のロボット１の所作について
１）顔認識後においてはコントローラＭＣは顔認識用カメラ１０に画像を取得させて一定のタイミングで常時顔パターン認識を実行する。そして、顔認識用カメラ１０の画角内でユーザーの顔を認識し、画角内の所定の位置、例えば画角中央の原点にユーザーの顔の２つの目の中央位置Ｃがある状態をデフォルト位置とする。コントローラＭＣはこのデフォルト位置から中央位置Ｃがずれた場合に、そのずれ量に応じて左右いずれかのずれ方向に瞳オブジェクト２７ａが移動するようなアニメーション表示をさせる。通常、瞳オブジェクト２７ａは、例えば図９（ａ）のように目オブジェクト２７の中で白目内に楕円形状として全体が現れているが、ユーザーの顔が移動しているある状態では図１１に示すように瞳オブジェクト２７ａはあたかもその方向を見ているように一部が隠れた目オブジェクト２７として表示されることとなる。
ユーザーが動いて顔認識用カメラ１０の画角から顔が出てしまい顔認識できなくなった場合には、コントローラＭＣは第１のモータ２３を駆動させ、中央位置Ｃがずれた方向に可動部３全体を回動させて顔認識用カメラ１０を向けるよう制御する。顔認識がされた段階で第１のモータ２３の駆動を停止させる。ある程度の回動、例えば可動部３全体を４５度回動させても顔認識がされない場合には、その段階でコントローラＭＣは第１のモータ２３の駆動を停止させ、その状態で常時顔パターン認識を継続する。
『２．顔認識時のロボット１の所作についてにおける効果』
これによって、ユーザーはロボット１にいつも見られながら対話をしているような感覚になり、対話のおもしろさが増すこととなる。 2. Regarding the behavior of the robot 1 during face recognition 1) After face recognition, the controller MC causes the face recognition camera 10 to acquire images and constantly performs face pattern recognition at a fixed timing. Then, the user's face is recognized within the angle of view of the face recognition camera 10, and the default state is that the center position C of the two eyes of the user's face is at a predetermined position within the angle of view, for example, the origin at the center of the angle of view. position. When the center position C deviates from this default position, the controller MC displays an animation in which the pupil object 27a moves in either the left or right direction depending on the amount of deviation. Normally, the pupil object 27a appears entirely as an ellipse inside the white of the eye in the eye object 27, as shown in FIG. 9(a), but in a certain state when the user's face is moving, it is shown in FIG. 11. Thus, the eye object 27a is displayed as a partially hidden eye object 27, as if the user were looking in that direction.
If the user moves and the face comes out of the field of view of the face recognition camera 10 and the face cannot be recognized, the controller MC drives the first motor 23 to move the movable part 3 in the direction in which the center position C is shifted. Control is performed to rotate the entire body and direct the face recognition camera 10. The drive of the first motor 23 is stopped at the stage when the face is recognized. If the face is not recognized even after a certain degree of rotation, for example, the entire movable part 3 is rotated by 45 degrees, the controller MC stops driving the first motor 23 at that stage, and in this state, the face pattern recognition is performed at all times. Continue.
“2. "Effects on the behavior of robot 1 during face recognition"
This makes the user feel as if they are interacting with the robot 1 while being watched all the time, which makes the interaction more interesting.

２）コントローラＭＣは顔認識用カメラ１０で常時顔パターン認識を実行するが、ユーザが動いていたり顔認識用カメラ１０の画角内にいなかったりする場合には顔認識ができない。そのため、顔認識状態を画面上の変化としてユーザーに報知することがよい。実施の形態１では、瞳オブジェクト２７ａ内部の瞳上での反射を表現した鎌状の反射オブジェクト２７ｂの色の濃さの変化で顔認識状態を報知するようにしている。本実施の形態１では、コントローラＭＣは認識されていない場合にはごく薄い青色で表示させ、認識中である状態ではそれより濃く、通常の顔認識されている状態では濃い青色で表示させる。
『２．顔認識時のロボット１の所作についてにおける効果その２』
これによって、ユーザーはロボット１に顔認識されているかいないかが容易にわかるため、積極的に顔認識するようにユーザーは協力するようになり、円滑な対話が進むことと
なる。 2) Although the controller MC always executes facial pattern recognition with the face recognition camera 10, face recognition cannot be performed if the user is moving or is not within the field of view of the face recognition camera 10. Therefore, it is preferable to notify the user of the face recognition status as a change on the screen. In the first embodiment, the face recognition state is notified by a change in the color density of the sickle-shaped reflective object 27b, which represents the reflection on the pupil inside the pupil object 27a. In the first embodiment, the controller MC displays the face in very light blue when the face is not being recognized, darker blue when the face is being recognized, and dark blue when the face is being recognized normally.
“2. Effect 2 on robot 1's behavior during face recognition
As a result, the user can easily tell whether or not the robot 1 has recognized his or her face, so the user actively cooperates with the robot 1 to recognize his or her face, and the dialogue progresses smoothly.

Ｃ．留守設定時の動作について
実施の形態１では留守設定モード、つまり留守設定時に登録したｅメールに対して画像の転送が可能である。留守設定モードは本実施の形態１ではユーザーがロボット１のチャット画面Ｓ２の設定ボタンオブジェクト３５をタッチした後に表示される設定画面において設定とその解除がされる。以下では留守設定モードがされている場合のコントローラＭＣの留守設定時プログラムに基づく処理について説明する。
コントローラＭＣは、留守設定モードにおける上記の「２．自然対話モードの開始と停止」におけるＢ．の４）での待ち受けードにおいて、ドップラーセンサ２２によって物体（人）が動いていると判断すると以下のように制御する。 C. Regarding operations when set to be away In the first embodiment, images can be transferred to the e-mail registered when set to be away in the away setting mode, that is, when set to be away. In the first embodiment, the absence setting mode is set and canceled on the setting screen that is displayed after the user touches the setting button object 35 on the chat screen S2 of the robot 1. In the following, processing based on the absentee setting program of the controller MC when the absentee setting mode is set will be described.
The controller MC performs the steps described in B. above in "2. Start and stop of natural dialogue mode" in the absence setting mode. In the standby mode in step 4), when it is determined by the Doppler sensor 22 that an object (person) is moving, the control is performed as follows.

１）コントローラＭＣは、ロボット１の周囲になんらかの動く物体が存在することで、この状態をユーザーにｅメールによって報知をする。コントローラＭＣはインターネット回線を通じてｅメールアドレスが登録されているロボット１の近くにいないユーザー（以下、外部ユーザーとする）の端末装置、例えばスマートフォンに対してロボット１のクラウドサーバーのＵＲＬをｅメールに記載して送る。ｅメールの件名や送信文中にこの報知の意図のわかるような表現を表記をする。例えば「誰かが来ているようです」のような文章やそれを意味するようなアイコン等である。
『Ｃ．留守設定時の動作についてにおける効果』
これによって、まずｅメールが送られて来たことによって外部ユーザーはなんらかのロボット１周囲に物体（人）が動いる状態が報知されて認識することができ、この状態に対して外部ユーザーに対策をとる機会が与えられることとなる。 1) If there is any moving object around the robot 1, the controller MC notifies the user of this state by e-mail. Controller MC writes the URL of the cloud server of robot 1 in an email to a terminal device, such as a smartphone, of a user (hereinafter referred to as an external user) who is not near robot 1 and whose email address is registered through the Internet line. and send it. Include expressions in the subject line of the e-mail or in the message that make it easy to understand the intent of the notification. For example, it may be a sentence such as ``It looks like someone is coming'' or an icon that indicates this.
“C. ``Effects of operations when set to be away''
As a result, by first receiving an e-mail, an external user can be notified and recognized that an object (person) is moving around a certain robot, and the external user can take countermeasures against this situation. You will be given the opportunity to take it.

２）コントローラＭＣは、外部ユーザーにｅメールを送信すると同時に顔認識用カメラ１０を起動させて画像を取得する。
３）コントローラＭＣは、外部ユーザーにｅメールを送信すると同時に、一定時間内にあるトリガーとなる発話があるかどうかを判断する。例えば「ただいまユピ坊」のような挨拶の発話である。この発話に基づいて自然対話モードにおけるビルトインシナリオが開始され、ユーザー（ここでは「ただいまユピ坊」を発話したロボット１の近くにいる者）に顔認証を促す。コントローラＭＣは顔認証の結果、登録されているユーザーの一人であると判断した場合に、外部ユーザーの端末装置に対して二回目となるｅメールを送信する。このｅメールはロボット１の周囲にいる者は不審者ではないという外部ユーザーに対する情報となる。つまり、二回目のｅメールは例えば家族のような関係者であることを報知するものとなる。ｅメールには登録情報に基づいて登録されているユーザーの名前を情報として件名や送信文中に表記する。尚、トリガーとして発話以外の、例えばタッチパネル部７にタッチして顔認識し、登録ユーザーであることを確認してもよい。
『Ｃ．留守設定時の動作についてにおける効果その２』
これによって、留守中に例えば子供等の家族が帰ってきた場合には、この二回目のｅメールによってその旨がわかるため、わざわざスマートフォン経由で留守中の家の様子を確認する必要がない。 2) The controller MC sends an e-mail to the external user and at the same time activates the face recognition camera 10 to acquire an image.
3) At the same time as sending an e-mail to an external user, the controller MC determines whether there is a triggering utterance within a certain period of time. For example, it is a greeting utterance such as "I'm home, Yupibo." Based on this utterance, a built-in scenario in the natural dialogue mode is started, prompting the user (in this case, the person near the robot 1 that uttered "I'm home") to perform facial recognition. When controller MC determines that the user is one of the registered users as a result of face authentication, it sends a second e-mail to the external user's terminal device. This e-mail provides information to external users that the people around the robot 1 are not suspicious. In other words, the second e-mail is to notify that the person is related to the person, such as a family member. In the e-mail, the name of the user registered based on the registration information is written as information in the subject line or in the message sent. Note that the trigger may be other than utterance, for example, by touching the touch panel section 7 and performing facial recognition to confirm that the user is a registered user.
“C. Regarding the operation when set to be away, effect part 2”
As a result, if a family member such as a child returns home while the user is away, the second e-mail will notify the user of this fact, so there is no need to take the trouble to check on the state of the home while the user is away via the smartphone.

４）１）において、ｅメールを受信した外部ユーザーは、特に３）において二回目のｅメールの送信がなかった場合に、ロボット１のＩＤとパスワードを入力してクラウドサーバーに接続し、スマートフォンのブラウザ上でクラウドサーバーが提供する顔認識用カメラ１０のカメラ画像をリアルタイムで見ることができる。二回目のｅメールの送信があってもそれは可能である。
図１２はユーザーのスマートフォン４１の一例であり、クラウドサーバーに接続後においてはタッチパネルを兼ねたその表示画面４３上に顔認識用カメラ１０の所定のカメラ画像が表示される。カメラ画像内には顔認識用カメラ１０の向きを遠隔操作するための４つ
の操作アイコン４４ａ～４４ｄが表示される。外部ユーザーは操作アイコン４４ａ～４４ｄを操作することで制御コマンドがクラウドサーバーを介してコントローラＭＣに出力され、制御コマンドに基づいて第１のモータ２３又は第３のモータ２５が駆動制御されてロボット１の頭部５と胴部４が回動して顔認識用カメラ１０の向きを変えることができる。また、録画ボタンアイコン４５にタッチすることで、録画を開始し再度タッチすることで録画を停止することができる。
また、スマートフォン４１の図示しないマイクロフォンに発話した音声データはクラウドサーバーを介してロボット１のスピーカー装置１３から音声出力され、一方でロボット１のマイクロフォン１２から発話した音声データはクラウドサーバーを介してスマートフォン４１の図示しないスピーカー装置から音声出力される。そのため、外部ユーザーは顔認識用カメラ１０の画像を見ながらロボット１近傍のユーザーとスマートフォン４１とロボット１を使用した対話をすることができる。
『Ｃ．留守設定時の動作についてにおける効果その３』
これによって、外部ユーザーは遠隔操作で顔認識用カメラ１０の向きを変えてロボット１の周囲の状況を確認することができ、例えば留守の際の自宅の安全状況をチェックすることができる。また、留守中に子供等の家族が帰ってきた場合でもこのようにスマートフォンを使用して積極的に外部から連絡することで家族を含めた他者との良好な関係に寄与する。 4) In 1), the external user who received the e-mail enters the ID and password of Robot 1 to connect to the cloud server, and connects to the cloud server, especially if the second e-mail is not sent in 3). The camera image of the face recognition camera 10 provided by the cloud server can be viewed in real time on the browser. This is possible even if a second email is sent.
FIG. 12 shows an example of a user's smartphone 41, and after connecting to the cloud server, a predetermined camera image of the facial recognition camera 10 is displayed on its display screen 43, which also serves as a touch panel. Four operation icons 44a to 44d for remotely controlling the direction of the face recognition camera 10 are displayed within the camera image. By operating the operation icons 44a to 44d, the external user outputs a control command to the controller MC via the cloud server, and based on the control command, the first motor 23 or the third motor 25 is driven and controlled to control the robot 1. The head 5 and body 4 of the person can rotate to change the direction of the face recognition camera 10. Furthermore, by touching the recording button icon 45, recording can be started, and by touching it again, recording can be stopped.
Furthermore, the voice data uttered into the microphone (not shown) of the smartphone 41 is outputted from the speaker device 13 of the robot 1 via the cloud server, while the voice data uttered from the microphone 12 of the robot 1 is transmitted to the smartphone 41 via the cloud server. Audio is output from a speaker device (not shown). Therefore, the external user can interact with a user near the robot 1 using the smartphone 41 and the robot 1 while viewing the image of the face recognition camera 10.
“C. Regarding the operation when set to be away, effect 3”
This allows the external user to remotely control the direction of the face recognition camera 10 to check the surroundings of the robot 1, and for example, to check the safety situation at home when the user is away. In addition, even if family members such as children return home while you are away, you can actively contact them from outside using your smartphone in this way, contributing to good relationships with others including your family.

＜実施の形態１の変形例１＞
次に、実施の形態１の変形例１について説明する。
上記自然対話におけるビルトインシナリオの対話において、ユーザーの滑舌が悪かったり、他の音が混ざってしまいマイクロフォン１２から取得した音声データがビルトインシナリオの正規表現又は非正規表現に合致しない場合には、コントローラＭＣは直ちに通常対話であると判断することなくユーザーに再度の発言を促すための発話として、例えば「もう一度言って下さい」というような音声出力をさせるようにしてもよい。
コントローラＭＣは、このような促しの発話をスピーカ装置１３からさせ、これに対して一定の時間内にユーザーからビルトインシナリオに沿った正しい発話がされた場合には再びビルトインシナリオの対話として処理するようにする。一方、このような場合でもユーザーの発話がビルトインシナリオに正規表現又は非正規表現に合致しない場合に外部のクラウドサーバーに接続させるようにする。
このようにすれば、無駄に外部のクラウドサーバーに接続させるようなことがなく、ロボット１の内部のみで対話を行うことができる。 <Modification 1 of Embodiment 1>
Next, a first modification of the first embodiment will be described.
In the built-in scenario dialogue in the natural dialogue described above, if the user's speech is poor or other sounds are mixed in and the audio data acquired from the microphone 12 does not match the regular expression or non-regular expression of the built-in scenario, the controller The MC may output a voice such as "Please say it again" as an utterance to prompt the user to speak again, without immediately determining that it is a normal dialogue.
The controller MC makes such a prompting utterance come from the speaker device 13, and if the user makes a correct utterance according to the built-in scenario within a certain period of time, the controller MC processes the dialogue as a built-in scenario dialogue again. Make it. On the other hand, even in such a case, if the user's utterance does not match the built-in scenario with regular expressions or non-regular expressions, the system connects to an external cloud server.
In this way, the robot 1 can communicate only within the robot 1 without needlessly connecting to an external cloud server.

＜実施の形態１の変形例２＞
次に、実施の形態１の変形例２について説明する。
上記自然対話においてユーザーがロボット１の言葉（発話）を聞き逃した場合、一定の時間内であれば直前のロボット１の発話を繰り返すような依頼の発話をユーザーが発話することで再度ロボット１に発話（音声出力）させるようにしてもよい。この処理はビルトインシナリオの対話でも自然対話でもいずれでも可能である。
コントローラＭＣは対話モード中において聞き逃しのトリガーとなるような発話、例えば「もう一度言って」という発話があったかどうかを音声認識する。そして、ロボット１からの発話後にユーザーから発話を繰り返す依頼があったと判断すると、直前にロボット１が発話した内容を再度音声出力する。そして、先の発話をしたことはキャンセルして、二回目の発話をもって一回目の発話として処理する。
このようにすれば、ユーザーが対話途中で聞き逃したりした場合でも対話が途切れることなく再開されることとなる。 <Modification 2 of Embodiment 1>
Next, a second modification of the first embodiment will be described.
If the user misses the words (utterances) of robot 1 in the above natural dialogue, the user can repeat the previous utterance of robot 1 within a certain period of time by uttering an utterance requesting robot 1 to repeat the previous utterance. It may also be possible to make speech (voice output). This process is possible for both built-in scenario dialogue and natural dialogue.
The controller MC performs voice recognition to determine whether or not there is an utterance that would trigger a missed response, for example, an utterance such as "Say again" during the dialog mode. If it is determined that the user has requested to repeat the utterance after the robot 1 has uttered the utterance, the content that the robot 1 has uttered just before is output again as a voice. Then, the previous utterance is canceled and the second utterance is processed as the first utterance.
In this way, even if the user misses the conversation during the conversation, the conversation can be resumed without interruption.

＜実施の形態１の変形例３＞
次に、実施の形態１の変形例３について説明する。
上記自然対話においてロボット１がタッチパネル部７に表示させたユーザーの発話をユーザーが確認して、間違って音声認識されたことがわかった場合には、一定の時間内であればそれを指摘して正しい対話に修正することができるようにしてもよい。この処理はビルトインシナリオの対話でも自然対話でもいずれでも可能である。
コントローラＭＣは対話モード中において、ユーザーからの音声認識が間違っている旨の指摘となるトリガーとなるような発話、例えば「間違えているよ」や「違うよ。もう一度いうよ」というような発話があったかどうかを音声認識する。そして、コントローラＭＣはその発話がユーザーの発話をタッチパネル部７に表示させた後の一定時間内にあったと判断すると再度のユーザーの発話を促す音声出力をする。例えば「ごめんね。もう一度言って」という発話内容を音声出力する。
そして、
（１）ビルトインシナリオの対話の場合には直前のユーザーの発話はキャンセルされ、再度ユーザーが発話する内容が正しい発話として音声認識処理される。
（２）通常対話では上記の「間違えているよ」や「もう一度いうよ」という対話内容も外部のクラウドサーバーに発話データ（音声データ）として送信され、そのような発話も含め再度ユーザーが発話した内容で返答データの作成をリクエストする。
このようにすれば、ロボット１が間違ってユーザーの発話を認識した場合でも正しい対話に修正することができる。 <Modification 3 of Embodiment 1>
Next, a third modification of the first embodiment will be described.
In the above natural dialogue, if the user checks the user's utterance displayed on the touch panel unit 7 by the robot 1 and finds that the voice was incorrectly recognized, he/she can point it out within a certain period of time. It may be possible to modify the dialogue to be correct. This process is possible for both built-in scenario dialogue and natural dialogue.
During the dialogue mode, the controller MC receives utterances from the user that may serve as a trigger to point out that the voice recognition is incorrect, such as utterances such as "You're wrong" or "That's wrong. I'll say it again." Recognize voice to see if it was there. When the controller MC determines that the utterance occurred within a certain period of time after the user's utterance was displayed on the touch panel section 7, it outputs a voice prompting the user to speak again. For example, the content of the utterance "I'm sorry. Please say it again" is output as a voice.
and,
(1) In the case of a built-in scenario dialogue, the user's previous utterance is canceled, and the content uttered by the user is processed for speech recognition as the correct utterance.
(2) In normal dialogue, the dialogue contents such as "You're wrong" and "I'll say it again" are sent to an external cloud server as utterance data (voice data), and the user can utter them again, including such utterances. Request creation of response data by content.
In this way, even if the robot 1 incorrectly recognizes the user's utterance, it can correct the dialogue to be correct.

＜実施の形態１の変形例４＞
実施の形態１の構成とは異なる例えば次のような構成を採用するようにしてもよい。
（１）上記ではビルトインシナリオが実行されない場合にクラウドサーバーにリクエストして通常対話に移行するような設定であった。つまり、ビルトインシナリオが実行されるのであれば、すべてビルトインシナリオとするような設定であったが、敢えてビルトインシナリオに対応する場合でもある条件でローカルで対応をせずにクラウドサーバーにリクエストするようにしてもよい。ある条件とは例えば何回かに一回の回数や、ランダムなタイミングで実行することがよい。
これによって、ロボット１と予測されない対話をすることとなり、決まり切っていないより人間的な対話ができることができる。
（２）上記実施の形態１ではコントローラＭＣは音声認識エンジンを備えず、音声認識エンジンを備えた外部のサーバーに接続してユーザーの発話（音声データ）をテキスト化するようにしていた。それによってロボット１の負担が軽減されている。
しかし、コントローラＭＣは、メモリ内に音声認識エンジンを備えるようにし、音声認識エンジンを呼び出してマイクロフォン１２から取得したユーザーの発話データ（音声データ）を音声認識エンジンを使用して自身でテキスト化した文字列データを作成するようにしてもよい。つまり、ロボット１のコントローラＭＣは自らユーザーの音声データをテキストデータ化する能力を有していてもよい。これによって、音声認識エンジンを使用せずに文字列データを作成できることとなって、例えば内部の処理時間が短くなる。
（３）上記実施の形態１では設定画面から初期設定するようにしていたが、例えばスマートフォンのような端末装置を使用して外部からクラウドサーバー経由で登録するようにしてもよい。その方が特に端末装置を使い慣れた人には設定が容易で時間の短縮となる。
（４）第１～の第３モータ２３～２５はサーボモータ以外の他の駆動手段を使用するようにしてもよい。他の駆動手段とは、例えば他の形式のモータや油圧シリンダ等である。
（５）上記実施の形態１では文字列データをロボット１内部で音声合成するようにしていた。このように内部の音声合成エンジンを用いることはそのまま音声データをサーバーとやり取りするよりデータが重くなりすぎずによいが、クラウドサーバー側で対話エンジンを使用して取得した返答データ（文字列データ）を音声合成し、その音声データをロボット１にレスポンスするようにしてもよい。
（６）上記実施の形態１ではチャット画面Ｓ２の設定ボタンオブジェクト３５から設定画面に移行するような構成であったが、タッチパネル部７をスライド操作することで設定画
面に移行ようにしってもよい。 <Modification 4 of Embodiment 1>
For example, the following configuration, which is different from the configuration of the first embodiment, may be adopted.
(1) In the above setting, if the built-in scenario is not executed, a request is made to the cloud server and the conversation shifts to normal dialogue. In other words, if a built-in scenario was to be executed, it was set to be a built-in scenario, but I decided to make a request to the cloud server instead of responding locally under the condition that it corresponds to a built-in scenario. It's okay. The certain condition may be, for example, once every few times or at random timing.
This results in an unexpected dialogue with the robot 1, and allows for a more human-like dialogue that is less formal.
(2) In the first embodiment, the controller MC does not include a voice recognition engine, but connects to an external server equipped with a voice recognition engine to convert the user's utterances (voice data) into text. This reduces the burden on the robot 1.
However, the controller MC has a voice recognition engine in its memory, calls the voice recognition engine, and converts the user's utterance data (voice data) acquired from the microphone 12 into text using the voice recognition engine. Column data may also be created. In other words, the controller MC of the robot 1 may have the ability to convert the user's voice data into text data. This makes it possible to create character string data without using a speech recognition engine, which reduces internal processing time, for example.
(3) In the first embodiment, initial settings are made from the settings screen, but registration may also be made from outside via a cloud server using a terminal device such as a smartphone, for example. This makes setting easier and saves time, especially for people who are accustomed to using terminal devices.
(4) The first to third motors 23 to 25 may use drive means other than servo motors. Other drive means include, for example, other types of motors, hydraulic cylinders, etc.
(5) In the first embodiment, character string data is synthesized into speech within the robot 1. Using the internal speech synthesis engine in this way does not require too much data compared to directly exchanging voice data with the server, but the response data (character string data) obtained using the dialogue engine on the cloud server side It is also possible to synthesize the voice data and send the voice data to the robot 1 as a response.
(6) In the first embodiment, the configuration is such that the setting screen is accessed from the setting button object 35 on the chat screen S2, but the setting screen may be accessed by sliding the touch panel section 7. .

（７）「ハ．通常対話における特別な所作」においてはビルトインシナリオにおいても同様にユーザーの発話データ内に特定の言葉が含まれていると判断した場合に特別な所作を実行させるようにしてもよい。例えばコントローラＭＣはユーザーの発話データ内に特定の言葉が含まれていると判断すると上記と同様に特別な所作をさせるように制御してもよい。
（８）「ハ．通常対話における特別な所作」においてロボット１の形状が異なれば更に異なるジェスチャーをさせるように制御してもよい。例えば、コントローラＭＣはロボット１に手や足があればそれらを駆動手段を制御して動かすようにしてもよい。
（９）顔画面Ｓ１の目オブジェクト２７のアニメーションとして、ときどき、瞬きさせるようなアニメーションを入れてもよい。例えば図９（ａ）の目オブジェクト２７の状態から図７の閉じた状態の目オブジェクト２７を挿入するような制御とすることで実行させる。そのようにすれば、ロボット１が実際に本当にこちらを見ているようなリアル感が創出されることとなりロボット１との対話をより楽しむことができる。
（１０）顔認識モードでは、登録されていないユーザーであれば不特定の人物として認識するようにしていたが、その状態から設定画面に移行して新たな別のユーザーとして認証登録するようにすると、便利である。
（１１）「Ｃ．留守設定時の動作について」において、留守設定モードをスマートフォンから設定できるようにすると便利である。
（１２）「Ｃ．留守設定時の動作について」において、コントローラＭＣは、ロボット１の周囲になんらかの動く物体が存在することで、この状態をユーザーにｅメールによって報知をするような処理をするが、逆に一定間隔で動いていることを認識し、一定時間内に動く物体がない場合にこの状態をユーザーにｅメールによって報知をするような処理を設けてもよい。
例えば、病人や介護対象者がある場合にその近くにロボット１を置くことで常に動きがあることを前提とした見守りをすることができる。 (7) Regarding "C. Special gestures in normal dialogue", even in the built-in scenario, if it is determined that a specific word is included in the user's utterance data, a special gesture may be executed. good. For example, when the controller MC determines that a specific word is included in the user's utterance data, the controller MC may control the user to make a special gesture in the same way as described above.
(8) If the shape of the robot 1 is different in "C. Special gestures in normal dialogue", the robot 1 may be controlled to make further different gestures. For example, if the robot 1 has arms and legs, the controller MC may control a driving means to move them.
(9) As the animation of the eye object 27 on the face screen S1, an animation that causes the eye to blink may be inserted from time to time. For example, the control is performed by inserting the closed eye object 27 in FIG. 7 from the state of the eye object 27 in FIG. 9(a). In this way, a realistic feeling as if the robot 1 is actually looking at you can be created, allowing you to enjoy the interaction with the robot 1 even more.
(10) In face recognition mode, an unregistered user was recognized as an unspecified person, but if you change from that state to the settings screen and authenticate and register as a new user. , convenient.
(11) Regarding "C. Operations when set to be away", it would be convenient to be able to set the away setting mode from a smartphone.
(12) In "C. Regarding operations when set to be away", if there is any moving object around the robot 1, the controller MC performs processing such as notifying the user of this state by e-mail. Conversely, a process may be provided to recognize that the object is moving at regular intervals, and to notify the user of this state by e-mail if there is no moving object within a fixed period of time.
For example, if there is a sick person or person to be cared for, by placing the robot 1 near the person, it is possible to watch over them on the assumption that they are constantly moving.

＜実施の形態２＞
次に、実施の形態２について説明する。
上記実施の形態１のロボット１の高輝度白色ＬＥＤ９に変えて、あるいはこれに併設した赤外線ＬＥＤを備えるようにしてもよい。この際に顔認識用カメラ１０のモジュールに赤外線フィルタが備えられていれば取り外す。赤外線ＬＥＤは人には見えないため、夜間の空き巣等の侵入者があった場合に、高輝度白色ＬＥＤ９が点灯することで驚いて侵入者に逃げられてしまう可能性がある。一方、赤外線ＬＥＤであると撮影されていることがわかりにくいので侵入者は逃げず、そのため侵入の画像を確認したり、保存したりすることが可能となる。 <Embodiment 2>
Next, a second embodiment will be described.
An infrared LED may be provided in place of the high-intensity white LED 9 of the robot 1 of the first embodiment, or in addition to the high-intensity white LED 9. At this time, if the module of the face recognition camera 10 is equipped with an infrared filter, it is removed. Since infrared LEDs are invisible to humans, if there is an intruder such as a burglary at night, there is a possibility that the high brightness white LED 9 will turn on and the intruder will be surprised and run away. On the other hand, if an infrared LED is used, it will be difficult to tell that the image is being taken, so the intruder will not run away, and it will therefore be possible to confirm and save images of the intrusion.

＜実施の形態２の変形例１＞
次に、実施の形態２の変形例１について説明する。
ロボット１が赤外線ＬＥＤを備えた場合に、この赤外線ＬＥＤを利用して赤外線リモコン信号受信部を備えた室内の各種装置の制御をするようにしてもよい。各種装置としては、例えばテレビ、オーディオ装置、エアコン装置等がよい。
ロボット１は赤外線リモコン信号受信部を備えた装置の赤外線リモコン信号受信部を直接見通せる場所に設置することがよい。実施の形態２の変形例１ではコントローラＭＣは顔認識用カメラ１０を使用した形状認識に関する形状認識プログラムを備えており、例えばテレビであればその形状の特徴（四角、大きい、黒い等）に基づいて認識することができる。ロボット１はユーザーの各種装置へ赤外線リモコン信号を出力するためのトリガーとしての例えば「テレビのスイッチ付けて」のような発話があると、その発話に基づいて第１のモータ２３又は第３のモータ２５を制御してロボット１を顔認識用カメラ１０を上
下に顔５を首振りさせながら、３６０度回転させて周囲を撮影させ、形状認識プログラムによってテレビの形状を認識させるように動作させる。
コントローラＭＣがテレビがあると判断すると、その方向を記憶させると同時に赤外線ＬＥＤからその物体方向に赤外線リモコン信号を出力させて、テレビを動作させるＯＮ・０ＦＦ等の制御を実行させる。次のテレビについてのトリガーがあった際にはまず、その方向において形状認識を実行する。赤外線リモコン信号は単にＯＮ・０ＦＦのスイッチング制御だけではなく、例えばテレビであればチャンネルの変更、例えばエアコンであれば温度調整等にも対応するように赤外線周波数を変更して制御することが可能となる。このような細かな制御では複数種類の周波数の異なる赤外線リモコン信号が必要となるが、赤外線の周波数の設定は、例えば図１２はユーザーのスマートフォン４１を使用してサーバー経由で行うようにするとよい。
また、形状認識プログラムによって方向を探さなくとも、スマートフォン４１経由で顔認識用カメラ１０を操作してその向きを変えることで各種装置の方向を取得し、その方向を登録するようにしてもよい。 <Modification 1 of Embodiment 2>
Next, a first modification of the second embodiment will be described.
If the robot 1 is equipped with an infrared LED, the infrared LED may be used to control various indoor devices equipped with an infrared remote control signal receiving section. Examples of various devices include a television, an audio device, an air conditioner, and the like.
It is preferable that the robot 1 be installed in a place where it can directly see the infrared remote control signal receiving section of a device equipped with an infrared remote control signal receiving section. In the first modification of the second embodiment, the controller MC is equipped with a shape recognition program related to shape recognition using the face recognition camera 10. For example, in the case of a television, the controller MC is equipped with a shape recognition program for shape recognition using the face recognition camera 10. can be recognized as such. When the robot 1 makes an utterance such as "turn on the TV" as a trigger for outputting an infrared remote control signal to various devices of the user, the robot 1 controls the first motor 23 or the third motor based on the utterance. 25, the robot 1 is operated so that the face recognition camera 10 is rotated 360 degrees to photograph the surroundings while swinging the face 5 up and down, and the shape of the television is recognized by the shape recognition program.
When the controller MC determines that there is a television, it memorizes the direction and at the same time outputs an infrared remote control signal from the infrared LED in the direction of the object to execute controls such as ON/OFF to operate the television. When there is a trigger for the next TV, shape recognition is first performed in that direction. The infrared remote control signal can be used not only to control ON/OFF switching, but also to change the infrared frequency to respond to, for example, changing the channel of a television, or adjusting the temperature of an air conditioner. Become. Although such detailed control requires multiple types of infrared remote control signals with different frequencies, it is preferable to set the infrared frequency using the user's smartphone 41 in FIG. 12 via the server, for example.
Further, instead of searching for the direction using the shape recognition program, the directions of various devices may be obtained by operating the face recognition camera 10 via the smartphone 41 and changing its direction, and the directions may be registered.

＜実施の形態３＞
次に、実施の形態３について説明する。
実施の形態３では音声認識エンジンを搭載したサーバーを使用する際の例えば対話ＡＰＩの利用等の接続に伴うランニングコストを削減することを主眼とした制御について説明する。
また、実施の形態３の対話プログラムはマイクロフォン１２から取得した音声の無音状態を検知できるサブプログラムを含んでいる。また、対話プログラムはユーザーの発話の音声データをマイクロフォン１２から取得して一旦録音し、サーバーに出力させるための録音・出力サブプログラムを含んでいる。
コントローラＭＣは発話があった場合には直ちにサーバーに接続させず、ユーザーの発話の音声データをまず一旦録音し、録音したユーザーの発話の音声データの無音状態を検出した段階で初めてサーバーに接続してその録音した音声データを出力し、対話エンジンでの返答データを作成させるようにする。このようにすれば常にサーバーに接続されているわけではなく、無音時間を含んだ長時間をサーバーに接続する必要がないため、無音の接続時間をカットすることができる。
音声認識エンジンはユーザーの発話中において１つのプロセスがそのユーザーに専有されることとなる。つまり、1つの処理に「何秒」というコンピュータとしては非常に長い時間が専有されることとなり、結果として音声認識エンジンを使用するユーザーのコストの負担が大きくなってしまうが、実施の形態３のようにすれば単位ユーザー当たりに必要なプロセスを減らすことができ、ユーザーのコスト削減に寄与する。 <Embodiment 3>
Next, Embodiment 3 will be described.
In Embodiment 3, control will be described that focuses on reducing running costs associated with connections such as the use of dialogue API when using a server equipped with a voice recognition engine.
Furthermore, the dialogue program of the third embodiment includes a subprogram that can detect a silent state of the voice obtained from the microphone 12. The dialogue program also includes a recording/output subprogram for acquiring audio data of the user's speech from the microphone 12, recording it once, and outputting it to the server.
The controller MC does not connect to the server immediately when there is an utterance, but first records the audio data of the user's utterance, and connects to the server only after detecting a silence state in the recorded audio data of the user's utterance. The recorded voice data is outputted and the response data is generated by the dialogue engine. In this way, it is not always connected to the server, and there is no need to connect to the server for a long time including silent time, so the silent connection time can be cut.
One process of the speech recognition engine is dedicated to the user while the user is speaking. In other words, one process takes up a very long amount of time for a computer, such as several seconds, and as a result, the cost burden for the user using the speech recognition engine increases. By doing so, the number of processes required per unit user can be reduced, contributing to cost reduction for users.

＜実施の形態３の変形例１＞
次に、実施の形態３の変形例１について説明する。
実施の形態３の変形例１でも音声認識エンジンを搭載したサーバーを使用する際の接続のランニングコストを削減することを主眼とした制御について説明する。ユーザーの発話が開始されるまでにタイムラグが発生することや、ユーザーの発話待ちの状態で結局ユーザーが発話せずタイムアウトでサーバーとの接続を終了する場合があると、音声認識サーバーはプロセスを消費してしまうのでユーザーのコストがかかってしまう。
実施の形態３の変形例１のロボット１の対話プログラムは発話の音声データをマイクロフォン１２から取得して一旦録音し、サーバーに出力させるための録音・出力サブプログラムを含んでいる。また、対話プログラムには録音された音声データの音圧レベルを検出し、出力するサブプログラムを含んでいる。
コントローラＭＣは自然対話の状態で常時録音されている音声データが一定音圧以上のである場合にサーバーに接続させ、録音中のデータを追っかけ再生するようにする。つまり、無音、あるいは音声認識ができないような小さな発話を無視し、対話可能な発話があ
った場合だけサーバーに接続して音声データを出力し、サーバー側に音声認識エンジンで返答データを作成させるようにする。
これによって、発話待ちの無駄な接続時間をなくすことが可能となる。 <Modification 1 of Embodiment 3>
Next, a first modification of the third embodiment will be described.
Modification 1 of Embodiment 3 will also explain control focused on reducing connection running costs when using a server equipped with a voice recognition engine. If there is a time lag before the user starts speaking, or if the user is waiting for the user to speak but the user does not speak and the connection with the server ends due to a timeout, the speech recognition server consumes the process. This incurs costs for the user.
The dialogue program for the robot 1 of the first modification of the third embodiment includes a recording/output subprogram for acquiring speech data from the microphone 12, recording it once, and outputting it to the server. The dialogue program also includes a subprogram that detects and outputs the sound pressure level of recorded audio data.
The controller MC connects to the server when the voice data constantly recorded in a state of natural dialogue has a certain sound pressure or more, and reproduces the data being recorded. In other words, it ignores small utterances that are silent or cannot be recognized, connects to the server and outputs voice data only when there is an utterance that can be interacted with, and has the server create response data using a voice recognition engine. Make it.
This makes it possible to eliminate wasted connection time waiting for speech.

＜実施の形態４＞
次に、実施の形態４について説明する。
音声認識サーバーは高価であるため、あらかじめ十分なリソースを用意することができないことがあり、サーバーリソースに余裕がない場合、端末が音声認識サーバーに接続しようとした場合にサーバーがビジー状態であることがある。
上記通常対話においては、サーバーに発話データ（音声データ）を送信し返答データの作成をリクエストする。とサーバーは発話データに基づいて文字列データ化された返答データを作成してレスポンスする。ところが、ビジー状態であると返答データがされず、エラーになってしまうことがある。サーバーからエラーメッセージが返信されることとなる。
ロボット１のコントローラＭＣはサーバーからエラーメッセージの送信を受けた場合にサーバー接続エラーである旨の発話をユーザーにせずに、ビルトインシナリオから対話を続けられるような返信を音声出力するようにする。例えば「もう一度言って」とか「うんうん」とか「なんだっけ？」というような曖昧な返答したり、適当な相槌を返すなどしてサーバーが空くのを待つ処理をするとよい。 <Embodiment 4>
Next, Embodiment 4 will be described.
Speech recognition servers are expensive, so it may not be possible to prepare sufficient resources in advance, and if server resources are scarce, the server may be busy when the device tries to connect to the speech recognition server. There is.
In the normal dialogue, utterance data (voice data) is sent to the server and a request is made to create response data. The server responds by creating response data converted into string data based on the utterance data. However, if the device is busy, the response data may not be sent and an error may occur. An error message will be returned from the server.
When the controller MC of the robot 1 receives an error message from the server, it outputs a voice reply that allows the user to continue the dialogue from a built-in scenario without making the user speak to the effect that there is a server connection error. For example, it would be a good idea to wait for the server to become available by giving vague replies such as ``Say again'', ``Yeah, yeah'', or ``What is that?'', or giving some kind of support.

＜実施の形態４の変形例１＞
次に、実施の形態４の変形例１について説明する。
ユーザー側の発話が長すぎると、音声認識エンジンが誤認識をする可能性がある。そのため、認識したユーザーの発話が長すぎると判断した場合に、音声認識エンジンを備えるサーバーに接続することなく記憶手段に記憶された音声データから選択された対話例を音声出力する機能を備えることがよい。
ロボット１のコントローラＭＣは、ユーザーの発話データが一定以上の長さになったと判断した場合には、サーバーに接続させることなくビルトインシナリオから「うん」とか「マジ？」とか「本当ですか？」などという対話においてどのようにも取れる相づちのような発話を音声出力する。発話データは音声データのままでもよく、コントローラＭＣあるいはサーバーで文字列データに変換された後のものでもよい。
これによって的外れな言葉が返ってくることを防止し、対話を仕切り直しして改めてユーザーに対話を促すようにすることができる。 <Modification 1 of Embodiment 4>
Next, a first modification of the fourth embodiment will be described.
If the user's utterance is too long, the speech recognition engine may misrecognize it. Therefore, if it is determined that the recognized user's utterance is too long, it is possible to provide a function to output a dialogue example selected from the voice data stored in the storage means without connecting to a server equipped with a voice recognition engine. good.
When the controller MC of robot 1 determines that the length of the user's utterance data exceeds a certain level, the controller MC of the robot 1 responds with questions such as "Yes", "Seriously?", or "Really?" from the built-in scenario without connecting to the server. It outputs voice-like utterances that can be taken in any way in a dialogue. The utterance data may be voice data as it is, or may be data that has been converted into character string data by the controller MC or the server.
This prevents irrelevant words from being answered, and allows the user to redirect the conversation and encourage the user to engage in dialogue again.

＜実施の形態５＞
次に、実施の形態５について説明する。
実施の形態５では複数の音声認識エンジンを組み合わせて利用する場合について説明する。
音声認識エンジンにはローカル（つまり、ネットサーバーに接続せずに装置内で処理する場合）の音声認識エンジンと、ネットサーバーに接続してリクエストによって作成した対話データをレスポンスするクラウドの音声認識エンジンがある。ローカルにもクラウドにもそれぞれ複数種類の音声認識エンジンがあり、無料のものも有料のものもある。そのため、これら異なる音声認識エンジンを備えるサーバーを利用する際に料金が無料のサーバーと有料のサーバーをミックスして利用するようにする。
ロボット１がインターネット回線を利用して接続されるクラウドサーバーでは、対話モードにおいて、ロボット１から発話データがリクエスト発行され音声認識エンジンに返答データを作成させる際に、例えばクラウドサーバーは次のように対応することがよい。
（１）月あたり設定したある時間Ａまでは有料の対話ＡＰＩにアクセスする。
（２）月あたり設定したある時間Ａからある時間Ｂまでは有料のＡＰＩと無料の音声認識エンジンを混ぜて使う。例えば最初の連続数回の認識は有料の音声認識エンジンのサーバ
ーを使いその後の連続した認識には無料の音声認識エンジンのサーバーを使うなどミックスして使う。
（３）月あたり設定したある時間Ｂを超えた場合、無料の音声認識エンジンのみを使う。これは一例であって、例えば月あたり設定したある時間Ａを越えた場合に直ちに無料の音声認識エンジンのみを使うような設定でもよい。
このようにすれば、有料の範囲を大きく越えずに対話をすることができる。ロボット１と接続されているクラウドサーバーがこのような処理を実行するプログラムに基づいて有料と無料とを月あたり設定した時間に基づいて計算してロボット１からのリクエスト発行を処理する。
ロボット１自体がこのような処理を実行して、リクエスト発行の際にクラウドサーバーに対して有料の対話ＡＰＩを使用するか、無料のサーバーの音声認識エンジンを使用するかの命令をするようにしてもよい。
＜実施の形態５－１＞
実施の形態５の複数の音声認識エンジンを組み合わせて利用する場合は複数の対話エンジンを組み合わせる場合についても同様である。 <Embodiment 5>
Next, Embodiment 5 will be described.
In Embodiment 5, a case will be described in which a plurality of speech recognition engines are used in combination.
There are two types of speech recognition engines: a local speech recognition engine (that is, when processing is performed within the device without connecting to an internet server), and a cloud speech recognition engine that connects to the internet server and responds with dialogue data created in response to a request. be. There are multiple types of speech recognition engines, both local and cloud-based, and some are free and some are paid. Therefore, when using servers equipped with these different speech recognition engines, use a mix of free servers and paid servers.
In the cloud server to which the robot 1 is connected using the Internet line, when the robot 1 issues a request for speech data in the dialog mode and causes the speech recognition engine to create response data, the cloud server responds as follows, for example. It is good to do.
(1) Access the paid dialogue API until a certain time A set per month.
(2) Use a mixture of paid APIs and free speech recognition engines from a certain time A to a certain time B set per month. For example, a paid speech recognition engine server is used for the first few consecutive recognitions, and a free speech recognition engine server is used for subsequent consecutive recognitions.
(3) If the time exceeds a certain time B set per month, only the free speech recognition engine will be used. This is just one example, and for example, a setting may be made in which only the free speech recognition engine is used immediately after a certain time A set per month has been exceeded.
In this way, it is possible to have a conversation without exceeding the paid range. A cloud server connected to the robot 1 processes requests issued by the robot 1 by calculating paid and free hours based on a set time per month based on a program that executes such processing.
The robot 1 itself executes such processing and instructs the cloud server to use the paid conversation API or the voice recognition engine of the free server when issuing a request. Good too.
<Embodiment 5-1>
The same applies to the case where a plurality of speech recognition engines of the fifth embodiment are used in combination, and the case where a plurality of dialogue engines are used in combination.

＜実施の形態６＞
次に、実施の形態６について説明する。
ある１つの決まった対話エンジンを使うだけでは、返答がきまったパターンになってしまいユーザーがロボット１との会話に飽きてしまう可能性がある。そのため、実施の形態６ではこれを解消するため複数の対話エンジンの出力の結果を用いて会話に飽きないようにその結果をアレンジするための処理を説明する。
対話エンジンはクラウドの対話エンジンだけではなく、ロボット１内のローカルな対話エンジンを使用してもよい。
この処理は複数の返答データを送信されたロボット１側で行ってもよく、対話エンジンを備えたいくつものサーバーからの返答データを取得した際にクラウドサーバー側で行ってもよい。
（１）雑談対話エンジンのうち文字列の文字数の最も長い返答をしてきたエンジンの結果を出力する。
最も長い返答とすると、いかにも対話しているように感じ、対話の単調さがなくなり、ユーザーは対話を楽しむことができる。
Ａ．例えば、「腹減った」とユーザーが発話した場合に、ａ～ｃの３つのエンジンからの回答が「ａエンジン：よく間食をしますか?」「ｂエンジン：ご飯食べてないの？」「ｃエンジン：なんか食え」である場合に、ａエンジンを採用してその返答データを出力する。
Ｂ．例えば、「今日の天気は晴れ」とユーザーが発話した場合に、ａ～ｃの３つのエンジンからの回答が「ａエンジン：今すぐお空に行って確認してきます」「ｂエンジン：快晴っぽい？」「ｃエンジン：晴れか雨かで、その日の気分が決まることがあるよね。」である場合に、ｃエンジンを採用してその返答データを出力する。 <Embodiment 6>
Next, Embodiment 6 will be described.
If only one fixed dialogue engine is used, the responses will become a fixed pattern and the user may get bored with the conversation with the robot 1. Therefore, in Embodiment 6, in order to solve this problem, a process for arranging the output results of a plurality of dialogue engines so that the conversation does not get boring will be explained.
As the dialogue engine, not only a cloud dialogue engine but also a local dialogue engine within the robot 1 may be used.
This process may be performed on the robot 1 side to which a plurality of response data have been sent, or may be performed on the cloud server side when response data from multiple servers equipped with dialogue engines are acquired.
(1) Output the result of the engine that gave the longest response among the chat dialogue engines.
When the longest response is used, it feels like a conversation, which eliminates the monotony of the conversation and allows the user to enjoy the conversation.
A. For example, when a user says, "I'm hungry," the three engines a to c will respond with "a engine: Do you often snack?", "b engine: Haven't you eaten?" and " C engine: If the answer is "Eat something", use the A engine and output the response data.
B. For example, when a user says, "Today's weather is sunny," the three engines a to c will respond, "A engine: I'll go to the sky right now and check,""B engine: It looks like it's going to be sunny."?""c engine: Whether it's sunny or rainy sometimes determines your mood that day, right?", the c engine is employed to output the response data.

（２）雑談対話エンジンのうち、語尾に「？」がついているものを最後に持ってきて出力する。このとき「？」がついている回答が複数あればそれらを連続して出力する。
語尾に疑問符がつくと、その疑問に更に答えるような話の流れになるため、会話が続きやすくなりユーザーは対話を楽しむことができる。
例えば、上記（１）Ａ．の選択肢では「よく間食をしますか?ご飯食べてないの？」と出力する。また、上記（１）Ｂ．の選択肢であれば「快晴っぽい？」と出力する。
（３）肯定文を組み合わせた後、疑問文を組み合わせて出力する。
このようにアレンジすることでいかにも考えて文章を練ったような応答になるため、ユーザーは真剣に自身の発話を聞いてもらっているような感覚となり、続けて会話をしたいと思うようになるため、会話が続きやすくなりユーザーは対話を楽しむことができる。ま
た、出力尺をかせぐことができるとともに人への返答を求めることができる。
例えば、上記（１）Ｂ．のような返答データが取得された場合「今すぐお空に行って確認してきます。晴れか雨かで、その日の気分が決まることがあるよね。快晴っぽい？」出力する。 (2) Among the chat dialogue engines, those with a "?" at the end are brought to the end and output. At this time, if there are multiple answers marked with "?", they are output consecutively.
When a question mark is added at the end of a word, the flow of the conversation begins to further answer that question, making it easier for the conversation to continue and allowing the user to enjoy the dialogue.
For example, the above (1) A. The option outputs ``Do you often snack? Do you not eat rice?'' In addition, the above (1) B. If the option is ``Is it clear?'' is output.
(3) After combining affirmative sentences, interrogative sentences are combined and output.
By arranging your responses in this way, your responses will appear as if you have put a lot of thought into your sentences, making users feel as if their utterances are being taken seriously, and making them want to continue the conversation. It becomes easier to continue the conversation and the user can enjoy the conversation. In addition, it is possible to obtain an output scale and to request a response from a person.
For example, the above (1) B. If response data such as ``I'm going to go to the sky right now and check it. Whether it's sunny or rainy sometimes determines the mood of the day, right? Does it look like it's sunny?'' is output.

（４）他よりも話題の転換をより頻繁にしてくるエンジンからの結果を、他のエンジンの結果よりも後に持ってきて出力する。
このようにアレンジすることで話題転換したことで次の発話を誘うような対話となり、対話が続きやすくなる。
Ａ．例えば、「どーもどーも」とユーザーが発話した場合に、ａ～ｃの３つのエンジンからの回答が「ａエンジン：だょね～」「ｂエンジン：そうですね」「ｃエンジン：野球は見たりしますか？」である場合に、ｃエンジンのデータを最後にして「だょね～そうですね野球は見たりしますか？」と出力する。
Ｂ．例えば、「なかなか見つからないね」とユーザーが発話した場合に、ａ～ｃの３つのエンジンからの回答が「ａエンジン：その通りですね」「ｂエンジン：あるあるー」「ｃエンジン：ご家族は何人ですか？」である場合に、ｃエンジンのデータを最後にして「その通りですねあるあるーご家族は何人ですか？」と出力する。
（５）他よりもよりフレンドリーな返答をしてくるエンジンをまず真っ先に出力して、その後に他のエンジンからの返答をくっつけて出力する。
このようにアレンジすることでユーザーが対話に引き込まれやすくなり、対話が続きやすくなる。フレンドリーかどうかは言葉（単語）に相対的な序列化をすることでどの位置に配置するかを決定することができる。
Ａ．例えば、上記（４）Ｂ．の場合ではｂエンジンの結果を最初にして「あるあるーその通りですねご家族は何人ですか？」と出力する。
Ｂ．例えば、「なるほどね」とユーザーが発話した場合に、ａ～ｂの２つのエンジンからの回答が「ａエンジン：あら適当な相槌ですね」「ｂエンジン：うむ」である場合に、ｂエンジンのデータを最後にして「ほほほーあら適当な相槌ですねうむ」と出力する。 (4) Results from engines that change topics more frequently than others are brought and output after the results from other engines.
By arranging it in this way, changing the topic will create a conversation that invites the next utterance, making it easier to continue the conversation.
A. For example, when a user says ``Domo Domo'', the answers from the three engines a to c are ``A engine: It's okay'', ``B engine: That's right'', and ``C engine: I watch baseball. If so, then the c engine data is placed at the end and it is output as ``Yes, that's right. Do you watch baseball?''
B. For example, when a user says, "It's hard to find something," the three engines a to c will respond, "A engine: That's right,""B engine: That's true,""C engine: Family." How many people do you have in your family?'', the c-engine data is placed at the end and it outputs, ``That's right. How many people does your family have?''
(5) Output the engine that gives a friendlier response than the others first, and then output the responses from the other engines together.
By arranging things in this way, it becomes easier for the user to be drawn into the dialogue, making it easier to continue the dialogue. Whether a word is friendly or not can be determined by ranking the words relative to each other.
A. For example, the above (4)B. In the case of , the result of the b engine is output first, and it says, ``Yes, that's right. How many people do you have in your family?''
B. For example, if the user says, "I see," and the answers from two engines a to b are "A engine: Oh, that's a nice comment" and "B engine: Hmm," then the b engine It outputs the data at the end, saying, ``Hohoho, that's a nice compliment.''

（６）（１）～（５）の処理を任意に組み合わせる
これによって、対話のバリエーションが増えることとなるため、ユーザーが同じ発話をした場合でもまったく同じ応答が帰ってきてしまうことがなくなり、対話に飽きることがなく対話が続きやすくなる。
例えば「老後って何」とユーザーが発話した場合に、ａ～ｃの３つのエンジンからの回答が「ａエンジン：ちょっと待ってくださいね」「ｂエンジン：サポートは嫌いじゃないよ」「ｃエンジン：今健康でいらっしゃいますか？」である場合に、最もフレンドリーなｂエンジンを最初にし、肯定文を組み合わせた後、疑問文を組み合わせ、「サポートは嫌いじゃないよちょっと待ってくださいね今健康でいらっしゃいますか？」と出力する。 (6) Combining the processes in (1) to (5) arbitrarily. This increases the variety of dialogue, so even if the user makes the same utterance, the exact same response will not be returned, and the dialogue This makes it easier to continue the conversation without getting bored.
For example, when a user says, "What is old age?", the answers from three engines a to c are "A engine: Please wait a moment,""B engine: I don't dislike support," and "C engine." ：Are you in good health now?", use the friendliest b-engine first, then combine it with an affirmative sentence, then a question sentence, and say, "I don't hate support, please wait a moment.I'm in good health now. Are you here?” is output.

（７）テキスト出力用の場合のエンジンでは、カッコや顔文字が帰ってくることがあるため、これらが帰ってきた場合には音声出力を抑制して出力する。そして画面にはそれらは表示させる。
顔文字は音声出力できないが、表示部に敢えて顔文字を表示させることで、音声と併せて対話の一部とすることで通常にはない対話のおもしろさを創出することができる。
Ａ．例えば、「腹減った」とユーザーが発話した場合に、ａ～ｃの３つのエンジンからの回答が「ａエンジン：(わざと無視)」「ｂエンジン：こんにちはお元気ですね」「ｃエンジン：こんにちは」である場合に、ａエンジンだけは音声出力させず、タッチパネル部７（表示画面）に表示させるようにする。
Ｂ．例えば、「元々入ってる」とユーザーが発話した場合に、ａ～ｃの３つのエンジ
ンからの回答が「ａエンジン：あなたはよくするんですか?「ｂエンジン：(´・ω・｀)」「ｃエンジン：夜型さんですか？」である場合に、ｂだけは音声出力させず、タッチパネル部７（表示画面）に表示させるようにする。
（８）同じ文字列が含まれる返答についてはいずれか１つを出力する。
同じ文字列が繰り返されると対話がくどくなってしまうし、聞き手に違和感を覚えさせてしまうためである。
例えば、「中華」とユーザーが発話した場合に、ａ～ｃの３つのエンジンからの回答が「ａエンジン：あらっいいですねぇ」「ｂエンジン：うん、中華です。」「ｃエンジン：中華を食べに行くんでしょうか？」である場合に、ｂエンジンとｃエンジンには「中華」の文字列があるためいずれか一方のみ出力する。例えば「あらっいいですねぇ中華を食べに行くんでしょうか？」のように出力する。 (7) In the case of an engine for text output, parentheses and emoticons may return, so if these return, audio output is suppressed and output. And they will be displayed on the screen.
Although emoticons cannot be output as audio, by deliberately displaying emoticons on the display and making them part of the dialogue along with the audio, it is possible to create an unusually interesting dialogue.
A. For example, when a user says ``I'm hungry,'' the three engines a to c will respond with ``a engine: (deliberately ignored),'' ``b engine: Hello, how are you?'' and ``c engine: Hello. ”, only engine a does not output audio, but displays it on the touch panel unit 7 (display screen).
B. For example, when a user says, "It's already there," the three engines a to c will respond, "A engine: Do you do this often?" B engine: (´・ω・｀)" c Engine: Are you a night owl?'', only b is displayed on the touch panel section 7 (display screen) without being outputted as a sound.
(8) Output one of the responses that include the same character string.
This is because if the same string of characters is repeated, the dialogue becomes tedious and the listener feels uncomfortable.
For example, when a user utters "Chinese food," the three engines a to c will respond, "A engine: That's nice,""B engine: Yes, it's Chinese food.""C engine: I eat Chinese food." Are you going to China?'', the b engine and c engine have the character string ``Chinese'', so only one of them is output. For example, it outputs something like, "Oh, that's nice. Are you going to eat Chinese food?"

（９）語尾変換手段、例えば語尾変換ＡＰＩを使って統一感を出すようにする。このときすべての返答について語尾変換を行ってもよいが、最後に出力する文か最初に出力する文のいずれか一方にのみ語尾変換を行うようにしてもよい。
普通の対話エンジンの文章に比べて、より親しみやすい表現となるのでよい。
例えば、あるエンジンから「ねむいな」と返答データがあった場合に語尾を語尾変換ＡＰＩによって変換させて「ねむいニャ」というように出力する。
（１０）認識失敗に備えて、複数のエンジンから得た返答のうち一部のみを音声出力に利用し、残りの返答は保持しておき、次の音声認識に失敗したときや対話システムからの返答がなかったときは、その保持しておいた返答を返すようにする。
音声認識が失敗した場合や、外部サーバーからのレスポンスがなかなか来ない場合に使用することで、対話が途切れずにつなげることができ、自然な対話に寄与する (9) Use a word ending conversion means, for example, a word ending conversion API to create a sense of unity. At this time, endings may be converted for all replies, but endings may be converted only for either the last sentence to be output or the first sentence to be output.
This is good because the expressions are more approachable than the sentences of ordinary dialogue engines.
For example, when a certain engine receives response data such as "Sleepy", the ending of the word is converted using the ending conversion API and output as "Sleepy nya".
(10) In case of recognition failure, only some of the responses obtained from multiple engines are used for voice output, and the remaining responses are retained and used when the next voice recognition fails or from the dialogue system. If there is no response, return the saved response.
By using it when voice recognition fails or when a response from an external server is slow to come, it allows the dialogue to continue without interruption, contributing to natural dialogue.

＜実施の形態７＞
図１４に示すように、実施の形態７はロボット１の近傍にスマートスピーカ５１を配置し、ロボット１とスマートスピーカ５１を組み合わせた装置（システム）である。ロボット１とスマートスピーカ５１の間隔は互いのマイクロフォンで音が拾える程度の距離であって例えば１～２ｍ以内に隣接配置されることがよい。
スマートスピーカ５１は無線ＬＡＮ装置を内蔵し、インターネットを使用した無線通信機能、電話回線接続機能等を有しネットワークモジュールが搭載されたネットワーク端末であり、マイクロフォンとスピーカ装置を備えた一種のコンピュータでもある。スマートスピーカ５１はスマートフォンのような端末装置を利用してサーバーを介して各種初期登録（例えば、使用者の名前、住所、電話番号、メールアドレス登録、複数の音声登録、ブルトゥースによるネットワーク対応のＡＩ機器の設定等）を実行し、音声登録した使用者からの発話（命令）によってインターネットに接続してサーバーの検索エンジンを使用して所定の処理を実行し、その結果をスピーカ装置から音声情報として出力する。 <Embodiment 7>
As shown in FIG. 14, the seventh embodiment is a device (system) in which a smart speaker 51 is placed near the robot 1, and the robot 1 and the smart speaker 51 are combined. The distance between the robot 1 and the smart speaker 51 is such that each other's microphones can pick up sound, and it is preferable that the robot 1 and the smart speaker 51 be placed adjacently within, for example, 1 to 2 meters.
The smart speaker 51 is a network terminal with a built-in wireless LAN device, a wireless communication function using the Internet, a telephone line connection function, etc., and is equipped with a network module, and is also a type of computer equipped with a microphone and a speaker device. . The smart speaker 51 uses a terminal device such as a smartphone to perform various initial registrations (e.g. user's name, address, telephone number, email address registration, multiple voice registrations, Bluetooth network compatible AI) via a server. device settings, etc.), connects to the Internet based on voice-registered utterances (commands) from the user, uses the server's search engine to execute predetermined processing, and transmits the results as voice information from the speaker device. Output.

ロボット１はスマートスピーカ５１と連携することで互いの機能を補うことができる。具体的にはロボット１とスマートスピーカ５１とを音声をインターフェースとして次のような機能を奏する。
（１）スマートスピーカ５１へのロボット１からの指示機能
イ．例えば、ユーザーがスマートスピーカのスキルを起動するフレーズを喋ったとき、ロボットはそのフレーズの音声認識結果の文字列を記憶しておき、ロボットは自らその文字列を音声合成で所定のタイミングで喋るようにする。
実施の形態７ではロボット１のコントローラＭＣは、ユーザーの発話を周波数成分を分析して個人の声を識別する声識別プログラム、ユーザーの発話を個人毎に区別して舞う頃フォン１２によって取得し、文字列データとして記憶させ、その文字列データに基づいて音声合成してスピーカ装置１３から再生させるフレーズを録音・再生プログラムを備えている。
ロボット１は、例えば発話を記憶するトリガーとなる発話、例えば「今からしゃべるから、録音して」という発話の後の言葉を記憶する機能を有している。そして、ユーザーはこの機能を利用して、スマートスピーカ５１を、起動させたりなんらかの処理をさせるような言葉を記憶させるようにする。例えば「ＯＫ、×××。照明をつけて。」のような言葉がよい。このとき、ロボット１に登録されるユーザー個人の音声は、スマートスピーカ５１に登録されるユーザー個人の声である。
そして、ロボット１に所定のタイミングで発話させるようにする。所定のタイミングで発話させる設定は、例えばスマートフォンのような端末を操作して設定登録できる。
ロ．ロボット１の「所定のタイミングでの発話」としては、例えばロボット１がなんらかの変化を検知すること、例えばタッチパネル部７へのタッチ動作や、ドップラーセンサ２２による物体（人）の検知等である。
例えばロボット１のコントローラＭＣはドップラーセンサ２２によって人を検知した場合に「ＯＫ、×××。照明をつけて。」というようにスピーカ装置１３から音声出力をさせる。それを受けてスマートスピーカ５１はネットワーク対応しているＡＩ機器である室内の照明を点灯させるように制御する。尚、制御される照明は前もってスマートスピーカ５１によって制御される対象であるように登録されている。照明以外に例えば、エアコン、テレビ、カーテンの開閉装置等をＡＩ機器とすることがよい。 The robot 1 can complement each other's functions by cooperating with the smart speaker 51. Specifically, the robot 1 and the smart speaker 51 perform the following functions using audio as an interface.
(1) Function of instructing the smart speaker 51 from the robot 1 a. For example, when a user speaks a phrase that activates a smart speaker skill, the robot memorizes the character string that is the voice recognition result of that phrase, and then uses speech synthesis to speak that character string at a predetermined timing. Make it.
In the seventh embodiment, the controller MC of the robot 1 uses a voice recognition program that analyzes the frequency components of the user's utterances to identify individual voices, and acquires the user's utterances using the voice phone 12 that distinguishes the user's utterances for each individual. It is provided with a program for recording and reproducing phrases that are stored as string data, synthesized into speech based on the string data, and played back from the speaker device 13.
The robot 1 has a function of storing, for example, an utterance that is a trigger for memorizing an utterance, such as the words following the utterance "I'm going to speak now, so please record it." Then, the user uses this function to cause the smart speaker 51 to memorize words that cause it to start up or perform some kind of processing. For example, it is good to say something like, "OK, ×××. Turn on the lights." At this time, the user's individual voice registered in the robot 1 is the user's individual voice registered in the smart speaker 51.
Then, the robot 1 is made to speak at a predetermined timing. Settings for speaking at a predetermined timing can be registered by operating a terminal such as a smartphone, for example.
B. The robot 1's "utterance at a predetermined timing" includes, for example, the robot 1 detecting some kind of change, such as a touch operation on the touch panel section 7 or the detection of an object (person) by the Doppler sensor 22.
For example, when the controller MC of the robot 1 detects a person using the Doppler sensor 22, it causes the speaker device 13 to output a voice such as "OK, XXX. Turn on the lights." In response to this, the smart speaker 51 controls the indoor lighting, which is a network-compatible AI device, to turn on. Note that the lighting to be controlled is registered in advance to be controlled by the smart speaker 51. In addition to lighting, for example, air conditioners, televisions, curtain opening/closing devices, etc. may be used as AI devices.

（２）スマートスピーカ５１からの発話かユーザーの発話かを区別する機能
ユーザーの個人の声を識別してロボット１に設定登録することで、ロボット１がスマートスピーカ５１からの発話か、あるいはユーザーの直の発話かを区別する機能を備えることができる。これによって、ロボット１が登録されていない声であるスマートスピーカ５１からの音に反応しないように制御することができ、逆に（１）のようにスマートスピーカ５１とロボット１にそれぞれユーザーの個人の声を登録することで、スマートスピーカ５１に対してユーザーだけでなくロボット１からも指示をすることができる。このように個人の発話を区別できることでスマートスピーカ５１への音声操作を妨害しないという機能も有する。
（３）スマートスピーカ５１からの発話の表示機能
スマートスピーカ５１にユーザーが指示した発話内容をロボット１が取得して文字テキスト化し、タッチパネル部７に表示させるようにしてもよい。
ロボット１のコントローラＭＣは発話内容を取得して自身であるいはサーバーに接続して文字テキスト化するプログラムを有しているとよい。
例えば、ユーザーが「ＯＫ、×××。今日の天気を教えて。」と発話し、これに対してスマートスピーカ５１が「今日の愛知県岡崎市の天気は晴れ、最高気温１５度、降水確率は２０％です」と回答した場合に、これらのすべての対話を、例えば、ロボット１のタッチパネル部７には例えば、次のように聞き取った両者の対話が表示される。
「ユーザー：ＯＫ、×××。今日の天気を教えて。
スマートスピーカ：今日の愛知県岡崎市の天気は晴れ、最高気温１５度、降水確率は２０％です」
また、加えてロボット１のコントローラＭＣは文字テキスト化した内容を短く翻案したり要約するプログラムを有しているとよい。ロボット１は翻案したり要約した内容を音声出力又はタッチパネル部７への表示させるようにするとよい。
（４）人がいないときにロボット１がスマートスピーカ５１へ色々聞いて学習しておく機能
例えば、ロボット１がビルトインシナリオとして「明日の天気は？」とか「なにか事件はないですか」などという質問ワードを有しており、ユーザーが留守の時にロボット１のコントローラＭＣに所定のタイミングでスマートスピーカ５１を起動するフレーズと一緒に質問ワードを音声出力させるようにする（ロボット１の声はスマートスピーカ５１に登録済みとする）。このとき、コントローラＭＣはスマートスピーカ５１からの発話内容をロボット１はマイクロフォン１２によって取得して記憶しておき、所定のタイミングでそ
の内容を音声出力させる。所定のタイミングとは、例えば所定の時間、ドップラーセンサ２２によって人を検知した際、ユーザーがロボット１に「何かニュースはないの？」というようなビルトインシナリオとしての発話を行った際等である。 (2) Function to distinguish between the utterances coming from the smart speaker 51 and the user's utterances By identifying the user's individual voice and registering the settings in the robot 1, the robot 1 can distinguish whether the utterances are coming from the smart speaker 51 or the user's utterances. It is possible to provide a function to distinguish between direct utterances and direct utterances. By doing this, it is possible to control the robot 1 so that it does not respond to the sound from the smart speaker 51 which is an unregistered voice, and conversely, as shown in (1), the robot 1 can be controlled so that it does not respond to the sound from the smart speaker 51 that is not registered. By registering the voice, not only the user but also the robot 1 can give instructions to the smart speaker 51. By being able to distinguish individual utterances in this way, it also has the function of not interfering with voice operations on the smart speaker 51.
(3) Display function of utterances from smart speaker 51 The robot 1 may acquire the utterance content instructed by the user to the smart speaker 51, convert it into text, and display it on the touch panel section 7.
The controller MC of the robot 1 preferably has a program that acquires the content of speech and converts it into text by itself or by connecting to a server.
For example, the user says, "OK, XXX. Tell me about today's weather." In response, the smart speaker 51 says, "Today's weather in Okazaki City, Aichi Prefecture is sunny, the maximum temperature is 15 degrees, and the probability of rain is is 20%.'', all of these conversations are displayed on the touch panel unit 7 of the robot 1, for example, as follows.
“User: OK, ×××. Tell me about today's weather.
Smart Speaker: Today's weather in Okazaki City, Aichi Prefecture is sunny, with a maximum temperature of 15 degrees and a 20% chance of rain.
In addition, it is preferable that the controller MC of the robot 1 has a program for translating or summarizing the content converted into text into short text. The robot 1 may output the translated or summarized content by voice or display it on the touch panel unit 7.
(4) A function for robot 1 to learn by asking various questions to smart speaker 51 when no one is around. For example, robot 1 can ask questions such as "What's the weather like tomorrow?" or "Are there any incidents?" as a built-in scenario. When the user is away, the controller MC of the robot 1 outputs the question word along with a phrase to activate the smart speaker 51 at a predetermined timing (the voice of the robot 1 is the voice of the smart speaker 51). ). At this time, the controller MC uses the microphone 12 of the robot 1 to acquire and store the content of the utterance from the smart speaker 51, and causes the robot 1 to output the content as voice at a predetermined timing. The predetermined timing is, for example, when a person is detected by the Doppler sensor 22 for a predetermined period of time, or when the user speaks to the robot 1 as a built-in scenario such as "Is there any news?" .

＜実施の形態７の変形例１＞
スマートスピーカ等、他の音声認識機器の音声操作を妨害しない機能を設定するようにしてもよい。例えば、他のスマートスピーカの起動フレーズ（音声認識開始ワード）を、の音声をロボット１が認識した場合、自身の音声出力を停止するようにしてもよい。
例えば、他のスマートスピーカであるＡ社の起動フレーズである「ＯＫ、×××」のような起動用のフレーズの音声を認識した場合に、ロボット１のコントローラＭＣはそれをマイクロフォン１２から取得し、登録済みの起動フレーズであると判断すると、自身の音声出力を一旦停止させる。
これによって音声認識機器の音声操作を妨害せずに、機能を発揮させることができる。 <Modification 1 of Embodiment 7>
You may also set a function that does not interfere with the voice operations of other voice recognition devices, such as smart speakers. For example, when the robot 1 recognizes the voice of another smart speaker's startup phrase (voice recognition start word), it may stop outputting its own voice.
For example, if the voice of a startup phrase such as "OK, , if it determines that it is a registered startup phrase, it will temporarily stop its own voice output.
This allows the voice recognition device to perform its functions without interfering with voice operations.

＜実施の形態７の変形例２＞
ロボット１にスマートスピーカ５１のような他の音声認識機器の音声認識起動キーワードを認識し、その後のユーザーのスマートスピーカ５１への発音を認識してクラウドサーバ－にリクエストして、検索エンジン等に検索をさせて対応する回答を得ておく。
ロボット１は、スマートスピーカ５１が音声認識に失敗してしまった場合（例えば「エラーです」など）や音声認識結果に対する適切な回答を出力できない旨の音声出力（例えば「すみません」など）を認識した場合、ロボット１は自身が前もって得ておいた回答を出力する。あるいは、音声認識に失敗してしまった場合や音声認識結果に対する適切な回答を出力できない旨の音声出力を受けてから、ロボット１はクラウドサーバ－にリクエストして回答を得るようにしてもよい。 <Modification 2 of Embodiment 7>
The robot 1 recognizes the voice recognition activation keyword of other voice recognition devices such as the smart speaker 51, recognizes the user's subsequent pronunciation to the smart speaker 51, sends a request to the cloud server, and searches it on a search engine, etc. and get the corresponding answer.
The robot 1 recognizes when the smart speaker 51 fails in voice recognition (for example, "This is an error") or when the smart speaker 51 outputs voice indicating that it cannot output an appropriate answer to the voice recognition result (for example, "Excuse me"). In this case, the robot 1 outputs the answer that it has obtained in advance. Alternatively, the robot 1 may make a request to the cloud server to obtain an answer after receiving a voice output indicating that voice recognition has failed or that an appropriate response to the voice recognition result cannot be output.

＜実施の形態８＞
ロボット１は、例えばユーザーの要求によってｗｅｂサイト上のニュース記事を音声で読み上げるようにしてもよい。ロボット１のコントローラＭＣはサーバー上での検索エンジンを利用したニュース記事のリクエストをし、クラウドサーバーはそのリクエストに対して、例えば登録サイトのニュースデータをテキストデータとしてレスポンスする。
ロボット１はニュースデータを音声合成して読み上げる（出力する）と同時に記事の情報源の名称も音声合成して読み上げる（出力する）。また、併せて表示画面としてのタッチパネル部７には記事の情報源のＵＲＬを表示をし、タッチパネル部７上でそのＵＲＬにタッチされたら、そのＵＲＬのページの内容をタッチパネル部７上に表示するようにする。
また、記事や記事の情報源を読み上げる場合には、それらが引用であることがわかるような表現で出力することがよい。
例えば、ＵＲＬ「https://＼＼＼＼.jp/archives/92###」の記事内容を読み上げる場合を説明する。
『「××ニュース」のサイトの記事を読み上げるよ。「・・・・・を本年１月１５日より販売する。」そうだよ。』
というように、例えば「のサイトの記事を読み上げるよ。」や「そうだよ」というような記事や記事の情報源以外を正規表現として引用であるように、聞き手にわかるように発話させ、この場合では画面表示に、例えば『https://＼＼＼＼.jp/archives/92###の情報だよ。』というようにＵＲＬを表示させる。そしてこのＵＲＬ部分にタッチすることでタッチパネル部７に読み上げた記事の内容を改めて表示させる。
「聞き手にわかるように発話」とは記事部分とそうでない部分で、例えば語調や声を変えるようにすることがよい。
このようにすれば、Ｗｅｂ記事を読まなくともロボット１の読み上げた内容を聞き取る
だけでニュース内容を理解でき、場合によっては念のため目視でニュース内容を確認することもできる。 <Embodiment 8>
The robot 1 may, for example, read news articles on a website aloud in response to a user's request. The controller MC of the robot 1 requests a news article using a search engine on the server, and the cloud server responds to the request by using, for example, news data from a registered site as text data.
The robot 1 synthesizes the news data and reads it out (outputs it), and at the same time synthesizes the name of the information source of the article and reads it out (outputs it). Additionally, the URL of the article information source is displayed on the touch panel section 7 serving as a display screen, and when that URL is touched on the touch panel section 7, the contents of the page of that URL is displayed on the touch panel section 7. Do it like this.
Furthermore, when reading out an article or an information source of an article, it is preferable to output it in a way that makes it clear that it is a quotation.
For example, a case will be described in which the content of an article with the URL "https://\\\\\.jp/archives/92###" is read aloud.
``I'll read out the article on the ``XX News'' site. "...will be on sale from January 15th of this year."That's right. ”
For example, ``I'm going to read out the article on the site of.'' or ``That's right.'' You can use regular expressions to quote articles other than the source of the article, such as ``I'm going to read out the article on the site of.'' or ``That's right.'' Then, the screen display will say, ``Information for https://\\\\\.jp/archives/92###. ” The URL will be displayed. By touching this URL portion, the content of the read article is displayed again on the touch panel section 7.
``Speak in a way that the listener can understand'' means, for example, changing the tone or voice between the article part and the non-article part.
In this way, the user can understand the news content just by listening to the content read out by the robot 1 without reading the web article, and in some cases, the user can visually confirm the news content just in case.

＜実施の形態９＞
実施の形態９ではコンピュータの見えない動きをロボットのアクチュエータの動作で見せる場合について説明する。
（１）ロボティクスプロセスオートメーションについて
ロボティクスプロセスオートメーション（以下、ＲＰＡとする）は、単純なパソコン作業を自動化するソフトウェアである。ソフトウェアはサーバーに設定することもでき、ユーザーのコンピュータに設定することもできる。図１５に基づいてソフトウェアをサーバーに設定した場合であって、上記各実施の形態のロボット１をＲＰＡのネットワークに組み込んだ場合の一例について説明する。
図１５に示すように、クラウドサーバー５５とユーザー側コンピュータ５６、５７とがインターネットを使用したネットワークで接続されている。また、クラウドサーバー５５とロボット１もネットワークで接続されている。ユーザー側コンピュータ５６はＲＰＡプログラムによってクラウドサーバー５５によって制御されている。また、ロボット１にはクラウドサーバー５５によって実行されるＲＰＡのためのプログラムにおける所定の処理においてその処理がまもなく実行される、実行されている、あるいは実行された等の処理情報が報知されるようになっている。
クラウドサーバー５５はユーザー側コンピュータ５６に処理１～処理４を順に処理させる。本実施の形態では処理1と処理２はコンピュータ５６、処理３と処理４はコンピュータ５７が実行する。もっと多くの処理を設定してもよく、処理に関わるユーザー側コンピュータ５６も１以上いくつでもよい。
処理１としては、例えばコンピュータ５６へのユーザーのアクセス・ログイン等、処理２としては、例えばコンピュータ５６内のデータに基づくリストの作成・仕分け等、処理３は、例えば処理２に続いて実行する顧客毎の請求内容の修正、処理４は、例えば処理３に続いて実行する請求書の発行である。本実施の形態９では例えば処理３ではユーザに修正のための入力を促し、その入力があって後に、次の処理４に移行するものとする。つまり、処理２の後は処理３での力が完了するまで一旦待ち受けードとなる。
クラウドサーバー５５はこれらの処理を実行する直前、処理中、処理後にそれぞれロボット１に異なる報知情報を出力し、ロボット１はその報知情報に基づいてロボット１の周囲に処理状況を報知するようにするとよい。あるいは各処理毎に一回の報知でもよい。 <Embodiment 9>
In Embodiment 9, a case will be described in which invisible movements of a computer are made visible by movements of actuators of a robot.
(1) About Robotics Process Automation Robotics Process Automation (hereinafter referred to as RPA) is software that automates simple computer tasks. The software can be set up on a server or on a user's computer. An example of a case where software is set on a server based on FIG. 15 and the robot 1 of each of the above embodiments is incorporated into an RPA network will be described.
As shown in FIG. 15, a cloud server 55 and user computers 56 and 57 are connected via a network using the Internet. Further, the cloud server 55 and the robot 1 are also connected via a network. The user computer 56 is controlled by the cloud server 55 using an RPA program. In addition, the robot 1 is notified of processing information such as that the process will be executed soon, is being executed, or has been executed in a predetermined process in the RPA program executed by the cloud server 55. It has become.
The cloud server 55 causes the user-side computer 56 to sequentially process processes 1 to 4. In this embodiment, processing 1 and processing 2 are executed by the computer 56, and processing 3 and processing 4 are executed by the computer 57. More processes may be set, and the number of user-side computers 56 involved in the processing may be one or more.
Process 1 includes, for example, user access and login to the computer 56, Process 2 includes, for example, creation and sorting of a list based on data in the computer 56, and Process 3 includes, for example, customer access and login performed following Process 2. Processing 4, which involves modifying the billing details for each request, is, for example, issuing a bill, which is executed subsequent to Processing 3. In the ninth embodiment, for example, in process 3, the user is prompted to input for correction, and after the input is received, the process moves to the next process 4. In other words, after process 2, the computer becomes a standby mode until the process 3 is completed.
The cloud server 55 outputs different notification information to the robot 1 immediately before, during, and after executing these processes, and the robot 1 notifies the surroundings of the processing status based on the notification information. good. Alternatively, notification may be made once for each process.

例えば、
ａ．ロボット１がどの処理がどのような状態かを音声や音の違い、あるいは音楽等で報知する。
ｂ．表示画面上で報知する。ａ．と同時に行ってもよい。
ｃ．処理３ではユーザーの入力が必要であるため、処理３だけを報知するようにしてもよく、処理３だけを他の報知とは異なる（識別できる）報知としてもよい。
ｄ．ロボット１から他の端末装置に処理の状態を転送して報知する。
ｅ．ロボット１が処理状況がわかるような動作をする。例えば、コンピュータ５６の処理が行われていればその方向を向くように制御する。そのため、前もってロボット１に対する各コンピュータ５６、５７の方向は何らかの方向特定手段、例えば上記の形状認識プログラムを使用して認識しておくことがよい。ロボット１に例えば矢印や腕部材のような方向指示部材を設け、その指し示す方向に報知対象としてのコンピュータ５６、５７があるように動作してもよい。 for example,
a. The robot 1 notifies which processing is in which state by voice, different sounds, music, etc.
b. Notify on the display screen. a. You can go at the same time.
c. Since process 3 requires user input, only process 3 may be notified, or only process 3 may be a notification that is different (identifiable) from other notifications.
d. The processing status is transferred from the robot 1 to other terminal devices and notified.
e. The robot 1 moves in such a way that the processing status can be understood. For example, if the computer 56 is processing, it is controlled to face that direction. Therefore, it is preferable to recognize the direction of each computer 56, 57 with respect to the robot 1 in advance using some direction specifying means, for example, the shape recognition program described above. The robot 1 may be provided with a direction indicating member such as an arrow or an arm member, and the robot 1 may operate so that the computers 56 and 57 as the notification target are located in the direction indicated by the direction indicating member.

（２）ブロックチェーンについて
ブロックチェーンは多数のコンピュータが分散して記録する仕組みである。特にパブリック型のブロックチェーンでは記録対象のデータや記録されたデータが公開される。
そこでブロックチェーンのネットワーク中にロボット１を配置し、ブロックデータが送信される前にユーザーにロボット１がお知らせするようにする。ロボット１に「待て」という命令を出力させることで（つまりデータ送信させずに待機するリクエストをする）送信を停止させるようにするとよい。 (2) About blockchain Blockchain is a system in which many computers record data in a distributed manner. In particular, with public blockchains, the data to be recorded and the recorded data are made public.
Therefore, Robot 1 is placed in the blockchain network, and Robot 1 notifies the user before block data is sent. It is preferable to stop the transmission by having the robot 1 output a command "wait" (that is, requesting to wait without transmitting data).

＜実施の形態１０＞
各実施の形態では自然対話モードについて説明したが、自然対話モードに代えて、または、自然対話モードとともに、外国語学習モードを設けるとよい。自然対話モードに加えて外国語学習モードを設けるときは、例えば自然対話モードで「外国語学習モードへ切り替え」という音声を認識したときに自然対話モードから外国語学習モードへ切替えるとよい。また「外国語学習モード」で「自然対話モードへ切り替え」という音声を認識したときに外国語学習モードから自然対話モードへ切替えるとよい。
よく外国語を習得するには外国人の友人を作るとよいなどと言われるが、そのような機会に恵まれる人は多くない。そこで外国語学習のパートナーになり得る対話システムである外国語学習機能を備えるとよい。
任意の第一言語と第二言語との連携とすることができる。以下、日本語の対話システムと英語の対話システムを連携させる構成で説明する。
基本的には英語で会話するシステムとし、会話中に日本語で「もう一回言って」などの要求を出力するとよい。さらに、英文解析Webサービスなどと連携して、「説明して」などの要求にたいして会話中の英文を日本語で解説する機能を備える。会話中に言いたいことが英語でどう言えばいいかわからないときには、「翻訳して」と要求をすると英語でどういうのかを出力する。出力された英文を読めばそのまま会話を続けることができる。自分で調べたりする必要がないので、会話が途切れることもなく、円滑な英会話学習が期待できる。
外国語学習モードでの母国語（例えば、日本人なら日本語）の音声認識エンジンと外国語（例えば英語）の音声認識エンジンはどちらもクラウド上で動作している音声認識エンジンを利用するようにしてもよいが、母国語（例えば日本語）については要求内容が定型文であること、要求に対する回答のフォーマットが決まっていることから、ローカル（例えばロボット１内）に音声認識エンジンを設けこれを利用するとよい。対話エンジンも同様である。音声合成エンジンもいずれの場所に設けてもよいが、特にローカルに設けるとよい。
マイクロフォン１２からの信号に基づく音声データを両言語の音声認識エンジンに投げると、どちらの音声認識エンジンからもなんらかの結果が返ってくる。例えば、「もう一回言って」という日本語を両方のエンジンに投げると、日本語のエンジンは「もう一回言って」というテキストデータを返し、英語のエンジンは「もう一回言って」を英語として解釈したデタラメなテキストデータを返してくる。このような場合、日本語の要求は定型文であるため、日本語のエンジンが返してきたテキストデータと要求の定型文を比較して、一致すれば日本語の要求がされたと判断し、一致しなければ英語が話されたと判断することで、英語と日本語を切り分ける処理を行なうと良い。
要求は英語で受け付けるようにしてもよいが、特に、要求をしようにも英語がわからないというケースを想定して、日本語でも要求できるようにすることが望ましいことを発明者は見出した。
「説明して」などの要求に対して会話中の英文を日本語で解説する機能は、単に英文の訳を日本語にして出力するだけでもよいが、特に英文で用いられている語句や文法の解説を出力するとよい。特にその英文で用いられている構文についての解説を出力するとよい。構文についての解説は例えば各句を頂点（ノード）して例えば各句を囲む描画をし、関連する各句の関係を示す線分等の枝（エッジ）を描画するとよい。例えばグラフ構造（特にツリー構造とするとよい）の図でタッチパネル部７に表示するとよい。
また、表示した内容を音声でスピーカ装置１３から出力するとよい。例えば、https://gigazine.net/news/20160602-foxtype-review/で解説されるような構文解析サービスのAP
Iをコールし、その結果を受け取って、解析結果を日本語で出力する構成とするとよい。
例えば以下のような処理と出力を行なう。
『処理英語の対話エンジンからフレーズを取得
ロボット I'm a fantastic robot.
人「もう一回言って」
ロボット I'm a fantastic robot.
人「説明して」
処理（「説明して」を認識）→構文解析APIコール→構文解析結果から日本語解説を生成
ロボット Iが主語で、amが動詞、robotが目的語になるよ。
fantasticは素晴らしいという意味の形容詞でrobotを修飾しているよ。
英文は「私は素晴らしいロボットです」という意味になるよ。
人「I don't think so.」
ロボット Don't say it! 』
話したいことが英語でわからなければ、日本語で英語での言い方を教えてくれるように要求できるので会話が途切れないという優れた効果を発揮する。日本語での要求ができない場合は、英語での言い方がわからないとき、辞書で調べたりネットで翻訳したりする必要があり、勉強しているという感じになってしまいストレスを感じる。日本語で要求できれば、ただバイリンガルと会話しているという感覚でストレスなく学習できる。語学学習は継続することがとても大事であるから、なるべく学習の際にストレスが少ないということは継続する上で極めて重要なことである。本構成によれば、継続して語学学習を行なえるロボット１を実現できる。 <Embodiment 10>
Although the natural dialogue mode has been described in each embodiment, a foreign language learning mode may be provided in place of or in addition to the natural dialogue mode. When providing a foreign language learning mode in addition to the natural dialogue mode, it is preferable to switch from the natural dialogue mode to the foreign language learning mode, for example, when a voice saying "switch to foreign language learning mode" is recognized in the natural dialogue mode. It is also preferable to switch from the foreign language learning mode to the natural dialogue mode when the user recognizes a voice saying "switch to natural dialogue mode" in the "foreign language learning mode".
It is often said that the best way to learn a foreign language is to make foreign friends, but not many people have such an opportunity. Therefore, it is a good idea to have a foreign language learning function, which is a dialogue system that can be your partner in foreign language learning.
It can be a combination of any first language and second language. In the following, a configuration will be explained in which a Japanese dialogue system and an English dialogue system are linked.
Basically, it would be a good idea to have a system that converses in English, and output requests such as "Say that again" in Japanese during the conversation. Furthermore, by linking with English sentence analysis web services, it has a function that provides Japanese explanations of the English sentences being spoken in response to requests such as ``explain.'' If you don't know how to say what you want to say in English during a conversation, you can request it to translate and it will output what it means in English. You can continue the conversation by reading the output English text. Since you don't have to do any research on your own, you can expect to learn English conversation smoothly without interruptions.
In foreign language learning mode, both the speech recognition engine for your native language (for example, Japanese if you are Japanese) and the speech recognition engine for the foreign language (for example, English) use speech recognition engines running on the cloud. However, in the case of a native language (e.g. Japanese), the request content is a fixed phrase and the format of the response to the request is fixed, so it is recommended to install a local voice recognition engine (e.g. inside the robot 1) and use this engine. Good to use. The same goes for the dialogue engine. Although the speech synthesis engine may be provided at any location, it is particularly preferable to provide it locally.
When voice data based on a signal from the microphone 12 is sent to voice recognition engines for both languages, both voice recognition engines return some kind of result. For example, if you throw the Japanese phrase "Tell me one more time" to both engines, the Japanese engine will return the text data "Tell me one more time" and the English engine will return "Tell me one more time." Returns random text data interpreted as English. In such a case, since the Japanese request is a fixed phrase, the text data returned by the Japanese engine is compared with the fixed phrase of the request, and if they match, it is determined that a Japanese request has been made, and It is a good idea to perform processing to separate English and Japanese by determining that English was spoken if the language is not spoken.
Although requests may be accepted in English, the inventor has found that it is desirable to be able to accept requests in Japanese, especially assuming a case where the requester does not understand English.
The function to explain English sentences in Japanese in response to requests such as "explain" can be done simply by outputting the translation of the English sentence into Japanese, but it is especially useful for explaining the words and grammar used in the English sentences. It is a good idea to output an explanation. It is especially good to output an explanation of the syntax used in the English text. To explain the syntax, for example, draw each phrase as a vertex (node) to surround each phrase, and draw edges such as line segments that indicate the relationships between related phrases. For example, it is preferable to display it on the touch panel section 7 in the form of a graph structure (particularly preferably a tree structure).
Further, it is preferable to output the displayed content in the form of audio from the speaker device 13. For example, the AP of a parsing service as explained at https://gigazine.net/news/20160602-foxtype-review/
A good configuration would be to call I, receive the result, and output the analysis result in Japanese.
For example, perform the following processing and output.
`` Process Retrieve phrase from English dialogue engine Robot I'm a fantastic robot.
People: “Say it again.”
Robot I'm a fantastic robot.
People: “Please explain.”
Processing (recognizes "explain") → Syntax analysis API call → Generates Japanese explanation from syntax analysis results Robot I is the subject, am is the verb, and robot is the object.
Fantastic is an adjective that modifies robot.
In English, it means "I'm a wonderful robot."
People: "I don't think so."
Robot Don't say it!
If you don't understand what you want to say in English, you can request someone to teach you how to say it in English in Japanese, which is a great way to keep the conversation going. If you can't make a request in Japanese, and you don't know how to say it in English, you'll have to look it up in a dictionary or translate it online, which can make you feel like you're just studying, which can be stressful. If you can make requests in Japanese, you can learn without stress as if you were just talking to a bilingual person. Continuing language learning is very important, so it is extremely important to keep learning as stress-free as possible. According to this configuration, it is possible to realize a robot 1 that can continuously perform language learning.

＜実施の形態１１＞
各実施の形態で説明した機能に加え、ロボット１の設置された室内へ人が入ってきたことを検知したとき発話する機能を設けるとよい。またロボット１の設置された室内から人が出ていくことを検知したとき発話する機能を設けるとよい。
例えば、その室内とその室内以外の場所の通路にセンサを設けて、ロボット１の設置された室内へ人が入ってきたこと、ロボット１の設置された室内から人が出ていくことを検知するとよい。特にロボット１が設置された室内に出入りするための自動ドアがある場合、センサは特に自動ドアの開閉のために人がその自動ドアに接近していることを検知するセンサを用いると良い。特に自動ドアをはさんで室外にある第一の人検知センサと、自動ドアをはさんで室内にある第二の人検知センサと、自動ドアが開いているときに自動ドアの場所にいる人を検知する第三の人検知センサセンサ（人のドアへの挟み込みを防止するためのセンサ）の少なくともいずれか２つにロボット１のコントローラMCを接続して、室内への出入りを検知するとよい。このようにすれば、ロボット１が設置された室内への出入り等を新たなセンサを設置することなく検出できる。例えば各センサの人を検知した際に立ち上がる信号のエッジを捉えて検出するとよい。
例えば、第一の人検知センサで人が検知された後、第三の人検知センサで人が検知された場合、「いらっしゃいませ」などと入ってきた人を歓迎するフレーズの音声をスピーカ装置１３から出力するとよい。例えば、第二の人検知センサで人が検知された後、第三の人検知センサで人が検知された場合、「ありがとうございました」と出ていく人に感謝するフレーズの音声をスピーカ装置１３から出力するとよい。
これらのときに第１～第３のモータ２３～２５を動かし、ロボット１の設置位置から予め設定した自動ドアの方を向く動作を行なうようにするとよい。なお、第一の人検知センサと第二の人検知センサとが同じ時に人を検知した場合には、第一の人検知センサを優先するとよい。このようにすれば、入ってくる人によりロボット１の存在を気づいてもらいやすくなるとともに、入ってくる人が「ありがとうございます」とロボット１にいきなり言われる違和感を軽減できる。 <Embodiment 11>
In addition to the functions described in each embodiment, it is preferable to provide a function of speaking when it is detected that a person has entered the room where the robot 1 is installed. Further, it is preferable to provide a function that makes a speech when it is detected that a person leaves the room where the robot 1 is installed.
For example, if a sensor is installed in a passageway between the room and a place other than the room, and detects when a person enters the room where the robot 1 is installed or when a person leaves the room where the robot 1 is installed, good. In particular, when there is an automatic door for entering and exiting the room in which the robot 1 is installed, it is preferable to use a sensor that detects when a person approaches the automatic door to open or close the automatic door. In particular, the first person detection sensor located outdoors across the automatic door, the second person detection sensor located indoors across the automatic door, and the person who is at the automatic door when the automatic door is open. It is preferable to connect the controller MC of the robot 1 to at least two of the third human detection sensors (sensors for preventing people from getting caught in the door) to detect entering and exiting the room. In this way, it is possible to detect the entry and exit of the robot 1 into the room where it is installed, without installing a new sensor. For example, detection may be performed by capturing the edge of a signal that rises when a person is detected by each sensor.
For example, if a person is detected by the first person detection sensor and then a person is detected by the third person detection sensor, the speaker device 13 plays a phrase such as "Welcome" to welcome the incoming person. It is best to output from For example, if a person is detected by the second person detection sensor and then a person is detected by the third person detection sensor, the speaker device 13 sends a voice saying "Thank you very much" to the person leaving the room. It is best to output from
At these times, it is preferable to move the first to third motors 23 to 25 to cause the robot 1 to face a preset automatic door from its installation position. Note that when the first human detection sensor and the second human detection sensor detect a person at the same time, it is preferable to give priority to the first human detection sensor. In this way, it becomes easier for people entering the room to notice the presence of the robot 1, and it also reduces the sense of discomfort that the person entering the room feels when the robot 1 suddenly says, "Thank you."

＜その他の実施の形態＞
（１）各実施形態等においては、無線ＬＡＮ装置２１を備えることとしたが、これに代えてまたはこれとともに有線ＬＡＮ装置を備え、有線ＬＡＮネットワークに接続するようにしてもよい。有線ＬＡＮ装置はロボット１に内蔵しても、外付けとしてもよい。ＵＳＢのＯＴＧ（On-The-Go）用の端子１８に有線ＬＡＮ装置を接続する構成としてもよい。有線ＬＡＮネットワークはルーター等を介してインターネットに接続される構成とするとよい。無線ＬＡＮは環境によっては通信が安定しないまたは接続できないケースも想定されうる。例えばスマホ等、多数の無線ＬＡＮ装置が存在する場所にロボット１を設置する場合には有線ＬＡＮ装置を介してインターネットにアクセスする構成とすると望ましい。
（２）各実施形態等においては、半二重方式での人とロボット１との対話の例を示しているが、全二重方式で人とロボット１との対話を行なうようにしてもよい。例えば表１の対話の中で「・・・を開いて」と人が行った後、「本当にいいですか」の発話を行っている間も人の音声の認識を続け「取消」という音声が「本当にいいですか」の発話中に認識された場合には、発話中であれば発話を中断し、すぐに「中止しました」とロボットから発話するように構成してもよい。
（３）半二重方式での対話を行なう構成は、構成や処理を簡素化でき、コストを低減できるので特によい。しかし、ロボット１がマイクロフォン１２をオンにしたタイミングが分かりづらく、ユーザーが喋っても、ロボットが認識対象とするユーザーの音声の先頭部分、すなわち言葉の先頭部分が欠けてしまうことが多いという課題を発明者らは見出した。この課題を解決するため、コントローラＭＣがマイクをオンしたタイミングで特徴的な画面表示をおこなうとよい。これによりスムーズな会話をサポートする。コントローラＭＣがマイクをオンしたタイミングで特徴的な画面表示をおこなう態様としては、１）画面の四隅を光らせる、２）マイクのアイコンを表示させる、の少なくともいずれか一方を行なうとよく、特に１）、２）の両方とも行うと優れた効果を発揮する。
（４）第１～第３のモータ２３～２５を構成するモータとしては、ＤＣモータなど各種のモータとすることができるが、特にステッピングモータとするとよく、ロボット１はステッピングモータにより姿勢を制御する構成とするとよい。ステッピングモータはモータに流す電流に比例してトルクの大きさが変わる。電流をたくさん流せば大きなトルクを得られるが、発熱や電池寿命などが問題になる。そこでロボット１の静止時はその姿勢を維持するために必要な最小限の電流を流し、ロボット１が姿勢を変えるときのみ大きな電流を流すようにすると特によい。なお、サーボモータ２３～２５のすべてについてその静止時に姿勢を維持するために必要な最小限の電流を通電するようにしてもよいが、ディテントトルク(通電しない状態でのトルク)で支持できる胴体部の第１のモータ２３は通電しないようにする一方、頭部分はディテントトルクでは負けてしまうため第２のモータ２４，第３のモータ２５については静止時もこの通電をするようするとよい。
また、静止時のトルクのまま回転させるとトルク不足で脱調が起こり上手く回転しないため、回転させる時は静止時に比べ、電流をたくさん流すようにしてトルクを上げるとよい。一方、静止時は回転時のような大きなトルクは必要ないので回転時に比べ、電流を下げるとよい。
また、ある方向に向きを変える場合、加速しながら一定速度まで上げて目的の角度が近づいたら減速して止めるという制御を行なうとよい。
また、ロボットをモータによって駆動させると機械的、電気的なノイズを発生する。このノイズが音声認識の認識率を低下させるので音声認識中はモータを停止するように制御するとよい。
本発明の範囲は，明細書に明示的に説明された構成や限定されるものではなく，本明細書に開示される本発明の様々な側面の組み合わせをも，その範囲に含むものである。本発明のうち，特許を受けようとする構成を，添付の特許請求の範囲に特定したが，現在の処は特許請求の範囲に特定されていない構成であっても，本明細書に開示される構成を，将来的に特許請求の範囲とする意思を有する。
本願発明は上述した実施の形態に記載の構成に限定されない。上述した各実施の形態や
変形例の構成要素は任意に選択して組み合わせて構成するとよい。また各実施の形態や変形例の任意の構成要素と，発明を解決するための手段に記載の任意の構成要素または発明を解決するための手段に記載の任意の構成要素を具体化した構成要素とは任意に組み合わせて構成するとよい。これらについても本願の補正または分割出願等において権利取得する意思を有する。また「～の場合」「～のとき」という記載があったとしてもその場合やそのときに限られる構成として記載はしているものではない。これらの場合やときでない構成についても開示しているものであり、権利取得する意思を有する。また順番を伴った記載になっている箇所もこの順番に限らない。一部の箇所を削除したり、順番を入れ替えた構成についても開示しているものであり、権利取得する意思を有する。
また，意匠出願への変更出願により，全体意匠または部分意匠について権利取得する意思を有する。図面は本装置の全体を実線で描画しているが，全体意匠のみならず当該装置の一部の部分に対して請求する部分意匠も包含した図面である。例えば当該装置の一部の部材を部分意匠とすることはもちろんのこと，部材と関係なく当該装置の一部の部分を部分意匠として包含した図面である。当該装置の一部の部分としては，装置の一部の部材としても良いし，その部材の部分としても良い。全体意匠はもちろんのこと，図面の実線部分のうち任意の部分を破線部分とした部分意匠を，権利化する意思を有する。 <Other embodiments>
(1) In each of the embodiments, the wireless LAN device 21 is provided, but instead of or in addition to this, a wired LAN device may be provided and connected to a wired LAN network. The wired LAN device may be built into the robot 1 or may be attached externally. A configuration may also be adopted in which a wired LAN device is connected to the USB OTG (On-The-Go) terminal 18. The wired LAN network is preferably configured to be connected to the Internet via a router or the like. Depending on the environment, wireless LAN communication may be unstable or connection may not be possible. For example, when the robot 1 is installed in a place where a large number of wireless LAN devices such as smartphones are present, it is preferable to configure the robot 1 to access the Internet via a wired LAN device.
(2) In each of the embodiments, an example of interaction between a person and the robot 1 is shown in a half-duplex mode, but the interaction between a person and the robot 1 may be conducted in a full-duplex mode. . For example, in the dialogue shown in Table 1, after the person says ``Open...'', the recognition of the person's voice continues even while saying ``Are you sure?'' and the voice ``Cancel'' is heard. If the robot is recognized during the utterance of ``Are you sure?'', the utterance may be interrupted if the utterance is in progress, and the robot may immediately utter ``Cancelled.''
(3) A configuration in which half-duplex interaction is performed is particularly advantageous because it can simplify configuration and processing and reduce costs. However, it is difficult to tell when the robot 1 turns on the microphone 12, and even when the user speaks, the beginning of the user's voice that the robot recognizes, that is, the beginning of the word, is often missing. The inventors have discovered. In order to solve this problem, it is preferable to display a characteristic screen at the timing when the controller MC turns on the microphone. This supports smooth conversation. As a mode of displaying a characteristic screen at the timing when the controller MC turns on the microphone, it is preferable to perform at least one of the following: 1) lighting up the four corners of the screen, and 2) displaying a microphone icon. In particular, 1) , 2) are both effective.
(4) The motors constituting the first to third motors 23 to 25 can be various types of motors such as DC motors, but stepping motors are particularly preferred, and the posture of the robot 1 is controlled by the stepping motors. It is good to have a configuration. The torque of a stepping motor changes in proportion to the current flowing through the motor. If a large amount of current flows, a large amount of torque can be obtained, but this poses problems such as heat generation and battery life. Therefore, it is particularly preferable that when the robot 1 is stationary, the minimum amount of current required to maintain its posture is passed, and that a large current is passed only when the robot 1 changes its posture. It should be noted that although it may be possible to supply all of the servo motors 23 to 25 with the minimum amount of current necessary to maintain their posture when they are stationary, only the body parts that can be supported by detent torque (torque when not energized) While the first motor 23 is not energized, it is preferable to energize the second motor 24 and the third motor 25 even when the motor is stationary, since the head portion loses detent torque.
Also, if you rotate the motor with the same torque when it is at rest, the lack of torque will cause it to step out and it will not rotate properly, so when rotating it, it is better to increase the torque by passing more current than when it is at rest. On the other hand, when the motor is stationary, it does not require the same large torque as when it is rotating, so it is better to lower the current compared to when it is rotating.
Furthermore, when changing the direction in a certain direction, it is preferable to perform control such that the speed is increased to a constant speed while accelerating, and when the target angle approaches, the speed is decelerated and stopped.
Furthermore, when a robot is driven by a motor, mechanical and electrical noise is generated. Since this noise reduces the recognition rate of speech recognition, it is preferable to control the motor to stop during speech recognition.
The scope of the present invention is not limited to the configurations explicitly described in the specification, but also includes combinations of various aspects of the invention disclosed herein. Of the present invention, the structure for which a patent is sought has been specified in the attached claims, but currently, even if the structure is not specified in the claims, it is not disclosed in this specification. The applicant intends to make such a configuration the scope of a patent claim in the future.
The present invention is not limited to the configuration described in the embodiments described above. The components of each of the embodiments and modifications described above may be arbitrarily selected and combined. Also, any component of each embodiment or modification, any component described in the means for solving the invention, or a component that embodies any component described in the means for solving the invention. It may be configured in any combination. The applicant intends to obtain rights to these matters through amendments to the application or divisional applications. Furthermore, even if there is a description of ``in the case of'' or ``in the case of'', the description is not intended to be limited to those cases or times. We have also disclosed these cases and other configurations, and we intend to acquire the rights. Furthermore, the sections described in order are not limited to this order. It also discloses a configuration in which some parts have been deleted or the order has been changed, and we have the intention to acquire the rights.
In addition, the applicant intends to acquire rights to the entire design or partial design by filing a conversion application to a design application. Although the drawing depicts the entire device using solid lines, the drawing includes not only the overall design but also a partial design that claims some parts of the device. For example, it is a drawing that not only includes some parts of the device as a partial design, but also includes some parts of the device as a partial design regardless of the components. The part of the device may be a part of the device or a part of the device. We intend to obtain rights not only for the entire design, but also for partial designs in which any part of the solid line part of the drawing is a broken line part.

１…装置としてのロボット、４１…他の機器としてのスマートフォン、５１…他の機器としてのスマートスピーカ。

1...Robot as a device, 41...Smartphone as another device, 51...Smart speaker as another device.

Claims

A robot equipped with an interaction function that displays an image of the robot's face on a screen and interacts with a user,
Equipped with a screen that displays a clock as a standby screen for dialogue,
A robot having a function of displaying a faint image of the robot's face on the background of a clock displayed as a standby screen for the dialogue.

The robot according to claim 1, further comprising a function of displaying an image of the robot's face without displaying a clock on the screen when interacting with the user.

While displaying an image of the robot's open eyes as an image of the robot's face without displaying a clock on the screen when interacting with the user,
A function is provided to display an image of closed eyes as a face image of the robot, which is displayed faintly in the background of the clock displayed as a standby screen for the dialogue.
The robot according to claim 2, characterized in that:

Provided with a function to switch the screen during dialogue with the user to a screen that displays the content of the dialogue as a character string as a chat screen, without displaying an image of the robot's face.
The robot according to any one of claims 1 to 3, characterized by:

As the screen when interacting with the user, the content of the interaction is displayed as a chat screen without displaying the image of the robot's face. be equipped with a function to transition to a screen that displays
The robot according to claim 4, characterized by:

A program for causing a computer to realize the functions of the robot according to any one of claims 1 to 5.