JP2003076398A

JP2003076398A - Robot apparatus, robot control method, recording medium, and program

Info

Publication number: JP2003076398A
Application number: JP2001266703A
Authority: JP
Inventors: Kazuo Ishii; 和夫石井; Wataru Onoki; 渡小野木; Akira Hanya; 亮半谷
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2001-09-04
Filing date: 2001-09-04
Publication date: 2003-03-14
Anticipated expiration: 2021-09-04
Also published as: JP4178777B2

Abstract

(57)【要約】【課題】ユーザからの発話に応じて、より好適な応答
を行うことができるようにする。【解決手段】音声認識部２４は、マイクロフォン１１
で集音された音声の特徴パラメータを取得したとき、音
声認識を行う。そして、音声認識部２４は、所定のキー
ワードを検出したとき、それを旋律生成部３２等に通知
する。音声ピッチ分析部２８は、旋律生成部３２からの
指示に基づいて、キーワードの音声を分析し、その音声
のピッチ周波数を抽出する。音声ピッチ分析部２８は、
抽出したピッチ周波数に対応する音階を選択し、それを
旋律生成部３２に通知する。旋律生成部３２は、旋律デ
ータ記憶部３３に記憶されている旋律データを読み出
し、その旋律データを、音声ピッチ分析部２８から通知
されてきた音階に基づいて変換する。旋律生成部３２に
より生成された新たな旋律データは、音声合成部３４に
より再生され、スピーカ１４から出力される。 (57) [Summary] [PROBLEMS] To provide a more suitable response in response to an utterance from a user. SOLUTION: A voice recognition unit 24 includes a microphone 11
When the feature parameter of the voice collected in step is acquired, voice recognition is performed. When detecting the predetermined keyword, the voice recognition unit 24 notifies the melody generation unit 32 and the like of the keyword. The voice pitch analysis unit 28 analyzes the voice of the keyword based on the instruction from the melody generation unit 32 and extracts the pitch frequency of the voice. The voice pitch analysis unit 28
The musical scale corresponding to the extracted pitch frequency is selected and notified to the melody generating unit 32. The melody generator 32 reads the melody data stored in the melody data storage 33 and converts the melody data based on the scale notified from the voice pitch analyzer 28. The new melody data generated by the melody generation unit 32 is reproduced by the voice synthesis unit 34 and output from the speaker 14.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ロボット装置およ
びロボット制御方法、記録媒体、並びにプログラムに関
し、特に、ユーザからの発話に応じて、より好適な応答
を行うことができるようにするロボット装置およびロボ
ット制御方法、記録媒体、並びにプログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a robot device, a robot control method, a recording medium, and a program, and more particularly, to a robot device and a robot device which enable a more suitable response in response to an utterance from a user. The present invention relates to a robot control method, a recording medium, and a program.

【０００２】[0002]

【従来の技術】近年、例えば、周囲の環境や、自らの内
部状態に応じて自律的に各種の行動をとる、エンターテ
イメント用のペットロボットが実現されている。2. Description of the Related Art In recent years, pet robots for entertainment have been realized which autonomously take various actions according to the surrounding environment and the internal state of the pet.

【０００３】そして、このようなペットロボットは、例
えば、使用者から話しかけられたときに、それに応答す
る音を出したり、使用者から頭を叩かれたときに怒って
いることを表わす音を出すようになされている。また、
ユーザがあまり遊んでくれずに、その内部状態が「さみ
しい」という感情になったとき、「さみしい」ことを表
わす（遊んで欲しいことを知らせる）音を出すようにな
されている。Such a pet robot, for example, emits a sound in response to the user's speech, or a sound indicating anger when the user's head is hit. It is done like this. Also,
When the user does not play much and the internal state becomes an emotion of "lonely", a sound indicating "lonely" (notifying that he wants to play) is output.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、このよ
うにペットロボットから出力される音のデータは、ペッ
トロボットに予め用意されているものであるため、内蔵
されている記憶容量などの観点から、その種類が制限さ
れてしまうという課題があった。すなわち、出力される
音がパターン化されてしまうことになる。However, since the sound data output from the pet robot is prepared in advance in the pet robot as described above, the sound data is stored in consideration of the built-in storage capacity. There was a problem that the types were limited. That is, the output sound is patterned.

【０００５】従って、しばらくの間その行動を観察すれ
ば、ユーザは、次に、ペットロボットから発せられる音
を容易に予測できるようになってしまい、ペットロボッ
トとのコミュニケーションが面白みの欠けたものとなっ
てしまう。Therefore, by observing the behavior for a while, the user can easily predict the next sound to be emitted from the pet robot, which makes communication with the pet robot uninteresting. turn into.

【０００６】本発明はこのような状況に鑑みてなされた
ものであり、ユーザからの呼びかけに応じて、より好適
な応答を行うことができるようにしたものである。The present invention has been made in view of such a situation, and is to make it possible to provide a more suitable response in response to a call from a user.

【０００７】[0007]

【課題を解決するための手段】本発明の第１のロボット
装置は、第１の音階から構成される第１の旋律データを
記憶する記憶手段と、入力された音声のピッチ周波数を
抽出する抽出手段と、抽出手段により抽出されたピッチ
周波数に基づいて第２の音階を選択する選択手段と、記
憶手段により記憶されている第１の旋律データを構成す
る第１の音階を、選択手段により選択された第２の音階
に変換して第２の旋律データを生成する生成手段と、生
成手段により生成された第２の旋律データを再生する再
生手段とを備えることを特徴とする。A first robot apparatus according to the present invention comprises a storage means for storing first melody data composed of a first scale and an extraction for extracting a pitch frequency of an inputted voice. Means, selection means for selecting a second scale based on the pitch frequency extracted by the extraction means, and first scale constituting the first melody data stored in the storage means by the selection means. It is characterized in that it is provided with a generating means for converting the generated second melody data by converting the generated second melody data and a reproducing means for reproducing the second melody data generated by the generating means.

【０００８】抽出手段は、所定のキーワードを表わす音
声のピッチ周波数を抽出するようにすることができる。The extracting means can extract the pitch frequency of the voice representing a predetermined keyword.

【０００９】抽出手段は、母音を含む音声のピッチ周波
数を抽出するようにすることができる。The extracting means may extract the pitch frequency of the voice containing the vowel.

【００１０】抽出手段は、音声の所定の期間に検出され
たピッチ周波数の平均値をピッチ周波数として抽出する
ようにすることができる。The extracting means may extract the average value of the pitch frequencies detected during a predetermined period of the voice as the pitch frequency.

【００１１】生成手段は、第２の音階を、所定の数のオ
クターブだけ遷移させて第２の旋律データを生成するよ
うにすることができる。The generating means can generate the second melody data by transitioning the second scale by a predetermined number of octaves.

【００１２】本発明の第１のロボット装置のロボット制
御方法は、第１の音階から構成される第１の旋律データ
を記憶する記憶ステップと、入力された音声のピッチ周
波数を抽出する抽出ステップと、抽出ステップの処理に
より抽出されたピッチ周波数に基づいて第２の音階を選
択する選択ステップと、記憶ステップの処理により記憶
されている第１の旋律データを構成する第１の音階を、
選択ステップの処理により選択された第２の音階に変換
して第２の旋律データを生成する生成ステップと、生成
ステップの処理により生成された第２の旋律データを再
生する再生ステップとを含むことを特徴とする。A first robot control method for a robot apparatus according to the present invention comprises a storage step of storing first melody data composed of a first scale, and an extraction step of extracting a pitch frequency of an input voice. , A selection step of selecting a second scale based on the pitch frequency extracted by the processing of the extraction step, and a first scale constituting the first melody data stored by the storage step,
A generation step of converting the second scale selected by the processing of the selection step to generate second melody data; and a reproduction step of reproducing the second melody data generated by the processing of the generation step Is characterized by.

【００１３】本発明の第１の記録媒体のプログラムは、
第１の音階から構成される第１の旋律データの記憶を制
御する記憶制御ステップと、入力された音声のピッチ周
波数の抽出を制御する抽出ステップと、抽出ステップの
処理により抽出されたピッチ周波数に基づいて第２の音
階を選択する選択ステップと、記憶制御ステップの処理
により記憶されている第１の旋律データを構成する第１
の音階を、選択ステップの処理により選択された第２の
音階に変換して第２の旋律データを生成する生成ステッ
プと、生成ステップの処理により生成された第２の旋律
データの再生を制御する再生制御ステップとを含むこと
を特徴とする。The program of the first recording medium of the present invention is
A storage control step for controlling storage of the first melody data composed of the first scale, an extraction step for controlling extraction of the pitch frequency of the input voice, and a pitch frequency extracted by the processing of the extraction step. A selection step for selecting a second scale based on the first scale, and a first step constituting the first melody data stored by the processing of the storage control step.
Of the second melody data generated by the process of the generating step, and controlling the reproduction of the second melody data generated by the process of the selecting step. And a reproduction control step.

【００１４】本発明の第１のプログラムは、第１の音階
から構成される第１の旋律データの記憶を制御する記憶
制御ステップと、入力された音声のピッチ周波数の抽出
を制御する抽出ステップと、抽出ステップの処理により
抽出されたピッチ周波数に基づいて第２の音階を選択す
る選択ステップと、記憶制御ステップの処理により記憶
されている第１の旋律データを構成する第１の音階を、
選択ステップの処理により選択された第２の音階に変換
して第２の旋律データを生成する生成ステップと、生成
ステップの処理により生成された第２の旋律データの再
生を制御する再生制御ステップとを含むことを特徴とす
る。A first program of the present invention comprises a storage control step of controlling storage of first melody data composed of a first scale, and an extraction step of controlling extraction of a pitch frequency of input voice. A selection step of selecting a second scale based on the pitch frequency extracted by the processing of the extraction step, and a first scale constituting the first melody data stored by the processing of the storage control step,
A generation step of converting the second scale selected by the processing of the selection step to generate the second melody data, and a reproduction control step of controlling reproduction of the second melody data generated by the processing of the generation step. It is characterized by including.

【００１５】本発明の第２のロボット装置は、第１の音
階から構成される第１の旋律データを記憶する記憶手段
と、自らの内部状態を管理する管理手段と、管理手段に
より管理される内部状態が変化したとき、内部状態の変
化に対応する第２の音階を選択する選択手段と、記憶手
段により記憶されている第１の旋律データを構成する第
１の音階を、選択手段により選択された第２の音階に変
換して第２の旋律データを生成する生成手段と、生成手
段により生成された第２の旋律データを再生する再生手
段とを備えることを特徴とする。The second robot device of the present invention is managed by the storage means for storing the first melody data composed of the first scale, the management means for managing its own internal state, and the management means. When the internal state changes, the selecting means selects the second scale corresponding to the change in the internal state, and the first scale forming the first melody data stored in the storage means is selected by the selecting means. It is characterized in that it is provided with a generating means for converting the generated second melody data by converting the generated second melody data and a reproducing means for reproducing the second melody data generated by the generating means.

【００１６】生成手段は、第２の音階を、所定の数のオ
クターブだけ遷移させて第２の旋律データを生成するよ
うにすることができる。The generating means can generate the second melody data by transitioning the second scale by a predetermined number of octaves.

【００１７】本発明の第２のロボット装置のロボット制
御方法は、第１の音階から構成される第１の旋律データ
を記憶する記憶ステップと、自らの内部状態を管理する
管理ステップと、管理ステップの処理により管理される
内部状態が変化したとき、内部状態の変化に対応する第
２の音階を選択する選択ステップと、記憶ステップの処
理により記憶されている第１の旋律データを構成する第
１の音階を、選択ステップの処理により選択された第２
の音階に変換して第２の旋律データを生成する生成ステ
ップと、生成ステップの処理により生成された第２の旋
律データを再生する再生ステップとを含むことを特徴と
する。A second robot control method for a robot apparatus according to the present invention comprises a storage step of storing first melody data composed of a first scale, a management step of managing its own internal state, and a management step. When the internal state managed by the process of 1 changes, the selection step of selecting the second scale corresponding to the change of the internal state and the first melody data which is stored by the processing of the storing step Of the second scale selected by the processing in the selection step.
And a reproducing step of reproducing the second melody data generated by the processing of the generating step.

【００１８】本発明の第２の記録媒体のプログラムは、
第１の音階から構成される第１の旋律データの記憶を制
御する記憶制御ステップと、自らの内部状態を管理する
管理ステップと、管理ステップの処理により管理される
内部状態が変化したとき、内部状態の変化に対応する第
２の音階を選択する選択ステップと、記憶制御ステップ
の処理により記憶されている第１の旋律データを構成す
る第１の音階を、選択ステップの処理により選択された
第２の音階に変換して第２の旋律データを生成する生成
ステップと、生成ステップの処理により生成された第２
の旋律データの再生を制御する再生制御ステップとを含
むことを特徴とする。The program of the second recording medium of the present invention is
A storage control step for controlling storage of the first melody data composed of a first scale, a management step for managing the internal state of itself, and an internal state when the internal state managed by the processing of the management step changes. The selection step of selecting the second scale corresponding to the change in the state, and the first scale constituting the first melody data stored by the processing of the memory control step are selected by the processing of the selection step. A second step generated by the process of the generating step, which generates the second melody data by converting into the second scale.
And a reproduction control step for controlling reproduction of the melody data of.

【００１９】本発明の第２のプログラムは、第１の音階
から構成される第１の旋律データの記憶を制御する記憶
制御ステップと、自らの内部状態を管理する管理ステッ
プと、管理ステップの処理により管理される内部状態が
変化したとき、内部状態の変化に対応する第２の音階を
選択する選択ステップと、記憶制御ステップの処理によ
り記憶されている第１の旋律データを構成する第１の音
階を、選択ステップの処理により選択された第２の音階
に変換して第２の旋律データを生成する生成ステップ
と、生成ステップの処理により生成された第２の旋律デ
ータの再生を制御する再生制御ステップとを含むことを
特徴とする。The second program of the present invention is a storage control step for controlling storage of the first melody data composed of the first scale, a management step for managing the internal state of itself, and processing of the management step. When the internal state managed by the change occurs, the selection step of selecting the second scale corresponding to the change of the internal state, and the first melody data forming the first melody data stored by the processing of the storage control step. A generation step of converting the scale into the second scale selected by the processing of the selection step to generate the second melody data, and a reproduction for controlling reproduction of the second melody data generated by the processing of the generation step. And a control step.

【００２０】本発明の第１のロボット装置およびロボッ
ト制御方法、並びにプログラムにおいては、第１の音階
から構成される第１の旋律データが記憶され、入力され
た音声のピッチ周波数が抽出され、抽出されたピッチ周
波数に基づいて第２の音階が選択される。また、記憶さ
れている第１の旋律データを構成する第１の音階が、選
択された第２の音階に変換されて第２の旋律データが生
成され、第２の旋律データが再生される。In the first robot apparatus, the robot control method, and the program of the present invention, the first melody data composed of the first scale is stored, and the pitch frequency of the inputted voice is extracted and extracted. A second scale is selected based on the pitch frequency of the selected pitch. Further, the first scale constituting the stored first melody data is converted into the selected second scale to generate the second melody data, and the second melody data is reproduced.

【００２１】本発明の第２のロボット装置およびロボッ
ト制御方法、並びにプログラムにおいては、第１の音階
から構成される第１の旋律データが記憶され、自らの内
部状態が管理され、管理される内部状態が変化したと
き、内部状態の変化に対応する第２の音階が選択され
る。また、記憶されている第１の旋律データを構成する
第１の音階が、選択された第２の音階に変換されて第２
の旋律データが生成され、生成された第２の旋律データ
が再生される。In the second robot apparatus, the robot control method, and the program of the present invention, the first melody data composed of the first scale is stored, the internal state of itself is managed, and the managed internal When the state changes, the second scale corresponding to the internal state change is selected. In addition, the first scale that constitutes the stored first melody data is converted into the selected second scale, and the second scale is converted into the second scale.
Is generated, and the generated second melody data is reproduced.

【００２２】[0022]

【発明の実施の形態】図１は、本発明を適用したペット
ロボット１の外観構成の例を示す斜視図である。1 is a perspective view showing an example of the external configuration of a pet robot 1 to which the present invention is applied.

【００２３】図に示すように、例えば、ペットロボット
１は、四つ足の犬形状のものとされており、胴体部ユニ
ット２の前後左右に、それぞれ脚部ユニット３Ａ，３
Ｂ，３Ｃ，３Ｄが連結されるとともに、胴体部ユニット
２の前端部と後端部に、それぞれ頭部ユニット４と尻尾
部ユニット５が連結されている。As shown in the figure, for example, the pet robot 1 is in the shape of a dog with four legs, and the leg units 3A, 3 are provided on the front, rear, left and right of the body unit 2, respectively.
B, 3C, and 3D are connected, and a head unit 4 and a tail unit 5 are connected to the front end and the rear end of the body unit 2, respectively.

【００２４】尻尾部ユニット５は、胴体部ユニット２の
上面に設けられたベース部５Ｂから、２自由度をもっ
て、湾曲または揺動自在に引き出されている。The tail unit 5 is drawn out from the base portion 5B provided on the upper surface of the body unit 2 so as to be curved or swingable with two degrees of freedom.

【００２５】このような外観構成を有するペットロボッ
ト１には、後に詳述するように、例えば、ユーザから
「おはよう」などと呼びかけられた場合に、それに応答
するための旋律（メロディ）データが用意されている。
そして、ペットロボット１は、ユーザからの音声を分析
し、その分析結果に基づいて、用意されている旋律デー
タを変換し、変換後に得られた旋律データを再生する。
すなわち、ユーザ毎、或いはその発話毎に旋律データが
変換されて、出力されることになる。As will be described later in detail, the pet robot 1 having such an external structure has melody (melody) data for responding to, for example, when a user calls "Good morning". Has been done.
Then, the pet robot 1 analyzes the voice from the user, converts the prepared melody data based on the analysis result, and reproduces the melody data obtained after the conversion.
That is, the melody data is converted and output for each user or each utterance.

【００２６】従って、１種類の旋律データであっても、
毎回、ユーザからの発話に応じた旋律データに変換され
るため、ユーザの歌いかけに対応した音が、ペットロボ
ット１から出力されることになる。これにより、ペット
ロボット１との相互理解度が深まり、コミュニケーショ
ンに飽きてしまうといったことを抑制することができ、
より好適なコミュニケーションを図ることができる。Therefore, even with one kind of melody data,
Since it is converted into melody data according to the utterance from the user every time, the pet robot 1 outputs a sound corresponding to the user's singing. As a result, the degree of mutual understanding with the pet robot 1 is deepened, and it is possible to prevent the user from getting tired of communication.
More suitable communication can be achieved.

【００２７】図２は、図１のペットロボット１の内部構
成の例を示すブロック図である。FIG. 2 is a block diagram showing an example of the internal configuration of the pet robot 1 of FIG.

【００２８】胴体部ユニット２には、ペットロボット１
の全体を制御するコントローラ１０が格納されている。
このコントローラ１０には、基本的に、CPU(Central Pr
ocessing Unit)１０Ａ、およびCPU１０Ａが各部を制御
するためのプログラムが記憶されているメモリ１０Ｂが
設けられている。The torso unit 2 includes a pet robot 1
A controller 10 for controlling the whole is stored.
The controller 10 basically includes a CPU (Central Pr
There is provided a memory 10B in which a program for the CPU 10A to control each unit is stored.

【００２９】また、胴体部ユニット２には、コントロー
ラ１０の他、例えば、ペットロボット１の動力源となる
バッテリ（図示せず）等も格納されている。In addition to the controller 10, the body unit 2 also stores, for example, a battery (not shown) which is a power source of the pet robot 1.

【００３０】図２に示すように、頭部ユニット４には、
外部からの刺激を感知するセンサとしての、音を感知す
る「耳」に相当するマイクロフォン１１、CCD(Charge C
oupled Device)やCMOS(Complementary Metal Oxide Sem
iconductor)イメージセンサなどから構成され、光を感
知する「目」に相当するカメラ１２、およびユーザがふ
れることによる圧力等を感知する触覚に相当するタッチ
センサ１３が、それぞれ所定の位置に設けられている。
また、頭部ユニット４には、ペットロボット１の「口」
に相当するスピーカ１４が、所定の位置に設置されてい
る。As shown in FIG. 2, the head unit 4 includes:
Microphone 11, CCD (Charge C) that corresponds to the "ear" that senses sound as a sensor that senses an external stimulus
Oupled Device) and CMOS (Complementary Metal Oxide Sem)
A camera 12 corresponding to an “eye” that senses light and a touch sensor 13 that corresponds to a tactile sensation such as pressure caused by a user's touch are provided at predetermined positions. There is.
In addition, the head unit 4 has a “mouth” of the pet robot 1.
The speaker 14 corresponding to is installed at a predetermined position.

【００３１】脚部ユニット３Ａ乃至３Ｄのそれぞれの関
節部分、脚部ユニット３Ａ乃至３Ｄのそれぞれと胴体部
ユニット２の連結部分、頭部ユニット４と胴体部ユニッ
ト２の連結部分、並びに尻尾部ユニット５と胴体部ユニ
ット２の連結部分などには、アクチュエータが設置され
ている。アクチュエータは、コントローラ１０からの指
示に基づいて各部を動作させる。Joint parts of the leg units 3A to 3D, connecting parts of the leg units 3A to 3D and the body unit 2, connecting parts of the head unit 4 and the body unit 2, and a tail unit 5 respectively. An actuator is installed at a connecting portion between the body unit 2 and the body unit 2. The actuator operates each part based on an instruction from the controller 10.

【００３２】図２の例においては、脚部ユニット３Ａに
は、アクチュエータ３ＡＡ₁乃至３ＡＡ_kが設けられ、脚
部ユニット３Ｂには、アクチュエータ３ＢＡ₁乃至３Ｂ
Ａ_kが設けられている。また、脚部ユニット３Ｃには、
アクチュエータ３ＣＡ₁乃至３ＣＡ_kが設けられ、脚部ユ
ニット３Ｄには、アクチュエータ３ＤＡ₁乃至３ＤＡ _kが
設けられている。さらに、頭部ユニット４には、アクチ
ュエータ４Ａ₁乃至４Ａ_Lが設けられており、尻尾部ユニ
ット５には、アクチュエータ５Ａ₁および５Ａ ₂がそれぞ
れ設けられている。In the example of FIG. 2, the leg unit 3A is
Is the actuator 3AA₁Through 3AA_kProvided with legs
The unit 3B includes an actuator 3BA₁Through 3B
A_kIs provided. Also, in the leg unit 3C,
Actuator 3CA₁To 3 CA_kIs installed on the leg
Actuator 3DA for knit 3D₁Through 3DA _kBut
It is provided. Furthermore, the head unit 4 is
Player 4A₁Through 4A_LIs provided and the tail uni
The actuator 5A₁And 5A ₂Is that
It is provided.

【００３３】頭部ユニット４に設置されるマイクロフォ
ン１１は、ユーザからの発話を含む周囲の音声（音）を
集音し、得られた音声信号をコントローラ１０に出力す
る。カメラ１２は、周囲の状況を撮像し、得られた画像
信号を、コントローラ１０に出力する。タッチセンサ１
３は、例えば、頭部ユニット４の上部に設けられてお
り、ユーザからの「撫でる」や「叩く」といった物理的
な働きかけにより受けた圧力を検出し、その検出結果を
圧力検出信号としてコントローラ１０に出力する。The microphone 11 installed in the head unit 4 collects the surrounding voice (sound) including the utterance from the user and outputs the obtained voice signal to the controller 10. The camera 12 captures an image of the surrounding situation and outputs the obtained image signal to the controller 10. Touch sensor 1
3 is provided on the upper part of the head unit 4, detects the pressure received by a physical action such as "stroking" or "striking" from the user, and the detection result is used as a pressure detection signal in the controller 10 Output to.

【００３４】コントローラ１０は、マイクロフォン１
１、カメラ１２、およびタッチセンサ１３から与えられ
る音声信号、画像信号、圧力検出信号に基づいて、周囲
の状況や、ユーザからの指令、ユーザからの働きかけな
どの有無を判断し、その判断結果に基づいて、ペットロ
ボット１が次にとる行動を決定する。そして、コントロ
ーラ１０は、その決定に基づいて、必要なアクチュエー
タを駆動させ、これにより、頭部ユニット４を上下左右
に振らせたり、尻尾部ユニット５を動かせたり、脚部ユ
ニット３Ａ乃至３Ｄのそれぞれを駆動して、ペットロボ
ット１を歩行させるなどの行動をとらせる。The controller 10 is the microphone 1
1. Based on the audio signal, the image signal, and the pressure detection signal provided from the camera 12, the touch sensor 13, the presence / absence of the surrounding condition, the instruction from the user, the action from the user, and the like are determined, and the determination result is used. Based on this, the next action taken by the pet robot 1 is determined. Then, the controller 10 drives the necessary actuator based on the determination, thereby swinging the head unit 4 vertically and horizontally, moving the tail unit 5, and each of the leg units 3A to 3D. To drive the pet robot 1 to walk.

【００３５】その他にも、コントローラ１０は、ペット
ロボット１の「目」の位置に設けられたLED(Light Emit
ting Diode)４１（図３参照）を点灯、消灯または点滅
させるなどの処理を行う。In addition, the controller 10 is an LED (Light Emit) provided at the position of the "eye" of the pet robot 1.
(Ting Diode) 41 (see FIG. 3) is turned on, turned off, or blinked.

【００３６】図３は、図２のコントローラ１０の機能的
構成例を示すブロック図である。なお、図３に示す各機
能は、メモリ１０Bに記憶されている制御プログラムをC
PU１０Aが実行することで実現される。FIG. 3 is a block diagram showing a functional configuration example of the controller 10 shown in FIG. In addition, each function shown in FIG. 3 is a control program stored in the memory 10B.
It is realized by the execution of the PU 10A.

【００３７】マイクロフォン１１からの音声信号は、AD
(Analog Digital)変換部２１に供給される。AD変換部２
１は、マイクロフォン１１から供給されてきた音声信号
をサンプリング、および量子化し、音声データに変換す
る。AD変換部２１により取得された音声データは、音声
特徴量分析部２２、および音声ピッチ分析部２８に供給
される。The audio signal from the microphone 11 is AD
It is supplied to the (Analog Digital) converter 21. AD converter 2
1 samples and quantizes the audio signal supplied from the microphone 11, and converts it into audio data. The voice data acquired by the AD conversion unit 21 is supplied to the voice feature amount analysis unit 22 and the voice pitch analysis unit 28.

【００３８】音声特徴量分析部２２は、入力される音声
データについて、適当なフレーム毎に、例えば、MFCC(M
el Frequency Cepstrum Coefficient)分析を行い、その
分析結果を、特徴パラメータ（特徴ベクトル）として、
音声区間検出部２３に出力する。なお、音声特徴量分析
部２２では、その他、例えば、線形予測係数、ケプスト
ラム係数、線スペクトル対、所定の周波数帯域ごとのパ
ワー（フィルタバンクの出力）等を、特徴パラメータと
して抽出することが可能である。The voice feature quantity analysis unit 22 receives, for example, MFCC (M
el Frequency Cepstrum Coefficient) analysis, and the analysis result is used as a feature parameter (feature vector).
It is output to the voice section detection unit 23. In addition, the voice feature amount analysis unit 22 can also extract, for example, a linear prediction coefficient, a cepstrum coefficient, a pair of line spectra, a power (output of a filter bank) for each predetermined frequency band, as a feature parameter. is there.

【００３９】音声区間検出部２３は、音声特徴量分析部
２２から供給される特徴パラメータに基づいて、例え
ば、所定の閾値以上の値をとるパワーが所定の期間継続
して検出されたか否かに基づいて音声区間を検出する。
そして、音声区間検出部２３は、検出した音声区間情報
を、音声特徴量分析部２２から供給されてきた特徴パラ
メータとともに音声認識部２４に出力する。また、音声
区間検出部２３により検出された音声区間情報は、音声
ピッチ分析部２８にも供給されている。Based on the characteristic parameters supplied from the speech characteristic amount analyzing section 22, the voice section detecting section 23 determines whether or not power having a value equal to or larger than a predetermined threshold value is continuously detected for a predetermined period. Based on this, the voice section is detected.
Then, the voice section detection unit 23 outputs the detected voice section information to the voice recognition unit 24 together with the feature parameter supplied from the voice feature amount analysis unit 22. The voice section information detected by the voice section detecting unit 23 is also supplied to the voice pitch analyzing unit 28.

【００４０】音声認識部２４は、音声認識する音声の言
語における個々の音素や音節などの音響的な特徴を表す
音響モデル、認識対象の各単語について、その発音に関
する情報（音韻情報）が記述された単語辞書、および、
単語辞書に登録されている各単語が、どのように連鎖す
る（つながる）のかを記述した文法規則を記憶してい
る。ここで、文法規則としては、例えば、文脈自由文法
（CFG）や、統計的な単語連鎖確率（N-gram）などに基
づく規則を用いることができる。The speech recognition unit 24 describes an acoustic model representing acoustic features such as individual phonemes and syllables in the language of speech to be recognized, and information (phonological information) about pronunciation of each word to be recognized. Word dictionary, and
It stores grammatical rules that describe how the words registered in the word dictionary are chained (connected). Here, as the grammar rule, for example, a rule based on context-free grammar (CFG), statistical word chain probability (N-gram), or the like can be used.

【００４１】そして、音声認識部２４は、音声特徴量分
析部２２からの特徴パラメータを用いて、そのような音
響モデル、単語辞書、および文法規則を必要に応じて参
照しながら、マイクロフォン１１に入力された音声を、
例えば、連続分布HMM(HiddenMarkov Model)法に基づい
て音声認識する。Then, the voice recognition unit 24 uses the feature parameters from the voice feature amount analysis unit 22 to input to the microphone 11 while referring to such an acoustic model, word dictionary, and grammar rules as needed. The voice
For example, speech recognition is performed based on the continuous distribution HMM (Hidden Markov Model) method.

【００４２】具体的には、音声認識部２４は、単語辞書
を参照し、音響モデルを接続することで、単語の音響モ
デル（単語モデル）を構成する。また、音声認識部２４
は、幾つかの単語モデルを、文法規則を参照することに
より接続し、そのようにして接続された単語モデルと特
徴パラメータに基づき、連続分布HMM法によって、マイ
クロフォン１１に入力された音声を認識する。即ち、音
声認識部２４は、音声特徴量分析部２２（音声区間検出
部２３）が出力する時系列の特徴パラメータが観測され
るスコア（尤度）が最も高い単語モデルの系列を検出
し、その単語モデルの系列に対応する単語列を、音声の
認識結果として出力する。Specifically, the voice recognition unit 24 constructs an acoustic model of a word (word model) by referring to the word dictionary and connecting acoustic models. In addition, the voice recognition unit 24
Connects several word models by referring to the grammar rules, and recognizes the speech input to the microphone 11 by the continuous distribution HMM method based on the word model and the feature parameters thus connected. . That is, the voice recognition unit 24 detects the word model sequence having the highest score (likelihood) at which the time-series feature parameters output by the voice feature amount analysis unit 22 (voice section detection unit 23) are detected, and A word string corresponding to a series of word models is output as a speech recognition result.

【００４３】つまり、音声認識部２４は、接続された単
語モデルに対応する単語列について、音声特徴量分析部
２２からの特徴パラメータの出現確率を累積し、その累
積値をスコアとして、そのスコアを最も高くする単語列
を、音声認識結果として出力する。That is, the voice recognition unit 24 accumulates the appearance probability of the feature parameter from the voice feature amount analysis unit 22 for the word string corresponding to the connected word model, and uses the accumulated value as a score to obtain the score. The highest word string is output as the voice recognition result.

【００４４】例えば、音声認識部２４は、「歩け」、
「伏せ」、「ボールを追いかけろ」等の指令その他を、
音声認識情報として、本能・感情管理部２６、行動管理
部２７に通知する。また、音声認識部２４は、所定のキ
ーワードを認識したとき、それを旋律生成部３２等に通
知する。For example, the voice recognition unit 24 is "walking",
Commands such as "prone" and "follow the ball"
The instinct / emotion management unit 26 and the behavior management unit 27 are notified as voice recognition information. In addition, when the voice recognition unit 24 recognizes a predetermined keyword, the voice recognition unit 24 notifies the melody generation unit 32 or the like of it.

【００４５】カメラ１２により撮像された画像信号、お
よびタッチセンサ１３により検出された圧力検出信号
は、センサデータ処理部２５に出力される。センサデー
タ処理部２５には、カメラ１２から供給されてきた画像
信号に基づいて画像認識を行う画像認識部、およびタッ
チセンサ１３から供給されてきた圧力検出信号を処理す
るタッチセンサ入力処理部（いずれも図示せず）が設け
られている。The image signal picked up by the camera 12 and the pressure detection signal detected by the touch sensor 13 are output to the sensor data processing section 25. The sensor data processing unit 25 includes an image recognition unit that performs image recognition based on the image signal supplied from the camera 12, and a touch sensor input processing unit that processes the pressure detection signal supplied from the touch sensor 13 (whichever Is also provided).

【００４６】この画像認識部は、カメラ１２から与えら
れる画像信号を用いて、画像認識を行い、その処理の結
果、例えば、「赤い丸いもの」や、「地面に対して垂直
な、かつ、所定の高さ以上の平面」等を検出したときに
は、「ボールがある」や、「壁がある」等の画像認識結
果を画像認識情報として、本能・感情管理部２６、およ
び行動管理部２７に出力する。This image recognition unit performs image recognition using the image signal supplied from the camera 12, and as a result of the processing, for example, "a red round object" or "a perpendicular to the ground and a predetermined value. When detecting a "plane equal to or higher than the height of", the image recognition result such as "there is a ball" or "there is a wall" is output to the instinct / emotion management unit 26 and the behavior management unit 27 as image recognition information. To do.

【００４７】タッチセンサ入力処理部は、タッチセンサ
１３から与えられる圧力検出信号を処理し、その処理の
結果、所定の閾値以上で、かつ、短時間の圧力を検出し
たときには、「叩かれた（しかられた）」と認識し、所
定の閾値未満で、かつ、長時間の圧力を検出したときに
は、「なでられた（ほめられた）」と認識して、その認
識結果を、状態認識情報として、本能・感情管理部２
６、および行動管理部２７に出力する。The touch sensor input processing unit processes the pressure detection signal given from the touch sensor 13, and when the pressure detection signal which is equal to or higher than a predetermined threshold value for a short period of time is detected as a result of the processing, it is "stripped ( If the pressure is below a predetermined threshold for a long time, it is recognized as "stroked (praised)" and the recognition result is used as state recognition information. As instinct / emotion management unit 2
6 and the behavior management unit 27.

【００４８】本能・感情管理部２６は、ペットロボット
１の本能、または感情を表わすパラメータを管理し、本
能の状態、または感情の状態を所定のタイミングで行動
管理部２７に出力する。The instinct / emotion management unit 26 manages parameters representing the instinct or emotion of the pet robot 1, and outputs the state of instinct or the state of emotion to the behavior management unit 27 at a predetermined timing.

【００４９】図４は、図３の本能・感情管理部２６の機
能構成例を模式的に示す図であり、図４に示すように、
本能・感情管理部２６は、ペットロボット１の感情を表
現する感情モデル５１と、本能を表現する本能モデル５
２を記憶し、管理している。FIG. 4 is a diagram schematically showing an example of the functional configuration of the instinct / emotion management unit 26 shown in FIG. 3. As shown in FIG.
The instinct / emotion management unit 26 includes an emotion model 51 that expresses the emotion of the pet robot 1 and an instinct model 5 that expresses the instinct.
2 is stored and managed.

【００５０】感情モデル５１は、例えば、「うれし
さ」、「悲しさ」、「怒り」、「驚き」、「恐れ」、
「嫌悪」等の感情の状態（度合い）を、所定の範囲（例
えば、０乃至１００等）の感情パラメータによってそれ
ぞれ表し、音声認識部２４、およびセンサデータ処理部
２５からの出力や時間経過等に基づいて、その値を変化
させる。The emotion model 51 includes, for example, "joy", "sadness", "anger", "surprise", "fear",
The emotional state (degree) such as "disgust" is expressed by emotion parameters in a predetermined range (for example, 0 to 100), and the output from the voice recognition unit 24 and the sensor data processing unit 25, the passage of time, etc. Based on that, the value is changed.

【００５１】この例においては、感情モデル５１は、
「うれしさ」を表わす感情ユニット５１Ａ、「悲しさ」
を表わす感情ユニット５１Ｂ、「怒り」を表わす感情ユ
ニット５１Ｃ、「驚き」を表わす感情ユニット５１Ｄ、
「恐れ」を表わす感情ユニット５１Ｅ、および、「嫌
悪」を表わす感情ユニット５１Ｆから構成されている。In this example, the emotion model 51 is
Emotional unit 51A expressing "joy", "Sadness"
An emotional unit 51B that represents, an emotional unit 51C that represents "anger", an emotional unit 51D that represents "surprise",
It is composed of an emotion unit 51E representing "fear" and an emotion unit 51F representing "dislike".

【００５２】本能モデル５２は、例えば、「食欲」、
「睡眠欲」、「運動欲」等の本能による欲求の状態（度
合い）を、所定の範囲の本能パラメータによってそれぞ
れ表し、音声認識部２４、およびセンサデータ処理部２
５からの出力や時間経過等に基づいて、その値を変化さ
せる。また、本能モデル５２は、行動履歴に基づいて
「運動欲」を表わすパラメータを高めたり、或いは、バ
ッテリー残量に基づいて「食欲」を表わすパラメータを
高めたりする。The instinct model 52 is, for example, “appetite”,
The states (degrees) of instinct such as "sleep desire" and "exercise desire" are represented by instinct parameters in a predetermined range, and the voice recognition unit 24 and the sensor data processing unit 2
The value is changed based on the output from 5 and the passage of time. Further, the instinct model 52 enhances the parameter indicating “motility” based on the action history, or enhances the parameter indicating “appetite” based on the remaining battery level.

【００５３】この例においては、本能モデル５２は、
「運動欲」を表わす本能ユニット５２Ａ、「愛情欲」を
表わす本能ユニット５２Ｂ、「食欲」を表わす本能ユニ
ット５２Ｃ、「好奇心」を表わす本能ユニット５２Ｄ、
および「睡眠欲」を表わす本能ユニット５２Ｅから構成
されている。In this example, the instinct model 52 is
An instinct unit 52A representing "motivation", an instinct unit 52B representing "love", an instinct unit 52C representing "appetite", an instinct unit 52D representing "curiosity",
And an instinct unit 52E indicating "sleep desire".

【００５４】本能・感情管理部２６は、このような感情
ユニット５１Ａ乃至５１Ｆと本能ユニット５２Ａ乃至５
２Ｅのパラメータを変化させることにより、ペットロボ
ット１の感情と本能の状態を表現し、その変化をモデル
化している。The instinct / emotion management section 26 has the above emotion units 51A to 51F and instinct units 52A to 5F.
By changing the parameters of 2E, the emotion and instinct of the pet robot 1 are expressed and the change is modeled.

【００５５】また、感情ユニット５１Ａ乃至５１Ｆと本
能ユニット５２Ａ乃至５２Ｅのパラメータは、外部から
の入力だけでなく、図の矢印で示すように、それぞれの
ユニット同士が相互に影響しあうことによっても変化さ
れる。Further, the parameters of the emotion units 51A to 51F and the instinct units 52A to 52E are changed not only by external input, but also by mutual influence of the respective units as shown by arrows in the figure. To be done.

【００５６】例えば、「うれしさ」を表現する感情ユニ
ット５１Ａと「悲しさ」を表現する感情ユニット５１Ｂ
が相互抑制的に結合することにより、本能・感情管理部
２６は、ユーザにほめてもらったときには「うれしさ」
を表現する感情ユニット５１Ａのパラメータを大きくす
るとともに、「悲しさ」を表現する感情ユニット５１Ｂ
のパラメータを小さくするなどして、表現する感情を変
化させる。For example, an emotional unit 51A expressing "joy" and an emotional unit 51B expressing "sadness".
Instinct / emotion management unit 26 is "joyful" when the user compliments each other due to the mutual inhibition.
The emotion unit 51B that expresses "sadness" while increasing the parameter of the emotion unit 51A that expresses
Change the emotion to be expressed, for example, by reducing the parameter of.

【００５７】また、感情モデル５１を構成する各ユニッ
ト同士、および本能モデル５２を構成する各ユニット同
士だけでなく、双方のモデルを超えて、それぞれのユニ
ットのパラメータが変化される。Further, the parameters of the respective units constituting the emotion model 51 and the units constituting the instinct model 52 are changed not only between the respective units but also over both models.

【００５８】例えば、図に示すように、本能モデル５２
の「愛情欲」を表わす本能ユニット５２Ｂや、「食欲」
を表わす本能ユニット５２Ｃのパラメータの変化に応じ
て、感情モデル５１の「悲しさ」を表現する感情ユニッ
ト５１Ｂや「怒り」を表現する感情ユニット５１Ｃのパ
ラメータが変化される。For example, as shown in FIG.
Instinct unit 52B that expresses "love desire" and "appetite"
The parameters of the emotional unit 51B expressing "sadness" and the emotional unit 51C expressing "anger" of the emotion model 51 are changed according to the change of the parameter of the instinct unit 52C.

【００５９】具体的には、「食欲」を表わす本能ユニッ
ト５２Ｃのパラメータが大きくなったとき、感情モデル
５１の「悲しさ」を表現する感情ユニット５１Ｂや、
「怒り」を表現する感情ユニット５１Ｃのパラメータが
大きくなる。Specifically, when the parameter of the instinct unit 52C representing "appetite" becomes large, the emotion unit 51B expressing "sadness" of the emotion model 51,
The parameter of the emotion unit 51C expressing "anger" becomes large.

【００６０】なお、より詳細には、感情モデル５１や本
能モデル５２だけでなく、成長モデルが本能・感情管理
部２６に用意され、その成長段階によって、感情モデル
５１や本能モデル５２の各ユニットのパラメータが変化
される。この成長モデルは、例えば、「幼年期」、「青
年期」、「熟年期」、「老年期」等の成長の状態（度合
い）を、所定の範囲の値によってそれぞれ表し、音声認
識部２４、およびセンサデータ処理部２５からの出力や
時間経過等に基づいて、その値を変化させる。More specifically, not only the emotion model 51 and the instinct model 52, but also a growth model is prepared in the instinct / emotion management unit 26, and each unit of the emotion model 51 and the instinct model 52 is selected depending on the growth stage. The parameters are changed. This growth model represents, for example, growth states (degrees) such as “childhood”, “adolescence”, “mature”, “old age” by a value in a predetermined range, and the speech recognition unit 24, The value is changed based on the output from the sensor data processing unit 25, the passage of time, and the like.

【００６１】本能・感情管理部２６は、感情モデル、本
能モデル等のパラメータで表される感情、本能等の状態
を、内部情報として行動管理部２７に出力する。The instinct / emotion management unit 26 outputs the emotions and instinct states represented by parameters such as the emotion model and the instinct model to the action management unit 27 as internal information.

【００６２】なお、本能・感情管理部２６には、音声認
識部２４、およびセンサデータ処理部２５から認識情報
が供給される他に、行動管理部２７から、ペットロボッ
ト１の現在、または過去の行動、具体的には、例えば、
「長時間歩いた」などの行動の内容を示す行動情報が供
給されるようになされている。そして、本能・感情管理
部２６は、同一の認識情報等が与えられた場合であって
も、行動情報により示されるペットロボット１の行動に
応じて、異なる内部情報を生成する。The instinct / emotion management unit 26 is supplied with recognition information from the voice recognition unit 24 and the sensor data processing unit 25, and the behavior management unit 27 supplies the present or past information of the pet robot 1. Actions, specifically, for example,
Behavior information indicating the content of the behavior such as "walking for a long time" is supplied. Then, the instinct / emotion management unit 26 generates different internal information according to the action of the pet robot 1 indicated by the action information even when the same recognition information or the like is given.

【００６３】例えば、ペットロボット１がユーザに挨拶
をし、ユーザに頭を撫でられた場合には、ユーザに挨拶
をしたという行動情報と、頭を撫でられたという認識情
報が本能・感情管理部２６に供給される。このとき、本
能・感情管理部２６においては、「うれしさ」を表す感
情ユニット５１Ａの値が増加される。For example, when the pet robot 1 greets the user and the user pats his / her head, the instinct / emotion management section provides the action information that the user greeted the user and the recognition information that the user patted the head. 26. At this time, in the instinct / emotion management unit 26, the value of the emotion unit 51A representing “joy” is increased.

【００６４】図３の説明に戻り、行動管理部２７は、音
声認識部２４、およびセンサデータ処理部２５から供給
されてきた情報と、本能・感情管理部２６から供給され
てきた内部情報、および時間経過等に基づいて次の行動
を決定し、決定した行動を表わす情報をコマンド生成部
２９に通知する。Returning to the explanation of FIG. 3, the action management section 27 has information supplied from the voice recognition section 24 and the sensor data processing section 25, internal information supplied from the instinct / emotion management section 26, and The next action is determined based on the elapse of time, etc., and the command generation unit 29 is notified of information indicating the determined action.

【００６５】図５は、図３の行動管理部２７の機能構成
例を示す模式図である。FIG. 5 is a schematic diagram showing a functional configuration example of the behavior management unit 27 of FIG.

【００６６】行動管理部２７は、行動モデルライブラリ
６１と行動選択部６２から構成されており、行動モデル
ライブラリ６１は、予め設定された条件（トリガ）に対
応させて、図に示すように、各種の行動モデルを有して
いる。The behavior management unit 27 is composed of a behavior model library 61 and a behavior selection unit 62, and the behavior model library 61 responds to preset conditions (triggers) as shown in FIG. Have a behavior model of.

【００６７】図の例においては、行動モデルライブラリ
６１には、ボールを検出した場合にとる行動を示すボー
ル対応行動モデル６１Ａ、ボールを見失った場合などに
とる行動を示す自律検索行動モデル６１Ｂ、上述した感
情モデル５１の変化を検出した場合にとる行動を示す感
情表現行動モデル６１Ｃが用意されている。また、障害
物を検出した場合にとる行動を示す障害物回避行動モデ
ル６１Ｄ、転倒したことを検出した場合にとる行動を示
す転倒復帰行動モデル６１Ｅ、およびバッテリー残量が
少なくなった場合にとる行動を示すバッテリー管理行動
モデル６１Ｆが用意されている。In the example shown in the figure, the behavior model library 61 includes a ball-corresponding behavior model 61A indicating an action to be taken when a ball is detected, an autonomous retrieval behavior model 61B indicating an action to be taken when the ball is lost, and the like. An emotion expression action model 61C indicating an action to be taken when a change in the emotion model 51 is detected is prepared. In addition, an obstacle avoidance behavior model 61D that shows an action to be taken when an obstacle is detected, a fall recovery behavior model 61E that shows an action to be taken when a fall is detected, and an action to be taken when the battery level is low. A battery management behavior model 61F indicating is shown.

【００６８】そして、行動選択部６２は、音声認識部２
４、センサデータ処理部２５等から供給される情報と、
本能・感情管理部２６から供給される内部情報、および
時間経過等を参照し、ペットロボット１が次にとる行動
を行動モデルライブラリ６１に用意されている行動モデ
ルから選択する。Then, the action selecting section 62 is connected to the voice recognizing section 2
4. Information supplied from the sensor data processing unit 25 and the like,
By referring to the internal information supplied from the instinct / emotion management unit 26, the passage of time, etc., the next action to be taken by the pet robot 1 is selected from the action models prepared in the action model library 61.

【００６９】なお、行動選択部６２は、現在のペットロ
ボット１の状態を表わすノードから、どの状態を表わす
ノードに遷移するのかを、それぞれのノード間を結ぶア
ークに設定されている遷移確率に基づいて決定する、図
６に示すような有限確率オートマトンを用いて選択す
る。The action selection unit 62 determines which state the node representing the current state of the pet robot 1 transits to, based on the transition probability set in the arc connecting the respective nodes. The finite probability automaton as shown in FIG.

【００７０】図６に示す有限確率オートマトンは、例え
ば、現在の状態がノード０（NODE₀）である場合、確率P
₁でノード１（NODE₁）に遷移し、確率P₂でノード２（NO
DE₂）に遷移し、確率P_nでノードｎ（NODE_n）に遷移する
ことを示している。また、この有限確率オートマトン
は、確率P_n-1でノード０（NODE₀）に遷移すること、す
なわち、いずれのノードにも遷移しないことを示してい
る。The finite probability automaton shown in FIG. 6 has a probability P when the current state is node 0 (NODE ₀ ).
Transition to node 1 (NODE ₁ ) at ₁ and node 2 (NODE at probability P ₂
DE ₂ ), with probability P _n , to node n (NODE _n ). Further, this finite probability automaton shows that it transits to the node 0 (NODE ₀ ) with the probability P _n−1 , that is, it does not transit to any node.

【００７１】そして、行動モデルライブラリ６１に規定
されるそれぞれの行動モデルは、複数の状態を表わすノ
ードから構成されており、それぞれのノードには、他の
ノードに遷移する確率が記述された状態遷移表が設定さ
れている。Each action model defined in the action model library 61 is composed of nodes representing a plurality of states, and each node has a state transition in which the probability of transition to another node is described. The table is set.

【００７２】図７は、行動モデルライブラリ６１に規定
される、所定の行動モデルに属するノード１００（NODE
₁₀₀）の状態遷移表の例を示す図である。FIG. 7 shows a node 100 (NODE) belonging to a predetermined behavior model defined in the behavior model library 61.
It is a figure which shows the example of the state transition table of ₁₀₀ ).

【００７３】図７においては、他のノード（または自ら
のノード）に遷移する条件としての入力イベントが「入
力イベント名」の欄に記載されており、遷移する条件と
して規定される「入力イベント名」についての、さらな
る条件が「データ名」、および「データ範囲」の欄に記
載されている。In FIG. 7, the input event as a condition for transition to another node (or its own node) is described in the column of "input event name", and the "input event name" defined as the transition condition is specified. Are described in the columns of “data name” and “data range”.

【００７４】図の例において、ID「１」が設定されてい
る情報は、「ボールを検出した」（入力イベント名「BA
LL」）ことに関する情報が行動管理部２７に通知され、
かつ、検出されたボールの大きさが「０乃至１０００」
の範囲であるとき、「３０パーセント」の確率でノード
１００からノード１２０に遷移することを示している。In the example of the figure, the information having the ID "1" is "ball detected" (input event name "BA
LL ”) information about that is notified to the behavior management unit 27,
And the size of the detected ball is "0 to 1000".
In the range of, the transition from the node 100 to the node 120 is shown with a probability of “30%”.

【００７５】また、ID「２」が設定されている情報は、
「頭を軽く叩かれた」（入力イベント名「PAT」）こと
に関する情報が行動管理部２７に通知されてきたとき、
「４０パーセント」の確率でノード１００からノード１
５０に遷移することを示している。Further, the information to which the ID "2" is set is
When the action management unit 27 is notified of information regarding "slapped on the head" (input event name "PAT"),
Node 40 to node 1 with a probability of “40%”
The transition to 50 is shown.

【００７６】さらに、ID「３」が設定されている情報
は、「頭を強く叩かれた」（入力イベント名「HIT」）
ことに関する情報が行動管理部２７に通知されてきたと
き、「２０パーセント」の確率でノード１００からノー
ド１５０に遷移することを示している。Furthermore, the information for which the ID "3" is set is "Head hit hard" (input event name "HIT").
When the behavior management unit 27 is notified of the information regarding the thing, it indicates that the transition from the node 100 to the node 150 is made with a probability of “20%”.

【００７７】ID「４」が設定されている情報は、「障害
物を検出した」（入力イベント名「OBSTACLE」）ことに
関する情報が行動管理部２７に通知され、かつ、その障
害物までの距離が「０乃至１００」の範囲であることが
通知されてきたとき、「１００パーセント」の確率でノ
ード１００からノード１０００に遷移することを示して
いる。As for the information having the ID "4", the information about "obstacle detected" (input event name "OBSTACLE") is notified to the action management unit 27, and the distance to the obstacle is detected. Indicates that the node 100 is transitioned from the node 100 to the node 1000 with a probability of “100%” when it is notified that the value is in the range of “0 to 100”.

【００７８】また、この例においては、音声認識部２４
やセンサデータ処理部２５等からの入力がない場合であ
っても、感情モデル５１や本能モデル５２のパラメータ
に応じて他のノードに遷移するようになされている。Further, in this example, the voice recognition unit 24
Even when there is no input from the sensor data processing unit 25 or the like, the transition is made to another node according to the parameters of the emotion model 51 or the instinct model 52.

【００７９】例えば、ID５が設定されている情報は、感
情モデル５１の「うれしさ（JOY）」を表わす感情ユニ
ット５１Ａのパラメータが「５０乃至１００」の範囲に
あるとき、「５パーセント」の確率でノード１２０に遷
移し、「３０パーセント」の確率でノード１５０に遷移
することを示している。For example, when the parameter of the emotion unit 51A representing the "joy (JOY)" of the emotion model 51 is in the range of "50 to 100", the information in which ID5 is set has a probability of "5%". Indicates that a transition is made to the node 120 and a transition is made to the node 150 with a probability of “30%”.

【００８０】ID６が設定されている情報は、感情モデル
５１の「驚き（SURPRISE）」を表わす感情ユニット５１
Ｄのパラメータが「５０乃至１００」の範囲にあると
き、「１５パーセント」の確率でノード１０００に遷移
することを示しており、ID７が設定されている情報は、
感情モデル５１の「悲しみ（SUDNESS）」を表わす感情
ユニット５１Ｂのパラメータが「５０乃至１００」の範
囲にあるとき、「５０パーセント」の確率でノード１２
０に遷移することを示している。The information to which the ID 6 is set is the emotion unit 51 representing “SURPRISE” of the emotion model 51.
When the parameter of D is in the range of “50 to 100”, it indicates that the node 1000 is transited to the node 1000 with a probability of “15%”.
When the parameter of the emotion unit 51B representing “SUDNESS” of the emotion model 51 is in the range of “50 to 100”, the node 12 has a probability of “50%”.
This indicates that the transition to 0 is made.

【００８１】また、図７の状態遷移表には、それぞれの
ノードに遷移したとき、そこで出力される行動が「出力
行動」の欄に記述されており、例えば、ノード１２０に
遷移したとき、そこで出力される行動は「ACTION１」で
あり、ノード１５０に遷移したとき、そこで出力される
行動は「ACTION２」であり、ノード１０００に遷移した
とき、そこで出力される行動は「MOVE BACK」（後退）
であるとされている。Further, in the state transition table of FIG. 7, the action output at each node when transitioning to each node is described in the column of “output action”. For example, when transitioning to node 120, The action that is output is "ACTION1", the action that is output when transitioning to the node 150 is "ACTION2", and the action that is output when transitioning to the node 1000 is "MOVE BACK" (back)
Is said to be.

【００８２】行動選択部６２は、このような状態遷移表
を参照することで状態を遷移させ、そのノードに設定さ
れている行動を指示する情報を、本能・感情管理部２
６、コマンド生成部２９等に出力する。The action selecting section 62 makes a state transition by referring to such a state transition table, and outputs information indicating an action set in the node to the instinct / emotion managing section 2
6, output to the command generation unit 29 and the like.

【００８３】そして、例えば、行動管理部２７により管
理され、遷移したノードに規定されている行動が「こん
にちはを話す」である場合、コマンド生成部２９は、行
動管理部２７からの指示に基づいて発話コマンドを生成
し、音声合成部３４に出力する。[0083] Then, for example, is managed by the behavior management unit 27, if the action is defined in the transition node is a "speak Hello", the command generating unit 29, based on an instruction from the behavior management unit 27 A speech command is generated and output to the voice synthesis unit 34.

【００８４】また、遷移したノードに規定されている行
動が「後退する」である場合、コマンド生成部２９は、
行動管理部２７からの指示に基づいて、それを実行させ
るコマンドを生成し、制御部３１に出力する。また、コ
マンド生成部２９は、ペットロボット１の「目」の位置
に設けられているLED４１を点灯、消灯または点滅させ
るとき、それを指示するコマンドを生成し、LED制御部
３０に出力する。If the action defined for the transitioned node is "backward", the command generation unit 29
Based on the instruction from the behavior management unit 27, a command to execute it is generated and output to the control unit 31. Further, when the LED 41 provided at the position of the “eye” of the pet robot 1 is turned on, turned off, or blinks, the command generation unit 29 generates a command instructing the LED 41 and outputs the command to the LED control unit 30.

【００８５】LED制御部３０は、コマンド生成部２９か
ら供給されてきたコマンドに基づいてLED４１を制御す
る。The LED control section 30 controls the LED 41 based on the command supplied from the command generation section 29.

【００８６】制御部３１は、コマンド生成部２９から供
給されるコマンドに基づいて、アクチュエータ３ＡＡ₁
乃至５Ａ₂を制御し、ペットロボット１の姿勢を現在の
姿勢から次の姿勢に遷移させる。The controller 31 controls the actuator 3AA ₁ based on the command supplied from the command generator 29.
To 5A ₂ are controlled to change the posture of the pet robot 1 from the current posture to the next posture.

【００８７】ここで、現在の姿勢から次に遷移可能な姿
勢は、例えば、胴体や手や足の形状、重さ、各部の結合
状態のようなペットロボット１の物理的形状と、関節が
曲がる方向や角度のようなアクチュエータ３ＡＡ₁乃至
５Ａ₂の機構とによって決定される。Here, the posture to which the current posture can be changed next is, for example, the physical shape of the pet robot 1 such as the shape and weight of the body, hands and feet, and the connection state of the respective parts, and the joint bends. It is determined by the mechanism of the actuators 3AA _{1 to} 5A ₂ such as direction and angle.

【００８８】なお、姿勢には、現在の姿勢から直接遷移
可能な姿勢と、直接遷移できない姿勢がある。例えば、
４本足のペットロボット１は、手足を大きく投げ出して
寝転んでいる状態から、伏せた状態へ直接遷移すること
はできるが、立った状態へ直接遷移することはできず、
一旦、手足を胴体近くに引き寄せて伏せた姿勢になり、
それから立ち上がるという２段階の動作が必要である。
また、安全に実行できない姿勢も存在する。例えば、ペ
ットロボット１は、４本足で立っている姿勢から、両前
足を挙げてバンザイをしようとすると、簡単に転倒して
しまう。The postures include a posture that can directly make a transition from the current posture and a posture that cannot make a direct transition. For example,
The four-legged pet robot 1 can make a direct transition to a prone state from a state of lying down with its limbs largely thrown out, but cannot make a direct transition to a standing state.
Once I pulled my limbs close to my torso and fell down,
Then, a two-step operation of starting up is required.
There are also postures that cannot be safely executed. For example, the pet robot 1 easily falls when attempting to banzai with both front legs raised from a standing posture with four legs.

【００８９】このため、制御部３１は、直接遷移可能な
姿勢をあらかじめ登録しておき、コマンド生成部２９か
ら供給されるコマンドが、直接遷移可能な姿勢を示す場
合には、それに基づいて対応するアクチュエータを駆動
する。Therefore, the control unit 31 registers in advance a posture in which direct transition is possible, and when the command supplied from the command generation unit 29 indicates a posture in which direct transition is possible, the control unit 31 responds based on that. Drive the actuator.

【００９０】一方、コマンド生成部２９から供給される
コマンドが、直接遷移不可能な姿勢を示す場合には、制
御部３１は、遷移可能な他の姿勢に一旦遷移した後に、
目的の姿勢まで遷移させるように、対応するアクチュエ
ータを駆動する。これによりペットロボット１が、遷移
不可能な姿勢を無理に実行しようとする事態や、転倒す
るような事態を回避することができる。On the other hand, when the command supplied from the command generation unit 29 indicates a posture in which direct transition is not possible, the control unit 31 makes a transition to another posture in which transition is possible and then
The corresponding actuator is driven so as to transition to the desired posture. As a result, it is possible to avoid a situation in which the pet robot 1 is forced to execute an untransitable posture or a situation in which the pet robot 1 falls.

【００９１】旋律生成部３２は、音声認識部２４からキ
ーワードが認識されたことが通知されてきたとき、その
キーワードのピッチ周波数を分析することを音声ピッチ
分析部２８に指示する。そして、旋律生成部３２は、キ
ーワードの音声が分析され、ピッチ周波数に基づいて選
択された音階が音声ピッチ分析部２８から通知されてき
たとき、旋律データ記憶部３３から所定の旋律データを
読み出し、通知されてきた音階に基づいて、その旋律デ
ータを変換する。When the melody generating section 32 is notified by the voice recognizing section 24 that the keyword has been recognized, the melody generating section 32 instructs the voice pitch analyzing section 28 to analyze the pitch frequency of the keyword. Then, the melody generating unit 32 reads the predetermined melody data from the melody data storage unit 33 when the voice of the keyword is analyzed and the scale selected based on the pitch frequency is notified from the voice pitch analyzing unit 28. The melody data is converted based on the notified scale.

【００９２】すなわち、旋律データ記憶部３３には、ペ
ットロボット１から出力される音（メロディ）の旋律デ
ータが、例えば、MIDI形式で複数用意されている。That is, in the melody data storage unit 33, a plurality of melody data of sounds (melody) output from the pet robot 1 are prepared, for example, in the MIDI format.

【００９３】旋律生成部３２は、例えば、旋律データの
各音階に、音声ピッチ分析部２８から供給されてきた音
階で置換するなどの処理を施し、新たな旋律データを生
成する。旋律生成部３２により生成された旋律データ
は、音声合成部３４に出力される。The melody generating section 32 performs processing such as replacing each scale of the melody data with the scale supplied from the voice pitch analyzing section 28 to generate new melody data. The melody data generated by the melody generator 32 is output to the voice synthesizer 34.

【００９４】音声ピッチ分析部２８は、旋律生成部３２
から指示されたとき、AD変換部２１から供給されてきた
キーワードの入力音声を分析し、個々の時刻のピッチ周
波数を算出する。例えば、音声ピッチ分析部２８は、ピ
ッチ周波数に対応する時間シフト量を変量とし、自己相
関係数が最大の値をとる時間シフト量に基づいて、入力
音声の個々の時刻におけるピッチ周波数を算出する。例
えば、音声ピッチ分析部２８による入力音声のサンプリ
ング周波数が１６ｋHzであり、自己相関係数が最大の値
をとる時間シフト量が１１８である場合、音声ピッチ分
析部２８は、ピッチ周波数ｆ（Hz）をｆ_PITCH＝１６０
００／１１８＝１３５．５９Hzと算出する。The voice pitch analysis section 28 includes a melody generation section 32.
When instructed by, the input voice of the keyword supplied from the AD conversion unit 21 is analyzed, and the pitch frequency at each time is calculated. For example, the voice pitch analysis unit 28 uses the time shift amount corresponding to the pitch frequency as a variable, and calculates the pitch frequency at each time of the input voice based on the time shift amount having the maximum value of the autocorrelation coefficient. . For example, when the sampling frequency of the input voice by the voice pitch analysis unit 28 is 16 kHz and the time shift amount at which the autocorrelation coefficient has the maximum value is 118, the voice pitch analysis unit 28 determines the pitch frequency f (Hz). F _PITCH = 160
Calculated as 00/118 = 135.59 Hz.

【００９５】そして、音声ピッチ分析部２８は、算出し
たピッチ周波数に基づいて、音階を選択するための所定
のピッチ周波数を抽出し、抽出したピッチ周波数に対応
する音階を選択する。音声ピッチ分析部２８は、例え
ば、所定の期間の平均値からなるピッチ周波数を複数抽
出し、それに基づいて複数の音階を選択する。音声ピッ
チ分析部２８により選択された音階は、旋律生成部３２
に通知され、旋律データを変換するために利用される。Then, the voice pitch analysis section 28 extracts a predetermined pitch frequency for selecting a scale based on the calculated pitch frequency, and selects a scale corresponding to the extracted pitch frequency. The voice pitch analysis unit 28 extracts, for example, a plurality of pitch frequencies composed of an average value of a predetermined period, and selects a plurality of scales based on the extracted pitch frequencies. The scale selected by the voice pitch analysis unit 28 is the melody generation unit 32.
Is used to convert melody data.

【００９６】音声合成部３４は、コマンド生成部２９か
ら所定の発話コマンドと、その発話の内容を示すテキス
トデータが供給されてきたとき、内部に有する音声合成
用辞書を参照し、そのテキストデータに対応する、例え
ば、WAVフォーマットなどによる音声データを生成す
る。When the predetermined utterance command and the text data indicating the content of the utterance are supplied from the command generating unit 29, the voice synthesizing unit 34 refers to the internal voice synthesizing dictionary to convert the text data into the text data. For example, WAV format corresponding audio data is generated.

【００９７】より詳細には、この音声合成用辞書には、
各単語の品詞情報や、読み、アクセント等の情報が記述
された単語辞書、その単語辞書に記述された単語につい
て、単語連鎖に関する制約等の生成用文法規則、音声情
報としての音素片データが記憶された音声情報辞書が格
納されている。More specifically, in this voice synthesis dictionary,
A word dictionary in which part-of-speech information of each word, reading, accent, etc. is described, grammatical rules for generation of restrictions on word chains, and phoneme piece data as speech information for the words described in the word dictionary. The stored voice information dictionary is stored.

【００９８】音声合成部３４は、単語辞書、および生成
用文法規則に基づいて、入力されるテキストの形態素解
析や、構文解析等のテキスト解析（言語解析）を行い、
音声合成に必要な情報を抽出する。音声合成に必要な情
報としては、例えば、ポーズの位置や、アクセント、イ
ントネーション、パワー等を制御するための韻律情報、
各単語の発音を表す音韻情報などがある。The voice synthesizer 34 performs morphological analysis (syntactic analysis) and other text analysis (language analysis) of the input text based on the word dictionary and the grammatical rules for generation,
Extract the information required for speech synthesis. As information necessary for voice synthesis, for example, prosodic information for controlling pose position, accent, intonation, power, etc.,
There is phonological information indicating the pronunciation of each word.

【００９９】音声合成部３４は、抽出した各種の情報に
基づいて、音声情報辞書を参照しながら、必要な音素片
データを接続し、さらに、音素片データの波形を加工す
ることによって、ポーズ、アクセント、イントネーショ
ン等を適切に付加し、供給されてきたテキストデータに
対応する音声データを生成する。The voice synthesizer 34 connects the necessary phoneme piece data while referring to the voice information dictionary based on the extracted various information, and further processes the waveform of the phoneme piece data to make a pause, Accents, intonations, etc. are appropriately added, and voice data corresponding to the supplied text data is generated.

【０１００】音声合成部３４により生成された音声デー
タは、DA変換部３５においてディジタルアナログ変換さ
れ、スピーカ１４から出力される。The voice data generated by the voice synthesizer 34 is digital-analog converted in the DA converter 35 and output from the speaker 14.

【０１０１】また、音声合成部３４は、旋律生成部３２
から旋律データが供給されてきたとき、それを再生し、
スピーカ１４から出力させる。The voice synthesizer 34 also includes a melody generator 32.
When the melody data is supplied from, play it,
It is output from the speaker 14.

【０１０２】次に、以上のような構成を有するペットロ
ボット１の動作について説明する。Next, the operation of the pet robot 1 having the above configuration will be described.

【０１０３】始めに、図８のフローチャートを参照し
て、入力音声に基づいて選択した音階を利用して旋律デ
ータを生成し、それを再生するペットロボット１の処理
について説明する。First, the processing of the pet robot 1 for generating melody data using the scale selected based on the input voice and reproducing it will be described with reference to the flowchart of FIG.

【０１０４】ステップＳ１において、音声認識部２４
は、音声認識の結果、キーワードを検出したか否かを判
定し、キーワードを検出したと判定するまで待機する。
例えば、ユーザにより発話された音声は、上述したよう
にマイクロフォン１１により集音され、各処理が施され
た後、音声認識部２４に供給されている。In step S1, the voice recognition unit 24
Determines whether or not a keyword is detected as a result of voice recognition, and waits until it is determined that a keyword is detected.
For example, the voice uttered by the user is collected by the microphone 11 as described above, subjected to each processing, and then supplied to the voice recognition unit 24.

【０１０５】音声認識部２４は、複数のキーワードを記
憶しており、それに基づいて、入力されてきた音声がキ
ーワードであるか否かを判定する。このキーワードは、
例えば、周波数変化が比較的少なく、ピッチ周波数を抽
出しやすい母音（例えば、「あ」の音）からなるものと
してもよい。The voice recognition unit 24 stores a plurality of keywords, and based on this, determines whether or not the input voice is a keyword. This keyword is
For example, it may be composed of a vowel (for example, a sound of "A") whose frequency change is relatively small and whose pitch frequency is easily extracted.

【０１０６】そして、音声認識部２４は、ステップＳ１
において、キーワードを検出したと判定した場合、ステ
ップＳ２に進み、それを行動管理部２７、および旋律生
成部３２に通知する。Then, the voice recognition section 24 operates in step S1.
In step S2, if it is determined that the keyword is detected, the process proceeds to step S2, and the action management unit 27 and the melody generation unit 32 are notified of the fact.

【０１０７】ステップＳ３において、旋律生成部３２
は、音声ピッチ分析部２８に対して、検出されたキーワ
ードの分析を行うことを指示する。In step S3, the melody generator 32
Instructs the voice pitch analysis unit 28 to analyze the detected keyword.

【０１０８】音声ピッチ分析部２８は、ステップＳ４に
おいて、旋律生成部３２からの指示に応じて、AD変換部
２１から供給されるキーワードの波形の分析を行い、ピ
ッチ周波数を抽出する。In step S4, the voice pitch analysis section 28 analyzes the waveform of the keyword supplied from the AD conversion section 21 according to the instruction from the melody generation section 32, and extracts the pitch frequency.

【０１０９】図９は、ピッチ周波数を抽出する音声ピッ
チ分析部２８の処理を説明する図である。なお、図９に
おいては、キーワードとして「ららら」が検出された場
合の例とされている。FIG. 9 is a diagram for explaining the processing of the voice pitch analysis unit 28 which extracts the pitch frequency. It should be noted that FIG. 9 shows an example in the case where “LaLaLa” is detected as the keyword.

【０１１０】そして、図９の上方に示す図は、縦軸を音
声の振幅、横軸を時刻とするキーワード「ららら」の波
形を表わしている。また、図９の下方に示す図の縦軸は
キーワード「ららら」のピッチ周波数を表わしており、
横軸は時刻を表わしている。The upper part of FIG. 9 shows the waveform of the keyword "la la la" with the vertical axis representing the amplitude of the voice and the horizontal axis representing the time. Further, the vertical axis of the diagram shown in the lower part of FIG. 9 represents the pitch frequency of the keyword “La la la”,
The horizontal axis represents time.

【０１１１】音声ピッチ分析部２８は、図９の上方に示
すようなキーワード「ららら」の波形が供給されてきた
とき、それぞれの時刻のピッチ周波数を、上述したよう
に時間シフト量とサンプリング周波数に基づいて算出す
る。When the waveform of the keyword "La la la" as shown in the upper part of FIG. 9 is supplied, the voice pitch analysis unit 28 sets the pitch frequency at each time to the time shift amount and the sampling frequency as described above. Calculate based on

【０１１２】そして、音声ピッチ分析部２８は、例え
ば、それぞれの「ら」の音の音声区間（ｔ₁，ｔ₂，
ｔ₃）を３等分し、３等分した音声区間のうち、中央の
区間の平均のピッチ周波数をそれぞれ抽出する。この音
声区間に関する情報は、音声区間検出部２３により検出
され、供給されてきたものである。Then, the voice pitch analysis section 28, for example, uses the voice intervals (t ₁ , t ₂ ,
t ₃ ) is equally divided into three, and the average pitch frequency of the central section is extracted from each of the three equally divided speech sections. The information regarding the voice section has been detected and supplied by the voice section detection unit 23.

【０１１３】図の例においては、音声区間ｔ₁が音声区
間ｔ₁₁，ｔ₁₂，ｔ₁₃とそれぞれ３等分され、同様に、音
声区間ｔ₂が音声区間ｔ₂₁，ｔ₂₂，ｔ₂₃とされ、音声区
間ｔ₃が音声区間ｔ₃₁，ｔ₃₂，ｔ₃₃とされている。In the example of the figure, the voice section t ₁ is divided into three voice sections t ₁₁ , t ₁₂ , and t ₁₃ , respectively, and similarly, the voice section t ₂ is divided into voice sections t ₂₁ , t ₂₂ , and t ₂₃ . The voice section t ₃ is set as voice sections t ₃₁ , t ₃₂ , and t ₃₃ .

【０１１４】そして、音声区間ｔ₁₂の区間におけるピッ
チ周波数の平均値が１３１とされ、音声区間ｔ₂₂の区間
におけるピッチ周波数の平均値が１４８とされ、音声区
間ｔ ₃₂の区間におけるピッチ周波数の平均値が１６３と
されている。Then, the voice section t₁₂In the section
The average value of the H frequency is set to 131, and the voice section t_{twenty two}Section of
The average value of the pitch frequency in
Interval t ₃₂The average value of the pitch frequency in the section is
Has been done.

【０１１５】なお、このように、キーワードのそれぞれ
の音の中央の区間におけるピッチ周波数の平均値を算出
し、それを抽出するだけでなく、様々な方法によりピッ
チ周波数を抽出することもできる。ピッチ周波数の抽出
方法としては、例えば、キーワードのそれぞれの音の平
均のピッチ周波数を抽出するようにしてもよいし、所定
の区間における最大のピッチ周波数を抽出するようにし
てもよい。また、それぞれの音の中央の時刻におけるピ
ッチ周波数を抽出するようにしてもよい。As described above, not only the average value of the pitch frequencies in the central section of each sound of the keyword is calculated and extracted, but also the pitch frequency can be extracted by various methods. As a method of extracting the pitch frequency, for example, the average pitch frequency of each sound of the keyword may be extracted, or the maximum pitch frequency in a predetermined section may be extracted. Alternatively, the pitch frequency at the center time of each sound may be extracted.

【０１１６】そして、音声ピッチ分析部２８は、ステッ
プＳ５において、このようにして抽出したピッチ周波数
に基づいて所定の音階を選択する。Then, in step S5, the voice pitch analysis section 28 selects a predetermined scale based on the pitch frequency thus extracted.

【０１１７】図１０は、音声ピッチ分析部２８が管理す
る、ピッチ周波数と音階の対応例を示す図である。図１
０の対応例は、「Ａ４」を「４４０Hz」の音としたもの
であり、「C１」から「B７」までの音階と、それぞれの
周波数が示されている。FIG. 10 is a diagram showing an example of correspondence between pitch frequencies and scales managed by the voice pitch analysis section 28. Figure 1
A correspondence example of 0 is that "A4" is a sound of "440 Hz", and the scale from "C1" to "B7" and the respective frequencies are shown.

【０１１８】従って、音声ピッチ分析部２８は、例え
ば、図９を参照して説明したように、「ららら」のキー
ワードから「１３１Hz，１４８Hz，１６３Hz」のピッチ
周波数を抽出した場合、それぞれのピッチ周波数に対応
する「Ｃ３，Ｄ３，Ｅ３」の音階を選択する。すなわ
ち、音声ピッチ分析部２８は、抽出したピッチ周波数
に、最も近い周波数の音階を選択する。Therefore, for example, when the pitch frequencies of "131 Hz, 148 Hz, 163 Hz" are extracted from the keyword "Rarara" as described with reference to FIG. The scale of "C3, D3, E3" corresponding to is selected. That is, the voice pitch analysis unit 28 selects the scale having the frequency closest to the extracted pitch frequency.

【０１１９】なお、ペットロボット１は、図１０に示し
た周波数のうち、認識の容易さから、例えば、８０Hz乃
至４００Hzの範囲のピッチ周波数を抽出するようになさ
れている。The pet robot 1 extracts pitch frequencies in the range of 80 Hz to 400 Hz from the frequencies shown in FIG. 10 for easy recognition.

【０１２０】このようにして音声ピッチ分析部２８によ
り選択された音階は、旋律生成部３２に通知される。The scale selected by the voice pitch analysis section 28 in this manner is notified to the melody generation section 32.

【０１２１】一方、行動管理部２７は、ステップＳ６に
おいて、音声認識部２４からキーワードが検出されたこ
とが通知されてきたとき、旋律生成部３２に対して、旋
律データの生成を指示する。また、旋律生成部３２に対
しては、その指示とともに、変換する旋律データの識別
情報も通知されている。On the other hand, the action management section 27 instructs the melody generation section 32 to generate melody data when the voice recognition section 24 notifies that the keyword is detected in step S6. In addition to the instruction, the melody generator 32 is also notified of the identification information of the melody data to be converted.

【０１２２】ステップＳ７において、旋律生成部３２
は、行動管理部２７から通知されてきた識別情報に対応
する旋律データを旋律データ記憶部３３から読み出し、
音声ピッチ分析部２８から供給されてきた音階に基づい
て、新たな旋律データを生成する。In step S7, the melody generator 32
Reads the melody data corresponding to the identification information notified from the behavior management unit 27 from the melody data storage unit 33,
New melody data is generated based on the scale supplied from the voice pitch analysis unit 28.

【０１２３】図１１は、旋律データ記憶部３３に記憶さ
れている旋律データを楽譜上に表わしたものの例を示す
図であり、この旋律データ１−１の音階１乃至９はそれ
ぞれ「Ｃ３，Ｃ３，Ｅ３，Ｇ３，Ｃ３，Ｃ３，Ｅ３，Ｇ
３，Ｃ３（＋１２）」である。なお、音階９は、音階１
（「Ｃ３」）を基準として、例えば、ピアノ上などで１
２鍵盤だけ上の音であることを示している。このよう
に、旋律データを構成する音階の種類が、キーワードに
より抽出された音階の数以上あるとき、旋律データを構
成する所定の音階は、他の音階を基準として表わされ
る。FIG. 11 is a diagram showing an example in which the melody data stored in the melody data storage unit 33 is represented on a musical score, and the scales 1 to 9 of this melody data 1-1 are "C3, C3," respectively. , E3, G3, C3, C3, E3, G
3, C3 (+12) ". Note that scale 9 is scale 1
Based on (“C3”), for example, 1 on a piano
It indicates that the sound is only two keys above. In this way, when the types of scales forming the melody data are equal to or larger than the number of scales extracted by the keyword, the predetermined scale forming the melody data is expressed with another scale as a reference.

【０１２４】そして、旋律生成部３２は、新たな旋律デ
ータを生成するために、始めに、図１２に示すように、
各音階を所定の文字に置換する。なお、この処理は、説
明の便宜上用いたものであり、実際には、旋律データ１
−１から旋律データ１−３（図１３）に直接変換される
ようにしてもよい。Then, the melody generator 32 first generates, as shown in FIG.
Replace each scale with a given letter. It should be noted that this processing is used for convenience of explanation, and actually, the melody data 1
-1 may be directly converted into the melody data 1-3 (Fig. 13).

【０１２５】例えば、旋律生成部３２は、旋律データ１
−１の「Ｃ３」を「Ｘ」に、「Ｄ３」を「Ｙ」に、「Ｅ
３」を「Ｚ」にそれぞれ置換する。これにより、旋律デ
ータ１−１は、旋律データ１−２に示すように「Ｘ，
Ｘ，Ｙ，Ｚ，Ｘ，Ｘ，Ｙ，Ｚ，Ｘ（＋１２）」となる。For example, the melody generator 32 uses the melody data 1
-1 "C3" to "X", "D3" to "Y", "E"
3 ”is replaced with“ Z ”. As a result, the melody data 1-1 becomes "X,
X, Y, Z, X, X, Y, Z, X (+12) ".

【０１２６】また、旋律生成部３２は、旋律データ１−
２の各文字に対して、音声ピッチ分析部２８により選択
された音階を対応付ける。図１３は、音声ピッチ分析部
２８により選択された音階が対応付けられて、新たな旋
律データが生成される処理を説明する図である。Further, the melody generating section 32 uses the melody data 1-
The scale selected by the voice pitch analysis unit 28 is associated with each character of 2. FIG. 13 is a diagram illustrating a process in which the scale selected by the voice pitch analysis unit 28 is associated and new melody data is generated.

【０１２７】例えば、上述したように、キーワード「ら
らら」に基づいて「Ｃ３，Ｄ３，Ｅ３」の音階が音声ピ
ッチ分析部２８により選択された場合、旋律生成部３２
は、旋律データ１−２の「Ｘ，Ｙ，Ｚ」に「Ｃ３，Ｄ
３，Ｅ３」をそれぞれ対応付ける。そして、旋律生成部
３２は、図に示すように「Ｃ３，Ｃ３，Ｄ３，Ｅ３，Ｃ
３，Ｃ３，Ｄ３，Ｅ３，Ｃ３（＋１２）」の音階の並び
からなる旋律データ１−３を生成する。すなわち、この
時点において、キーワード「ららら」から抽出されたピ
ッチ周波数に基づいて、予め用意されている旋律データ
１−１が旋律データ１−３に変換されている（音階３お
よび４、並びに音階７および８が他の音階にそれぞれ変
換されている）。For example, as described above, when the scale of "C3, D3, E3" is selected by the voice pitch analysis unit 28 based on the keyword "La la la", the melody generation unit 32.
Is "C3, D" in "X, Y, Z" of melody data 1-2.
3, E3 ”are associated with each other. Then, the melody generation unit 32 displays “C3, C3, D3, E3, C
3, C3, D3, E3, C3 (+12) ”is generated to generate melody data 1-3. That is, at this point, the melody data 1-1 prepared in advance is converted into the melody data 1-3 based on the pitch frequency extracted from the keyword “La la la” (scales 3 and 4, and scale 7). And 8 are converted to other scales respectively).

【０１２８】さらに、旋律生成部３２は、以上のように
して生成した旋律データが比較的低い音の場合、生成し
た旋律データの各音を所定の数のオクターブだけ高音に
遷移し、新たな旋律データを生成する。当然、生成した
旋律データが比較的高い音の場合、数オクターブだけ下
げるようにしてもよい。Further, when the melody data generated as described above is a relatively low tone, the melody generating section 32 transitions each tone of the generated melody data to a high pitch by a predetermined number of octaves to generate a new melody. Generate data. Of course, if the generated melody data is a relatively high-pitched sound, it may be lowered by a few octaves.

【０１２９】図１４は、旋律データの各音階を、例え
ば、それぞれ３オクターブだけ遷移させて新たな旋律デ
ータを生成する旋律生成部３２の処理を説明する図であ
る。FIG. 14 is a diagram for explaining the processing of the melody generating section 32 which generates new melody data by shifting each scale of the melody data by, for example, 3 octaves.

【０１３０】図に示すように、旋律データ１−３のそれ
ぞれの音階が３オクターブだけ高音に遷移しており、
「Ｃ６，Ｃ６，Ｄ６，Ｅ６，Ｃ６，Ｃ６，Ｄ６，Ｅ６，
Ｃ６（＋１２）」の音階の並びからなる旋律データ１−
４が生成されている。すなわち、予め用意されている旋
律データ１−１から、旋律データ１−４が新たに生成さ
れている。As shown in the figure, each scale of the melody data 1-3 has transitioned to the high pitch by 3 octaves,
"C6, C6, D6, E6, C6, C6, D6, E6
C6 (+12) ”melody data consisting of a sequence of scales 1-
4 has been generated. That is, the melody data 1-4 is newly generated from the melody data 1-1 prepared in advance.

【０１３１】そして、旋律生成部３２により生成された
旋律データは、音声合成部３４に出力される。Then, the melody data generated by the melody generating section 32 is output to the voice synthesizing section 34.

【０１３２】音声合成部３４は、ステップＳ８におい
て、旋律生成部３２から供給されてきた旋律データを再
生し、スピーカ１４から出力させる。なお、出力される
音の長さは、旋律データ記憶部３３に予め記憶されてい
る旋律データの通りでもよいし、上述したようにして抽
出されたピッチ周波数や、ピッチ周波数が検出された時
刻、或いは検出区間の範囲などに基づいて変換されるよ
うにしてもよい。The voice synthesizing unit 34 reproduces the melody data supplied from the melody generating unit 32 and outputs it from the speaker 14 in step S8. The length of the output sound may be the same as the melody data stored in advance in the melody data storage unit 33, or may be the pitch frequency extracted as described above or the time when the pitch frequency is detected, Alternatively, the conversion may be performed based on the range of the detection section or the like.

【０１３３】図１５は、旋律データ記憶部３３に記憶さ
れている他の旋律データと、その変換処理を説明する図
である。FIG. 15 is a diagram for explaining other melody data stored in the melody data storage unit 33 and the conversion processing thereof.

【０１３４】図１５において、旋律データ２−１の音の
並びは「Ｃ３，Ｅ３，Ｇ３，Ｅ３，Ｇ３，Ｃ３（−
５），Ｅ３，Ｇ３，Ｅ３，Ｇ３」とされている。そし
て、上述したように、キーワード「ららら」から「Ｃ
３，Ｄ３，Ｅ３」の音階が選択された場合、旋律データ
２−１の「Ｃ３，Ｅ３，Ｇ３」がそれぞれ「Ｃ３，Ｄ
３，Ｅ３」に変換され（矢印の処理）、「Ｃ３，Ｄ
３，Ｅ３，Ｄ３，Ｅ３，Ｃ３（−５），Ｄ３，Ｅ３，Ｄ
３，Ｅ３」の音の並びからなる旋律データ２−２に変換
される。すなわち、この時点において、旋律データ２−
１を構成する音階２乃至５、および音階７乃至１０が他
の音階にそれぞれ変換されている。In FIG. 15, the arrangement of notes in the melody data 2-1 is "C3, E3, G3, E3, G3, C3 (-
5), E3, G3, E3, G3 ". Then, as described above, from the keyword “LaLaLa” to “C
When the scale of "3, D3, E3" is selected, "C3, E3, G3" of the melody data 2-1 is changed to "C3, D", respectively.
"3, E3" (arrow processing), "C3, D
3, E3, D3, E3, C3 (-5), D3, E3, D
3, E3 ”is converted into melody data 2-2 consisting of a sequence of sounds. That is, at this point, the melody data 2-
The scales 2 to 5 and the scales 7 to 10 which form 1 are converted into other scales, respectively.

【０１３５】また、旋律データ２−２の各音は、旋律生
成部３２により、それぞれ３オクターブだけ高音に遷移
され（矢印の処理）、「Ｃ６，Ｄ６，Ｅ６，Ｄ６，Ｅ
６，Ｃ６（−５），Ｄ６，Ｅ６，Ｄ６，Ｅ６」の音の並
びからなる旋律データ２−３が生成されている。そし
て、この旋律データ２−３が再生され、対応する音が出
力される。Further, each note of the melody data 2-2 is transitioned to a high pitch by 3 octaves by the melody generating section 32 (processed by an arrow), and "C6, D6, E6, D6, E"
6, C6 (−5), D6, E6, D6, E6 ”, the melody data 2-3 is generated. Then, the melody data 2-3 is reproduced, and the corresponding sound is output.

【０１３６】図１６は、旋律データ記憶部３３に記憶さ
れているさらに他の旋律データと、その変換処理を説明
する図である。FIG. 16 is a diagram for explaining still another melody data stored in the melody data storage unit 33 and the conversion processing thereof.

【０１３７】図１６において、旋律データ３−１の音の
並びは「Ｃ３，Ｅ３，Ｇ３，Ｃ３（＋１２），Ｃ３，Ｅ
３，Ｇ３，Ｃ３（＋１２），Ｃ３（−５），Ｃ３（−
１）」とされている。そして、上述したように、キーワ
ード「ららら」から「Ｃ３，Ｄ３，Ｅ３」の音階が選択
された場合、旋律データ３−１の「Ｃ３，Ｅ３，Ｇ３」
がそれぞれ「Ｃ３，Ｄ３，Ｅ３」に変換され（矢印In FIG. 16, the arrangement of notes in the melody data 3-1 is "C3, E3, G3, C3 (+12), C3, E.
3, G3, C3 (+12), C3 (-5), C3 (-
1) ”. Then, as described above, when the scale of "C3, D3, E3" is selected from the keyword "Rarara", "C3, E3, G3" of the melody data 3-1.
Are converted to "C3, D3, E3" respectively (arrows

【外１】の処理）、「Ｃ３，Ｄ３，Ｅ３，Ｃ３（＋１２），Ｃ
３，Ｄ３，Ｅ３，Ｃ３（＋１２），Ｃ３（−５），Ｃ３
（−１）」の音の並びからなる旋律データ２−２に変換
される。すなわち、この時点において、旋律データ２−
１を構成する音階２および３、並びに音階６および７が
他の音階にそれぞれ変換されている。[Outer 1] Processing), “C3, D3, E3, C3 (+12), C
3, D3, E3, C3 (+12), C3 (-5), C3
(-1) "is converted into melody data 2-2 consisting of a sequence of sounds. That is, at this point, the melody data 2-
The scales 2 and 3 constituting 1 and the scales 6 and 7 are converted into other scales, respectively.

【０１３８】また、旋律データ３−２の各音は、旋律生
成部３２により、それぞれ３オクターブだけ高音に遷移
され（矢印Further, each note of the melody data 3-2 is transitioned to a high note by 3 octaves by the melody generating section 32 (arrows).

【外２】の処理）、「Ｃ６，Ｄ６，Ｅ６，Ｃ６（＋１２），Ｃ
６，Ｄ６，Ｅ６，Ｃ６（＋１２），Ｃ６（−５），Ｃ６
（−１）」の音の並びからなる旋律データ３−３が生成
される。そして、この旋律データ３−３が再生され、対
応する音が出力されることになる。[Outside 2] Processing), “C6, D6, E6, C6 (+12), C
6, D6, E6, C6 (+12), C6 (-5), C6
Melody data 3-3 composed of a sequence of sounds of (-1) is generated. Then, the melody data 3-3 is reproduced and the corresponding sound is output.

【０１３９】以上のように、予め用意されている旋律デ
ータを、キーワードの入力音声から抽出したピッチ周波
数に基づいて変換することにより、仮に同じキーワード
を複数のユーザが順次発話した場合であっても、同じ旋
律データから、それぞれ異なる音が出力されることにな
る。As described above, by converting the prepared melody data based on the pitch frequency extracted from the input voice of the keyword, even if a plurality of users sequentially utter the same keyword. , Different sounds are output from the same melody data.

【０１４０】従って、ペットロボット１から出力される
音が予測のつかないものとなるため、ペットロボット１
とのコミュニケーションが、より面白みのあるものとな
る。Therefore, since the sound output from the pet robot 1 becomes unpredictable, the pet robot 1
Communication with will be more interesting.

【０１４１】以上においては、抽出したピッチ周波数に
対応する音階を選択し、その音階に基づいて変換した後
に得られる旋律データを、所定の数のオクターブだけ遷
移させるとしたが、当然、反対に、抽出したピッチ周波
数に対応する音階を選択し、その音階を所定の数のオク
ターブだけ遷移させたものを用いて、予め用意されてい
る旋律データを変換するようにしてもよい。In the above, the scale corresponding to the extracted pitch frequency is selected, and the melody data obtained after conversion based on the scale is transitioned by a predetermined number of octaves, but of course, conversely, It is also possible to select a scale corresponding to the extracted pitch frequency, and use the transition of that scale by a predetermined number of octaves to convert the melody data prepared in advance.

【０１４２】この場合、例えば、キーワード「ららら」
のピッチ周波数に基づいて選択された「Ｃ３，Ｄ３，Ｅ
３」の音階は、それぞれ３オクターブだけ高音に遷移さ
れて「Ｃ６，Ｄ６，Ｅ６」の音階とされる。そして、例
えば、図１１の旋律データ１−１の「Ｃ３，Ｅ３，Ｇ
３」がそれぞれ「Ｃ６，Ｄ６，Ｅ６」により変換され
て、「Ｃ６，Ｃ６，Ｄ６，Ｅ６，Ｃ６，Ｃ６，Ｄ６，Ｅ
６，Ｃ６（＋１２）」の音の並びからなる旋律データ１
−４が生成される。In this case, for example, the keyword "La la la"
"C3, D3, E selected based on the pitch frequency of
The scale of "3" is transitioned to the high pitch by 3 octaves, and the scale of "C6, D6, E6" is obtained. Then, for example, "C3, E3, G" of the melody data 1-1 of FIG.
3 ”are converted by“ C6, D6, E6 ”, respectively, and“ C6, C6, D6, E6, C6, C6, D6, E
6, C6 (+12) ”sequence of notes 1
-4 is generated.

【０１４３】また、ピッチ周波数に基づいて選択した音
階を、単に所定の数のオクターブだけ遷移させるだけで
なく、ペットロボット１による発話の周波数領域内の音
に遷移させるようにしてもよい。Further, the scale selected based on the pitch frequency may not only be transited by a predetermined number of octaves, but may be transited to a sound within the frequency range of the utterance by the pet robot 1.

【０１４４】図１７は、ピッチ周波数に基づいて選択さ
れた音階を、所定の周波数の範囲に遷移させる処理を説
明する図である。FIG. 17 is a diagram for explaining the processing for transitioning the scale selected based on the pitch frequency to the range of a predetermined frequency.

【０１４５】図１７Ａは、ピッチ周波数に基づいて選択
された音階の例を示す図であり、上述したように、キー
ワード「ららら」に基づいて、「Ｃ３，Ｄ３，Ｅ３」の
音階が選択されている。そして、選択された音階が、常
に、ペットロボット１の標準発話域の範囲内にあるよう
に、図１７Ｂに示すように遷移される。FIG. 17A is a diagram showing an example of the scale selected based on the pitch frequency. As described above, the scale of "C3, D3, E3" is selected based on the keyword "La la la". There is. Then, the selected scale is always transited as shown in FIG. 17B so that it is within the standard speech range of the pet robot 1.

【０１４６】例えば、この例においては、選択した音階
の中央の音「Ｄ３」が「Ｃ６」の音に遷移されている。For example, in this example, the central note "D3" of the selected scale is changed to the note "C6".

【０１４７】従って、このようにペットロボット１の標
準発話域に遷移された音階「Ｂ５，Ｃ６，Ｄ６」に基づ
いて旋律データが変換された場合、旋律データ１−１
は、「Ｂ５，Ｂ５，Ｃ６，Ｄ６，Ｂ５，Ｂ５，Ｃ６，Ｄ
６，Ｂ５（＋１２）」の音の並びからなる旋律データに
変換される。すなわち、変換された旋律データの各音階
は、いずれも図１７に示すペットロボット１の標準発話
領域「Ｆ５」乃至「Ｆ６」の範囲内のものとなる。Therefore, when the melody data is converted on the basis of the scale "B5, C6, D6" which has been changed to the standard utterance range of the pet robot 1 as described above, the melody data 1-1
Is "B5, B5, C6, D6, B5, B5, C6, D
6, B5 (+12) ”is converted into melody data consisting of a sequence of sounds. That is, each scale of the converted melody data is within the range of the standard speech areas “F5” to “F6” of the pet robot 1 shown in FIG.

【０１４８】このように、ピッチ周波数に基づいて選択
した音階を様々な領域に遷移することにより、ペットロ
ボット１から出力される音を制御することができる。こ
れにより、入力された音声の高低が、直接ペットロボッ
ト１から出力される音に現れることを抑制することがで
きる。すなわち、このような周波数領域の遷移を行わな
い場合、高い声が入力された場合、それに応じて高音が
ペットロボット１から出力されることになる。As described above, the sound output from the pet robot 1 can be controlled by transitioning the scale selected based on the pitch frequency to various regions. As a result, it is possible to prevent the pitch of the input voice from appearing directly in the sound output from the pet robot 1. That is, if such a transition in the frequency domain is not performed, and if a high voice is input, a high tone is output from the pet robot 1 accordingly.

【０１４９】以上においては、音声のピッチ周波数に対
応する音階に基づいて旋律データを変換するとしたが、
様々な情報に基づいて旋律データを変換し、それを出力
させることもできる。In the above description, the melody data is converted based on the scale corresponding to the pitch frequency of the voice.
It is also possible to convert the melody data based on various information and output it.

【０１５０】次に、図１８のフローチャートを参照し
て、感情の変化に基づいて選択した音階を利用して旋律
データを生成し、それを再生するペットロボット１の処
理について説明する。Next, with reference to the flowchart of FIG. 18, the processing of the pet robot 1 for generating melody data using the scale selected based on the change in emotion and reproducing it will be described.

【０１５１】行動管理部２７は、ステップＳ２１におい
て、本能・感情管理部２６から通知されてくる内部情報
に基づいて、ペットロボット１の感情の変化を検出した
か否かを判定し、検出したと判定するまで待機する。In step S21, the behavior management unit 27 determines whether or not a change in emotion of the pet robot 1 is detected, based on the internal information notified from the instinct / emotion management unit 26, and it is detected. Wait until the judgment.

【０１５２】そして、行動管理部２７は、ステップＳ２
１において、感情の変化を検出したと判定した場合、ス
テップＳ２２に進み、ペットロボット１の感情が変化し
たことを表わす情報を、変換する旋律データの識別情報
とともに旋律生成部３２に通知する。The action management section 27 then proceeds to step S2.
When it is determined in 1 that a change in emotion has been detected, the process proceeds to step S22, and information indicating that the emotion of the pet robot 1 has changed is notified to the melody generation unit 32 together with the identification information of the melody data to be converted.

【０１５３】旋律生成部３２は、ステップＳ２３におい
て、行動管理部２７から感情が変化したことが通知され
てきたとき、旋律データを変換するために用いる音階
を、変化した感情に基づいて選択する。In step S23, the melody generator 32 selects the scale used for converting the melody data based on the changed emotion when the behavior manager 27 is notified that the emotion has changed.

【０１５４】図１９は、変化したペットロボット１の感
情と、音階の対応例を示す図であり、このような対応が
旋律生成部３２に予め用意されている。FIG. 19 is a diagram showing an example of correspondence between the changed emotion of the pet robot 1 and the musical scale, and such correspondence is prepared in advance in the melody generation section 32.

【０１５５】この例においては、旋律生成部３２は、感
情が「JOY（うれしさ）」に変化したとき、「Ｃ，Ｅ，
Ｇ」の音階を選択し、感情が「SAD（悲しさ）」に変化
したとき、「Ｃ，Ｄ♯，Ｇ」の音階を選択し、感情が
「ANGRY（怒り）」に変化したとき、「Ｃ，Ｃ，Ｆ」の
音階を選択するとされている。また、旋律生成部３２
は、感情が「SURPRISE（驚き）」に変化したとき、
「Ｃ，Ｄ♯，Ｆ♯」の音階を選択し、感情が「DISGUST
（嫌悪）」に変化したとき、「Ｃ，Ｆ♯，Ｃ」の音階を
選択し、感情が「FEAR（おそれ）」に変化したとき、
「Ｃ，Ｃ♯，Ｃ」の音階を選択するとされている。In this example, when the emotion changes to "JOY (joy)", the melody generator 32 outputs "C, E,
When the "G" scale is selected and the emotion changes to "SAD", when the "C, D #, G" scale is selected and the emotion changes to "ANGRY", " It is said that a scale of "C, C, F" is selected. In addition, the melody generator 32
When the emotion changes to "SURPRISE",
Select the scale of "C, D #, F #" and the emotion is "DISGUST
(Aversion) ”, the scale of“ C, F #, C ”is selected, and when the emotion changes to“ FEAR ”,
It is said that the scale of "C, C #, C" is selected.

【０１５６】例えば、「JOY（うれしさ）」にメジャー
な音を対応付け、「SAD（悲しさ）」にマイナーな音を
対応付けることにより、ペットロボット１の感情がうれ
しいときには、一般的に明るい感じがする音が出力され
ることになり、悲しいときには、一般的に暗く悲しげな
感じがする音が出力されることになる。従って、ペット
ロボット１の感情を表現することができる。For example, when "JOY" is associated with a major sound and "SAD" is associated with a minor sound, when the pet robot 1 is happy, it generally feels bright. When the person is sad, a sound that feels dark and sad is generally output. Therefore, the emotion of the pet robot 1 can be expressed.

【０１５７】さらに、感情値が大きいときには、音程を
上げると自然なので、感情値が１０のときは音程を＋１
０シフトすることも考えられる。Further, when the emotion value is large, it is natural to raise the pitch, so when the emotion value is 10, the pitch is +1.
It is possible to shift 0.

【０１５８】そして、ステップＳ２４において、旋律生
成部３２は、行動管理部２７から指示された旋律データ
を旋律データ記憶部３３から読み出し、選択した音階に
基づいて、その旋律データを変換する。Then, in step S24, the melody generating section 32 reads out the melody data instructed by the behavior management section 27 from the melody data storage section 33, and converts the melody data based on the selected scale.

【０１５９】図２０は、旋律データの変換処理を説明す
る図である。FIG. 20 is a diagram for explaining conversion processing of melody data.

【０１６０】図２０においては、図１１に示した旋律デ
ータ１−１を変換する場合の例とされ、また、旋律生成
部３２に対して、感情が「ANGRY（怒り）」に変化した
ことが通知されてきた場合の例とされている。FIG. 20 shows an example in which the melody data 1-1 shown in FIG. 11 is converted, and the emotion is changed to “ANGRY” with respect to the melody generating section 32. It is an example of when notification is given.

【０１６１】従って、旋律生成部３２は、図１９に示し
たような対応テーブルから、「Ｃ，Ｃ，Ｆ」の音階を選
択し、それに基づいて旋律データ１−１を変換する。例
えば、旋律生成部３２は、出力される音がペットロボッ
ト１の標準発話域内のものとなるように、選択した音階
「Ｃ，Ｃ，Ｆ」を「Ｃ６，Ｃ６，Ｆ６」とし、旋律デー
タ１−１の「Ｃ３，Ｅ３，Ｇ３」を、それぞれ「Ｃ６，
Ｃ６，Ｆ６」で変換する。Therefore, the melody generator 32 selects the scale of "C, C, F" from the correspondence table as shown in FIG. 19 and converts the melody data 1-1 based on the selected scale. For example, the melody generation unit 32 sets the selected scale “C, C, F” to “C6, C6, F6” so that the output sound is within the standard utterance range of the pet robot 1, and the melody data 1 -1 "C3, E3, G3" is changed to "C6, E3, G3", respectively.
Convert with "C6, F6".

【０１６２】そして、以上のような変換により「Ｃ６，
Ｃ６，Ｃ６，Ｆ６，Ｃ６，Ｃ６，Ｃ６，Ｆ６，Ｃ６（＋
１２）」の音の並びからなる旋律データ１−５が生成さ
れる。Then, by the above conversion, "C6
C6, C6, F6, C6, C6, C6, F6, C6 (+
12) ”is formed, and melody data 1-5 is generated.

【０１６３】また、例えば、ペットロボット１の感情が
「SAD（悲しさ）」に変化したことが通知されてきた場
合、旋律生成部３２は、「Ｃ，Ｄ♯，Ｇ」の音階を選択
し、出力される音がペットロボット１の標準発話域内の
ものとなるように、その音階を「Ｃ６，Ｄ６♯，Ｇ６」
とする。そして、旋律生成部３２は、「Ｃ６，Ｄ６♯，
Ｇ６」の音階を用いて旋律データ１−１を変換し、「Ｃ
６，Ｃ６，Ｄ６♯，Ｇ６，Ｃ６，Ｃ６，Ｄ６♯，Ｇ６，
Ｃ６（＋１２）」の音の並びからなる新たな旋律データ
を生成する。Further, for example, when it is notified that the emotion of the pet robot 1 has changed to "SAD (sadness)", the melody generating section 32 selects the scale of "C, D #, G". , So that the output sound is within the standard utterance range of the pet robot 1, the scale is “C6, D6 #, G6”.
And Then, the melody generator 32 displays “C6, D6 #,
The melody data 1-1 is converted using the scale of "G6", and "C
6, C6, D6 #, G6, C6, C6, D6 #, G6
New melody data composed of a sequence of sounds of "C6 (+12)" is generated.

【０１６４】このようにして生成された旋律データは、
音声合成部３４に供給され、ステップＳ２５において、
音声合成部３４により再生される。The melody data generated in this way is
It is supplied to the voice synthesizer 34, and in step S25,
It is reproduced by the voice synthesizer 34.

【０１６５】以上のように、ペットロボット１により検
出された様々な情報に基づいて、予め用意されている旋
律データを変換し、新たな旋律データを生成することが
できる。例えば、カメラ１２により撮像された画像に含
まれる、所定の色（光）の割合を抽出し、それに基づい
て音階を選択するようにしてもよいし、タッチセンサ１
３により検出された外部からの圧力のレベルに基づい
て、音階を選択し、それを利用して新たな旋律データを
生成するようにしてもよい。また、マイクロフォン１１
により集音された音声などの音量に基づいて新たな旋律
データを生成することもできる。As described above, the melody data prepared in advance can be converted based on the various information detected by the pet robot 1, and new melody data can be generated. For example, the proportion of a predetermined color (light) included in the image captured by the camera 12 may be extracted, and the scale may be selected based on the extracted proportion.
It is also possible to select a scale based on the level of the external pressure detected by 3 and use it to generate new melody data. Also, the microphone 11
It is also possible to generate new melody data based on the volume of a voice or the like collected by.

【０１６６】これにより、ユーザの予測のできない、無
数のパターンの音（メロディ）をペットロボット１に出
力させることができる。また、予め用意しておく旋律デ
ータの数を減らすことが可能となる。As a result, the pet robot 1 can be made to output numerous patterns of sounds (melody) that cannot be predicted by the user. Further, it is possible to reduce the number of melody data prepared in advance.

【０１６７】上述した各種の処理は、図１に示したよう
な動物型のロボットに実行させるだけでなく、例えば、
２足歩行が可能な人間型のロボットや、コンピュータ内
で活動する仮想ロボット等に実行させるようにしてもよ
い。The above-mentioned various processes are not only executed by the animal type robot as shown in FIG.
It may be executed by a humanoid robot capable of bipedal walking, a virtual robot that operates in a computer, or the like.

【０１６８】上述した一連の処理は、ハードウェアによ
り実行させることもできるが、ソフトウェアにより実行
させることもできる。この場合、そのソフトウェアを実
行させる情報処理装置は、例えば、図２１に示されるよ
うなパーソナルコンピュータにより構成される。The series of processes described above can be executed by hardware, but can also be executed by software. In this case, the information processing device that executes the software is configured by a personal computer as shown in FIG. 21, for example.

【０１６９】図２１において、CPU７１は、ROM（Read O
nly Memory）７２に記憶されているプログラム、また
は、記憶部７８からRAM（Random Access Memory）７３
にロードされたプログラムに従って各種の処理を実行す
る。RAM７３にはまた、CPU７１が各種の処理を実行する
上において必要なデータなどが適宜記憶される。In FIG. 21, the CPU 71 uses the ROM (Read O
nly Memory) 72 or a program stored in the storage unit 78 to RAM (Random Access Memory) 73
Various processes are executed according to the program loaded in. The RAM 73 also appropriately stores data necessary for the CPU 71 to execute various processes.

【０１７０】CPU７１、ROM７２、およびRAM７３は、バ
ス７４を介して相互に接続されている。このバス７４に
はまた、入出力インタフェース７５も接続されている。The CPU 71, ROM 72, and RAM 73 are connected to each other via a bus 74. An input / output interface 75 is also connected to the bus 74.

【０１７１】入出力インタフェース７５には、キーボー
ド、マウスなどよりなる入力部７６、CRT(Cathode Ray
Tube)，LCD(Liquid Crystal Display)などよりなるディ
スプレイ、並びにスピーカなどよりなる出力部７７、ハ
ードディスクなどより構成される記憶部７８、モデム、
ターミナルアダプタなどより構成される通信部７９が接
続されている。通信部７９は、ネットワークを介しての
通信処理を行う。The input / output interface 75 includes an input unit 76 including a keyboard and a mouse and a CRT (Cathode Ray).
Tube), an LCD (Liquid Crystal Display) display, etc., and an output unit 77 including a speaker, a storage unit 78 including a hard disk, a modem,
A communication unit 79 composed of a terminal adapter or the like is connected. The communication unit 79 performs communication processing via the network.

【０１７２】入出力インタフェース７５にはまた、必要
に応じてドライブ８０が接続され、磁気ディスク８１、
光ディスク８２、光磁気ディスク８３、或いは半導体メ
モリ８４などが適宜装着される。A drive 80 is connected to the input / output interface 75 if necessary, and a magnetic disk 81,
The optical disc 82, the magneto-optical disc 83, the semiconductor memory 84, or the like is mounted as appropriate.

【０１７３】一連の処理をソフトウェアにより実行させ
る場合には、そのソフトウェアを構成するプログラム
が、図２１に示すような汎用のパーソナルコンピュータ
などに、ネットワークや記録媒体からインストールされ
る。When a series of processes is executed by software, a program forming the software is installed in a general-purpose personal computer as shown in FIG. 21 from a network or a recording medium.

【０１７４】この記録媒体は、図２１に示すように、装
置本体とは別に、ユーザにプログラムを提供するために
配布される、プログラムが記録されている磁気ディスク
８１（フロッピディスクを含む）、光ディスク８２（CD
-ROM(Compact Disk-Read Only Memory)，DVD(Digital V
ersatile Disk)を含む）、光磁気ディスク８３（MD（登
録商標）(Mini-Disk)を含む）、もしくは半導体メモリ
８４などよりなるパッケージメディアにより構成される
だけでなく、装置本体に予め組み込まれた状態でユーザ
に提供される、プログラムが記録されているROM７２
や、記憶部７８に含まれるハードディスクなどで構成さ
れる。As shown in FIG. 21, this recording medium is a magnetic disk 81 (including a floppy disk) on which a program is recorded, which is distributed in order to provide the program to the user, separately from the main body of the apparatus, an optical disk. 82 (CD
-ROM (Compact Disk-Read Only Memory), DVD (Digital V
ersatile disk), a magneto-optical disk 83 (including MD (registered trademark) (Mini-Disk)), or a package medium including a semiconductor memory 84, etc. ROM 72 in which the program is recorded, which is provided to the user in the state
Or a hard disk included in the storage unit 78.

【０１７５】なお、本明細書において、記録媒体に記録
されるプログラムを記述するステップは、記載された順
序に従って時系列的に行われる処理はもちろん、必ずし
も時系列的に処理されなくとも、並列的あるいは個別に
実行される処理をも含むものである。In the present specification, the steps for writing the program recorded on the recording medium are not limited to the processing performed in time series according to the order described, but are not necessarily performed in time series, but may be performed in parallel. Alternatively, it also includes processes that are individually executed.

【０１７６】[0176]

【発明の効果】本発明の第１のロボット装置およびロボ
ット制御方法、並びにプログラムによれば、第１の音階
から構成される第１の旋律データを記憶し、入力された
音声のピッチ周波数を抽出し、抽出したピッチ周波数に
基づいて第２の音階を選択する。また、記憶している第
１の旋律データを構成する第１の音階を、選択された第
２の音階に変換して第２の旋律データを生成し、第２の
旋律データを再生するようにしたので、ユーザからの発
話に応じて、より好適な応答を行うことができる。According to the first robot apparatus, the robot control method, and the program of the present invention, the first melody data composed of the first scale is stored, and the pitch frequency of the inputted voice is extracted. Then, the second scale is selected based on the extracted pitch frequency. In addition, the first scale constituting the stored first melody data is converted into the selected second scale to generate the second melody data, and the second melody data is reproduced. Therefore, it is possible to provide a more suitable response according to the utterance from the user.

【０１７７】本発明の第２のロボット装置およびロボッ
ト制御方法、並びにプログラムによれば、第１の音階か
ら構成される第１の旋律データを記憶し、自らの内部状
態を管理し、管理する内部状態が変化したとき、内部状
態の変化に対応する第２の音階を選択する。また、記憶
している第１の旋律データを構成する第１の音階を、選
択した第２の音階に変換して第２の旋律データを生成
し、生成した第２の旋律データを再生するようにしたの
で、ユーザが予測のできない音を発生することができ
る。According to the second robot device, the robot control method, and the program of the present invention, the first melody data composed of the first scale is stored, and the internal state of itself is managed and managed internally. When the state changes, the second scale corresponding to the change in the internal state is selected. In addition, the first scale constituting the stored first melody data is converted into the selected second scale to generate the second melody data, and the generated second melody data is reproduced. As a result, the user can generate a sound that cannot be predicted.

[Brief description of drawings]

【図１】本発明を適用したペットロボットの外観の例を
示す斜視図である。FIG. 1 is a perspective view showing an example of the appearance of a pet robot to which the present invention is applied.

【図２】図１のペットロボットの内部構成の例を示すブ
ロック図である。FIG. 2 is a block diagram showing an example of an internal configuration of the pet robot of FIG.

【図３】図１のペットロボットの機能構成の例を示すブ
ロック図である。FIG. 3 is a block diagram showing an example of a functional configuration of the pet robot of FIG.

【図４】図３の本能・感情管理部の機能の例を模式的に
示す図である。4 is a diagram schematically showing an example of functions of an instinct / emotion management unit in FIG.

【図５】図３の行動管理部の機能の例を模式的に示す図
である。5 is a diagram schematically showing an example of functions of an action management unit shown in FIG.

【図６】有限オートマトンの例を示す図である。FIG. 6 is a diagram showing an example of a finite state automaton.

【図７】状態の遷移確率の例を示す図であるFIG. 7 is a diagram showing an example of a state transition probability.

【図８】図１のペットロボットの処理を説明するフロー
チャートである。8 is a flowchart illustrating a process of the pet robot of FIG.

【図９】ピッチ周波数を抽出する処理を説明する図であ
る。FIG. 9 is a diagram illustrating a process of extracting a pitch frequency.

【図１０】音階とピッチ周波数との対応例を示す図であ
るFIG. 10 is a diagram showing an example of correspondence between musical scales and pitch frequencies.

【図１１】旋律データの例を示す図である。FIG. 11 is a diagram showing an example of melody data.

【図１２】旋律データを変換する処理を説明する図であ
る。FIG. 12 is a diagram illustrating a process of converting melody data.

【図１３】旋律データを変換する他の処理を説明する図
である。FIG. 13 is a diagram illustrating another process of converting melody data.

【図１４】旋律データを変換するさらに他の処理を説明
する図である。FIG. 14 is a diagram illustrating still another process of converting melody data.

【図１５】他の旋律データを変換する処理を説明する図
である。FIG. 15 is a diagram illustrating a process of converting another piece of melody data.

【図１６】さらに他の旋律データを変換する処理を説明
する図である。FIG. 16 is a diagram illustrating a process of converting still another melody data.

【図１７】音階の遷移を説明する図である。FIG. 17 is a diagram for explaining transition of musical scales.

【図１８】図１のペットロボットの他の処理を説明する
フローチャートである。FIG. 18 is a flowchart illustrating another process of the pet robot of FIG. 1.

【図１９】感情と音階の対応例を示す図である。FIG. 19 is a diagram showing an example of correspondence between emotions and musical scales.

【図２０】旋律データを変換する処理を説明する図であ
る。FIG. 20 is a diagram illustrating a process of converting melody data.

【図２１】パーソナルコンピュータの例を示すブロック
図である。FIG. 21 is a block diagram showing an example of a personal computer.

[Explanation of symbols]

１０コントローラ，１１マイクロフォン，１４
スピーカ，２１AD変換部，２２音声特徴量分析
部，２３音声区間検出部，２４音声認識部，
２６本能・感情管理部，２７行動管理部，２８
音声ピッチ分析部，３２旋律生成部，３３旋
律データ記憶部，３４音声合成部，３５ DA変換
部，８１磁気ディスク，８２光ディスク，８
３光磁気ディスク，８４半導体メモリ10 controller, 11 microphone, 14
Speaker, 21 AD conversion unit, 22 voice feature amount analysis unit, 23 voice section detection unit, 24 voice recognition unit,
26 Instinct / Emotion Management Department, 27 Behavior Management Department, 28
Voice pitch analysis unit, 32 melody generation unit, 33 melody data storage unit, 34 voice synthesis unit, 35 DA conversion unit, 81 magnetic disk, 82 optical disk, 8
3 magneto-optical disk, 84 semiconductor memory

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 13/00 Ｇ１０Ｌ 3/00 ５３１Ｗ 13/06 ５３７Ｇ 15/00 Ｊ 15/08 Ｒ 15/10 9/08 Ｂ 15/18 9/00 Ｂ 5/04 Ｆ 3/00 ５３１Ｎ (72)発明者半谷亮東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 2C150 BA11 CA02 DA05 DA24 DA27 DA28 DF03 DF04 DF06 DF08 DF33 ED42 ED47 ED52 EF07 EF16 EF17 EF23 EF28 EF29 EF33 EF36 3C007 AS36 CS08 KS11 KS31 KS39 MT14 WA04 WA14 WB19 WB27 WC07 WC30 5D015 FF00 HH03 KK02 5D045 AA09 AB11 ─────────────────────────────────────────────────── ─── Continued Front Page (51) Int.Cl. ⁷ Identification Code FI Theme Coat (Reference) G10L 13/00 G10L 3/00 531W 13/06 537G 15/00 J 15/08 R 15/10 9/08 B 15/18 9/00 B 5/04 F 3/00 531N (72) Inventor Ryo Hanya 6-735 Kita-Shinagawa, Shinagawa-ku, Tokyo Sonomi Co., Ltd. F-term (reference) 2C150 BA11 CA02 DA05 DA24 DA27 DA28 DF03 DF04 DF06 DF08 DF33 ED42 ED47 ED52 EF07 EF16 EF17 EF23 EF28 EF29 EF33 EF36 3C007 AS36 CS08 KS11 KS31 KS39 MT14 WA04 WA14 WB19 WB27 WC07 WC30 5D015 AFF09 5H003 KK

Claims

[Claims]

1. A storage means for storing first melody data composed of a first scale, an extraction means for extracting a pitch frequency of an input voice, and the pitch frequency extracted by the extraction means. Selecting means for selecting a second scale based on the second scale selected by the selecting means, and the first scale constituting the first melody data stored in the storage means. A robot apparatus comprising: a generation unit that converts and generates second melody data; and a reproduction unit that reproduces the second melody data generated by the generation unit.

2. The robot apparatus according to claim 1, wherein the extraction means extracts a pitch frequency of the voice that represents a predetermined keyword.

3. The robot apparatus according to claim 1, wherein the extraction unit extracts a pitch frequency of the voice including a vowel.

4. The robot apparatus according to claim 1, wherein the extracting unit extracts an average value of pitch frequencies detected in a predetermined period of the voice as the pitch frequency.

5. The robot apparatus according to claim 1, wherein the generating means further transits the second scale by a predetermined number of octaves to generate the second melody data.

6. A storage step of storing first melody data composed of a first scale, an extraction step of extracting a pitch frequency of an input voice, and the pitch extracted by the processing of the extraction step. A selection step of selecting a second scale based on a frequency, and the first step stored by the processing of the storage step.
Generating the second melody data by converting the first scale constituting the melody data of the above into the second scale selected by the processing of the selecting step; and generating by the processing of the generating step. A reproducing step of reproducing the generated second melody data, the robot controlling method.

7. A storage control step of controlling storage of first melody data composed of a first scale, an extraction step of extracting a pitch frequency of input voice, and a processing of the extraction step. A selection step of selecting a second scale based on the pitch frequency, and a process of the selection step of selecting the first scale constituting the first melody data stored by the processing of the storage control step. And a reproduction control step of controlling reproduction of the second melody data generated by the processing of the generation step. A recording medium having a computer-readable program recorded thereon.

8. A storage control step of controlling storage of first melody data composed of a first scale, an extraction step of extracting a pitch frequency of an input voice, and a extraction step performed by the processing of the extraction step. A selection step of selecting a second scale based on the pitch frequency, and a process of the selection step of selecting the first scale constituting the first melody data stored by the processing of the storage control step. And a reproduction control step of controlling reproduction of the second melody data generated by the processing of the generation step. A program that causes a computer to execute.

9. A storage means for storing first melody data composed of a first scale, a management means for managing its own internal state, and a change in the internal state managed by the management means. , Selecting means for selecting a second scale corresponding to the change in the internal state, and selecting the first scale constituting the first melody data stored by the storage means by the selecting means. A robot apparatus comprising: a generation unit that converts the second melody data into the second scale to generate second melody data; and a reproduction unit that reproduces the second melody data generated by the generation unit. .

10. The generating means sets the second scale as:
The robot apparatus according to claim 9, wherein the second melody data is generated by further transitioning by a predetermined number of octaves.

11. A storage step of storing first melody data composed of a first scale, a management step of managing its own internal state, and a change of the internal state managed by the processing of said management step. The selection step of selecting a second scale corresponding to the change in the internal state, and the first step stored by the processing of the storage step.
Generating the second melody data by converting the first scale constituting the melody data of the above into the second scale selected by the processing of the selecting step; and generating by the processing of the generating step. A reproducing step of reproducing the generated second melody data, the robot controlling method.

12. A storage control step of controlling storage of first melody data composed of a first scale, a management step of managing an internal state of itself, the internal section managed by processing of the management step. A selection step of selecting a second scale corresponding to the change of the internal state when the state changes, and the first scale that constitutes the first melody data stored by the processing of the storage control step. To a second scale selected by the process of the selecting step to generate second melody data; and reproduction of the second melody data generated by the process of the generating step. A recording medium having a computer-readable program recorded thereon, which comprises a reproduction control step for controlling.

13. A storage control step of controlling storage of first melody data composed of a first scale, a management step of managing an internal state of itself, the internal section managed by processing of the management step. A selection step of selecting a second scale corresponding to the change of the internal state when the state changes, and the first scale that constitutes the first melody data stored by the processing of the storage control step. To a second scale selected by the process of the selecting step to generate second melody data; and reproduction of the second melody data generated by the process of the generating step. A program that causes a computer to execute a reproduction control step for controlling.