JP2021117580A

JP2021117580A - Information processing equipment and programs

Info

Publication number: JP2021117580A
Application number: JP2020009102A
Authority: JP
Inventors: 輝長岡; Akira Nagaoka; 春満信田; Harumitsu Nobuta; 敏之前澤; Toshiyuki Maezawa
Original assignee: Mixi Inc
Current assignee: Mixi Inc
Priority date: 2020-01-23
Filing date: 2020-01-23
Publication date: 2021-08-10
Anticipated expiration: 2040-01-23
Also published as: JP7436804B2; JP7598076B2; JP2025015772A; JP2024054168A

Abstract

【課題】ユーザのごとの事情に合わせた会話文の提供を可能とする情報処理装置、及びプログラムを提供する。
【解決手段】日時情報に関連付けられた、ユーザのイベントを特定するイベント特定情報を取得し、当該日時情報が表す日時より後の日時に発話する処理において、当該イベント特定情報に関連する関連会話文を生成する情報処理装置である。
【選択図】図１
PROBLEM TO BE SOLVED: To provide an information processing device and a program capable of providing a conversational sentence according to a situation of each user.
SOLUTION: In a process of acquiring event specific information that identifies a user's event associated with date and time information and speaking at a date and time after the date and time represented by the date and time information, a related conversational sentence related to the event specific information. It is an information processing device that generates.
[Selection diagram] Fig. 1

Description

本発明は、情報処理装置、及びプログラムに関する。 The present invention relates to an information processing device and a program.

近年では、ユーザの予定を管理するツールが種々開発されている。こうしたツールのうちには、ユーザからの会話に含まれる所定のキーワードに基づいて予定を登録して管理する技術が知られている（例えば、非特許文献１等などを参照）。 In recent years, various tools for managing user schedules have been developed. Among such tools, there is known a technique for registering and managing schedules based on predetermined keywords included in conversations from users (see, for example, Non-Patent Document 1 and the like).

「ｉＰｈｏｎｅユーザガイド」、[online]、アップルコンピュータ、[令和元年１２月１６日検索]、インターネット<URL: https://support.apple.com/ja-jp/guide/iphone/iph3d110f84/ios>"IPhone User Guide", [online], Apple Computer, [Searched on December 16, 1st year of Reiwa], Internet <URL: https://support.apple.com/ja-jp/guide/iphone/iph3d110f84/ios >

しかしながら、上記従来のツールでは、会話的な文でユーザとのやりとりをしていながら、将来の予定を管理するだけで、その予定に関わる会話文の提供を行うなどの活用ができておらず、ユーザごとの事情に合わせた情報提供が十分でない。 However, the above-mentioned conventional tools cannot be utilized such as providing conversational sentences related to the future schedule only by managing the future schedule while communicating with the user by conversational sentences. Insufficient information is provided according to the circumstances of each user.

本発明は上記実情に鑑みて為されたもので、ユーザごとの事情に合わせた会話文の提供を可能とする情報処理装置、及びプログラムを提供することを、その目的の一つとする。 The present invention has been made in view of the above circumstances, and one of its purposes is to provide an information processing device and a program capable of providing conversational sentences according to the circumstances of each user.

上記従来例の問題点を解決する本発明の一態様は、情報処理装置であって、日時情報に関連付けられた、ユーザのイベントを特定するイベント特定情報を取得する取得手段と、前記日時情報が表す日時より後の日時に発話する処理において、当該イベント特定情報に関連する関連会話文を生成する会話文生成手段と、を備えることとしたものである。 One aspect of the present invention that solves the problems of the above-mentioned conventional example is an information processing device, in which an acquisition means for acquiring event specific information that identifies a user's event associated with date and time information and the date and time information are used. In the process of uttering at a date and time after the represented date and time, it is provided with a conversation sentence generation means for generating a related conversation sentence related to the event specific information.

本発明によると、ユーザごとの事情に合わせた会話文の提供が可能となる。 According to the present invention, it is possible to provide a conversational sentence according to the circumstances of each user.

本発明の実施の形態に係る情報処理システムの構成例を表すブロック図である。It is a block diagram which shows the structural example of the information processing system which concerns on embodiment of this invention. 本発明の実施の形態に係る端末装置の構成例を表すブロック図である。It is a block diagram which shows the structural example of the terminal apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るサーバの例を表す機能ブロック図である。It is a functional block diagram which shows the example of the server which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理システムで利用される会話文キューの例を表す説明図である。It is explanatory drawing which shows the example of the conversational sentence queue used in the information processing system which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理システムで利用されるアクションデータベースの内容例を表す説明図である。It is explanatory drawing which shows the content example of the action database used in the information processing system which concerns on embodiment of this invention. 本発明の実施の形態に係る端末装置の例を表す機能ブロック図である。It is a functional block diagram which shows the example of the terminal apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理システムで利用される設定情報の例を表す説明図である。It is explanatory drawing which shows the example of the setting information used in the information processing system which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理システムの動作例を表す流れ図である。It is a flow chart which shows the operation example of the information processing system which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理システムの動作例を表すもう一つの流れ図である。It is another flow chart which shows the operation example of the information processing system which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理システムの会話文の選択の処理の例を表すフローチャート図である。It is a flowchart which shows the example of the process of selecting the conversational sentence of the information processing system which concerns on embodiment of this invention.

本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る情報処理システム１は、図１に例示するように、情報処理装置としてのサーバ１０と、当該サーバ１０にネットワーク等の通信手段を介して通信可能に接続された端末装置２０とを含んで構成される。 Embodiments of the present invention will be described with reference to the drawings. As illustrated in FIG. 1, the information processing system 1 according to the embodiment of the present invention includes a server 10 as an information processing device and a terminal communicably connected to the server 10 via a communication means such as a network. It is configured to include the device 20.

ここでサーバ１０は、図１に示したように、制御部１１と、記憶部１２と、通信部１３とを含んで構成される。また、端末装置２０は、ロボットであり、図２に例示するように、脚部２１と、本体部２２とを少なくとも含み、本体部２２に、制御部３１と、記憶部３２と、センサ部３３と、表示部３４と、音声出力部３５と、通信部３６と、駆動部３７とを収納している。また脚部２１と本体部２２とは、少なくとも１軸まわりに回転可能なアクチュエータを介して連結されており、脚部２１に対して本体部２２の向きを回動可能となっている。 Here, as shown in FIG. 1, the server 10 includes a control unit 11, a storage unit 12, and a communication unit 13. Further, the terminal device 20 is a robot, and as illustrated in FIG. 2, includes at least a leg portion 21 and a main body portion 22, and the main body portion 22 includes a control unit 31, a storage unit 32, and a sensor unit 33. The display unit 34, the audio output unit 35, the communication unit 36, and the drive unit 37 are housed. Further, the leg portion 21 and the main body portion 22 are connected via an actuator that can rotate at least one axis, and the direction of the main body portion 22 can be rotated with respect to the leg portion 21.

サーバ１０の制御部１１は、ＣＰＵ等のプログラム制御デバイスであり、記憶部１２に格納されたプログラムに従って動作する。本実施の形態では、この制御部１１は、端末装置２０からリクエスト情報を受け入れる。またこの制御部１１は、当該受け入れたリクエスト情報に基づく処理を実行する。例えば制御部１１は、当該処理の一つとして、端末装置２０にて実行されるアクションを指示するアクション指示と、端末装置２０にて発声される音声の内容を表す文字列情報とを含むアクション情報を、上記リクエスト情報の送信元である端末装置２０へ送信する。 The control unit 11 of the server 10 is a program control device such as a CPU, and operates according to a program stored in the storage unit 12. In the present embodiment, the control unit 11 receives request information from the terminal device 20. Further, the control unit 11 executes a process based on the received request information. For example, as one of the processes, the control unit 11 includes action information including an action instruction for instructing an action to be executed by the terminal device 20 and character string information representing the content of the voice uttered by the terminal device 20. Is transmitted to the terminal device 20 which is the source of the request information.

また、本実施の形態の一例では、この制御部１１は、日時を表す日時情報に関連付けられた、ユーザの、「デート」などのイベントを表す情報（以下、イベント特定情報と呼ぶ）を取得し、当該日時情報が表す日時より後の日時に、端末装置２０にて発声される音声を、端末装置２０へ送出する処理を実行する過程で、上記取得したイベント特定情報に関連する関連会話文を生成する。この制御部１１の詳しい処理の内容については、後に説明する。 Further, in an example of the present embodiment, the control unit 11 acquires information representing an event such as "date" of the user (hereinafter, referred to as event specific information) associated with the date and time information representing the date and time. , In the process of executing the process of sending the voice uttered by the terminal device 20 to the terminal device 20 at a date and time after the date and time represented by the date and time information, the related conversational sentence related to the event specific information acquired above is transmitted. Generate. The details of the processing of the control unit 11 will be described later.

記憶部１２は、ディスクデバイスまたはメモリデバイスであり、制御部１１によって実行されるプログラムを保持する。この記憶部１２は、また、制御部１１のワークメモリとしても動作する。本実施の形態の一例では、この記憶部１２には、端末装置２０への指示を生成するための情報が格納されていてもよい。例えば、この記憶部１２には端末装置２０にて発話させるための会話文の候補を蓄積した会話文キューを保持する。この会話文キューの内容については後に述べる。 The storage unit 12 is a disk device or a memory device, and holds a program executed by the control unit 11. The storage unit 12 also operates as a work memory of the control unit 11. In an example of the present embodiment, the storage unit 12 may store information for generating an instruction to the terminal device 20. For example, the storage unit 12 holds a conversation sentence queue in which candidates for conversation sentences to be spoken by the terminal device 20 are accumulated. The contents of this conversational sentence queue will be described later.

通信部１３は、ネットワークインタフェース等であり、制御部１１から入力される指示に従い、ネットワークを介して端末装置２０宛に種々の情報を送出する。またこの通信部１３は、ネットワークを介して受信した情報を、制御部１１に出力する。 The communication unit 13 is a network interface or the like, and sends various information to the terminal device 20 via the network according to an instruction input from the control unit 11. Further, the communication unit 13 outputs the information received via the network to the control unit 11.

端末装置２０の制御部３１は、ＣＰＵ等のプログラム制御デバイスであり、記憶部３２に格納されたプログラムに従って動作する。本実施の形態では、この制御部３１は、所定のタイミングで、サーバ１０に対して後に説明する、端末装置２０に固有な機器識別情報とともにリクエスト情報を送出する。 The control unit 31 of the terminal device 20 is a program control device such as a CPU, and operates according to a program stored in the storage unit 32. In the present embodiment, the control unit 31 sends request information to the server 10 at a predetermined timing together with the device identification information unique to the terminal device 20, which will be described later.

本実施の形態の一例では、端末装置２０の制御部３１は、後に説明するセンサ部３３がユーザの音声の入力を受け入れると、当該入力された音声を文字列情報に変換する。この処理は、広く知られた音声認識の処理を用いることができ、制御部３１は例えば音声認識処理を実行する音声認識サーバに入力された音声の情報を送出し、認識した文字列情報を受信することでこの処理を実行してもよい。 In an example of the present embodiment, when the sensor unit 33 described later accepts the input of the user's voice, the control unit 31 of the terminal device 20 converts the input voice into character string information. For this process, a widely known voice recognition process can be used, and the control unit 31 sends, for example, voice information input to a voice recognition server that executes voice recognition processing, and receives the recognized character string information. This process may be executed by doing so.

また制御部３１は、ユーザにより音声が入力されたことを契機（トリガ）として、サーバ１０に対してリクエスト情報を送出する。このリクエスト情報には、トリガを特定する情報（例えばユーザにより音声が入力された旨の情報）と、サーバ１０での処理に必要な情報、例えば、ここではユーザが入力した音声の認識結果である文字列情報とを含む。 Further, the control unit 31 sends request information to the server 10 when a voice is input by the user (trigger). This request information includes information for identifying a trigger (for example, information indicating that a voice has been input by the user) and information necessary for processing on the server 10, for example, here, a recognition result of the voice input by the user. Contains character string information.

すなわち制御部３１は、予め定められたトリガが発生したと判断すると、サーバ１０での処理に必要な情報を収集して、当該トリガを特定する情報とともに、当該収集した情報を含むリクエスト情報をサーバ１０へ送出することとなる。このトリガは、先の例のように、ユーザにより音声が入力されたことのほか、所定の時刻になった、など、任意に定め得る。この制御部３１の詳しい動作の内容についても後に説明する。 That is, when the control unit 31 determines that a predetermined trigger has occurred, it collects information necessary for processing on the server 10, and together with the information for identifying the trigger, the server includes the request information including the collected information. It will be sent to 10. This trigger can be arbitrarily set, such as when a voice is input by the user as in the previous example, or when a predetermined time has come. The details of the operation of the control unit 31 will also be described later.

記憶部３２は、メモリデバイス等であり、制御部３１によって実行されるプログラムを保持する。この記憶部３２は、また、制御部３１のワークメモリとしても動作する。 The storage unit 32 is a memory device or the like, and holds a program executed by the control unit 31. The storage unit 32 also operates as a work memory of the control unit 31.

本実施の形態では、この記憶部３２には、上記トリガと、サーバ１０に送出するべき情報を特定する情報等とを関連付けた設定情報が格納されていてもよい。この設定情報については後に具体的な例を挙げて説明する。またこの記憶部３２には、端末装置２０に固有に設定された機器識別情報が格納されている。 In the present embodiment, the storage unit 32 may store setting information in which the trigger is associated with information that specifies information to be sent to the server 10. This setting information will be described later with a specific example. Further, the storage unit 32 stores device identification information uniquely set for the terminal device 20.

さらに、本実施の形態のある例では、この記憶部３２は、表示部３４に表示するべきアニメーションの画像データを格納している。具体的に記憶部３２は、笑顔の目の画像データ、涙の流れるアニメーションの目の画像データ…など目のアニメーションの画像データや、笑った状態で閉じた口の画像データ、泣いている状態での閉じた口の画像データ、発声中の口のアニメーションの画像データ…など、口のアニメーションの画像データ…といったように、キャラクタの表情を表す各部の複数の画像データを格納していてもよい。 Further, in an example of the present embodiment, the storage unit 32 stores the image data of the animation to be displayed on the display unit 34. Specifically, the storage unit 32 contains image data of eye animation such as smiling eye image data, tear-flowing animated eye image data, etc., image data of a closed mouth in a laughing state, and a crying state. A plurality of image data of each part representing the facial expression of the character may be stored, such as the image data of the closed mouth of the character, the image data of the animation of the mouth during vocalization, and the image data of the animation of the mouth.

センサ部３３は、少なくとも音声センサであるマイクを含む。またこのセンサ部３３は、タッチセンサや、加速度センサ等を含んでもよい。このセンサ部３３は、各センサで検出した音声信号や、ユーザが触れた位置を表す情報、加速度の情報等を、制御部３１に出力する。 The sensor unit 33 includes at least a microphone that is a voice sensor. Further, the sensor unit 33 may include a touch sensor, an acceleration sensor, and the like. The sensor unit 33 outputs the audio signal detected by each sensor, information indicating the position touched by the user, acceleration information, and the like to the control unit 31.

表示部３４は、液晶ディスプレイ等であり、制御部３１から入力される指示に従って画像データを表示する。本実施の形態の一例では、表示部３４は、目や口の画像データを用いて、キャラクタの表情を表示する。音声出力部３５は、スピーカー等であり、制御部３１から入力される音声信号に従って音声を鳴動する。 The display unit 34 is a liquid crystal display or the like, and displays image data according to an instruction input from the control unit 31. In an example of this embodiment, the display unit 34 displays the facial expression of the character by using the image data of the eyes and the mouth. The voice output unit 35 is a speaker or the like, and sounds a voice according to a voice signal input from the control unit 31.

通信部３６は、ネットワークインタフェースを含む。この通信部３６は、無線または有線にてネットワークを介してサーバ１０との間で情報を送受する。具体的に通信部３６は、制御部３１から入力される指示に従って、サーバ１０に対してリクエスト情報等を送出する。また、この通信部３６は、サーバ１０から受信した情報を制御部３１に出力する。 The communication unit 36 includes a network interface. The communication unit 36 transmits / receives information to / from the server 10 via a network wirelessly or by wire. Specifically, the communication unit 36 sends request information and the like to the server 10 according to the instruction input from the control unit 31. Further, the communication unit 36 outputs the information received from the server 10 to the control unit 31.

駆動部３７は、制御部３１から入力される指示に従い、脚部２１に対して本体部２２を回転するようアクチュエータを駆動する。 The drive unit 37 drives the actuator so as to rotate the main body 22 with respect to the leg 21 according to the instruction input from the control unit 31.

次に、本実施の形態のサーバ１０の制御部１１の動作について説明する。本実施の形態では、このサーバ１０の制御部１１は、図３に例示するように、受信部４１と、イベント管理部４２と、イベント情報取得部４３と、会話文生成部４４と、会話履歴管理部４５と、アクション情報生成部４６と、指示送信部４７とを含んで構成される。 Next, the operation of the control unit 11 of the server 10 of the present embodiment will be described. In the present embodiment, the control unit 11 of the server 10 includes a reception unit 41, an event management unit 42, an event information acquisition unit 43, a conversation sentence generation unit 44, and a conversation history, as illustrated in FIG. It includes a management unit 45, an action information generation unit 46, and an instruction transmission unit 47.

受信部４１は、端末装置２０からリクエスト情報と、機器識別情報とを受信する。このリクエスト情報には、端末装置２０で実行するべきアクションの要求の原因（トリガ）を特定する情報を含む。トリガの種類については後述するが、例えばユーザによる音声の入力等がその一例となる。ユーザによる音声入力があったとのトリガに基づく上記リクエスト情報には、当該トリガを特定する情報とともに、ユーザにより入力された音声の内容を表す情報が含まれてもよい。ここでユーザにより入力された音声の内容を表す情報は、音声を認識した結果である文字列情報でよい。 The receiving unit 41 receives the request information and the device identification information from the terminal device 20. This request information includes information that identifies the cause (trigger) of the request for the action to be executed by the terminal device 20. The type of trigger will be described later, and an example thereof is voice input by a user. The request information based on the trigger that there is a voice input by the user may include information representing the content of the voice input by the user together with the information that identifies the trigger. Here, the information representing the content of the voice input by the user may be character string information which is the result of recognizing the voice.

受信部４１は、ここで受け入れたリクエスト情報に含まれる、トリガを特定する情報や、ユーザにより入力された音声の内容を表す文字列情報等を、イベント情報取得部４３と、アクション情報生成部４６とに出力する。 The receiving unit 41 receives the information for specifying the trigger, the character string information representing the content of the voice input by the user, and the like included in the request information received here, as well as the event information acquisition unit 43 and the action information generation unit 46. Output to.

イベント管理部４２は、ユーザごとのイベント情報を記憶部１２に格納して管理する。ここでイベント情報は、ユーザの予定であり、日時情報と、イベントを特定するイベント特定情報とを関連付けたものである。 The event management unit 42 stores and manages event information for each user in the storage unit 12. Here, the event information is the schedule of the user, and the date and time information is associated with the event specific information that identifies the event.

イベント情報取得部４３は、ユーザにより入力された音声の内容を表す文字列情報の入力を、受信部４１から受け入れる。そしてイベント情報取得部４３は、当該受け入れた文字列情報がイベントに関わる情報であるか否かを判断する。具体的に本実施の形態のイベント情報取得部４３は、当該文字列情報と、予め定められたイベントに関わる情報のパターンとを比較する。 The event information acquisition unit 43 receives the input of the character string information representing the content of the voice input by the user from the reception unit 41. Then, the event information acquisition unit 43 determines whether or not the received character string information is information related to the event. Specifically, the event information acquisition unit 43 of the present embodiment compares the character string information with a pattern of information related to a predetermined event.

ここでパターンは、「明日」や「明後日」、「来週」など予め定めた日時を表す語（日時関連語）に合致する部分と、「デート」や「仕事」、「美容院」など、イベントに関連する語（イベント関連語）に合致する部分とを含む。このパターンは例えば正規表現の文字列で表すことができ、例えば、「［明日｜明後日｜週末｜来週］［Ｄ＋日］［Ｗ＋曜］［日］［は｜に］Ｗ＋［が］［ある｜なん］＊」などと設定される。 Here, the pattern is the part that matches the predetermined date and time words (date and time related words) such as "tomorrow", "the day after tomorrow", and "next week", and the event such as "date", "work", and "beauty salon". Includes parts that match words related to (event-related words). This pattern can be represented, for example, by a regular expression string, for example, "[Tomorrow | The day after tomorrow | Weekend | Next week] [D + day] [W + day] [Sun] [ha | ni] W + [ga] [is | What] * ”is set.

ここで上記の例では、「＊」は、空白を含む任意の文字に合致する正規表現であり、「＋」は、一文字以上の直前の文字で指定された種類の文字に合致する正規表現であり、［Ｘ｜Ｙ｜…］は、Ｘ，Ｙ…のいずれかに合致する正規表現であるものとする。また［Ｚ］は、存在してもしなくてもよいパターンを意味し、「Ｄ」は数字に、「Ｗ」は文字にそれぞれ合致するものとする。従って上記の文字列は、
「明日、デートがあるんだ」や、
「来週の月曜日は試験だよ」
などの文字列情報に合致する。そして、上記パターンにおいて、
「［明日｜明後日｜週末｜来週］［Ｄ＋日］［Ｗ＋曜］［日］」
の部分が日時関連語に、
「Ｗ＋［が］［ある｜なん］」のうち、先頭の「Ｗ＋」の（任意の文字に合致する）部分
がイベント関連語にそれぞれ相当する。 Here, in the above example, "*" is a regular expression that matches any character including spaces, and "+" is a regular expression that matches the type of character specified by the character immediately before one or more characters. Yes, [X | Y | ...] is a regular expression that matches any of X, Y ... Further, [Z] means a pattern that may or may not exist, and "D" corresponds to a number and "W" corresponds to a character. Therefore, the above string is
"I have a date tomorrow,"
"Next Monday is the exam."
Matches character string information such as. And in the above pattern
"[Tomorrow | The day after tomorrow | Weekend | Next week] [D + day] [W + day] [Sun]"
Part is a date and time related word,
Of "W + [ga] [aru | what]", the part of "W +" (matching any character) at the beginning corresponds to each event-related word.

またこのパターンは一つに限られず、複数あってもよい。例えば上記パターンのほか、「Ｗ＋［が］［ある｜なん］＊［明日｜明後日｜週末｜来週］［Ｄ＋日］［Ｗ＋曜］［日］」などといったパターンを設定しておくこととしてもよい。この例は「ライブがあるんだよね、明日」のように、先に例示したパターンには合致しない例に合致するものであり、イベント情報取得部４３はユーザにより入力された音声の内容を表す文字列情報が上記の例のような、イベントに関わる情報のパターン（複数ある場合はそのいずれか）に合致する場合に、受け入れた文字列情報がイベントに関わる情報であると判断する。 Further, this pattern is not limited to one, and may be plural. For example, in addition to the above pattern, a pattern such as "W + [ga] [is | what] * [tomorrow | the day after tomorrow | weekend | next week] [D + day] [W + day] [day]" may be set. .. This example matches an example that does not match the pattern illustrated above, such as "There is a live, tomorrow", and the event information acquisition unit 43 represents the content of the voice input by the user. When the character string information matches the pattern of the information related to the event (if there are a plurality of them, one of them) as in the above example, it is determined that the accepted character string information is the information related to the event.

すなわちイベント情報取得部４３は、ユーザにより入力された音声の内容を表す文字列情報が上記の例のような、イベントに関わる情報のパターンに合致し、イベントに関わる情報であると判断した場合は、当該文字列情報から日時情報と、イベントを特定するイベント特定情報とを取り出す。具体的に、文字列情報が「明日、デートがあるんだ」である場合は、イベント情報取得部４３は、処理を実行している日時の次の日の日時情報を得る。 That is, when the event information acquisition unit 43 determines that the character string information representing the content of the voice input by the user matches the pattern of the information related to the event as in the above example and is the information related to the event. , Extract the date and time information and the event specific information that specifies the event from the character string information. Specifically, when the character string information is "I have a date tomorrow", the event information acquisition unit 43 obtains the date and time information of the day following the date and time when the process is being executed.

ここで日時情報は、上記パターンにより見出された日時関連語に基づいて推定される、イベントが発生する日時を表す情報である。日時関連語等から日時情報を推定する処理については、広く知られた技術を採用できるので、ここでの詳しい説明は省略する。 Here, the date and time information is information representing the date and time when the event occurs, which is estimated based on the date and time related words found by the above pattern. As for the process of estimating the date and time information from the date and time related words, a widely known technique can be adopted, so detailed description here will be omitted.

例えば処理が１２月１５日に行われている場合、イベント情報取得部４３は、その日から見て「明日」である１２月１６日という日時情報を取得する。また、イベントを特定するイベント特定情報として、イベント情報取得部４３は、上記文字列情報から「デート」を抽出する。このイベント特定情報の抽出の方法も、広く知られている方法を採用できる。 For example, when the processing is performed on December 15, the event information acquisition unit 43 acquires the date and time information of December 16, which is "tomorrow" from that date. Further, as the event specific information for specifying the event, the event information acquisition unit 43 extracts "date" from the character string information. A widely known method can be adopted as the method for extracting the event specific information.

イベント情報取得部４３は、ここで取り出した、日時情報と、イベントを特定するイベント特定情報とを会話文生成部４４に出力する。またイベント情報取得部４３は、ユーザにより入力された音声の内容を表す文字列情報が、イベントに関わる情報のパターン（複数ある場合はそのいずれにも）に合致しない場合は、処理を中断する。 The event information acquisition unit 43 outputs the date and time information extracted here and the event specific information that identifies the event to the conversation sentence generation unit 44. Further, the event information acquisition unit 43 interrupts the process when the character string information representing the content of the voice input by the user does not match the pattern of the information related to the event (if there are a plurality of them, none of them).

会話文生成部４４は、イベント情報取得部４３が出力する日時情報と、イベント特定情報とを受けて、会話文を生成する。具体的にこの会話文生成部４４は、イベント情報取得部４３が出力する日時より後の日時（発話期間）を定める。 The conversation sentence generation unit 44 generates a conversation sentence by receiving the date and time information output by the event information acquisition unit 43 and the event specific information. Specifically, the conversation sentence generation unit 44 determines a date and time (utterance period) after the date and time output by the event information acquisition unit 43.

この発話期間の決定は、例えば次のようにして行う。本実施の形態のある例では、サーバ１０は、日数候補情報として、予め、「１日」、「２日」、「１週間」など予め定めた日数候補と、日数候補ごとに定めた発話期間の初日から末日までの日数（発話期間長）と、イベントの発生日を特定する語（日時指示語と呼ぶ）とを互いに関連付けた日数データテーブルを、記憶部１２に格納しているものとする。 The utterance period is determined, for example, as follows. In an example of this embodiment, the server 10 uses a predetermined number of days candidate such as "1 day", "2 days", and "1 week" as the number of days candidate information, and a utterance period determined for each number of days candidate. It is assumed that the storage unit 12 stores a number of days data table in which the number of days from the first day to the last day (the length of the utterance period) and the word that specifies the occurrence date of the event (called a date and time demonstrative) are associated with each other. ..

会話文生成部４４は、イベント情報取得部４３が出力する日時に対して、上記日数データテーブルに含まれる日数候補のうち一つをランダムに選択して加算して、発話期間の初日を定める。また、会話文生成部４４は、この発話期間の初日に、上記選択した日数候補に関連付けられた発話期間長を加算して発話期間の末日を定める。 The conversation sentence generation unit 44 randomly selects and adds one of the number of days candidates included in the number of days data table to the date and time output by the event information acquisition unit 43, and determines the first day of the utterance period. Further, the conversation sentence generation unit 44 determines the last day of the utterance period by adding the utterance period length associated with the selected number of days candidate to the first day of the utterance period.

一例として、イベント情報取得部４３が出力する日時が「１２月１５日」であるとき、日数候補のうち「２日」（関連付けられた発話期間長を「０日」とする）を選択したとすると、会話文生成部４４は、発話期間の初日を「１２月１７日」、末日を「１２月１７日」とする。 As an example, when the date and time output by the event information acquisition unit 43 is "December 15", "2 days" (assuming the associated utterance period length is "0 days") is selected from the candidate days. Then, the conversation sentence generation unit 44 sets the first day of the utterance period to "December 17" and the last day to "December 17".

また会話文生成部４４は、選択した日数候補に関連付けられた日時指示語と、イベント情報取得部４３が出力するイベント特定情報とを用いて、会話文を生成する。例えば会話文生成部４４は、予め定められた会話文パターンの候補である、
「そういえば、＜日時指示語＞の＜イベント特定情報＞は、どうだった？」
「あ。＜日時指示語＞は、＜イベント特定情報＞だったんだよね？」
…
などといった候補のうちから一つをランダムに選択して、日時指示とイベント特定情報とを、当該選択した会話文パターンに差し込む。 Further, the conversation sentence generation unit 44 generates a conversation sentence by using the date and time demonstrative associated with the selected number of days candidate and the event specific information output by the event information acquisition unit 43. For example, the conversation sentence generation unit 44 is a candidate for a predetermined conversation sentence pattern.
"By the way, how was the <event specific information> of the <date and time demonstrative>?"
"Ah. The <date and time demonstrative> was <event specific information>, right?"
...
One of the candidates such as, etc. is randomly selected, and the date and time instruction and the event specific information are inserted into the selected conversation sentence pattern.

これにより会話文生成部４４は、例えば、
「そういえば、一昨日のデートは、どうだった？」
のような会話文を生成する。 As a result, the conversation sentence generation unit 44 can be, for example,
"By the way, how was your date the day before yesterday?"
Generate a conversational sentence like.

会話文生成部４４は、ここで生成した会話文を上記決定した発話期間を発話条件として関連付けて、記憶部１２に格納した会話文キューに蓄積する。すなわちサーバ１０の記憶部１２に格納される会話文キューは、ここでの例では、図４に例示するように、発話条件（Ｃ）と、会話文（Ｄ）とを関連付けて格納したものとなる。 The conversation sentence generation unit 44 associates the conversation sentence generated here with the utterance period determined above as an utterance condition, and stores the conversation sentence in the conversation sentence queue stored in the storage unit 12. That is, in the example here, the conversational sentence queue stored in the storage unit 12 of the server 10 is stored in association with the utterance condition (C) and the conversational sentence (D), as illustrated in FIG. Become.

なお、発話条件は、上記の発話期間のほか、発話の時刻に係る条件や、発話時の気温等の条件が含まれてもよい。さらに発話条件は必ずしも必要でなく、定めなくてもよい。発話条件が定められない場合（発話条件なしの場合）、当該発話条件の定めのない会話文は、常時発話可能な会話文となる。 In addition to the above-mentioned utterance period, the utterance conditions may include conditions related to the utterance time, conditions such as the temperature at the time of utterance, and the like. Furthermore, the utterance conditions are not always necessary and need not be set. When the utterance condition is not defined (when there is no utterance condition), the conversational sentence without the utterance condition is a conversational sentence that can be spoken at all times.

また本実施の形態では、この会話文キューには、会話文生成部４４が生成した会話文のほかにも、会話文が予め登録されていてもよい。このように、予め登録された会話文は例えば、
・最高気温が３５度以上だった、かつ、時刻が１８時以降であるとの発話条件に関連付けて、「今日は、暑かったね」
・時刻が午前２時から午前４時の間であるとの発話条件に関連付けて、「そろそろ寝ない？」
などといった例があり得る。 Further, in the present embodiment, in addition to the conversational sentence generated by the conversational sentence generation unit 44, the conversational sentence may be registered in advance in this conversational sentence queue. In this way, the pre-registered conversational sentence is, for example,
・ In relation to the utterance condition that the maximum temperature was 35 degrees or higher and the time was after 18:00, "It was hot today."
・ In relation to the utterance condition that the time is between 2:00 am and 4:00 am, "Why don't you go to bed soon?"
There can be examples such as.

会話履歴管理部４５は、ユーザごとに、受信部４１が受け入れた、当該ユーザにより入力された音声の内容と、後に説明するアクション情報生成部４６が指示した、当該ユーザの端末装置２０により発話される会話文の内容とを順次記録する。つまり、この会話履歴管理部４５は、ユーザと端末装置２０との間での会話の履歴を記録することとなる。 The conversation history management unit 45 is uttered for each user by the content of the voice input by the user received by the reception unit 41 and the terminal device 20 of the user instructed by the action information generation unit 46 to be described later. Record the contents of the conversational sentences in sequence. That is, the conversation history management unit 45 records the history of conversations between the user and the terminal device 20.

アクション情報生成部４６は、受信部４１から入力される情報に基づいて、リクエストを送出した端末装置２０が実行するべきアクションを決定し、当該アクションを指示する情報（アクション指示）と、アクションの実行に必要となる情報（以下、パラメータ情報と呼ぶ）とを含むアクション情報を生成して指示送信部４７に出力する。 The action information generation unit 46 determines an action to be executed by the terminal device 20 that has sent the request based on the information input from the reception unit 41, and the information (action instruction) for instructing the action and the execution of the action. Action information including the information required for the above (hereinafter referred to as parameter information) is generated and output to the instruction transmission unit 47.

本実施の形態の一例では、サーバ１０の記憶部１２には、端末装置２０への指示を生成するための情報として、図５に例示するように、トリガを特定する情報（Ｔ）と、ユーザにより入力された音声の内容を表す情報と比較する情報（Ｖ、以下、比較文字列情報と呼ぶ。ただしこの比較文字列情報は、トリガの種類によってはなくてもよい）と、アクション情報の生成のためにサーバ１０が実行する処理を表す情報（Ｓ）とを互いに関連づけたレコード（Ｒ）を少なくとも一つ含む、アクションデータベースが格納されているものとする。 In an example of this embodiment, the storage unit 12 of the server 10 contains information (T) for specifying a trigger and user as information for generating an instruction to the terminal device 20 as illustrated in FIG. Information to be compared with the information representing the content of the voice input by (V, hereinafter referred to as comparison character string information. However, this comparison character string information may not be necessary depending on the type of trigger) and action information generation. It is assumed that an action database including at least one record (R) in which information (S) representing a process executed by the server 10 is associated with each other is stored.

アクション情報生成部４６は、受信部４１から入力されるトリガを特定する情報（Ｔ）に関連付けられた、比較文字列情報（Ｖ、あれば）とアクション情報の生成のためにサーバ１０が実行するべき処理を表す情報とを取得する。 The action information generation unit 46 is executed by the server 10 for generating the comparison character string information (V, if any) and the action information associated with the information (T) that identifies the trigger input from the reception unit 41. Get information that represents the processing to be done.

そしてアクション情報生成部４６は、比較文字列情報が取得されれば（トリガを特定する情報に比較文字列情報が関連付けられていれば）、受信部４１が出力する文字列情報と当該比較文字列情報とを比較する。そして、アクション情報生成部４６は、受信部４１が出力する文字列情報が比較文字列情報に一致していると判断すると、取得した情報が表す処理を実行して、アクション情報を生成する。 Then, if the comparison character string information is acquired (if the comparison character string information is associated with the information that identifies the trigger), the action information generation unit 46 transfers the character string information output by the reception unit 41 and the comparison character string. Compare with information. Then, when the action information generation unit 46 determines that the character string information output by the reception unit 41 matches the comparison character string information, the action information generation unit 46 executes the process represented by the acquired information to generate the action information.

また、アクション情報生成部４６は、比較文字列情報が取得されていなければ、上記取得した情報が表す処理を実行して、アクション情報を生成する。 If the comparison character string information has not been acquired, the action information generation unit 46 executes the process represented by the acquired information to generate the action information.

具体的な例として、ここではアクションデータベースには、「ユーザによる音声入力があった」旨のトリガを特定する情報と、ユーザにより入力された音声の内容を表す情報と比較するべき比較文字列情報として「＊ニュース［を｜は］＊［ない｜教えて｜読みあげて］＊」などといった文字列の情報とに「ニュースの文字列情報を、インターネット上の所定のウェブサーバから取得し、当該文字列情報を読み上げるよう指示する」との情報を関連付けたレコードが記録されているものとする。 As a specific example, here, in the action database, the information that identifies the trigger that "there was a voice input by the user" and the comparison character string information that should be compared with the information that represents the content of the voice input by the user. In addition to the character string information such as "* news [o | ha] * [not | tell | read aloud] *", "news character string information is obtained from a predetermined web server on the Internet, and the relevant information is obtained. It is assumed that a record associated with the information "Instruct to read out the character string information" is recorded.

なお、この比較文字列情報も正規表現で表されているものとする。従って上記の文字列は、
「今日のニュースを教えて」や、
「何かニュースはない？」
といった文字列情報に合致することとなる。 It is assumed that this comparison character string information is also represented by a regular expression. Therefore, the above string is
"Tell me about today's news"
"Are there any news?"
It will match the character string information such as.

アクション情報生成部４６は、例えば受信部４１から「ユーザによる音声入力があった」旨のトリガを特定する情報と、ユーザにより入力された音声の内容を表す情報として「何かニュースはない？」といった文字列情報との入力を受け入れると、当該トリガを特定する情報を含むレコードをアクションデータベースから検索する。 For example, the action information generation unit 46 specifies information that identifies a trigger that "there was a voice input by the user" from the reception unit 41, and "is there any news?" As information that represents the content of the voice input by the user. When the input with the character string information such as is accepted, the record containing the information specifying the trigger is searched from the action database.

ここではアクション情報生成部４６は、上記のレコードをアクションデータベースから見出すこととなり、当該レコードに含まれる、比較文字列情報と、受け入れた文字列情報とを比較する。上記の例では受け入れた文字列情報「何かニュースはない？」が、比較文字列情報「＊ニュース［を｜は］＊［ない｜教えて｜読みあげて］＊」に合致すると判断されるので、アクション情報生成部４６は、検索で見出した上記のレコードに含まれる、サーバ１０が実行するべき処理を表す情報、例えば
「（ステップ１）ニュースの文字列情報を、インターネット上の所定のウェブサーバから取得する、
（ステップ２）当該文字列情報を読み上げる指示を生成
（ステップ３）読み上げのときに再生するアニメーション情報を表示させる指示を生成する」を取得して、この情報に従った処理を実行する。 Here, the action information generation unit 46 finds the above record from the action database, and compares the comparison character string information included in the record with the accepted character string information. In the above example, it is judged that the accepted character string information "Are there any news?" Matches the comparison character string information "* News [| ha] * [No | Tell | Read aloud] *". Therefore, the action information generation unit 46 obtains information including the above-mentioned record found in the search, which represents the process to be executed by the server 10, for example, "(step 1) news character string information, on a predetermined web on the Internet. Get from the server,
(Step 2) Generate an instruction to read out the character string information (Step 3) Generate an instruction to display the animation information to be played at the time of reading out ”, and execute the process according to this information.

すなわちアクション情報生成部４６は、この読み出した情報に従って、インターネット上の所定のウェブサーバからニュースの文字列情報を取得する。またアクション情報生成部４６は、並列して行われるアクション処理の実行開始の時点で表示するべきアニメーションの画像データと、実行中に表示するべきアニメーションの画像データと、実行終了の時点で表示するべきアニメーションの画像データとをそれぞれ特定する情報（各画像データのファイル名でよい）を含むアニメーション情報の表示指示を生成する。 That is, the action information generation unit 46 acquires news character string information from a predetermined web server on the Internet according to the read information. Further, the action information generation unit 46 should display the image data of the animation to be displayed at the start of execution of the action processing performed in parallel, the image data of the animation to be displayed during execution, and the image data of the animation to be displayed at the end of execution. Generates a display instruction of animation information including information for identifying each image data of animation (the file name of each image data may be used).

そしてこの例では、アクション情報生成部４６は、アクション指示とパラメータ情報とを含んだアクション情報を生成して指示送信部４７に出力する。ここでアクション指示には、文字列情報を読み上げるべき旨の指示と、アニメーション情報の表示指示とを含む。また、パラメータ情報には、上記取得した文字列情報と、アニメーションの画像データを特定する情報とを含む。 Then, in this example, the action information generation unit 46 generates action information including the action instruction and the parameter information and outputs the action information to the instruction transmission unit 47. Here, the action instruction includes an instruction to read out the character string information and an instruction to display the animation information. Further, the parameter information includes the acquired character string information and information for specifying the image data of the animation.

またここで、サーバ１０が実行するべき処理を表す情報には「会話文の選択」の指示が含まれてもよい。このような指示が含まれる場合、アクション情報生成部４６は、当該指示に従い、例えば次のような方法で会話文を選択する。 Further, here, the information representing the process to be executed by the server 10 may include the instruction of "selection of conversational sentence". When such an instruction is included, the action information generation unit 46 selects a conversational sentence by, for example, the following method according to the instruction.

アクション情報生成部４６は、会話文キューに格納されている会話文のうち、当該会話文に関連付けられた発話条件を満足する会話文を抽出する。ここで発話条件を満足するか否かの判断に必要な種々の情報、例えば現在日時（処理を実行している日時）の情報や、気象情報等はネットワークを介してＮＴＰ（Network Time Protocol）サーバや、所定のウェブサーバから取得すればよい。 The action information generation unit 46 extracts a conversation sentence that satisfies the utterance condition associated with the conversation sentence from the conversation sentences stored in the conversation sentence queue. Here, various information necessary for determining whether or not the utterance condition is satisfied, for example, information on the current date and time (date and time when processing is being executed), weather information, etc., is stored in an NTP (Network Time Protocol) server via a network. Or, it may be obtained from a predetermined web server.

アクション情報生成部４６は、発話条件を満足するとして抽出した会話文のうちから一つを例えばランダムに選択する。またアクション情報生成部４６は、会話履歴管理部４５が記録している、リクエストを送出した端末装置２０のユーザに係る会話の履歴を読み出す。そしてアクション情報生成部４６は、現在記録されている会話の履歴に続いて、上記選択した会話文が発話されたときに自然な会話となるか否かを判断する。 The action information generation unit 46 randomly selects, for example, one of the conversational sentences extracted as satisfying the utterance condition. Further, the action information generation unit 46 reads out the conversation history of the user of the terminal device 20 that sent the request, which is recorded by the conversation history management unit 45. Then, the action information generation unit 46 determines whether or not the conversation becomes natural when the selected conversation sentence is spoken, following the currently recorded conversation history.

この判断は例えば、人間同士の間でなされた会話のテキストを機械学習したニューラルネットワーク等を用いて、現在記録されている会話の履歴に続く文として妥当であるか否か、すなわち会話に連続性があるか否かを判断させることで実現できる。このような処理は、いわゆる次文予測（Next Sentence Prediction：ＮＳＰ）として知られる処理である。次文予測を行うためのニューラルネットワークとしては、例えばＢＥＲＴとして知られるモデル（https://arxiv.org/pdf/1706.03762.pdf）を利用できる。このような次文予測を行うための機械学習の学習用データとしては、一対の会話文（第１の会話文と第２の会話文ととする）と、当該第１，第２の会話文の連続性を表す情報とを互いに関連付けたものを用いる方法等、広く知られた学習用データ並びに、それを用いた機械学習処理方法を採用できる。 Whether or not this judgment is appropriate as a sentence following the currently recorded conversation history using, for example, a neural network obtained by machine learning the text of a conversation made between humans, that is, continuity in the conversation. It can be realized by letting you judge whether or not there is. Such a process is a process known as so-called Next Sentence Prediction (NSP). As a neural network for predicting the next sentence, for example, a model known as BERT (https://arxiv.org/pdf/1706.03762.pdf) can be used. The learning data for machine learning for predicting the next sentence is a pair of conversational sentences (referred to as a first conversational sentence and a second conversational sentence) and the first and second conversational sentences. It is possible to adopt widely known learning data such as a method of using information representing the continuity of the data and a machine learning processing method using the same.

このようにニューラルネットワークを利用して現在記録されている会話の履歴に続く文として、選択した会話文が妥当であるか否か、つまり会話の連続性を判断させた場合、ニューラルネットワークの出力は、その妥当性を数値として表したものとなる。そこでアクション情報生成部４６は、予め定めたしきい値を超える数値となるときに、自然な会話となると判断（連続性ありと判断）する。 In this way, if the selected conversation sentence is valid or not, that is, the continuity of the conversation is judged as the sentence following the conversation history currently recorded using the neural network, the output of the neural network is , The validity is expressed numerically. Therefore, the action information generation unit 46 determines that the conversation is natural (determines that there is continuity) when the numerical value exceeds a predetermined threshold value.

そしてアクション情報生成部４６は、上記のように連続性ありとの判断をしたときには、文字列情報を読み上げるべき旨の指示と、アニメーション情報の表示指示とを含むアクション指示を生成する。またアクション情報生成部４６は、上記選択した会話文の文字列情報と、アニメーションの画像データを特定する情報とを含むパラメータ情報を生成し、当該生成したアクション指示とパラメータ情報とをアクション情報として、端末装置２０へ送出するよう、指示送信部４７に指示する。 Then, when the action information generation unit 46 determines that there is continuity as described above, the action information generation unit 46 generates an action instruction including an instruction to read out the character string information and an instruction to display the animation information. Further, the action information generation unit 46 generates parameter information including the character string information of the selected conversation sentence and the information for specifying the image data of the animation, and uses the generated action instruction and parameter information as action information. Instruct the instruction transmission unit 47 to send the data to the terminal device 20.

指示送信部４７は、アクション情報生成部４６が生成したアクション情報を、受信部４１が受信したリクエスト情報の送信元である端末装置２０に対して送出する。 The instruction transmission unit 47 transmits the action information generated by the action information generation unit 46 to the terminal device 20 which is the transmission source of the request information received by the reception unit 41.

次に、端末装置２０の制御部３１の動作について説明する。本実施の形態では、制御部３１は、図６に例示するように、リクエスト送出部５１と、アクション情報受信部５２と、音声合成部５３と、アクション処理実行部５４とを機能的に含んで構成される。 Next, the operation of the control unit 31 of the terminal device 20 will be described. In the present embodiment, as illustrated in FIG. 6, the control unit 31 functionally includes a request sending unit 51, an action information receiving unit 52, a voice synthesis unit 53, and an action processing execution unit 54. It is composed.

リクエスト送出部５１は、予め定められたトリガが発生したと判断すると、サーバ１０での処理に必要な情報を収集して、当該トリガを特定する情報とともに、当該収集した情報を含むリクエスト情報をサーバ１０へ送出する。具体的にここでは、ユーザにより音声が入力されたことや、所定の時刻になったなどといったトリガを予め列挙して、設定情報に含め、記憶部３２に格納しておく。 When the request sending unit 51 determines that a predetermined trigger has occurred, it collects information necessary for processing on the server 10, and together with the information for identifying the trigger, the server includes the request information including the collected information. Send to 10. Specifically, here, triggers such as voice input by the user or a predetermined time are listed in advance, included in the setting information, and stored in the storage unit 32.

一例として設定情報には、図７に例示するように、トリガごとに、それぞれの名称（トリガ名：Ｎ）に対して発生条件（Ｃ）や当該トリガに関係してサーバ１０での処理に必要な情報を特定する情報（Ｐ）、インターバル時間（Ｔ）等のトリガの処理に関わる情報に関連付けて列挙して記録されているものとする。 As an example, as illustrated in FIG. 7, the setting information is required for the generation condition (C) for each name (trigger name: N) and the processing on the server 10 in relation to the trigger, as illustrated in FIG. It is assumed that the information (P) for specifying the information, the interval time (T), and the like are listed and recorded in association with the information related to the trigger processing.

リクエスト送出部５１は、この設定情報を参照して、発生条件が満足されたと判断すると、当該発生条件が満足されたトリガが発生したとして、当該トリガに関係してサーバ１０での処理に必要な情報（Ｐ）を参照する。 When the request sending unit 51 determines that the generation condition is satisfied by referring to this setting information, it is assumed that a trigger satisfying the generation condition has occurred, and it is necessary for processing on the server 10 in relation to the trigger. Refer to information (P).

そしてリクエスト送出部５１は、当該参照した情報で特定される、サーバ１０での処理に必要な情報を収集し、当該収集した情報と、発生したトリガを特定する情報（トリガ名でよい）とを含むリクエスト情報を、サーバ１０へ送出する。 Then, the request sending unit 51 collects information necessary for processing on the server 10 specified by the referenced information, and collects the collected information and information (which may be a trigger name) that identifies the triggered trigger. The included request information is sent to the server 10.

アクション情報受信部５２は、サーバ１０からアクション情報を受信して、当該受信したアクション情報を、アクション処理実行部５４に出力する。 The action information receiving unit 52 receives the action information from the server 10 and outputs the received action information to the action processing execution unit 54.

音声合成部５３は、後に説明するアクション処理実行部５４から入力される文字列情報に基づいて、音声データを合成する。この音声合成部５３は、合成して得られた音声データを、アクション処理実行部５４に出力する。 The voice synthesis unit 53 synthesizes voice data based on the character string information input from the action processing execution unit 54, which will be described later. The voice synthesis unit 53 outputs the voice data obtained by synthesis to the action processing execution unit 54.

アクション処理実行部５４は、サーバ１０が送出したアクション情報から、アクション指示とパラメータ情報とを取り出し、アクション指示に従って処理を実行する。具体的に、上述の例のように、当該取得した文字列情報を読み上げるべき旨の指示と、上記アニメーション情報の表示指示とを含むアクション指示、及び、取得した文字列情報と、アニメーションの画像データとを含むパラメータ情報を含んだアクション情報を、アクション情報受信部５２が受信した場合について説明する。 The action processing execution unit 54 extracts the action instruction and the parameter information from the action information sent by the server 10, and executes the process according to the action instruction. Specifically, as in the above example, an instruction to read out the acquired character string information, an action instruction including a display instruction of the animation information, the acquired character string information, and animation image data. A case where the action information receiving unit 52 receives the action information including the parameter information including the above will be described.

この例ではアクション処理実行部５４は、取得した文字列情報を音声合成部５３に出力して、音声データを取得する。また、アクション処理実行部５４は、アクション情報に含まれる情報で特定されるアニメーションの画像データを記憶部２２から読み出す。そしてアクション処理実行部５４は、音声合成部５３が出力した音声データを、音声出力部３５に出力して、音声を鳴動させるとともに、上記読み出したアニメーションの画像データを表示部３４に出力して、アニメーションの画像データを再生する。 In this example, the action processing execution unit 54 outputs the acquired character string information to the voice synthesis unit 53 to acquire voice data. Further, the action processing execution unit 54 reads out the image data of the animation specified by the information included in the action information from the storage unit 22. Then, the action processing execution unit 54 outputs the voice data output by the voice synthesis unit 53 to the voice output unit 35 to ring the voice, and outputs the image data of the read animation to the display unit 34. Play the image data of the animation.

［動作］
本実施の形態の情報処理システム１は、以上の構成を備えており、次の例のように動作する。なお以下の例ではサーバ１０の記憶部１２には、アクションの要求の原因であるトリガごとに、アクション情報の生成のためにサーバ１０が実行する処理を表す情報が関連付けられて、アクションデータベースとして格納されているものとする。 [motion]
The information processing system 1 of the present embodiment has the above configuration and operates as in the following example. In the following example, the storage unit 12 of the server 10 is stored as an action database in which information representing the processing executed by the server 10 for generating action information is associated with each trigger that is the cause of the action request. It is assumed that it has been done.

以下の例では、このアクションデータベースに含まれる情報の例として、
・トリガを特定する情報（Ｔ）：ユーザが会話をしている
・実行する処理：
（ステップ１）会話文の文字列情報を選択
（ステップ２）当該文字列情報を読み上げる指示を生成
（ステップ３）読み上げのときに再生するアニメーション情報を表示させる指示を生成する
との情報が含まれるものとする。 In the following example, as an example of the information contained in this action database,
-Information that identifies the trigger (T): The user is having a conversation-Process to be executed:
(Step 1) Select the character string information of the conversation sentence (Step 2) Generate an instruction to read out the character string information (Step 3) Generate an instruction to display the animation information to be played at the time of reading out. And.

また、端末装置２０の記憶部３２は、設定情報として、図７に例示したように、トリガごとに、発生条件（Ｃ）や当該トリガに関係してサーバ１０での処理に必要な情報を特定する情報（Ｐ）等を関連付けて格納している。 Further, as setting information, the storage unit 32 of the terminal device 20 specifies the generation condition (C) and the information required for processing on the server 10 in relation to the trigger for each trigger, as illustrated in FIG. Information (P) to be stored is associated and stored.

以下の例では、この設定情報に、
・トリガを特定する情報（トリガ名Ｎ）：ユーザによる音声入力があった
・発生条件（Ｃ）：ユーザが所定のウェイクワードを発声した
・サーバ１０での処理に必要な情報を特定する情報（Ｐ）：
ユーザが発話した内容の文字列情報
…
といった情報が含まれるものとする。ここでウェイクワードとは、ユーザがその語を発話したときに、音声入力の開始として認識するべき、「ねえ聞いてよ」や「起きてよ」等の語であり、予め定められているものとする。端末装置２０は、サーバ１０での処理に必要となるユーザが発話した内容の文字列情報から、このウェイクワードに相当する文字列部分を除いてもよい。 In the following example, this setting information is
-Information that identifies the trigger (trigger name N): There was a voice input by the user-Occurrence condition (C): The user uttered a predetermined wake word-Information that identifies the information required for processing on the server 10 ( P):
Character string information of the content spoken by the user ...
Such information shall be included. Here, the wake word is a word such as "Hey listen" or "Get up" that should be recognized as the start of voice input when the user utters the word, and is predetermined. And. The terminal device 20 may remove the character string portion corresponding to this wake word from the character string information of the content uttered by the user required for processing on the server 10.

以下、このような設定の情報等を保持するサーバ１０と、端末装置２０との動作について、図８，図９を参照しながら説明する。 Hereinafter, the operation of the server 10 that holds the information of such settings and the terminal device 20 will be described with reference to FIGS. 8 and 9.

ユーザが、端末装置２０に対して「ねえ聞いてよ。明日はデートなんだけど…」などと発話する（図８のＳ１１）と、端末装置２０はこのユーザの音声を認識する処理を実行して（Ｓ１２）、ユーザが発話した音声に対応する文字列情報を取得する。既に述べたように、音声認識の処理は端末装置２０自身が行わなくても、ネットワークを介して音声認識処理のサービスにアクセスすることで行ってもよい。 When the user utters to the terminal device 20, "Hey, listen. It's a date tomorrow ..." (S11 in FIG. 8), the terminal device 20 executes a process of recognizing the user's voice. (S12), the character string information corresponding to the voice spoken by the user is acquired. As described above, the voice recognition process may be performed by accessing the voice recognition process service via the network without performing the voice recognition process by the terminal device 20 itself.

端末装置２０は、設定情報を参照して、いずれかのトリガの発生条件が満足されたかを調べる（Ｓ１３）。ここでは、ユーザがウェイクワードである「ねえ聞いてよ」を発声しているので、「ユーザによる音声入力があった」旨のトリガが発生したものとして（Ｓ１３：Ｙｅｓ）、設定情報に従い、ユーザが発話した内容の文字列情報を収集する。なお、ステップＳ１３において、いずれのトリガの発生条件も満足していないと判断すると（Ｓ１３：Ｎｏ）、端末装置２０は処理を終了する。 The terminal device 20 refers to the setting information and checks whether any of the trigger generation conditions is satisfied (S13). Here, since the user is uttering the wake word "Hey, listen", it is assumed that a trigger "there was a voice input by the user" has occurred (S13: Yes), and the user follows the setting information. Collects the character string information of the content spoken by. If it is determined in step S13 that none of the trigger generation conditions is satisfied (S13: No), the terminal device 20 ends the process.

ここでは、ユーザが発話した内容は既にステップＳ１２にて、ユーザが発話した内容の文字列情報を取得しているので、端末装置２０は、当該文字列情報と、発生したトリガを特定する情報（トリガ名「ユーザによる音声入力があった」）とを含むリクエスト情報をサーバ１０宛に送出する（Ｓ１４）。 Here, since the content uttered by the user has already acquired the character string information of the content uttered by the user in step S12, the terminal device 20 has the character string information and information for identifying the generated trigger ( Request information including the trigger name “there was a voice input by the user”) is sent to the server 10 (S14).

サーバ１０では端末装置２０からのリクエスト情報を受信する。そしてサーバ１０は、当該リクエスト情報に含まれる文字列情報がイベントに関わる情報であるか否かを、予め定められたイベントに関わる情報のパターンと当該リクエスト情報に含まれる文字列情報とを比較することで判断する（Ｓ１５）。 The server 10 receives the request information from the terminal device 20. Then, the server 10 compares a predetermined pattern of information related to the event with the character string information included in the request information to determine whether or not the character string information included in the request information is information related to the event. Judgment is made by (S15).

ここでパターンが「＊［明日｜明後日｜週末｜来週］［Ｄ＋日］［Ｗ＋曜］［日］［は｜に］Ｗ＋［が］［ある｜なん］＊」であるとすると、上記受け入れた文字列情報が「ねえ聞いてよ。明日はデートなんだけど…」であるので、サーバ１０は、当該パターンに一致するものと判断する。つまり、この文字列情報は、イベントに関わる情報であると判断される（Ｓ１５：Ｙｅｓ）こととなる。 Here, assuming that the pattern is "* [Tomorrow | The day after tomorrow | Weekend | Next week] [D + day] [W + day] [Sun] [ha | ni] W + [ga] [is | what] *", the above was accepted. Since the character string information is "Hey, listen. It's a date tomorrow ...", the server 10 determines that it matches the pattern. That is, this character string information is determined to be information related to the event (S15: Yes).

なお、イベントに関わる情報でないと判断されたとき（Ｓ１５：Ｎｏ）には、サーバ１０は、他の処理、つまりアクションデータベースを参照して得られる処理を実行する。 When it is determined that the information is not related to the event (S15: No), the server 10 executes another process, that is, a process obtained by referring to the action database.

そこでサーバ１０は、この文字列情報から日時情報と、イベント特定情報とを取り出す（Ｓ１６）。ここでは「明日」とあるので、サーバ１０は、この処理を実行している日時（例えば１２月１５日）の次の日である「１２月１６日」との日時情報を得る。また、サーバ１０は、イベント特定情報として、上記文字列情報から「デート」を抽出する。 Therefore, the server 10 extracts the date and time information and the event specific information from the character string information (S16). Since "tomorrow" is used here, the server 10 obtains date and time information of "December 16", which is the day following the date and time when this process is executed (for example, December 15). Further, the server 10 extracts "date" from the above character string information as event specific information.

次にサーバ１０は、会話文を生成する。サーバ１０は、まず会話文の発話期間として、予め日数データテーブルに定められた日数候補から一つの日数を、ランダムに取得する（Ｓ１７：発話期間の決定）。ここでは日数候補として「７日」を取得したものとする。またこのステップＳ１６の処理においてサーバ１０は、取得した日数候補に関連付けて日数データテーブルに記録されている発話期間長（ここでは「７日」とする）と、日時指示語（ここでは「この間の」とする）を取得し、ステップＳ１６で取り出した日時情報（１２月１６日）と、取得した日数候補と、発話期間長とを用い、発話期間を「１２月２３日から１２月３０日まで」と決定する。 Next, the server 10 generates a conversational sentence. First, the server 10 randomly acquires one number of days from the number of days candidates predetermined in the number of days data table as the utterance period of the conversation sentence (S17: determination of the utterance period). Here, it is assumed that "7 days" is acquired as a candidate for the number of days. Further, in the process of step S16, the server 10 has the utterance period length (here, “7 days”) recorded in the number of days data table in association with the acquired number of days candidate, and the date and time demonstrative (here, “during this period”). The utterance period is set to "from December 23 to December 30" by using the date and time information (December 16) extracted in step S16, the acquired number of days candidates, and the utterance period length. ".

なお、この発話期間の決定の処理では、予め日数データテーブルに定められた日数候補から一つの日数をランダムに決定する例としたが、本実施の形態はこれに限られず、サーバ１０は、ステップＳ１６で取得したイベント特定情報に関連付けて予め定められた日数を、発話期間として取得してもよい。 In the process of determining the utterance period, one day is randomly determined from the number of days candidates predetermined in the number of days data table, but the present embodiment is not limited to this, and the server 10 is a step. A predetermined number of days associated with the event specific information acquired in S16 may be acquired as the utterance period.

この例では、例えばイベント特定情報が「デート」（比較的頻繁に発生し得るイベントであり、話題のライフタイムが比較的短いと考えられるイベント）である場合は、比較的短い日数として「４日」、また、「旅行」（話題のライフタイムが比較的長いと考えられるイベント）である場合は、比較的長い日数として「１４日」などと予め定めたデータテーブルを記憶部１２に記憶させておき、サーバ１０は、このデータテーブルを参照して、発話期間を設定する。 In this example, for example, if the event specific information is a "date" (an event that can occur relatively frequently and the topic lifetime is considered to be relatively short), the relatively short number of days is "4 days". In the case of "travel" (an event that is considered to have a relatively long topical lifetime), a data table predetermined as "14 days" as a relatively long number of days is stored in the storage unit 12. The server 10 sets the speech period with reference to this data table.

また別の例では、ステップＳ１６で取得したイベント特定情報のそれぞれに関連付けて複数の日数候補を記憶したデータテーブルを参照して発話期間を決定してもよい。 In another example, the utterance period may be determined by referring to a data table in which a plurality of day number candidates are stored in association with each of the event specific information acquired in step S16.

この例では、イベント特定情報が「デート」（比較的頻繁に発生し得るイベントであり、話題のライフタイムが比較的短いと考えられるイベント）である場合は、比較的短い日数の日数候補として「１日、２日、３日、４日」、また、「旅行」（話題のライフタイムが比較的長いと考えられるイベント）である場合は、比較的長い日数まで含めて「１日、２日、…、１３日、１４日」などと予め定めたデータテーブルを記憶部１２に記憶させておく。 In this example, if the event-specific information is "dating" (an event that can occur relatively frequently and is considered to have a relatively short topical lifetime), then "a candidate for the number of days with a relatively short number of days" "1 day, 2 days, 3 days, 4 days", and in the case of "travel" (event that is considered to have a relatively long topical lifetime), "1 day, 2 days" including relatively long days , ..., 13th, 14th ”and so on are stored in the storage unit 12.

そしてサーバ１０は、ステップＳ１６で取得したイベント特定情報に関連付けて、上記データテーブルに予め定められた日数候補から一つの日数をランダムに選択し、当該選択した日数を発話期間として設定する。 Then, the server 10 randomly selects one number of days from the number of days candidates predetermined in the data table in association with the event specific information acquired in step S16, and sets the selected number of days as the utterance period.

この例では、ステップＳ１６で取得したイベント特定情報で特定されるイベントが「デート」の場合は、１日乃至４日の発話期間が設定されることとなり、ステップＳ１６で取得したイベント特定情報で特定されるイベントが「旅行」であるときには、１日乃至１４日の発話期間が設定されることとなる。 In this example, when the event specified by the event specific information acquired in step S16 is "date", the utterance period of 1 to 4 days is set, and the event is specified by the event specific information acquired in step S16. When the event to be held is "travel", the utterance period of 1 to 14 days will be set.

サーバ１０は、選択した日数候補に関連付けられた日時指示語と、イベント情報取得部４３が出力するイベント特定情報と、予め定められた会話文パターンの候補の情報とを用いて、会話文を生成する（Ｓ１８）。ここではサーバ１０は、予め定められた、複数の会話文パターンの候補のうちから一つを選択して、日時指示語（「この間の」）とイベント特定情報（「デート」）とを差し込むことで、会話文、例えば、
「そういえば、この間のデートは、どうだった？」
といった会話文を生成する。 The server 10 generates a conversation sentence by using the date and time demonstrative associated with the selected number of days candidate, the event specific information output by the event information acquisition unit 43, and the information of the candidate of the conversation sentence pattern determined in advance. (S18). Here, the server 10 selects one of a plurality of predetermined conversational sentence pattern candidates, and inserts the date and time demonstrative (“during this time”) and the event specific information (“date”). So, conversational sentences, for example,
"By the way, how was your date during this time?"
To generate a conversational sentence such as.

サーバ１０は、このステップＳ１８の処理で生成した会話文に、ステップＳ１７の処理で決定した発話期間を発話条件として関連付けて、記憶部１２に格納した会話文キューに蓄積する（Ｓ１９）。 The server 10 associates the conversation sentence generated in the process of step S18 with the utterance period determined in the process of step S17 as a utterance condition, and stores the conversation sentence in the conversation sentence queue stored in the storage unit 12 (S19).

この処理により、会話文キューには、予め設定されている会話文である、
（１）発話条件：最高気温が３５度以上だった、かつ、時刻が１８時以降である
会話文：「今日は、暑かったね」
（２）発話条件：時刻が午前２時から午前４時の間である
会話文「そろそろ寝ない？」
…
などとともに、
（ｎ）発話条件：１２月２３日から１２月３０日までの発話期間内である
会話文：「そういえば、この間のデートは、どうだった？」
との会話文が蓄積された状態となる。 By this process, the conversation sentence queue is a preset conversation sentence.
(1) Speaking conditions: The maximum temperature was 35 degrees or higher, and the time was after 18:00. Conversation: "It was hot today."
(2) Speaking condition: The time is between 2:00 am and 4:00 am Conversational sentence "Why don't you go to bed soon?"
...
And so on
(N) Speaking conditions: During the utterance period from December 23 to December 30 Conversation: "By the way, how was your date during this time?"
Conversational sentences with are accumulated.

なお、この動作の間も、サーバ１０は、ユーザにより入力された音声の内容と、当該ユーザの端末装置２０により発話される会話文の内容とを順次記録して、会話履歴の情報を生成している。ここでは例えば、ユーザとの間で次のような会話が行われる場合、その会話履歴の情報が保持される。 During this operation, the server 10 sequentially records the content of the voice input by the user and the content of the conversation sentence uttered by the terminal device 20 of the user, and generates the information of the conversation history. ing. Here, for example, when the following conversation is held with the user, the information of the conversation history is retained.

すなわち、サーバ１０では、
ユーザ：「ねえ聞いてよ。明日はデートなんだけどね」
端末装置２０の発話：「へえ、そうなんだ」
ユーザ：「天気はどうかな」
端末装置２０の発話：「明日は晴れるみたいだよ」
といった会話履歴の情報が記録されることとなる。 That is, on the server 10,
User: "Hey, listen. I'm going on a date tomorrow."
Utterance of terminal device 20: "Oh, that's right."
User: "How is the weather?"
Utterance of terminal device 20: "It looks like it will be fine tomorrow."
Information on the conversation history such as is recorded.

なお、サーバ１０は、このステップＳ１６乃至Ｓ１９の処理と並行して、他の処理、つまりアクションデータベースを参照して得られる処理を実行してもよい。 The server 10 may execute another process, that is, a process obtained by referring to the action database, in parallel with the processes of steps S16 to S19.

その後、例えば１２月２４日にユーザが、端末装置２０に対して「起きてよ。何かニュースある？」などと発話する（図９のＳ２１）と、端末装置２０はこのユーザの音声を認識する処理を実行して（Ｓ２２）、ユーザが発話した音声に対応する文字列情報を取得する。 Then, for example, on December 24, when the user utters "Get up. Is there any news?" To the terminal device 20 (S21 in FIG. 9), the terminal device 20 recognizes the user's voice. (S22), the character string information corresponding to the voice spoken by the user is acquired.

また端末装置２０は、設定情報を参照して、いずれかのトリガの発生条件が満足されたかを調べる（Ｓ２３）。ここでは、ユーザがウェイクワードを発声しているので、「ユーザによる音声入力があった」旨のトリガが発生したものとして（Ｓ２３：Ｙｅｓ）、設定情報に従い、ユーザが発話した内容の文字列情報「起きてよ。何かニュースある？」を収集する。そして端末装置２０は、当該文字列情報と、発生したトリガを特定する情報（トリガ名「ユーザによる音声入力があった」）とを含むリクエスト情報をサーバ１０宛に送出する（Ｓ２４）。 Further, the terminal device 20 refers to the setting information and checks whether any of the trigger generation conditions is satisfied (S23). Here, since the user is uttering a wake word, it is assumed that a trigger "there was a voice input by the user" has occurred (S23: Yes), and the character string information of the content uttered by the user is determined according to the setting information. Collect "Get up. Any news?" Then, the terminal device 20 sends out the request information including the character string information and the information specifying the generated trigger (trigger name “there was a voice input by the user”) to the server 10 (S24).

サーバ１０では、「ユーザによる音声入力があった」旨のトリガに関連付けられた、比較文字列情報「＊ニュース［を｜は］＊［ない｜教えて｜読みあげて］＊」に、端末装置２０が送信した文字列情報「起きてよ。何かニュースある？」とが比較して一致しているか否かを判断する（Ｓ２５）。 In the server 10, the terminal device is provided with the comparison character string information "* news [| ha] * [not | tell | read aloud] *" associated with the trigger "there was a voice input by the user". It is determined whether or not the character string information “Get up. Is there any news?” Sent by 20 matches or not (S25).

ここでは、サーバ１０が端末装置２０から受信した文字列情報「起きてよ。何かニュースある？」が比較文字列情報に一致するため、サーバ１０は、ステップＳ２５で比較文字列情報と受信した文字列情報とが一致したと判断して、「ユーザによる音声入力があった」旨のトリガと上記比較文字列情報とに関連付けられている情報で特定される処理を開始する。 Here, since the character string information "Get up. Is there any news?" Received from the terminal device 20 by the server 10 matches the comparison character string information, the server 10 receives the comparison character string information in step S25. It is determined that the character string information matches, and the process specified by the information associated with the trigger "there was a voice input by the user" and the comparison character string information is started.

例えばサーバ１０は、インターネット上の所定のウェブサーバからニュースの文字列情報を取得し、当該文字列情報を読み上げるべき旨の指示とともに、アクション情報として端末装置２０に対して送出する（Ｓ２６）。 For example, the server 10 acquires news character string information from a predetermined web server on the Internet, and sends it to the terminal device 20 as action information together with an instruction to read out the character string information (S26).

端末装置２０では、この指示に従い、ニュースの情報を読み上げる処理を実行する（Ｓ２７）。ここでは例えば「先週に引き続き、暖かい日が続いていますが、年末は寒波が襲来すると予想されています」のようなニュースが読み上げられる。 The terminal device 20 executes a process of reading out news information according to this instruction (S27). Here, for example, news such as "Warm days continue from last week, but cold waves are expected to hit at the end of the year" is read aloud.

サーバ１０では、ここまでで端末装置２０との間で送受した文字列情報を会話の履歴として記録している。上述の例では、
ユーザ：「起きてよ。何かニュースある？」
端末装置２０の発話：「『先週に引き続き、暖かい日が続いていますが、年末は寒波が襲来すると予想されています』、だって」
といった会話の履歴が記録される。 The server 10 records the character string information sent and received to and from the terminal device 20 up to this point as a conversation history. In the above example,
User: "Get up. Any news?"
Utterance of terminal device 20: "'It's been a warm day since last week, but it's expected that a cold wave will hit at the end of the year.'"
The history of conversations such as is recorded.

なお、ここまでの処理において、サーバ１０は、当該リクエスト情報に含まれる文字列情報がイベントに関わる情報であるか否かを、予め定められたイベントに関わる情報のパターンと当該リクエスト情報に含まれる文字列情報とを比較することで判断するが、「起きてよ。何かニュースある？」の文字列情報は、パターン「＊［明日｜明後日｜週末｜来週］［Ｄ＋日］［Ｗ＋曜］［日］［は｜に］Ｗ＋［が］［ある｜なん］＊」に合致しないため、イベントに関わる情報とされず、対応する処理は行われない。 In the processing up to this point, the server 10 includes in the predetermined event-related information pattern and the request information whether or not the character string information included in the request information is information related to the event. Judging by comparing with the character string information, the character string information of "Get up. Is there any news?" Is the pattern "* [Tomorrow | Tomorrow's day | Weekend | Next week] [D + day] [W + day] Since it does not match [day] [ha | ni] W + [ga] [is | what] * ", it is not regarded as information related to the event and the corresponding processing is not performed.

この後、ユーザがさらに、端末装置２０に対して「ああ。確かに先週は暖かかったなあ」などと発話すると（Ｓ２８）、端末装置２０はユーザが発話した音声に対応する文字列情報を取得する。そして端末装置２０は例えば「ユーザが会話をしている」旨のトリガが発生したものとして、設定情報に従い、ユーザが発話した内容の文字列情報「ああ。確かに先週は暖かかったなあ」と、発生したトリガを特定する情報とを含むリクエスト情報をサーバ１０宛に送出する（Ｓ２９）。 After that, when the user further utters to the terminal device 20 such as "Oh, it was certainly warm last week" (S28), the terminal device 20 acquires the character string information corresponding to the voice spoken by the user. .. Then, the terminal device 20 assumes that, for example, a trigger to the effect that "the user is having a conversation" has occurred, and according to the setting information, the character string information of the content spoken by the user "Oh. Certainly it was warm last week." Request information including information for identifying the generated trigger is sent to the server 10 (S29).

サーバ１０ではこのリクエスト情報に含まれる文字列情報がイベントに関わる情報であるか否かを、予め定められたイベントに関わる情報のパターンと当該リクエスト情報に含まれる文字列情報とを比較することで判断するが、この文字列情報も、上記パターンに合致しないので、イベントに関わる情報とされず、対応する処理は行われない。 The server 10 compares whether or not the character string information included in the request information is information related to the event by comparing the predetermined pattern of the information related to the event with the character string information included in the request information. Although it is determined, since this character string information also does not match the above pattern, it is not regarded as information related to the event, and the corresponding processing is not performed.

一方、サーバ１０は、「ユーザが会話をしている」とのトリガを特定する情報に関連付けられている情報で特定される処理を実行し、会話文の文字列情報を選択して、当該文字列情報を発声させる指示を端末装置２０へ送出する（Ｓ３０）。 On the other hand, the server 10 executes a process specified by the information associated with the information that identifies the trigger that "the user is having a conversation", selects the character string information of the conversation sentence, and selects the character. An instruction to utter the column information is sent to the terminal device 20 (S30).

すなわちサーバ１０は、図１０に例示するように、このステップＳ３０の処理において、会話文キューに格納されている会話文のうち、当該会話文に関連付けられた発話条件を満足する会話文を抽出する（Ｓ４１）。またサーバ１０は、ここで会話文のうち、未選択の会話文の一つを例えばランダムに選択する（Ｓ４２）。 That is, as illustrated in FIG. 10, the server 10 extracts a conversational sentence that satisfies the utterance condition associated with the conversational sentence from the conversational sentences stored in the conversational sentence queue in the process of step S30. (S41). Further, the server 10 randomly selects, for example, one of the unselected conversational sentences from the conversational sentences (S42).

他方、サーバ１０は、リクエストを送出した端末装置２０のユーザに係る会話の履歴を参照し、現在記録されている会話の履歴に続いて、ステップＳ４２で選択した会話文が発話されたときに自然な会話となるか否かを判断する（Ｓ４３）。 On the other hand, the server 10 refers to the conversation history of the user of the terminal device 20 that sent the request, and naturally when the conversation sentence selected in step S42 is spoken following the currently recorded conversation history. It is determined whether or not the conversation will be successful (S43).

具体的にサーバ１０は、人間同士の間でなされた会話のテキストを機械学習したＢＥＲＴのモデルを用いた次文予測処理により、選択した会話文が、現在記録されている会話の履歴に続く文としての妥当性を表す数値（妥当であるほど大きい値となるものとする）を取得する。そしてサーバ１０は、取得した値が予め定めたしきい値を超える数値となるときに、自然な会話となる（連続性あり）と判断し、そうでないときには自然な会話とならない（連続性なし）と判断する。 Specifically, the server 10 uses a BERT model that machine-learns the text of a conversation between humans to perform next sentence prediction processing, so that the selected conversation sentence follows the currently recorded conversation history. (The more reasonable the value, the larger the value) is obtained. Then, the server 10 determines that the conversation will be natural (with continuity) when the acquired value exceeds a predetermined threshold value, and will not be a natural conversation (without continuity) otherwise. Judge.

サーバ１０は、ステップＳ４３において、選択した会話文が発話されたときに自然な会話となると判断する（Ｓ４３：Ｙｅｓ）と、当該選択した会話文の文字列情報を読み上げるべき旨の指示を含むアクション情報を生成する。そしてサーバ１０は、当該アクション情報を端末装置２０へ送出する（Ｓ４４）。 In step S43, when the server 10 determines that the conversation will be natural when the selected conversation sentence is spoken (S43: Yes), the server 10 includes an instruction to read out the character string information of the selected conversation sentence. Generate information. Then, the server 10 sends the action information to the terminal device 20 (S44).

なお、ステップＳ４３において、選択した会話文が発話されたときに自然な会話とならないと判断する（Ｓ４３：Ｎｏ）と、サーバ１０は、処理Ｓ４２に戻って処理を続ける。なお、処理Ｓ４２において未選択の会話文がない場合は、予め定めた会話文の文字列情報を読み上げるべき旨の指示を含むアクション情報を生成して端末装置２０へ送出するなど、予め定めた処理を実行する（Ｓ４５：既定処理の実行）。 If it is determined in step S43 that the conversation does not become a natural conversation when the selected conversation sentence is spoken (S43: No), the server 10 returns to the process S42 and continues the process. If there is no unselected conversation sentence in the process S42, a predetermined process such as generating action information including an instruction to read out the character string information of the predetermined conversation sentence and sending it to the terminal device 20 is performed. (S45: Execution of default processing).

図９に戻り、端末装置２０では、アクション情報の指示に従い、文字列情報を読み上げる処理を実行する（Ｓ３１）。 Returning to FIG. 9, the terminal device 20 executes a process of reading out the character string information according to the instruction of the action information (S31).

具体的な例として、上記ステップＳ４３の処理においては、
ユーザ：「起きてよ。何かニュースある？」
端末装置２０の発話：「『先週に引き続き、暖かい日が続いていますが、年末は寒波が襲来すると予想されています』、だって」
ユーザ：「ああ。確かに先週は暖かかったなあ」
といった会話の履歴に引き続いて、発話条件が満足されている会話文の一つとして、「そういえば、この間のデートは、どうだった？」との会話文が自然に連続するか否かが判断される。 As a specific example, in the process of step S43,
User: "Get up. Any news?"
Utterance of terminal device 20: "'It's been a warm day since last week, but it's expected that a cold wave will hit at the end of the year.'"
User: "Oh, it was warm last week."
Following the conversation history such as, as one of the conversation sentences that the utterance conditions are satisfied, it is judged whether or not the conversation sentence "By the way, how was the date during this time?" Is naturally continuous. Will be done.

ここで会話に連続性があると判断された場合、この会話文を発話するようサーバ１０が端末装置２０に指示するので、会話は全体として、
ユーザ：「起きてよ。何かニュースある？」
端末装置２０の発話：「『先週に引き続き、暖かい日が続いていますが、年末は寒波が襲来すると予想されています』、だって」
ユーザ：「ああ。確かに先週は暖かかったなあ」
端末装置２０の発話：「そういえば、この間のデートは、どうだった？」
といったようになる。 If it is determined that the conversation is continuous, the server 10 instructs the terminal device 20 to utter this conversation sentence, so that the conversation as a whole is completed.
User: "Get up. Any news?"
Utterance of terminal device 20: "'It's been a warm day since last week, but it's expected that a cold wave will hit at the end of the year.'"
User: "Oh, it was warm last week."
Utterance of terminal device 20: "By the way, how was your date during this time?"
And so on.

このように本実施の形態では、過去の予定に関する会話が現れることで、ユーザに対応した内容の会話を実現でき、親近感を喚起できる。 As described above, in the present embodiment, the conversation about the past schedule appears, so that the conversation of the content corresponding to the user can be realized and the feeling of familiarity can be aroused.

［端末装置単体の場合］
ここまでの説明では、端末装置２０がユーザの発した音声を認識して得た文字列情報をサーバ１０へ送出していた。そしてサーバ１０にて端末装置２０で発声するべき音声データのもととなる文字列情報（会話文）を生成して提供することとしていた。 [In the case of a single terminal device]
In the above description, the terminal device 20 recognizes the voice emitted by the user and sends the character string information obtained to the server 10. Then, the server 10 generates and provides the character string information (conversational sentence) that is the source of the voice data to be uttered by the terminal device 20.

しかしながら本発明の実施の形態の一態様では、端末装置２０が情報処理装置として機能してもよい。この場合、サーバ１０は必ずしも必要ではない。 However, in one aspect of the embodiment of the present invention, the terminal device 20 may function as an information processing device. In this case, the server 10 is not always necessary.

この例では、端末装置２０の制御部３１が、リクエスト送出部５１と、アクション情報受信部５２と、音声合成部５３と、アクション処理実行部５４としての機能を実現するとともに、受信部４１と、イベント管理部４２と、イベント情報取得部４３と、会話文生成部４４と、会話履歴管理部４５と、アクション情報生成部４６としても動作することとなる。 In this example, the control unit 31 of the terminal device 20 realizes the functions of the request transmission unit 51, the action information reception unit 52, the voice synthesis unit 53, and the action processing execution unit 54, and the reception unit 41 and the reception unit 41. It also operates as an event management unit 42, an event information acquisition unit 43, a conversation sentence generation unit 44, a conversation history management unit 45, and an action information generation unit 46.

またこの場合、アクションデータベースや会話文キュー等、上記の説明で記憶部１２に格納されているデータはいずれも端末装置２０の記憶部３２に格納される。 Further, in this case, all the data stored in the storage unit 12 in the above description, such as the action database and the conversation sentence queue, are stored in the storage unit 32 of the terminal device 20.

そしてこの例ではリクエスト送出部５１は、リクエスト情報を、制御部３１自身が実現する受信部４１に対して出力することとなる。また受信部４１は、当該出力されたリクエスト情報を受け入れて処理を行う。さらにアクション情報生成部４６は、生成したアクション情報を、制御部３１自身が実現するアクション情報受信部５２に対して出力する。 Then, in this example, the request sending unit 51 outputs the request information to the receiving unit 41 realized by the control unit 31 itself. Further, the receiving unit 41 receives the output request information and performs processing. Further, the action information generation unit 46 outputs the generated action information to the action information receiving unit 52 realized by the control unit 31 itself.

これらの情報の出力と受け入れは、ローカルループバックを通じて、ネットワーク（通信部３６）経由で行われてもよいし、ネットワークを介することなく、記憶部３２に格納して読み出すことで行われてもよい。 The output and acceptance of this information may be performed via the network (communication unit 36) through a local loopback, or may be stored and read in the storage unit 32 without going through the network. ..

［発話の停止動作］
また端末装置２０は、音声データ（サーバ１０から受信した文字列情報に基づいて合成した音声データを含む）の再生中に、ユーザが端末装置２０に対して所定の操作を行ったときに、音声データの再生を中断することとしてもよい。 [Stop utterance]
Further, the terminal device 20 voices when the user performs a predetermined operation on the terminal device 20 during playback of voice data (including voice data synthesized based on character string information received from the server 10). The playback of the data may be interrupted.

この所定の操作は、例えばセンサ部３３がタッチセンサを備える場合は、タッチセンサに触れることにより行われてもよい。この例では、端末装置２０は音声データの再生中、ユーザがタッチセンサに触れたことを検出すると、音声データの再生を中断する。 This predetermined operation may be performed by touching the touch sensor, for example, when the sensor unit 33 includes the touch sensor. In this example, when the terminal device 20 detects that the user touches the touch sensor during the reproduction of the audio data, the reproduction of the audio data is interrupted.

また別の例では、この所定の操作は、ユーザが所定の語を発したことであってもよい。例えば端末装置２０は音声データの再生中、ユーザが「しっ」と、静かにするよう要求する語を発したと判断すると、端末装置２０は、音声データの再生を中断する。 In yet another example, this predetermined operation may be that the user utters a predetermined word. For example, when the terminal device 20 determines that the user has issued a word requesting to be quiet during the reproduction of the voice data, the terminal device 20 interrupts the reproduction of the voice data.

さらに別の例では、情報処理システム１は、ユーザが操作を行わない場合であっても所定の会話文の発声をしないよう制御してもよい。 In yet another example, the information processing system 1 may be controlled so as not to utter a predetermined conversational sentence even when the user does not perform the operation.

具体的に端末装置２０のセンサ部３３は、人数を検出可能な人感センサ（広く知られているデバイスであるので、その詳しい説明を省略する）を備えてもよい。そしてこの人感センサが端末装置２０の周囲で検出した人数の情報を、会話文の選択に用いてもよい。 Specifically, the sensor unit 33 of the terminal device 20 may include a motion sensor capable of detecting the number of people (since it is a widely known device, detailed description thereof will be omitted). Then, the information on the number of people detected by the motion sensor around the terminal device 20 may be used for selecting a conversational sentence.

この例では、会話文生成部４４としての機能を実行するサーバ１０または端末装置２０は、イベント情報取得部４３が出力する情報に基づいて会話文を生成して会話文キューに登録する際、発話条件に、発話期間の情報に加えて、端末装置２０が検出した人数の情報に係る条件を含める。一例としてこの人数に係る条件は、人数が「１」（単数）であるとの条件としておく。 In this example, when the server 10 or the terminal device 20 that executes the function as the conversation sentence generation unit 44 generates a conversation sentence based on the information output by the event information acquisition unit 43 and registers it in the conversation sentence queue, the utterance is spoken. In addition to the information on the utterance period, the condition includes the condition related to the information on the number of people detected by the terminal device 20. As an example, the condition relating to this number of people is that the number of people is "1" (singular).

端末装置２０は、サーバ１０で会話文を選択する場合には、サーバ１０に対して送出するリクエスト情報に、周囲で検出した人数を表す人数情報を含めて送出する。 When the server 10 selects a conversational sentence, the terminal device 20 sends the request information to be sent to the server 10 including the number of people information indicating the number of people detected in the surroundings.

そしてアクション情報生成部４６としての機能を実行するサーバ１０または端末装置２０は、会話文キューに格納されている会話文のうち、当該会話文に関連付けられた発話条件を満足する会話文を抽出する際に、この人数情報を用いた条件も満足する会話文を選択する。 Then, the server 10 or the terminal device 20 that executes the function as the action information generation unit 46 extracts the conversational sentence that satisfies the speech condition associated with the conversational sentence from the conversational sentences stored in the conversational sentence queue. At that time, select a conversational sentence that satisfies the conditions using this number of people information.

上述の例のように、過去のイベント情報に関係する会話文の発話条件として人数が「１」であるとの条件を含めておくと、端末装置２０の周囲に複数の人物が存在する場合には、過去のイベント情報に関係する会話文が選択されない状態となるので、ユーザのプライベートな情報が漏出する機会を低減できる。 As in the above example, if the condition that the number of people is "1" is included as the utterance condition of the conversation sentence related to the past event information, when a plurality of people exist around the terminal device 20. Is in a state where conversational sentences related to past event information are not selected, so that the chance of leaking the user's private information can be reduced.

［感情の表現］
本実施の形態の一例ではさらに、アクション情報に、端末装置２０の表示部３４に表示するキャラクタの感情に関係する情報が含まれてもよい。この感情の情報は、例えば喜び（Joy）、や悲しみ（Sad）などを特定する情報等でよい。またこの感情の情報には、無感情を表す情報（flat）を含んでもよい。 [Expression of emotion]
In an example of the present embodiment, the action information may further include information related to the emotion of the character displayed on the display unit 34 of the terminal device 20. This emotional information may be, for example, information that identifies joy (Joy), sadness (Sad), or the like. Further, the emotional information may include information (flat) indicating no emotion.

端末装置２０では、受信したアクション情報に含まれる感情の情報を参照して、表示部３４に表示する目の画像データを選択する。一例として感情の情報が「Sad」である場合は、端末装置２０は目の画像データのうち、涙を流している目のアニメーションの画像データを選択して表示する。 The terminal device 20 selects the image data of the eyes to be displayed on the display unit 34 with reference to the emotional information included in the received action information. As an example, when the emotional information is "Sad", the terminal device 20 selects and displays the image data of the animation of the eyes that shed tears from the image data of the eyes.

さらに本実施の形態の一例では、アクション情報に含める感情の情報を決定する際に、発話の対象となる文字列情報を生成するサーバ１０あるいは端末装置２０は、当該文字列情報に含まれる語に基づいて感情を決定してもよい。このような処理は例えば文字列情報に含まれる単語に基づく文字列情報が表す文の極性判断の処理などとして行うことができる。具体的に、ニュースの文字列情報を発話させる場合、悲劇的な事件を表す単語が含まれているときには悲しみの感情を特定する情報とともに、当該ニュースの文字列情報をアクション情報に含める。 Further, in an example of the present embodiment, when the emotional information to be included in the action information is determined, the server 10 or the terminal device 20 that generates the character string information to be uttered is a word included in the character string information. Emotions may be determined based on. Such processing can be performed, for example, as processing for determining the polarity of a sentence represented by character string information based on a word included in character string information. Specifically, when the character string information of news is uttered, the character string information of the news is included in the action information together with the information for identifying the feeling of sadness when a word representing a tragic event is included.

［情報の削除］
また、本実施の形態において、会話文キューを記憶するサーバ１０または端末装置２０は、所定のタイミングごとに会話文キューから、既に経過している発話期間（終了している発話期間）に関連付けられた会話文を削除することとしてもよい。 [Delete information]
Further, in the present embodiment, the server 10 or the terminal device 20 that stores the conversational sentence queue is associated with the already elapsed utterance period (the finished utterance period) from the conversational sentence queue at predetermined timing intervals. You may delete the conversational sentence.

さらに、会話文キューに格納した情報のうち、会話文生成部４４が格納した会話文については、当該会話文が発話されたときに会話文キューから削除してもよい。これにより、同じ会話文が何度も再生されてしまうことを防止する。 Further, among the information stored in the conversation sentence queue, the conversation sentence stored by the conversation sentence generation unit 44 may be deleted from the conversation sentence queue when the conversation sentence is uttered. This prevents the same conversational sentence from being played over and over again.

［スケジュール情報からの取得］
さらに本実施の形態のここまでの説明では、イベント情報取得部４３として機能するサーバ１０または端末装置２０は、ユーザにより入力された音声の内容を表す文字列情報がイベントに関わる情報のパターンに合致する場合に、当該文字列情報から日時情報と、イベント特定情報とを取り出していた。しかしながら本実施の形態はこれに限られない。 [Get from schedule information]
Further, in the description of the present embodiment so far, in the server 10 or the terminal device 20 that functions as the event information acquisition unit 43, the character string information representing the content of the voice input by the user matches the pattern of the information related to the event. When doing so, the date and time information and the event specific information were extracted from the character string information. However, this embodiment is not limited to this.

本実施の形態の一例では、イベント情報取得部４３は、上記の処理に代えて、あるいは上記の処理とともに、予め定められたスケジュール情報を取得して日時情報と、イベント特定情報とを取り出してもよい。具体的に、イベント情報取得部４３として機能するサーバ１０または端末装置２０は、予めユーザから、ユーザがスケジュール情報を登録するウェブサービスへのアクセス権の設定を受けておく。そして端末装置２０は、当該設定に従い、ウェブサービスからスケジュール情報を取得する。 In an example of the present embodiment, the event information acquisition unit 43 may acquire the predetermined schedule information and extract the date and time information and the event specific information in place of the above processing or in addition to the above processing. good. Specifically, the server 10 or the terminal device 20 that functions as the event information acquisition unit 43 receives in advance from the user the access right to the web service in which the user registers the schedule information. Then, the terminal device 20 acquires the schedule information from the web service according to the setting.

このイベント情報取得部４３は、ここで取得したスケジュール情報に、日時情報と、当該日時にユーザが関わるイベントを特定するイベント特定情報（例えば「デート」や「仕事」など）が含まれるときに、これら日時情報と、イベント特定情報とを取り出し、会話文生成部４４に出力する。 When the schedule information acquired here includes date and time information and event specific information (for example, "date" or "work") that identifies an event in which the user is involved in the date and time, the event information acquisition unit 43 The date and time information and the event specific information are taken out and output to the conversation sentence generation unit 44.

この例によると、ユーザとの会話に表れなくても、ユーザの過去の予定に基づいて、会話を提供できる。 According to this example, the conversation can be provided based on the user's past schedule, even if it does not appear in the conversation with the user.

［発話期間を広くとる場合］
またここまでの説明では、会話文生成部４４が、予め会話文全体を生成して会話文キューに蓄積することとしていた。そこで発話期間に応じて、会話の主題である過去のイベントが発生した日までの相対的な指示語（「昨日」、「一昨日」など）を会話文生成部４４が予め含めていた。 [When taking a wide utterance period]
Further, in the explanation so far, the conversation sentence generation unit 44 has assumed that the entire conversation sentence is generated in advance and stored in the conversation sentence queue. Therefore, the conversation sentence generation unit 44 includes in advance relative demonstrative words (“yesterday”, “the day before yesterday”, etc.) up to the day when the past event, which is the subject of the conversation, occurs according to the utterance period.

しかしながら本実施の形態はこれに限られず、会話文生成部４４は、会話文のうち、過去のイベントが発生した日を表す語の位置のみを表す会話文のテンプレートを生成して会話文キューに蓄積しておいてもよい。この場合、会話文が選択された後にアクション情報生成部４６の処理において、選択された会話文を発話するべき指示を生成する際に、その時点から、対象となる過去のイベントが発生した日を表す語を生成してもよい。 However, the present embodiment is not limited to this, and the conversation sentence generation unit 44 generates a conversation sentence template showing only the position of the word representing the day when the past event occurred in the conversation sentence, and puts it in the conversation sentence queue. You may accumulate it. In this case, in the process of the action information generation unit 46 after the conversation sentence is selected, when the instruction to utter the selected conversation sentence is generated, the day when the target past event occurred from that point is set. You may generate a word to represent it.

この例では、イベント情報取得部４３は、イベント管理部４２に対して取得したイベント情報を記録させ、当該記録させたイベントを識別する、イベントに固有なイベント識別子を発行させる。 In this example, the event information acquisition unit 43 causes the event management unit 42 to record the acquired event information and issue an event identifier unique to the event that identifies the recorded event.

そして会話文生成部４４は、このイベント情報取得部４３が取得したイベント情報（日時情報及びイベント特定情報を含む）に関わる会話文の生成時に、過去のイベントが発生した日を表す語の位置のみを表す会話文のテンプレートを生成し、対応するイベント識別子に関連付けて会話文キューに蓄積する。ここで会話文のテンプレートは例えば、「＜日時＞のデートはどうだった？」のように、日時を特定する語に置き換えるべき符号を含むものとなる。 Then, the conversation sentence generation unit 44 only positions the words representing the day when the past event occurred at the time of generating the conversation sentence related to the event information (including the date and time information and the event specific information) acquired by the event information acquisition unit 43. Generates a conversational sentence template that represents, associates it with the corresponding event identifier, and stores it in the conversational sentence queue. Here, the template of the conversation sentence includes a code to be replaced with a word that specifies the date and time, such as "How was the date of <date and time>?".

その後、アクション情報生成部４６が、このイベント識別子に関連付けられた会話文を発話の対象として選択すると、アクション情報生成部４６は、当該選択した会話文に関連付けられたイベント識別子を参照して、イベント管理部４２から当該イベント識別子に対応するイベント情報（少なくともその日時情報を含む）を取得する。 After that, when the action information generation unit 46 selects the conversational sentence associated with this event identifier as the target of speech, the action information generation unit 46 refers to the event identifier associated with the selected conversational sentence and causes an event. The event information (including at least the date and time information) corresponding to the event identifier is acquired from the management unit 42.

そして現在の日時から見て、当該日時を特定する語を生成して、選択した会話文のうち、日時を特定する語に置き換えるべき符号に置き換える。この語の生成は、予め日数の間隔に基づいて１日であれば「昨日」、２日であれば「一昨日」、３日から６日であれば「この間」、７日から１３日であれば「先週」…などとして予め定めておけば、現在の日時と対象となったイベントの発生した日時との差（日数の間隔）に基づいて得ることができる。 Then, when viewed from the current date and time, a word that specifies the date and time is generated, and the selected conversational sentence is replaced with a code that should be replaced with the word that specifies the date and time. This word is generated in advance based on the interval of days, if it is 1 day, "yesterday", if it is 2 days, "the day before yesterday", if it is 3 to 6 days, "during this time", or if it is 7 to 13 days. For example, if it is set in advance as "last week", etc., it can be obtained based on the difference (interval between days) between the current date and time and the date and time when the target event occurred.

そしてアクション情報生成部４６は、日時を特定する語を含めた会話文を発話させる指示を含めたアクション情報を生成する。 Then, the action information generation unit 46 generates action information including an instruction to utter a conversational sentence including a word that specifies the date and time.

［実施形態の特徴］
本実施の形態は、また次のことを特徴とする。すなわち、本実施の形態の一態様は、情報処理装置であって、日時情報に関連付けられた、ユーザのイベントを特定する、「デート」などのイベント特定情報を取得する取得手段と、前記日時情報が表す日時より後の日時に発話する処理において、「先週のデートどうだった」などといった、当該イベント特定情報に関連する会話文である関連会話文を生成する会話文生成手段と、を備える。 [Characteristics of Embodiment]
The present embodiment is also characterized by the following. That is, one aspect of the present embodiment is an information processing device, which is an acquisition means for acquiring event specific information such as "date", which identifies a user's event associated with date and time information, and the date and time information. In the process of speaking at a date and time after the date and time represented by, a conversation sentence generation means for generating a related conversation sentence which is a conversation sentence related to the event specific information such as "How was the date last week" is provided.

この情報処理装置は、ユーザごとに提供される端末装置として実装されてもよいし、当該端末装置に通信可能に接続されて上記の各手段を実現するサーバとして実装されてもよい。 This information processing device may be implemented as a terminal device provided for each user, or may be implemented as a server that is communicably connected to the terminal device and realizes each of the above means.

この例によると、情報処理装置が過去の予定に関する会話文を提供することとなり、過去のイベントについての会話が実現されるため、ユーザにとってよりプライベートな内容の会話を実現でき、親近感を喚起できる。 According to this example, the information processing device provides a conversation sentence about the past schedule, and the conversation about the past event is realized, so that the conversation with more private content can be realized for the user and the feeling of familiarity can be aroused. ..

またここで、ユーザから入力される会話文に、予め定められたイベントに関わる「デート」などのイベント関連語と、「来週末」などといった日時を特定する日時関連語とが含まれる場合に、前記イベント関連語に関連する前記イベント特定情報と、前記日時関連語が表す前記日時情報とを関連付けて蓄積する蓄積手段、を備える。 In addition, here, when the conversation sentence input from the user includes an event-related word such as "date" related to a predetermined event and a date-related word such as "next weekend" that specifies the date and time. The storage means for accumulating the event specific information related to the event-related word and the date and time information represented by the date-time-related word in association with each other is provided.

この例では、ユーザがスケジュール情報などとして登録せずとも、会話文として入力しているだけでイベントに関する情報が蓄積できる。 In this example, information about the event can be accumulated only by inputting it as a conversational sentence without the user registering it as schedule information or the like.

さらに、前記会話文生成手段は、「先週のデートどうだった？」などといった前記関連会話文を、当該関連会話文を発話する期間を表す情報に関連付けて記録し、前記期間に、現在の日時が含まれる場合に、ユーザから入力される会話文と、当該関連会話文とが連続性を有するか否かが判断され、連続性を有すると判断されたときに、当該関連会話文の発話処理を実行する実行手段と、を含むこととしてもよい。 Further, the conversation sentence generation means records the related conversation sentence such as "How was your date last week?" In association with information indicating the period for uttering the related conversation sentence, and records the current date and time in the period. Is included, it is determined whether or not the conversational sentence input by the user and the related conversational sentence have continuity, and when it is determined that the conversational sentence has continuity, the utterance processing of the related conversational sentence is performed. It may include an execution means for executing the above.

この例によると、過去の予定に関する会話が、自然に連続すると判断されたタイミングで現れるので、会話が自然になる。 According to this example, the conversation about the past schedule appears at the timing when it is judged to be continuous naturally, so that the conversation becomes natural.

ここで、前記実行手段は、第１の会話文と、第２の会話文と、当該第１，第２の会話文の連続性を表す情報とを互いに関連付けた学習データを用いて機械学習されたニューラルネットワークを用いて、前記受け入れた会話文と、当該関連会話文とが連続性を有するか否かを判定することとしてもよい。 Here, the execution means is machine-learned using learning data in which the first conversation sentence, the second conversation sentence, and the information representing the continuity of the first and second conversation sentences are associated with each other. It is also possible to determine whether or not the accepted conversational sentence and the related conversational sentence have continuity by using the neural network.

さらに前記関連会話文が発話されている間、ユーザが、発話を強制的に停止させる動作など、所定の入力を行うと、当該発話を停止することとしてもよい。これにより、発話が適切でない場合に発話を停止できる。 Further, while the related conversational sentence is being spoken, the user may stop the utterance when a predetermined input such as an operation of forcibly stopping the utterance is performed. This makes it possible to stop the utterance when the utterance is not appropriate.

また、前記生成した関連会話文を発話する再生装置と、当該再生装置の近傍に所在する人物を検出する手段をさらに含み、前記再生装置の近傍に所在する人物が単数であるときに限り、前記関連会話文の発話を、前記再生装置に行わせることとしてもよい。これにより、適切でない場面での発話を抑止できる。 Further, the reproduction device for uttering the generated related conversational sentence and a means for detecting a person located in the vicinity of the reproduction device are further included, and only when there is a single person in the vicinity of the reproduction device, the above-mentioned The playback device may be used to utter a related conversational sentence. As a result, it is possible to suppress utterances in inappropriate situations.

１情報処理システム、１０サーバ、１１制御部、１２記憶部、１３通信部、２０端末装置、２１脚部、２２本体部、３１制御部、３２記憶部、３３センサ部、３４表示部、３５音声出力部、３６通信部、３７駆動部、４１受信部、４２イベント管理部、４３イベント情報取得部、４４会話文生成部、４５会話履歴管理部、４６アクション情報生成部、４７指示送信部、５１リクエスト送出部、５２アクション情報受信部、５３音声合成部、５４アクション処理実行部。

1 Information system, 10 servers, 11 control units, 12 storage units, 13 communication units, 20 terminal devices, 21 legs, 22 main units, 31 control units, 32 storage units, 33 sensor units, 34 display units, 35 voices Output unit, 36 communication unit, 37 drive unit, 41 receiver unit, 42 event management unit, 43 event information acquisition unit, 44 conversation sentence generation unit, 45 conversation history management unit, 46 action information generation unit, 47 instruction transmission unit, 51 Request sending unit, 52 action information receiving unit, 53 voice synthesis unit, 54 action processing execution unit.

Claims

An acquisition method for acquiring event-specific information that identifies a user's event associated with date and time information,
A conversation sentence generation means for generating a related conversation sentence related to the event specific information in a process of uttering at a date and time after the date and time represented by the date and time information.
Information processing device equipped with.

The information processing device according to claim 1.
When the conversation sentence input from the user includes an event-related word related to a predetermined event and a date-time-related word that specifies a date and time, the event-specific information related to the event-related word and the date and time are specified. An information processing device including a storage means for accommodating and accumulating the date and time information represented by a related word.

The information processing device according to claim 1 or 2.
The conversational sentence generation means records the related conversational sentence in association with information indicating a period during which the related conversational sentence is spoken.
When the current date and time is included in the period, the conversational sentence input from the user and the execution means for executing the utterance processing of the related conversational sentence when the related conversational sentence has continuity. Information processing equipment including.

The information processing device according to claim 3.
The execution means is a machine-learned neural network using learning data in which a first conversation sentence, a second conversation sentence, and information representing the continuity of the first and second conversation sentences are associated with each other. An information processing device that determines whether or not the input conversation sentence and the related conversation sentence have continuity by using.

The information processing device according to any one of claims 1 to 4.
An information processing device that stops the utterance when the user makes a predetermined input while the related conversational sentence is being uttered.

The information processing device according to any one of claims 1 to 5.
A playback device that utters the generated related conversational sentence, and
Further including means for detecting a person located in the vicinity of the playback device, including
An information processing device that causes the playback device to utter the related conversational sentence only when there is only one person located in the vicinity of the playback device.

Computer,
An acquisition method for acquiring event-specific information that identifies a user's event associated with date and time information,
A conversation sentence generation means for generating a related conversation sentence related to the event specific information in a process of uttering at a date and time after the date and time represented by the date and time information.
A program that functions as.