JP2025071574A

JP2025071574A - Vehicle Dialogue System

Info

Publication number: JP2025071574A
Application number: JP2023181854A
Authority: JP
Inventors: 順雄藤田; Yorio Fujita
Original assignee: Yazaki Corp
Current assignee: Yazaki Corp
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2025-05-08

Abstract

To provide a vehicle dialogue system that can give occupants more natural dialogue experiences.SOLUTION: A vehicle dialogue system (1) using generative AI (10) includes: a voice input unit (2) that inputs the voice spoken by an occupant; a voice recognition unit (41) that converts the voice input by the voice input unit into first text data; a feeling estimation unit (45) that inputs at least one of the voice and the first text data to estimate the occupant's feelings and generates second text data representing an estimation result; a control unit (42) that inputs the first text data and the second text data, as input information (S1), to the generative AI (10); a voice synthesis unit (43) that converts response information (S2) from the generative AI (10) into voice; and a voice output unit (5) that outputs the voice converted by the voice synthesis unit (43).SELECTED DRAWING: Figure 1

Description

本発明は、車両用対話システムに関する。 The present invention relates to a dialogue system for vehicles.

自動車用音声対話システムとして、ユーザーからシステムに対して話しかけると、システムがあらかじめ登録されている対話シナリオの中から回答として適しているものを選択し、音声に変えて応答するものが提案されている（特許文献１）。 A voice dialogue system for automobiles has been proposed in which, when a user speaks to the system, the system selects an appropriate answer from among pre-registered dialogue scenarios and responds by converting the answer into voice (Patent Document 1).

このような音声対話システムは、ユーザーの発話内容が多少曖昧であっても、あらかじめ記録されている対話シナリオから適宜回答を選択する。しかしながら、このような音声対話システムは、対話シナリオから選択された回答を、状況にかかわらず同様の口調で出力するため、運転者が回答に違和感を覚える場合があった。 This type of voice dialogue system selects an appropriate answer from a pre-recorded dialogue scenario even if the user's utterance is somewhat vague. However, this type of voice dialogue system outputs the answer selected from the dialogue scenario in the same tone of voice regardless of the situation, which can make the driver feel uncomfortable with the answer.

特開２０１８－５４７９０号公報JP 2018-54790 A

本発明は、上述した事情に鑑みてなされたものであり、その目的は、乗員に対しより自然な対話体験を与えることができる車両用対話システムを提供することにある。 The present invention has been made in consideration of the above-mentioned circumstances, and its purpose is to provide a dialogue system for a vehicle that can provide a more natural dialogue experience for occupants.

前述した目的を達成するために、本発明に係る車両用対話システムは、下記を特徴としている。
車両に搭載され、テキストデータから成る入力情報を入力すると、テキストデータから成る応答情報を出力する生成ＡＩを利用した車両用対話システムであって、
乗員が発話した音声を入力する音声入力部と、
前記音声入力部により入力された前記音声を第１テキストデータに変換する音声認識部と、
前記音声及び前記第１テキストデータの少なくとも一方を入力して前記乗員の感情を推定し、推定結果を表す第２テキストデータを生成する感情推定部と、
前記第１テキストデータと、前記第２テキストデータと、を前記入力情報として前記生成ＡＩに入力する制御部と、
前記生成ＡＩからの前記応答情報を音声に変換する音声合成部と、
前記音声合成部により変換された前記音声を出力する音声出力部と、を備える、
車両用対話システム。 In order to achieve the above object, the vehicle dialogue system according to the present invention has the following features.
A vehicle dialogue system using a generation AI that is mounted on a vehicle and that outputs response information made up of text data when input information made up of text data is input,
a voice input unit for inputting a voice uttered by a passenger;
a voice recognition unit that converts the voice input by the voice input unit into first text data;
a feeling estimation unit that estimates a feeling of the occupant by inputting at least one of the voice and the first text data and generates second text data representing an estimation result;
a control unit that inputs the first text data and the second text data as the input information to the generation AI;
A voice synthesis unit that converts the response information from the generation AI into voice;
a voice output unit that outputs the voice converted by the voice synthesis unit,
Dialogue system for vehicles.

本発明によれば、運転者あるいは乗員に対しより自然な対話体験を与えることができる車両用対話システムを提供することができる。 The present invention provides a dialogue system for a vehicle that can provide a more natural dialogue experience for the driver or passengers.

以上、本発明について簡潔に説明した。更に、以下に説明される発明を実施するための形態（以下、「実施形態」という。）を添付の図面を参照して通読することにより、本発明の詳細は更に明確化されるであろう。 The present invention has been briefly described above. The details of the present invention will become clearer by reading the following description of the embodiment of the invention (hereinafter referred to as "embodiment") with reference to the attached drawings.

図１は、第１実施形態の車両用対話システムを示すブロック図である。FIG. 1 is a block diagram showing a vehicle dialogue system according to a first embodiment. 図２は、図１に示すマイコンの処理手順を示すフローチャートである。FIG. 2 is a flowchart showing a processing procedure of the microcomputer shown in FIG. 図３は、エージェントの状態遷移図である。FIG. 3 is a state transition diagram of an agent. 図４は、エージェントの状態を説明する図である。FIG. 4 is a diagram for explaining the states of an agent. 図５は、第２実施形態の車両用対話システムを示すブロック図である。FIG. 5 is a block diagram showing a vehicle dialogue system according to the second embodiment.

本発明に関する具体的な実施形態について、各図を参照しながら以下に説明する。
図１は、本発明の車両用対話システムの一実施形態を示すブロック図である。なお、以下の説明では、運転者が発話する場合を例にしているが、同乗者など車両に乗車しているいずれの乗員が発話する場合にも適用可能である。 Specific embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a block diagram showing an embodiment of a vehicle dialogue system according to the present invention. In the following description, a case where the driver speaks is taken as an example, but the present invention can also be applied to a case where any occupant in the vehicle, such as a passenger, speaks.

本実施形態の車両用対話システム１は、車両に搭載され、生成ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）１０を利用して、運転者と対話するシステムである。生成ＡＩ１０は、例えばＣｈａｔＧＰＴから構成され、テキストデータから成る入力情報Ｓ１を入力するとテキストデータから成る応答情報Ｓ２を出力する。 The vehicle dialogue system 1 of this embodiment is a system that is mounted on a vehicle and communicates with the driver using a generation AI (Artificial Intelligence) 10. The generation AI 10 is configured, for example, from ChatGPT, and when input information S1 consisting of text data is input, response information S2 consisting of text data is output.

車両用対話システム１は、音声入力部としてのマイク２と、通信モジュール３と、エージェントとして機能するマイクロコンピュータ４（以下、「マイコン４」と略記）と、音声出力部としてのスピーカ５と、表示部７とを備えている。マイク２は、運転者が発話した音声をマイコン４へ入力する。通信モジュール３は、インターネット通信網（図示せず）を介して生成ＡＩ１０と通信を行うためのものであり、インターネット通信網に接続するための回路やアンテナなどで構成される。 The vehicle dialogue system 1 includes a microphone 2 as a voice input unit, a communication module 3, a microcomputer 4 (hereinafter abbreviated as "microcomputer 4") that functions as an agent, a speaker 5 as a voice output unit, and a display unit 7. The microphone 2 inputs voice spoken by the driver to the microcomputer 4. The communication module 3 is for communicating with the generation AI 10 via an Internet communication network (not shown), and is composed of circuits, antennas, etc. for connecting to the Internet communication network.

マイコン４は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）やＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）などのメモリと、メモリに格納されたプログラムに従って動作するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、を有し、車両用対話システム１全体の制御を司る。 The microcomputer 4 has a memory such as a RAM (Random Access Memory) or a ROM (Read Only Memory), and a CPU (Central Processing Unit) that operates according to a program stored in the memory, and is responsible for controlling the entire vehicle dialogue system 1.

マイコン４は、音声認識部４１と、生成ＡＩ１０に対する入力制御や接続されている機器の制御、これらの機器に対する出力制御などを実行する制御部としての音声対話部４２と、音声合成部４３と、感情推定部４５と、を有している。音声認識部４１は、マイク２により入力された音声信号をテキストデータ（第１テキストデータ）に変換して、音声対話部４２及び感情推定部４５に入力する。音声対話部４２は、音声認識部４１により変換されたテキストデータに、感情推定部４５により決定された運転者の感情に応じたテキストデータ（第２テキストデータ）や、運転者が発話したときの運転者または運転者が運転する車両の状態、または、状態に応じた命令を示すテキストデータ（第３テキストデータ）を付与して入力情報Ｓ１として生成ＡＩ１０に入力する。 The microcomputer 4 has a voice recognition unit 41, a voice dialogue unit 42 as a control unit that performs input control for the generation AI 10, control of connected devices, output control for these devices, etc., a voice synthesis unit 43, and an emotion estimation unit 45. The voice recognition unit 41 converts the voice signal input by the microphone 2 into text data (first text data) and inputs it to the voice dialogue unit 42 and the emotion estimation unit 45. The voice dialogue unit 42 adds text data (second text data) corresponding to the driver's emotion determined by the emotion estimation unit 45, text data (third text data) indicating the state of the driver or the vehicle driven by the driver when the driver spoke, or an instruction corresponding to the state, to the text data converted by the voice recognition unit 41, and inputs it to the generation AI 10 as input information S1.

また、音声対話部４２には、車両に搭載されている複数のセンサからの車両情報Ｓ３が入力されている。センサとしては、車外の照度を計測する照度センサ、外気温を計測する温度センサ、車両位置を検出するＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）、座席に人が座っているか否かを検出する着座センサ、車両の速度を検出する速度センサなどが考えられる。センサから入力される車両情報Ｓ３は、センサが計測した計測値だけでなく、計測値の移動平均や加重平均であってもよいし、計測値や計測値の移動平均、加重平均を使って演算された値であってもよい。 Vehicle information S3 is also input to the voice dialogue unit 42 from multiple sensors mounted on the vehicle. Possible sensors include an illuminance sensor that measures the illuminance outside the vehicle, a temperature sensor that measures the outside air temperature, a GPS (Global Positioning System) that detects the vehicle position, a seating sensor that detects whether or not a person is sitting in a seat, and a speed sensor that detects the vehicle speed. The vehicle information S3 input from the sensor may be not only a measurement value measured by the sensor, but also a moving average or weighted average of the measurement values, or a value calculated using the measurement value, the moving average, or the weighted average of the measurement values.

また、音声対話部４２には、運転者の顔を撮影した画像に基づいて運転者の状態（表情、居眠り、漫然運転、わき見運転をしているか否か）を検出するドライバモニタからの検出結果である人情報Ｓ４が入力されている。また、音声対話部４２には、車両の異常（エンジン異常、油圧異常、水温異常、充電異常など）を検出して警告ランプの点灯させる異常検出部からの検出結果である警告情報が入力されてもよい。 The voice dialogue unit 42 also receives input of human information S4, which is the detection result from a driver monitor that detects the driver's state (facial expression, whether the driver is dozing, driving absent-mindedly, or not) based on an image of the driver's face. The voice dialogue unit 42 may also receive input of warning information, which is the detection result from an abnormality detection unit that detects abnormalities in the vehicle (engine abnormality, oil pressure abnormality, water temperature abnormality, charging abnormality, etc.) and turns on a warning lamp.

また、音声対話部４２は、車両に搭載された車両機器１１に接続されていて、車両機器１１の制御を行うことができる。車両機器１１としては、例えば、車両に搭載されたエアーコンディション、ヘッドアップディスプレイなどの表示器、オーディオ機器が考えられる。 The voice dialogue unit 42 is also connected to vehicle equipment 11 mounted on the vehicle and can control the vehicle equipment 11. Examples of the vehicle equipment 11 include an air conditioner, a display such as a head-up display, and audio equipment mounted on the vehicle.

音声対話部４２は、生成ＡＩ１０からの応答情報Ｓ２を入力し、入力した応答情報Ｓ２を音声合成部４３に出力する。音声合成部４３は、応答情報Ｓ２を音声信号に変換して、スピーカ５に出力する。スピーカ５は、音声合成部４３により変換された音声を出力する。表示部７は、運転席近傍に設置されるモニタやヘッドアップディスプレイ（ＨＵＤ）など、運転者に対し情報を表示可能な画面を有する手段により構成されている。マイコン４は、スピーカ５により音声が出力される際に、音声に連動したキャラクター表示を画面上で行う。 The voice dialogue unit 42 inputs the response information S2 from the generation AI 10 and outputs the input response information S2 to the voice synthesis unit 43. The voice synthesis unit 43 converts the response information S2 into a voice signal and outputs it to the speaker 5. The speaker 5 outputs the voice converted by the voice synthesis unit 43. The display unit 7 is composed of a means having a screen that can display information to the driver, such as a monitor installed near the driver's seat or a head-up display (HUD). When voice is output by the speaker 5, the microcontroller 4 displays a character linked to the voice on the screen.

感情推定部４５は、マイク２から入力された音声信号及び音声認識部４１から入力されたテキストデータに基づいて運転者の感情をリアルタイムで推定し、推定結果を示すテキストデータを生成して音声対話部４２に入力する。 The emotion estimation unit 45 estimates the driver's emotion in real time based on the voice signal input from the microphone 2 and the text data input from the voice recognition unit 41, generates text data indicating the estimation result, and inputs it to the voice dialogue unit 42.

感情推定部４５は、マイク２から入力された音声信号の波形から、運転者の声のトーン、強弱、話速、スペクトル特性などの特徴量を抽出する。また、感情推定部４５は、音声認識部４１から入力されたテキストデータから、感情を示している表現や言い回しなどを抽出し、感情を表す特徴量に変換する。これらの特徴量と感情の区分とは対応マップとして感情推定部４５に記憶されている。感情の区分は、例えば「喜び」「怒り」「悲しみ」「驚き」「中立」に分けられている。 The emotion estimation unit 45 extracts features such as the tone, strength, speech rate, and spectral characteristics of the driver's voice from the waveform of the voice signal input from the microphone 2. The emotion estimation unit 45 also extracts expressions and phrases that indicate emotions from the text data input from the voice recognition unit 41 and converts them into features that represent emotions. These features and emotion categories are stored in the emotion estimation unit 45 as a correspondence map. The emotion categories are divided into, for example, "happiness," "anger," "sadness," "surprise," and "neutral."

一例として、対応マップは、あらかじめ初期値により特徴量と各区分とが対応付けられている。したがって、感情推定部４５は、特徴量を抽出すると、対応マップを参照して運転者の現在の感情を決定し、テキストデータとして音声対話部４２に入力する。 As an example, the correspondence map is a map in which features are associated with each category using initial values in advance. Therefore, when the emotion estimation unit 45 extracts features, it refers to the correspondence map to determine the driver's current emotion and inputs it to the voice dialogue unit 42 as text data.

また、感情推定部４５は、音声信号から抽出される特徴量と、音声認識部４１から入力されたテキストデータとを比較して、対応マップで感情の各区分に対応付けられている特徴量を補正するようにしてもよい。 The emotion estimation unit 45 may also compare the features extracted from the voice signal with the text data input from the voice recognition unit 41, and correct the features associated with each emotion category in the correspondence map.

感情推定部４５は、対応マップによって決定された運転者の感情を、テキストデータとして音声対話部４２に出力する。音声対話部４２は、感情推定部４５により推定された感情に応じたテキストデータを、生成ＡＩ１０に入力する入力情報Ｓ１に追加する。例えば、感情推定部４５により運転者の感情が「怒り」と決定された場合、音声対話部４２は、「怒りを和らげるような言葉で」という指示を追加する。これにより、生成ＡＩ１０は、怒りを鎮めるような内容や、穏やかな言葉遣いの回答を生成することができる。 The emotion estimation unit 45 outputs the driver's emotion determined by the correspondence map to the voice dialogue unit 42 as text data. The voice dialogue unit 42 adds text data corresponding to the emotion estimated by the emotion estimation unit 45 to the input information S1 input to the generation AI 10. For example, if the emotion estimation unit 45 determines that the driver's emotion is "anger," the voice dialogue unit 42 adds the instruction "use words that will calm the anger." This allows the generation AI 10 to generate an answer that calms the anger or uses gentle language.

なお、感情推定部４５は、決定された運転者の感情を継続的に保持するようにしてもよい。この場合、設定により感情が保持される継続期間を無期限にしたり所定時間などの有期限にするよう設定できると好適である。また、このように感情が保持される場合には、マイコン４が、感情推定部４５に保持されている感情を乗員による入力または車両の状況が所定の条件を満たしたことなどによってリセットできるように構成されていると好適である。 The emotion estimation unit 45 may be configured to continuously retain the determined emotion of the driver. In this case, it is preferable that the duration for which the emotion is retained can be set to be indefinite or for a limited period such as a predetermined time. In addition, when emotions are retained in this manner, it is preferable that the microcomputer 4 is configured to reset the emotion retained in the emotion estimation unit 45 by input from the occupant or when the vehicle situation satisfies a predetermined condition.

次に、上述した構成の車両用対話システム１の動作について図２に示すフローチャートを参照して説明する。マイコン４は、イグニッションオン（ＩＧ－ＯＮ）を検知すると（Ｓｐ１）、生成ＡＩ１０に対して運転者及び車両の状態を示すテキストデータを送信する（Ｓｐ２）。この運転者及び車両の状態を示すテキストデータの一例としては、「これから話し相手が車に乗ります。車に乗っていることを考えた内容で会話してください。」というものが考えられる。これにより、生成ＡＩ１０が、話し相手（運転者）が車に乗っているという状態を理解し、以降は車に乗っていることが前提の対話内容となる。 Next, the operation of the vehicle dialogue system 1 configured as described above will be explained with reference to the flowchart shown in FIG. 2. When the microcomputer 4 detects that the ignition is on (IG-ON) (Sp1), it transmits text data indicating the state of the driver and the vehicle to the generation AI 10 (Sp2). An example of this text data indicating the state of the driver and the vehicle could be "The person you are speaking to is about to get in the car. Please keep the conversation in a way that takes into account the fact that the person is in a car." This allows the generation AI 10 to understand that the person you are speaking to (the driver) is in the car, and from then on the dialogue content will be based on the assumption that the person is in the car.

また、Ｓｐ２において、マイコン４は、制御できる車両機器１１の情報のテキストデータを送信するようにしてもよい。このテキストデータの一例としては、「話し相手との会話に応じて、エアコンの温度、風量の調節、ヘッドアップディスプレイの輝度調整、オーディオの制御を提案できます。」というものが考えられる。これにより、生成ＡＩ１０が、運転者が乗車している車両においてエアコンの温度、風量の調節、ヘッドアップディスプレイの輝度調整、オーディオの制御ができるという状態を理解できる。 In addition, in Sp2, the microcontroller 4 may transmit text data of information on the vehicle equipment 11 that can be controlled. An example of this text data could be, "We can suggest adjusting the temperature and airflow of the air conditioner, adjusting the brightness of the head-up display, and controlling the audio, depending on the conversation with the person you are talking to." This allows the generation AI 10 to understand that the driver can adjust the temperature and airflow of the air conditioner, adjust the brightness of the head-up display, and control the audio in the vehicle he is riding in.

また、Ｓｐ２において、マイコン４は、着座センサからの検出結果に基づいて車両に乗車している人数を判定し、判定された人数を生成ＡＩ１０に送信するようにしてもよい。このテキストデータの一例としては、「運転者しか乗車していません」や「運転者を含めて２名乗車しています。」というものが考えられる。これにより、生成ＡＩ１０は、運転者が一人なのか、他に乗車している人がいる状態なのかを理解できる。 In addition, in Sp2, the microcontroller 4 may determine the number of people riding in the vehicle based on the detection results from the seating sensor, and transmit the determined number of people to the generation AI 10. An example of this text data could be "Only the driver is riding in the vehicle" or "There are two people riding in the vehicle, including the driver." This allows the generation AI 10 to understand whether there is only one driver or whether there are other people riding in the vehicle.

次に、マイコン４は、上述した車両情報Ｓ３や人情報Ｓ４を取得する（Ｓｐ３）。その後、マイコン４は、運転者が発話すると（Ｓｐ４でＹ）、運転者が発話した音声をテキストデータに変換する音声認識処理と、音声に基づいた運転者の感情推定を行う（Ｓｐ５）。次に、マイコン４は、音声認識処理により変換したテキストデータに、Ｓｐ３で推定された運転者の感情や、発話したときの運転者及び車両の状態、または、状態に応じた命令（テキストデータ）を付与し、入力情報Ｓ１として生成ＡＩ１０に送信する（Ｓｐ６）。 Next, the microcontroller 4 acquires the vehicle information S3 and person information S4 described above (Sp3). After that, when the driver speaks (Y in Sp4), the microcontroller 4 performs a voice recognition process to convert the voice spoken by the driver into text data, and estimates the driver's emotions based on the voice (Sp5). Next, the microcontroller 4 adds the driver's emotions estimated in Sp3, the state of the driver and vehicle at the time of speaking, or commands (text data) according to the state, to the text data converted by the voice recognition process, and transmits it to the generation AI 10 as input information S1 (Sp6).

Ｓｐ６においてマイコン４は、Ｓｐ５において推定された運転者の感情に応じて、運転者の感情がポジティブになる回答を得るためのプロンプト調整用のテキストデータを生成する。そして、マイコン４は、生成したテキストデータを、Ｓｐ３で取得した車両情報Ｓ３や人情報Ｓ４を示すテキストデータとともに、入力情報Ｓ１に付与することが考えられる。また、マイコン４は、車両が運転中か、停車中か、信号待ち中かを示すテキストデータを、車両の状態を示すテキストデータとして付与することが考えられる。 In Sp6, the microcomputer 4 generates text data for prompt adjustment to obtain an answer that will make the driver feel positive, according to the driver's emotions estimated in Sp5. The microcomputer 4 may then add the generated text data to the input information S1 together with text data indicating the vehicle information S3 and person information S4 acquired in Sp3. The microcomputer 4 may also add text data indicating whether the vehicle is driving, stopped, or waiting at a traffic light as text data indicating the vehicle's status.

また、マイコン４は、運転者が発話したときの運転者の運転負荷が高いか否かを判定する。運転負荷が高いか否かは、例えば、速度センサからの速度（車両情報Ｓ３）や人情報Ｓ４に基づいて判定できる。また、予め登録された運転者の属性（運転に慣れた人か／そうでない人か、車両の機器に詳しい人か／そうでない人か）に基づいて判断してもよい。マイコン４は、運転者が発話したときに運転負荷が高いと判定した場合、応答情報Ｓ２を短くする旨のテキストデータを状態に応じた命令として付与することが考えられる。このときのテキストデータの一例としては、「回答は要約してください」というものが考えられる。 The microcomputer 4 also determines whether the driver's driving load is high when the driver speaks. Whether the driving load is high can be determined, for example, based on the speed from a speed sensor (vehicle information S3) or person information S4. It may also be determined based on pre-registered driver attributes (whether the driver is accustomed to driving or not, whether the driver is familiar with the vehicle's equipment or not). If the microcomputer 4 determines that the driving load is high when the driver speaks, it may assign text data to shorten the response information S2 as a command according to the state. An example of such text data may be, "Please summarize your answer."

その後、マイコン４は、Ｓｐ７で送信した入力情報Ｓ１に応じた応答情報Ｓ２を受信し（Ｓｐ７）、受信した応答情報Ｓ２を音声に変換してスピーカ５から出力する（Ｓｐ８）。次に、マイコン４は、イグニッションオフ（ＩＧ－ＯＦＦ）を検知すると（Ｓｐ９でＹ）、処理を終了する。これに対して、マイコン４は、イグニッションオフが検知されなければ（Ｓｐ９でＮ）、再びＳｐ３に戻る。 Then, the microcomputer 4 receives response information S2 corresponding to the input information S1 sent in Sp7 (Sp7), converts the received response information S2 into voice, and outputs it from the speaker 5 (Sp8). Next, when the microcomputer 4 detects that the ignition is off (IG-OFF) (Y in Sp9), it ends the process. On the other hand, if the microcomputer 4 does not detect that the ignition is off (N in Sp9), it returns to Sp3 again.

また、マイコン４は、Ｓｐ３を実行後、運転者が発話していなければ（Ｓｐ４でＮ）、Ｓｐ３により取得した情報Ｓ３～Ｓ４に基づいて予め定めた提案条件が成立していれば（Ｓｐ１０）、提案内容を音声データに変換してスピーカ５から出力した後（Ｓｐ１１）、Ｓｐ３に戻る。Ｓｐ１１の提案条件としては、例えば、速度センサにより計測された速度（車両情報Ｓ３）が早ければ、速度を下げる提案条件が成立したとして、速度を下げる提案をスピーカ５から出力する。 Furthermore, after executing Sp3, if the driver has not spoken (N in Sp4), or if a predetermined proposal condition based on the information S3-S4 acquired by Sp3 is met (Sp10), the microcomputer 4 converts the proposal content into voice data and outputs it from the speaker 5 (Sp11), and then returns to Sp3. As a proposal condition for Sp11, for example, if the speed measured by the speed sensor (vehicle information S3) is fast, the proposal condition to reduce the speed is met, and a proposal to reduce the speed is output from the speaker 5.

なお、以上のような制御が実行される際に、マイコン４は、車両のディスプレイにキャラクターを表示し、スピーカ５からの音声の出力に合わせてキャラクターがあたかも運転者に話しかけているように表示するようにしてもよい。この場合、運転者の感情に応じてマイコン４が自動的にキャラクターや表情などの見かけを変更したり、運転者が操作によりキャラクターを変更するようにしてもよい。 When the above-mentioned control is executed, the microcomputer 4 may display a character on the vehicle's display, and may display the character as if it is speaking to the driver in sync with the audio output from the speaker 5. In this case, the microcomputer 4 may automatically change the appearance of the character, such as the facial expression, according to the driver's emotions, or the driver may change the character through operation.

マイコン４が自動的にキャラクターの表情を変更する場合には、例えば、「怒り」や「悲しみ」など、運転者の感情がネガティブである場合には、上述したように、音声対話部４２が生成ＡＩ１０に対し、運転者の感情がポジティブになるような回答をするよう指示を追加するが、これに合わせ、キャラクターの表情や動きも、優しさを表したり、ゆっくりした動きにするなど、運転者の感情がポジティブになるように表示する。逆に、運転者の感情がポジティブである場合には、キャラクターの表情も嬉しさを表すことにより、運転者のポジティブな感情に共感するような表示を行う。 When the microcontroller 4 automatically changes the character's facial expression, for example, if the driver's emotion is negative, such as "anger" or "sadness," as described above, the voice dialogue unit 42 adds instructions to the generation AI 10 to give an answer that will make the driver's emotion positive, and in accordance with this, the character's facial expression and movements are also displayed to make the driver's emotion positive, such as by expressing kindness or by moving slowly. Conversely, if the driver's emotion is positive, the character's facial expression also expresses happiness, thereby displaying empathy with the driver's positive emotion.

また、別の例として、運転者の感情がネガティブである場合には、マイコン４がキャラクターの感情を最もポジティブに誘導しやすいキャラクターに変更してもよい。いずれのキャラクターへの変更が運転者の感情をポジティブにするために最も効果的であるかを判断するには、例えば、音声対話部４２がキャラクターを変更した後の運転者の感情の変化をデータとして蓄積することにより行われる。つまり、運転者とエージェントとが会話を重ねることにより、キャラクターやその回答内容が運転者の好みに最適化されていく。 As another example, if the driver's emotions are negative, the microcomputer 4 may change the character to one that is most likely to induce positive emotions. To determine which character change is most effective for making the driver's emotions positive, for example, the voice dialogue unit 42 may accumulate data on the changes in the driver's emotions after changing the character. In other words, as the driver and the agent have repeated conversations, the character and its responses are optimized to suit the driver's preferences.

図３は、エージェントの状態遷移図である。図４は、エージェントの状態を説明する図である。
表示部７の画面にエージェントの機能に連動したキャラクターが表示される場合には、マイコン４は、まずスリープ状態にある。スリープ状態において、キャラクターは、基本画面の表示や操作の表示の邪魔にならないように小さくなっている。 Fig. 3 is a state transition diagram of an agent, and Fig. 4 is a diagram for explaining the states of an agent.
When a character linked to the function of an agent is displayed on the screen of the display unit 7, the microcomputer 4 is first in a sleep state. In the sleep state, the character is made small so as not to obstruct the display of the basic screen or the display of operations.

運転者が発声したウェイクアップワードがマイク２に入力されると、マイコン４は、ウェイクアップの状態に遷移し、エージェントとしての機能を開始する。この状態で、マイコン４は運転者の発話を待っている。もし所定時間待っても運転者の発話がなければ、マイコンはスリープ状態に移行する。運転者が所定時間以内に発話した場合には、マイコン４はヒアリング状態に移行し、音声認識や解析を実行する。 When the wake-up word spoken by the driver is input to the microphone 2, the microcomputer 4 transitions to a wake-up state and starts functioning as an agent. In this state, the microcomputer 4 waits for the driver to speak. If the driver does not speak after waiting a specified time, the microcomputer transitions to a sleep state. If the driver speaks within the specified time, the microcomputer 4 transitions to a hearing state and performs voice recognition and analysis.

マイコン４は、運転者の発話から生成されたテキストデータや、運転者の感情、運転者や車両の状態に基づいて生成されたテキストデータから入力情報Ｓ１を生成し、生成ＡＩ１０に送信している間は、ウェイト状態となる。そして、生成ＡＩ１０からテキストデータの応答情報Ｓ２を受信すると、スピーチ状態に移行し、スピーカ５から応答情報を表す音声を出力するとともに、音声に合わせたキャラクター表示を行う。 The microcontroller 4 generates input information S1 from text data generated from the driver's speech, the driver's emotions, and text data generated based on the driver and vehicle conditions, and is in a wait state while transmitting the input information S1 to the generation AI 10. Then, when response information S2 in the form of text data is received from the generation AI 10, the microcontroller 4 transitions to a speech state, outputs a voice representing the response information from the speaker 5, and displays a character that matches the voice.

スピーチ状態において、応答情報Ｓ２の音声による出力中に運転者が発話を開始した場合には、マイコン４は、ヒアリング状態に移行するバージイン機能を実行する。バージイン機能の実行により中断された応答情報の出力は、状況に応じて運転者の発話後に再開するか、そのまま終了する。例えば、応答情報Ｓ２が運転に関し重要な情報である場合には、運転者の発話後に中断された残りの応答情報の出力を再開し、重要度が低い場合には、そのまま終了するようにしてもよい。 In the speech state, if the driver starts speaking while the response information S2 is being output by voice, the microcomputer 4 executes the barge-in function to transition to the hearing state. Depending on the situation, the output of the response information that was interrupted by the execution of the barge-in function is resumed after the driver speaks, or is terminated as it is. For example, if the response information S2 is important information related to driving, the output of the remaining response information that was interrupted after the driver speaks may be resumed, and if the importance is low, it may be terminated as it is.

マイコン４は、スピーチ状態で、応答情報Ｓ２の音声による出力が終了すると、ウェイクアップモードに移行し、再び話者の発話を待つ。 When the microcontroller 4 is in the speech state and the audio output of the response information S2 ends, it transitions to the wake-up mode and waits for the speaker to speak again.

なお、タッチパネルのように表示部７の画面がユーザによる操作画面を兼ねている場合には、運転者はドラッグ操作などによってキャラクターの表示位置を変更したり、複数の指を使ってキャラクターの表示サイズを拡大、縮小できるようにしてもよい。また、キャラクターの変更を含むエージェント機能の設定画面が表示されている状態では、マイコン４は、画面上でのキャラクター表示を中断する。 When the screen of the display unit 7 doubles as a user operation screen, such as a touch panel, the driver may change the display position of the character by dragging or the like, or may enlarge or reduce the display size of the character using multiple fingers. Also, when a setting screen for the agent function, including changing the character, is displayed, the microcomputer 4 suspends the display of the character on the screen.

図５は、第２実施形態の車両用対話システムを示すブロック図である。
本実施形態において、第１実施形態と同じ構成については同じ符号を付与しその説明を省略する。 FIG. 5 is a block diagram showing a vehicle dialogue system according to the second embodiment.
In this embodiment, the same components as those in the first embodiment are given the same reference numerals and the description thereof will be omitted.

第２実施形態において、感情推定部４５は、車両情報Ｓ３及び人情報Ｓ４を入力し、感情を表す特徴量を抽出する。例えば、感情推定部４５は、アクセル開度やブレーキ踏力、あるいは車両の加速度を示す情報が普段より大きく、運転者の表情が「怒り」に該当する場合には、これらの情報を運転者の「怒り」に関連が強くなる特徴量として抽出する。感情推定部４５は、上述した音声信号に基づく特徴量と、車両情報Ｓ３から得られた特徴量と人情報Ｓ４から得られた運転者の表情の検出結果を所定の比率で合算し、対応マップに基づいて、最終的な運転者の感情を決定する。なお、感情推定部４５は、車両情報Ｓ３及び人情報Ｓ４のうち一方のみを入力するようにしてもよい。 In the second embodiment, the emotion estimation unit 45 inputs the vehicle information S3 and the person information S4 and extracts features representing emotions. For example, when information indicating the accelerator opening, the brake pedal force, or the vehicle acceleration is greater than usual and the driver's facial expression corresponds to "anger," the emotion estimation unit 45 extracts this information as features that are strongly related to the driver's "anger." The emotion estimation unit 45 adds up the features based on the above-mentioned voice signal, the features obtained from the vehicle information S3, and the detection result of the driver's facial expression obtained from the person information S4 in a predetermined ratio, and determines the final emotion of the driver based on the correspondence map. Note that the emotion estimation unit 45 may input only one of the vehicle information S3 and the person information S4.

なお、本発明は、上述した実施形態に限定されるものではなく、適宜、変形、改良、等が可能である。その他、上述した実施形態における各構成要素の材質、形状、寸法、数、配置箇所、等は本発明を達成できるものであれば任意であり、限定されない。 The present invention is not limited to the above-described embodiment, and can be modified, improved, etc. as appropriate. In addition, the material, shape, size, number, location, etc. of each component in the above-described embodiment are arbitrary as long as they can achieve the present invention, and are not limited.

例えば、感情推定部４５は、感情の区分に加え、運転者の疲労度合いを推定し、感情の決定に加えて、あるいは感情の決定に変えて、運転者の疲労度合いを示すテキストデータを音声対話部４２に入力してもよい。一例として、音声対話部４２は、運転者が疲労している場合には生成ＡＩ１０に入力する入力情報Ｓ１に「優しく語り掛けるような言葉で」という指示を追加する。 For example, the emotion estimation unit 45 may estimate the driver's degree of fatigue in addition to classifying emotions, and input text data indicating the driver's degree of fatigue to the voice dialogue unit 42 in addition to or instead of determining the emotion. As an example, when the driver is fatigued, the voice dialogue unit 42 adds an instruction to "speak in gentle, conversational words" to the input information S1 input to the generation AI 10.

また、マイコン４は、運転者と同乗者の雰囲気に応じて回答やキャラクターを変更してもよい。例えば、マイク２に運転者と同乗者の双方の音声が入力された場合、それぞれの音声信号を示すテキストデータが音声認識部４１から感情推定部４５に入力される。感情推定部４５が、運転者と同乗者のいずれの感情もネガティブであると決定した場合には、音声対話部４２は、それを受けて「喧嘩を仲裁するような言葉で」などの指示を入力情報Ｓ１に追加する。また、音声対話部４２は、感情推定部４５から得られた運転者と同乗者の感情がいずれもポジティブで、運転者と同乗者の発話量が多い場合には、運転者と同乗者の会話を邪魔しないよう、「明るい雰囲気でなるべく短めの言葉で」などの指示を入力情報Ｓ１に追加する。逆に、運転者と同乗者の感情がいずれもポジティブであるにもかかわらず、運転者と同乗者の発話量が少ない場合には、「明るい雰囲気で長めの言葉で」とか「豆知識も追加して」などの指示を入力情報Ｓ１に追加する。 The microcomputer 4 may also change the answer or character depending on the mood of the driver and the passenger. For example, when the voices of both the driver and the passenger are input to the microphone 2, text data indicating the respective voice signals is input from the voice recognition unit 41 to the emotion estimation unit 45. When the emotion estimation unit 45 determines that the emotions of both the driver and the passenger are negative, the voice dialogue unit 42 adds an instruction such as "use words that mediate a fight" to the input information S1. When the emotions of the driver and the passenger obtained from the emotion estimation unit 45 are both positive and the driver and the passenger speak a lot, the voice dialogue unit 42 adds an instruction such as "use a cheerful atmosphere and keep words as short as possible" to the input information S1 so as not to disturb the conversation between the driver and the passenger. Conversely, when the emotions of both the driver and the passenger are positive but the driver and the passenger speak a little, the voice dialogue unit 42 adds an instruction such as "use a cheerful atmosphere and keep words as short as possible" to the input information S1.

また、マイコン４は、車両の走行シーンに応じて、キャラクターや口調を変更するようにしてもよい。例えば、マイコン４は、同じキャラクターを表示する場合でも、夏と冬、日中と夜間、晴天と雨天など、走行シーンの違いによって衣装を変更したり、キャラクターそのものを変更してもよい。 The microcomputer 4 may also change the character or tone of voice depending on the driving scene of the vehicle. For example, even when the same character is displayed, the microcomputer 4 may change the costume or the character itself depending on the driving scene, such as summer and winter, daytime and nighttime, sunny and rainy, etc.

また、上記実施形態では、感情推定部４５は、音声認識部４１からのテキストデータ、または当該テキストデータと車両情報Ｓ３に基づいて感情を推定する場合について説明したが、状況に応じては車両情報Ｓ３のみに応じて感情を推定するようにしてもよい。 In addition, in the above embodiment, the emotion estimation unit 45 estimates emotions based on text data from the voice recognition unit 41, or based on the text data and the vehicle information S3. However, depending on the situation, emotions may be estimated based only on the vehicle information S3.

また、感情推定部４５は、人情報Ｓ４を入力し、人情報Ｓ４から運転者の感情を示す特徴量を抽出して、感情を決定する際のパラメータの１つとしてもよい。また、感情推定部４５は、生成ＡＩ１０からの回答に対する運転者の更なる質問などの発話の音声に含まれる特徴量や、発話の分量などから、運転者がどのような感情のときにどのようなキャラクターや回答の口調を用いると運転者がポジティブになるかを情報として蓄積し、状況に応じたキャラクターの設定変更を行うようにする。 The emotion estimation unit 45 may also input human information S4, extract features indicating the driver's emotions from the human information S4, and use the extracted features as one of the parameters when determining the emotion. The emotion estimation unit 45 may also accumulate information on what emotions the driver is feeling and what character or tone of response would make the driver feel positive, based on features contained in the voice of the driver's utterances, such as further questions in response to the answer from the generation AI 10, and the amount of speech, and change the character settings according to the situation.

ここで、上述した本発明の実施形態に係る車両用対話システムの特徴をそれぞれ以下［１］～［４］に簡潔に纏めて列記する。 Here, the features of the vehicle dialogue system according to the embodiment of the present invention described above are briefly summarized and listed below in [1] to [4].

［１］車両に搭載され、テキストデータから成る入力情報（Ｓ１）を入力すると、テキストデータから成る応答情報（Ｓ２）を出力する生成ＡＩ（１０）を利用した車両用対話システム（１）であって、
乗員が発話した音声を入力する音声入力部（２）と、
前記音声入力部により入力された前記音声を第１テキストデータに変換する音声認識部（４１）と、
前記音声及び前記第１テキストデータの少なくとも一方を入力して前記乗員の感情を推定し、推定結果を表す第２テキストデータを生成する感情推定部（４５）と、
前記第１テキストデータと、前記第２テキストデータと、を前記入力情報（Ｓ１）として前記生成ＡＩ（１０）に入力する制御部（４２）と、
前記生成ＡＩ（１０）からの前記応答情報（Ｓ２）を音声に変換する音声合成部（４３）と、
前記音声合成部（４３）により変換された前記音声を出力する音声出力部（５）と、を備える、
車両用対話システム。 [1] A vehicle dialogue system (1) that uses a generation AI (10) that is mounted on a vehicle and that outputs response information (S2) made of text data when input information (S1) made of text data is input,
A voice input unit (2) for inputting a voice uttered by a passenger;
a voice recognition unit (41) for converting the voice input by the voice input unit into first text data;
an emotion estimation unit (45) that receives at least one of the voice and the first text data to estimate an emotion of the occupant and generate second text data representing an estimation result;
A control unit (42) that inputs the first text data and the second text data as the input information (S1) to the generation AI (10);
a voice synthesis unit (43) that converts the response information (S2) from the generation AI (10) into voice;
a voice output unit (5) that outputs the voice converted by the voice synthesis unit (43),
Dialogue system for vehicles.

上記［１］の構成によれば、生成ＡＩから出力を得る際に、乗員の感情を加味した回答を得ることができる。したがって、乗員に対し回答を音声により提示する際に、同じ内容の回答でも、乗員の感情に応じて口調や言葉遣いを変化させることができるので、乗員は対話システムに対して親近感を覚えることができる。また、乗員の感情がネガティブである場合には、それを加味した回答を出力することができるので、乗員の感情をさらに悪化させることを防止できる。 According to the configuration [1] above, when obtaining output from the generation AI, it is possible to obtain an answer that takes into account the emotions of the occupant. Therefore, when presenting an answer to the occupant by voice, even if the answer has the same content, the tone and wording can be changed depending on the emotions of the occupant, so that the occupant can feel a sense of familiarity with the dialogue system. In addition, if the occupant's emotions are negative, an answer that takes these into account can be output, thereby preventing the occupant's emotions from worsening further.

［２］前記制御部（４２）は、前記第２テキストデータにより示される感情に応じた付加情報を生成し、前記入力情報（Ｓ１）に付加して前記生成ＡＩ（１０）に入力する、
上記［１］に記載の車両用対話システム。 [2] The control unit (42) generates additional information corresponding to the emotion indicated by the second text data, adds the additional information to the input information (S1), and inputs the additional information to the generation AI (10).
The vehicle dialogue system according to [1] above.

上記［２］の構成によれば、車両用対話システムは生成ＡＩに対し、乗員の感情に応じてより具体的に回答に対する口調や言葉遣いなどを指定できるので、乗員の感情に沿った適切な回答を提供することができる。 According to the configuration of [2] above, the vehicle dialogue system can specify to the generation AI the tone and wording of the response more specifically according to the occupant's emotions, so that an appropriate response can be provided that matches the occupant's emotions.

［３］前記感情推定部（４５）は、推定結果としてあらかじめ区分された複数のカテゴリーにより示される感情の１つを選択し、選択された結果を第２テキストデータとして前記制御部（４２）に入力するとともに、選択された前記結果を継続的に保持またはリセットすることができる、
上記［１］又は［２］に記載の車両用対話システム。 [3] The emotion estimation unit (45) selects one of emotions indicated by a plurality of categories classified in advance as an estimation result, inputs the selected result to the control unit (42) as second text data, and continuously holds or resets the selected result.
The vehicle dialogue system according to any one of claims 1 to 2.

上記［３］の構成によれば、車両用対話システムはあらかじめ乗員の感情をカテゴリー化しているので、乗員の感情を誤って認識する可能性を抑制できる。また、各カテゴリーに対応付けられている乗員の音声の特徴量などを補正できるので、乗員の発話が増えることにより乗員の感情をより正確に判断することができるようになる。 According to the configuration of [3] above, the vehicle dialogue system categorizes the occupant's emotions in advance, which reduces the possibility of erroneously recognizing the occupant's emotions. In addition, the vehicle dialogue system can correct the features of the occupant's voice associated with each category, so that the occupant's emotions can be judged more accurately as the occupant speaks more.

［４］乗員に対し画像情報を提示する画面を有する表示部（７）を備え、
前記制御部（４２）は、前記画面に表示されるキャラクターの見かけ又は種類を変更する、
上記［１］から［３］のいずれかに記載の車両用対話システム。 [4] A display unit (7) having a screen for presenting image information to an occupant,
The control unit (42) changes the appearance or type of the character displayed on the screen.
The vehicle dialogue system according to any one of [1] to [3] above.

上記［４］の構成によれば、車両用対話システムは、乗員の感情に応じて、出力される音声に加え、表示されるキャラクターも変化させることにより、乗員は、視覚及び聴覚のいずれにおいても車両用対話システムに親近感を覚えることができる。 According to the configuration of [4] above, the vehicle dialogue system changes not only the voice output but also the character displayed according to the occupant's emotions, allowing the occupant to feel a sense of familiarity with the vehicle dialogue system both visually and aurally.

１車両用対話システム
２マイク
３通信モジュール
４マイクロコンピュータ
５スピーカ
７表示部
１０生成ＡＩ
１１車両機器
４１音声認識部
４２音声対話部（制御部）
４３音声合成部
４５感情推定部
Ｓ１入力情報
Ｓ２応答情報
Ｓ３車両情報
Ｓ４人情報 Reference Signs List 1 Vehicle dialogue system 2 Microphone 3 Communication module 4 Microcomputer 5 Speaker 7 Display unit 10 Generation AI
11 Vehicle equipment 41 Voice recognition unit 42 Voice dialogue unit (control unit)
43 Voice synthesis unit 45 Emotion estimation unit S1 Input information S2 Response information S3 Vehicle information S4 Person information

Claims

A vehicle dialogue system using a generation AI that is mounted on a vehicle and that outputs response information made up of text data when input information made up of text data is input,
a voice input unit for inputting a voice uttered by a passenger;
a voice recognition unit that converts the voice input by the voice input unit into first text data;
a feeling estimation unit that estimates a feeling of the occupant by inputting at least one of the voice and the first text data and generates second text data representing an estimation result;
a control unit that inputs the first text data and the second text data as the input information to the generation AI;
A voice synthesis unit that converts the response information from the generation AI into voice;
a voice output unit that outputs the voice converted by the voice synthesis unit,
Dialogue system for vehicles.

The control unit generates additional information corresponding to the emotion indicated by the second text data, adds the additional information to the input information, and inputs the additional information to the generation AI.
The vehicle dialogue system according to claim 1 .

The emotion deduction unit selects one of emotions indicated by a plurality of categories classified in advance as an estimation result, inputs the selected result to the control unit as second text data, and continuously holds or resets the selected result.
The vehicle dialogue system according to claim 1 .

A display unit having a screen for presenting image information to an occupant,
The control unit changes the appearance or type of the character displayed on the screen.
The vehicle dialogue system according to claim 1 .