JP2001117581A

JP2001117581A - Feeling recognition device

Info

Publication number: JP2001117581A
Application number: JP30081099A
Authority: JP
Inventors: Tetsuya Oishi; 哲也大石; Fumio Saito; 文男斉藤
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 1999-10-22
Filing date: 1999-10-22
Publication date: 2001-04-27

Abstract

PROBLEM TO BE SOLVED: To provide a feeling recognition device which can perform feeling recognition taking the degree of a feeling into account with a simple constitution. SOLUTION: A voice recognition part 12 specified a recognized character string by recognizing an input voice collected by a microphone 10. A kind judging part 16 judges the kind (affirmative or negative) of a feeling roughly according to the contents of the recognized character string. A degree decision part 18 detects the repetition of a vocabulary, a specific exclamation, and a specific adverb included in the input voice according to the contents of the recognized character string and judges the degree (high or low) of the feeling according to the detection result. A feeling recognition part 26 combines the judged results of the rough kind of the feeling and the degree of the feeling together to judge the detailed kind of the feeling.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力音声に基づい
て感情を認識する感情認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an emotion recognition device for recognizing an emotion based on an input voice.

【０００２】[0002]

【従来の技術】従来から、入力された音声に基づいて感
情を認識する感情認識装置が知られている。このような
感情認識装置には、音声認識処理を行うことにより音声
に対応した文字列を特定し、この文字列の内容に基づい
て感情認識を行うものや、音声に基づいて声の強弱、ピ
ッチ、発声間隔等の特徴量を抽出することにより感情の
程度（抑揚）を検出して感情認識を行うもの、および両
者を複合して感情認識を行うもの等がある。これらの感
情認識装置では、文字列の内容や音声の各特徴量の変化
に基づいて、喜び、怒り、哀しみ、楽しみ、驚きなど
７、８種類に分類された感情の中から１つの感情を特定
し、これを感情認識結果として出力する。2. Description of the Related Art Conventionally, an emotion recognition device for recognizing an emotion based on an input voice has been known. Such an emotion recognition device performs a voice recognition process to specify a character string corresponding to a voice, and performs emotion recognition based on the content of the character string, or performs voice strength, pitch, In addition, there are a method in which the degree of emotion (inflection) is detected by extracting a feature amount such as an utterance interval to perform emotion recognition, and a method in which both are combined to perform emotion recognition. In these emotion recognition devices, one emotion is identified from seven or eight kinds of emotions such as joy, anger, sadness, enjoyment, and surprise based on changes in the content of a character string and each feature amount of voice. Then, this is output as an emotion recognition result.

【０００３】[0003]

【発明が解決しようとする課題】ところで、上述した従
来の感情認識装置において、文字列の内容に基づいて感
情認識を行うものでは、感情を示す語彙としてあらかじ
め用意しておいた語彙情報と文字列の内容とが一致する
か否かを判断することにより感情の種類を特定すること
はできるが、感情の程度までは特定することができない
という問題があった。また、音声の特徴量に基づいて感
情認識を行うものでは、感情の程度を判定することは比
較的容易であるが、感情の種類を特定することは困難で
あった。また、文字列の内容と音声の特徴量の両者を情
報として用いるものでは、文字列の内容を認識するため
の構成と音声の特徴量を抽出するための構成とが必要と
なるため、音声認識装置の構成が複雑化してしまうとい
う問題があった。In the above-described conventional emotion recognition apparatus, in which emotion recognition is performed based on the contents of a character string, vocabulary information and a character string prepared in advance as vocabularies indicating emotions are used. Although it is possible to identify the type of emotion by judging whether or not the content of the emotion matches, the problem is that the degree of emotion cannot be identified. Further, when emotion recognition is performed based on the feature amount of speech, it is relatively easy to determine the degree of emotion, but it is difficult to specify the type of emotion. In addition, in the case of using both the contents of a character string and the feature amount of a voice as information, a configuration for recognizing the content of a character string and a configuration for extracting a feature amount of a voice are required. There is a problem that the configuration of the apparatus is complicated.

【０００４】本発明は、このような点に鑑みて創作され
たものであり、その目的は、簡単な構成により感情の程
度を考慮した感情認識を行うことができる感情認識装置
を提供することにある。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an emotion recognition device capable of performing emotion recognition with a simple structure in consideration of the degree of emotion. is there.

【０００５】[0005]

【課題を解決するための手段】上述した課題を解決する
ために、本発明の感情認識装置は、音声認識手段によっ
て入力音声に対して音声認識処理を行って認識文字列を
特定し、この認識文字列の内容に基づいて第１の感情判
定手段によって概略的な感情の種類を判定するととも
に、認識文字列の内容に基づいて程度判定手段によって
感情の程度を判定しており、第１の感情判定手段によっ
て判定される概略的な感情の種類と程度判定手段によっ
て判定される感情の程度とに基づいて第２の感情判定手
段によって詳細な感情の種類を判定している。このよう
に、認識文字列の内容に基づいて感情の程度を判定して
いるので、従来のように声の大きさや高さ等に基づいて
感情の程度を判定する場合と比べて、音声信号を分析し
て各特徴量を抽出するための複雑な構成が不要であり、
簡単な構成で感情の程度を考慮した感情認識を行うこと
ができる。In order to solve the above-mentioned problems, an emotion recognition apparatus according to the present invention performs a voice recognition process on an input voice by a voice recognition unit to specify a recognition character string, and performs the recognition. The first emotion determination unit determines a rough type of emotion based on the content of the character string, and the degree determination unit determines the degree of emotion based on the content of the recognized character string. The detailed emotion type is determined by the second emotion determination unit based on the general emotion type determined by the determination unit and the degree of emotion determined by the degree determination unit. As described above, since the degree of emotion is determined based on the content of the recognized character string, the voice signal is compared with the conventional case where the degree of emotion is determined based on the loudness or pitch of the voice. There is no need for a complicated configuration for analyzing and extracting each feature,
With a simple configuration, emotion recognition can be performed in consideration of the degree of emotion.

【０００６】また、上述した程度判定手段は、入力音声
に含まれる語彙の重複、感嘆詞、副詞の少なくとも１つ
に基づいて感情の程度を判定することが望ましい。人間
同士の会話を考えると、感情の程度が高い場合の音声中
には、例えば「だめだめ」といった語彙の重複や、「う
わぁ」といった感嘆詞や、「すごく」といった副詞が現
れるので、入力音声に含まれるこれらの要素（語彙の重
複、感嘆詞、副詞）を検出することにより感情の程度を
容易に判定することができる。It is desirable that the degree determining means determines the degree of emotion based on at least one of vocabulary duplication, exclamation and adverb contained in the input speech. Considering the conversation between humans, if the degree of emotion is high, in the voice, for example, duplication of vocabulary such as "no useless", exclamation such as "wow" and adverb such as "great" appear, The degree of emotion can be easily determined by detecting these elements (duplicate vocabulary, exclamation, adverb) contained in the voice.

【０００７】また、上述した第１の感情判定手段によっ
て判定される概略的な感情の種類は、少なくとも肯定的
または否定的の２種類であり、程度判定手段によって判
定される感情の程度は、少なくとも高または低の２種類
であり、第２の感情判定手段は、概略的な感情の種類と
感情の程度とを組み合わせることで得られる肯定的かつ
高、肯定的かつ低、否定的かつ高、否定的かつ低の少な
くとも４種類の中から１つを選択することにより詳細な
感情を判定することが望ましい。例えば、各種システム
において利用者の感情に基づいて各種動作を決定するよ
うな場合を想定すると、詳細な感情の種類としては上述
した４種類程度が認識できれば十分である場合が多いの
で、本発明の感情認識装置を適用することにより利用者
の感情の程度を考慮した感情認識を行って各種動作を行
う各種システムを容易に実現することができる。[0007] Further, the general types of emotions determined by the first emotion determining means are at least two types, positive and negative, and the degree of emotion determined by the degree determining means is at least: High or low, and the second emotion determination means is a positive and high, a positive and low, a negative and high, a negative obtained by combining the general type of emotion and the degree of emotion. It is desirable to determine detailed emotions by selecting one of at least four types of target and low. For example, assuming a case where various actions are determined based on a user's emotion in various systems, it is often sufficient to recognize about four types of detailed emotions as described above. By applying the emotion recognition device, it is possible to easily realize various systems that perform various operations by performing emotion recognition in consideration of the degree of the user's emotion.

【０００８】[0008]

【発明の実施の形態】以下、本発明を適用した一実施形
態の感情認識装置について、図面を参照しながら説明す
る。図１は、一実施形態の感情認識装置の構成を示す図
である。図１に示す感情認識装置１は、マイクロホン１
０、音声認識部１２、音声認識辞書格納部１４、種類判
定部１６、程度判定部１８、感情認識部２６を含んで構
成されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an emotion recognition apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a configuration of an emotion recognition device according to an embodiment. The emotion recognition device 1 shown in FIG.
0, a voice recognition unit 12, a voice recognition dictionary storage unit 14, a type determination unit 16, a degree determination unit 18, and an emotion recognition unit 26.

【０００９】マイクロホン１０は、利用者が発声した音
声を集音して音声信号に変換する。音声認識部１２は、
マイクロホン１０から出力される音声信号を解析して所
定の音声認識処理を行い、利用者が発声した音声（入力
音声）に対応する認識文字列を特定する。音声認識辞書
格納部１４は、標準的な音声に対応した信号波形を音声
認識用辞書として格納している。[0009] The microphone 10 collects the voice uttered by the user and converts it into a voice signal. The voice recognition unit 12
The voice signal output from the microphone 10 is analyzed to perform a predetermined voice recognition process, and a recognition character string corresponding to the voice uttered by the user (input voice) is specified. The voice recognition dictionary storage unit 14 stores a signal waveform corresponding to a standard voice as a voice recognition dictionary.

【００１０】種類判定部１６は、音声認識部１２から出
力される認識文字列に基づいて、概略的な感情の種類を
判定する。ここで、本実施形態では、種類判定部１６に
よって判定される概略的な感情の種類として、肯定的
（Positive）と否定的（Negative）の２種類を考える。
以後、肯定的な感情を「感情Ｐ」と称し、否定的な感情
を「感情Ｎ」と称するものとする。種類判定部１６は、
認識文字列に対応する語彙が上述した感情Ｐと感情Ｎの
いずれに属するものであるかを調べることにより概略的
な感情の種類を判定する。[0010] The type determination section 16 determines the approximate type of emotion based on the recognition character string output from the voice recognition section 12. Here, in the present embodiment, as the general types of emotions determined by the type determination unit 16, two types of positive (Positive) and negative (Negative) are considered.
Hereinafter, a positive emotion is referred to as “emotion P” and a negative emotion is referred to as “emotion N”. The type determination unit 16
By roughly determining whether the vocabulary corresponding to the recognized character string belongs to the emotion P or the emotion N described above, the general type of emotion is determined.

【００１１】図２は、各種の語彙と概略的な感情の種類
との関係について説明する図である。図２に示すよう
に、感情Ｐ（肯定的な感情）に属する語彙としては、
「いいね」、「うまそう」、「かっこいい」、「楽しそ
う」、「好き」、「最高」等が考えられる。また、感情
Ｎ（否定的な感情）に属する語彙としては、「だめ」、
「まずそう」、「かっこ悪い」、「つまらなそう」、
「嫌い」、「最低」等が考えられる。なお、感情Ｐに属
する語彙および感情Ｎに属する語彙は、上述した例に限
定されるものではなく他にも種々の語彙が考えられる。
種類判定部１６は、図２に示したような各種の語彙と概
略的な感情の種類との関係をデータテーブルとして格納
しておき、このデータテーブルを参照して認識文字列に
対応する語彙が感情Ｐおよび感情Ｎのいずれに属するも
のであるかを調べることにより概略的な感情の種類を判
定する。FIG. 2 is a diagram for explaining the relationship between various vocabularies and general types of emotions. As shown in FIG. 2, the vocabulary belonging to the emotion P (positive emotion) is
"Like", "Look good", "Cool", "Look fun", "Love", "Best", etc. can be considered. Vocabulary belonging to emotion N (negative emotion) is
"First,""Cool,""Sorry,"
"Dislike", "lowest", etc. are considered. Note that the vocabulary belonging to the emotion P and the vocabulary belonging to the emotion N are not limited to the above example, and various other vocabularies can be considered.
The type determination unit 16 stores the relationship between the various vocabularies as shown in FIG. 2 and the general types of emotions as a data table, and refers to this data table to determine the vocabulary corresponding to the recognized character string. The type of the emotion is roughly determined by examining which one of the emotion P and the emotion N belongs.

【００１２】程度判定部１８は、音声認識部１２から出
力される認識文字列に基づいて、入力音声に語彙の重
複、所定の感嘆詞、所定の副詞の少なくとも１つが含ま
れるか否かを調べることにより感情の程度を判定する。
具体的には、程度判定部１８は、入力音声に上述した語
彙の重複、所定の感嘆詞、所定の副詞の少なくとも１つ
が含まれる場合には感情の程度が高い（Exciting）と判
定し、それ以外では感情の程度が低い（Depressing）と
判定する。以後、感情の程度が高いことを「程度Ｅ」と
称し、感情の程度が低いことを「程度Ｄ」と称するもの
とする。The degree judging section 18 checks whether or not the input speech contains at least one of vocabulary duplication, a predetermined exclamation, and a predetermined adverb, based on the recognition character string output from the voice recognition section 12. This determines the degree of emotion.
Specifically, the degree determining unit 18 determines that the degree of emotion is high (Exciting) when the input voice includes at least one of the above-described vocabulary duplication, a predetermined exclamation, and a predetermined adverb. Otherwise, the degree of emotion is determined to be low (Depressing). Hereinafter, a high degree of emotion is referred to as "degree E", and a low degree of emotion is referred to as "degree D".

【００１３】また、入力音声に含まれる語彙の重複、所
定の感嘆詞、所定の副詞の各々を検出するために、程度
判定部１８は、重複検出部２０、感嘆詞検出部２２、副
詞検出部２４を含んで構成されている。重複検出部２０
は、音声認識部１２から出力される認識文字列に基づい
て、入力音声に同じ語彙の重複した部分がある場合にこ
れを検出する。具体的には、重複検出部２０は、例え
ば、「いいねいいね」や「だめだめ」等の入力音声に対
応する認識文字列が出力された場合に、これを同じ語彙
の重複した部分として検出する。感嘆詞検出部２２は、
音声認識部１２から出力される認識文字列に基づいて、
入力音声に含まれる所定の感嘆詞を検出する。図３は、
感嘆詞検出部２２によって検出される感嘆詞の一例を示
す図である。感嘆詞検出部２２は、図３に示したような
所定の感嘆詞（「うわぁ」、「おお」、「ひえー」等）
に関するデータテーブルを格納しており、このデータテ
ーブルに格納された感嘆詞のいずれかが入力音声に含ま
れている場合にこれを検出する。副詞検出部２４は、音
声認識部１２から出力される認識文字列に基づいて、入
力音声に含まれる所定の副詞を検出する。図４は、副詞
検出部２４によって検出される副詞の一例を示す図であ
る。副詞検出部２４は、図４に示したような所定の副詞
（「すごく」、「とても」等）に関するデータテーブル
を格納しており、このデータテーブルに格納された副詞
のいずれかが入力音声に含まれている場合にこれを検出
する。In order to detect each of vocabulary duplication, predetermined exclamation, and predetermined adverb contained in the input voice, the degree judging unit 18 includes a duplication detecting unit 20, an exclamation detecting unit 22, and an adverb detecting unit. 24. Duplication detector 20
Detects, based on the recognition character string output from the voice recognition unit 12, when the input voice has a duplicate portion of the same vocabulary. Specifically, for example, when a recognition character string corresponding to an input voice such as “like” or “no useless” is output, the duplication detection unit 20 regards this as an overlapping part of the same vocabulary. To detect. The exclamation point detection unit 22
Based on the recognition character string output from the voice recognition unit 12,
A predetermined exclamation included in the input voice is detected. FIG.
FIG. 4 is a diagram illustrating an example of an exclamation detected by the exclamation detection unit 22. The exclamation word detecting unit 22 determines a predetermined exclamation word (such as “Wow ぁ”, “Oo”, or “Hie”) as shown in FIG.
A data table is stored, and when any of the exclamation words stored in the data table is included in the input voice, this is detected. The adverb detection unit 24 detects a predetermined adverb included in the input speech based on the recognition character string output from the speech recognition unit 12. FIG. 4 is a diagram illustrating an example of an adverb detected by the adverb detecting unit 24. The adverb detecting unit 24 stores a data table related to a predetermined adverb ("very much", "very", etc.) as shown in FIG. 4, and any of the adverbs stored in this data table is included in the input voice. Detect if it is included.

【００１４】感情認識部２６は、種類判定部１６によっ
て判定された概略的な感情の種類と程度判定部１８によ
って判定された感情の程度とに基づいて詳細な感情を判
定する。具体的には、感情認識部２６は、概略的な感情
の種類に関する判定結果として出力される「感情Ｐ」ま
たは「感情Ｎ」と感情の程度に関する判定結果として出
力される「程度Ｅ」または「程度Ｄ」とに基づいて、詳
細な感情を「感情Ｐ」と「程度Ｅ」の組合せ（以後、こ
れを「感情Ｐ／Ｅ」と称する）、「感情Ｐ」と「程度
Ｄ」の組合せ（以後、これを「感情Ｐ／Ｄ」と称す
る）、「感情Ｎ」と「程度Ｄ」の組合せ（以後、これを
「感情Ｎ／Ｄ」と称する）、「感情Ｎ」と「程度Ｅ」の
組合せ（以後、これを「感情Ｎ／Ｅ」と称する）の４種
類に分類しており、これらのいずれに該当するかを判定
することにより詳細な感情を判定している。感情認識部
２６によって判定された詳細な感情が感情認識装置１に
よる感情認識結果として出力される。The emotion recognition unit 26 determines detailed emotions based on the general type of emotion determined by the type determination unit 16 and the degree of emotion determined by the degree determination unit 18. Specifically, the emotion recognizing unit 26 outputs “emotion P” or “emotion N” that is output as a determination result regarding the general type of emotion and “degree E” or “degree E” which is output as a determination result regarding the degree of emotion. Based on the "degree D", detailed emotions are combined with "emotional P" and "degree E" (hereinafter, referred to as "emotional P / E"), and the combination of "emotional P" and "degree D" ( Hereinafter, this is referred to as “emotional P / D”, a combination of “emotional N” and “degree D” (hereinafter, referred to as “emotional N / D”), and a combination of “emotional N” and “degree E”. Combinations (hereinafter referred to as “emotional N / E”) are classified into four types, and detailed emotions are determined by determining which of the combinations corresponds. The detailed emotion determined by the emotion recognition unit 26 is output as an emotion recognition result by the emotion recognition device 1.

【００１５】図５は、感情認識部２６によって判定され
る感情認識結果の一例を示す図である。図５に示すテー
ブルは、左から一列目に「良い／悪い」、「うまい／ま
ずい」等の各種の判断基準が示されており、これらの判
断基準に対する入力音声と感情認識結果との関係の一例
が示されている。例えば、判断基準「良い／悪い」に対
する入力音声が「いいねいいね」の場合には、「いい
ね」という語彙から概略的な感情の種類が「感情Ｐ」に
属すると判定され、同じ語彙が重複していることから感
情の程度が「程度Ｅ」に属すると判定されるので、詳細
な感情の判定結果として「感情Ｐ／Ｅ」が得られる。同
様に、判断基準「好き／嫌い」に対する入力音声が「嫌
い」の場合には、「嫌い」という語彙から概略的な感情
の種類が「感情Ｎ」に属すると判定され、同じ語彙の重
複または感嘆詞または副詞のいずれも入力音声に含まれ
ていないことから感情の程度が「程度Ｄ」に属すると判
定されるので、詳細な感情の判定結果として「感情Ｎ／
Ｄ」が得られる。FIG. 5 is a diagram showing an example of the emotion recognition result determined by the emotion recognition section 26. As shown in FIG. In the table shown in FIG. 5, various criteria such as “good / bad” and “good / unsatisfactory” are shown in the first column from the left, and the relationship between the input voice and the emotion recognition result with respect to these criteria is shown. An example is shown. For example, when the input voice for the judgment criterion “good / bad” is “like”, it is determined from the vocabulary of “like” that the general emotion type belongs to “emotion P”, and the same vocabulary is used. Are determined to belong to the “degree E” because of the overlap, so that “emotion P / E” is obtained as a detailed emotion determination result. Similarly, if the input voice for the criterion “like / dislike” is “dislike”, it is determined from the vocabulary of “dislike” that the general type of emotion belongs to “emotion N”, Since neither the exclamation verb nor the adverb is included in the input voice, the degree of the emotion is determined to belong to “degree D”.
D "is obtained.

【００１６】上述した音声認識部１２、音声認識辞書格
納部１４が音声認識手段に、種類判定部１６が第１の感
情判定手段に、程度判定部１８が程度判定手段に、感情
認識部２６が第２の感情判定手段にそれぞれ対応してい
る。The speech recognition unit 12 and the speech recognition dictionary storage unit 14 described above serve as speech recognition means, the type determination unit 16 serves as first emotion determination means, the degree determination unit 18 serves as degree determination means, and the emotion recognition unit 26 includes Each corresponds to the second emotion determination means.

【００１７】本実施形態の感情認識装置１は上述したよ
うな構成を有しており、次にその動作を説明する。図６
は、本実施形態の感情認識装置１の動作手順を示す流れ
図である。The emotion recognition apparatus 1 of the present embodiment has the above-described configuration, and the operation will be described next. FIG.
5 is a flowchart showing an operation procedure of the emotion recognition device 1 of the present embodiment.

【００１８】音声認識部１２は、マイクロホン１０に対
して音声入力がなされた否かを常に判定しており（ステ
ップ１００）、音声入力がなされた場合には、音声認識
辞書格納部１４に格納された音声認識用辞書を用いて音
声認識処理を行い、入力音声に対応する認識文字列を特
定する（ステップ１０１）。The voice recognition unit 12 always determines whether or not a voice has been input to the microphone 10 (step 100). If a voice has been input, the voice is stored in the voice recognition dictionary storage unit 14. A speech recognition process is performed using the speech recognition dictionary thus specified, and a recognition character string corresponding to the input speech is specified (step 101).

【００１９】次に、種類判定部１６は、音声認識部１２
から出力された認識文字列に基づいて、上述した各種の
語彙と概略的な感情の種類との関係を示すデータテーブ
ルを用いて概略的な感情の種類を判定する（ステップ１
０２）。例えば、種類判定部１６は、認識文字列が「い
いね」、「好き」等の感情Ｐ（肯定的な感情）に属する
語彙を示すものであれば、概略的な感情の種類が「感情
Ｐ」であると判定し、認識文字列が「だめ」、「嫌い」
等の感情Ｎ（否定的な感情）に属する語彙を示すもので
あれば、概略的な感情の種類が「感情Ｎ」であると判定
する。Next, the type judging section 16
Based on the recognition character string output from the above, a rough emotion type is determined using a data table showing the relationship between the various vocabularies and the rough emotion type described above (step 1).
02). For example, if the recognized character string indicates a vocabulary belonging to the emotion P (positive emotion) such as “like” or “like”, the general emotion type is “emotion P ”And the recognized character strings are“ No ”and“ Dislike ”.
If the word indicates a vocabulary belonging to the emotion N (negative emotion), the general emotion type is determined to be “emotion N”.

【００２０】また、程度判定部１８は、音声認識部１２
から出力された認識文字列に基づいて、感情の程度を判
定する（ステップ１０３）。例えば、程度判定部１８
は、認識文字列に、「いいねいいね」等の語彙の重複、
「うわぁ」等の感嘆詞、「すごく」等の副詞のいずれか
１つ以上が含まれている場合には、感情の程度を「程度
Ｅ」（感情の程度が高い）と判定し、語彙の重複、感嘆
詞、副詞のいずれも含まれていない場合には、感情の程
度を「程度Ｄ」（感情の程度が低い）と判定する。な
お、上述したステップ１０２で示した処理とステップ１
０３で示した処理とは、処理を行う順番を入れ替えても
よく、また、処理を同時に行うようにしてもよい。Further, the degree judging section 18 comprises the speech recognition section 12
Then, the degree of emotion is determined based on the recognition character string output from (step 103). For example, the degree determining unit 18
Means that the recognition string contains vocabulary duplication such as "like",
If at least one of an exclamation word such as "Wow" and an adverb such as "Wow" is included, the degree of emotion is determined as "Degree E" (the degree of emotion is high), and the vocabulary is determined. If neither the duplication, the exclamation, nor the adverb is included, the degree of emotion is determined to be “degree D” (the degree of emotion is low). Note that the processing shown in step 102 described above and step 1
The order of performing the processes may be changed from the process indicated by 03, or the processes may be performed simultaneously.

【００２１】概略的な感情の種類についての判定および
感情の程度について判定が行われ、各判定結果が出力さ
れると、感情認識部２６は、これらの判定結果に基づい
て詳細な感情を判定する（ステップ１０４）。具体的に
は、上述したように、概略的な感情の種類の判定結果が
「感情Ｐ」または「感情Ｎ」と出力され、感情の程度と
の判定結果が「程度Ｅ」または「程度Ｄ」と出力される
ので、感情認識部２６は、これらの判定結果の組合せか
ら詳細な感情が「感情Ｐ／Ｅ」、「感情Ｐ／Ｄ」、「感
情Ｎ／Ｅ」、「感情Ｎ／Ｄ」のいずれに属するかを判定
する。詳細な感情の判定動作が完了すると、ステップ１
００に戻り、音声入力の有無の判定以降の動作が繰り返
される。[0021] After roughly determining the type of emotion and determining the degree of emotion and outputting each determination result, the emotion recognizing unit 26 determines detailed emotion based on these determination results. (Step 104). Specifically, as described above, the rough emotion type determination result is output as “emotion P” or “emotion N”, and the determination result of the degree of emotion is “degree E” or “degree D”. Is output, the emotion recognition unit 26 determines that the detailed emotion is “emotional P / E”, “emotional P / D”, “emotional N / E”, “emotional N / D” from the combination of these determination results. Is determined. When the detailed emotion determination operation is completed, step 1
Returning to 00, the operation after the determination of the presence or absence of the voice input is repeated.

【００２２】次に、本実施形態の感情認識装置１を車載
用システムに応用した一例について説明する。図７は、
本実施形態の感情認識装置１を含んで構成される車載用
システムの構成を示す図である。図７に示す車載用シス
テム１００は、感情認識装置１と、音声認識を行うこと
により利用者が発声した音声に応答して対話形式で各種
の動作指示を決定して出力する音声処理装置２と、自車
位置を検出して自車位置周辺の地図を表示したり、利用
者によって選択された目的地までの経路探索および経路
誘導等を行うナビゲーション装置３と、ラジオチューナ
やカセットテープデッキやＣＤプレーヤ等を収容したオ
ーディオ装置４と、ナビゲーション装置３等から出力さ
れる各種画像を表示する表示装置５と、ナビゲーション
装置２から出力される案内音声等やオーディオ装置４か
ら出力されるオーディオ音等の各種音声を出力するスピ
ーカ６を含んで構成されている。Next, an example in which the emotion recognition device 1 of the present embodiment is applied to a vehicle-mounted system will be described. FIG.
FIG. 1 is a diagram illustrating a configuration of a vehicle-mounted system including an emotion recognition device 1 according to an embodiment. The in-vehicle system 100 illustrated in FIG. 7 includes an emotion recognition device 1 and a voice processing device 2 that determines and outputs various operation instructions in an interactive manner in response to a voice uttered by a user by performing voice recognition. A navigation device 3 for detecting the position of the vehicle and displaying a map around the position of the vehicle, searching for a route to the destination selected by the user, and guiding the route, a radio tuner, a cassette tape deck, and a CD. An audio device 4 containing a player or the like, a display device 5 for displaying various images output from the navigation device 3 or the like, and a guidance sound or the like output from the navigation device 2 or an audio sound or the like output from the audio device 4 It is configured to include a speaker 6 that outputs various sounds.

【００２３】次に、上述した車載用システム１００にお
いて、音声処理装置２が感情認識装置１から出力される
感情認識結果に対応して利用者との対話を行う例につい
て説明する。Next, a description will be given of an example in which the voice processing device 2 performs a dialogue with a user in response to the emotion recognition result output from the emotion recognition device 1 in the above-described on-vehicle system 100.

【００２４】具体例として、音声処理装置２が利用者と
の対話内容に基づいてドライブの行き先を決定したり、
演奏させる音楽を決定したりして、ナビゲーション装置
３やオーディオ装置４に対して動作指示を出力する場合
の例を示す。なお、対話例の説明では、利用者を”Ｕ”
（User）で示し、音声処理装置２を”Ａ”（Agent）で
示す。As a specific example, the voice processing device 2 determines the destination of the drive based on the contents of the dialogue with the user,
An example in which an operation instruction is output to the navigation device 3 or the audio device 4 by deciding music to be played will be described. In the description of the dialogue example, the user is referred to as “U”.
(User), and the audio processing device 2 is indicated by “A” (Agent).

【００２５】（具体例１：ドライブの行き先決定）Ｕ：「どこかに遊びに行きたいな。」Ａ：「県外まで遠出してみましょうか？」Ｕ：「お、いいねいいね。」（「感情Ｐ／Ｅ」という判
定結果が得られる。この判定結果から音声処理装置２
は、利用者が遠出を強く望んでいると判断し、具体的な
提案を決定する。）Ａ：「宮城あたりはいかがですか？」Ｕ：「だめだめ、雪が降るかもしれないだろ。」（「感
情Ｎ／Ｅ」という判定結果が得られる。この判定結果か
ら音声処理装置２は、利用者の感情が極めて否定的であ
るので謝罪および別な行き先候補の提示が必要であると
判断し、晩秋の東北方面へのドライブはＮＧと学習す
る。）Ａ：「すみません、気がまわりませんでした。」Ａ：「それでは、東京方面に致しましょうか。」Ｕ：「うん、いいね。」（「感情Ｐ／Ｄ」という判定結
果が得られる。この判定結果から音声処理装置２は、利
用者の感情が肯定的であると判断してドライブの行き先
を東京方面に決定する。）Ａ：「それでは、東京方面に経路案内します。」（音声
処理装置２は、ナビゲーション装置３に動作指示を出力
して東京方面へ経路誘導を行わせる。）（具体例２：演奏させる音楽の決定）Ｕ：「音楽が聞きたいな。」Ａ：「ジャズでもおかけしますか？」Ｕ：「いいね。」（「感情Ｐ／Ｄ」という判定結果が得
られる。この判定結果から音声処理装置２は、利用者の
感情が肯定的であり、音楽ジャンルはジャズでよいと判
断する。）Ａ：「○○○（歌手名）でもいかがでしょう？」Ｕ：「良い趣味してるじゃないの！○○○ってすごくか
っこいいよな！」（「感情Ｐ／Ｅ」という判定結果が得
られる。この判定結果から音声処理装置２は、自分の提
案に対する利用者の感情が非常に肯定的であると判断
し、演奏させる音楽を決定するとともに、利用者の好み
（歌手の○○○が大好き）を学習する。）Ａ：「てへへ（照れた声）！、それじゃあ、○○○の曲
をおかけします。」（音声処理装置２は、オーディオ装
置４に動作指示を出力して歌手○○○の曲を演奏させ
る。）上述した具体例１および２に示したように、本実施形態
の感情認識装置１から出力される感情認識結果を用いる
ことにより、音声処理装置２は、利用者の感情を反映さ
せた対話を行って、車載用システム１００に含まれる各
装置に対して利用者の意図に応じた各種の動作指示を与
えることができる。したがって、より人間的で利用者に
優しいシステムを実現することができる。(Specific Example 1: Determining the Destination of the Drive) U: "I want to go somewhere to play." A: "Let's go outside the prefecture?" U: "Oh, I like it ." (A determination result of “emotion P / E” is obtained. From the determination result, the voice processing device 2
Judges that the user strongly desires to go out, and determines a specific proposal. ) A: "? Is the per Miyagi how" U:. ". Damedame, will not may snow" (judgment result of "emotion N / E" is obtained voice processing apparatus 2 from the result of this determination is, Since the user's emotions are extremely negative, we judge that it is necessary to apologize and present another destination candidate, and learn to drive NG in the late autumn in the direction of Tohoku.) A: "I'm sorry, I don't care. A: "Well then, let's go to Tokyo." U: "Yeah, I like ." (A judgment result of "Emotion P / D" is obtained. From this judgment result, the speech processing device 2 Judge that the user's emotion is positive and decide the destination of the drive to Tokyo.) A: "Then we will guide you to Tokyo." (The voice processing device 2 operates the navigation device 3 Output instructions and go to Tokyo To carry out the route guidance) (Specific Example 2:. Determination of the music to be played) U: "is I want to hear music." A: "Do you want to apologize in jazz" U: ". Like" ( "Emotion A determination result of “P / D” is obtained. From this determination result, the voice processing device 2 determines that the user's emotion is positive and the music genre is jazz.) A: “○○○ (singer) ? name) Why even "U:!" is not that a good hobby or really I ○○○
It's cool ! (A determination result of “emotion P / E” is obtained. From this determination result, the voice processing device 2 determines that the user's emotion with respect to his / her proposal is very positive, and determines music to be played. At the same time, learn the user's preferences (I love the singer's OO).) A: "Teehe (Shining voice)! Then, I'll play the OO song." The device 2 outputs an operation instruction to the audio device 4 to play the tune of the singer XX.) As described in the specific examples 1 and 2, the emotion is output from the emotion recognition device 1 of the present embodiment. By using the emotion recognition result, the voice processing device 2 performs a dialog reflecting the user's emotion, and instructs each device included in the vehicle-mounted system 100 to perform various operation instructions according to the user's intention. Can be given. Therefore, a more human-friendly system can be realized.

【００２６】このように、本実施形態の感情認識装置１
は、音声認識部１２によって特定された認識文字列の内
容に基づいて、種類判定部１６により概略的な感情の種
類（肯定的または否定的）を判定するとともに程度判定
部１８によって感情の程度（高いまたは低い）を判定
し、概略的な感情の種類と感情の程度の各判定結果に基
づいて感情認識部２６により詳細な感情の種類を判定し
ている。したがって、従来のように声の大きさや高さ等
に基づいて感情の程度を判定する場合と比べて音声信号
を分析して各特徴量を抽出するための複雑な構成が不要
であり、簡単な構成で感情の程度を考慮した感情認識を
行うことができる。特に、感情の程度を判定するための
要素として入力音声に含まれる語彙の重複、感嘆詞、副
詞を用いているが、これらの要素（語彙の重複、感嘆
詞、副詞）は従来からある音声認識技術を用いることに
より容易に検出することができるので、感情の程度を容
易に判定することができる。As described above, the emotion recognition apparatus 1 of the present embodiment
Is determined by the type determination unit 16 on the basis of the content of the recognized character string specified by the speech recognition unit 12, and the degree of emotion (positive or negative) is determined by the degree determination unit 18. (High or low), and the emotion recognition unit 26 determines a detailed emotion type based on each determination result of the general emotion type and emotion degree. Therefore, a complicated configuration for analyzing a voice signal and extracting each feature amount is unnecessary compared with a case where a degree of emotion is determined based on a loudness or a pitch of a voice as in the related art. With the configuration, emotion recognition can be performed in consideration of the degree of emotion. In particular, vocabulary duplication, exclamation and adverb included in the input speech are used as elements for determining the degree of emotion, and these elements (duplication of vocabulary, exclamation and adverb) are used in conventional speech recognition. Since the detection can be easily performed by using the technique, the degree of the emotion can be easily determined.

【００２７】なお、本発明は上記実施形態に限定される
ものではなく、本発明の要旨の範囲内において種々の変
形実施が可能である。例えば、上述した実施形態では、
程度判定部１８は、入力音声に語彙の重複、所定の感嘆
詞、所定の副詞の少なくとも１つが含まれるか否かを調
べることにより感情の程度を判定していたが、これらの
要素（語彙の重複、所定の感嘆詞、所定の副詞）は、必
ずしも全部調べる必要はなく、これらの要素のいずれか
１つまたは２つの要素を調べて感情の程度を判定しても
よい。The present invention is not limited to the above embodiment, and various modifications can be made within the scope of the present invention. For example, in the embodiment described above,
The degree determining unit 18 determines the degree of emotion by examining whether or not the input voice includes at least one of vocabulary duplication, a predetermined exclamation, and a predetermined adverb. (Duplicate, predetermined exclamation, predetermined adverb) does not necessarily need to be examined, and any one or two of these elements may be examined to determine the degree of emotion.

【００２８】また、上述した実施形態では、本発明の感
情認識装置１を応用した車載用システム１００について
説明したが、本発明の応用例はこれに限定されるもので
はなく、種々のシステムに応用することができる。例え
ば、本発明の感情認識装置をロボットに適用した場合に
は、詳細な感情の種類として得られる４種類の感情に応
じて、感情の種類とその程度をも考慮した人間的な振る
舞いをすることができるロボットを容易に実現できる。Further, in the above-described embodiment, the on-vehicle system 100 to which the emotion recognition device 1 of the present invention is applied has been described. However, the application of the present invention is not limited to this, and is applied to various systems. can do. For example, when the emotion recognition device of the present invention is applied to a robot, a human-like behavior that takes into account the type and degree of emotion in accordance with four types of emotion obtained as detailed types of emotions is required. A robot that can do this can be easily realized.

【００２９】[0029]

【発明の効果】上述したように、本発明によれば、入力
音声に対応して特定された認識文字列の内容に基づい
て、概略的な感情の種類（肯定的または否定的）と感情
の程度（高いまたは低い）をそれぞれ判定し、これらの
判定結果に基づいて詳細な感情の種類を判定している。
したがって、従来のように声の大きさや高さ等に基づい
て感情の程度を判定する場合と比べて音声信号を分析し
て各特徴量を抽出するための複雑な構成が不要であり、
簡単な構成で感情の程度を考慮した感情認識を行うこと
ができる。特に、感情の程度を判定するための要素とし
て入力音声に含まれる語彙の重複、感嘆詞、副詞を用い
ているので、感情の程度を容易に判定することができ
る。As described above, according to the present invention, the approximate emotion type (positive or negative) and the emotion type are determined based on the contents of the recognized character string specified in response to the input voice. The degree (high or low) is determined, and a detailed emotion type is determined based on these determination results.
Therefore, a complicated configuration for analyzing the audio signal and extracting each feature amount is unnecessary compared with the conventional case where the degree of emotion is determined based on the volume or pitch of the voice, and the like.
With a simple configuration, emotion recognition can be performed in consideration of the degree of emotion. In particular, since the vocabulary duplication, the exclamation, and the adverb included in the input speech are used as elements for determining the degree of emotion, the degree of emotion can be easily determined.

[Brief description of the drawings]

【図１】一実施形態の感情認識装置の構成を示す図であ
る。FIG. 1 is a diagram illustrating a configuration of an emotion recognition device according to an embodiment.

【図２】各種の語彙と概略的な感情の種類との関係につ
いて説明する図である。FIG. 2 is a diagram illustrating a relationship between various vocabularies and types of general emotions.

【図３】感嘆詞検出部によって検出される感嘆詞の一例
を示す図である。FIG. 3 is a diagram illustrating an example of an exclamation detected by an exclamation detection unit.

【図４】副詞検出部によって検出される副詞の一例を示
す図である。FIG. 4 is a diagram illustrating an example of an adverb detected by an adverb detecting unit.

【図５】感情認識部によって判定される感情認識結果の
一例を示す図である。FIG. 5 is a diagram illustrating an example of an emotion recognition result determined by an emotion recognition unit.

【図６】感情認識装置の動作手順を示す流れ図である。FIG. 6 is a flowchart showing an operation procedure of the emotion recognition device.

【図７】本実施形態の感情認識装置を含んで構成される
車載用システムの構成を示す図である。FIG. 7 is a diagram showing a configuration of a vehicle-mounted system including the emotion recognition device of the present embodiment.

[Explanation of symbols]

１感情認識装置１０マイクロホン１２音声認識部１４音声認識辞書格納部１６種類判定部１８程度判定部２０重複検出部２２感嘆詞検出部２４副詞検出部２６感情認識部 Reference Signs List 1 emotion recognition device 10 microphone 12 voice recognition unit 14 voice recognition dictionary storage unit 16 type determination unit 18 degree determination unit 20 duplication detection unit 22 exclamation detection unit 24 adverb detection unit 26 emotion recognition unit

Claims

[Claims]

1. A speech recognition means for performing a speech recognition process on an input speech to specify a recognition character string corresponding to the input speech, and a general emotion type based on the content of the recognition character string. First emotion determining means for determining; degree determining means for determining the degree of emotion based on the content of the recognized character string; and the general emotion type determined by the first emotion determining means; A second emotion determination unit that determines a detailed type of emotion based on the degree of the emotion determined by the degree determination unit.

2. The emotion according to claim 1, wherein the degree determining means determines the degree of the emotion based on at least one of a vocabulary overlap, an exclamation, and an adverb included in the input speech. Recognition device.

3. The general emotion type determined by the first emotion determination unit according to claim 1 or 2, wherein the general type of emotion is at least two of positive or negative.
The degree of the emotion determined by the degree determination means is at least two types, high or low, and the detailed emotion type determined by the second emotion determination means is at least: Emotion recognition including four types of positive and high, positive and low, negative and high, negative and low obtained by combining the general type of emotion and the degree of the emotion apparatus.