JP2018190170A

JP2018190170A - Speech generation apparatus, speech generation method, and speech generation program

Info

Publication number: JP2018190170A
Application number: JP2017091927A
Authority: JP
Inventors: 航光田; Wataru Mitsuta; 東中　竜一郎; Ryuichiro Higashinaka; 竜一郎東中; 松尾　義博; Yoshihiro Matsuo; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2017-05-02
Filing date: 2017-05-02
Publication date: 2018-11-29
Anticipated expiration: 2037-05-02
Also published as: JP6674411B2

Abstract

【課題】雑談対話システムにおいてユーザ発話に直接含まれない言外の情報を得ることができる発話生成装置、発話生成方法、及び発話生成プログラムを提供する。
【解決手段】ユーザ発話と言外の情報との組の集合である用例データ、又はコーパス文書内で共起する述語項構造の組に基づいて、入力部４０により入力されたユーザ発話に対応する言外の情報、又はユーザ発話に対応する述語項構造から、言外の情報の候補を生成し、入力部４０により入力されたユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、生成された言外の情報の候補の各々に対して、スコアを算出する。
【選択図】図３An utterance generation device, an utterance generation method, and an utterance generation program capable of obtaining extra information that is not directly included in a user utterance in a chat dialogue system.
Corresponding to a user utterance input by an input unit based on example data that is a set of a set of user utterances and extra information, or a set of predicate term structures that co-occur in a corpus document. A score indicating the likelihood of out-of-word information generated from the out-of-word information or the predicate term structure corresponding to the user utterance, and the likelihood of the out-of-word information with respect to the user utterance input by the input unit 40 A score is calculated for each of the generated non-verbal information candidates based on the ranking model for calculating.
[Selection] Figure 3

Description

本発明は、発話生成装置、発話生成方法、及び発話生成プログラムに関する。 The present invention relates to an utterance generation device, an utterance generation method, and an utterance generation program.

対話システムは、大きく分けて２種類あり、タスク指向型対話システムと非タスク指向型対話システム（雑談対話システム）とに分けられる。近年は、そのエンターテイメント性、ロボットとの日常会話等が注目されていることもあり、雑談対話システムの研究が盛んに行われている。 There are roughly two types of dialogue systems, which are divided into task-oriented dialogue systems and non-task-oriented dialogue systems (chat dialogue dialogue systems). In recent years, attention has been paid to entertainment, daily conversation with robots, etc., and research on chat dialogue systems has been actively conducted.

対話において会話を進めるために、システムの理解をユーザに伝える応答発話は重要である。例えば、ホテル予約を行うタスク指向型対話システムであれば、ユーザ発話を理解した結果を「〜に〜日に泊まりたいのですね」と伝えることで、ユーザはシステムの理解を確認することができる。雑談対話システムであれば、ユーザが発話した内容を繰り返すことで、ユーザの話を理解しているということを示すことができる。 In order to advance the conversation in the conversation, the response utterance that conveys the understanding of the system to the user is important. For example, in the case of a task-oriented dialogue system for making a hotel reservation, the user can confirm the understanding of the system by telling the user that he / she wants to stay overnight because he / she wants to stay overnight. If it is a chat dialogue system, it can show that the user's story is understood by repeating the content which the user spoke.

雑談対話システムでは、ユーザ発話からキーワードを抽出し、テンプレートに当てはめることで応答発話を生成することが多い。例えば、「富士山に行きました」という発話から「富士山」というキーワードを抽出し、「〜ですね」というパタンに当てはめることで、「富士山ですね」という応答発話を生成することができる。 In a chat dialogue system, a response utterance is often generated by extracting a keyword from a user utterance and applying it to a template. For example, by extracting the keyword “Mt. Fuji” from the utterance “I went to Mt. Fuji” and applying it to the pattern “I am”, a response utterance “I am Mt. Fuji” can be generated.

本手法の具体的な説明は、非特許文献１に開示されている。 A specific description of this method is disclosed in Non-Patent Document 1.

また、近年では、非特許文献２に開示されているように、ユーザ発話から述語項構造と呼ばれる意味構造（述語及びその項からなる構造）を抽出し、抽出した述語項構造を元に応答発話を生成する手法も研究されている。例えば、「富士山に行きました」というユーザ発話から「ユーザガ富士山ニ行く」という述語項構造を抽出し、末尾の表現を変えることで、「富士山に行ったんですね」という応答発話を生成することができる。 In recent years, as disclosed in Non-Patent Document 2, a semantic structure called a predicate term structure (a structure consisting of a predicate and its terms) is extracted from a user utterance, and a response utterance is based on the extracted predicate term structure. The method of generating is also studied. For example, by extracting the predicate term structure of “User Ga Fujisan ni go” from the user utterance “I went to Mt. Fuji” and changing the expression at the end, a response utterance “I went to Mt. Fuji” is generated be able to.

J. Weizenbaum, "ELIZA-a computer program for the study of natural language communication between man and machine", Communications of the ACM, vol. 9, pp. 36-45, 1966.J. Weizenbaum, "ELIZA-a computer program for the study of natural language communication between man and machine", Communications of the ACM, vol. 9, pp. 36-45, 1966. Ryuichiro Higashinaka, Kenji Imamura, Toyomi Meguro, Chiaki Miyazaki, Nozomi Kobayashi, Hiroaki Sugiyama, Toru Hirano, Hoshiro Makino, Yoshihiro Matsuo, "Towards an open domain conversational system fully based on natural language processing, In Proc. COLING, pp. 928-939, 2014.Ryuichiro Higashinaka, Kenji Imamura, Toyomi Meguro, Chiaki Miyazaki, Nozomi Kobayashi, Hiroaki Sugiyama, Toru Hirano, Hoshiro Makino, Yoshihiro Matsuo, "Towards an open domain conversational system fully based on natural language processing, In Proc. COLING, pp. 928- 939, 2014.

既存の雑談対話システムが生成可能な応答発話は、ユーザ発話中の単語、述語項構造等から生成されるため、ユーザ発話中で表層的に表れたものしか生成することができない。そのため、ユーザ発話を単に繰り返して生成しているだけという印象をユーザに与えてしまい、より深く話を聞いていると示すことができない。 Since the response utterances that can be generated by the existing chat dialogue system are generated from the words, predicate term structures, etc. that are being uttered by the user, only those that appear superficially in the user utterance can be generated. For this reason, the user is given the impression that the user utterance is simply generated repeatedly, and cannot be shown as listening deeper.

本発明は、雑談対話システムにおいてユーザ発話に直接含まれない言外の情報を得ることができる発話生成装置、発話生成方法、及び発話生成プログラムを提供することを目的とする。 An object of the present invention is to provide an utterance generation device, an utterance generation method, and an utterance generation program that can obtain extra information that is not directly included in a user utterance in a chat dialogue system.

上記目的を達成するために、本発明の発話生成装置は、ユーザ発話を入力する入力部と、ユーザ発話と言外の情報との組の集合である用例データ、又はコーパス文書内で共起する述語項構造の組に基づいて、前記入力部により入力された前記ユーザ発話に対応する言外の情報、又は前記ユーザ発話に対応する述語項構造から、前記言外の情報の候補を生成する検索部と、前記入力部により入力された前記ユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、前記検索部により生成された前記言外の情報の候補の各々に対して、前記スコアを算出する言外の情報ランキング部と、を含む。 In order to achieve the above object, the utterance generation device of the present invention co-occurs in an example data or a corpus document that is a set of a combination of an input unit for inputting a user utterance and user utterance and extra information. Retrieval that generates candidates for out-of-word information from out-of-word information corresponding to the user utterance input by the input unit or predicate term structure corresponding to the user utterance based on a set of predicate term structures Unit, the user utterance input by the input unit, and the ranking model for calculating a score representing the likelihood of the information outside the speech with respect to the user utterance. A non-verbal information ranking unit that calculates the score for each of the information candidates.

なお、前記言外の情報のタイプを識別するためのタイプ識別モデルに基づいて、前記言外の情報の各々に対して前記タイプを識別して付与するタイプ識別部と、前記タイプが付与された前記言外の情報から、前記タイプに関する予め定めた条件を満たす前記言外の情報を出力するタイプフィルタ部と、を更に含むようにしても良い。 In addition, based on a type identification model for identifying the type of out-of-word information, a type identifying unit that identifies and gives the type to each of out-of-word information, and the type is given You may make it further include the type filter part which outputs the said extraordinary information which satisfy | fills the predetermined condition regarding the said type from the said extraordinary information.

また、予め前記タイプが付与された前記言外の情報の集合を収録したデータに基づいて、前記タイプ識別モデルを作成するタイプ識別モデル作成部を更に備え、前記タイプ識別部は、前記タイプ識別モデル作成部により作成された前記タイプ識別モデルに基づいて、前記各々の言外の情報に対して前記タイプを識別して付与するようにしても良い。 In addition, a type identification model creating unit that creates the type identification model based on data that includes a set of information other than the above-described information to which the type has been assigned in advance is further provided, and the type identification unit includes the type identification model Based on the type identification model created by the creation unit, the type may be identified and assigned to each of the extra information.

また、前記スコアが算出された前記言外の情報の候補のうち、前記スコアが予め定めた条件を満たす前記言外の情報を応答発話として出力する出力部を更に含むようにしても良い。 Moreover, you may make it further include the output part which outputs the said extraordinary information that the said conditions satisfy | fill the conditions predetermined among the candidates of the extraordinary information from which the said score was calculated as a response utterance.

また、前記出力部により出力される前記言外の情報の各々を発話文に変換する表現変換部を更に含むようにしても良い。 Moreover, you may make it further include the expression conversion part which converts each of the said non-verbal information output by the said output part into an utterance sentence.

また、ユーザ発話と正例の前記言外の情報との組、及びユーザ発話と負例の前記言外の情報との組に基づいて、前記ユーザ発話及び前記言外の情報の各々を形態素解析し、得られた形態素のうちの語幹同士の組み合わせ、及び前記語幹同士の組み合わせが、正例の組及び負例の組の何れから得られたものであるかに基づいて、前記ランキングモデルを作成するランキングモデル作成部を更に備え、言外の情報ランキング部は、前記入力部により入力された前記ユーザ発話と、前記検索部により生成された前記言外の情報の候補との各々を形態素解析し、得られた形態素のうちの語幹同士の組み合わせと、前記ランキングモデル作成部により作成された前記ランキングモデルとに基づいて、前記言外の情報の候補について前記スコアを算出するようにしても良い。 Further, based on a set of the user utterance and the non-verbal information of the positive example and a set of the user utterance and the non-verbal information of the negative example, each of the user utterance and the non-verbal information is morphologically analyzed. The ranking model is created based on whether the combination of stems among the obtained morphemes and the combination of stems is obtained from a positive example set or a negative example set. A non-verbal information ranking unit that performs a morphological analysis on each of the user utterances input by the input unit and the non-verbal information candidates generated by the search unit. Based on the combination of stems among the obtained morphemes and the ranking model created by the ranking model creation unit, the score is calculated for the candidate information outside the word. Unishi and may be.

上記目的を達成するために、本発明の発話生成方法は、入力部、検索部、及び言外の情報ランキング部を含む発話生成装置における発話生成方法であって、前記入力部が、ユーザ発話を入力するステップと、前記検索部が、ユーザ発話と言外の情報との組の集合である用例データ、又はコーパス文書内で共起する述語項構造の組に基づいて、前記入力部により入力された前記ユーザ発話に対応する言外の情報、又は前記ユーザ発話に対応する述語項構造から、前記言外の情報の候補を生成するステップと、前記言外の情報ランキング部が、前記入力部により入力された前記ユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、前記検索部により生成された前記言外の情報の候補の各々に対して、前記スコアを算出するステップと、を含む。 In order to achieve the above object, an utterance generation method of the present invention is an utterance generation method in an utterance generation apparatus including an input unit, a search unit, and an extra information ranking unit, wherein the input unit performs user utterance. The input step and the search unit are input by the input unit based on example data that is a set of user utterances and extra information or a set of predicate term structures that co-occur in a corpus document. Generating a candidate for the verbal information from the verbal information corresponding to the user utterance or the predicate term structure corresponding to the user utterance; and the verbal information ranking unit by the input unit Based on the input user utterance and a ranking model for calculating a score representing the likelihood of extra information regarding the user utterance, the extra information generated by the search unit For each candidate, including a step of calculating the score.

上記目的を達成するために、本発明の発話生成プログラムは、コンピュータを、上記発話生成装置の各部として機能させるためのプログラムである。 In order to achieve the above object, the utterance generation program of the present invention is a program for causing a computer to function as each part of the utterance generation apparatus.

本発明によれば、雑談対話システムにおいてユーザ発話に直接含まれない言外の情報を得ることが可能となる。 According to the present invention, it is possible to obtain extra information that is not directly included in the user utterance in the chat dialogue system.

実施形態に係る発話生成装置の全体の構成を示すブロック図である。It is a block diagram which shows the whole structure of the speech production | generation apparatus which concerns on embodiment. 実施形態に係る学習部の機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of the learning part which concerns on embodiment. 実施形態に係る発話生成部の機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of the speech production | generation part which concerns on embodiment. 実施形態に係る発話生成処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the speech production | generation process which concerns on embodiment.

以下、本実施形態について図面を用いて説明する。 Hereinafter, the present embodiment will be described with reference to the drawings.

本実施形態に係る発話生成装置は、ユーザ発話から推定が可能な情報をもとに、応答発話を生成することで、ユーザ発話に限定されない発話を生成する。本実施形態では、ユーザ発話から推定が可能な情報を「言外の情報」と呼び、当該「言外の情報」を「発話に明示的に表れていなかったとしても、人間が一般に理解することができる情報」と定義する。 The utterance generation device according to the present embodiment generates an utterance that is not limited to a user utterance by generating a response utterance based on information that can be estimated from the user utterance. In this embodiment, information that can be estimated from user utterances is called “outside speech information”, and even if the “outside speech information” is not explicitly expressed in the utterance, it is generally understood by humans. Is defined as “information that can be used”.

具体的には、本実施形態に係る発話生成装置は、ユーザ発話を入力とし、言外の情報を推定した上で、その言外の情報を用いて応答発話を生成する。また、本実施形態に係る発話生成装置は、特に、言外の情報を確認したり質問したりする発話を生成することで、応答発話を生成する。言外の情報の推定では、以下に示す（方法１）及び（方法２）で言外の情報の候補を列挙し、ユーザ発話と各々の言外の情報との候補ペアの尤もらしさを表すスコアを算出し、言外の情報の候補をランキングすることで、言外の情報を推定する。 Specifically, the utterance generation device according to the present embodiment receives user utterances as input, estimates extra information, and generates response utterances using the extra information. In addition, the utterance generation device according to the present embodiment generates a response utterance by generating an utterance for confirming or asking a question in particular. In the estimation of out-of-word information, candidates for out-of-word information are listed in the following (Method 1) and (Method 2), and a score representing the likelihood of a candidate pair of user utterance and each out-of-word information Is calculated, and the information of words outside the speech is ranked to estimate the information outside the speech.

（方法１）ユーザ発話と言外の情報との組の集合を収録したデータを用意し、入力されたユーザ発話を用いて、データ中のユーザ発話を検索することで、言外の情報の候補を列挙する。 (Method 1) Candidates for out-of-language information by preparing data containing a set of user utterances and out-of-word information and searching for user utterances in the data using the input user utterance Is enumerated.

（方法２）文書内で共起する述語項構造の組の集合のデータ（例：＜雨が降る、洗濯物が濡れる＞）を用意し、ユーザ発話と類似する述語項構造を検索し、データ中で、検索された述語項構造と組み合わせられた述語項構造を言外の情報とみなすことで、言外の情報の候補を列挙する。 (Method 2) Data of a set of predicate term structure sets co-occurring in a document (e.g., <raining, wet laundry>) is prepared, a predicate term structure similar to a user utterance is searched, and data Among them, the predicate term structure combined with the searched predicate term structure is regarded as out-of-verb information, thereby enumerating candidates for out-of-verb information.

（方法１）及び（方法２）の方法で、ユーザ発話に対する言外の情報の候補を列挙し、列挙した候補をランキングすることで、言外の情報の推定を行う。具体的には、ユーザ発話と正しい言外の情報（正例の言外の情報）の組、及び、ユーザ発話と誤った言外の情報（負例の言外の情報）の組を収録したデータをもとに、ユーザ発話に対して言外の情報の尤もらしさを推定する回帰モデルを学習する。学習した回帰モデルを用いて、発話に対する言外の情報の候補の尤もらしさをスコアとして推定した上でランキングを行い、スコアが予め定めた閾値以上の言外の情報を、ユーザ発話に対する言外の情報として推定する。 By the methods (method 1) and (method 2), candidates for information outside the speech with respect to the user utterance are listed, and by ranking the candidates listed, the information outside the speech is estimated. Specifically, a set of user utterances and correct out-of-word information (positive out-of-word information) and a set of user utterances and out-of-word information (in negative out-of-word information) were recorded. Based on the data, a regression model is learned that estimates the likelihood of information other than words for user utterances. Using the learned regression model, ranking is performed after estimating the likelihood of the candidate for information outside the speech as a score, and the information outside the speech with a score equal to or greater than a predetermined threshold Estimate as information.

ユーザ発話に対する言外の情報として推定した言外の情報の中には、様々な種類の情報が含まれ得るため、推定した言外の情報の全てを応答発話として出力してしまうと、ユーザが不快に思う場合がある。例えば、「テストで１番になりました」というユーザ発話に対して、「ユーザは褒められたい」という言外の情報を推定したとき、「あなたは褒められたいんですね」と応答すると、ユーザが不快に思う可能性がある。そのため、予め定義した言外の情報のタイプに基づいて、推定された言外の情報のタイプを推定した後、特定のタイプの言外の情報のみを応答発話として出力する。 Since various types of information can be included in the out-of-language information estimated as out-of-language information for the user utterance, if all of the estimated out-of-language information is output as response utterances, the user May be uncomfortable. For example, in response to a user utterance that says “I got the first test”, when I guessed the extra information that “the user wants to be praised” May be uncomfortable. For this reason, after estimating the estimated information type outside the speech based on the predefined information type outside the speech, only the specific type speech outside the information is output as a response utterance.

推定した言外の情報に対して、言外の情報をそのままの形（例えば、平叙文）、もしくは、末尾を確認の形式（例：「〜なんですね」）、質問の形式（例：「〜なんですか？」）等に変換した形で発話として使用することで、応答発話を出力する。 In contrast to the estimated out-of-word information, the out-of-word information is used as it is (for example, a plain text), the confirmation form at the end (for example, "What is it?"), The form of the question (for example, "~" What is it? ”) Etc.), etc. are used as utterances to output response utterances.

図１は、本実施形態に係る発話生成装置１０の全体の構成を示すブロック図である。図１に示すように、本実施形態に係る発話生成装置１０は、学習部１２、及び、発話生成部１４を備えている。学習部１２は、発話生成部１４で必要とされるデータ及びモデルを作成する。また、発話生成部１４は、学習部１２で作成されたデータ及びモデルを元に、ユーザ発話に対する複数の応答発話の集合を、その尤もらしさのスコアを付与した状態で生成する。 FIG. 1 is a block diagram showing an overall configuration of an utterance generation device 10 according to the present embodiment. As shown in FIG. 1, the utterance generation device 10 according to the present embodiment includes a learning unit 12 and an utterance generation unit 14. The learning unit 12 creates data and a model required by the utterance generation unit 14. Further, the utterance generation unit 14 generates a set of a plurality of response utterances with respect to the user utterance with the likelihood score assigned thereto based on the data and model created by the learning unit 12.

図２は、本実施形態に係る発話生成装置１０の学習部１２の機能的な構成を示すブロック図である。学習部１２は、コーパス文書共起データ作成部２０、コーパス文書共起データ記憶部２２、ランキングモデル作成部２４、ランキングモデル記憶部２６、タイプ識別モデル作成部２８、及び、タイプ識別モデル記憶部３０を備えている。 FIG. 2 is a block diagram illustrating a functional configuration of the learning unit 12 of the utterance generation device 10 according to the present embodiment. The learning unit 12 includes a corpus document co-occurrence data creation unit 20, a corpus document co-occurrence data storage unit 22, a ranking model creation unit 24, a ranking model storage unit 26, a type identification model creation unit 28, and a type identification model storage unit 30. It has.

図３は、本実施形態に係る発話生成装置１０の発話生成部１４の機能的な構成を示すブロック図である。発話生成部１４は、入力部４０、用例データ記憶部４２、検索部４４、言外の情報ランキング部４６、タイプ識別部４８、タイプフィルタ部５０、表現変換部５２、及び、出力部５４を備えている。また、検索部４４は、用例検索部４４ａ、及び、共起検索部４４ｂを備えている。 FIG. 3 is a block diagram illustrating a functional configuration of the utterance generation unit 14 of the utterance generation apparatus 10 according to the present embodiment. The utterance generation unit 14 includes an input unit 40, an example data storage unit 42, a search unit 44, an extraordinary information ranking unit 46, a type identification unit 48, a type filter unit 50, an expression conversion unit 52, and an output unit 54. ing. The search unit 44 includes an example search unit 44a and a co-occurrence search unit 44b.

なお、図１中及び図２中の矢印は、各部の入出力関係を表す。また、破線は、各部がそのモデル又はデータを利用することを表す。 In addition, the arrow in FIG. 1 and FIG. 2 represents the input-output relationship of each part. A broken line indicates that each part uses the model or data.

コーパス文書共起データ作成部２０は、コーパス文書内で共起する述語項構造の組に基づいて、入力部４０により入力したユーザ発話に対応する言外の情報から、前記言外の情報の候補を生成し、コーパス文書共起データ記憶部２２に記憶させる。 The corpus document co-occurrence data creation unit 20 generates a candidate for the non-verbal information from the non-verbal information corresponding to the user utterance input by the input unit 40 based on the set of predicate term structures that co-occur in the corpus document. Is generated and stored in the corpus document co-occurrence data storage unit 22.

具体的には、まず、入力された文書コーパスからイベントを抽出するために、各文書に対して述語項構造解析を行う。述語項構造解析には、述語及び項が抽出できるツールを利用すると良い。述語及び項が抽出できるツールとしては、例えば、下記非特許文献３に開示されている、出願人が開発したＪＤＥＰが挙げられる。 Specifically, first, predicate term structure analysis is performed on each document in order to extract events from the input document corpus. For the predicate term structure analysis, a tool capable of extracting predicates and terms may be used. As a tool capable of extracting predicates and terms, for example, JDEP developed by the applicant disclosed in Non-Patent Document 3 below can be cited.

［非特許文献３］Kenji Imamura, Genichiro Kikui, and Norihito Yasuda, "Japanese dependency parsing using scquential labeling for semi-spoken language.", In Proc ACL, 2007. [Non-Patent Document 3] Kenji Imamura, Genichiro Kikui, and Norihito Yasuda, "Japanese dependency parsing using scquential labeling for semi-spoken language.", In Proc ACL, 2007.

例えば、以下のように、３つの文書が含まれたブログ文書コーパスを入力として受け付ける。
文書Ａ：「車を運転して、富士山に行った。やはり山が好きだと思った。」
文書Ｂ：「富士山に行って、山に登った。景色が綺麗だった。」
文書Ｃ：「山が好きなので、富士山に行った。移動するために、車を運転した。」 For example, a blog document corpus including three documents is accepted as input as follows.
Document A: “I drove a car and went to Mt. Fuji.
Document B: “I went to Mt. Fuji and climbed the mountain. The scenery was beautiful.”
Document C: “I went to Mt. Fuji because I like mountains. I drove to drive.”

これに対し、述語項構造解析を行った結果が以下である。なお、「Ｉ」（一人称を表す記号）は、ブログの著者を表す。また、述語は動詞で表され、項は名詞として表される。助詞の「ガ」、「ヲ」、及び「二」は項の種類を表し、それぞれ主語、直接目的語、及び間接目的語を表す。
文書Ａ：「Ｉガ車ヲ運転する」、「Ｉガ富士山ニ行く」、「Ｉガ山ガ好き」、「Ｉガ思う」
文書Ｂ：「Ｉガ富士山ニ行く」、「Ｉガ山ニ登る」、「景色ガ綺麗」
文書Ｃ：「Ｉガ山ガ好き」、「Ｉガ富士山ニ行く」、「Ｉガ移動する」、「Ｉガ車ヲ運転する」 On the other hand, the result of predicate term structure analysis is as follows. “I” (symbol representing the first person) represents the author of the blog. Predicates are expressed as verbs, and terms are expressed as nouns. The particles “ga”, “wo”, and “two” represent the types of terms, and represent the subject, direct object, and indirect object, respectively.
Document A: “I drive my car”, “I go to Mt. Fuji”, “I love Iga”, “I think”
Document B: “I Go to Mt. Fuji”, “Climb I” and “Beautiful scenery”
Document C: “I love Gayamaga”, “I go to Iga Fujisan”, “I move”, “I drive Iga”

次に、同一の文書から抽出された述語項構造の全ての組み合わせについて組として抽出し、抽出した組が文書内において共起する共起回数をカウントすることで、下記表１に示す結果が得られる。 Next, all combinations of predicate term structures extracted from the same document are extracted as a pair, and the number of co-occurrences in which the extracted pair co-occurs in the document is obtained, thereby obtaining the result shown in Table 1 below. It is done.

最後に、各々の述語項構造の組に対して付与された共起回数に基づいて、共起回数が予め定めた閾値以上の述語項構造の組を共起データとしてコーパス文書共起データ記憶部２２に出力する。 Finally, based on the number of co-occurrence given to each set of predicate term structures, a corpus document co-occurrence data storage unit using a set of predicate term structures whose co-occurrence number is equal to or greater than a predetermined threshold as co-occurrence data 22 to output.

上述した例では、例えば、閾値を２とした場合、共起回数が２回以上の述語項構造の組を抽出し、下記表２に示す述語項構造の組が共起データとして出力される。 In the above-described example, for example, when the threshold is set to 2, a set of predicate term structures having a co-occurrence count of 2 or more is extracted, and a set of predicate term structures shown in Table 2 below is output as co-occurrence data.

コーパス文書共起データ記憶部２２には、コーパス文書共起データ作成部２０により作成されたコーパス文書共起データが記憶される。 The corpus document co-occurrence data storage unit 22 stores the corpus document co-occurrence data created by the corpus document co-occurrence data creation unit 20.

ランキングモデル作成部２４は、ユーザ発話と正例の言外の情報との組、及びユーザ発話と負例の言外の情報との組に基づいて、ユーザ発話及び言外の情報の各々を形態素解析し、得られた形態素のうちの語幹同士の組み合わせ、及び語幹同士の組み合わせが正例の組及び負例の組の何れから得られたものであるかに基づいて、ランキングモデルを作成し、ランキングモデル記憶部２６に記憶させる。ランキングモデルは、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのモデルである。 The ranking model creation unit 24 converts each of the user utterance and the non-verbal information into a morpheme based on the set of the user utterance and the non-verbal information of the positive example, and the combination of the user utterance and the non-verbal information of the negative example. Analyzing and creating a ranking model based on the combination of stems among the obtained morphemes, and whether the combination of stems is obtained from a positive example set or a negative example set, It is stored in the ranking model storage unit 26. The ranking model is a model for calculating a score representing the likelihood of information other than words for user utterances.

ランキングモデルの作成には、下記表３に示すような、予め人手で作成された、ユーザ発話と正例の言外の情報との組、及びユーザ発話と負例の言外の情報との組のデータを用いる。なお、下記表３におけるフラグは、１が正例を表し、０が負例を表す。また、「Ｉ」は、言外の情報が付与されたユーザ発話の話者を表す。 As shown in Table 3 below, the ranking model is created by manually creating a set of user utterances and positive non-verbal information, and a set of user utterances and negative non-verbal information. The data is used. Note that, in the flags in Table 3 below, 1 represents a positive example and 0 represents a negative example. “I” represents a speaker of a user utterance given extra information.

このようなデータを元にランキングモデルの学習を行うために、素性として、ユーザ発話に含まれる形態素と、言外の情報に含まれる形態素との組を用いる。 In order to perform the learning of the ranking model based on such data, a set of morphemes included in user utterances and morphemes included in extraordinary information is used as features.

ここでは、一例として、下記表４に示すユーザ発話と言外の情報とを例に挙げて説明する。ただし、Ｓは、話者を表す記号である。 Here, as an example, user utterances and extra information shown in Table 4 below will be described as examples. Here, S is a symbol representing a speaker.

下記非特許文献４で開示されている、出願人が開発したＪＴＡＧ等の形態素解析機を用いて、ユーザ発話と言外の情報との形態素解析を行う。 Using a morphological analyzer such as JTAG developed by the applicant, disclosed in Non-Patent Document 4 below, morphological analysis of user utterances and extra information is performed.

［非特許文献４］Takeshi Fuchi, and Shinichiro Takagi, "Japanese morphological analyzer using word co-occurrence: JTAG", Proceedings of the 17th international conference on Computational linguistics-Volume 1, Association for Computational Linguistics, 1998. [Non-Patent Document 4] Takeshi Fuchi, and Shinichiro Takagi, "Japanese morphological analyzer using word co-occurrence: JTAG", Proceedings of the 17th international conference on Computational linguistics-Volume 1, Association for Computational Linguistics, 1998.

形態素解析を行うことにより、下記表５に示すような結果を得る。なお、下記表５において、左側の列は表記を表し、中央の列は品詞を表し、右側の列は語幹を表す。 By performing morphological analysis, the results shown in Table 5 below are obtained. In Table 5 below, the left column represents the notation, the middle column represents the part of speech, and the right column represents the stem.

次に、形態素解析の結果を利用して、内容語（文法的な意味を持たず、意味を表す単語）を抽出し、ユーザ発話から抽出した内容語と、言外の情報から抽出した内容語との組を作成する。具体的には、ユーザ発話と言外の情報のそれぞれから、品詞が名詞、動詞語幹、又は形容詞語幹である形態素を抽出し、ユーザ発話から抽出した形態素の各々と、言外の情報から抽出した形態素の各々とを組み合わせることで、下記表６に示すような、内容語の組を作成する。なお、下記表６では、各々の内容語の組を、（ユーザ発話から抽出した形態素，言外の情報から抽出した形態素）として示している。 Next, using the results of morphological analysis, we extract content words (words that have no grammatical meaning and represent meanings), extract content words from user utterances, and content words extracted from non-verbal information Create a pair with Specifically, the morpheme whose part of speech is a noun, verb stem, or adjective stem is extracted from each of the user utterance and verbal information, and extracted from each morpheme extracted from the user utterance and verbal information. By combining each morpheme, a set of content words as shown in Table 6 below is created. In Table 6 below, each set of content words is shown as (morpheme extracted from user utterance, morpheme extracted from extra information).

このようにして内容語の組を作成し、作成した内容語の組と、内容語の組が正例の組及び負例の組の何れから得られたものであるかに基づいて、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルを学習する。 A set of content words is created in this way, and based on whether the created set of content words and the set of content words are obtained from a positive example set or a negative example set, the user utterance A ranking model for calculating a score representing the likelihood of information other than the word is learned.

学習アルゴリズムには、例えば、ロジスティック回帰を用いれば良い。ロジスティック回帰の詳細は、下記非特許文献５に開示されている。また、ロジスティック回帰以外のアルゴリズムとしては、例えばランキングＳＶＭ（Support Vector Machine）等が挙げられる。 For example, logistic regression may be used as the learning algorithm. The details of logistic regression are disclosed in Non-Patent Document 5 below. As an algorithm other than logistic regression, for example, ranking SVM (Support Vector Machine) and the like can be cited.

［非特許文献５］高村大也，言語処理のための機械学習入門，コロナ社，2010. [Non-Patent Document 5] Daiya Takamura, Introduction to Machine Learning for Language Processing, Corona, 2010.

最終的に、学習の結果として得られた、内容語の組毎の、ユーザ発話に対する言外の情報の尤もらしさを表す重みの情報を、ランキングモデルとしてランキングモデル記憶部２６に記憶させる。 Finally, the weight information representing the likelihood of out-of-word information with respect to the user utterance for each set of content words obtained as a result of learning is stored in the ranking model storage unit 26 as a ranking model.

ランキングモデル記憶部２６には、ランキングモデル作成部２４により作成されたランキングモデルが記憶される。 The ranking model storage unit 26 stores the ranking model created by the ranking model creation unit 24.

タイプ識別モデル作成部２８は、予めタイプが付与された言外の情報の集合を収録したデータに基づいて、言外の情報のタイプを識別するタイプ識別モデルを作成し、タイプ識別モデル記憶部３０に記憶させる。 The type identification model creation unit 28 creates a type identification model for identifying the type of extraordinary information based on data that includes a set of extraordinary information to which a type has been assigned in advance, and a type identification model storage unit 30. Remember me.

タイプの種類は、下記表７に示す９種類である。 There are nine types of types shown in Table 7 below.

これらの９種類のタイプは、言外の情報がどのような情報を表しているかという観点で作成されている。各タイプの作成方法、及び詳細な説明は、下記非特許文献６に開示されている。 These nine types are created from the viewpoint of what kind of information the extra information represents. Each type of creation method and detailed description are disclosed in Non-Patent Document 6 below.

［非特許文献６］光田航，東中竜一郎，松尾義博，「複数の作業者グループを用いた対話における言外の情報の類型化」，電子情報通信学会技術研究報告 Vol. 116 No. 379 言語理解とコミュニケーション, pp. 13-18, 2016. [Non-Patent Document 6] Mitsuda Mitsuo, Higashinaka Ryuichiro, Matsuo Yoshihiro, “Classification of Words in Dialogue Using Multiple Worker Groups”, IEICE Technical Report Vol. 116 No. 379 Language Understanding and Communication, pp. 13-18, 2016.

本実施形態では、予めタイプが各々の言外の情報に付与されたデータを用いて、言外の情報に対してタイプを識別するためのタイプ識別モデルを学習する。 In the present embodiment, a type identification model for identifying a type for out-of-word information is learned using data in which the type is previously assigned to the out-of-word information.

下記表８に示す言外の情報を例とし、素性の抽出方法を説明する。なお、下記表８で例示されている言外の情報は、「信念１」というタイプに分類される言外の情報である。 The feature extraction method will be described by taking information outside the table shown in Table 8 as an example. Note that the non-verbal information exemplified in Table 8 below is non-verbal information classified into the type “belief 1”.

言外の情報に対して形態素解析を行うと、下記表９に示すような結果が得られる。下記表９の結果は、出願人が開発したＪＴＡＧ等の形態素解析機を用いて形態素解析を行った場合の出力結果である。下記表９における左側の列は表記を表し、中央左側の列は品詞を表し、中央右側の列は語幹を表し、右側の列は日本語語彙大系のカテゴリ番号を表す。また、日本語語彙大系のカテゴリ番号［］［］［］は、左側から順番に、一般名詞意味属性、固有名詞意味属性、用言意味属性を表す。これらの各々の属性については、非特許文献７に開示されている。ただし、Ｓは、話者を表す記号である。 When a morphological analysis is performed on information other than words, the results shown in Table 9 below are obtained. The results in Table 9 below are output results when morphological analysis is performed using a morphological analyzer such as JTAG developed by the applicant. In the following Table 9, the left column represents the notation, the middle left column represents the part of speech, the middle right column represents the stem, and the right column represents the Japanese vocabulary large category number. In addition, the category numbers [] [] [] of the Japanese vocabulary large system represent general noun semantic attributes, proper noun semantic attributes, and prescriptive semantic attributes in order from the left side. Each of these attributes is disclosed in Non-Patent Document 7. Here, S is a symbol representing a speaker.

［非特許文献７］池原悟，宮崎正弘，白井諭，横尾昭男，中岩浩巳，小倉健太郎，大山芳史，林良彦，日本語語彙大系，岩波書店，1997. [Non-Patent Document 7] Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio Yokoo, Hiroaki Nakaiwa, Kentaro Ogura, Yoshifumi Oyama, Yoshihiko Hayashi, Japanese Vocabulary System, Iwanami Shoten, 1997.

このような形態素解析の結果から、以下のような素性が得られる。なお、ユニグラムは系列の要素１つずつのことを表し、バイグラムは系列の隣接する要素２つずつを順序付きで組にしたものを表す。また、形態素は、形態素解析の結果として得られた各形態素の語幹を表す。 The following features are obtained from the result of such morphological analysis. A unigram represents one element of a sequence, and a bigram represents a group of two adjacent elements in a sequence in order. The morpheme represents the stem of each morpheme obtained as a result of the morpheme analysis.

形態素ユニグラム：「Ｓ」、「は」、「富士山」、「が」、「好き」、「だ」
形態素バイグラム：「Ｓ-は」、「は-富士山」、「富士山-が」、「が-好き」、「好き-だ」
一般名詞意味属性ユニグラム：「４７１」、「１３００」
一般名詞意味属性バイグラム：「４７１−１３００」
用言意味属性ユニグラム：「１１」
用言意味属性バイグラム：なし Morphological Unigrams: “S”, “Ha”, “Mount Fuji”, “Ga”, “Like”, “Da”
Morphological bigram: “S-ha”, “Ha-Fuji”, “Mt. Fuji-ga”, “Ga-I like”, “I like-”
General noun meaning attribute unigram: “471”, “1300”
General noun meaning attribute bigram: "471-1300"
Prescriptive meaning attribute unigram: "11"
Prescriptive meaning attribute bigram: None

一般名詞意味属性と用言意味属性とのカテゴリ番号が複数存在する場合には、最も左側のカテゴリ番号が、上記ＪＴＡＧが最も適切と判定したカテゴリ番号であることを考慮し、最も左側のカテゴリ番号のみを用いる。 When there are a plurality of category numbers of general noun semantic attributes and prescriptive semantic attributes, the leftmost category number is considered to be the category number that the JTAG has determined to be most appropriate, and the leftmost category number. Use only.

これらの素性を用いて、言外の情報を入力として、そのタイプを出力するモデルを学習する。学習アルゴリズムには、多クラス分類が可能なＳＶＭを利用すれば良い。多クラス分類については、上記非特許文献５に詳しく開示されている。 Using these features, learn the model that takes the information outside the word as input and outputs the type. As the learning algorithm, an SVM capable of multi-class classification may be used. The multi-class classification is disclosed in detail in Non-Patent Document 5 above.

タイプ識別モデル記憶部３０には、タイプ識別モデル作成部２８により作成されたタイプ識別モデルが記憶される。 The type identification model storage unit 30 stores the type identification model created by the type identification model creation unit 28.

入力部４０は、１つのユーザ発話を入力し、入力したユーザ発話を、用例検索部４４ａ、共起検索部４４ｂ、及び言外の情報ランキング部４６に出力する。例えば、ユーザ発話として下記表１０に示すユーザ発話が入力され、入力されたユーザ発話が、用例検索部４４ａ、共起検索部４４ｂ、及び言外の情報ランキング部４６に出力される。 The input unit 40 inputs one user utterance and outputs the input user utterance to the example search unit 44a, the co-occurrence search unit 44b, and the extra information ranking unit 46. For example, the user utterances shown in Table 10 below are input as user utterances, and the input user utterances are output to the example search unit 44a, the co-occurrence search unit 44b, and the extra information ranking unit 46.

なお、入力部４０は、ユーザ発話の他に、当該ユーザ発話の前までの対話文脈（発話が系列になったもの）を入力として受け付けても良い。 In addition to the user utterance, the input unit 40 may accept as an input a conversation context (a series of utterances) before the user utterance.

用例データ記憶部４２には、ユーザ発話と言外の情報との組の集合である用例データが記憶されている。 The example data storage unit 42 stores example data that is a set of user utterances and extra information.

用例検索部４４ａは、用例データ記憶部４２に記憶されている用例データに基づいて、入力部４０により入力されたユーザ発話に対応する言外の情報から、言外の情報の候補を生成する。 Based on the example data stored in the example data storage unit 42, the example search unit 44a generates a candidate for extra information from the extra information corresponding to the user utterance input by the input unit 40.

具体的には、まず、入力部４０により入力されたユーザ発話、及び、用例データ中のユーザ発話を、ｗｏｒｄ２ｖｅｃを用いてベクトルに変換する。ｗｏｒｄ２ｖｅｃは、テキストコーパスを用いて学習を行うことで、任意のテキストを固定長のベクトルに変換する一般的な手法である。 Specifically, first, the user utterance input by the input unit 40 and the user utterance in the example data are converted into vectors using word2vec. word2vec is a general technique for converting an arbitrary text into a fixed-length vector by performing learning using a text corpus.

ｗｏｒｄ２ｖｅｃの学習には、Ｗｉｋｉｐｅｄｉａ（登録商標）等のコーパス文書を用いる。この際、コーパス文書共起データ作成部で用いたコーパス文書と同じコーパス文書を用いても良い。具体的な学習方法は、下記非特許文献８に開示されている。 A corpus document such as Wikipedia (registered trademark) is used for learning word2vec. At this time, the same corpus document as that used in the corpus document co-occurrence data creation unit may be used. A specific learning method is disclosed in Non-Patent Document 8 below.

［非特許文献８］Tomas Mikolov, Kai Chen, and Jeffrey Dean, "Efficient estimation of word representation in vector space", CoRR, Vol. abs/1301.3781, 2013. [Non-Patent Document 8] Tomas Mikolov, Kai Chen, and Jeffrey Dean, "Efficient estimation of word representation in vector space", CoRR, Vol. Abs / 1301.3781, 2013.

ｗｏｒｄ２ｖｅｃを用いてベクトルに変換した、入力部４０により入力されたユーザ発話と、用例データ中のユーザ発話との類似度の計算には、コサイン類似度を用いれば良い。コサイン類似度は、ベクトル間の類似度を測るために用いられる一般的な尺度であり、下記（１）式で表される。なお、下記（１）式における

と

とはベクトルを表し、

はコサイン類似度を表す。 The cosine similarity may be used to calculate the similarity between the user utterance input by the input unit 40 converted into a vector using word2vec and the user utterance in the example data. The cosine similarity is a general scale used for measuring the similarity between vectors, and is expressed by the following equation (1). In the following formula (1)

When

Represents a vector,

Represents the cosine similarity.

・・・（１）
... (1)

入力部４０により入力されたユーザ発話と、用例データ中のユーザ発話との類似度を計算し、用例データ中で、コサイン類似度が閾値以上であるユーザ発話と組になっている言外の情報を、言外の情報の候補として出力する。 The similarity between the user utterance input by the input unit 40 and the user utterance in the example data is calculated, and the extra information that is paired with the user utterance whose cosine similarity is equal to or greater than the threshold in the example data Is output as a candidate for extra information.

ユーザ発話として「紅葉を見に富士山に行きました」というユーザ発話に対する処理の一例を以下に示す。下記表１１に、用例データの一例を示す。 An example of a process for a user utterance “I went to Mt. Fuji to see autumn leaves” as a user utterance is shown below. Table 11 below shows an example of example data.

下記表１２は、入力部４０により入力されたユーザ発話と、用例データ中のユーザ発話とをベクトル化し、類似度を計算した結果を示す。 Table 12 below shows the result of calculating the similarity by vectorizing the user utterance input by the input unit 40 and the user utterance in the example data.

上記表１２において、類似度の閾値を０．８とすると、下記表１３に示すように、類似度が閾値以上のであるユーザ発話と組になっている言外の情報を、言外の情報の候補として言外の情報ランキング部４６に出力する。 In Table 12 above, if the similarity threshold is 0.8, as shown in Table 13 below, the verbal information paired with the user utterance whose similarity is equal to or greater than the threshold will be As a candidate, the information is output to the information ranking unit 46.

共起検索部４４ｂは、コーパス文書共起データ記憶部２２にコーパス文書共起データとして記憶されている、コーパス文書内で共起する述語項構造の組に基づいて、ユーザ発話に対応する述語項構造から、言外の情報の候補を生成する。 The co-occurrence search unit 44b predicate terms corresponding to user utterances based on a set of predicate term structures co-occurring in the corpus document stored as corpus document co-occurrence data in the corpus document co-occurrence data storage unit 22. From the structure, a candidate for extra information is generated.

言外の情報の候補を検索する際には、コーパス文書共起データ記憶部２２に記憶されているコーパス文書共起データを読み出し、読み出したコーパス文書共起データに対して、入力部４０により入力されたユーザ発話と類似した述語項構造を検索する。そして、述語項構造の組において、検索された述語項構造と組み合わせている述語項構造を言外の情報の候補として言外の情報ランキング部４６に出力する。 When searching for candidate information outside the word, the corpus document co-occurrence data stored in the corpus document co-occurrence data storage unit 22 is read out, and the read corpus document co-occurrence data is input by the input unit 40. A predicate term structure similar to the user utterance is retrieved. Then, in the predicate term structure set, the predicate term structure combined with the searched predicate term structure is output to the out-of-word information ranking unit 46 as a candidate for out-of-word information.

なお、入力部４０により入力されたユーザ発話と類似した述語項構造を検索する方法としては、上述した用例検索部４４ａと同様にベクトル間の類似度を用いた方法が挙げられる。 As a method for searching for a predicate term structure similar to the user utterance input by the input unit 40, a method using the similarity between vectors as in the example search unit 44a described above can be used.

言外の情報ランキング部４６は、入力部４０により入力されたユーザ発話と、ランキングモデル記憶部２６に記憶されているランキングモデルとに基づいて、用例検索部４４ａ又は共起検索部４４ｂにより生成された言外の情報の候補の各々に対して、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出する。 The extraordinary information ranking unit 46 is generated by the example search unit 44a or the co-occurrence search unit 44b based on the user utterance input by the input unit 40 and the ranking model stored in the ranking model storage unit 26. For each candidate for non-verbal information, a score representing the likelihood of the non-verbal information for the user utterance is calculated.

下記表１４に、入力部４０により入力されたユーザ発話を示し、下記表１５に、言外の情報の候補を示す。 Table 14 below shows user utterances input by the input unit 40, and Table 15 below shows candidates for extra information.

下記表１６に、このような例において算出された、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを示す。 Table 16 below shows a score representing the likelihood of out-of-word information for the user utterance calculated in such an example.

下記表１７に示すように、算出したスコアが閾値（例えば、０．２）以上の言外の情報を、スコアを昇順に並べたランキング形式で、タイプ識別部４８に出力する。 As shown in Table 17 below, extra information whose calculated score is greater than or equal to a threshold (for example, 0.2) is output to the type identifying unit 48 in a ranking format in which the scores are arranged in ascending order.

タイプ識別部４８は、タイプ識別モデル記憶部３０に記憶されているタイプ識別モデルに基づいて、言外の情報の各々に対してタイプを識別して付与する。 Based on the type identification model stored in the type identification model storage unit 30, the type identification unit 48 identifies and assigns the type to each piece of extra information.

例えば、下記表１８に示される言外の情報を入力したとする。 For example, it is assumed that information other than the words shown in Table 18 below is input.

このような場合には、下記表１９に示されるように、各々の言外の情報にタイプが付与される。 In such a case, as shown in Table 19 below, a type is assigned to each extra information.

タイプフィルタ部５０は、タイプが付与された言外の情報から、タイプに関する予め定めた条件を満たす言外の情報を、表現変換部５２に出力する。 The type filter unit 50 outputs, to the expression conversion unit 52, extra information that satisfies a predetermined condition related to the type from the extra information provided with the type.

予め定めた条件を満たす言外の情報の選択方法としては、例えば、出力する言外の情報のタイプを予め列挙した設定ファイルを記憶させておき、この設定ファイルに含まれるタイプの言外の情報のみを選択する方法が挙げられる。 As a method for selecting out-of-language information that satisfies a predetermined condition, for example, a configuration file in which types of out-of-language information to be output are stored in advance is stored, and the out-of-language information included in this configuration file is stored. The method of selecting only is mentioned.

例えば、設定ファイルに「事実１」のタイプ、及び「事実２」のタイプのみが列挙されているとする。この場合に、下記表２０に示す、タイプが付与された言外の情報が入力された場合、上記設定ファイルを用いて、下記表２１に示す言外の情報が選択される。 For example, assume that only the type of “Fact 1” and the type of “Fact 2” are listed in the configuration file. In this case, when the information other than the word given the type shown in Table 20 below is input, the information outside the word shown in Table 21 below is selected using the setting file.

このように、タイプが付与された言外の情報を、タイプに基づいてフィルタリングすることで、例えば「Ｓは褒められたい」というように、ユーザに発話すべきでない可能性がある言外の情報を、出力対象の言外の情報から除外することができる。なお、予め定めた条件を満たす言外の情報の選択方法としては、例えば、出力すべきでない言外の情報のタイプを予め列挙した設定ファイルを記憶させておいてもよい。 In this way, by filtering out information that is given a type based on the type, information that is not likely to be spoken to the user, such as “S want to give up”, for example. Can be excluded from non-verbal information to be output. In addition, as a method for selecting out-of-language information that satisfies a predetermined condition, for example, a setting file in which types of out-of-word information that should not be output may be stored in advance.

表現変換部５２は、タイプフィルタ部５０により出力された言外の情報の各々を、発話文に変換し、出力部５４に出力する。 The expression conversion unit 52 converts each non-verbal information output by the type filter unit 50 into an utterance sentence and outputs the utterance sentence to the output unit 54.

言外の情報の表現を変換する際には、言外の情報の末尾を「〜ですね」と変換することで、確認の形式に変換する。変換方法はこれに限らず、言外の情報の末尾を「〜なんですか？」と変換することで質問の形式にしても良い。また、変換を行わず、言外の情報をそのまま応答発話として出力しても良い。 When converting the expression of non-verbal information, the end of the non-verbal information is converted to a confirmation format by converting it to "~". The conversion method is not limited to this, and the question may be in the form of a question by converting the end of the extra information as “? Further, information other than speech may be output as a response utterance without conversion.

下記表２２に示すような言外の情報を入力した場合には、各々の言外の情報は、下記表２３に示すような表現に変換される。なお、下記表２２では、言外の情報のみを示している。 When extra information as shown in Table 22 below is input, each extra information is converted into an expression as shown in Table 23 below. In Table 22 below, only extra information is shown.

出力部５４は、スコアが算出された言外の情報のうち、スコアが予め定めた条件を満たす言外の情報を、スコアを付与した状態で、応答発話として出力する。出力方法としては、応答発話を示すデータをディスプレイ等の表示手段に表示させたり、応答発話を示すデータを外部装置に送信したり、応答発話を示す音声を音声出力手段により出力させたりする方法が挙げられる。 The output unit 54 outputs out-of-word information satisfying a predetermined score among the out-of-word information for which the score is calculated, as a response utterance in a state where the score is given. As an output method, there is a method of displaying data indicating a response utterance on a display unit such as a display, transmitting data indicating a response utterance to an external device, or outputting a voice indicating a response utterance by a voice output unit. Can be mentioned.

下記表２４に示すような、スコアが付与された言外の情報を入力した場合には、スコアに基づき、下記表２５に示すような、スコアが予め定めた閾値（例えば、０．５）以上の言外の情報が、スコアが付与された状態で、応答発話として出力される。 When non-verbal information to which a score is assigned as shown in Table 24 below is input, based on the score, the score as shown in Table 25 below is a predetermined threshold (for example, 0.5) or more. Is output as a response utterance in a state where a score is given.

なお、本実施形態に係る発話生成装置１０は、例えば、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、各種プログラムを記憶するＲＯＭ（Read Only Memory）を備えたコンピュータ装置で構成される。また、発話生成装置１０を構成するコンピュータは、ハードディスクドライブ、不揮発性メモリ等の記憶部を備えていても良い。本実施形態では、ＣＰＵがＲＯＭ、ハードディスク等の記憶部に記憶されているプログラムを読み出して実行することにより、上記のハードウェア資源とプログラムとが協働し、上述した機能が実現される。 Note that the utterance generation device 10 according to the present embodiment is configured by a computer device including, for example, a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory) that stores various programs. The computer constituting the utterance generation device 10 may include a storage unit such as a hard disk drive or a nonvolatile memory. In the present embodiment, the CPU reads and executes a program stored in a storage unit such as a ROM or a hard disk, whereby the hardware resources and the program cooperate to realize the above-described function.

次に、本実施形態に係る発話生成装置１０による発話生成処理の流れを、図４に示すフローチャートを用いて説明する。本実施形態では、発話生成装置１０に、発話生成処理の実行を開始するための予め定めたデータが入力されたタイミングで発話生成処理が開始されるが、発話生成処理が開始されるタイミングはこれに限らず、例えば、対話ルールを示すデータが入力されたタイミングで発話生成処理が開始されても良い。 Next, the flow of utterance generation processing by the utterance generation apparatus 10 according to the present embodiment will be described with reference to the flowchart shown in FIG. In the present embodiment, the utterance generation process is started at a timing when predetermined data for starting execution of the utterance generation process is input to the utterance generation apparatus 10. For example, the utterance generation process may be started at a timing when data indicating a dialogue rule is input.

ステップＳ１０１では、入力部４０が、ユーザ発話を入力する。 In step S101, the input unit 40 inputs a user utterance.

ステップＳ１０３では、用例検索部４４ａが、ユーザ発話と言外の情報との組の集合である用例データに基づいて、入力部４０により入力されたユーザ発話に対応する言外の情報から、言外の情報の候補を生成する。 In step S103, the example search unit 44a uses the example data, which is a set of user utterances and extra information, from the extra information corresponding to the user utterances input by the input unit 40. Generate information candidates.

ステップＳ１０５では、共起検索部４４ｂが、コーパス文書共起データである、コーパス文書内で共起する述語項構造の組に基づいて、入力部４０により入力されたユーザ発話に対応する述語項構造から、言外の情報の候補を生成する。 In step S105, the co-occurrence search unit 44b has a predicate term structure corresponding to the user utterance input by the input unit 40 based on a set of predicate term structures that co-occur in the corpus document, which is corpus document co-occurrence data. From this, a candidate for out-of-word information is generated.

ステップＳ１０７では、言外の情報ランキング部４６が、入力部４０により入力されたユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、用例検索部４４ａ又は共起検索部４４ｂにより生成された言外の情報の候補の各々に対して、スコアを算出する。 In step S107, the non-verbal information ranking unit 46 uses the user utterance input by the input unit 40 and a ranking model for calculating a score representing the likelihood of the non-verbal information with respect to the user utterance. A score is calculated for each of the non-verbal information candidates generated by the search unit 44a or the co-occurrence search unit 44b.

ステップＳ１０９では、タイプ識別部４８が、言外の情報のタイプを識別するためのタイプ識別モデルに基づいて、言外の情報の各々に対してタイプを識別して付与する。 In step S109, the type identifying unit 48 identifies and assigns the type to each of the non-verbal information based on the type identification model for identifying the type of the non-verbal information.

ステップＳ１１１では、タイプフィルタ部５０が、タイプが付与された言外の情報から、タイプに基づいてフィルタリングし、タイプに関する予め定めた条件を満たす言外の情報のみを出力する。 In step S111, the type filter unit 50 performs filtering based on the type from the non-verbal information to which the type is assigned, and outputs only the non-verbal information that satisfies a predetermined condition regarding the type.

ステップＳ１１３では、表現変換部５２が、言外の情報の各々を、表現を変換することにより発話文に変換する。 In step S113, the expression conversion unit 52 converts each piece of extra information into an utterance sentence by converting the expression.

ステップＳ１１５では、出力部５４が、スコアが算出された言外の情報の候補のうち、スコアが予め定めた条件を満たす言外の情報を応答発話として出力し、本発話生成処理のプログラムの実行を終了する。 In step S115, the output unit 54 outputs out-of-word information satisfying a predetermined condition as a response utterance out of the out-of-word information candidates for which the score is calculated, and executes the program for the present utterance generation processing. Exit.

このように、本実施形態では、入力されたユーザ発話と言外の情報との組の集合である用例データ、又はコーパス文書内で共起する述語項構造の組に基づいて、入力されたユーザ発話に対応する言外の情報、又はユーザ発話に対応する述語項構造から、言外の情報の候補を生成する。また、入力されたユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、生成された言外の情報の候補の各々に対して、スコアを算出する。また、スコアが算出された言外の情報の候補のうち、スコアが予め定めた条件を満たす言外の情報を応答発話として出力する As described above, in the present embodiment, the input user is based on the example data that is a set of the set of the input user utterance and the extra information, or the set of predicate term structures that co-occur in the corpus document. Out-of-word information candidates are generated from out-of-word information corresponding to utterances or predicate term structures corresponding to user utterances. In addition, based on the input user utterance and a ranking model for calculating a score representing the likelihood of the non-verbal information with respect to the user utterance, a score is generated for each of the generated non-verbal information candidates. Is calculated. Further, out-of-word information whose score is calculated, out-of-word information satisfying a predetermined condition is output as a response utterance.

これにより、言外の情報を用いた応答発話を生成することにより、雑談対話システムがユーザの発話内容に限定されない、様々な内容を応答することができる。話をしっかり理解しているとユーザに伝えることができるため、より長く使ってもらえる対話システムが実現される。 Thus, by generating a response utterance using out-of-language information, the chat dialogue system can respond to various contents that are not limited to the user's utterance contents. Since it is possible to tell the user that the story is well understood, a dialogue system that can be used for a longer time is realized.

なお、本実施形態では、図１乃至図３に示す機能の構成要素の動作をプログラムとして構築し、発話生成装置１０として利用されるコンピュータにインストールして実行させるが、これに限らず、ネットワークを介して流通させても良い。 In the present embodiment, the operation of the components of the functions shown in FIGS. 1 to 3 is constructed as a program and installed and executed on a computer used as the utterance generation device 10. However, the present invention is not limited to this. It may be distributed through.

また、構築されたプログラムをハードディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールしたり、配布したりしても良い。 Further, the constructed program may be stored in a portable storage medium such as a hard disk or a CD-ROM, and installed in a computer or distributed.

１０発話生成装置
１２学習部
１４発話生成部
２０コーパス文書共起データ作成部
２２コーパス文書共起データ記憶部
２４ランキングモデル作成部
２６ランキングモデル記憶部
２８タイプ識別モデル作成部
３０タイプ識別モデル記憶部
４０入力部
４２用例データ記憶部
４４検索部
４４ａ用例検索部
４４ｂ共起検索部
４６言外の情報ランキング部
４８タイプ識別部
５０タイプフィルタ部
５２表現変換部
５４出力部 DESCRIPTION OF SYMBOLS 10 Utterance production | generation apparatus 12 Learning part 14 Utterance production | generation part 20 Corpus document co-occurrence data creation part 22 Corpus document co-occurrence data storage part 24 Ranking model creation part 26 Ranking model memory | storage part 28 Type identification model creation part 30 Type identification model memory | storage part 40 Input unit 42 Example data storage unit 44 Search unit 44a Example search unit 44b Co-occurrence search unit 46 Extra information ranking unit 48 Type identification unit 50 Type filter unit 52 Expression conversion unit 54 Output unit

Claims

An input unit for inputting a user utterance;
Based on example data that is a set of user utterances and non-verbal information, or a predicate term structure co-occurring in a corpus document, an extra word corresponding to the user utterance input by the input unit A search unit that generates a candidate for the extra information from the information or the predicate term structure corresponding to the user utterance;
Based on the user utterance input by the input unit and a ranking model for calculating a score representing the likelihood of the out-of-language information with respect to the user utterance, the out-of-language information generated by the search unit For each candidate, a non-verbal information ranking unit that calculates the score;
An utterance generating device including

Based on a type identification model for identifying the type of the outside information, a type identifying unit that identifies and gives the type to each of the outside information,
The utterance generation device according to claim 1, further comprising: a type filter unit that outputs, from the extra information provided with the type, the extra information satisfying a predetermined condition relating to the type.

A type identification model creating unit that creates the type identification model based on data that includes a set of information other than the word to which the type is assigned in advance,
The utterance generation device according to claim 2, wherein the type identification unit identifies and assigns the type to each of the extraordinary information based on the type identification model created by the type identification model creation unit. .

The output part which outputs as the response utterance the said extraordinary information which the said score satisfy | fills the conditions predetermined among the candidates of the extraordinary information for which the said score was calculated. The utterance generation device described in the paragraph.

The utterance generation device according to claim 4, further comprising an expression conversion unit that converts each of the non-verbal information output by the output unit into an utterance sentence.

Based on a set of user utterances and positive examples of non-verbal information and a set of user utterances and negative examples of non-verbal information, each of the user utterances and non-verbal information is morphologically analyzed, Ranking that creates the ranking model based on whether the combination of stems among the obtained morphemes and the combination of stems is obtained from a positive example set or a negative example set A model creation unit,
The non-verbal information ranking unit morphologically analyzes each of the user utterance input by the input unit and the non-verbal information candidates generated by the search unit, and a stem of the obtained morphemes The utterance generation device according to any one of claims 1 to 5, wherein the score is calculated for a candidate for the extra information based on a combination of each other and the ranking model created by the ranking model creation unit. .

An utterance generation method in an utterance generation device including an input unit, a search unit, and an extra information ranking unit,
The input unit inputs a user utterance;
Based on the example data that is a set of user utterances and extra information, or a set of predicate term structures that co-occur in a corpus document, the search unit inputs the user utterances input by the input unit. Generating the verbal information candidates from the corresponding verbal information or the predicate term structure corresponding to the user utterance;
Based on the user utterance input by the input unit and a ranking model for calculating a score representing the likelihood of the non-verbal information with respect to the user utterance, Calculating the score for each of the generated non-verbal information candidates;
Utterance generation method including

The utterance production | generation program for functioning a computer as each part of the utterance production | generation apparatus in any one of Claims 1-6.