JP2012141910A

JP2012141910A - Information acquisition device

Info

Publication number: JP2012141910A
Application number: JP2011000902A
Authority: JP
Inventors: Konagi Uchibe; こなぎ内部; Yasutsugu Morimoto; 康嗣森本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-01-06
Filing date: 2011-01-06
Publication date: 2012-07-26
Anticipated expiration: 2031-01-06
Also published as: JP5560207B2

Abstract

【課題】質問と回答の対で構成されるテキストデータから有用な情報を抽出する。
【解決手段】質問と回答の内容をそれぞれ解析し、それらの解析結果を組合せることで有用な情報を抽出する。具体的には、質問と回答の対から成るテキストデータを入力する入力手段と、前記テキストデータから情報を抽出する情報抽出手段と、前記情報抽出手段が抽出した結果を出力する出力手段とを備え、前記情報抽出手段は、上記入力手段による入力の質問部分テキストを解析する質問テキスト解析手段と，同入力の回答部分テキストを解析する回答テキスト解析手段と，上記質問テキスト解析手段と上記回答テキスト解析手段の解析結果からテキストの適合判定を行う適合テキスト判定手段を含む、情報取得装置とする。
【選択図】図１PROBLEM TO BE SOLVED: To extract useful information from text data composed of a pair of a question and an answer.
The contents of the question and the answer are analyzed, and useful information is extracted by combining the analysis results. Specifically, an input means for inputting text data composed of a pair of a question and an answer, an information extraction means for extracting information from the text data, and an output means for outputting a result extracted by the information extraction means. The information extraction means comprises: question text analysis means for analyzing the question part text input by the input means; answer text analysis means for analyzing the answer part text input thereto; the question text analysis means; and the answer text analysis. The information acquisition apparatus includes a conforming text determination unit that performs text conformity determination from the analysis result of the unit.
[Selection] Figure 1

Description

本発明は、質問と回答の対から成るテキストデータに対する解析技術に関するものである。 The present invention relates to an analysis technique for text data including a pair of a question and an answer.

多くの企業は顧客からの質問や意見要望を受け付ける、サポートサービス部門やコールセンターを有し、質問等の内容と、それらに対する企業側の回答や対応の内容をテキストデータとして記録、蓄積している。このようなデータから顧客のニーズや製品への評価・意見などを捉え、売上げ拡大や新製品開発への手がかりとしようとする企業も多い。有効な手がかりを得るためには、サポートサービス等で蓄積したテキストデータを解析し、必要な情報を抽出する技術が重要となる。 Many companies have a support service department and a call center that accept questions and requests from customers, and record and store the contents of questions and the responses and responses of the companies as text data. Many companies try to capture customer needs and product evaluations and opinions from such data and use them as clues to expand sales and develop new products. In order to obtain effective clues, a technique for analyzing text data accumulated by a support service and extracting necessary information is important.

顧客の意見を収集したテキストデータに対し、形態素解析等、既存のテキスト解析技術を用いて得られる、テキストに含まれる単語の頻度を顧客ニーズとして抽出する技術が特許文献１に開示されている。 Patent Document 1 discloses a technique for extracting the frequency of words contained in a text as customer needs, which is obtained by using an existing text analysis technique such as morphological analysis for text data obtained by collecting customer opinions.

また、顧客からの意見を表現によって、喜びや怒りなどの感情軸で分類する技術が特許文献２に開示されている。 Further, Patent Literature 2 discloses a technique for classifying customer opinions based on emotional axes such as joy and anger based on expression.

特開２００７−２２６５６８号公報JP 2007-226568 A 特開２００３−２８１１６１号公報JP 2003-281161 A

特許文献１の方法を用いると、テキストデータに多く現れる単語など、単語ベースの情報は得られる。しかし、サポートサービス等に寄せられる問合せに含まれる単語は、既存製品の名前、製品の機能に関する単語、あるいは関連分野の一般的な専門用語がほとんどである。従って、売上個数などと同等の情報や、対象分野では既知の情報など、当たり前の情報しか得られないという課題があった。 Using the method of Patent Document 1, word-based information such as words that frequently appear in text data can be obtained. However, most of the words included in inquiries sent to support services are the names of existing products, words related to product functions, or general technical terms in related fields. Therefore, there is a problem that only information that is commonplace such as information equivalent to the number of units sold or information known in the target field can be obtained.

また、サポートサービス等へ寄せられる問合せは、ネガティブな意見が大多数であり、特許文献２の方法を用いると、ネガティブな感情に偏った分類しか行えないという課題があった。 In addition, inquiries sent to support services and the like have a large number of negative opinions, and using the method of Patent Document 2, there is a problem that only a classification biased toward negative emotions can be performed.

本発明では、質問と回答の対から成るテキストデータに対して、質問と回答のそれぞれのテキストの内容を解析し、両者の解析結果を組合せることで、単語ベースでは得られない情報や、感情軸よりも具体的な観点からの情報を抽出する。具体的には、質問と回答の対から成るテキストデータを入力する入力手段と、前記テキストデータから情報を抽出する情報抽出手段と、前記情報抽出手段が抽出した結果を出力する出力手段、とを備え、前記情報抽出手段は、上記入力手段による入力の質問部分テキストを解析する質問内容解析手段と，同入力の回答部分テキストを解析する回答内容解析手段と，上記質問内容解析手段と上記回答内容解析手段の解析結果からテキストの適合判定を行う適合テキスト判定手段を含む、情報取得装置とすることを特徴とする。 In the present invention, for text data composed of a pair of a question and an answer, the contents of each text of the question and the answer are analyzed, and the analysis results of both are combined to obtain information and emotion that cannot be obtained on a word basis. Extract information from a specific point of view rather than the axis. Specifically, an input means for inputting text data composed of a pair of a question and an answer, an information extraction means for extracting information from the text data, and an output means for outputting a result extracted by the information extraction means, The information extraction means includes: question content analysis means for analyzing the question part text input by the input means; answer content analysis means for analyzing the answer part text of the input; the question content analysis means; and the answer content The information acquisition device includes a conforming text determination unit that performs text conformity determination from the analysis result of the analysis unit.

本発明によれば、質問と回答から構成されるテキストデータから、有用な情報が取得できるという効果がある。 According to the present invention, there is an effect that useful information can be acquired from text data composed of a question and an answer.

本発明の構成を説明する図である。It is a figure explaining the structure of this invention. 質問回答テーブルを説明する図である。It is a figure explaining a question answer table. 情報抽出プログラムの処理方式を説明する図である。It is a figure explaining the processing system of an information extraction program. 質問内容解析処理の処理方式を説明する図である。It is a figure explaining the processing system of a question content analysis process. 回答内容解析処理の処理方式を説明する図である。It is a figure explaining the processing method of an answer content analysis process. 適合テキスト判定処理の処理方式を説明する図である。It is a figure explaining the processing system of a suitable text determination process. 必要部分テキスト抽出処理の処理方式を説明する図である。It is a figure explaining the processing system of a required partial text extraction process. 適合テキストテーブルを説明する図である。It is a figure explaining a conformity text table. 処理結果の表示例を説明する図である。It is a figure explaining the example of a display of a processing result. 本発明の時系列を説明する図である。It is a figure explaining the time series of this invention. 処理結果の表示例を説明する図である。It is a figure explaining the example of a display of a processing result. 蓄積された結果から統計情報を取得する処理部の構成図を示す図である。It is a figure which shows the block diagram of the process part which acquires statistical information from the accumulate | stored result.

以下、本発明の実施例を図１〜８を参照して説明する。 Embodiments of the present invention will be described below with reference to FIGS.

図１は本実施例の構成について説明する図である。本発明を実現するための装置１０１は，各種処理を実行するＣＰＵ１０２と，ユーザが入力を行う入力装置１０３と，各種情報をユーザに提供する出力装置１０４と，ＯＳ１０６，情報抽出プログラム１０７などのプログラム類，各種データを格納すると共に処理中の計算結果を一時的に格納する作業メモリ１１１を格納する記憶装置１０５からなる。 FIG. 1 is a diagram for explaining the configuration of this embodiment. An apparatus 101 for realizing the present invention includes a CPU 102 that executes various processes, an input apparatus 103 that is input by a user, an output apparatus 104 that provides various information to the user, an OS 106, and a program such as an information extraction program 107. And a storage device 105 for storing a work memory 111 for storing various data and temporarily storing calculation results being processed.

入力装置１０３は、キーボードやマウスなどのデバイスで構成することが可能である。 The input device 103 can be configured with devices such as a keyboard and a mouse.

出力装置１０４は、ディスプレイなどの表示装置で構成することが可能である。 The output device 104 can be configured by a display device such as a display.

記憶装置１０５は、ＳＲＡＭやフラッシュメモリなどの不揮発性メモリで構成することが可能であるほか、プログラム類や不変データをＲＯＭに、可変データをＲＡＭに分ける構成としてもよく、半導体メモリではなく、磁気ディスクなどの記憶媒体で構成することも可能である。 The storage device 105 can be configured by a non-volatile memory such as SRAM or flash memory, and may be configured such that programs and invariant data are divided into ROM and variable data is divided into RAM. It can also be configured by a storage medium such as a disk.

図２は本実施例で扱うデータの構成について説明する図である。データは質問回答テーブル２０１として構成する事ができる。質問テキスト２０２、回答テキスト２０３、質問フラグ２０４、回答フラグ２０５をテーブルの要素とし、必要に応じて必要な数の属性２０６を要素に加えてもよい。質問回答テーブル２０１は記憶装置１０５に保存される。 FIG. 2 is a diagram for explaining the configuration of data handled in this embodiment. The data can be configured as a question answer table 201. The question text 202, the answer text 203, the question flag 204, and the answer flag 205 may be elements of the table, and a necessary number of attributes 206 may be added to the elements as necessary. The question / answer table 201 is stored in the storage device 105.

質問テキスト２０２は、サポートサービスやコールセンター等に寄せられる顧客等の質問、意見、要望などを記したテキストである。 The question text 202 is a text that describes questions, opinions, requests, etc. of customers etc. sent to support services, call centers, and the like.

回答テキスト２０３は、質問テキスト２０２に対する、回答や対応内容などを記したテキストである。 The answer text 203 is a text that describes an answer, a corresponding content, etc. with respect to the question text 202.

質問フラグ１０４は、初期値は０とし、後述する質問内容解析の結果に応じて１にする。 The question flag 104 has an initial value of 0 and is set to 1 according to the result of question content analysis described later.

回答フラグ１０５は、初期値は０とし、後述する回答内容解析の結果に応じて１にする。 The answer flag 105 has an initial value of 0, and is set to 1 according to the result of answer content analysis described later.

属性１０６領域には、製品名や顧客の種別など質問や回答に関する属性を格納できる。 Attributes relating to questions and answers such as product names and customer types can be stored in the attribute 106 area.

図３は情報抽出プログラム１０７の処理フローを示す概略ＰＡＤである。質問回答データテーブルに登録されている各要素に対し、質問内容解析処理３０１、回答内容解析処理３０２、適合テキスト判定処理３０３を実施する。各処理（３０１、３０２、３０３）はそれぞれ質問解析モジュール１０８、回答解析モジュール１０９、適合テキスト判定モジュール１１０の各モジュールで行う処理に相当する。更に、情報抽出プログラム１０７にモジュールを加えることにより、必要部分テキスト抽出処理３０４を行ってもよい。 FIG. 3 is a schematic PAD showing the processing flow of the information extraction program 107. For each element registered in the question / answer data table, a question content analysis process 301, an answer content analysis process 302, and a matching text determination process 303 are performed. Each process (301, 302, 303) corresponds to a process performed by each of the question analysis module 108, the answer analysis module 109, and the matching text determination module 110, respectively. Furthermore, the necessary partial text extraction process 304 may be performed by adding a module to the information extraction program 107.

図４は質問内容解析処理３０１の概略ＰＡＤである。本処理は質問解析モジュール１０８により行われる。質問回答テーブル２０１の処理対象要素の質問テキスト２０２が、定められた表現ルールを満たすか否かを判定し（４０１）、満たす場合には質問回答テーブル２０１の対象要素の質問フラグ２０４を１にする（４０２）。上記表現ルールについて、質問者が製品やサービスに対して何らかの要望をしている質問を選別する場合を例として説明する。「変数の下限値を変更したい」「処理Ａと処理Ｂを同時に実行したい」のような質問は、質問者が何かをやりたい、やろうとしているという要望を示す質問である。テキストから「〜したい」などのモダリティにより要望表現を抽出する技術は一般的に知られている。モダリティを抽出ルールとして、質問者が何かをやりたい、要望している、という意志を表す表現を含む質問を選別できる。しかし、モダリティだけでは、「質問したい」「相談したい」など、製品にではなく人に対応して欲しいことを示す質問も含まれてしまうため、人に対する要望を示すような表現を除外ルールとして列挙する。「抽出ルールとして列挙されている表現を含むが、除外ルールに列挙されている表現には一致しない」ことを表現ルールとすることにより、質問者が製品に要望している内容を含む質問テキストを抽出する。 FIG. 4 is a schematic PAD of the question content analysis process 301. This process is performed by the question analysis module 108. It is determined whether or not the question text 202 of the processing target element in the question answer table 201 satisfies a predetermined expression rule (401), and when satisfied, the question flag 204 of the target element of the question answer table 201 is set to 1. (402). The above expression rule will be described by taking as an example a case where a questioner selects a question that he / she desires for a product or service. Questions such as “I want to change the lower limit value of a variable” and “I want to execute process A and process B at the same time” are questions that indicate a desire that the questioner wants to do something or wants to do it. A technique for extracting a desired expression from a text by a modality such as “I want to do” is generally known. Using the modality as an extraction rule, it is possible to select a question that includes an expression that expresses the willingness of the questioner to do or want to do something. However, the modality alone includes questions that indicate that you want people to respond to the product, such as "I want to ask a question" or "I want to consult". To do. By using an expression rule that "includes expressions listed as extraction rules, but does not match expressions listed in exclusion rules", the question text containing the content that the questioner wants for the product Extract.

図５は回答内容解析処理３０２の概略ＰＡＤである。本処理は回答解析モジュール１０９により行われる。質問回答テーブル２０１の処理対象要素の回答テキスト２０３が、定められた表現ルールを満たすか否かを判定し（５０１）、満たす場合には質問回答テーブル２０１の対象要素の回答フラグ２０５を１にする（５０２）。上記表現ルールについて、質問対象事項をサポートしていない旨、回答している回答テキストを選別する場合を例として説明する。「未サポートです」「サポートしておりません」「〜ことはできません」など、質問対象事項をサポートしていないことを示す表現を列挙し、抽出ルールとする。「サポートしておりませんでした」のような過去の状態を示す表現や、「〜では未サポート」のような限定された条件下での状況を示す表現を除外ルールとすることもできる。「抽出ルールとして列挙されている表現を含むが、除外ルールに列挙されている表現には一致しない」ことを表現ルールとすることにより、質問対象事項をサポートしていないという内容の回答テキストを抽出する。 FIG. 5 is a schematic PAD of the response content analysis process 302. This process is performed by the answer analysis module 109. It is determined whether or not the answer text 203 of the processing target element in the question answer table 201 satisfies a predetermined expression rule (501). If it is satisfied, the answer flag 205 of the target element in the question answer table 201 is set to 1. (502). The above expression rule will be described by taking as an example the case of selecting answer texts that answer that the subject matter is not supported. Expressions indicating that the question target item is not supported, such as “not supported”, “not supported”, “cannot be”, are enumerated as extraction rules. An expression indicating a past state such as “Not supported” or an expression indicating a situation under a limited condition such as “Unsupported in” can be used as an exclusion rule. By using an expression rule that includes the expressions listed as extraction rules but does not match the expressions listed in the exclusion rules, the answer text that does not support the question item is extracted. To do.

図６は適合テキスト判定処理３０３の概略ＰＡＤである。本処理は適合テキスト判定モジュール１１０により行われる。対象要素の質問フラグと回答フラグが共に１であるか否かを判定し（６０１）、共に１である場合には、対象要素の記憶装置１０５上の格納位置を示すアドレスである、質問回答ポインタを適合テキストテーブル（後述、図８）に格納する。図４、図５の説明において例に示した表現ルールに従うと、質問において、質問者が要望していることに対し、回答ではそれがサポートされていないと回答していることになる。従って、サポートすることによってサービス向上が見込める可能性のある事柄について述べられている質問・回答であるため、重要な情報となる。 FIG. 6 is a schematic PAD of the matching text determination process 303. This process is performed by the matching text determination module 110. It is determined whether or not both the question flag and the answer flag of the target element are 1 (601). If both are 1, the question answer pointer that is an address indicating the storage position of the target element on the storage device 105 Is stored in the matching text table (described later, FIG. 8). According to the expression rules shown as examples in the description of FIGS. 4 and 5, in response to a question, a questioner requests that the answer is not supported. Therefore, this is important information because it is a question / answer describing matters that can be expected to improve the service through support.

これにより、質問と回答をそれぞれ単独で解析しても得られない情報を、両方の解析結果の組合せにより得ることができるようになる。 As a result, information that cannot be obtained by analyzing the question and the answer independently can be obtained by a combination of both analysis results.

図７は必要部分テキスト抽出処理３０４の概略ＰＡＤである。この処理は情報抽出プログラム１０７にモジュールを追加することで実行できる。適合テキスト判定処理３０３で抽出された要素に対して以下の処理を行う。 FIG. 7 is a schematic PAD of the necessary part text extraction process 304. This process can be executed by adding a module to the information extraction program 107. The following processing is performed on the elements extracted in the matching text determination processing 303.

処理７０１においては、対象要素の質問テキストに対し、質問内容解析処理３０２において用いた表現ルールを構成する表現の直前のテキストを抽出する。即ち、図４の説明で用いた表現ルールにおいては、抽出ルールを構成する表現、例えば「〜したい」の「〜」に相当するテキストを抽出対象テキストとすることができる。例えば、「変数の下限値を変更したい」「処理Ａと処理Ｂを同時に実行したい」のような質問テキストからは、それぞれ「変数の下限値を変更」「処理Ａと処理Ｂを同時に実行」を抽出する。句点や接続詞などの直後を抽出開始点とし、抽出ルール表現の直前を抽出終了点とすることで、必要部分テキストを抽出することができる。 In process 701, the text immediately before the expression constituting the expression rule used in the question content analysis process 302 is extracted from the question text of the target element. In other words, in the expression rule used in the description of FIG. 4, an expression constituting the extraction rule, for example, a text corresponding to “to” of “to want” can be set as the extraction target text. For example, from question texts such as “I want to change the lower limit value of a variable” or “I want to execute process A and process B at the same time”, “Change the lower limit value of variable” and “Execute process A and process B at the same time” respectively. Extract. The required partial text can be extracted by setting the extraction start point immediately after the phrase or conjunction and the extraction end point immediately before the extraction rule expression.

処理７０２においては、対象要素の回答テキストに対し、回答内容解析処理３０３において用いた表現ルールを構成する表現の直前のテキストを抽出する。即ち、図５の説明で用いた表現ルールにおいては、抽出ルールを構成する表現、「未サポートです」「サポートしておりません」の直前のテキストを抽出対象テキストとすることができる。例えば、「下限値の変更は未サポートです」「処理Ａと処理Ｂの同時実行はサポートしておりません」のような回答テキストからは、それぞれ「下限値の変更」「処理Ａと処理Ｂの同時実行」を抽出する。抽出方法については処理７０１と同様にできる。 In the process 702, the text immediately before the expression constituting the expression rule used in the answer content analysis process 303 is extracted from the answer text of the target element. That is, in the expression rule used in the description of FIG. 5, the expression immediately before the expression constituting the extraction rule, “not supported” or “not supported” can be set as the extraction target text. For example, response texts such as “Changing the lower limit value is not supported” and “Simultaneous execution of process A and process B are not supported” indicate “change of the lower limit value”, “process A and process B,” respectively. Is executed simultaneously. The extraction method can be the same as the processing 701.

処理７０１と処理７０２はどちらか一方を行うようにしてもよい。 Either the process 701 or the process 702 may be performed.

処理７０３では、処理７０１と処理７０２でそれぞれ抽出されたテキストのどちらか一方または両方を選択する。例えば、基本的に処理７０１の抽出テキストを採用し、処理７０１の抽出結果の文字数が数文字程度の極端に短い場合など、抽出結果が不十分である際には、処理７０２の抽出結果を採用するという処理を行う。質問と回答の両者の結果で補完し合うことで精度を向上させることもできる。 In process 703, one or both of the texts extracted in processes 701 and 702 are selected. For example, when the extracted text of the process 701 is basically used, and when the extraction result is insufficient, such as when the number of characters of the extracted result of the process 701 is extremely short, such as several characters, the extracted result of the process 702 is used. The process of doing. The accuracy can be improved by complementing the results of both the question and the answer.

必要部分テキスト抽出処理３０４により、質問と回答の全文を参照しなくても、要となる事柄をひと目で確認できるようになる。例えば、サポートサービスに収集された情報から、有用な情報として設計部門などの担当部署へ展開するような場合に、必要なテキスト部分だけを一覧として提示することで視認性を向上させられる。出力例については後述する。即ち、必要部分テキスト抽出処理は、本情報取得装置では必須の構成ではないものの、この処理を実行することにより、ユーザの確認の簡便性を高めることができる効果がある。 The necessary part text extraction process 304 makes it possible to confirm important matters at a glance without referring to the full text of the question and the answer. For example, when the information collected by the support service is expanded as useful information to a department in charge such as a design department, visibility is improved by presenting only a necessary text portion as a list. An output example will be described later. That is, the necessary part text extraction processing is not an essential component in the information acquisition apparatus, but by executing this processing, there is an effect that it is possible to improve the convenience of confirmation by the user.

図８は図３の処理結果を格納するデータの構成例について示す図である。データは適合テキストテーブル８０１として構成することができる。適合テキストテーブル８０１は記憶装置１０５に保存される。 FIG. 8 is a diagram showing a configuration example of data for storing the processing result of FIG. The data can be organized as a matching text table 801. The matching text table 801 is stored in the storage device 105.

適合テキスト判定処理３０３で該当すると判定された質問回答テーブル２０１の要素への質問回答ポインタ８０４を格納する。また、必要部分テキスト抽出処理３０４を実施する場合、抽出した必要部分テキストを格納する（８０２）。必要であれば属性値８０３を格納してもよい。 The question answer pointer 804 to the element of the question answer table 201 determined to be applicable in the matching text determination processing 303 is stored. When the necessary part text extraction process 304 is performed, the extracted necessary part text is stored (802). If necessary, an attribute value 803 may be stored.

質問回答ポインタ８０４により、質問回答テーブル２０１を参照できるため、質問回答テーブル２０１の属性２０６に質問受付日時、製品名、質問者の種別などを格納しておくことにより、適合テキストテーブル８０１に格納された処理結果を分類できる。 Since the question answer table 201 can be referred to by the question answer pointer 804, the question acceptance date / time, the product name, the type of the questioner, etc. are stored in the attribute 206 of the question answer table 201, so that they are stored in the conformance text table 801. The processing results can be classified.

質問受付日時または回答日時のような時間情報を格納しておけば、時間的な増減の推移がわかる。また、製品名からは製品ごとの、質問者の種別を、例えば業種別に記録しておくと、業種別の結果が得られる。 By storing time information such as the date and time when a question is received or the date and time of answering, it is possible to know a change in time. Further, if the type of the questioner for each product is recorded from the product name, for example, by industry, the result for each industry can be obtained.

図１２は適合テキスト判定処理３０３で得られ、蓄積された結果から統計情報を取得する処理部の構成図である。統計情報取得モジュール１２０１は、類似度計算部１２０２と集計部１２０３から成り、図１の情報抽出プログラム１０７内のモジュールとして実現できる。 FIG. 12 is a configuration diagram of a processing unit that obtains statistical information from the accumulated results obtained by the matching text determination processing 303. The statistical information acquisition module 1201 includes a similarity calculation unit 1202 and a totaling unit 1203, and can be realized as a module in the information extraction program 107 in FIG.

類似度計算部１２０２において、必要部分テキスト８０２に含まれる単語の一致数により類似度を求め、類似度が一定値を超えた場合に同件と判定する。適合テキストテーブル８０１の属性値８０３として、同件である要素のアドレスやＩＤをリストや配列で保持することにより、適合テキストテーブル８０１の各要素がどの要素と同件であるかという情報を保存できる。 The similarity calculation unit 1202 obtains the similarity based on the number of matching words included in the necessary partial text 802, and determines that the case is the same when the similarity exceeds a certain value. By storing the addresses and IDs of the elements that are the same as a list or array as the attribute value 803 of the compatible text table 801, it is possible to store information as to which element each element of the compatible text table 801 is the same as. .

集計部１２０３において、同件数を計算する。同件数が多いということは要望が多いことを示しているため、より重要な内容であることが認識できる。同件内容を時間や製品などで分類すれば、より詳細な情報を取得できる。 The totaling unit 1203 calculates the number of cases. A large number of cases indicates that there are many requests, so it can be recognized that the content is more important. More detailed information can be obtained by classifying the content of the case by time or product.

図９は図３の処理結果の表示画面の例である。質問の受付月又は回答の受付月ごとに分け、タブ選択（９０１、９０２、９０３）により各月の結果を表示する例を示している。対象となる製品名９０４、必要部分テキスト抽出３０４で抽出したテキストの内容９０５を表示する。内容９０５には必要なテキストだけが書かれているため、簡略で読みやすい表示となる。前述のように同件数を求め、件数９０６を表示しても良い。図９では受付時期で分類して表示しているが、製品名、質問者の業種など、質問回答テーブル２０１の属性２０６に格納した他の情報を用いて分類することができるし、表示項目を増やすことも可能である。 FIG. 9 shows an example of the processing result display screen of FIG. An example is shown in which the results of each month are displayed by tab selection (901, 902, 903), divided into question acceptance months or answer acceptance months. The target product name 904 and the text content 905 extracted by the necessary part text extraction 304 are displayed. Since only necessary text is written in the contents 905, the display becomes simple and easy to read. As described above, the same number may be obtained and the number 906 may be displayed. In FIG. 9, the information is classified and displayed according to the reception time, but it can be classified using other information stored in the attribute 206 of the question answer table 201 such as the product name and the type of the questioner. It is also possible to increase.

図１０は本実施例の時系列図である。質問・回答テキストやその他属性値を含むデータを入力装置１０３から入力する１００１と、ＣＰＵ１０２により質問回答テーブル２０１の形式に変換され、記憶装置１０５に格納される１００２。入力装置１０３からの情報抽出要求１００３を受けて、ＣＰＵ１０２は記憶装置１０５からデータを取得し１００４、情報抽出プログラム１０７により情報抽出処理を行い、処理結果を記憶装置１０５に格納する１００５。入力装置１０３から結果表示要求１００６を受けると、ＣＰＵ１０２は記憶装置１０５から結果データを取得し１００７、出力装置１０４に結果を表示する１００８。結果表示１００８は、結果表示要求１００６がなくても、情報抽出要求１００３を受けて、情報抽出処理が終了した時点で行う方式としてもよい。 FIG. 10 is a time series diagram of this embodiment. Data including a question / answer text and other attribute values is input 1001 from the input device 103, and converted into a question / answer table 201 format by the CPU 102 and stored in the storage device 105 1002. In response to the information extraction request 1003 from the input device 103, the CPU 102 acquires data from the storage device 105 1004, performs information extraction processing by the information extraction program 107, and stores the processing result in the storage device 105 1005. When the result display request 1006 is received from the input device 103, the CPU 102 acquires result data from the storage device 105 1007 and displays 1008 on the output device 104 1008. Even if there is no result display request 1006, the result display 1008 may be a method that is performed when the information extraction process is completed upon receipt of the information extraction request 1003.

図４の質問内容解析処理３０１の概略ＰＡＤにおいて、判定処理４０１の表現ルールを、可能か否かを問う表現を抽出するルールとする例について説明する。「処理Ｔのログの出力先を変更できますか？」「出力メッセージの制御は可能ですか」など、質問者がある事柄を実行できるかどうかを問う質問は一般的である。このような質問は、可能や不可能を表すモダリティと疑問を表すモダリティを併用することで抽出できる。例えば、「できますか」「できませんか」「可能か」などの表現を列挙し、これらを含むことを表現ルールとする。 In the outline PAD of the question content analysis process 301 in FIG. 4, an example will be described in which the expression rule of the determination process 401 is a rule for extracting an expression asking whether or not it is possible. Questions that ask whether a questioner can execute a certain matter, such as “Can the log output destination of the process T be changed?” Or “Can the output message be controlled?” Are common. Such a question can be extracted by using a modality representing possible or impossible and a modality representing question. For example, expressions such as “Can you do”, “Can you do” or “Can you do” are enumerated, and including them is used as an expression rule.

上記の質問内容解析処理３０１の判定処理４０１の表現ルールに対し、図５の回答内容解析処理３０２の概略ＰＡＤの判定処理５０１の表現ルールを、不可能であることを示す表現を抽出するルールとする。この表現ルールも不可能を表現するモダリティを用いた抽出ルールすることで作成できる。即ち、「できません」「不可能です」など不可能を示す表現を列挙し、これらを含むことを表現ルールとする。 In contrast to the expression rule of the determination process 401 of the question content analysis process 301 described above, the expression rule of the outline PAD determination process 501 of the answer content analysis process 302 of FIG. To do. This expression rule can also be created by making an extraction rule using a modality expressing the impossibility. That is, expressions that indicate impossibility such as “cannot be performed” and “impossible” are listed, and the inclusion of these expressions is used as an expression rule.

上記による質問内容解析処理３０１と回答内容解析処理３０２の結果から、適合テキスト判定処理３０３を行うと、可能性を問う質問に対し、できないと回答している質問と回答の対が得られる。 From the results of the question content analysis process 301 and the answer content analysis process 302 described above, when the matching text determination process 303 is performed, a question-answer pair that cannot be answered to a question asking about possibility is obtained.

更に、実施例１と同様に必要部分テキスト抽出処理３０４を実施して、結果を適合テキストテーブル８０１に格納し、格納されたデータを集計して、同件ケースの数を取得できる。同件が多い事柄は、ニーズが多いにも関わらず、実現できていない事柄であるため、この事柄をできるようにすることで、ユーザの利便性を向上させられる。 Further, the necessary partial text extraction process 304 is performed in the same manner as in the first embodiment, the result is stored in the matching text table 801, and the stored data is aggregated to obtain the number of the same case. The matter with many cases is a matter that has not been realized even though there are many needs, so the convenience of the user can be improved by making this matter possible.

図４の質問内容解析処置３０１の判定処理４０１の表現ルールを実施例２と同様とし、図５の回答内容解析処理３０２の概略ＰＡＤにおいて、判定処理５０１の表現ルールを、代替策を提示する表現を抽出するルールとする例について説明する。例えば、「コマンドＡでは変数Ｘを変更できません。ただし、ファイルＢの定義文でＸの値を変更すれば、変数Ｘを別の値に変更できます。」「関数Ｆではご質問の操作は実現できません。代わりに関数Ｇを使用して下さい。」のように、質問者が質問したやり方では実現できなくても、別の方法を用いれば目的を達成できる場合に、その方法を教えるというケースはよく起こる。このような代替策を回答しているケースをまとめておけば、同種の質問を受けたときに、一から調べなくてもすぐに回答できるため、レスポンス時間を短縮できる。また、同種の質問が多い場合には、質問してきたやり方の方が、より一般的である可能性も見出せるため、その方法で実現できるような仕様変更を検討するきっかけとしても有効である。 The expression rule of the determination process 401 of the question content analysis process 301 in FIG. 4 is the same as that in the second embodiment, and the expression rule of the determination process 501 is an expression that presents an alternative measure in the schematic PAD of the answer content analysis process 302 in FIG. A description will be given of an example in which the rule is used to extract. For example, “Command A cannot change variable X. However, if you change the value of X in the definition statement of file B, you can change variable X to another value.” If you can't do it the way the questioner asked you, but if you can achieve the goal by using another method, you can use the function G instead. " It happens often. Summarizing the cases where such alternatives are answered, the response time can be shortened because the same type of question can be answered immediately without searching from the beginning. In addition, when there are many similar questions, it is possible to find a possibility that the method of asking questions is more general, and it is also effective as an opportunity to study specification changes that can be realized by that method.

判定処理５０１の表現ルールを、「ただし、〜れば、…できます。」「代わりに、…して下さい。」といった、代替策を提示する表現を列挙し、列挙した表現を含むこと、とすることで、代替策を提示している回答を抽出できる。
図７の必要部分テキスト抽出処理３０４において、代替策の内容を表すテキストを抽出する処理を行ってもよい。「ただし」と「れば」の間に挟まれたテキストや、「代わりに」と「して下さい」の間など抽出ルールを作成することで抽出できる。抽出結果は適合テキストテーブル８０１の属性欄８０３に格納できる。 The expression rule of the determination process 501 is to enumerate expressions that present alternatives, such as “However, if you can ...”, “Please, instead ...”, and include the enumerated expressions. By doing so, you can extract answers that present alternatives.
In the necessary part text extraction process 304 of FIG. 7, a process of extracting text representing the contents of the alternative may be performed. It can be extracted by creating an extraction rule such as text between “how” and “if”, or between “instead” and “please”. The extraction result can be stored in the attribute column 803 of the matching text table 801.

図１１は質問と代替策の表示例である。製品名１１０１、質問の内容１１０２、代替策１１０３を一覧表示している。質問回答テーブル２０１の属性欄２０６の格納情報により、表示項目を増やすことも可能である。新規の質問と同様の質問が含まれていれば、代替策がすぐにわかるため、回答時間を短縮できる。 FIG. 11 is a display example of questions and alternative measures. A list of product names 1101, question contents 1102, and alternative measures 1103 is displayed. It is also possible to increase the number of display items according to the information stored in the attribute column 206 of the question / answer table 201. If a question is included that is similar to a new question, the answer time can be reduced because the alternatives are readily known.

本願発明は、例えばサポートサービスやコールセンターに電話、メール、文書等で寄せられたり、Web上で収集される顧客等からの質問とそれに対する回答のテキストデータ分析に適用できる。 The present invention can be applied to, for example, text data analysis of questions from customers and the like collected by the telephone, e-mail, documents, etc. to support services and call centers, and collected on the Web.

１０１情報抽出装置，１０２ＣＰＵ，１０３入力装置，１０４出力装置，１０５記憶装置，１０６ＯＳ，１０７情報抽出プログラム，１０８質問解析モジュール，１０９回答解析モジュール，１１０適合テキスト判定モジュール，１１１作業メモリ，２０１質問回答テーブル，２０２質問テキスト、２０３回答テキスト、２０４質問フラグ、２０５回答フラグ、２０６属性、３０１質問内容解析処理、３０２回答内容解析処理、３０３適合テキスト判定処理、３０４必要部分テキスト抽出処理、８０１適合テキストテーブル、８０２必要部分テキスト、８０３属性、８０４質問回答ポインタ。 DESCRIPTION OF SYMBOLS 101 Information extraction device, 102 CPU, 103 Input device, 104 Output device, 105 Storage device, 106 OS, 107 Information extraction program, 108 Question analysis module, 109 Answer analysis module, 110 Conformance text determination module, 111 Work memory, 201 Question Answer table, 202 Question text, 203 Answer text, 204 Question flag, 205 Answer flag, 206 Attribute, 301 Question content analysis processing, 302 Answer content analysis processing, 303 Conformance text determination processing, 304 Necessary partial text extraction processing, 801 Conformance text Table, 802 Required text, 803 attribute, 804 Question answer pointer.

Claims

An input means for inputting text data composed of a pair of question text and answer text;
A question content analysis means for analyzing the question text;
An answer content analyzing means for analyzing the answer text;
A conformance text determination means for performing text conformity determination from the analysis results of the question content analysis means and the answer content analysis means;
And an output means for outputting a result determined by the matching text determination means.

Furthermore, it has a necessary partial text extracting means for extracting a necessary partial text from the text data that is adapted by the matching text determination means,
The information acquisition apparatus according to claim 1, wherein the extracted result is output as the determination result.

The information acquisition apparatus according to claim 1, wherein time information is added to the determination result, and the time information is output together with the determination result.

2. The information acquisition apparatus according to claim 1, further comprising storage means for storing a table storing attributes belonging to the question text.

The information acquisition apparatus according to claim 4, wherein the determination result is classified and output based on the attribute.

3. The information acquisition apparatus according to claim 2, wherein the necessary part text extracting unit extracts the necessary part text based on a predetermined extraction rule for the question text and / or the answer text.

Furthermore, from a plurality of the necessary partial texts, a similarity calculation unit for obtaining a similarity according to the degree of coincidence of words included in the necessary partial text;
The information acquisition apparatus according to claim 2, further comprising: a totaling unit that tabulates the cases that are recognized as the result of obtaining the similarity.

The information acquisition apparatus according to claim 1, wherein the matching text determination unit extracts a pair of the question text asking for possibility and the answer text indicating impossibility.

The information acquisition apparatus according to claim 1, wherein the matching text determination unit extracts a pair of the question text and the answer text indicating an alternative to the question text.