JP5078960B2

JP5078960B2 - Text processing apparatus and computer program

Info

Publication number: JP5078960B2
Application number: JP2009205145A
Authority: JP
Inventors: 哲石井; 寛浅野
Original assignee: QUALICA Inc
Current assignee: QUALICA Inc
Priority date: 2009-09-04
Filing date: 2009-09-04
Publication date: 2012-11-21
Anticipated expiration: 2029-09-04
Also published as: JP2011054137A

Description

本発明は、商品またはサービスに関連する者から寄せられた見解に代表される時系列情報を有するテキスト情報の内容をテキスト処理する技術に関する。 The present invention relates to a technique for text processing the contents of text information having time-series information represented by opinions received from persons related to goods or services.

例えば企業のヘルプデスクには、自社商品やサービスに対して、顧客から、例えば、苦情や意見などの様々な見解が寄せられる。また、このような自社商品やサービスに対する見解は、市場調査アンケート、Ｗｅｂでの風評、営業マンやサービスマンからの営業日報やサービス日報などを通じて、ヘルプデスク以外からも様々な形で寄せられる。 For example, a company's help desk receives various opinions such as complaints and opinions from customers for its products and services. In addition, opinions on such products and services can be sent in various forms other than help desks through market research questionnaires, reputations on the Web, daily sales reports and daily service reports from salesmen and servicemen.

従来では、企業においては、上記見解に対する個別の対応を図るとともに、これらの見解の内容をテキストとして蓄積し、これらの内容を分析することによって、自社商品やサービスの問題等を検出していた。 In the past, companies have individually dealt with the above-mentioned views, accumulate the contents of these views as texts, and analyze the contents to detect problems of their products and services.

この際には、前記商品やサービスの問題等の検出は、既出の問題等に使用されているキーワードを予め設定し、キーワードを用いたフィルタリング技術により、蓄積された見解の内容の中から、設定済みのキーワードを抽出することによって行っていた。 In this case, the problem of the product or service is detected in advance by setting a keyword used for the existing problem, etc., and is set from the contents of the accumulated opinions by the filtering technique using the keyword. It was done by extracting already used keywords.

このため、上記従来の技術にあっては、既出の問題等の検出は自動的に行えるものの、未知の問題等の検出を自動的に行うことは出来なかった。 For this reason, in the above-described conventional technology, although an already-existing problem or the like can be automatically detected, an unknown problem or the like cannot be automatically detected.

しかしながら、企業においては、自社商品やサービス等の重大なトラブルを未然に防ぐため、既出の問題等のみならず、未知の問題等についても、できるだけ早期に発見し適切な対処を図ることが望まれていた。このため、問題等が将来的に起こりうるという予兆を自動的に検出するシステムが切望されていた。 However, in order to prevent serious troubles with their products and services, it is desirable for companies to discover not only the existing problems but also unknown problems as soon as possible and take appropriate measures. It was. For this reason, a system that automatically detects a sign that a problem or the like may occur in the future has been desired.

そこで、本発明の目的は、商品またはサービスに関連する者から寄せされた見解に代表されるテキスト情報から、商品またはサービスに対する問題等の予兆の自動検出を可能にすることである。 Accordingly, an object of the present invention is to enable automatic detection of a sign such as a problem with a product or service from text information typified by views received from persons related to the product or service.

本発明の一つの実施態様に従うテキスト処理装置は、商品またはサービスに関連する者による前記商品またはサービスに対する見解の受付日と、前記見解の内容を示すテキストとを含むデータブロックを複数記憶する記憶手段と、テキスト解析によって、前記複数のデータブロックのそれぞれのテキスト同士を比較して、互いの類似度を算出するテキスト解析手段と、前記テキスト解析手段によって算出された類似度に基づいて、複数のデータブロックをグループ化して新規グループを生成するグループ生成手段と、前記グループ生成手段により生成された新規グループについて、前記新規グループ内の複数のデータブロックの受付日を基準として、所定の抽出期間における時系列分析を行う時系列分析手段と、を備える。 The text processing device according to one embodiment of the present invention is a storage means for storing a plurality of data blocks including a date of acceptance of an opinion on the product or service by a person related to the product or service, and a text indicating the content of the opinion. And text analysis means for comparing the texts of the plurality of data blocks by text analysis and calculating the similarity between them, and a plurality of data based on the similarity calculated by the text analysis means Group generation means for grouping blocks to generate a new group, and for the new group generated by the group generation means, a time series in a predetermined extraction period on the basis of reception dates of a plurality of data blocks in the new group And a time series analysis means for performing analysis.

好適な実施態様では、前記時系列分析手段による分析により得られた、前記見解の受付日ベースの出現件数の時系列変化が所定の抽出条件に合致する前記新規グループを抽出するグループ抽出手段をさらに備えてもよい。 In a preferred embodiment, there is further provided group extracting means for extracting the new group in which a time series change in the number of appearances based on the reception date of the opinion obtained by analysis by the time series analyzing means matches a predetermined extraction condition. You may prepare.

好適な実施態様では、前記抽出条件は、前記新規グループ内のデータブロック数、前記受付日ベースの出現件数の変動率、または前記受付日ベースの出現件数の変動パターンのいずれかに基づいて定められた条件であってもよい。 In a preferred embodiment, the extraction condition is determined based on any of the number of data blocks in the new group, the rate of change in the number of appearances based on the reception date, or the variation pattern of the number of appearances based on the reception date. The conditions may be different.

好適な実施態様では、前記抽出期間は、直近の第一の抽出期間と、前記第一の抽出期間を含みかつ前記第一の抽出期間よりも長い第二の抽出期間とを含んでもよい。 In a preferred embodiment, the extraction period may include a most recent first extraction period and a second extraction period that includes the first extraction period and is longer than the first extraction period.

好適な実施態様では、所定の条件を満たした表現が、予め定められている既出グループ別に蓄積される条件記憶部と、前記記憶手段から読みだした前記データブロックを、前記条件記憶部内の表現を含む第一のデータブロックと、前記条件記憶部内の表現を含まない第二のデータブロックとに分別するフィルタリング処理部と、をさらに備えてもよい。 In a preferred embodiment, an expression satisfying a predetermined condition is stored in a condition storage unit that is stored for each predetermined group, and the data block read from the storage unit is expressed as an expression in the condition storage unit. You may further provide the filtering process part classified into the 1st data block containing and the 2nd data block which does not contain the expression in the said condition memory | storage part.

好適な実施態様では、前記テキスト解析手段は、前記第二のデータブロックを対象として前記類似度を算出し、前記グループ生成手段は、前記第二のデータブロックをグループ化し、前記フィルタリング処理部は、前記第一のデータブロックに、前記各第一のデータブロックが含んでいる表現に対応する既出グループの識別子を対応付け、前記時系列分析手段は、グループ生成手段により生成された新規グループ、及び、前記既出グループのうちの少なくとも一つのグループについて時系列分析を行ってもよい。 In a preferred embodiment, the text analysis unit calculates the similarity for the second data block, the group generation unit groups the second data block, and the filtering processing unit includes: The first data block is associated with an identifier of an already-existing group corresponding to the expression included in each first data block, and the time series analysis unit includes a new group generated by a group generation unit, and A time series analysis may be performed on at least one of the previously described groups.

好適な実施態様では、前記見解のオリジナルデータを記憶するオリジナルデータ記憶手段と、前記オリジナルデータに複数の文が含まれるとき、前記複数の文を一文ずつに分割し、前記テキストに一文のみが含まれる複数のデータブロックを生成するデータブロック生成手段と、をさらに備えてもよい。 In a preferred embodiment, the original data storage means for storing the original data of the opinion, and when the original data includes a plurality of sentences, the plurality of sentences are divided into one sentence, and the text includes only one sentence. And a data block generating means for generating a plurality of data blocks.

好適な実施形態では、予め定められている既出表現が複数の既出グループ別に記憶されている既出条件記憶部と、予め定められている不要表現が記憶されている不要条件記憶部と、前記記憶手段から読みだした前記データブロックを、前記既出表現を含む第一のデータブロックと、前記不要表現を含む第二のデータブロックと、前記既出表現及び前記不要表現のいずれも含まない第三のデータブロックとに分別するとともに、前記第一のデータブロックをさらに前記既出グループ別に分別するフィルタリング処理部と、をさらに備え、前記テキスト解析手段は、前記第三のデータブロックを対象として前記類似度を算出し、前記グループ生成手段は、前記第三のデータブロックをグループ化して前記新規グループを生成し、前記時系列分析手段は、前記フィルタリング処理部で分別された既出グループ及びグループ化生成手段で生成された新規グループの少なくとも一つのグループについて、グループ内の複数のデータブロックの受付時刻を基準として、所定の抽出期間における時系列分析を行ってもよい。 In a preferred embodiment, an already-existing condition storage unit in which predetermined appearance expressions are stored for a plurality of appearance groups, an unnecessary condition storage unit in which predetermined unnecessary expressions are stored, and the storage unit The data block read from the first data block including the above-described expression, the second data block including the unnecessary expression, and the third data block including neither the above-mentioned expression nor the unnecessary expression And a filtering processing unit that further classifies the first data block by the already-existing group, and the text analysis unit calculates the similarity for the third data block. The group generation means generates the new group by grouping the third data blocks, and the time series analysis unit. For at least one of the existing group sorted by the filtering processing unit and the new group generated by the grouping generation means, at the time of a predetermined extraction period based on the reception time of a plurality of data blocks in the group A series analysis may be performed.

上記の処理により、予兆候補を好適に絞り込むことができ、予兆発見作業の負荷を大幅に軽減することが可能となる。 By the above processing, the predictor candidates can be appropriately narrowed down, and it is possible to greatly reduce the burden of predictor finding work.

好適な実施態様では、前記グループ生成手段により生成された新規グループに属するデータブロック内のテキストから抽出された表現を、新たな既出グループの新たな既出表現若しくは新たな不要表現として、前記既出条件記憶部若しくは前記不要条件記憶部に登録する登録手段をさらに備えていても良い。 In a preferred embodiment, the existing condition storage is performed by using an expression extracted from the text in the data block belonging to the new group generated by the group generating means as a new existing expression or a new unnecessary expression of the new existing group. Or registration means for registering in the unnecessary condition storage unit.

本発明にかかるテキスト処理装置は、商品またはサービスに関連する者から寄せられた見解に代表されるテキスト情報から、商品またはサービスに対する問題点等の予兆を自動検出できる。 The text processing apparatus according to the present invention can automatically detect a sign such as a problem with a product or service from text information typified by views received from a person related to the product or service.

本発明の一実施形態に係るテキスト処理装置の全体構成を示す図である。It is a figure which shows the whole structure of the text processing apparatus which concerns on one Embodiment of this invention. オリジナルデータ記憶部内に格納されるデータ構造の一例を示す図である。It is a figure which shows an example of the data structure stored in an original data storage part. データブロック記憶部内に格納されるデータ構造の一例を示す図である。It is a figure which shows an example of the data structure stored in a data block memory | storage part. 既出条件記憶部内に格納される既出表現抽出条件の一例を示す概念図である。It is a conceptual diagram which shows an example of the existing expression extraction conditions stored in the existing condition storage part. 不要条件記憶部内に格納される既出表現抽出条件の一例を示す概念図である。It is a conceptual diagram which shows an example of the existing expression extraction conditions stored in an unnecessary condition memory | storage part. 既出条件処理部によって処理された後のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure after processed by the already-existing condition process part. 不要条件処理部によって処理された後のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure after processing by the unnecessary condition process part. 類似度に基いたグループ化の方法を説明するための図である。It is a figure for demonstrating the method of grouping based on similarity. テキスト解析結果テーブルの一例を示す図である。It is a figure which shows an example of a text analysis result table. グループ生成手段によるグループ化された後のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure after grouping by the group production | generation means. 時系列分析手段によって生成されたヒストグラムの一例を示す図である。It is a figure which shows an example of the histogram produced | generated by the time series analysis means. 変動パターンの例を示した図である。It is the figure which showed the example of the fluctuation pattern. 時系列分析手段およびグループ抽出手段の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of a time series analysis means and a group extraction means. 時系列分析における条件設定画面の一例示す図である。It is a figure which shows an example of the condition setting screen in a time series analysis. 時系列分析およびグループ抽出における詳細設定画面の一例を示す図である。It is a figure which shows an example of the detailed setting screen in a time series analysis and group extraction. 時系列分析およびグループ抽出の結果を示すグラフである。It is a graph which shows the result of a time series analysis and group extraction. 時系列ベースの問題の予兆期および頻出期を概念的に示した図である。It is the figure which showed notion and the frequent appearance period of the problem of a time series base conceptually.

以下、本発明の一実施形態に係るテキストを含む情報を処理するシステムとして、ヘルプデスク等から入力されるテキストを含む情報を処理する、テキスト処理システムを一例にあげ、図面を参照して説明する。 Hereinafter, a text processing system that processes information including text input from a help desk or the like will be described as an example of a system that processes information including text according to an embodiment of the present invention with reference to the drawings. .

図１は、本実施形態に係るテキスト処理システムの全体構成を示す図である。 FIG. 1 is a diagram showing an overall configuration of a text processing system according to the present embodiment.

本システムは、テキスト処理装置１と、入力装置２および出力装置３とを備える。 The system includes a text processing device 1, an input device 2, and an output device 3.

入力装置２からは、テキストを含む情報が入力される。テキスト処理装置１は、入力装置２から入力されたテキストを含む情報を解析処理し、解析処理した結果を出力装置３に出力する。テキストを含む情報は、ネットワーク２ａを介して他の端末装置２ｃや携帯端末装置２ｂなどから直接テキスト処理装置１に入力してもよい。 Information including text is input from the input device 2. The text processing device 1 analyzes information including text input from the input device 2 and outputs the result of the analysis processing to the output device 3. Information including text may be directly input to the text processing device 1 from the other terminal device 2c, the portable terminal device 2b, or the like via the network 2a.

テキスト処理装置１は、例えば汎用的なコンピュータシステムにより構成され、以下に説明するテキスト処理装置１内の個々の構成要素または機能は、例えば、コンピュータプログラムを実行することにより実現される。 The text processing device 1 is configured by, for example, a general-purpose computer system, and each component or function in the text processing device 1 described below is realized by executing a computer program, for example.

テキスト処理装置１は、例えば図１に示すように、オリジナルデータ記憶部４と、データブロック生成手段５と、データブロック記憶部６と、条件記憶部７と、フィルタリング処理部８と、類似度判定部９と、グルーピング処理部１０と、時系列分析手段１１と、グループ抽出手段１２と、を備える。 For example, as shown in FIG. 1, the text processing device 1 includes an original data storage unit 4, a data block generation unit 5, a data block storage unit 6, a condition storage unit 7, a filtering processing unit 8, and similarity determination. Unit 9, grouping processing unit 10, time series analysis unit 11, and group extraction unit 12.

オリジナルデータ記憶部４には、オリジナルデータ４０が記憶される。オリジナルデータ４０は、例えば担当オペレータによって入力装置２から入力され、受付ナンバー４２別に逐次蓄積される。また、例えば、オリジナルデータ記憶部４には、商品等に関連する者の携帯端末装置２ｂや他の端末装置２ｃから入力されたオリジナルデータ４０が、ネットワーク２ａを介して直接蓄積されてもよい。 Original data 40 is stored in the original data storage unit 4. The original data 40 is input from the input device 2 by, for example, an operator in charge, and is sequentially stored for each reception number 42. For example, the original data 40 input from the portable terminal device 2b or another terminal device 2c of a person related to the product or the like may be directly stored in the original data storage unit 4 via the network 2a.

オリジナルデータ４０には、オペレータが対応した、商品またはサービス（以下、商品等という）に関連する者による商品等に対する見解が記憶される。商品等に関連する者とは、例えば、顧客、見込み顧客、またはメンテナンス担当者等、何らかの形でその商品等と関連を有する者である。また、商品等に対する見解とは、商品等に対する、苦情、意見またはコメントを含むテキスト情報である。以下の説明では、顧客から寄せられた見解を処理する場合について説明する。 In the original data 40, an opinion on a product or the like by a person related to the product or service (hereinafter referred to as a product or the like) corresponding to the operator is stored. The person related to the product or the like is a person who is related to the product or the like in some form, such as a customer, a prospective customer, or a maintenance person. Further, the opinion on the product or the like is text information including a complaint, opinion or comment on the product or the like. In the following description, a case where an opinion received from a customer is processed will be described.

図２は、オリジナルデータ記憶部４内に格納されたオリジナルデータ４０のデータ構造の一例を示す図である。 FIG. 2 is a diagram showing an example of the data structure of the original data 40 stored in the original data storage unit 4.

例えば、図２に示すオリジナルデータ４０には、顧客の見解を受け付けた日付４１と、受付ナンバー４２と、受付オペレータ情報として担当４３と、顧客情報として、顧客の氏名４４、住所４５、電話番号４６およびメールアドレス４７と、商品等に対する顧客の見解の内容を示す受付内容４８と、顧客の見解に対しオペレータが回答した内容を示す回答内容４９と、がそれぞれ記憶されている。 For example, in the original data 40 shown in FIG. 2, the date 41 when the customer's opinion was received, the reception number 42, the person in charge 43 as reception operator information, the customer name 44, the address 45, and the telephone number 46 as customer information. In addition, an e-mail address 47, a reception content 48 indicating the content of the customer's view of the product, and a response content 49 indicating the content of the operator's response to the customer's view are stored.

受付内容４８は、商品等に対する顧客の見解がテキストとして記憶される。回答内容４９もまた、テキストとして記憶される。 In the received content 48, the customer's view on the product or the like is stored as text. The response content 49 is also stored as text.

図３は、データブロック記憶部６内に格納されたデータブロック群６０のデータ構造の一例を示す図である。 FIG. 3 is a diagram illustrating an example of the data structure of the data block group 60 stored in the data block storage unit 6.

データブロック記憶部６は、顧客による商品等に対する見解の受付時刻情報の一例としての受付日６３と、見解の内容を示すテキスト６４とを含むデータブロック６５を複数記憶する。データブロック６５は以下のようにデータブロック生成手段５により生成される。 The data block storage unit 6 stores a plurality of data blocks 65 including a reception date 63 as an example of reception time information of views on products and the like by customers and a text 64 indicating the content of the views. The data block 65 is generated by the data block generation means 5 as follows.

データブロック生成手段５は、オリジナルデータ４０に複数の文が含まれるとき、その複数の文を一文ずつに分割し、一文のみが含まれる複数のデータブロック６５を生成する。 When the original data 40 includes a plurality of sentences, the data block generation unit 5 divides the plurality of sentences into one sentence and generates a plurality of data blocks 65 including only one sentence.

データブロック生成手段５は、オリジナルデータ記憶部４から受付ナンバー４２別にオリジナルデータ４０を読み出す。読みだしたオリジナルデータ４０の受付内容４８に複数の文が含まれていた場合には、データブロック生成手段５は、例えば句点を検出して、その複数の文を一文ずつに分割して、一文のみのテキスト６４を複数作成する。データブロック生成手段５は、一文のみのテキスト６４に、それぞれ、オリジナルデータ４０の日付４１と受付ナンバー４２を対応づけた、受付日６３と受付ナンバー６１を記載してこれらにブロックＩＤ６２を付与し、複数のデータブロック６５を生成する。 The data block generator 5 reads the original data 40 from the original data storage unit 4 for each reception number 42. If the received content 48 of the read original data 40 includes a plurality of sentences, the data block generation means 5 detects, for example, a punctuation mark, divides the plurality of sentences one sentence at a time, A plurality of only texts 64 are created. The data block generation means 5 describes the reception date 63 and the reception number 61 in which the date 41 and the reception number 42 of the original data 40 are associated with the text 64 of only one sentence, respectively, and assigns a block ID 62 thereto. A plurality of data blocks 65 are generated.

データブロック生成手段５により生成されたデータブロック６５は、データブロック記憶部６に記憶される。例えば、データブロック生成手段５は、オリジナルデータ記憶部４に記憶されたオリジナルデータ４０から適宜、データブロック６５を生成するようにしてもよい。 The data block 65 generated by the data block generation unit 5 is stored in the data block storage unit 6. For example, the data block generation unit 5 may appropriately generate the data block 65 from the original data 40 stored in the original data storage unit 4.

条件記憶部７には、所定の条件を満たした表現が、予め定められている既出グループ別に蓄積される。ここにいう、表現とは、例えば、文字列である。 In the condition storage unit 7, expressions satisfying a predetermined condition are accumulated for each predetermined group. The expression here is, for example, a character string.

条件記憶部７は、例えば図１のように、所定の条件として既出表現抽出条件７１ａが設定された既出条件記憶部７１と、所定の条件として不要表現抽出条件７２ａが設定された不要条件記憶部７２とを備える。 For example, as shown in FIG. 1, the condition storage unit 7 includes an already-existing condition storage unit 71 in which an existing expression extraction condition 71a is set as a predetermined condition, and an unnecessary condition storage unit in which an unnecessary expression extraction condition 72a is set as a predetermined condition. 72.

図４に、既出条件記憶部７１内に格納される、複数の既出表現抽出条件７１ａの一例を示す。 FIG. 4 shows an example of a plurality of previously-explained expression extraction conditions 71 a stored in the already-existing condition storage unit 71.

既出表現抽出条件７１ａは、例えば、類似する既出表現抽出条件７１ａごとにグループ化されている。各既出表現抽出条件７１ａには、グループを識別する識別子７１ｂが対応づけられている。この識別子７１ｂは、各既出表現抽出条件７１ａを一意に特定するとともに、各既出表現抽出条件７１ａが属するグループ及び既出表現抽出条件７１ａであることも特定できるようになっている。図示例にあっては、識別子７１ｂは、既出表現抽出条件７１ａであることを示す「ｅ」と、グループを示す２ケタの数字と、グループ内の条件を識別する４ケタの数字で構成されている。既出条件記憶部７１は、図４のようにツリー構造としてもよい。 The existing expression extraction conditions 71a are grouped, for example, for each similar existing expression extraction condition 71a. An identifier 71b for identifying a group is associated with each of the previously described expression extraction conditions 71a. The identifier 71b uniquely identifies each of the existing expression extraction conditions 71a, and can also specify the group to which each of the existing expression extraction conditions 71a belongs and the existing expression extraction condition 71a. In the illustrated example, the identifier 71b is composed of “e” indicating the existing expression extraction condition 71a, a 2-digit number indicating the group, and a 4-digit number identifying the condition in the group. Yes. The already-existing condition storage unit 71 may have a tree structure as shown in FIG.

既出表現抽出条件７１ａは、過去に寄せられた商品等に対する顧客の見解から、既に認識されている問題に対応する表現である。この際、既出表現抽出条件７１ａに対応させた識別子をあわせて入力してもよい。 The already-explained expression extraction condition 71a is an expression corresponding to a problem that has already been recognized from the customer's view of a product or the like sent in the past. At this time, an identifier corresponding to the previously described expression extraction condition 71a may be input together.

既出条件記憶部７１には、後述するグルーピング処理部１０が新たに生成したグループ（新規グループ）を追加登録できるようにしても良い。例えば、テキスト処理装置１は、グルーピング処理部１０により生成された新規グループに属するデータブロック内のテキストから抽出された表現を、新たな既出グループの新たな既出表現（既出表現抽出条件）若しくは新たな不要表現（不要表現抽出条件）として、前記既出条件記憶部７１若しくは不要表現条件記憶部７２に登録する手段（図示しない）を備えていても良い。 In the existing condition storage unit 71, a group (new group) newly generated by the grouping processing unit 10 to be described later may be additionally registered. For example, the text processing apparatus 1 uses the expression extracted from the text in the data block belonging to the new group generated by the grouping processing unit 10 as a new appearance expression (existing expression extraction condition) of a new appearance group or a new expression. As an unnecessary expression (unnecessary expression extraction condition), means (not shown) for registering in the already-existing condition storage unit 71 or the unnecessary expression condition storage unit 72 may be provided.

図５に、不要条件記憶部７２内に格納される、複数の不要表現抽出条件７２ａの一例を示す。 FIG. 5 shows an example of a plurality of unnecessary expression extraction conditions 72 a stored in the unnecessary condition storage unit 72.

不要表現抽出条件７２ａは、例えば、類似する不要表現抽出条件７２ａごとにグループ化されている。各不要表現抽出条件７２ａには、グループを識別する識別子７２ｂが対応づけられている。この識別子７２ｂは、各不要表現抽出条件７２ａを一意に特定するとともに、各不要表現抽出条件７２ａが属するグループ及び不要表現抽出条件７２ａであることも特定できるようになっている。図示例にあっては、識別子７２ｂは、不要表現抽出条件７２ａであることを示す「ｕ」と、グループを示す２ケタの数字と、グループ内の条件を識別する４ケタの数字で構成されている。不要条件記憶部７１は、図４のようにツリー構造としてもよい。 The unnecessary expression extraction conditions 72a are grouped for each similar unnecessary expression extraction condition 72a, for example. Each unnecessary expression extraction condition 72a is associated with an identifier 72b for identifying a group. The identifier 72b uniquely identifies each unnecessary expression extraction condition 72a, and can also identify the group to which each unnecessary expression extraction condition 72a belongs and the unnecessary expression extraction condition 72a. In the illustrated example, the identifier 72b is composed of “u” indicating the unnecessary expression extraction condition 72a, a 2-digit number indicating the group, and a 4-digit number identifying the condition in the group. Yes. The unnecessary condition storage unit 71 may have a tree structure as shown in FIG.

不要表現抽出条件７２ａは、例えば、「お世話になっております」や「対応願います」など、定型句として使われる表現であって、商品等の問題には直接関係ないとされる表現である。この際、不要表現抽出条件７２ａに対応させた識別子をあわせて入力してもよい。 The unnecessary expression extraction condition 72a is an expression that is used as a fixed phrase such as “Thank you for your help” or “Please respond”, and is an expression that is not directly related to a problem with a product or the like. . At this time, an identifier associated with the unnecessary expression extraction condition 72a may be input together.

不要条件記憶部７２に記憶される不要表現抽出条件７２ａは、オペレータ等によって、入力装置２から随時書き込まれる。この際、不要表現抽出条件７２ａに対応させた識別子をあわせて入力してもよい。 The unnecessary expression extraction condition 72a stored in the unnecessary condition storage unit 72 is written from the input device 2 as needed by an operator or the like. At this time, an identifier associated with the unnecessary expression extraction condition 72a may be input together.

フィルタリング処理部８は、データブロック記憶部６から読みだしたデータブロック６５を、条件記憶部７内の表現を含む第一のデータブロックと、条件記憶部７内の表現を含まない第二のデータブロックとに分別する。 The filtering processing unit 8 includes the first data block including the expression in the condition storage unit 7 and the second data not including the expression in the condition storage unit 7 for the data block 65 read from the data block storage unit 6. Sort into blocks.

フィルタリング処理部８は、例えば、図１に示すように既出条件処理部８１と不要条件処理部８２とを備える。 The filtering processing unit 8 includes, for example, an existing condition processing unit 81 and an unnecessary condition processing unit 82 as shown in FIG.

図６は、既出条件処理部８１によってフィルタリング処理された後のデータブロック群６０の一例を示す図である。 FIG. 6 is a diagram illustrating an example of the data block group 60 after the filtering process is performed by the already-existing condition processing unit 81.

既出条件処理部８１は、データブロック記憶部６から所定の抽出期間のデータブロック群６０を読み出し、既出条件記憶部７１を参照して、データブロック６５内のテキスト６４に既出表現抽出条件７１ａが含まれているかどうかを判断する。例えば、図４、６を参照して説明すると、テキストに既出表現抽出条件７１ａが含まれていた場合には、既出表現抽出条件７１ａに対応する「ｅ」から始まる識別子７１ｂが、そのテキスト６４を含んだデータブロック６５に対応するグループ欄６６に付与される。また、テキスト６４に既出表現抽出条件７１ａが含まれていなかった場合には、そのテキスト６４を含んだデータブロック６５に対応するグループ欄６６には、「その他」が設定される。 The already-existing condition processing unit 81 reads the data block group 60 of a predetermined extraction period from the data block storage unit 6, and refers to the already-existing condition storage unit 71, and the already-explained expression extraction condition 71 a is included in the text 64 in the data block 65. Determine whether it is. For example, referring to FIGS. 4 and 6, when the text expression extraction condition 71 a is included in the text, the identifier 71 b starting with “e” corresponding to the text expression extraction condition 71 a is used to identify the text 64. It is given to the group column 66 corresponding to the included data block 65. If the text 64 does not include the previously-explained expression extraction condition 71a, “other” is set in the group column 66 corresponding to the data block 65 including the text 64.

既出条件処理部８１によってフィルタリング処理が行われた後、データブロック群６０は、不要条件処理部８２に引き渡される。一方、例えば図６において、既出条件処理部８１で処理されたデータブロック６５のうち、グループ欄６６に「ｅ」から始まる識別子７１ｂが付与されたデータブロック６５は、既出データブロック記憶部１３に記憶される。 After the filtering process is performed by the existing condition processing unit 81, the data block group 60 is delivered to the unnecessary condition processing unit 82. On the other hand, for example, in FIG. 6, among the data blocks 65 processed by the existing condition processing unit 81, the data block 65 to which the identifier 71 b starting with “e” is assigned to the group column 66 is stored in the existing data block storage unit 13. Is done.

図７は、不要条件処理部８２によってフィルタリング処理された後のデータブロック群６０の一例を示す図である。 FIG. 7 is a diagram illustrating an example of the data block group 60 after being filtered by the unnecessary condition processing unit 82.

不要条件処理部８２は、既出条件処理部８１から受け取ったデータブロック群６０について、不要条件記憶部７２を参照して、データブロック６５内のテキスト６４に不要表現抽出条件７２ａが含まれているかどうかを判断する。不要表現抽出条件７２ａが含まれているかどうかの判断は、例えば、グループ欄６６に「その他」が設定されているデータブロック６５について行われる。例えば、図５、７を参照して説明すると、テキスト６４に不要表現抽出条件７２ａが含まれていた場合には、その不要表現抽出条件７２ａに対応する「ｕ」から始まる識別子７２ｂが、データブロック６５に対応するグループ欄６６に「その他」に置き換えて設定される。また、テキスト６４に不要表現抽出条件７２ａが含まれていなかった場合には、そのテキスト６４を含んだデータブロック６５に対応するグループ欄６６は、「その他」のままとなる。 The unnecessary condition processing unit 82 refers to the unnecessary condition storage unit 72 for the data block group 60 received from the already-existing condition processing unit 81, and determines whether or not the unnecessary expression extraction condition 72 a is included in the text 64 in the data block 65. Judging. The determination whether or not the unnecessary expression extraction condition 72a is included is made, for example, for the data block 65 in which “other” is set in the group column 66. For example, referring to FIGS. 5 and 7, if the text 64 includes an unnecessary expression extraction condition 72a, an identifier 72b starting from “u” corresponding to the unnecessary expression extraction condition 72a is displayed in the data block. It is set in the group column 66 corresponding to 65 by replacing it with “Other”. If the text 64 does not include the unnecessary expression extraction condition 72a, the group column 66 corresponding to the data block 65 including the text 64 remains “Other”.

不要条件処理部８２によってフィルタリング処理が行われた後、データブロック群６０は、後述するグルーピング処理部１０に引き渡される。一方、例えば図７において、不要条件処理部８２で処理されたデータブロック６５のうち、グループ欄６６に「ｕ」から始まる識別子７２ｂが付与されたデータブロック６５は、不要データブロック記憶部１４に記億される。 After the filtering process is performed by the unnecessary condition processing unit 82, the data block group 60 is delivered to the grouping processing unit 10 described later. On the other hand, for example, in FIG. 7, among the data blocks 65 processed by the unnecessary condition processing unit 82, the data block 65 to which the identifier 72 b starting with “u” is assigned to the group column 66 is recorded in the unnecessary data block storage unit 14. Billion.

類似度判定部９は、２つのテキストの類似度を判定する。例えば対象となる２つのテキストをそれぞれ形態素に分解し、互いに対比して、それらの係り受けを分析するなどして、互いの類似度を判定する。 The similarity determination unit 9 determines the similarity between two texts. For example, the two target texts are each decomposed into morphemes, compared with each other, and their dependency is analyzed.

グルーピング処理部１０は、不要条件処理部８２から受け取ったデータブロック群６０について、例えば、グループ欄６６に「その他」が設定されているデータブロック６５をすべて抽出する。そして、グルーピング処理部１０は、抽出されたデータブロック６５のいずれか２つのデータブロック６５を類似度判定部９へ渡す。 For example, the grouping processing unit 10 extracts all the data blocks 65 in which “other” is set in the group column 66 from the data block group 60 received from the unnecessary condition processing unit 82. Then, the grouping processing unit 10 passes any two data blocks 65 of the extracted data blocks 65 to the similarity determination unit 9.

グルーピング処理部１０は、対象となる複数のデータブロック６５同士の全件マッチングを行う。つまり、複数のデータブロック６５から２つを抽出する場合のすべての組み合わせについて、類似度判定部９へ類似度判定を繰り返し依頼する。 The grouping processing unit 10 performs all matching between a plurality of target data blocks 65. That is, the similarity determination unit 9 is repeatedly requested for similarity determination for all combinations when two are extracted from the plurality of data blocks 65.

従って、類似度判定部９及びグルーピング処理部１０は、テキスト解析によって、複数のデータブロック６５のそれぞれのテキスト６４同士を比較して、互いの類似度を算出する。 Therefore, the similarity determination unit 9 and the grouping processing unit 10 compare the texts 64 of the plurality of data blocks 65 by text analysis, and calculate the similarity between each other.

グルーピング処理部１０は、全件マッチングを行うときに、類似度判定部９へ送った２つのデータブロック６５と、類似度判定部９から返されたその類似度を類似度算出テーブル１００に保存する。類似度算出テーブル１００は、グルーピング処理部１０に一時的に記憶される。 When performing all-matching, the grouping processing unit 10 stores the two data blocks 65 sent to the similarity determination unit 9 and the similarities returned from the similarity determination unit 9 in the similarity calculation table 100. . The similarity calculation table 100 is temporarily stored in the grouping processing unit 10.

類似度は、単語や単語のかかり受けを含む、一文のテキスト６４全体を比較し、選択された一のデータブロック６５に含まれるテキスト６４と、比較対象となる一のデータブロック６５に含まれるテキスト６４とがどのくらい類似しているかの度合を定量的に算出したものである。 The degree of similarity is a comparison of the whole text 64 of a sentence including a word and a word change, and the text 64 included in one selected data block 65 and the text included in one data block 65 to be compared. This is a quantitative calculation of the degree of similarity to 64.

図８に、類似度算出テーブル１００の一例を示す。 FIG. 8 shows an example of the similarity calculation table 100.

類似度算出テーブル１００は、例えば図８のように、選択された一のデータブロック６５ごとに、その一のデータブロック６５と、比較対象となるすべてのデータブロック６５それぞれとの類似度１０２が、記憶されている。この場合、類似度１０２の高いデータブロック６５から順に、順位１０１「１」から類似度算出テーブル１００に記憶される。 For example, as shown in FIG. 8, the similarity calculation table 100 includes, for each selected data block 65, the similarity 102 between the one data block 65 and all the data blocks 65 to be compared. It is remembered. In this case, the data is stored in the similarity calculation table 100 from the rank 101 “1” in order from the data block 65 having the highest similarity 102.

図８では、選択された一のデータブロック６５のブロックＩＤ１０３（以下、基準ＩＤという）は、「１１１４」である。基準ＩＤ「１１１４」と、比較対象となるブロックＩＤ１０３（以下、対象ＩＤという）との類似度については、対象ＩＤが「１１１４」とした、自身との類似度１０２が、１００％で一番高く、対象ＩＤが、「５４０３」のデータブロック６５との類似度１０２が、８５．７％で次に高い。 In FIG. 8, the block ID 103 (hereinafter referred to as the reference ID) of the selected one data block 65 is “1114”. Regarding the similarity between the reference ID “1114” and the block ID 103 to be compared (hereinafter referred to as “target ID”), the similarity 102 with the target ID “1114” is the highest at 100%. The similarity 102 with the data block 65 having the target ID “5403” is the next highest at 85.7%.

全件マッチングが終了すると、グルーピング処理部１０は、類似度１０２に基づいて、複数のデータブロック６５をグループ化する。 When all matching is completed, the grouping processing unit 10 groups the plurality of data blocks 65 based on the similarity 102.

まず、グルーピング処理部１０は、例えば、類似度算出テーブル１００に基づき、図９に示すような分析を行いグループを生成する。 First, the grouping processing unit 10 generates a group by performing an analysis as illustrated in FIG. 9 based on the similarity calculation table 100, for example.

グルーピング処理部１０は、類似度算出テーブル１００から、基準ＩＤ、対象ＩＤおよびこれらの類似度１０２を抽出し、図９に示すように、基準ＩＤごとに、その基準ＩＤと対象ＩＤの組を、類似度１０２の高い順に配置する。 The grouping processing unit 10 extracts the reference ID, the target ID, and their similarity 102 from the similarity calculation table 100, and sets a pair of the reference ID and the target ID for each reference ID as shown in FIG. They are arranged in descending order of similarity 102.

次に、グルーピング処理部１０は、テキスト解析結果テーブル９０に基づいて、複数のデータブロック６５をグループ化する。つまり、テキスト解析結果テーブル９０に基づいて、基準ＩＤと対象ＩＤとの類似度１０２が所定以上であったものを同一グループとする。 Next, the grouping processing unit 10 groups a plurality of data blocks 65 based on the text analysis result table 90. That is, on the basis of the text analysis result table 90, those in which the similarity 102 between the reference ID and the target ID is equal to or greater than a predetermined value are set as the same group.

基準ＩＤと対象ＩＤとの類似度１０２が所定以上とは、この２つブロックＩＤを有するデータブロック６５間の類似度１０２が閾値以上である場合と、この２つのブロックＩＤに関連するデータブロック６５を介して、結果として３つ以上のデータブロック６５の類似度１０２が閾値以上となる場合とを含む概念である。閾値は、例えば８５％などと予め設定しておく。 The similarity 102 between the reference ID and the target ID is equal to or greater than a predetermined value. The similarity 102 between the data blocks 65 having the two block IDs is equal to or greater than a threshold value, and the data block 65 related to the two block IDs. As a result, the concept includes a case where the similarity 102 of three or more data blocks 65 is equal to or greater than a threshold value. The threshold value is set in advance as 85%, for example.

グルーピング処理部１０は、テキスト解析結果テーブル９０の、基準ＩＤと対象ＩＤの組のうち、類似度１０２が閾値以上の組と、閾値未満の組とを識別する。図示例にあっては、類似度１０２が閾値以上の組を識別するために、閾値未満の組にハッチングを施している。また、図示例では、簡単のため、ブロックＩＤ１〜７についてのテキスト解析結果を示している。 In the text analysis result table 90, the grouping processing unit 10 identifies a pair having a similarity 102 equal to or higher than a threshold and a pair having a threshold less than the threshold among the pairs of the reference ID and the target ID. In the illustrated example, in order to identify a group having a similarity 102 that is equal to or greater than a threshold value, the group that is less than the threshold value is hatched. In the illustrated example, the text analysis results for the block IDs 1 to 7 are shown for simplicity.

図９のテキスト解析結果テーブル９０にあっては、類似度１０２が閾値以上の基準ＩＤと対象ＩＤの組は、基準ＩＤ「１」では、「１−７」、「１−２」の２組であり、基準ＩＤ「２」では、「２−５」の１組であり、基準ＩＤ「３」では、「３−６」、「３−９」の２組である。グルーピング処理部１０は、上記の組を識別してグループ化する。まず、「１−７」、「１−２」の類似度が閾値以上であるため「１−２−７」のブロックＩＤを有するデータブロック６５がグループ化される。次に、「２−５」の類似度１０２が閾値以上であるため、「１−２−５−７」のブロックＩＤを有するデータブロック６５がグループ化される。この場合、例えば「２−７」の類似度や、「５−７」の類似度が閾値以上であるか否かは問わない。また、「３−６」、「３−９」が閾値以上であるため、「３−６−９」のブロックＩＤを有するデータブロック６５がグループ化される。この場合も「６−９」の類似度が閾値以上であるか否かは問わない。 In the text analysis result table 90 of FIG. 9, there are two sets of reference IDs and target IDs whose similarity 102 is equal to or higher than a threshold value, “1-7” and “1-2” for the reference ID “1”. The reference ID “2” is one set of “2-5”, and the reference ID “3” is two sets of “3-6” and “3-9”. The grouping processing unit 10 identifies and groups the above sets. First, since the similarity of “1-7” and “1-2” is equal to or greater than the threshold value, the data blocks 65 having the block ID “1-2-7” are grouped. Next, since the similarity 102 of “2-5” is equal to or greater than the threshold value, the data blocks 65 having the block ID of “1-2-5-7” are grouped. In this case, for example, it does not matter whether the similarity of “2-7” or the similarity of “5-7” is greater than or equal to a threshold value. Further, since “3-6” and “3-9” are equal to or greater than the threshold value, the data blocks 65 having the block ID “3-6-9” are grouped. Also in this case, it does not matter whether the similarity of “6-9” is equal to or higher than the threshold value.

図１０は、グルーピング処理部１０によってグループ化された後のデータ構造の一例を示す図である。 FIG. 10 is a diagram illustrating an example of a data structure after grouping by the grouping processing unit 10.

グルーピング処理部１０は、グループ化によって新規に生成されたグループのデータブロック６５に、そのグループを識別する識別子を付与する。例えば、図１０では、識別子は、グルーピング処理部１０によってグループ化された新規グループのデータブロック６５ある旨を示す「ｎ」と、グループを示す２ケタの数字と、グループ内の条件を識別する４ケタの数字で構成されている。 The grouping processing unit 10 assigns an identifier for identifying the group to the data block 65 of the group newly generated by grouping. For example, in FIG. 10, the identifier identifies “n” indicating that there is a data block 65 of a new group grouped by the grouping processing unit 10, a 2-digit number indicating the group, and a condition within the group 4 It consists of digits.

グルーピング処理部１０によって付与された識別子は、データブロック群６０のグループ欄６６に「その他」に置き換えて設定される。 The identifier assigned by the grouping processing unit 10 is set in the group column 66 of the data block group 60 in place of “Other”.

続いて、グルーピング処理部１０は、例えば、図１０に示す、データブロック群６０のうち、グループ欄６６に「ｎ」から始まる識別子が付与されたデータブロック６５を、新規データブロック記憶部１５に記憶する。 Subsequently, for example, in the data block group 60 shown in FIG. 10, the grouping processing unit 10 stores, in the new data block storage unit 15, the data block 65 in which the identifier starting with “n” is assigned to the group column 66. To do.

上述した、フィルタリング処理部８、グルーピング処理部１０並びに類似度判定部９による各処理は、受付日６３を基準とした、予め設定された期間のデータブロック６５を対象として、バッチ処理によって行ってもよい。 Each processing by the filtering processing unit 8, the grouping processing unit 10, and the similarity determination unit 9 described above may be performed by batch processing for a data block 65 of a preset period with the reception date 63 as a reference. Good.

時系列分析手段１１は、グループ化されたグループ内の複数のデータブロック６５の受付日６４を基準として、グループの所定の抽出期間における時系列分析を行う。時系列分析を行う際の、抽出期間は、直近の第一抽出期間と、第一抽出期間を含みかつ第一抽出期間よりも長い第二抽出期間とを含むようにしてもよい。 The time series analysis unit 11 performs time series analysis in a predetermined extraction period of the group with reference to the reception date 64 of the plurality of data blocks 65 in the grouped group. The extraction period when performing the time series analysis may include the latest first extraction period and the second extraction period including the first extraction period and longer than the first extraction period.

時系列分析手段１１は、例えば、１つのグループについて、受付日６３を基準とした所定期間を横軸にし、受付日６３を基準とした所定期間に対するデータブロック数を縦軸とした、ヒストグラムを生成する。さらに、時系列分析手段１１は、例えば、生成したヒストグラムに基づき、受付日６３を基準とした所定期間のデータブロック数の変動率を算出する。 For example, the time series analysis unit 11 generates a histogram for one group with the predetermined period based on the reception date 63 as the horizontal axis and the number of data blocks for the predetermined period based on the reception date 63 as the vertical axis. To do. Furthermore, the time-series analysis unit 11 calculates the fluctuation rate of the number of data blocks in a predetermined period with the reception date 63 as a reference based on the generated histogram, for example.

図１１に、時系列分析手段１１で生成されたヒストグラムの一例を示す。図１１（ａ）は、日が経つごとに、データブロックの出現数が増加している。図１１（ｂ）は、データブロックの出現数が、一時的に増加してもとに戻っている。 FIG. 11 shows an example of a histogram generated by the time series analysis unit 11. In FIG. 11A, the number of appearances of data blocks increases with the passage of days. FIG. 11B returns to the original state even if the number of appearances of the data block temporarily increases.

変動率は、所定期間における、データブロック数の最小値から最大値への増加率、若しくは最大値から最小値への減少率である。増加率および減少率は、データブロック数の最大値を最小値で割った値を百分率表示したものであり、どちらもプラスで表わされる。最小値が０の場合には、便宜上、増加率及び減少率は、データブロック６５の数の最大値を１で割った値を百分率表示したものとして扱う。 The variation rate is an increase rate from the minimum value to the maximum value or a decrease rate from the maximum value to the minimum value in the predetermined period. The increase rate and the decrease rate are values obtained by dividing the maximum value of the number of data blocks by the minimum value and expressed as a percentage, both of which are expressed as pluses. When the minimum value is 0, for the sake of convenience, the increase rate and the decrease rate are treated as a percentage value obtained by dividing the maximum value of the number of data blocks 65 by 1.

従って、図１１（ａ）では、「２００８年７月」のデータブロック数が最大値となり、「２００８年４月」のデータブロック数が最小値となる。なおこの場合、最小値が０であるので、増加率は、最大値である「２００８年７月」のデータブロック数を１で割った値を百分率表示したものとなる。 Therefore, in FIG. 11A, the number of data blocks of “July 2008” is the maximum value, and the number of data blocks of “April 2008” is the minimum value. In this case, since the minimum value is 0, the increase rate is a value obtained by dividing the number of data blocks of “July 2008”, which is the maximum value, by 1 as a percentage.

また、図１１（ｂ）の場合は、「２００８年６月」のデータブロック数が最大値となり、「２００８年５月」及び「２００８年７月」のデータブロック数が同数であるので、ともに最小値となる。 In the case of FIG. 11B, the number of data blocks of “June 2008” is the maximum, and the numbers of data blocks of “May 2008” and “July 2008” are the same. Minimum value.

時系列分析手段１１は、グルーピング処理部１０によって、生成された新規グループ、及び、各第一のデータブロックが含んでいる表現に対応する既出グループの少なくとも一つのグループについて時系列分析を行ってもよい。 The time series analysis unit 11 may perform time series analysis on the generated new group and at least one of the existing groups corresponding to the expression included in each first data block by the grouping processing unit 10. Good.

時系列分析手段１１により時系列分析されたグループは、例えば図1に示すように、グループ抽出手段１２に引き渡される。 The group subjected to time series analysis by the time series analysis unit 11 is delivered to the group extraction unit 12 as shown in FIG.

グループ抽出手段１２は、時系列分析手段１１による時系列分析により得られた、前記見解の受付日６４ベースの出現件数の時系列変化が所定の抽出条件に合致するグループを抽出する。抽出条件は、新規グループ内のデータブロック数、見解の受付日ベースの出現件数の変動率、または受付日ベースの出現件数の変動パターンのいずれかに基づいて定めてもよい。 The group extraction unit 12 extracts a group in which the time series change of the number of appearances based on the reception date 64 of the opinion obtained by the time series analysis by the time series analysis unit 11 matches a predetermined extraction condition. The extraction condition may be determined based on any of the number of data blocks in the new group, the rate of change in the number of appearances based on the reception date of the opinion, or the variation pattern of the number of appearances based on the reception date.

グループ抽出手段１２が、時系列分析手段１１の分析結果に基づいて変動パターンを決定する。例えば、図１１のようなヒストグラムを生成したときの、データブロック数の変動に応じて決めてもよい。 The group extraction unit 12 determines a variation pattern based on the analysis result of the time series analysis unit 11. For example, you may decide according to the fluctuation | variation of the number of data blocks when a histogram like FIG. 11 is produced | generated.

図１２に、変動パターンの一例を示す。 FIG. 12 shows an example of the variation pattern.

１の時系列パターン１８２は、「単調に増加」である、このときは、変動率として増加率が計算される。 The time series pattern 182 of 1 is “monotonically increasing”. At this time, the increasing rate is calculated as the variation rate.

２の時系列パターン１８２は、「減少後に増加」である。このときは、変動率として増加率が計算される。 The second time-series pattern 182 is “increase after decrease”. At this time, an increase rate is calculated as a variation rate.

３の時系列パターン１８２は、「不規則に変動」である。このときは、変動率は「？」となる。 The time series pattern 182 of “3” is “irregularly fluctuating”. At this time, the fluctuation rate is “?”.

４の時系列パターン１８２は、「増加後に減少」である。このときは、変動率として減少率が計算される。 The time series pattern 182 of “4” is “decrease after increase”. At this time, the decrease rate is calculated as the variation rate.

５の時系列パターン１８２は、「変化なし」である。縦軸のコメントデータブロック６３件数が０件であるときは除かれる。 The time series pattern 182 of “5” is “no change”. When the number of comment data blocks 63 on the vertical axis is 0, it is excluded.

６の時系列パターン１８２は、「単調減少」である。このときは、変動率として減少率が計算される。 The time series pattern 182 of 6 is “monotonically decreasing”. At this time, the decrease rate is calculated as the variation rate.

図１３は、図１の時系列分析手段１１およびグループ抽出手段１２の処理の流れを示すフローチャートである。図１４は、出力装置３に表示される時系列分析の際の条件設定画面１６である。図１５は、出力装置３に表示される時系列分析およびグループ抽出の際の詳細設定画面１７である。図１６は、出力装置３に表示される結果表示画面１９の一例を示す図である。 FIG. 13 is a flowchart showing a processing flow of the time series analysis unit 11 and the group extraction unit 12 of FIG. FIG. 14 shows a condition setting screen 16 for time series analysis displayed on the output device 3. FIG. 15 is a detailed setting screen 17 for time series analysis and group extraction displayed on the output device 3. FIG. 16 is a diagram illustrating an example of a result display screen 19 displayed on the output device 3.

図１３のフローチャートの処理の流れを、図１４〜１６を参照しつつ説明する。 The process flow of the flowchart of FIG. 13 will be described with reference to FIGS.

まず、分析を行う対象を選択するための条件を設定する（Ｓ１）。このとき、図１４の条件設定画面１６が入力装置２に表示される。図１４の条件設定画面１６には、新規データブロック記憶部１５に記憶されている新規データブロックを選択するチェックボックス１６１、既出データブロック記憶部１３に記憶されている既出データブロックを選択するチェックボックス１６２および不要データブロック記憶部１４に記憶されている不要データブロックを選択するチェックボックス１６３と、実行ボタン１６４と、キャンセルボンタンが表示される。 First, conditions for selecting an object to be analyzed are set (S1). At this time, the condition setting screen 16 of FIG. 14 is displayed on the input device 2. In the condition setting screen 16 of FIG. 14, a check box 161 for selecting a new data block stored in the new data block storage unit 15 and a check box for selecting an existing data block stored in the existing data block storage unit 13 are displayed. 162 and a check box 163 for selecting an unnecessary data block stored in the unnecessary data block storage unit 14, an execution button 164, and a cancel button are displayed.

時系列分析をする際には、例えば、オペレータは、入力装置２の条件設定画面１６から、どの記憶部１３〜１５について時系列分析するか選択し、実行する。図１４では、例えば新規データブロック記憶部１５を選択するチェックボックス１６１にチェックが入力されている。 When performing the time series analysis, for example, the operator selects which storage unit 13 to 15 to perform the time series analysis from the condition setting screen 16 of the input device 2 and executes it. In FIG. 14, for example, a check is input to a check box 161 for selecting a new data block storage unit 15.

新規データブロックが１６１選択された場合には、時系列分析手段１１は、新規データブロック記憶部１５内のデータブロック６５を参照し、そのグループ欄６６の識別子が「ｎ」から始まるグループについて時系列分析を実行する。 When the new data block 161 is selected, the time series analysis unit 11 refers to the data block 65 in the new data block storage unit 15 and sets the time series for the group whose identifier in the group column 66 starts from “n”. Perform analysis.

既出データブロック１６２が選択された場合には、時系列分析手段１１は、既出データブロック記憶部１３内のデータブロック６５を参照し、そのグループ欄６６の識別子が「ｅ」から始まるグループについて時系列分析を実行する。 When the existing data block 162 is selected, the time series analysis unit 11 refers to the data block 65 in the existing data block storage unit 13 and sets the time series for the group whose identifier in the group column 66 starts from “e”. Perform analysis.

不要データブロック１６３が選択された場合には、時系列分析手段１１は、不要データブロック記憶部１４内のデータブロック６５を参照し、そのグループ欄６６の識別子が「ｕ」から始まるグループについて時系列分析を実行する。 When the unnecessary data block 163 is selected, the time series analysis unit 11 refers to the data block 65 in the unnecessary data block storage unit 14 and sets the time series for the group whose identifier in the group column 66 starts from “u”. Perform analysis.

条件設定画面１６において、実行ボタン１６４が押されると、図１５の詳細設定画面１７が出力装置３に表示される。詳細設定画面１７は、時系列分析を行う際の抽出期間と、グループ抽出手段における抽出条件を指示または選択する画面である。 When the execution button 164 is pressed on the condition setting screen 16, the detailed setting screen 17 of FIG. 15 is displayed on the output device 3. The detailed setting screen 17 is a screen for instructing or selecting an extraction period for performing time series analysis and an extraction condition in the group extraction means.

図１５の詳細設定画面１７では、直近の期間と全体の分析期間がそれぞれ選択できるようになっている。図１５の詳細設定画面１７には、分析期間の指定領域１７１、区分の選択領域１７２、直近の指定領域１７３、変動パターンの選択領域１７４、変動率の指定領域１７５、文書数の指定領域１７６が表示される。オペレータは、詳細設定画面１７に必要事項を指定若しくは選択する（Ｓ２）。 In the detailed setting screen 17 of FIG. 15, the most recent period and the entire analysis period can be selected. The detailed setting screen 17 in FIG. 15 includes an analysis period specification area 171, a category selection area 172, a latest specification area 173, a variation pattern selection area 174, a variation rate specification area 175, and a document number specification area 176. Is displayed. The operator designates or selects necessary items on the detailed setting screen 17 (S2).

分析期間の指定領域１７１には、抽出したい期間の始期と終期の年月日を指定できる。分析期間は、抽出される最も長い期間であり、直近の期間を含んだ期間である。 In the analysis period specification area 171, the start date and the end date of the period to be extracted can be specified. The analysis period is the longest period that is extracted and includes the most recent period.

区分の選択領域１７２には、時系列分析において横軸となる、受付日６３を基準とした期間が選択できる。この場合の期間として、「年」、「月」、「週」、「日」の４つが選択できる。 In the category selection area 172, a period based on the reception date 63, which is a horizontal axis in the time series analysis, can be selected. In this case, four periods of “year”, “month”, “week”, and “day” can be selected.

直近の指定領域１７３には、区分の選択領域１７２に指定した期間を、直近から何区分出力するかを指定できる。例えば図１５のように、区分の選択領域１７２で「週」を選択し、直近の指定領域１７３で、「３データ」とした場合には、週単位で、直近から３週間分のデータが出力される。 In the latest designated area 173, it is possible to designate how many sections are output from the most recent period specified in the section selection area 172. For example, as shown in FIG. 15, when “week” is selected in the selection area 172 of the category and “3 data” is selected in the latest designated area 173, data for the last 3 weeks is output in units of weeks. Is done.

変動パターンの選択領域１７４には、時系列分析によって解析されたグループの変動パターンのうち、抽出したい変動パターンを指定できる。変動パターンは、直近の期間および全体の分析期間についてそれぞれ指定できる。変動パターンの選択領域１７４には、図１５で示す「指定なし」の他、図１２に例示したように、数種類の変動パターン(時系列パターン１８２)を選択できるようにしてもよい In the variation pattern selection area 174, a variation pattern to be extracted can be designated from among the variation patterns of the group analyzed by the time series analysis. Variation patterns can be specified for the most recent period and the entire analysis period, respectively. In the variation pattern selection area 174, in addition to “no designation” shown in FIG. 15, as exemplified in FIG. 12, several types of variation patterns (time series pattern 182) may be selected.

変動率の指定領域１７５は、何パーセント以上の変動率のグループを出力するかを指定できる。 The variation rate designation area 175 can designate what percentage of the group with the variation rate to be output.

文書数の指定領域１７６には、例えば図１５では、時系列分析したグループのうち、全体の分析期間における、データブロック数がいくつ以上のものを出力するかが指定できる。 In the document number designation area 176, for example, in FIG. 15, it is possible to designate the number of data blocks to be output in the entire analysis period from the group analyzed in time series.

なお、変動率の指定領域１７５および文書数の指定領域１７６には、直近のデータ数、若しくは、直近のデータ数と全体の分析期間の両方についての変動率および文書数を指定できるようにしてもよい。 It should be noted that the rate of change designation area 175 and the number of documents designation area 176 can specify the latest data number, or the change rate and the number of documents for both the latest data number and the entire analysis period. Good.

ここで図１３に戻ると、時系列分析手段１１は、図１４のチェックボックス１６１〜１６３にチェックされた記憶部１３〜１５から、指定領域１７１に入力した分析期間のデータブロック６５を抽出する（Ｓ３）。 Returning to FIG. 13, the time-series analysis unit 11 extracts the data block 65 of the analysis period input to the designated area 171 from the storage units 13 to 15 checked in the check boxes 161 to 163 in FIG. 14 ( S3).

ステップＳ３で抽出したデータブロック６５について、時系列分析手段１１は、区分の選択領域１７２で選択された区分に従って、直近及び全体の分析期間のそれぞれについて、上述した要領で時系列分析を行う。（Ｓ４）。 For the data block 65 extracted in step S3, the time series analysis unit 11 performs time series analysis in the manner described above for each of the latest and the entire analysis period according to the category selected in the category selection area 172. (S4).

時系列分析段１１による時系列分析結果に基づいて、グループ抽出手段１２は、変動率を算出する（Ｓ５）。 Based on the time series analysis result by the time series analysis stage 11, the group extraction means 12 calculates the fluctuation rate (S5).

グループ抽出手段１２は、図１５の詳細設定画面１７において、指定領域１７５に指定された変動率、および、指定領域１７６に指定された文書数の範囲に該当するグループを抽出する（Ｓ６）。 The group extraction unit 12 extracts groups corresponding to the ranges of the change rate specified in the specified area 175 and the number of documents specified in the specified area 176 on the detailed setting screen 17 in FIG. 15 (S6).

グループ抽出手段１２は、ステップＳ６で抽出されたグループの中から、図１５の詳細設定画面１７において、選択領域１７４に選択した変動パターンに該当するグループを抽出する（Ｓ７）。変動パターンは、その領域１７４に選択した、全体の分析期間および直近の期間それぞれについて検索され、どちらもかまたはいずれか一方に合致したものが抽出される。なお、選択領域１７４に「指定なし」が選択された場合は、その抽出期間については抽出されない。 The group extraction unit 12 extracts a group corresponding to the variation pattern selected in the selection area 174 on the detailed setting screen 17 in FIG. 15 from the groups extracted in step S6 (S7). The variation pattern is searched for each of the entire analysis period and the most recent period selected in the area 174, and a pattern that matches either or both is extracted. Note that when “undesignated” is selected in the selection area 174, the extraction period is not extracted.

図１４の条件設定画面１６、図１５の詳細設定画面１７で、指定または選択した条件に基づいた、時系列分析処理およびグループ抽出処理の結果が出力装置３に表示される（Ｓ８）。 The results of the time series analysis process and the group extraction process based on the conditions specified or selected on the condition setting screen 16 in FIG. 14 and the detailed setting screen 17 in FIG. 15 are displayed on the output device 3 (S8).

図１６は、結果表示画面１９の一例を示す図である。 FIG. 16 is a diagram illustrating an example of the result display screen 19.

結果表示画面１９は、出力装置３に出力される。結果表示画面１９には、例えば、図１６のように、出力ナンバー１９１、全体期間１９２、直近期間１９３、変動率１９４、文書数１９５、グループ１９６が出力される。なお、全体期間１９２は、全体の分析期間の変動パターンを示し、直近期間１９３は、直近の期間の変動パターンを示す。また、変動率１９４および文書数１９５は、全体期間１９２の変動率および文書数を示す。 The result display screen 19 is output to the output device 3. For example, as shown in FIG. 16, the output number 191, the entire period 192, the latest period 193, the change rate 194, the number of documents 195, and the group 196 are output on the result display screen 19. Note that the overall period 192 shows the fluctuation pattern of the whole analysis period, and the latest period 193 shows the fluctuation pattern of the latest period. Further, the change rate 194 and the number of documents 195 indicate the change rate and the number of documents in the entire period 192.

所定の出力ナンバー１９１における、全体期間１９２をクリックすると、全体の分析期間の変動パターンの詳細が表示される。例えば、出力ナンバー１９１「１」における、分析期間１９２をクリックすると、図１１（ａ）のような詳細結果が表示される。また、例えば、出力ナンバー１９１「４」における、直近期間１９３をクリックしたときには、図１５で指定した直近期間１７３の３データ分が図１１（ｂ）のように出力される。 Clicking on the entire period 192 for a predetermined output number 191 displays details of the variation pattern of the entire analysis period. For example, when the analysis period 192 in the output number 191 “1” is clicked, a detailed result as shown in FIG. 11A is displayed. For example, when the latest period 193 is clicked in the output number 191 “4”, three data of the latest period 173 specified in FIG. 15 are output as shown in FIG.

また、図示はしないが、グループ１９６をクリックすれば、グループ内のデータブロック６５が全件表示され、個別のデータブロック６５を表示することで、データブロック記憶部６や、各条件を満たしたデータブロック１３〜１５を適宜参照できるようにしてもよい。 Although not shown, if the group 196 is clicked, all the data blocks 65 in the group are displayed. By displaying the individual data blocks 65, the data block storage unit 6 and the data satisfying each condition are displayed. You may enable it to refer to the blocks 13-15 suitably.

以上のように、本実施形態に係るテキスト処理装置１にあっては、従来技術のように、予め設定したキーワード等でのフィルタリング処理をしなくても、テキスト解析により算出された類似度に基づいて複数のデータブロック６５をグループ化し、グループ内のデータブロック６５を時系列分析することで、問題を自動的に検出できる。 As described above, the text processing apparatus 1 according to the present embodiment is based on the similarity calculated by the text analysis without performing the filtering process with a preset keyword or the like as in the prior art. Thus, a problem can be automatically detected by grouping a plurality of data blocks 65 and analyzing the data blocks 65 in the group in time series.

特に、商品等に問題点が検出されるときには、例えば図１７に示すように、問題が頻出する頻出期の前に、ヘルプデスク等にその問題を指摘する商品等に顧客からのごく少数の声が寄せられる予兆期があるという特徴がある。本実施形態に係るテキスト処理装置１は、この特徴を利用して、頻出期の前の予兆期の段階であっても、これから問題となるであろう点を自動的に検出することができる。 In particular, when a problem is detected in a product or the like, as shown in FIG. 17, for example, as shown in FIG. There is a characteristic that there is a predictive period. The text processing apparatus 1 according to the present embodiment can automatically detect a point that will be a problem even in the sign period before the frequent appearance period by using this feature.

また、本実施形態にあっては、出力したい見解の出現件数の時系列変化の傾向を予め設定することができる。 Moreover, in this embodiment, the tendency of the time series change of the appearance number of views to output can be set in advance.

上述した本発明の実施形態は、本発明の説明のための例示であり、本発明の範囲をそれらの実施形態にのみ限定する趣旨ではない。当業者は、本発明の要旨を逸脱することなしに、他の様々な態様で本発明を実施することができる。 The above-described embodiments of the present invention are examples for explaining the present invention, and are not intended to limit the scope of the present invention only to those embodiments. Those skilled in the art can implement the present invention in various other modes without departing from the gist of the present invention.

１テキスト処理装置
２入力装置
２ａネットワーク
２ｂ携帯端末装置
２ｃ端末装置
３出力装置
４オリジナルデータ記憶部
５データブロック生成手段
６データブロック記憶部
７条件記憶部
８フィルタリング処理部
９類似度判定部
１０グループピング処理部
１１時系列分析手段
１２グループ抽出手段
１３既出データブロック記憶部
１４不要データブロック記憶部
１５新規データブロック記憶部
６３受付日
６４テキスト
６５データブロック
DESCRIPTION OF SYMBOLS 1 Text processing device 2 Input device 2a Network 2b Portable terminal device 2c Terminal device 3 Output device 4 Original data memory | storage part 5 Data block production | generation means 6 Data block memory | storage part 7 Condition memory | storage part 8 Filtering process part 9 Similarity determination part 10 Grouping Processing unit 11 Time series analysis unit 12 Group extraction unit 13 Existing data block storage unit 14 Unnecessary data block storage unit 15 New data block storage unit 63 Reception date 64 Text 65 Data block

Claims

Storage means for storing a plurality of data blocks including text reception time and a text reception time represented by an opinion on the product or service by a person related to the product or service;
Text analysis means for comparing each text of the plurality of data blocks by text analysis and calculating the degree of similarity with each other;
Group generation means for grouping a plurality of data blocks to generate a new group based on the similarity calculated by the text analysis means;
For the new group generated by the group generation means , the first extraction period including the most recent first extraction period and the first extraction period on the basis of reception times of a plurality of data blocks in the new group, and the first A time series analysis means for performing a time series analysis in a predetermined extraction period including a second extraction period longer than the extraction period ;
A group extracting unit for extracting the new group obtained by the analysis by the time series analyzing unit, wherein the time series change in the number of appearances based on the reception time of the opinion matches a predetermined extraction condition;
With
The extraction condition is a condition determined based on one or more parameters including at least one of a variation rate of the number of appearances based on the reception time and a variation pattern of the number of appearances based on the reception time.
Text processing apparatus.

The extraction conditions are:
(*) The number of data blocks in the new group in at least one of the most recent first extraction period and the second extraction period,
(*) The rate of change in the number of appearances based on the reception time in at least one of the most recent first extraction period and the second extraction period,
(*) Fluctuation pattern of the number of appearances based on the reception time in each of the latest first extraction period and the second extraction period
It is a condition determined based on
The text processing apparatus according to claim 1.

A pre-determined pre-existing expression stored in a plurality of pre-existing groups;
An unnecessary condition storage unit that stores predetermined unnecessary expressions;
The data block read from the storage means includes a first data block including the above-described expression, a second data block including the above-mentioned unnecessary expression, and a third including neither the above-described expression nor the unnecessary expression. And a filtering processing unit for further classifying the first data block by the already-existing group,
The text analysis means calculates the similarity for the third data block,
The group generation means groups the third data block to generate the new group,
The time series analysis means, with respect to at least one group of the existing group sorted by the filtering processing unit and the new group generated by the grouping generation means, based on the reception time of a plurality of data blocks in the group, performing time series analysis in the predetermined extraction period, the text processing device according to claim 1 or 2.

The expression extracted from the text in the data block belonging to the new group generated by the group generation means is used as the new existing expression or new unnecessary expression of the new existing group, and the existing condition storage unit or the unnecessary condition storage The text processing apparatus according to claim 3 , further comprising registration means for registering in the section.

A computer program for text processing,
Storage means for storing a plurality of data blocks including text reception time and a text reception time represented by an opinion on the product or service by a person related to the product or service;
Text analysis means for comparing each text of the plurality of data blocks by text analysis and calculating the degree of similarity with each other;
Group generation means for grouping a plurality of data blocks to generate a new group based on the similarity calculated by the text analysis means;
For the new group generated by the group generation means , the first extraction period including the most recent first extraction period and the first extraction period on the basis of reception times of a plurality of data blocks in the new group, and the first A time series analysis means for performing a time series analysis in a predetermined extraction period including a second extraction period longer than the extraction period ;
A group extraction means for extracting the new group whose time series change in the number of appearances based on the reception time base of the opinion obtained by the analysis by the time series analysis means matches a predetermined extraction condition ; is realized,
The extraction condition is a condition determined based on one or more parameters including at least one of a variation rate of the number of appearances based on the reception time and a variation pattern of the number of appearances based on the reception time. <br/> Computer program.