JP2008084151A

JP2008084151A - Information display device and information display method

Info

Publication number: JP2008084151A
Application number: JP2006265319A
Authority: JP
Inventors: Yumiko Watanabe; 由美子渡邉; Reina Nishido; 礼奈西土; Mayumi Sato; 眞由美佐藤; Yasuhiko Asakawa; 泰彦浅川; Natsue Kaburayama; 奈津恵蕪山; Hiroshi Matsuda; 寛松田; Yasuhiro Hirayama; 靖博平山
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 2006-09-28
Filing date: 2006-09-28
Publication date: 2008-04-10

Abstract

【課題】文書データから必要な情報を取得するのは容易でない。
【解決手段】情報表示装置は、文書集合をユーザが設定した第１、第２の分類手法で分類する。情報表示装置に表示するマトリクス５０は、第１、第２の分類手法で分類した際の分類項目をそれぞれ表示する列の分類項目欄５２および行の分類項目欄５４、分類された行と列の文書集合の積集合に係る数値を２次元マトリクス上のドットの色で表す図形表示欄５６を含む。設定された分類手法がクラスタリングであった場合は、クラスタリングの実施に際し各文書から抽出した語句から所定の基準により選択した代表語句を分類項目として表示する。
【選択図】図２It is not easy to obtain necessary information from document data.
An information display device classifies a document set by first and second classification methods set by a user. The matrix 50 displayed on the information display device includes a column classification item column 52 and a row classification item column 54 for displaying the classification items when classified by the first and second classification methods, and the classified row and column classification. It includes a graphic display field 56 that represents the numerical value related to the product set of the document set by the color of dots on the two-dimensional matrix. If the set classification method is clustering, representative words and phrases selected according to a predetermined criterion from words and phrases extracted from each document when clustering is performed are displayed as classification items.
[Selection] Figure 2

Description

本発明は情報表示技術に関し、特に蓄積された文書データから得られる情報を視覚化して表示する情報表示技術およびそれに適用される情報表示方法に関する。 The present invention relates to an information display technique, and more particularly to an information display technique for visualizing and displaying information obtained from accumulated document data and an information display method applied thereto.

コンピュータやネットワークなどの情報処理分野における技術環境は近年、劇的な進歩を遂げてきた。それにともない多量かつ多様なデータを記憶装置に保存したり、ネットワークや記録媒体を介して入手したりすることが容易に行われるようになってきた。そのため情報化社会における課題の重点は、情報をいかに入手するかという点から、膨大な情報からいかに必要な情報を効率よく取捨選択するかという点へと移行しつつある。 The technological environment in the information processing field such as computers and networks has made dramatic progress in recent years. Along with this, it has become easy to store a large amount of various data in a storage device or obtain it via a network or a recording medium. Therefore, the emphasis of the issues in the information society is shifting from the point of how to obtain information to the point of efficiently selecting necessary information from a huge amount of information.

このような課題に対して、様々なデータベースから必要なデータを絞り込むための様々な技術が開発されてきた。例えば特許出願の公開公報などにおいて、入力された検索式にヒットした公報についてキーワードや特許分類などに基づく集計を行いマトリクスマップを表示することにより、データの絞込みを行う技術が提案されている（例えば特許文献１）。またナレッジマネジメントシステムにおいて、コミュニティや専門分野ごとの投稿数または評価値の度合いを２次元表示することにより、取得する投稿記事の絞込みを行う技術も提案されている（例えば特許文献２）。
特開２００５−１６５８５８号公報特開２００５−８５０１７号公報 In response to such a problem, various techniques for narrowing down necessary data from various databases have been developed. For example, in publications of patent applications, etc., a technique has been proposed in which data is narrowed down by performing aggregation based on keywords, patent classifications, etc., and displaying a matrix map for publications that hit an input search expression (for example, Patent Document 1). Further, in the knowledge management system, a technique for narrowing down the number of posted articles to be acquired by two-dimensionally displaying the number of posts or the degree of evaluation value for each community or specialized field has been proposed (for example, Patent Document 2).
JP 2005-165858 A JP 2005-85017 A

上記のような技術は、検索を前提としてあらかじめ分類や属性が付加されているデータを対象としているため、そのようなデータ構造に合わせてシステムを構築することにより初めて実現が可能となる。ところがこのような場合、当該システムを別のデータ構造を有するデータや別の用途に用いることはできず、上記技術におけるシステムの場合は公開公報の検索、または投稿記事の検索、といった使用目的に限定される。 Since the technique as described above is targeted for data to which classification and attributes are added in advance on the premise of search, it can be realized only by constructing a system according to such a data structure. However, in such a case, the system cannot be used for data having a different data structure or for another purpose, and in the case of the system in the above technique, it is limited to a purpose of use such as searching for a public gazette or searching for a posted article. Is done.

本発明はこうした状況に鑑みてなされたものであり、その目的は、ユーザが所望とする多様な情報を容易かつ直感的に取得できる技術を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a technique capable of easily and intuitively acquiring a variety of information desired by a user.

本発明のある態様は、情報表示装置に関する。この情報表示装置は、複数の文書を記憶する記憶部と、記憶部が記憶した複数の文書を、第１の分類手法および第２の分類手法で分類することにより２系列の文書集合群を形成する分類処理部と、分類処理部が実施した第１の分類手法および第２の分類手法による分類結果の相関関係を、２系列の文書集合群を行および列に展開し文書集合群同士の積集合に係る数値情報を所定の図形で表現した２次元マトリクスとして表示するマトリクス表示部と、を備えたことを特徴とする。 One embodiment of the present invention relates to an information display device. This information display device forms a two-line document set group by classifying a plurality of documents stored in a storage unit and a plurality of documents stored in the storage unit by a first classification method and a second classification method. And the correlation between the classification results obtained by the first classification method and the second classification method performed by the classification processing unit, the two series of document set groups are expanded into rows and columns, and the product of the document set groups is obtained. And a matrix display unit that displays numerical information relating to the set as a two-dimensional matrix expressed in a predetermined figure.

ここで「第１の分類手法」と「第２の分類手法」は同一の分類手法であってもよい。したがって「２系列の文書集合群」は同一の文書集合群であってもよい。また「図形で表現した」とは円、多角形、線などの幾何形状のいずれかに色、模様、大きさなどのいずれかまたはその組み合わせによって変化を持たせたものである。あるいは幾何形状自体を変化させたり、幾何形状を組み合わせたりしてもよい。 Here, the “first classification method” and the “second classification method” may be the same classification method. Therefore, the “two-line document set group” may be the same document set group. Further, “expressed as a graphic” is a change in a geometrical shape such as a circle, polygon, line, etc., depending on any one of color, pattern, size, or a combination thereof. Alternatively, the geometric shape itself may be changed or the geometric shapes may be combined.

本発明の別の態様も、情報表示装置に関する。この情報表示装置は、複数の文書を記憶する記憶部と、記憶部が記憶した複数の文書を所定の分類手法で分類することにより複数の文書集合を形成する分類処理部と、分類処理部が分類した各文書集合から抽出された語句の出現数に基づき当該語句に係る数値情報を算出し、分類処理部が実施した分類の結果と語句に係る数値情報との相関関係を、語句に係る数値情報を所定の図形で表現した２次元マトリクスとして表示するマトリクス表示部と、を備えたことを特徴とする。 Another aspect of the present invention also relates to an information display device. The information display device includes a storage unit that stores a plurality of documents, a classification processing unit that forms a plurality of document sets by classifying the plurality of documents stored in the storage unit by a predetermined classification method, and a classification processing unit. Based on the number of occurrences of the phrase extracted from each classified document set, the numerical information related to the phrase is calculated, and the correlation between the result of classification performed by the classification processing unit and the numerical information related to the phrase is calculated based on the numerical value related to the phrase. And a matrix display unit that displays information as a two-dimensional matrix expressed in a predetermined figure.

本発明の別の態様は、情報表示方法に関する。この情報表示方法は、複数の文書を分類する第１の分類手法および第２の分類手法の選択入力をユーザより受け付けるステップと、選択された第１の分類手法および第２の分類手法で複数の文書を分類し、２系列の文書集合群を形成するステップと、第１の分類手法および第２の分類手法による分類結果の相関関係を、２系列の文書集合群を行および列に展開し文書集合群同士の積集合に係る数値情報を所定の図形で表現した２次元マトリクスとして表示するステップと、を含むことを特徴とする Another aspect of the present invention relates to an information display method. The information display method includes a step of receiving a selection input of a first classification method and a second classification method for classifying a plurality of documents from a user, and a plurality of selected first classification method and second classification method. The step of classifying a document to form a two-series document set group and the correlation between the classification results obtained by the first classification method and the second classification method are expanded into two rows and columns, and the document is expanded. Displaying numerical information related to the intersection set of the set groups as a two-dimensional matrix expressed in a predetermined figure.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a representation of the present invention converted between a method, an apparatus, a system, etc. are also effective as an aspect of the present invention.

本発明によれば、ユーザは文書データから所望の情報を容易かつ直感的に取得することができる。 According to the present invention, a user can easily and intuitively obtain desired information from document data.

図１は本実施の形態における情報表示装置の構成を示している。情報表示装置１０は、ユーザが表示にかかる指示入力を行う入力部２０、文書データなどを記憶した記憶部１２、文書データを所定の分類手法で分類する分類処理部１４、２つの系列の分類項目を行、および列に割り当て、それぞれの要素を構成する数値を図形化して２次元マトリクスで表示するマトリクス表示部２２を含む。マトリクス表示部２２は、分類された文書データに基づく数値データを取得し、マトリクスの表示データを生成するマトリクス生成部１６と、当該マトリクスを表示する表示部１８を含む。以上の構成要素はバス２４を介して接続されており、相互にデータを送受する。 FIG. 1 shows the configuration of the information display device in this embodiment. The information display device 10 includes an input unit 20 for a user to input instructions for display, a storage unit 12 that stores document data, a classification processing unit 14 that classifies document data using a predetermined classification method, and two series of classification items. Is included in a row and a column, and a matrix display unit 22 for displaying numerical values constituting the respective elements in a two-dimensional matrix. The matrix display unit 22 includes a matrix generation unit 16 that acquires numerical data based on the classified document data and generates display data of the matrix, and a display unit 18 that displays the matrix. The above components are connected via a bus 24, and send and receive data to and from each other.

図１において、様々な処理を行う機能ブロックとして記載される各要素は、ハードウェア的には、ＣＰＵ、メモリ、その他のＬＳＩで構成することができ、ソフトウェア的には、言語処理機能のあるプログラムなどによって実現される。したがって、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組合せによっていろいろな形で実現できることは当業者には理解されるところであり、いずれかに限定されるものではない。 In FIG. 1, each element described as a functional block for performing various processes can be configured by a CPU, a memory, and other LSIs in terms of hardware, and a program having a language processing function in terms of software. Etc. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.

入力部２０はキーボード、マウス、トラックボール、トラックパッドなど一般的に用いられる入力装置のいずれか、またはその組み合わせでよい。入力部２０によりユーザは、記憶部１２に記憶された文書データから処理対象の文書集合を指定したり、表示部１８に表示したマトリクス表示画面上で、マトリクスの行および列を構成する分類項目の種類や表示するデータの種類を選択したり、表示されたマトリクス上の所望の領域を選択したりする。 The input unit 20 may be any one of commonly used input devices such as a keyboard, a mouse, a trackball, a trackpad, or a combination thereof. The input unit 20 allows the user to specify a document set to be processed from the document data stored in the storage unit 12, and on the matrix display screen displayed on the display unit 18, the classification items constituting the matrix rows and columns are displayed. The type and type of data to be displayed are selected, or a desired area on the displayed matrix is selected.

記憶部１２は、ハードディスクやメモリ、ＤＶＤ（Digital Versatile Disk）、ＣＤ（Compact Disk）などの記録媒体の読取装置などのいずれか、またはそれらの組み合わせでもよく、データ量や検索処理の形態に応じたハードウェアから適宜選択する。したがってその数は限定されない。また記憶部１２の一部は、ネットワーク（図示せず）を介してバス２４と接続していてもよい。この場合、他の機能ブロックはネットワークに接続されたサーバ（図示せず）を介して当該記憶部１２とデータの送受を行ってよい。 The storage unit 12 may be any one of a reading device for a recording medium such as a hard disk, a memory, a DVD (Digital Versatile Disk), a CD (Compact Disk), or the like, or a combination thereof. Select from hardware as appropriate. Therefore, the number is not limited. A part of the storage unit 12 may be connected to the bus 24 via a network (not shown). In this case, other functional blocks may transmit / receive data to / from the storage unit 12 via a server (not shown) connected to the network.

記憶部１２に記憶された文書データは、例えば文書本体であるテキストデータと、作成日、作成者、分類コードなど、文書を特徴づける属性とを関連付けたデータであってもよいし、文書本体であるテキストデータのみであってもよい。ここで文書とは新聞記事や特許公開公報など長文で構成されるものでも、１文、あるいは１単語で構成されるものでもよく、その長短は問わない。また記憶部１２には文書データの他、表示する画像のテンプレートデータやマトリクス上に表示する図形に係るデータ、図形で表される数値を算出するためのパラメータなど、マトリクス表示において必要となるデータも記憶される。 The document data stored in the storage unit 12 may be, for example, data in which text data that is a document body is associated with attributes that characterize the document, such as a creation date, a creator, and a classification code. Only certain text data may be used. Here, the document may be composed of a long sentence such as a newspaper article or a patent publication, or may be composed of a single sentence or a single word. In addition to document data, the storage unit 12 also includes data necessary for matrix display, such as template data for images to be displayed, data related to graphics displayed on the matrix, and parameters for calculating numerical values represented by the graphics. Remembered.

分類処理部１４は、ユーザが入力部２０において指定した文書集合のデータを記憶部１２から読み出し、ユーザが選択した、マトリクスの行および列を構成する分類項目の種類に従い、対応する分類手法で分類を行う。分類手法としては、あらかじめ各文書に関連づけて記憶された属性ごとに分類する手法、あらかじめ用意した語句集合のいずれかに、所定の基準によって各文書を振り分けていく手法、特定の分類項目を持たず文書同士の類似性により分類していく手法（以後、クラスタリングと呼ぶ）などのいずれの手法でもよい。分類された文書集合は、文書ごとに与えられた識別情報に基づき各分類項目と関連付けて記憶部１２のメモリなどに保存される。 The classification processing unit 14 reads out the data of the document set designated by the user through the input unit 20 from the storage unit 12, and classifies the data according to the classification method corresponding to the type of classification items constituting the rows and columns of the matrix selected by the user. I do. As a classification method, a method for classifying each attribute stored in association with each document in advance, a method for assigning each document according to a predetermined criterion to one of a set of words and phrases prepared in advance, and a specific classification item are not provided. Any method such as a method of classifying documents based on similarity between documents (hereinafter referred to as clustering) may be used. The classified document set is stored in the memory of the storage unit 12 in association with each classification item based on the identification information given for each document.

本実施の形態では、分類した結果をマトリクス形式で表示するが、分類処理部１４が行う分類手法は２つとは限らない。すなわち、１つの分類手法で分類した結果を行および列に同様に表示することもある。ユーザは分類項目の種類を選択することにより、文書集合が含有する多様な情報から所望のものを効率よく取得することができる。分類項目や対応する分類手法、およびマトリクス表示により得られる情報の具体例については後に説明する。 In the present embodiment, the classified results are displayed in a matrix format, but the classification method performed by the classification processing unit 14 is not necessarily two. That is, the result of classification by one classification method may be displayed in the same way in rows and columns. By selecting the type of classification item, the user can efficiently obtain a desired item from various information contained in the document set. Specific examples of classification items, corresponding classification methods, and information obtained by matrix display will be described later.

マトリクス表示部２２のマトリクス生成部１６は、分類処理部１４が分類した結果をユーザが入力部２０において選択した分類項目に従い、行または列に配置していき、２次元マトリクスの表示データを生成する。例えば記憶部１２から読み出した処理対象の文書集合の分類項目ごとの文書数を表示させたい場合を考える。このときマトリクス生成部１６はまず、行および列として選択された分類項目に分類した結果である、２系列の文書集合群同士の各組み合わせの積集合を要素とする行列を生成する。すなわち、行をなす文書集合群が｛Ｍ１，Ｍ２，・・・，Ｍｍ｝（ｍは項目数）、列をなす文書集合群が｛Ｎ１，Ｎ２，・・・，Ｎｎ｝（ｎは項目数）だとすると、式１のような行列を生成する。そして各積集合に属する文書の識別情報と、対応する行および列の２つの分類項目の組み合わせとを関連づけてメモリなどに保存する。 The matrix generation unit 16 of the matrix display unit 22 arranges the results classified by the classification processing unit 14 in rows or columns according to the classification item selected by the user in the input unit 20 and generates display data of a two-dimensional matrix. . For example, let us consider a case in which it is desired to display the number of documents for each classification item of the processing target document set read from the storage unit 12. At this time, the matrix generation unit 16 first generates a matrix having as an element a product set of combinations of two series of document set groups, which is a result of classification into the classification items selected as rows and columns. That is, the document set group forming a row is {M1, M2,..., Mm} (m is the number of items), and the document set group forming a column is {N1, N2,. ), A matrix like Equation 1 is generated. Then, the identification information of documents belonging to each product set and the corresponding combination of two classification items of rows and columns are stored in a memory or the like in association with each other.

次にマトリクス生成部１６は、マトリクスの各要素である積集合について、ユーザが入力部２０において選択した、表示するデータの種類に応じた演算を行い、行列を数値化する。上記の例では、積集合に属する文書の数をカウントして最終的な数値行列を求める。 Next, the matrix generation unit 16 performs an operation on the product set, which is each element of the matrix, according to the type of data to be displayed selected by the user using the input unit 20, and digitizes the matrix. In the above example, the final numerical matrix is obtained by counting the number of documents belonging to the product set.

さらにマトリクス生成部１６は、当該数値行列の各要素を図形として視覚化して表す。これによりユーザは、たとえ分類項目が多くても数値分布や傾向などをより直感的に把握することができる。マトリクス生成部１６は後述するように、自動またはユーザによる指示により、マトリクスの行や列の並び替えを行ったり、表示させる分類項目の絞込みを行ったりする。そのためユーザは全体的な傾向把握に加えて局所的な情報を取得するためのデータの絞込みを効率よく行うことができる。 Further, the matrix generation unit 16 visualizes each element of the numerical matrix as a graphic. Thereby, even if there are many classification items, the user can grasp a numerical distribution and a tendency more intuitively. As will be described later, the matrix generation unit 16 rearranges the rows and columns of the matrix and narrows down the classification items to be displayed, either automatically or according to a user instruction. Therefore, the user can efficiently narrow down data for acquiring local information in addition to grasping the overall trend.

表示するデータの種類によっては、マトリクス生成部１６は、分類処理部１４が分類した文書集合から抽出された印象表現語を含むフレーズや、名詞句、形容詞句、動詞句などの語句の出現数を、文書集合ごとにカウントする。また必要に応じて、出現数などに基づいたアフェクト度の計算も行う。語句の抽出処理自体は外部の処理装置が行ってもよく、その場合は抽出された語句と各文書とを対応づけたデータが記憶部１２に記憶されている。マトリクス生成部１６は当該データを参照することにより語句の出現数をカウントする。具体的な抽出語句の種類と表示データの種類については後に詳述する。 Depending on the type of data to be displayed, the matrix generation unit 16 determines the number of appearances of phrases including impression expression words extracted from the document set classified by the classification processing unit 14 and phrases such as noun phrases, adjective phrases, and verb phrases. Count for each document set. If necessary, the degree of effect is also calculated based on the number of appearances. The phrase extraction process itself may be performed by an external processing device. In this case, data in which the extracted phrase is associated with each document is stored in the storage unit 12. The matrix generation unit 16 counts the number of appearances of words by referring to the data. Specific types of extracted words and types of display data will be described in detail later.

表示部１８は、マトリクス生成部１６の制御のもと、生成されたマトリクスの表示データを画面に表示するとともに、ユーザが入力部２０において処理対象の文書集合を指定したり、分類項目の種類を選択したりする際の受付画面を表示する。マトリクスの表示画面と受付画面とは同一の画面に共に表示してよい。表示部１８はさらに、表示したマトリクス上で移動する縦、横の２本のガイド線も表示する。ユーザは入力部２０により当該ガイド線の交点を移動させることにより、マトリクス上の領域を選択することができる。マトリクス上の領域が選択されたら、マトリクス生成部１６は、表示部１８に当該領域に存在する図形が表す数値情報やその図形が属する分類項目に係る情報を表示させる。 The display unit 18 displays display data of the generated matrix on the screen under the control of the matrix generation unit 16, and the user designates a document set to be processed in the input unit 20 and sets the type of classification item. Displays a reception screen when making selections. The matrix display screen and the reception screen may be displayed together on the same screen. The display unit 18 also displays two vertical and horizontal guide lines that move on the displayed matrix. The user can select an area on the matrix by moving the intersection of the guide lines using the input unit 20. When a region on the matrix is selected, the matrix generation unit 16 causes the display unit 18 to display numerical information represented by the graphic existing in the region and information related to the classification item to which the graphic belongs.

表示部１８はさらに、ユーザが選択したマトリクス上の領域に存在する図形に対応する文書集合の本文を表示する。文書集合の本文はマトリクスを表示するウィンドウに重畳させた別のウィンドウ上に表示してもよいし、マトリクスの表示領域にマトリクスと切替えて表示してもよい。 The display unit 18 further displays the text of the document set corresponding to the graphic existing in the area on the matrix selected by the user. The text of the document set may be displayed on another window superimposed on the matrix display window, or may be displayed by switching to the matrix display area.

図２は表示部１８に表示されるマトリクスの例を示している。マトリクス５０は列の分類項目欄５２、行の分類項目欄５４、および図形表示欄５６を含む。図２の例では、列の分類項目欄５２と行の分類項目欄５４のいずれも、「カメラ」、「ケース」、「サイズ」、「シャッター」、「レンズ」、「バッテリー」、・・・、「電池」、なる分類項目が表示されている。例えば、カメラ関係の投稿記事の文書集合を、２つの分類手法、または２つの分類器で、あらかじめ設定したこれらの分類項目に主題に応じて分類した際の、分類結果の相関関係、あるいは分類手法や分類器の傾向などを把握する場合などには図２のような表示が有用となる。 FIG. 2 shows an example of a matrix displayed on the display unit 18. The matrix 50 includes a column classification item field 52, a row classification item field 54, and a graphic display field 56. In the example of FIG. 2, all of the column classification item column 52 and the row classification item column 54 are “camera”, “case”, “size”, “shutter”, “lens”, “battery”,. , “Battery” is displayed. For example, correlation of classification results or classification technique when a set of documents related to a camera-related article is classified into these classification items set in advance by two classification methods or two classifiers according to the subject matter. For example, when the tendency of the classifier is grasped, the display as shown in FIG. 2 is useful.

図形表示欄５６には、マトリクスの各要素の数値を表す図形としてドットが表示されている。以後、数値はドットで表すとして説明するが、別の形状を有する図形でも同様である。図２においてドットは全て同一の大きさで、数値に応じて色を異ならせている。ただしここでは、表示の便宜上、色の変化は模様の変化で表しており、例えば模様が密なほど値が高いとする。またドットが表示されていない要素は数値が０、または数値が算出されない場合などである。 In the graphic display field 56, dots are displayed as graphics representing the numerical values of the elements of the matrix. In the following description, the numerical values are represented by dots, but the same applies to figures having other shapes. In FIG. 2, the dots are all the same size and have different colors according to the numerical values. However, here, for convenience of display, the color change is represented by a pattern change. For example, the denser the pattern, the higher the value. An element in which no dot is displayed is a case where a numerical value is 0 or a numerical value is not calculated.

図２では、マトリクスの対角線を構成する要素、すなわち（「カメラ」，「カメラ」）、（「ケース」，「ケース」）といった同一の分類項目の組み合わせにおいて数値が高いことがわかる。さらにこの例では、分類項目「レンズ」の列が、他の列よりドットの数が多い。この結果を上記のように、２つの分類手法で同じ分類項目への分類を行い、それぞれの分類項目に属する文書数を表示させた場合に当てはめると、当該２つの分類手法でも分類傾向にはおよそ差がないことや、列を構成する分類手法で「レンズ」に分類された文書は、行を構成する分類手法では分類が分散することなどが把握できる。 In FIG. 2, it can be seen that the numerical value is high in the combination of the same classification items such as the elements constituting the diagonal of the matrix, that is, (“camera”, “camera”), (“case”, “case”). Furthermore, in this example, the column of the classification item “lens” has a larger number of dots than the other columns. If this result is applied to the case where the two classification methods are used to classify the same classification item and the number of documents belonging to the respective classification items is displayed, the classification tendency is approximately even in the two classification methods. It can be understood that there is no difference, and that documents classified as “lens” by the classification method that constitutes the columns are dispersed by the classification method that constitutes the rows.

図３は表示部１８に表示されるマトリクスの別の例を示している。この例におけるマトリクス５０では、列の分類項目欄５２に「２００４年」、「２００５年」、「２００６年」という分類項目が、行の分類項目欄５４に「カメラ」、「携帯電話」という分類項目が表示されている。図２の例では図形表示欄５６に表示されたドットが同一の大きさであったが、図３ではドットの大きさが異なる。さらにドットの色も異なるが、図２と同様、便宜上模様を異ならせて表現している。すなわち図３の例では、１つのドットの大きさおよび色によって２つの数値を表している。 FIG. 3 shows another example of the matrix displayed on the display unit 18. In the matrix 50 in this example, the category items “2004”, “2005”, and “2006” are displayed in the column category item column 52, and the categories “camera” and “mobile phone” are listed in the row category item column 54. Items are displayed. In the example of FIG. 2, the dots displayed in the graphic display field 56 have the same size, but in FIG. 3, the size of the dots is different. Furthermore, although the colors of the dots are different, the patterns are expressed with different patterns for convenience as in FIG. That is, in the example of FIG. 3, two numerical values are represented by the size and color of one dot.

図３は例えば、「カメラ」または「携帯電話」に関する文書数の、作成年に対する推移をドットの大きさで表すとともに、２００４年からの文書数の変化率をドットの色で表した場合のマトリクスである。このとき「カメラ」に関する文書数は「携帯電話」に関する文書数より全期間に渡り少ないが、その増加率は「カメラ」に関する文書の方が多いことが把握できる。 FIG. 3 shows, for example, a matrix in the case where the number of documents related to “camera” or “mobile phone” is represented by the size of a dot with respect to the creation year and the rate of change in the number of documents from 2004 is represented by a dot color. It is. At this time, the number of documents related to “camera” is smaller over the entire period than the number of documents related to “mobile phone”, but it can be understood that the increase rate is higher for documents related to “camera”.

次に以上の構成による情報表示装置１０の動作を説明する。図４は情報表示装置１０によるマトリクス表示の処理手順を示すフローチャートである。まずユーザは入力部２０により、処理対象の文書集合を指定する（Ｓ１０）。例えば、表示部１８に入力を行うための受付画面、すなわち記憶部１２中に記憶された様々な文書集合から選択を行うための画面を表示させ、その中からユーザが選択入力する。あるいは文書集合を記憶したＣＤ−ＲＯＭなどの記録媒体を記憶部１２に読み込ませたり、ネットワークを介したサーバから記憶部１２へのダウンロード指示を行ったりしてもよい。 Next, the operation of the information display apparatus 10 having the above configuration will be described. FIG. 4 is a flowchart showing a matrix display processing procedure by the information display apparatus 10. First, the user designates a document set to be processed through the input unit 20 (S10). For example, a reception screen for input on the display unit 18, that is, a screen for selecting from various document sets stored in the storage unit 12, is displayed, and the user selects and inputs the screen. Alternatively, a recording medium such as a CD-ROM storing the document set may be read into the storage unit 12 or a download instruction may be given from the server to the storage unit 12 via a network.

次にユーザは、マトリクスの行および列を構成する分類項目の種類およびマトリクスとして表示させるデータの種類を選択する（Ｓ１４）。このステップも、Ｓ１０において選択された文書集合に対して選択可能な分類項目を表示部１８に表示させることにより、ユーザが選択入力してよい。このとき、文書の属性により分類する場合は「性別」、「作成日」など属性の種類を、あらかじめ用意した語句集合を分類項目とする場合は当該語句集合につけた名前を選択項目として表示する。一方、クラスタリングを行った結果を表示させたい場合は、分類項目が未知のため「クラスタリング」といった選択項目を表示する。同様に、抽出した語句ごとに数値を表したい場合も、抽出される語句が未知のため、語句の種類を選択項目として表示する。データの種類については、例えば「文書数」や「文書数割合」など、計算可能な数値の名前を選択項目として表示する。必要に応じて各選択項目についての詳細な設定を受け付ける画面を表示してもよい。 Next, the user selects the type of classification items constituting the rows and columns of the matrix and the type of data to be displayed as the matrix (S14). Also in this step, the user may select and input the classification items that can be selected for the document set selected in S10 on the display unit 18. At this time, when classifying according to the attribute of the document, the type of attribute such as “gender” and “creation date” is displayed as a selection item. On the other hand, when it is desired to display the result of clustering, since the classification item is unknown, a selection item such as “clustering” is displayed. Similarly, when it is desired to express a numerical value for each extracted phrase, the type of phrase is displayed as a selection item because the extracted phrase is unknown. For the type of data, for example, a name of a numerical value that can be calculated such as “number of documents” or “ratio of the number of documents” is displayed as a selection item. You may display the screen which receives the detailed setting about each selection item as needed.

分類処理部１４は、Ｓ１０で指定された文書集合を記憶部１２から読み出し、Ｓ１４で選択された分類項目の種類に基づき分類を実行する（Ｓ１６）。読み出した文書集合が記憶部１２においてすでに分類されており、ユーザがその分類手法のみを選択している場合は、Ｓ１６の処理をスキップする。 The classification processing unit 14 reads out the document set designated in S10 from the storage unit 12, and executes classification based on the type of the classification item selected in S14 (S16). If the read document set is already classified in the storage unit 12, and the user has selected only the classification method, the process of S16 is skipped.

マトリクス表示部２２のマトリクス生成部１６は、上述のとおり行、列を構成する２系列の文書集合の積集合を各項目の組み合わせごとに形成していき、Ｓ１４で選択された表示データの種類に基づき、各積集合に関する数値を算出する。あるいは分類処理部１４が分類した文書集合ごとにＳ１４で選択された語句の種類に応じた語句を抽出し、出現数または出現数に基づく数値を算出する。そして当該数値に基づきマトリクスに表示するドットの色や大きさを決定し、当該ドットと行および列の分類項目とからなるマトリクスデータを生成する（Ｓ１８）。表示部１８はマトリクスデータを画面上に出力する（Ｓ２０）。 As described above, the matrix generation unit 16 of the matrix display unit 22 forms a product set of two series of document sets constituting rows and columns for each combination of items, and sets the type of display data selected in S14. Based on this, a numerical value for each product set is calculated. Alternatively, the phrase corresponding to the type of phrase selected in S14 is extracted for each document set classified by the classification processing unit 14, and the number of appearances or a numerical value based on the number of appearances is calculated. Then, the color and size of the dots displayed on the matrix are determined based on the numerical values, and matrix data including the dots and the classification items of the rows and columns is generated (S18). The display unit 18 outputs the matrix data on the screen (S20).

このとき表示部１８は、マトリクスの他に分類項目の種類や表示データの種類を選択入力するための受付画面を常時表示しておく。そしてマトリクスを表示している間にユーザが新たな分類項目の種類を選択する入力を行った場合（Ｓ２２のＹ）、分類処理部１４は当該分類項目の種類に基づき新たな分類を行い（Ｓ１６）、マトリクス生成部１６はマトリクスデータを生成し直す（Ｓ１８）。またユーザが新たな表示データの種類を選択する入力を行った場合（Ｓ２４のＹ）、マトリクス生成部１６は当該表示データの種類に基づき新たな数値を算出してマトリクスデータを生成し直す（Ｓ１８）。これらの処理により、表示部１８は、ユーザが選択した分類項目の種類、または表示データの種類に従った新たなマトリクスを表示する（Ｓ２０）。 At this time, the display unit 18 always displays a reception screen for selecting and inputting the type of classification item and the type of display data in addition to the matrix. If the user inputs to select a new category item type while displaying the matrix (Y in S22), the classification processing unit 14 performs a new classification based on the category item type (S16). ), The matrix generation unit 16 regenerates the matrix data (S18). When the user inputs to select a new display data type (Y in S24), the matrix generation unit 16 calculates a new numerical value based on the display data type and regenerates the matrix data (S18). ). Through these processes, the display unit 18 displays a new matrix according to the type of classification item selected by the user or the type of display data (S20).

以上の処理を、マトリクス表示の終了指示をユーザが入力するまで行い（Ｓ２６のＮ）、終了指示の入力によりマトリクス表示を終了する（Ｓ２６のＹ）。 The above processing is performed until the user inputs an instruction to end the matrix display (N in S26), and the matrix display is ended by inputting the end instruction (Y in S26).

図５は表示部１８に表示される画面の構成例を示している。マトリクス表示画面６０は文書集合指定領域６６、マトリクス表示領域５１、分類項目選択領域６２、凡例表示領域６４、ソート指示ボタン６７、および絞込み指示ボタン６８を含む。図４のＳ１０においてユーザは、記憶部１２における記憶場所および文書集合名を文書集合指定領域６６に対して入力することにより、処理対象の文書集合を指定する。一般的な文書指定手法のように、記憶部１２のツリー構造を参照できるようにしてそこから選択するようにしてもよい。 FIG. 5 shows a configuration example of a screen displayed on the display unit 18. The matrix display screen 60 includes a document set designation area 66, a matrix display area 51, a classification item selection area 62, a legend display area 64, a sort instruction button 67, and a narrowing instruction button 68. In S10 of FIG. 4, the user designates the document set to be processed by inputting the storage location and the document set name in the storage unit 12 to the document set designation area 66. As in a general document specifying method, the tree structure of the storage unit 12 may be referred to and selected from there.

マトリクス表示領域５１は列の分類項目表示領域５３、行の分類項目表示領域５５、および図形表示領域５７を含む。また分類項目選択領域６２には、マトリクスとして表示できる行の分類項目や列の分類項目の種類、および表示データの種類の候補が表示される。同図では、行または列の分類項目の種類として「クラスタ名」、「地域」、「職業」が、表示するデータの種類として「数量」、「割合」、「本文」が候補として表示されている。ここで「本文」は、本実施の形態の機能として、ドットで表されたある文書集合をユーザが指定した場合に、当該文書集合に属する文書の本体をテキストデータとして表示する場合に選択される。 The matrix display area 51 includes a column classification item display area 53, a row classification item display area 55, and a graphic display area 57. In the category item selection area 62, row category items and column category item types that can be displayed as a matrix, and display data type candidates are displayed. In this figure, “Cluster name”, “Region”, “Occupation” are displayed as the types of classification items in the row or column, and “Quantity”, “Percentage”, “Text” are displayed as the types of data to be displayed. Yes. Here, “body” is selected as a function of the present embodiment when the user designates a certain document set represented by dots and the body of the document belonging to the document set is displayed as text data. .

図４のＳ１４においてユーザは、分類項目選択領域６２からマトリクスの行および列に表示させたい分類項目の種類を選択し、入力部２０であるポインティングデバイスなどによりその項目を列の分類項目表示領域５３や行の分類項目表示領域５５にそれぞれドラッグアンドドロップ操作することにより、分類項目の種類を確定する。同様に、分類項目選択領域６２からデータの種類を選択し、図形表示領域５７にドラッグアンドドロップ操作することにより表示データの種類を確定する。マトリクス生成部１６は、図４のＳ１８においてマトリクスデータを生成する際、算出した各要素の数値の範囲に応じてドットの色や大きさを決定し、凡例表示領域６４に凡例を表示する。 In S14 of FIG. 4, the user selects the type of classification item to be displayed in the matrix row and column from the classification item selection area 62, and the item is displayed in the column classification item display area 53 using the pointing device or the like as the input unit 20. The type of the classification item is determined by performing a drag-and-drop operation to the classification item display area 55 of the line. Similarly, the type of data is selected from the classification item selection area 62, and the type of display data is determined by performing a drag and drop operation on the graphic display area 57. When generating matrix data in S <b> 18 of FIG. 4, the matrix generation unit 16 determines the color and size of the dots according to the calculated numerical value range of each element, and displays the legend in the legend display area 64.

ソート指示ボタン６７および絞込み指示ボタン６８はそれぞれ、所定の基準により、マトリクス表示領域５１に表示されたマトリクスの行または列を入れ替えたり表示数を減縮したりする際にユーザによって選択される。これらの機能については後に説明する。 The sort instruction button 67 and the narrow-down instruction button 68 are selected by the user when the matrix rows or columns displayed in the matrix display area 51 are exchanged or the number of displays is reduced according to predetermined criteria. These functions will be described later.

以上のような画面構成とすることにより、ユーザは視覚的、直感的に条件設定を行うことができ、多様な形態の文書や色々な分類手法があっても容易に所望の情報を得ることができる。なお図５に示した画面は例示であり、本実施の形態はこれに限られない。例えば分類項目の種類や表示データの種類によっては、さらに詳細な設定を必要とする場合もある。このときは必要に応じて別のウィンドウを重ねて表示したり、分類項目選択領域６２に選択肢を追加したりすることによって設定を行う。いずれの場合も、ポインティングデバイスを使用して直感的に選択できるような画面構成が望ましい。 With the screen configuration described above, the user can set conditions visually and intuitively, and can easily obtain desired information even if there are various forms of documents and various classification methods. it can. Note that the screen shown in FIG. 5 is an example, and the present embodiment is not limited to this. For example, depending on the type of classification item and the type of display data, more detailed settings may be required. At this time, the setting is performed by displaying another window in an overlapping manner or adding an option to the classification item selection area 62 as necessary. In any case, a screen configuration that can be selected intuitively using a pointing device is desirable.

ここで分類項目の種類を選択する際の態様のひとつを説明する。図６は行の分類項目が階層構造を有するときにマトリクス表示領域５１に表示されるマトリクスの例を示している。この例はカメラに関する記事を、そこに含まれる「ケース」、「サイズ」、「レンズ」などの被修飾名詞句で分類し、さらにそれらの句を修飾する「しっかりする」、「重い」、「丈夫」などの形容詞句でさらに細分化して分類した場合について示している。すなわちこのときの分類項目は、被修飾名詞句が上位層、形容詞句が下位層の階層構造を有する。一方、列の分類項目は「機種Ａ」、「機種Ｂ」、「機種Ｃ」など、機種ごとに分類する単層構造を有している。 Here, one mode when selecting the type of the classification item will be described. FIG. 6 shows an example of a matrix displayed in the matrix display area 51 when the row classification items have a hierarchical structure. This example categorizes articles about cameras by their modified noun phrases such as “case”, “size”, “lens”, etc., and further modifies those phrases as “solid”, “heavy”, “ It shows the case of further subdividing and classifying by adjective phrases such as “durable”. That is, the classification item at this time has a hierarchical structure in which the modified noun phrase is an upper layer and the adjective phrase is a lower layer. On the other hand, the column classification item has a single-layer structure that classifies by model, such as “model A”, “model B”, and “model C”.

このような状況においては、分類項目選択領域６２には例えば「カメラ語句（上位／下位）」、「カメラ語句（上位）」、「カメラ語句（下位）」といった候補を表示する。ここで「カメラ語句」とはあらかじめ用意された分類項目列、この場合は階層構造を有する分類項目の集合につけられた名前である。なお「名詞句／形容詞句」などの表示でもよい。クラスタリングを行うときは「クラスタリング（上位／下位）」などでもよい。 In such a situation, candidates such as “camera phrase (upper / lower)”, “camera phrase (upper)”, and “camera phrase (lower)” are displayed in the classification item selection area 62, for example. Here, the “camera word / phrase” is a name assigned to a group of classification items prepared in advance, in this case, a classification item having a hierarchical structure. A display such as “noun phrase / adjective phrase” may be used. When clustering is performed, “clustering (upper / lower)” or the like may be used.

そして「カメラ語句（上位／下位）」を行の分類項目表示領域５５にドラッグアンドドロップ操作したときは、マトリクス表示領域５１には図６（ａ）に示すように上位層および下位層の分類項目が階層構造のまま表示される。したがってドットが表す数値は、例えば「ケース」に属する文書集合のうち「重い」に属するものと、「機種Ａ」に属する文書集合との積集合に係る数値である。この例では、各機種に対して「何（被修飾名詞句）」が「どう（形容詞句）」であるという内容の記事が多いのか、などを把握することができる。 When a “camera word / phrase (upper / lower)” is dragged and dropped to the line category item display area 55, the matrix display area 51 displays the category items of the upper and lower layers as shown in FIG. Is displayed as a hierarchical structure. Therefore, for example, the numerical value represented by the dot is a numerical value related to a product set of a document set belonging to “Case” and a document set belonging to “Model A”. In this example, it is possible to grasp, for each model, whether there are many articles whose content is “what (modified noun phrase)” is “how (adjective phrase)”.

一方、「カメラ語句（上位）」を選択した場合、マトリクス表示領域５１には図６（ｂ）に示すように、上位層の分類項目が表示される。このときドットが表す数値は、例えば「ケース」に属する文書集合と「機種Ａ」に属する文書集合との積集合に係る数値である。これにより、各機種の「何（被修飾名詞句）に係る記事が多いのか、などを把握することができる。同様に、「カメラ語句（下位）」を選択した場合、図６（ｃ）に示すように下位層の分類項目が表示される。このときは上位層の分類項目に関わらず「重い」なる下位層の分類項目に属する文書を集計して文書集合とし、それと「機種Ａ」に属する文書集合との積集合に係る数値を表示する。これにより「何（被修飾名詞句）」に関わらずどのような形容をされた記事が多いのかを機種ごとに把握できる。 On the other hand, when “camera word / phrase (upper order)” is selected, the upper layer classification items are displayed in the matrix display area 51 as shown in FIG. The numerical value represented by the dot at this time is, for example, a numerical value related to a product set of a document set belonging to “case” and a document set belonging to “model A”. As a result, it is possible to grasp “what (the modified noun phrase) has many articles related to each model.” Similarly, when “camera phrase (subordinate)” is selected, FIG. As shown, lower layer classification items are displayed. At this time, the documents belonging to the lower-level classification item “heavy” regardless of the higher-level classification items are aggregated to obtain a document set, and a numerical value relating to the product set of the document set belonging to “model A” is displayed. . As a result, regardless of “what (modified noun phrase)”, it is possible to grasp for each model how many articles are described.

分類項目の階層は図６の例では２層であったが、３層以上でも同様に表示の切替えを行う。このようにポインティングデバイスのみによって表示データを切替えることができるため、分類項目が階層構造を有していても、全体的な傾向の把握から詳細な分析までを効率よく行える。また階層の違いによる結果を容易に比較することができる。 Although the classification item hierarchy is two layers in the example of FIG. 6, display switching is performed in the same manner for three or more layers. As described above, since display data can be switched only by a pointing device, even if the classification item has a hierarchical structure, it is possible to efficiently perform the process from grasping the overall trend to detailed analysis. In addition, the results due to the difference in hierarchy can be easily compared.

マトリクス生成部１６は、ユーザがソート指示ボタン６７を押下することにより、マトリクス表示領域５１に表示されるマトリクスの行や列を入れ替え、ソートを行ったマトリクスデータを生成する。分類項目によっては、列のソート、行のソート、列および行のソートを選択するサブメニューをさらに表示させてもよい。また、どのような基準によってソートを行うかをサブメニューによって選択するようにしてもよい。ソートの基準としては分類項目の種類や表示データの種類などによって、（１）分類項目名によるソート、（２）合計値によるソート、（３）割合によるソート、（４）分散度によるソート、（５）対角化ソート、などから選択できるようにする。 When the user presses the sort instruction button 67, the matrix generation unit 16 replaces the rows and columns of the matrix displayed in the matrix display area 51, and generates sorted matrix data. Depending on the classification item, a submenu for selecting column sorting, row sorting, column and row sorting may be further displayed. In addition, the criteria for sorting may be selected from the submenu. Sorting criteria include (1) sorting by category item name, (2) sorting by total value, (3) sorting by percentage, (4) sorting by degree of dispersion, depending on the type of classification item and the type of display data. 5) Select from diagonalized sort.

分類項目名によるソートは、分類項目の文字列の文字コードに基づき、例えばあいうえお順などでソートを行う。合計値によるソートは、各列や各行を構成する要素の数値の合計値に基づき、例えば降順でソートを行う。割合によるソートは、各列や各行を構成する要素の数値の合計値に対する各要素の数値の割合に基づきソートを行う。分散度によるソートは、より多くの分類項目に値が分散しているか否かに基づきソートを行う。分散度には例えば、「ある行（または列）において値（ドット）が存在する分類項目数／その行（または列）に属する分類項目数」などの定義を用いる。 Sorting by classification item name is performed based on the character code of the character string of the classification item, for example, in the order of AIUEO. Sorting by the total value is performed, for example, in descending order based on the total value of the numerical values of the elements constituting each column or each row. The sorting based on the ratio is performed based on the ratio of the numerical value of each element to the total value of the numerical values of the elements constituting each column or each row. The sorting based on the degree of dispersion is performed based on whether or not the values are distributed to more classification items. For example, a definition such as “the number of classification items in which a value (dot) exists in a certain row (or column) / the number of classification items belonging to that row (or column)” is used.

対角化ソートは、マトリクスの対角線にある要素の値に着目し、行または列のどちらか一方のみをソートする場合と、行と列の双方をソートする場合とを用意する。図７は対角化ソートを行う様子を模式的に示している。図７（ａ）は、行の分類項目欄５４に表示された分類項目の順番は固定とし、対角線の領域７０にある要素の値が最も大きくなるように列の分類項目欄５２に表示された分類項目の表示順を入れ替え、ソートを行った例である。対角線の領域７０に着目してソートを行うことにより、行および列の分類項目の並び順を比較するだけで傾向を把握できる場合がある。 In the diagonalization sort, attention is paid to the value of the element on the diagonal line of the matrix, and a case where only one of the row and the column is sorted and a case where both the row and the column are sorted are prepared. FIG. 7 schematically shows how diagonalization sorting is performed. In FIG. 7A, the order of the classification items displayed in the row classification item column 54 is fixed, and the values of the elements in the diagonal area 70 are displayed in the column classification item column 52 so as to be the largest. This is an example in which the display order of the classification items is changed and sorting is performed. By focusing on the diagonal region 70, the trend may be grasped only by comparing the arrangement order of the row and column classification items.

対角化ソートにおいて行、または列の一方のみをソートする手法としては、次のような上位優先片側対角化ソートのアルゴリズムが考えられる。なおここでは行を固定し列をソートする場合について述べるが、列を固定し行をソートする場合も「行」と「列」を読み替えることによって同様に実現できる。
（１）ｎ行ｎ列の正方行列Ｔについて、ｉ＝１行目から処理を開始
（２）Ｔの行ベクトルｔｉの要素ｔｉ１，・・・，ｔｉｎのうち、ｉ≦ｊ≦ｎかつ最大の値を有するｔｉｊを求める
（３）ｉ≠ｊの場合はＴのｉ列目とｊ列目を入れ替える
（４）ｉ＜ｎの場合はｉ＝ｉ＋１として（２）から処理を繰り返す As a method for sorting only one of the rows or columns in the diagonalization sort, the following higher priority one-side diagonal sort algorithm can be considered. Although the case where the row is fixed and the column is sorted is described here, the case where the column is fixed and the row is sorted can be similarly realized by replacing “row” and “column”.
(1) Start processing from i = 1 row for square matrix T of n rows and n columns (2) Among elements ti1,..., Tin of row vector ti of T, i ≦ j ≦ n and the largest (3) If i ≠ j, replace the i-th column with the j-th column of T. (4) If i <n, set i = i + 1 and repeat the process from (2).

しかし上記アルゴリズムでは、上位の行で決定された列の位置を下位の行で変更できないため、下位の行では最大要素が対角線に位置しない場合もある。そこで以下のような、正方行列Ｔの中で最大の値を有する要素から順に対角化を行う、最大値優先片側対角化ソートのアルゴリズムを採用してもよい。
（１）決定済みの行列番号を格納するリストＬ＝｛｝を用意する
（２）Ｌ中の全ての行列番号ｌｋについてｉ≠ｌｋかつｊ≠ｌｋが成り立つＴの最大要素ｔｉｊを求める
（３）ｉ≠ｊの場合はＴのｉ列目とｊ列目を入れ替える
（４）｜Ｌ｜＜ｎの場合はＬ＝Ｌ∪｛ｉ｝として（２）から処理を繰り返す However, in the above algorithm, since the position of the column determined in the upper row cannot be changed in the lower row, the maximum element may not be located diagonally in the lower row. Therefore, a maximum value priority one-sided diagonal sort algorithm that performs diagonalization in order from the element having the maximum value in the square matrix T as described below may be employed.
(1) Prepare a list L = {} for storing determined matrix numbers (2) Find the maximum element tij of T for which i ≠ lk and j ≠ lk for all matrix numbers lk in L (3) When i ≠ j, the i-th column and the j-th column of T are exchanged. (4) When | L | <n, L = L∪ {i} and the process is repeated from (2).

図７（ｂ）は、対角線の領域７０にある要素の値が最も大きくなり、かつ対角線の領域７０の左上から右下に向けて値が降順となるように、行の分類項目欄５４に表示された分類項目および列の分類項目欄５２に表示された分類項目の双方についてソートを行った例である。このようなソートを実現するアルゴリズムとしては以下に示す両側対角化ソートがある。
（１）ｎ行ｎ列の正方行列Ｔについてｋ＝１から処理を開始
（２）ｋ≦ｉかつｋ≦ｊが成立する全てのＴの要素の中で最大の値を有する要素ｔｉｊを求める
（３）ｋ≠ｉの場合はＴのｋ行目とｉ行目を入れ替える
（４）ｋ≠ｊの場合はＴのｋ列目とｊ列目を入れ替える
（５）ｋ＜ｎの場合はｋ＝ｋ＋１として（２）から処理を繰り返す In FIG. 7B, the values of the elements in the diagonal area 70 are the largest, and the values are displayed in the line classification item column 54 so that the values are in descending order from the upper left to the lower right of the diagonal area 70. This is an example in which sorting is performed for both the classified items displayed and the classified items displayed in the column classified item column 52. As an algorithm for realizing such a sort, there is a double-sided diagonal sort shown below.
(1) Start processing from k = 1 for a square matrix T of n rows and n columns (2) Find an element tij having the maximum value among all T elements for which k ≦ i and k ≦ j holds ( 3) When k ≠ i, the kth and ith rows of T are switched. (4) When k ≠ j, the kth and jth columns of T are switched. (5) When k <n, k = Repeat the process from (2) as k + 1

ユーザが絞込み指示ボタン６８を押下した際、マトリクス生成部１６は、マトリクス表示領域５１に表示する分類項目を絞込んだマトリクスデータを生成する。絞り込んだ結果表示される分類項目の数は、固定値としてもよいし、ソートにおいて算出された数値にしきい値を設けて自動的に決定してもよい。また、ユーザがポインティングデバイスで数を設定できるゲージなどを表示することにより、ユーザが指定できるようにしてもよい。 When the user presses the narrowing down instruction button 68, the matrix generation unit 16 generates matrix data in which the classification items to be displayed in the matrix display area 51 are narrowed down. The number of classification items displayed as a result of narrowing down may be a fixed value, or may be automatically determined by providing a threshold value for the numerical value calculated in sorting. Further, the user may be allowed to designate by displaying a gauge or the like that allows the user to set the number with a pointing device.

ソート指示ボタン６７と同様に、サブメニューにてどのような手法で絞込みを行うかをユーザが選択できるようにしてもよい。絞込みには上述したようなソートのアルゴリズムを利用してもよい。例えばサブメニューにて「分散度（昇順）」なる絞込み手法を選択した場合は、上述の分散度を各行（または列）に対して算出し、その値が下位となる所定数の行（または列）のみを表示する。表示すべきドットがない、すなわち文書集合が存在しない行や列を削除するようにしてもよい。 Similar to the sort instruction button 67, the user may be able to select a method for narrowing down in the submenu. The sorting algorithm as described above may be used for narrowing down. For example, when the narrowing-down method of “dispersion degree (ascending order)” is selected in the submenu, the above-mentioned dispersion degree is calculated for each row (or column), and a predetermined number of rows (or columns) whose values are lower. ) Only. You may make it delete the row | line | column and column which do not have the dot which should be displayed, ie, a document set does not exist.

マトリクス生成部１６は、図５に示したマトリクス表示画面６０におけるマトリクス表示領域５１に、マトリクス上の各ドットが表す文書集合に係る具体的な情報を表すテキストデータを追加して表示する。図８はドットが表す文書集合に係る情報を表示した際のマトリクス表示画面６０を示している。マトリクス表示領域５１にマトリクスが表示されている状態で、ポインティングデバイスによりあるドット８３の領域を指示すると、そのドットで交差する横方向ガイド線８０および縦方向ガイド線８２が表示されるようにする。これらのガイド線によってユーザは、指示したドットがどの分類項目に属しているのかを把握できる。 The matrix generation unit 16 adds and displays text data representing specific information related to the document set represented by each dot on the matrix in the matrix display area 51 on the matrix display screen 60 shown in FIG. FIG. 8 shows a matrix display screen 60 when information related to a document set represented by dots is displayed. In the state where the matrix is displayed in the matrix display area 51, when the area of the dot 83 is designated by the pointing device, the horizontal guide line 80 and the vertical guide line 82 intersecting with the dot are displayed. With these guide lines, the user can grasp to which classification item the designated dot belongs.

さらに横方向ガイド線８０の近傍に横方向情報表示領域８６を、縦方向ガイド線８２の近傍に縦方向情報表示領域８４をポップアップウィンドウなどで表示する。横方向情報表示領域８６には、指示したドットが属する行の分類項目名、当該ドットが表す具体的な数値、当該分類項目に属する要素の数値の合計値などを表示する。図８の例では、分類項目名が「カメラ」、ドットが表す数値として「文書数」が「６」、合計値として文書数が「２１」と表示されている。縦方向情報表示領域８４にも同様の情報を表示する。「文書数」は表示するデータの種類によって割合や語句の出現数などに置き換えられる。 Further, a horizontal direction information display area 86 is displayed in the vicinity of the horizontal direction guide line 80, and a vertical direction information display area 84 is displayed in the vicinity of the vertical direction guide line 82 in a pop-up window or the like. In the horizontal direction information display area 86, the classification item name of the line to which the designated dot belongs, the specific numerical value represented by the dot, the total value of the numerical values of the elements belonging to the classification item, and the like are displayed. In the example of FIG. 8, the classification item name is “camera”, the number of documents represented by dots is “6”, and the total number of documents is “21”. Similar information is also displayed in the vertical direction information display area 84. The “number of documents” is replaced with a ratio, the number of occurrences of phrases, and the like depending on the type of data to be displayed.

このような画面構成とすることにより、分類項目が多数ありマトリクス５０が煩雑な図となっても、各ドットがどの分類項目を表しているのかを即座に知ることができる。また、ドットで全体的な傾向を把握しながらも、容易な操作で局所的な数値を取得することができる。なおドットの領域を指示した場合と同様に、各分類項目を指示することにより、当該分類項目についての情報、例えば当該分類項目に属する全文書の数や、分類処理において得られた情報などを表示するようにしてもよい。 With such a screen configuration, even if there are many classification items and the matrix 50 becomes a complicated figure, it is possible to immediately know which classification item each dot represents. In addition, it is possible to acquire a local numerical value with an easy operation while grasping the overall tendency with dots. As in the case of designating the dot area, by designating each classification item, information on the classification item, for example, the number of all documents belonging to the classification item, information obtained in the classification process, and the like are displayed. You may make it do.

あるドット８３の領域を選択した状態で、ユーザが分類項目選択領域６２に表示された「本文」６２ａなる候補をさらに選択することにより、当該ドット８３が表す文書集合の本文を表示させる。本文の表示例については後に示す。このときマトリクス生成部１６は、マトリクス生成時にメモリなどに保存した、当該積集合に属する文書の識別情報を分類項目に基づき特定する。そして別に用意した表示用のテンプレートデータに、識別情報を基に記憶部１２から読み出した本文のデータを貼り付けたり、リンクを張ったりすることにより本文表示のためのデータを生成する。該当する文書が多数ある場合などは適宜スクロールやページングのための機能を提供する。ドットを選択する代わりに各分類項目を選択することにより、当該分類項目に属する全ての文書の本文を表示するようにしてもよい。 In a state where a certain dot 83 area is selected, the user further selects a candidate “text” 62 a displayed in the classification item selection area 62, thereby displaying the text of the document set represented by the dot 83. A display example of the text will be shown later. At this time, the matrix generation unit 16 specifies identification information of documents belonging to the product set stored in a memory or the like when the matrix is generated based on the classification item. Then, the data for displaying the text is generated by pasting or linking the text data read from the storage unit 12 based on the identification information to the template data for display prepared separately. When there are a large number of applicable documents, functions for scrolling and paging are provided as appropriate. By selecting each category item instead of selecting a dot, the texts of all documents belonging to the category item may be displayed.

さらに本文を表示した後、そのデータを例えばｃｓｖ形式で保存できるようにする。保存の指示入力および保存の手順についてはデータ保存のための一般的な手法を用いることができる。このように所望の文書集合の本文を表示したり保存したりすることにより、ユーザは分類結果の数値的な側面ばかりでなく、文書の実態を確認することができる。膨大なデータベースに含まれる文書でも、最初に分類してその傾向をドットで確認してから最終的には所望の文書本体を入手する、という段階を踏むことにより、検索クエリによる検索を繰り返す場合に比べ、格段に効率よく所望の文書に行き着くことができる。 Further, after displaying the text, the data can be saved in, for example, the csv format. As a storage instruction input and storage procedure, a general method for data storage can be used. By displaying and saving the text of a desired document set in this way, the user can check not only the numerical aspects of the classification result but also the actual state of the document. Even when a document contained in a huge database is repeatedly searched by a search query by following the steps of first classifying and checking the trend with dots and finally obtaining the desired document itself In comparison, it is possible to reach the desired document much more efficiently.

次に本実施の形態における情報表示装置１０が提供する分類手法と、それをマトリクスとして表示することによって得られる情報について例示する。分類手法としては上述したように、元々文書に関連づけられた属性が存在する場合にその属性ごとに分類する手法、所定の分類項目に所定の方法によって文書を振り分ける手法、および文書同士の類似性により文書のまとまり（クラスタ）を生成していくクラスタリングが挙げられる。ここでは所定の分類項目への分類手法、およびクラスタリング手法について簡単に説明する。ただし、本実施の形態における分類手法はここで説明するものに限られず、一般的に提案されている手法のいずれを選択してもよい。 Next, a classification method provided by the information display device 10 in the present embodiment and information obtained by displaying it as a matrix will be exemplified. As described above, the classification method includes a method of classifying each attribute when the attribute originally associated with the document exists, a method of assigning the document to a predetermined classification item by a predetermined method, and similarity between documents. One example is clustering that generates a group of documents (cluster). Here, a classification method for a predetermined classification item and a clustering method will be briefly described. However, the classification method in the present embodiment is not limited to that described here, and any of the generally proposed methods may be selected.

（所定の分類項目への分類）
この分類手法は、あらかじめ分類項目（以後、カテゴリと呼ぶ）とそれに関連する語句群（以後、プロファイルと呼ぶ）を用意し、各文書から抽出した語句群とプロファイルとによって、文書とカテゴリとの類似度を判定し、類似度の高いカテゴリに文書を振り分ける手法である。例えば新聞記事を「政治」、「経済」、「スポーツ」というカテゴリに分類したい場合、「スポーツ」に関連する「野球」、「サッカー」、「試合」といった語句で構成するプロファイルを用意する。プロファイルを構成する各語句はその重要度などによって重み付けされている。 (Classification into predetermined classification items)
In this classification method, classification items (hereinafter referred to as categories) and related word groups (hereinafter referred to as profiles) are prepared in advance, and similarities between documents and categories are determined by word groups and profiles extracted from each document. This is a method of determining the degree and assigning the document to a category having a high degree of similarity. For example, when it is desired to classify newspaper articles into the categories of “politics”, “economy”, and “sports”, a profile is prepared that includes words such as “baseball”, “soccer”, and “game” related to “sports”. Each word constituting the profile is weighted according to its importance.

具体的な手法は以下のとおりである。すなわち、まず処理対象文書から語句を形態素解析により抽出する。そして同一内容で表記の異なる語句の表記を統一する。次にそれらの語句の重みベクトルと、その並び順に対応したプロファイルの重みベクトルとに基づき、ベクトル空間法を用いて処理対象文書と各カテゴリとの類似度を計算していく。前者の重みベクトルをＡ＝｛ｗ＿ａ１，ｗ＿ａ２，ｗ＿ａ３，・・・，ｗ＿ａＮ｝、後者の重みベクトルをＢ＝｛ｗ＿ｂ１，ｗ＿ｂ２，ｗ＿ｂ３，・・・，ｗ＿ｂＮ｝とする。ここでＮは処理対象文書から抽出された語句の数、ｗはＴＦ−ＩＤＦ（Term Frequency - Inverse Document Frequency）法などにより導出された語句の重要度である。このとき類似度ｓｉｍ（Ｂ，Ａ）は以下のようになる。 The specific method is as follows. That is, a phrase is first extracted from a processing target document by morphological analysis. And unify the notation of words with the same content but different notations. Next, based on the weight vector of those words and the weight vector of the profile corresponding to the arrangement order, the similarity between the processing target document and each category is calculated using the vector space method. The former weight vector is A = {w_a1, w_a2, w_a3,..., W_aN}, and the latter weight vector is B = {w_b1, w_b2, w_b3,. Here, N is the number of words / phrases extracted from the document to be processed, and w is the importance of words / phrases derived by the TF-IDF (Term Frequency-Inverse Document Frequency) method. At this time, the similarity sim (B, A) is as follows.

類似度ｓｉｍ（Ｂ，Ａ）をカテゴリごとに算出していき、最も類似度の高かったカテゴリに処理対象文書を分類する。 The similarity sim (B, A) is calculated for each category, and the processing target document is classified into the category having the highest similarity.

（クラスタリングによる分類）
クラスタリングは所定の分類項目を用意せずに文書同士の類似性によって文書のクラスタを形成する手法である。クラスタはトピック、すなわち話題ととらえることもできる。この場合も文書ごとに形態素解析により語句を抽出し、同一内容の語句の表記を統一する。そして全文書について語句の重みベクトルを対応する順序で生成し、文書の組み合わせごとに類似度を計算していく。この類似度も式２で与えられた値を用いてよい。その後、例えば類似度があるしきい値を超えた場合にそれらの文書は類似しているとみなし、同一のクラスタを生成する。 (Classification by clustering)
Clustering is a method of forming a cluster of documents based on similarity between documents without preparing a predetermined classification item. Clusters can also be viewed as topics, that is, topics. In this case as well, words are extracted by morphological analysis for each document, and the expressions of the words having the same contents are unified. Then, word weight vectors for all documents are generated in a corresponding order, and the similarity is calculated for each combination of documents. This similarity may also be the value given by Equation 2. After that, for example, when the degree of similarity exceeds a certain threshold, the documents are considered to be similar, and the same cluster is generated.

文書から抽出された語句のうち、クラスタを特徴づける語句をクラスタの代表語句として抽出しておく。また代表語句のうち最も特徴的な語句をクラスタ名として決定する。例えば、各語句についてクラスタとの相互情報量を算出し、その値が上位である数個の語句を代表語句、その値が最も高い語句をクラスタ名とする。語句ｔとクラスタＣとの相互情報量ＭＩ（ｔ，Ｃ）は次の式で表される。 Among the phrases extracted from the document, a phrase that characterizes the cluster is extracted as a representative phrase of the cluster. Also, the most characteristic phrase among the representative phrases is determined as the cluster name. For example, the mutual information with the cluster is calculated for each word, and several words / phrases with the highest value are used as representative words / phrases and the word / phrase with the highest value is used as the cluster name. The mutual information MI (t, C) between the word t and the cluster C is expressed by the following equation.

ここでＰ（ｔ）は語句ｔを含む文書が現れる確率であり、（語句ｔを含む文書数／全文書数）で定義される。Ｐ（Ｃ）はクラスタＣに属する文書が現れる確率であり、（クラスタＣに属する文書数／全文書数）で定義される。またＰ（ｔ，Ｃ）は語句ｔを含むクラスタＣに属する文書が現れる確率であり、（語句ｔを含むクラスタＣの文書数／全文書数）で定義される。図２におけるマトリクス５０の列の分類項目欄５２や行の分類項目欄５４にはこのようにして決定したクラスタ名を表示する。 Here, P (t) is a probability that a document including the word t appears, and is defined by (the number of documents including the word t / the total number of documents). P (C) is the probability that a document belonging to cluster C will appear, and is defined by (number of documents belonging to cluster C / total number of documents). P (t, C) is a probability that a document belonging to the cluster C including the word t appears, and is defined by (the number of documents of the cluster C including the word t / the total number of documents). The cluster names determined in this way are displayed in the column classification item column 52 and the row classification item column 54 of the matrix 50 in FIG.

次に各分類手法をマトリクス５０の表示形式に適用した具体例と効果について述べる。なお各例の説明においてはマトリクスを表で表し、図示を簡便化するが、表における各数値はマトリクス上のドットで表現されるものとする。また各表においてマトリクスを構成する行および列の数は簡単のために２ないし３とするが、それに限定されるものではない。 Next, specific examples and effects of applying each classification method to the display format of the matrix 50 will be described. In the description of each example, the matrix is represented by a table to simplify the illustration, but each numerical value in the table is represented by a dot on the matrix. The number of rows and columns constituting the matrix in each table is 2 to 3 for simplicity, but is not limited thereto.

（クラスタリングを利用したプロファイル診断）
表１は行の分類項目を、用意されたカテゴリである「カテゴリ１」、「カテゴリ２」とし、列の分類項目をクラスタリングの結果得られたクラスタ名である「クラスタＡ」、「クラスタＢ」とした場合のマトリクスである。このマトリクス表示の目的は、適正なプロファイルとカテゴリの関係が設定されているかを確認する点にある。すなわち、プロファイルとカテゴリを用意する際に元となった文書のカテゴリの付与基準を精査したり、類似した文書に異なるカテゴリが付与されていないかをチェックしたりする。 (Profile diagnosis using clustering)
In Table 1, the classification items of the rows are “category 1” and “category 2” which are prepared categories, and the cluster classification items are “cluster A” and “cluster B” which are cluster names obtained as a result of clustering. Is a matrix. The purpose of this matrix display is to confirm whether an appropriate profile and category relationship is set. That is, it examines whether or not different categories are assigned to similar documents by examining the criteria for assigning the original document category when preparing the profile and the category.

このとき行われるクラスタリングは、カテゴリの数と同一に設定する。このようなマトリクスにおいて数値１〜数値４をドットで表す。例えば数値１として、カテゴリを用意する際の全元文書を対象としてクラスタリングを実行した結果、「カテゴリ１」の文書のうち「クラスタＡ」に分類された文書集合の文書数ｎＡ＿１を表示する。数値２〜４も同様に表示する。または、カテゴリ１に属する全文書集合の文書数ｎ１のうち、クラスタＡに属する文書集合の文書数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１を数値１としてもよい。 The clustering performed at this time is set to be the same as the number of categories. In such a matrix, numerical values 1 to 4 are represented by dots. For example, as the numerical value 1, the number of documents nA_1 of the document set classified as “cluster A” among the documents of “category 1” is displayed as a result of performing clustering on all original documents when preparing the category. Numerical values 2 to 4 are displayed in the same manner. Alternatively, the ratio of the number of documents nA_1 of the document set belonging to cluster A out of the number of documents n1 of all document sets belonging to category 1, that is, nA_1 / n1 may be set to a numerical value 1.

このようなマトリクスを表示することにより、類似した文書にも関わらず別カテゴリに分類されたものを発見でき、カテゴリの付与基準に内在する問題を洗い出すことができる。 By displaying such a matrix, it is possible to discover those classified into different categories in spite of similar documents, and to identify problems inherent in the category assignment criteria.

（構成する語句群を利用したプロファイル診断）
表２は行の分類項目および列の分類項目のいずれも、複数のカテゴリにそれぞれ対応して用意された複数のプロファイルである「プロファイル１」、「プロファイル２」、「プロファイル３」とした場合のマトリクスである。このマトリクス表示の目的は、カテゴリ間の類似性をチェックする点にある。例えば「充電」と「電池」というカテゴリがあり、それらのカテゴリのプロファイルが類似している場合、「充電」カテゴリに分類したい文書が「電池」に分類される可能性がある。このような場合に、類似しているプロファイルを統合するなどのプロファイルチューニングを行うことにより、分類精度を向上させることができる。 (Diagnosis of profiles using constituent words)
Table 2 shows a case in which “profile 1”, “profile 2”, and “profile 3”, which are a plurality of profiles prepared corresponding to a plurality of categories, are used for both the row classification item and the column classification item. Matrix. The purpose of this matrix display is to check the similarity between categories. For example, when there are categories “charge” and “battery” and the profiles of these categories are similar, there is a possibility that a document to be classified into the “charge” category may be classified as “battery”. In such a case, the classification accuracy can be improved by performing profile tuning such as integrating similar profiles.

このようなマトリクスにおいて数値１〜９をドットで表す。例えば数値２として、「プロファイル１」と「プロファイル２」の類似度を表示する。類似度を表す指標として例えば単語共有率を算出する。 In such a matrix, numerical values 1 to 9 are represented by dots. For example, as the numerical value 2, the similarity between “profile 1” and “profile 2” is displayed. For example, a word sharing rate is calculated as an index representing the degree of similarity.

単語共有率ｓｔｒは、プロファイルＣとＤの間で正の重みを持つ単語を共有する割合として以下で定義される値である。 The word sharing rate str is a value defined below as a ratio of sharing a word having a positive weight between the profiles C and D.

ここでｓｔｒ（Ｃ，Ｄ）＝ｓｔｒ（Ｄ，Ｃ）は必ずしも成立しない。また同一のプロファイルの組み合わせ、すなわち数値１、５、９に表される単語共有率は１である。式４において単語ｉの重みの与え方を変えることにより、単語共有率の観点を変化させることができる。一般に重みベクトルの要素ｗｉは−１から１の実数値を取ることが可能である。一方、ｗｉに語句の出現の有無を１および０の２値で与えると、ｓｔｒは共有する単語の割合を表す指標となる。また、ｗｉに単語の出現頻度を与えると、ｓｔｒは共有する単語の出現頻度の割合を示す指標となる。 Here, str (C, D) = str (D, C) does not necessarily hold. Further, the word sharing rate represented by the same profile combination, that is, the numerical values 1, 5, and 9, is 1. By changing the way in which the weight of the word i is given in Equation 4, the viewpoint of the word sharing rate can be changed. In general, the weight vector element w i can take a real value from −1 to 1. On the other hand, when the presence / absence of a word / phrase is given to wi as a binary value of 1 and 0, str becomes an index representing the ratio of shared words. Further, when the appearance frequency of a word is given to wi, str becomes an index indicating the ratio of the appearance frequency of the shared word.

類似度を表す指標として、単語共有率ｓｔｒの他に、コサイン類似度や結束度などを採用してもよい。コサイン類似度は、２つのプロファイルの重みベクトルを式２に適用することによって得られる。結束度ｒｅｌは、プロファイルｐ１とプロファイルｐ２を構成する語句の数により以下のように定義される値である。 As an index representing the degree of similarity, in addition to the word sharing rate str, a cosine similarity or a cohesion degree may be employed. The cosine similarity is obtained by applying the two profile weight vectors to Equation 2. The cohesion degree rel is a value defined as follows according to the number of words constituting the profile p1 and the profile p2.

ここでａはプロファイルｐ１を構成する語句の数、ｂはプロファイルｐ２を構成する語句の数、ｃはプロファイルｐ１とプロファイルｐ２とで共通に存在する語句の数である。 Here, a is the number of words constituting the profile p1, b is the number of words constituting the profile p2, and c is the number of words commonly existing in the profile p1 and the profile p2.

なおコサイン類似度および結束度は、同じプロファイルの組み合わせでは同じ値となるため、三角行列である表３のようなマトリクスとする。ここで「−」で表される要素はドットを表示しない。この場合も、同一のプロファイルの組み合わせ、すなわち数値３、５、７に表されるコサイン類似度および単語共有率は１である。 Note that the cosine similarity and cohesion are the same values for the same profile combination, so a matrix as shown in Table 3, which is a triangular matrix, is used. Here, the element represented by “−” does not display a dot. Also in this case, the combination of the same profiles, that is, the cosine similarity and the word sharing rate represented by the numerical values 3, 5, and 7 are 1.

（正解カテゴリと分類カテゴリの関係把握）
表４は行の分類項目および列の分類項目ともに、用意されたカテゴリである「カテゴリ１」、「カテゴリ２」としているが、行の分類項目は目視などにより正しく分類した場合であり、列の分類項目は分類器などによって機械的に分類した場合である。前者を「カテゴリ１（正解）」、後者を「カテゴリ１（分類）」などと表記している。このマトリクス表示の目的は、機械的に行った分類と正解との関係を視覚化する点にある。 (Understanding the relationship between correct answer category and classification category)
Table 4 shows the category “category 1” and “category 2” which are the prepared categories for both the row classification item and the column classification item. However, the row classification item is a case where the row classification item is correctly classified by visual observation or the like. The classification item is a case where mechanical classification is performed by a classifier or the like. The former is described as “category 1 (correct answer)”, the latter as “category 1 (classification)”, and the like. The purpose of this matrix display is to visualize the relationship between mechanically classified and correct answers.

表４において行および列の分類項目は同一である。このようなマトリクスにおいて数値１〜４をドットで表す。例えば数値１として、「カテゴリ１（正解）」に分類された文書集合の文書数ｎ１（正解）のうち、「カテゴリ１（分類）」に分類された文書集合の文書数ｎ１（分類）＿（正解）の割合、ｎ１（分類）＿(正解)／ｎ１（正解）なる値を表示する。同様に数値２〜４の値を表示する。このとき数値１および数値４は正解に対する分類器の再現率と考えることができる。 In Table 4, the row and column classification items are the same. In such a matrix, numerical values 1 to 4 are represented by dots. For example, as the numerical value 1, among the document number n1 (correct answer) of the document set classified as “category 1 (correct answer)”, the document number n1 (classification) _ (of the document set classified as “category 1 (classification)” is shown. The correct answer) ratio, n1 (classification) _ (correct answer) / n1 (correct answer) is displayed. Similarly, values 2 to 4 are displayed. At this time, the numerical value 1 and the numerical value 4 can be considered as the recall rate of the classifier for the correct answer.

あるいは数値１として、「カテゴリ１（分類）」に分類された文書の文書数ｎ１（分類）のうち、「カテゴリ１（正解）」に分類された文書の文書数ｎ１（正解）＿（分類）の割合、ｎ１（正解）＿（分類）／ｎ１（分類）を表示してもよい。数値２〜４も同様の値を表示する。このとき数値１および数値４は分類器による分類の正解に対する適合率と考えることができる。 Alternatively, the numerical value n1 (correct answer) _ (classification) of the documents classified as “category 1 (correct answer) out of the document number n1 (classification) of documents classified as“ category 1 (classification) ”as the numerical value 1 , N1 (correct answer) _ (classification) / n1 (classification) may be displayed. Numerical values 2 to 4 display similar values. At this time, the numerical value 1 and the numerical value 4 can be considered as the precision of the classification correct answer by the classifier.

ドットによって再現率を表すマトリクスを表示させた状態で、ユーザが入力部２０であるポインティングデバイスでマトリクス表示領域５１をクリックすると、適合率を表すマトリクスへと表示が切替わるようにしてもよい。適合率から再現率への切替えも同様に行ってよい。このようなマトリクスを表示することにより、分類器による誤分類の多いカテゴリについて、どのカテゴリに誤分類されているのかを視覚的に把握することができ、プロファイルの調整方針を立てるうえでの知見を得ることができる。 In a state where a matrix representing the reproduction rate is displayed with dots, when the user clicks on the matrix display area 51 with a pointing device which is the input unit 20, the display may be switched to a matrix representing the matching rate. Switching from the precision ratio to the recall ratio may be performed in the same manner. By displaying such a matrix, it is possible to visually grasp which categories are misclassified by the classifiers that are frequently misclassified, and to gain knowledge in developing a profile adjustment policy. Obtainable.

（分類結果の時系列分析）
表５は行の分類項目を、用意されたカテゴリである「カテゴリ１」、「カテゴリ２」とし、列の分類項目を文書に関連付けて記憶された作成日、登録日などの時系列単位である「時系列単位Ａ」、「時系列単位Ｂ」とした場合のマトリクスである。時系列単位とは例えば、２００６年、２００７年・・・や、上半期、下半期など、ある期間を指す名前である。このマトリクスの目的は、文書の経時的な変化をカテゴリごとに把握する点にある。 (Time series analysis of classification results)
Table 5 shows time category units such as the creation date and the registration date stored in association with the document, with the category items in the row being “category 1” and “category 2” which are prepared categories. It is a matrix in the case of “time series unit A” and “time series unit B”. The time series unit is a name indicating a certain period such as 2006, 2007..., First half, second half or the like. The purpose of this matrix is to grasp changes in the document over time for each category.

この場合は数値１として、「カテゴリ１」に分類された文書集合の文書数ｎ１のうち、「時系列単位Ａ」に属する文書集合の文書数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。この値を表示することにより、あるカテゴリについて出現割合の高い時期を把握することができる。あるいは数値１として、「時系列単位Ａ」に属する文書集合の文書数ｎＡのうち、「カテゴリ１」に分類された文書集合の文書数ｎ１＿Ａの割合、すなわちｎ１＿Ａ／ｎＡなる値を表示してもよい。数値２〜４も同様である。この値を表示することにより、ある期間について出現割合の高いカテゴリを把握することができる。このときも上述同様、ユーザがマトリクス表示領域５１をクリックすることによりマトリクスを切替えて表示するようにしてよい。 In this case, as the numerical value 1, the ratio of the document number nA_1 of the document set belonging to the “time series unit A” out of the document number n1 of the document set classified as “category 1”, that is, a value of nA_1 / n1 is displayed. . The same applies to the numerical values 2 to 4. By displaying this value, it is possible to grasp the time when the appearance ratio is high for a certain category. Alternatively, as the numerical value 1, the ratio of the number of documents n1_A of the document set classified into “category 1” out of the number of documents nA of the document set belonging to “time series unit A”, that is, a value of n1_A / nA may be displayed. Good. The same applies to the numerical values 2 to 4. By displaying this value, it is possible to grasp a category having a high appearance ratio for a certain period. Also at this time, the matrix may be switched and displayed when the user clicks on the matrix display area 51 as described above.

このようなマトリクスを表示することにより、例えばある時期に急激に増加した話題を容易に把握でき、当該文書集合のみをテキストマイニングなどで分析することにより、注目されている話題に絞り込んだ解析を効率よく行うことができる。また、広告を出すなどのイベントが行われた時期と比較することにより、イベントが話題に与える影響などを把握することができる。さらにいずれのカテゴリにも分類されなかった文書数の変化を追うことができ、その傾向によって新たにプロファイル調整を行う時期を検討したり、その文書を確認して新製品の提案に繋がる意見を発掘したりすることができる。 By displaying such a matrix, for example, topics that have increased rapidly at a certain time can be easily grasped, and by analyzing only the document set by text mining, etc., analysis focused on the topic of interest is efficient. Can be done well. Further, by comparing with the time when an event such as an advertisement is performed, the influence of the event on the topic can be grasped. Furthermore, it is possible to follow changes in the number of documents that were not classified into any category, and considering the timing of new profile adjustments based on the trend, or checking the documents and finding opinions that lead to new product proposals You can do it.

表５のマトリクスを社内で活用する場合、あらかじめ各カテゴリを社内の部門に割り当て、ユーザが行の分類項目表示領域５５をクリックした際に、各カテゴリが属する部門別に集計し直したマトリクスを表示するようにしてもよい。これにより部門ごとの文書数割合の経時変化を確認でき、例えば苦情数の傾向などを部門ごとに把握することができる。 When the matrix in Table 5 is used in the company, each category is assigned to the department in the company in advance, and when the user clicks the classification item display area 55 in the row, the matrix re-calculated for each department to which each category belongs is displayed. You may do it. As a result, it is possible to confirm the change over time of the document number ratio for each department, and for example, it is possible to grasp the tendency of the number of complaints for each department.

表５と同様にして、作成した人の性別、職業などあらゆる属性で分類した結果を表示できる。例えば割合の高くなり易い属性や、属性ごとの割合が似たような分布となり易いカテゴリなどについて傾向を把握することができる。 In the same manner as in Table 5, it is possible to display the results of classification according to all attributes such as the sex and occupation of the created person. For example, it is possible to grasp a tendency for an attribute that tends to have a high ratio or a category that tends to have a distribution with a similar ratio for each attribute.

（クラスタリング結果の把握）
表６は行の分類項目をクラスタ名である「クラスタ１」、「クラスタ２」とし、列の分類項目をクラスタ名である「クラスタＡ」、「クラスタＢ」とした場合のマトリクスである。ここで行の分類項目を生じさせるクラスタリング手法と列の分類項目を生じさせるクラスタリング手法とは異なる手法とする。このマトリクスの目的は、異なるクラスタリング手法におけるクラスタリング結果を比較する点にある。 (Understanding clustering results)
Table 6 is a matrix in which the row classification items are cluster names “cluster 1” and “cluster 2”, and the column classification items are cluster names “cluster A” and “cluster B”. Here, a clustering method for generating a row classification item and a clustering method for generating a column classification item are different from each other. The purpose of this matrix is to compare clustering results from different clustering techniques.

ここではクラスタ数の設定が大きい場合と小さい場合との結果を比較したり、文書単位でのクラスタリング結果と、外部の話題分割器を利用してより小さな話題単位に分けてからクラスタリングを行った結果などを比較する。この場合は数値１として例えば、ある手法によって「クラスタ１」に分類された文書集合の文書数ｎ１のうち、別の手法によって「クラスタＡ」に分類された文書集合の文書数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。 Here, we compared the results when the number of clusters is large and small, or the results of clustering in document units and clustering after dividing into smaller topic units using an external topic divider Compare etc. In this case, for example, as the numerical value 1, for example, the ratio of the document number nA_1 of the document set classified into “cluster A” by another method out of the document number n1 of the document set classified as “cluster 1” by one method, that is, The value nA_1 / n1 is displayed. The same applies to the numerical values 2 to 4.

このようなマトリクスを表示することにより、例えば設定するクラスタ数を増加させたとき、多くのクラスタに分散するクラスタや、あまり分散しないクラスタなど、各クラスタの変動を視覚的に把握することができる。このとき、設定するクラスタ数を増加させても文書集合に変化がないクラスタは、ドットをグレーで表示するなどして目立たなくさせてもよい。これにより文書集合に変化があったクラスタの分散のみを容易に把握することができる。 By displaying such a matrix, for example, when the number of clusters to be set is increased, it is possible to visually grasp the variation of each cluster, such as a cluster that is distributed over many clusters or a cluster that is not so distributed. At this time, a cluster in which the document set does not change even when the number of clusters to be set is increased may be made inconspicuous by displaying dots in gray. As a result, it is possible to easily grasp only the cluster distribution in which the document set has changed.

また、１つの文書に複数の話題を含む場合に、その話題の分散具合を確認できる。処理対象文書が顧客からの問い合わせ文であった場合などに、どのような話題が同時に問い合わせられることが多いのか、あるいはどのような話題の連続性で問い合わせが行われるのか、などを把握することができる。 In addition, when a single document includes a plurality of topics, it is possible to check the distribution of the topics. It is possible to grasp what topics are often inquired at the same time when the processing target document is an inquiry sentence from a customer, or what kind of continuity of inquiries is made. it can.

（クラスタの代表語句を利用した話題の関連性把握）
表７は行の分類項目を、各クラスタの代表語句である「クラスタ１の代表語句」、「クラスタ２の代表語句」とし、列の分類項目をクラスタ名である「クラスタ１」、「クラスタ２」とした場合のマトリクスである。クラスタの代表語句は例えば最大５語と設定する。また表示スペースなどに鑑み、マトリクス上の表記は行の分類項目においてもクラスタ名のみを表示するようにしてもよい。 (Understanding the relevance of topics using representative phrases of clusters)
In Table 7, the classification items of the rows are “representative phrases of cluster 1” and “representative phrases of cluster 2” that are representative phrases of each cluster, and the classification items of the columns are “cluster 1” and “cluster 2” that are cluster names. "Is a matrix. For example, the maximum number of representative words of the cluster is set to 5 words. In view of the display space and the like, the notation on the matrix may display only the cluster name even in the row classification item.

上述のクラスタリング結果の把握が「クラスタ１」と「クラスタＡ」に共通に属する“文書”を表示することによって行われたのに対し、この場合はクラスタの“代表語句”を介して話題の関連性を把握することを目的とする。例えば、クラスタ２に属する文書のうちクラスタ１の代表語句を含む文書集合の割合などを算出する。このような文書集合は、主にクラスタ２という話題について述べながらも、クラスタ１に関する話題にも触れている文書ということになる。すなわち、このような文書が多いということは、クラスタ１とクラスタ２は関連性の強い話題である、と考えられる。これは前述したとおり、クラスタの代表語句はクラスタを特徴づける語句であるため、本来は他のクラスタに属する文書に多く出現することはない、という性質を利用している。 While the above-mentioned grasping of the clustering result is performed by displaying “documents” that belong to “cluster 1” and “cluster A” in common, in this case, the relation of the topic via the “representative phrase” of the cluster The purpose is to understand sex. For example, the ratio of the document set including the representative words / phrases of cluster 1 among the documents belonging to cluster 2 is calculated. Such a document set is a document that mainly mentions the topic of cluster 2 but also touches the topic of cluster 1. That is, when there are many such documents, it can be considered that cluster 1 and cluster 2 are highly related topics. As described above, since the representative word / phrase of a cluster is a word / characteristic characterizing the cluster, it uses the property that it does not often appear in documents belonging to other clusters.

表７のマトリクスにおいて数値１は、「クラスタ１」の代表語句を少なくとも１語含む文書集合の文書数ｎ１（語句）のうち、「クラスタ１」に分類された文書集合の文書数ｎ１（クラスタ）＿(語句)の割合、すなわちｎ１（クラスタ）＿(語句)／ｎ１（語句）なる値を表示する。数値２〜４も同様である。 In the matrix of Table 7, the numerical value 1 is the number of documents n1 (cluster) of the document set classified as “cluster 1” out of the number n1 (phrases) of the document set including at least one representative word of “cluster 1”. The ratio of _ (phrase), that is, a value of n1 (cluster) _ (phrase) / n1 (phrase) is displayed. The same applies to the numerical values 2 to 4.

このようなマトリクスを用いた場合の実用例を次に挙げる。例えばクラスタ１のクラスタ名が「ファインダー」、その代表語句が「ファインダー／動き／屋外」であり、クラスタ２のクラスタ名が「液晶」、その代表語句が「液晶／画面／モニター」であったとする。このとき、「クラスタ１の代表語句」を分類項目とする行の各要素のうち、クラスタ１自身の値である数値１以外に数値２で高い値を示していたとする。これはすなわち、クラスタ１「ファインダー」の代表語句がクラスタ２「液晶」に属する文書に比較的多く出現している状態である。 A practical example of using such a matrix will be described below. For example, the cluster name of cluster 1 is “finder”, its representative word is “finder / motion / outdoor”, the cluster name of cluster 2 is “liquid crystal”, and its typical word is “liquid crystal / screen / monitor”. . At this time, it is assumed that a high value is indicated by the numerical value 2 in addition to the numerical value 1 that is the value of the cluster 1 itself among the elements of the row having “representative phrase of the cluster 1” as a classification item. In other words, the representative phrase of cluster 1 “finder” appears in a relatively large number in documents belonging to cluster 2 “liquid crystal”.

そこで数値２を表すドットの領域を選択し、上述のように当該文書集合の本文を表示させると、例えば「“屋外”で液晶が見にくいのでアナログカメラのように目で覗いて撮影する。」、「“動き”のあるものを撮影するときは、液晶よりファインダーの方が適している。」などといった文章を得ることができる。すなわち、クラスタ１「ファインダー」の代表語句である「ファインダー／動き／屋外」が、「ファインダー」の話題を表すものと想定し、その語句の分布を見ることで、主に「液晶」について述べながら「ファインダー」の話題にも触れている文書を確認することができる。 Therefore, when a dot area representing the numerical value 2 is selected and the text of the document set is displayed as described above, for example, “It is difficult to see the liquid crystal in“ outdoor ”, so take a picture with an eye like an analog camera”, “ When shooting something with “movement”, you can get a sentence such as “The viewfinder is better than the LCD.” That is, it is assumed that “Finder / motion / outdoor”, which is a representative word of cluster 1 “Finder”, represents the topic of “Finder”, and by looking at the distribution of the word, mainly describing “Liquid Crystal” You can check documents that touch the topic of “Finder”.

図９は表７のマトリクス構成において文書の本文を表示させたときの表示部１８における画面の構成例を示している。本文表示画面９８は凡例表示領域９０、本文表示領域９２、強調表示領域９４、および文書情報表示領域９６を含む。この表示は例えばクラスタ１「携帯」に分類され、かつクラスタ２「大き」の代表語句を含む文書集合を現す、数値３のドットを選択して本文を表示させた場合である。このとき本文表示画面９８は、本文のどの語句がどのクラスタの代表語句であるかを色別に強調表示する。 FIG. 9 shows a configuration example of a screen in the display unit 18 when the text of the document is displayed in the matrix configuration of Table 7. The text display screen 98 includes a legend display area 90, a text display area 92, a highlight display area 94, and a document information display area 96. This display is, for example, a case where the text is displayed by selecting a dot with a numerical value 3 that represents a document set classified as cluster 1 “mobile” and including a representative phrase of cluster 2 “large”. At this time, the text display screen 98 highlights which words in the text are representative words of which cluster by color.

まず凡例表示領域９０には各クラスタを強調するための枠の色の凡例を表示する。同図ではクラスタ「携帯」の代表語句には白色の枠９０ａ、クラスタ「大き」の代表語句には黒色の枠９０ｂを用いている。当然それ以外の色でもよい。本文表示領域９２には本文を表示し、強調表示領域９４には当該本文に含まれる代表語句に強調のための白色の枠９０ａ、黒色の枠９０ｂを施した文を表示する。同図の文ではクラスタ「大き」の代表語句として「大きさ」が黒色の枠９０ｂで、クラスタ「携帯」の代表語句として「重さ」および「携帯性」が白色の枠９０ａで囲まれて表示されている。文書情報表示領域９６には表示させた文書と関連づけて記憶されている情報、すなわち当該文書を含むファイル名、作成者の性別、年代、職業、作成日などを表示する。表示すべき文書が複数ある場合はページを切替えられるようにして複数ページに渡って表示させてもよい。 First, in the legend display area 90, a legend of the color of the frame for emphasizing each cluster is displayed. In the figure, a white frame 90a is used for the representative word / phrase of the cluster “mobile”, and a black frame 90b is used for the representative word / phrase of the cluster “large”. Of course, other colors may be used. A text is displayed in the text display area 92, and a text in which a white frame 90a and a black frame 90b for emphasis are applied to the representative words / phrases included in the text are displayed in the highlight display area 94. In the text of the figure, “size” is surrounded by a black frame 90b as a representative phrase of the cluster “large”, and “weight” and “portability” are surrounded by a white frame 90a as representative phrases of the cluster “mobile”. It is displayed. The document information display area 96 displays information stored in association with the displayed document, that is, the file name including the document, the gender of the creator, age, occupation, creation date, and the like. When there are a plurality of documents to be displayed, the pages may be switched over and displayed over a plurality of pages.

以上のような構成とすることにより、特定の話題のみと強く関連する話題や、複数の話題と広く関連する話題を把握することができる。そして「話題１」と「話題２」という２つの観点から書かれた点に共通性を有する文書を絞り込み、確認することができる。また数値を示すドットによって、「話題１」と「話題２」が関連して述べられているという傾向を把握することができる。 With the above configuration, it is possible to grasp topics that are strongly related to only a specific topic or topics that are widely related to a plurality of topics. Then, it is possible to narrow down and confirm documents having commonality with respect to the points written from the two viewpoints of “topic 1” and “topic 2”. In addition, it is possible to grasp the tendency that “topic 1” and “topic 2” are described in relation to each other by dots indicating numerical values.

（クラスタの固有表現分析）
表８は行の分類項目をクラスタ名である「クラスタ１」、「クラスタ２」とし、列の分類項目を固有表現のカテゴリ名である「固有表現カテゴリＡ」、「固有表現カテゴリＢ」とした場合のマトリクスである。ここで固有表現とは、商品名、組織名、地名、人名などの固有名詞や、日時、期間、金額、数量、ＵＲＬ（Uniform Resource Locator）、メールアドレス、電話番号など、物や数を識別する情報である。したがって「固有表現カテゴリＡ」などは、具体的な固有名詞など、またはその集合になる。例えば、「山田」、「田中」といった固有名詞のほか、それらを包含する集合として「人名」としてもよい。このマトリクスの目的は、各クラスタにどのように固有表現が分布しているかを把握する点にある。 (Cluster specific expression analysis)
In Table 8, the row classification items are “cluster 1” and “cluster 2” that are cluster names, and the column classification items are “specific expression category A” and “specific expression category B” that are the category names of specific expressions. A matrix of cases. Here, the unique expression identifies a thing or number such as a proper noun such as a product name, organization name, place name, person name, date, time, period, amount, quantity, URL (Uniform Resource Locator), e-mail address, telephone number, etc. Information. Therefore, “proprietary expression category A” or the like is a specific proper noun or the like or a set thereof. For example, in addition to proper nouns such as “Yamada” and “Tanaka”, “person names” may be used as a set including them. The purpose of this matrix is to grasp how the unique expressions are distributed in each cluster.

ここで数値１として例えば、「クラスタ１」に分類された文書集合の文書数ｎ１のうち、「固有表現カテゴリＡ」が抽出された文書集合の文書数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。このようなマトリクスを表示することにより、話題と固有表現とをクロス分析することができ、地域や人物、組織名等に密接に関連する話題や、それ以外の話題について容易に知見を得ることができる。 Here, as a numerical value 1, for example, a ratio of the document number nA_1 of the document set from which the “specific expression category A” is extracted out of the document number n1 of the document set classified as “cluster 1”, that is, a value of nA_1 / n1. indicate. The same applies to the numerical values 2 to 4. By displaying such a matrix, it is possible to cross-analyze topics and specific expressions, and to easily obtain knowledge about topics closely related to regions, persons, organization names, etc., and other topics it can.

（語句の時系列分析）
表９は行の分類項目を、文書から抽出された語句である「語句１」、「語句２」とし、列の分類項目を文書に関連付けて記憶された作成日、登録日などの時系列単位である「時系列単位Ａ」、「時系列単位Ｂ」とした場合のマトリクスである。文書から語句を抽出する処理は、情報表示装置１０の外部の装置が行ってよく、このとき情報表示装置１０の記憶部１２には抽出された語句と各文書とが関連付けて記憶されている。また時系列単位は（分類結果の時系列分析）において説明したのと同様の分類項目である。このマトリクスの目的は、文書中に出現する語句を時系列分析する点にある。 (Time series analysis of words)
Table 9 uses “word / phrase 1” and “word / phrase 2” which are words / phrases extracted from a document as row classification items, and time series units such as creation date and registration date stored in association with the column classification items in the document. This is a matrix in the case of “time series unit A” and “time series unit B”. The processing for extracting a phrase from a document may be performed by a device external to the information display device 10. At this time, the extracted phrase and each document are stored in association with each other in the storage unit 12 of the information display device 10. The time series unit is the same classification item as described in (Time Series Analysis of Classification Results). The purpose of this matrix is to analyze the words that appear in the document in time series.

抽出された語句のうち、表９のマトリクスの行の分類項目に表示する語句の数、すなわち行の数はあらかじめ設定しておく。例えば頻度が上位の２０語句などとする。ここで数値１として例えば、「語句１」を含む文書集合の文書数ｎ１のうち、「時系列単位Ａ」に属する文書集合の文書数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。 Of the extracted words / phrases, the number of words / phrases to be displayed in the classification item of the matrix row in Table 9, that is, the number of rows, is set in advance. For example, it is assumed that the top 20 words are frequently used. Here, as a numerical value 1, for example, a ratio of the number nA_1 of documents belonging to “time series unit A” out of the number n1 of documents including “word 1”, that is, a value of nA_1 / n1 is displayed. The same applies to the numerical values 2 to 4.

処理対象の文書が商品に関する顧客の問い合わせや苦情などである場合、文書の本文が例えば「印刷でエラーが発生した」であると「印刷」、「エラー」、「発生」という語句が抽出され、行の分類項目となる。表９のマトリクスはこれらの語句を含む文書数の経時変化を表示する。これにより、例えばある時期に急激に増加した語句に着目して分析を行うことが容易になり、顧客の声の変化や問題点の迅速な把握が可能になる。 If the document to be processed is a customer inquiry or complaint about a product, and the text of the document is, for example, “An error has occurred in printing”, the words “print”, “error”, “occurrence” are extracted, This is a line classification item. The matrix in Table 9 displays the change over time in the number of documents containing these words. Thereby, for example, it becomes easy to perform analysis while paying attention to words that have increased rapidly at a certain time, and it becomes possible to quickly grasp changes in customer voices and problems.

（フレーズによる意見傾向分析）
表１０は行の分類項目を文書より抽出された名詞句である「名詞句１」、「名詞句２」とし、列の分類項目を文書より抽出された形容詞句である「形容詞句Ａ」、「形容詞句Ｂ」とした場合のマトリクスである。列の分類項目は形容詞句に代わり動詞句、あるいは形容詞句と動詞句の組み合わせでもよい。（語句の時系列分析）と同様、文書から名詞句、形容詞句、動詞句を抽出する処理は、情報表示装置１０の外部の装置が行ってよい。このマトリクスの目的は、抽出された名詞句と、形容詞句あるいは動詞句との係り受けの関係、すなわちフレーズの一覧を文書数とともに視覚化する点にある。 (Phrase opinion analysis)
Table 10 uses “noun phrases 1” and “noun phrases 2” which are noun phrases extracted from a document as classification items in a row, and “adjective phrase A” which is an adjective phrase extracted from a document as a column classification item. It is a matrix in the case of “adjective phrase B”. The column classification item may be a verb phrase instead of an adjective phrase, or a combination of an adjective phrase and a verb phrase. Similar to (time-series analysis of words / phrases), processing for extracting noun phrases, adjective phrases, and verb phrases from a document may be performed by a device external to the information display device 10. The purpose of this matrix is to visualize the relationship between extracted noun phrases and adjective phrases or verb phrases, that is, a list of phrases together with the number of documents.

ここで数値１として例えば、形容詞句と係り受けをなす「名詞句１」の出現数ｎ１に対する、「名詞句１」と「形容詞句Ａ」による係り受けの出現数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。列の分類項目を動詞句や形容詞句と動詞句の組み合わせとした場合も同様の数値を表示する。表示切替ボタンを表示させることで、それらのマトリクスを切替えて表示できるようにする。また行の分類項目である、形容詞句などと係り受けをなす名詞句の数が多い場合は、出現数ｎ１が上位である名詞句のみを表示するよう、あらかじめ表示する名詞句の数の上限を設定しておく。 Here, as a numerical value 1, for example, the ratio of the number of appearances nA_1 of “noun phrase 1” and “adjective phrase A” to the number of occurrences of “noun phrase 1” that depends on the adjective phrase, that is, nA — 1 / n1. Is displayed. The same applies to the numerical values 2 to 4. The same numerical value is also displayed when the column classification item is a verb phrase or a combination of an adjective phrase and a verb phrase. By displaying the display switching button, the matrix can be switched and displayed. If there are a large number of noun phrases that depend on adjective phrases, such as line classification items, the upper limit of the number of noun phrases to be displayed in advance is displayed so that only the noun phrases with the highest number of occurrences n1 are displayed. Set it.

このようなマトリクスを表示することにより、文書中に出現する係り受けを一覧表示でき、何がどう書かれているか、といった文書のポイントの傾向を容易に把握することができる。例えば処理対象の文書が商品に関するアンケートの回答文などである場合、名詞句「液晶」について、「見にくい」、「大きい」、「小さい」などの形容詞句との係り受け関係が抽出され、各分類項目として表示される。また出現頻度の大小が視覚的に示される。これにより「液晶」についての印象や評価を容易に確認することができる。 By displaying such a matrix, it is possible to display a list of dependencies appearing in the document, and to easily grasp the tendency of the points of the document such as what is written and how. For example, if the document to be processed is an answer to a questionnaire about a product, the dependency relationship between the noun phrase “liquid crystal” and adjective phrases such as “difficult to see”, “large”, “small”, etc. is extracted. Displayed as an item. The appearance frequency is visually shown. Thereby, the impression and evaluation about "liquid crystal" can be confirmed easily.

（クラスタとフレーズによる意見傾向分析）
表１１は行の分類項目をクラスタ名である「クラスタ１」、「クラスタ２」とし、列の分類項目を文書から抽出されたフレーズである「フレーズＡ」、「フレーズＢ」とした場合のマトリクスである。このマトリクスの目的は、クラスタとフレーズの関係を把握する点にある。 (Analysis of opinion trends using clusters and phrases)
Table 11 shows a matrix in which the row classification items are cluster names “cluster 1” and “cluster 2” and the column classification items are phrases “phrase A” and “phrase B” extracted from the document. It is. The purpose of this matrix is to grasp the relationship between clusters and phrases.

ここで数値１として例えば、「クラスタ１」に属する文書集合の文書数ｎ１に対する、当該文書集合に含まれる「フレーズＡ」の出現数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。列の分類項目であるフレーズは階層構造を有することができる。この場合は上述したように、階層を切替えてマトリクスを表示できるようにする。例えば被修飾名詞句や形容詞句ごとに集計したものを表示したり、階層構造を展開して被修飾名詞句と形容詞句を列の分類項目として共に表示したりする。また異なるフレーズの数が多い場合は、出現数が上位であるフレーズのみを表示するよう、あらかじめ表示するフレーズの数の上限を設定しておく。 Here, as a numerical value 1, for example, a ratio of the number of appearances nA_1 of “phrase A” included in the document set to the number of documents n1 of the document set belonging to “cluster 1”, that is, a value of nA_1 / n1 is displayed. The same applies to the numerical values 2 to 4. Phrases that are column classification items can have a hierarchical structure. In this case, as described above, the hierarchy can be switched so that the matrix can be displayed. For example, an aggregated number for each modified noun phrase or adjective phrase is displayed, or the hierarchical structure is expanded to display the modified noun phrase and the adjective phrase together as column classification items. When there are a large number of different phrases, the upper limit of the number of phrases to be displayed is set in advance so that only the phrases with the highest number of occurrences are displayed.

このようなマトリクスを表示することにより、あるクラスタにはどのようなフレーズが頻出するか、など、クラスタとフレーズとの関係を取得でき、どの話題について何がどう書かれているか、といった文書のポイントの傾向をより的確に把握することができる。例えば処理対象の文書がカメラに関するアンケートの回答文などである場合、クラスタ１「本体」に属する文書集合に、フレーズＡ「ほこり−つきやすい」、フレーズＢ「ほこり−入りやすい」といったフレーズの出現数が多ければ、カメラの本体にほこりがつきやすいという問題があることが容易に把握される。 By displaying such a matrix, it is possible to obtain the relationship between clusters and phrases, such as what phrases frequently appear in a cluster, and document points such as what is written about what topic. Can be grasped more accurately. For example, when the document to be processed is a response to a questionnaire regarding a camera, the number of phrases such as phrase A “dust-easy” and phrase B “dust-easy” appears in the document set belonging to cluster 1 “main body”. If there are many, it will be easily understood that there is a problem that the camera body is likely to be dusty.

（フレーズを利用した属性分析）
表１２は行の分類項目を文書から抽出されたフレーズである「フレーズ１」、「フレーズ２」とし、列の分類項目を文書に関連付けて記憶された属性の値である「属性Ａ」、「属性Ｂ」とした場合のマトリクスである。このマトリクスの目的は、フレーズと属性の関係を把握する点にある。 (Attribute analysis using phrases)
In Table 12, “Phrase 1” and “Phrase 2” are phrases extracted from the document as row classification items, and “Attribute A” and “Attribute A” are stored as attribute values stored in association with the document. This is a matrix in the case of “attribute B”. The purpose of this matrix is to understand the relationship between phrases and attributes.

ここで数値１として例えば、処理対象の文書集合における「フレーズ１」の出現数ｎ１に対する、「属性Ａ」に属する文書集合における「フレーズ１」の出現数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。この数値によって、あるフレーズについて出現割合の高い属性を把握することができる。あるいは数値１として「属性Ａ」に属する文書集合におけるフレーズの出現数ｎＡに対する、「属性Ａ」に属する文書集合における「フレーズ１」の出現数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎＡなる値を表示してもよい。この数値によって、ある属性の値について、出現割合の高いフレーズを把握することができる。 Here, as a numerical value 1, for example, a ratio of the number nA_1 of “phrase 1” in the document set belonging to “attribute A” to the number n1 of “phrase 1” in the document set to be processed, that is, a value of nA_1 / n1. indicate. The same applies to the numerical values 2 to 4. By this numerical value, it is possible to grasp an attribute having a high appearance ratio for a certain phrase. Alternatively, as a numerical value 1, the ratio of the number of appearances nA_1 of “phrase 1” in the document set belonging to “attribute A” to the number of occurrences nA of phrases in the document set belonging to “attribute A”, that is, a value of nA_1 / nA is displayed. Also good. With this numerical value, a phrase having a high appearance ratio can be grasped for a certain attribute value.

また、列の分類項目表示領域５３のいずれかの分類項目、すなわち属性の値を選択して本文表示を行うと、当該属性に属する文書集合の本文が表示されるようにする。一方、行の分類項目表示領域５５のいずれかの分類項目、すなわちフレーズを選択して本文表示を行うと、当該フレーズを含む文書集合の本文が表示されるようにする。また、図形表示領域５７のあるドットの領域を選択して本文表示を行うと、対応するフレーズを含み、対応する属性に属する文書集合の本文が表示されるようにする。 Further, when any of the classification items in the column classification item display area 53, that is, the value of the attribute is selected and the text is displayed, the text of the document set belonging to the attribute is displayed. On the other hand, when one of the classification items in the classification item display area 55 in the row, that is, the phrase is selected and the text is displayed, the text of the document set including the phrase is displayed. When a dot area in the graphic display area 57 is selected and the text is displayed, the text of the document set including the corresponding phrase and belonging to the corresponding attribute is displayed.

行の分類項目であるフレーズは（クラスタとフレーズによる意見傾向分析）と同様、階層構造を有していてよく、この場合は上述したように階層を切替えてマトリクスを表示できるようにする。また異なるフレーズの数が多い場合は、出現数が上位であるフレーズのみを表示するよう、あらかじめ表示するフレーズの数の上限を設定しておく。さらに行の分類項目は、フレーズに代わり共起頻度上位タームペアとしてもよい。共起頻度上位タームペアとは、一文中に出現する語句のペアを頻度順に並べてその上位を取得したものである。 Phrases, which are classification items of rows, may have a hierarchical structure, similar to (opinion trend analysis based on clusters and phrases). In this case, as described above, the hierarchy is switched so that the matrix can be displayed. When there are a large number of different phrases, the upper limit of the number of phrases to be displayed is set in advance so that only the phrases with the highest number of occurrences are displayed. Furthermore, the line classification item may be a co-occurrence frequency upper term pair instead of the phrase. The co-occurrence frequency upper term pair is obtained by arranging the word / phrase pairs appearing in one sentence in the order of frequency and acquiring the higher rank.

このようなマトリクスを表示することにより、何がどうである、何をどうして欲しいといった、文書のポイントを把握できるフレーズの出現度合いを、文書に対応する属性と関連づけて取得することができる。これにより、例えば処理対象の文書が顧客の意見であった場合に、顧客の性別ごとに何をどうして欲しいかの傾向を把握したり、時系列単位で何がどうであるかの推移を容易に把握することができる。 By displaying such a matrix, it is possible to acquire the degree of appearance of a phrase that can grasp the point of the document, such as what is what and what is desired, in association with the attribute corresponding to the document. For example, when the document to be processed is a customer's opinion, for example, it is possible to grasp the trend of what you want to do for each gender of the customer, and to easily change what is in time series. I can grasp it.

（アフェクト度時系列分析）
表１３は行の分類項目を評価フレーズに含まれる名詞句である「評価フレーズの名詞句１」、「評価フレーズの名詞句２」とし、列の分類項目を文書に関連付けて記憶された作成日、登録日などの時系列単位である「時系列単位Ａ」、「時系列単位Ｂ」とした場合のマトリクスである。ここで評価フレーズとは文書から抽出されたフレーズのうち、評価に係るフレーズのことであり、例えば、被修飾名詞句「画質」と形容詞句「悪い」のような係り受けである。このマトリクスの目的は、アフェクト度の経時変化を把握する点にある。 (Affect degree time series analysis)
Table 13 shows that the classification items in the row are the noun phrases “evaluation phrase noun phrase 1” and “evaluation phrase noun phrase 2” included in the evaluation phrase, and the column classification items are stored in association with the document. , A matrix in the case of “time series unit A” and “time series unit B” which are time series units such as registration date. Here, the evaluation phrase is a phrase related to evaluation among phrases extracted from a document, and is a dependency such as a modified noun phrase “image quality” and an adjective phrase “bad”. The purpose of this matrix is to grasp the change in the degree of effect over time.

ここで数値１として例えば、「時系列単位Ａ」に属する文書集合における「評価フレーズの名詞句１」のアフェクト度を表示する。数値２〜４も同様である。ここでアフェクト度は評価の度合いを数値化したものであり次のように定義される。
印象表現のアフェクト度＝程度副詞の程度値×印象表現語のアフェクト度
被修飾名詞句のアフェクト度＝Σ（印象表現のアフェクト度×印象表現語を含む印象表現の頻度）／（アフェクト度が得られた印象表現語の頻度） Here, as the numerical value 1, for example, the degree of effect of “noun phrase 1 of the evaluation phrase” in the document set belonging to “time series unit A” is displayed. The same applies to the numerical values 2 to 4. Here, the degree of effect is obtained by quantifying the degree of evaluation and is defined as follows.
Impression expression effect level = Degree adverb degree value x Impression expression word effect degree Modified noun phrase effect degree = Σ (Impression expression effect degree x Impression expression frequency including impression expression word) / (Affect degree obtained) Frequency of impression expression words)

ここでΣは算出結果の合計である。アフェクト度の算出に先立ち、あらかじめ印象表現語とその印象表現語が表すアフェクト度とを関連付けて記憶部１２に記憶させておく。また程度副詞が印象表現語をどの程度強調するかを表す程度値を、各程度副詞と関連付けて記憶部１２に記憶させておく。これにより上記の定義を用いて印象表現のアフェクト度が得られる。また被修飾名詞句のアフェクト度は定義のとおり、同じ名詞句を有する印象表現について、頻度を考慮したそれぞれの印象表現のアフェクト度を足し合わせ、さらにアフェクト度が得られた印象表現の数で割ることによって得られる。この数値は名詞句の印象表現あたりのアフェクト度であり、被修飾名詞句が表している対象に対する情動性を表している。 Here, Σ is the sum of the calculation results. Prior to the calculation of the effect level, the impression expression word and the effect level represented by the impression expression word are associated with each other and stored in the storage unit 12 in advance. A degree value indicating how much the degree adverb emphasizes the impression expression word is stored in the storage unit 12 in association with each degree adverb. Thereby, the degree of effect of impression expression is obtained using the above definition. The degree of effect of the modified noun phrase is, as defined, for impression expressions having the same noun phrase, adding the degree of effect of each impression expression considering the frequency, and dividing by the number of impression expressions that obtained the degree of effect. Can be obtained. This numerical value is the degree of effect per impression expression of the noun phrase, and expresses the affectivity for the object represented by the modified noun phrase.

例えばある文書集合において、被修飾名詞句「ボディー」に係る形容詞句として「キュート」が２回、「頑丈」が２回、「アンバランス」１回が出現した場合を考える。それらの形容詞句のアフェクト度がそれぞれ「３」、「３」、「−１」であるとき、「ボディー」のアフェクト度は（３×２＋３×２＋（−３×１））／５＝１．８と算出できる。 For example, consider a case where “cute” appears twice, “strong” appears twice, and “unbalance” appears once as an adjective phrase related to the modified noun phrase “body” in a document set. When the degree of effect of these adjective phrases is “3”, “3”, and “−1”, the degree of effect of “body” is (3 × 2 + 3 × 2 + (− 3 × 1)) / 5 = 1. 8 can be calculated.

表１３のマトリクスにさらに、アフェクト度の平均を表す行および列を追加して表示してもよい。例えばマトリクスの右側に列を追加し、全期間における「評価フレーズの名詞句１」の平均アフェクト度などをドットで表示したり、マトリクスの下側に行を追加し、「時系列単位Ａ」における全名詞句の平均アフェクト度などをドットで表示してもよい。 In addition to the matrix in Table 13, rows and columns representing the average degree of effect may be added and displayed. For example, a column is added to the right side of the matrix, and the average degree of effect of “evaluation phrase noun phrase 1” is displayed with dots in the entire period, or a row is added to the lower side of the matrix. You may display the average effect degree of all noun phrases, etc. with a dot.

アフェクト度は正、負の実数値を取りうるため、ドットの色で正負を表し、ドットの大きさで絶対値を表してもよい。例えばアフェクト度が正の場合は青、負の場合は赤、０の場合はグレーというように３色で表示し、絶対値によって大きさを決定する。このような構成とすることにより、１つの画面内でアフェクト度の推移を名詞句ごとに追うことができ、比較や分析が容易になる。 Since the degree of effect can be a positive or negative real value, the dot color may represent positive or negative, and the dot size may represent an absolute value. For example, when the degree of effect is positive, it is displayed in three colors such as blue, when it is negative, red when it is 0, and the size is determined by the absolute value. By adopting such a configuration, the transition of the degree of effect can be followed for each noun phrase in one screen, and comparison and analysis are facilitated.

また、列の分類項目表示領域５３のいずれかの分類項目、すなわち時系列単位を選択して本文表示を行うと、当該時系列単位に属する文書集合のうち評価フレーズの名詞句を含む文書集合の本文が表示されるようにする。一方、行の分類項目表示領域５５のいずれかの分類項目、すなわち評価フレーズの名詞句を選択して本文表示を行うと、当該評価フレーズの名詞句を含む文書集合の本文が表示されるようにする。また、図形表示領域５７のあるドットの領域を選択して本文表示を行うと、対応する評価フレーズの名詞句を含む文書集合のうち、対応する時系列単位に属する文書集合の本文が表示されるようにする。この際、本文に含まれる評価フレーズを属性として抜き出して表示する。 Further, when any of the classification items in the column classification item display area 53, that is, the time series unit is selected and the text is displayed, the document set including the noun phrase of the evaluation phrase among the document sets belonging to the time series unit is displayed. Make sure the text is displayed. On the other hand, when one of the classification items in the classification item display area 55 in the row, that is, the noun phrase of the evaluation phrase is selected and the text is displayed, the text of the document set including the noun phrase of the evaluation phrase is displayed. To do. When a text area is displayed by selecting a dot area in the graphic display area 57, the text of the document set belonging to the corresponding time-series unit among the document sets including the noun phrase of the corresponding evaluation phrase is displayed. Like that. At this time, the evaluation phrase included in the text is extracted and displayed as an attribute.

図１０は表１３のマトリクス構成において文書の本文を表示させたときの表示部１８における画面の構成例を示している。本文表示画面１０８は文書データ表示領域１１０を含む。文書データ表示領域１１０は、被修飾名詞句表示領域１１２、形容詞句表示領域１１４、本文表示領域１１６、および属性表示領域１１８を含む。同図は例えば、商品であるカメラに関する顧客の感想文などを処理対象としており、評価フレーズの名詞句として「撮影」を含む文書集合を選択して本文を表示させた場合の画面である。そのため被修飾名詞句表示領域１１２には全ての文において「撮影」と記載されている。 FIG. 10 shows a configuration example of the screen in the display unit 18 when the text of the document is displayed in the matrix configuration of Table 13. The text display screen 108 includes a document data display area 110. The document data display area 110 includes a modified noun phrase display area 112, an adjective phrase display area 114, a text display area 116, and an attribute display area 118. This figure is a screen when, for example, a customer's comment on a camera as a product is a processing target, and a text set is displayed by selecting a document set including “shooting” as a noun phrase of an evaluation phrase. Therefore, “captured” is described in all sentences in the modified noun phrase display area 112.

また、各文書に含まれる評価フレーズの形容詞句が本文から抜き出され、形容詞句表示領域１１４に記載されている。これにより、本文表示領域１１６に記載されている各文書の評価のポイントを一見して確認することができる。属性表示領域１１８には文書を作成した人の性別や年代などの属性が記載されている。これにより、どのような顧客層がどのような評価をしているかの傾向を把握しながら、場合によって本文を参照し具体的な内容を確認する、といったことが一画面で行え、集計や分析を効率的に行うことができる。 In addition, the adjective phrase of the evaluation phrase included in each document is extracted from the text and described in the adjective phrase display area 114. Thereby, it is possible to confirm at a glance the evaluation points of each document described in the text display area 116. The attribute display area 118 describes attributes such as the gender and age of the person who created the document. In this way, it is possible to check the specific contents by referring to the main text on a single screen while grasping the trend of what kind of customers are doing what kind of evaluations. Can be done efficiently.

図１１および図１２は、同じく表１３のマトリクス構成において文書の本文を表示させたときの表示部１８における画面構成の別の例を示している。この例では２段階の処理により本文を表示する。まず１段階目として選択した名詞句を含む文書集合における、当該名詞句を含む評価フレーズの出現数およびアフェクト度を一覧表示する。図１１はその際の画面構成例を示している。評価フレーズカウント表示画面１２８は、被修飾名詞句表示領域１２０および形容詞句表示領域１２２を含む。被修飾名詞句表示領域１２０には、表１３のマトリクス表示において選択されたドットに対応する、評価フレーズの名詞句およびその出現数が表示される。同図では名詞句「撮影」および出現数「３７」が表示されている。 11 and 12 show another example of the screen configuration in the display unit 18 when the text of the document is displayed in the matrix configuration of Table 13 as well. In this example, the text is displayed by a two-stage process. First, in the document set including the noun phrase selected as the first stage, the number of appearances of the evaluation phrase including the noun phrase and the degree of effect are displayed in a list. FIG. 11 shows a screen configuration example at that time. The evaluation phrase count display screen 128 includes a modified noun phrase display area 120 and an adjective phrase display area 122. In the modified noun phrase display area 120, the noun phrase of the evaluation phrase and the number of appearances thereof corresponding to the dot selected in the matrix display of Table 13 are displayed. In the figure, the noun phrase “shooting” and the number of appearances “37” are displayed.

形容詞句表示領域１２２には、選択されたドットに対応する名詞句と係り受けをなす形容詞句およびその出現数とアフェクト度が表示される。同図では形容詞句として「不向き」、「よい」、「〜安定」が表示され、それぞれの出現数が「７」、「３」、「２」、アフェクト度が「不評中」、「好評中」、「不評低」と表示されている。ここでアフェクト度の表示は、各形容詞句のアフェクト度を言葉によって表現している。すなわち、アフェクト度が正の値であれば好評、負の値であれば不評であり、さらにその程度を高、中、低の３段階で表している。例えばアフェクト度が−３の形容詞句は「不評中」、アフェクト度が４．５の形容詞句は「好評高」などと表示する。それらの対応関係はあらかじめ設定して記憶部１２に記憶させておく。 The adjective phrase display area 122 displays an adjective phrase that is dependent on the noun phrase corresponding to the selected dot, the number of appearances, and the degree of effect. In the figure, “unsuitable”, “good”, and “to stable” are displayed as adjective phrases, and the number of occurrences of each is “7”, “3”, “2”, and the degree of effect is “unpopular” or “popular” "," Unpopular "is displayed. Here, the indication of the degree of effect expresses the degree of effect of each adjective phrase by words. That is, if the effect degree is a positive value, it is popular, and if it is negative, it is not popular. Further, the degree is represented by three levels of high, medium, and low. For example, an adjective phrase with an effect degree of -3 is displayed as “not popular”, and an adjective phrase with an effect degree of 4.5 is displayed as “highly popular”. These correspondences are set in advance and stored in the storage unit 12.

評価フレーズカウント表示画面１２８において形容詞句選択チェックボックス１２４がチェックされたら、本文表示の２段階目として、形容詞句のいずれかを含む文書集合、すなわち表１３のマトリクスで選択したドットに対応した文書集合の本文を表示する。図１２はその際の画面構成例を示している。なお、評価フレーズカウント表示画面１２８において個々の形容詞句の先頭にあるチェックボックスがチェックされた場合は、各形容詞句を含む文書集合の本文のみを表示する。 When the adjective phrase selection check box 124 is checked on the evaluation phrase count display screen 128, as a second stage of the text display, a document set including any of the adjective phrases, that is, a document set corresponding to the dot selected in the matrix of Table 13 Displays the text of. FIG. 12 shows a screen configuration example at that time. When the check box at the head of each adjective phrase is checked on the evaluation phrase count display screen 128, only the text of the document set including each adjective phrase is displayed.

図１２における本文表示画面１３０は、全文表示指示領域１３２、形容詞句表示領域１３６、本文表示領域１３８、クラスタ名表示領域１４０、属性表示領域１４２を含む。形容詞句表示領域１３６には、評価フレーズカウント表示画面１２８に表示した形容詞句を表示し、本文表示領域１３８には各形容詞句を含む文書のうち、該当箇所を含む所定長の文を表示する。同図では例えば、形容詞句「不向き」を含む７件の文書の本文が、上から順に表示され、次の形容詞句「よい」を含む３件の文書の本文が、その次に表示されている。本文表示では着目する名詞句および形容詞句を枠で囲ったり色付けを行うなどして強調表示する。 The text display screen 130 in FIG. 12 includes a full text display instruction area 132, an adjective phrase display area 136, a text display area 138, a cluster name display area 140, and an attribute display area 142. In the adjective phrase display area 136, the adjective phrases displayed on the evaluation phrase count display screen 128 are displayed, and in the text display area 138, a sentence having a predetermined length including the corresponding part is displayed among documents including the adjective phrases. In the figure, for example, the texts of seven documents including the adjective phrase “unsuitable” are displayed in order from the top, and the texts of three documents including the next adjective phrase “good” are displayed next. . In the text display, the noun phrase and adjective phrase of interest are highlighted by surrounding them with a frame or coloring them.

クラスタ名表示領域１４０には各文書が属するクラスタのクラスタ名、例えば「大き」、「室内」、「動画」などを表示する。これにより表示している文書が主に何を話題にしたものであるかが容易に把握できる。属性表示領域１４２には図１０の画面例と同様、文書を作成した人の性別や年代などが記載される。また、各行の先頭に表示した全文表示指示領域１３２がクリックされた場合は、選択された行の文書の全文をさらに表示する。 The cluster name display area 140 displays the cluster name of the cluster to which each document belongs, for example, “large”, “room”, “moving image”, and the like. Thereby, it is possible to easily grasp what the displayed document is mainly about. Similar to the screen example of FIG. 10, the attribute display area 142 describes the gender and age of the person who created the document. When the full text display instruction area 132 displayed at the beginning of each line is clicked, the full text of the document on the selected line is further displayed.

本文表示を２段階にすることにより、アフェクト度や出現数などを確認したうえで本文を表示させることができ、文書集合が多数となった場合でも効率よく絞り込みが行える。またアフェクト度、出現数、クラスタ名、本文といった多角的なデータを容易に関連付けて理解することができる。 By making the text display in two stages, it is possible to display the text after confirming the degree of influence and the number of appearances, and it is possible to efficiently narrow down even when there are a large number of document sets. In addition, various data such as the degree of effect, the number of appearances, the cluster name, and the text can be easily associated and understood.

（アフェクト度属性分析）
表１４は行の分類項目を文書集合に含まれる印象表現語句である「印象表現語句１」、「印象表現語句２」とし、列の分類項目を文書に関連付けて記憶された属性の名前である「属性Ａ」、「属性Ｂ」とした場合のマトリクスである。このマトリクスの目的は、用いられる印象表現と属性との関係をアフェクト度を利用して把握する点にある。 (Affect degree attribute analysis)
Table 14 shows the name of the attribute stored in association with the document with the column classification item being “impression expression phrase 1” and “impression expression phrase 2”, which are impression expression phrases included in the document set. This is a matrix in the case of “attribute A” and “attribute B”. The purpose of this matrix is to grasp the relationship between the impression expression used and the attribute using the degree of effect.

ここで数値１として例えば、「属性Ａ」に属する文書集合における「印象表現語句１」の頻度を考慮したアフェクト度を表示する。数値２〜４も同様である。例えば「印象表現語句１」が「キュート」、「印象表現語句２」が「アンバランス」なる語句であり、「属性Ａ」が「男性」、「属性Ｂ」が「女性」であったとする。「キュート」の固有のアフェクト度が３、「男性」および「女性」が作成した文書における「キュート」の頻度がそれぞれ２、および１０であるとする。また、「アンバランス」の固有のアフェクト度が−３、「男性」および「女性」が作成した文書における「アンバランス」の頻度がそれぞれ１２、および４であるとする。このとき頻度を考慮した各語句のアフェクト度は固有のアフェクト度に頻度を乗算して得られるため、数値１は６、数値２は３０、数値３は−３６、数値４は−１２となる。 Here, as the numerical value 1, for example, the degree of effect considering the frequency of “impression expression word / phrase 1” in the document set belonging to “attribute A” is displayed. The same applies to the numerical values 2 to 4. For example, it is assumed that “impression expression phrase 1” is “cute”, “impression expression phrase 2” is “unbalance”, “attribute A” is “male”, and “attribute B” is “female”. Assume that the “cute” unique effect degree is 3, and the frequencies of “cute” in documents created by “male” and “female” are 2 and 10, respectively. Further, it is assumed that the unique “unbalance” effect degree is −3, and the frequency of “unbalance” in documents created by “male” and “female” is 12 and 4, respectively. At this time, the degree of affect of each word and phrase considering the frequency is obtained by multiplying the frequency of the unique degree of effect, so the numerical value 1 is 6, the numerical value 2 is 30, the numerical value 3 is -36, and the numerical value 4 is -12.

図１３はこのデータをドットで表した場合のマトリクスを示している。同図のマトリクス５０において列の分類項目欄５２には「男性」、「女性」なる属性名が表示され、行の分類項目欄５４には「キュート」、「アンバランス」なる印象表現語句が表示されている。図形表示欄５６にはドットの大きさおよび色で、各印象表現語句のアフェクト度が属性ごとに表示されている。ここでは表示の便宜上、白色および黒色でアフェクト度の正および負を表している。 FIG. 13 shows a matrix when this data is represented by dots. In the matrix 50 of the same figure, the column classification item column 52 displays attribute names “male” and “female”, and the row classification item column 54 displays impression expression phrases “cute” and “unbalance”. Has been. In the graphic display column 56, the degree of effect of each impression expression word / phrase is displayed for each attribute in the size and color of the dot. Here, for the convenience of display, white and black indicate the positive and negative effect degrees.

このようなマトリクスを表示することにより、例えば「女性」はアフェクト度が正の表現、すなわち好評の評価をする際に「キュート」という語句を多く用いる、不評の評価をする際、「アンバランス」という語句はどちらかといえば「男性」が多く用いる、といった、印象表現と属性との関係を傾向として把握することができる。 By displaying such a matrix, for example, “female” has a positive degree of effect, that is, the word “cute” is often used when evaluating a favorable evaluation. It is possible to grasp the relationship between the impression expression and the attribute as a tendency, such as “man” is often used.

（ＦＡＱ作成支援）
表１５は行の分類項目を顧客からの問合せ文をクラスタリングした際のクラスタ名である「クラスタ１（問合せ）」、「クラスタ２（問合せ）」とし、列の分類項目を問合せ文に対する回答文をクラスタリングした際のクラスタ名である「クラスタＡ（回答）」、「クラスタＢ（回答）」とした場合のマトリクスである。問合せ文および回答文は、電子メールや葉書に記載された文章や、電話における音声を文書化したものなどである。行および列に表示させるクラスタの数は同一としてよい。このマトリクスの目的は問合せに対する回答のばらつきや、回答に対する問合せのばらつきを把握する点にある。 (FAQ preparation support)
Table 15 shows “cluster 1 (query)” and “cluster 2 (query)” which are cluster names when query statements from customers are clustered as row classification items, and column classification items are response sentences to the query text. This is a matrix in the case of “cluster A (answer)” and “cluster B (answer)” which are cluster names when clustering. The inquiry sentence and the answer sentence are sentences written in an e-mail or a postcard, or a voice documented on a telephone. The number of clusters displayed in the rows and columns may be the same. The purpose of this matrix is to grasp the variation of answers to queries and the variation of queries to answers.

ここで数値１として例えば、「クラスタ１（問合せ）」に属する文書集合の文書数ｎ１のうち、「クラスタＡ（回答）」に属する文書集合の文書数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。この値は問合せの内容に対する回答の内容のばらつきを表す。一方、数値１として例えば「クラスタＡ（回答）」に属する文書集合の文書数ｎＡのうち、「クラスタ１（問合せ）」に属する文書集合の文書数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎＡなる値を表示してもよい。この値は回答の内容に対する問合せの内容のばらつきを表す。なおユーザがマトリクス表示領域５１をクリックすることによりこれらの数値を切替えて表示するようにしてもよい。 Here, as the numerical value 1, for example, the ratio of the number of documents nA_1 of the document set belonging to “Cluster A (Answer)” out of the number of documents n1 of the document set belonging to “Cluster 1 (inquiry)”, that is, a value of nA_1 / n1. indicate. The same applies to the numerical values 2 to 4. This value represents the variation in the content of the response to the content of the inquiry. On the other hand, as a numerical value 1, for example, the ratio of the number of documents nA_1 of the document set belonging to “Cluster 1 (inquiry)” out of the number of documents nA belonging to “Cluster A (answer)”, that is, a value of nA_1 / nA is displayed. May be. This value represents the variation in the content of the inquiry with respect to the content of the answer. Note that these numerical values may be switched and displayed when the user clicks on the matrix display area 51.

例えばある問合せに対して回答の内容のばらつきが大きい場合、そのような問合せに対する回答基準を明確化する必要がある。このように問合せの内容に対する回答の内容のばらつきをマトリクス表示することにより、回答する側の改善点を把握することができる。また、回答の内容に対する問合せの内容のばらつきをマトリクス表示することにより、問合せ者と回答者の用いる用語のばらつきを把握したり、よくある回答からＦＡＱを作成するために回答の類似性を把握したりすることができる。 For example, when there is a large variation in the content of answers to a certain inquiry, it is necessary to clarify the answer criteria for such an inquiry. In this way, by displaying the variation of the answer contents with respect to the contents of the inquiry in a matrix, the improvement point on the answering side can be grasped. In addition, by displaying the variation in the content of the query with respect to the content of the response in a matrix, it is possible to grasp the variation in the terms used by the inquirer and the respondent, and to understand the similarity of the responses in order to create a FAQ from the common answers. Can be.

（ＦＡＱ検索文時系列分析）
表１６は行の分類項目をＦＡＱ検索において質問者が入力した検索文である「検索文１」、「検索文２」とし、列の分類項目を入力された時系列単位である「時系列単位Ａ」、「時系列単位Ｂ」とした場合のマトリクスである。ここで検索文とは目的のドキュメントを検索するためにＦＡＱ検索システムにおいて入力された、「郵便番号」といった検索文字列、あるいは「７桁の郵便番号について知りたい」といった文章のことである。表１６に表示する検索文は頻度が上位のものとし、表示させる数はあらかじめ設定しておく。このマトリクスの目的は、ＦＡＱ検索において入力された検索文の経時変化を把握する点にある。 (FAQ search time series analysis)
Table 16 uses “search sentence 1” and “search sentence 2” which are search sentences input by the questioner in the FAQ search as row classification items, and “time series unit” which is a time series unit in which column classification items are input. A matrix when “A” and “time-series unit B” are used. Here, the search sentence is a search character string such as “zip code” or a sentence “I want to know about a 7-digit zip code”, which is input in the FAQ search system in order to search for a target document. The search sentences to be displayed in Table 16 have the highest frequency, and the number to be displayed is set in advance. The purpose of this matrix is to grasp changes with time of a search sentence inputted in the FAQ search.

ここで数値１として例えば、全期間における「検索文１」の入力件数ｎ１に対する、「時系列単位Ａ」に入力された「検索文１」の入力件数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。この数値により、ある検索文の入力頻度が高い時期などを特定できる。一方、数値１として例えば「時系列単位Ａ」に入力された検索文の入力件数ｎＡにおける「検索文１」の入力件数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎＡなる値を表示してもよい。この数値により、ある期間において入力件数の多い検索文などを特定できる。なおユーザがマトリクス表示領域５１をクリックすることによりこれらの数値を切替えて表示するようにしてもよい。 Here, as a numerical value 1, for example, a ratio of the number nA_1 of “search sentence 1” input in “time series unit A” to the number n1 of “search sentence 1” in all periods, that is, a value of nA_1 / n1. indicate. The same applies to the numerical values 2 to 4. By this numerical value, it is possible to specify the time when the input frequency of a certain search sentence is high. On the other hand, as a numerical value 1, for example, a ratio of the input number nA_1 of “search sentence 1” to the input number nA of search sentences input in “time series unit A”, that is, a value of nA_1 / nA may be displayed. With this numerical value, a search sentence having a large number of input items in a certain period can be specified. Note that these numerical values may be switched and displayed when the user clicks on the matrix display area 51.

また行の分類項目は検索文そのものでもよいし、検索文から抽出した語句別に分類項目としてもよい。このようなマトリクスを表示することにより、例えば問合せが増加している検索文について把握でき、当該問合せに対応するコンテンツを拡充させたり商品の問題点を抽出したり、といった改善に向けた対策を迅速に立てることができる。 The classification item of the line may be the search sentence itself, or may be a classification item for each phrase extracted from the search sentence. By displaying such a matrix, it is possible to grasp, for example, search sentences with increasing inquiries, and to quickly take measures for improvement such as expanding the content corresponding to the inquiries and extracting product problems. Can stand up.

行の分類項目を質問文から抽出された語句として同様のマトリクスを表示してもよい。ここで質問文とはＦＡＱに含まれる質問、すなわち“よくある質問”のことであり、例えば「印刷でエラーが発生した」といった文章である。このような文章に対し語句抽出処理を行い抽出された語句によって分類を行う。上記の文章では例えば「印刷」、「エラー」、「発生」という語句が抽出される。語句の抽出処理は情報表示装置１０の外部の装置が行ってもよく、その場合は抽出された語句を文書および属性と関連づけて記憶部１２に記憶しておく。行の分類項目を質問文から抽出された語句とした場合も、検索文と同様の効果を得ることができる。 A similar matrix may be displayed with the line category items as words extracted from the question sentence. Here, the question sentence is a question included in the FAQ, that is, a “frequently asked question”, for example, a sentence such as “an error has occurred in printing”. Phrase extraction processing is performed on such sentences, and classification is performed based on the extracted phrases. In the above sentence, for example, the words “print”, “error”, and “occurrence” are extracted. The phrase extraction process may be performed by a device external to the information display apparatus 10. In this case, the extracted phrase is associated with the document and the attribute and stored in the storage unit 12. Even when the line classification item is a word extracted from the question sentence, the same effect as the search sentence can be obtained.

（ＦＡＱ検索文のカテゴリ分析）
表１７は行の分類項目をＦＡＱ検索において質問者が入力した検索文である「検索文１」、「検索文２」とし、列の分類項目を検索文に対応するカテゴリ名である「カテゴリＡ」、「カテゴリＢ」とした場合のマトリクスである。このマトリクスの目的は、ＦＡＱ検索において入力された検索文をカテゴリごとの割合として把握する点にある。カテゴリは、作成されたＦＡＱに付与される項目についての情報であり、例えばプリンタについてのＦＡＱであれば「用紙設定・印刷」や「はがき」などである。検索文の入力時にカテゴリの指定を行うことにより、検索文とカテゴリとを関連付けて記憶しておくこともできる。 (Category analysis of FAQ search sentences)
In Table 17, “Search text 1” and “Search text 2”, which are search sentences input by the questioner in the FAQ search, are set as the row classification items, and “Category A” is the category name corresponding to the search text as the column classification items. ”And“ Category B ”. The purpose of this matrix is to grasp the search text input in the FAQ search as a ratio for each category. The category is information about items assigned to the created FAQ. For example, if the FAQ is for a printer, the category is “paper setting / printing” or “postcard”. By specifying a category when inputting a search sentence, the search sentence and the category can be stored in association with each other.

ここで数値１として例えば、「検索文１」の全入力件数ｎ１のうち、「カテゴリＡ」に属する「検索文１」の入力件数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。また行の分類項目は検索文そのものでもよいし、検索文から抽出した語句別に分類項目としてもよい。このようなマトリクスを表示することにより、同じ検索文でも分類される割合の高くなりやすいカテゴリを特定したり、複数のカテゴリに対する分布が類似する検索文を特定したりできる。例えばコンピュータに関する検索において複数のカテゴリで用いられる「ドライバ」なる検索文が、「機種」や「ＯＳ」といったカテゴリのうちどのカテゴリで検索されることが多いか、などの傾向を把握することができる。 Here, as the numerical value 1, for example, the ratio of the input number nA_1 of “search sentence 1” belonging to “category A” out of all the input numbers n1 of “search sentence 1”, that is, a value of nA_1 / n1 is displayed. The same applies to the numerical values 2 to 4. The classification item of the line may be the search sentence itself, or may be a classification item for each phrase extracted from the search sentence. By displaying such a matrix, it is possible to specify categories that tend to be classified even in the same search sentence, or to specify search sentences that have similar distributions for a plurality of categories. For example, it is possible to grasp a tendency such as a search term “driver” used in a plurality of categories in a search related to a computer in which category of “model” or “OS” is frequently searched. .

なお（分類結果の時系列分析）と同様、あらかじめ各カテゴリを社内の部門に割り当て、ユーザが行の分類項目表示領域５５をクリックした際に、各カテゴリが属する部門別に集計し直したマトリクスを表示するようにしてもよい。 As with (Time Series Analysis of Classification Results), each category is assigned to internal departments in advance, and when the user clicks on the classification item display area 55 in the row, a matrix that is re-aggregated by department to which each category belongs is displayed. You may make it do.

（ＦＡＱ質問文語句のカテゴリ分布把握）
表１８は行の分類項目をＦＡＱの質問文から抽出された語句である「質問文の語句１」、「質問文の語句２」とし、列の分類項目を質問文に対応するカテゴリ名である「カテゴリＡ」、「カテゴリＢ」とした場合のマトリクスである。このマトリクスの目的は、質問文に含まれる語句がどのようなカテゴリに分布しているかを把握する点にある。 (Understanding the category distribution of FAQ questions)
Table 18 is a category name corresponding to a question sentence with column classification items as “question word phrase 1” and “question word phrase 2”, which are words extracted from FAQ question sentences. This is a matrix in the case of “Category A” and “Category B”. The purpose of this matrix is to grasp in what category the words included in the question sentence are distributed.

ここで数値１として例えば、「質問文の語句１」を含む質問文の数ｎ１のうち、「カテゴリＡ」に属する質問文の数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。このようなマトリクスを表示することにより、質問文に含まれる語句のカテゴリに対する分布を把握することができる。例えばあるカテゴリにのみ属している質問文の語句は、カテゴリを特徴づける語句である可能性が高い。したがってＦＡＱの作成する際、当該語句を含む質問文に付与するカテゴリの決定に有用な情報となる。 Here, for example, a ratio of the number nA_1 of question sentences belonging to “category A” out of the number n1 of question sentences including “phrase 1 of question sentence”, that is, a value of nA_1 / n1 is displayed. The same applies to the numerical values 2 to 4. By displaying such a matrix, it is possible to grasp the distribution of the phrase included in the question sentence with respect to the category. For example, a phrase of a question sentence that belongs only to a certain category is highly likely to be a phrase that characterizes the category. Therefore, when the FAQ is created, it is useful information for determining the category to be given to the question sentence including the word / phrase.

また２、３個のカテゴリに分布し、そのうち１つのカテゴリに属する割合が極端に高い語句があった場合、分布割合の低いカテゴリに属する質問文は、誤ったカテゴリが付与されている可能性がある。またカテゴリの内容と直接関係ないにも関わらずその語句を使用している質問文である可能性もある。このような観点からこのマトリクス表示は、作成済みのＦＡＱにおいて付与されているカテゴリが正確かどうかを判断するための材料となる。 Also, if there are words that are distributed in a few categories and the ratio of one of them is extremely high, the question sentence that belongs to the category with a low distribution ratio may have an incorrect category. is there. In addition, there is a possibility that the question sentence uses the phrase even though it is not directly related to the contents of the category. From this point of view, this matrix display is a material for determining whether or not the category assigned in the prepared FAQ is accurate.

さらに複数のカテゴリに比較的均等に分布し、かつカテゴリ名にもなっている語句がある場合は、作成済みのＦＡＱにおいてカテゴリの構成に問題がある可能性が考えられる。例えば質問文「印刷時にエラーが出ます」が「印刷」カテゴリに属し、質問文「アップグレード時にエラーが出ます」が「アップグレード」カテゴリに属し、質問文「予期せぬエラーが発生しました、といわれる」が「エラー」カテゴリに属している場合、複数のカテゴリで「エラー」なる語句が使用されているにも関わらず、「エラー」というカテゴリが存在している。このような場合はカテゴリの構成に問題がある可能性が高い。マトリクスを表示することにより以上述べたような問題点の洗い出しを容易に行うことができる。 Furthermore, if there are words that are distributed relatively evenly in a plurality of categories and are also category names, there is a possibility that there is a problem with the category structure in the created FAQ. For example, the question text “Error during printing” belongs to the “Print” category, the question text “Error during upgrade” belongs to the “Upgrade” category, and the question text “An unexpected error has occurred.” When “is” belongs to the “error” category, the category “error” exists even though the word “error” is used in a plurality of categories. In such a case, there is a high possibility that there is a problem with the category configuration. By displaying the matrix, the problems as described above can be easily identified.

（ＦＡＱカテゴリ数の時系列分析）
表１９は行の分類項目をＦＡＱの質問文に対応するカテゴリ名である「カテゴリ１」、「カテゴリ２」とし、列の分類項目を、質問文を作成した時系列単位である「時系列単位Ａ」、「時系列単位Ｂ」、「時系列単位Ｃ」とした場合のマトリクスである。このマトリクスの目的は、質問文の数の経時変化をカテゴリごとに把握する点にある。 (Time series analysis of the number of FAQ categories)
In Table 19, “Category 1” and “Category 2” are category names corresponding to the question texts of the FAQ, and the column classification items are “time series units” that are time series units for creating the question texts. This is a matrix in the case of “A”, “time series unit B”, and “time series unit C”. The purpose of this matrix is to grasp the change over time in the number of question sentences for each category.

ここで数値１として例えば、「カテゴリ１」に属する質問文のうち「時系列単位Ａ」に作成された質問文の数ｎＡ＿１を表示する。数値２〜６も同様である。あるいは、時系列単位Ｂに作成された質問文に関する数値２および５、および時系列単位Ｃに作成された質問文に関する数値３および６を、時系列単位Ａに作成された質問文の数である数値１および４からの変化量または変化の割合としてもよい。例えば「カテゴリ１」に属する質問文のうち「時系列単位Ａ」に作成された質問文の数をｎＡ＿１、「時系列単位Ｂ」に作成された質問文の数をｎＢ＿１とすると、数値２を変化量ｎＢ＿１−ｎＡ＿１あるいは変化の割合ｎＢ＿１／ｎＡ＿１とする。図３に示したマトリクスのように、ドットの色と大きさによって変化の割合と絶対値とを同時に表示するようにしてもよい。 Here, as the numerical value 1, for example, the number nA_1 of question sentences created in “time series unit A” among the question sentences belonging to “category 1” is displayed. The same applies to the numerical values 2 to 6. Alternatively, numerical values 2 and 5 relating to the question sentence created in time series unit B and numerical values 3 and 6 relating to the question sentence created in time series unit C are the number of question sentences created in time series unit A. The amount of change from the numerical values 1 and 4 or the rate of change may be used. For example, assuming that the number of question sentences created in “time series unit A” among the question sentences belonging to “category 1” is nA_1 and the number of question sentences created in “time series unit B” is nB_1, the numerical value 2 is The amount of change is nB_1-nA_1 or the rate of change nB_1 / nA_1. As in the matrix shown in FIG. 3, the rate of change and the absolute value may be displayed simultaneously depending on the color and size of the dots.

このようなマトリクスを表示することにより、あるカテゴリに属するＦＡＱの件数が急に増加したなどの変化を容易に把握することができ、問題が起こっているカテゴリを発見したりカテゴリ間のバランス調整の必要性を認識したり、というように問題の発生を即座に認識することができる。このような効果をより確実に得るために、変化量や変化の割合にあらかじめしきい値を設定しておき、当該しきい値を超えた時点でユーザに通知を行う機能を設けてもよい。通知は画面にその旨の警告を表示してもよいし、問題となっているカテゴリについてユーザが設定したアドレスに電子メールを自動送信してもよい。 By displaying such a matrix, it is possible to easily grasp changes such as a sudden increase in the number of FAQs belonging to a certain category, and to find out the category in which a problem occurs or to adjust the balance between categories. Recognize the necessity and immediately recognize the occurrence of a problem. In order to obtain such an effect more reliably, a threshold value may be set in advance for the amount of change and the rate of change, and a function of notifying the user when the threshold value is exceeded may be provided. For the notification, a warning to that effect may be displayed on the screen, or an e-mail may be automatically transmitted to an address set by the user for the category in question.

（専門知識分析）
表２０は行の分類項目を、文書に含まれる専門用語である「専門用語１」、「専門用語２」とし、列の分類項目を、文書を作成した担当者である「担当者Ａ」、「担当者Ｂ」とした場合のマトリクスである。ここで専門用語に関する情報は、あらかじめ外部辞書などから記憶部１２に記憶させておく。このマトリクスの目的は、問合せに対する回答文や営業日報などで使用している専門用語を、作成者ごとに把握する点にある。 (Expertise analysis)
In Table 20, the classification item of the row is “technical term 1” and “technical term 2” which are technical terms included in the document, and the column classification item is “person in charge A” who is the person in charge of creating the document. This is a matrix in the case of “person in charge B”. Here, information relating to technical terms is stored in advance in the storage unit 12 from an external dictionary or the like. The purpose of this matrix is to grasp for each creator the technical terms used in answers to inquiries and business daily reports.

ここで数値１として例えば、全担当者が「専門用語１」を使用した回数ｎ１に対する、「担当者Ａ」が「専門用語１」を使用した回数ｎＡ＿１の割合、すなわちｎＡ＿１／ｎ１なる値を表示する。数値２〜４も同様である。このようなマトリクスを表示することにより、各担当者が有する専門知識のレベルや得意分野、苦手分野を把握でき、担当者教育や担当変更などを効率的に行うことができる。 Here, as a numerical value 1, for example, a ratio of the number of times nA_1 that “person A” used “technical term 1” to the number n1 that all persons in charge used “technical term 1”, that is, a value of nA_1 / n1 is displayed. To do. The same applies to the numerical values 2 to 4. By displaying such a matrix, it is possible to grasp the level of expertise, field of expertise, and field of weakness of each person in charge, and it is possible to efficiently perform person-in-charge education and person-in-charge change.

以上のべた本実施の形態によれば、文書集合をユーザが指定した様々な手法で分類し、分類結果に関する情報をマトリクス上のドットで表示することにより視覚化する。これにより、高度な分類手法を用いた場合でも、分類結果の相関関係を１つの画面で直感的に把握することができる。またマトリクスの行や列をなす分類項目は、一覧表示からのドラッグアンドドロップなどの簡単な操作で設定するため、異なる分類項目によるマトリクスを次々表示させて傾向を比較することによる知見を得ることができる。１つのマトリクスは２次元の情報であるが、複数のマトリクスを比較していくことができると多次元解析が実現され、得られる情報量が格段に増加する。 According to the above-described embodiment, the document set is classified by various methods specified by the user, and the information related to the classification result is visualized by displaying with dots on the matrix. Thereby, even when an advanced classification method is used, the correlation between the classification results can be intuitively grasped on one screen. In addition, since the classification items that make up the rows and columns of the matrix are set by simple operations such as drag and drop from the list display, it is possible to obtain knowledge by comparing the trends by displaying the matrix with different classification items one after another. it can. One matrix is two-dimensional information. However, if a plurality of matrices can be compared, multidimensional analysis is realized, and the amount of information obtained is greatly increased.

さらに膨大なデータベースを処理対象としても、文書に適した分類項目を選択することができるため、効率のよい分類処理が可能となり、最終的に所望の文書を取得するまでの絞込みを効率よく行える。また文書の属性の有無や形式などに関わらずどのような文書でも処理が可能なため、文書の検索や商品の分析などに幅広く利用できるほか、分類処理自体の妥当性をチェックするなどシステム自身の調整も行うことができる。各用途において得られる多様な効果は上述したとおりである。 Furthermore, even if an enormous database is to be processed, classification items suitable for the document can be selected, so that efficient classification processing is possible, and it is possible to efficiently narrow down until a desired document is finally obtained. In addition, since any document can be processed regardless of the presence / absence and format of the document, it can be widely used for searching documents, analyzing products, etc., and checking the validity of the classification process itself. Adjustments can also be made. Various effects obtained in each application are as described above.

以上、本発明を実施の形態をもとに説明した。この実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. This embodiment is an exemplification, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. is there.

本実施の形態における検索システムの全体的な構成を示す図である。It is a figure which shows the whole structure of the search system in this Embodiment. 本実施の形態において表示部に表示されるマトリクスの例を示す図である。It is a figure which shows the example of the matrix displayed on a display part in this Embodiment. 本実施の形態において表示部に表示されるマトリクスの別の例を示す図である。It is a figure which shows another example of the matrix displayed on a display part in this Embodiment. 本実施の形態における情報表示装置によるマトリクス表示の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the matrix display by the information display apparatus in this Embodiment. 本実施の形態において表示部に表示される画面の構成例を示す図である。It is a figure which shows the structural example of the screen displayed on a display part in this Embodiment. 本実施の形態において行の分類項目が階層構造を有するときにマトリクス表示領域に表示されるマトリクスの例を示す図である。It is a figure which shows the example of the matrix displayed on a matrix display area, when the classification item of a line has hierarchical structure in this Embodiment. 本実施の形態において対角化ソートを行う様子を模式的に示す図である。It is a figure which shows typically a mode that diagonalization sort is performed in this Embodiment. 本実施の形態において文書集合に係る情報を表示した際の画面の構成例を示す図である。It is a figure which shows the structural example of the screen at the time of displaying the information which concerns on a document set in this Embodiment. 本実施の形態において文書の本文を表示した際の画面の構成例を示す図である。It is a figure which shows the structural example of the screen at the time of displaying the text of a document in this Embodiment. 本実施の形態のアフェクト度時系列分析において文書の本文を表示した際の画面の構成例を示す図である。It is a figure which shows the structural example of the screen at the time of displaying the text of a document in the effect degree time series analysis of this Embodiment. 本実施の形態のアフェクト度時系列分析において選択した名詞句を含む評価フレーズの出現数およびアフェクト度を一覧表示した際の画面の構成例を示す図である。It is a figure which shows the structural example of the screen at the time of displaying the number of appearance of the evaluation phrase containing the noun phrase selected in the effect degree time series analysis of this Embodiment, and the degree of effect. 本実施の形態のアフェクト度時系列分析において文書の本文を表示した際の画面の構成例を示す図である。It is a figure which shows the structural example of the screen at the time of displaying the text of a document in the effect degree time series analysis of this Embodiment. 本実施の形態のアフェクト度属性分析において表示されるマトリクスの例を示す図である。It is a figure which shows the example of the matrix displayed in the effect degree attribute analysis of this Embodiment.

Explanation of symbols

１０情報表示装置、１２記憶部、１４分類処理部、１６マトリクス生成部、１８表示部、２０入力部、２２マトリクス表示部、５０マトリクス、５１マトリクス表示領域、５２列の分類項目欄、５４行の分類項目欄、５６図形表示欄、６０マトリクス表示画面、６２分類項目選択領域、６６文書集合指定領域、６７ソート指示ボタン、６８絞込み指示ボタン、８０横方向ガイド線、８２縦方向ガイド線、８４縦方向情報表示領域、８６横方向情報表示領域。 10 information display devices, 12 storage units, 14 classification processing units, 16 matrix generation units, 18 display units, 20 input units, 22 matrix display units, 50 matrices, 51 matrix display areas, 52 columns of classification item fields, 54 rows Classification item field, 56 graphic display field, 60 matrix display screen, 62 classification item selection area, 66 document set designation area, 67 sort instruction button, 68 narrowing instruction button, 80 horizontal direction guide line, 82 vertical direction guide line, 84 vertical direction Direction information display area, 86 Horizontal direction information display area.

Claims

A storage unit for storing a plurality of documents;
A classification processing unit that forms a plurality of document set groups by classifying a plurality of documents stored in the storage unit by a first classification method and a second classification method;
The correlation between the classification results obtained by the first classification method and the second classification method performed by the classification processing unit is related to the intersection of the document sets by expanding the two series of document sets into rows and columns. A matrix display unit for displaying numerical information as a two-dimensional matrix expressed in a predetermined figure;
An information display device comprising:

The display unit according to claim 1, further comprising a display unit that displays a list of a plurality of classification methods and receives an input from a user who selects the first classification method and the second classification method from the list. Information display device.

The said matrix display part represents either the number of the documents contained in the said product set, and the ratio of the number of the said documents by the change of the color of the said figure, The Claim 1 or 2 characterized by the above-mentioned. Information display device.

The matrix display unit is a numerical value related to the amount of change in the number of documents included in the product set belonging to each row, based on the number of documents included in each product set belonging to a predetermined column of the product set. The information display device according to claim 1, wherein the information is expressed by a change in color of the graphic.

The classification processing unit classifies a plurality of documents stored in the storage unit according to a predetermined clustering method based on similarity of words included in the document,
At least one of the first classification method and the second classification method is a clustering method performed by the classification processing unit,
The matrix display unit displays a representative phrase selected according to a predetermined criterion from phrases extracted from each document when the classification processing unit performs clustering as a classification item on a heading of the two-dimensional matrix. 3. The information display device according to 1 or 2.

Both the first classification method and the second classification method are the same clustering method performed by the classification processing unit,
The matrix display unit expresses a ratio of a document set belonging to another classification item including at least one representative word / phrase selected from a word / phrase extracted from a document set belonging to a certain classification item by the graphic. The information display device according to claim 5.

The storage unit further stores preset fixed classification items,
The classification processing unit classifies the plurality of documents stored in the storage unit into the fixed classification items,
Both the first classification method and the second classification method are classifications into the fixed classification items performed by the classification processing unit,
The said matrix display part represents the similarity for every combination of the said fixed classification item calculated with the word extracted from the document set which belongs to each fixed classification item with the said figure, or characterized by the above-mentioned. 2. The information display device according to 2.

Either of the first classification method and the second classification method is a classification method in which classification items form a hierarchical structure,
The classification processing unit performs classification corresponding to each hierarchy of the classification items,
The information display device according to claim 1, wherein the matrix display unit switches and displays the two-dimensional matrix that represents a classification result corresponding to the classification item hierarchy according to a selection instruction of a user.

The matrix display unit determines a display order of each classification item in the two-dimensional matrix so that a numerical value related to the product set becomes maximum on a diagonal line of the two-dimensional matrix. The information display device described in 1.

The matrix display unit converts at least one of numerical information represented by a graphic displayed in an area on the two-dimensional matrix selected by a user and information on classification items corresponding to rows and columns of the area into a text format. The information display device according to claim 1, further displaying.

The said matrix display part further displays the document contained in the said product set corresponding to the figure displayed on the area | region on the said two-dimensional matrix selected by the user in a text format. The information display device described in 1.

A storage unit for storing a plurality of documents;
A classification processing unit that forms a plurality of document sets by classifying the plurality of documents stored in the storage unit by a predetermined classification method;
Based on the number of occurrences of words / phrases extracted from each document set classified by the classification processing unit or the number of occurrences of combinations of words / phrases, the numerical information related to the words / phrases is calculated, and the result of classification performed by the classification processing unit and A matrix display for displaying a correlation with numerical information related to a phrase as a two-dimensional matrix expressing the numerical information related to the phrase in a predetermined figure;
An information display device comprising:

The storage unit further stores data associating a predetermined impression expression word with a numerical value indicating a degree of evaluation represented by each impression expression word,
The matrix display unit calculates, for each modified noun phrase included in the phrase including the impression expression word extracted from each document set, a degree of evaluation of the modified noun phrase based on the data, The information display device according to claim 12, wherein the information display device is represented by a figure.

Receiving a selection input of a first classification method and a second classification method for classifying a plurality of documents from a user;
Classifying the plurality of documents with the selected first classification method and second classification method to form a two-line document set group;
The correlation between the classification results obtained by the first classification method and the second classification method is expressed as follows: the two series of document set groups are expanded into rows and columns, and numerical information relating to a product set of the document set groups is expressed in a predetermined figure. Displaying as a represented two-dimensional matrix;
An information display method comprising:

The information display method according to claim 14, further comprising a step of receiving a selection input of the type of numerical information from a user.

When a selection input of any one of the first classification method, the second classification method, and the numerical information type is newly received after the displaying step, the two-dimensional matrix corresponding to the selection input is displayed. The information display method according to claim 15, further comprising a step of switching between.

A function of accepting selection input of a first classification method and a second classification method for classifying a plurality of documents from a user;
A function of classifying the plurality of documents by the selected first classification method and second classification method to form a two-line document set group;
The correlation between the classification results obtained by the first classification method and the second classification method is expressed as follows: the two series of document set groups are expanded into rows and columns, and numerical information relating to a product set of the document set groups is expressed in a predetermined figure. A function to display as a expressed two-dimensional matrix;
A computer program for causing a computer to realize the above.