JP2000010986A

JP2000010986A - Retrieval support method for document data base and storage medium where program thereof is stored

Info

Publication number: JP2000010986A
Application number: JP10171915A
Authority: JP
Inventors: Shuichi Arai; 秀一荒井
Original assignee: TRENDY KK
Current assignee: TRENDY KK
Priority date: 1998-06-18
Filing date: 1998-06-18
Publication date: 2000-01-14
Anticipated expiration: 2018-06-18
Also published as: JP3431836B2

Abstract

PROBLEM TO BE SOLVED: To enable a search by displaying circumferential topics extracted under OR conditions and subject topics extracted under AND conditions when plural documents are retrieved. SOLUTION: The circumferential topics, generated by ORing cooccurrence networks of respective documents in a document group as a retrieval space, and subject topics, generated by ANDing cooccurrence networks of the respective documents, are displayed. Consequently, narrowed-down elements can accurately be shown to a retriever. The retriever further inputs narrowing-down conditions according to the narrowed-down elements to narrow down the retrieval space, and then the retrieval of document data which has few noise and high flexibility becomes possible. This system has its processes composed of a cooccurrence data base generating process 100 and a subject suggestion independent word network display process 200, and those processes 100 and 200 are performed by reading execution programs to a CPU from a large-capacity external storage device.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、新聞記事、技術文
書、文芸著作文書等より特定な単語を抽出し、その単語
同士をネットワーク形式で関係付けることによって行う
ドキュメントデータベースの利用に関し、特に複数の文
書を対象とした検索を行う場合に有効な技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the use of a document database by extracting specific words from newspaper articles, technical documents, literary works, etc., and associating the words in a network format. The present invention relates to a technique that is effective when performing a search for a document.

【０００２】ネットワークやＣＤ−ＲＯＭ等の電子デー
タ媒体を対象（検索空間）とした検索手法には以下のも
のが知られている。[0002] The following methods are known as search methods for electronic data media such as networks and CD-ROMs (search space).

【０００３】第１は、検索空間に対して検索条件として
検索者がキーワードを入力し、そのキーワードに合致し
た文書を検索空間より抽出し、絞り込みを行う方法であ
り、インターネット上でのサーチエンジンもこの種のも
のが多い。The first is a method in which a searcher inputs a keyword as a search condition in a search space, extracts documents matching the keyword from the search space, and narrows the search. A search engine on the Internet is also used. There are many of this kind.

【０００４】第２は、検索空間をあらかじめいくつかの
分野に分類しておき、それら分野に属する文書群の特徴
を示す単語をシステム側から提示し、検索者が必要な情
報にしたがって分野を選ぶことにより、絞り込みを行う
手法である。いわゆるディレクトリサービスと呼ばれる
ものがこれに含まれる。[0004] Second, the search space is classified into several fields in advance, words indicating the characteristics of the documents belonging to those fields are presented from the system side, and the searcher selects the field according to necessary information. This is a method of narrowing down. This includes what is called a directory service.

【０００５】第３は、前記第１の手法および第２の手法
の組み合わせによる検索手法であり、あらかじめ検索分
野を限定した後にキーワードでの検索を行うようにした
ものであり、検索主題から遊離したノイズを除去するの
に有効である。A third method is a search method based on a combination of the first method and the second method, in which a search is performed using a keyword after limiting a search field in advance, and is separated from a search subject. It is effective for removing noise.

【０００６】このような従来の検索手法は大半がトップ
ダウン方式による処理、すなわちより抽象的な目標を設
定して広い範囲から徐々に目標を抽象度の低いもの（具
体的なもの）として対象を絞り込んでいく方法だった。
一方、これとは逆に、より抽象度の低いデータの集合か
ら抽象度の高いデータの集合をインデックス的に生成し
ておくことにより検索フィールドの目安をつける手法も
存在するが、このときの抽象度の高いデータ集合の生成
は、専ら人間の経験に基づく操作に依存していた。[0006] Most of the conventional retrieval methods are top-down processing, that is, a more abstract target is set and the target is gradually reduced from a wide range to a target having a lower level of abstraction (specific one). It was a way to narrow down.
On the other hand, on the other hand, there is a method of indexing a search field by generating a set of data with a higher degree of abstraction from a set of data with a lower degree of abstraction in an index manner. The generation of frequent data sets relied exclusively on operations based on human experience.

【０００７】[0007]

【発明が解決しようとする課題】しかし、これらの検索
手法ではいずれも十分な検索効率を得ることができなか
った。その原因は以下の通りである。（１）検索空間の文書数が絞り込めていないときにキー
ワードマッチングで絞り込みを行うと、文書の内容特徴
を示さない単語とキーワードがマッチングを起こし、検
索者の望まない文書が索出されてしまうことになる。However, none of these retrieval techniques could provide sufficient retrieval efficiency. The cause is as follows. (1) If the number of documents in the search space is not narrowed down, if the keyword is narrowed down by keyword matching, a word that does not show the content characteristic of the document will be matched with the keyword, and a document not desired by the searcher will be searched out. Will be.

【０００８】（２）反対に検索空間の文書数が絞り込め
ているときには、既に同義語による検索洩れが起きてお
り、その検索空間に属する文書の他にも検索者の意図す
る文書が存在するということがある。(2) On the other hand, when the number of documents in the search space is narrowed down, search omission has already occurred due to a synonym, and a document intended by the searcher exists in addition to the documents belonging to the search space. There is that.

【０００９】（３）文書には話題が一つしかないという
ことは稀であり、話題が複数存在することが多い。更に
話題の捉え方は人の主観によって様々であり、ディレク
トリサービスのような手法を用いても、一文書に対して
分野を特定することは困難である。(3) It is rare that a document has only one topic, and a plurality of topics often exist. Furthermore, the way of perceiving topics varies depending on human subjectivity, and it is difficult to specify a field for one document even by using a method such as a directory service.

【００１０】（４）絞り込みを行うための検索条件とな
る検索キーに、全文、キーワード、表題、分野を用いて
いることが多いが、それらは文書の特徴を明確にしてい
るとは限らない。すなわち、そもそも検索者が予め意図
していなかったような分析結果を多くの文書の集合体か
ら得ようとするときに（発見的探索）、あらかじめ検索
者が適当な検索キーを設定できることはむしろ希であ
る。(4) A search key serving as a search condition for narrowing down often uses a full text, a keyword, a title, and a field, but these do not always clarify the characteristics of a document. That is, when trying to obtain an analysis result that was not intended by the searcher in advance from a collection of many documents (heuristic search), it is rather rare that the searcher can set an appropriate search key in advance. It is.

【００１１】（５）絞り込みを行う過程において、検索
者の知ることのできる情報が、検索空間の文書数のみで
あるといったことが多く、その情報だけでは検索空間が
検索者の意図したものであるかの判断ができない。(5) In the process of narrowing down, the information that the searcher can know is often only the number of documents in the search space, and the search space is intended by the searcher only with that information. Can't judge.

【００１２】（６）検索条件によって絞り込まれた文書
というのは、最終的に検索者がその全文を読み内容を把
握して、必要な文書かを判断しなければならず、時間と
労力がかかる。(6) A document narrowed down by a search condition requires a searcher to finally read the entire text and grasp the contents to determine whether the document is necessary, and it takes time and labor. .

【００１３】現在、上記で述べた検索上の問題点を解決
するためにいくつかの手法が提案されている。例えば、
あるサーチエンジンでは、検索空間内の重要語や出現頻
度が高い単語を検索システム側から提示するようになっ
ている。さらに、検索空間をグラフやマップの形式で検
索者に提示し、検索者が得られる情報の増加を図る努力
もされている。At present, several methods have been proposed to solve the above-mentioned problems in search. For example,
In some search engines, important words in a search space and words having a high frequency of appearance are presented from the search system side. Further, efforts are being made to present search spaces to searchers in the form of graphs and maps so as to increase the information obtained by the searchers.

【００１４】本発明者も、特開平８−３１４９８０号公
報において、与えられた文書から自立語を抽出し、この
自立語に対する共起単語とその共起回数を記録する共起
テーブルを作成し、これらの共起関係の強さを示す共起
確率を計算し、これらの自立語同士を共起確率によって
異なるリンク表示を行わせることによって文書の内容把
握を極めて容易にする共起ネットワーク表示手法を提案
した。In Japanese Patent Laid-Open Publication No. Hei 8-314980, the present inventor also extracts a self-sustaining word from a given document and creates a co-occurrence table for recording the co-occurring words and the number of times of co-occurrence for the independent word. A co-occurrence network display method that calculates the co-occurrence probabilities that indicate the strength of these co-occurrence relationships and displays these independent words with different links according to the co-occurrence probabilities makes it extremely easy to grasp the contents of the document. Proposed.

【００１５】本発明ではこの手法をさらに発展させて、
大量な複数の文書を対象にした話題という視点から共起
ネットワークを用いた検索技術を提供するものである。In the present invention, this technique is further developed,
An object of the present invention is to provide a search technology using a co-occurrence network from the viewpoint of topics covering a large number of documents.

【００１６】[0016]

【課題を解決するための手段】本発明の第１の手段は、
与えられた第１の複数の文書群から自立語を抽出するス
テップと、文書毎に生成された自立語に対する共起単語
とその共起回数を記録する共起テーブルを作成するステ
ップと、作成された文書毎の共起テーブルより論理和ネ
ットワークを生成する周辺話題生成ステップと、作成さ
れた文書毎の共起テーブルより論理積ネットワークを生
成する中心話題生成ステップと、前記論理積ネットワー
クを含む論理和ネットワークを表示し、絞り込みを行う
単語の入力を促すステップと、入力された単語を含む第
２の文書群に絞り込み、この第２の文書群を前記第１の
文書群として前記共起テーブルの作成と周辺話題生成と
中心話題生成とを繰り返すドキュメントデータベースの
検索支援方法である。A first means of the present invention is as follows.
Extracting an independent word from the given first plurality of documents; creating a co-occurrence table for recording the co-occurring words and the number of times of co-occurrence for the independent words generated for each document; A peripheral topic generating step of generating a logical sum network from the co-occurrence table of each document, a central topic generating step of generating a logical AND network from the generated co-occurrence table of the document, and a logical sum including the logical AND network Displaying a network and prompting for a word to be narrowed down; narrowing down to a second group of documents containing the input word; creating the co-occurrence table with the second group of documents as the first group of documents This is a document database search support method that repeats generation of a topic topic and generation of a central topic.

【００１７】検索空間となる複数の文書群（第１の文書
群）から各文書の共起ネットワークの論理和をとって周
辺話題を生成するとともに、各文書の共起ネットワーク
の論理積をとって中心話題を生成し、これらを表示する
ことで絞り込み要素を的確に検索者に示すことが可能と
なる。検索者はこれに基づいてさらに絞り込み条件を入
力することにより、検索空間を絞ることができ、ノイズ
の極めて少ないかつ柔軟性に富んだドキュメントデータ
の索出が可能となる。From a plurality of document groups (first document group) as a search space, a logical sum of the co-occurrence network of each document is calculated to generate a peripheral topic, and a logical product of the co-occurrence network of each document is calculated. By generating the main topics and displaying them, it becomes possible to accurately show the narrowing-down element to the searcher. The searcher can further narrow down the search space by inputting narrowing conditions based on the search condition, and can search for document data with extremely little noise and high flexibility.

【００１８】第２の手段は、前記第１の手段において、
前記中心話題生成ステップとして、前記複数の文書間の
相互情報量を用いて複数文書間で同時に存在する共起対
の強さを算出するようにした。[0018] The second means is the first means,
As the central topic generation step, the strength of a co-occurrence pair that exists simultaneously among a plurality of documents is calculated using the mutual information amount between the plurality of documents.

【００１９】すなわち、複数の文書間で同時に存在する
共起対の共起確率に対して、相互情報量を定義し、この
相互情報量で共起対の結びつきの強さを示すことで複数
の文書群に同時に含まれている共起対がどの程度文書群
の中で中心の話題を表しているかの指標を得ることがで
きる。That is, mutual information is defined with respect to the co-occurrence probability of a co-occurrence pair which exists simultaneously in a plurality of documents, and the mutual information indicates the strength of the connection of the co-occurrence pair, whereby a plurality of It is possible to obtain an index of how much a co-occurrence pair simultaneously included in a document group represents a central topic in the document group.

【００２０】第３の手段は、前記第１の手段をプログラ
ムとして記憶した記憶媒体である。ここで、記憶媒体と
は、磁気的、光学的に記録可能なあらゆる媒体を含み、
その形態もディスク、テープ、メモリカートリッジ等い
かなるものであってもよい。具体的には、光ディスク、
光磁気ディスク、ＩＣカード、磁気テープ等が挙げられ
る。The third means is a storage medium storing the first means as a program. Here, the storage medium includes any medium that can be magnetically and optically recorded,
The form may be any form such as a disk, a tape, a memory cartridge and the like. Specifically, optical discs,
Examples include a magneto-optical disk, an IC card, and a magnetic tape.

【００２１】[0021]

【発明の実施の形態】本発明の理解のために、自立語の
抽出と、共起テーブルの構成ならびに共起ネットワーク
の表示例について簡略化した例で説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS For the purpose of understanding the present invention, a description will be given of a simplified example of extraction of an independent word, a configuration of a co-occurrence table, and a display example of a co-occurrence network.

【００２２】本発明でいう「自立語」とは単独で意味を
なす単語を指し、たとえば「料理の先生になりたい」と
いう文の場合、「料理」、「先生」および「なる」が自
立語である。In the present invention, the word "independent word" refers to a word that has meaning alone. For example, in the case of a sentence "I want to be a cooking teacher,""cooking,""teacher," and "naruto" are independent words. is there.

【００２３】また、「共起」とは同一文中に同時に存在
する二つの自立語間の関係をいう。すなわち同一文中に
同時に複数の自立語が存在する場合にはそれらの自立語
は互いに共起している、または、共起関係にあるとい
う。また、互いに共起関係にある２つの自立語が存在す
る場合に、一方の自立語を他方の自立語に対する共起単
語という。"Co-occurrence" refers to the relationship between two independent words that exist simultaneously in the same sentence. That is, when a plurality of independent words exist in the same sentence at the same time, these independent words are said to co-occur with each other or to have a co-occurrence relationship. When two independent words having a co-occurrence relation exist, one independent word is referred to as a co-occurring word with respect to the other independent word.

【００２４】次に、共起テーブルの具体例を図１に示
す。同図では、文書１「料理の先生になりたい」、文書
２「料理学校の先生です」という２つの文に基づいて共
起テーブルを作成する例を示している。同図において、
文書１の自立語として「料理」、「先生」および「な
る」が抽出され、文書２の自立語として「料理」、「学
校」および「先生」が抽出される。これらの自立語と共
起関係にある同図右側の共起単語が抽出され共起回数が
算出される。たとえば、「料理」という自立語は文書１
と文書２の双方に存在しており、この「料理」という自
立語に対して「先生」という共起単語は文書１に１回、
文書２に１回出現しており合計２回共起されている。し
たがって自立語「料理」に対する共起単語「先生」の共
起回数は２となる。Next, a specific example of the co-occurrence table is shown in FIG. The figure shows an example in which a co-occurrence table is created based on two sentences: Document 1 “I want to be a cooking teacher” and Document 2 “I am a cooking school teacher”. In the figure,
“Cooking”, “teacher” and “naru” are extracted as independent words of the document 1, and “cooking”, “school” and “teacher” are extracted as independent words of the document 2. The co-occurrence words on the right side of the figure that are co-occurring with these independent words are extracted and the number of co-occurrence is calculated. For example, the independent word "cooking" is document 1.
And document 2 both exist, and for this independent word "cooking", the co-occurrence word "teacher" appears once in document 1,
Appears once in Document 2 and co-occurs twice. Therefore, the co-occurrence frequency of the co-occurrence word “teacher” with respect to the independent word “cooking” is 2.

【００２５】なおこのとき、図１の左縦方向で示したよ
うに、文書１と文書２とを繋げて一文として共起テーブ
ルを作成する方法と、右縦方向で示したように、文書１
と文書２の共起テーブルをそれぞれ作成してこれらのテ
ーブルをマージすることにより複数文書の共起テーブル
とする方法とがあるが、論理和（ＯＲ）の場合、どちら
で評価しても差はない。この点については後述する。上
記図１で得られた共起テーブルを共起ネットワークで表
示したものが図２である。ここでは非常に短い文章であ
るために、共起確率の算出は意識していない。At this time, a method of creating a co-occurrence table as one sentence by connecting document 1 and document 2 as shown in the left vertical direction of FIG.
There is a method of creating a co-occurrence table of document 2 and merging these tables to form a co-occurrence table of a plurality of documents. Absent. This will be described later. FIG. 2 shows the co-occurrence table obtained in FIG. 1 described above in a co-occurrence network. Here, since the text is very short, the calculation of the co-occurrence probability is not considered.

【００２６】「共起確率」とは、ある自立語ｗｉが出現
したとき、ｗｉに対する共起単語ｃｊがｗｉと共起する
確率Ｐ（ｃｊ/ｗｉ)であり、下記の（１）式により算出
可能である。Ｐ（ｃｊ／ｗｉ）＝Ｍ（ｃｊ／ｗｉ）／Ｎｗｉ・・・・・（１）ただし、ｗｉ：自立語（１≦ｉ≦ＮＢ、ＮＢ：一文書中の総語量
数）ｃｊ：自立語ｗｉに対する共起単語（１≦Ｊ≦Ｎｂ（ｗ
ｉ）、Ｎｂ（ｗｉ：ｗｉに対する共起単語数）Ｍ（ｃ
ｊ／ｗｉ）：自立語ｗｉとそれに対する共起単語ｃｊと
の共起回数Ｎｗｉ：自立語ｗｉの出現数前記（１）式により算出される共起確率は、自立語ｗｉ
の生起という条件つき確率であり、ｗｉからそれに対す
る共起単語ｃｊへの意味的な繋がりを示すから、この情
報を「自立語ｗｉ→共起単語ｃｊ」のような有向リンク
により表現することができる。The "co-occurrence probability" is the probability P (cj / wi) that a co-occurring word cj for wi co-occurs with wi when a certain independent word wi appears, and is calculated by the following equation (1). It is possible. P (cj / wi) = M (cj / wi) / Nwi (1) where, wi: independent word (1 ≦ i ≦ NB, NB: total number of words in one document) cj: independent Co-occurrence word (1 ≦ J ≦ Nb (w
i), Nb (wi: the number of co-occurring words for wi) M (c
j / wi): Number of co-occurrence of independent word wi and co-occurring word cj Nwi: Number of appearance of independent word wi The co-occurrence probability calculated by the above equation (1) is the independent word wi
Is a conditional probability of occurrence of wi, and indicates a semantic connection from wi to the co-occurring word cj. Therefore, this information should be represented by a directional link such as “independent word wi → co-occurring word cj”. Can be.

【００２７】続いて、本発明は、共起回数の期待値を算
出する共起回数期待値算出ステップを実行する。共起回
数の期待値Ｅ（ｃｊ／ｗｉ）は下記の（２）式により算
出することができる。Ｅ（ｃｊ／ｗｉ）＝Ｎｓ・｛１−（１−Ｐｃｊ）ｍ−（１−Ｐｗｉ）ｍ＋（１−Ｐｃｊ）ｍ・（１−Ｐｗｉ）ｍ｝・・・・（２）ただし、Ｎｓ：一文書中の総文数Ｎｉ：総自立語数ｍ：一文中の平均自立語数Ｐｃｊ＝Ｎｃｊ／ＮＩＰｃｊ：共起単語ｃｊの出現
確率Ｐｗｉ＝Ｎｗｉ／ＮＩＰｗｉ：自立語ｗｉの出現確
率ｍ−ＮＩ／ＮｓSubsequently, the present invention executes a co-occurrence count expected value calculating step of calculating an expected value of the co-occurrence count. The expected value E (cj / wi) of the co-occurrence count can be calculated by the following equation (2). E (cj / wi) = Ns ｛{1- (1-Pcj) m- (1-Pwi) m + (1-Pcj) m ・ (1-Pwi) m} (2) where Ns : Total number of sentences in one document Ni: Total number of independent words m: Average number of independent words in one sentence Pcj = Ncj / NI Pcj: Probability of appearance of co-occurring word cj Pwi = Nwi / NI Pwi: Probability of appearance of independent word wi m− NI / Ns

【００２８】次に、期待値Ｅ（ｃｊ／ｗｉ）と実際の共
起回数Ｍ（ｃｊ／ｗｉ）を比較して、一定の条件を満た
す自立語とそれに対する共起単語との組み合わせを抽出
する主題示唆自立語群選定ステップが実行される。一定
の条件としては、下記の（３）式を例示することができ
る。Ｍ（ｃｊ／ｗｉ）＞Ｅ（ｃｊ／ｗｉ）・・・・・（３）Next, the expected value E (cj / wi) is compared with the actual number of co-occurrences M (cj / wi), and a combination of an independent word satisfying a certain condition and a co-occurring word corresponding thereto is extracted. A subject suggestion independent word group selection step is executed. As the constant condition, the following equation (3) can be exemplified. M (cj / wi)> E (cj / wi) (3)

【００２９】そして、上記の（３）式を満たせば期待値
Ｅ（ｃｊ／ｗｉ）よりも実際の共起回数Ｍ（ｃｊ／ｗ
ｉ）が大きいことになり、自立語ｗｉと共起単語ｃｊと
の意味的な繋がりがあるといえる。しかし、文書中の総
自立語数ＮＩに対して語量数ＮＢが極端に少ない場合、
たとえば子供向けの物語などのように少ない語量での繰
り返しが多い場合には、意味的な繋がりが非常に弱くて
も上記の（３）式の条件を満たす自立語と共起単語の組
み合わせが多数出現することになる。If the above equation (3) is satisfied, the actual number of co-occurrences M (cj / w) is smaller than the expected value E (cj / wi).
i) is large, and it can be said that there is a semantic connection between the independent word wi and the co-occurring word cj. However, when the number of words NB is extremely small with respect to the total number of independent words NI in the document,
For example, when there are many repetitions with a small amount of words such as a story for children, even if the semantic connection is very weak, the combination of the independent word and the co-occurring word that satisfies the condition of the above equation (3) is Many will appear.

【００３０】そこで、一単語の平均出現数を考慮し、こ
こでは次の（４）式を満たす場合のみを共起関係が強い
と判断してもよい。Ｍ（ｃｊ／ｗｉ）＞Ｅ（ｃｊ／ｗｉ）＋α・ＮＩ／ＮＢ・・・・・（４）上記（４）式において、αはたとえば実験的に求めるこ
とができる。Therefore, considering the average number of appearances of one word, it may be determined that the co-occurrence relation is strong only when the following expression (4) is satisfied. M (cj / wi)> E (cj / wi) + αNI / NB (4) In the above equation (4), α can be obtained experimentally, for example.

【００３１】次に、図３を用いて、本実施形態の処理と
ハードウエアとの関係を示す。本実施の形態の処理は大
きく分けて、共起データベース作成処理（１００）と、
主題示唆自立語ネットワーク表示処理（２００）とから
なる。そして、これらの処理を実現するために、１６ビ
ット処理、好ましくは３２ビット処理以上のＣＰＵ、メ
インメモリ、ハードディスク装置等の大容量外部記憶装
置、及びＣＲＴまたはプリンタ等の外部表示・出力装置
を備えた電子計算機システムを用いる。上記の共起デー
タベース作成処理（１００）と主題示唆自立語ネットワ
ーク表示処理（２００）とは、大容量外部記憶装置より
実行プログラムがＣＰＵに読み込まれて実行される。ま
た、原文章の入力は予めエディタあるいはワードプロセ
ッサ等のアプリケーションソフトによって入力され、例
えばテキストファイル形式で大容量外部記憶装置に格納
されているものとする。Next, the relationship between the processing of this embodiment and the hardware will be described with reference to FIG. The processing of this embodiment is roughly divided into a co-occurrence database creation processing (100),
Subject suggestion independent word network display processing (200). In order to realize these processes, a 16-bit processing, preferably a 32-bit or higher CPU, a main memory, a large-capacity external storage device such as a hard disk device, and an external display / output device such as a CRT or a printer are provided. Computer system is used. The co-occurrence database creation processing (100) and the subject suggestion independent word network display processing (200) are executed by the CPU reading an execution program from a large-capacity external storage device. The input of the original text is input in advance by application software such as an editor or a word processor, and is stored in a large-capacity external storage device in, for example, a text file format.

【００３２】上記の共起データベース作成処理（１０
０）と主題示唆自立語ネットワーク表示処理（２００）
とは、単一の電子計算機で実現されるようにしてもよ
く、あるいは共起データベース作成処理（１００）を実
現する共起データベース作成装置と、主題示唆自立語ネ
ットワーク表示処理（２００）を実現する主題示唆自立
語ネットワーク表示装置とに分け、これらの装置を通信
回線で接続して、電子化された文書データや共起データ
ベース等を通信によって送受信できるようにした形態と
してもよい。さらに、共起データベース作成装置と主題
示唆自立語ネットワーク表示装置とのデータの受け渡し
は通信に限られるものでないことは当然であり、例えば
ＦＤやＣＤ−ＲＯＭ等の記録媒体に記録して受け渡しを
行う方法等を例示することができる。The above co-occurrence database creation processing (10
0) and subject suggestion independent word network display processing (200)
May be realized by a single computer, or may realize a co-occurrence database creation device for realizing a co-occurrence database creation process (100) and a subject suggestion independent word network display process (200). It may be divided into a subject suggestion independent word network display device, and these devices may be connected by a communication line so that digitized document data, a co-occurrence database, and the like can be transmitted and received by communication. Further, it is natural that the data transfer between the co-occurrence database creation device and the subject suggestion independent word network display device is not limited to communication. For example, the data transfer is performed by recording on a recording medium such as an FD or a CD-ROM. Methods and the like can be exemplified.

【００３３】以下、共起データベース作成処理（１０
０）、及び主題示唆自立語ネットワーク表示処理（２０
０）について述べる。〈共起データベース作成処理（１００）〉共起データベ
ース作成処理（１００）は、電子化された文書データか
ら、共起テーブル（ＴＢＬｎ）と、共起確率テーブル
（ＴＢＭｎ）と、期待値テーブル（ＴＢＮｎ）とから構
成される共起データベース（ＫＤＢ）を作成し、この共
起データベース（ＫＤＢ）を大容量外部記憶装置（ある
いはメインメモリ）に記録する処理である。The co-occurrence database creation process (10
0) and subject suggestion independent word network display processing (20
0) will be described. <Co-occurrence database creation process (100)> The co-occurrence database creation process (100) is based on digitized document data, and includes a co-occurrence table (TBLn), a co-occurrence probability table (TBMn), and an expected value table (TBNn). ) Is created, and this co-occurrence database (KDB) is recorded in a large-capacity external storage device (or main memory).

【００３４】本実施例では、文書１の共起テーブル（Ｔ
ＢＬ１），共起確率テーブル（ＴＢＭ１），期待値テー
ブル（ＴＢＮ１）と文書２の共起テーブル（ＴＢＬ
２），共起確率テーブル（ＴＢＭ２），期待値テーブル
（ＴＢＮ２）とがマージされて新たな複合文書（１〜
ｎ：ここではｎ＝２）の共起テーブル（ＴＢＬ１〜
ｎ），共起確率テーブル（ＴＢＭ１〜ｎ），期待値テー
ブル（ＴＢＮ１〜ｎ）が生成されている。In this embodiment, the co-occurrence table (T
BL1), co-occurrence probability table (TBM1), expected value table (TBN1) and co-occurrence table of document 2 (TBL1)
2), the co-occurrence probability table (TBM2) and the expected value table (TBN2) are merged into a new compound document (1 to
n: Here, n = 2) co-occurrence tables (TBL1 to TBL1)
n), co-occurrence probability tables (TBM1 to TBM), and expected value tables (TBN1 to TBN) are generated.

【００３５】前記共起データベース作成処理（１００）
は、図３に示すように自立語抽出処理（１１０）、共起
テーブル作成処理（１２０）、共起確率算出処理（１３
０）、及び共起回数期待値算出処理（１４０）に分ける
ことができる。The co-occurrence database creation processing (100)
As shown in FIG. 3, the independent word extraction process (110), the co-occurrence table creation process (120), and the co-occurrence probability calculation process (13)
0) and co-occurrence count expected value calculation processing (140).

【００３６】自立語抽出処理は、電子化された文書デー
タから、自立語を抽出し、その文書中の総自立語数と語
量数を算出する。共起テーブル作成処理（１２０）は、
自立語抽出処理によって抽出された自立語毎に、各自立
語に対する共起単語とその共起回数とを登録する共起テ
ーブル（ＴＢＬｎ）を作成する。In the independent word extraction process, an independent word is extracted from digitized document data, and the total number of independent words and the number of words in the document are calculated. The co-occurrence table creation process (120)
For each independent word extracted by the independent word extraction processing, a co-occurrence table (TBLn) for registering a co-occurring word for each independent word and its co-occurrence count is created.

【００３７】共起確率算出処理（１３０）は、共起テー
ブル（ＴＢＬｎ）に記録された全自立語に対し、各自立
語がそれに対する共起単語と共起する共起確率を算出す
る。尚、１つの自立語に対して複数の共起単語が存在す
る場合には、各共起単語毎に共起確率を算出する。さら
に、共起確率算出処理（１３０）では、各自立語毎に、
それに対する共起単語と、その共起単語と共起する共起
確率とを登録する共起確率テーブル（ＴＢＭｎ）を作成
する。The co-occurrence probability calculation process (130) calculates the co-occurrence probability of each independent word co-occurring with the co-occurring word corresponding to all independent words recorded in the co-occurrence table (TBLn). When a plurality of co-occurring words exist for one independent word, a co-occurrence probability is calculated for each co-occurring word. Further, in the co-occurrence probability calculation process (130), for each independent word,
A co-occurrence probability table (TBMn) for registering co-occurrence words and co-occurrence probabilities co-occurring with the co-occurrence word is created.

【００３８】共起回数期待値算出処理（１４０）は、共
起テーブル（ＴＢＬｎ）に記録された全自立語に対し、
各自立語のそれに対する共起単語と共起する共起回数の
期待値を算出する。尚、１つの自立語に対して複数の共
起単語が存在する場合には、各共起単語毎に共起回数の
期待値を算出する。そして、各自立語毎に、それに対す
る共起単語とその共起単語と共起する共起回数の期待値
とを登録する期待値テーブル（ＴＢＮｎ）を作成する。（主題示唆自立語ネットワーク表示処理（２００））主
題示唆自立語ネットワーク表示処理（２００）は、共起
データベースに基づいて共起ネットワークを作成し、作
成された共起ネットワークをＣＲＴやプリンタ等の出力
装置に出力するとともに、共起ネットワークの原文章を
検索して出力装置に出力する処理である。本実施の形態
では、出力装置としてＣＲＴを用いる。The co-occurrence count expected value calculation process (140) is performed on all independent words recorded in the co-occurrence table (TBLn).
An expected value of the number of co-occurrences of each independent word co-occurring with the co-occurring word is calculated. When a plurality of co-occurring words exist for one independent word, an expected value of the number of co-occurring times is calculated for each co-occurring word. Then, for each independent word, an expected value table (TBNn) for registering a co-occurring word and an expected value of the number of co-occurrences co-occurring with the co-occurring word is created. (Thematic Suggestion Independent Word Network Display Processing (200)) The theme suggestion independent word network display processing (200) creates a co-occurrence network based on a co-occurrence database and outputs the created co-occurrence network to a CRT, printer, or the like. This is a process of outputting to the output device as well as searching for the original text of the co-occurrence network. In this embodiment, a CRT is used as an output device.

【００３９】主題示唆自立語ネットワーク表示処理（２
００）は、図７に示すように、主題示唆自立語群選定処
理（２１０）、共起ネットワーク生成処理（２２０）、
文書検索処理（２３０）、及び原文参照処理（２４０）
に分けることができる。Subject Suggestion Independent Word Network Display Processing (2
00), as shown in FIG. 7, a subject suggestion independent word group selection process (210), a co-occurrence network generation process (220),
Document search processing (230) and original text reference processing (240)
Can be divided into

【００４０】主題示唆自立語群選定処理（２１０）は、
大容量外部記憶装置のなかから、ユーザが指定する文書
データに対応した共起データベース（ＫＤＢ）、すなわ
ち、共起テーブルと共起確率テーブルと期待値テーブル
とを読み出す。そして、全ての自立語について、自立語
とそれに対する共起単語との関係（すなわち、共起回数
期待値と実際の共起回数との関係）が前述の（４）式を
満たす自立語とそれに対する共起単語との組み合わせを
判別し、これらの組み合わせを登録するテーブルを作成
する。通常、（４）式の定数αには、初期値（１．５）
が設定されている。The subject suggestion independent word group selection process (210)
A co-occurrence database (KDB) corresponding to the document data specified by the user, that is, a co-occurrence table, a co-occurrence probability table, and an expected value table are read from the large-capacity external storage device. Then, for all the independent words, the relationship between the independent word and the co-occurring word corresponding to the independent word (ie, the relationship between the expected co-occurrence count and the actual co-occurrence count) satisfies the above expression (4). Are determined, and a table for registering these combinations is created. Normally, the constant α in the equation (4) includes an initial value (1.5)
Is set.

【００４１】共起ネットワーク生成処理（２２０）は、
主題示唆自立語群選定処理（２１０）によって作成され
たテーブルを参照し、各文書に対応する共起ネットワー
クを作成する。作成された共起ネットワークはＣＲＴに
画面表示される。この共起ネットワークは、前述の図６
に示すように、自立語を示す文字列と、共起関係にある
自立語を示す文字列間を結ぶ共起線とから構成されてい
る。共起線は、共起関係の強さによってその線種、色、
濃淡、長さ、あるいは太さが異なっている。リンクの線
種、色、濃淡、長さ、あるいは太さを決定する場合、共
起確率テーブルの共起確率を参照し、共起確率の大きさ
に応じてリンクの線種、色、長さ、あるいは太さを決定
する。尚、共起関係の強さを差別化する方法としては、
共起関係にある２つの自立語とこれらの自立語間を結ぶ
共起線を三次元表現によって表示し、共起確率の大きさ
によって三次元表現の表現形式を変更する方法も例示で
きる。さらに、共起線を、共起関係の強さに応じた濃淡
で表す場合、ユーザの選択によって共起関係がある程度
以上強いものだけが実質的にユーザに可視になるように
することもできるし、共起関係がより弱いものまで実質
的にユーザに可視となるようにすることもできる。The co-occurrence network generation processing (220)
The co-occurrence network corresponding to each document is created with reference to the table created by the subject suggestion independent word group selection process (210). The created co-occurrence network is displayed on the CRT screen. This co-occurrence network is based on FIG.
As shown in FIG. 2, the character string is a word indicating an independent word and a co-occurrence line connecting the character strings indicating the independent words in a co-occurrence relationship. Co-occurrence lines have different line types, colors,
Different shades, lengths, or thicknesses. When determining the line type, color, shading, length, or thickness of a link, refer to the co-occurrence probability in the co-occurrence probability table, and select the line type, color, and length of the link according to the magnitude of the co-occurrence probability. Or determine the thickness. As a method for differentiating the strength of co-occurrence,
A method in which two independent words having a co-occurrence relationship and a co-occurrence line connecting these independent words are displayed in a three-dimensional expression, and the expression form of the three-dimensional expression is changed depending on the magnitude of the co-occurrence probability can also be exemplified. Further, when the co-occurrence line is represented by shading according to the strength of the co-occurrence relation, it is possible to make only those whose co-occurrence relation is stronger than a certain degree by the user's selection substantially visible to the user. Alternatively, even weaker co-occurrence relationships can be made substantially visible to the user.

【００４２】共起ネットワーク生成処理（２２０）で
は、ユーザによって複数の文書が指定された場合には、
各文書毎に共起ネットワークを生成する。そして、それ
らの共起ネットワークを画面上に表示する際には、一画
面上に並べて表示する全文書表示と、一画面上に重ねて
表示する１文書ずつ表示とを、ユーザが選択できるよう
になっている。また、（４）式の定数αの値を変更して
主題示唆自立語群を選定し直し、共起ネットワークの表
現レベル（共起レベル）を変更することも可能である。In the co-occurrence network generation process (220), when a plurality of documents are specified by the user,
A co-occurrence network is generated for each document. When the co-occurrence networks are displayed on the screen, the user can select between displaying all documents arranged side by side on one screen and displaying each document superimposed on one screen. Has become. It is also possible to change the expression level (co-occurrence level) of the co-occurrence network by changing the value of the constant α in equation (4) and re-selecting the subject suggestion independent word group.

【００４３】文書検索処理（２３０）は、ユーザが例え
ばキーボード等からキーワードを入力したときに、共起
ネットワーク生成処理（２２０）で生成された１つある
いは２つ以上の共起ネットワークを検索して、ユーザが
入力したキーワードを含む共起ネットワークを抽出す
る。ここで、キーワードを含む共起ネットワークが複数
存在する場合には、それらの共起ネットワークは、ユー
ザの選択によって全文書表示あるいは１文書ずつ表示さ
れる。The document search process (230) searches for one or more co-occurrence networks generated in the co-occurrence network generation process (220) when a user inputs a keyword from, for example, a keyboard or the like. , A co-occurrence network including the keyword input by the user is extracted. Here, when there are a plurality of co-occurrence networks including a keyword, these co-occurrence networks are displayed as a whole document or one document at a time depending on the selection of the user.

【００４４】原文参照処理（２４０）は、大容量記憶装
置の文書データの中から、ユーザが指定する文書の原文
章を読み出し、画面表示する。原文章を指定する手順と
しては、文書検索処理（２３０）によって画面表示され
た１つあるいは２つ以上の共起ネットワークのうち、ユ
ーザが希望する文書の共起ネットワークの表示座標を位
置指定手段によって指定すればよい。位置指定手段は、
例えば、マウス等のポインティングデバイスである。
尚、ユーザが、原文章の指定を行う前に、共起ネットワ
ーク上の一又は二以上の任意の自立語の表示位置をポイ
ンティングデバイスによって指定すれば、その自立語が
本発明の第２のキーワードとして認識される。そして、
原文章を表示する際には、原文章の中から、第２のキー
ワードを含む文とその前後の文とがピックアップされて
表示される。このとき、第２のキーワード、あるいは第
２のキーワードを含む文は、網掛け表示や反転表示等の
方法によって強調または区別される。さらに、キーワー
ドを含む文が原文章中に複数存在する場合には、ユーザ
の指定により、キーワードを含む文のうち現在表示され
ている文の前の部分又は文へスクロールまたはジャンプ
したり、次の部分又は文へスクロールまたはジャンプし
たりすることができるようになっている。The original text reference process (240) reads out the original text of the document specified by the user from the document data in the mass storage device and displays it on the screen. As a procedure for designating the original text, the display coordinates of the co-occurrence network of the document desired by the user among one or two or more co-occurrence networks displayed on the screen by the document search process (230) by the position designation means. You can specify it. The position designation means
For example, it is a pointing device such as a mouse.
If the user specifies the display position of one or more arbitrary independent words on the co-occurrence network using a pointing device before designating the original sentence, the independent words are used as the second keyword of the present invention. Will be recognized as And
When displaying the original sentence, a sentence including the second keyword and sentences before and after the sentence are picked up and displayed from the original sentence. At this time, the second keyword or a sentence including the second keyword is emphasized or distinguished by a method such as hatching or reverse display. Further, when there are a plurality of sentences including the keyword in the original sentence, scroll or jump to a portion or a sentence before the currently displayed sentence among the sentences including the keyword, and You can scroll or jump to a part or sentence.

【００４５】次に、複数文書を対象とした場合に、論理
和（ＯＲ）のとりかたによって共起ネットワークがどの
ように変化するかを調べた。（実験１）毎日新聞１９９６年７月および８月発行分よ
り「大腸菌Ｏ１５７」に関する１８記事を抜粋し、全記
事を一文書に繋げ主題抽出した共起ネットワークと、文
書毎の共起ネットワークを論理和（ＯＲ）マージした共
起ネットワークを比較した。（実験２）実験１の「Ｏ１５７」に関する記事と、毎日
新聞１９９７年９月および１０月より抜粋した「ガルー
ダ・インドネシア航空機墜落事故」に関する記事から、
複数の話題が存在した文書群に存在した場合、文書数の
比率によって共起ネットワークの変化をみた。Next, when a plurality of documents are targeted, it was examined how the co-occurrence network changes depending on the logical sum (OR). (Experiment 1) 18 articles on "Escherichia coli O157" were extracted from the July and August 1996 editions of the Mainichi Shimbun, and a co-occurrence network in which all articles were connected to one document to extract themes and a co-occurrence network for each document were logically analyzed. The sum (OR) merged co-occurrence networks were compared. (Experiment 2) From an article on "O157" in Experiment 1 and an article on "Garuda Indonesia aircraft crash" extracted from the Mainichi Newspaper September and October 1997,
When multiple topics existed in a document group, the co-occurrence network was changed according to the ratio of the number of documents.

【００４６】上記各々の結果を図４〜図１０に示した。
図４は、実験１の全文書を繋げて共起テーブルを作成
し、それに基づいて共起ネットワーク形式で表示したも
の。図５は、実験１の各文書のそれぞれについて共起テ
ーブルを作成し、これらの共起テーブルをマージした結
果を共起ネットワーク形式で表示したもの、図６はその
語彙数、総自立語数および総文書数を表形式で示したも
のである。The respective results are shown in FIGS.
FIG. 4 shows a co-occurrence table created by connecting all documents of Experiment 1 and displayed in a co-occurrence network format. FIG. 5 shows a co-occurrence table created for each document in Experiment 1 and the result of merging these co-occurrence tables displayed in a co-occurrence network format. FIG. 6 shows the number of vocabulary words, the total number of independent words, and the total number of independent words. The number of documents is shown in a table format.

【００４７】この結果、実験１については、全文書を繋
げたネットワークと、ＯＲマージしたネットワークで
は、全く同様の共起ネットワークが生成された。同時
に、共起ネットワークのＯＲをとることによって、「Ｏ
１５７」の文書群の特徴を表す「オー」「１５７」や
「食中毒」「大腸菌」といった単語群が抽出されること
が確認できる。As a result, in Experiment 1, the same co-occurrence network was generated between the network connecting all the documents and the network merged by OR. At the same time, by ORing the co-occurrence networks,
It can be confirmed that word groups such as “O” and “157” and “food poisoning” and “Escherichia coli” representing the features of the document group of “157” are extracted.

【００４８】図７〜９は、実験２における２つの話題に
分かれる記事群について、その比率を変化させて共起ネ
ットワーク形式で表示したものである。図１０はその文
書比率である。FIGS. 7 to 9 show, in the co-occurrence network format, an article group divided into two topics in Experiment 2 with its ratio changed. FIG. 10 shows the document ratio.

【００４９】この実験２では、２つの話題のいずれかに
属する文書の文書数の比率を変化させることにより、ほ
ぼ線形に各々の話題を表す単語群が抽出されていること
が確認できる。このことで、文書群に複数に話題が存在
した場合でも、その中にどの様な話題があるかを共起ネ
ットワークより読み取ることができることが分かる。In Experiment 2, it can be confirmed that by changing the ratio of the number of documents belonging to any one of the two topics, a word group representing each topic is extracted almost linearly. Thus, even if a plurality of topics exist in the document group, it can be understood that what kind of topics are present can be read from the co-occurrence network.

【００５０】次に、共起ネットワークの論理積（ＡＮ
Ｄ）をとることで、有益な情報を得ることができないか
を考える。Next, the logical product (AN) of the co-occurrence network
Consider whether you can obtain useful information by taking D).

【００５１】共起ネットワークのＯＲをとるときと同様
に、共起テーブルより共起ネットワークのＡＮＤをとる
と考えると、単純に全文書に含まれている共起対を得る
ことができる。それらの共起対は、文書群の中心の話題
を担っていると思われる。しかし、ある文書では主題を
表す共起対と、他の文書の主題を表していない共起対が
あり、それら２文書に含まれているからといっても、共
起対が２文書の中心の話題を表せているとは考えられな
い。Assuming that the AND of the co-occurrence network is obtained from the co-occurrence table as in the case of ORing the co-occurrence network, a co-occurrence pair included in all documents can be simply obtained. These co-occurrence pairs are likely to be at the center of the documents. However, in some documents, there are co-occurrence pairs that represent the subject and co-occurrence pairs that do not represent the subject of other documents. Even if they are included in these two documents, the co-occurrence pair is the center of the two documents. I don't think it's a topic.

【００５２】そこで、ＡＮＤをとることで得られる共起
対が、文書群の中でどの程度中心の話題を表しているか
の指標を得られないかを考える。Therefore, it is considered whether or not an index indicating how central a co-occurrence pair obtained by taking an AND represents a central topic in a document group can be obtained.

【００５３】共起関係にある２自立語間には、その結び
付きの強さとして、前述の（１）式で共起確率を定義で
きる。The co-occurrence probability can be defined by the above equation (1) as the strength of the connection between two independent words having a co-occurrence relationship.

【００５４】ここで、２つの事象の間の確率から相互情
報量を求めることにより、２事象間の違いを情報量で表
せることに着目する。Here, attention is paid to the fact that the difference between two events can be represented by the amount of information by obtaining the mutual information from the probability between the two events.

【００５５】次に、Ａ，Ｂの２文書間に同時に存在する
共起対の共起確率に対して、図１１に示した（５）式の
様に相互情報量を定義できる。なお、Ｐ_A（ｃｉ／ｗ
ｉ）は文書Ａにおける自立語ｗｉと共起語ｃｊとの共起
確率、Ｐ_B（ｃｉ／ｗｉ）は文書Ｂにおける自立語ｗｉ
と共起語ｃｊとの共起確率、Ｐ_{A B}（ｃｉ／ｗｉ）は文
書Ａと文書ＢとをＯＲマージしたときの自立語ｗｉと共
起語ｃｊとの共起確率である。Next, for the co-occurrence probability of the co-occurrence pair existing simultaneously between the two documents A and B, the mutual information amount can be defined as in equation (5) shown in FIG. In addition, P _A (ci / w
i) is the co-occurrence probability between the independent word wi and the co-occurring word cj in the document A, and P _B (ci / wi) is the independent word wi in the document B.
And P _AB (ci / wi) are the co-occurrence probabilities of the independent word wi and the co-occurring word cj when the document A and the document B are OR-merged.

【００５６】（５）式の相互情報量を用いることで、２
文書間に同時に存在する共起対に対して、２文書間での
共起対の結び付きの強さを示すことができる。By using the mutual information of equation (5), 2
For co-occurrence pairs existing simultaneously between documents, it is possible to indicate the strength of connection of the co-occurrence pairs between the two documents.

【００５７】そこで、多数の文書間に同時に存在する共
起対の結び付きは、存在する文書の総２組み合わせの相
互情報量の平均とみなすことができ、この値は図１２に
示した（６）式で得ることができる。Therefore, the association of co-occurrence pairs that exist simultaneously between a large number of documents can be regarded as the average of mutual information of two combinations of existing documents, and this value is shown in FIG. 12 (6). It can be obtained by the formula.

【００５８】この相互情報量の平均が小さければ小さい
程、より中心の話題を示している共起対であると言え
る。It can be said that the smaller the average of the mutual information amount is, the more the co-occurrence pair indicates a more central topic.

【００５９】相互情報量を用いることにより、共起ネッ
トワークのＡＮＤを構成する共起対、つまり文書群に同
時に含まれている共起対がどの程度文書群の中で中心の
話題を表しているかの指標を得ることができる。By using the mutual information, how many co-occurrence pairs forming the AND of the co-occurrence network, that is, the co-occurrence pairs simultaneously included in the document group, represent the central topic in the document group. Index can be obtained.

【００６０】以上より、複数の共起ネットワークのＡＮ
ＤとＯＲを求めることができるようになった。ＡＮＤと
ＯＲは文書からのボトムアップ処理（より具体的な対象
からより抽象的な対象へと順に組み上げていくような検
索処理）で得られるため、計算機で自動的に算出でき
る。ＡＮＤは文書群の中心の話題を表しており、ＯＲは
ＡＮＤを内包しているはずであることから、次の様な話
題の定義をする（図１３参照）。「中心話題」複数の共起ネットワークのＡＮＤをとるこ
とで得られ、構成する共起対は全ての文書に含まれ、そ
れら共起対を「中心話題」と定義する。共起対の結び付
きの強さを相互情報量の平均で求めることができる。「周辺話題」複数の共起ネットワークをＯＲすることに
より、文書群にどの様な話題が存在しているかを表すこ
とができ、それらの中にはＡＮＤ、つまり「中心話題」
が内包されているはずであることから、文書群に「中心
話題」以外にどの様な話題があるかを知ることができ
る、それらの話題を「周辺話題」を定義する。さらに
「周辺話題」を構成する共起対には、文書群にどの程度
含まれているかの文書数を提示できる。As described above, the AN of a plurality of co-occurrence networks
D and OR can now be found. Since AND and OR are obtained by bottom-up processing from a document (a search processing in which a more specific object is assembled in order from an abstract object), it can be automatically calculated by a computer. AND represents the central topic of the document group, and OR must include AND. Therefore, the following topic is defined (see FIG. 13). “Central topic” is obtained by taking the AND of a plurality of co-occurrence networks, the constituent co-occurrence pairs are included in all documents, and these co-occurrence pairs are defined as “central topic”. The strength of association of co-occurrence pairs can be determined by the average of mutual information. By "ORing" a plurality of co-occurrence networks of "peripheral topics", it is possible to indicate what topics are present in the document group, and among them, AND, that is, "center topic"
Should be included, so that it is possible to know what topics other than the “center topic” exist in the document group. These topics are defined as “surrounding topics”. Further, the co-occurrence pair constituting the "peripheral topic" can be presented with the number of documents indicating how much is included in the document group.

【００６１】「中心話題」と「周辺話題」は内包の関係
にあることから、同一の共起ネットワーク内に表示する
ことができる。Since the “central topic” and the “peripheral topic” have an inclusive relation, they can be displayed in the same co-occurrence network.

【００６２】ここで「周辺話題」に着目すると、その構
成する共起対は検索空間に属する文書から抽出されたも
ので、検索空間に存在する全文書数よりも、含まれてい
る文書は少ないことから、「周辺話題」を検索条件とす
ることで、図１４に示すようにさらに絞り込みを行うこ
とができる。Focusing on the “peripheral topic”, the co-occurrence pairs are extracted from the documents belonging to the search space, and include fewer documents than the total number of documents existing in the search space. Therefore, by using "surrounding topics" as the search condition, it is possible to further narrow down as shown in FIG.

【００６３】検索者が必要な情報に合わせて「周辺話
題」を選ぶことで、もともとの検索空間の「中心話題」
と、検索者によって選ばれた「周辺話題」とを新たな
「中心話題」とした、新たな検索空間を生成することが
できる。このことは、検索空間を狭めることになり、絞
り込みとなる。このとき、検索者が別の検索条件を選ん
だ場合、別の検索空間に絞り込みをかけることができ
る。When the searcher selects the “surrounding topic” according to the required information, the “central topic” of the original search space is selected.
A new search space can be generated in which the “topic” selected by the searcher and the “surrounding topic” are set as a new “center topic”. This narrows the search space and narrows down the search space. At this time, if the searcher selects another search condition, it is possible to narrow down to another search space.

【００６４】検索空間の共起ネットワークはボトムアッ
プ処理で得られるので、「周辺話題」を構成する共起対
も必ず検索空間に属する文書に存在している。また、
「周辺話題」の存在する文書数は、検索空間に属する文
書数より確実に小さく、検索空間を確実に狭めることが
できる。Since the co-occurrence network of the search space is obtained by the bottom-up process, the co-occurrence pair forming the "peripheral topic" always exists in the document belonging to the search space. Also,
The number of documents in which “peripheral topics” exist is certainly smaller than the number of documents belonging to the search space, and the search space can be reliably narrowed.

【００６５】また検索空間に複数の話題が存在したとし
ても、話題を特定することによって絞り込むのではな
く、絞り込むことによって話題が明確になっていく効果
を得られる。Further, even if a plurality of topics exist in the search space, the narrowing down of the topics can be achieved by narrowing down the topics instead of specifying the topics.

【００６６】「周辺話題」を検索者に選ばせることは、
検索のシステム側から検索空間の重要語を提示すること
になり、同義語による検索洩れを軽減することも可能に
する。Making the "surrounding topic" selected by the searcher is as follows.
Key words in the search space are presented from the search system side, and it is also possible to reduce search omissions due to synonyms.

【００６７】さらに、絞り込み時には、文者群の特徴を
表している「周辺話題」をキーにマッチングを行うこと
になるので、文書の特徴を示さない単語とのマッチング
を抑制できる。Further, at the time of narrowing down, matching is performed using "peripheral topics" representing the characteristics of the group of writers as keys, so that matching with words that do not show the characteristics of the document can be suppressed.

【００６８】「周辺話題」を検索条件にすることで、絞
り込みが行えることを述べたが、さらに、絞り込みする
過程で次の様な情報を付加表示してもよい。「検索空間に存在する総文書数」従来の検索でも提示さ
れている総文書数であり、検索者があとどの程度絞り込
みを行えばよいかの指標にできる。具体的には共起ネッ
トワーク画面上に数字で直接表示してもよいし、または
画面上にボックスウィンドウを開きここに表示してもよ
い。「「周辺話題」の文書数」「周辺話題」を構成する共起
対はどの文書に含まれているか、という情報を持ってい
ることから、文書数を提示することができることは前述
した。その「周辺話題」の文書数は、「周辺話題」を選
び絞り込みを行った場合、検索空間がどの程度の大きさ
になるかの指標になり、検索条件としてのリスク（危
険）を知ることが可能となる。これも共起ネットワーク
画面上に数字で直接表示してもよいし、または画面上に
ボックスウィンドウを開きここに表示してもよい。「「中心話題」の相互情報量の平均」絞り込みする過程
で「中心話題」の相互情報量の平均を見ることで、「中
心話題」がどの様に状態遷移しているかを知ることがで
き、検索者にとってその検索空間が意図したものかを判
断できる。さらに、新たに検索条件として選ばれた「周
辺話題」が「中心話題」になったとき、選んだ検索条件
が検索者にとって有効であったかの判断も下すことがで
きる。Although it has been described that the narrowing can be performed by using "peripheral topics" as a search condition, the following information may be additionally displayed during the narrowing process. “Total number of documents existing in search space” is the total number of documents presented in the conventional search, and can be used as an index of how far the searcher should further narrow down. Specifically, it may be displayed directly on the co-occurrence network screen by a number, or a box window may be opened on the screen and displayed here. As described above, it is possible to present the number of documents because it has information on which document contains the co-occurrence pair forming the “number of documents of“ peripheral topic ”” and “peripheral topic”. The number of documents in the "nearby topics" can be an index of how large the search space will be when narrowing down by selecting "nearby topics", and it is possible to know the risk (danger) as a search condition It becomes possible. This may also be displayed directly on the co-occurrence network screen by a number, or a box window may be opened on the screen and displayed here. By looking at the average of the mutual information of the "center topic" in the process of narrowing down the "average of the mutual information of the" center topic "", it is possible to know how the state of the "center topic" transitions. The searcher can determine whether the search space is intended. Further, when the “surrounding topic” newly selected as a search condition becomes the “center topic”, it is possible to determine whether the selected search condition is effective for the searcher.

【００６９】以上より検索上の問題の一つである「検索
者の知ることができる情報が少ない」という点が解決で
き、検索者が意図した検索空間が得られているかの判断
を下すことができるようになる。これも共起ネットワー
ク画面上に数字で直接表示してもよいし、または画面上
にボックスウィンドウを開きここに表示してもよい。As described above, it is possible to solve one of the problems in the search, that is, "there is little information that the searcher can know", and it is possible to determine whether the search space intended by the searcher is obtained. become able to. This may also be displayed directly on the co-occurrence network screen by a number, or a box window may be opened on the screen and displayed here.

【００７０】次に、図１５を用いて本実施形態の検索フ
ローを説明する。Next, a search flow according to this embodiment will be described with reference to FIG.

【００７１】まず、検索空間全体からの文書の絞り込み
を行う（ステップ１５０１）。First, documents are narrowed down from the entire search space (step 1501).

【００７２】次に、これで索出された文書によって新た
に構成される検索空間の共起ネットワークを生成する。
それと同時に「中心話題」、「周辺話題」を算出し、付
加情報としてそれら共起対毎に相互情報量の平均、文書
数を提示する（１５０２）。Next, a co-occurrence network of a search space newly constructed by the documents searched out is generated.
At the same time, the “central topic” and the “peripheral topic” are calculated, and the average of the mutual information amount and the number of documents are presented for each co-occurrence pair as additional information (1502).

【００７３】次に、検索者が必要な情報にしたがって、
「周辺話題」を構成する一共起対を選ぶ。Next, according to the information required by the searcher,
Select a co-occurrence pair that constitutes "neighborhood topics".

【００７４】ここで、検索空間より選ばれた共起対が存
在する文書に絞り込みを行う（１５０３）。つまり、処
理２を絞り込まれた文書に対して行う。次に、絞り込ま
れた検索空間に対して、満足がいくかいかないかの判断
を行う（１５０４）。Here, the documents in which the co-occurring pairs selected from the search space exist are narrowed down (1503). That is, the process 2 is performed on the narrowed documents. Next, it is determined whether the narrowed search space is satisfactory or not (1504).

【００７５】上記ステップ１５０４において、検索結果
に満足しなかった場合、バックトラックを起こし、絞り
込まれる前の検索空間に戻り、ステップ１５０２以降の
処理を繰り返す。If the search result is not satisfied in step 1504, a backtrack occurs, the process returns to the search space before narrowing down, and the processes in step 1502 and thereafter are repeated.

【００７６】検索者が絞り込みの終了の合図を出すか、
一文書になるまで、ステップ１５０２〜１５０４の処理
を繰り返す。絞り込みの終了か、一文書になった時点
で、検索空間に属している文書毎に、共起ネットワーク
を提示する（１５０５）。Whether the searcher gives a signal to end the narrowing down,
Steps 1502 to 1504 are repeated until one document is obtained. At the end of the narrowing-down or when one document is obtained, a co-occurrence network is presented for each document belonging to the search space (1505).

【００７７】[0077]

【実験例】本発明を具体的に実現するための検索システ
ムの実装を行った。実装にあたって実現させた機能は以
下の通りである。（１）バック機能とフォワード機能検索処理を１ステップ前に戻す機能と１ステップ先に進
める機能である。これは具体的には検索のログを一時的
にメモリに蓄積しておくことにより実現できる。[Experimental example] A search system for specifically realizing the present invention was implemented. The functions realized in the implementation are as follows. (1) Back function and forward function These are a function to return the search processing one step before and a function to advance the search processing one step ahead. This can be specifically realized by temporarily storing a search log in a memory.

【００７８】すなわち、検索者が検索空間に対して満足
しなかった場合バックトラック操作を行うため、前検索
空間に戻るためのバック機能が必要となる。逆にバック
トラックをしたところ検索者がやはり検索条件は正しか
ったと判断する場合があるので、フォワード機能も必要
となる。（２）共起ネットワークの自立語ノードの再配置機能可視的な共起ネットワークを画面に表示することにより
検索結果を示しているが、このネットワークが見づらい
ときに自立語ノードを再配置する機能を持たせる必要が
ある。（３）共起確率の閾値変更機能共起ネットワークを構成する共起対は強さによって結び
付いているため、閾値を変化させることによって、ネッ
トワーク全体の情報の多さを変更できる。そこで、ネッ
トワークの閾値を変更する機能が必要となる。（４）その他現在の検索空間の閾値と、総文書数の提示、「中心話
題」と「周辺話題」を色分けし、各々の共起対の二等分
点に相互情報量の平均、文書数を表す等の機能を用意し
た。That is, when the searcher is not satisfied with the search space, the back track operation is performed, so a back function for returning to the previous search space is required. Conversely, when the backtracking is performed, the searcher may determine that the search conditions are still correct, so a forward function is also required. (2) Relocation function of autonomous node in co-occurrence network Search results are shown by displaying a visible co-occurrence network on the screen, but a function to relocate autonomous node when this network is difficult to see It is necessary to have. (3) Co-occurrence Probability Threshold Change Function Since co-occurrence pairs forming a co-occurrence network are linked by strength, the amount of information of the entire network can be changed by changing the threshold. Therefore, a function for changing the network threshold is required. (4) Others Present threshold of current search space and total number of documents, color-code "central topic" and "peripheral topic", and divide each co-occurrence pair into two equal points, average mutual information, number of documents And other functions.

【００７９】これらの機能を付け、実装した表示画面の
例を図１６に示す。FIG. 16 shows an example of a display screen provided with these functions and mounted.

【００８０】なお、これらの実装プログラムは、ＵＮＩ
Ｘワークステーション上でＣ言語によって記述した。Note that these implementation programs are UNI
Written in C language on X workstation.

【００８１】図２１〜図２５は共起ネットワークを表示
するための表示画面を示したものである。画面構成は、
共起ネットワークを表示するための窓２１１２が開かれ
ており、この窓２１１２の右側に各種のボタンや表示枠
が配置されている。ここで表示された各ボタンはマウス
等の座標指示手段によって操作可能である。FIGS. 21 to 25 show display screens for displaying a co-occurrence network. The screen configuration is
A window 2112 for displaying the co-occurrence network is open, and various buttons and display frames are arranged on the right side of the window 2112. Each of the buttons displayed here can be operated by coordinate indicating means such as a mouse.

【００８２】モード表示部２１０２は検索モード(Searc
h Mode)と読出モード(Read Mode)を表示する。バックボ
タン２１０３(Back)は、前検索表示に戻すためのボタン
であり、フォワードボタン１２０４(Forward)は、バッ
クボタンで戻した表示画面を先に進めて元に戻すための
ボタンである。The mode display unit 2102 displays a search mode (Searc
h Mode) and read mode (Read Mode). A back button 2103 (Back) is a button for returning to the previous search display, and a forward button 1204 (Forward) is a button for advancing the display screen returned by the back button and returning to the original display screen.

【００８３】再配置ボタン２１０５(Replace)は、窓２
１１２に表示されている共起ネットワークの自立語の位
置を再配置するためのボタンである。変更ボタン２１０
６(Change)は閾値を変更するためのボタンであり、図１
２の右端に示すように閾値の数値を変更できるように変
化する。The rearrange button 2105 (Replace) is
A button for rearranging the position of the independent word of the co-occurrence network displayed in 112. Change button 210
6 (Change) is a button for changing the threshold value.
As shown at the right end of FIG. 2, the threshold value is changed so that the numerical value of the threshold value can be changed.

【００８４】閾値表示部２１０７は、現在の閾値を表示
するための小窓であり、同図では閾値として３．５が表
示されている。The threshold value display section 2107 is a small window for displaying the current threshold value. In FIG. 14, 3.5 is displayed as the threshold value.

【００８５】総文書数表示部２１０８は、検索空間の総
文書数を表示する小窓であり、同図では文書数（ファイ
ル数）として２７が表示されている。The total number of documents display section 2108 is a small window for displaying the total number of documents in the search space. In the figure, 27 is displayed as the number of documents (number of files).

【００８６】モード切替ボタン２１１０は、検索モード
と読出モードとの切り替えを行うためのボタンである。The mode switching button 2110 is a button for switching between a search mode and a reading mode.

【００８７】時節毎の文書群にある特定の話題が存在し
ていることから、テレビ放送のある論説の１９９５年１
０月〜１９９７年１２月放映分の記録を用意し、検索の
実験を行った。Since a particular topic exists in the document group for each time period, an editorial of a television broadcast in 1995
A record of the broadcast from January to December 1997 was prepared, and a search experiment was performed.

【００８８】まず、キーワードマッチングおよび時系列
の共起ネットワークの２種類の手法で最初の検索空間を
生成し、絞り込みを行った。First, the first search space was generated and narrowed down by two types of methods, namely, keyword matching and time-series co-occurrence network.

【００８９】以降、図面上における共起ネットワークの
太い線分は「中心話題」を表しており、共起対の線分上
の数値は相互情報量の平均を、細い線分は「周辺話題」
を表しており、線分上の数値はその「周辺話題」が選ば
れた場合の絞り込まれる文書数を示している。Hereinafter, the thick line segment of the co-occurrence network on the drawing indicates “center topic”, the numerical value on the line segment of the co-occurrence pair indicates the average of mutual information, and the thin line segment indicates “surrounding topic”.
And the numerical value on the line segment indicates the number of documents to be narrowed down when the “surrounding topic” is selected.

【００９０】前述の論説用ニュース原稿による検索空間
では、「沖縄」「基地」のキーワードで共起対のマッチ
ングをした結果、２７文書が索出された。その共起ネッ
トワークを示したものが図１７の上図（ａ）である。In the search space based on the above-mentioned editorial news manuscript, as a result of matching the co-occurrence pairs using the keywords “Okinawa” and “base”, 27 documents were found. FIG. 17A shows the co-occurrence network.

【００９１】得られた検索空間に絞り込みの条件として
違うものを選んだ場合の評価として、「沖縄」「問題」
を与えた場合の共起ネットワークが図１７（ｂ）であ
る。また、（ａ）に「日」「米」を与えた場合の共起ネ
ットワークが図１７（ｃ）である。When different search conditions were selected for the obtained search space, the evaluations were “Okinawa” and “Problem”.
FIG. 17B shows a co-occurrence network in the case where is given. FIG. 17C shows a co-occurrence network when “day” and “rice” are given to (a).

【００９２】絞り込み条件の設定・変更インターフェー
スを示したものが図２２である。同図では、窓２１１２
中に絞り込み用小窓２２０１が表示され、絞り込みのた
めの単語が表示されるようになっている。FIG. 22 shows an interface for setting / changing the narrowing-down conditions. In the figure, the window 2112
A narrow window 2201 for narrowing down is displayed inside, and words for narrowing down are displayed.

【００９３】図１８および図１９は、時系列の共起ネッ
トワークからの検索の実験結果である。図１８は、１９
９６年の論説用ニュース原稿を１月より三ヶ月毎に４つ
の時系列に分類し、その中から４〜６月分の共起ネット
ワークを選び、さらに「民主」「主義」で絞り込みを行
った結果である。図１９は、大局的な時系列からの検索
が行えるかの評価として、１９９６、９７年の２年分の
時系列で三ヶ月毎４つに分類し、先と同様に４〜６月分
を選び、「民主」「主義」で絞り込みを行った結果であ
る。FIGS. 18 and 19 show experimental results of retrieval from a time-series co-occurrence network. FIG.
The 1996 editorial news manuscript was categorized into four chronological series every three months from January, and a co-occurrence network for April to June was selected from among them, and further narrowed down by "democracy" and "ism" The result. FIG. 19 shows, as an evaluation of whether or not a search can be performed from a global time series, the time series is classified into four every three months in a two-year time series of 1996 and 1997. It is the result of choosing and narrowing down by "democracy" and "ism".

【００９４】各実験とも絞り込みを行っているが、絞り
込む前の検索空間の「中心話題」と選んだ「周辺話題」
が、「中心話題」となる検索空間に絞り込みを行えてい
ることがわかる。そのことにより、図１７（ｂ）の「問
題」「基地」や「米軍」「基地」のように、新たに共起
対のリンクが張られたり、図１７（ｃ）の「安全」「保
障」のように、絞り込む前の検索空間にはなかった「周
辺話題」が現れる等の結果が得られた。このように、共
起ネットワーク表示により、検索空間の状態が可視化さ
れているので、検索者が意図した検索空間であるかの判
断が容易になっている。In each experiment, the search was narrowed down, but the "center topic" and the "surrounding topic" selected in the search space before the search were narrowed down.
However, it can be seen that the search space narrowed down to the "center topic". As a result, a new co-occurrence pair link is established like “problem” and “base” and “US Army” and “base” in FIG. 17B, and “safety” and “safety” in FIG. Results such as "surroundings" appearing in the search space before narrowing down, such as "security", were obtained. As described above, since the state of the search space is visualized by the co-occurrence network display, it is easy to determine whether the search space is intended by the searcher.

【００９５】また図１７において、選ぶ「周辺話題」に
よって生成される検索空間が違っており、選んだ「周辺
話題」に関連する検索空間に絞り込みが行われているこ
とが窺える。その際に、もともとの「中心話題」である
「沖縄」「基地」の相互情報量の平均は、図２０で示す
ように変化しており、検索の方向が間違っていないこと
を示している。また、文書数は「周辺話題」の線分上に
提示している値に実際に絞り込まれるので、検索条件を
選ぶにあたり、そのリスクを知ることができる。In FIG. 17, the search space generated by the selected “surrounding topic” is different, and it can be seen that the search space related to the selected “surrounding topic” is narrowed down. At that time, the average of the mutual information amounts of “Okinawa” and “base”, which are the original “center topics”, has changed as shown in FIG. 20, indicating that the search direction is not wrong. Further, the number of documents is actually narrowed down to the value presented on the line segment of "peripheral topic", so that the risk can be known when selecting a search condition.

【００９６】時系列からの絞り込みを行う手法でも同様
のことが窺える。The same can be seen in the technique of narrowing down from the time series.

【００９７】さらに、時系列を持っている文書群からの
検索では、その特徴である時節毎に偏った話題が存在し
ていることを利用することで、検索上の問題点である同
義語による検索洩れを軽減した検索を行うことができ
る。Further, in a search from a group of documents having a time series, by utilizing the fact that a topic biased for each time period, which is a feature of the search, a synonym which is a problem in the search is used. A search with reduced search omission can be performed.

【００９８】以上のことより、本検索手法はある程度検
索空間が狭められた状態からの検索を行うには、効率的
で有効であるといえる。From the above, it can be said that this search method is efficient and effective for performing a search from a state where the search space is narrowed to some extent.

【００９９】図２４は、読出モードにおける文書表示例
を示したものである。また図２５は読出モードにおい
て、索出された文書毎の共起ネットワークを順次表示す
る状態を示したものである。FIG. 24 shows an example of document display in the reading mode. FIG. 25 shows a state in which the co-occurrence network for each retrieved document is sequentially displayed in the reading mode.

【０１００】[0100]

【発明の効果】本発明によれば、複数文書を対象にした
検索において、ＯＲ条件で抽出された周辺話題と、ＡＮ
Ｄ条件で抽出された中心話題とを表示することにより、
発見的探索、すなわち検索者が予め意図していなかった
ような分析結果の取得も含む柔軟な検索を複数の文書の
集合体を対象に実行することができる。According to the present invention, in a search for a plurality of documents, a peripheral topic extracted under an OR condition and an AN
By displaying the central topic extracted under the D condition,
A heuristic search, that is, a flexible search including acquisition of an analysis result that the searcher did not intend in advance can be performed on a set of a plurality of documents.

[Brief description of the drawings]

【図１】本発明の共起テーブルの統合（マージ）につ
いて説明するための図FIG. 1 is a diagram for explaining integration (merging) of co-occurrence tables according to the present invention;

【図２】簡単な共起ネットワークを示す説明図FIG. 2 is an explanatory diagram showing a simple co-occurrence network

【図３】本発明の検索支援システムを構成するブロッ
ク図FIG. 3 is a block diagram of a search support system according to the present invention;

【図４】２文書を結合して共起ネットワークを表示し
た説明図FIG. 4 is an explanatory view showing a co-occurrence network by combining two documents.

【図５】２文書の共起テーブルをマージして共起ネッ
トワークを構成した説明図FIG. 5 is an explanatory diagram in which a co-occurrence network is configured by merging co-occurrence tables of two documents.

【図６】２文書を繋げて主題抽出した場合とマージし
て主題抽出した場合との比較表FIG. 6 is a comparison table between a case where two documents are connected and the subject is extracted and a case where the subject is extracted by merging.

【図７】ニュース論説記事からの共起ネットワークを
表示した図（ｉ）FIG. 7 shows a co-occurrence network from a news editorial article (i).

【図８】ニュース論説記事からの共起ネットワークを
表示した図（ｉｉ）FIG. 8 is a diagram showing a co-occurrence network from a news editorial article (ii).

【図９】ニュース論説記事からの共起ネットワークを
表示した図（ｉｉｉ）FIG. 9 is a diagram showing a co-occurrence network from a news editorial article (iii).

【図１０】共起ネットワークのマージ比率を示した比
較表FIG. 10 is a comparison table showing a merge ratio of a co-occurrence network.

【図１１】２文書で同時に存在する共起対の共起確率
に対して相互情報量を定義した式（５）FIG. 11 is an equation (5) that defines mutual information with respect to the co-occurrence probability of a co-occurrence pair existing simultaneously in two documents.

【図１２】共起対の結び付きを示すための存在する文
書の総２組み合わせの相互情報量の平均を示す式（６）FIG. 12 is an equation (6) showing an average of mutual information of two total combinations of existing documents to show the association of co-occurrence pairs.

【図１３】共起ネットワークからの中心話題（ＡＮ
Ｄ）と周辺話題（ＯＲ）の抽出概念を示す説明図FIG. 13 shows a central topic (AN) from a co-occurrence network.
Explanatory diagram showing the concept of extraction of D) and surrounding topics (OR)

【図１４】周辺話題を用いた絞り込みの概念を示す説
明図FIG. 14 is an explanatory diagram showing a concept of narrowing down using peripheral topics.

【図１５】検索空間から絞り込みを行う過程を示す説
明図FIG. 15 is an explanatory diagram showing a process of narrowing down from a search space.

【図１６】共起ネットワークの表示画面例を示す図FIG. 16 is a diagram illustrating an example of a display screen of a co-occurrence network.

【図１７】キーワードマッチングからの共起ネットワ
ーク検索を示す図FIG. 17 is a diagram showing a co-occurrence network search from keyword matching

【図１８】時系列の共起ネットワークからの検索を示
す図（１）FIG. 18 is a diagram (1) showing a search from a time-series co-occurrence network;

【図１９】時系列の共起ネットワークからの検索を示
す図（２）FIG. 19 is a diagram illustrating a search from a time-series co-occurrence network (2).

【図２０】実験例における抽出単語の相互情報量の平
均の変化を示す表FIG. 20 is a table showing a change in an average of mutual information of extracted words in an experimental example;

【図２１】本実施形態の表示画面を示す図（１）FIG. 21 is a diagram showing a display screen according to the embodiment (1).

【図２２】本実施形態の表示画面を示す図（２）FIG. 22 is a diagram (2) showing a display screen of the present embodiment.

【図２３】本実施形態の表示画面を示す図（３）FIG. 23 is a view showing a display screen according to the embodiment (3).

【図２４】本実施形態の表示画面を示す図（４）FIG. 24 is a diagram (4) showing a display screen of the embodiment.

【図２５】本実施形態の表示画面を示す図（５）FIG. 25 is a view showing a display screen according to the embodiment (5).

[Explanation of symbols]

２１０１窓２１０２モード表示部２１０３バックボタン２１０４フォワードボタン２１０５再配置ボタン２１０６変更ボタン２１０７閾値表示部２１０８総文書数表示部２１１０モード切替ボタン２１１１終了ボタン２１１２ネットワーク表示部 2101 window 2102 mode display section 2103 back button 2104 forward button 2105 rearrange button 2106 change button 2107 threshold display section 2108 total number of documents display section 2110 mode switching button 2111 end button 2112 network display section

Claims

[Claims]

An independent word is extracted from a given first plurality of documents, and a co-occurrence table for recording co-occurring words and the number of co-occurrences for the independent words generated for each document is created. A peripheral topic generating step of generating an OR network from the generated co-occurrence table for each document; a central topic generating step of generating an AND network from the generated co-occurrence table for each document; Displaying a logical sum network including a network and prompting input of a word to be narrowed down; narrowing down to a second document group including the input word; and defining the second document group as the first document group A document database search support method that repeats creation of a co-occurrence table and generation of peripheral topics and central topics.

2. The method according to claim 1, wherein the central topic generating step is a step of calculating the strength of a co-occurrence pair existing simultaneously in a plurality of documents by using a mutual information amount between the plurality of documents. 1. A search support method for a document database according to 1.

3. A step of extracting an independent word from a given first plurality of documents, and creating a co-occurrence table for recording co-occurring words and the number of co-occurrences for the independent words generated for each document. A peripheral topic generating step of generating an OR network from the generated co-occurrence table for each document; a central topic generating step of generating an AND network from the generated co-occurrence table for each document; Displaying a logical sum network including a network and prompting input of a word to be narrowed down; narrowing down to a second document group including the input word; and defining the second document group as the first document group A memory storing a search support program for a document database, comprising a step of repeating co-occurrence table creation, peripheral topic generation and central topic generation Body.