JP2012242966A

JP2012242966A - Knowledge acquisition device, knowledge acquisition method, and program

Info

Publication number: JP2012242966A
Application number: JP2011110739A
Authority: JP
Inventors: 哲朗 ▲高▼橋; Tetsuro Takahashi; Nobuyuki Igata; 伸之井形
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-05-17
Filing date: 2011-05-17
Publication date: 2012-12-10
Anticipated expiration: 2031-05-17
Also published as: JP5594225B2

Abstract

【課題】より迅速に結果を出力可能な知識獲得装置を提供すること。
【解決手段】特定の関係を有する単語の対と、該単語の対を含む文に関する形態素の構造とを関連付けて持つ単語対テーブルを格納した第１の記憶手段と、前記第１の記憶手段から抽出した単語の対と形態素の構造に、検索対象を限定する特定の事象を加えた検索キーを作成し、該検索キーで検索対象の文書群を格納した第２の記憶手段を検索して該検索キーの出現数を求め、求めた出現数を前記単語の対と前記形態素の構造に関連付けて前記単語対テーブルに格納する出現数情報取得手段と、前記単語対テーブルを参照し、各単語の対に関する形態素の構造毎の出現数傾向と、全単語の対に関する形態素の構造毎の出現数傾向との合致程度に基づいて、前記各単語の対と前記特定の事象との関連性を評価した評価値を出力する評価手段と、を備える知識獲得装置。
【選択図】図３To provide a knowledge acquisition device capable of outputting results more quickly.
A first storage unit storing a word pair table having a pair of words having a specific relationship and a morpheme structure related to a sentence including the pair of words, and the first storage unit A search key is created by adding a specific event that limits the search target to the extracted word pair and morpheme structure, and the second storage means storing the search target document group is searched using the search key The number of occurrences of the search key is obtained, the number of occurrences obtained is associated with the word pair and the structure of the morpheme and stored in the word pair table, and the word pair table is referred to. The relevance between each pair of words and the specific event was evaluated based on the degree of matching between the trend of the number of appearances of each morpheme related to the pair and the number of appearances of each morpheme related to the pair of words. An evaluation means for outputting an evaluation value; Knowledge acquisition apparatus comprising.
[Selection] Figure 3

Description

本発明は、コンピュータを用いてデータ群から単語対等の知識を取得する技術に関し、特に、特定の事象と単語対との関係に関する知識を取得する知識獲得装置、知識取得方法、及びプログラムに関する。 The present invention relates to a technique for acquiring knowledge such as word pairs from a data group using a computer, and more particularly to a knowledge acquisition device, a knowledge acquisition method, and a program for acquiring knowledge about a relationship between a specific event and a word pair.

従来、インターネット等を利用して、アクセス可能なデータ群からキーワード（検索キー）を含むデータを検索することが広く行われている。検索の結果として取得されるデータは、主に電子文書である。 2. Description of the Related Art Conventionally, data including a keyword (search key) is searched from an accessible data group using the Internet or the like. Data acquired as a result of the search is mainly an electronic document.

このようなデータ検索を通じて獲得された知識は、辞書・辞典等の作成、及び電子的情報サービスの提供等に応用することができる。例えば、「＊は＊＊に効く」、「＊は＊＊に効果がある」等を検索キーとしてデータ群を検索すると、「病気」と「それに効果のある食品」のような単語の対が複数組取得されることが期待される。ここで、上記「＊」や「＊＊」は、内容を特定せずにデータ検索を行う部分を示す。 Knowledge acquired through such data search can be applied to the creation of dictionaries, dictionaries, etc., and the provision of electronic information services. For example, if you search a data group using “* is effective for **”, “* is effective for **”, etc. as a search key, a pair of words such as “disease” and “food that has an effect on it” will be found. It is expected that multiple sets will be acquired. Here, the above “*” and “**” indicate a portion where data search is performed without specifying the contents.

以下、このようにして得られる特定の関係を有する単語の対を、単語対と称する。上記の場合、「風邪」と「大根」、「風邪」と「生姜」、「頭痛」と「梅干し」などが単語対として取得され得る。また、上記「＊は＊＊に効く」の他にも、「場所」と「そこに適した掃除道具」、「季節」と「料理」、「食材」と「調味料」等、様々なものが単語対として考えられる。このような単語対を網羅的に収集することによって、有用な情報サービスを提供することが可能になると考えられる。 Hereinafter, a word pair having a specific relationship obtained in this manner is referred to as a word pair. In the above case, “cold” and “radish”, “cold” and “ginger”, “headache” and “plum pickled”, etc. can be acquired as word pairs. In addition to the above "* works for **", there are various things such as "place" and "cleaning tool suitable there", "season" and "cooking", "food ingredients" and "condiment" Are considered as word pairs. It is considered that useful information services can be provided by comprehensively collecting such word pairs.

一方、「＊」や「＊＊」以外の部分、すなわち「は」・「に効く」の部分、及び「は」・「に効果がある」の部分は、単語対を取得するための抽出規則といえる。以下、こうした抽出規則を、「文脈パターン」と称する。 On the other hand, parts other than “*” and “**”, that is, “Ha” and “Effective”, and “Ha” and “Effective” are extraction rules for obtaining word pairs. It can be said. Hereinafter, such an extraction rule is referred to as a “context pattern”.

単語対と文脈パターンは、大規模データからコンピュータが自動アルゴリズムで取得することができる（例えば、特許文献１、並びに非特許文献１参照）。 A word pair and a context pattern can be acquired from a large-scale data by an automatic algorithm by a computer (see, for example, Patent Document 1 and Non-Patent Document 1).

具体的には、まず、シードと称される既知の単語対をコンピュータに与える。コンピュータは、シードを含む文脈パターンを、大規模データから検索する。文脈パターンが得られると、得られた文脈パターンを含む文書を検索し、得られた文書から未知の単語対を取得する。これらを繰り返し行うことによって、単語対及び文脈パターンの数が増加していく。最終的には、複数の単語対と複数の文脈パターンが知識として獲得され、辞書・辞典等の用途に用いることが可能なデータベースが作成される。 Specifically, first, a known word pair called a seed is given to the computer. The computer searches the large-scale data for context patterns including seeds. When the context pattern is obtained, a document including the obtained context pattern is searched, and an unknown word pair is obtained from the obtained document. By repeating these steps, the number of word pairs and context patterns increases. Eventually, a plurality of word pairs and a plurality of context patterns are acquired as knowledge, and a database that can be used for applications such as a dictionary and a dictionary is created.

なお、「出現数が所定数以上である」等の条件を付与して、一般的でないと思われる知識（ノイズ、ゴミ）を除外する処理等が行われ得る。この際に、コンピュータの作業にユーザの各種設定入力等を挟むことにより、文脈パターンに重み付けを行うことも想定される。 It should be noted that a process of excluding knowledge (noise, dust) that seems to be uncommon may be performed by giving a condition such as “the number of appearances is a predetermined number or more”. At this time, it is also assumed that the context pattern is weighted by inserting various setting inputs of the user into the computer operation.

米国特許第７１４６３０８号明細書US Pat. No. 7,146,308

Stijn De Saeger他、「単語の意味クラスを用いたパターン学習による大規模な意味的関係獲得」、言語処理学会第１６回年次大会、ｐｐ９３２−９３５、２０１０Stijn De Saeger et al., “Acquisition of large-scale semantic relationships by pattern learning using word semantic classes”, 16th Annual Conference of the Language Processing Society, pp 932-935, 2010

ところで、上記のように知識を獲得する際には、データ（電子文書）に関する特定の事象に限定してデータを収集したいというニーズが存在する。特定の事象の代表的なものは、地域・業界・分野等のカテゴリーである。以下、簡便のため、カテゴリーと表記する。 By the way, when acquiring knowledge as described above, there is a need to collect data limited to a specific event related to data (electronic document). Typical examples of specific events are categories such as region, industry, and field. Hereinafter, for the sake of simplicity, it is referred to as a category.

具体的には、前述の「病気」と「それに効果のある食品」のような単語対を取得する場合、（１）医療関連の権威ある文書から獲得された単語対のみを用いたい、（２）その反対に、草の根的な単語対を網羅的に集めたい等の要求が考えられる。また、ユーザの居住地等に起因し、（３）西日本だけで通説となっている単語対を集めたい、等の要求も考えられる。 Specifically, when acquiring word pairs such as the above-mentioned “disease” and “food effective for it”, (1) use only word pairs acquired from a medical-related authoritative document, (2 On the other hand, there may be a request to collect grass root word pairs comprehensively. Further, due to the user's residence, etc., there may be a request such as (3) collecting word pairs that are common in West Japan alone.

ところが、大規模データからコンピュータが自動アルゴリズムで単語対及び文脈パターンを取得する処理は、多くの繰り返し処理を含むため、カテゴリーが指定されてから処理を行うのでは、処理時間が長くなってしまう。この結果、ユーザに所望の結果を迅速に提供できない可能性が高くなる。 However, the process in which a computer acquires word pairs and context patterns from large-scale data using an automatic algorithm includes many repetitive processes. Therefore, if a process is performed after a category is specified, the processing time becomes long. As a result, there is a high possibility that the desired result cannot be quickly provided to the user.

一方、予め、カテゴリー毎に単語対や文脈パターンを取得してデータベースに格納しておくことも考えられるが、この場合、データ量が膨大となり、リソースに対する要求が高くなってしまう。また、ユーザにより指定され得る全てのカテゴリーを予測するのは現実的でない。 On the other hand, it is conceivable to acquire word pairs and context patterns for each category in advance and store them in a database. However, in this case, the amount of data becomes enormous and the demand for resources increases. In addition, it is not realistic to predict all categories that can be specified by the user.

本発明はこのような課題を解決するためのものであり、より迅速に結果を出力可能な知識獲得装置、知識取得方法、及びプログラムを提供することを、主たる目的とする。 The present invention is to solve such problems, and a main object of the present invention is to provide a knowledge acquisition device, a knowledge acquisition method, and a program that can output results more quickly.

上記目的を達成するための一態様は、
特定の関係を有する単語の対と、該単語の対を含む文に関する形態素の構造とを関連付けて持つ単語対テーブルを格納した第１の記憶手段と、
前記第１の記憶手段から抽出した単語の対と形態素の構造に、検索対象を限定する特定の事象を加えた検索キーを作成し、該検索キーで検索対象の文書群を格納した第２の記憶手段を検索して該検索キーの出現数を求め、求めた出現数を前記単語の対と前記形態素の構造に関連付けて前記単語対テーブルに格納する出現数情報取得手段と、
前記単語対テーブルを参照し、各単語の対に関する形態素の構造毎の出現数傾向と、全単語の対に関する形態素の構造毎の出現数傾向との合致程度に基づいて、前記各単語の対と前記特定の事象との関連性を評価した評価値を出力する評価手段と、
を備える知識獲得装置である。 One aspect for achieving the above object is as follows:
A first storage means storing a word pair table having a pair of words having a specific relationship and a structure of a morpheme related to a sentence including the pair of words;
A second search key is created by adding a specific event limiting the search target to the word pair and morpheme structure extracted from the first storage means, and storing a search target document group using the search key. Retrieving the number of occurrences of the search key by searching the storage means, the appearance number information acquisition means for storing the obtained number of occurrences in the word pair table in association with the word pair and the morpheme structure;
With reference to the word pair table, based on the degree of coincidence between the number of occurrences of each morpheme structure related to each word pair and the number of appearances of each morpheme structure related to all word pairs, An evaluation means for outputting an evaluation value obtained by evaluating an association with the specific event;
It is a knowledge acquisition device provided with.

本発明によれば、より迅速に結果を出力可能な知識獲得装置等を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the knowledge acquisition apparatus etc. which can output a result more rapidly can be provided.

本発明の一実施例に係る知識獲得装置１を含む情報システムの全体像である。1 is an overall view of an information system including a knowledge acquisition device 1 according to an embodiment of the present invention. 本実施例の知識獲得装置１のハードウエア構成例である。It is a hardware structural example of the knowledge acquisition apparatus 1 of a present Example. 本実施例の知識獲得装置１の機能構成例である。It is a functional structural example of the knowledge acquisition apparatus 1 of a present Example. 本実施例の知識獲得装置１により実行される処理の全体像を模式的に示す図である。It is a figure which shows typically the whole image of the process performed by the knowledge acquisition apparatus 1 of a present Example. オペレータ１１０がクライアントコンピュータ１００に対してシードを入力し、知識獲得装置１が単語対テーブル４０を生成する様子を模式的に示す図である。It is a figure which shows typically a mode that the operator 110 inputs a seed with respect to the client computer 100, and the knowledge acquisition apparatus 1 produces | generates the word pair table 40. FIG. 単語対獲得部３０により生成される単語対テーブル４０の一例である。It is an example of the word pair table 40 produced | generated by the word pair acquisition part 30. FIG. ユーザ１２０がクライアントコンピュータ１００に対してクエリを入力し、知識獲得装置１が出現数テーブル４２を生成する様子を模式的に示す図である。It is a figure which shows typically a mode that the user 120 inputs a query with respect to the client computer 100, and the knowledge acquisition apparatus 1 produces | generates the appearance number table 42. FIG. 検索部３２により生成される出現数テーブル４２の一例である。It is an example of the appearance number table generated by the search unit. 知識獲得装置１がスコア付単語対４４を生成する様子を模式的に示す図である。It is a figure which shows typically a mode that the knowledge acquisition apparatus 1 produces | generates the word pair 44 with a score. スコア算出部３４の出力により出力されるスコア付単語対４４の一例である。4 is an example of a scored word pair 44 output by the output of the score calculation unit 34; 従来の装置によって実行され得る処理と、本実施例の知識獲得装置１により実行される処理とを対比するための模式図である。It is a schematic diagram for contrasting the process which can be performed with the conventional apparatus, and the process performed by the knowledge acquisition apparatus 1 of a present Example. 単語対獲得部３０により実行される特徴的な処理の流れを示すフローチャートである。4 is a flowchart showing a flow of characteristic processing executed by a word pair acquisition unit 30. 単語対獲得部３０により実行される文脈パターン抽出処理の流れを示すフローチャートである。5 is a flowchart showing a flow of context pattern extraction processing executed by a word pair acquisition unit 30. Ｓ４１０の解析において用いられる形態素の構造を模式的に示す図である。It is a figure which shows typically the structure of the morpheme used in the analysis of S410. 検索部３２により実行される特徴的な処理の流れを示すフローチャートである。4 is a flowchart showing a flow of characteristic processing executed by a search unit 32. スコア算出部３４により実行される特徴的な処理の流れを示すフローチャートである。4 is a flowchart showing a flow of characteristic processing executed by a score calculation unit.

以下、本発明を実施するための形態について、添付図面を参照しながら実施例を挙げて説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described with reference to the accompanying drawings.

以下、図面を参照し、本発明の一実施例に係る知識獲得装置１について説明する。 Hereinafter, a knowledge acquisition device 1 according to an embodiment of the present invention will be described with reference to the drawings.

［ハードウエア構成］
図１は、本発明の一実施例に係る知識獲得装置１を含む情報システムの全体像である。図示するように、知識獲得装置１は、例えば、一又は複数のクライアントコンピュータ１００がネットワーク５０を介して接続されたサーバ装置である。また、知識獲得装置１は、ネットワーク５０を介して大規模文書２００にアクセス可能となっている。なお、ネットワーク５０は、インターネット、携帯電話やＰＨＳ（Personal Handy-phone System）の電波網、ＬＡＮ（Local Area Network）等を含む。 [Hardware configuration]
FIG. 1 is an overall view of an information system including a knowledge acquisition device 1 according to an embodiment of the present invention. As illustrated, the knowledge acquisition device 1 is a server device in which one or a plurality of client computers 100 are connected via a network 50, for example. The knowledge acquisition apparatus 1 can access the large-scale document 200 via the network 50. The network 50 includes the Internet, a mobile phone, a PHS (Personal Handy-phone System) radio network, a LAN (Local Area Network), and the like.

図２は、本実施例の知識獲得装置１のハードウエア構成例である。知識獲得装置１は、例えば、ＣＰＵ（Central Processing Unit）１０と、ドライブ装置１２と、補助記憶装置１６と、メモリ装置１８と、インタフェース装置２０と、入力装置２２と、出力装置２４と、を備える情報処理装置である。これらの構成要素は、バスやシリアル回線等を介して接続されている。 FIG. 2 is a hardware configuration example of the knowledge acquisition apparatus 1 of the present embodiment. The knowledge acquisition device 1 includes, for example, a CPU (Central Processing Unit) 10, a drive device 12, an auxiliary storage device 16, a memory device 18, an interface device 20, an input device 22, and an output device 24. Information processing apparatus. These components are connected via a bus, a serial line, or the like.

ＣＰＵ１０は、例えば、プログラムカウンタや命令デコーダ、各種演算器、ＬＳＵ（Load Store Unit）、汎用レジスタ等を有するプロセッサである。 The CPU 10 is a processor having, for example, a program counter, an instruction decoder, various arithmetic units, an LSU (Load Store Unit), a general-purpose register, and the like.

ドライブ装置１２は、記憶媒体１４からプログラムやデータを読み込み可能な装置である。プログラムを記録した記録媒体１４がドライブ装置１２に装着されると、プログラムが記録媒体１４からドライブ装置１２を介して補助記憶装置１６にインストールされる。記録媒体１４は、例えば、ＣＤ−ＲＯＭ、ＤＶＤディスク、ＵＳＢメモリ等の可搬型の記録媒体である。また、補助記憶装置１６は、例えば、ＨＤＤ（Hard Disk Drive）やフラッシュメモリである。 The drive device 12 is a device that can read a program and data from the storage medium 14. When the recording medium 14 on which the program is recorded is mounted on the drive device 12, the program is installed from the recording medium 14 to the auxiliary storage device 16 via the drive device 12. The recording medium 14 is a portable recording medium such as a CD-ROM, a DVD disk, or a USB memory. The auxiliary storage device 16 is, for example, an HDD (Hard Disk Drive) or a flash memory.

プログラムのインストールは、上記のように記憶媒体１４を用いる他、インタフェース装置２０がネットワーク５０を介して他のコンピュータよりダウンロードし、補助記憶装置１６にインストールすることによって行うこともできる。また、情報処理装置の出荷時に、予め補助記憶装置１６やＲＯＭ（Read Only Memory）等に格納されていてもよい。 In addition to using the storage medium 14 as described above, the program can also be installed by the interface device 20 being downloaded from another computer via the network 50 and installed in the auxiliary storage device 16. Further, it may be stored in advance in an auxiliary storage device 16 or a ROM (Read Only Memory) at the time of shipment of the information processing apparatus.

このようにしてインストール又は予め格納されたプログラムをＣＰＵ１０が実行することにより、図１に示す態様の情報処理装置が、本実施例の知識獲得装置１として機能することができる。 When the CPU 10 executes the program installed or stored in advance as described above, the information processing apparatus having the mode shown in FIG. 1 can function as the knowledge acquisition apparatus 1 of the present embodiment.

メモリ装置１８は、例えば、ＲＡＭ（Random Access Memory）やＥＥＰＲＯＭ（Electrically Erasable and Programmable Read Only Memory）である。インタフェース装置２０は、上記ネットワークとの接続等を制御する。 The memory device 18 is, for example, a RAM (Random Access Memory) or an EEPROM (Electrically Erasable and Programmable Read Only Memory). The interface device 20 controls connection with the network.

入力装置２２は、例えば、キーボードやマウス、タッチパッド、タッチパネル、マイク等である。また、出力装置２４は、例えば、ＬＣＤ（Liquid Crystal Display）やＣＲＴ（Cathode Ray Tube）等の表示装置、プリンタ、スピーカ等を含む。 The input device 22 is, for example, a keyboard, a mouse, a touch pad, a touch panel, a microphone, or the like. The output device 24 includes, for example, a display device such as an LCD (Liquid Crystal Display) and a CRT (Cathode Ray Tube), a printer, a speaker, and the like.

インタフェース装置２０は、ネットワーク５０を介してクライアントコンピュータ１００や大規模文書２００にアクセスすることができる。 The interface device 20 can access the client computer 100 and the large-scale document 200 via the network 50.

クライアントコンピュータ１００は、サーバ側の知識獲得装置１と同様、ＣＰＵ、ドライブ装置、補助記憶装置、メモリ装置、インタフェース装置、入力装置、出力装置等を有する。これらについての詳細な説明は省略する。 The client computer 100 includes a CPU, a drive device, an auxiliary storage device, a memory device, an interface device, an input device, an output device, and the like, like the knowledge acquisition device 1 on the server side. Detailed description thereof will be omitted.

大規模文書２００は、ネットワーク５０を介してアクセス可能なあらゆる記憶装置に格納された電子文書である。 The large-scale document 200 is an electronic document stored in any storage device accessible via the network 50.

［機能構成］
図３は、本実施例の知識獲得装置１の機能構成例である。知識獲得装置１は、単語対獲得部３０と、検索部３２と、スコア算出部３４と、を備える。これらの機能ブロックは、補助記憶装置１６等に格納されたプログラム・ソフトウエアをＣＰＵ１０が実行することにより機能する。 [Function configuration]
FIG. 3 is a functional configuration example of the knowledge acquisition apparatus 1 of the present embodiment. The knowledge acquisition device 1 includes a word pair acquisition unit 30, a search unit 32, and a score calculation unit 34. These functional blocks function when the CPU 10 executes program software stored in the auxiliary storage device 16 or the like.

また、知識獲得装置１は、メモリ装置１８や補助記憶装置１６の所定領域に、単語対テーブル４０と、出現数テーブル４２と、スコア付単語対４４と、を生成する。 In addition, the knowledge acquisition device 1 generates a word pair table 40, an appearance number table 42, and a scored word pair 44 in predetermined areas of the memory device 18 and the auxiliary storage device 16.

図４は、本実施例の知識獲得装置１により実行される処理の全体像を模式的に示す図である。図示するように、知識獲得装置１は、大規模文書２００とシード２５０に基づき単語対テーブル４０を生成する。 FIG. 4 is a diagram schematically showing an overall image of processing executed by the knowledge acquisition device 1 of the present embodiment. As illustrated, the knowledge acquisition device 1 generates a word pair table 40 based on the large-scale document 200 and the seed 250.

また、知識獲得装置１は、単語対テーブル４０とクエリ２６０に基づきカテゴリー検索を行って出現数テーブル４２を生成し、出現数テーブル４２に基づいてスコア算出を行ってスコア付単語対４４を出力する。ここで、「カテゴリー」とは、特許請求の範囲における「検索対象を限定する特定の事象」の一例である。 In addition, the knowledge acquisition device 1 performs a category search based on the word pair table 40 and the query 260 to generate the appearance number table 42, calculates a score based on the appearance number table 42, and outputs a scored word pair 44. . Here, the “category” is an example of “a specific event that limits a search target” in the claims.

これらの処理のうち、単語対テーブル４０の生成は、サービス提供の前処理（事前のバッチ処理）として行われ、出現数テーブル４２の生成及びスコア算出は、ユーザによりサービス要求が行われた際に（適用先が決まった後に）実行されると好適である。以下、これらの処理について説明する。 Among these processes, the generation of the word pair table 40 is performed as a pre-process for service provision (preliminary batch process), and the generation of the appearance number table 42 and the score calculation are performed when a service request is made by the user. It is suitable to be executed (after the application destination is determined). Hereinafter, these processes will be described.

｛単語対獲得｝
単語対獲得部３０は、所与のシードを元に、シードを含む文脈パターンを大規模文書２００から検索する。 {Acquire word pairs}
The word pair acquisition unit 30 searches the large-scale document 200 for a context pattern including a seed based on a given seed.

ここで、シードとは、特定の関係を有する単語の対（単語対）、又は単語対を含む文の中に現れる文脈パターンである。これらの双方がシードとして与えられてもよい。単語対の例として、「生姜は風邪に効く。」という文における「風邪」、「生姜」が挙げられ、この場合の文脈パターンは、「は」・「に効く」である。文脈パターンは、データ検索の対象とされる大規模文書２００内の文に含まれる単語対以外の特徴的な文字列であり、大規模文書２００から単語対を獲得するための抽出規則として理解することができる。また、後述するように、本実施例における文脈パターンは、特許請求の範囲における「単語の対を含む文に関する形態素の構造」の一例である。なお、本実施例では句点「。」を区切りとする「文」をデータ検索の対象とするが、複数の文が集まった「文章」をデータ検索の対象としてもよい。 Here, the seed is a context pattern that appears in a word pair (word pair) having a specific relationship or in a sentence including the word pair. Both of these may be given as seeds. Examples of word pairs include “cold” and “ginger” in the sentence “Ginger is effective for colds”, and the context pattern in this case is “Ha” and “Effective for”. The context pattern is a characteristic character string other than a word pair included in a sentence in the large-scale document 200 that is a data search target, and is understood as an extraction rule for acquiring a word pair from the large-scale document 200. be able to. Further, as will be described later, the context pattern in the present embodiment is an example of “a morpheme structure relating to a sentence including a word pair” in the claims. In this embodiment, a “sentence” with a delimiter “.” As a delimiter is a data search target, but a “sentence” in which a plurality of sentences are collected may be a data search target.

シードは、例えば、オペレータがクライアントコンピュータ１００に対して任意の単語対、又は文脈パターン（或いはこれらの双方）を入力することにより、ネットワーク５０を介してＣＰＵ１０に提供される。図５は、オペレータ１１０がクライアントコンピュータ１００に対してシードを入力し、知識獲得装置１が単語対テーブル４０を生成する様子を模式的に示す図である。 The seed is provided to the CPU 10 via the network 50 when the operator inputs an arbitrary word pair and / or context pattern to the client computer 100, for example. FIG. 5 is a diagram schematically illustrating how the operator 110 inputs a seed to the client computer 100 and the knowledge acquisition device 1 generates the word pair table 40.

単語対獲得部３０は、シードが単語対である場合、まず、シードを含む文脈パターンを大規模文書２００から検索する。文脈パターンが得られると、得られた文脈パターンを含む文書を大規模文書２００から検索し、得られた文書から未知の単語対を取得する。これらを繰り返し行うことによって、単語対及び文脈パターンの数が増加していく。単語対獲得部３０は、取得された単語対及び文脈パターンを、例えばこれらをラベルとする空のデータテーブルの形式で保持する。 When the seed is a word pair, the word pair acquisition unit 30 first searches the large-scale document 200 for a context pattern including the seed. When the context pattern is obtained, a document including the obtained context pattern is searched from the large-scale document 200, and an unknown word pair is obtained from the obtained document. By repeating these steps, the number of word pairs and context patterns increases. The word pair acquisition unit 30 holds the acquired word pairs and context patterns, for example, in the form of an empty data table using these as labels.

図６は、単語対獲得部３０により生成される単語対テーブル４０の一例である。図示するように、単語対テーブル４０は、単語対と文脈パターンを行方向と列方向のラベルとし、単語対と文脈パターンの組み合わせに該当するデータ内容部分はブランクとなっている。このブランク部分に、後述する検索部３２の検索結果として出力される出現数（ヒット件数）が格納されることにより、出現数テーブル４２が生成される。
ここで、大規模文書２００からのデータ検索は、大規模文書を所有している場合には既知の検索技術により行ない、大規模文書を所有していない場合にはGoogle（登録商標）に代表される既存の検索サービス業者により実施されている技術を利用して行う。このようなデータ検索の具体的内容に関しては、周知・慣用技術であるため詳細な説明は省略するが、一般的な文書検索で用いられるAND、OR、NOT等の検索条件を適切に設定することにより、単語対又は文脈パターンを含む文書を抽出することができる。 FIG. 6 is an example of the word pair table 40 generated by the word pair acquisition unit 30. As shown in the figure, the word pair table 40 uses word pairs and context patterns as labels in the row direction and column direction, and data content portions corresponding to combinations of word pairs and context patterns are blank. An appearance number table 42 is generated by storing the number of appearances (number of hits) output as a search result of the search unit 32 described later in this blank part.
Here, the data search from the large-scale document 200 is performed by a known search technique when the large-scale document is owned, and represented by Google (registered trademark) when the large-scale document is not owned. This is done using technology implemented by existing search service providers. The specific contents of such data search are well-known and commonly used techniques, and will not be described in detail. However, search conditions such as AND, OR, and NOT used in general document search should be set appropriately. Thus, a document including a word pair or a context pattern can be extracted.

｛検索｝
検索部３２は、単語対獲得部３０により生成された単語対テーブル４０を元に、オペレータやユーザにより入力されたクエリを反映したデータ検索を、大規模文書２００に対して実行する。そして、その結果を出現数テーブル４２として格納する。 {Search}
Based on the word pair table 40 generated by the word pair acquisition unit 30, the search unit 32 performs a data search that reflects a query input by an operator or user on the large-scale document 200. Then, the result is stored as an appearance number table 42.

図７は、オペレータ１１０又はユーザ１２０がクライアントコンピュータ１００に対してクエリを入力し、知識獲得装置１が出現数テーブル４２を生成する様子を模式的に示す図である。なお、本図におけるクライアントコンピュータ１００は、図５とは異なるものであってよい。 FIG. 7 is a diagram schematically illustrating how the operator 110 or the user 120 inputs a query to the client computer 100 and the knowledge acquisition device 1 generates the appearance number table 42. Note that the client computer 100 in this figure may be different from that in FIG.

オペレータ１１０又はユーザ１２０は、例えば、「『国立病院』OR『市立病院』」等のカテゴリー（特許請求の範囲における「特定の事象」の一例である）をクエリとして入力する。すると、検索部３２は、「『国立病院』OR『市立病院』」というクエリを満たすデータ検索を、単語対テーブル４０に格納された単語対と文脈パターンの全ての組み合わせについて行う。すなわち、単語対テーブル４０に格納された複数の単語対と複数の文脈パターンから抽出された一の単語対及び一の文脈パターンに、クエリを加えたものを検索キーとして、大規模文書２００から検索する処理を、全ての単語対と文脈パターンについて行う。 For example, the operator 110 or the user 120 inputs a category such as ““ National Hospital ”OR“ City Hospital ”” (an example of “specific event” in the scope of claims) as a query. Then, the search unit 32 performs a data search that satisfies the query ““ national hospital ”OR“ city hospital ”” for all combinations of word pairs and context patterns stored in the word pair table 40. That is, a search is performed from the large-scale document 200 using, as a search key, one word pair and one context pattern extracted from a plurality of word pairs and a plurality of context patterns stored in the word pair table 40 as a search key. This process is performed for all word pairs and context patterns.

そして、検索部３２は、各単語対と文脈パターンの組み合わせについて、「『国立病院』OR『市立病院』」というクエリを満たす検索結果における出現数を取得し、単語対テーブル４０のブランク部分に格納して出現数テーブル４２とする。例えば、（「風邪」には「大根」が効く）AND（「国立病院」OR「市立病院」）について検索を行った際の出現数が２１であれば、これを単語対テーブル４０の「風邪」−「大根」と「ＡにはＢが効く」に対応する場所（アドレス）に格納する。当該出現数テーブル４２は、クエリ（「国立病院」OR「市立病院」）に対応する専用テーブルとして生成される。すなわち、検索部３２は、入力されたクエリに対応する専用テーブルを生成する。図８は、検索部３２により生成される出現数テーブル４２の一例である。 Then, the search unit 32 acquires the number of appearances in the search result satisfying the query ““ national hospital ”OR“ city hospital ”” for each word pair and context pattern combination, and stores it in the blank portion of the word pair table 40. Thus, the appearance number table 42 is obtained. For example, if the number of appearances when searching for AND (“national hospital” OR “city hospital”) is 21 (“cold” works for “cold”), this is expressed as “cold” in the word pair table 40. "-" Radish "and" A is effective for B ". The appearance number table 42 is generated as a dedicated table corresponding to the query (“national hospital” OR “city hospital”). That is, the search unit 32 generates a dedicated table corresponding to the input query. FIG. 8 is an example of the appearance number table 42 generated by the search unit 32.

このようにして、検索部３２は、入力されたクエリを満たし、且つ単語対獲得部３０に予め格納されている単語対と文脈パターンの全ての組み合わせに該当する文書の出現数を取得し、出現数テーブル４２を生成する。 In this way, the search unit 32 acquires the number of appearances of a document that satisfies the input query and corresponds to all combinations of word pairs and context patterns stored in the word pair acquisition unit 30 in advance. A number table 42 is generated.

｛スコア算出｝
スコア算出部３４は、検索部３２により生成された出現数テーブル４２を参照し、各単語対とクエリの関連性を評価し、評価結果に基づく出力を行う。 {Score calculation}
The score calculation unit 34 refers to the appearance number table 42 generated by the search unit 32, evaluates the relevance between each word pair and the query, and performs output based on the evaluation result.

スコア算出部３４の出力は、例えば、出現数テーブル４２にスコアを付加したスコア付単語対４４の形式でなされる。本実施例におけるスコアとは、（１）クエリとして入力されたカテゴリーとの関連性、及び（２）その単語対自身の出現数の多さ、すなわちその単語対が一般に浸透しているかどうか、を示す評価値である。 The output of the score calculation unit 34 is made, for example, in the form of a scored word pair 44 with a score added to the appearance number table 42. The score in the present embodiment is (1) relevance to the category input as a query, and (2) the number of occurrences of the word pair itself, that is, whether or not the word pair is generally permeated. It is the evaluation value shown.

図９は、知識獲得装置１がスコア付単語対４４を生成する様子を模式的に示す図である。また、図１０は、スコア算出部３４により出力されるスコア付単語対４４の一例である。図示するように、スコア算出部３４は、出現数テーブル４２に格納された各単語対について、スコアを付加して出力する。 FIG. 9 is a diagram schematically illustrating how the knowledge acquisition device 1 generates a scored word pair 44. FIG. 10 is an example of a scored word pair 44 output by the score calculation unit 34. As shown in the figure, the score calculation unit 34 adds a score to each word pair stored in the appearance number table 42 and outputs it.

スコアの算出は、例えば次式（１）により行うことができる。式中、Score_iはi番目の単語対のスコアを表し、f(w,p)は単語対wと文脈パターンpの同時出現数を示し、Nは全ての単語対と文脈パターンの組み合わせの出現数を示す。また、添字jは文脈パターンの識別パラメータであり、添え字kは単語対の識別パラメータである。 The score can be calculated by, for example, the following formula (1). In the formula, Score _i represents the score of the i-th word pair, f (w, p) represents the number of simultaneous occurrences of word pair w and context pattern p, and N represents the occurrence of all word pairs and context pattern combinations. Indicates a number. The subscript j is a context pattern identification parameter, and the subscript k is a word pair identification parameter.

上式（１）は、単語対全体に関しての文脈パターン毎の出現数（図１０における「合計」行の各数値）を算出し、これを総出現数で除した値を荷重係数として、単語対の文脈パターン毎の出現数に乗じて合計したものである。従って、本実施例におけるスコアは、各単語対に関する文脈パターン毎の出現数傾向と、全単語対に関する文脈パターン毎の出現数傾向との合致程度に基づいて、各単語対とクエリとの関連性を適切に評価した評価値となる。図１０に即して説明すると、例えば、単語対（「風邪」−「大根」）のスコアは、Nが１０４７であるため、次式（２）のようになる。 The above equation (1) calculates the number of occurrences for each context pattern (the numerical values in the “total” line in FIG. 10) for the entire word pair, and the value obtained by dividing this by the total number of occurrences is used as a weighting factor. Multiply by the number of occurrences for each context pattern. Therefore, the score in the present embodiment is based on the degree of matching between the appearance number tendency for each context pattern for each word pair and the appearance number tendency for each context pattern for all word pairs, and the relationship between each word pair and the query. It becomes an evaluation value that is evaluated appropriately. Referring to FIG. 10, for example, the score of the word pair (“cold” − “radish”) is expressed by the following equation (2) because N is 1047.

Score_{（「風邪」−「大根」）}＝２１×（２０１／１０４７）＋８×（１５５／１０４７）＋３６×（８３／１０４７）＋…＝０．２１ …（２）。 Score _{(“cold” − “radish”)} = 21 × (201/1047) + 8 × (155/1047) + 36 × (83/1047) +... = 0.21 (2).

これによって、単に出現数の多い単語対が好スコアを獲得するのではなく、当該クエリを満たす全ての単語対に対応する文脈パターンの出現傾向に近い傾向を示す単語対が、高スコアを獲得することとなる。例えば、「病院関係」というカテゴリーでは、文脈パターン１はよく用いられるが、文脈パターン３は余り用いられないといった傾向が存在する場合、このような傾向に近い傾向を示す単語対に対して、高スコアを付与する。これによって、カテゴリーによく適合した単語対に高スコアを付与することができ、「カテゴリーに合致した単語対の知識を得たい」というユーザの要求に応えることができる。なお、スコアの算出は、例えば単語対の総数を乗じる等して何らかの正規化処理を行ってもよい。 As a result, a pair of words having a high number of appearances does not obtain a good score, but a word pair showing a tendency close to the appearance tendency of context patterns corresponding to all word pairs satisfying the query obtains a high score. It will be. For example, in the category of “hospital relations”, when there is a tendency that the context pattern 1 is often used but the context pattern 3 is not used much, a word pair showing a tendency close to such a tendency is high. Give a score. As a result, a high score can be given to word pairs that are well suited to the category, and the user's request to “get knowledge of word pairs that match the category” can be met. Note that the score may be calculated by performing some normalization processing, for example, by multiplying the total number of word pairs.

以上のようにスコア付単語対４４を生成すると、これをそのままユーザに出力してもよいし、スコア順にランキング（並べ替え）したものを出力してもよい。また、ランキング上位のものに限定してユーザに提供してもよい。このように、スコア算出の結果に基づく出力は、種々のものが考えられる。 When the scored word pair 44 is generated as described above, it may be output to the user as it is, or a ranking (rearranged) in the order of score may be output. Moreover, you may provide to a user only in the top ranking. As described above, various outputs based on the score calculation result can be considered.

また、上記のように、単語対テーブル４０の生成は、サービス提供の前処理として行われ、出現数テーブル４２の生成及びスコア算出は、ユーザによりサービス要求が行われた際に実行されると好適である。図１１は、従来の装置によって実行され得る処理と、本実施例の知識獲得装置１により実行される処理とを対比するための模式図である。 Moreover, as described above, the generation of the word pair table 40 is performed as a pre-process for providing the service, and the generation of the appearance number table 42 and the score calculation are preferably performed when a service request is made by the user. It is. FIG. 11 is a schematic diagram for comparing a process that can be executed by a conventional apparatus with a process that is executed by the knowledge acquisition apparatus 1 of the present embodiment.

図中、上段は、従来の装置によって実行される処理を模式的に示している。図示するように、従来の装置においてカテゴリー毎に単語対を獲得しようとすると、カテゴリー毎に単語対獲得部３０と同等の処理を行う必要があった。ところが、単語対獲得部３０の処理は、シードから単語対や文脈パターンを繰り返し取得するものであり、所要時間が比較的長いものである。この結果、ユーザやオペレータがカテゴリーを入力してから単語対を取得するまでの処理時間が長くなってしまう。 In the figure, the upper part schematically shows processing executed by a conventional apparatus. As shown in the figure, in order to acquire word pairs for each category in the conventional apparatus, it is necessary to perform the same processing as the word pair acquisition unit 30 for each category. However, the processing of the word pair acquisition unit 30 is to repeatedly acquire word pairs and context patterns from the seed, and the required time is relatively long. As a result, the processing time from when the user or operator inputs a category until the word pair is acquired becomes longer.

これに対し、図１１の後段で処理の概要が表される本実施例の知識獲得装置１では、事前のバッチ処理として単語対テーブル４０が生成され、適用先が決まった後には出現数テーブル４２の生成及びスコア算出のみが行われる。出現数テーブル４２の生成は、クエリを反映した検索キーを設定して大規模文書２００を検索する処理であり、比較的短時間で終了する。また、スコア算出についても、単純な演算処理であるため、処理時間は短いものとなる。この結果、ユーザがクエリを入力してから単語対を取得するまでの処理時間を短くすることができる。すなわち、より迅速に結果を出力することができる。 On the other hand, in the knowledge acquisition apparatus 1 of the present embodiment in which the outline of the process is represented in the latter stage of FIG. 11, the word pair table 40 is generated as a pre-batch process, and the appearance number table 42 after the application destination is determined. Only generation and score calculation are performed. The generation of the appearance number table 42 is a process for searching the large-scale document 200 by setting a search key reflecting the query, and is completed in a relatively short time. Also, the score calculation is a simple calculation process, so the processing time is short. As a result, the processing time from when the user inputs a query until the word pair is acquired can be shortened. That is, the result can be output more quickly.

［処理フロー］
以下、知識獲得装置１が有する各機能ブロックの処理について、フローチャートに即して具体的に説明する。なお、全体フローについては、図４を参照することとし、図示を省略する。 [Processing flow]
Hereinafter, processing of each functional block included in the knowledge acquisition device 1 will be specifically described with reference to a flowchart. For the entire flow, refer to FIG. 4 and the illustration is omitted.

図１２は、単語対獲得部３０により実行される特徴的な処理の流れを示すフローチャートである。 FIG. 12 is a flowchart showing a flow of characteristic processing executed by the word pair acquisition unit 30.

まず、単語対獲得部３０は、入力されたシード２５０を単語対リストに追加する（Ｓ３００）。単語対リスト４０Ａ、及び後述する文脈パターンリスト４０Ｂは、単語対テーブル４０の行方向及び列方向のラベルとなるものであり、メモリ装置１８や補助記憶装置１６の所定領域に設定される。 First, the word pair acquisition unit 30 adds the input seed 250 to the word pair list (S300). The word pair list 40 </ b> A and a context pattern list 40 </ b> B, which will be described later, serve as labels in the row direction and column direction of the word pair table 40, and are set in predetermined areas of the memory device 18 and the auxiliary storage device 16.

次に、単語対獲得部３０は、単語対リスト４０Ａに格納された単語対で大規模文書２００を検索し、文脈パターンを抽出して文脈パターンリスト４０Ｂに追加する（Ｓ３０２；詳細は図１２に記載）。 Next, the word pair acquisition unit 30 searches the large-scale document 200 with the word pairs stored in the word pair list 40A, extracts a context pattern, and adds it to the context pattern list 40B (S302; details are shown in FIG. 12). Description).

次に、単語対獲得部３０は、新たな文脈パターンが一つでも抽出できたかどうかを判定する（Ｓ３０４）。新たな文脈パターンが全く抽出できなかった場合は、本フローを終了する。 Next, the word pair acquisition unit 30 determines whether even one new context pattern has been extracted (S304). If no new context pattern can be extracted, this flow ends.

一方、新たな文脈パターンが一つでも抽出できた場合には、文脈パターンリストに格納された文脈パターンで大規模文書２００を検索し、単語対を抽出して単語対リスト４０Ａに追加する（Ｓ３０６）。 On the other hand, if even one new context pattern can be extracted, the large-scale document 200 is searched with the context patterns stored in the context pattern list, and word pairs are extracted and added to the word pair list 40A (S306). ).

次に、単語対獲得部３０は、新たな単語対が一つでも抽出できたかどうかを判定する（Ｓ３０８）。新たな単語対が全く抽出できなかった場合は、本フローを終了する。 Next, the word pair acquisition unit 30 determines whether even one new word pair has been extracted (S308). If no new word pair can be extracted, this flow ends.

一方、新たな単語対が一つでも抽出できた場合には、Ｓ３０２に戻る。こうして、単語対と文脈パターンが繰り返し抽出され、単語対リスト４０Ａや文脈パターンリスト４０Ｂに追加される。 On the other hand, if even one new word pair can be extracted, the process returns to S302. In this way, word pairs and context patterns are repeatedly extracted and added to the word pair list 40A and the context pattern list 40B.

単語対獲得部３０は、本フローが終了すると、単語対リスト４０Ａや文脈パターンリスト４０Ｂの内容に基づき単語対テーブル４０を生成し、メモリ装置１８や補助記憶装置１６の所定領域に格納する。 When this flow ends, the word pair acquisition unit 30 generates the word pair table 40 based on the contents of the word pair list 40A and the context pattern list 40B, and stores the word pair table 40 in a predetermined area of the memory device 18 or the auxiliary storage device 16.

図１３は、単語対獲得部３０により実行される文脈パターン抽出処理の流れを示すフローチャートである。本フローは、図１２のＳ３０２に相当する。 FIG. 13 is a flowchart showing a flow of context pattern extraction processing executed by the word pair acquisition unit 30. This flow corresponds to S302 in FIG.

まず、単語対獲得部３０は、単語対リスト４０Ａの最上段から順に単語対を一つ取り出す（Ｓ４００）。 First, the word pair acquisition unit 30 extracts one word pair in order from the top of the word pair list 40A (S400).

次に、単語対獲得部３０は、単語対リスト４０Ａの最後まで検索を終了したか否か、すなわち、Ｓ４００において新たな単語対を取り出すことができたかどうかを判定する（Ｓ４０２）。単語対リスト４０Ａの最後まで検索を終了した場合は、本フローを終了する。 Next, the word pair acquisition unit 30 determines whether or not the search has been completed up to the end of the word pair list 40A, that is, whether or not a new word pair has been extracted in S400 (S402). When the search is completed up to the end of the word pair list 40A, this flow ends.

一方、単語対リスト４０Ａの最後まで検索を終了していない場合は、取り出した単語対を検索キーとして大規模文書を検索する（Ｓ４０４）。検索結果は、メモリ装置１８や補助記憶装置１６に一時的に格納される。 On the other hand, if the search has not been completed up to the end of the word pair list 40A, a large-scale document is searched using the extracted word pair as a search key (S404). The search result is temporarily stored in the memory device 18 or the auxiliary storage device 16.

次に、単語対獲得部３０は、Ｓ４０４の検索結果を一つ取り出す（Ｓ４０６）。そして、最後まで検索結果を取り出したか否か、すなわち、Ｓ４０６において、新たな検索結果を取り出すことができたかどうかを判定する（Ｓ４０８）。新たな検索結果を取り出すことができなかった場合は、Ｓ４００に戻る。 Next, the word pair acquisition unit 30 takes out one search result of S404 (S406). Then, it is determined whether or not a search result has been extracted to the end, that is, whether or not a new search result has been extracted in S406 (S408). If a new search result cannot be extracted, the process returns to S400.

一方、新たな検索結果を取り出すことができた場合は、検索結果に含まれる単語対が所定の構文パターンを形成しているか否かを解析し、これを判定する（Ｓ４１０）。 On the other hand, if a new search result can be extracted, it is determined whether or not the word pairs included in the search result form a predetermined syntax pattern (S410).

図１４は、Ｓ４１０の解析において用いられる形態素の構造を模式的に示す図である。ある単語対（「風邪」−「大根」）を検索キーとした文書検索の結果は、例えば「風邪にはやっぱり大根が一番効くよ」のような、単語対や文脈パターン以外の語を含む文であることが想定される。 FIG. 14 is a diagram schematically showing a morpheme structure used in the analysis of S410. The result of document search using a certain word pair ("cold"-"radish") as a search key includes words other than word pairs and context patterns, such as "The radish is most effective for colds". It is assumed to be a sentence.

単語対獲得部３０は、まず、日本語の文を形態素と呼ばれる最小の単位に分割する形態素解析という処理を行なう。形態素の単位では、例えば「お茶」という語は、「お（接頭語）」と「茶（名詞）」の二つにカウントされる。そして、「風邪にはやっぱり大根が一番効くよ」という文を形態素解析すると、「風邪／に／は／やっぱり／大根／が／一番／効く／よ」のように分割される。 First, the word pair acquisition unit 30 performs a process called morpheme analysis that divides a Japanese sentence into minimum units called morphemes. In the morpheme unit, for example, the word “tea” is counted as “o (prefix)” and “tea (noun)”. Then, when the sentence “The radish is the most effective for colds” is analyzed, the sentence is divided into “cold / ni / ha / the radishes / most / effective / yo”.

次に、単語対獲得部３０は、係り受け解析を行なう。係り受け解析では、まず形態素列を文節という単位にまとめ上げる。上記の文は、「（風邪／に／は），（やっぱり），（大根／が），（一番），（効く／よ）」のように文節にまとめ上げられる。そして、文節の間の係り関係を定義する。日本語における係り関係は、（１）係り先は一つ、（２）前から後ろに向かって係るという原則に基づいて解析される。上記の形態素解析と係り受け解析は、既存の技術が周知となっており、それぞれ、９９％、９１％程度の精度で自動的に処理される。 Next, the word pair acquisition unit 30 performs dependency analysis. In dependency analysis, first, morpheme strings are grouped into units called phrases. The above sentences are grouped into phrases such as “(cold / ni / ha), (after all), (radish / ga), (first), (effective / yo)”. Then, the relationship between clauses is defined. The relationship in Japanese is analyzed based on the principle that (1) there is one member and (2) it is from the front to the back. The morphological analysis and dependency analysis are well known in the existing technology, and are automatically processed with an accuracy of about 99% and 91%, respectively.

上記の文の係り関係は、以下のようになる。
（風邪／に／は）→（効く／よ）
（やっぱり） →（効く／よ）
（大根／が） →（効く／よ）
（一番） →（効く／よ）
（効く／よ） →＜文末＞ The relationship between the above sentences is as follows.
(Cold / ni / ha) → (works / yo)
(After all) → (Effective / Yo)
(Daikon / ga) → (Effective / yo)
(First) → (works / yo)
(Effective / Yo) → <End of sentence>

このような係り関係を木構造で表わし、且つ「文節内の形態素はそれぞれ直後の形態素に係る」という経験則を用いると，図１４で例示する構造が取得される。 When such a relation is represented by a tree structure and an empirical rule that “the morpheme in the phrase is related to the morpheme immediately after” is used, the structure illustrated in FIG. 14 is acquired.

更に、形態素解の木構造から「へ」の字の部分を抽出する方法について図１４の例に即して説明する。まず、「風邪」と「大根」が単語対として与えられていることを前提とする。そして、この二つの単語対（形態素）の双方を含む最小限の部分構造を抽出する。図１４の例では、網掛けされた「へ」の字の部分が、この最小限の部分構造に相当する。 Further, a method for extracting the character portion of “he” from the tree structure of the morpheme solution will be described with reference to the example of FIG. First, it is assumed that “cold” and “radish” are given as word pairs. Then, a minimum partial structure including both of these two word pairs (morphemes) is extracted. In the example of FIG. 14, the shaded “he” portion corresponds to this minimum partial structure.

このように最小限の部分構造が得られると、単語対獲得部３０は、単語対が、ある所定の距離内に在る場合に、所定の構文パターンを形成していると判定する。所定の構文パターンを形成していると判定した場合、得られた最小限の部分構造から単語対を除いた部分が、文脈パターンとして認識される。 When the minimum partial structure is obtained in this way, the word pair acquisition unit 30 determines that a predetermined syntax pattern is formed when the word pairs are within a predetermined distance. When it is determined that a predetermined syntax pattern is formed, a part obtained by removing a word pair from the obtained minimum partial structure is recognized as a context pattern.

ここで、「所定の距離」とは、例えば、図１４で示す形態素の木構造における、形態素を接続するリンクの数をいう。下記の矢印の数（５）が、これに相当する。
（風邪）→（に）→（は）→（効く）
（大根）→（が）→（効く） Here, the “predetermined distance” refers to, for example, the number of links connecting morphemes in the morpheme tree structure shown in FIG. The number of arrows (5) below corresponds to this.
(Cold) → (To) → (Ha) → (Effective)
(Radish) → (ga) → (works)

このように距離を限定することによって、比較的長い文に含まれる単語対から有意でない文脈パターンが抽出されるのを抑制することができる。この距離を限定しなかった場合、或いは距離を３０程度まで許容した場合、「風邪になった妻に頼まれて買い物に来たがリストに入っていた納豆が無い。」のような文から、「風邪」と「納豆」を抽出し、この文から有意でない文脈パターンを抽出してしまう可能性があるからである。距離の閾値を決定するのに特段の規則は無く、得られた知識における誤りの多少に応じて経験的に定めて良い。 By limiting the distance in this way, it is possible to suppress extraction of insignificant context patterns from word pairs included in relatively long sentences. When this distance is not limited, or when the distance is allowed up to about 30, from a sentence such as "There is no natto on the list that came to the shopping at the request of a wife who had a cold." This is because “cold” and “natto” are extracted, and insignificant context patterns may be extracted from this sentence. There is no special rule for determining the distance threshold, and it may be determined empirically depending on the number of errors in the knowledge obtained.

なお、形態素解の木構造は、図１４で示すような「へ」の字であるとは限らず、多様な形態を取り得る。例えば、「風邪に効く大根を買った。」という文についての形態素解析の結果は「風邪／に／効く／大根／を／買っ／た」のようになる。 Note that the tree structure of the morphological solution is not necessarily a “he” as shown in FIG. 14, and can take various forms. For example, the result of the morphological analysis for the sentence “I bought a radish that works for a cold” would be “cold / effective / effective / radish / obtained / buy / taken”.

また、係り受け解析の結果は以下のようになる。
（風邪／に）→（効く）
（効く）→（大根／を）
（大根／を）→（買っ／た） The result of dependency analysis is as follows.
(Cold /) → (Effective)
(Effective) → (Daikon /)
(Daikon /) → (Buy / Ta)

この場合、形態素の木構造は以下のように「ヘ」の字ではなく一直線になり、抽出されるパターンも一直線になる。
（風邪／に）→（効く）→（大根／を）→（買っ／た）
（風邪→に）→（効く）→（大根→を）→（買っ→た） In this case, the tree structure of the morpheme is not a “f” character but is a straight line as follows, and the extracted pattern is also a straight line.
(Cold / ni) → (Effective) → (Daikon //) → (Bought / taken)
(Cold →) → (Effective) → (Daikon → →) → (Buy →)

フローの説明に戻る。単語対が所定の構文パターンを形成している場合は、抽出された文脈パターンを文脈パターンリスト４０Ｂに追加する（Ｓ４１２）。単語対が所定の構文パターンを形成していない場合は、Ｓ４０６に戻る。 Return to the description of the flow. If the word pair forms a predetermined syntax pattern, the extracted context pattern is added to the context pattern list 40B (S412). If the word pair does not form a predetermined syntax pattern, the process returns to S406.

図１５は、検索部３２により実行される特徴的な処理の流れを示すフローチャートである。 FIG. 15 is a flowchart showing a flow of characteristic processing executed by the search unit 32.

まず、検索部３２は、単語対テーブル４０の最上段から順にから単語対を一つ取り出す（Ｓ５００）。 First, the search unit 32 extracts one word pair from the top of the word pair table 40 in order (S500).

次に、検索部３２は、単語対に関して単語対テーブル４０の最後まで処理を行ったか否か、すなわち、Ｓ５００において新たな単語対を取り出すことができたかどうかを判定する（Ｓ５０２）。単語対テーブル４０の最後まで処理を行った場合は、本フローを終了する。 Next, the search unit 32 determines whether or not the word pair has been processed to the end of the word pair table 40, that is, whether or not a new word pair has been extracted in S500 (S502). When the process is completed up to the end of the word pair table 40, this flow is finished.

一方、単語対テーブル４０の最後まで処理を行っていない場合は、単語対テーブル４０最左列から順に文脈パターンを一つ取り出す（Ｓ５０４）。 On the other hand, if the processing is not performed to the end of the word pair table 40, one context pattern is extracted in order from the leftmost column of the word pair table 40 (S504).

次に、検索部３２は、文脈パターンに関して単語対テーブル４０の最後まで処理を行ったか否か、すなわち、Ｓ５０４において新たな文脈パターンを取り出すことができたかどうかを判定する（Ｓ５０６）。単語対テーブル４０の最後まで処理を行った場合は、Ｓ５００に戻る。 Next, the search unit 32 determines whether or not processing has been performed up to the end of the word pair table 40 regarding the context pattern, that is, whether or not a new context pattern has been extracted in S504 (S506). When the process is completed up to the end of the word pair table 40, the process returns to S500.

一方、単語対テーブル４０の最後まで処理を行っていない場合は、Ｓ５００及びＳ５０４において取り出された単語対及び文脈パターンと、入力されたクエリとを結合して検索キーを作成し（Ｓ５０８）、大規模文書２００を検索する（Ｓ５１０）。検索キーは、例えば、文脈パターン（例えば「ＡはＢに効く」）のＡ及びＢの箇所に単語対を埋め込んだ文と、入力されたクエリをAND条件で結合して作成される。 On the other hand, if the processing is not performed to the end of the word pair table 40, the word pair and context pattern extracted in S500 and S504 and the input query are combined to create a search key (S508). The scale document 200 is searched (S510). The search key is created, for example, by combining a sentence in which a word pair is embedded at positions A and B of a context pattern (for example, “A works for B”) and an input query using an AND condition.

そして、出現数（ヒット件数）を取得し、出現数テーブル４２における、Ｓ５００及びＳ５０４で取り出された単語対及び文脈パターンに該当する箇所に格納し（Ｓ５１２）、Ｓ５０４に戻る。 Then, the number of appearances (number of hits) is acquired, stored in the location corresponding to the word pair and context pattern extracted in S500 and S504 in the appearance number table 42 (S512), and the process returns to S504.

なお、図１５のフローは、全てが自動的に進行するのではなく、検索実行毎にユーザに検索結果を出力し、確認操作を行わせるものであってもよい。 Note that the flow of FIG. 15 does not automatically proceed, but may output a search result to the user and perform a confirmation operation for each search execution.

ここで、図１５のフローでは、形態素の木構造を扱わずに文字列を用いた検索を行うため、検索キーの設定方法や検索の仕様次第では、「風邪にはやっぱり大根が効く」のように「やっぱり」等が入っていると出現数がカウントされない可能性がある。しかしながら、（１）係る検索は大規模な文書から行なわれるため、「やっぱり」などが入っていない表現も多く存在することが期待される、（２）検索の処理では正確な出現数が必要になるのではなく、特定のカテゴリーにおける傾向が分かればよい、（３）「やっぱり」などの文字列が間に含まれる割合がパターン毎に一定だと仮定すると、傾向を知るときには、「やっぱり」などの表現による検索数の低下は無視できる等の理由から、大きな問題とはならない。むしろ、詳細な検索を行わないため、処理を単純化することができ、高速な処理を実現することができる。 Here, in the flow of FIG. 15, a search using a character string is performed without handling the morpheme tree structure. Therefore, depending on the search key setting method and the search specification, “a radish is still effective for colds” If there is "After all" etc., the number of appearances may not be counted. However, since (1) such a search is performed from a large-scale document, it is expected that there will be many expressions that do not contain “After all”, etc. (2) In the search processing, an accurate number of appearances is required. It is only necessary to know the tendency in a specific category. (3) Assuming that the ratio of character strings such as "Yappari" is constant for each pattern, "Yappari" etc. The decline in the number of searches due to the expression is not a big problem because it can be ignored. Rather, since detailed search is not performed, processing can be simplified and high-speed processing can be realized.

図１６は、スコア算出部３４により実行される特徴的な処理の流れを示すフローチャートである。 FIG. 16 is a flowchart showing a flow of characteristic processing executed by the score calculation unit 34.

まず、スコア算出部３４は、出現数テーブル４０を参照し、各文脈パターンの出現数、及び総出現数Nを算出する（Ｓ６００）。 First, the score calculation unit 34 refers to the appearance number table 40 and calculates the number of appearances of each context pattern and the total number of appearances N (S600).

次に、スコア算出部３４は、単語対テーブル４０の最上段から順に単語対を一つ取り出す（Ｓ６０２）。 Next, the score calculation unit 34 extracts one word pair in order from the top row of the word pair table 40 (S602).

次に、スコア算出部３４は、単語対に関して単語対テーブル４０の最後まで処理を行ったか否か、すなわち、Ｓ６０２において新たな単語対を取り出すことができたかどうかを判定する（Ｓ６０４）。単語対テーブル４０の最後まで処理を行った場合は、本フローを終了する。 Next, the score calculation unit 34 determines whether or not the word pair has been processed to the end of the word pair table 40, that is, whether or not a new word pair has been extracted in S602 (S604). When the process is completed up to the end of the word pair table 40, this flow is finished.

一方、単語対テーブル４０の最後まで処理を行っていない場合は、単語対テーブル４０の最左列から順に文脈パターンを一つ取り出す（Ｓ６０６）。 On the other hand, when the processing is not performed to the end of the word pair table 40, one context pattern is extracted in order from the leftmost column of the word pair table 40 (S606).

次に、スコア算出部３４は、文脈パターンに関して単語対テーブル４０の最後まで処理を行ったか否か、すなわち、Ｓ６０６において新たな文脈パターンを取り出すことができたかどうかを判定する（Ｓ６０８）。単語対テーブル４０の最後まで処理を行った場合は、Ｓ６０２に戻る。 Next, the score calculation unit 34 determines whether or not processing has been performed up to the end of the word pair table 40 regarding the context pattern, that is, whether or not a new context pattern has been extracted in S606 (S608). When the process is completed up to the end of the word pair table 40, the process returns to S602.

一方、単語対テーブル４０の最後まで処理を行っていない場合は、該当する単語対及び文脈パターンの組み合わせの出現数（例えば図中（２））に、該当する文脈パターンの総出現数（図中（３））を総出現数N（図中（４））で除した値を乗じた値を算出する。そして、この値を、該当する単語対のスコア（図中（１））に累積加算する（Ｓ６１０）。 On the other hand, when the processing is not performed up to the end of the word pair table 40, the total number of occurrences of the corresponding context pattern (in the figure (for example, (2) in the figure)) A value obtained by multiplying the value obtained by dividing (3)) by the total number of appearances N ((4) in the figure) is calculated. Then, this value is cumulatively added to the score of the corresponding word pair ((1) in the figure) (S610).

係る処理によって、上式（１）で表したスコアが、各単語対について算出されることになる。 By such processing, the score expressed by the above formula (1) is calculated for each word pair.

「まとめ」
以上説明した本実施例の知識獲得装置１によれば、適用先が決まった後に出現数テーブル４２の生成及びスコア算出のみを行うことができるため、クエリが入力されてから単語対を取得するまでの処理時間を短くすることができる。従って、各単語対とクエリとの関連性を適切に評価した評価値を、より迅速に出力することができる。 "Summary"
According to the knowledge acquisition apparatus 1 of the present embodiment described above, since the occurrence number table 42 can be generated and the score can be calculated after the application destination is determined, the word pair is acquired after the query is input. The processing time can be shortened. Therefore, an evaluation value that appropriately evaluates the relevance between each word pair and the query can be output more quickly.

以上、本発明を実施するための最良の形態について実施例を用いて説明したが、本発明はこうした実施例に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 The best mode for carrying out the present invention has been described above with reference to the embodiments. However, the present invention is not limited to these embodiments, and various modifications can be made without departing from the scope of the present invention. And substitutions can be added.

例えば、単語対テーブル４０に相当するデータは、外部から入力され、或いは予め補助記憶装置１６等に格納されているものとしてもよい。この場合、単語対獲得部３０を省略することができる。 For example, data corresponding to the word pair table 40 may be input from the outside or stored in advance in the auxiliary storage device 16 or the like. In this case, the word pair acquisition unit 30 can be omitted.

また、スコア算出部３４は、上記のように、単語対とクエリの関連性を評価した結果に基づく出力を行うものとしたが、反対に、文脈パターンとクエリの関連性を評価した結果に基づく出力を行うものとしてもよい。この場合、スコア算出部３４は、一の文脈パターンについての単語対毎の出現数比率と、全文脈パターンについての単語対毎の出現数比率と、の合致程度に基づいて、文脈パターンとカテゴリーの関連性を評価する。こうした出力は、利用者がエンドユーザ以外の場合に、好適に利用され得る。この場合、スコアの算出は、上式（１）における単語対と文脈パターンをそっくり入れ替えることにより行われる。こうすれば、文脈パターンとクエリとの関連性を適切に評価した評価値を、より迅速に出力することができる。 Further, as described above, the score calculation unit 34 performs the output based on the result of evaluating the relationship between the word pair and the query, but on the contrary, based on the result of evaluating the relationship between the context pattern and the query. Output may be performed. In this case, the score calculation unit 34 determines whether the context pattern and the category are based on the matching degree between the appearance ratio for each word pair for one context pattern and the appearance ratio for each word pair for all context patterns. Assess relevance. Such output can be suitably used when the user is not an end user. In this case, the score is calculated by completely replacing the word pair and the context pattern in the above formula (1). In this way, an evaluation value that appropriately evaluates the relationship between the context pattern and the query can be output more quickly.

また、ユーザが入力するクエリの内容の代表例として「カテゴリー」を挙げたが、クエリは検索対象を限定するものであればよく、一般的に「カテゴリー」という概念に含まれないクエリの入力を受け付けても構わない。例えば、「何年何月何日以降の文書」のような時期的な制限等が考えられる。 In addition, “category” is given as a representative example of the contents of the query entered by the user. However, the query is not limited to the concept of “category” in general as long as it limits the search target. You can accept it. For example, there may be a time limit such as “documents from what day, what month, what day”.

以上の説明に関し、さらに以下の項を開示する。
（付記１）
特定の関係を有する単語の対と、該単語の対を含む文に関する形態素の構造とを関連付けて持つ単語対テーブルを格納した第１の記憶手段と、
前記第１の記憶手段から抽出した単語の対と形態素の構造に、検索対象を限定する特定の事象を加えた検索キーを作成し、該検索キーで検索対象の文書群を格納した第２の記憶手段を検索して該検索キーの出現数を求め、求めた出現数を前記単語の対と前記形態素の構造に関連付けて前記単語対テーブルに格納する出現数情報取得手段と、
前記単語対テーブルを参照し、各単語の対に関する形態素の構造毎の出現数傾向と、全単語の対に関する形態素の構造毎の出現数傾向との合致程度に基づいて、前記各単語の対と前記特定の事象との関連性を評価した評価値を出力する評価手段と、
を備える知識獲得装置。
（付記２）
付記１に記載の知識獲得装置であって、
前記評価手段は、各単語の対に関する形態素の構造毎の出現数に、全単語対に関する形態素の構造毎の出現数をそれぞれ乗じて合計し、総出現数で除した値を評価値として出力する手段である、
知識獲得装置。
（付記３）
付記１又は２に記載の知識獲得装置であって、
所与の単語の対又は形態素の構造を検索キーとして第２の記憶手段を検索し、得られた検索結果を検索キーに追加して更にデータ検索を行うことを繰り返すことにより、得られた複数の単語の対及び複数の形態素の構造を前記記憶手段に格納するデータ収集手段を更に備える、
知識獲得装置。
（付記４）
付記１ないし３のいずれか１項に記載の知識獲得装置であって、
前記単語の対を含む文に関する形態素の構造は、前記データ検索の対象文を係り受け解析をして得た最小限の部分構造となる形態素の構造から、前記単語対の形態素を除いた形態素の構造である、
知識獲得装置。
（付記５）
特定の関係を有する単語の対と、該単語の対を含む文に関する形態素の構造とを関連付けて持つ単語対テーブルを格納した第１の記憶手段と、
前記第１の記憶手段から抽出した単語の対と形態素の構造に、検索対象を限定する特定の事象を加えた検索キーを作成し、該検索キーで検索対象の文書群を格納した第２の記憶手段を検索して該検索キーの出現数を求め、求めた出現数を前記単語の対と前記形態素の構造に関連付けて前記単語対テーブルに格納する出現数情報取得手段と、
前記単語対テーブルを参照し、各形態素の構造に関する単語の対毎の出現数傾向と、全形態素の構造に関する単語の対毎の出現数傾向との合致程度に基づいて、前記各形態素の構造と前記特定の事象との関連性を評価した評価値を出力する評価手段と、
を備える知識獲得装置。
（付記６）
特定の関係を有する単語の対と、該単語の対を含む文に関する形態素の構造とを関連付けて持つ単語対テーブルを格納した第１の記憶手段から抽出した単語の対と形態素の構造に、検索対象を限定する特定の事象を加えた検索キーを作成し、該検索キーで検索対象の文書群を格納した第２の記憶手段を検索して該検索キーの出現数を求め、求めた出現数を前記単語の対と前記形態素の構造に関連付けて前記単語対テーブルに格納する処理と、
前記単語対テーブルを参照し、各単語の対に関する形態素の構造毎の出現数傾向と、全単語の対に関する形態素の構造毎の出現数傾向との合致程度に基づいて、前記各単語の対と前記特定の事象との関連性を評価した評価値を出力する処理と、
をコンピュータが実行する知識獲得方法。
（付記７）
特定の関係を有する単語の対と、該単語の対を含む文に関する形態素の構造とを関連付けて持つ単語対テーブルを格納した第１の記憶手段から抽出した単語の対と形態素の構造に、検索対象を限定する特定の事象を加えた検索キーを作成し、該検索キーで検索対象の文書群を格納した第２の記憶手段を検索して該検索キーの出現数を求め、求めた出現数を前記単語の対と前記形態素の構造に関連付けて前記単語対テーブルに格納する処理と、
前記単語対テーブルを参照し、各単語の対に関する形態素の構造毎の出現数傾向と、全単語の対に関する形態素の構造毎の出現数傾向との合致程度に基づいて、前記各単語の対と前記特定の事象との関連性を評価した評価値を出力する処理と、
をコンピュータに実行させるプログラム。 Regarding the above description, the following items are further disclosed.
(Appendix 1)
A first storage means storing a word pair table having a pair of words having a specific relationship and a structure of a morpheme related to a sentence including the pair of words;
A second search key is created by adding a specific event limiting the search target to the word pair and morpheme structure extracted from the first storage means, and storing a search target document group using the search key. Retrieving the number of occurrences of the search key by searching the storage means, the appearance number information acquisition means for storing the obtained number of occurrences in the word pair table in association with the word pair and the morpheme structure;
With reference to the word pair table, based on the degree of coincidence between the number of occurrences of each morpheme structure related to each word pair and the number of appearances of each morpheme structure related to all word pairs, An evaluation means for outputting an evaluation value obtained by evaluating an association with the specific event;
A knowledge acquisition device comprising:
(Appendix 2)
The knowledge acquisition device according to attachment 1, wherein
The evaluation means multiplies the number of appearances of each morpheme related to each word pair by the number of appearances of each morpheme related to all word pairs, and outputs the value divided by the total number of appearances as an evaluation value. Means,
Knowledge acquisition device.
(Appendix 3)
The knowledge acquisition device according to appendix 1 or 2,
Searching the second storage means using a given word pair or morpheme structure as a search key, adding the obtained search result to the search key and repeating the data search to obtain a plurality of Data collection means for storing the word pairs and the structure of a plurality of morphemes in the storage means,
Knowledge acquisition device.
(Appendix 4)
The knowledge acquisition device according to any one of appendices 1 to 3,
The morpheme structure related to the sentence including the word pair is the morpheme structure obtained by removing the morpheme of the word pair from the morpheme structure that is the minimum partial structure obtained by performing dependency analysis on the target sentence of the data search. Structure,
Knowledge acquisition device.
(Appendix 5)
A first storage means storing a word pair table having a pair of words having a specific relationship and a structure of a morpheme related to a sentence including the pair of words;
A second search key is created by adding a specific event limiting the search target to the word pair and morpheme structure extracted from the first storage means, and storing a search target document group using the search key. Retrieving the number of occurrences of the search key by searching the storage means, the appearance number information acquisition means for storing the obtained number of occurrences in the word pair table in association with the word pair and the morpheme structure;
With reference to the word pair table, the structure of each morpheme is determined based on the degree of match between the number of occurrences of each word pair related to the structure of each morpheme and the number of occurrences of each word pair related to the structure of all morphemes. An evaluation means for outputting an evaluation value obtained by evaluating an association with the specific event;
A knowledge acquisition device comprising:
(Appendix 6)
The word pair and morpheme structure extracted from the first storage means storing the word pair table having the word pair having a specific relationship and the morpheme structure related to the sentence including the word pair are searched. A search key to which a specific event for limiting the target is added is created, the second storage means storing the document group to be searched is searched with the search key, and the number of appearances of the search key is obtained. Storing in the word pair table in association with the word pair and the morpheme structure;
With reference to the word pair table, based on the degree of coincidence between the number of occurrences of each morpheme structure related to each word pair and the number of appearances of each morpheme structure related to all word pairs, A process of outputting an evaluation value that evaluates the association with the specific event;
A knowledge acquisition method in which a computer is executed.
(Appendix 7)
The word pair and morpheme structure extracted from the first storage means storing the word pair table having the word pair having a specific relationship and the morpheme structure related to the sentence including the word pair are searched. A search key to which a specific event for limiting the target is added is created, the second storage means storing the document group to be searched is searched with the search key, and the number of appearances of the search key is obtained. Storing in the word pair table in association with the word pair and the morpheme structure;
With reference to the word pair table, based on the degree of coincidence between the number of occurrences of each morpheme structure related to each word pair and the number of appearances of each morpheme structure related to all word pairs, A process of outputting an evaluation value that evaluates the association with the specific event;
A program that causes a computer to execute.

１知識獲得装置
１０ＣＰＵ
１２ドライブ装置
１４記憶媒体
１６補助記憶装置
１８メモリ装置
２０インタフェース装置
２２入力装置
２４出力装置
３０単語対獲得部
３２検索部
３４スコア算出部
４０単語対テーブル
４０Ａ単語対リスト
４０Ｂ文脈パターンリスト
４２出現数テーブル
４４スコア付単語対
５０ネットワーク
１００クライアントコンピュータ
１１０オペレータ
１２０ユーザ
２００大規模文書
２５０シード
２６０クエリ 1 Knowledge acquisition device 10 CPU
DESCRIPTION OF SYMBOLS 12 Drive apparatus 14 Storage medium 16 Auxiliary storage apparatus 18 Memory apparatus 20 Interface apparatus 22 Input apparatus 24 Output apparatus 30 Word pair acquisition part 32 Search part 34 Score calculation part 40 Word pair table 40A Word pair list 40B Context pattern list 42 Appearance number table 44 Word Pair with Score 50 Network 100 Client Computer 110 Operator 120 User 200 Large Document 250 Seed 260 Query

Claims

A first storage means storing a word pair table having a pair of words having a specific relationship and a structure of a morpheme related to a sentence including the pair of words;
A second search key is created by adding a specific event limiting the search target to the word pair and morpheme structure extracted from the first storage means, and storing a search target document group using the search key. Retrieving the number of occurrences of the search key by searching the storage means, the appearance number information acquisition means for storing the obtained number of occurrences in the word pair table in association with the word pair and the morpheme structure;
With reference to the word pair table, based on the degree of coincidence between the number of occurrences of each morpheme structure related to each word pair and the number of appearances of each morpheme structure related to all word pairs, An evaluation means for outputting an evaluation value obtained by evaluating an association with the specific event;
A knowledge acquisition device comprising:

The knowledge acquisition device according to claim 1,
The evaluation means multiplies the number of appearances of each morpheme related to each word pair by the number of appearances of each morpheme related to all word pairs, and outputs the value divided by the total number of appearances as an evaluation value. Means,
Knowledge acquisition device.

The knowledge acquisition device according to claim 1 or 2,
Searching the second storage means using a given word pair or morpheme structure as a search key, adding the obtained search result to the search key and repeating the data search to obtain a plurality of Data collection means for storing the word pairs and the structure of a plurality of morphemes in the storage means,
Knowledge acquisition device.

The knowledge acquisition device according to any one of claims 1 to 3,
The morpheme structure related to the sentence including the word pair is the morpheme structure obtained by removing the morpheme of the word pair from the morpheme structure that is the minimum partial structure obtained by performing dependency analysis on the target sentence of the data search. Structure,
Knowledge acquisition device.

The word pair and morpheme structure extracted from the first storage means storing the word pair table having the word pair having a specific relationship and the morpheme structure related to the sentence including the word pair are searched. A search key to which a specific event for limiting the target is added is created, the second storage means storing the document group to be searched is searched with the search key, and the number of appearances of the search key is obtained. Storing in the word pair table in association with the word pair and the morpheme structure;
With reference to the word pair table, based on the degree of coincidence between the number of occurrences of each morpheme structure related to each word pair and the number of appearances of each morpheme structure related to all word pairs, A process of outputting an evaluation value that evaluates the association with the specific event;
A knowledge acquisition method in which a computer is executed.

The word pair and morpheme structure extracted from the first storage means storing the word pair table having the word pair having a specific relationship and the morpheme structure related to the sentence including the word pair are searched. A search key to which a specific event for limiting the target is added is created, the second storage means storing the document group to be searched is searched with the search key, and the number of appearances of the search key is obtained. Storing in the word pair table in association with the word pair and the morpheme structure;
With reference to the word pair table, based on the degree of coincidence between the number of occurrences of each morpheme structure related to each word pair and the number of appearances of each morpheme structure related to all word pairs, A process of outputting an evaluation value that evaluates the association with the specific event;
A program that causes a computer to execute.