JP2001117942A

JP2001117942A - INFORMATION SEARCHING DEVICE, INFORMATION SEARCHING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM RECORDING PROGRAM FOR CAUSING COMPUTER TO EXECUTE THE METHOD

Info

Publication number: JP2001117942A
Application number: JP29890299A
Authority: JP
Inventors: Toshihiro Ajiki; 敏宏安食
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 1999-10-20
Filing date: 1999-10-20
Publication date: 2001-04-27

Abstract

(57)【要約】【課題】複数の情報源に対する情報検索において、検
索要求に対する各情報源の適合度をきめ細かく算出する
ことで、操作者に対して適切な情報源を選択するための
手がかりを提供すること。【解決手段】データベースごとに作成された検索用フ
ァイルを記憶する検索用ファイル記憶部３００と、クラ
イアントからの検索要求を受信する受信部３０１と、検
索要求の種類を判定する要求処理部３０２と、検索要求
に対する各文書の適合度をベクトルの類似度に基づいて
算出する文書適合度算出部３０３と、算出された適合度
に基づいて所定数の文書を抽出する文書抽出部３０４
と、抽出された文書のデータベースごとの件数や適合度
の総和等に基づいて各データベースの適合度を算出する
データベース適合度算出部３０５と、処理結果をクライ
アントに送信する送信部３０６と、を備える。 (57) [Summary] [Problem] In an information search for a plurality of information sources, a clue for selecting an appropriate information source for an operator by finely calculating the relevance of each information source to a search request. To provide. A search file storage unit that stores a search file created for each database, a reception unit that receives a search request from a client, a request processing unit that determines the type of search request, A document relevance calculation unit 303 that calculates the relevance of each document to the search request based on the similarity between the vectors, and a document extraction unit 304 that extracts a predetermined number of documents based on the calculated relevance.
A database suitability calculation unit 305 for calculating the suitability of each database based on the number of extracted documents for each database and the sum of the suitability, and a transmission unit 306 for transmitting the processing result to the client. .

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、複数のデータベ
ースの中から検索目的に合致したデータベースを選択す
る情報検索装置、情報検索方法およびその方法をコンピ
ュータに実行させるプログラムを記録したコンピュータ
読み取り可能な記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information retrieval apparatus, an information retrieval method, and a computer-readable recording recording a program for causing a computer to execute the information retrieval method from among a plurality of databases. Regarding the medium.

【０００２】[0002]

【従来の技術】電子化された文書やデータ等をハードデ
ィスク等の記憶装置にデータベースとして蓄積してお
き、操作者からの検索要求が入力されると、当該検索要
求に適合する情報を検索して表示画面等に出力する情報
検索装置が従来から知られている。2. Description of the Related Art Computerized documents and data are stored in a storage device such as a hard disk as a database, and when a search request is input from an operator, information matching the search request is searched. 2. Description of the Related Art An information search device that outputs information on a display screen or the like has been conventionally known.

【０００３】上記装置はその内部、あるいはネットワー
クを介して接続されたほかの情報処理装置等の内部に設
けられた、いくつかの異なったデータベースから、いず
れかを指定して検索を実行できる場合が多い。その場
合、操作者は利用可能なデータベースのうち、いずれの
データベースを検索対象とするかを指定しなければなら
ない。[0003] In some cases, the above-mentioned device can execute a search by designating any one of several different databases provided inside the device or inside another information processing device or the like connected via a network. Many. In that case, the operator must specify which database among the available databases is to be searched.

【０００４】ところで利用可能なデータベースの数が多
いほど、目的とする情報を得られる確率が高くなる反
面、いずれのデータベースを検索対象として指定すべき
か、言い換えれば、目的とする情報がいずれのデータベ
ースに存在するかの見当づけが難しくなる。[0004] By the way, as the number of available databases increases, the probability of obtaining target information increases, but on the other hand, which database should be designated as a search target, in other words, the target information is stored in which database. Presence of existence becomes difficult.

【０００５】操作者が経験や利用マニュアル等によっ
て、それぞれのデータベースに蓄積されている情報の概
要をあらかじめ把握している場合はよいが、たとえば毎
日の新聞記事を機械的に蓄積しているデータベースのよ
うに、その内部に蓄積されている情報の全貌がほとんど
だれにも分からないようなものもある。[0005] It is good for the operator to know in advance the outline of the information stored in each database based on experience, a usage manual, or the like, but for example, a database in which daily newspaper articles are stored mechanically. In some cases, almost no one can tell the whole picture of the information stored inside.

【０００６】そのようなデータベースは、ひとまず検索
対象に指定して検索を実行してみなければ、目的とする
情報が得られそうか、そうでないかが判断できない。し
かし上記の新聞記事データベースが典型的であるよう
に、そのようなデータベースは時に巨大で、検索に長時
間を要する場合がある。[0006] In such a database, it is impossible to judge whether target information is likely to be obtained or not unless a database is first specified and searched. However, as the newspaper article databases described above are typical, such databases are sometimes huge and can take a long time to search.

【０００７】そこでこのような、検索対象の指定の段階
における試行錯誤を防止する目的で、入力された検索語
から適切なデータベースを自動的に選定して検索を実行
する情報検索装置が知られている。In order to prevent such trial and error at the stage of specifying a search object, there is known an information search apparatus for automatically selecting an appropriate database from input search words and executing a search. I have.

【０００８】たとえば特願平４−１２２９８８号公報に
記載の発明は、利用可能なデータベースごとに当該デー
タベース内の文書に含まれるキーワードのリストを作成
しておき、入力された検索キーワードを前記リストと比
較して、リスト内に当該キーワードが存在すれば、当該
リストの母体となったデータベースを対象として当該キ
ーワードによる検索を実行するというものである。For example, according to the invention described in Japanese Patent Application No. 4-122988, a list of keywords included in documents in the available database is prepared in advance, and the entered search keywords are stored in the list. By comparison, if the keyword is present in the list, a search based on the keyword is executed for the database that is the base of the list.

【０００９】[0009]

【発明が解消しようとする課題】しかしながら上記従来
技術は、データベース内に検索キーワードが一回でも出
現していれば、当該データベースを当該キーワードによ
る検索の対象とするようにしている。However, in the above-mentioned prior art, if a search keyword appears at least once in a database, the database is used as a search target by the keyword.

【００１０】したがって、実際に検索をおこなえば１０
０件の適合文書が得られるデータベースＡも、１件しか
得られないデータベースＢも、なんら区別されることが
ないため、たとえば網羅的な検索が目的ではなく、ある
事項に関して大体どのような文献があるか、数十件程度
をサンプルとして見たいだけの場合は、その目的のため
ならばデータベースＡのみを検索すればたりるにもかか
わらず、データベースＢもあわせて検索されてしまうこ
とにより、無駄な処理が生ずるという問題点があった。Therefore, if an actual search is performed, 10
There is no distinction between database A, which can obtain zero matching documents, and database B, which can obtain only one document. If you only want to see some dozens of samples as a sample, for that purpose, only database A is searched, but database B is also searched. There is a problem that a complicated process occurs.

【００１１】また、たとえばワインを使った南フランス
風料理について調査する目的で、「ワイン」および「フ
ランス料理」をキーワードとして検索をおこなうと、全
体としてはフランス料理に関する文書を蓄積したデータ
ベースＡも、全体としてはフランスの時事問題に関する
文書を蓄積したデータベースＢも、それらのキーワード
を有する限り同様に検索対象となるため、検索結果にワ
インの卸売価格に関する文献やフランス料理界の人材不
足に関する文献等のノイズが混入し、所望の文書が発見
しにくくなるという問題点があった。Further, for example, when a search is performed using “wine” and “French cuisine” as keywords for the purpose of investigating southern French cuisine using wine, a database A that stores documents relating to French cuisine as a whole is also obtained. As a whole, the database B, which stores documents on French current affairs, is also searchable as long as it has such keywords. Therefore, the search results include literature on the wholesale price of wine and literature on the shortage of human resources in the French cuisine industry. There is a problem that noise is mixed in and a desired document is hardly found.

【００１２】さらに上記従来技術によれば、いわゆる検
索の空振り（ヒット件数が０件となること）による時間
や労力の無駄は回避することができるが、適合文書数が
０ではないものの極めて少ないデータベースも検索対象
となってしまうため、場合によっては、データベースの
取捨選択による処理効率の向上を望めないことがあっ
た。Further, according to the above-mentioned prior art, it is possible to avoid wasting time and labor due to a so-called missed search (the number of hits becomes zero). However, although the number of compatible documents is not zero, the database is extremely small. In some cases, it may not be possible to improve the processing efficiency by selecting a database.

【００１３】たとえばデータベースが巨大であればある
ほど、その中に検索キーワードの含まれる可能性は高く
なるので、上記の新聞記事データベース等は、キーワー
ドが特殊な専門用語ででもない限り、常に検索対象とな
る可能性が高いと考えられる。[0013] For example, the larger the database, the higher the possibility of including a search keyword in the database. Therefore, the above-mentioned newspaper article database and the like are always searched for unless the keyword is a special technical term. It is considered that the possibility is high.

【００１４】したがって、利用可能なデータベースの多
くが、新聞記事のように種々の分野にまたがる多数の用
語を含んでいる場合、上記従来技術によっては結局ほと
んどのデータベースが検索対象となる結果、処理に長時
間を要したり、コストが高くついたり（データベースの
検索に対して課金される場合）、あるいは検索結果が膨
大となってしまう等の問題点があった。Therefore, when many of the available databases contain many terms that cover various fields, such as newspaper articles, most of the databases are eventually searched according to the above-mentioned prior art, and as a result, processing is difficult. There are problems such as a long time, a high cost (when a fee is charged for searching the database), and an enormous amount of search results.

【００１５】上記のような問題は、あるデータベースを
当該検索の対象とするか否か、言い換えればあるデータ
ベースが当該検索の目的に合致したか否かを、単純に検
索キーワードの有無によって、適合と不適合の二値をも
って判定すること、さらにその判定結果の修正の機会を
操作者に与えないこと、の２点から生じている。[0015] The above-mentioned problem is to determine whether or not a certain database is a target of the search, in other words, whether or not a certain database matches the purpose of the search, by simply determining whether or not there is a search keyword. The determination is made based on two values of nonconformity, and the operator is not given a chance to correct the determination result.

【００１６】この発明は、上述した従来技術による問題
点を解消するため、複数の情報源に対する情報検索にお
いて、検索要求に対する各情報源の適合度をきめ細かく
算出することで、操作者に対して適切な情報源を選択す
るための手がかりを提供することが可能な情報検索装
置、情報検索方法およびその方法をコンピュータに実行
させるプログラムを記録したコンピュータ読み取り可能
な記録媒体を提供することを目的とする。According to the present invention, in order to solve the above-described problem of the related art, in information search for a plurality of information sources, the degree of suitability of each information source with respect to a search request is finely calculated, so that an operator can be appropriately searched. It is an object of the present invention to provide an information search device, an information search method, and a computer-readable recording medium storing a program for causing a computer to execute the method, which can provide a clue for selecting a suitable information source.

【００１７】[0017]

【課題を解決するための手段】上述した課題を解決し、
目的を達成するため、請求項１の発明にかかる情報検索
装置は、複数のデータベースの中から検索目的に合致し
たデータベースを選択する情報検索装置において、前記
複数のデータベースに格納された文書の、入力された検
索要求に対する適合度を文書ごとに算出する第１の算出
手段と、前記第１の算出手段により算出された適合度に
基づいて、前記複数のデータベースに格納された文書か
ら所定の数の文書を抽出する抽出手段と、前記抽出手段
により抽出された文書のデータベースごとの数に基づい
て、前記複数のデータベースの前記検索要求に対する適
合度をデータベースごとに算出する第２の算出手段と、
を備えたことを特徴とする。Means for Solving the Problems To solve the above-mentioned problems,
In order to achieve the object, an information retrieval apparatus according to the present invention is an information retrieval apparatus for selecting a database that matches a retrieval purpose from among a plurality of databases, wherein an input of a document stored in the plurality of databases is performed. First calculating means for calculating the relevance to the searched search request for each document; and a predetermined number of documents stored in the plurality of databases based on the relevance calculated by the first calculating means. Extracting means for extracting a document; second calculating means for calculating, for each database, the relevance of the plurality of databases to the search request based on the number of documents extracted by the extracting means for each database;
It is characterized by having.

【００１８】この請求項１の発明によれば、検索要求に
対する各データベースの適合度が、当該検索要求に対す
る適合度に基づいて抽出された文書内に占める、各デー
タベースの文書の件数に基づいてきめ細かく算出され
る。According to the first aspect of the present invention, the relevance of each database to the search request is finely determined based on the number of documents of each database in the documents extracted based on the relevance to the search request. Is calculated.

【００１９】また、請求項２の発明にかかる情報検索装
置は、複数のデータベースの中から検索目的に合致した
データベースを選択する情報検索装置において、前記複
数のデータベースに格納された文書の、入力された検索
要求に対する適合度を文書ごとに算出する第１の算出手
段と、前記第１の算出手段により算出された適合度に基
づいて、前記複数のデータベースに格納された文書から
所定の数の文書を抽出する抽出手段と、前記抽出手段に
より抽出された文書の前記第１の算出手段により算出さ
れた適合度に基づいて、前記複数のデータベースの前記
検索要求に対する適合度をデータベースごとに算出する
第２の算出手段と、を備えたことを特徴とする。According to a second aspect of the present invention, there is provided an information search apparatus for selecting a database that matches a search purpose from among a plurality of databases, wherein an input of a document stored in the plurality of databases is performed. First calculating means for calculating the relevance to the search request for each document, and a predetermined number of documents from the documents stored in the plurality of databases based on the relevance calculated by the first calculating means. Extracting means for extracting, based on the relevance calculated by the first calculating means of the document extracted by the extracting means, calculating the relevance to the search request of the plurality of databases for each database 2 calculation means.

【００２０】この請求項２の発明によれば、検索要求に
対する各データベースの適合度が、当該検索要求に対す
る適合度に基づいて抽出された文書の、各データベース
ごとの適合度の総和や平均値等に基づいてきめ細かく算
出される。According to the second aspect of the present invention, the relevance of each database to the search request is determined based on the sum or average of the relevance of each database of the documents extracted based on the relevance to the search request. Calculated finely based on

【００２１】また、請求項３の発明にかかる情報検索方
法は、複数のデータベースの中から検索目的に合致した
データベースを選択する情報検索方法において、前記複
数のデータベースに格納された文書の、入力された検索
要求に対する適合度を文書ごとに算出する第１の算出工
程と、前記第１の算出工程により算出された適合度に基
づいて、前記複数のデータベースに格納された文書から
所定の数の文書を抽出する抽出工程と、前記抽出工程に
より抽出された文書のデータベースごとの数に基づい
て、前記複数のデータベースの前記検索要求に対する適
合度をデータベースごとに算出する第２の算出工程と、
を含んだことを特徴とする。According to a third aspect of the present invention, in the information search method for selecting a database that matches a search purpose from among a plurality of databases, an input of a document stored in the plurality of databases is performed. A first calculation step of calculating the relevance to the search request for each document, and a predetermined number of documents from the documents stored in the plurality of databases based on the relevance calculated in the first calculation step. And a second calculation step of calculating, for each database, the relevance of the plurality of databases to the search request based on the number of documents extracted by the extraction step for each database,
It is characterized by including.

【００２２】この請求項３の発明によれば、検索要求に
対する各データベースの適合度が、当該検索要求に対す
る適合度に基づいて抽出された文書内に占める、各デー
タベースの文書の件数に基づいてきめ細かく算出され
る。According to the third aspect of the present invention, the relevance of each database to the search request is finely determined based on the number of documents in each database in the documents extracted based on the relevance to the search request. Is calculated.

【００２３】また、請求項４の発明にかかる情報検索方
法は、複数のデータベースの中から検索目的に合致した
データベースを選択する情報検索方法において、前記複
数のデータベースに格納された文書の、入力された検索
要求に対する適合度を文書ごとに算出する第１の算出工
程と、前記第１の算出工程により算出された適合度に基
づいて、前記複数のデータベースに格納された文書から
所定の数の文書を抽出する抽出工程と、前記抽出工程に
より抽出された文書の前記第１の算出工程により算出さ
れた適合度に基づいて、前記複数のデータベースの前記
検索要求に対する適合度をデータベースごとに算出する
第２の算出工程と、を含んだことを特徴とする。According to a fourth aspect of the present invention, there is provided an information search method for selecting a database matching a search purpose from among a plurality of databases, wherein an input of a document stored in the plurality of databases is performed. A first calculation step of calculating the relevance to the search request for each document, and a predetermined number of documents from the documents stored in the plurality of databases based on the relevance calculated in the first calculation step. Extracting, based on the relevance calculated in the first calculation step of the document extracted in the extraction step, calculating the relevance of the plurality of databases to the search request for each database 2 calculation steps.

【００２４】この請求項４の発明によれば、検索要求に
対する各データベースの適合度が、当該検索要求に対す
る適合度に基づいて抽出された文書の、各データベース
ごとの適合度の総和や平均値等に基づいてきめ細かく算
出される。According to the fourth aspect of the present invention, the relevance of each database to the search request is determined based on the sum or average of the relevance of each document in the document extracted based on the relevance to the search request. Calculated finely based on

【００２５】また、請求項５の発明にかかる記録媒体
は、請求項３または４のいずれかに記載された方法をコ
ンピュータに実行させるプログラムを記録したことで、
そのプログラムが機械読み取り可能となり、これによっ
て、請求項３または４の動作をコンピュータによって実
現することが可能となる。According to a fifth aspect of the present invention, there is provided a recording medium storing a program for causing a computer to execute the method according to the third or fourth aspect.
The program becomes machine-readable, whereby the operation of claim 3 or 4 can be realized by a computer.

【００２６】[0026]

【発明の実施の形態】以下に添付図面を参照して、この
発明にかかる情報検索装置、情報検索方法およびその方
法をコンピュータに実行させるプログラムを記録したコ
ンピュータ読み取り可能な記録媒体の好適な実施の形態
を詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of an information retrieval apparatus, an information retrieval method and a computer-readable recording medium according to the present invention are described below with reference to the accompanying drawings. The form will be described in detail.

【００２７】（実施の形態）まず、本発明の実施の形態
にかかる情報検索装置を含む情報検索システムのシステ
ム構成について説明する。図１は、本発明の実施の形態
にかかる情報検索装置を含む情報検索システムのシステ
ム構成を示すブロック図である。(Embodiment) First, a system configuration of an information search system including an information search device according to an embodiment of the present invention will be described. FIG. 1 is a block diagram showing a system configuration of an information search system including an information search device according to an embodiment of the present invention.

【００２８】図１において、１０１は本発明の実施の形
態にかかる情報検索装置である。本発明の実施の形態に
かかる情報検索装置は、それ自体情報検索装置を兼ねて
いる。すなわち後述するように、操作者の検索目的に合
致したデータベースを提示するだけでなく、当該データ
ベースを対象として通常の情報検索をおこない、検索要
求に適合する文書を提示する機能も有している。In FIG. 1, reference numeral 101 denotes an information retrieval apparatus according to an embodiment of the present invention. The information search device according to the embodiment of the present invention also doubles as an information search device. That is, as will be described later, it has a function of not only presenting a database that matches the search purpose of the operator, but also performing a normal information search on the database and presenting a document that meets the search request.

【００２９】１０２，１０３および１０４は文書データ
ベースを保持し、要求に応じて指定された文書を要求元
に送信するファイルサーバであり、それぞれ経済に関す
る文書を蓄積した「経済記事データベース」、政治に関
する文書を蓄積した「政治記事データベース」、国外の
時事に関する文書を蓄積した「国際記事データベース」
を有している。Reference numerals 102, 103 and 104 denote file servers which hold a document database and transmit a designated document to a request source in response to a request. "Political article database" with accumulated information, "International article database" with documents on current events outside Japan
have.

【００３０】これらのデータベースに蓄積されている文
書は、あらかじめ所定のファイル形式、具体的にはＳ−
ＪＩＳ形式に統一されている。Ｓ−ＪＩＳ形式とはＳＧ
ＭＬなどの構造化文書と類似の形式であり、たとえば<b
ody>のタグに続いて文書の本文、<title>のタグに続い
て文書の表題、<author>のタグに続いて文書の作成者
名、<createddate>のタグに続いて文書の作成日時、<pu
bdate>のタグに続いて文書の公表日時（刊行物への掲載
日時等）等が記述されている。The documents stored in these databases are stored in a predetermined file format, specifically, an S-file.
It is unified into JIS format. What is S-JIS format SG
It has a format similar to a structured document such as ML, for example, <b
ody> tag, the document body, <title> tag, document title, <author> tag, document creator name, <createddate> tag, document creation date, <pu
Following the bdate> tag, the date and time of publication of the document (date and time of publication in the publication, etc.) are described.

【００３１】なお、これらのデータベースに異なる形式
の文書、たとえば市販のワープロソフトにより作成され
た独自形式の文書を追加するときは、それに先立ってフ
ァイル形式の変換をおこなう。上述の情報検索装置１０
１は、各種形式のファイルをＳ−ＪＩＳ形式に変換する
ための機能（フィルタ）を有しているが、本発明の中心
的な内容ではないので詳細な説明を省略する。When a document of a different format, for example, a document of a unique format created by commercially available word processing software is added to these databases, the file format is converted prior to the addition. Information retrieval device 10 described above
Reference numeral 1 has a function (filter) for converting files of various formats into the S-JIS format, but a detailed description thereof is omitted since it is not the main content of the present invention.

【００３２】またファイルサーバ１０２，１０３および
１０４は、上記文書のファイルのほかに、情報検索装置
１０１がそれらの文書を検索する際に使用する複数の検
索用ファイルを保持している。これらの検索用ファイル
は、データベースの文書に含まれるすべての語句につい
て、その出現頻度やどの文書に出現しているか等の情報
を記述したものである。The file servers 102, 103, and 104 hold a plurality of search files used by the information search device 101 to search for those documents, in addition to the document files. These search files describe information such as the frequency of occurrence of all words included in the documents in the database and the documents in which they appear.

【００３３】１０５および１０６は操作者からの種々の
要求をキーボード等により入力し、情報検索装置１０１
から入力した処理結果をディスプレイ等に出力するクラ
イアントである。上記情報検索装置１０１、ファイルサ
ーバ１０２，１０３および１０４、クライアント１０５
および１０６は、それぞれネットワーク１００により接
続されている。Reference numerals 105 and 106 are used to input various requests from the operator via a keyboard or the like, and
This is a client that outputs the processing result input from the server to a display or the like. The information retrieval apparatus 101, file servers 102, 103 and 104, client 105
And 106 are connected by a network 100, respectively.

【００３４】本発明の実施の形態では、情報検索装置１
０１とファイルサーバ１０２，１０３および１０４を別
個に設けるようにしたが、情報検索装置１０１自体がデ
ータベースを管理し、ファイルサーバとしての機能を有
するようにしてもよい。また、情報検索装置１０１にク
ライアント１０５、１０６としての機能を持たせ、情報
検索装置１０１において各種要求の入力や結果の出力が
できるようにしてもよい。In the embodiment of the present invention, the information retrieval device 1
01 and the file servers 102, 103 and 104 are provided separately, but the information retrieval apparatus 101 itself may manage the database and have a function as a file server. The information search device 101 may have functions as the clients 105 and 106 so that the information search device 101 can input various requests and output results.

【００３５】また、情報検索装置１０１がファイルサー
バ１０２，１０３，１０４およびクライアント１０５，
１０６を兼ねるようにしてもよい。この場合は、検索要
求の入力や処理結果の出力は情報検索装置１０１におい
ておこなわれることになり、また後述する適合度の算出
等の処理は、情報検索装置１０１自体が保持しているデ
ータベースや文書についておこなわれることになる。The information retrieval apparatus 101 is connected to the file servers 102, 103, 104 and the clients 105,
106 may also be used. In this case, the input of the search request and the output of the processing result are performed in the information search apparatus 101, and the processing such as the calculation of the degree of matching described later is performed by the database or the document held by the information search apparatus 101 itself. Will be performed.

【００３６】なお情報検索装置１０１、ファイルサーバ
１０２、１０３および１０４、クライアント１０５およ
び１０６は、具体的にはそれぞれワークステーション、
パーソナルコンピュータ等により実現される。The information retrieval device 101, the file servers 102, 103 and 104, and the clients 105 and 106 are specifically a workstation,
It is realized by a personal computer or the like.

【００３７】つぎに、本発明の実施の形態にかかる情報
検索装置１０１のハードウエア構成について説明する。
図２は、本発明の実施の形態にかかる情報検索装置のハ
ードウエア構成を示すブロック図である。Next, a hardware configuration of the information retrieval apparatus 101 according to the embodiment of the present invention will be described.
FIG. 2 is a block diagram illustrating a hardware configuration of the information search device according to the embodiment of the present invention.

【００３８】図２において、２０１はシステム全体を制
御するＣＰＵを、２０２はブートプログラム等を記憶し
たＲＯＭを、２０３はＣＰＵ２０１のワークエリアとし
て使用されるＲＡＭを、２０４はＣＰＵ２０１の制御に
したがってＨＤ（ハードディスク）２０５に対するデー
タのリード／ライトを制御するＨＤＤ（ハードディスク
ドライブ）を、２０５はＨＤＤ２０４の制御で書き込ま
れたデータを記憶するＨＤをそれぞれ示している。2, reference numeral 201 denotes a CPU for controlling the entire system; 202, a ROM storing a boot program and the like; 203, a RAM used as a work area of the CPU 201; A hard disk drive (HDD) 205 controls reading / writing of data from / to a hard disk 205, and an HD 205 stores data written under the control of the HDD 204.

【００３９】また、２０６はＣＰＵ２０１の制御にした
がってＦＤ（フロッピーディスク）２０７に対するデー
タのリード／ライトを制御するＦＤＤ（フロッピーディ
スクドライブ）を、２０７はＦＤＤ２０６の制御で書き
込まれたデータを記憶する着脱自在のＦＤをそれぞれ示
している。Reference numeral 206 denotes an FDD (floppy disk drive) for controlling reading / writing of data from / to an FD (floppy disk) 207 under the control of the CPU 201; Are respectively shown.

【００４０】また、２０８はカーソル、アイコンあるい
はツールボックスをはじめ、文書、画像、機能情報等の
データに関するウインドウを表示するディスプレイを、
２０９は通信回線２１０を介してネットワークＮＥＴに
接続され、そのネットワークＮＥＴと内部とのインター
フェイスを司るインターフェイス（Ｉ／Ｆ）をそれぞれ
示している。Reference numeral 208 denotes a display for displaying windows related to data such as documents, images, and function information, including a cursor, icons, and tool boxes.
An interface (I / F) 209 is connected to the network NET via the communication line 210 and controls an interface between the network NET and the inside.

【００４１】また、２１１は文字、数値、各種指示等の
入力のための複数のキーを備えたキーボードを、２１２
はカーソルの移動や範囲選択、あるいはウインドウの移
動やサイズの変更、アイコンの選択、移動等をおこなう
マウスを、２１３は画像を光学的に読み取るスキャナ
を、２１４はウインドウに表示された内容等を印刷する
プリンタを、２１５は着脱可能な記録媒体であるＣＤ−
ＲＯＭを、２１６はＣＤ−ＲＯＭ２１５に対するデータ
のリードを制御するＣＤ−ＲＯＭドライブを、それぞれ
示している。また、２００は上記各部を接続するための
バスを示している。A keyboard 211 has a plurality of keys for inputting characters, numerical values, various instructions, etc.
Is a mouse for moving a cursor and selecting a range, or moving and resizing a window, selecting and moving an icon, 213 is a scanner for optically reading an image, and 214 is a printer for printing contents displayed in a window. 215 is a CD-ROM which is a removable recording medium.
A ROM 216 indicates a CD-ROM drive for controlling reading of data from the CD-ROM 215. Reference numeral 200 denotes a bus for connecting the above components.

【００４２】つぎに、本発明の実施の形態にかかる情報
検索装置の機能的構成について説明する。図３は、本発
明の実施の形態にかかる情報検索装置の構成を機能的に
示すブロック図である。Next, a functional configuration of the information retrieval apparatus according to the embodiment of the present invention will be described. FIG. 3 is a block diagram functionally showing the configuration of the information search device according to the embodiment of the present invention.

【００４３】図３において、本発明の実施の形態にかか
る情報検索装置は、検索用ファイル記憶部３００と、受
信部３０１と、要求処理部３０２と、文書適合度算出部
３０３と、文書抽出部３０４と、データベース適合度算
出部３０５と、送信部３０６とを含む構成である。また
文書適合度算出部３０３は、クエリーベクトル作成部３
０３ａと、文書ベクトル作成部３０３ｂと、ベクトル類
似度算出部３０３ｃとを含む構成である。Referring to FIG. 3, the information retrieval apparatus according to the embodiment of the present invention includes a retrieval file storage unit 300, a reception unit 301, a request processing unit 302, a document relevance calculation unit 303, and a document extraction unit. The configuration includes a 304, a database compatibility calculation unit 305, and a transmission unit 306. Also, the document relevance calculating unit 303 includes the query vector creating unit 3
03a, a document vector creation unit 303b, and a vector similarity calculation unit 303c.

【００４４】検索用ファイル記憶部３００は、検索しよ
うとするデータベースの検索用ファイルを、当該データ
ベースを管理するファイルサーバから取得して記憶す
る。そしていったん記憶された検索用ファイルは、検索
用ファイル記憶部３００の記憶容量が不足しない限り削
除されないので、つぎに同じデータベースについて検索
をおこなう際、ここにその検索用ファイルが残っていれ
ば、当該ファイルをあらためてファイルサーバから取得
してくる必要がなく処理効率が向上する。The search file storage unit 300 acquires a search file of a database to be searched from a file server that manages the database and stores it. Then, the search file once stored is not deleted unless the storage capacity of the search file storage unit 300 is insufficient, so that the next time the same database is searched, if the search file remains here, There is no need to obtain the file from the file server again, and the processing efficiency is improved.

【００４５】ここでは説明の便宜上、検索用ファイル記
憶部３００にはすでにファイルサーバ１０２、１０３お
よび１０４に存在する「経済記事データベース」「政治
記事データベース」および「国際記事データベース」の
すべての検索用ファイルが記憶されているものとする。Here, for convenience of explanation, all the search files of the “economic article database”, “political article database”, and “international article database” existing in the file servers 102, 103 and 104 are stored in the search file storage section 300. Is stored.

【００４６】受信部３０１は、所定のプロトコルにした
がい、クライアント１０５または１０６からの種々の要
求をネットワーク１００を介して受信する。そして受信
した要求を、後述する要求処理部３０２に対して出力す
る。The receiving unit 301 receives various requests from the client 105 or 106 via the network 100 according to a predetermined protocol. The received request is output to a request processing unit 302 described later.

【００４７】要求処理部３０２は、受信部３０１から入
力した要求の種類を判定する。上述のように、本発明の
実施の形態にかかる情報検索装置は情報検索装置を兼ね
るため、入力する要求には大別して下記の二種類があ
る。The request processing unit 302 determines the type of the request input from the receiving unit 301. As described above, since the information search device according to the embodiment of the present invention also serves as an information search device, there are roughly the following two types of requests to be input.

【００４８】（１）検索目的に適合するデータベースの
提示この要求は少なくとも、「銀行の統廃合、合併につい
て」のような検索文（検索語または検索式であってもよ
いが、本発明の実施の形態では通常の自然文によって問
い合わせをおこなうようにしている）を含む。この要求
に対する処理が本発明の中心的な内容であり、詳細は後
述する。(1) Presentation of a database suitable for the purpose of search This request includes at least a search sentence (a search word or a search expression, such as “About bank consolidation and merger” In the form, an inquiry is made using a normal natural sentence). Processing for this request is the main content of the present invention, and details will be described later.

【００４９】（２）検索目的に適合する文書の提示この要求は少なくとも検索文と、検索すべきデータベー
スの名称を含み、その他検索すべき期間等の付加的な情
報を含む。これは通常の情報検索要求であり、指定した
データベース内の適合文書の本文、あるいはその表題や
抄録等を出力するように指示するものである。(2) Presentation of Document Suitable for Search Purpose This request includes at least a search sentence, the name of a database to be searched, and other additional information such as a period to be searched. This is a normal information search request, and instructs to output the text of the conforming document in the designated database, or its title, abstract, and the like.

【００５０】上記のほか、検索環境の整備に関する要求
（利用者パスワードの設定変更や利用可能なデータベー
スの追加／削除、データベースに対する文書の追加／削
除を含む）等も考えられるが、本発明の中心的な内容で
はないので詳細な説明は省略する。In addition to the above, there are conceivable requests for maintenance of a search environment (including setting of a user password, addition / deletion of an available database, addition / deletion of a document to / from a database), and the like. Since the content is not typical, detailed description is omitted.

【００５１】受信部３０１から入力した要求の種類は、
要求中の、その種類を示す特定のビットの値を参照する
等して判定する。あるいは、少なくとも上記（１）であ
るか（２）であるかは、データベースの名称の有無によ
り判断するようにしてもよい。すなわち、検索文ととも
にデータベースの指定があれば（２）、検索文はあるが
データベースの指定がなければ（１）とする。The type of the request input from the receiving unit 301 is
The determination is made by referring to the value of a specific bit indicating the type in the request. Alternatively, at least the above (1) or (2) may be determined based on the presence or absence of a database name. That is, if a database is specified together with a search sentence (2), if there is a search sentence but no database is specified (1).

【００５２】そして、このようにして判定した要求の種
類にしたがって、必要な情報を適切な機能部に対して出
力する。具体的には、入力した要求が上記（１）または
（２）であったときは、要求中に含まれる検索文等を後
述する文書適合度算出部３０３に、また要求の種類を後
述する文書抽出部３０４に対して出力する。Then, in accordance with the type of request determined in this way, necessary information is output to an appropriate function unit. Specifically, when the input request is the above (1) or (2), a search sentence or the like included in the request is transmitted to a document relevance calculating unit 303 described later, and the type of the request is transmitted to a document described later. Output to the extraction unit 304.

【００５３】文書適合度算出部３０３は、特許請求の範
囲にいう「第１の算出手段」に相当し、クエリーベクト
ル作成部３０３ａと、文書ベクトル作成部３０３ｂと、
ベクトル類似度算出部３０３ｃとを備えている。The document suitability calculating section 303 corresponds to “first calculating means” in the claims, and includes a query vector creating section 303a, a document vector creating section 303b,
A vector similarity calculation unit 303c.

【００５４】クエリーベクトル作成部３０３ａは、要求
処理部３０２から入力した検索文に基づいて、その意味
内容を数値的に表現したクエリーベクトルを作成する。
ベクトルの構造については後述する。The query vector creation unit 303a creates a query vector that numerically expresses its meaning based on the search sentence input from the request processing unit 302.
The structure of the vector will be described later.

【００５５】文書ベクトル作成部３０３ｂは、検索用フ
ァイル記憶部３００に記憶された各々のデータベースの
検索用ファイルに基づいて、データベースに格納されて
いる個々の文書の文書ベクトルを作成する。The document vector creation unit 303b creates a document vector of each document stored in the database based on the search file of each database stored in the search file storage unit 300.

【００５６】ここで文書ベクトルとは、データベース内
のすべての文書に含まれるすべての語句と同数の要素値
からなるベクトルであり、各語句に対応する各要素値の
特徴によって、当該文書の意味内容を数値的に把握する
ことができる。Here, the document vector is a vector having the same number of element values as all the phrases contained in all the documents in the database, and the semantic content of the document is determined by the characteristic of each element value corresponding to each phrase. Can be grasped numerically.

【００５７】単純には、文書ベクトルの各要素値は、そ
れに対応する語句が当該文書中に出現する頻度によって
決定される。たとえばその文書にある語句が１回出現し
ていれば、当該語句に対応する文書ベクトル内の要素値
は１となり、１０回出現していれば１０となり、まった
く出現していなければ０となる。Simply, each element value of the document vector is determined by the frequency at which the corresponding phrase appears in the document. For example, if the phrase in the document appears once, the element value in the document vector corresponding to the phrase becomes 1, if it appears 10 times, it becomes 10, and if it does not appear at all, it becomes 0.

【００５８】ただし、たとえば「銀行」という語を多く
含む文書であっても、同じデータベース内のほかの文書
にもその語が多く含まれる場合は、その語が当該文書に
とって特徴的である度合いは低いと考えられる。反対
に、「銀行」という語をわずかしか含まない文書であっ
ても、他の文書にはその語がまったく現れていなけれ
ば、その語は他の文書と比較したときの当該文書の特徴
を端的に表現していると考えられる。However, for example, even if a document contains many words "bank", if another document in the same database also contains many words, the degree to which the word is characteristic for the document is determined. It is considered low. Conversely, even if a document contains only a few words "bank", if the word does not appear in any other document, then the word may simply characterize the document when compared to other documents. It is thought that it is expressed in.

【００５９】このような事情に鑑みて、本発明の実施の
形態ではベクトルの各要素値を単純な語句の出現頻度で
なく、当該語句の出現箇所の統計学的な特徴、すなわち
文書間や、あるいは一文書内での分散状況等を考慮して
算出するようにしている。上述のように、データベース
ごとに作成されている検索用ファイルにはこれらの情報
が記述されているので、文書ベクトル作成部３０３ｂは
検索用ファイル記憶部３００にキャッシュされたこれら
の検索ファイルを参照して、個々の文書の文書ベクトル
を作成する。In view of such circumstances, in the embodiment of the present invention, each element value of a vector is determined not by the frequency of appearance of a simple phrase, but by the statistical characteristics of the appearance of the phrase, that is, between documents, Alternatively, the calculation is performed in consideration of the distribution status in one document. As described above, such information is described in the search file created for each database. Therefore, the document vector creation unit 303b refers to these search files cached in the search file storage unit 300. To create a document vector for each document.

【００６０】ベクトル類似度算出部３０３ｃは、クエリ
ーベクトル作成部３０３ａで作成されたクエリーベクト
ルと、文書ベクトル作成部３０３ｂで作成された個々の
文書ベクトルとの類似度を順次算出する。ベクトルの類
似度は、具体的にはそれらの内積に基づいて所定の計算
式により算出する。The vector similarity calculator 303c sequentially calculates the similarity between the query vector created by the query vector creator 303a and the individual document vectors created by the document vector creator 303b. The vector similarity is specifically calculated by a predetermined formula based on the inner product thereof.

【００６１】個々のベクトルは検索文や文書の意味内容
を数値的に表現したものなので、ベクトルの類似度をも
って、当該クエリーベクトルの元となった検索文と当該
文書ベクトルの元となった文書の意味内容の類似度、言
い換えれば、当該検索文に対する当該文書の適合度とす
る。Since each vector is a numerical representation of the meaning of a search sentence or a document, the similarity of the vector is used to determine the search sentence that is the source of the query vector and the document that is the source of the document vector. The similarity of the semantic content, in other words, the relevance of the document to the search sentence.

【００６２】上記のようにして検索文に対する各文書の
適合度を算出すると、文書適合度算出部３０３は、各文
書とその所属するデータベース、および検索文に対する
適合度とを対応づけて、適合度の高い順に並び替えをお
こなったリスト（または当該リストの存在する場所、た
とえばメモリ上のアドレス）を、後述する文書抽出部３
０４に対して出力する。図４は、文書適合度算出部３０
３により作成されるリストの一例を模式的に示す説明図
である。When the relevance of each document to the search text is calculated as described above, the document relevance calculating unit 303 associates each document with the database to which it belongs, and the relevance for the search text, and The list (or a location where the list exists, for example, an address on a memory) sorted in the descending order of
04 is output. FIG. 4 shows the document relevance calculating unit 30.
FIG. 4 is an explanatory diagram schematically showing an example of a list created by No. 3.

【００６３】文書抽出部３０４は、要求処理部３０２か
ら入力した要求の種類と、文書適合度算出部３０３から
入力した各文書の適合度のリストとに基づいて、入力し
た文書の中から所定数の文書を抽出する。その抽出の基
準と、抽出した文書の出力先とは、要求の種類によって
異なる。The document extracting unit 304 determines a predetermined number of input documents based on the type of request input from the request processing unit 302 and the list of relevance of each document input from the document relevance calculating unit 303. To extract documents. The extraction criterion and the output destination of the extracted document differ depending on the type of request.

【００６４】すなわち、要求処理部３０２から入力した
要求の種別が上述の（１）であった場合は、文書抽出部
３０４は文書適合度算出部３０３から入力したリストを
参照して、適合度の高い順に何件か、たとえば上位５件
の文書を抽出する。上述の図４の例では、抽出されるの
は「Ａ銀行、Ｂ銀行、Ｃ銀行の提携」から「米国Ｄ銀行
とＧ銀行提携の噂」までの５件の文書となる。そして抽
出した文書のレコードを上記リスト中から抜き出して、
後述するデータベース適合度算出部３０５に対して出力
する。That is, when the type of the request input from the request processing unit 302 is the above (1), the document extracting unit 304 refers to the list input from the document relevance calculating unit 303 and Some documents, for example, the top five documents are extracted in descending order. In the example of FIG. 4 described above, five documents are extracted from “alliance between Bank A, Bank B, and Bank C” to “rumors of alliance between Bank D and Bank G in the United States”. Then, extract the record of the extracted document from the above list,
Output to the database adaptability calculating unit 305 described later.

【００６５】また、要求の種別が上述の（２）であった
場合は、適合度の高い文書からたとえば１０件、図４の
例では「Ａ銀行、Ｂ銀行、Ｃ銀行の提携」から「米国Ｄ
銀行とＥ銀行の駆け引き」までの文書を抽出する。そし
て抽出した文書のレコードを上記リスト中から抜き出し
て、後述する送信部３０６に対して出力する。When the type of request is the above (2), for example, ten documents having a high relevance are shown, and in the example of FIG. 4, "Affiliation between Bank A, Bank B and Bank C" is changed to "US D
Extract the documents up to "Bargaining between Bank and E Bank". Then, the record of the extracted document is extracted from the list and output to the transmission unit 306 described later.

【００６６】データベース適合度算出部３０５は、特許
請求の範囲にいう「第２の算出手段」に相当し、文書抽
出部３０４によって抽出された文書から、入力された検
索文に対する各データベースの適合度を算出する。この
算出は、具体的には下記のような数値に着目しておこな
う。The database relevance calculating unit 305 corresponds to the “second calculating unit” in the claims, and calculates the relevance of each database from the document extracted by the document extracting unit 304 to the input search text. Is calculated. This calculation is specifically performed by focusing on the following numerical values.

【００６７】（ａ）データベースごとの文書の件数たとえば図４の例では、文書抽出部３０４により抽出さ
れた上位５件の文書中に、「経済記事データベース」の
文書は「Ａ銀行、Ｂ銀行、Ｃ銀行の提携」「Ａ銀行、Ｂ
銀行、Ｃ銀行の提携の波紋」の２件、「政治記事データ
ベース」の文書は「政府、銀行の統廃合を促進」の１
件、「国際記事データベース」の文書は「米国Ｄ銀行と
Ｅ銀行とＦ銀行の合併問題」「米国Ｄ銀行とＧ銀行提携
の噂」の２件含まれている。(A) Number of documents for each database For example, in the example of FIG. 4, among the top five documents extracted by the document extraction unit 304, the documents of the "economic article database" are "bank A, bank B, Bank C tie-up, "Bank A, B
Two documents, "Ripple of Bank and C Bank Alliance", and "Political Article Database" document, "Promoting Consolidation of Government and Bank"
There are two documents in the "International Article Database": "The merger problem between Bank D, Bank E and Bank F in the United States" and "Rumors about the partnership between Bank D and Bank G in the United States".

【００６８】この件数を各データベースの適合度とすれ
ば、適合度の高いデータベースは順に「経済記事データ
ベース」および「国際記事データベース」（同順位）、
つぎに「政治記事データベース」となる。この計算法に
よれば、適合度の高い文書を多く有するデータベースほ
どその順位は高くなる。If this number is regarded as the relevance of each database, the databases with the higher relevance are sequentially “economic article database” and “international article database” (same rank),
Next is the "political article database". According to this calculation method, a database having many documents with high relevance has a higher rank.

【００６９】（ｂ）データベースごとの文書の適合度の
総和、または平均値たとえば図４の例では、上位５件中に占める「経済記事
データベース」の文書「Ａ銀行、Ｂ銀行、Ｃ銀行の提
携」「Ａ銀行、Ｂ銀行、Ｃ銀行の提携の波紋」の適合度
の総和は２４３．５７９、平均値は１２１．７８９（端
数切り捨て）であり、「政治記事データベース」の文書
「政府、銀行の統廃合を促進」の適合度の総和および平
均値は１０７．６７８であり、「国際記事データベー
ス」の文書「米国Ｄ銀行とＥ銀行とＦ銀行の合併問題」
「米国Ｄ銀行とＧ銀行提携の噂」の適合度の総和は２０
８．１３４、平均値は１０４．０６７である。(B) Sum or average of the degrees of relevance of documents for each database For example, in the example of FIG. 4, the documents of the "economic article database" in the top five documents "Alliance of Bank A, Bank B and Bank C" The total relevance of "Ripple of Bank A, Bank B and Bank C alliance" is 243.579, the average value is 121.789 (rounded down), and the document "Political article, The sum and average of the fitness values for "Promote Consolidation" is 107.678, and the document "International Article Database" on "A merger problem between Bank D, Bank E and Bank F of the United States"
The total relevance of the "rumors of US D-bank and G-bank alliance" is 20
8.134, average value is 104.067.

【００７０】文書の適合度の総和をデータベースの適合
度とすれば、データベースの順位は高い順に「経済記事
データベース」「国際記事データベース」「政治記事デ
ータベース」となり、平均値を適合度とすれば「経済記
事データベース」「政治記事データベース」「国際記事
データベース」となる。総和を用いる計算法によれば、
各文書の適合度は比較的低くてもその件数の多いデータ
ベースの順位が上がることになり、平均値を用いる計算
法によれば、件数は少なくても各文書の適合度の比較的
高いデータベースの順位が上がることになる。Assuming that the sum of the degrees of matching of documents is the degree of matching of the database, the order of the database is “economic article database”, “international article database”, and “political article database” in descending order. It is an economic article database, a political article database, and an international article database. According to the calculation method using summation,
Even if the relevance of each document is relatively low, the ranking of the database with the large number of cases rises, and according to the calculation method using the average value, the database of the database with the relatively high relevance of each document even if the number of cases is small is high. The ranking will rise.

【００７１】（ｃ）データベースごとの文書の順位たとえば定数１０００を各文書の順位で割ったものの総
和を、各データベースの適合度とする。図４の例では、
「経済記事データベース」の文書「Ａ銀行、Ｂ銀行、Ｃ
銀行の提携」「Ａ銀行、Ｂ銀行、Ｃ銀行の提携の波紋」
はそれぞれ１位と２位であるので、１０００／１＋１０
００／２＝１５００を当該データベースの適合度とす
る。(C) Order of Documents in Each Database For example, the sum of values obtained by dividing the constant 1000 by the order of each document is defined as the relevance of each database. In the example of FIG.
"Economic article database" documents "Bank A, Bank B, C
Bank Alliance "" Ripple of Bank A, Bank B and Bank C Alliance "
Are 1st and 2nd respectively, so 1000/1 + 10
00/2 = 1500 is set as the fitness of the database.

【００７２】同様に「政治記事データベース」の適合度
は１０００／４＝２５０、「国際記事データベース」の
適合度は１０００／３＋１０００／５＝５３３．３（端
数切り捨て）となり、順位は高い順に「経済記事データ
ベース」「国際記事データベース」「政治記事データベ
ース」となる。この計算法によれば、件数は少なくても
適合度の順位の高い文書を有するデータベースほど、そ
れ自体の適合度は高くなることになる。Similarly, the relevance of the “political article database” is 1000/4 = 250, and the relevance of the “international article database” is 1000/3 + 1000/5 = 533.3 (rounded down). Article database, "international article database," and "political article database." According to this calculation method, a database having a document with a high degree of relevance even if the number of cases is small has a higher degree of relevance of itself.

【００７３】データベースの適合度は上記のいずれかの
数値をもって算出してもよいが、一つの計算法では優劣
のつかない場合もあるため（上記（ａ）の例を参照）、
いくつかを組み合わせて算出するようにしてもよい。こ
こでは説明の便宜上、上記（ｂ）の各文書の適合度の総
和によって、データベースの適合度が「経済記事データ
ベース」は２４３．５７９、「経済記事データベース」
は１０７．６７８、「国際記事データベース」は２０
８．１３４と算出されたものとする。Although the degree of conformity of the database may be calculated using any of the above numerical values, there is a case where one calculation method does not give any advantage (see the example of (a) above).
You may make it calculate combining some. Here, for the sake of convenience of explanation, the conformity of the database is “243.579” for the “economic article database” and “economic article database” based on the sum of the relevance of each document in (b) above.
Is 107.678, and "international article database" is 20
It is assumed that 8.134 is calculated.

【００７４】データベース適合度算出部３０５は、上記
のようにして各データベースの適合度を算出すると、後
述する送信部３０６に対して、データベースの名称をそ
の順位の高い順に出力する。データベースの名称ととも
に、その適合度の値を出力するようにしてもよい。After calculating the fitness of each database as described above, the database fitness calculating unit 305 outputs the names of the databases to the transmitting unit 306, which will be described later, in descending order of their rank. The value of the matching degree may be output together with the name of the database.

【００７５】送信部３０６は、文書抽出部３０４または
データベース適合度算出部３０５から入力した処理結果
を、所定のプロトコルにしたがい、ネットワーク１００
を介してクライアント１０５または１０６に送信する。
送信された処理結果は、クライアント１０５または１０
６の表示画面上に出力される。The transmitting unit 306 converts the processing result input from the document extracting unit 304 or the database matching degree calculating unit 305 into the network 100 according to a predetermined protocol.
To the client 105 or 106 via the.
The transmitted processing result is transmitted to the client 105 or 10
6 is output on the display screen.

【００７６】図５は、クライアント１０５または１０６
から入力された要求が、検索文「銀行の統廃合、合併に
ついて」を含む上述（１）の要求であった場合に、最終
的にクライアント１０５または１０６の表示画面に表示
される処理結果の一例を示す説明図である。FIG. 5 shows the client 105 or 106
Is an example of the processing result finally displayed on the display screen of the client 105 or 106 when the request input from the above is the request of the above-described (1) including the search sentence "about bank consolidation and merger". FIG.

【００７７】なお、データベースの名称とともにその適
合度を表示するようにしてもよい。また図５の画面にさ
らに検索文入力エリア等をあわせて表示し、操作者が当
該エリア内に新たな検索文を入力し、データベースの中
からマウス等によりいずれかを指定して実行ボタン等を
クリックすることにより、指定されたデータベースに対
する新たな検索文による検索が実行できるようにしても
よい。Note that the degree of compatibility may be displayed together with the name of the database. Further, a search sentence input area and the like are further displayed on the screen of FIG. 5, and the operator inputs a new search sentence in the area, designates one of the databases with a mouse or the like, and clicks an execution button or the like. By clicking, a search with a new search sentence for the specified database may be executed.

【００７８】上記の実行ボタン等がクリックされると、
クライアント１０５または１０６はこの画面から抜き出
した検索文とデータベースの名称とに基づいて、上述の
（２）の要求を作成し、情報検索装置１０１に対して送
信する。このようにすれば、データベースの検索画面か
ら直接検索されたデータベースに対する文書の検索が実
行できるため、画面切り替え等の操作が不要となり操作
性が向上する。When the above execution button or the like is clicked,
The client 105 or 106 creates the above request (2) based on the search sentence extracted from this screen and the name of the database, and transmits the request to the information search device 101. With this configuration, since a document can be searched for the database searched directly from the database search screen, operations such as screen switching are not required, and operability is improved.

【００７９】また本発明の実施の形態では、上述（１）
の要求が入力されたときはデータベースの名称を適合度
の高い順に表示するのみとしたので、操作者が続けて文
書の検索をおこなうときは、いずれかのデータベースを
指定してあらためて上述（２）の要求を入力しなければ
ならないが、上位何件かのデータベースを自動的に指定
して、引き続いて同じ検索文による文書検索をおこなう
ようにしてもよい。In the embodiment of the present invention, the above-mentioned (1)
Is input, only the names of the databases are displayed in the order of higher relevance. Therefore, when the operator performs a subsequent search for a document, any one of the databases is designated and the above-mentioned (2) is renewed. However, it is also possible to automatically specify some top databases and then perform a document search using the same search sentence.

【００８０】なお検索用ファイル記憶部３００、受信部
３０１、要求処理部３０２、文書適合度算出部３０３、
文書抽出部３０４、データベース適合度算出部３０５お
よび送信部３０６は、それぞれＲＯＭ２０２、ＲＡＭ２
０３またはハードディスク２０５、フロッピーディスク
２０７等の記録媒体に記録されたプログラムに記載され
た命令にしたがってＣＰＵ２０１等が命令処理を実行す
ることにより、各部の機能を実現するものである。The search file storage unit 300, the receiving unit 301, the request processing unit 302, the document relevance calculating unit 303,
The document extracting unit 304, the database matching degree calculating unit 305, and the transmitting unit 306 are provided in the ROM 202 and the RAM 2
03 or the hard disk 205, the floppy disk 207, or the like, the CPU 201 or the like executes the instruction processing in accordance with the instructions described in the program recorded on a recording medium such as the recording medium, thereby realizing the function of each unit.

【００８１】つぎに、本発明の実施の形態にかかる情報
検索装置（兼情報検索装置）の一連の処理について説明
する。図６は、本発明の実施の形態にかかる情報検索装
置の処理の手順を示すフローチャートである。Next, a series of processes of the information search device (also as an information search device) according to the embodiment of the present invention will be described. FIG. 6 is a flowchart illustrating a processing procedure of the information search device according to the embodiment of the present invention.

【００８２】図６のフローチャートにおいて、まずステ
ップＳ６０１で、受信部３０１においてクライアント１
０５または１０６から送信された要求を受信する。続く
ステップＳ６０２で、要求処理部３０２において、受信
部３０１から入力した要求が検索目的に適合するデータ
ベースの提示要求であるかどうかを判定する。In the flowchart shown in FIG. 6, first, in step S601, the client 301
Receive the request sent from 05 or 106. In subsequent step S602, the request processing unit 302 determines whether the request input from the receiving unit 301 is a presentation request of a database suitable for the search purpose.

【００８３】そして入力した要求がデータベースの提示
要求であるときは（ステップＳ６０２肯定）、ステップ
Ｓ６０３で、クエリーベクトル作成部３０３ａにおい
て、入力した要求中の検索文からクエリーベクトルを作
成する。続くステップＳ６０４で、文書ベクトル３０３
ｂにおいて、データベースの個々の文書の文書ベクトル
を作成する。If the input request is a request to present a database (Yes at step S602), at step S603, the query vector generating unit 303a generates a query vector from the input search query. In the following step S604, the document vector 303
At b, create a document vector for each document in the database.

【００８４】そしてステップＳ６０５で、ベクトル類似
度算出部３０３ｃにおいて、クエリーベクトルと個々の
文書ベクトルの類似度を算出し、この算出結果のリスト
をステップＳ６０６で文書抽出部３０４に対して出力す
る。In step S605, the vector similarity calculator 303c calculates the similarity between the query vector and each document vector, and outputs a list of the calculation result to the document extractor 304 in step S606.

【００８５】ステップＳ６０７で、文書抽出部３０４に
おいて、入力したリストの中から適合度の順に所定数
（この数は要求の種類に応じてあらかじめ定められてい
る）の文書を抽出する。そしてステップＳ６０８で、デ
ータベース適合度算出部３０５において、ステップＳ６
０７で抽出した文書に基づいて各データベースの適合度
を算出する。In step S607, the document extracting unit 304 extracts a predetermined number of documents (this number is predetermined according to the type of request) from the input list in the order of relevance. Then, in step S608, the database matching degree calculation unit 305 determines in step S6
The relevance of each database is calculated based on the document extracted in 07.

【００８６】さらにステップＳ６０９で、送信部３０６
において、クライアント１０５または１０６に対してス
テップＳ６０８で算出した適合度の高い順にデータベー
スの名称を送信し、本フローチャートによる処理を終了
する。Further, in step S609, transmitting section 306
, The names of the databases are transmitted to the client 105 or 106 in descending order of the degree of matching calculated in step S608, and the processing according to this flowchart ends.

【００８７】またステップＳ６０２において、入力した
要求がデータベースの提示要求でなかったときは（ステ
ップＳ６０２否定）、ステップＳ６１０において、受信
部３０１から入力した要求が検索目的に適合する文書の
提示要求であるかどうかを判定する。If the input request is not a database presentation request in step S602 (No in step S602), in step S610, the request input from the receiving unit 301 is a document presentation request suitable for the search purpose. Is determined.

【００８８】そして入力した要求が文書の提示要求であ
ったときは（ステップＳ６１０肯定）ステップＳ６１１
に移行し、文書の提示要求でなかったときは（ステップ
Ｓ６１０否定）本フローチャートによる処理を終了し
て、その要求の種類に応じたほかの処理をおこなう。When the input request is a document presentation request (Yes at step S610), step S611 is performed.
If the request is not a document presentation request (No at Step S610), the process according to this flowchart ends, and another process according to the type of the request is performed.

【００８９】ステップＳ６１１〜Ｓ６１５はステップＳ
６０３〜Ｓ６０７と同様であるので説明を省略する。ス
テップＳ６１６で、送信部３０６において、ステップＳ
６１５で抽出した文書の表題やその適合度等をクライア
ント１０５または１０６に対して送信し、本フローチャ
ートによる処理を終了する。Steps S611 to S615 correspond to step S
The description is omitted because it is the same as 603 to S607. In step S616, the transmitting unit 306 executes step S616.
The title of the document extracted in 615 and the degree of conformity thereof are transmitted to the client 105 or 106, and the processing according to this flowchart ends.

【００９０】以上説明したように本発明の実施の形態に
よれば、ある検索要求に対するあるデータベースの適合
度は、適合または不適合の二値でなく連続した値を取る
数値によってきめ細かく算出され、また操作者はその結
果を参照して、必要に応じてデータベースを取捨選択す
ることができるため、見込みのないデータベースを検索
したり不必要に大量のデータベースを検索したりするこ
とがなく、検索の処理効率が向上する。As described above, according to the embodiment of the present invention, the relevance of a certain database to a certain search request is finely calculated by a numerical value that takes a continuous value instead of a binary value of conformity or non-conformity. The user can refer to the result and select the database as needed, eliminating the need to search for unlikely databases or unnecessarily large numbers of databases. Is improved.

【００９１】また操作者は、まず簡単な検索文によりデ
ータベースを絞り込んでからより複雑な検索文によって
検索をおこなうことで、多数のデータベースに格納され
た大量の文書の中から、検索ノイズやシステムの処理負
荷を低く抑えつつ、効率的に所望の文書を検索すること
ができる。The operator first narrows down the database with a simple search sentence and then performs a search with a more complicated search sentence, so that a large number of documents stored in a large number of databases can be searched for search noise and system noise. It is possible to efficiently search for a desired document while keeping the processing load low.

【００９２】また文書の検索が主目的ではなく、むしろ
利用可能な情報源に蓄積されている情報の特徴や傾向を
概略的に知りたい場合にも、本発明にかかる情報検索装
置を利用することができる。たとえばある事件について
積極的に取材している新聞社はどこであるか、ネットワ
ークにより接続された各種の電子掲示板の中で、ある事
項に関する情報が充実しているのはどれであるか、等を
調べたい場合である。本発明によれば、自然文による問
い合わせを入力するだけで、容易にそれぞれの情報源の
特徴を把握することができる。[0092] Also, when the retrieval of documents is not the main purpose, but rather when one wants to roughly know the characteristics and trends of information stored in available information sources, the information retrieval apparatus according to the present invention can be used. Can be. For example, find out which newspaper companies are actively reporting on a particular case, and which of the various electronic bulletin boards connected by the network have more information on a particular matter. If you want. ADVANTAGE OF THE INVENTION According to this invention, the characteristic of each information source can be easily grasped only by inputting the inquiry by a natural sentence.

【００９３】なお、本発明の実施の形態で説明した情報
検索方法は、あらかじめ用意されたプログラムをパーソ
ナルコンピュータやワークステーション等のコンピュー
タで実行することにより実現される。このプログラム
は、ハードディスク、フロッピーディスク、ＣＤ−ＲＯ
Ｍ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記
録媒体に記録され、コンピュータによって記録媒体から
読み出されて実行される。またこのプログラムは、上記
記録媒体を介して、または伝送媒体として、インターネ
ット等のネットワークを介して配布することができる。The information search method described in the embodiment of the present invention is realized by executing a prepared program on a computer such as a personal computer or a workstation. This program is for hard disk, floppy disk, CD-RO
The program is recorded on a computer-readable recording medium such as M, MO, and DVD, and is read out from the recording medium and executed by the computer. This program can be distributed via the recording medium or as a transmission medium via a network such as the Internet.

【００９４】[0094]

【発明の効果】以上説明したように、請求項１に記載の
発明によれば、第１の算出手段が、前記複数のデータベ
ースに格納された文書の、入力された検索要求に対する
適合度を文書ごとに算出し、抽出手段が、前記第１の算
出手段により算出された適合度に基づいて、前記複数の
データベースに格納された文書から所定の数の文書を抽
出し、第２の算出手段が、前記抽出手段により抽出され
た文書のデータベースごとの数に基づいて、前記複数の
データベースの前記検索要求に対する適合度をデータベ
ースごとに算出するため、検索要求に対する各データベ
ースの適合度が、当該検索要求に対する適合度に基づい
て抽出された文書内に占める、各データベースの文書の
件数に基づいてきめ細かく算出され、これによって、操
作者に対して適切な情報源を選択するための手がかりを
提供することが可能な情報検索装置が得られるという効
果を奏する。As described above, according to the first aspect of the present invention, the first calculating means determines the relevance of the documents stored in the plurality of databases to the input search request. The extraction means extracts a predetermined number of documents from the documents stored in the plurality of databases based on the fitness calculated by the first calculation means, and the second calculation means Calculating the relevance of the plurality of databases to the search request for each database based on the number of documents extracted by the extraction means for each database, so that the relevance of each database to the search request is Calculated finely based on the number of documents in each database in the documents extracted based on the degree of relevance to An effect that is capable of information retrieval device to provide clues for selecting an information source.

【００９５】また、請求項２に記載の発明によれば、第
１の算出手段が、前記複数のデータベースに格納された
文書の、入力された検索要求に対する適合度を文書ごと
に算出し、抽出手段が、前記第１の算出手段により算出
された適合度に基づいて、前記複数のデータベースに格
納された文書から所定の数の文書を抽出し、第２の算出
手段が、前記抽出手段により抽出された文書の前記第１
の算出手段により算出された適合度に基づいて、前記複
数のデータベースの前記検索要求に対する適合度をデー
タベースごとに算出するため、検索要求に対する各デー
タベースの適合度が、当該検索要求に対する適合度に基
づいて抽出された文書の、各データベースごとの適合度
の総和や平均値等に基づいてきめ細かく算出され、これ
によって、操作者に対して適切な情報源を選択するため
の手がかりを提供することが可能な情報検索装置が得ら
れるという効果を奏する。According to the second aspect of the present invention, the first calculating means calculates, for each document, the relevance of the documents stored in the plurality of databases to the input search request and extracts the document. Means for extracting a predetermined number of documents from the documents stored in the plurality of databases based on the degree of matching calculated by the first calculating means, and second calculating means for extracting by the extracting means Said first of said document
Based on the relevance calculated by the calculating means, the relevance of the plurality of databases to the search request is calculated for each database, so that the relevance of each database to the search request is based on the relevance to the search request. Based on the sum and average of the relevance of each extracted database for each database, it can be calculated finely, providing the operator with clues for selecting an appropriate information source. This provides an effect that a simple information retrieval device can be obtained.

【００９６】また、請求項３に記載の発明によれば、第
１の算出工程が、前記複数のデータベースに格納された
文書の、入力された検索要求に対する適合度を文書ごと
に算出し、抽出工程が、前記第１の算出工程により算出
された適合度に基づいて、前記複数のデータベースに格
納された文書から所定の数の文書を抽出し、第２の算出
工程が、前記抽出工程により抽出された文書のデータベ
ースごとの数に基づいて、前記複数のデータベースの前
記検索要求に対する適合度をデータベースごとに算出す
るため、検索要求に対する各データベースの適合度が、
当該検索要求に対する適合度に基づいて抽出された文書
内に占める、各データベースの文書の件数に基づいてき
め細かく算出され、これによって、操作者に対して適切
な情報源を選択するための手がかりを提供することが可
能な情報検索方法が得られるという効果を奏する。According to the third aspect of the present invention, the first calculating step calculates, for each document, the relevance of the documents stored in the plurality of databases with respect to the input search request, and extracts the document. A step of extracting a predetermined number of documents from the documents stored in the plurality of databases based on the degree of matching calculated by the first calculation step, and a second calculation step by the extraction step Based on the number of documents for each database, to calculate the relevance of the plurality of databases to the search request for each database, the relevance of each database to the search request,
It is finely calculated based on the number of documents in each database in the documents extracted based on the relevance to the search request, thereby providing the operator with clues for selecting an appropriate information source. There is an effect that an information search method capable of performing the search can be obtained.

【００９７】また、請求項４に記載の発明によれば、第
１の算出工程が、前記複数のデータベースに格納された
文書の、入力された検索要求に対する適合度を文書ごと
に算出し、抽出工程が、前記第１の算出工程により算出
された適合度に基づいて、前記データベースに格納され
た文書から所定の数の文書を抽出し、第２の算出工程
が、前記抽出工程により抽出された文書の前記第１の算
出工程により算出された適合度に基づいて、前記複数の
データベースの前記検索要求に対する適合度をデータベ
ースごとに算出するため、検索要求に対する各データベ
ースの適合度が、当該検索要求に対する適合度に基づい
て抽出された文書の、各データベースごとの適合度の総
和や平均値等に基づいてきめ細かく算出され、これによ
って、操作者に対して適切な情報源を選択するための手
がかりを提供することが可能な情報検索方法が得られる
という効果を奏する。According to the fourth aspect of the present invention, the first calculating step calculates the degree of conformity of the documents stored in the plurality of databases to the input search request for each document, and extracts the document. A step of extracting a predetermined number of documents from the documents stored in the database based on the degree of matching calculated in the first calculation step, and a second calculation step extracted by the extraction step Based on the relevance calculated in the first calculation step of the document, the relevance of the plurality of databases to the search request is calculated for each database. Of documents extracted based on the degree of relevance to each database, is calculated finely based on the sum or average value of relevance for each database. Information retrieval method capable of providing a clue to select appropriate information sources is an effect that can be obtained.

【００９８】また、請求項５に記載の発明によれば、請
求項３または４のいずれかに記載された方法をコンピュ
ータに実行させるプログラムを記録したことで、そのプ
ログラムを機械読み取り可能となり、これによって、請
求項３または４の動作をコンピュータによって実現する
ことが可能な記録媒体が得られるという効果を奏する。According to the fifth aspect of the present invention, by recording a program for causing a computer to execute the method according to the third or fourth aspect, the program becomes machine-readable. Accordingly, an effect is obtained that a recording medium capable of realizing the operation of claim 3 or 4 by a computer can be obtained.

[Brief description of the drawings]

【図１】本発明の実施の形態にかかる情報検索装置を含
む、情報検索システムのシステム構成を示すブロック図
である。FIG. 1 is a block diagram showing a system configuration of an information search system including an information search device according to an embodiment of the present invention.

【図２】本発明の実施の形態にかかる情報検索装置のハ
ードウエア構成を示すブロック図である。FIG. 2 is a block diagram illustrating a hardware configuration of the information search device according to the embodiment of the present invention.

【図３】本発明の実施の形態にかかる情報検索装置の構
成を機能的に示すブロック図である。FIG. 3 is a block diagram functionally showing a configuration of the information search device according to the embodiment of the present invention.

【図４】本発明の実施の形態にかかる文書適合度算出部
３０３により作成されるリストの一例を模式的に示す説
明図である。FIG. 4 is an explanatory diagram schematically showing an example of a list created by the document relevance calculating section 303 according to the embodiment of the present invention.

【図５】本発明の実施の形態にかかる情報検索装置の処
理結果の、クライアント１０５または１０６における表
示の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of a display on the client 105 or 106 of a processing result of the information search device according to the embodiment of the present invention.

【図６】本発明の実施の形態にかかる情報検索装置の処
理の手順を示すフローチャートである。FIG. 6 is a flowchart illustrating a processing procedure of the information search device according to the embodiment of the present invention;

[Explanation of symbols]

１００ネットワーク１０１情報検索装置１０２、１０３、１０４ファイルサーバ１０５、１０６クライアント２００バス２０１ＣＰＵ２０２ＲＯＭ２０３ＲＡＭ２０４ＨＤＤ２０５ＨＤ２０６ＦＤＤ２０７ＦＤ２０８ディスプレイ２０９Ｉ／Ｆ２１０通信回線２１１キーボード２１２マウス２１３スキャナ２１４プリンタ２１５ＣＤ−ＲＯＭ２１６ＣＤ−ＲＯＭドライブ３００検索用ファイル記憶部３０１受信部３０２要求処理部３０３文書適合度算出部３０３ａクエリーベクトル作成部３０３ｂ文書ベクトル作成部３０３ｃベクトル類似度算出部３０４文書抽出部３０５データベース適合度算出部３０６送信部 100 Network 101 Information Retrieval Device 102, 103, 104 File Server 105, 106 Client 200 Bus 201 CPU 202 ROM 203 RAM 204 HDD 205 HD 206 FDD 207 FD 208 Display 209 I / F 210 Communication Line 211 Keyboard 212 Mouse 213 Scanner 214 Printer 215 CD-ROM 216 CD-ROM drive 300 Search file storage unit 301 Receiving unit 302 Request processing unit 303 Document relevance calculation unit 303a Query vector generation unit 303b Document vector generation unit 303c Vector similarity calculation unit 304 Document extraction unit 305 Database Fitness calculation unit 306 Transmission unit

Claims

[Claims]

1. An information search apparatus for selecting a database that matches a search purpose from among a plurality of databases, wherein a relevance of a document stored in the plurality of databases to an input search request is calculated for each document. First calculating means, extracting means for extracting a predetermined number of documents from the documents stored in the plurality of databases based on the degree of matching calculated by the first calculating means, and extracting by the extracting means And a second calculating means for calculating, for each database, the relevance of the plurality of databases to the search request based on the number of documents obtained for each database.

2. An information search apparatus for selecting a database that matches a search purpose from among a plurality of databases, wherein the relevance of a document stored in the plurality of databases to an input search request is calculated for each document. First calculating means, extracting means for extracting a predetermined number of documents from the documents stored in the plurality of databases based on the degree of matching calculated by the first calculating means, and extracting by the extracting means Second calculating means for calculating, for each database, the relevance of the plurality of databases with respect to the search request based on the relevance calculated by the first calculating means for the obtained document. Information retrieval device.

3. An information search method for selecting a database that matches a search purpose from a plurality of databases, wherein a degree of relevance of a document stored in the plurality of databases to an input search request is calculated for each document. A first calculation step; an extraction step of extracting a predetermined number of documents from the documents stored in the plurality of databases based on the fitness calculated in the first calculation step; A second calculation step of calculating, for each database, the relevance of the plurality of databases to the search request based on the number of documents obtained for each database.

4. An information search method for selecting a database that matches a search purpose from among a plurality of databases, wherein a degree of relevance of a document stored in the plurality of databases to an input search request is calculated for each document. A first calculation step; an extraction step of extracting a predetermined number of documents from the documents stored in the plurality of databases based on the fitness calculated in the first calculation step; And a second calculation step of calculating, for each database, the degrees of relevance of the plurality of databases to the search request based on the degrees of relevance calculated by the first calculation step of the obtained document. Information retrieval method.

5. A computer-readable recording medium on which a program for causing a computer to execute the method according to claim 3 or 4 is recorded.