JP2002014985A

JP2002014985A - Document search system and search document registration control method

Info

Publication number: JP2002014985A
Application number: JP2000198474A
Authority: JP
Inventors: Naoaki Kondo; 修明近藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2000-06-30
Filing date: 2000-06-30
Publication date: 2002-01-18

Abstract

(57)【要約】【課題】文書ファイルの更新内容を効率よくデータベー
スに自動反映できるようにし、頻繁に更新が行われる文
書ファイルに対する検索効率の向上を図る。【解決手段】登録対象の文書ファイルを読み込むと（ス
テップＳ３１）、その文書ファイルがすでにデータベー
ス１１６に登録されているか否かが判断される（ステッ
プＳ３２）。登録対象の文書ファイルが既登録の文書フ
ァイルであった場合には、すでに登録されているその文
書ファイルに関するデータベース上の該当する旧インデ
ックスデータの登録位置を示す識別子が取得された後
（ステップＳ３４）、登録対象の文書ファイルから新た
なインデックスデータが生成され、そのインデックスデ
ータが登録対象の文書ファイルへのリンク情報と共にデ
ータベースに登録される（ステップＳ３５）。この後、
旧インデックスデータおよびそれに関連づけられたリン
ク情報がデータベース１１６から削除される（ステップ
Ｓ３６）。 (57) [Summary] [PROBLEMS] To improve the search efficiency of a frequently updated document file by automatically updating the contents of the document file in a database efficiently. When a document file to be registered is read (step S31), it is determined whether or not the document file has already been registered in the database (step S32). If the document file to be registered is a registered document file, an identifier indicating the registration position of the corresponding old index data in the database relating to the registered document file is obtained (step S34). Then, new index data is generated from the document file to be registered, and the index data is registered in the database together with link information to the document file to be registered (step S35). After this,
The old index data and the link information associated therewith are deleted from the database 116 (step S36).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は自然言語検索システ
ムなどの文書検索システムおよびその検索システムに用
いられる検索文書登録制御方法に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a document retrieval system such as a natural language retrieval system and a retrieval document registration control method used in the retrieval system.

【０００２】[0002]

【従来の技術】従来より、文書検索の方式としてはキー
ワード検索システムが良く知られている。このキーワー
ド検索システムは、ユーザが入力したキーワードを含む
文書を検索するものである。このシステムで目的とする
文書を検索するためには、ユーザは、その目的文書に含
まれている適切なキーワードを指定する必要がある。こ
の場合、特に、データベース中に大量の文書がある場合
には、単独のキーワードを指定しただけでは多数の候補
文書が得られてしまうので、ユーザはさらにキーワード
を付加入力して候補を絞りこむといった絞り込み検索を
行う必要がある。2. Description of the Related Art Conventionally, a keyword search system is well known as a document search system. This keyword search system searches for a document including a keyword input by a user. In order to search for a target document in this system, the user needs to specify an appropriate keyword contained in the target document. In this case, especially when there is a large number of documents in the database, a large number of candidate documents can be obtained only by specifying a single keyword. Therefore, the user narrows down the candidates by additionally inputting a keyword. You need to perform a refined search.

【０００３】このような検索方式を用いたものとして、
膨大な文書ファイル中からユーザが必要とする文書ファ
イルを探し出すという全文検索システムが開発されてい
る。この文書検索システムにおいては、検索候補を絞り
込むために、ユーザがいくつかの単語を“ｏｒ”あるい
は“ａｎｄ”あるいは“ｎｏｔ”で結び、検索式を作成
し、それに応じて検索する必要があった。[0003] As one using such a search method,
2. Description of the Related Art A full-text search system has been developed which searches for a document file required by a user from a large number of document files. In this document search system, in order to narrow down search candidates, it is necessary for a user to connect several words with "or", "and", or "not", create a search formula, and perform a search accordingly. .

【０００４】そこで、最近では、登録された文書データ
を日常的な言葉（自然言語）で効率的に検索可能な自然
言語検索システムが開発されている。自然言語検索シス
テムにおいては、日常的な言葉（自然言語）により問い
合わせ文（クエリー）を入力するだけで、自動的にその
クエリーの内容解析、構文解析、形態素解析、キーワー
ドの重み付け処理、類似度算出などの処理が行われ、こ
れにより適切な文書をデータベースから容易に検索する
ことができる。[0004] Therefore, recently, a natural language search system capable of efficiently searching registered document data in everyday language (natural language) has been developed. In a natural language search system, simply inputting a query sentence (query) in everyday language (natural language) automatically analyzes the content of the query, syntax analysis, morphological analysis, keyword weighting processing, similarity calculation And the like, whereby an appropriate document can be easily retrieved from the database.

【０００５】このような自然言語検索システムにおいて
は、高度な文書解析処理が必要となるので、検索対象の
文書ファイルを予め解析してそれに対応する検索用デー
タを生成し、その検索用データをデータベースに登録し
ておくことが必要となる。In such a natural language search system, a high-level document analysis process is required. Therefore, a document file to be searched is analyzed in advance, search data corresponding to the document file is generated, and the search data is stored in a database. It is necessary to register in.

【０００６】[0006]

【発明が解決しようとする課題】しかし、従来では、こ
のような文書登録の作業はユーザによる手動操作で行わ
れており、新たに検索対象として含めたい文書ファイル
を指定して、登録処理を起動するという操作を行うこと
が必要とされた。予め決められたディレクトリ上の文書
ファイルを対象に自動的に登録処理を行うという自動登
録機能を持つものも考えられてはいるが、文書ファイル
の追加登録のみを考慮したものが多く、すでにデータベ
ースに登録されている文書ファイルが更新された場合の
処理については十分な考慮がなされていないのが通常で
ある。これは、従来の自然言語検索システムは、完成し
た文書ファイルについてのみそれを検索対象とするとい
う前提のもとで開発されていたからである。Conventionally, however, such a document registration operation is manually performed by a user, and a document file to be newly included as a search target is designated, and the registration process is started. It was necessary to perform the operation of doing. Although there are some that have an automatic registration function that automatically registers document files in a predetermined directory, there are many that consider only additional registration of document files, and already have Normally, sufficient consideration is not given to the processing when a registered document file is updated. This is because the conventional natural language search system was developed on the premise that only completed document files are to be searched.

【０００７】したがって、あるディレクトリ上に新たな
文書ファイルが存在する場合には、それに対応する検索
用データをデータベースに自動的に追加登録することは
できるものの、登録済みの文書ファイルの内容が更新さ
れた場合については、その更新内容をデータベースに反
映させるための作業を手作業で行うことが必要とされ
た。また、自動再登録を行う場合であっても、同一文書
ファイルに関する検索用データが２重にデータベースに
再登録されてしまい、これによってデータベース上の検
索用データのデータサイズの拡大、類似する検索用デー
タの冗長度の増大を招き、結果的に検索効率が低下する
という問題があった。Therefore, if a new document file exists in a certain directory, the search data corresponding to the new document file can be automatically added to the database, but the contents of the registered document file are updated. In such a case, it was necessary to manually perform the work for reflecting the updated contents in the database. Further, even when automatic re-registration is performed, search data for the same document file is double-registered again in the database, thereby increasing the data size of the search data in the database and retrieving similar search data. There has been a problem that the redundancy of data is increased, and as a result, the search efficiency is reduced.

【０００８】今後は、電子メール、電子文書回覧、ワー
クフロー、スケジュール管理といったグループウェアに
代表されるように、オフィス内における各個人の作業課
程における文書などのように頻繁に更新が繰り返される
ような文書ファイルについてもそれを有効活用するため
の仕組みの実現が求められることが予想される。In the future, documents that are frequently updated, such as documents in the work course of each individual in the office, as represented by groupware such as e-mail, electronic document circulation, workflow, and schedule management. It is anticipated that a mechanism for effectively utilizing files will be required.

【０００９】本発明は上述の事情に鑑みてなされたもの
であり、文書ファイルの更新内容を効率よくデータベー
スに自動反映できるようにし、頻繁に更新が行われる文
書ファイルに対してもそれを効率よく検索することが可
能な文書検索システムおよび検索文書登録制御方法を提
供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned circumstances, and enables an automatic update of the contents of a document file to be automatically reflected in a database, and efficiently updates the contents of a frequently updated document file. An object of the present invention is to provide a searchable document search system and a search document registration control method.

【００１０】[0010]

【課題を解決するための手段】上述の課題を解決するた
め、本発明は、データベースに登録された文書ファイル
の検索を行う文書検索システムにおいて、前記データベ
ースへの文書ファイルの登録をユーザ指定のスケジュー
ルで定期的に実行する自動登録手段と、前記自動登録手
段によって文書ファイルの登録を行うとき、新規登録対
象の文書ファイルが前記データベースに既に登録されて
いるか否かを検出する手段と、前記新規登録対象の文書
ファイルがすでに登録済みの文書ファイルであることが
検出された場合、前記新規登録対象の文書ファイルに対
応する検索用データを生成し、その生成した検索用デー
タを前記データベースの該当する文書ファイルの検索用
データに対して上書き登録する上書き登録手段とを具備
することを特徴とする。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention provides a document search system for searching for a document file registered in a database. Automatically registering means for periodically executing at the time of registering a document file by the automatic registering means, means for detecting whether a document file to be newly registered is already registered in the database, If it is detected that the target document file is a registered document file, search data corresponding to the newly registered target document file is generated, and the generated search data is stored in a corresponding document in the database. Overwriting registration means for overwriting and registering the file search data. That.

【００１１】この文書検索システムによれば、ユーザ指
定のスケジュールに従い、データベースへの文書ファイ
ルの登録が定期的に自動実行される。この自動登録に際
しては、新規登録対象の文書ファイルがデータベースに
既に登録されているか否かが調べられる。新規登録対象
の文書ファイルがすでに登録済みの文書ファイルである
ことが検出された場合には、新規登録対象の新規登録対
象の文書ファイルに対応する検索用データが生成される
と共に、その生成した検索用データがデータベースの該
当する文書ファイルの検索用データに対して上書き登録
される。このような自動登録及び上書き登録の機能を用
いることにより、検索用データのデータサイズの拡大を
招くことなく、文書ファイルの更新内容をデータベース
に自動的に反映させることが可能となり、検索効率の向
上を図ることができる。According to this document search system, the registration of a document file in the database is automatically performed periodically according to a schedule specified by the user. At the time of this automatic registration, it is checked whether or not the document file to be newly registered has already been registered in the database. If it is detected that the document file to be newly registered is a document file that has already been registered, search data corresponding to the newly registered document file to be newly registered is generated, and the generated search file is generated. Data is registered over the search data of the corresponding document file in the database. By using such functions of automatic registration and overwrite registration, it is possible to automatically reflect the updated contents of the document file in the database without increasing the data size of the search data, thereby improving search efficiency. Can be achieved.

【００１２】また、新規登録対象の文書ファイルがすで
に登録済みの文書ファイルであることが検出された場
合、新規登録対象の文書ファイルが登録済みの該当する
文書ファイルに対して更新されているか否かをファイル
更新日時に基づいて判定する手段を更に設け、新規登録
対象の文書ファイルが更新されていない場合には、上書
き登録手段による新規登録対象の文書ファイルに関する
上書き登録処理の実行を禁止することが好ましい。これ
により、実際に更新されている文書ファイルのみを対象
にした上書き登録処理を実現でき、登録処理の効率化を
図ることが可能となる。If it is detected that the document file to be newly registered is a document file that has already been registered, it is determined whether the document file to be newly registered has been updated with respect to the registered document file. May be further provided based on the file update date and time, and when the document file to be newly registered has not been updated, execution of the overwrite registration process for the document file to be newly registered by the overwrite registration means may be prohibited. preferable. As a result, the overwrite registration process for only the document file that is actually updated can be realized, and the efficiency of the registration process can be improved.

【００１３】また、新規登録対象の文書ファイルが未登
録の文書ファイルであることが検出された場合、新規登
録対象の文書ファイルに類似する文書ファイルをデータ
ベース中の検索用データを用いて検索することにより、
所定の類似度以上の文書ファイルがすでに登録されてい
るか否かを判定する手段をさらに設け、所定の類似度以
上の文書ファイルがすでに登録されている場合には、そ
の文書ファイルに対応する検索用データに関連づけて新
規登録対象の文書ファイルに対するリンク情報のみを前
記データベースに登録することが好ましい。これによ
り、類似する文書ファイル間で検索用データを共用する
ことが可能となるので、検索用データの重複が無くな
り、検索効率を高めることができる。When it is detected that the document file to be newly registered is an unregistered document file, a document file similar to the document file to be newly registered is searched for using the search data in the database. By
Means for determining whether a document file having a predetermined similarity or higher has already been registered is further provided. If a document file having a predetermined similarity or higher is already registered, a search file corresponding to the document file is registered. Preferably, only the link information for the document file to be newly registered is registered in the database in association with the data. This makes it possible to share search data between similar document files, so that duplication of search data is eliminated and search efficiency can be improved.

【００１４】[0014]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態を説明する。図１には、本発明の一実施形態に係
る文書検索システムを適用したクランアント・サーバ型
のコンピュータシステムの構成が示されている。このコ
ンピュータシステムは、図示のように、サーバコンピュ
ータ１１と、複数のクライアントコンピュータ１２とか
ら構成されている。サーバコンピュータ１１およびクラ
イアントコンピュータ１２は、ＬＡＮ、インターネット
などのコンピュータネットワークを介して接続されてい
る。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a configuration of a client server type computer system to which a document search system according to an embodiment of the present invention is applied. This computer system comprises a server computer 11 and a plurality of client computers 12, as shown. The server computer 11 and the client computer 12 are connected via a computer network such as a LAN and the Internet.

【００１５】サーバコンピュータ１１には、図示のよう
に、自然言語検索システム１１１、管理機能部１１２、
登録機能部１１３、ファイル記憶装置１１４、およびＨ
ＴＴＰサーバ１１５などが設けられている。自然言語検
索システム１１１は、登録された文書データを日常的な
言葉（自然言語）で効率的に検索可能な検索システムで
あり、クライアントコンピュータ１２からの日常的な言
葉（自然言語）による問い合わせ文（クエリー）を解析
し、さらにその構文解析、形態素解析、キーワードの重
み付け処理、類似度算出などの処理を行うことにより、
登録文書の中から適切な文書を検索する。この自然言語
検索システム１１１は、データベース１１６、自然言語
検索エンジン１１７、および自然言語検索サーバ１１８
から構成されている。The server computer 11 includes a natural language search system 111, a management function unit 112,
Registration function unit 113, file storage device 114, and H
A TTP server 115 and the like are provided. The natural language search system 111 is a search system that can efficiently search registered document data using everyday words (natural languages). Query), and then perform parsing, morphological analysis, keyword weighting, similarity calculation, and other processing.
Search for an appropriate document from the registered documents. The natural language search system 111 includes a database 116, a natural language search engine 117, and a natural language search server 118.
It is composed of

【００１６】データベース１１６には、検索対象として
登録された文書ファイルに対応する検索用データ（イン
デックスデータ）が記憶されている。自然言語検索エン
ジン１１７は上述の自然言語検索のための演算処理を行
うためのソフトウェアモジュールであり、データベース
１１６の検索用データを用いることにより、問い合わせ
に合致する登録文書を検索する。自然言語検索サーバ１
１８はＨＴＴＰサーバ１１５を通じてクライアントコン
ピュータ１２に対して検索サービスを提供するためのソ
フトウェアモジュールである。The database 116 stores search data (index data) corresponding to a document file registered as a search target. The natural language search engine 117 is a software module for performing the arithmetic processing for the natural language search described above, and searches for a registered document that matches the inquiry by using search data in the database 116. Natural language search server 1
Reference numeral 18 denotes a software module for providing a search service to the client computer 12 via the HTTP server 115.

【００１７】管理機能部１１２は自然言語検索システム
１１１を運用・管理するためのソフトウェアモジュール
であり、文書自動登録に関するスケジュール管理、自動
登録モードの設定などの処理を行う。登録機能部１１３
は、データベース１１６への文書ファイルの登録をユー
ザ指定のスケジュールで定期的に実行するためのソフト
ウェアモジュールであり、管理機能部１１２の制御の
下、ファイル記憶装置１１４に格納されている文書ファ
イルを対象に自動登録処理を実行する。ユーザ指定のス
ケジュールとは、例えば、「毎朝９：００に自動登
録」、「毎週月曜日に自動登録」といった登録に関する
日時指定を意味し、その設定はシステム管理者などによ
って行われる。The management function unit 112 is a software module for operating and managing the natural language search system 111, and performs processes such as schedule management for automatic document registration and setting of an automatic registration mode. Registration function unit 113
Is a software module for periodically registering a document file in the database 116 according to a schedule specified by the user. The software module targets a document file stored in the file storage device 114 under the control of the management function unit 112. Execute the automatic registration process. The schedule specified by the user means, for example, a date and time specification relating to registration such as “automatic registration at 9:00 every morning” and “automatic registration every Monday”, and the setting is performed by a system administrator or the like.

【００１８】ファイル記憶装置１１４には、検索対象と
なる各種文書ファイルが記憶されている。各クライアン
トコンピュータ１２のユーザは、ブラウザ１２１を通じ
てサーバコンピュータ１１にアクセスすることにより、
メール、スケジュール、回覧などの各種文書ファイルを
ファイル記憶装置１１４に自由に書き込んだり、更新し
たりする事ができる。The file storage device 114 stores various document files to be searched. The user of each client computer 12 accesses the server computer 11 through the browser 121,
Various document files such as mail, schedule, and circulation can be freely written and updated in the file storage device 114.

【００１９】すなわち、図２に示されているように、フ
ァイル記憶装置１１４は各クライアントコンピュータ１
２からアクセス可能な共有フォルダとして割り当てられ
ており、各クライアントコンピュータ１２のユーザはブ
ラウザ１２１を通じてファイル記憶装置１１４に対する
ファイル入出力操作を自由に行うことができる。このた
め、ファイル記憶装置１１４には新たな文書ファイルの
追加のみならず、ファイル記憶装置１１４内の既存の文
書ファイルに対する更新操作も頻繁に行われる。ファイ
ル記憶装置１１４内の文書ファイルは自然言語検索シス
テム１１１のデータベース１１６に定期的に自動登録さ
れるので、クライアントコンピュータ１２の各ユーザ
は、ブラウザ１２１を通して自然言語検索サーバ１１８
をアクセスすることにより、ファイル記憶装置１１４内
の文書ファイルの中からの目的文書を容易に検索するこ
とができる。That is, as shown in FIG. 2, the file storage device 114
2, the user of each client computer 12 can freely perform a file input / output operation on the file storage device 114 through the browser 121. Therefore, not only a new document file is added to the file storage device 114, but also an update operation for an existing document file in the file storage device 114 is frequently performed. Since the document file in the file storage device 114 is automatically and periodically registered in the database 116 of the natural language search system 111, each user of the client computer 12 can use the natural language search server 118 through the browser 121.
By accessing the target file, the target document can be easily searched from the document files in the file storage device 114.

【００２０】ここで、図３のフローチャートを参照し
て、本システムにおける一連の処理の流れを説明する。Here, a flow of a series of processes in the present system will be described with reference to a flowchart of FIG.

【００２１】まず、管理機能部１１２から自然言語検索
サーバ１１８に対してデータベース１１６のデータベー
ス名の取得要求が出され、その取得したデータベース名
を基に、登録処理を行うべきデータベース名が指定され
る（ステップＳ１１）。続いて、管理機能部１１２によ
り、自然言語検索システム１１１に登録すべき文書ファ
イルの所在が選択される（ステップＳ１２）。本実施形
態においては、ファイル記憶装置１１４を指定するディ
レクトリ（パス）が文書ファイルの所在として選定され
る。この後、システム管理者による設定に基づき、管理
機能部１１２により、登録機能部１１３に対する自動登
録日時の設定とその場合の登録モードの設定が行われる
（ステップＳ１３）。登録モードには様々なオプション
が用意されており、「全文書ファイルの上書き登録」、
「ファイル更新日時の比較による特定文書ファイルにつ
いてのみの上書き登録」、「類似度によるリンク情報の
みの追加登録」などを指定することができる。これら登
録モードの具体的な内容については図４以降で後述す
る。First, a request for obtaining the database name of the database 116 is issued from the management function unit 112 to the natural language search server 118, and a database name to be registered is specified based on the obtained database name. (Step S11). Subsequently, the location of the document file to be registered in the natural language search system 111 is selected by the management function unit 112 (step S12). In the present embodiment, a directory (path) specifying the file storage device 114 is selected as the location of the document file. Thereafter, based on the settings made by the system administrator, the management function unit 112 sets the automatic registration date and time for the registration function unit 113 and the registration mode in that case (step S13). Various options are available in the registration mode, such as "overwrite registration of all document files",
“Overwrite registration only for a specific document file by comparing file update date and time”, “additional registration only for link information based on similarity”, and the like can be designated. Specific contents of these registration modes will be described later with reference to FIG.

【００２２】管理機能部１１２で指定された登録開始日
時に達すると、登録機能部１１３は、管理機能部１１２
により指定されたデータベース１１６への接続を行い、
続いて、管理機能部１１２により指定された登録モード
に従い、ファイル記憶装置１１４内の文書ファイルをデ
ータベース１１６に登録するための処理を行う（ステッ
プＳ１４）。When the registration start date and time specified by the management function unit 112 is reached, the registration function unit 113
Makes a connection to the database 116 specified by
Subsequently, according to the registration mode designated by the management function unit 112, a process for registering the document file in the file storage device 114 in the database 116 is performed (step S14).

【００２３】各クライアントコンピュータ１２のユーザ
は、ブラウザ１２１を通じて自然言語検索サーバ１１８
にアクセスし、自然言語検索サーバ１１８のホームペー
ジをクライアントコンピュータ１２のディスプレイモニ
タに表示する。そして、ホームページで提供される入力
フォーム上に自然言語で問い合わせ文を入力することに
より、その問い合わせ文に基づく検索処理が自然言語検
索システム１１１によって実行され、その検索結果がブ
ラウザ１２１により画面表示される（ステップＳ１
５）。検索結果には、該当する文書ファイル内の数行の
テキスト情報とその文書ファイルへのリンク情報とが含
まれており、リンク情報をクリックすると、ファイル記
憶装置１１４内の該当する文書ファイルをファイル記憶
装置１１４から読み出して表示することができる。The user of each client computer 12 sends a natural language search server 118 through a browser 121.
To display the homepage of the natural language search server 118 on the display monitor of the client computer 12. Then, by inputting a query in a natural language on an input form provided on the homepage, a search process based on the query is executed by the natural language search system 111, and the search result is displayed on the screen by the browser 121. (Step S1
5). The search result includes several lines of text information in the corresponding document file and link information to the document file. When the link information is clicked, the corresponding document file in the file storage device 114 is stored in the file. It can be read from the device 114 and displayed.

【００２４】なお、ある文書ファイルのテキスト全てを
一種の問い合わせ文として入力し、そのテキストに類似
する文書ファイルを検索するという処理を行うこともで
きる。It is also possible to perform a process of inputting all the text of a certain document file as a kind of inquiry sentence and searching for a document file similar to the text.

【００２５】次に、図４を参照して、管理機能部１１２
によって提供される登録モード選択画面の一例を説明す
る。この登録モード選択画面は自動登録処理における登
録処理の内容を選択させるためのものであり、図示のよ
うに、「チェックしない（全ファイル上書き登録）」の
チェックボックス、「日付チェックする／しない」のチ
ェックボックス、「類似度チェックする／しない」のチ
ェックボックス、「類似度」の入力フィールドなどが表
示される。Next, referring to FIG.
An example of the registration mode selection screen provided by the above will be described. This registration mode selection screen is for allowing the user to select the contents of the registration processing in the automatic registration processing. As shown in the figure, a check box of "do not check (overwrite registration of all files)" A check box, a check box for "check / not check similarity", an input field for "similarity", and the like are displayed.

【００２６】「チェックしない（全ファイル上書き登
録）」のチェックボックスをチェックすると、「全文書
ファイルの上書き登録」モードが選択される。この場
合、「日付チェック」および「類似度チェック」は行わ
れない。この「全文書ファイルの上書き登録」モード
は、ファイル記憶装置１１４内の全文書ファイルに対応
する検索用データを生成して、それをデータベース１１
６に登録するモードであり、すでにデータベース１１６
に登録されている文書ファイルについては新たに作成し
た検索用データによって既存の検索用データが上書きさ
れる。この様子を図５に示す。When the check box of “not checked (overwrite registration of all files)” is checked, the “overwrite registration of all document files” mode is selected. In this case, “date check” and “similarity check” are not performed. In this “overwrite registration of all document files” mode, search data corresponding to all document files in the file storage device 114 is generated, and the search data is stored in the database 11.
6 is a mode for registering in the database 116
The existing search data is overwritten by the newly created search data for the document file registered in. This is shown in FIG.

【００２７】図５においては、最初の登録時（登録１）
においては、ファイル記憶装置１１４にファイル名Ａ，
Ｂ，Ｃ，Ｄ，Ｅ，Ｆという６つの文書ファイルが記憶さ
れており、それら文書ファイルＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ
に対応する検索用データ（ＩＮＤＥＸＤＡＴＡ１）が
データベース１１６に登録されており、今回の登録時
（登録２）においては、ファイル記憶装置１１４にファ
イルＧが新たに追加されていた場合を想定している。文
書ファイルＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆについては、前回の
登録時から更新されているものもあるし、更新されてい
ないものもある。In FIG. 5, at the time of first registration (registration 1)
, The file name A,
Six document files B, C, D, E, and F are stored, and these document files A, B, C, D, E, and F are stored.
Is registered in the database 116, and it is assumed that the file G has been newly added to the file storage device 114 at the time of this registration (registration 2). . Some of the document files A, B, C, D, E, and F have been updated since the previous registration, and some have not been updated.

【００２８】検索用データはＩＮＤＥＸＤＡＴＡ１か
らＩＮＤＥＸＤＡＴＡ２に置き換えられる。ＩＮＤＥ
ＸＤＡＴＡ２は、文書ファイルＡ，Ｂ，Ｃ，Ｄ，Ｅ，
Ｆ、Ｇそれぞれから新たに生成された検索用データから
なり、文書ファイルＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆに対応する
既存の検索用データは削除される。このように、「全文
書ファイルの上書き登録」モードにおいては、データベ
ース１１６のインデックスはファイル記憶装置１１４の
最新の内容に合わせて全て置き換えられる。既存の検索
用データは削除されるので、自動登録処理が何度実行さ
れても検索用データが重複することはない。The search data is replaced from INDEX DATA1 to INDEX DATA2. INDE
X DATA2 includes document files A, B, C, D, E,
Existing search data composed of search data newly generated from F and G, respectively, and corresponding to document files A, B, C, D, E and F is deleted. As described above, in the “overwrite registration of all document files” mode, all the indexes of the database 116 are replaced according to the latest contents of the file storage device 114. Since the existing search data is deleted, the search data will not be duplicated no matter how many times the automatic registration process is executed.

【００２９】また、「日付チェックする」のチェックボ
ックスをチェックすると、すでに登録済みの文書ファイ
ルと現在の文書ファイルとのファイル更新日時に関する
情報の比較が行われ、現在の文書ファイルが登録データ
よりも新しい場合にのみ前述の上書き登録処理が実行さ
れる。この処理の具体的な手順については、図９で後述
する。When the "check date" check box is checked, information on the file update date and time of the registered document file and the current document file is compared, and the current document file is compared with the registered data. The above-described overwriting registration process is executed only when the data is new. The specific procedure of this processing will be described later with reference to FIG.

【００３０】また、「類似度チェックする」のチェック
ボックスをチェックすると、すでに登録されていない新
たな文書ファイル（図５のファイルＧ）に類似する既登
録の文書ファイルが調べられ、類似度入力フィールドで
指定された類似度以上の類似文書ファイルが既登録であ
れば、その類似文書ファイルの検索用データが新たな文
書ファイル用の検索用データとして流用される。この処
理の具体的な手順は図１１以降で説明する。When a check box of "check similarity" is checked, a registered document file similar to a new document file (file G in FIG. 5) which has not been registered is checked, and a similarity input field is checked. If a similar document file having a similarity degree equal to or more than the specified degree has already been registered, the search data of the similar document file is diverted as search data for a new document file. The specific procedure of this process will be described with reference to FIG.

【００３１】このように、図４の登録モード選択画面の
設定により、以下の４種類の登録モードを選択的に使用
することができる。As described above, by setting the registration mode selection screen shown in FIG. 4, the following four types of registration modes can be selectively used.

【００３２】登録モード＃１：全文書ファイルの上書き登録（チェ
ックなし）登録モード＃２：日付チェック登録モード＃３：類似度チェック登録モード＃４：日付および類似度チェック次に、図６のフローチャートを参照して、登録機能部１
１３によって実行される自動登録処理の手順について説
明する。管理機能部１１２にて指定された登録日時にな
ると、登録機能部１１３は、まず、自然言語検索サーバ
１１８を通じて登録先のデータベース１１６との接続を
行う（ステップＳ２１）。次いで、登録機能部１１３
は、ファイル記憶部１１４から最初の文書ファイルを読
み込み（ステップＳ２２）、そして現在の登録モードの
種別を判定し（ステップＳ２３）、その判定結果に合わ
せて、データベース１１６への文書ファイルの登録処理
を行う（ステップＳ２４〜Ｓ２７）。この後、登録機能
部１１３は、ファイル記憶部１１４内の全ての文書ファ
イルに対する登録処理が完了したか否かを判定し（ステ
ップＳ２８）、未処理の文書ファイルがあれば、ステッ
プＳ２２からの処理を再度実行する。なお、登録モード
の種別は全文書ファイルについて共通であるので、２文
書目以降の文書ファイルの登録処理に際しては登録モー
ドのチャック処理は省略しても良い。Registration mode # 1: Overwrite registration of all document files (no check) Registration mode # 2: Date check Registration mode # 3: Similarity check Registration mode # 4: Date and similarity check Next, a flowchart of FIG. Refer to the registration function section 1
The procedure of the automatic registration process executed by the server 13 will be described. When the registration date and time specified by the management function unit 112 comes, the registration function unit 113 first connects to the registration destination database 116 through the natural language search server 118 (step S21). Next, the registration function unit 113
Reads the first document file from the file storage unit 114 (step S22), determines the type of the current registration mode (step S23), and performs the process of registering the document file in the database 116 according to the determination result. (Steps S24 to S27). Thereafter, the registration function unit 113 determines whether or not the registration processing has been completed for all the document files in the file storage unit 114 (step S28). If there is an unprocessed document file, the processing from step S22 is performed. Is executed again. Since the type of the registration mode is common to all the document files, the chucking process in the registration mode may be omitted when registering the second and subsequent document files.

【００３３】（登録モード＃１：全文書ファイルの上書
き登録）次に、図７のフローチャートを参照して、登録
モード＃１における具体的な登録処理の手順について説
明する。(Registration Mode # 1: Overwrite Registration of All Document Files) Next, a specific procedure of the registration process in the registration mode # 1 will be described with reference to the flowchart of FIG.

【００３４】登録機能部１１３は、ファイル記憶部１１
４から登録対象の文書ファイルを読み込むと（ステップ
Ｓ３１）、まず、その文書ファイルがすでにデータベー
ス１１６に登録されているか否かを判断する（ステップ
Ｓ３２）。これは、登録対象の文書ファイルと同一のフ
ァイル名をもつ文書ファイルに対応した検索用データが
すでにデータベース１１６に登録されているか否かを調
べることによって行われる。登録対象の文書ファイルが
未登録であれば、登録機能部１１３は、その文書ファイ
ルからテキストを抽出してそのテキストを解析すること
によって検索用データ（インデックスデータ）を生成
し、その検索用データを文書ファイルへのリンク情報と
共にデータベース１１６に登録する（ステップＳ３
３）。なお、実際には、検索用データにはインデックス
のみならず、抽出されたテキストデータの一部又は全て
が含まれるが、検索用データとしてどのような情報を登
録するかについては検索システムの設計によるので、以
下では単に検索用データまたはインデックスデータと称
することにする。The registration function unit 113 stores the file storage unit 11
When a document file to be registered is read from Step 4 (Step S31), first, it is determined whether or not the document file has already been registered in the database 116 (Step S32). This is performed by checking whether or not search data corresponding to a document file having the same file name as the document file to be registered has already been registered in the database 116. If the document file to be registered is not registered, the registration function unit 113 generates search data (index data) by extracting a text from the document file and analyzing the text, and generates the search data. It is registered in the database 116 together with the link information to the document file (step S3)
3). Actually, the search data includes not only the index but also part or all of the extracted text data, but what information is registered as the search data depends on the design of the search system. Therefore, it is simply referred to as search data or index data below.

【００３５】登録対象の文書ファイルが既登録の文書フ
ァイルであった場合には、登録機能部１１３は、上書き
処理を実行する。すなわち、登録機能部１１３は、すで
に登録されているその文書ファイルに関するデータベー
ス１１６上の該当する旧インデックスデータの登録位置
を示す識別子（レコードのポインタ情報など）を取得し
た後（ステップＳ３４）、登録対象の文書ファイルから
テキストを抽出してそのテキストを解析することによっ
て新たなインデックスデータを生成し、そのインデック
スデータを登録対象の文書ファイルへのリンク情報と共
にデータベース１１６に登録する（ステップＳ３５）。
そして、この後、登録機能部１１３は、取得した識別子
を用いて、旧インデックスデータおよびそれに関連づけ
られたリンク情報をデータベース１１６から削除する
（ステップＳ３６）。この様子を図８に示す。図８で
は、登録対象のファイルＡがデータベース１１６にすで
に登録されている場合の例である。登録対象のファイル
Ａに関するインデックスデータおよびファイルＡへのリ
ング情報がデータベース１１６に新規登録され、そして
既登録のファイルＡに関するインデックスデータおよび
ファイルＡへのリング情報がデータベース１１６から削
除される。If the document file to be registered is a registered document file, the registration function unit 113 executes an overwriting process. That is, the registration function unit 113 acquires an identifier (pointer information of a record, etc.) indicating the registration position of the corresponding old index data on the database 116 regarding the document file already registered (step S34), A new index data is generated by extracting a text from the document file and analyzing the text, and the index data is registered in the database 116 together with link information to the registration target document file (step S35).
Then, thereafter, the registration function unit 113 deletes the old index data and the link information associated therewith from the database 116 using the acquired identifier (step S36). This is shown in FIG. FIG. 8 shows an example in which the file A to be registered has already been registered in the database 116. The index data relating to the file A to be registered and the ring information to the file A are newly registered in the database 116, and the registered index data relating to the file A and the ring information to the file A are deleted from the database 116.

【００３６】以上の図７の処理は登録対象の各文書ファ
イル毎に個々に行われる。これにより、無駄にインデッ
クスデータを増やすことなく、データベース１１６の内
容を最新のものに更新することができる。The processing shown in FIG. 7 is individually performed for each document file to be registered. Thus, the content of the database 116 can be updated to the latest one without wasting index data.

【００３７】（登録モード＃２：日付チェック）次に、
図９のフローチャートを参照して、登録モード＃２にお
ける具体的な登録処理の手順について説明する。(Registration mode # 2: date check)
With reference to the flowchart of FIG. 9, a specific procedure of the registration process in the registration mode # 2 will be described.

【００３８】登録機能部１１３は、ファイル記憶部１１
４から登録対象の文書ファイルを読み込むと（ステップ
Ｓ４１）、まず、その文書ファイルがすでにデータベー
ス１１６に登録されているか否かを判断する（ステップ
Ｓ４２）。登録対象の文書ファイルが未登録であれば、
登録機能部１１３は、その文書ファイルからテキストを
抽出してそのテキストを解析することによって検索用デ
ータ（インデックスデータ）を生成し、その検索用デー
タを文書ファイルへのリンク情報と共にデータベース１
１６に登録する（ステップＳ４３）。The registration function unit 113 stores the file storage unit 11
When the document file to be registered is read from Step 4 (Step S41), first, it is determined whether or not the document file is already registered in the database 116 (Step S42). If the document file to be registered is not registered,
The registration function unit 113 generates search data (index data) by extracting text from the document file and analyzing the text, and stores the search data together with link information to the document file in the database 1.
16 (step S43).

【００３９】一方、登録対象の文書ファイルが既登録の
文書ファイルであった場合には、登録機能部１１３は、
ファイル更新日時のチェックにより、上書き処理の必要
性の有無を判断する（ステップＳ４４）。すなわち、デ
ータベース１１６上における登録済み文書ファイルに対
応するファイル更新日付と登録対象の文書ファイルのフ
ァイル更新日付との比較が行われ、登録対象の文書ファ
イルがデータベース１１６上のそれよりも新しいもので
あるか否かの判定が行われる。もちろん、登録済み文書
ファイルに対応するファイル更新日付はデータベース１
１６上で文書ファイル毎に管理しても良いが、前回自動
登録処理を実行した日時を登録済み文書ファイルに対応
するファイル更新日付として使用し、それを登録対象の
文書ファイルのファイル更新日付と比較するようにして
も良い。On the other hand, when the document file to be registered is a registered document file, the registration function unit 113
By checking the file update date and time, it is determined whether or not the overwriting process is necessary (step S44). That is, the file update date corresponding to the registered document file on the database 116 is compared with the file update date of the document file to be registered, and the document file to be registered is newer than that on the database 116. Is determined. Of course, the file update date corresponding to the registered document file is
16 may be managed for each document file, but the date and time when the previous automatic registration process was executed is used as the file update date corresponding to the registered document file, and is compared with the file update date of the document file to be registered. You may do it.

【００４０】登録対象の文書ファイルのファイル更新日
付がデータベース１１６上における登録済み文書ファイ
ルに対応するファイル更新日付よりも新しいものである
場合には、図７のステップ３４〜Ｓ３６と同じ手順で、
登録対象の文書ファイルの内容に対応する検索データお
よびリンク情報の上書き登録が行われるが（ステップＳ
４５）、登録対象の文書ファイルのファイル更新日付が
データベース１１６上における登録済み文書ファイルに
対応するファイル更新日付と同じかあるいは古い場合に
は、ステップＳ４５の上書き登録のための処理はスキッ
プされる。この様子を図１０に示す。If the file update date of the document file to be registered is newer than the file update date corresponding to the registered document file in the database 116, the same procedure as in steps 34 to S36 in FIG.
Search data and link information corresponding to the content of the document file to be registered are overwritten and registered (step S
45) If the file update date of the document file to be registered is equal to or older than the file update date corresponding to the registered document file on the database 116, the process for overwrite registration in step S45 is skipped. This is shown in FIG.

【００４１】図１０においては、最初の登録時（登録
１）においては、ファイル記憶装置１１４にファイル名
Ａ，Ｂ，Ｃ，Ｄという４つの文書ファイルが記憶されて
おり、それら文書ファイルＡ，Ｂ，Ｃ，Ｄに対応する検
索用データ（ＩＮＤＥＸＤＡＴＡ１）がデータベース
１１６に登録されており、今回の登録時（登録２）にお
いては、ファイル記憶装置１１４にファイルＥが新たに
追加されているとともに、ファイルＢ，Ｃについては最
初の登録時（登録１）以降に更新がなされている場合を
想定している。検索用データはＩＮＤＥＸＤＡＴＡ１
からＩＮＤＥＸＤＡＴＡ２に置き換えられる。ＩＮＤＥ
ＸＤＡＴＡ２は、文書ファイルＡ，Ｂ，Ｃ，Ｄ，Ｅそ
れぞれに対応する検索用データおよびリンク情報から構
成されるが、今回新たに生成された検索用データおよび
リンク情報はファイルＢ，Ｃ，Ｅに対応するものだけと
なり、ファイルＡ，Ｄについては既存の検索用データお
よびリンク情報がそのまま流用される。また、ファイル
Ｂ，Ｃについては既存の検索用データおよびリンク情報
については削除される。In FIG. 10, at the time of the first registration (registration 1), four document files having file names A, B, C and D are stored in the file storage device 114, and these document files A and B are stored. , C, and D are registered in the database 116. At the time of this registration (registration 2), the file E is newly added to the file storage device 114, and It is assumed that the files B and C have been updated after the first registration (registration 1). Search data is INDEX DATA1
From INDEXDATA2. INDE
X DATA2 is composed of search data and link information respectively corresponding to the document files A, B, C, D, and E. The search data and link information newly generated this time are files B, C, and E, respectively. . For files A and D, existing search data and link information are diverted as they are. The existing search data and link information of the files B and C are deleted.

【００４２】（登録モード＃３：類似度チェック）次
に、図１１のフローチャートを参照して、登録モード＃
３における具体的な登録処理の手順について説明する。(Registration mode # 3: similarity check) Next, referring to the flowchart of FIG.
3 will be described.

【００４３】登録機能部１１３は、ファイル記憶部１１
４から登録対象の文書ファイルを読み込むと（ステップ
Ｓ５１）、まず、その文書ファイルがすでにデータベー
ス１１６に登録されているか否かを判断する（ステップ
Ｓ５２）。登録対象の文書ファイルが未登録であれば、
登録機能部１１３は、類似度入力フィールドで指定され
たユーザ指定の類似度を取得する（ステップＳ５３）。
次いで、登録機能部１１３は、登録対象の文書ファイル
から全テキストを抽出し、それを問い合わせ文として自
然言語検索システム１１１に入力することにより、登録
対象文書に類似する文書の検索を実行する（ステップＳ
５４）。The registration function unit 113 stores the file
When a document file to be registered is read from Step 4 (Step S51), first, it is determined whether or not the document file is already registered in the database 116 (Step S52). If the document file to be registered is not registered,
The registration function unit 113 acquires the user-specified similarity specified in the similarity input field (step S53).
Next, the registration function unit 113 extracts a whole text from the document file to be registered and inputs it to the natural language search system 111 as a query sentence, thereby executing a search for a document similar to the document to be registered (step). S
54).

【００４４】検索結果が存在しない場合には、登録機能
部１１３は、登録対象の文書ファイルからテキストを抽
出してそのテキストを解析することによって検索用デー
タ（インデックスデータ）を生成し、その検索用データ
を文書ファイルへのリンク情報と共にデータベース１１
６に登録する（ステップＳ５５）。If there is no search result, the registration function unit 113 generates search data (index data) by extracting a text from the document file to be registered and analyzing the text. Data is stored in the database 11 together with link information to the document file.
6 (step S55).

【００４５】検索結果が存在する場合には、登録機能部
１１３は、検索結果と共に自然言語検索システム１１１
から出力される類似度の値をチェックし、ユーザ指定の
類似度との大小比較を行う（ステップＳ５６）。検索文
書の類似度がユーザ指定の類似度よりも低い場合には、
登録機能部１１３は、ステップＳ５５の処理を行うが、
検索文書の類似度がユーザ指定の類似度以上であった場
合には、登録対象の文書ファイルへのリンク情報のみを
データベース１１６に登録し、登録対象の文書ファイル
に対応する検索データの生成および登録は行わない（ス
テップＳ５７）。この様子を図１２に示す。If there is a search result, the registration function unit 113 transmits the natural language search system 111 together with the search result.
The value of the similarity output from is checked, and the magnitude is compared with the similarity specified by the user (step S56). If the similarity of the search document is lower than the user-specified similarity,
The registration function unit 113 performs the process of step S55,
If the similarity of the search document is equal to or greater than the user-specified similarity, only the link information to the registration target document file is registered in the database 116, and the search data corresponding to the registration target document file is generated and registered. Is not performed (step S57). This is shown in FIG.

【００４６】図１２は、新規登録対象の文書ファイルが
文書ファイルＧであり、その文書ファイルＧに対してユ
ーザ指定の類似度以上の類似度を持つ文書ファイルが文
書ファイルＡの場合の例である。この場合、文書ファイ
ルＧ用の検索用データの生成および登録は行われず、文
書ファイルＡの検索用データ（インデックスデータ）に
関連づけられた状態で、文書ファイルＧへのリンク情報
が追加登録される。この場合、ある問い合わせ文（クエ
リー）に対する検索結果としては、例えば図１３に示す
ように、問い合わせ文に対応する類似文書の一部と文書
ファイルＡ，Ｇを示すリンク情報とを含む検索結果が表
示されることになる。FIG. 12 shows an example in which the document file to be newly registered is the document file G, and the document file A having a similarity higher than the similarity designated by the user to the document file G is the document file A. . In this case, the generation and registration of the search data for the document file G are not performed, and the link information to the document file G is additionally registered while being associated with the search data (index data) of the document file A. In this case, as a search result for a certain query (query), for example, as shown in FIG. 13, a search result including a part of a similar document corresponding to the query and link information indicating document files A and G is displayed. Will be done.

【００４７】また、図１１のステップＳ５２にて、登録
対象の文書ファイルが既登録の文書ファイルであると判
定された場合には、登録機能部１１３は、図７のステッ
プ３４〜Ｓ３６と同じ手順で、登録対象の文書ファイル
の内容に対応する検索データおよびリンク情報の上書き
登録を行う（ステップＳ５８）。If it is determined in step S52 in FIG. 11 that the document file to be registered is a registered document file, the registration function unit 113 executes the same procedure as in steps S34 to S36 in FIG. Then, overwrite registration of search data and link information corresponding to the content of the document file to be registered is performed (step S58).

【００４８】（登録モード＃４：日付および類似度チェ
ック）次に、図１４のフローチャートを参照して、登録
モード＃４における具体的な登録処理の手順について説
明する。本例は、図１１で説明した類似度チェックの処
理と図９で説明した日付チェックの処理を組み合わせて
使用して場合の例である。(Registration Mode # 4: Date and Similarity Check) Next, a specific procedure of the registration process in the registration mode # 4 will be described with reference to the flowchart of FIG. This example is an example in which the similarity check processing described in FIG. 11 and the date check processing described in FIG. 9 are used in combination.

【００４９】即ち、登録機能部１１３は、ファイル記憶
部１１４から登録対象の文書ファイルを読み込むと（ス
テップＳ６１）、まず、その文書ファイルがすでにデー
タベース１１６に登録されているか否かを判断する（ス
テップＳ６２）。登録対象の文書ファイルが未登録であ
れば、登録機能部１１３は、類似度入力フィールドで指
定されたユーザ指定の類似度を取得する（ステップＳ６
３）。次いで、登録機能部１１３は、登録対象の文書フ
ァイルから全テキストを抽出し、それを問い合わせ文と
して自然言語検索システム１１１に入力することによ
り、登録対象文書に類似する文書の検索を実行する（ス
テップＳ６４）。That is, when the registration function unit 113 reads a document file to be registered from the file storage unit 114 (step S61), it first determines whether or not the document file has already been registered in the database 116 (step S61). S62). If the document file to be registered has not been registered, the registration function unit 113 acquires the user-specified similarity specified in the similarity input field (step S6).
3). Next, the registration function unit 113 extracts a whole text from the document file to be registered and inputs it to the natural language search system 111 as a query sentence, thereby executing a search for a document similar to the document to be registered (step). S64).

【００５０】検索結果が存在しない場合には、登録機能
部１１３は、登録対象の文書ファイルからテキストを抽
出してそのテキストを解析することによって検索用デー
タ（インデックスデータ）を生成し、その検索用データ
を文書ファイルへのリンク情報と共にデータベース１１
６に登録する（ステップＳ６５）。If there is no search result, the registration function unit 113 generates search data (index data) by extracting a text from the document file to be registered and analyzing the text. Data is stored in the database 11 together with link information to the document file.
6 (step S65).

【００５１】検索結果が存在する場合には、登録機能部
１１３は、検索結果と共に自然言語検索システム１１１
から出力される類似度の値をチェックし、ユーザ指定の
類似度との大小比較を行う（ステップＳ６６）。検索文
書の類似度がユーザ指定の類似度よりも低い場合には、
登録機能部１１３は、ステップＳ６５の処理を行うが、
検索文書の類似度がユーザ指定の類似度以上であった場
合には、登録対象の文書ファイルへのリンク情報のみを
データベース１１６に登録し、登録対象の文書ファイル
に対応する検索データの生成および登録は行わない（ス
テップＳ６７）。If there is a search result, the registration function unit 113 transmits the natural language search system 111 together with the search result.
The value of the similarity output from is checked, and the magnitude is compared with the similarity specified by the user (step S66). If the similarity of the search document is lower than the user-specified similarity,
The registration function unit 113 performs the process of step S65,
If the similarity of the search document is equal to or greater than the user-specified similarity, only the link information to the registration target document file is registered in the database 116, and the search data corresponding to the registration target document file is generated and registered. Is not performed (step S67).

【００５２】また、ステップＳ６２にて、登録対象の文
書ファイルが既登録の文書ファイルであると判定された
場合には、登録機能部１１３は、ファイル更新日時のチ
ェックにより、上書き処理の必要性の有無を判断する
（ステップＳ６８）。すなわち、データベース１１６上
における登録済み文書ファイルに対応するファイル更新
日付と登録対象の文書ファイルのファイル更新日付との
比較が行われ、登録対象の文書ファイルがデータベース
１１６上のそれよりも新しいものであるか否かの判定が
行われる。もちろん、前述したように、前回自動登録処
理を実行した日時を登録済み文書ファイルに対応するフ
ァイル更新日付として使用し、それを登録対象の文書フ
ァイルのファイル更新日付と比較するようにしても良
い。If it is determined in step S62 that the document file to be registered is a registered document file, the registration function unit 113 checks the file update date and time to determine whether the overwriting process is necessary. The presence or absence is determined (step S68). That is, the file update date corresponding to the registered document file on the database 116 is compared with the file update date of the document file to be registered, and the document file to be registered is newer than that on the database 116. Is determined. Of course, as described above, the date and time when the automatic registration process was executed last time may be used as the file update date corresponding to the registered document file, and may be compared with the file update date of the document file to be registered.

【００５３】登録対象の文書ファイルのファイル更新日
付がデータベース１１６上における登録済み文書ファイ
ルに対応するファイル更新日付よりも新しいものである
場合には、図７のステップ３４〜Ｓ３６と同じ手順で、
登録対象の文書ファイルの内容に対応する検索データお
よびリンク情報の上書き登録が行われるが（ステップＳ
６９）、登録対象の文書ファイルのファイル更新日付が
データベース１１６上における登録済み文書ファイルに
対応するファイル更新日付と同じかあるいは古い場合に
は、ステップＳ６９の上書き登録のための処理はスキッ
プされる。If the file update date of the document file to be registered is later than the file update date corresponding to the registered document file on the database 116, the same procedure as in steps 34 to S36 in FIG.
Search data and link information corresponding to the content of the document file to be registered are overwritten and registered (step S
69) If the file update date of the document file to be registered is the same as or older than the file update date corresponding to the registered document file on the database 116, the process for overwrite registration in step S69 is skipped.

【００５４】以上のように、本実施形態においては、自
然言語検索システム１１１に対して自動文書登録機能を
適用し、かつ自動文書登録時における登録モードとして
全文書ファイルの上書き登録（登録モード＃１）、日付
チェックによるファイル更新の有無による上書き登録
（登録モード＃２）、類似度チェックによる類似文書の
有無によるリンク情報のみの登録（登録モード＃３）、
および日付チェックおよび類似度チェックの組み合わせ
による登録（登録モード＃４）を用意しているので、自
動登録による登録処理が繰り返し実行されても、同一の
検索用データが２重、３重に登録されてしまうという不
具合を招くことなく、文書ファイルの更新内容を効率よ
くデータベース１１６に自動反映することが可能とな
る。特に、複数のクライアントコンピュータ１２間で共
通に使用されるファイル記憶装置１１４を対象にそこに
含まれる文書ファイルの登録処理を自動化しているの
で、データベース１１６の無用なデータサイズの増大を
招くことなく、オフィス内における各個人の作業課程に
おける文書などのように頻繁に更新が繰り返されるよう
な文書ファイルについてもそれを有効活用することが可
能となる。As described above, in the present embodiment, the automatic document registration function is applied to the natural language search system 111, and all document files are overwritten and registered (registration mode # 1) as the registration mode at the time of automatic document registration. ), Overwrite registration based on presence / absence of file update by date check (registration mode # 2), registration of link information only based on presence / absence of similar document by similarity check (registration mode # 3),
And registration (registration mode # 4) by a combination of date check and similarity check is prepared, so that even if registration processing by automatic registration is repeatedly executed, the same search data is registered twice or triple. The updated contents of the document file can be automatically and efficiently reflected in the database 116 without causing the problem that the document file is updated. In particular, since the process of registering the document file included in the file storage device 114 commonly used by the plurality of client computers 12 is automated, the unnecessary data size of the database 116 does not increase. It is also possible to effectively use a document file that is frequently updated, such as a document in a work course of each individual in the office.

【００５５】なお、以上の説明では、本実施形態の文書
検索システムをクランアント・サーバ型のコンピュータ
システムに適用した場合を例に説明したが、スタンドア
ロンで使用されるコンピュータにおいても同様にして適
用することができる。In the above description, the case where the document search system according to the present embodiment is applied to a client server type computer system has been described as an example. However, the present invention is similarly applied to a stand-alone computer. be able to.

【００５６】また、本実施形態の自然言語検索システム
１１１およびその自動登録制御のための機能（管理機能
部１１２および登録機能部１１３）はコンピュータプロ
グラムによって実現されているので、そのコンピュータ
プログラムをコンピュータ読み取り可能な記憶媒体に記
録しておくことにより、その記憶媒体を通じてコンピュ
ータプログラムを通常のコンピュータに導入するだけ
で、本実施形態と同様の効果を得ることが可能となる。
また、コンピュータプログラムの配布は、記憶媒体のみ
ならず、通信媒体を通じて行うこともできる。Further, since the natural language retrieval system 111 of this embodiment and the functions for its automatic registration control (management function unit 112 and registration function unit 113) are realized by computer programs, the computer programs are read by a computer. By recording the program on a possible storage medium, the same effect as in the present embodiment can be obtained only by introducing a computer program into a normal computer through the storage medium.
The computer program can be distributed not only through a storage medium but also through a communication medium.

【００５７】また、本発明は、上記実施形態に限定され
るものではなく、実施段階ではその要旨を逸脱しない範
囲で種々に変形することが可能である。更に、上記実施
形態には種々の段階の発明が含まれており、開示される
複数の構成要件における適宜な組み合わせにより種々の
発明が抽出され得る。例えば、実施形態に示される全構
成要件から幾つかの構成要件が削除されても、発明が解
決しようとする課題の欄で述べた課題（の少なくとも１
つ）が解決でき、発明の効果の欄で述べられている効果
（の少なくとも１つ）が得られる場合には、この構成要
件が削除された構成が発明として抽出され得る。The present invention is not limited to the above-described embodiment, and can be variously modified at the stage of implementation without departing from the scope of the invention. Further, the embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some components are deleted from all the components shown in the embodiment, at least one of the problems described in the column of the problem to be solved by the invention is to be solved.
Can be solved and the effect (at least one of them) described in the section of the effect of the invention can be obtained, a configuration from which this component is deleted can be extracted as the invention.

【００５８】[0058]

【発明の効果】以上説明したように、本発明によれば、
文書ファイルの更新内容を効率よくデータベースに自動
反映できるようになり、頻繁に更新が行われる文書ファ
イルに対してもそれを効率よく検索することが可能とな
る。As described above, according to the present invention,
The updated content of the document file can be automatically and efficiently reflected in the database, and the document file that is frequently updated can be efficiently searched.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る文書検索システムを
適用したクランアント・サーバ型のコンピュータシステ
ムの構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of a client server type computer system to which a document search system according to an embodiment of the present invention is applied.

【図２】同実施形態の文書検索システムの運用形態の一
例を説明するための図。FIG. 2 is an exemplary view for explaining an example of an operation mode of the document search system according to the embodiment.

【図３】同実施形態の文書検索システムにおける一連の
処理の流れを説明するフローチャート。FIG. 3 is an exemplary flowchart illustrating a flow of a series of processes in the document search system of the embodiment.

【図４】同実施形態の文書検索システムに設けられた管
理機能部によって提供される登録モード選択画面の一例
を示す図。FIG. 4 is an exemplary view showing an example of a registration mode selection screen provided by a management function unit provided in the document search system of the embodiment.

【図５】同実施形態の文書検索システムに適用される上
書き登録処理の原理を説明するための図。FIG. 5 is an exemplary view for explaining the principle of overwrite registration processing applied to the document search system of the embodiment.

【図６】同実施形態の文書検索システムによって実行さ
れる自動登録処理の手順を示すフローチャート。FIG. 6 is an exemplary flowchart illustrating the procedure of an automatic registration process executed by the document search system of the embodiment.

【図７】同実施形態の文書検索システムによって実行さ
れる登録モード＃１における具体的な登録処理の手順を
示すフローチャート。FIG. 7 is an exemplary flowchart showing the procedure of a specific registration process in a registration mode # 1 executed by the document search system of the embodiment.

【図８】図７における上書き処理の原理を説明するため
の図。FIG. 8 is a view for explaining the principle of the overwriting process in FIG. 7;

【図９】同実施形態の文書検索システムによって実行さ
れる登録モード＃２における具体的な登録処理の手順を
示すフローチャート。FIG. 9 is an exemplary flowchart showing the procedure of a specific registration process in a registration mode # 2 executed by the document search system of the embodiment.

【図１０】図９におけるファイル更新の有無による上書
き登録処理の原理を説明するための図。FIG. 10 is a view for explaining the principle of overwrite registration processing based on the presence or absence of file update in FIG. 9;

【図１１】同実施形態の文書検索システムによって実行
される登録モード＃３における具体的な登録処理の手順
を示すフローチャート。FIG. 11 is an exemplary flowchart showing the procedure of a specific registration process in a registration mode # 3 executed by the document search system of the embodiment.

【図１２】図１１におけるリンク情報登録処理の原理を
説明するための図。FIG. 12 is a view for explaining the principle of link information registration processing in FIG. 11;

【図１３】図１１におけるリンク情報登録処理を用いた
場合における検索結果の一例を示す図。FIG. 13 is a diagram showing an example of a search result when the link information registration processing in FIG. 11 is used.

【図１４】同実施形態の文書検索システムによって実行
される登録モード＃４における具体的な登録処理の手順
を示すフローチャート。FIG. 14 is an exemplary flowchart showing the procedure of a specific registration process in a registration mode # 4 executed by the document search system of the embodiment.

[Explanation of symbols]

１１…サーバコンピュータ１２…クライアントコンピュータ１１１…自然言語検索システム１１２…管理機能部１１３…登録機能部１１４…ファイル記憶装置１１５…ＨＴＴＰサーバ１１６…データベース１１７…自然言語検索エンジン１１８…自然言語検索サーバ DESCRIPTION OF SYMBOLS 11 ... Server computer 12 ... Client computer 111 ... Natural language search system 112 ... Management function part 113 ... Registration function part 114 ... File storage device 115 ... HTTP server 116 ... Database 117 ... Natural language search engine 118 ... Natural language search server

Claims

[Claims]

1. A document retrieval system for retrieving a document file registered in a database, comprising: an automatic registration unit for periodically registering a document file in the database on a schedule specified by a user; Means for detecting whether a document file to be newly registered is already registered in the database, and detecting that the document file to be newly registered is a registered document file Is generated, search data corresponding to the document file to be newly registered is generated,
A document search system comprising overwriting registration means for overwriting and registering the generated search data with respect to the search data of the corresponding document file in the database.

2. When it is detected that the document file to be newly registered is a document file that has already been registered, whether the document file to be newly registered has been updated with respect to the registered document file Means for determining whether or not the document file to be newly registered has not been updated, wherein the overwriting registration processing for the document file to be newly registered by the overwriting registration means is performed when the document file to be newly registered has not been updated. 2. The document search system according to claim 1, wherein execution is prohibited.

3. When it is detected that the document file to be newly registered is an unregistered document file, a document file similar to the document file to be newly registered is searched for by using search data in the database. Means for determining whether or not a document file having a predetermined similarity or higher is already registered by retrieving; if a document file having a predetermined similarity or higher is already registered, 2. The document search system according to claim 1, wherein only link information for the document file to be newly registered is registered in the database in association with search data corresponding to the file.

4. A document retrieval system for retrieving a document file registered in a database, comprising: an automatic registration unit for periodically registering a document file in the database according to a schedule specified by a user; Means for detecting whether or not a document file to be newly registered is already registered in the database; and detecting that the document file to be newly registered is an unregistered document file. In this case, it is determined whether or not a document file having a predetermined similarity or higher is already registered by searching for a document file similar to the document file to be newly registered using the search data in the database. Means, and if a document file having a predetermined similarity or higher is not registered, Generate search data corresponding to the document file to be recorded, register the generated search data and link information to the newly registered document file in the database, and register a document file having a predetermined similarity or higher. And a means for registering, in the database, only link information for the document file to be newly registered in association with search data corresponding to the document file.

5. A search document registration control method applied to a document search system for searching for a document file registered in a database, wherein the registration of the document file in the database is periodically executed according to a schedule specified by a user. Automatically registering, and when registering a document file in the automatic registration step, detecting whether or not a document file to be newly registered is already registered in the database; and If it is detected that the document file has already been registered, generate search data corresponding to the document file to be newly registered,
An overwriting registration step of overwriting and registering the generated search data over the search data of the corresponding document file in the database.

6. When it is detected that the document file to be newly registered is a document file that has already been registered, whether the document file to be newly registered has been updated with respect to the registered document file Judging whether or not the document file to be newly registered has not been updated. If the document file to be newly registered has not been updated, the overwriting registration process for the document file to be newly registered by the overwriting registration step is performed. 6. The search document registration control method according to claim 5, wherein execution is prohibited.

7. When it is detected that the document file to be newly registered is an unregistered document file, a document file similar to the document file to be newly registered is searched for by using search data in the database. The method further comprises a step of determining whether or not a document file having a predetermined similarity or higher is already registered by searching, and if a document file having a predetermined similarity or higher is already registered, 6. The search document registration control method according to claim 5, wherein only link information for the document file to be newly registered is registered in the database in association with search data corresponding to the file.

8. A search document registration control method applied to a document search system for searching for a document file registered in a database, wherein the registration of the document file in the database is periodically executed on a schedule specified by a user. Automatically registering, and when registering a document file in the automatic registration step, detecting whether or not a document file to be newly registered is already registered in the database; and When it is detected that the document file is an unregistered document file, a document file similar to the document file to be newly registered is searched for by using search data in the database, and a document file having a predetermined similarity or higher is obtained. Determining whether or not is already registered; and When the document file is not registered, search data corresponding to the document file to be newly registered is generated, and the generated search data and link information for the document file to be newly registered are registered in the database. Registering only link information to the newly registered document file in association with the search data corresponding to the document file if a document file having a predetermined similarity or higher is already registered. And a method for controlling registration of a search document.

9. A computer-readable storage medium storing a computer program for searching for a document file registered in a database, wherein the computer program specifies registration of the document file in the database by a user. An automatic registration step that is periodically executed according to a schedule of; and, when registering a document file by the automatic registration step, a step of detecting whether a document file to be newly registered is already registered in the database, If it is detected that the document file to be newly registered is a document file that has already been registered, search data corresponding to the document file to be newly registered is generated,
An overwriting registration step of overwriting and registering the generated search data with respect to the search data of the corresponding document file in the database.

10. The computer program, when it is detected that the document file to be newly registered is a document file that has already been registered, the computer program deletes the document file to be registered newly from the corresponding document file. Determining whether or not the document file to be newly registered has not been updated if the document file to be newly registered has not been updated. 10. The storage medium according to claim 9, wherein execution of an overwrite registration process is prohibited.

11. The computer program, when it is detected that the document file to be newly registered is an unregistered document file, searches the database for a document file similar to the document file to be newly registered. Further comprising a step of determining whether or not a document file having a predetermined similarity or higher is already registered by searching using the application data, wherein a document file having a predetermined similarity or higher is already registered 10. The storage medium according to claim 9, wherein only the link information for the document file to be newly registered is registered in the database in association with the search data corresponding to the document file.

12. A computer readable storage medium storing a computer program for searching for a document file registered in a database, wherein the computer program specifies registration of the document file in the database by a user. An automatic registration step that is periodically executed according to a schedule of; and, when registering a document file by the automatic registration step, a step of detecting whether a document file to be newly registered is already registered in the database, When it is detected that the document file to be newly registered is an unregistered document file, a document file similar to the document file to be newly registered is searched for by using search data in the database, and thereby a predetermined file is searched. A document file with a similarity of Determining whether a document file having a predetermined similarity or more is not registered, and generating search data corresponding to the document file to be newly registered; Data and link information for the document file to be newly registered are registered in the database, and when a document file having a predetermined similarity or higher is already registered, the document file is linked to search data corresponding to the document file. Registering only link information for the document file to be newly registered in the database.