JP2000276480A

JP2000276480A - Document processing method and apparatus, and recording medium

Info

Publication number: JP2000276480A
Application number: JP11080390A
Authority: JP
Inventors: Katashi Nagao; 確長尾
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1999-03-24
Filing date: 1999-03-24
Publication date: 2000-10-06

Abstract

(57)【要約】【課題】日付と時間を遡って過去の分類項目と文書を
見つけ出す。【解決手段】複数のエレメントから構成された内部構
造に関する情報が付与された電子文書を複数の分類項目
に分類する分類モデルを用い、分類モデルに記録された
分類モデルの更新日時を参照して電子文書を処理するも
のであって、日時を入力する入力部２０と、入力部２０
で入力した日時に対応する分類モデルに基づいて電子文
書を検索する制御部１１とを有している。 (57) [Summary] [Problem] Find out past classification items and documents by going back date and time. SOLUTION: A classification model for classifying an electronic document, which is provided with information on an internal structure composed of a plurality of elements, into a plurality of classification items, and refers to an update date and time of the classification model recorded in the classification model. An input unit 20 for processing a document and inputting a date and time;
And a control unit 11 for searching for an electronic document based on the classification model corresponding to the date and time input in step (1).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、要素について内部
構造を付与された電子文書を処理する文書処理方法およ
び装置ならびに上記電子文書を処理するプログラムを記
録された記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document processing method and apparatus for processing an electronic document having an internal structure assigned to an element, and a recording medium on which a program for processing the electronic document is recorded.

【０００２】[0002]

【従来の技術】従来、インターネットにおいて、ウィン
ドウ形式でハイパーテキスト型情報を提供するアプリケ
ーションサービスとしてＷＷＷ（World Wide Web）が提
供されている。2. Description of the Related Art Conventionally, WWW (World Wide Web) has been provided as an application service for providing hypertext information in a window format on the Internet.

【０００３】ＷＷＷは、文書の作成、公開または共有化
の文書処理を実行し、新しいスタイルの文書の在り方を
示したシステムである。しかし、文書の実際上の利用の
観点からは、文書の内容に基づいた文書の分類や要約と
いった、ＷＷＷを越える高度な文書処理が求められてい
る。このような高度な文書処理には、文書の内容の機械
的な処理が不可欠である。[0003] The WWW is a system for executing document processing for creating, publishing, or sharing a document, and showing the way of a new style document. However, from the viewpoint of practical use of documents, advanced document processing beyond WWW, such as classification and summarization of documents based on the contents of the documents, is required. For such advanced document processing, mechanical processing of the contents of the document is indispensable.

【０００４】しかしながら、文書の内容の機械的な処理
は、以下のような理由から依然として困難である。第１
に、ハイパーテキストを記述する言語であるＨＴＭＬ
（Hyper Text Markup Language）は、文書の表現につい
ては規定するが、文書の内容についてはほとんど規定し
ていない。第２に、文書間に構成されたハイパーテキス
トのネットワークは、文書の読者にとって文書の内容を
理解するために必ずしも利用しやすいものではない。第
３に、一般に文書の著作者は読者の便宜を念頭に置かず
に著作するが、文書の読者の便宜が著作者の便宜と調整
されることはない。However, mechanical processing of the contents of a document is still difficult for the following reasons. First
HTML, a language that describes hypertext
(Hyper Text Markup Language) stipulates the expression of a document, but hardly specifies the content of the document. Second, a network of hypertexts formed between documents is not always easy for a reader of the document to understand the contents of the document. Third, although the author of a document generally works without the convenience of the reader in mind, the convenience of the reader of the document is not coordinated with that of the author.

【０００５】このように、ＷＷＷは新しい文書の在り方
を示したシステムであるが、文書を機械的に処理しない
ために、高度な文書処理をおこなうことができなかっ
た。換言すると、高度な文書処理を実行するためには、
文書を機械的に処理することが必要となる。[0005] As described above, the WWW is a system showing the way of a new document. However, since the document is not mechanically processed, advanced document processing cannot be performed. In other words, to perform advanced document processing,
Documents need to be processed mechanically.

【０００６】そこで、文書の機械的な処理を目標とし
て、文書の機械的な処理を支援するシステムが自然言語
研究の成果に基づいて開発されている。自然言語研究に
よる文書処理として、文書の著作者等による文書の内部
構造についての属性情報、いわゆるタグの付与を前提と
した、文書に付与されたタグを利用する機械的な文書処
理が提案されている。Therefore, a system for supporting mechanical processing of documents has been developed based on the results of natural language research, with the goal of mechanical processing of documents. As a document processing based on natural language research, mechanical document processing using tags attached to a document has been proposed on the assumption that attribute information about the internal structure of the document by the author of the document, so-called tags are added. I have.

【０００７】[0007]

【発明が解決しようとする課題】ところで、近年のコン
ピュータの普及や、ネットワーク化の進展に伴い、文書
処理の高機能化が求められている。たとえば、ネットワ
ークを介して受信した文書をコンピュータに蓄積してい
くと、蓄積された文書の量は次第に膨大なものになる。By the way, with the recent spread of computers and the progress of networking, there has been a demand for sophisticated document processing. For example, as documents received via a network are stored in a computer, the amount of stored documents becomes enormous.

【０００８】本発明は、上述の実情に鑑みてなされるも
のであって、蓄積した文書から日付と時間を遡って所望
の文書を容易に検索するような文書処理方法および装置
ならびに蓄積した文書から日付と時間を遡って所望の文
書を容易に検索するような文書処理プログラムが記録さ
れた記録媒体を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-described circumstances, and is directed to a document processing method and apparatus for easily searching for a desired document by going back to the date and time from the stored document and from the stored document. It is an object of the present invention to provide a recording medium in which a document processing program for easily retrieving a desired document by going back date and time is recorded.

【０００９】[0009]

【課題を解決するための手段】上述の課題を解決するた
めに、本発明に係る文書処理方法は、分類モデルに記録
された上記分類モデルの更新日時を参照して電子文書を
処理する文書処理方法であって、日時を入力する入力工
程と、上記入力工程で入力した日時に対応する分類モデ
ルに基づいて電子文書を検索する検索工程と有するもの
である。In order to solve the above-mentioned problems, a document processing method according to the present invention provides a document processing method for processing an electronic document by referring to the update date of the classification model recorded in the classification model. The method includes an input step of inputting a date and time, and a search step of searching an electronic document based on a classification model corresponding to the date and time input in the input step.

【００１０】本発明に係る文書処理装置は、分類モデル
に記録された上記分類モデルの更新日時を参照して電子
文書を処理する文書処理装置であって、日時を入力する
入力手段と、上記入力手段で入力した日時に対応する分
類モデルに基づいて電子文書を検索する検索手段とを有
するものである。A document processing apparatus according to the present invention is a document processing apparatus for processing an electronic document by referring to an update date and time of the classification model recorded in the classification model, wherein the input means inputs a date and time; Search means for searching for an electronic document based on the classification model corresponding to the date and time input by the means.

【００１１】本発明に係る記録媒体は、分類モデルに記
録された上記分類モデルの更新日時を参照して電子文書
を処理する文書処理プログラムが記録された記録媒体で
あって、上記文書処理プログラムは、日時を入力する入
力処理と、上記入力処理で入力した日時に対応する分類
モデルに基づいて電子文書を検索する検索処理とを有す
るものである。[0011] A recording medium according to the present invention is a recording medium in which a document processing program for processing an electronic document with reference to the update date and time of the classification model recorded in the classification model is recorded. , An input process for inputting a date and time, and a search process for searching an electronic document based on a classification model corresponding to the date and time input in the input process.

【００１２】[0012]

【発明の実施の形態】以下、図面を参照して、本発明に
係る文書処理方法および装置ならびに記録媒体の実施の
形態について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of a document processing method and apparatus and a recording medium according to the present invention will be described with reference to the drawings.

【００１３】本発明の実施の形態としての文書処理装置
は、図１に示すように、制御部１１およびインターフェ
ース１２を備える本体１０と、ユーザからの入力を受け
て本体１０に送る入力部２０と、外部からの信号を受信
して本体１０に送る受信部２１と、本体１０からの出力
を表示する表示部３０と、記録媒体３２に対して情報を
記録／再生する記録／再生部３１とを有している。As shown in FIG. 1, a document processing apparatus according to an embodiment of the present invention includes a main unit 10 having a control unit 11 and an interface 12, an input unit 20 which receives an input from a user and sends it to the main unit 10. A receiving unit 21 that receives an external signal and sends it to the main unit 10, a display unit 30 that displays an output from the main unit 10, and a recording / reproducing unit 31 that records / reproduces information on / from a recording medium 32. Have.

【００１４】本体１０は、制御部１１およびインターフ
ェース１２を有し、この文書処理装置の主要な部分を構
成している。制御部１１は、この文書処理装置における
処理を実行するＣＰＵ１３と、揮発性のメモリであるＲ
ＡＭ１４と、不揮発性のメモリであるＲＯＭ１５とを有
している。ＣＰＵ１３は、たとえばＲＯＭ１５に記録さ
れた手順にしたがって、必要な場合にはデータを一時的
にＲＡＭ１４に格納して、プログラムを実行するための
制御をおこなう。インターフェース１２には、入力部２
０、受信部２１、表示部３０および記録／再生部３１が
接続される。インターフェース１２は、制御部１１の制
御の下に、入力部２０および受信部２１からのデータの
入力、表示部３０へのデータの送信、記録／再生部３１
に対するデータの送受信について、データを送信するタ
イミングを調整したり、データの形式を変換したりす
る。The main body 10 has a controller 11 and an interface 12, and constitutes a main part of the document processing apparatus. The control unit 11 includes a CPU 13 that executes processing in the document processing apparatus, and an R that is a volatile memory.
It has an AM 14 and a ROM 15 which is a nonvolatile memory. The CPU 13 temporarily stores data, if necessary, in the RAM 14 according to a procedure recorded in the ROM 15, for example, and performs control for executing the program. The interface 12 includes an input unit 2
0, the receiving unit 21, the display unit 30, and the recording / reproducing unit 31 are connected. Under the control of the control unit 11, the interface 12 inputs data from the input unit 20 and the reception unit 21, transmits data to the display unit 30, and records / reproduces the data.
For data transmission / reception to / from, the timing of data transmission is adjusted or the format of data is converted.

【００１５】入力部２０は、この文書処理装置に対する
ユーザの入力を受ける部分である。この入力部２０は、
たとえばキーボードやマウスにより構成される。ユーザ
は、この入力部２０を用い、キーボードによりキーワー
ドを入力したり、マウスにより表示部３０に表示されて
いる文書のエレメントを選択して入力したりすることが
できる。ここで、エレメントとは文書を構成する要素で
あって、たとえば文書、文および語が含まれる。The input section 20 is a section for receiving an input from the user to the document processing apparatus. This input unit 20
For example, it is composed of a keyboard and a mouse. The user can use the input unit 20 to input a keyword using a keyboard or select and input an element of a document displayed on the display unit 30 using a mouse. Here, the element is an element constituting a document, and includes, for example, a document, a sentence, and a word.

【００１６】受信部２１は、この文書処理装置に外部か
らたとえば通信回線を介して送信される信号を受信する
部分である。この受信部２１は、電子文書である複数の
文書を受信する。受信部２１は、受信したデータを本体
１０に送る。The receiving section 21 is a section for receiving a signal transmitted from the outside to the document processing apparatus via, for example, a communication line. The receiving unit 21 receives a plurality of documents that are electronic documents. The receiving unit 21 sends the received data to the main unit 10.

【００１７】表示部３０は、この文書処理装置からの出
力を表示する。この表示部３０は、たとえば陰極線管
（cathode ray tube;CRT）や液晶表示装置（liquid cry
stal display;LCD）から構成され、たとえば単数または
複数のウィンドウを表示し、このウィンドウ上に文字、
図形等を表示したりする。The display unit 30 displays the output from the document processing device. The display unit 30 is, for example, a cathode ray tube (CRT) or a liquid crystal display (liquid crytal).
stal display (LCD), which displays one or more windows and displays text,
For example, displaying figures.

【００１８】記録／再生部３１は、たとえばいわゆるフ
ロッピーディスクのような記録媒体３２に対して情報の
記録／再生をおこなう。記録媒体３２には、文書を処理
する文書処理プログラムが記録されている。この記録媒
体３２についてはさらに後述する。The recording / reproducing section 31 records / reproduces information on / from a recording medium 32 such as a so-called floppy disk. In the recording medium 32, a document processing program for processing a document is recorded. This recording medium 32 will be further described later.

【００１９】続いて、本実施の形態における文書につい
て説明する。本実施の形態においては、文書は、その内
部構造をタグ付けによる属性情報によって記述されてい
る。文書処理装置における文書処理は、文書に付与され
たタグを参照しておこなわれる。本実施の形態において
は、文書の構造を示す統語論的（syntactic）タグとと
もに、多言語間で文書の機械的な内容理解を可能にする
ような意味的（semantic）・語用論的タグを文書に与え
ている。Next, a document according to this embodiment will be described. In the present embodiment, a document has its internal structure described by attribute information by tagging. Document processing in the document processing device is performed with reference to a tag attached to the document. In the present embodiment, a semantic / pragmatic tag that enables mechanical understanding of the content of a document between multiple languages is added together with a syntactic tag indicating the structure of the document. Has given to the document.

【００２０】本実施の形態においては、統語論的なタグ
付けとしては、文書の内部構造を記述するものがある。
タグ付けによる内部構造は、図２に示すように、文書、
文、語彙エレメント等の各エレメントが、通常リンク、
参照・被参照リンクによりリンクされて構成されてい
る。図中において、白丸“○”はエレメントを示し、最
下位の白丸は文書における最小レベルの語に対応する語
彙エレメントである。また、実線は文書、文、語彙等の
エレメント間のつながり示す通常リンク（normallink
）である。破線は参照・被参照による係り受け関係を
示す参照リンク（reference link）である。文書の内部
構造は、上位から下位への順序で、文書（document）、
サブディビジョン（subdivision ）、段落（paragrap
h）、文（sentence ）、サブセンテンシャルセグメント
（subsentential segment ）、・・・、語彙エレメント
から構成される。このうち、サブディビジョンと段落は
オプションである。In this embodiment, syntactic tagging describes the internal structure of a document.
The internal structure by tagging is as shown in FIG.
Each element such as sentence, vocabulary element etc. is usually a link,
It is configured to be linked by a reference / referenced link. In the figure, a white circle “○” indicates an element, and the lowest white circle is a vocabulary element corresponding to the minimum level word in the document. A solid line indicates a normal link (normallink) indicating a connection between elements such as a document, a sentence, and a vocabulary.
). A broken line is a reference link indicating a dependency relationship between a reference and a referenced. The internal structure of a document is document, document,
Subdivision, paragraph (paragrap)
h), sentence (sentence), subsentential segment (subsentential segment),..., vocabulary elements. Of these, subdivisions and paragraphs are optional.

【００２１】本実施の形態においては、意味論・語用論
的なタグ付けとしては、多義語の意味のように意味等の
情報を記述するものがある。本実施の形態におけるタグ
付けは、ＨＴＭＬ（Hyper Text Markup Language）と同
様なＸＭＬ（Extended Markup Language）の形式による
ものである。In the present embodiment, as the semantic / pragmatic tagging, there is a method of describing information such as a meaning like a meaning of a polysemous word. Tagging in the present embodiment is based on an XML (Extended Markup Language) format similar to HTML (Hyper Text Markup Language).

【００２２】タグ付けの一例を次に示すが、文書へのタ
グ付けはこの方法に限られない。また、以下では英語と
日本語の文書の例を示すが、タグ付けによる内部構造の
記述は他の言語にも同様に適用することができる。An example of tagging is shown below, but tagging a document is not limited to this method. In the following, examples of English and Japanese documents are shown, but the description of the internal structure by tagging can be similarly applied to other languages.

【００２３】たとえば、“Time flies like an arro
w.”という文については、下記のようなタグ付けをする
ことができる。For example, "Time flies like an arro
The following tag can be attached to the sentence "w."

【００２４】＜文＞＜名詞句語義＝“time０”＞time＜／名詞句＞＜動詞句＞＜動詞語義＝“fly１”＞flies＜／動詞＞＜形容動詞句＞＜形容動詞語義＝like０＞like＜／形
容動詞＞＜名詞句＞an ＜名詞語義＝“arrow０”＞arrow＜／名詞＞＜／名詞
句＞＜／形容動詞句＞＜／動詞句＞.＜／文＞ここで＜文＞、＜名詞＞、＜名詞句＞、＜動詞＞、＜動
詞句＞、＜形容動詞＞、＜形容動詞句＞は、それぞれ
文、名詞、名詞句、動詞、動詞句、形容詞を含む前置詞
句または後置詞句／形容詞句、形容詞句／形容動詞句の
ような文の統語構造（syntactic structure ）を表して
いる。タグは、エレメントの先端の直前および終端の直
後に対応して配置される。ここでは、エレメントの終端
の直後に配置されるタグは、記号“／”によりエレメン
トの終端であることを示している。このエレメントは統
語的構成素、すなわち句、節、および文を示す。なお、
語義＝“time０”は、語“time”の有する複数の意味、
すなわち複数の語義のうちの第０番目の意味を指してい
る。具体的には、“time”には名詞と動詞があるが、こ
こでは語“time”が名詞であることを示している。同様
に、語“オレンジ”は色と果物の意味があるが、これら
も語義によって区別することができる。<Sentence><noun phrase word meaning = “time0”> time </ noun phrase><verbphrase><verb word meaning = “fly1”> flies <// verb><adjective verb phrase><adjective verb word meaning = like0> like 〈adjective verb〉〈noun phrase〉 an 〈noun word sense ＝ “arrow0”> arrow 〈/ noun〉〈／ noun phrase〉〈／ adjective verb phrase〉〈／ verb phrase〉. , <Noun>, <noun phrase>, <verb>, <verb phrase>, <adjective verb>, <adjective verb phrase> are prepositional phrases including sentences, nouns, noun phrases, verbs, verb phrases, adjectives, respectively. It represents a syntactic structure of a sentence such as a postposition phrase / adjective phrase or an adjective phrase / adjective verb phrase. Tags are arranged corresponding to immediately before the head of the element and immediately after the end of the element. Here, the tag placed immediately after the end of the element indicates that it is the end of the element by the symbol “/”. This element indicates syntactic constructs, ie, phrases, clauses, and sentences. In addition,
The meaning = “time0” is a plural meaning of the word “time”,
That is, it indicates the 0th meaning of the plural meanings. Specifically, “time” has a noun and a verb, but here, the word “time” indicates that it is a noun. Similarly, the word "orange" has the meaning of color and fruit, which can also be distinguished by meaning.

【００２５】本実施の形態における文書は、図３に示す
ように、表示部３０のウィンドウ１０１に統語構造を表
示することができる。このウィンドウ１０１において
は、右半面１０３に語彙エレメントが、左半面１０２に
文の構造がそれぞれ表示されている。The document according to the present embodiment can display a syntactic structure in a window 101 of the display unit 30 as shown in FIG. In this window 101, vocabulary elements are displayed on the right half 103, and sentence structures are displayed on the left half 102, respectively.

【００２６】このウィンドウ１０１には、タグ付けによ
り内部構造を記述された次に示すような文書「Ａ氏のＢ
会が終わったＣ市で、一部の大衆紙と一般紙がその写真
報道を自主規制する方針を紙面で明らかにした。」の一
部が表示されている。この文書のタグ付けの例を次に示
す。In this window 101, the following document "Mr. A's B"
In C City, where the event ended, some popular and general newspapers have announced on their papers that they will voluntarily regulate their photographic coverage. Is displayed. An example of tagging this document follows.

【００２７】＜文書＞＜文＞＜形容動詞句関係＝“位
置”＞＜名詞句＞＜形容動詞句場所＝“Ｃ市”＞＜形容動詞句関係＝“主語”＞＜名詞句識別子＝
“Ｂ会”＞＜形容動詞句関係＝“所属”＞＜人名識別子＝“Ａ氏”＞Ａ氏＜／
人名＞の＜／形容動詞句＞＜組織名識別子＝“Ｂ会”
＞Ｂ会＜／組織名＞＜／名詞句＞が＜／形容動詞句＞終わった＜／形容動詞句＞＜地名識別子＝“Ｃ市”＞
Ｃ市＜／地名＞＜／名詞句＞で、＜／形容動詞句＞＜形
容動詞句関係＝“主語”＞＜名詞句識別子＝“pres
s” 統語＝“並列”＞＜名詞句＞＜形容動詞句＞一部
の＜／形容動詞句＞大衆紙＜／名詞句＞と＜名詞＞一般
紙＜／名詞＞＜／名詞句＞が＜／形容動詞句＞＜形容動詞句関係＝“目的語”＞＜形容動詞句関係
＝“内容” 主語＝“press”＞＜形容動詞句関係＝
“目的語”＞＜名詞句＞＜形容動詞句＞＜名詞共参照＝
“Ｂ会”＞そ＜／名詞＞の＜／形容動詞句＞写真報道＜
／名詞句＞を＜／形容動詞句＞自主規制する＜／形容動詞句＞方針を＜／形容動詞句＞＜形容動詞句関係＝“位置”＞紙面で＜／形容動詞句
＞明らかにした。＜／文＞＜／文書＞この文書においては、「一部の大衆紙と一般紙」は、統
語＝“並列”というタグにより並列であることが表され
ている。並列の定義は、係り受け関係を共有すると言う
ことである。特に何も指定がない場合は、たとえば、＜
名詞句関係＝ｘ＞＜名詞＞Ａ＜／名詞＞＜名詞＞Ｂ＜
／名詞＞＜／名詞句＞はＡがＢに依存関係のあること
を表す。関係＝ｘは関係属性を表す。<Document><sentence><adjective verb phrase relation = “position”><nounphrase><adjective verb phrase place = “C city”><adjective verb phrase relation = “subject”><noun phrase identifier =
“B meeting”><adjective verb phrase relation = “affiliation”><person name identifier = “Mr. A”> Mr.
<Person name></ adjective verb phrase><organization name identifier = "B meeting"
> B meeting </ organization name></ noun phrase></ adjective verb phrase> finished </ adjective verb phrase><place name identifier = "C city">
C city </ place name></ noun phrase>, </ adjective verb phrase><adjective verb phrase relation = "subject"><noun phrase identifier = "pres
s ”syntactic =“ parallel ”><nounphrase><adjective verb phrase> Some </ adjective verb phrases> popular paper </ noun phrase> and <noun> general paper </ noun></ noun phrase> / Adjective verb phrase><adjective verb phrase relation = “object”><adjective verb phrase relation = “content” subject = “press”><adjective verb phrase relation =
"Object"><nounphrase><adjective verb phrase><noun co-reference =
"B-kai"> so </ noun></ adjective verb phrase> photo coverage <
</ Noun phrase></ adjective verb phrase> Self-regulating </ adjective verb phrase> Policy </ adjective verb phrase><adjective verb phrase relation = “position”></ adjective verb phrase> on paper. </ Sentence></document> In this document, "part of popular paper and general paper" is expressed in parallel by a tag of syntactic = "parallel". The definition of parallel is to share a dependency relationship. If nothing is specified, for example, <
Noun phrase relation = x><noun> A </ noun><noun> B <
/ Noun></ noun phrase> indicates that A has a dependency on B. Relation = x represents a relation attribute.

【００２８】関係続性は、統語、意味、修辞についての
相互関係を記述する。主語、目的語、間接目的語のよう
な文法機能、動作主、被動作者、受益者などのような主
題役割、および理由、結果などのような修辞関係はこの
関係属性により記述される。本実施の形態では、主語、
目的語、間接目的語のような比較的容易な文法機能につ
いて関係属性を記述する。Relational continuity describes the interrelationships between syntactic, semantic, and rhetorical. Grammar functions such as subjects, objects, and indirect objects, subject roles such as an actor, a subject, a beneficiary, etc., and rhetorical relations such as a reason and a result are described by the relation attributes. In the present embodiment, the subject,
Describe relational attributes for relatively easy grammar functions such as object and indirect objects.

【００２９】また、この文書においては、“Ａ氏”、
“Ｂ会”、“Ｃ市”のような固有名詞について、地名、
人名、組織名等のタグにより属性が記述されている。こ
れら地名、人名、組織名等のタグが付与される語は常に
固有名詞である。In this document, "Mr. A"
For proper nouns such as "B Association" and "C City", place names,
Attributes are described by tags such as person names and organization names. These words to which tags such as place names, personal names, and organization names are given are always proper nouns.

【００３０】文書処理装置の受信部２１は、タグ付けに
よる内部構造を有する文書を受信する。受信部２１は受
信した文書を本体１０に送る。本体に送られた文書は、
たとえばＲＡＭ１４に記憶される。このように文書処理
装置には、受信した単数または複数の文書が蓄積され
る。ユーザは、入力部２０に入力することにより、所望
の文書を表示部３０に表示させることができる。The receiving section 21 of the document processing apparatus receives a document having an internal structure by tagging. The receiving unit 21 sends the received document to the main body 10. Documents sent to the main unit
For example, it is stored in the RAM 14. As described above, the received document is stored in the document processing apparatus. The user can cause the display unit 30 to display a desired document by inputting to the input unit 20.

【００３１】表示部３０において、文書は、たとえば大
きさを変更することができるウィンドウ上に表示され
る。文書の表示は、複数の文書を複数のウィンドウによ
り並べて表示したり、複数のウィンドウを重ねて表示す
ることができる。また、文書の表示に文書の要約の表示
を代えることができる。On display unit 30, the document is displayed on a window whose size can be changed, for example. The document can be displayed by arranging a plurality of documents side by side by a plurality of windows, or by overlapping a plurality of windows. Also, the display of the document summary can be replaced with the display of the document.

【００３２】文書処理装置の制御部１１は、ユーザによ
る入力部２０への入力に応じて、表示部３０に表示され
た文書に各種の処理を施す。ユーザによる入力部２０へ
の入力は、入力部２０のマウスをクリックすることによ
り表示部３０の領域を指定したり、入力部２０のキーボ
ードによりキーワードを入力したりすることによりおこ
なう。表示部３０には、入力部２０のマウスに連動する
カーソルが表示されている。The control unit 11 of the document processing apparatus performs various processes on the document displayed on the display unit 30 according to the input to the input unit 20 by the user. The input to the input unit 20 by the user is performed by specifying the area of the display unit 30 by clicking the mouse of the input unit 20 or by inputting a keyword using the keyboard of the input unit 20. The display unit 30 displays a cursor linked to the mouse of the input unit 20.

【００３３】文書処理装置は、受信した複数の文書から
各文書の特徴を表す特徴情報すなわちインデックスを作
成する。図４を参照して、インデックスの作成について
説明する。The document processing apparatus creates feature information representing the features of each document, that is, an index, from the plurality of received documents. With reference to FIG. 4, creation of an index will be described.

【００３４】ステップＳ１１では、文書処理装置の受信
部２１は、外部から送信される複数の文書を受信する。
文書処理装置は、受信部２１にて受信された複数の文書
を、制御部１１の制御の下に、たとえばＲＡＭ１４に記
憶させる。In step S11, the receiving section 21 of the document processing apparatus receives a plurality of externally transmitted documents.
The document processing device stores the plurality of documents received by the receiving unit 21 in, for example, the RAM 14 under the control of the control unit 11.

【００３５】ステップＳ１２では、文書処理装置の制御
部１１は、各文書の特徴を抽出したインデックスの作成
をおこなう。すなわち、制御部１１は、たとえばＲＡＭ
１４に記憶させたステップＳ１１で受信された複数の文
書を読み出して各文書のインデックスを作成する。制御
部１１は、作成したインデックスをたとえばＲＡＭ１４
に記憶させる。In step S12, the control unit 11 of the document processing apparatus creates an index by extracting the characteristics of each document. That is, for example, the control unit 11
Then, a plurality of documents received in step S11 stored in step S11 are read and an index of each document is created. The control unit 11 stores the created index in the RAM 14
To memorize.

【００３６】インデックスは、各文書の特徴を表すもの
である。文書処理装置は、文書分類の分類モデルに基づ
いて、それぞれの文書のインデックスを参照して文書を
自動的に分類する。分類モデルは、文書を分類する複数
の分類項目すなわちカテゴリから構成される。各カテゴ
リは、その特徴を表すカテゴリインデックスを有してい
る。The index represents the characteristics of each document. The document processing apparatus automatically classifies documents by referring to the index of each document based on the classification model of the document classification. The classification model includes a plurality of classification items, that is, categories, for classifying documents. Each category has a category index representing its characteristics.

【００３７】ステップＳ１３においては、ユーザは、文
書処理装置の表示部３０に表示された文書を閲覧する。
すなわち、文書処理装置の制御部１１は、ユーザの希望
に応じて、記憶する複数の文書から所望の文書を表示部
３０に表示するように制御する。ウィンドウの表示領域
の大きさにより、文書の全部が表示できないときは、ウ
ィンドウには文書の一部が表示される。所望の文書の選
択は、ユーザが入力部２０で選択することによりおこな
う。なお、このステップＳ１３は、ユーザの必要に応じ
て設けられる。また、図中においてこのステップＳ１４
が平行四辺形で表されているのは、ユーザが操作するこ
とに対応している。以下も同様である。In step S13, the user browses the document displayed on the display unit 30 of the document processing device.
That is, the control unit 11 of the document processing apparatus controls the display unit 30 to display a desired document from a plurality of stored documents as desired by the user. If the entire document cannot be displayed due to the size of the display area of the window, a part of the document is displayed in the window. The selection of a desired document is performed by the user selecting with the input unit 20. Step S13 is provided as needed by the user. Also, in the figure, this step S14
Is represented by a parallelogram, which corresponds to a user operation. The same applies to the following.

【００３８】ここで、表示部３０における表示の具体例
について説明する。この具体例とは、ユーザが自由に文
書を分類するカテゴリを設定、変更できるグラフィック
ユーザインターフェース（graphic user interface; GU
I）である。このＧＵＩにおいては、ユーザが設定した
カテゴリに応じて文書の自動分類がおこなわれる。Here, a specific example of display on the display unit 30 will be described. This example is a graphic user interface (GU) that allows the user to freely set and change categories for classifying documents.
I). In this GUI, documents are automatically classified according to the category set by the user.

【００３９】図５に示すように、このＧＵＩのウィンド
ウ３０１には、操作ボタン３０２、“スポーツニュー
ス”を表示する第１の分類表示部３０３、“ビジネスニ
ュース”を表示する第２の分類表示部３０４、“政治ニ
ュース”を表示する第３の分類表示部３０５などの各カ
テゴリが表示されている。各カテゴリに対応する分類表
示部には、文書のタイトルや文書の最初の部分が表示さ
れる。As shown in FIG. 5, the GUI window 301 includes an operation button 302, a first category display section 303 for displaying "sports news", and a second category display section for displaying "business news". Each category is displayed, such as a third classification display section 305 for displaying “political news” 304. The classification display section corresponding to each category displays the title of the document and the first part of the document.

【００４０】また、このウィンドウ３０１においては、
操作ボタン３０２は、画面のウィンドウの状態を初期の
位置にもどすポジションリセット（position reset）
と、文書の内容を閲読するブラウザ（browser ）を呼び
出すブラウザのボタンと、このウィンドウからの脱出
（exit）のボタンとを含んでいる。なお、上述の各分類
表示部の大きさは固定的ではなく変更することができ
る。各分類表示部のタイトルまたはラベルも変更するこ
とができる。In this window 301,
The operation button 302 is used to reset the state of the window on the screen to an initial position (position reset).
And a button for a browser that invokes a browser for reading the contents of the document, and a button for exiting from this window. Note that the size of each of the above-described classification display sections is not fixed but can be changed. The title or label of each category display can also be changed.

【００４１】文書の自動分類は、ユーザの個別の要求に
応じてカテゴリを決めることにより、ユーザの関心に応
えたり、ユーザが文書を探すときの効率の向上を図るも
のである。The automatic classification of documents aims at responding to the interests of the user and improving the efficiency when the user searches for a document by determining the category according to the individual request of the user.

【００４２】このウィンドウ３０１においては、現在の
時点での文書の分類の他に、過去の日時に遡って自分が
過去に行った分類のカテゴリやそのカテゴリに属する文
書にアクセスすることができる。ここで、“日時”と
は、日付と時間を合わせて称するものである。以下も同
様とする。In this window 301, in addition to the classification of the document at the present time, it is possible to access the category of the classification that the user has performed in the past and the documents belonging to that category by going back to the past date and time. Here, “date and time” refers to both date and time. The same applies to the following.

【００４３】文書処理装置の制御部１１は、入力部２０
からの過去の日時の入力に応じて、分類モデルに記録さ
れた更新日時を参照して、合致する分類モデルを検索す
る。そして、制御部１１は、検索された分類モデルにつ
いて、分類状況を再現して表示部３０のＧＵＩのウィン
ドウ３０１に再現する。The control unit 11 of the document processing device includes an input unit 20
In accordance with the input of the past date and time from, the updated classification date and time recorded in the classification model are referred to to search for a matching classification model. Then, the control unit 11 reproduces the classification status of the retrieved classification model in the GUI window 301 of the display unit 30.

【００４４】たとえば、このＧＵＩのウィンドウ３０１
には、ユーザの所望の過去の日時の分類モデルによっ
て、第１の分類表示部３０３の“スポーツニュース”、
第２の分類表示部３０４の“ビジネスニュース”、第３
の分類表示部３０５の“政治ニュース”等の各カテゴリ
に、過去の分類モデルに基づいて文書が表示される。For example, the window 301 of this GUI
According to the classification model of the past date and time desired by the user, “sports news” in the first classification display unit 303,
“Business News” in the second classification display unit 304, third
The document is displayed in each category such as “political news” of the category display unit 305 based on the past category model.

【００４５】さらに、ユーザは、たとえば第１の分類表
示部３０３の“スポーツニュース”などのカテゴリごと
に時間を遡って文書を検索することができる。これは、
カテゴリインデックスに記録された各文書の更新日時を
参照しておこなわれる。このように、文書処理装置は、
時間を遡ることにより、過去の分類のカテゴリや、その
カテゴリに属する文書にアクセスする便宜を提供する。Further, the user can search for a document by going back in time for each category such as "sports news" in the first classification display unit 303, for example. this is,
This is performed by referring to the update date and time of each document recorded in the category index. Thus, the document processing device
By going back in time, it provides the convenience of accessing the category of the past classification and the documents belonging to the category.

【００４６】ステップＳ１４においては、ユーザは、ス
テップＳ１３において文書処理装置の表示部３０にて閲
覧した複数の文書についてカテゴリを作成し、上記複数
の文書を分類する。文書処理装置においては、文書を分
類するカテゴリの設定は、たとえばカテゴリの数に対応
して分割された領域を有するウィンドウについて、所望
のカテゴリを追加したり、あるいは変更や削除をするこ
とによりおこなわれる。複数の文書のカテゴリへの分類
は、たとえばウィンドウに表示された文書のアイコンを
ドラッグすることにより、文書を所望の領域に移動する
ことによりおこなう。新たに作成したカテゴリや操作の
結果は、たとえばＲＡＭ１４に記憶される。なお、カテ
ゴリの作成および文書の分類の操作の詳細については、
さらに後述する。In step S14, the user creates a category for a plurality of documents browsed on the display unit 30 of the document processing apparatus in step S13, and classifies the plurality of documents. In the document processing apparatus, a category for classifying documents is set, for example, by adding, changing, or deleting a desired category in a window having an area divided according to the number of categories. . The classification of a plurality of documents into categories is performed by, for example, dragging an icon of a document displayed in a window to move the document to a desired area. The newly created category and the result of the operation are stored in the RAM 14, for example. For more information on creating categories and classifying documents,
Further details will be described later.

【００４７】ステップＳ１５においては、文書処理装置
の制御部１１は、ステップＳ１４においておこなわれた
カテゴリの作成と、このカテゴリに応じた分類操作に基
づいて、分類モデルの作成を実行する。文書処理装置
は、たとえばＲＡＭ１４に記憶されたステップＳ１４に
おけるカテゴリおよび分類操作の結果を読み出す。そし
て、文書処理装置の制御部１１は、この結果に基づい
て、各カテゴリに分類された上記複数の文書について、
各カテゴリに特徴的な固有名詞、固有名詞以外の語義、
分類された文書の文書アドレスを集めて、分類モデルを
生成する。In step S15, the control unit 11 of the document processing apparatus executes the creation of the category performed in step S14 and the creation of a classification model based on the classification operation according to this category. The document processing apparatus reads, for example, the result of the category and classification operation in step S14 stored in the RAM 14. Then, based on the result, the control unit 11 of the document processing apparatus performs, for the plurality of documents classified into each category,
Characteristic proper nouns in each category, meanings other than proper nouns,
The document addresses of the classified documents are collected to generate a classification model.

【００４８】ここで、固有名詞以外の場合に語そのもの
ではなく語義を用いるのは、同じ語でも複数の意味を有
することがあるからである。そして、文書処理装置の制
御部１１は、このように作成した分類モデルをたとえば
ＲＡＭ１４に記憶させる。なお、分類モデルの作成の詳
細については、さらに後述する。The reason why the meaning is used instead of the word itself in cases other than proper nouns is that the same word may have a plurality of meanings. Then, the control unit 11 of the document processing apparatus stores the classification model thus created in, for example, the RAM 14. The details of the generation of the classification model will be described later.

【００４９】そして、ステップＳ１６では、文書処理装
置の制御部１１は、ステップＳ１５で作成した分類モデ
ルを登録する。そして、この一連の工程を終了する。In step S16, the control unit 11 of the document processing device registers the classification model created in step S15. Then, this series of steps ends.

【００５０】たとえば図４のステップＳ１４で作成した
分類項目に対しては、図６に示すように、ユーザの手動
による分類モデルの変更の処理がおこなわれる。文書の
手動分類にでは、最初のステップＳ１７でたとえばステ
ップＳ１１で受信した文書を分類するカテゴリの分類
先、またカテゴリ自体をユーザが手動により更新する。
この手動による更新は、図５に示したＧＵＩのウィンド
ウ３０１にてユーザが操作することによりおこなわれ
る。For example, for the classification item created in step S14 in FIG. 4, a process of changing the classification model manually by the user is performed as shown in FIG. In the manual classification of documents, the user manually updates the classification destination of the category for classifying the document received in step S11, for example, and the category itself in the first step S17.
This manual update is performed by the user operating a GUI window 301 shown in FIG.

【００５１】ステップＳ１７では、ユーザは、ウィンド
ウ３０１において、複数の文書を分類するカテゴリの変
更をおこなう。たとえば、ユーザは、ウィンドウ３０１
に表示される分類表示部を新設したり、削除したりする
ことによりカテゴリ自体を変更することができる。ま
た、ユーザは、ウィンドウ３０１の各分類表示部に表示
された文書を他の分類表示部に移動することにより、文
書の分類先のカテゴリを変更することができる。In step S17, the user changes the category for classifying a plurality of documents in the window 301. For example, the user may select window 301
The category itself can be changed by newly installing or deleting the classification display section displayed in the section. Further, the user can change the category to which the document is classified by moving the document displayed on each classification display unit of the window 301 to another classification display unit.

【００５２】ステップＳ１８では、制御部１１は、ステ
ップＳ１７においてユーザにより入力された、カテゴリ
への文書の分類先や、カテゴリ自体の変更に基づいて、
分類モデルを変更する処理をおこなう。そして、ステッ
プＳ１９では、制御部１１は、変更した分類モデルを登
録する。たとえば、制御部１１は、変更した分類モデル
をＲＡＭ１４に記憶させる。なお、分類モデルの更新日
時も同時に登録しておくものとする。また、更新前の分
類モデルも保存しておく。In step S18, the control unit 11 determines, based on the classification destination of the document into the category and the change in the category itself, input by the user in step S17.
Perform processing to change the classification model. Then, in step S19, the control unit 11 registers the changed classification model. For example, the control unit 11 causes the RAM 14 to store the changed classification model. The update date and time of the classification model is also registered at the same time. The classification model before updating is also stored.

【００５３】図４および図６により、文書を分類する基
準となる分類モデルの作成および更新がおこなわれる。
この分類モデルを基準として、文書の自動的な分類をお
こなうことができる。Referring to FIGS. 4 and 6, a classification model serving as a reference for classifying documents is created and updated.
Automatic classification of documents can be performed based on this classification model.

【００５４】図７を参照して、文書処理装置がおこなう
文書の自動分類の動作について説明する。ステップＳ２
１では、文書処理装置の受信部２１は、外部からたとえ
ば通信回線を介して送信された新たな文書を受信する。
文書処理装置における文書の受信の動作については、ス
テップＳ１１で詳しく述べたので、ここでは説明を省略
する。受信した文書は、たとえばＲＡＭ１４に記憶され
る。Referring to FIG. 7, the operation of automatic document classification performed by the document processing apparatus will be described. Step S2
In 1, the receiving unit 21 of the document processing apparatus receives a new document transmitted from the outside via, for example, a communication line.
Since the operation of receiving a document in the document processing apparatus has been described in detail in step S11, the description is omitted here. The received document is stored in the RAM 14, for example.

【００５５】ステップＳ２２においては、文書処理装置
の制御部１１は、たとえばＲＡＭ１４に記憶されたステ
ップＳ２１で受信した文書を読み出す。制御部１１は、
この新たな文書から各文書の特徴を表す語を抽出するこ
とによりインデックスを作成する。また、インデックス
には、文書の受信時刻が記録される。そして、制御部１
１は、インデックスをたとえばＲＡＭ１４に記憶させ
る。In step S22, the control section 11 of the document processing apparatus reads the document received in step S21 stored in the RAM 14, for example. The control unit 11
An index is created by extracting words representing the characteristics of each document from the new document. The index records the reception time of the document. And the control unit 1
1 stores the index in the RAM 14, for example.

【００５６】ステップＳ２３においては、文書処理装置
の制御部１１は、分類モデルに基づいて、インデックス
を附された各文書を複数のカテゴリの一つに分類する。
そして、制御部１１は、分類の結果をたとえばＲＡＭ１
４に記憶させる。なお、このような文書の自動分類の詳
細については、さらに後述する。In step S23, the control unit 11 of the document processing apparatus classifies each indexed document into one of a plurality of categories based on the classification model.
Then, the control unit 11 stores the classification result in, for example, the RAM 1
4 is stored. The details of such automatic classification of documents will be described later.

【００５７】ステップＳ２４においては、文書処理装置
の制御部１１は、ステップＳ２３における分類の結果を
たとえばＲＡＭ１４から読み出し、自動分類の結果に基
づいて分類モデルを更新する。In step S24, the control section 11 of the document processing apparatus reads the result of the classification in step S23 from, for example, the RAM 14, and updates the classification model based on the result of the automatic classification.

【００５８】ステップＳ２５においては、文書処理装置
の制御部１１は、ステップＳ２４における分類モデル更
新の結果を登録する。この際に、カテゴリの更新日時を
記憶させるとともに、更新する前の分類モデルも消去せ
ずに記憶しておく。In step S25, the control section 11 of the document processing apparatus registers the result of the classification model update in step S24. At this time, the update date and time of the category are stored, and the classification model before the update is stored without being deleted.

【００５９】文書モデルは文書が新たに配信されたり、
カテゴリが変更されたりすると、新たなカテゴリを作成
し、更新日時を記録する。新しく作成されたカテゴリ以
外は、以前の特徴をそのままコピーする。これによっ
て、ユーザは過去の文書を容易に検索して内容を確認す
ることができる。The document model indicates that a document is newly distributed,
When a category is changed, a new category is created and the update date and time are recorded. Except for the newly created category, the previous features are copied as they are. This allows the user to easily search for past documents and check the contents.

【００６０】次に、Ｓ２２における文書の特徴を発見し
てインデックスを作る手順を説明する。インデックスの
作成は、図８に示すように、語義間関連度に基づいてお
こなわれる。Next, the procedure for finding the characteristics of the document and creating an index in S22 will be described. The creation of the index is performed based on the degree of association between meanings, as shown in FIG.

【００６１】ステップＳ３１においては、制御部１１
は、図４のステップＳ１１および図７のステップＳ２１
において受信した文書内で活性拡散を実行する。すなわ
ち、文書内の各エレメントの中心活性値を拡散する。中
心活性値の拡散処理については、さらに後述する。制御
部１１は、拡散処理により得られた中心活性値をたとえ
ばＲＡＭ１４に記憶させる。In step S31, the control unit 11
Corresponds to step S11 in FIG. 4 and step S21 in FIG.
Perform active spreading in the received document. That is, the central activity value of each element in the document is diffused. The central activity value diffusion process will be further described later. The control unit 11 stores, for example, the RAM 14 with the central activation value obtained by the diffusion processing.

【００６２】ステップＳ３２においては、制御部１１
は、ステップＳ１１で得られた各エレメントの中心活性
値に基づいて、中心活性値があらかじめ設定された閾値
を超えるエレメントを抽出する。制御部１１は、このよ
うに抽出したエレメントをたとえばＲＡＭ１４に記憶さ
せる。In step S32, the control unit 11
Extracts elements whose central activity value exceeds a preset threshold based on the central activity value of each element obtained in step S11. The control unit 11 causes the RAM 14 to store the extracted elements, for example.

【００６３】ステップＳ３３においては、制御部１１
は、ステップＳ３２にて抽出したエレメントをたとえば
ＲＡＭ１４から読み出す。制御部１１は、これらのエレ
メントからタグに基づいてすべての固有名詞を取り出し
てインデックスに加え、その結果をたとえばＲＡＭ１４
に記憶させる。In step S33, the control unit 11
Reads the element extracted in step S32 from the RAM 14, for example. The control unit 11 extracts all proper nouns from these elements based on tags and adds them to the index.
To memorize.

【００６４】固有名詞は複数の語義を持たず、辞書に載
っていないなどの特殊の性質を有するので、固有名詞以
外の語とは別に扱うものである。固有名詞は、上述した
ように地名、人名、組織名等のタグにより識別される。
たとえば、図３に示した文では、“Ａ氏”、“Ｂ会”お
よび“Ｃ市”は、それぞれ人名、組織名、地名であるこ
とがタグにより記述されているので、固有名詞であるこ
とがわかる。Proper nouns do not have a plurality of meanings and have special properties such as not being listed in a dictionary. Therefore, they are treated separately from words other than proper nouns. Proper nouns are identified by tags such as place names, personal names, and organization names as described above.
For example, in the sentence shown in FIG. 3, "Mr. A", "B Association" and "C City" are proper nouns because the tags describe that they are a person name, an organization name, and a place name, respectively. I understand.

【００６５】ステップＳ３４においては、制御部１１
は、たとえばＲＡＭ１４からステップＳ３２にて抽出し
たエレメントから固有名詞以外の語義を取り出してイン
デックスに加え、その結果をＲＡＭ１４に記憶させる。In step S34, control unit 11
Extracts a meaning other than a proper noun from the element extracted in step S32 from the RAM 14 and adds it to the index, and stores the result in the RAM 14.

【００６６】このように、インデックスは、文書の特徴
を発見して、その特徴を配列することによりおこなう。
すなわち、タグ付けによって内部構造が記述された上記
文書について、各語彙エレメントの中心活性値を拡散
し、拡散後の中心活性値が閾値より大きい語彙エレメン
トを抽出する。抽出された語彙エレメントについて、固
有名詞または語義をインデックスに追加する。As described above, the index is obtained by finding the features of the document and arranging the features.
That is, for the above-described document in which the internal structure is described by tagging, the central activity value of each vocabulary element is diffused, and vocabulary elements whose central activity value after diffusion is larger than a threshold value are extracted. Add proper nouns or meanings to the index for the extracted vocabulary elements.

【００６７】なお、インデックスには、文書の特徴とと
もに、その文書のＲＡＭ１４での記憶された位置を示す
文書アドレスを含めておく。インデックスはその文書の
特徴を表す語を含むので、所望の文書を検索する際に利
用することができる。インデックスの自動分類への適用
については、さらに後述する。It should be noted that the index includes a document address indicating the position of the document stored in the RAM 14 together with the characteristics of the document. Since the index includes words representing the characteristics of the document, it can be used when searching for a desired document. The application of the index to the automatic classification will be further described later.

【００６８】ここで、インデックスの具体例を示す。Here, a specific example of the index will be described.

【００６９】＜インデックス日付＝“AAAA/BB/CC”
時間＝“DD:EE:FF” 文書アドレス＝“1234”＞＜要約＞減税規模、触れず−Ｘ首相の会見＜／要約＞＜語語義＝“0003” 中心活性値＝“140.6”＞触れ
ず＜／語＞＜語語義＝“0105” 識別子＝“Ｘ” 中心活性値＝
“67.2”＞首相＜／語＞＜名前識別子＝“Ｘ” 語語義＝“6103” 中心活
性値＝“150.2”＞Ｘ首相＜／語＞＜語語義＝“5301” 中心活性値＝“120.6”＞求め
た＜／語＞＜語語義＝“2350” 識別子＝“Ｘ” 中心活性値＝
“31.4”＞首相＜／語＞＜語語義＝“9582” 中心活性値＝“182.3”＞強調
した＜／語＞＜語語義＝“2595” 中心活性値＝“93.6”＞触れる
＜／語＞＜語語義＝“9472” 中心活性値＝“12.0”＞予告し
た＜／語＞＜語語義＝“4934” 中心活性値＝“46.7”＞触れな
かった＜／語＞＜語語義＝“0178” 中心活性値＝“175.7”＞釈明
した＜／語＞＜語語義＝“7248” 識別子＝“Ｘ” 中心活性値＝
“130.6”＞私＜／語＞＜語語義＝“3684” 識別子＝“Ｘ” 中心活性値＝
“121.9”＞首相＜／語＞＜語語義＝“1824” 中心活性値＝“144.4.”＞訴え
た＜／語＞＜語語義＝“7289” 中心活性値＝“176.8”＞見せ
た＜／語＞＜／インデックス＞<Index date = “AAAA / BB / CC”
Time = “DD: EE: FF” Document address = “1234”><Summary> Tax reduction scale, untouched-Prime Minister X's interview </ Summary><Word meaning = "0003" Central activity value = "140.6"> Untouched </ Word><word meaning = “0105” identifier = “X” central activity value =
“67.2”> Prime Minister </ word><Name Identifier = “X” Word Meaning = “6103” Central Activity Value = “150.2”> X Prime Minister </ word><Word Meaning = “5301” Central Activity Value = “120.6” > Determined </ word><word Meaning = “2350” Identifier = “X” Central activity value =
“31.4”> Prime Minister </ word><word meaning = “9582” Central activity value = “182.3”> emphasized </ word><word meaning = “2595” Central activity value = “93.6”> touch </ word><Word meaning = “9472” central activity value = “12.0”> forecasted </ word><word meaning = “4934” central activity value = “46.7”> not touched </ word><word meaning = “0178” Central activity value = "175.7"> Explained </ word><word Meaning = "7248" Identifier = "X" Central activity value =
“130.6”> I </ word><word Meaning = “3684” Identifier = “X” Central activity value =
“121.9”> Prime Minister </ word><word meaning = “1824” Central activity value = “144.4.”> Appealed </ word><word meaning = “7289” Central activity value = “176.8”> Showed </ / Word></index>

【００７０】このインデックスにおいては、＜インデッ
クス＞および＜／インデックス＞は、インデックスの始
端および終端を、＜日付＞および＜時間＞はこのインデ
ックスが作成された日付および時間を、＜要約＞および
＜／要約＞はこのインデックスの内容の要約の始端およ
び終端を、＜語＞および＜／語＞は語の始端および終端
を示している。語義＝“0003”は、その語義が、複数の
語義のうちの第３番目であることを示している。他につ
いても同様である。また、各語に対して中心活性値が付
与されている。In this index, <index> and </ index> indicate the start and end of the index, <date> and <time> indicate the date and time when this index was created, <summary> and </ <Summary> indicates the start and end of the summary of the contents of the index, and <word> and </ word> indicate the start and end of the word. The meaning = “0003” indicates that the meaning is the third of a plurality of meanings. The same applies to other cases. In addition, a central activity value is assigned to each word.

【００７１】続いて、タグ付けによる内部構造に基づい
て、ステップＳ３１でおこなう活性拡散によりエレメン
トの中心活性値を拡散する方法について説明する。な
お、この活性拡散は、後述する図１６におけるステップ
Ｓ６２においても実行される。Next, a method of diffusing the central activation value of the element by the activation diffusion performed in step S31 based on the internal structure by tagging will be described. This active diffusion is also performed in step S62 in FIG. 16 described later.

【００７２】タグ付けによる内部構造を与えられた文書
においては、活性拡散と呼ばれる処理をおこなうことに
より、各エレメントにタグ付けによる内部構造に応じた
中心活性値を付与することができる。活性拡散は、中心
活性値の高いエレメントと関わりのあるエレメントにも
高い中心活性値を与えるような処理である。この中心活
性値は、タグ付けによる内部構造に応じて決定されるの
で、タグ付けによる内部構造を考慮した文書の分析に利
用することができる。In a document given an internal structure by tagging, a central activity value corresponding to the internal structure by tagging can be given to each element by performing a process called activity diffusion. Active diffusion is a process in which an element associated with an element having a high central activity value is also given a high central activity value. Since the central activity value is determined according to the internal structure by tagging, it can be used for analyzing a document in consideration of the internal structure by tagging.

【００７３】活性拡散は、図９のフローチャートに示す
一連の行程にしたがって、文書処理装置の制御部１１の
制御の下に実行される。The active diffusion is executed under the control of the control unit 11 of the document processing apparatus according to a series of steps shown in the flowchart of FIG.

【００７４】ステップＳ４１では、参照・被参照リンク
と通常リンクに関しては、エレメントを連結するリンク
の端点の端点活性値を０に設定する。制御部１１は、こ
のように付与した端点活性値の初期値をたとえばＲＡＭ
１４に記憶させる。In step S41, for the reference / referenced link and the normal link, the end point activation value of the end point of the link connecting the elements is set to 0. The control unit 11 stores the initial value of the end point activation value thus assigned, for example, in the RAM
14 is stored.

【００７５】エレメントとエレメントの連結は、たとえ
ば図１０に示すようになる。この図においては、文書を
構成するエレメントとリンクの構造の一部として、エレ
メントＥ_iおよびエレメントＥ_jが示されている。エレメ
ントＥ_iとエレメントＥ_jとは、中心活性値ｅ_iおよびｅ_j
をそれぞれ有し、リンクＬ_ijにて接続されている。リン
クＬ_ijのエレメントＥ_iに接続する端点はＴ_ij、エレメ
ントＥ_jに接続する端点はＴ_jiである。エレメントＥ
_iは、リンクＬ_ijにより接続されるエレメントＥ_jの他
に、リンクＬ_ik、Ｌ_ilおよびＬ_imによって図示しないエ
レメントＥ_k、Ｅ_lおよびＥ_mにそれぞれ接続している。
エレメントＥ_jは、エレメントＥ_jからみたリンクＬ_ijで
あるＬ_jiにより接続されるエレメントＥ_iの他に、リン
クＬ_jp、Ｌ_jqおよびＬ_jrによって図示しないエレメント
Ｅ_p、Ｅ_qおよびＥ_rにそれぞれ接続している。The connection between the elements is as shown in FIG. 10, for example. In this figure, an element _Ei and an element _Ej are shown as a part of the structure of the elements and links constituting the document. The element E _i and the element E _j are the central activation values e _i and e _j
And are connected by a link L _ij . The end point of the link L _ij connected to the element E _i is T _ij , and the end point of the link L _ij connected to the element E _j is T _ji . Element E
_i, in addition to the elements E _j, which is connected by a link L _ij, the link L _ik, L _il and L _im element E _k (not shown) by being connected to the E _l and E _m.
Element E _j, in addition to the elements E _i connected by L _ji is an element E _j viewed from the link L _ij, the link L _uk, L _jq and L not shown by _jr element E _p, the E _q and E _r Each is connected.

【００７６】ステップＳ４２においては、文書処理装置
の制御部１１は、文書を構成するエレメントＥ_iを計数
するカウンタの初期化をおこなう。すなわち、エレメン
トを計数するカウンタのカウント値ｉを１に設定する。
このカウンタは、第１番目のエレメントＥ₁を参照する
ことになる。[0076] In step S42, the control unit 11 of the document processing apparatus initializes the counter for counting the elements E _i of a document. That is, the count value i of the counter for counting elements is set to one.
This counter will refer to the _first element E1.

【００７７】ステップＳ４３においては、文書処理装置
の制御部１１は、カウンタが参照するエレメントについ
て、新たな中心活性値を計算するリンク処理を実行す
る。このリンク処理については、さらに後述する。In step S43, the control unit 11 of the document processing apparatus executes link processing for calculating a new central activation value for the element referred to by the counter. This link processing will be further described later.

【００７８】ステップＳ４４においては、文書処理装置
の制御部１１は、文書中のすべてのエレメントについて
新たな中心活性値の計算が完了したか否かを判断する。
そして、制御部１１は、文書中のすべてのエレメントに
ついて中心活性値の計算が完了したときには“ＹＥＳ”
としてステップＳ４５に処理を進め、文書中のすべての
エレメントについて新たな中心活性値の計算が完了して
いないときには“ＮＯ”としてステップＳ４７に処理を
進める。In step S44, the control unit 11 of the document processing device determines whether the calculation of the new central activation value has been completed for all the elements in the document.
Then, when the calculation of the central activation value is completed for all the elements in the document, the control unit 11 sets “YES”.
When the calculation of the new central activation value is not completed for all the elements in the document, “NO” is determined and the process proceeds to step S47.

【００７９】具体的には、制御部１１は、カウンタのカ
ウント値ｉが、文書の含むエレメントの総数に達したか
否かを判断する。そして、制御部１１は、カウンタのカ
ウント値ｉが文書に含まれるエレメントの総数に達した
ときには、すべてのエレメントが計算済みとしてステッ
プＳ４５に処理を進める。制御部１１は、カウンタのカ
ウント値ｉが文書に含まれるエレメントの総数に達して
いないときにはすべてのエレメントについて計算が終了
していないとしてステップＳ４７に処理を進める。Specifically, the control section 11 determines whether or not the count value i of the counter has reached the total number of elements included in the document. Then, when the count value i of the counter reaches the total number of elements included in the document, the control unit 11 determines that all elements have been calculated and proceeds to step S45. When the count value i of the counter has not reached the total number of elements included in the document, the control unit 11 determines that the calculation has not been completed for all elements, and proceeds to step S47.

【００８０】ステップＳ４７においては、文書処理装置
の制御部１１は、カウンタのカウント値ｉを１増加させ
て、カウンタのカウント値をｉ＋１とする。このことに
より、カウンタはｉ＋１番目のエレメント、すなわち次
のエレメントを参照する。そして、処理はステップＳ４
３にもどり、端点活性値の計算およびこれに続く一連の
行程が、次のｉ＋１番目のエレメントについて実行され
る。In step S47, the control section 11 of the document processing apparatus increases the count value i of the counter by 1, and sets the count value of the counter to i + 1. Thus, the counter refers to the (i + 1) th element, that is, the next element. Then, the process proceeds to step S4
Returning to 3, the calculation of the endpoint activity value and a series of subsequent steps are executed for the next (i + 1) th element.

【００８１】具体的には、制御部１１は、エレメントを
計数するカウンタのカウント値ｉを１増加する。このこ
とにより、カウンタはステップＳ４３で中心活性値が計
算されたエレメントの次のエレメントを参照することに
なる。More specifically, the control section 11 increases the count value i of the counter for counting elements by one. As a result, the counter refers to the element next to the element for which the central activity value was calculated in step S43.

【００８２】ステップＳ４５においては、文書処理装置
の制御部１１は、文書に含まれるすべてのエレメントの
中心活性値の変化分、すなわち新たに計算された中心活
性値の元の中心活性値に対する変化分について平均値を
計算する。In step S45, the control unit 11 of the document processing apparatus changes the central activity values of all the elements included in the document, that is, the variation of the newly calculated central activity value from the original central activity value. Calculate the average value for.

【００８３】文書処理装置の制御部１１は、たとえばＲ
ＡＭ１４に記憶された元の中心活性値と新たに計算した
中心活性値を、文書に含まれるすべてのエレメントにつ
いて読み出す。制御部１１は、新たに計算した中心活性
値の元の中心活性値に対するそれぞれの変化分の総和を
文書に含まれるエレメントの総数で除することにより、
すべてのエレメントの中心活性値の変化分の平均値を計
算する。制御部１１は、このように計算したすべてのエ
レメントの中心活性値の変化分の平均値を、たとえばＲ
ＡＭ１４に記憶させる。The control unit 11 of the document processing device is, for example, R
The original center activity value and the newly calculated center activity value stored in the AM 14 are read for all elements included in the document. The control unit 11 divides the sum of the respective changes of the newly calculated central activity value from the original central activity value by the total number of elements included in the document,
Calculate the average value of the change in the central activity value of all elements. The control unit 11 calculates the average value of the change of the central activity values of all the elements calculated as described above, for example, as
It is stored in AM14.

【００８４】ステップＳ４６においては、制御部１１
は、ステップＳ４９で計算したすべてのエレメントの中
心活性値の変化分の平均値が、あらかじめ設定された閾
値以内であるか否かを判断する。そして、制御部１１
は、上記変化分が閾値以内であると“ＹＥＳ”としてこ
の一連の行程を終了する。上記制御部１１は、上記変化
分が閾値以内でないときには“ＮＯ”として、ステップ
Ｓ４２にてカウンタのカウント値ｉを１に設定して文書
のエレメントの中心活性値を計算する一連の行程を再び
実行する。この一連の行程にて構成されるステップＳ４
２からステップＳ４４に至るループが繰り返されるごと
に上記変化分は徐々に減少する。In step S46, the control unit 11
Determines whether or not the average value of the change in the central activity value of all the elements calculated in step S49 is within a preset threshold value. And the control unit 11
Is "YES" if the change is within the threshold, and the series of steps is terminated. If the change is not within the threshold, the control unit 11 sets the count value i of the counter to 1 in step S42 and executes a series of steps for calculating the central activation value of the document element again in step S42. I do. Step S4 composed of this series of steps
Each time the loop from step 2 to step S44 is repeated, the amount of the change gradually decreases.

【００８５】続いて、ステップＳ４３にて実行されるリ
ンク処理について、図１１に示すフローチャートを参照
して説明する。なお、このフローチャートは、一つのエ
レメントＥ_iに対する処理であるが、中心活性値の拡散
処理の際には、リンク処理はすべてのエレメントに対し
ておこなわれる。Next, the link processing executed in step S43 will be described with reference to the flowchart shown in FIG. This flowchart is a process for one element E _i, during the diffusion process of the central activation value, the link process is performed for all the elements.

【００８６】ステップＳ５１においては、文書処理装置
の制御部１１は、文書を構成するエレメントＥ_iと一端
が接続されたリンクを計数するカウンタの初期化をおこ
なう。すなわち、リンクを計数するカウンタのカウント
値ｊを１に設定する。すなわち、このカウンタは、エレ
メントＥ_iと接続された第１番目のエレメントＬ_i1を参
照している。[0086] In step S51, the control unit 11 of the document processing apparatus initializes the counter for counting the link elements E _i and one end is connected of a document. That is, the count value j of the counter for counting the links is set to one. That is, this counter refers to the first element L _i1 connected to the element E _i .

【００８７】ステップＳ５２においては、エレメントＥ
_iとＥ_jを接続するリンクＬ_ijにおいては、制御部１１
は、タグを参照することにより、そのリンクＬ_ijが通常
リンクであるか否かを判断する。制御部１１は、リンク
Ｌ_ijが通常リンクと参照リンクのいずれであるかを判断
する。これは、関係属性のタグを参照することで判断さ
れる。制御部１１は、そのリンクが通常リンクのときに
は“ＹＥＳ”としてステップＳ５３に処理を進め、その
リンクが参照リンクのときには“ＮＯ”としてステップ
Ｓ５４に処理を進める。In step S52, element E
_In the link L _ij connecting _i and E _j , the control unit 11
Determines whether the link _Lij is a normal link by referring to the tag. The control unit 11 determines whether the link L _ij is a normal link or a reference link. This is determined by referring to the tag of the relation attribute. When the link is a normal link, the control unit 11 proceeds to step S53 as “YES”, and when the link is a reference link, proceeds to step S54 as “NO”.

【００８８】ステップＳ５３においては、エレメントＥ
_iの通常リンクＬ_ijに接続された端点Ｔ_ijの新たな端点
活性値を計算する処理がおこなわれる。In step S53, the element E
Processing for calculating a new endpoint activation value of the endpoint T _ij connected to the normal link L _{ij of} _i is performed.

【００８９】ここでは、ステップＳ５２における判別に
より、リンクＬ_ijは通常リンクであることが明らかにな
っている。エレメントＥ_iの通常リンクＬ_ijに接続され
る端点Ｔ_ijの端点活性値ｔ_ijは、エレメントＥ_jの端点
活性値のうち、リンクＬ_ij以外のリンクに接続するすべ
ての端点Ｔ_jp、Ｔ_jq、Ｔ_jrの端点活性値ｔ_jp、ｔ_jq、ｔ
_jrと、エレメントＥ_iがリンクＬ_ijにより接続されるエ
レメントＥ_jの中心活性値ｅ_jを加算し、この加算で得た
値を文書に含まれるエレメントの総数で除することによ
り求められる。Here, it is clear from the determination in step S52 that the link _Lij is a normal link. Point activation values t _ij endpoint T _ij that is normally connected to the link L _ij of the element E _i, of the end-point activation value of the element E _j, all endpoints T _uk connecting link other than the link L _ij, T _jq , T _jr of the end-point activation values t _jp, t _jq, t
and _jr, adds the central activation value e _j of the element E _j of the element E _i is connected by the link L _ij, is determined by dividing the total number of elements included the value obtained by this addition to the document.

【００９０】文書処理装置の制御部１１は、たとえばＲ
ＡＭ１４に記憶されたデータから、必要な端点活性値お
よび中心活性値を読み出す。制御部１１は、読み出され
た端点活性値および中心活性値について、上述のように
その通常リンクと接続された端点の新たな端点活性値を
計算する。そして制御部１１は、このように計算した端
点活性値を、たとえばＲＡＭ１４に記憶させる。The control unit 11 of the document processing apparatus, for example,
From the data stored in the AM 14, necessary end point activation values and center activation values are read. The control unit 11 calculates a new endpoint activity value of the endpoint connected to the normal link as described above for the endpoint activity value and the center activity value that have been read. Then, the control unit 11 stores the calculated end point activation value in the RAM 14, for example.

【００９１】ステップＳ５４においては、エレメントＥ
_iの参照リンクに接続された端点Ｔ_ijの端点活性値を計
算する処理がおこなわれる。In step S54, the element E
Processing is performed to calculate the end point activity value of the end point T _ij connected to the reference link of _i .

【００９２】ステップＳ５２における判別により、リン
クＬ_ijは参照リンクであることが明らかになっている。
エレメントＥ_iの参照リンクＬ_ijに接続する端点Ｔ_ijの
新たな端点活性値ｔ_ijは、エレメントＥ_jの端点活性値
のうち、このリンクＬ_ijを除いたリンクに接続するすべ
ての端点Ｔ_jp、Ｔ_jq、Ｔ_jrの端点活性値ｔ_jp、ｔ_jq、ｔ
_jrと、エレメントＥ_iがリンクＬ_ijにより接続されるエ
レメントＥ_jの中心活性値ｅ_jを加算することにより求め
られる。From the determination in step S52, it is clear that the link _Lij is a reference link.
The new endpoint activation values t _ij of the endpoints T _ij connected to the reference link L _ij of the element E _i are all the endpoints T _jp connected to the links excluding the link L _ij among the endpoint activation values of the element E _j. , T _jq, T _jr of the end-point activation values t _jp, t _jq, t
and _jr, it is obtained by adding the central activation value e _j of the element E _j of the element E _i is connected by a link L _ij.

【００９３】文書処理装置の制御部１１は、たとえばＲ
ＡＭ１４に記憶されたデータから、必要な端点活性値お
よび中心活性値を読み出す。制御部１１、読み出された
端点活性値および中心活性値を用いて、上述のように参
照リンクと接続された新たな端点活性値を計算する。そ
して制御部１１は、このように計算した端点活性値を、
たとえばＲＡＭ１４に記憶させる。The control unit 11 of the document processing apparatus, for example,
From the data stored in the AM 14, necessary end point activation values and center activation values are read. The control unit 11 calculates a new endpoint activation value connected to the reference link as described above using the read endpoint activation value and the central activation value. Then, the control unit 11 calculates the endpoint activity value calculated in this manner,
For example, it is stored in the RAM 14.

【００９４】ステップＳ５３における通常リンクの処
理、およびステップＳ５４における参照リンクの処理
は、ステップＳ５２からステップＳ５５に至るループに
あるように、カウント値ｉにより参照されているエレメ
ントＥ_iに接続するすべてのリンクＬ_ijに対して実行さ
れる。[0094] processing of the normal link in step S53, and processing of the reference link in step S54, as in a loop extending from the step S52 to step S55, all of which connect to the element E _i that are referenced by the count value i Executed for link L _ij .

【００９５】ステップＳ５５においては、エレメントＥ
_iに接続するすべてのリンクについて端点活性値が計算
されたか否かが判別される。そして、すべてのリンクに
ついて端点活性値が計算されているときには“ＹＥＳ”
としてステップＳ５７に進み、すべてのリンクについて
端点活性値が計算されていないときには“ＮＯ”として
ステップＳ５７に進む。In step S55, the element E
_It is determined whether the endpoint activation values have been calculated for all the links connected to _i . If the end point activation values have been calculated for all the links, “YES”
The process proceeds to step S57, and if the end point activation values have not been calculated for all the links, the process proceeds to step S57 as "NO".

【００９６】ステップＳ５６においては、ステップＳ５
５にてエレメントＥ_iのすべてのリンクＬ_ijについて端
点活性値ｔ_ijが求められたことが判別されたので、エレ
メントＥ_iの中心活性値ｅ_iの更新を実行する。In step S56, step S5
At 5 for all links L _ij of the element E _i so that the end-point activation value t _ij is obtained is determined, performing an update of the central activation value e _i of the element E _i.

【００９７】エレメントＥ_iの中心活性値ｅ_iの新たな値
すなわち更新値は、エレメントＥ_iの現在の中心活性値
ｅ_iとエレメントＥ_iのすべての端点の新たな端点活性値
の和ｅ_i’＝ｅ_i＋Σｔ_j’をとることにより求められ
る。ここで、プライム“’”は、新たな値という意味で
ある。[0097] the new value that is updated value of the central activation value e _i of the element E _i, the sum e _i of a new end-point activation values of all of the end points of the current central activation value e _i and the element E _i of the element E _i It is obtained by taking '= e _i + _{ t _j '. Here, the prime “′” means a new value.

【００９８】文書処理装置の制御部１１は、たとえばＲ
ＡＭ１４に記憶されたデータから必要な端点活性値を読
み出す。制御部１１は、上述したような計算を実行し、
そのエレメントＥ_iの中心活性値ｅ_iを算出する。そし
て、制御部１１は、計算した新たな中心活性値ｅ_iをた
とえばＲＡＭ１４に記憶させる。The control unit 11 of the document processing apparatus, for example,
A necessary endpoint activation value is read from the data stored in the AM 14. The control unit 11 performs the calculation as described above,
Calculating a central activation value e _i of the element E _i. Then, the control unit 11 stores the new central activation value e _i calculated for example RAM 14.

【００９９】次に、文書処理装置の動作について説明す
る。文書処理装置は、現在の分類モデルに対応する文書
のみならず、ユーザの所望の日付と時間を遡って過去の
カテゴリと文書を見つけ出すことができる。すなわち、
文書の分類はユーザの興味に依存するので、時間の経過
とともに変化していく可能性がある。文書処理装置は、
時間の経過とともに変化した分類についても、過去の分
類を再現する。Next, the operation of the document processing apparatus will be described. The document processing apparatus can find not only documents corresponding to the current classification model but also past categories and documents by going back to the date and time desired by the user. That is,
Since the classification of a document depends on the interests of the user, it may change over time. The document processing device
The past classification is also reproduced for the classification that has changed over time.

【０１００】文書処理装置は、図１２に示すように、Ｓ
１０１においては、文書処理装置の入力部２０は、ユー
ザからの日時の入力を受ける。ユーザによる日時の入力
は、たとえば表示部３０に表示されたカレンダーにおい
て入力部２０のマウスに連動して表示部３０に表示され
るカーソルで選択することによりおこなわれる。また、
ユーザによる日時の入力は、たとえば入力部２０のキー
ボードにより打ち込まれてもよい。制御部１１は、入力
部２０に入力された日時を、たとえばＲＡＭ１４に記憶
させる。The document processing apparatus, as shown in FIG.
In 101, the input unit 20 of the document processing apparatus receives an input of a date and time from a user. The input of the date and time by the user is performed, for example, by selecting with a cursor displayed on the display unit 30 in conjunction with the mouse of the input unit 20 in the calendar displayed on the display unit 30. Also,
The input of the date and time by the user may be input by, for example, the keyboard of the input unit 20. The control unit 11 causes the date and time input to the input unit 20 to be stored in, for example, the RAM 14.

【０１０１】ステップＳ１０２においては、文書処理装
置の制御部１１は、たとえばＲＡＭ１４に記憶された文
書をカテゴリに分類するモデルである分類モデルを読み
出す。分類モデルは、更新時の日時を記録することによ
りその履歴を有しているので、この日時と入力された日
時を参照することにより、所望の日時の分類モデルを読
み出すことができる。In step S102, the control unit 11 of the document processing apparatus reads a classification model, which is a model for classifying documents stored in the RAM 14 into categories, for example. Since the classification model has a history by recording the date and time at the time of update, the classification model at a desired date and time can be read by referring to this date and time and the input date and time.

【０１０２】ここで、分類モデルの日時の更新について
説明する。図１３中のＡに示す分類モデルは、カテゴリ
“スポーツ”、“社会”、“コンピュータ”、“植
物”、“美術”および“イベント”に対して、固有名詞
“Ａ氏、・・・”、“Ｂ氏、・・・”、“Ｃ社、Ｇ社、
・・・”、“Ｄ種、・・・”、“Ｅ氏、・・・”および
“Ｆ氏、Ｈ氏”を、語義“野球（４５４６）、グランド
（２３４３）、・・・”、“労働（３１１２）、固有
（９８２１）、・・・”、“モバイル（２１０２）、・
・・”、“桜１(１１１１１)、オレンジ１（９９１
１）”、“桜２(１１１１２)、オレンジ２（９９１
２）”および“桜３(１１１１３)”を、この分類モデル
に対応する文書のアドレスを示す文書アドレス“ＳＰ
１、ＳＰ２、ＳＰ３、・・・”、“ＳＯ１、ＳＯ２、Ｓ
Ｏ３、・・・”、“ＣＯ１、ＣＯ２、ＣＯ３、・・
・”、“ＰＬ１、ＰＬ２、ＰＬ３、・・・”、“ＡＲ
１、ＡＲ２、ＡＲ３、・・・”および“ＥＶ１、ＥＶ
２、ＥＶ３、・・・”をそれぞれ有している。そして、
この分類モデルには、更新日時“１９９９年１月２０日
１０時３３分２７秒”が記録されている。ここで、たと
えば“オレンジ１”は植物のオレンジを、“オレンジ
２”は色のオレンジを示すものとする。なお、括弧内の
数字は語義に対応する文書アドレスを示している。Here, the updating of the date and time of the classification model will be described. The classification model indicated by A in FIG. 13 includes proper nouns “Mr. A,...” For the categories “sports”, “society”, “computer”, “plant”, “art”, and “event”. "Mr. B, ...", "Company C, Company G,
.., “D type,...”, “Mr. E,...” And “Mr. F, H” have the meanings “baseball (4546), ground (2343),. Labor (3112), Unique (9821), ... "," Mobile (2102), ...
・・ ”,“ Sakura 1 (11111), Orange 1 (991
1), “Sakura 2 (11112), Orange 2 (991
2) and “Sakura 3 (11113)” are replaced with the document address “SP” indicating the address of the document corresponding to this classification model.
1, SP2, SP3,... "," SO1, SO2, S
O3, ... "," CO1, CO2, CO3, ...
"," PL1, PL2, PL3, ... "," AR
1, AR2, AR3, ... "and" EV1, EV
2, EV3,..., Respectively.
In this classification model, the update date and time "10:33:27 on January 20, 1999" is recorded. Here, for example, "orange 1" indicates a plant orange, and "orange 2" indicates a color orange. The number in parentheses indicates the document address corresponding to the meaning.

【０１０３】この受信日時は、図１３中のＢに示すよう
に文書処理装置が前回の更新日時“１９９８年１２月１
０日１９時５６分１０秒”の分類モデルが、文書処理装
置が新たに受信日時“１９９９年１月２０日１０時３３
分２７秒”に文書を受信したことにより、更新されたも
のである。すなわち、図１３中のＢに示す分類モデル
は、カテゴリ“スポーツ”、“社会”、“コンピュー
タ”、“植物”、“美術”および“イベント”に対し
て、固有名詞“Ａ氏、・・・”、“Ｂ氏、・・・”、
“Ｃ社、Ｇ社、・・・”、“Ｄ種、・・・”、“Ｅ氏、
・・・”および“Ｆ氏”を、語義“野球（４５４６）、
グランド（２３４３）、・・・”、“労働（３１１
２）、固有（９８２１）、・・・”、“モバイル（２１
０２）、・・・”、“桜１(１１１１１)、オレンジ１
（９９１１）”、“桜２(１１１１２)、オレンジ２（９
９１２）”および“桜３(１１１１３)”を、この分類モ
デルに対応する文書アドレス“ＳＰ１、ＳＰ２、ＳＰ
３、・・・”、“ＳＯ１、ＳＯ２、ＳＯ３、・・・”、
“ＣＯ１、ＣＯ２、ＣＯ３、・・・”、“ＰＬ１、ＰＬ
２、ＰＬ３、・・・”、“ＡＲ１、ＡＲ２、ＡＲ３、・
・・”および“ＥＶ１、ＥＶ２、ＥＶ３、・・・”をそ
れぞれ有している。そして、この図１３中のＢに示す分
類モデルは、新たにカテゴリ“イベント”に対応する固
有名詞“Ｈ氏”が加えられて図１３中のＡに示す分類モ
デルに更新された。As shown in FIG. 13B, the received date and time is the date and time when the document processing apparatus updated the last update date and time “December 1, 1998
The classification model of “0: 19: 56: 10” is newly added to the document processing apparatus by the reception date and time “10:33 on January 20, 1999.
The document has been updated by receiving the document at “minute 27 seconds.” That is, the classification model shown in B in FIG. 13 has the categories “sports”, “society”, “computer”, “plant”, “ For "art" and "event", proper nouns "Mr. A, ...", "Mr. B, ...",
"Company C, Company G, ...", "Class D, ...", "Mr. E,
... "and" Mr. F "are translated into the meaning" baseball (4546),
Grand (2343), ... "," Labor (311
2), unique (9821), ... "," mobile (21
02), ... "," Sakura 1 (11111), Orange 1
(9911) "," Sakura 2 (11112), Orange 2 (9
912) ”and“ Sakura 3 (11113) ”are assigned to the document addresses“ SP1, SP2, SP ”corresponding to this classification model.
3, ... "," SO1, SO2, SO3, ... ",
"CO1, CO2, CO3, ...", "PL1, PL
2, PL3, ... "," AR1, AR2, AR3, ...
.. "and" EV1, EV2, EV3,... ". The classification model indicated by B in FIG. 13 is a proper noun" H "newly corresponding to the category" event ". "Has been added to the classification model shown in FIG.

【０１０４】なお、分類モデルの更新の形態としては、
カテゴリの増減やカテゴリの組み替えのみならず、別の
新たなカテゴリへの変更も有り得る。Note that the classification model is updated in the following manner.
Not only the increase / decrease of the category and the rearrangement of the category, but also the change to another new category may be possible.

【０１０５】図１２のステップＳ１０３においては、文
書処理装置の制御部１１は、ステップＳ１０２で読み出
した分類モデルにより、その分類モデルに対応する文書
を検索する。ユーザが入力した日時のカテゴリや検索し
た文書などの分類状況を再現して表示部３０に表示す
る。そして、この一連の工程を終了する。In step S103 of FIG. 12, the control section 11 of the document processing apparatus searches the classification model read out in step S102 for a document corresponding to the classification model. The category of the date and time entered by the user and the classification status of the searched documents are reproduced and displayed on the display unit 30. Then, this series of steps ends.

【０１０６】ステップＳ１０３における文書の検索は、
ステップＳ１０２において読み出した分類モデルに基づ
いて行われる。そこで、分類モデルの読み出しと、読み
出し他分類モデルに基づく文書の検索について、図１４
を参照して説明する。The search for the document in step S103 is performed as follows.
This is performed based on the classification model read in step S102. FIG. 14 shows the reading of the classification model and the retrieval of a document based on the read other classification model.
This will be described with reference to FIG.

【０１０７】ステップＳ１１１では、ユーザが入力部２
０に文書を検索する基準となる日時を入力する。ステッ
プＳ１１２では、制御部１１は、ステップＳ１１１にお
いて入力された日時に基づいて文書モデルを抽出する。
この文書モデルの抽出について、図１５を参照して説明
する。In step S111, the user inputs
In 0, enter the date and time as a reference for searching for a document. In step S112, the control unit 11 extracts a document model based on the date and time input in step S111.
The extraction of the document model will be described with reference to FIG.

【０１０８】図１５中のＡにおいては、日時ｔ₁から日
時ｔ₂までは分類モデルＰが、日時ｔ₂から日時ｔ₃まで
は分類モデルＱが、それぞれ登録されている。すなわ
ち、日時ｔ₂において、分類モデルＰは、分類モデルＱ
に更新された。ステップＳ１１１でたとえばＸ月Ｙ日と
いう日時を入力すると、ステップＳ１１２では、ステッ
プＳ１１１で入力した正確な日時ｔ₀に分類モデルを更
新する。この場合、日時ｔ₀に登録されている分類モデ
ルＱが抽出される。[0108] In A of FIG. 15, from time t ₁ to time t ₂ classification model P is, from time t ₂ to time t ₃ classification model Q, are respectively registered. That is, at the date and time t ₂ , the classification model P
Was updated to If you enter the date and time of step S111, for example, X month Y date, in step S112, updating the classification model to the exact date and time t ₀ input in step S111. In this case, the classification model Q that are registered on the date and time t ₀ is extracted.

【０１０９】図１５中のＢにおいては、図中のＡと同様
に、日時ｔ₁から日時ｔ₂までは分類モデルＰが、日時ｔ
₂から日時ｔ₃までは分類モデルＱが、それぞれ登録され
ている。ステップＳ１１１においては、たとえば１週間
前という日時が入力されたものとする。このような場合
には、正確に１週間前の日時ｔ₀₁に所定の幅を取ってモ
デルを検索する。図では、一週間前の日時ｔ₀₁と１日経
過した６日前の日時ｔ₀₁の間に、分類モデルＰから分類
モデルＱへの更新日時ｔ₂が存在している。したがっ
て、上記所定の幅として１日以上を指定すると、更新時
刻ｔ₂に更新された分類モデルＱが選択される。[0109] In B in FIG. 15, similarly to the A in the figure, from time t ₁ to time t ₂ classification model P is, time t
From ₂ up to date and time t ₃ classification model Q are registered, respectively. In step S111, for example, it is assumed that the date and time of one week ago has been input. In such a case, the model is searched with a predetermined width exactly at the date and time t ₀₁ one week ago. In the figure, during the previous date and time t ₀₁ 6 days after one day and one week prior to the date and time t _01, update date and time t ₂ from the classification model P to the classification model Q is present. Therefore, if you specify one or more days as the predetermined width, updated update time t ₂ classification model Q is selected.

【０１１０】ステップＳ１１２における、過去の文書の
分類の参照、そしてその分類に基づいた過去の文書の抽
出は、過去の分類のほうがよいと思ったときに実益があ
る。The reference to the classification of the past document in step S112 and the extraction of the past document based on the classification have a benefit when the past classification is considered better.

【０１１１】ステップＳ１１３では、制御部１１は、ス
テップＳ１１２において抽出されたモデルが更新された
時点で未読の文書を検索して抽出する。未読の文書の抽
出は、ステップＳ１１２で抽出した分類モデルの更新日
時に基づいて、分類モデルを参照することにより、分類
モデルに含まれる文書を認識され、各文書のインデック
スからさらに各文書の更新時間を参照することによりお
こなわれる。In step S113, the control unit 11 searches and extracts an unread document when the model extracted in step S112 is updated. The unread document is extracted by referring to the classification model based on the update date and time of the classification model extracted in step S112, the documents included in the classification model are recognized, and the update time of each document is further determined from the index of each document. This is performed by referring to.

【０１１２】ステップＳ１１４では、制御部１１は、ス
テップＳ１１３において抽出した未読の文書を表示部３
０に表示するように制御する。表示部３０には、たとえ
ば複数のウィンドウに、それぞれ所定の期間内の分類モ
デルを表示することができる。In step S114, the control unit 11 displays the unread document extracted in step S113 on the display unit 3.
It is controlled so that it is displayed at 0. The display unit 30 can display a classification model within a predetermined period, for example, in a plurality of windows.

【０１１３】なお、上述のように再現された過去の分類
モデルのカテゴリにしたがって、その過去の時点から現
在まで受信した文書を分類操作することもできる。過去
の分類モデルによる現在の文書の分類は、過去の分類モ
デルが現在の分類モデルよりよいとユーザが判断した場
合に実益がある。また、これから入力される文書を過去
の分類モデルによって分類したいときにも、過去の分類
モデルを利用することができる。Note that, according to the category of the past classification model reproduced as described above, the document received from the past time to the present can be classified. Classification of a current document by a past classification model is useful if the user determines that the past classification model is better than the current classification model. Also, when it is desired to classify a document to be input by a past classification model, the past classification model can be used.

【０１１４】上述の一連の工程では、制御部１１は、入
力部２０への日時の入力に基づき、この日時に対応する
分類モデルを読み出し、この分類モデルによって文書を
検索することにより日時を遡って過去のカテゴリと文書
を見つけ出している。したがって、ユーザは過去の文書
を容易に検索してその内容を確認することができる。In the above-described series of steps, the control unit 11 reads out the classification model corresponding to the date and time based on the input of the date and time into the input unit 20, and searches the document by this classification model to go back in time and date. Find out past categories and documents. Therefore, the user can easily search for past documents and check the contents.

【０１１５】次に、上述した中心活性値に基づいておこ
なう語義間関連度の計算について、図１６に示すフロー
チャートを参照して説明する。Next, a description will be given, with reference to the flowchart shown in FIG. 16, of the calculation of the degree of association between meanings based on the above-mentioned central activity value.

【０１１６】最初のステップＳ６１において、制御部１
１は、電子辞書内の語の語義の説明を用い、電子辞書を
使って語義のネットワークをタグ付けにより作成する。
すなわち、電子辞書における各語義の説明と、この説明
中に現れる語義との参照関係から、上述したような語義
のタグ付けによる構造のネットワークを作成する。これ
は、最上位のエレメントを電子辞書として、図２に示し
たようなタグ付けによる内部構造を構成することに相当
する。制御部１１は、たとえばＲＡＭ１４に記憶された
語義とその説明を順に読み出して、ネットワークを作成
する。制御部１４は、このようにして作成した語義のネ
ットワークをたとえばＲＡＭ１４に記憶させる。At the first step S61, the control unit 1
1 uses a description of the meaning of a word in the electronic dictionary and creates a network of the meaning by tagging using the electronic dictionary.
That is, a network having a structure by tagging the meanings as described above is created from the reference relation between the meanings of the meanings in the electronic dictionary and the meanings appearing in the description. This corresponds to configuring the internal structure by tagging as shown in FIG. 2 using the highest-order element as an electronic dictionary. The control unit 11 sequentially reads, for example, the meanings and their descriptions stored in the RAM 14 and creates a network. The control unit 14 stores the semantic network created in this manner in, for example, the RAM 14.

【０１１７】なお、この電子辞書は、たとえば通信回線
から文書処理装置の受信部２１にて受信することができ
る。また、電子辞書は、たとえばＣＤ−ＲＯＭなどの記
録媒体３２によって提供される。記録媒体３２により提
供された電子辞書は、記録／再生部３１により再生され
る。The electronic dictionary can be received by the receiving unit 21 of the document processing device from, for example, a communication line. The electronic dictionary is provided by a recording medium 32 such as a CD-ROM. The electronic dictionary provided by the recording medium 32 is reproduced by the recording / reproducing unit 31.

【０１１８】ステップＳ６２において、ステップＳ６１
で作成された語義のネットワーク上で、上述した中心活
性値の拡散処理をおこなう。この活性拡散により、各語
義の中心活性値は、電子辞書により与えられたタグ付け
による内部構造に応じて更新される。In step S62, step S61
The above-mentioned diffusion processing of the central activity value is performed on the network of the meaning created in the above. By this activation diffusion, the central activation value of each meaning is updated according to the internal structure by tagging provided by the electronic dictionary.

【０１１９】ステップＳ６３においては、ステップＳ６
１で作成された語義のネットワークを構成する一つの語
義ｓ_iを選択し、ステップＳ６４においては、この語義
ｓ_iに対応する語彙エレメントＥ_iの中心活性値ｅ_iの初
期値を適当に変化させ、このときの中心活性値の差分Δ
ｅ_iを計算する。In step S63, step S6
Selects one semantic s _i constituting the network of semantic created in 1, in step S64, suitably changing the initial value of the central activation value e _i vocabulary element E _i corresponding to this meaning s _i , The difference Δ of the central activity value at this time
Calculate e _i .

【０１２０】ステップＳ６５においては、ステップＳ６
４におけるエレメントＥ_iの中心活性値ｅ_iの初期値の変
化に対応する、語義ｓ_jに対応するエレメントＥ_jの中心
活性値ｅ_jの差分Δｅ_jを求める。ステップＳ６６におい
ては、ステップＳ６５で求めた差分Δｅ_jをステップＳ
６４で求めたΔｅ_iで除した商Δｅ_j／Δｅ_iを、語義ｓ_i
の語義ｓ_jに対する語義間関連度とする。ある語義の中
心活性値をステップＳ６４で変えたのに応じて、関連す
る語の中心活性値の変わることとなる。In step S65, step S6
Corresponds to a change in the initial value of the central activation value e _i of the element E _i at 4 obtains a difference .DELTA.e _j of central activation values e _j of the element E _j corresponding to semantic s _j. In the step S66, the difference Δe _j obtained in step S65 step S
The quotient Δe _j / Δe _i divided by Δe _{i found} in 64 is given by the meaning s _i
Is the degree of association between the meaning and the meaning of the meaning s _j . As the central activity value of a certain meaning is changed in step S64, the central activity value of a related word changes.

【０１２１】ステップＳ６７においては、語義ｓ_iと語
義ｓ_jとのすべての対について語義間関連度の演算が終
了したか否かについて判断する。そして、すべての語義
の対について語義間関連度の演算が終了したときには
“ＹＥＳ”として、この一連の処理を終了する。すべて
の語義の対について語義間関連度の演算が終了していな
いときには、“ＮＯ”として、ステップＳ６３にもど
り、語義間関連度の演算が終了していない対について語
義間関連度の演算を継続する。In step S67, it is determined whether the calculation of the degree of association between meanings has been completed for all pairs of meanings s _i and s _j . Then, when the calculation of the degree of association between meanings is completed for all the meaning pairs, "YES" is determined, and this series of processing ends. If the calculation of the degree of association between the meanings is not completed for all the pairs of meanings, the process returns to step S63, and the calculation of the degree of association between the meanings is continued for the pair for which the calculation of the degree of meaning is not completed. I do.

【０１２２】このように計算された語義間関連度は、図
１７に示すように、それぞれの語義と語義の間に定義さ
れる。この語義の表においては、語義間関連度は正規化
され、０から１までの値をとる。すなわち、この語義の
表においては“コンピュータ”、“テレビ”、“ＶＴ
Ｒ”の間の相互の語義間関連度が示されている。“コン
ピュータ”と“テレビ”の語義間関連度は０．５５、
“コンピュータ”と“ＶＴＲ”の語義間関連度は０．２
５、“テレビ”と“ＶＴＲ”の語義間関連度は０．６０
である。制御部１１は、このように作成した語義間関連
度をたとえばＲＡＭ１４に記憶させる。The calculated degree of association between meanings is defined between each meaning as shown in FIG. In this meaning table, the degree of association between meanings is normalized and takes a value from 0 to 1. That is, in this table of meaning, "computer", "television", "VT"
R indicates the mutual meaning relation between “computer” and “television”.
The relation between the meaning of "computer" and "VTR" is 0.2
5. The degree of association between meanings of "TV" and "VTR" is 0.60
It is. The control unit 11 causes the RAM 14 to store the word-to-sense association degree created in this way, for example.

【０１２３】ステップＳ６３からステップＳ６７のルー
プにおいては、制御部は、必要な値をたとえばＲＡＭ１
４から順に読み出して、上述したように語義間関連度を
計算する。制御部１１は、計算した語義間関連度をたと
えばＲＡＭ１４に記憶させる。In the loop from step S63 to step S67, the control unit stores the necessary value in RAM1
4, and the degree of association between meanings is calculated as described above. The control unit 11 causes the RAM 14 to store the calculated word-to-sense association degree, for example.

【０１２４】次に、上述したように算出された語義間関
連度を用いた文書分類について説明する。この語義間関
連度を利用した文書分類は、先に説明した図５のＧＵＩ
のウィンドウ３０１における文書分類に用いられる。Next, a description will be given of document classification using the degree of association between meanings calculated as described above. The document classification using the degree of association between meanings is performed by the GUI shown in FIG.
Is used for document classification in the window 301.

【０１２５】各カテゴリの分類モデルは、タグ付けによ
る内部構造による中心活性値に基づいて抽出される。上
述したように、文書処理装置の制御部１１は、ステップ
Ｓ３２において中心活性値が所定の閾値を超えるエレメ
ントを抽出し、ステップＳ３３においてこのエレメント
からすべての固有名詞を取り出してインデックスに加
え、ステップＳ３４においては固有名詞以外の語義を取
り出してインデックスに加える。分類モデルは、たとえ
ば上述の手順により生成されたインデックスを含むカテ
ゴリインデックスから構成される。The classification model of each category is extracted based on the central activity value based on the internal structure by tagging. As described above, the control unit 11 of the document processing apparatus extracts an element whose central activity value exceeds a predetermined threshold value in step S32, extracts all proper nouns from this element in step S33, and adds the extracted proper noun to the index. In, meanings other than proper nouns are extracted and added to the index. The classification model is composed of, for example, a category index including an index generated by the above-described procedure.

【０１２６】図７におけるステップＳ２３でおこなわれ
る文書の自動分類は、このような分類モデルを参照し
て、図１８のフローチャートに示す一連の手順にしたが
って、語義間関連度に基づいておこなわれる。The automatic classification of documents performed in step S23 in FIG. 7 is performed based on the degree of association between meanings according to a series of procedures shown in the flowchart of FIG. 18 with reference to such a classification model.

【０１２７】ステップＳ７１においては、制御部１１
は、分類モデルの各カテゴリＣ_i に含まれる固有名詞の
集合と、ステップＳ６２において文書から抽出されイン
デックスに入れられた語のうちの固有名詞の集合とにつ
いて、これらの共通集合の数をＰ（Ｃ_i ）とする。そし
て、制御部１１は、このようにして算出した数Ｐ
（Ｃ_i）をたとえばＲＡＭ１４に記憶させる。At step S71, the control unit 11
Defines the number of intersections of the set of proper nouns included in each category C _i of the classification model and the set of proper nouns of the words extracted from the document and indexed in step S62 as P ( C _i ). Then, the control unit 11 calculates the number P thus calculated.
(C _i ) is stored in the RAM 14, for example.

【０１２８】ステップＳ７２においては、制御部１１
は、その文書のインデックス中の語義と各カテゴリＣｉ
に含まれる全文書の語義との語義間関連度を図１７の語
義の表を参照し、語義間関連度の総和Ｒ（Ｃ_i ）を演算
する。すなわち、制御部１１は、分類モデルにおける固
有名詞以外の語について、ステップＳ６１で算出した語
義間関連度の総和Ｒ（Ｃ_i ）をとる。そして、制御部１
１は、算出した語義間関連度の総和Ｒ（Ｃ_i ）をたとえ
ばＲＡＭ１４に記憶させる。In step S72, control unit 11
Means the meaning in the index of the document and each category Ci
The sum R (C _i ) of the degrees of association between the meanings is calculated with reference to the table of the meanings of the words in FIG. That is, the control unit 11 calculates the sum R (C _i ) of the degree of association between meanings calculated in step S61 for words other than proper nouns in the classification model. And the control unit 1
1 stores the calculated sum R (C _i ) of the degree of association between meanings in, for example, the RAM 14.

【０１２９】ステップＳ７３においては、文書の項目Ｃ
_i に対する関連度である文書分類間関連度をＲｅｌ（Ｃ_i ）＝ｍＰ（Ｃ_i ）＋ｎＲ（Ｃ_i ）と定義する。ここで、係数ｍ、ｎは定数で、それぞれの
値の文書分類間関連度への貢献の度合いを表すパラメー
タである。制御部１１は、ステップＳ３３で算出した共
通集合の数Ｐ（Ｃ_i ）およびステップＳ６４で算出した
語義間関連度の総和Ｒ（Ｃ_i ）をたとえばＲＡＭ１４か
ら読み出し、上述の式に当てはめて文書分類間関連度Ｒ
ｅｌ（Ｃ_i ）を算出する。なお、これらの係数ｍ、ｎの
値としては、たとえばｍ＝１０、ｎ＝１とすることがで
きる。そして、制御部１１は、このように算出した文書
分類間関連度Ｒｅｌ（Ｃ_i ）をたとえばＲＡＭ１４に記
憶させる。In step S73, item C of the document
_The relevance between document classes, which is the relevance for _i, is defined as Rel (C _i ) = mP (C _i ) + nR (C _i ). Here, the coefficients m and n are constants and are parameters indicating the degree of contribution of the respective values to the degree of association between document classifications. The control unit 11 reads, for example, from the RAM 14 the number P (C _i ) of common sets calculated in step S33 and the sum R (C _i ) of the degree of association between meanings calculated in step S64, and applies the document classification to the above equation. Inter-association degree R
Calculate el (C _i ). The values of these coefficients m and n can be, for example, m = 10 and n = 1. Then, the control unit 11 stores the thus calculated inter-document-class relevance Rel (C _i ) in the RAM 14, for example.

【０１３０】係数ｍおよびｎの値は、統計的手法を使っ
て推定することもできる。すなわち、制御部１１は、複
数の係数ｍおよびｎの対について文書分類間関連度Ｒｅ
ｌ（Ｃ_i ）が与えられると、上記係数を最適化により求
めることができる。The values of the coefficients m and n can also be estimated using a statistical method. That is, the control unit 11 determines the degree of association between document classifications Re for a plurality of pairs of coefficients m and n.
Given l (C _i ), the above coefficients can be obtained by optimization.

【０１３１】ステップＳ７４においては、制御部１１
は、項目Ｃ_i に対する文書分類間関連度Ｒｅｌ（Ｃ_i ）
が全項目中最大で、その文書分類間関連度の値がある閾
値を超えているとき、カテゴリＣｉに文書を分類する。
すなわち、制御部１１は、複数のカテゴリについてそれ
ぞれ文書の文書分類間関連度Ｒｅｌ（Ｃ_i ）を作成し、
最大の文書分類間関連度Ｒｅｌ（Ｃ_i ）が閾値を超えて
いるときには、文書をそのカテゴリＣ_i に分類する。最
大の文書分類間関連度Ｒｅｌ（Ｃ_i ）が閾値を超えてい
ないときには、文書の分類はおこなわない。At step S74, control unit 11
Is the degree of association Rel (C _i ) between document classifications for the item C _i
Is the largest of all the items, and when the value of the degree of association between the document classifications exceeds a certain threshold value, the document is classified into the category Ci.
That is, the control unit 11 creates the inter-document class relevance Rel (C _i ) for each of the plurality of categories,
When the maximum degree of association between document classifications Rel (C _i ) exceeds the threshold, the documents are classified into the category C _i . When the maximum degree of inter-document-class relevance Rel (C _i ) does not exceed the threshold value, the document is not classified.

【０１３２】このように、文書中に含まれる語義間の語
義間関連度の計算とそれに基づく文書分類間関連度によ
る分類の手順は、複数のエレメントから構成される内部
構造を有する文書を処理し、この文書を複数のカテゴリ
に分類する。すなわち、この手順は、文書と各カテゴリ
との文書分類間関連度を算出し、算出された文書分類間
関連度に基づいて上記文書を分類するカテゴリを決定す
る。As described above, the procedure of calculating the degree of inter-semantic association between the meanings contained in the document and the classification based on the degree of inter-document association based on the processing processes a document having an internal structure composed of a plurality of elements. This document is classified into a plurality of categories. That is, this procedure calculates the degree of inter-document classification between a document and each category, and determines the category into which the document is classified based on the calculated degree of inter-document classification.

【０１３３】ここで、文書を分類するカテゴリは、文書
から抽出された固有名詞および／または語義を含むイン
デックスによって特徴づけられる。そして、このような
分類モデルを用いて、各カテゴリの分類モデルに含まれ
る固有名詞と文書から抽出された固有名詞についての共
通である固有名詞の数を算出し、各カテゴリの上記分類
モデルに含まれる語義に対する上記文書の語義間関連度
の総和を算出する。さらに、文書に含まれる固有名詞と
語義間関連度に基づいて抽出された重複する固有名詞の
数と、固有名詞以外の語義間関連度の総和とに基づいた
文書分類間関連度により上記文書を分類するカテゴリを
決定する。この語義間関連度は、上述したようなタグ付
けによる辞書の内部構造に基づいて決定される。Here, the category for classifying a document is characterized by an index including proper nouns and / or meanings extracted from the document. Then, using such a classification model, the number of proper nouns common to the proper noun included in the classification model of each category and the proper noun extracted from the document is calculated, and the number of proper nouns included in the classification model of each category is calculated. Calculate the sum of the relevance between the meanings of the document with respect to the meaning. Furthermore, the above document is determined by the inter-document relevance based on the number of duplicate proper nouns extracted based on the proper nouns and the meaning relevance included in the document and the sum of the relevance between the meanings other than the proper noun. Decide which category to classify. This degree of association between meanings is determined based on the internal structure of the dictionary by tagging as described above.

【０１３４】文書の分類は、たとえば共通する固有名詞
の数および語義間関連度の線形結合が最大となって、か
つ、文書文書分類間関連度が所定の閾値を越えるような
項目に対しておこなわれる。このような共通する固有名
詞の数および語義間関連度の線形結合の係数は、文書と
カテゴリの関連の大きさから、上述したように統計的に
決定することができる。Document classification is performed, for example, on items in which the linear combination of the number of common proper nouns and the degree of association between word meanings is maximized, and the degree of association between document and document classes exceeds a predetermined threshold. It is. The coefficient of the linear combination of the number of common proper nouns and the degree of association between meanings can be statistically determined from the magnitude of the association between the document and the category as described above.

【０１３５】次に、文書処理装置の記録／再生部３１に
おいて記録／再生される記録媒体３２について説明す
る。記録媒体３２には、複数のエレメントからタグ付け
による内部構造を有する文書を処理する文書処理プログ
ラムが記録されている。この記録媒体３２としては、情
報の記録／再生が可能なたとえばフロッピーディスクが
利用される。Next, the recording medium 32 recorded / reproduced in the recording / reproducing section 31 of the document processing apparatus will be described. The recording medium 32 records a document processing program for processing a document having an internal structure by tagging from a plurality of elements. As the recording medium 32, for example, a floppy disk capable of recording / reproducing information is used.

【０１３６】この記録媒体３２は、複数のエレメントか
ら構成された内部構造に関する情報が付与された電子文
書を複数のカテゴリに分類する分類モデルを用い、上記
分類モデルに記録された更新日時を参照して上記電子文
書を処理する文書処理プログラムが記録された記録媒体
３２であって、上記文書処理プログラムは、日時を入力
する入力処理と、上記入力処理で入力した日時に対応す
る分類モデルを読み出す読み出し処理、上記読み出し処
理で読み出した分類モデルに基づいて文書を検索する検
索処理とを有する。This recording medium 32 uses a classification model for classifying an electronic document, which is provided with information on an internal structure composed of a plurality of elements, into a plurality of categories, and refers to the update date and time recorded in the classification model. A document processing program for processing the electronic document, the document processing program comprising: an input process for inputting a date and time; and a readout for reading a classification model corresponding to the date and time input in the input process. And a search process for searching for a document based on the classification model read in the read process.

【０１３７】なお、本実施の形態においては、文書への
タグ付けの方法の一例を示したが、本発明がこのタグ付
けの方法に限定されないことはもちろんである。また、
本実施の形態においては、文書処理装置の受信部２１に
外部から文書が送信されるとしたが、本発明はこれに限
定されない。たとえば、上記文書は、文書処理装置のＲ
ＯＭ１３に書き込まれていたり、記録／再生部３１にお
いて記録媒体３２から読み出されてもよい。In the present embodiment, an example of a method of tagging a document has been described, but it is needless to say that the present invention is not limited to this tagging method. Also,
In the present embodiment, the document is transmitted from the outside to the receiving unit 21 of the document processing apparatus, but the present invention is not limited to this. For example, the above document is stored in the document processing device R
The data may be written in the OM 13 or read from the recording medium 32 in the recording / reproducing unit 31.

【０１３８】また、上述の実施の形態においては、文書
処理装置の表示部３０に表示された文書から所望のエレ
メントを選択するデバイスとしてマウスを例示したが、
本発明がこれに限定されないことはいうまでもない。文
書処理装置におけるエレメントの入力には、タブレッ
ト、ライトペン等の他のデバイスを利用することができ
る。Further, in the above-described embodiment, a mouse is exemplified as a device for selecting a desired element from a document displayed on the display unit 30 of the document processing apparatus.
It goes without saying that the present invention is not limited to this. Other devices such as a tablet and a light pen can be used for inputting elements in the document processing apparatus.

【０１３９】[0139]

【発明の効果】上述のように、本発明は、分類モデルに
記録された上記分類モデルの更新日時を参照して電子文
書を処理するものであって、日時を入力し、入力した日
時に対応する分類モデルに基づいて文書を検索するもの
である。したがって、本発明によると、過去の文書のみ
ならず、過去の分類モデルに基づいた文書の分類項目へ
の分類をも知ることができる。すなわち、本発明は、過
去の分類をも提供することにより、ユーザの過去の関心
の方向を知ることができるので、使用の際の利便性を拡
大する。As described above, the present invention processes an electronic document by referring to the update date and time of the classification model recorded in the classification model. The document is searched based on the classification model to be searched. Therefore, according to the present invention, not only the past documents but also the classification of the documents into the classification items based on the past classification model can be known. In other words, the present invention also provides past classifications so that it is possible to know the direction of the user's past interests, thereby increasing convenience in use.

【０１４０】また、本発明は、電子文書の特徴を表す特
徴情報を抽出し、分類モデルを構成する複数の分類項目
について、電子文書の特徴情報との関連度に応じて、各
文書を上記分類項目に分類する。したがって、本発明に
よると、過去の分類モデルに基づいて、電子文書を特徴
情報によって分類することができる。Further, according to the present invention, feature information representing features of an electronic document is extracted, and for each of a plurality of classification items constituting a classification model, each document is classified according to the degree of association with the feature information of the electronic document. Classify into items. Therefore, according to the present invention, an electronic document can be classified by feature information based on a past classification model.

[Brief description of the drawings]

【図１】本実施の形態を適用した文書処理装置の構成を
示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a document processing apparatus to which the present embodiment has been applied.

【図２】文書のタグ付けによる内部構造を示す図であ
る。FIG. 2 is a diagram showing an internal structure by tagging a document.

【図３】文書のタグ付けによる内部構造を表示したウィ
ンドウを示す図である。FIG. 3 is a diagram showing a window displaying an internal structure by tagging a document.

【図４】本実施の形態を適用した文書処理装置の動作を
示すフローチャートである。FIG. 4 is a flowchart illustrating an operation of the document processing apparatus to which the embodiment is applied.

【図５】文書の分類をおこなうＧＵＩを示す図である。FIG. 5 is a diagram showing a GUI for classifying documents.

【図６】文書を手動分類するフローチャートである。FIG. 6 is a flowchart for manually classifying documents.

【図７】文書を自動分類するフローチャートである。FIG. 7 is a flowchart for automatically classifying documents.

【図８】文書の特徴を発見してインデックスを作成する
フローチャートである。FIG. 8 is a flowchart for finding an index of a document and creating an index.

【図９】活性拡散を示すフローチャートである。FIG. 9 is a flowchart showing active diffusion.

【図１０】活性拡散の処理を説明する図である。FIG. 10 is a diagram illustrating a process of active diffusion.

【図１１】活性拡散のリンク処理のフローチャートであ
る。FIG. 11 is a flowchart of link processing of active spread.

【図１２】日時を遡って過去の分類項目と文書を再現す
る一連の工程を示すフローチャートである。FIG. 12 is a flowchart showing a series of steps for reproducing past classification items and documents by going back to the date and time.

【図１３】分類モデルの日時の更新を説明する図であ
る。FIG. 13 is a diagram illustrating updating of a date and time of a classification model.

【図１４】日時の幅の範囲内で未読の文書を抽出するフ
ローチャートである。FIG. 14 is a flowchart for extracting an unread document within the range of date and time.

【図１５】入力日時に近い分類モデル抽出を説明する図
である。FIG. 15 is a diagram for explaining classification model extraction close to the input date and time.

【図１６】語義間関連度の計算のフローチャートであ
る。FIG. 16 is a flowchart of calculation of the degree of association between meanings.

【図１７】語義間関連度の表を示す図である。FIG. 17 is a diagram showing a table of the degree of association between meanings.

【図１８】文書分類間関連度による文書分類のフローチ
ャートである。FIG. 18 is a flowchart of document classification based on the degree of association between document classifications.

[Explanation of symbols]

１０本体、１１制御部、１２インターフェース、
１３ＣＰＵ、２０入力部、２１受信部、３０表示
部、３１記録／再生部10 body, 11 control unit, 12 interface,
13 CPU, 20 input unit, 21 receiving unit, 30 display unit, 31 recording / reproducing unit

Claims

[Claims]

1. A document processing method for processing an electronic document by referring to an update date and time of the classification model recorded in the classification model, the method comprising: inputting a date and time; A search step of searching for an electronic document based on a classification model to be processed.

2. The document processing method according to claim 1, wherein when the classification model is updated, the classification model is updated by recording the date and time.

3. A feature information extracting step for extracting feature information representing features of the electronic document from the electronic document; and a feature of the electronic document extracted in the feature information extracting step for a plurality of classification items constituting the classification model. 2. A document processing method according to claim 1, further comprising a classification step of classifying each electronic document into said classification items according to a degree of association with information.

4. A receiving step of receiving a plurality of electronic documents, wherein the characteristic information extracting step extracts, for each electronic document received in the receiving step, characteristic information representing characteristics of the electronic document. The document processing method according to claim 3, wherein

5. The document processing method according to claim 4, wherein the classification model is updated by recording date and time when a new electronic document received in the receiving step is classified.

6. The document processing method according to claim 1, wherein the electronic document is provided with information on an internal structure composed of a plurality of elements.

7. A document processing apparatus for processing an electronic document by referring to an update date and time of the classification model recorded in the classification model, comprising: input means for inputting a date and time; A search unit for searching an electronic document based on a classification model to be processed.

8. The document processing apparatus according to claim 7, wherein the electronic document is provided with information on an internal structure composed of a plurality of elements.

9. A recording medium on which a document processing program for processing an electronic document with reference to an update date and time of the classification model recorded in the classification model is recorded, wherein the document processing program includes an input for inputting a date and time. A recording medium comprising: a process; and a search process for searching for an electronic document based on a classification model corresponding to the date and time input in the input process.

10. The document processing apparatus according to claim 9, wherein the electronic document is provided with information on an internal structure composed of a plurality of elements.