JP6084311B1

JP6084311B1 - Document information providing device

Info

Publication number: JP6084311B1
Application number: JP2016003355A
Authority: JP
Inventors: 大樹清水; 倫一宮川
Original assignee: Toyota Technical Development Corp
Current assignee: Toyota Technical Development Corp
Priority date: 2016-01-12
Filing date: 2016-01-12
Publication date: 2017-02-22
Anticipated expiration: 2036-01-12
Also published as: JP2017126109A

Abstract

【課題】他社の新しい技術文書（他社文書）が公開されたとき、他社文書と技術分野が関連する自社の技術文書（自社文書）を自動的に抽出し、その自社文書の担当者に他社文書の情報を提供するためには、予め抽出条件を適切に設定する必要があり、抽出条件の適切な設定のための運用工数が増大する虞がある。【解決手段】他社文書に含まれる単語の集合（即ち、語彙）と類似する語彙によって構成される自社文書を他社文書と技術分野が関連する文書として抽出することによって予め抽出条件を設定すること無く、その自社文書の担当者に対して他社文書の情報を配信することを実現する。【選択図】図３[PROBLEMS] When a new technical document (other company's document) of another company is released, the company's technical document (in-house document) related to the other company's document and the technical field is automatically extracted, and the other company's document is sent to the person in charge of the company's document. In order to provide this information, it is necessary to appropriately set the extraction condition in advance, and there is a possibility that the operation man-hours for appropriately setting the extraction condition may increase. By extracting an in-house document composed of a vocabulary similar to a set of words (ie, vocabulary) included in another company's document as a document related to the other company's document and the technical field, the extraction condition is not set in advance. , It is possible to distribute the information of other company documents to the person in charge of the company document. [Selection] Figure 3

Description

本発明は、新たな技術文書が取得されたとき、前記新たな技術文書と技術分野が関連する既存の技術文書の担当者へ同新たな技術文書に関する情報を提供する文書情報提供装置に関する。 The present invention relates to a document information providing apparatus for providing information on a new technical document to a person in charge of an existing technical document related to the technical field when the new technical document is acquired.

新たに公開された知的財産権情報（例えば、公開特許公報及び国際公開公報）が予め設定された抽出条件に合致したとき、その知的財産権情報を所定の利用者へ配信する情報流通促進システム（以下、「従来システム」とも称呼される。）が特許文献１に記載されている。抽出条件は、知的財産権情報に含まれる単語、及び、分類情報（例えば、ＩＰＣ及びＦＩ）等を従来システムに登録することによって設定される。従来システムによれば、様々な分野の開発者及び研究者等が、自分の開発している技術及び研究している分野と関連した技術情報を自動的に受信することが可能となる。 Information distribution promotion that distributes intellectual property right information to a predetermined user when newly published intellectual property right information (for example, published patent gazette and international publication gazette) meets preset extraction conditions A system (hereinafter also referred to as “conventional system”) is described in Patent Document 1. The extraction condition is set by registering words included in the intellectual property right information, classification information (for example, IPC and FI), and the like in the conventional system. According to the conventional system, developers and researchers in various fields can automatically receive the technology they are developing and the technical information related to the researched field.

特開２０１５−９９５８３号公報Japanese Patent Laying-Open No. 2015-99583

しかしながら、新しい知的財産権情報が抽出条件に合致しても（即ち、予め設定された単語を含んでいても、或いは、分類情報が一致しても）、その知的財産権情報が従来システムの利用者にとっての関連分野に属していない可能性がある。 However, even if the new intellectual property information matches the extraction condition (that is, even if it contains a preset word or the classification information matches), the intellectual property information is not displayed in the conventional system. May not belong to the relevant field for users.

従来システムの利用者が抽出条件を細かく設定すれば（例えば、より多くの単語を抽出条件に追加すれば）、従来システムが関連分野に属さない知的財産権情報を抽出することを回避できる可能性がある。その反面、抽出条件を細かく設定し過ぎると、関連分野の知的財産権情報が抽出条件に合致しなくなる虞がある。換言すれば、抽出条件を適切に調整するための工数が過大となる虞がある。 If the user of the conventional system sets the extraction conditions in detail (for example, if more words are added to the extraction condition), it is possible to avoid the intellectual property right information that the conventional system does not belong to the related field. There is sex. On the other hand, if the extraction conditions are set too finely, there is a risk that the intellectual property right information in the related field will not match the extraction conditions. In other words, the man-hours for appropriately adjusting the extraction conditions may be excessive.

そこで、本発明の目的の一つは、新たな技術文書（例えば、公開特許公報及び国際公開公報並びに学術論文等の知的財産権情報）が取得されたとき、その技術文書と関連する技術分野の担当者に対してその技術文書に関する情報を提供することを簡易な運用操作によって実現できる文書情報提供装置を提供することである。 Accordingly, one of the objects of the present invention is that when a new technical document (for example, intellectual property rights information such as a published patent gazette, an international published gazette, and an academic paper) is acquired, a technical field related to the technical document. It is to provide a document information providing apparatus capable of realizing information related to the technical document to a person in charge by a simple operation operation.

上記目的を達成するための本発明に係る文書情報提供装置（以下、「本発明装置」とも称呼される。）は、文書取得部、文書記憶部、関連文書抽出部、及び、情報提供部を備えている。
前記文書取得部は、被分類技術文書を取得する。
前記文書記憶部は、それぞれに担当者が割り当てられた複数の参照技術文書に関する情報を記憶する。 To achieve the above object, a document information providing apparatus according to the present invention (hereinafter also referred to as “present apparatus”) includes a document acquisition unit, a document storage unit, a related document extraction unit, and an information provision unit. I have.
The document acquisition unit acquires a classified technical document.
The document storage unit stores information on a plurality of reference technical documents each assigned a person in charge.

前記関連文書抽出部は、前記被分類技術文書が前記文書取得部によって取得されたとき、前記被分類技術文書を構成する語彙と類似する語彙によって構成される１つ又は複数の前記参照技術文書を関連文書として抽出する関連文書抽出処理を実行する。
前記情報提供部は、前記関連文書抽出部によって抽出された前記関連文書のそれぞれの前記担当者に対して前記被分類技術文書に関する情報を提供する。 When the classified technical document is acquired by the document acquisition unit, the related document extracting unit extracts one or a plurality of the reference technical documents configured by vocabulary similar to the vocabulary configuring the classified technical document. A related document extraction process for extracting as a related document is executed.
The information providing unit provides information related to the classified technical document to each person in charge of the related document extracted by the related document extracting unit.

例えば、本発明装置を利用する法人は、自社の特許出願に係る明細書等を参照技術文書として本発明装置に登録し、且つ、特許出願のそれぞれに係る発明をした者を参照技術文書の担当者として登録しておくことができる。加えて、本発明装置は、他社（自社以外の者）が出願した特許出願に係る公開特許公報を被分類技術文書として文書取得部が取得するように本発明装置を設定しておくことができる。 For example, a corporation that uses the device of the present invention registers the specification, etc. relating to its own patent application in the device of the present invention as a reference technical document, and the person who invented each of the patent applications is responsible for the reference technical document. You can register as a person. In addition, the apparatus of the present invention can set the apparatus of the present invention so that the document acquisition unit acquires a published patent publication related to a patent application filed by another company (a person other than the company) as a classified technical document. .

この場合、本発明装置は、他社の特許公報が新たに公開されたとき、その特許公報を構成する語彙に類似する語彙によって構成される自社の特許出願の担当者に対してその特許公報に関する情報を提供することができる。 In this case, when the patent gazette of another company is newly published, the device of the present invention provides information on the patent gazette to the person in charge of the company's patent application composed of vocabulary similar to the vocabulary constituting the patent gazette. Can be provided.

特許公報を含む技術文書は、その文書が属する技術分野に特有の専門用語を含んでいる可能性が高い。そのため、技術分野が互いに関連する２つの技術文書のそれぞれに含まれる単語（具体的には、名詞）の集合（即ち、語彙）は、互いに類似している場合が多い。そのため、本発明装置によって担当者に提供される他社の特許公報は、担当者が担当している技術分野に属している可能性が高い。 There is a high possibility that a technical document including a patent publication contains technical terms specific to the technical field to which the document belongs. Therefore, a set (namely, vocabulary) of words (specifically, nouns) included in each of two technical documents in which technical fields are related to each other is often similar to each other. Therefore, there is a high possibility that the patent publications of other companies provided to the person in charge by the device of the present invention belong to the technical field that the person in charge is in charge of.

換言すれば、参照技術文書を担当者と共に本発明装置に登録しておけば、被分類技術文書が取得されたとき、本発明装置は、その被分類技術文書と技術分野が関連する参照技術文書の担当者に対してその被分類技術文書に関する情報を提供することができる。この際、予め抽出条件を本発明装置に設定しておく必要はない。従って、本発明装置によれば、被分類技術文書と技術分野が関連する参照技術文書の担当者に対する被分類技術文書に関する情報提供を簡易な運用操作によって実現することができる。 In other words, if the reference technical document is registered in the device of the present invention together with the person in charge, when the classified technical document is acquired, the device of the present invention is related to the technical document related to the classified technical document. Information about the classified technical document can be provided. At this time, it is not necessary to set extraction conditions in the apparatus of the present invention in advance. Therefore, according to the apparatus of the present invention, it is possible to provide information related to the classified technical document to the person in charge of the reference technical document related to the classified technical document and the technical field by a simple operation operation.

本発明装置の一態様において、
前記関連文書抽出部は、
（１）ある技術文書に含まれる単語のそれぞれの同技術文書における出現頻度に基づいて定まる「第１語彙分布」を前記被分類技術文書及び前記複数の参照技術文書のそれぞれに対して算出し、
（２）２つの技術文書のそれぞれの前記第１語彙分布が互いに類似しているほど小さい値になる「第１語彙分布距離」を前記被分類技術文書及び前記複数の参照技術文書のうちの任意の２つの技術文書の組合せに対して算出し、
（３）前記第１語彙分布距離が所定値よりも小さい前記技術文書の組合せに含まれる技術文書のそれぞれを集約することによって所定数の「近傍文書グループ」を生成し、
（４）前記被分類技術文書が含まれる前記近傍文書グループに含まれる前記参照技術文書を前記関連文書として抽出する、
ことによって関連文書抽出処理を実行するように構成されることが好適である。 In one aspect of the device of the present invention,
The related document extraction unit includes:
(1) calculating a “first vocabulary distribution” determined based on the appearance frequency of each word included in a technical document in the technical document for each of the classified technical document and the plurality of reference technical documents;
(2) The “first vocabulary distribution distance”, which becomes smaller as the first vocabulary distributions of the two technical documents are similar to each other, is set to an arbitrary one of the classified technical document and the plurality of reference technical documents. Calculated for the combination of two technical documents
(3) generating a predetermined number of “neighboring document groups” by aggregating each of the technical documents included in the combination of the technical documents in which the first vocabulary distribution distance is smaller than a predetermined value;
(4) extracting the reference technical document included in the neighboring document group including the classified technical document as the related document;
It is preferable that the related document extraction process is executed.

技術文書の属する技術分野に特有の専門用語は、その技術文書において繰り返し使用される可能性が高い。そのため、同一の近傍文書グループに含まれる技術文書のそれぞれは、技術分野が互いに関連している可能性が高い。従って、本態様によれば、被分類技術文書と技術分野が関連する参照技術文書を精度良く抽出することができる。 Technical terms specific to the technical field to which the technical document belongs are likely to be used repeatedly in the technical document. Therefore, there is a high possibility that the technical fields of the technical documents included in the same neighboring document group are related to each other. Therefore, according to this aspect, it is possible to accurately extract the reference technical document related to the classified technical document and the technical field.

本発明装置の他の態様において、
本発明装置は、
前記複数の参照技術文書のそれぞれを技術分野が互いに関連する技術文書の集合である複数の「参照技術文書グループ」に分類したうえで前記文書記憶部に記憶させる「参照技術文書分類部」を備え、
前記参照技術文書分類部は、
前記参照技術文書を前記文書記憶部に追加するとき、
前記参照技術文書グループのうち前記追加される参照技術文書と技術分野が最も密接に関連している前記参照技術文書の集合である同参照技術文書グループに同追加される参照技術文書が属すると判定する「参照技術文書追加処理」を実行し、
前記関連文書抽出部は、
前記参照技術文書グループのうち前記被分類技術文書と技術分野が最も密接に関連している前記参照技術文書の集合である同参照技術文書グループに含まれる同参照技術文書を前記関連文書として抽出することによって前記関連文書抽出処理を実行する、
ように構成されることが好適である。 In another embodiment of the device of the present invention,
The device of the present invention
Each of the plurality of reference technical documents is classified into a plurality of “reference technical document groups” that are collections of technical documents related to each other in the technical field, and a “reference technical document classification unit” is stored in the document storage unit. ,
The reference technical document classification unit includes:
When adding the reference technical document to the document storage unit,
It is determined that the reference technical document to be added belongs to the reference technical document group that is a set of the reference technical documents that are most closely related to the technical field of the reference technical document to be added among the reference technical document groups. Execute "Reference technical document addition process"
The related document extraction unit includes:
Among the reference technical document groups, the reference technical documents included in the reference technical document group that is a set of the reference technical documents most closely related to the classified technical document and the technical field are extracted as the related documents. The related document extraction process is executed by
It is preferable to be configured as described above.

参照技術文書グループの構成は、参照技術文書追加処理の実行時に画定される。そのため、例えば、複数の被分類技術文書が取得された場合（即ち、関連文書抽出処理が複数回実行された場合）であっても、参照技術文書グループのそれぞれに含まれる参照技術文書の組合せ（即ち、参照技術文書グループの構成）は変化しない。従って、本態様によれば、参照技術文書グループの構成を変化させることなく複数の被分類技術文書に関する担当者への情報提供を行うことができる。 The configuration of the reference technical document group is defined when the reference technical document addition process is executed. Therefore, for example, even when a plurality of classified technical documents are acquired (that is, when the related document extraction process is executed a plurality of times), a combination of reference technical documents included in each of the reference technical document groups ( That is, the structure of the reference technical document group does not change. Therefore, according to this aspect, it is possible to provide information to a person in charge regarding a plurality of classified technical documents without changing the configuration of the reference technical document group.

例えば、本発明装置の利用開始に伴って本発明装置の運用者が文書記憶部に参照技術文書を登録するとき、参照技術文書グループを設定しておけば、その後、追加される参照技術文書は、既存の参照技術文書グループに何れかに属することになる。そのため、本態様によれば、参照技術文書グループのそれぞれの技術分野を固定したうえで参照技術文書の追加をすることができる。 For example, when the operator of the device of the present invention registers a reference technical document in the document storage unit with the start of use of the device of the present invention, if a reference technical document group is set, then the reference technical document to be added is Belongs to any existing reference technical document group. Therefore, according to this aspect, it is possible to add a reference technical document after fixing each technical field of the reference technical document group.

前述した本発明装置の態様は、
前記参照技術文書分類部が、
（１）ある技術文書に含まれる単語のそれぞれの同技術文書における出現頻度に基づいて定まる「第２語彙分布」を前記追加される参照技術文書及びそれ以外の前記参照技術文書のそれぞれに対して算出し、
（２）２つの技術文書のそれぞれの前記第２語彙分布が互いに類似しているほど小さい値になる「第２語彙分布距離」を前記追加される参照技術文書及び前記それ以外の参照技術文書のうちの任意の２つの技術文書の組合せに対して算出し、
（３）前記第２語彙分布距離が所定値よりも小さい前記技術文書の組合せに含まれる技術文書のそれぞれを集約することによって所定数の前記参照技術文書グループ生成し、
（４）前記生成された参照技術文書グループのうち前記追加される参照技術文書が含まれる同参照技術文書グループに同追加される参照技術文書が属すると判定する、
ことによって前記参照技術文書追加処理を実行するように構成され得る。 The aspect of the device of the present invention described above is
The reference technical document classification unit
(1) A “second vocabulary distribution” determined based on the appearance frequency of each word included in a technical document in each technical document is added to each of the added reference technical document and the other reference technical documents. Calculate
(2) The “second vocabulary distribution distance”, which becomes a smaller value as the second vocabulary distributions of the two technical documents are similar to each other, is set to the added reference technical document and the other reference technical documents. Calculate for a combination of any two of these technical documents,
(3) generating a predetermined number of the reference technical document groups by aggregating each of the technical documents included in the technical document combination in which the second vocabulary distribution distance is smaller than a predetermined value;
(4) It is determined that the added reference technical document belongs to the reference technical document group including the added reference technical document among the generated reference technical document groups.
Accordingly, the reference technical document adding process may be executed.

本態様において、追加される参照技術文書の語彙分布と同様の語彙分布を有する参照技術文書グループにその参照技術文書が属することになる。そのため、追加される参照技術文書及び他の参照技術文書の集合を、技術分野が互いに関連する参照技術文書によって構成される複数の参照技術文書グループへ精度良く分割することができる。 In this aspect, the reference technical document belongs to a reference technical document group having a vocabulary distribution similar to that of the added reference technical document. Therefore, a set of reference technical documents and other reference technical documents to be added can be accurately divided into a plurality of reference technical document groups configured by reference technical documents having technical fields related to each other.

加えて、参照技術文書が追加される毎に参照技術文書グループが改めて生成される。従って、本態様によれば、多くの参照技術文書が追加される場合であっても参照技術文書グループが互いに技術分野が関連する参照技術文書によって構成される状態が維持される。 In addition, each time a reference technical document is added, a reference technical document group is newly generated. Therefore, according to this aspect, even when many reference technical documents are added, a state in which the reference technical document groups are configured by the reference technical documents related to the technical field is maintained.

或いは、前述した本発明装置の態様は、
前記関連文書抽出部が、
（１）前記参照技術文書グループに含まれる前記参照技術文書の数に対するその参照技術文書グループにおけるある単語を含んでいる同参照技術文書の数の比率である「第１単語含有率」を同参照技術文書グループ毎に且つ同単語毎に算出し、
（２）前記被分類技術文書が前記第１単語含有率のより高い単語をより多く含んでいるほど大きい値となり且つ同被分類技術文書が前記第１単語含有率のより低い単語をより少なく含んでいるほど大きい値となる「第１文書関連度」を前記参照技術文書グループのそれぞれに対して算出し、
（３）前記参照技術文書グループのうち前記第１文書関連度が最も大きい値となる同参照技術文書グループに含まれる前記参照技術文書を前記関連文書として抽出する、
ことによって前記関連文書抽出処理を実行するように構成され得る。 Alternatively, the aspect of the device of the present invention described above is
The related document extraction unit
(1) The same reference is made to the “first word content rate”, which is the ratio of the number of the reference technical documents including a certain word in the reference technical document group to the number of the reference technical documents included in the reference technical document group. Calculate for each technical document group and for each word,
(2) The value becomes higher as the classified technical document includes more words having the higher first word content rate, and the classified technical document includes fewer words having the lower first word content rate. The “first document relevance”, which becomes a larger value as
(3) extracting the reference technical document included in the reference technical document group having the highest value of the first document relevance among the reference technical document groups as the related document;
Accordingly, the related document extraction process may be executed.

第１文書関連度は、被分類技術文書が「ある参照技術文書グループに含まれる参照技術文書の多くが含んでいる単語」を含んでいれば大きな値となる。一方、第１文書関連度は、被分類技術文書が「その参照技術文書グループに含まれる参照技術文書の多くが含んでいない単語」を含んでいれば小さな値となる。加えて、第１文書関連度は、被分類技術文書が「その参照技術文書グループに含まれる参照技術文書の多くが含んでいない単語」を含んでいなければ大きな値となる。 The first document relevance value is a large value if the classified technical document includes “words included in many reference technical documents included in a certain reference technical document group”. On the other hand, the first document relevance level is a small value if the classified technical document includes “a word that is not included in many of the reference technical documents included in the reference technical document group”. In addition, the first document relevance value is a large value if the classified technical document does not include “a word not included in many of the reference technical documents included in the reference technical document group”.

換言すれば、第１文書関連度は、被分類技術文書及び参照技術文書のそれぞれが「含んでいる単語」に加えて「含んでない単語」にも基づいて決定される。従って、本態様によれば、被分類技術文書と技術分野が関連する参照技術文書を含む参照技術文書グループを精度良く抽出することができる。 In other words, the first document relevance is determined based on “words not included” in addition to “words included” in each of the classified technical document and the reference technical document. Therefore, according to this aspect, it is possible to accurately extract a reference technical document group including a reference technical document related to a classified technical document and a technical field.

加えて、前述した本発明装置の態様は、
前記参照技術文書分類部が、
（１）前記参照技術文書グループに含まれる前記参照技術文書の数に対するその参照技術文書グループにおけるある単語を含んでいる同参照技術文書の数の比率である「第２単語含有率」を同参照技術文書グループ毎に且つ同単語毎に算出し、
（２）前記追加される参照技術文書が前記第２単語含有率のより高い単語をより多く含んでいるほど大きい値となり且つ同追加される参照技術文書が前記第２単語含有率のより低い単語をより少なく含んでいるほど大きい値となる「第２文書関連度」を前記参照技術文書グループのそれぞれに対して算出し、
（３）前記参照技術文書グループのうち前記第２文書関連度が最も大きい値となる同参照技術文書グループに前記追加される参照技術文書が属すると判定する、
ことによって前記参照技術文書追加処理を実行するように構成され得る。 In addition, the aspect of the device of the present invention described above is
The reference technical document classification unit
(1) The same reference is made to the “second word content ratio” which is a ratio of the number of the reference technical documents including a word in the reference technical document group to the number of the reference technical documents included in the reference technical document group. Calculate for each technical document group and for each word,
(2) The reference technical document to be added has a higher value as the number of words having a higher second word content rate is increased, and the reference technical document to be added has a lower word content rate. Calculating “second document relevance” for each of the reference technical document groups, which becomes a larger value as the content of the reference technical document group is reduced.
(3) It is determined that the added reference technical document belongs to the reference technical document group having the highest value of the second document relevance among the reference technical document groups.
Accordingly, the reference technical document adding process may be executed.

第１文書関連度と同様に、第２文書関連度は、追加される参照技術文書及び他の参照技術文書のそれぞれが「含んでいる単語」に加えて「含んでない単語」にも基づいて決定される。そのため、本態様において、追加される参照技術文書と技術分野が関連する他の参照技術文書によって構成される参照技術文書グループを精度良く抽出することができる。その結果、本態様によれば、多くの参照技術文書が追加される場合であっても参照技術文書グループが互いに技術分野が関連する参照技術文書によって構成される状態が維持される。 Similar to the first document relevance level, the second document relevance level is determined based on “words not included” in addition to “words included” in each of the added reference technical document and other reference technical documents. Is done. For this reason, in this aspect, it is possible to accurately extract a reference technical document group including reference technical documents to be added and other reference technical documents related to the technical field. As a result, according to this aspect, even when many reference technical documents are added, a state in which the reference technical document groups are constituted by the reference technical documents related to the technical field is maintained.

更に、前述した本発明装置の態様は、
前記参照技術文書分類部が、
（１）前記参照技術文書グループに含まれる前記参照技術文書のそれぞれの前記担当者の集合と、前記追加される参照技術文書の担当者の集合と、の両方に含まれる担当者の数が多いほど大きい値となる「担当者関連度」を前記参照技術文書グループのそれぞれに対して算出し、
（２）前記参照技術文書グループのうち前記担当者関連度が最も大きい値となる同参照技術文書グループに前記追加される参照技術文書が属すると判定する、
ことによって前記参照技術文書追加処理を実行するように構成され得る。 Furthermore, the aspect of the device of the present invention described above is
The reference technical document classification unit
(1) The number of persons in charge is included in both the set of persons in charge of each of the reference technical documents included in the reference technical document group and the group of persons in charge of the added reference technical document. The “Responsibility of person in charge”, which becomes a larger value, is calculated for each of the reference technical document groups
(2) It is determined that the added reference technical document belongs to the reference technical document group having the highest relevance level among the reference technical document groups.
Accordingly, the reference technical document adding process may be executed.

一般に、担当者のそれぞれは、特定の技術分野に精通したうえで担当業務を遂行しているので、同一の担当者が関わった複数の参照技術文書は技術分野が互いに関連している可能性が高い。ある参照技術文書に対して複数の担当者が関わっていれば、「それら複数の担当者の集合」と構成員が類似する「複数の担当者の集合」が関わった他の参照技術文書は、その参照技術文書と技術分野が互いに関連している可能性が高い。 In general, each person in charge is familiar with a specific technical field and performs his / her duties. Therefore, multiple reference technical documents related to the same person may have technical fields related to each other. high. If multiple representatives are involved in a reference technical document, other reference technical documents related to a "collection of multiple representatives" whose members are similar to "collection of those multiple representatives" The reference technical document and the technical field are likely to be related to each other.

従って、２つの参照技術文書に関する担当者関連度が大きいほど、それら２つの参照技術文書の技術分野が互いに関連していると考えられる。そのため、本態様において、追加される参照技術文書と技術分野が関連する他の参照技術文書によって構成される参照技術文書グループを精度良く抽出することができる。その結果、本態様によれば、多くの参照技術文書が追加される場合であっても参照技術文書グループが互いに技術分野が関連する参照技術文書によって構成される状態が維持される。 Therefore, it is considered that the technical fields of the two reference technical documents are related to each other as the degree of relevance of the person in charge regarding the two reference technical documents is larger. For this reason, in this aspect, it is possible to accurately extract a reference technical document group including reference technical documents to be added and other reference technical documents related to the technical field. As a result, according to this aspect, even when many reference technical documents are added, a state in which the reference technical document groups are constituted by the reference technical documents related to the technical field is maintained.

本発明の第１実施形態に係る文書情報提供装置（第１装置）の構成を表した概略図である。It is the schematic showing the structure of the document information provision apparatus (1st apparatus) which concerns on 1st Embodiment of this invention. 第１装置が算出する参照技術文書及び被分類技術文書の語彙分布を表した表である。It is the table | surface showing the vocabulary distribution of the reference technical document and the classified technical document which a 1st apparatus calculates. 参照技術文書及び被分類技術文書の語彙分布を２次元の直交座標系に表したグラフである。It is the graph which represented the vocabulary distribution of the reference technical document and the classified technical document in the two-dimensional orthogonal coordinate system. 第１装置が実行する参照技術文書追加処理を表したフローチャートである。It is a flowchart showing the reference technical document addition process which a 1st apparatus performs. 第１装置が実行する関連文書抽出処理を表したフローチャートである。It is a flowchart showing the related document extraction process which a 1st apparatus performs. 第１装置が実行する被分類技術文書通知処理を表したフローチャートである。It is a flowchart showing the classified technical document notification process which a 1st apparatus performs. 参照技術文書グループ毎及び単語毎に算出された単語含有率を表した表である。It is the table | surface showing the word content rate computed for every reference technical document group and every word. 文書関連度の算出方法を説明するために示された単語含有率及び語彙分布の例である。It is an example of the word content rate and vocabulary distribution shown in order to demonstrate the calculation method of a document relevance degree. 参照技術文書グループ毎に算出された文書関連度を表した表である。It is the table | surface showing the document relevance calculated for every reference technical document group. 本発明の第２実施形態に係る文書情報提供装置（第２装置）が実行する参照技術文書追加処理を表したフローチャートである。It is a flowchart showing the reference technical document addition process which the document information provision apparatus (2nd apparatus) which concerns on 2nd Embodiment of this invention performs. 第２装置が実行する関連文書抽出処理を表したフローチャートである。It is a flowchart showing the related document extraction process which a 2nd apparatus performs. 本発明の第３実施形態に係る文書情報提供装置（第３装置）が実行する参照技術文書登録処理を表したフローチャートである。It is a flowchart showing the reference technical document registration process which the document information provision apparatus (3rd apparatus) which concerns on 3rd Embodiment of this invention performs. 第３装置が実行する参照技術文書追加処理を表したフローチャートである。It is a flowchart showing the reference technical document addition process which a 3rd apparatus performs. 参照技術文書毎に算出された語彙集合及び担当者関連度を表した表である。It is the table | surface showing the vocabulary set calculated for every reference technical document, and a person in charge relevance degree. 担当者関連度が大きい順に並べ替えられた参照技術文書を表した表である。It is the table | surface showing the reference technical document rearranged in the order of the person-in-charge relevance degree. 本発明の第４実施形態に係る文書情報提供装置（第４装置）が実行する参照技術文書登録処理を表したフローチャートである。It is a flowchart showing the reference technical document registration process which the document information provision apparatus (4th apparatus) which concerns on 4th Embodiment of this invention performs. 第４装置が実行する参照技術文書追加処理を表したフローチャートである。It is a flowchart showing the reference technical document addition process which a 4th apparatus performs. 第１装置から第４装置までのそれぞれの参照技術文書の分類方法及び被分類技術文書の分類方法を対比した表である。It is the table | surface which contrasted the classification method of each reference technical document from the 1st apparatus to the 4th apparatus, and the classification method of a to-be-classified technical document.

以下、図面を参照しながら本発明の各実施形態に係る文書情報提供装置について説明する。
＜第１実施形態＞
本発明の第１実施形態に係る文書情報提供装置１１（以下、「第１装置」とも称呼される。）の概略構成が図１に示される。文書情報提供装置１１は、汎用コンピュータであり、ＣＰＵ２１、ＲＡＭ２２、ハードディスクドライブ（ＨＤＤ）２３、ネットワークインタフェース２４、及び、操作インタフェース２５を含んでいる。 The document information providing apparatus according to each embodiment of the present invention will be described below with reference to the drawings.
<First Embodiment>
FIG. 1 shows a schematic configuration of a document information providing apparatus 11 (hereinafter also referred to as “first apparatus”) according to the first embodiment of the present invention. The document information providing apparatus 11 is a general-purpose computer, and includes a CPU 21, a RAM 22, a hard disk drive (HDD) 23, a network interface 24, and an operation interface 25.

ＣＰＵ２１は、所定のプログラムを逐次実行することによってデータの読み込み、数値演算、及び、演算結果の出力等を行う。ＲＡＭ２２は、データを一時的に記憶する。ＨＤＤ２３は、ＣＰＵ２１が実行するプログラム及びデータベース（ＤＢ）等を記憶する。 The CPU 21 reads data, performs numerical calculations, outputs calculation results, and the like by sequentially executing a predetermined program. The RAM 22 temporarily stores data. The HDD 23 stores programs executed by the CPU 21, a database (DB), and the like.

ネットワークインタフェース２４は、周知のネットワーク４１（例えば、インターネット）を介して外部の文書公開サーバ４２及び電子メールサーバ４３と通信することができる。文書公開サーバ４２は、新たな公開特許公報を随時公開し、その公報をネットワーク４１を介して取得可能な状態に維持するＷｅｂサーバ（ｈｔｔｐサーバ）である。例えば、文書公開サーバ４２は、特許庁が公開している「公報発行サイト」及び民間企業が提供する特許情報サービスのＷｅｂサーバである。 The network interface 24 can communicate with an external document publishing server 42 and an electronic mail server 43 via a known network 41 (for example, the Internet). The document publishing server 42 is a Web server (http server) that publishes a new published patent gazette as needed and maintains the gazette in a state where it can be acquired via the network 41. For example, the document publication server 42 is a “publication publication site” published by the Patent Office and a Web server for a patent information service provided by a private company.

電子メールサーバ４３は、文書情報提供装置１１（具体的には、ネットワークインタフェース２４）からの要求（例えば、ＳＭＴＰリクエスト）に応じて所望の宛先（電子メールアドレス）に対して電子メールを送信することができる。 The e-mail server 43 transmits an e-mail to a desired destination (e-mail address) in response to a request (for example, an SMTP request) from the document information providing apparatus 11 (specifically, the network interface 24). Can do.

操作インタフェース２５は、文書情報提供装置１１に接続された入力装置２６及び出力装置２７と通信することができる。入力装置２６はキーボード及びマウスを含み、出力装置２７はディスプレイ装置を含んでいる。加えて、操作インタフェース２５は、ＵＳＢポート（不図示）を備え、ＵＳＢポートに接続されたＵＳＢメモリからのデータの読み込み及びＵＳＢメモリへのメモリの書き込みを実行することができる。文書情報提供装置１１の運用者は、入力装置２６を用いて文書情報提供装置１１を操作し、操作結果を出力装置２７を介して取得する。 The operation interface 25 can communicate with an input device 26 and an output device 27 connected to the document information providing device 11. The input device 26 includes a keyboard and a mouse, and the output device 27 includes a display device. In addition, the operation interface 25 includes a USB port (not shown), and can read data from a USB memory connected to the USB port and write data to the USB memory. The operator of the document information providing device 11 operates the document information providing device 11 using the input device 26 and acquires the operation result via the output device 27.

（データベース）
ＨＤＤ２３が記憶するＤＢには、参照技術文書ＤＢ３１及び被分類技術文書ＤＢ３２が含まれている。参照技術文書ＤＢ３１には、特定の法人（以下、「自社」とも称呼される。）が出願した特許出願に関する情報（願書に添付された明細書等を含み、以下、「参照技術文書」とも称呼される。）が登録されている。参照技術文書ＤＢ３１において、出願番号が参照技術文書の文書ＩＤ（識別子）として用いられる。 (Database)
The DB stored in the HDD 23 includes a reference technical document DB 31 and a classified technical document DB 32. The reference technical document DB 31 includes information (including a specification attached to the application) relating to a patent application filed by a specific corporation (hereinafter also referred to as “own company”), and is hereinafter also referred to as “reference technical document”. Is registered). In the reference technical document DB 31, the application number is used as the document ID (identifier) of the reference technical document.

参照技術文書ＤＢ３１には、参照技術文書のそれぞれに対する一人又は複数の担当者に関する情報が登録されている。原則として、担当者は参照技術文書（即ち、特許出願に係る明細書等）に記載された発明の発明者及び同発明の成立過程に関わった補助者等である。担当者が異動していれば、後任者がその参照技術文書の担当者として参照技術文書ＤＢ３１に登録されている。参照技術文書ＤＢ３１に登録された担当者に関する情報には担当者のそれぞれの電子メールアドレスが含まれている。 In the reference technical document DB 31, information on one or more persons in charge for each of the reference technical documents is registered. As a general rule, the person in charge is the inventor of the invention described in the reference technical document (that is, the description relating to the patent application, etc.) and the assistant who is involved in the formation process of the invention. If the person in charge has changed, the successor is registered in the reference technical document DB 31 as the person in charge of the reference technical document. The information regarding the person in charge registered in the reference technical document DB 31 includes the e-mail address of each person in charge.

被分類技術文書ＤＢ３２には、自社以外の者（以下、「他社」とも称呼される。）が出願した特許出願に係る公開特許公報（以下、「被分類技術文書」とも称呼される。）に関する情報が登録されている。被分類技術文書ＤＢ３２において、出願番号が被分類技術文書の文書ＩＤ（識別子）として用いられる。 The classified technical document DB 32 relates to a published patent publication (hereinafter also referred to as “classified technical document”) relating to a patent application filed by a person other than the company (hereinafter also referred to as “other company”). Information is registered. In the classified technical document DB 32, the application number is used as the document ID (identifier) of the classified technical document.

（処理の概要）
ＣＰＵ２１（以下、単に「ＣＰＵ」とも称呼される。）が実行する処理には参照技術文書追加処理、関連文書抽出処理、及び、被分類技術文書通知処理が含まれている。運用者が入力装置２６を操作して新しい参照技術文書（以下、「新参照技術文書」とも称呼される。）を文書情報提供装置１１に登録するとき、ＣＰＵは参照技術文書追加処理を実行し、その新参照技術文書の内容及び文書ＩＤを参照技術文書ＤＢ３１に追加する。 (Outline of processing)
The processing executed by the CPU 21 (hereinafter also simply referred to as “CPU”) includes reference technical document addition processing, related document extraction processing, and classified technical document notification processing. When the operator operates the input device 26 to register a new reference technical document (hereinafter also referred to as “new reference technical document”) in the document information providing apparatus 11, the CPU executes a reference technical document addition process. The contents of the new reference technical document and the document ID are added to the reference technical document DB 31.

新しい被分類技術文書（以下、「新被分類技術文書」とも称呼される。）が文書公開サーバ４２にて公開されると、ＣＰＵは、関連文書抽出処理を実行し、周知の方法（例えば、ｈｔｔｐリクエスト）により新被分類技術文書を取得（ダウンロード）する。更に、ＣＰＵは、関連文書抽出処理の実行時、新被分類技術文書と技術分野が関連している参照技術文書の集合を抽出する。次いで、ＣＰＵは、新被分類技術文書の内容及び文書ＩＤ、並びに、新被分類技術文書と技術分野が関連している参照技術文書の文書ＩＤを被分類技術文書ＤＢ３２に登録する。 When a new classified technical document (hereinafter also referred to as “new classified technical document”) is published by the document publication server 42, the CPU executes a related document extraction process and executes a known method (for example, A new classified technical document is acquired (downloaded) by an http request. Further, when executing the related document extraction process, the CPU extracts a set of reference technical documents related to the new classified technical document and the technical field. Next, the CPU registers the content and document ID of the new classified technical document and the document ID of the reference technical document related to the new classified technical document and the technical field in the classified technical document DB 32.

新被分類技術文書に関する情報が被分類技術文書ＤＢ３２に登録されると、ＣＰＵは、被分類技術文書通知処理を実行し、新被分類技術文書と技術分野が関連する参照技術文書のそれぞれの担当者宛てに電子メールを送信し、新被分類技術文書が公開されたこと及び新被分類技術文書の内容を担当者に通知する。 When information related to the new classified technical document is registered in the classified technical document DB 32, the CPU executes a classified technical document notification process, and takes charge of the new classified technical document and the reference technical document related to the technical field. An e-mail is sent to the person in charge to notify the person in charge that the new classified technical document has been released and the contents of the new classified technical document.

（参照技術文書追加処理）
参照技術文書追加処理の実行時、ＣＰＵは、新参照技術文書に対して周知の方法により形態素解析処理（所謂、テキストマイニング処理）を実行することによって新参照技術文書に含まれる単語（具体的には、名詞）と、単語のそれぞれの出現回数と、を単語毎に抽出する。文書に含まれる単語とその単語の出現回数との組合せの集合は「語彙分布」とも称呼される。 (Reference technical document addition process)
At the time of executing the reference technical document addition process, the CPU executes a morphological analysis process (so-called text mining process) on the new reference technical document by a well-known method to thereby execute a word (specifically, a word included in the new reference technical document). Is a noun) and the number of occurrences of each word. A set of combinations of a word included in a document and the number of occurrences of the word is also referred to as a “vocabulary distribution”.

参照技術文書ＤＢ３１に登録されている参照技術文書のぞれぞれの語彙分布の例が、図２の表に示される。図２の表において、文書１から文書（Ｎ−１）までの文書は、参照技術文書ＤＢ３１に登録される時点において既に参照技術文書ＤＢ３１に登録されていた参照技術文書であり、文書Ｎは、新たに参照技術文書ＤＢ３１に追加される新参照技術文書である。文書１から文書Ｎまでの参照技術文書のそれぞれには、単語１から単語ＭまでのＭ種類の単語のうちの一部（場合によっては、全部）の単語が含まれている。 An example of the vocabulary distribution of each reference technical document registered in the reference technical document DB 31 is shown in the table of FIG. In the table of FIG. 2, the documents from document 1 to document (N-1) are reference technical documents that have already been registered in the reference technical document DB 31 at the time of registration in the reference technical document DB 31. This is a new reference technical document that is newly added to the reference technical document DB 31. Each of the reference technical documents from document 1 to document N includes some (or all in some cases) of the M types of words from word 1 to word M.

（関連文書抽出処理−単純化した例を用いた説明）
関連文書抽出処理の実行時、ＣＰＵは、新被分類技術文書に関する語彙分布を抽出する。算出された新被分類技術文書の語彙分布は、図２に示される。更に、ＣＰＵは、新被分類技術文書及び参照技術文書ＤＢ３１に登録されたＮ個の参照技術文書から構成される文書の集合を、技術分野が関連した文書を集約することによって所定値Ｋ個のグループに分類する。そのため、ＣＰＵは、この（Ｎ＋１）個の文書の集合に含まれる任意の２つの文書の組合せに関して「２つ文書の技術分野が関連している程度を表す語彙分布距離Ｄ」をそれら２つの文書のそれぞれの語彙分布に基づいて算出する。語彙分布距離Ｄの値は、２つの文書の技術分野がより密接に関連しているほど小さくなる。 (Related document extraction process-explanation using simplified example)
When executing the related document extraction process, the CPU extracts a vocabulary distribution related to the new classified technical document. The calculated vocabulary distribution of the new classified technical document is shown in FIG. Further, the CPU collects a set of documents composed of the N reference technical documents registered in the new classified technical document and the reference technical document DB 31 by collecting the documents related to the technical field by a predetermined value K. Classify into groups. Therefore, the CPU sets “the vocabulary distribution distance D representing the degree to which the technical fields of the two documents are related” regarding the combination of any two documents included in the set of (N + 1) documents. Is calculated based on each vocabulary distribution. The value of the vocabulary distribution distance D decreases as the technical fields of the two documents are more closely related.

語彙分布距離Ｄの算出方法について、単純化した例を参照しながら説明する。語彙分布を構成する単語が単語１（例えば、「触媒」）と単語２（例えば、「温度」）の２つだけであると仮定したときの文書毎の語彙分布の例が、図３の分布図に示される。 A method for calculating the vocabulary distribution distance D will be described with reference to a simplified example. FIG. 3 shows an example of the vocabulary distribution for each document when it is assumed that there are only two words, ie, word 1 (for example, “catalyst”) and word 2 (for example, “temperature”). Shown in the figure.

図３において、単語１の出現回数がｘ_１軸によって表され、単語２の出現回数がｘ_２軸によって表されている。ｘ_１軸とｘ_２軸とは互いに直交する。図３の分布図には、参照技術文書のそれぞれが白い丸（「○」）によって表され、１つの新被分類技術文書が黒い四角（「■」）によって表されている。新被分類技術文書を表す黒い四角は、点Ｐ０とも称呼される。 3, the number of occurrences of the word 1 is represented by x ₁ axis, the number of occurrences of the word 2 is represented by x ₂ axis. The x ₁ axis and _{x 2} axes orthogonal to each other. In the distribution diagram of FIG. 3, each of the reference technical documents is represented by a white circle (“◯”), and one new classified technical document is represented by a black square (“■”). The black square representing the new classified technical document is also referred to as point P0.

ある２つの文書の組合せに関する語彙分布距離Ｄは、分布図上のそれらの文書のそれぞれに対応する２つの点（白い丸又は黒い四角）の間の距離として算出される。２つの点の間の距離が短ければ、語彙分布距離Ｄが小さくなる。具体的には、２つの文書の一方における単語１の出現回数と他方における単語１の出現回数との差が小さく、且つ、一方における単語２の出現回数と他方における単語２の出現回数との差が小さければ、語彙分布距離Ｄが小さくなる。換言すれば、２つの文書のそれぞれの語彙分布が類似していれば、語彙分布距離Ｄが小さくなる。語彙分布距離Ｄが小さければ、それに対応する２つの文書は技術分野が互いに関連していると考えられる。 The vocabulary distribution distance D relating to a combination of two documents is calculated as a distance between two points (white circles or black squares) corresponding to each of the documents on the distribution map. If the distance between the two points is short, the vocabulary distribution distance D becomes small. Specifically, the difference between the number of occurrences of word 1 in one of the two documents and the number of appearances of word 1 in the other is small, and the difference between the number of occurrences of word 2 in one and the number of occurrences of word 2 in the other Is smaller, the vocabulary distribution distance D is smaller. In other words, if the vocabulary distributions of the two documents are similar, the vocabulary distribution distance D becomes small. If the vocabulary distribution distance D is small, the two documents corresponding to the vocabulary distribution distance D are considered to be related to each other in the technical field.

例えば、図３の点Ｐ１によって表された参照技術文書は、単語１をｘ_１１個を含み（即ち、ｘ_１＝ｘ_１１）、単語２をｘ_２１個含んでいる（即ち、ｘ_２＝ｘ_２１）。この場合、点Ｐ１の座標は点Ｐ１（ｘ_１１，ｘ_２１）として表すことができる。同様に、図３の点Ｐ２によって表された参照技術文書は、点Ｐ２（ｘ_１２，ｘ_２２）と表され、図３の点Ｐ３によって表された参照技術文書は、点Ｐ３（ｘ_１３，ｘ_２３）と表される。 For example, the reference technical document represented by the point P1 in FIG. 3 contains the word 1 containing x ₁ 1 (ie x ₁ = x ₁ 1) and the word 2 containing x ₂ 1 (ie x ₂ = x ₂ 1). In this case, the coordinates of the point P1 can be represented as a point P1 (x ₁ 1, x ₂ 1). Similarly, the reference technical document represented by the point P2 in FIG. 3 is represented as a point P2 (x ₁ 2, x ₂ 2), and the reference technical document represented by the point P3 in FIG. ₁ 3, x ₂ 3).

図３の分布図における点Ｐ１と点Ｐ２との間の距離（即ち、語彙分布距離Ｄ）をＤ（１，２）と表せば、語彙分布距離Ｄ（１，２）は、下式（１）によって算出される。同様に、点Ｐ２と点Ｐ３との間の語彙分布距離Ｄ（２，３）は下式（２）によって算出される。

If the distance between the points P1 and P2 in the distribution diagram of FIG. 3 (ie, the vocabulary distribution distance D) is expressed as D (1,2), the vocabulary distribution distance D (1,2) can be expressed by the following formula (1 ). Similarly, the vocabulary distribution distance D (2, 3) between the points P2 and P3 is calculated by the following equation (2).

任意の２つの文書の組合せについて語彙分布距離Ｄが算出されると、語彙分布距離Ｄが第１閾値Ｄｔｈ１以下である文書（参照技術文書及び新被分類技術文書）で構成されたグループが生成される。例えば、語彙分布距離Ｄ（１，２）及び語彙分布距離Ｄ（２，３）は、共に第１閾値Ｄｔｈ１以下である。従って、点Ｐ１、点Ｐ２及び点Ｐ３のそれぞれによって表される文書は、同じグループ（便宜上、「グループ１」とも称呼される。）に含まれる。 When the vocabulary distribution distance D is calculated for a combination of two arbitrary documents, a group composed of documents (reference technical document and new classified technical document) whose vocabulary distribution distance D is equal to or less than the first threshold Dth1 is generated. The For example, the vocabulary distribution distance D (1,2) and the vocabulary distribution distance D (2,3) are both equal to or less than the first threshold value Dth1. Therefore, the documents represented by the points P1, P2, and P3 are included in the same group (also referred to as “group 1” for convenience).

一方、点Ｐ３と点Ｐ４のそれぞれによって表された２つの文書の組合せに関する語彙分布距離Ｄ（３，４）は、第１閾値Ｄｔｈ１よりも大きい。従って、点Ｐ４によって表される文書はグループ１には含まれない。点Ｐ４によって表される文書が含まれるグループは、便宜上「グループ２」とも称呼される。 On the other hand, the vocabulary distribution distance D (3,4) regarding the combination of the two documents represented by the points P3 and P4 is larger than the first threshold value Dth1. Therefore, the document represented by the point P4 is not included in the group 1. The group including the document represented by the point P4 is also referred to as “group 2” for convenience.

本例において、文書のそれぞれは、５つのグループ（即ち、グループ１〜グループ５）に分割されている。図３において、５つのグループのそれぞれに含まれる文書を表す点（白い丸又は黒い四角）は、閉曲線Ｃ１〜閉曲線Ｃ５のそれぞれによってそれぞれ囲まれている。新被分類技術文書を表す点Ｐ０は、グループ１に含まれる。そのため、グループ１に含まれる参照技術文書のそれぞれが、新被分類技術文書と技術分野が関連していることが判る。 In this example, each of the documents is divided into five groups (that is, group 1 to group 5). In FIG. 3, points (white circles or black squares) representing documents included in each of the five groups are surrounded by closed curves C1 to C5, respectively. The point P0 representing the new classified technical document is included in the group 1. Therefore, it can be understood that each of the reference technical documents included in the group 1 is related to the new classified technical document and the technical field.

本例において、第１閾値Ｄｔｈ１は、生成されるグループの数が「５」となるように設定されていた。例えば、第１閾値Ｄｔｈ１よりも短い第２閾値Ｄｔｈ２（即ち、Ｄｔｈ１＞Ｄｔｈ２）に基づいて文書をグルーピングすると、グループ４（閉曲線Ｃ４で囲まれた文書によって構成されるグループ）は、破線Ｃ４ａ及び破線Ｃ４ｂのそれぞれによって囲まれる２つのグループに分割される。 In this example, the first threshold value Dth1 is set so that the number of generated groups is “5”. For example, when documents are grouped based on a second threshold value Dth2 (that is, Dth1> Dth2) shorter than the first threshold value Dth1, group 4 (a group constituted by documents surrounded by a closed curve C4) is represented by a broken line C4a and a broken line. Divided into two groups surrounded by each of C4b.

即ち、語彙分布距離Ｄの閾値が小さくなるほど生成される文書グループの数が多くなる。換言すれば、語彙分布距離Ｄの閾値が小さくなるほど技術分野がより密接に関連した文書によって構成された文書グループが生成される。 That is, the smaller the threshold value of the vocabulary distribution distance D is, the more document groups are generated. In other words, as the threshold value of the vocabulary distribution distance D becomes smaller, a document group composed of documents with more closely related technical fields is generated.

（関連文書抽出処理−実際の処理）
図３に示された例では、分布図が２次元の直交座標系によって示されていたが（即ち、参照技術文書及び新被分類技術文書のそれぞれには単語１及び単語２以外の単語が含まれていないと仮定していたが）、実際の文書には多数の単語が含まれる。従って、実際の語彙分布距離Ｄは、互いに直行する多数の軸を有する多次元直交座標系において各文書を表す点が語彙分布に応じてプロットされ、それらの点のうちの任意の２つの点の間の距離に基づいて算出される。 (Related Document Extraction Process-Actual Process)
In the example shown in FIG. 3, the distribution map is shown by a two-dimensional orthogonal coordinate system (that is, each of the reference technical document and the new classified technical document includes words other than word 1 and word 2). The actual document contains a large number of words. Therefore, the actual vocabulary distribution distance D is obtained by plotting points representing each document in accordance with the vocabulary distribution in a multi-dimensional orthogonal coordinate system having a number of axes orthogonal to each other. It is calculated based on the distance between them.

参照技術文書及び新被分類技術文書がＭ種類の単語の一部（場合によっては、全部）によって構成されていれば、Ｍ次元の多次元直交座標系における点Ｐａ（ｘ_１ａ，ｘ_２ａ，…，ｘ_Ｍａ）及び点Ｐｂ（ｘ_１ｂ，ｘ_２ｂ，…，ｘ_Ｍｂ）の間の語彙分布距離Ｄ（Ｐａ，Ｐｂ）は、下式（３）によって算出される。

If the reference technical document and the new classified technical document are composed of part (or all in some cases) of M types of words, the point Pa (x ₁ a, x ₂ a in the M-dimensional multi-dimensional orthogonal coordinate system is used. ,..., X _M a) and the vocabulary distribution distance D (Pa, Pb) between the points Pb (x ₁ b, x ₂ b,..., X _M b) are calculated by the following equation (3).

任意の２つの文書の組合せに対して語彙分布距離Ｄが算出されると、語彙分布距離Ｄが距離閾値Ｄｔｈ以下である範囲に含まれる参照技術文書及び新被分類技術文書を集約することによって文書がグルーピングされる。加えて、生成されるグループの数が所定値Ｋ個となるように距離閾値Ｄｔｈが調整される。Ｋ個の集合に分割された文書の集合のうち、新被分類技術文書が属する集合に含まれる参照技術文書のそれぞれ（便宜上、「関連文書」とも称呼される。）が、新被分類技術文書と技術分野が関連していることが判る。 When the vocabulary distribution distance D is calculated for a combination of two arbitrary documents, the document is obtained by aggregating the reference technical documents and the new classified technical documents included in the range where the vocabulary distribution distance D is equal to or less than the distance threshold Dth. Are grouped. In addition, the distance threshold Dth is adjusted so that the number of groups to be generated is a predetermined value K. Of the set of documents divided into K sets, each of the reference technical documents (also referred to as “related documents” for convenience) included in the set to which the new classified technical document belongs is referred to as the new classified technical document. And technical fields are related.

（被分類技術文書通知処理）
ＣＰＵは、関連文書抽出処理が完了すると、被分類技術文書通知処理を実行する。即ち、関連文書抽出処理により抽出された関連文書のそれぞれの担当者に対して電子メールを送信する。 (Classified technical document notification process)
When the related document extraction process is completed, the CPU executes a classified technical document notification process. That is, an e-mail is transmitted to each person in charge of the related document extracted by the related document extraction process.

（具体的作動−参照技術文書追加処理）
次に、ＣＰＵの具体的作動について図４〜図６のフローチャートを参照しながら説明する。上述したように、文書情報提供装置１１の運用者が操作インタフェース２５を操作して新参照技術文書を登録するとき、ＣＰＵは参照技術文書追加処理を実行する。具体的には、ＣＰＵは、図４のステップ４００から処理を開始し、ステップ４０５に進む。 (Specific operation-reference technical document addition processing)
Next, a specific operation of the CPU will be described with reference to the flowcharts of FIGS. As described above, when the operator of the document information providing apparatus 11 operates the operation interface 25 to register a new reference technical document, the CPU executes a reference technical document addition process. Specifically, the CPU starts the process from step 400 in FIG. 4 and proceeds to step 405.

ステップ４０５にてＣＰＵは、操作インタフェース２５のＵＳＢポートに接続されたＵＳＢメモリに保存された新参照技術文書（具体的には、特許出願の願書及び願書に添付された明細書等のＨＴＭＬファイル）を読み込み、その内容をＲＡＭ２２に保存する。 In step 405, the CPU makes a new reference technical document stored in the USB memory connected to the USB port of the operation interface 25 (specifically, an HTML file such as a patent application application and a specification attached to the application). And the contents are stored in the RAM 22.

次いで、ＣＰＵは、ステップ４１０に進み、新参照技術文書から語彙分布を抽出する。更に、ＣＰＵは、ステップ４１５に進み、新参照技術文書の内容及び新参照技術文書の語彙分布等を参照技術文書ＤＢ３１に登録する。その後、ＣＰＵは、ステップ４９５に進んで本ルーチンを終了する。 Next, the CPU proceeds to step 410 to extract a vocabulary distribution from the new reference technical document. Further, the CPU proceeds to step 415 to register the contents of the new reference technical document and the vocabulary distribution of the new reference technical document in the reference technical document DB 31. Thereafter, the CPU proceeds to step 495 to end the present routine.

（具体的作動−関連文書抽出処理）
一方、ＣＰＵは、図５にフローチャートにより表された関連文書抽出処理を所定の時間が経過する毎に実行する。従って、適当なタイミングとなると、ＣＰＵは、図５のステップ５００から処理を開始し、ステップ５０５に進む。 (Specific operation-related document extraction processing)
On the other hand, the CPU executes the related document extraction process represented by the flowchart in FIG. 5 every time a predetermined time elapses. Therefore, when the appropriate timing is reached, the CPU starts processing from step 500 in FIG. 5 and proceeds to step 505.

ステップ５０５にてＣＰＵは、文書公開サーバ４２上に新被分類技術文書が公開されているか否かを判定する。具体的には、ＣＰＵは、文書公開サーバ４２上に、前回本ルーチンを実行した後に公開された被分類技術文書があるか否かを確認する。新被分類技術文書が公開されていれば、ＣＰＵは、ステップ５０５にて「Ｙｅｓ」と判定し、ステップ５１０に進み、新被分類技術文書を文書公開サーバ４２から取得（ダウンロード）する。 In step 505, the CPU determines whether or not the new classified technical document is published on the document publication server 42. Specifically, the CPU confirms whether or not there is a classified technical document published after the previous execution of this routine on the document publication server 42. If the new classified technical document is published, the CPU makes a “Yes” determination at step 505 to proceed to step 510 to acquire (download) the new classified technical document from the document publication server 42.

更に、ＣＰＵは、ステップ５１５に進み、取得した新被分類技術文書から語彙分布を抽出する。その後、ＣＰＵは、ステップ５２０に進み、新被分類技術文書及び参照技術文書のうち任意の２つの文書の組合せに対して上記式（３）に基づいて語彙分布距離Ｄを算出する。 Further, the CPU proceeds to step 515 to extract a vocabulary distribution from the acquired new classified technical document. Thereafter, the CPU proceeds to step 520 to calculate the vocabulary distribution distance D based on the above equation (3) for a combination of two arbitrary documents among the new classified technical document and the reference technical document.

次いで、ＣＰＵは、ステップ５２５に進み、新被分類技術文書からの語彙分布距離Ｄが所定の参照距離Ｄｒよりも短い範囲に参照技術文書が存在しているか否かを判定する。参照距離Ｄｒは、語彙分布距離Ｄが参照距離Ｄｒより大きいと、その語彙分布距離Ｄに関連する２つの技術文書は技術内容が互いに関連していないと判断される値となるように設定されている。 Next, the CPU proceeds to step 525 to determine whether or not the reference technical document exists in a range where the vocabulary distribution distance D from the new classified technical document is shorter than the predetermined reference distance Dr. The reference distance Dr is set so that if the vocabulary distribution distance D is greater than the reference distance Dr, the two technical documents related to the vocabulary distribution distance D are determined to be not related to each other. Yes.

新被分類技術文書との間の語彙分布距離Ｄが参照距離Ｄｒよりも短い参照技術文書が存在していれば、ＣＰＵは、ステップ５２５にて「Ｙｅｓ」と判定してステップ５３０に進み、距離閾値Ｄｔｈの値を参照距離Ｄｒに設定する。更に、ＣＰＵは、ステップ５３０に進み、語彙分布距離Ｄが距離閾値Ｄｔｈ以下である文書を集約することによって文書（参照技術文書及び新被分類技術文書）のグルーピングを行う。 If there is a reference technical document in which the vocabulary distribution distance D to the new classified technical document is shorter than the reference distance Dr, the CPU makes a “Yes” determination at step 525 to proceed to step 530 to determine the distance. The threshold value Dth is set as the reference distance Dr. Further, the CPU proceeds to step 530 to group documents (reference technical document and new classified technical document) by collecting documents whose vocabulary distribution distance D is equal to or smaller than the distance threshold Dth.

次いで、ＣＰＵは、ステップ５４０に進み、生成された文書グループの数が所定値Ｋ以下であるか否かを判定する。文書グループの数が所定値Ｋよりも大きければ、ＣＰＵは、ステップ５４０にて「Ｎｏ」と判定してステップ５４５に進み、距離閾値Ｄｔｈの値を所定値Ｄｃだけ小さくする。次いで、ＣＰＵは、ステップ５３５に進む。 Next, the CPU proceeds to step 540 to determine whether or not the number of generated document groups is equal to or less than a predetermined value K. If the number of document groups is larger than the predetermined value K, the CPU makes a “No” determination at step 540 to proceed to step 545 to decrease the value of the distance threshold Dth by the predetermined value Dc. Next, the CPU proceeds to step 535.

ＣＰＵは、生成された文書グループの数が所定値Ｋ以下となるまで、ステップ５３５乃至ステップ５４５の処理を繰り返し実行する。生成された文書グループの数が所定値Ｋ以下となれば、ＣＰＵは、ステップ５４０にて「Ｙｅｓ」と判定してステップ５５０に進み、新被分類技術文書と同じ文書グループに含まれていた参照技術文書が新被分類技術文書と技術分野が関連する文書（即ち、関連文書）であると判定する。更に、ＣＰＵは、ステップ５５５に進み、新被分類技術文書の内容及び関連文書の文書ＩＤ等を被分類技術文書ＤＢ３２に登録する。 The CPU repeatedly executes the processing from step 535 to step 545 until the number of generated document groups becomes equal to or less than the predetermined value K. If the number of generated document groups is equal to or less than the predetermined value K, the CPU makes a “Yes” determination at step 540 to proceed to step 550, where the reference included in the same document group as the new classified technical document. It is determined that the technical document is a document related to the new classified technical document and the technical field (that is, a related document). Further, the CPU proceeds to step 555 to register the contents of the new classified technical document and the document ID of the related document in the classified technical document DB 32.

次いで、ＣＰＵは、ステップ５６０に進み、文書公開サーバ４２上に公開された新被分類技術文書を総て処理したか否かを判定する。未処理の新被分類技術文書が残っていれば、ＣＰＵは、ステップ５６０にて「Ｎｏ」と判定してステップ５１０に進む。 Next, the CPU proceeds to step 560 to determine whether or not all new classified technical documents published on the document publication server 42 have been processed. If an unprocessed new classified technical document remains, the CPU makes a “No” determination at step 560 to proceed to step 510.

一方、総ての新被分類技術文書に対して上述した処理が完了していれば、ＣＰＵは、ステップ５６０にて「Ｙｅｓ」と判定してステップ５９５に進んで、本ルーチンを終了する。 On the other hand, if the processing described above has been completed for all new classified technical documents, the CPU makes a “Yes” determination at step 560 to proceed to step 595 to end the present routine.

加えて、新被分類技術文書との間の語彙分布距離Ｄが参照距離Ｄｒよりも短い参照技術文書が存在していなければ、ＣＰＵは、ステップ５２５にて「Ｎｏ」と判定してステップ５６５に進み、関連文書が存在しないと判定する。更に、ＣＰＵは、ステップ５５５に進み、新被分類技術文書の内容及び関連文書が存在しないこと等を被分類技術文書ＤＢ３２に登録する。 In addition, if there is no reference technical document in which the vocabulary distribution distance D to the new classified technical document is shorter than the reference distance Dr, the CPU makes a “No” determination at step 525 to proceed to step 565. It is determined that the related document does not exist. Further, the CPU proceeds to step 555 to register the contents of the new classified technical document and the absence of the related document in the classified technical document DB 32.

なお、文書公開サーバ４２上に新被分類技術文書が公開されていなければ、ＣＰＵは、ステップ５０５にて「Ｎｏ」と判定してステップ５９５に直接進む。 If the new classified technical document is not published on the document publication server 42, the CPU makes a “No” determination at step 505 to directly proceed to step 595.

（具体的作動−被分類技術文書通知処理）
ＣＰＵは、上述した関連文書抽出処理が終了すると（即ち、図５のステップ５９５に進んだ後）、被分類技術文書通知処理を実行する。具体的には、ＣＰＵは、図６のステップ６００から処理を開始し、ステップ６０５に進む。ステップ６０５にてＣＰＵは、被分類技術文書ＤＢ３２に登録された後、本ルーチンによって処理されていない被分類技術文書（未処理文書）が存在しているか否かを判定する。 (Specific operation-classified technical document notification process)
When the related document extraction process described above ends (that is, after proceeding to step 595 in FIG. 5), the CPU executes a classified technical document notification process. Specifically, the CPU starts the process from step 600 in FIG. 6 and proceeds to step 605. In step 605, the CPU determines whether or not there is a classified technical document (unprocessed document) that has not been processed by this routine after being registered in the classified technical document DB 32.

未処理文書が存在していれば、ＣＰＵは、ステップ６０５にて「Ｙｅｓ」と判定してステップ６１０に進み、その未処理文書に対して関連文書が登録されているか否かを判定する。関連文書が登録されていれば、ＣＰＵは、ステップ６１０にて「Ｙｅｓ」と判定してステップ６１５に進み、関連文書のそれぞれの担当者を参照技術文書ＤＢ３１から抽出する。 If an unprocessed document exists, the CPU makes a “Yes” determination at step 605 to proceed to step 610 to determine whether or not a related document is registered for the unprocessed document. If the related document is registered, the CPU makes a “Yes” determination at step 610 to proceed to step 615 to extract each person in charge of the related document from the reference technical document DB 31.

次いで、ＣＰＵは、ステップ６２０に進み、電子メールサーバ４３に対して担当者への電子メールの送信をリクエストする。更に、ＣＰＵは、ステップ６２５に進み、総ての未処理文書に対して処理が完了したか否かを判定する。総ての未処理文書に対して処理が完了していれば、ＣＰＵは、ステップ６２５にて「Ｙｅｓ」と判定してステップ６９５に進み、本ルーチンを終了する。 Next, the CPU proceeds to step 620 and requests the email server 43 to send an email to the person in charge. Further, the CPU proceeds to step 625 to determine whether or not processing has been completed for all unprocessed documents. If processing has been completed for all unprocessed documents, the CPU makes a “Yes” determination at step 625 to proceed to step 695 to end the present routine.

一方、未処理文書が残っていれば、ＣＰＵは、ステップ６２５にて「Ｎｏ」と判定してステップ６１０に進む。なお、未処理文書に対して関連文書が登録されていなければ、ＣＰＵは、ステップ６１０にて「Ｎｏ」と判定してステップ６２５に直接進む。加えて、本ルーチンの実行開始時に未処理文書が存在していなければ、ＣＰＵは、ステップ６０５にて「Ｎｏ」と判定してステップ６９５に直接進む。 On the other hand, if an unprocessed document remains, the CPU makes a “No” determination at step 625 to proceed to step 610. If no related document is registered for the unprocessed document, the CPU makes a “No” determination at step 610 to proceed directly to step 625. In addition, if there is no unprocessed document at the start of execution of this routine, the CPU makes a “No” determination at step 605 to directly proceed to step 695.

以上、説明したように、第１装置（文書情報配信装置１１）は、
被分類技術文書を取得する文書取得部（ネットワークインタフェース２４等）と、
それぞれに担当者が割り当てられた複数の参照技術文書に関する情報を記憶する文書記憶部（ＨＤＤ２３）と、
前記被分類技術文書が前記文書取得部によって取得されたとき、前記被分類技術文書を構成する語彙と類似する語彙によって構成される１つ又は複数の前記参照技術文書を関連文書として抽出する関連文書抽出処理を実行する関連文書抽出部（ＣＰＵ２１等）と、
前記関連文書抽出部によって抽出された前記関連文書のそれぞれの前記担当者に対して前記被分類技術文書に関する情報を提供する情報提供部（ネットワークインタフェース２４等）と、
を備えている。 As described above, the first device (document information distribution device 11)
A document acquisition unit (such as a network interface 24) that acquires classified technical documents;
A document storage unit (HDD 23) for storing information on a plurality of reference technical documents each assigned a person in charge;
When the classified technical document is acquired by the document acquisition unit, a related document that extracts one or a plurality of the reference technical documents configured by vocabulary similar to the vocabulary configuring the classified technical document as a related document A related document extraction unit (CPU 21 or the like) that executes extraction processing;
An information providing unit (such as a network interface 24) that provides information on the classified technical document to each person in charge of the related document extracted by the related document extracting unit;
It has.

加えて、前記関連文書抽出部は、
ある技術文書に含まれる単語のそれぞれの同技術文書における出現頻度に基づいて定まる第１語彙分布を前記被分類技術文書及び前記複数の参照技術文書のそれぞれに対して算出し（図２、図４のステップ４１０及び図５のステップ５１５）、
２つの技術文書のそれぞれの前記第１語彙分布が互いに類似しているほど小さい値になる第１語彙分布距離（語彙分布距離Ｄ）を前記被分類技術文書及び前記複数の参照技術文書のうちの任意の２つの技術文書の組合せに対して算出し（図５のステップ５２０）、
前記第１語彙分布距離が所定値（距離閾値Ｄｔｈ）よりも小さい前記技術文書の組合せに含まれる技術文書のそれぞれを集約することによって所定数（Ｋ）の近傍文書グループを生成し（図５のステップ５３５乃至ステップ５４５）、
前記被分類技術文書が含まれる前記近傍文書グループに含まれる前記参照技術文書を前記関連文書として抽出する（図５のステップ５５０）、
ことによって関連文書抽出処理を実行するように構成されている。 In addition, the related document extraction unit
A first vocabulary distribution determined based on the appearance frequency of each word included in a technical document is calculated for each of the classified technical document and the plurality of reference technical documents (FIGS. 2 and 4). Step 410 and step 515 of FIG. 5),
The first vocabulary distribution distance (vocabulary distribution distance D), which becomes smaller as the first vocabulary distributions of the two technical documents are similar to each other, is determined from among the classified technical document and the plurality of reference technical documents. Calculate for any two technical document combinations (step 520 in FIG. 5);
A predetermined number (K) of neighboring document groups are generated by aggregating each of the technical documents included in the technical document combination in which the first vocabulary distribution distance is smaller than a predetermined value (distance threshold Dth) (FIG. 5). Steps 535 to 545),
Extracting the reference technical document included in the neighboring document group including the classified technical document as the related document (step 550 in FIG. 5);
Thus, the related document extraction process is executed.

第１装置によれば、文書公開サーバ４２にて新たに被分類技術文書が公開されたとき、この被分類技術文書と技術分野が関連する参照技術文書の担当者は、被分類技術文書に関する情報を自動的に受信することができる。この場合、担当者及び運用者は、第１装置に抽出条件（例えば、単語検索に用いられる検索ワード、及び、公開特許公報の分類情報等）を予め登録しておく必要がない。そのため、第１装置によれば、簡易な運用操作によって担当者のそれぞれが担当している技術分野に関連する新たな被分類技術文書の情報配信を実現することができる。 According to the first apparatus, when a classified technical document is newly published by the document publication server 42, the person in charge of the reference technical document related to the classified technical document and the technical field can obtain information on the classified technical document. Can be received automatically. In this case, the person in charge and the operator do not need to register the extraction conditions (for example, the search word used for word search and the classification information of the published patent publication) in advance in the first device. Therefore, according to the first device, it is possible to realize information distribution of new classified technical documents related to the technical field in charge of each person in charge by a simple operation operation.

加えて、第１装置は関連文書を参照技術文書及び被分類技術文書の語彙分布に基づいて精度良く抽出するので、担当者が担当している技術分野に関連しない被分類技術文書に関する情報を配信することを回避することができる。 In addition, since the first device accurately extracts the related document based on the vocabulary distribution of the reference technical document and the classified technical document, the information about the classified technical document that is not related to the technical field that the person in charge is in charge of is distributed. Can be avoided.

＜第２実施形態＞
次に、本発明の第２実施形態に係る文書情報配信装置１２（以下、「第２装置」とも称呼される。）について説明する。第１装置は、関連文書抽出処理の実行時、参照技術文書及び新被分類技術文書の集合を所定値Ｋ個のグループに分割し、新被分類技術文書と同じグループに属する参照技術文書のそれぞれが関連文書であると判定していた。これに対し、第２装置は、参照技術文書追加処理の実行時に参照技術文書をＫ個のグループに分割しておき、関連文書抽出処理の実行時に新被分類技術文書がどの参照技術文書のグループに属するかを判定することによって関連文書を抽出する。以下、この相違点を中心に説明する。 Second Embodiment
Next, the document information distribution apparatus 12 (hereinafter also referred to as “second apparatus”) according to the second embodiment of the present invention will be described. The first device divides a set of reference technical documents and new classified technical documents into groups of K predetermined values at the time of executing the related document extraction process, and each of the reference technical documents belonging to the same group as the new classified technical documents. Was determined to be a related document. On the other hand, the second device divides the reference technical document into K groups when the reference technical document addition process is executed, and which reference technical document group is the new classified technical document when the related document extraction process is executed. The related document is extracted by determining whether it belongs to. Hereinafter, this difference will be mainly described.

（参照技術文書追加処理）
文書情報配信装置１２のＣＰＵ２１（以下、単に「ＣＰＵ」とも称呼される。）は、参照技術文書追加処理の実行時、上述した語彙分布距離Ｄに基づく文書のグルーピングによって新参照技術文書を含む参照技術文書の集合をＫ個の「技術分野が互いに関連する参照技術文書の集合」（以下、「参照技術文書グループ」とも称呼される。）に分割する。 (Reference technical document addition process)
The CPU 21 of the document information distribution apparatus 12 (hereinafter also simply referred to as “CPU”) includes a reference including a new reference technical document by grouping the documents based on the vocabulary distribution distance D described above when the reference technical document addition process is executed. The set of technical documents is divided into K “sets of reference technical documents in which technical fields are related to each other” (hereinafter also referred to as “reference technical document group”).

更に、ＣＰＵは、参照技術文書グループのそれぞれについて、「参照技術文書グループに含まれる参照技術文書の数」に対する「その参照技術文書グループにおける、ある単語を含んでいる参照技術文書の数」の比率である単語含有率θを単語毎に算出する。即ち、単語含有率θは、参照技術文書グループ毎、且つ、単語毎に定まる値である。単語含有率θは、後述される関連文書抽出処理において参照される。 Further, the CPU, for each reference technical document group, the ratio of “the number of reference technical documents including a certain word in the reference technical document group” to “the number of reference technical documents included in the reference technical document group”. Is calculated for each word. That is, the word content rate θ is a value determined for each reference technical document group and for each word. The word content rate θ is referred to in a related document extraction process described later.

ＣＰＵは、参照技術文書グループに含まれる参照技術文書のいずれにも含まれない単語については単語含有率θを算出しない。従って、単語含有率θは０より大きく且つ１以下の値である（即ち、０＜θ≦１）。 The CPU does not calculate the word content rate θ for words that are not included in any of the reference technical documents included in the reference technical document group. Accordingly, the word content θ is a value greater than 0 and less than or equal to 1 (that is, 0 <θ ≦ 1).

例えば、ある参照技術文書グループが１０個の参照技術文書によって構成されていて、そのうちの４個の参照技術文書が単語１を含んでいた場合、その参照技術文書グループにおける単語１の単語含有率θは４／１０＝０．４となる。 For example, when a certain reference technical document group includes 10 reference technical documents, and four of the reference technical documents include the word 1, the word content rate θ of the word 1 in the reference technical document group Is 4/10 = 0.4.

参照技術文書グループ及び単語含有率θの例が図７に示される。図７には、参照技術文書グループａ及び参照技術文書グループｂが示されている。グループａには文書ａ１から文書ａＮａまでのＮａ個の参照技術文書が含まれ、グループｂには文書ｂ１から文書ｂＮｂまでのＮｂ個の参照技術文書が含まれている。 An example of the reference technical document group and the word content rate θ is shown in FIG. FIG. 7 shows a reference technical document group a and a reference technical document group b. Group a includes Na reference technical documents from document a1 to document aNa, and group b includes Nb reference technical documents from document b1 to document bNb.

グループａに含まれる参照技術文書のそれぞれは、単語ａ１から単語ａＭａまでのＭａ個の単語のうちの一部（場合によっては、全部）を含んでいる。グループｂに含まれる参照技術文書のそれぞれは、単語ｂ１から単語ｂＭｂまでのＭｂ個の単語のうちの一部（場合によっては、全部）を含んでいる。 Each of the reference technical documents included in the group a includes some (in some cases, all) of the Ma words from the word a1 to the word aMa. Each of the reference technical documents included in the group b includes a part (or all in some cases) of the Mb words from the word b1 to the word bMb.

図７において、単語のそれぞれが参照技術文書のそれぞれに含まれているか否かが「１」及び「０」の文字によって表されている。例えば、グループａに含まれる文書ａ１には単語ａ１が含まれるので図７の表の該当する箇所には「１」が記載されている。一方、文書ａ１には単語ａ４が含まれていないので図７の表の該当する箇所には「０」が記載されている。図７の表において「１」及び「０」によって表される、ある文書が含んでいる単語の集合は、「語彙集合」とも称呼される。 In FIG. 7, whether or not each of the words is included in each of the reference technical documents is represented by characters “1” and “0”. For example, since the word a1 is included in the document a1 included in the group a, “1” is described in the corresponding part of the table of FIG. On the other hand, since the word a4 is not included in the document a1, “0” is described in the corresponding portion of the table of FIG. A set of words included in a document represented by “1” and “0” in the table of FIG. 7 is also referred to as a “vocabulary set”.

加えて、図７の表にはグループａの単語含有率θａ及びグループｂの単語含有率θｂが単語のそれぞれについて表されている。例えば、グループａに含まれる参照技術文書の数（Ｎａ個）に対する単語ａ１を含む参照技術文書の数の比率（即ち、単語含有率θａ）は「０．１５」である。 In addition, the table of FIG. 7 shows the word content rate θa of group a and the word content rate θb of group b for each word. For example, the ratio of the number of reference technical documents including the word a1 to the number of reference technical documents (Na) included in the group a (namely, the word content θa) is “0.15”.

（関連文書抽出処理−単純化した例を用いた説明）
ＣＰＵは、関連文書抽出処理の実行時、文書関連度Ｐを「参照技術文書グループの単語含有率θ」及び「新被分類技術文書の語彙集合」に基づいて参照技術文書グループのそれぞれに対して算出する。文書関連度Ｐは、新被分類技術文書の技術分野と、ある参照技術文書グループに含まれる参照技術文書の技術分野と、が関連している程度を表す値であり、より密接に関連しているほど大きくなる。従って、ＣＰＵは、文書関連度Ｐの値が最大となる参照技術文書グループに新被分類技術文書が属すると判定する。 (Related document extraction process-explanation using simplified example)
When executing the related document extraction process, the CPU sets the document relevance level P for each of the reference technical document groups based on the “word content rate θ of the reference technical document group” and the “vocabulary set of new classified technical documents”. calculate. The document relevance P is a value representing the degree to which the technical field of the new classified technical document and the technical field of the reference technical document included in a certain reference technical document group are related, and is more closely related. The bigger you are. Therefore, the CPU determines that the new classified technical document belongs to the reference technical document group having the maximum document relevance value P.

文書関連度Ｐの算出方法について、単純化した例を参照しながら説明する。ある参照技術文書グループを構成する参照技術文書に含まれる単語が４個であると仮定したときの、その参照技術文書グループに関する単語含有率θ及び新被分類技術文書の語彙集合の例が、図８（Ａ）及び（Ｂ）のそれぞれに示されている。 A method of calculating the document relevance level P will be described with reference to a simplified example. An example of a word content rate θ related to a reference technical document group and a vocabulary set of a new classified technical document when the number of words included in the reference technical document constituting a reference technical document group is four is shown in FIG. 8 (A) and (B) respectively.

図８（Ａ）及び（Ｂ）には、新被分類技術文書が４個の単語のそれぞれを含むか否か（即ち、語彙集合）が「１」及び「０」の文字によって表されている。概して、図８（Ａ）の例において、新被分類技術文書は、単語含有率θが高い単語を含み、単語含有率θが低い単語を含んでいない。一方、図８（Ｂ）の例において、新被分類技術文書は、単語含有率θが低い単語を含み、単語含有率θが高い単語を含んでいない。 8A and 8B, whether or not the new classified technical document includes each of the four words (that is, the vocabulary set) is represented by the characters “1” and “0”. . In general, in the example of FIG. 8A, the new classified technical document includes a word having a high word content rate θ and does not include a word having a low word content rate θ. On the other hand, in the example of FIG. 8B, the new classified technical document includes words with a low word content rate θ and does not include words with a high word content rate θ.

文書関連度Ｐは、４個の単語のそれぞれについて、「ある単語の単語含有率θ及びその単語が新被分類技術文書に含まれるか否かに基づいて定まる係数Ｒ」を求め、それら４個の係数Ｒのそれぞれを乗じることによって算出される。係数Ｒは、ある単語が新被分類技術文書に含まれていれば、その単語の単語含有率θと等しくなり、その単語が新被分類技術文書に含まれていなければ、１からその単語の単語含有率θを減じた値（即ち、（１−θ））となる。 For each of the four words, the document relevance level P is obtained by obtaining “a word content rate θ of a certain word and a coefficient R determined based on whether or not the word is included in the new classified technical document”. Is calculated by multiplying each coefficient R. The coefficient R is equal to the word content rate θ of a word if the word is included in the new classified technical document. If the word is not included in the new classified technical document, the coefficient R is 1 to A value obtained by subtracting the word content rate θ (that is, (1−θ)).

例えば、図８（Ａ）において、単語１の単語含有率θは「０．７」であり、新被分類技術文書は単語１を含んでいるので、単語１の係数Ｒ（１）は「０．７」である。一方、単語３の単語含有率θは「０．３」であり、新被分類技術文書は単語１を含んでいないので、単語３の係数Ｒ（３）は１−０．３＝０．７である。文書関連度Ｐは、これら４個の係数を乗じることによって算出される（即ち、Ｐ＝Ｒ（１）×Ｒ（２）×Ｒ（３）×Ｒ（４））。 For example, in FIG. 8A, the word content rate θ of the word 1 is “0.7”, and the new classified technical document includes the word 1, so the coefficient R (1) of the word 1 is “0”. .7 ". On the other hand, since the word content rate θ of the word 3 is “0.3” and the new classified technical document does not include the word 1, the coefficient R (3) of the word 3 is 1−0.3 = 0.7. It is. The document relevance P is calculated by multiplying these four coefficients (that is, P = R (1) × R (2) × R (3) × R (4)).

図８（Ａ）の例によれば、文書関連度Ｐは「０．３１３６」となる。一方、図８（Ｂ）の例によれば、文書関連度Ｐは「０．００３６」となる。図８（Ａ）及び（Ｂ）の例から理解されるように、文書関連度Ｐは、単語含有率θの高い単語を新被分類技術文書が含んでいるほど高い値となり、且つ、単語含有率θの低い単語を新被分類技術文書が含んでいないほど高い値となる。 According to the example of FIG. 8A, the document relevance P is “0.3136”. On the other hand, according to the example of FIG. 8B, the document relevance P is “0.0036”. As can be understood from the examples of FIGS. 8A and 8B, the document relevance P becomes higher as the new classified technical document includes a word having a higher word content rate θ, and includes the word. The higher the value, the lower the rate θ is in the new classified technical document.

（関連文書抽出処理−実際の処理）
図８（Ａ）及び（Ｂ）の例では参照技術文書グループに４個の単語のみが含まれていたが、実際の参照技術文書グループには多数の単語が含まれる。参照技術文書グループにＭ個の単語が含まれていれば、文書関連度Ｐは下式（４）により算出される。

ここで、θ（ｉ）は参照技術文書グループに含まれるｉ番目の単語に関する
単語含有率θであり、
Ｅ（ｉ）はｉ番目の単語が新被分類技術文書に含まれているか否かを表し、
ｉ番目の単語が新被分類技術文書に含まれていれば「１」となり、
ｉ番目の単語が新被分類技術文書に含まれていなければ「０」となる。 (Related Document Extraction Process-Actual Process)
In the example of FIGS. 8A and 8B, the reference technical document group includes only four words, but the actual reference technical document group includes a large number of words. If M words are included in the reference technical document group, the document relevance P is calculated by the following equation (4).

Here, θ (i) relates to the i-th word included in the reference technical document group.
The word content θ,
E (i) represents whether or not the i-th word is included in the new classified technical document,
If the i-th word is included in the new classified technical document, it will be “1”.
If the i-th word is not included in the new classified technical document, “0” is set.

ＣＰＵは、参照技術文書グループのそれぞれについて文書関連度Ｐを算出する。参照技術文書グループ毎の文書関連度Ｐの例が、図９の表に示される。図９から理解されるように、Ｋ個の参照技術文書グループのうちグループｂの文書関連度Ｐが最も値が大きい。従って、新被分類技術文書は、グループｂに含まれる参照技術文書のそれぞれと、他の参照技術文書グループに含まれる参照技術文書と比較して技術分野が関連している（即ち、グループｂに含まれる参照技術文書が関連文書である）ことが判る。 The CPU calculates the document relevance P for each of the reference technical document groups. An example of the document relevance level P for each reference technical document group is shown in the table of FIG. As can be understood from FIG. 9, the document relevance level P of group b has the largest value among the K reference technical document groups. Therefore, the new classified technical document has a technical field related to each of the reference technical documents included in the group b compared to the reference technical documents included in the other reference technical document groups (that is, to the group b). It is understood that the reference technical document included is a related document).

（具体的作動−参照技術文書追加処理）
次に、参照技術文書追加処理及び関連文書抽出処理の実行時におけるＣＰＵの具体的作動について図１０及び図１１のフローチャートを参照しながら説明する。図１０のフローチャートに示されたステップであって図４のフローチャートに示されたステップと同様の処理が実行されるステップには図４と同一のステップ符号が付されている。加えて、図１０及び図１１のフローチャートに示されたステップであって図５のフローチャートに示されたステップと同様の処理が実行されるステップには図５と同一のステップ符号が付されている。なお、被分類技術文書通知処理については、第２装置の作動は第１装置の作動と同一であるので説明を省略する。 (Specific operation-reference technical document addition processing)
Next, a specific operation of the CPU during the execution of the reference technical document addition process and the related document extraction process will be described with reference to the flowcharts of FIGS. Steps shown in the flowchart of FIG. 10 where the same processing as the steps shown in the flowchart of FIG. 4 is executed are denoted by the same step symbols as in FIG. In addition, the steps shown in the flowcharts of FIGS. 10 and 11 in which the same processes as the steps shown in the flowchart of FIG. 5 are executed are denoted by the same step symbols as in FIG. . Regarding the classified technical document notification process, the operation of the second device is the same as the operation of the first device, and thus the description thereof is omitted.

ＣＰＵは、参照技術文書追加処理の実行時、図１０のステップ１０００から処理を開始し、ステップ４０５及びステップ４１０の処理を経てステップ１０１５に進む。即ち、１０１５にてＣＰＵは、新参照技術文書を含む参照技術文書のうち任意の２つの文書の組合せに対して上記式（３）に基づいて語彙分布距離Ｄを算出する。 When executing the reference technical document addition process, the CPU starts the process from step 1000 in FIG. 10 and proceeds to step 1015 through the processes in steps 405 and 410. That is, at 1015, the CPU calculates the vocabulary distribution distance D based on the above equation (3) for any two combinations of reference technical documents including the new reference technical document.

次いで、ＣＰＵは、ステップ５３０の処理を経てステップ１０３５に進み、語彙分布距離Ｄが距離閾値Ｄｔｈ以下である文書を集約することによって新参照技術文書を含む参照技術文書のグルーピングを行い、ステップ５４０に進む。 Next, the CPU proceeds to step 1035 through the processing of step 530, groups the reference technical documents including the new reference technical document by aggregating documents whose vocabulary distribution distance D is equal to or smaller than the distance threshold Dth, and proceeds to step 540. move on.

ＣＰＵは、ステップ５４０にて「Ｙｅｓ」と判定したとき（即ち、新参照技術文書が属する参照技術文書グループが決定したとき）、ステップ１０４５に進み、新参照技術文書の語彙集合を抽出する。次いで、ＣＰＵは、ステップ１０５０に進み、参照技術文書グループのそれぞれに含まれる単語のそれぞれについて単語含有率θを算出する。 When the CPU determines “Yes” in step 540 (ie, when the reference technical document group to which the new reference technical document belongs is determined), the CPU proceeds to step 1045 to extract a vocabulary set of the new reference technical document. Next, the CPU proceeds to step 1050 to calculate the word content rate θ for each word included in each reference technical document group.

更に、ＣＰＵは、ステップ１０５５に進み、新参照技術文書の内容、新参照技術文書が属する参照技術文書グループ及び単語含有率θ等を参照技術文書ＤＢ３１に登録する。その後、ＣＰＵは、ステップ１０９５に進んで本ルーチンを終了する。なお、ＣＰＵは、ステップ５４０にて「Ｎｏ」と判定したとき、ステップ５４５の処理を経てステップ１０３５に進む。 Further, the CPU proceeds to step 1055 to register the contents of the new reference technical document, the reference technical document group to which the new reference technical document belongs, the word content rate θ, and the like in the reference technical document DB 31. Thereafter, the CPU proceeds to step 1095 to end the present routine. If the CPU determines “No” in step 540, it proceeds to step 1035 through step 545.

（具体的作動−関連文書抽出処理）
ＣＰＵは、関連文書抽出処理の実行時、図１１のステップ１１００から処理を開始し、ステップ５０５に進む。ＣＰＵは、ステップ５０５にて「Ｙｅｓ」と判定したとき、ステップ５１０の処理を経てステップ１１１５に進み、新被分類技術文書の語彙集合を抽出する。次いで、ＣＰＵは、ステップ１１２０に進み、新被分類技術文書の参照技術文書グループのそれぞれに対する文書関連度Ｐを上記式（４）に基づいてそれぞれ算出する。 (Specific operation-related document extraction processing)
When executing the related document extraction process, the CPU starts the process from step 1100 in FIG. 11 and proceeds to step 505. When the CPU determines “Yes” in step 505, it proceeds to step 1115 through the process of step 510 and extracts a vocabulary set of the new classified technical document. Next, the CPU proceeds to step 1120 to calculate the document relevance level P for each of the reference technical document groups of the new classified technical document based on the above equation (4).

更に、ＣＰＵは、ステップ１１２５に進み、算出された文書関連度Ｐの最大値が所定の関連度閾値Ｐｔｈ以上であるか否かを判定する。関連度閾値Ｐｔｈは、ある参照技術文書グループに対する文書関連度Ｐが関連度閾値Ｐｔｈより小さいと、新被分類技術文書が「その参照技術文書グループに含まれる参照技術文書」と技術分野が関連していないと判断される値となるように設定されている。 Further, the CPU proceeds to step 1125 to determine whether or not the calculated maximum value of the document relevance level P is greater than or equal to a predetermined relevance level threshold Pth. If the document relevance level P for a certain reference technical document group is smaller than the relevance level threshold Pth, the relevance threshold Pth is related to the technical field of the new classified technical document “reference technical document included in the reference technical document group”. It is set to a value that is determined not to be.

算出された文書関連度Ｐの最大値が関連度閾値Ｐｔｈ以上であれば、ＣＰＵは、ステップ１１２５にて「Ｙｅｓ」と判定してステップ１１３０に進み、文書関連度Ｐの値が最大となる参照技術文書グループを抽出する。 If the calculated maximum value of the document relevance level P is greater than or equal to the relevance level threshold value Pth, the CPU makes a “Yes” determination at step 1125 to proceed to step 1130, where the value of the document relevance level P is the maximum. Extract technical document groups.

次いで、ＣＰＵは、ステップ５５０及びステップ５５５の処理を経てステップ５６０に進む。ＣＰＵは、ステップ５６０にて「Ｙｅｓ」と判定したとき、ステップ１１９５に進み、本ルーチンを終了する。 Next, the CPU proceeds to step 560 through steps 550 and 555. When the CPU makes a “Yes” determination at step 560, the CPU proceeds to step 1195 to end the present routine.

一方、算出された文書関連度Ｐの最大値が関連度閾値Ｐｔｈより小さければ、ＣＰＵは、ステップ１１２５にて「Ｎｏ」と判定し、ステップ５６５及びステップ５５５の処理を経てステップ５６０に進む。なお、ＣＰＵは、ステップ５０５にて「Ｎｏ」と判定したとき、ステップ１１９５に直接進む。加えて、ＣＰＵは、ステップ５６０にて「Ｎｏ」と判定したとき、ステップ５１０に進む。 On the other hand, if the calculated maximum value of the document relevance P is smaller than the relevance threshold Pth, the CPU makes a “No” determination at step 1125 to proceed to step 560 through the processing of step 565 and step 555. When the CPU makes a “No” determination at step 505, the CPU proceeds directly to step 1195. In addition, when the CPU makes a “No” determination at step 560, the CPU proceeds to step 510.

以上、説明したように、第２装置（文書情報配信装置１２）は、
前記複数の参照技術文書のそれぞれを技術分野が互いに関連する技術文書の集合である複数の参照技術文書グループに分類したうえで前記文書記憶部に記憶させる参照技術文書分類部（ＣＰＵ２１等）を備え、
前記参照技術文書分類部は、
前記参照技術文書を前記文書記憶部に追加するとき、前記参照技術文書グループのうち前記追加される参照技術文書と技術分野が最も密接に関連している前記参照技術文書の集合である同参照技術文書グループに同追加される参照技術文書が属すると判定する参照技術文書追加処理（図１０）を実行し、
前記関連文書抽出部は、
前記参照技術文書グループのうち前記被分類技術文書と技術分野が最も密接に関連している前記参照技術文書の集合である同参照技術文書グループに含まれる同参照技術文書を前記関連文書として抽出することによって前記関連文書抽出処理（図１１）を実行する、
ように構成されている。 As described above, the second device (document information distribution device 12)
Each of the plurality of reference technical documents is classified into a plurality of reference technical document groups that are collections of technical documents related to each other in the technical field, and a reference technical document classification unit (CPU 21 or the like) is stored in the document storage unit. ,
The reference technical document classification unit includes:
When adding the reference technical document to the document storage unit, the reference technical document is a set of the reference technical documents that are most closely related to the added technical technical field in the reference technical document group. A reference technical document addition process (FIG. 10) for determining that a reference technical document to be added to the document group belongs is executed.
The related document extraction unit includes:
Among the reference technical document groups, the reference technical documents included in the reference technical document group that is a set of the reference technical documents most closely related to the classified technical document and the technical field are extracted as the related documents. The related document extraction process (FIG. 11) is executed by
It is configured as follows.

加えて、第２装置は、
前記参照技術文書分類部が、
ある技術文書に含まれる単語のそれぞれの同技術文書における出現頻度に基づいて定まる第２語彙分布を前記追加される参照技術文書及びそれ以外の前記参照技術文書のそれぞれに対して算出し、
２つの技術文書のそれぞれの前記第２語彙分布が互いに類似しているほど小さい値になる第２語彙分布距離（語彙分布距離Ｄ）を前記追加される参照技術文書及び前記それ以外の参照技術文書のうちの任意の２つの技術文書の組合せに対して算出し（図１０のステップ１０１５）、
前記第２語彙分布距離が所定値よりも小さい前記技術文書の組合せに含まれる技術文書のそれぞれを集約することによって所定数の前記参照技術文書グループ生成し（図１０のステップ１０３５、ステップ５４０、ステップ５４５及びステップ１０４５）、
前記生成された参照技術文書グループのうち前記追加される参照技術文書が含まれる同参照技術文書グループに同追加される参照技術文書が属すると判定する（図１０のステップ１０５５）、
ことによって前記参照技術文書追加処理を実行するように構成されている。 In addition, the second device
The reference technical document classification unit
Calculating a second vocabulary distribution determined based on the appearance frequency of each word included in a technical document for each of the added reference technical document and the other reference technical documents;
The second vocabulary distribution distance (vocabulary distribution distance D), which becomes a smaller value as the second vocabulary distributions of the two technical documents are similar to each other, is added to the reference technical document and the other reference technical documents. Is calculated for a combination of two arbitrary technical documents (step 1015 in FIG. 10),
A predetermined number of the reference technical document groups are generated by aggregating each of the technical documents included in the technical document combination in which the second vocabulary distribution distance is smaller than a predetermined value (Step 1035, Step 540, Step in FIG. 10). 545 and step 1045),
It is determined that the added reference technical document belongs to the same reference technical document group including the added reference technical document among the generated reference technical document groups (step 1055 in FIG. 10).
Accordingly, the reference technical document adding process is executed.

更に、第２装置は、
前記関連文書抽出部が、
前記参照技術文書グループに含まれる前記参照技術文書の数に対するその参照技術文書グループにおけるある単語を含んでいる同参照技術文書の数の比率である第１単語含有率（単語含有率θ）を同参照技術文書グループ毎に且つ同単語毎に算出し（図１０のステップ１０５５）、
前記被分類技術文書が前記第１単語含有率のより高い単語をより多く含んでいるほど大きい値となり且つ同被分類技術文書が前記第１単語含有率のより低い単語をより少なく含んでいるほど大きい値となる第１文書関連度（文書関連度Ｐ）を前記参照技術文書グループのそれぞれに対して算出し（図１１のステップ１１２０）、
前記参照技術文書グループのうち前記第１文書関連度が最も大きい値となる同参照技術文書グループに含まれる前記参照技術文書を前記関連文書として抽出する（図１１のステップ１１３０）、
ことによって前記関連文書抽出処理を実行するように構成されている。 Furthermore, the second device is
The related document extraction unit
The first word content rate (word content rate θ), which is the ratio of the number of the reference technical documents including a word in the reference technical document group to the number of the reference technical documents included in the reference technical document group, is the same. For each reference technical document group and for each word (step 1055 in FIG. 10),
The higher the value of the classified technical document, the higher the first word content rate, the larger the value, and the lower the classified technical document, the lower the first word content rate. A first document relevance level (document relevance level P) having a large value is calculated for each of the reference technical document groups (step 1120 in FIG. 11),
The reference technical document included in the reference technical document group having the highest value of the first document relevance among the reference technical document groups is extracted as the related document (step 1130 in FIG. 11).
Thus, the related document extraction process is executed.

第２装置によれば、関連文書抽出処理が複数回実行されても参照技術文書グループのそれぞれに含まれる参照技術文書の組合せ（即ち、参照技術文書グループの構成）は変化しない。そのため、被分類技術文書の内容によって参照技術文書グループの構成が変化し、以て、被分類技術文書の情報提供を受ける担当者の組合せが変化してしまうことが回避される。具体的には、ある被分類技術文書に関する情報は担当者Ａ及び担当者Ｂに配信され、別の被分類技術文書に関する情報は担当者Ａ及び担当者Ｃに配信されるという事象の発生が回避される。 According to the second apparatus, even if the related document extraction process is executed a plurality of times, the combination of the reference technical documents included in each of the reference technical document groups (that is, the configuration of the reference technical document group) does not change. Therefore, it is avoided that the configuration of the reference technical document group changes depending on the contents of the classified technical document, and the combination of persons in charge who receive information on the classified technical document is changed. Specifically, the occurrence of an event that information related to one classified technical document is distributed to the person in charge A and the person in charge B and information related to another classified technical document is distributed to the person in charge A and the person in charge C is avoided. Is done.

加えて、参照技術文書グループの構成が語彙分布距離Ｄに基づいて決定されるので、第２装置によれば、技術分野が互いに関連する参照技術文書を精度良く集約したうえで参照技術文書グループを生成することができる。更に、参照技術文書追加処理を実行する度に参照技術文書グループが改めて生成されるので、第２装置によれば、多くの参照技術文書が追加されても参照技術文書グループが互いに技術分野が関連する参照技術文書によって構成される状態が維持される。 In addition, since the configuration of the reference technical document group is determined based on the vocabulary distribution distance D, according to the second device, the reference technical document group is obtained after accurately collecting the reference technical documents related to each other in the technical field. Can be generated. Further, since the reference technical document group is newly generated every time the reference technical document adding process is executed, the second technical apparatus relates the technical fields to each other even if many reference technical documents are added. The state constituted by the reference technical document is maintained.

加えて、第２装置は、関連文書抽出処理の実行時、文書関連度Ｐに基づいて関連文書を抽出する。換言すれば、第２装置は関連文書を、被分類技術文書及び参照技術文書のそれぞれが「含んでいる単語」に加えて「含んでない単語」にも基づいて決定する。そのため、第２装置によれば、被分類技術文書と技術分野が関連する参照技術文書を含む参照技術文書グループを精度良く抽出することができる。 In addition, the second device extracts a related document based on the document relevance level P when executing the related document extraction process. In other words, the second device determines the related document based on “words not included” in addition to “words included” in each of the classified technical document and the reference technical document. Therefore, according to the second apparatus, it is possible to accurately extract a reference technical document group including a reference technical document related to a classified technical document and a technical field.

＜第３実施形態＞
次に、本発明の第３実施形態に係る文書情報配信装置１３（以下、「第３装置」とも称呼される。）について説明する。第２装置は、参照技術文書追加処理の実行時、参照技術文書を語彙分布距離Ｄに基づいて所定数の参照技術文書グループに分割していた。即ち、第２装置は、新参照技術文書を登録する度に参照技術文書のグルーピングをやり直していた。これに対し、第３装置は、新参照技術文書を追加するとき、新参照技術文書が、予め分類された参照技術文書グループのうちのどのグループに属するか（即ち、どのグループと技術分野が関連するか）を文書関連度Ｐに基づいて判定する。以下、この相違点を中心に説明する。 <Third Embodiment>
Next, a document information distribution apparatus 13 (hereinafter also referred to as “third apparatus”) according to a third embodiment of the present invention will be described. The second device divides the reference technical document into a predetermined number of reference technical document groups based on the vocabulary distribution distance D when the reference technical document addition process is executed. That is, the second apparatus regroups the reference technical documents every time a new reference technical document is registered. On the other hand, when the third device adds a new reference technical document, to which group of the reference technical document groups the new reference technical document belongs (ie, which group and the technical field are related). Is determined based on the document relevance P. Hereinafter, this difference will be mainly described.

（参照技術文書登録処理）
文書情報配信装置１３の運用者は、参照技術文書を技術分野に応じて予め所定値Ｋ個のグループ（参照技術文書グループ）に分類したうえで文書情報配信装置１３に登録する。このとき、文書情報配信装置１３のＣＰＵ２１（以下、単に「ＣＰＵ」とも称呼される。）は、参照技術文書登録処理を実行し、参照技術文書のそれぞれを参照技術文書グループ毎に参照技術文書ＤＢ３１に登録する。 (Reference technical document registration process)
The operator of the document information distribution apparatus 13 classifies the reference technical documents into predetermined value K groups (reference technical document groups) in advance according to the technical field, and registers them in the document information distribution apparatus 13. At this time, the CPU 21 of the document information distribution apparatus 13 (hereinafter also simply referred to as “CPU”) executes a reference technical document registration process, and each of the reference technical documents is referred to for each reference technical document group. Register with.

（参照技術文書追加処理）
更に、運用者が参照技術文書を追加するとき、ＣＰＵは、参照技術文書追加処理を実行する。ＣＰＵは、参照技術文書追加処理の実行時、参照技術文書グループのそれぞれについて新参照技術文書との関連度を表す文書関連度Ｐを算出し、文書関連度Ｐが最大となる参照技術文書グループに新参照技術文書が属すると判定する。 (Reference technical document addition process)
Further, when the operator adds a reference technical document, the CPU executes a reference technical document addition process. When executing the reference technical document addition process, the CPU calculates a document relevance level P representing the relevance level with the new reference technical document for each of the reference technical document groups, and determines the reference technical document group having the maximum document relevance level P. It is determined that the new reference technical document belongs.

（具体的作動−参照技術文書登録処理）
次に、参照技術文書登録処理及び参照技術文書追加処理の実行時におけるＣＰＵの具体的作動について図１２及び図１３のフローチャートを参照しながら説明する。図１３のフローチャートに示されたステップであって図４、図１０及び図１１のフローチャートに示されたステップと同様の処理が実行されるステップには図４、図１０及び図１１のそれぞれと同一のステップ符号が付されている。 (Specific operation-reference technical document registration process)
Next, a specific operation of the CPU at the time of executing the reference technical document registration process and the reference technical document addition process will be described with reference to the flowcharts of FIGS. The steps shown in the flowchart of FIG. 13 and the same steps as those shown in the flowcharts of FIGS. 4, 10, and 11 are the same as those in FIGS. 4, 10, and 11, respectively. The step code is attached.

なお、関連文書抽出処理については第３装置の作動は第２装置の作動と同一であり、被分類技術文書通知処理については第３装置の作動は第１装置の作動と同一であるので、これらの処理の説明を省略する。 The operation of the third device is the same as the operation of the second device for the related document extraction process, and the operation of the third device is the same as the operation of the first device for the classified technical document notification process. The description of the process is omitted.

ＣＰＵは、参照技術文書登録処理の実行時、図１２のステップ１２００から処理を開始し、ステップ１２０５に進み、ＵＳＢメモリに保存された予めグルーピングされた（即ち、参照技術文書グループに分類された）複数の参照技術文書を読み込み、その内容をＲＡＭ２２に保存する。次いで、ＣＰＵは、ステップ１２１０に進み、読み込んだ参照技術文書のそれぞれの語彙集合を抽出する。 When executing the reference technical document registration process, the CPU starts the process from step 1200 in FIG. 12, proceeds to step 1205, and is pre-grouped (that is, classified into the reference technical document group) stored in the USB memory. A plurality of reference technical documents are read and the contents are stored in the RAM 22. Next, the CPU proceeds to step 1210 and extracts each vocabulary set of the read reference technical document.

更に、ＣＰＵは、ステップ１２１５に進み、参照技術文書グループのそれぞれに含まれる単語のそれぞれについて単語含有率θを算出する。その後、ＣＰＵは、ステップ１２２０に進み、参照技術文書のそれぞれの内容及び参照技術文書グループのそれぞれの単語含有率θ等を参照技術文書ＤＢ３１に登録する。この際、ＣＰＵは、参照技術文書ＤＢ３１に既に登録されていた参照技術文書に関する情報を参照技術文書ＤＢ３１から削除する。次いで、ＣＰＵは、ステップ１２９５に進んで本ルーチンを終了する。 Further, the CPU proceeds to step 1215 to calculate the word content rate θ for each word included in each reference technical document group. Thereafter, the CPU proceeds to step 1220 to register each content of the reference technical document and each word content rate θ of the reference technical document group in the reference technical document DB 31. At this time, the CPU deletes information related to the reference technical document already registered in the reference technical document DB 31 from the reference technical document DB 31. Next, the CPU proceeds to step 1295 to end the present routine.

（具体的作動−参照技術文書追加処理）
ＣＰＵは、参照技術文書追加処理の実行時、図１３のステップ１３００から処理を開始し、ステップ４０５、ステップ１１２０、ステップ１１３０及びステップ１０４５の処理を経てステップ１３５０に進む。 (Specific operation-reference technical document addition processing)
When executing the reference technical document addition process, the CPU starts the process from step 1300 in FIG. 13, and proceeds to step 1350 through the processes in step 405, step 1120, step 1130, and step 1045.

ステップ１３５０にてＣＰＵは、ステップ１１３０にて抽出された参照技術文書グループ（即ち、新参照技術文書が属するグループ）に新参照技術文書を加えた参照技術文書の集合（即ち、更新された参照技術文書グループ）に含まれる単語のそれぞれの単語含有率θを算出する。次いで、ＣＰＵは、ステップ１３５５に進み、新参照技術文書に関する情報、及び、更新された参照技術文書グループに関する情報等を参照技術文書ＤＢ３１に登録する。更に、ＣＰＵは、ステップ１３９５に進み、本ルーチンを終了する。 In step 1350, the CPU adds a new reference technical document to the reference technical document group (that is, the group to which the new reference technical document belongs) extracted in step 1130 (ie, updated reference technology). The word content rate θ of each word included in the document group is calculated. Next, the CPU proceeds to step 1355 to register information related to the new reference technical document and information related to the updated reference technical document group in the reference technical document DB 31. Further, the CPU proceeds to step 1395 to end this routine.

以上、説明したように、第３装置は（文書情報配信装置１３）は、
前記参照技術文書分類部が、
前記参照技術文書グループに含まれる前記参照技術文書の数に対するその参照技術文書グループにおけるある単語を含んでいる同参照技術文書の数の比率である第２単語含有率（単語含有率θ）を同参照技術文書グループ毎に且つ同単語毎に算出し（図１２のステップ１２１５）、
前記追加される参照技術文書が前記第２単語含有率のより高い単語をより多く含んでいるほど大きい値となり且つ同追加される参照技術文書が前記第２単語含有率のより低い単語をより少なく含んでいるほど大きい値となる第２文書関連度（文書関連度Ｐ）を前記参照技術文書グループのそれぞれに対して算出し（図１３のステップ１１２０）、
前記参照技術文書グループのうち前記第２文書関連度が最も大きい値となる同参照技術文書グループに前記追加される参照技術文書が属すると判定する（図１３のステップ１１３０）、
ことによって前記参照技術文書追加処理を実行するように構成されている。 As described above, the third device (document information distribution device 13)
The reference technical document classification unit
The second word content rate (word content rate θ), which is the ratio of the number of the reference technical documents containing a word in the reference technical document group to the number of the reference technical documents included in the reference technical document group, is the same. For each reference technical document group and for each word (step 1215 in FIG. 12),
The added reference technical document has a higher value as it contains more words with a higher second word content, and the added reference technical document has fewer words with a lower second word content. A second document relevance level (document relevance level P) that increases as it is included is calculated for each of the reference technical document groups (step 1120 in FIG. 13).
It is determined that the added reference technical document belongs to the reference technical document group having the highest value of the second document relevance among the reference technical document groups (step 1130 in FIG. 13).
Accordingly, the reference technical document adding process is executed.

第３装置によれば、参照技術文書グループのそれぞれに含まれる参照技術文書の組合せ（即ち、参照技術文書グループの構成）が文書関連度Ｐに基づいて決定される。換言すれば、参照技術文書グループの構成は、参照技術文書のそれぞれが「含んでいる単語」に加えて「含んでない単語」にも基づいて決定される。従って、第３装置によれば、多くの参照技術文書が追加される場合であっても参照技術文書グループが互いに技術分野が関連する参照技術文書によって構成される状態が維持される。 According to the third apparatus, a combination of reference technical documents included in each of the reference technical document groups (that is, the configuration of the reference technical document group) is determined based on the document relevance level P. In other words, the structure of the reference technical document group is determined based on “words not included” in addition to “words included” in each of the reference technical documents. Therefore, according to the third device, even when many reference technical documents are added, the state in which the reference technical document groups are constituted by the reference technical documents related to the technical field is maintained.

＜第３実施形態の変形例＞
次に、本発明の第３実施形態の変形例に係る文書情報配信装置（以下、「本変形装置」とも称呼される。）ついて説明する。第３装置は、参照技術文書登録処理によって予め分類された複数の参照技術文書を参照技術文書ＤＢ３１に登録し、更に、参照技術文書追加処理によって新参照技術文書を参照技術文書ＤＢ３１に追加する際に新参照技術文書がどの参照技術文書グループに属するかを判定していた。これに対し、本変形例に係る文書情報配信装置は、参照技術文書追加処理を実行しない。 <Modification of Third Embodiment>
Next, a document information distribution apparatus (hereinafter also referred to as “this modification apparatus”) according to a modification of the third embodiment of the present invention will be described. The third device registers a plurality of reference technical documents classified in advance by the reference technical document registration process in the reference technical document DB 31, and further adds a new reference technical document to the reference technical document DB 31 by the reference technical document addition process. To which reference technical document group the new reference technical document belongs. On the other hand, the document information distribution apparatus according to this modification does not execute the reference technical document addition process.

そのため、新参照技術文書を登録するとき（即ち、参照技術文書ＤＢ３１に参照技術文書を追加するとき）、本変形装置の運用者は、改めて参照技術文書登録処理を実行する。換言すれば、運用者は、新参照技術文書を本変形装置に登録するとき、新参照技術文書がどの参照技術文書グループに属するか決定しておく必要がある。 For this reason, when a new reference technical document is registered (that is, when a reference technical document is added to the reference technical document DB 31), the operator of the deforming apparatus executes a reference technical document registration process again. In other words, the operator needs to determine which reference technical document group the new reference technical document belongs to when registering the new reference technical document in the deformation apparatus.

＜第４実施形態＞
次に、本発明の第４実施形態に係る文書情報配信装置１４（以下、「第４装置」とも称呼される。）について説明する。第３装置は、予め分類された参照技術文書グループの集合に新参照技術文書を更に追加するとき、文書関連度Ｐに基づいて新参照技術文書がどの参照技術文書グループに属するかを決定していた。これに対し、第４装置は、予め分類された参照技術文書グループの集合に新参照技術文書を更に追加するとき、参照技術文書のそれぞれの担当者に応じて定まる担当者関連度Ｑに基づいて新参照技術文書がどの参照技術文書グループに属するかを決定する。従って、以下、この相違点を中心に説明する。 <Fourth embodiment>
Next, a document information distribution apparatus 14 (hereinafter also referred to as “fourth apparatus”) according to a fourth embodiment of the present invention will be described. The third device determines which reference technical document group the new reference technical document belongs to based on the document relevance P when the new reference technical document is further added to the set of reference technical document groups classified in advance. It was. On the other hand, when the fourth device further adds a new reference technical document to the set of reference technical document groups classified in advance, the fourth device is based on the person-in-charge relevance level Q determined according to each person in charge of the reference technical document. Determine which reference technical document group the new reference technical document belongs to. Therefore, this difference will be mainly described below.

（参照技術文書追加処理）
文書情報配信装置１４のＣＰＵ２１（以下、単に「ＣＰＵ」とも称呼される。）は、参照技術文書追加処理の実行時、担当者関連度Ｑを参照技術文書のそれぞれに対して算出する。担当者関連度Ｑは、新参照技術文書の担当者の集合と、既に参照技術文書ＤＢ３１に登録されている参照技術文書（他の参照技術文書）のそれぞれの担当者の集合と、が類似している程度を表す値である。 (Reference technical document addition process)
The CPU 21 of the document information distribution apparatus 14 (hereinafter also simply referred to as “CPU”) calculates the person-in-charge relevance level Q for each reference technical document when executing the reference technical document addition process. The person-in-charge relevance level Q is similar to the set of persons in charge of the new reference technical document and the set of persons in charge of the reference technical documents (other reference technical documents) already registered in the reference technical document DB 31. It is a value that represents the degree to which

具体的には、担当者関連度Ｑの値は、新参照技術文書の担当者の集合、及び、他の参照技術文書の担当者の集合の積集合（即ち、論理積）に含まれる担当者の人数に等しい。従って、担当者関連度Ｑの値は、これら２つの担当者の集合が互いに類似しているほど且つそれらの集合に含まれる担当者の数が大きくなるほど大きくなる。２つの参照技術文書に関する担当者関連度Ｑが大きいほど、それら２つの参照技術文書の技術分野が互いに関連していると考えられる。 Specifically, the value of person-in-charge relevance Q is a person in charge included in the intersection (ie, logical product) of a set of persons in charge of the new reference technical document and a group of persons in charge of other reference technical documents. Equal to the number of people. Therefore, the value of the person-in-charge relevance Q increases as the two sets of persons in charge are similar to each other and the number of persons in charge included in these sets increases. It is considered that the technical fields of the two reference technical documents are related to each other as the person-in-charge relevance level Q regarding the two reference technical documents is larger.

ＣＰＵは、参照技術文書追加処理の実行時、新参照技術文書と他の参照技術文書のそれぞれとの組合せに対して担当者関連度Ｑをそれぞれ算出する。算出された担当者関連度Ｑの例が図１４に示される。図１４には、既に参照技術文書ＤＢ３１に登録されている参照技術文書１から参照技術文書ＮまでのＮ個の参照技術文書、及び、新参照技術文書のそれぞれの担当者が「１」及び「０」の文字によって表されている。 When executing the reference technical document addition process, the CPU calculates the person-in-charge relevance level Q for each combination of the new reference technical document and each of the other reference technical documents. An example of the calculated person-in-charge association degree Q is shown in FIG. In FIG. 14, the persons in charge of the N reference technical documents from the reference technical document 1 to the reference technical document N already registered in the reference technical document DB 31 and the new reference technical document are “1” and “ It is represented by the character “0”.

例えば、参照技術文書１の担当者には担当者１が含まれているので図１４の該当する箇所には「１」が記載されている。一方、参照技術文書１の担当者には担当者４が含まれていないので図１４の該当する箇所には「０」が記載されている。ある参照技術文書の担当者が含んでいる個々の担当者の集合は、「担当者集合」とも称呼される。 For example, since the person in charge of the reference technical document 1 includes the person in charge 1, “1” is described in the corresponding part of FIG. On the other hand, since the person in charge of the reference technical document 1 does not include the person in charge 4, “0” is described in the corresponding part of FIG. 14. A set of individual persons in charge included in a person in charge of a reference technical document is also referred to as a “person in charge group”.

加えて、図１４の表には参照技術文書のそれぞれが属する参照技術文書グループ（グループａ、グループｂ、…、グループＫのＫ個のグループのいずれか）、及び、参照技術文書のそれぞれに対応する担当者関連度Ｑの値が表されている。 In addition, the table in FIG. 14 corresponds to each of the reference technical document groups (any one of K groups of group a, group b,..., Group K) to which each of the reference technical documents belongs, and each of the reference technical documents. The value of the person-in-charge relevance Q is shown.

担当者関連度Ｑの値が大きい順に参照技術文書を並べ替えて得られる表の例が図１５に示される。ＣＰＵは、担当者関連度Ｑが大きい順に所定値Ｌ個（本例において、６個）の参照技術文書（担当者類似文書）を抽出し、参照技術文書グループのうちもっとも多くの担当者類似文書が含まれる参照技術文書グループを新参照技術文書が属するグループであると判定する。 An example of a table obtained by rearranging the reference technical documents in descending order of the person-in-charge relevance level Q is shown in FIG. The CPU extracts L reference technical documents (six person-in-charge documents) in the descending order of the person-in-charge relevance level Q, and the most similar person-in-charge documents in the reference technical document group. Is determined to be a group to which the new reference technical document belongs.

本例においては、６個の関連参照技術文書のうち、グループａに含まれる参照技術文書が３個あり、グループｂに含まれる参照技術文書が２個あり、そして、グループｃに含まれる参照技術文書が１個ある。グループａに含まれる関連文書が他のグループに含まれる参照技術文書と比較して最も多いので、ＣＰＵは、新参照技術文書は、グループａに含まれると判定する。 In this example, among the six related reference technical documents, there are three reference technical documents included in group a, two reference technical documents included in group b, and reference technologies included in group c. There is one document. Since the number of related documents included in group a is the largest compared to reference technical documents included in other groups, the CPU determines that the new reference technical document is included in group a.

（具体的作動−参照技術文書登録処理）
次に、参照技術文書登録処理及び参照技術文書追加処理の実行時におけるＣＰＵの具体的作動について図１６及び図１７のフローチャートを参照しながら説明する。図１６及び図１７のフローチャートに示されたステップであって図４、図１０、図１２及び図１３のフローチャートに示されたステップと同様の処理が実行されるステップには図４、図１０、図１２及び図１３のそれぞれと同一のステップ符号が付されている。 (Specific operation-reference technical document registration process)
Next, a specific operation of the CPU during execution of the reference technical document registration process and the reference technical document addition process will be described with reference to the flowcharts of FIGS. 16 and 17. Steps shown in the flowcharts of FIGS. 16 and 17 in which the same processing as the steps shown in the flowcharts of FIGS. 4, 10, 12, and 13 are executed are shown in FIGS. The same step codes as those in FIGS. 12 and 13 are given.

なお、関連文書抽出処理については第４装置の作動は第２装置の作動と同一であり、被分類技術文書通知処理については第４装置の作動は第１装置の作動と同一であるので、これらの処理の説明を省略する。 As for the related document extraction process, the operation of the fourth device is the same as the operation of the second device, and for the classified technical document notification processing, the operation of the fourth device is the same as the operation of the first device. The description of the process is omitted.

ＣＰＵは、参照技術文書登録処理の実行時、図１６のステップ１６００から処理を開始し、ステップ１２０５乃至ステップ１２１５の処理を経てステップ１６２０に進む。ステップ１６２０にてＣＰＵは、参照技術文書のそれぞれの担当者集合を抽出する。 When executing the reference technical document registration process, the CPU starts the process from step 1600 of FIG. 16, and proceeds to step 1620 through the processes of steps 1205 to 1215. In step 1620, the CPU extracts each person-in-charge set of the reference technical document.

次いで、ＣＰＵは、ステップ１６２５に進み、参照技術文書のそれぞれの内容、参照技術文書グループのそれぞれの単語含有率θ及び参照技術文書のそれぞれの担当者集合等を参照技術文書ＤＢ３１に登録する。この際、参照技術文書ＤＢ３１に既に登録されていた参照技術文書に関する情報は参照技術文書ＤＢ３１から削除する。次いで、ＣＰＵは、ステップ１６９５に進んで本ルーチンを終了する。 Next, the CPU proceeds to step 1625 to register each content of the reference technical document, each word content θ of the reference technical document group, each person in charge of the reference technical document, and the like in the reference technical document DB 31. At this time, information regarding the reference technical document already registered in the reference technical document DB 31 is deleted from the reference technical document DB 31. Next, the CPU proceeds to step 1695 to end the present routine.

（具体的作動−参照技術文書追加処理）
ＣＰＵは、参照技術文書追加処理の実行時、図１７のステップ１７００から処理を開始し、ステップ４０５の処理を経てステップ１７１０に進み、読み込んだ新参照技術文書から担当者集合を抽出する。 (Specific operation-reference technical document addition processing)
When executing the reference technical document addition process, the CPU starts the process from step 1700 of FIG. 17, proceeds to step 1710 through the process of step 405, and extracts a person-in-charge set from the read new reference technical document.

次いで、ＣＰＵは、ステップ１７１５に進み、新参照技術文書と、既に参照技術文書ＤＢ３１に登録されていた参照技術文書のそれぞれと、の間の担当者関連度Ｑをそれぞれ算出する。更に、ＣＰＵは、ステップ１７２０に進み、担当者関連度Ｑが大きい上位Ｌ個の参照技術文書（即ち、担当者類似文書）を抽出する。 Next, the CPU proceeds to step 1715 to calculate a person-in-charge relevance level Q between the new reference technical document and each of the reference technical documents already registered in the reference technical document DB 31. Further, the CPU proceeds to step 1720 to extract the top L reference technical documents (that is, person-in-charge similar documents) having a large person-in-charge relevance level Q.

次いで、ＣＰＵは、ステップ１７２５に進み、Ｌ個の担当者類似文書のそれぞれが属する参照技術文書グループのうち最も多くの関連参照技術文書が含まれる参照技術文書グループを抽出し、新参照技術文書がその参照技術文書グループに属すると判定する。更に、ＣＰＵは、ステップ１０４５、ステップ１３５０及びステップ１３５５の処理を経てステップ１７９５に進み、本ルーチンを終了する。 Next, the CPU proceeds to step 1725 to extract a reference technical document group including the most related reference technical documents from among the reference technical document groups to which each of the L person-in-charge similar documents belongs, and a new reference technical document is obtained. It is determined that it belongs to the reference technical document group. Further, the CPU proceeds to step 1795 through steps 1045, 1350, and 1355, and ends the present routine.

以上、説明したように、第４装置（文書情報提供装置１４）は、
前記参照技術文書分類部が、
前記参照技術文書グループに含まれる前記参照技術文書のそれぞれの前記担当者の集合と、前記追加される参照技術文書の担当者の集合と、の両方に含まれる担当者の数が多いほど大きい値となる担当者関連度（Ｑ）を前記参照技術文書グループのそれぞれに対して算出し（図１６のステップ１７１５）、
前記参照技術文書グループのうち前記担当者関連度が最も大きい値となる同参照技術文書グループに前記追加される参照技術文書が属すると判定する（図１６のステップ１７２０）、
ことによって前記参照技術文書追加処理を実行するように構成されている。 As described above, the fourth device (document information providing device 14)
The reference technical document classification unit
The larger the number of persons in charge included in both the set of persons in charge of the reference technical document included in the reference technical document group and the group of persons in charge of the added reference technical document, the larger the value. The person-in-charge relevance level (Q) is calculated for each of the reference technical document groups (step 1715 in FIG. 16),
It is determined that the added reference technical document belongs to the reference technical document group having the highest value of the person-in-charge relevance among the reference technical document groups (step 1720 in FIG. 16).
Accordingly, the reference technical document adding process is executed.

第４装置によれば、参照技術文書グループのそれぞれに含まれる参照技術文書の組合せ（即ち、参照技術文書グループの構成）が担当者関連度Ｑに基づいて決定される。そのため、「同一の担当者が関わった技術分野が互いに関連している複数の参照技術文書」は、同一の参照技術文書グループに含まれる可能性が高い。従って、第４装置によれば、多くの参照技術文書が追加される場合であっても参照技術文書グループが互いに技術分野が関連する参照技術文書によって構成される状態が維持される。 According to the fourth device, a combination of reference technical documents included in each of the reference technical document groups (that is, the configuration of the reference technical document group) is determined based on the person-in-charge relationship degree Q. Therefore, “a plurality of reference technical documents in which technical fields related to the same person in charge are related to each other” is highly likely to be included in the same reference technical document group. Therefore, according to the fourth device, even when many reference technical documents are added, the state in which the reference technical document groups are configured by the reference technical documents related to the technical field is maintained.

上述した第１装置から第４装置までのそれぞれの参照技術文書の分類方法及び被分類技術文書の分類方法を対比した表が図１８に示される。 FIG. 18 shows a table comparing the classification method of the reference technical document and the classification method of the classified technical document from the first device to the fourth device described above.

以上、本発明に係る文書情報提供装置の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の目的を逸脱しない限りにおいて種々の変更が可能である。例えば、各実施形態における参照技術文書は自社の特許出願（願書に添付された明細書等）であり、被分類技術文書は他社の特許出願（公開特許公報）であった。しかし、参照技術文書及び被分類技術文書は、特許出願以外の技術文書であってもよい。 Although the embodiment of the document information providing apparatus according to the present invention has been described above, the present invention is not limited to the above embodiment, and various modifications can be made without departing from the object of the present invention. For example, the reference technical document in each embodiment is an in-house patent application (specifications attached to the application), and the classified technical document is a patent application (published patent publication) of another company. However, the reference technical document and the classified technical document may be technical documents other than the patent application.

例えば、参照技術文書及び被分類技術文書は、技術論文であっても良い。或いは、参照技術文書及び被分類技術文書に実用新案登録出願が含まれていても良い。更に、参照技術文書及び被分類技術文書は、ネットワーク４１を介してアクセスすることができるＷｅｂサーバにて公開されるニュース原稿であっても良い。加えて、参照技術文書及び被分類技術文書は、これらの文書（即ち、特許出願、実用新案登録出願、学術論文、及び、ニュース原稿）の組合せであっても良い。 For example, the reference technical document and the classified technical document may be technical papers. Alternatively, the utility model registration application may be included in the reference technical document and the classified technical document. Further, the reference technical document and the classified technical document may be a news manuscript published on a Web server accessible via the network 41. In addition, the reference technical document and the classified technical document may be a combination of these documents (ie, patent application, utility model registration application, academic paper, and news manuscript).

加えて、第１実施形態及び第２実施形態において、語彙分布はある技術文書における、ある単語の出現回数に基づいて算出されていた。しかし、語彙分布は、技術文書の長さ（テキスト長）に対する単語の出現回数の比率に基づいて算出されても良い In addition, in the first embodiment and the second embodiment, the vocabulary distribution is calculated based on the number of occurrences of a certain word in a certain technical document. However, the vocabulary distribution may be calculated based on the ratio of the number of appearances of the word to the length of the technical document (text length).

加えて、各実施形態において、語彙分布及び語彙集合は、ある技術文書に出現する単語毎に算出されていた。しかし、語彙分布及び／又は語彙集合は、技術文書に出現する単語を所定の同義語辞書に基づいて集約した上で算出されても良い。例えば、「電動機」、「モータ」及び「モーター」が同義語として扱われ、その結果、ある技術文書に「モータ」又は「モーター」が含まれているとき、語彙分布及び／又は語彙集合の算出においては、これらの単語は「電動機」であると扱われても良い。 In addition, in each embodiment, the vocabulary distribution and the vocabulary set are calculated for each word appearing in a certain technical document. However, the vocabulary distribution and / or vocabulary set may be calculated after aggregating words appearing in the technical document based on a predetermined synonym dictionary. For example, when “motor”, “motor”, and “motor” are treated as synonyms and, as a result, “motor” or “motor” is included in a technical document, the lexical distribution and / or vocabulary set is calculated. In, these words may be treated as “motors”.

加えて、第１実施形態において、語彙分布距離Ｄが参照距離Ｄｒよりも短い範囲に参照技術文書が存在していなければ、第１装置は、関連文書が存在しないと判定していた。しかし、この処理は割愛されても良い。同様に、第２実施形態において、文書関連度Ｐが関連度閾値Ｐｔｈ以上とならなければ、第２装置は、関連文書が存在しないと判定していた。しかし、この処理は割愛されても良い。 In addition, in the first embodiment, if there is no reference technical document in a range where the vocabulary distribution distance D is shorter than the reference distance Dr, the first device determines that there is no related document. However, this process may be omitted. Similarly, in the second embodiment, if the document relevance level P is not equal to or higher than the relevance level threshold Pth, the second device determines that there is no related document. However, this process may be omitted.

加えて、各実施形態において、参照技術文書のそれぞれには担当者が割り当てられていた。しかし、担当者の替わりに自社の担当部署が参照技術文書のそれぞれに割り当てられていても良い。例えば、参照技術文書（即ち、特許出願に係る明細書等）に記載された発明の発明者が所属する部署が、参照技術文書に割り当てられていても良い。この場合、第４装置は、各参照技術文書の担当部署に基づいて担当者関連度Ｑを算出しても良い。 In addition, in each embodiment, a person in charge is assigned to each reference technical document. However, instead of the person in charge, the department in charge of the company may be assigned to each of the reference technical documents. For example, a department to which the inventor of the invention described in a reference technical document (that is, a specification related to a patent application, etc.) may be assigned to the reference technical document. In this case, the fourth device may calculate the person-in-charge relevance level Q based on the department in charge of each reference technical document.

１１…文書情報配信装置、２１…ＣＰＵ、２２…ＲＡＭ、２３…ＨＤＤ、２４…ネットワークインタフェース、２５…操作インタフェース、２６…入力装置、２７…出力装置、４１…ネットワーク、４２…文書公開サーバ、４３…電子メールサーバ。

DESCRIPTION OF SYMBOLS 11 ... Document information delivery apparatus, 21 ... CPU, 22 ... RAM, 23 ... HDD, 24 ... Network interface, 25 ... Operation interface, 26 ... Input device, 27 ... Output device, 41 ... Network, 42 ... Document publication server, 43 ... an email server.

Claims

A document acquisition unit for acquiring classified technical documents;
Each of the plurality of reference technical documents to which a person in charge is assigned is classified into one of a plurality of reference technical document groups that are collections of documents related to each other in the technical field, and information about the reference technical documents is stored. A document storage unit;
When the classified technical document is acquired by the document acquisition unit, the reference technical document that is a group constituted by vocabularies similar to the vocabulary constituting the classified technical document among the plurality of reference technical document groups A related document extracting unit that executes a related document extracting process of extracting each of the reference technical documents included in the group as a related document;
With
The related document extraction unit includes:
For each reference technical document group, a first word content ratio that is a ratio of the number of the reference technical documents including a word in the reference technical document group to the number of the reference technical documents included in the reference technical document group. And for each word,
The higher the value of the classified technical document, the higher the first word content rate, the larger the value, and the lower the classified technical document, the lower the first word content rate. Calculating a first document relevance value having a large value for each of the reference technical document groups;
Extracting the reference technical document included in the reference technical document group having the highest value of the first document relevance among the reference technical document groups as the related document;
Accordingly, the document information providing apparatus configured to execute the related document extraction process.

A document acquisition unit for acquiring classified technical documents;
Each of the plurality of reference technical documents to which a person in charge is assigned is classified into one of a plurality of reference technical document groups that are collections of documents related to each other in the technical field, and information about the reference technical documents is stored. A document storage unit;
When the classified technical document is acquired by the document acquisition unit, the reference technical document that is a group constituted by vocabularies similar to the vocabulary constituting the classified technical document among the plurality of reference technical document groups A related document extracting unit that executes a related document extracting process of extracting each of the reference technical documents included in the group as a related document;
With
The document storage unit
When the reference technical document is added, the reference technical document is added to the reference technical document group, which is a set of the reference technical documents most technically related to the added reference technical document. A document information providing apparatus for executing a reference technical document addition process for classifying reference technical documents,
The document storage unit
A second word content rate, which is a ratio of the number of the reference technical documents including a word in the reference technical document group to the number of the reference technical documents included in the reference technical document group, is set for each reference technical document group. And for each word,
The added reference technical document has a higher value as it contains more words with a higher second word content, and the added reference technical document has fewer words with a lower second word content. Calculating a second document relevance level that increases as it is included for each of the reference technical document groups;
Classifying the added reference technical document into the reference technical document group having the highest value of the second document relevance among the reference technical document groups;
The reference technical document addition process is executed by
Document information providing apparatus configured as described above.

A document acquisition unit for acquiring classified technical documents;
Each of the plurality of reference technical documents to which a person in charge is assigned is classified into one of a plurality of reference technical document groups that are collections of documents related to each other in the technical field, and information about the reference technical documents is stored. A document storage unit;
When the classified technical document is acquired by the document acquisition unit, the reference technical document that is a group constituted by vocabularies similar to the vocabulary constituting the classified technical document among the plurality of reference technical document groups A related document extracting unit that executes a related document extracting process of extracting each of the reference technical documents included in the group as a related document;
With
The document storage unit
When the reference technical document is added, the reference technical document is added to the reference technical document group, which is a set of the reference technical documents most technically related to the added reference technical document. A document information providing apparatus for executing a reference technical document addition process for classifying reference technical documents,
The document storage unit
The larger the number of persons in charge included in both the set of persons in charge of the reference technical document included in the reference technical document group and the group of persons in charge of the added reference technical document, the larger the value. For each reference technical document group,
Classifying the reference technical document to be added to the reference technical document group having the highest relevance level among the reference technical document groups.
The reference technical document addition process is executed by
Document information providing apparatus configured as described above.

The document information providing apparatus according to any one of claims 1 to 3 ,
A document information providing apparatus including an information providing unit that provides information on the classified technical document to each person in charge of the related document extracted by the related document extracting unit.