JPH11203286A

JPH11203286A - Natural language processing apparatus and method

Info

Publication number: JPH11203286A
Application number: JP10006859A
Authority: JP
Inventors: Makoto Hirota; 誠廣田; Michio Aizawa; 道雄相澤; Kazue Kaneko; 和恵金子; Minoru Fujita; 稔藤田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1998-01-16
Filing date: 1998-01-16
Publication date: 1999-07-30

Abstract

(57)【要約】【課題】単語辞書の任意のサイズの一部分を内部メモ
リ上にロードできるようにすることで、自然言語処理の
状況に応じて、処理の高速化や使用メモリサイズの抑制
を行なうことができるようにする。【解決手段】ツリー形式で単語を登録した単語辞書１
０５a備え、自然言語文を処理する際に、単語辞書１０
５a内に文書処理における表記の使用頻度情報を保持
し、単語辞書１０５aの一部辞書ロード部１０２により
内部メモリ１０７にロードして使用可能とし、ロードの
際に表記出現頻度計算部１０１が表示出現頻度計算し、
ロード条件設定部１０３がこの計算結果及び予め決めら
れたロード条件ファイル１０３aの内容に従ってロード
条件を決定する。 (57) [Summary] [Problem] To enable a part of an arbitrary size of a word dictionary to be loaded on an internal memory, thereby accelerating the processing and suppressing the size of a used memory according to the situation of natural language processing. Be able to do it. A word dictionary in which words are registered in a tree format.
05a, when processing a natural language sentence, the word dictionary 10
The notation use frequency information in the document processing is stored in 5a, and is loaded into the internal memory 107 by the partial dictionary load unit 102 of the word dictionary 105a to be usable, and the notation appearance frequency calculation unit 101 is displayed and displayed at the time of loading. Calculate the frequency,
The load condition setting unit 103 determines a load condition according to the calculation result and the contents of the predetermined load condition file 103a.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はツリー（ＴＲＥＥ）
形式の単語辞書を用いて自然語文を解析する自然言語処
理装置及び方法に関するものである。TECHNICAL FIELD The present invention relates to a tree (TREE).
TECHNICAL FIELD The present invention relates to a natural language processing apparatus and method for analyzing a natural language sentence using a word dictionary of a format.

【０００２】[0002]

【従来の技術】自然言語を解析するのに用いる単語辞書
として、従来から、図４に示すようなツリー形式の辞書
がよく用いられている（但し、表記出現頻度は除
く。）。このツリー形式の単語辞書は、全体容量が大き
いためＩＣメモリ等の内部記憶装置には登録されておら
ず、外部記憶装置であるディスク装置にファイルとして
保持されていた。2. Description of the Related Art As a word dictionary used for analyzing a natural language, a tree-type dictionary as shown in FIG. 4 has been frequently used (however, notation frequency is excluded). The word dictionary in the tree format is not registered in an internal storage device such as an IC memory because of its large overall capacity, but is stored as a file in a disk device as an external storage device.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、自然言
語処理を行なう際には、単語辞書を頻繁にアクセスする
ため、ディスク装置へのアクセス速度が自然言語処理の
処理速度に大きな影響を与える。ハードウェアの進歩に
よりディスク装置のアクセス速度は向上しているもの
の、解析処理の処理速度向上を妨げる要因となってい
た。However, when performing natural language processing, the word dictionary is frequently accessed, and the access speed to the disk device greatly affects the processing speed of the natural language processing. Although the access speed of the disk device has been improved due to the progress of hardware, it has been a factor that hinders the improvement of the processing speed of the analysis process.

【０００４】一方、内部メモリへのアクセス速度はディ
スク装置へのアクセス速度に比べて一般にはかなり速い
ので、辞書の内容を内部メモリにロードすることも考え
られる。しかし、数万語レベルの単語を保持した辞書を
すべて内部メモリ上にロードするためにはそれだけで相
当なメモリ容量が必要であり、不十分な容量のメモリに
辞書のような大きなデータを無理にロードすると、かえ
って処理速度が遅くなってしまう。On the other hand, since the access speed to the internal memory is generally much faster than the access speed to the disk device, it is conceivable to load the contents of the dictionary into the internal memory. However, to load all dictionaries holding tens of thousands of words on the internal memory, a considerable amount of memory is required by itself, and large data such as dictionaries can be forcibly stored in insufficient memory. When loading, the processing speed is rather slowed down.

【０００５】本発明は、上述した課題を解決し、単語辞
書の任意のサイズの一部分を内部メモリ上にロードでき
るようにすることで、自然言語処理の状況に応じて、処
理の高速化や使用メモリサイズの抑制を行なうことがで
きるようにすることを目的とする。SUMMARY OF THE INVENTION The present invention solves the above-mentioned problems, and enables a part of an arbitrary size of a word dictionary to be loaded on an internal memory, thereby accelerating or using a processing according to the situation of natural language processing. An object of the present invention is to be able to suppress the memory size.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成する一手
段として例えば以下の構成を備える。As one means for achieving the above object, for example, the following arrangement is provided.

【０００７】即ち、ツリー形式で単語を登録した単語辞
書を用いて自然言語文を解析する自然言語処理装置であ
って、内部処理データを記憶可能な内部記憶手段と、前
記内部記憶手段とは別個に大容量の記憶容量を備える外
部記憶手段と、前記外部記憶手段に記憶されたＴＲＥＥ
形式で単語を登録した単語辞書と、前記単語辞書の少な
くとも一部を前記内部記憶手段にロードする単語ロード
手段とを備えることを特徴とする。That is, a natural language processing apparatus for analyzing a natural language sentence using a word dictionary in which words are registered in a tree format, wherein an internal storage means capable of storing internal processing data and the internal storage means are provided separately. External storage means having a large storage capacity, and TREE stored in the external storage means.
A word dictionary in which words are registered in a format; and word loading means for loading at least a part of the word dictionary into the internal storage means.

【０００８】そして例えば、更に、前記単語辞書を用い
て自然言語を解析する解析手段とを備えることを特徴と
する。[0008] For example, the invention is characterized by further comprising analysis means for analyzing a natural language using the word dictionary.

【０００９】また例えば、前記単語ロード手段は、前記
内部記憶手段にロードされる前記単語辞書のサイズより
前記内部記憶手段にロードされた単語辞書のみを用いて
自然言語を解析可能な確率を求め、該求めた確率より前
記内部記憶手段にロードされる前記単語辞書のロード部
分を決定するロード条件設定手段を含むことを特徴とす
る。Also, for example, the word loading means obtains a probability that a natural language can be analyzed using only the word dictionary loaded in the internal storage means from the size of the word dictionary loaded in the internal storage means, And a load condition setting unit for determining a load portion of the word dictionary to be loaded into the internal storage unit from the obtained probability.

【００１０】更に例えば、文書情報を保持する文書情報
保持手段と、前記文書情報保持手段に保持された文書情
報中の前記単語辞書に登録された単語の各表記の出現頻
度を計算する表記出現頻度計算手段とを備えることを特
徴とする。あるいは、前記ロード条件設定手段は、前記
表記出現頻度計算手段で求めた確率により前記単語辞書
のロード部分を決定することを特徴とする。Further, for example, a document information holding means for holding document information, and a notation appearance frequency for calculating the occurrence frequency of each notation of a word registered in the word dictionary in the document information held in the document information holding means Calculation means. Alternatively, the load condition setting means determines a load portion of the word dictionary based on the probability obtained by the notation appearance frequency calculation means.

【００１１】又は、ツリー形式で単語を登録した単語辞
書を用いて自然言語文を解析する自然言語処理装置であ
って、前記単語辞書内に文書処理における表記の使用頻
度を保持し、内部処理データを記憶可能な内部メモリに
前記単語辞書の内容の一部を表記出現頻度を参照してロ
ードする手段を備えることを特徴とする。A natural language processing apparatus for analyzing a natural language sentence using a word dictionary in which words are registered in a tree format, wherein the word dictionary holds the frequency of use of notations in document processing, and stores internal processing data Means for loading a part of the content of the word dictionary into an internal memory capable of storing the word reference frequency with reference to the notation appearance frequency.

【００１２】そして例えば、前記単語辞書より内部記憶
手段へのロードにおいては、前記内部記憶手段にロード
される前記単語辞書のサイズより前記内部記憶手段にロ
ードされた単語辞書のみを用いて自然言語を解析可能な
確率を求め、該求めた確率より前記内部記憶手段にロー
ドされる前記単語辞書のロード部分を決定することを特
徴とする。[0012] For example, in loading from the word dictionary to the internal storage means, the natural language is used only by using the word dictionary loaded into the internal storage means based on the size of the word dictionary loaded into the internal storage means. It is characterized in that a probability that can be analyzed is obtained, and a load portion of the word dictionary to be loaded into the internal storage means is determined from the obtained probability.

【００１３】[0013]

【発明の実施の形態】以下、図面を参照して本発明に係
る一発明の実施の形態例を詳細に説明する。図１は本発
明に係る一発明の実施の形態例に係る自然言語処理装置
の構成を示すブロック図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a natural language processing apparatus according to an embodiment of the present invention.

【００１４】図１において１００は装置全体の制御を司
る制御部、１０１は表記出現頻度計算部、１０２は辞書
ロード部、１０３はロード条件設定部、１０３aはロー
ド条件設定ファイル、１０４は文書データベース、１０
５はディスク装置であり、ディスク装置１０５には単語
辞書１０５aが登録されている。In FIG. 1, reference numeral 100 denotes a control unit for controlling the entire apparatus, 101 denotes a notation frequency calculation unit, 102 denotes a dictionary load unit, 103 denotes a load condition setting unit, 103a denotes a load condition setting file, 104 denotes a document database, 10
Reference numeral 5 denotes a disk device, and a word dictionary 105a is registered in the disk device 105.

【００１５】また、１０６は制御部１００の制御手順あ
るいは自然言語処理装置の各部分の制御手順等を記憶す
るＲＯＭであり、制御メモリとして機能している。１０
７はＩＣ等で構成されるアクセス速度の速い内部メモリ
であり、内部メモリ１０７には所定サイズの内部単語辞
書ロードエリア１０７aが割り当てられている。１１０
は文書データベース１０４に格納されている自然言語を
解析可能な自然言語解析部である。A ROM 106 stores a control procedure of the control unit 100 or a control procedure of each part of the natural language processing apparatus, and functions as a control memory. 10
Reference numeral 7 denotes an internal memory having a high access speed, such as an IC, and an internal word dictionary load area 107a of a predetermined size is allocated to the internal memory 107. 110
Is a natural language analysis unit capable of analyzing a natural language stored in the document database 104.

【００１６】以上の説明において、例えば、表記出現頻
度計算部１０１はＲＯＭ１０６に格納されている図３の
フローチャートに示すような制御手順に従って後述する
文書データベース１０４に保持された文書情報中の単語
辞書１０５aに登録された単語の各表記の出現頻度を計
算する。また、ＲＯＭ１０６はロード条件設定部１０３
と辞書ロード部１０２の図６に示す制御手順等も記憶し
ている。In the above description, for example, the notation appearance frequency calculation unit 101 follows the control procedure shown in the flowchart of FIG. 3 stored in the ROM 106, and the word dictionary 105a in the document information held in the document database 104 described later. Calculate the appearance frequency of each notation of the word registered in. The ROM 106 stores the load condition setting unit 103
And the control procedure of the dictionary loading unit 102 shown in FIG.

【００１７】なお、以上の説明は各個別の構成により自
然言語処理を行なう例を示したが、本発明は以上の例に
限定されるものではなく、中央処理装置（コンピュータ
装置）で一括して各機能を実現してもよい。このよう
に、中央処理装置（コンピュータ装置）で一括して各機
能を実現する例を図２に示す。In the above description, an example in which the natural language processing is performed by each individual configuration has been described. However, the present invention is not limited to the above example, and the central processing unit (computer device) collectively performs the processing. Each function may be realized. FIG. 2 shows an example in which the functions are collectively realized by the central processing unit (computer device).

【００１８】図２は中央処理装置（コンピュータ装置）
で一括して各機能を実現する場合の例を示す図である。
図２において、２１は制御メモリであり、図２のＲＯＭ
１０６に対応する。図３及び図６のフローチャートに示
すような制御手順に従った制御プログラム等を記憶さ
せ、中央処理装置２２で実行させればよい。FIG. 2 shows a central processing unit (computer unit).
FIG. 4 is a diagram showing an example of a case where each function is realized collectively by the.
2, reference numeral 21 denotes a control memory, which is a ROM of FIG.
106. A control program or the like according to a control procedure as shown in the flowcharts of FIGS. 3 and 6 may be stored and executed by the central processing unit 22.

【００１９】２２は制御メモリ２１に保持されている制
御手順に従って判断・演算などを行なう中央処理装置で
ある。２３は内部メモリ、２４はディスク装置であり、
前記文書データベースや単語辞書、ロード条件設定ファ
イル等を保持する。２５はバスである。Reference numeral 22 denotes a central processing unit for performing judgment and calculation in accordance with a control procedure stored in the control memory 21. 23 is an internal memory, 24 is a disk device,
It holds the document database, word dictionary, load condition setting file, and the like. 25 is a bus.

【００２０】次に以上に示すような本実施の形態例にお
ける表記出現頼度計算部１０１の動作を図３に示すフロ
ーチャートを参照して以下に説明する。Next, the operation of the notation appearance reliability calculation unit 101 in the present embodiment as described above will be described below with reference to the flowchart shown in FIG.

【００２１】表記出現頼度計算部１０１は、まずステッ
プＳ３０１で文書データベース１０４から文書Ｄｎを取
り出す。続いてステップＳ３０２で内蔵する不図示の解
析する文字位置を示す文字位置レジスタｉに「１」をセ
ットする。続いてステップＳ３０３で文書Ｄｎの文字位
置レジスタｉの示す文字位置から単語辞書のツリー（Ｔ
ＲＥＥ）構造を辿る。The notation appearance reliability calculation unit 101 first retrieves the document Dn from the document database 104 in step S301. Subsequently, in step S302, "1" is set to a character position register i (not shown) indicating a character position to be analyzed, which is not shown. Subsequently, in step S303, the word dictionary tree (T
REE) Follow the structure.

【００２２】例えば、図４に示すツリー（ＴＲＥＥ）構
造の単語辞書に対し、文書Ｄｎの文字位置ｉが図５の矢
印の位置であった場合、ツリー構造を辿った結果、ツリ
ーの葉「あら」から「あらかた」に辿り着く。つまり位
置ｉを先頭とする文字列の中に、「あら」「あらかた」
という表記が出現していることがわかる。For example, when the character position i of the document Dn is the position of the arrow in FIG. 5 in the word dictionary having the tree (TREE) structure shown in FIG. "" To "Arata". In other words, in the character string starting at position i, "ara", "arakata"
It can be seen that the notation has appeared.

【００２３】そこで、次のステップＳ３０４でこの「あ
ら」「あらかた」に対応する表記出現頻度を１増やす。
次にステップＳ３０５で文字位置レジスタｉを１つイン
クリメントする（文字位置を右へひとつずらす）。そし
てステップＳ３０６において、そこが文書Ｄｎの末尾か
どうかを調べる。文書Ｄｎの末尾でなければステップＳ
３０３へ戻り上記処理を続ける。Therefore, in the next step S304, the notation appearance frequency corresponding to the words "rough" and "rough" is increased by one.
Next, in step S305, the character position register i is incremented by one (the character position is shifted to the right by one). Then, in a step S306, it is checked whether or not this is the end of the document Dn. If not the end of document Dn, step S
Returning to step 303, the above processing is continued.

【００２４】一方、ステップＳ３０６で文書Ｄｎの末尾
であればステップＳ３０７に進み、文書データベース１
０４のすべての文書を処理したかどうかを調べる。すべ
ての文書を処理したら当該処理を終了する。一方、未だ
処理していない文書があればステップＳ３０１に戻り、
次の文書Ｄｎ＋１を文書データベース１０４から取り出
し、同様の処理を行う。On the other hand, if the end of the document Dn is found in step S306, the process proceeds to step S307, where the document database 1
Check whether all 04 documents have been processed. When all the documents have been processed, the process ends. On the other hand, if there is a document that has not been processed, the process returns to step S301.
The next document Dn + 1 is retrieved from the document database 104, and the same processing is performed.

【００２５】このように図３に示す処理を行うことによ
り、文書データベース１０４に格納されている処理すべ
き自然言語文書における単語辞書に対する表記出現頻度
が求められ、図４に示すように単語辞書の表記出現頻度
格納領域に格納することができる。By performing the processing shown in FIG. 3 as described above, the notation appearance frequency for the word dictionary in the natural language document to be processed stored in the document database 104 is obtained, and as shown in FIG. It can be stored in the notation appearance frequency storage area.

【００２６】なお、以上の表記出現頻度は、文書処理が
統べて終了した時点でリセットしても、あるいは、所定
割合で累積していってもよい。The above-mentioned notation appearance frequency may be reset when the document processing is completed, or may be accumulated at a predetermined rate.

【００２７】次に図６に示すフローチャートを参照し
て、本実施の形態例におけるロード条件設定部１０３と
辞書ロード部１０２の動作を説明する。以下の説明は、
ロード条件設定部１０３の実施の形態例の一例として、
ロード条件設定ファイルを用いる方法を示す。Next, the operations of the load condition setting unit 103 and the dictionary load unit 102 according to this embodiment will be described with reference to the flowchart shown in FIG. The following description is
As an example of the embodiment of the load condition setting unit 103,
The method of using the load condition setting file will be described.

【００２８】本実施の形態例において、ロード条件設定
ファイル１０３aの例を図７に示す。ロード条件設定フ
ァイル１０３aは、内部メモリ１０７への単語辞書のロ
ードサイズの上限、及び非ディスクアクセス確率の下限
を設定するファイルである。FIG. 7 shows an example of the load condition setting file 103a in this embodiment. The load condition setting file 103a is a file for setting an upper limit of the load size of the word dictionary to the internal memory 107 and a lower limit of the non-disk access probability.

【００２９】図７には、ロード条件設定ファイル１０３
aの設定条件の例として、（a）は内部メモリ１０７への
単語辞書のロードサイズの上限が２Ｍバイトの場合の例
であり、単語辞書を２Ｍバイト分ロード可能なことを示
している。また（ｂ）は内部メモリのサイズを指定する
のではなく、文書を処理する際に内部メモリ１０７に格
納された単語辞書のアクセス割合でサイズを設定させる
ものであり、非ディスクアクセス確率の下限として図７
の例では６０％に設定されている例である。FIG. 7 shows a load condition setting file 103.
As an example of the setting condition of a, (a) is an example where the upper limit of the load size of the word dictionary to the internal memory 107 is 2 Mbytes, and indicates that the word dictionary can be loaded by 2 Mbytes. (B) does not specify the size of the internal memory, but sets the size based on the access ratio of the word dictionary stored in the internal memory 107 when processing the document. FIG.
Is an example in which it is set to 60%.

【００３０】図７の（ｃ）はロードサイズ上限を「０」
とした場合の例であり、この場合は全く辞書ロードを行
なわず、単語辞書を用いる場合にはすべてディスク装置
１０５上に格納されている単語辞書をアクセスする。
（ｄ）は単語辞書全体を内部メモリ１０７にロードする
場合の例であり、（ｅ）の場合は、文書を処理する際に
全てを内部メモリ１０７の単語辞書をアクセスするのみ
で行う設定である。FIG. 7C shows that the upper limit of the load size is "0".
In this case, dictionary loading is not performed at all, and when word dictionaries are used, all word dictionaries stored on the disk device 105 are accessed.
(D) is an example in which the entire word dictionary is loaded into the internal memory 107, and (e) is a setting in which all documents are processed only by accessing the word dictionary in the internal memory 107 when processing a document. .

【００３１】本実施の形態例ではこのようにロード条件
設定ファイル１０３aの設定に従って単語辞書の内部メ
モリ１０７へのロード割合などを制御している。In this embodiment, the load ratio of the word dictionary to the internal memory 107 is controlled in accordance with the setting of the load condition setting file 103a.

【００３２】以上の考慮にいれ、図６の制御の説明を行
う。ロード条件設定部１０３は、まず、ステップＳ６０
１で現在のロード条件設定ファイル１０３aを読み込
む。例えば、図７の（ａ）は単語辞書を２Ｍバイト以上
ロードしないことを意味し、（ｂ）はディス装置の単語
辞書をクアクセスせずにな内部メモリ１０７上にロード
した辞書部分にアクセスするだけで済む確率が６０％と
なるに十分なサイズだけ内部メモリ１０７にロードを行
なうことを意味する。ロードサイズ上限値は不図示のレ
ジスタｌに格納し、非ディスクアクセス確率下限値は不
図示のｐレジスタに格納する。With the above in mind, the control of FIG. 6 will be described. First, the load condition setting unit 103 first proceeds to step S60
In step 1, the current load condition setting file 103a is read. For example, FIG. 7A shows that the word dictionary is not loaded more than 2 Mbytes, and FIG. 7B accesses the dictionary part loaded on the internal memory 107 without accessing the word dictionary of the disk device. This means that the internal memory 107 is loaded with a size sufficient to achieve a probability of only 60%. The load size upper limit value is stored in a register 1 (not shown), and the non-disk access probability lower limit value is stored in a p register (not shown).

【００３３】そして、続くステップＳ６０２でステップ
Ｓ６０１で読み込んだロード条件設定ファイル１０３a
のロードサイズ上限値が”ＡＬＬ”であるか否かを調べ
る。もしロードサイズ上限値が図７の（ｄ）に示すよう
に”ＡＬＬ”である場合にはステップステップ６０６に
進み、単語辞書１０５aの全てを内部メモリ１０７中に
ロードして当該処理を終了する。Then, in the following step S602, the load condition setting file 103a read in step S601
It is checked whether or not the load size upper limit value is “ALL”. If the upper limit of the load size is "ALL" as shown in FIG. 7D, the process proceeds to step 606, where the entire word dictionary 105a is loaded into the internal memory 107, and the process ends.

【００３４】例えば自然言語処理装置が多数のクライア
ント機の接続されたサーバ機の様に、内部メモリ容量が
十分に大きく、各クライアント機がサーバ機の単語辞書
を利用して自然言語処理を行うような場合には、ディス
ク装置をいちいちアクセスしていたのでは処理効率が低
下してしまうため、ロードサイズ上限値を”ＡＬＬ”と
することが望ましい。For example, a natural language processing device has a sufficiently large internal memory capacity, like a server connected to a large number of clients, and each client performs natural language processing using a word dictionary of the server. In such a case, it is desirable to set the load size upper limit value to “ALL” because the processing efficiency is reduced if the disk device is accessed one by one.

【００３５】一方、ステップＳ６０２でステップＳ６０
１で読み込んだロード条件設定ファイル１０３aのロー
ドサイズ上限値が”ＡＬＬ”でない場合にはステップＳ
６０３に進み、ステップＳ６０１で読み込んだロード条
件設定ファイル１０３aのロードサイズ上限値が”０”
であるか否かを調べる。ロードサイズ上限値が”０”で
ある場合には単語辞書の内部メモリ１０７へのロードは
行わずに当該処理を終了する。On the other hand, in step S602, step S60
If the load size upper limit value of the load condition setting file 103a read in step 1 is not "ALL", step S
Proceeding to step 603, the load size upper limit value of the load condition setting file 103a read in step S601 is “0”.
Check if it is. If the upper limit of the load size is “0”, the processing ends without loading the word dictionary into the internal memory 107.

【００３６】一方、ステップＳ６０３でステップＳ６０
１で読み込んだロード条件設定ファイル１０３aのロー
ドサイズ上限値が”０”でない場合にはステップＳ６０
４に進み、非ディスクアクセス確率下限値が１００か否
かを調べる。非ディスクアクセス確率下限値が１００の
場合には全ての単語辞書アクセスを内部メモリにロード
した単語辞書に行う設定であるため、ステップＳ６０６
に進む。On the other hand, in step S603, step S60
If the load size upper limit value of the load condition setting file 103a read in step 1 is not "0", step S60 is executed.
The process proceeds to step 4 to check whether the lower limit of the non-disk access probability is 100 or not. If the non-disk access probability lower limit value is 100, all word dictionary accesses are set to the word dictionary loaded in the internal memory, so that step S606 is performed.
Proceed to.

【００３７】但し、本発明は以上の例に限定されるもの
ではなく、例えば図３の処理により更新された表記出現
頻度を調べ、単語辞書中の表記出現頻度が１以上のもの
だけを内部メモリ１０７にロードしてきてもよい。この
様に制御することにより、内部メモリ１０７の専有容量
を必要最小限に抑えることができる。However, the present invention is not limited to the above example. For example, the notation frequency updated by the processing of FIG. 107 may be loaded. By performing such control, the occupied capacity of the internal memory 107 can be minimized.

【００３８】一方、ステップＳ６０４で非ディスクアク
セス確率下限値が１００でない場合にはステップＳ６０
５に進み、ロード条件設定部１０１は図３の処理で更新
した単語辞書中の図４に示す表記出現頻度を抽出し、抽
出した表記出現頻度とレジスタｌのロードサイズ上限値
と、レジスタｐの非ディスクアクセス確率下限値から内
部メモリ１０７にロードできる単語辞書部分を決定す
る。ここでは、単語辞書１０５a中の表記出現頻度の高
いものから順次ロードする部分を決めていく。On the other hand, if the lower limit of the non-disk access probability is not 100 in step S604, the process proceeds to step S60.
5, the load condition setting unit 101 extracts the notation appearance frequency shown in FIG. 4 in the word dictionary updated in the processing of FIG. 3, the extracted notation appearance frequency, the load size upper limit value of the register l, and the A word dictionary portion that can be loaded into the internal memory 107 is determined from the lower limit value of the non-disk access probability. Here, the parts to be sequentially loaded are determined in descending order of the notation frequency in the word dictionary 105a.

【００３９】そして、辞書ロード部１０２は、ロード条
件設定部１０３で設定した単語辞書１０５aの内部メモ
リ１０７へのロード部分を内部メモリ１０７の内部単語
辞書領域１０７aにロードする。Then, the dictionary loading unit 102 loads the part of the word dictionary 105a set by the load condition setting unit 103 into the internal memory 107 into the internal word dictionary area 107a of the internal memory 107.

【００４０】ロード条件設定部１０３のロード条件設定
の詳細を以下に説明する。ロード条件設定部１０３は、
ロードサイズ上限値”ｌ”が設定されている場合は、表
記出現頻度計算部１０１が計算した単語辞書１０５aの
表記出現頻度を調べ、表記出現度の高い順に単語辞書を
調べる。The details of the load condition setting of the load condition setting section 103 will be described below. The load condition setting unit 103
If the load size upper limit “l” is set, the notation appearance frequency of the word dictionary 105a calculated by the notation appearance frequency calculation unit 101 is checked, and the word dictionaries are checked in descending order of notation appearance degree.

【００４１】例えば、図８の例で表記出現頻度の高いも
のは「ありがためいわく」「いえ」「いえじ」…の順で
ある。「ありがためいわく」「いえ」までロードした場
合のロードサイズが”ｌ”より小さく、「ありがためい
わく」「いえ」「いえじ」までロードした場合のロード
サイズが”ｌ”を超える場合は、単語辞書のツリー構造
の中で「ありがためいわく」「いえ」に関係する部分、
つまり図８の太枠の部分をロードするように設定する。For example, in the example of FIG. 8, those having a high notation appearance frequency are in the order of "thank you,""no,""no," and so on. If the load size when loading up to "Thank you so much" and "No" is smaller than "l", and the load size when loading up to "Thank you so much", "No" and "No way" exceeds "1", a word dictionary The part of the tree structure related to "Thank you so much" and "No"
That is, the setting is made to load the portion indicated by the thick frame in FIG.

【００４２】そして、辞書ロード部１０２はこの設定に
したがって例えば「ありがためいわく」「いえ」に関係
する部分を内部メモリ１０７の内部単語辞書領域１０７
aにロードする。Then, the dictionary loading unit 102 stores, for example, a portion related to “Thank you” and “No” according to this setting in the internal word dictionary area 107 of the internal memory 107.
Load to a.

【００４３】非ディスクアクセス確率下限値ｐが設定さ
れている場合は、やはり、単語辞書１０５aの表記出現
頻度の高い順に単語辞書を調べ、非ディスクアクセス確
率下限値ｐを超えるところまでをロード部分とする。例
えば、図８の単語辞書に対してｐ＝６５「ありがためい
わく」「いえ」までロードした場合の非ディスクアクセ
ス確率は、（208+134）／（208+134+103+56+21+3）＝ 65.3％でｐを超えるので、図８の太枠の部分をメモリロードす
る。When the lower limit p of the non-disk access probability is set, the word dictionary is also searched in the order of the frequency of appearance of the word in the word dictionary 105a. I do. For example, the probability of non-disk access when the word dictionary of FIG. 8 is loaded up to p = 65 “thank you” and “no” is (208 + 134) / (208 + 134 + 103 + 56 + 21 + 3) = 65.3%, which exceeds p, so the memory is loaded in the portion indicated by the thick frame in FIG.

【００４４】以上説明したように本実施の形態例によれ
ば、内部メモリ１０７中にロード可能な内部単語辞書割
り当て容量を決定するロードサイズ上限や非ディスクア
クセス確率下限を自由に設定し、その設定に応じて単語
辞書１０５aの内部メモリ１０７への適当なロード部分
を決定し、決定部分を内部メモリ１０７上にロードする
ようにしたので、自然言語処理の状況に応じて、処理の
高速化や使用メモリサイズの抑制を行なうことができる
ようになるという効果が得られる。As described above, according to the present embodiment, the upper limit of the load size and the lower limit of the non-disk access probability for determining the allocated capacity of the internal word dictionary that can be loaded into the internal memory 107 are set freely. The appropriate portion of the word dictionary 105a to be loaded into the internal memory 107 is determined in accordance with the condition, and the determined portion is loaded onto the internal memory 107. The effect that the memory size can be suppressed can be obtained.

【００４５】即ち、辞書に登録された単語のうち、出現
頻度の高い一部をメモリ上にロードすることで、辞書に
必要なメモリサイズを抑えつつ、ディスク装置１０５へ
のアクセスの頻度を減らすことができる。That is, by loading a part of the words registered in the dictionary having a high frequency of appearance into the memory, the frequency of access to the disk device 105 can be reduced while the memory size required for the dictionary is reduced. Can be.

【００４６】この場合においても、辞書登録単語のうち
どの程度をメモリロードするのが適当かは、その自然言
語処理の実行環境や自然言語処理の目的等に依存する
が、自然言語処理の状況に応じて、処理の高速化や使用
メモリサイズの抑制を行なうことができ、状況に応じて
最適の内部メモリ１０７中の単語辞書の割り当てが可能
となる。In this case as well, how much of the words registered in the dictionary should be loaded into the memory depends on the execution environment of the natural language processing, the purpose of the natural language processing, and the like. Accordingly, the processing speed can be increased and the size of the memory used can be suppressed, and the word dictionary in the internal memory 107 can be optimally allocated according to the situation.

【００４７】[0047]

【他の実施の形態】以上に説明した実施の形態例では、
単語辞書の内部メモリへのロード条件を指定するのに、
ロード条件設定ファイルを用いているが、プログラムに
設定関数を用意する方法であっても、図２にような中央
処理装置が全てを制御する様な場合にはそのＯＳの環境
変数の形で設定するなどさまざまな方法を用いてよい。[Other Embodiments] In the embodiment described above,
To specify the conditions for loading the word dictionary into the internal memory,
Although the load condition setting file is used, even if the setting function is prepared in the program, if the central processing unit controls everything as shown in FIG. Various methods may be used.

【００４８】また上述した実施の形態例では、ロード条
件であるロードサイズ上限や非ディスクアクセス確率下
限の値を直接設定する方法をとっているが、自然言語処
理ソフトウェアを搭載するコンピュータの搭載メモリ容
量からロードサイズに上限を自動的に設定するようにし
てもよい。In the above-described embodiment, the method of directly setting the values of the load size upper limit and the non-disk access probability lower limit, which are the load conditions, is adopted. However, the memory capacity of the computer on which the natural language processing software is installed is taken. , The upper limit may be automatically set for the load size.

【００４９】なお、本発明は、複数の機器から構成され
るシステムに通用しても、１つの機器からなる装置に通
用してもよい。前述した実施形態の機能を実現するソフ
トウェアのプログラムコードを記録した記録媒体を、シ
ステム或いは装置に供給し、そのシステム或いは装置の
コンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格
納されたプログラムコードを読み出し実行することによ
っても、達成されることは言うまでもない。The present invention may be applied to a system including a plurality of devices or to an apparatus including a single device. A recording medium storing the software program code for realizing the functions of the above-described embodiments is supplied to a system or an apparatus, and a computer (or CPU or MPU) of the system or the apparatus reads out the program code stored in the recording medium. Needless to say, it can also be achieved by executing.

【００５０】この場合、記録媒体から読み出されたプロ
グラムコード自体が前述した実施形態の機能を実現する
ことになり、そのプログラムコードを記録した記録媒体
により本発明を構成することになる。プログラムコード
を供給するための記録媒体としては、例えば、フロッピ
ーディスク，ハードディスク，光ディスク，光磁気ディ
スク，ＣＤ−ＲＯＭ，ＣＤ−Ｒ，磁気テープ，不揮発性
のメモリカード，ＲＯＭなどを用いることができる。In this case, the program code itself read from the recording medium realizes the function of the above-described embodiment, and the present invention is constituted by the recording medium on which the program code is recorded. As a recording medium for supplying the program code, for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a ROM, and the like can be used.

【００５１】また、コンピュータが読み出したプログラ
ムコードを実行することにより、前述した実施形態の機
能が実現されるだけでなく、そのプログラムコードの指
示に基づき、コンピュータ上で稼動しているＯＳなどが
実際の処理の一部または全部を行ない、その処理によっ
て前述した実施形態の機能が実現される場合も含まれる
ことは言うまでもない。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also the OS and the like running on the computer are actually executed based on the instructions of the program code. It goes without saying that a part or all of the above-described processing is performed, and the functions of the above-described embodiments are realized by the processing.

【００５２】更に、記録媒体から読み出されたプログラ
ムコードが、コンピュータに挿入された機能拡張ボード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書き込まれた後、そのプログラムコードの指示
に基づき、その機能拡張ボードや機能拡張ユニットに備
わるＣＰＵなどが実際の処理の一部または全部を行な
い、その処理によって前述した実施形態の機能が実現さ
れる場合も含まれることは言うまでもないFurther, after the program code read from the recording medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instruction of the program code, It goes without saying that the CPU included in the function expansion board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【発明の効果】以上説明したように本発明によれば、ツ
リー形式で単語を登録した単語辞書を用いて自然言語文
を解析する自然言語処理装置において、外部記憶装置に
格納されている単語辞書の少なくとも一部を内部記憶手
段にロードして自然言語処理を行うことができ、処理効
率を最適化できる。As described above, according to the present invention, in a natural language processing apparatus for analyzing a natural language sentence using a word dictionary in which words are registered in a tree format, a word dictionary stored in an external storage device Can be loaded into the internal storage means to perform natural language processing, and the processing efficiency can be optimized.

【００５３】しかも、この場合においても、自然言語処
理の状況に応じてロードサイズ上限や非ディスクアクセ
ス確率下限を自由に設定でき、最適なロード量を選択で
き、しかも、処理の高速化や使用メモリサイズの抑制を
行なうことができる。In this case as well, the upper limit of the load size and the lower limit of the non-disk access probability can be set freely according to the situation of the natural language processing, and the optimum load amount can be selected. Size reduction can be performed.

【００５４】[0054]

[Brief description of the drawings]

【図１】本発明に係る一発明の実施の形態例に係る自然
言語処理装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a natural language processing device according to an embodiment of the present invention.

【図２】図１に示す各機能を中央処理装置（コンピュー
タ装置）で一括して実現する場合の例を示すブロック図
である。FIG. 2 is a block diagram illustrating an example in which the functions illustrated in FIG. 1 are collectively realized by a central processing unit (computer device).

【図３】本実施の形態例の表記出現頻度計算部の処理手
順を示す動作フローチャートである。FIG. 3 is an operation flowchart illustrating a processing procedure of a notation appearance frequency calculation unit according to the embodiment;

【図４】本実施の形態例における単語辞書のツリー構造
の例を示す図である。FIG. 4 is a diagram showing an example of a tree structure of a word dictionary in the embodiment.

【図５】本実施の形態例における文書データベースの文
書の例を示す図である。FIG. 5 is a diagram showing an example of a document in a document database according to the embodiment.

【図６】本実施の形態例のロード条件設定部および辞書
ロード部の実施の形態例の処理手順を示す動作フローチ
ャートである。FIG. 6 is an operation flowchart illustrating a processing procedure of a load condition setting unit and a dictionary load unit according to the embodiment of the present embodiment;

【図７】本実施の形態例におけるロード条件設定ファイ
ルの例を示す図である。FIG. 7 is a diagram illustrating an example of a load condition setting file according to the embodiment;

【図８】本実施の形態例における単語辞書のロード部分
の例を示す図である。FIG. 8 is a diagram showing an example of a loaded portion of a word dictionary in the embodiment.

[Explanation of symbols]

２１制御メモリ２２中央処理装置２３メモリ２４ディスク装置２５バス１００制御部１０１表記出現頻度計算部１０２辞書ロード部１０３ロード条件設定部１０４文書データベース１０５ディスク装置１０５a 単語辞書１０６ＲＯＭ１０７内部メモリ１０７a 内部単語辞書格納領域１１０自然言語解析部 Reference Signs List 21 control memory 22 central processing unit 23 memory 24 disk device 25 bus 100 control unit 101 notation appearance frequency calculation unit 102 dictionary load unit 103 load condition setting unit 104 document database 105 disk device 105a word dictionary 106 ROM 107 internal memory 107a internal word dictionary Storage area 110 Natural language analysis unit

フロントページの続き (72)発明者藤田稔東京都大田区下丸子３丁目30番２号キヤノン株式会社内Continuation of the front page (72) Inventor Minoru Fujita 3-30-2 Shimomaruko, Ota-ku, Tokyo Inside Canon Inc.

Claims

[Claims]

1. A natural language processing apparatus for analyzing a natural language sentence using a word dictionary in which words are registered in a tree format, comprising: an internal storage unit capable of storing internal processing data; External storage means having a large storage capacity, a word dictionary in which words are registered in a tree (TREE) format stored in the external storage means, and at least a part of the word dictionary is loaded into the internal storage means. A natural language processing device comprising: a word loading unit.

2. The natural language processing apparatus according to claim 1, further comprising an analysis unit that analyzes a natural language using a word dictionary.

3. The word loading means obtains a probability that a natural language can be analyzed using only the word dictionary loaded in the internal storage means from the size of the word dictionary loaded in the internal storage means. 2. A load condition setting means for determining a load portion of the word dictionary to be loaded into the internal storage means from the obtained probability.
Alternatively, the natural language processing device according to claim 2.

4. A document information holding means for holding document information, and a notation appearance for calculating an appearance frequency of each notation of a word registered in the word dictionary in the document information held in the document information holding means. 4. The natural language processing device according to claim 1, further comprising a frequency calculation unit.

5. The natural language processing apparatus according to claim 4, wherein said load condition setting means determines a load part of said word dictionary based on a probability obtained by said notation appearance frequency calculation means.

6. A natural language processing method in a natural language processing device for analyzing a natural language sentence using a word dictionary in which words are registered in a tree format, the method comprising: A natural language processing method comprising: loading a part of the contents of the word dictionary into an internal memory capable of storing processing data with reference to a notation frequency.

7. In loading from the word dictionary to the internal storage means, a probability that a natural language can be analyzed is obtained using only the word dictionary loaded in the internal storage means, and the internal storage is obtained from the obtained probability. 7. The method according to claim 6, wherein a load portion of the word dictionary to be loaded on the means is determined.
Natural language processing method described.

8. A computer storage medium storing a control procedure for realizing the function according to claim 1. Description:

9. A computer program sequence for realizing the function according to any one of claims 1 to 7.