JPH0981557A

JPH0981557A - Natural language processing device and natural language processing method

Info

Publication number: JPH0981557A
Application number: JP7234077A
Authority: JP
Inventors: Yoshimi Saito; 佳美齋藤; Hiroyasu Nogami; 宏康野上; Tatsuya Uehara; 龍也上原; Tatsuya Dewa; 達也出羽; Yumi Mizutani; 由美水谷
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-09-12
Filing date: 1995-09-12
Publication date: 1997-03-28

Abstract

(57)【要約】【課題】出力された自然言語表現が共起として既知であ
るか否かを容易に知り得るようにする。【解決手段】例えばかな漢字変換処理において、編集制
御部１０２は単語列候補保持部１０８に保持された単語
群が共起情報と一致するか否かを調べる。言語情報の中
の単語群が共起情報と一致しない場合あるいは一致する
場合に、編集制御部１０２はその旨を知らしめる共起有
無情報（例えば下線）を当該候補文字列に付して表示部
１０３に出力する。これにより、表示部１０３に出力さ
れた文字列に付与された共起有無情報から、その表現が
共起として既知であると判断されたものか、あるいは、
そのような蓄積のない表現とかいうことを簡単に知るこ
とができる。 (57) [Abstract] [PROBLEMS] To easily know whether an output natural language expression is known as co-occurrence. In the kana-kanji conversion processing, for example, the edit control unit 102 checks whether or not the word group held in the word string candidate holding unit 108 matches the co-occurrence information. When the word group in the language information does not match or does not match the co-occurrence information, the edit control unit 102 attaches co-occurrence presence / absence information (for example, underline) indicating that fact to the candidate character string and displays it. Output to 103. As a result, whether the expression is determined to be known as a co-occurrence based on the co-occurrence information attached to the character string output to the display unit 103, or
You can easily know such expressions without accumulation.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、自然言語による文
書を作成、編集または表示するための自然言語処理装置
に係り、特に単語間の共起情報を用いて仮名漢字変換な
どの自然言語処理を行う自然言語処理装置及び自然言語
処理方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language processing apparatus for creating, editing or displaying a document in natural language, and more particularly to performing natural language processing such as kana-kanji conversion using co-occurrence information between words. The present invention relates to a natural language processing device and a natural language processing method.

【０００２】[0002]

【従来の技術】自然言語処理では、自然言語表現の確か
らしさを評価する尺度の１つとして、共起情報が利用さ
れる。共起情報とは、意味的に関係する単語同志を結び
付けた情報である。2. Description of the Related Art In natural language processing, co-occurrence information is used as one of the measures for evaluating the certainty of natural language expressions. Co-occurrence information is information that links words that are semantically related.

【０００３】例えば、日本語ワードプロセッサ等で用い
られる仮名漢字変換装置では、入力された仮名文字列か
ら変換変換結果として多数の単語列候補が生成されるの
で、これらの候補を品詞などの文法情報を用いて絞り込
んだ後に、共起情報を参照して優先順位を決定する方法
が用いられている。For example, in a kana-kanji conversion device used in a Japanese word processor or the like, a large number of word string candidates are generated from an inputted kana character string as a conversion conversion result. Therefore, these candidates are provided with grammatical information such as a part of speech. After narrowing down by using the method, a method of referring to the co-occurrence information and determining the priority is used.

【０００４】このような自然言語処理においては、入力
された自然言語情報に基づいた単語対を予め蓄積されて
いる共起情報と比較し、その有無によりその後の処理を
切り替えるといった方法が取られている。In such natural language processing, a method is used in which a word pair based on the input natural language information is compared with pre-stored co-occurrence information and the subsequent processing is switched depending on the presence or absence of the co-occurrence information. There is.

【０００５】しかし、単語対と共起情報との比較結果、
つまり、処理対象となる単語対の共起情報の有無の情報
は、直接にはシステム側でのみ利用できる情報であり、
この情報をユーザが直接、簡便に参照することはできな
かった。However, as a result of comparison between the word pair and the co-occurrence information,
In other words, the information about the presence or absence of co-occurrence information of a word pair to be processed is information that can be directly used only on the system side,
This information could not be directly referenced by the user.

【０００６】[0006]

【発明が解決しようとする課題】上記したように、共起
情報を用いた自然言語処理において、これまでユーザに
は、出力された自然言語表現中のある表現が、共起とし
て既知であると判断された表現か、それともそのような
蓄積のない表現とかいう情報を直接、簡便に参照する手
段が提供されていなかった。このため、ユーザは、出力
された自然言語表現を評価する際に、共起情報を十分利
用することができなかった。As described above, in natural language processing using co-occurrence information, it is known to the user that a given expression in the output natural language expressions is known as co-occurrence. No means for directly and simply referring to information such as a judged expression or an expression without such accumulation has been provided. Therefore, the user cannot sufficiently use the co-occurrence information when evaluating the output natural language expression.

【０００７】本発明は上記のような点に鑑みなされたも
ので、出力された自然言語表現が共起として既知である
か否かを容易に知ることのできる自然言語処理装置及び
自然言語処理方法を提供することを目的とする。The present invention has been made in view of the above points, and a natural language processing device and a natural language processing method capable of easily knowing whether or not an output natural language expression is known as co-occurrence. The purpose is to provide.

【０００８】[0008]

【課題を解決するための手段】本発明は、入力を変換し
て得られる自然言語表現を構成する言語情報を保持する
保持手段と、この保持手段に保持された言語情報の中の
単語群が共起情報と一致するか否かを判断する共起判断
手段と、この共起判断手段によって上記言語情報の中の
単語群が共起情報と一致しない場合あるいは一致する場
合に、その旨を知らしめる共起有無情報を自然言語表現
に付して出力する出力手段とを具備したことを特徴とす
る。According to the present invention, a holding means for holding language information constituting a natural language expression obtained by converting an input, and a word group in the language information held by the holding means are provided. A co-occurrence determination means for determining whether or not the co-occurrence information matches, and if the co-occurrence determination means determines that the word group in the language information does not match or coincides with the co-occurrence information, it notifies that fact. An output unit for outputting the co-occurrence presence / absence information attached to a natural language expression is output.

【０００９】このような構成によれば、入力を変換して
得られる自然言語表現を構成する言語情報を対象とし
て、その中の単語群が共起情報と一致するか否かが調べ
られる。そして、共起情報と一致しない場合あるいは一
致する場合に、その旨を知らしめる共起有無情報が自然
言語表現に付して出力される。According to this structure, it is examined whether or not the word group in the natural language expression obtained by converting the input matches the co-occurrence information. Then, when the co-occurrence information does not match or when the co-occurrence information does match, co-occurrence presence / absence information for notifying that effect is added to the natural language expression and output.

【００１０】したがって、ユーザは、出力された自然言
語表現に付与された共起有無情報から、その表現が共起
として既知であると判断されたものか、あるいは、その
ような蓄積のない表現とかいうことを簡単に知ることが
できる。これにより、出力された自然言語表現を理解
し、さらには編集時における文書の品質向上と文書作成
の効率向上を図ることができる。Therefore, the user says that the expression is judged to be known as co-occurrence from the output co-occurrence information given to the output natural language expression, or an expression without such accumulation. You can easily know that. As a result, the output natural language expression can be understood, and the quality of the document at the time of editing and the efficiency of document creation can be improved.

【００１１】[0011]

【発明の実施の形態】以下、図面を参照して本発明の一
実施形態を説明する。ここでは、共起情報を用いて自然
言語処理を行う自然言語処理装置として、仮名漢字変換
装置を例にして説明する。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below with reference to the drawings. Here, a Kana-Kanji conversion device will be described as an example of a natural language processing device that performs natural language processing using co-occurrence information.

【００１２】図１は本発明の一実施形態に係る仮名漢字
変換装置の概略構成を示すブロック図である。この仮名
漢字変換装置は、コンピュータ等の一種の情報処理装置
で構成されており、入力部，表示部や記憶部等の複数の
ハード部と、プログラム上にソフト的に構築された複数
のソフト部とで構成されている。FIG. 1 is a block diagram showing a schematic configuration of a kana-kanji conversion device according to an embodiment of the present invention. This kana-kanji conversion device is composed of a kind of information processing device such as a computer, and has a plurality of hardware parts such as an input part, a display part and a storage part, and a plurality of software parts constructed as software on a program. It consists of and.

【００１３】すなわち、仮名漢字変換装置は、大きく分
けて、入力部１０１、編集制御部１０２、表示部１０
３、単語検索部１０４、文節候補生成部１０５、文節候
補選択部１０６、接続テーブル１０７、単語列候補保持
部１０８、単語辞書１０９、及び共起辞書１１０で構成
されている。That is, the kana-kanji conversion device is roughly divided into an input unit 101, an edit control unit 102, and a display unit 10.
3, a word search unit 104, a phrase candidate generation unit 105, a phrase candidate selection unit 106, a connection table 107, a word string candidate holding unit 108, a word dictionary 109, and a co-occurrence dictionary 110.

【００１４】このような構成の仮名漢字変換装置におけ
る仮名漢字変換処理は、大きく分けて、２つの処理から
構成されている。第１の処理は、入力される仮名文字列
に対し、自立語と付属語の接続性に関する情報、付属語
と付属語の接続性に関する情報等を用いて文節の範囲を
認定する処理である。The kana-kanji conversion process in the kana-kanji conversion device having such a configuration is roughly divided into two processes. The first process is a process for recognizing the range of a phrase for an input kana character string by using information on the connectivity between independent words and adjuncts, information on the connectivity between adjuncts and adjuncts, and the like.

【００１５】この処理は、図１の入力部１０１、単語検
索部１０４、文節候補生成部１０５での処理に対応す
る。なお、この処理は、特開平３−２６５０６１号公報
に詳細に説明されているので、ここでは簡単に説明す
る。This process corresponds to the process in the input unit 101, the word search unit 104, and the phrase candidate generation unit 105 in FIG. Since this process is described in detail in Japanese Patent Laid-Open No. 3-265061, it will be briefly described here.

【００１６】まず、入力部１０１から、変換対象である
仮名情報が入力され、順次、編集制御部１０２を通じて
単語検索部１０４に送られる。単語検索部１０４では、
単語辞書１０９を参照して単語候補が抽出される。単語
検索部１０４で抽出された単語候補は、文節候補生成部
１０５に送られる。文節候補生成部１０５では、接続テ
ーブル１０７を参照して、複数の単語候補から文節候補
を生成し、結果を文節候補選択部１０６に送る。接続テ
ーブル１０７には、自立語と付属語、および付属語と付
属語の接続情報が格納されている。文節候補生成部１０
５の処理の結果は、単語列候補保持部１０８に記憶され
る。First, the kana information to be converted is input from the input unit 101 and sequentially sent to the word search unit 104 through the edit control unit 102. In the word search unit 104,
Word candidates are extracted by referring to the word dictionary 109. The word candidates extracted by the word search unit 104 are sent to the phrase candidate generation unit 105. The phrase candidate generating unit 105 refers to the connection table 107 to generate a phrase candidate from a plurality of word candidates, and sends the result to the phrase candidate selecting unit 106. The connection table 107 stores connection information of independent words and adjunct words, and adjunct words and adjunct words. Phrase candidate generation unit 10
The result of the process of 5 is stored in the word string candidate holding unit 108.

【００１７】第２の処理は、第１の処理で生成された１
個以上の単語列候補の中から第１候補を決定する処理で
ある。文節候補選択部１０６では、単語列候補から第１
候補を選択する。文節候補選択部１０６で選択された１
個の単語列候補は、編集制御部１０２に送られる。編集
制御部１０２には、仮名漢字変換された結果や、変換前
の仮名列、同音異義語に対する他の候補のリストなどが
供給され、それを一時的に記憶すると共に、出力表示す
べき文字列などを表示部１０３に送る。また、編集制御
部１０２は、カーソルの移動、文字列の削除、同音異義
語の選択など通常の編集操作に関連する入力も、入力部
１０１から受けとり、予め決められた編集関係の処理を
行なう。The second process is the 1 generated in the first process.
This is a process of determining a first candidate from a plurality of word string candidates. The phrase candidate selection unit 106 selects the first word string candidate from the first.
Select a candidate. 1 selected by the phrase candidate selection unit 106
The individual word string candidates are sent to the editing control unit 102. The edit control unit 102 is supplied with a result of Kana-Kanji conversion, a Kana string before conversion, a list of other candidates for homonyms, and the like, which temporarily stores the character string to be output and displayed. Etc. are sent to the display unit 103. The editing control unit 102 also receives from the input unit 101 inputs related to normal editing operations such as cursor movement, deletion of character strings, and selection of homonyms, and performs predetermined editing-related processing.

【００１８】単語辞書１０９内には、図２に示すよう
に、各単語に対して、単語番号、読み情報、見出し情
報、品詞情報、素性情報が設定されている。例えば、
「異議」という単語に対しては、単語番号が「００
１」、読み情報が「いぎ」、見出し情報が「異議」、品
詞情報が「名詞」、素性情報が「抑制」となっている。
他の単語についても同様である。As shown in FIG. 2, a word number, reading information, heading information, part-of-speech information, and feature information are set in the word dictionary 109 for each word. For example,
For the word "opposition", the word number is "00
1 ”, the reading information is“ Igi ”, the heading information is“ object ”, the part-of-speech information is“ noun ”, and the feature information is“ suppress ”.
The same applies to other words.

【００１９】共起辞書１１０内には、図３に示すよう
に、共起する単語の単語番号と、共起される単語の単語
番号と、２単語間の共起関係を示す関係情報とが記憶さ
れている。なお、単語番号は図２に示した単語辞書１０
９に設定された単語番号である。例えば、「００１」の
「異議」と、「００５」の「唱える」が、相互間に助詞
の「を」を挟んで、「異議を唱える」という形で用いら
れるという情報が登録されている。他の共起情報も同様
である。As shown in FIG. 3, the co-occurrence dictionary 110 stores word numbers of co-occurring words, word numbers of co-occurring words, and relationship information indicating a co-occurrence relationship between two words. Remembered The word number is the word dictionary 10 shown in FIG.
It is a word number set to 9. For example, information is registered that "objection" of "001" and "chant" of "005" are used in a form of "protest" with a particle "wa" sandwiched between them. The same applies to other co-occurrence information.

【００２０】なお、共起辞書１１０の参照処理、共起情
報と文節列候補との比較処理は、ここでは文節列候補選
択部１０６に含まれるが、その実現方法については、例
えば、特開平３−２６５０６１号公報、特願平５−５２
２４５公報に記述されているような方法を用いることが
できる。The reference processing of the co-occurrence dictionary 110 and the comparison processing of the co-occurrence information and the phrase sequence candidate are included in the phrase sequence candidate selection unit 106 here. No. 265061, Japanese Patent Application No. 5-52
The method described in Japanese Patent No. 245 publication can be used.

【００２１】また、自然言語表現を構成する言語情報
は、ここでは、単語列候補保持部１０８に保持されてい
る文節列候補である。次に、第１の実施形態に関わる編
集制御部１０２の処理について説明する。The language information forming the natural language expression is the phrase sequence candidate held in the word string candidate holding unit 108 here. Next, the processing of the edit control unit 102 according to the first embodiment will be described.

【００２２】編集制御部１０２は、文節列候補選択部１
０６での第１候補決定処理が終了すると、その結果を表
示部１０３に送る。その際に、編集制御部１０２は、文
節列候補選択部１０６によって得られる単語列候補保持
部１０８の単語列候補の各単語と共起情報との比較結果
に基づいて、共起情報が一致していない部分に赤い下線
を表示するための強調表示指示情報を付与して表示部１
０３に出力する。The edit control unit 102 includes a phrase sequence candidate selection unit 1
When the first candidate determination process in 06 is completed, the result is sent to the display unit 103. At that time, the editing control unit 102 determines that the co-occurrence information matches based on the comparison result of each word of the word string candidates of the word string candidate holding unit 108 obtained by the phrase string candidate selecting unit 106 and the co-occurrence information. The display unit 1 is provided with the highlighting instruction information for displaying the red underline on the non-displayed portion.
Output to 03.

【００２３】具体的に説明すると、今、単語辞書１０９
には図４に示すような辞書情報が登録されており、ま
た、共起辞書１１０には図５（ａ）に示すような共起情
報が登録されているものとする。More specifically, the word dictionary 109 will now be described.
It is assumed that the dictionary information as shown in FIG. 4 is registered in, and the co-occurrence information as shown in FIG. 5 (a) is registered in the co-occurrence dictionary 110.

【００２４】例えば、「しりょうをこうえんするとき」
という仮名文字列の入力があると、「資料を公演する
時」，「資料を後援する時」という候補文字列が得ら
れ、その中の「資料を公演する時」という候補文字列が
各単語の頻度情報等に基づいて第１候補として表示部１
０３に表示される。[0024] For example, "when you take a study"
When a kana character string is input, candidate character strings "when performing a material" and "when sponsoring a material" are obtained, and the candidate character string "when performing a material" in each word Display unit 1 as the first candidate based on the frequency information of
03 is displayed.

【００２５】ここで、「資料を公演する」という候補文
字列の中には共起情報と一致する単語対がないので、図
６（ａ）に示すように、「資料を公演する」の部分に赤
い下線が共起有無情報（ここでは共起していないことを
示す）として表示されることになる。Here, since there is no word pair that matches the co-occurrence information in the candidate character string "performance the material", as shown in FIG. 6 (a), the part "performance the material" is performed. A red underline will be displayed as co-occurrence information (indicating that no co-occurrence occurs here).

【００２６】一方、「しばいをこうえんするとき」とい
う仮名文字列の入力に対しては、「芝居を公演する
時」，「芝居を後援する時」という候補文字列が得ら
れ、その中の「芝居を公演する時」という候補文字列が
第１候補として表示部１０３に表示されるが、この場合
には、「芝居を公演する時」の中の「芝居」と「公演」
が共起情報と一致する単語であるため、図６（ｂ）に示
すように赤い下線の表示はない。On the other hand, in response to the input of a kana character string "when playing Shibai," candidate character strings "when performing a play" and "when sponsoring a play" are obtained. The candidate character string “when performing a play” is displayed on the display unit 103 as the first candidate. In this case, “play” and “performance” in “when performing a play”
Is a word that matches the co-occurrence information, there is no red underline display as shown in FIG.

【００２７】図７は上記第１の実施形態における処理動
作を示すフローチャートである。入力部１０１から変換
対象となる仮名文字列が入力されると、単語検索部１０
４にて当該仮名文字列に対応する単語候補が抽出され、
文節候補生成部１０５にてその単語候補から文節候補が
生成される。この文節候補生成部１０５によって得られ
た結果は、単語列候補保持部１０８に記憶される。FIG. 7 is a flow chart showing the processing operation in the first embodiment. When a kana character string to be converted is input from the input unit 101, the word search unit 10
In 4, the word candidates corresponding to the kana character string are extracted,
The phrase candidate generating unit 105 generates phrase candidates from the word candidates. The result obtained by the phrase candidate generating unit 105 is stored in the word string candidate holding unit 108.

【００２８】その際に、文節候補選択部１０６は、共起
辞書１１０を用いて文節候補の各単語に関する共起情報
を参照し、その共起情報と一致する単語があれば、その
旨を示すフラグ情報を付加しておく。At this time, the phrase candidate selecting unit 106 refers to the co-occurrence information regarding each word of the phrase candidate using the co-occurrence dictionary 110, and if there is a word that matches the co-occurrence information, it indicates that. Flag information is added.

【００２９】しかして、編集制御部１０２は、文節候補
選択部１０６を通じて、変換結果として出力すべき第１
位の候補文字列を得ると（ステップＡ１１）、その第１
位の候補文字列の中の１番目の文節を参照し、その文節
に共起情報の一致フラグが立っているか否かを調べる
（ステップＡ１２）。The edit control unit 102, through the phrase candidate selecting unit 106, outputs the first result to be output as the conversion result.
When a candidate character string for rank is obtained (step A11), the first
The first phrase in the candidate character string for rank is referred to, and it is checked whether or not the coincidence flag of the co-occurrence information is set for that phrase (step A12).

【００３０】その結果、当該文節に共起情報の一致フラ
グが立っていなければ（ステップＡ１３のＮｏ）、編集
制御部１０２は当該文節に相当する表層文字列情報に対
して強調表示指示情報を付与する（ステップＡ１４）。
一方、共起情報の一致フラグが立っていれば（ステップ
Ａ１３のＹｅｓ）、このような強調表示指示情報の付与
は行わない。As a result, if the coincidence flag of the co-occurrence information is not set in the phrase (No in step A13), the edit control unit 102 adds the highlighting instruction information to the surface character string information corresponding to the phrase. (Step A14).
On the other hand, if the coincidence flag of the co-occurrence information is set (Yes in step A13), such emphasis display instruction information is not added.

【００３１】このようにして、編集制御部１０２は第１
位の候補文字列の各文節毎に共起情報の有無を調べるこ
とにより（ステップＡ１５，Ａ１６）、共起情報のかか
っていない文節を検出すると、その文節文字列に強調表
示指示情報を付与した後、表示部１０３に出力する（ス
テップＡ１７）。これにより、候補文字列が出力された
際に、赤い下線の有無によって、当該表現が共起として
既知であるか否かを容易に確認できるようになる。In this way, the editing control unit 102 is the first
By checking the presence / absence of co-occurrence information for each clause of the candidate character string of rank (steps A15 and A16), if a clause without co-occurrence information is detected, highlighting instruction information is added to the clause character string. Then, it outputs to the display part 103 (step A17). Thereby, when the candidate character string is output, it becomes possible to easily confirm whether or not the expression is known as the co-occurrence by the presence or absence of the red underline.

【００３２】次に、第２の実施形態に関わる編集制御部
１０２の処理について説明する。編集制御部１０２で
は、文節列候補選択部１０６での第１候補決定処理が終
了すると、その結果を表示部１０３に送る。その際に、
編集制御部１０２は、文節列候補選択部１０６によって
得られる単語列候補保持部１０８の単語列候補の各単語
と共起情報との比較結果に基づいて、共起情報が一致し
ている部分を薄い鼠色の文字（ドットを間引いた文字）
で表示するための強調表示指示情報を付与して表示部１
０３に出力する。Next, the processing of the edit control unit 102 according to the second embodiment will be described. In the edit control unit 102, when the first candidate determination process in the phrase sequence candidate selection unit 106 is completed, the result is sent to the display unit 103. At that time,
The edit control unit 102 determines a portion where the co-occurrence information matches based on the comparison result of each word of the word sequence candidates of the word sequence candidate holding unit 108 obtained by the phrase sequence candidate selection unit 106 and the co-occurrence information. Light gray characters (dot thinned characters)
Display unit 1 with added highlighting instruction information for displaying in
Output to 03.

【００３３】具体的に説明すると、今、単語辞書１０９
には図４に示すような辞書情報が登録されており、ま
た、共起辞書１１０には図５（ａ）に示すような共起情
報が登録されているものとする。More specifically, the word dictionary 109 will now be described.
It is assumed that the dictionary information as shown in FIG. 4 is registered in, and the co-occurrence information as shown in FIG. 5 (a) is registered in the co-occurrence dictionary 110.

【００３４】例えば、「しりょうをこうえんするとき」
という仮名文字列の入力があると、「資料を公演する
時」，「資料を後援する時」という候補文字列が得ら
れ、その中の「資料を公演する時」という候補文字列が
各単語の頻度情報等に基づいて第１候補として表示部１
０３に表示される。[0034] For example, "when you need to study"
When a kana character string is input, candidate character strings "when performing a material" and "when sponsoring a material" are obtained, and the candidate character string "when performing a material" in each word Display unit 1 as the first candidate based on the frequency information of
03 is displayed.

【００３５】ここで、「資料を公演する」という候補文
字列の中には共起情報と一致する単語対がないので、そ
れらは通常の（デフォルトの）文字で表示される。一
方、「しばいをこうえんするとき」という仮名文字列の
入力に対しては、「芝居を公演する時」，「芝居を後援
する時」という候補文字列が得られ、その中の「芝居を
公演する時」という候補文字列が第１候補として表示部
１０３に表示されるが、この場合には、「芝居を公演す
る」の中の「芝居」と「公演」が共起情報と一致する単
語であるため、この部分は薄い鼠色の文字で表示される
ことになる。Here, since there is no word pair that matches the co-occurrence information in the candidate character string "performance material", they are displayed in normal (default) characters. On the other hand, in response to the input of the kana character string "when playing Shibai", candidate character strings "when performing a play" and "when sponsoring a play" are obtained. The candidate character string “when performing” is displayed as the first candidate on the display unit 103. In this case, “play” and “performance” in “play the performance” match the co-occurrence information. Since it is a word, this part will be displayed in light gray characters.

【００３６】図８は上記第２の実施形態における処理動
作を示すフローチャートである。入力部１０１から変換
対象となる仮名文字列が入力されると、単語検索部１０
４にて当該仮名文字列に対応する単語候補が抽出され、
文節候補生成部１０５にてその単語候補から文節候補が
生成される。この文節候補生成部１０５によって得られ
た結果は、単語列候補保持部１０８に記憶される。FIG. 8 is a flow chart showing the processing operation in the second embodiment. When a kana character string to be converted is input from the input unit 101, the word search unit 10
In 4, the word candidates corresponding to the kana character string are extracted,
The phrase candidate generating unit 105 generates phrase candidates from the word candidates. The result obtained by the phrase candidate generating unit 105 is stored in the word string candidate holding unit 108.

【００３７】その際に、文節候補選択部１０６は、共起
辞書１１０を用いて文節候補の各単語に関する共起情報
を参照し、その共起情報と一致する単語があれば、その
旨を示すフラグ情報を付加しておく。At this time, the phrase candidate selecting unit 106 refers to the co-occurrence information regarding each word of the phrase candidate using the co-occurrence dictionary 110, and if there is a word that matches the co-occurrence information, it indicates that. Flag information is added.

【００３８】しかして、編集制御部１０２は、文節候補
選択部１０６を通じて、変換結果として出力すべき第１
位の候補文字列を得ると（ステップＢ１１）、その第１
位の候補文字列の中の１番目の文節を参照し、その文節
に共起情報の一致フラグが立っているか否かを調べる
（ステップＢ１２）。The edit control unit 102, through the phrase candidate selection unit 106, outputs the first conversion result.
When the candidate character string for rank is obtained (step B11), the first
The first phrase in the rank candidate character string is referred to, and it is checked whether or not the coincidence flag of the co-occurrence information is set for that phrase (step B12).

【００３９】その結果、当該文節に共起情報の一致フラ
グが立っていれば（ステップＢ１３のＹｅｓ）、編集制
御部１０２は当該文節に相当する表層文字列情報に対し
て抑制表示指示情報を付与する（ステップＢ１４）。一
方、共起情報の一致フラグが立っていなければ（ステッ
プＢ１３のＮｏ）、このような抑制表示指示情報の付与
は行わない。As a result, when the coincidence flag of the co-occurrence information is set for the phrase (Yes in step B13), the edit control unit 102 adds the suppression display instruction information to the surface layer character string information corresponding to the phrase. (Step B14). On the other hand, if the coincidence flag of the co-occurrence information is not set (No in step B13), such suppression display instruction information is not added.

【００４０】このようにして、編集制御部１０２は第１
位の候補文字列の各文節毎に共起情報の有無を調べるこ
とにより（ステップＢ１５，Ｂ１６）、共起情報のかか
っている文節を検出すると、その文節文字列に抑制表示
指示情報を付与した後、表示部１０３に出力する（ステ
ップＢ１７）。これにより、候補文字列が出力された際
に、文字の表示濃度によって、当該表現が共起として既
知であるか否かを容易に確認できるようになる。In this way, the editing control unit 102 is the first
By checking the presence / absence of co-occurrence information for each phrase of the candidate character string of rank (steps B15, B16), when a phrase with co-occurrence information is detected, suppression display instruction information is added to the phrase character string. Then, it outputs to the display part 103 (step B17). Accordingly, when the candidate character string is output, it becomes possible to easily confirm whether or not the expression is known as the co-occurrence based on the display density of the character.

【００４１】次に、第３の実施形態に関わる編集制御部
１０２の動作について説明する。編集制御部１０２は、
共起有無情報表示処理部と編集処理部からなる。そし
て、編集処理部に送られた文字が、共起有無情報を表示
させるための表示キーのコードでない場合にはそのコー
ドに対応した処理を行なう。Next, the operation of the editing control section 102 according to the third embodiment will be described. The editing control unit 102
The co-occurrence / non-occurrence information display processing unit and the editing processing unit are included. Then, if the character sent to the edit processing unit is not the code of the display key for displaying the co-occurrence information, the process corresponding to the code is performed.

【００４２】また、共起有無情報表示キーであった場合
には、単語列候補保持部を１０８参照し、共起有無情報
を読み出し、共起情報が一致していない単語対に、表示
部１０３上で白黒反転表示とするためのコードを付与し
て、表示部１０８に出力する。If the key is the co-occurrence / non-occurrence information display key, the word string candidate holding unit 108 is referred to read the co-occurrence / non-occurrence information, and the display unit 103 displays the word pairs whose co-occurrence information does not match. A code for black-and-white reverse display is added to the above, and output to the display unit 108.

【００４３】具体的に説明すると、今、単語辞書１０９
には図４に示すような辞書情報が登録されており、ま
た、共起辞書１１０には図５（ａ）に示すような共起情
報が登録されているものとする。More specifically, the word dictionary 109 will now be described.
It is assumed that the dictionary information as shown in FIG. 4 is registered in, and the co-occurrence information as shown in FIG. 5 (a) is registered in the co-occurrence dictionary 110.

【００４４】例えば、「しりょうをこうえんするとき」
という仮名文字列の入力があると、「資料を公演する
時」，「資料を後援する時」という候補文字列が得ら
れ、その中の「資料を公演する時」という候補文字列が
各単語の頻度情報等に基づいて第１候補として表示部１
０３に表示される。For example, "when you take a study"
When a kana character string is input, candidate character strings "when performing a material" and "when sponsoring a material" are obtained, and the candidate character string "when performing a material" in each word Display unit 1 as the first candidate based on the frequency information of
03 is displayed.

【００４５】ここで、入力部１０１に設けられた共起有
無情報表示キーが押されると、「資料を公演する」の部
分には共起情報がないので、「資料を公演する」の部分
が白黒反転表示される。When the co-occurrence / non-occurrence information display key provided in the input unit 101 is pressed, the "Perform material" part has no co-occurrence information, so that the "Perform material" part is displayed. Black and white display is reversed.

【００４６】一方、「しばいをこうえんするとき」とい
う仮名文字列の入力に対しては、「芝居を公演する
時」，「芝居を後援する時」という候補文字列が得ら
れ、その中の「芝居を公演する時」という候補文字列が
第１候補として表示部１０３に表示されるが。この場合
には、共起有無情報表示キーが操作されても、表示部１
０３の表示候補には変化がない。On the other hand, in response to the input of a kana character string "when playing Shibai", candidate character strings "when performing a play" and "when supporting a play" are obtained. Although the candidate character string "when performing a play" is displayed on the display unit 103 as the first candidate. In this case, even if the co-occurrence / non-occurrence information display key is operated, the display unit 1
The display candidate of 03 does not change.

【００４７】図９は上記第３の実施形態における処理動
作を示すフローチャートである。入力部１０１から変換
対象となる仮名文字列が入力されると、単語検索部１０
４にて当該仮名文字列に対応する単語候補が抽出され、
文節候補生成部１０５にてその単語候補から文節候補が
生成される。この文節候補生成部１０５によって得られ
た結果は、単語列候補保持部１０８に記憶される。FIG. 9 is a flow chart showing the processing operation in the third embodiment. When a kana character string to be converted is input from the input unit 101, the word search unit 10
In 4, the word candidates corresponding to the kana character string are extracted,
The phrase candidate generating unit 105 generates phrase candidates from the word candidates. The result obtained by the phrase candidate generating unit 105 is stored in the word string candidate holding unit 108.

【００４８】その際に、文節候補選択部１０６は、共起
辞書１１０を用いて文節候補の各単語に関する共起情報
を参照し、その共起情報と一致する単語があれば、その
旨を示すフラグ情報を付加しておく。At this time, the phrase candidate selecting unit 106 refers to the co-occurrence information regarding each word of the phrase candidate using the co-occurrence dictionary 110, and if there is a word that matches the co-occurrence information, it indicates that. Flag information is added.

【００４９】しかして、編集制御部１０２は、文節候補
選択部１０６を通じて、変換結果として出力すべき第１
位の候補文字列を得ると（ステップＣ１１）、その第１
位の候補文字列の中の１番目の文節を参照し、その文節
に共起情報の一致フラグが立っているか否かを調べる
（ステップＣ１２）。The edit control unit 102, through the phrase candidate selecting unit 106, outputs the first conversion result to be output.
If a candidate character string for rank is obtained (step C11), the first
The first phrase in the rank candidate character string is referred to, and it is checked whether or not the coincidence flag of the co-occurrence information is set for that phrase (step C12).

【００５０】その結果、当該文節に共起情報の一致フラ
グが立っていなければ（ステップＣ１３のＮｏ）、編集
制御部１０２は当該文節に相当する表層文字列情報に対
して強調表示指示情報を付与する（ステップＣ１４）。
一方、共起情報の一致フラグが立っていれば（ステップ
Ｃ１３のＹｅｓ）、このような強調表示指示情報の付与
は行わない。As a result, if the coincidence flag of the co-occurrence information is not set for the phrase (No in step C13), the edit control unit 102 adds the highlighting instruction information to the surface character string information corresponding to the phrase. (Step C14).
On the other hand, if the coincidence flag of the co-occurrence information is set (Yes in step C13), such emphasis display instruction information is not added.

【００５１】このようにして、編集制御部１０２は第１
位の候補文字列の各文節毎に共起情報の有無を調べるこ
とにより（ステップＣ１５，Ｃ１６）、共起情報のかか
っていない文節を検出すると、その文節文字列に強調表
示指示情報を付与した後、表示部１０３に出力する（ス
テップＣ１７）。In this way, the editing control unit 102 is the first
By checking the presence / absence of co-occurrence information for each clause of the candidate character string of rank (steps C15 and C16), when a clause without co-occurrence information is detected, the highlighting instruction information is added to the clause character string. Then, it outputs to the display part 103 (step C17).

【００５２】ここで、表示部１０３では、第１位の候補
文字列を表示するが（ステップＣ１８）、その際に、入
力部１０１から共起有無情報の表示コードが入力された
場合には（ステップＣ１９のＹｅｓ）、編集制御部１０
２は強調表示指示情報が付与されている部分を白黒反転
表示する（ステップＣ２０）。これにより、候補文字列
が出力された際に、共起有無情報表示キーの押下によっ
て変化する白黒反転部分によって、当該表現が共起とし
て既知であるか否かを容易に確認できるようになる。Here, the display unit 103 displays the first-ranked candidate character string (step C18). At that time, if the display code of the co-occurrence information is input from the input unit 101 ( Step C19: Yes), edit control unit 10
In step 2, the portion to which the highlighting instruction information is added is displayed in black and white in reverse (step C20). Thus, when the candidate character string is output, it is possible to easily confirm whether or not the expression is known as a co-occurrence, by the black-and-white inverted portion that is changed by pressing the co-occurrence information display key.

【００５３】なお、本発明は、上記実施形態のみならず
その主旨を逸脱しない範囲で種々に用いることができ
る。かな漢字変換システムで、共起有無情報が付与され
た変換結果の出力を他のシステムが解釈し、表示する形
態でも構わない。The present invention can be used not only in the above-described embodiment but also in various other ways without departing from the spirit of the invention. In the kana-kanji conversion system, the output of the conversion result provided with the co-occurrence information may be interpreted and displayed by another system.

【００５４】また、かな漢字変換処理の過程で、第１候
補として出力する候補文字列に対して、対応する読み文
字列が同じ場所である第２位以下の候補の単語が共起情
報と一致する場合に、その旨を示す情報を同時に付与
し、強調表示することもできる。Further, in the process of kana-kanji conversion processing, with respect to the candidate character string output as the first candidate, the words of the second and lower candidates in which the corresponding reading character strings are at the same position match the co-occurrence information. In this case, information indicating that fact can be added at the same time and highlighted.

【００５５】例えば、「しばいをこうえんする」という
文字列の入力に対して、共起辞書１１０に図５（ｂ）に
示すような共起情報が登録されている場合には、第１位
の候補文字列である「芝居を公演する」という表現の他
に、第２位の候補文字列である「芝居を後援する」も共
起情報が一致することになる。For example, if the co-occurrence information as shown in FIG. 5B is registered in the co-occurrence dictionary 110 in response to the input of the character string "SHIBAI KOU SHI EN", the first place In addition to the expression "Perform a play", which is the candidate character string of, the co-occurrence information also matches the second-ranked candidate character string, "Sponsor a play."

【００５６】そこで、上記第１の実施形態の例で言え
ば、「芝居を公演する」の部分に薄い赤の下線を引いて
表示する。これにより、次候補操作によって次に共起情
報が一致する候補が出力されることを事前に知ることが
できる。Therefore, in the example of the first embodiment described above, a thin red underline is displayed in the "play performance" portion. Thereby, it is possible to know in advance that the candidate having the next matching co-occurrence information will be output by the next candidate operation.

【００５７】また、かな漢字変換処理の他に、入力が第
１の言語、例えば英語で、出力が第２の言語、例えば日
本語であるような自然言語処理つまり機械翻訳処理にお
いて、出力する日本語を構成する言語情報に対して共起
情報を参照し、比較する手段を具備することにより、訳
文中の共起情報に一致しない部分、あるいは一致した部
分を強調表示することもできる。In addition to the kana-kanji conversion processing, the Japanese output in the natural language processing, that is, the machine translation processing in which the input is the first language, for example, English, and the output is the second language, for example, Japanese. By providing a means for referring to and comparing the co-occurrence information with respect to the linguistic information that composes, it is possible to highlight a portion that does not match the co-occurrence information in the translated text or a portion that matches.

【００５８】また、参照する共起情報は、システム内部
に所有する共起辞書でも、システム内部に所有する単語
辞書の一部でも、以前の処理中に学習した辞書でも、シ
ステム外部の辞書でも、システム内外の文書データから
検索しても、いずれでも構わない。The co-occurrence information to be referred to may be a co-occurrence dictionary owned inside the system, a part of the word dictionary owned inside the system, a dictionary learned during the previous processing, or a dictionary outside the system. It does not matter whether it is retrieved from the document data inside or outside the system.

【００５９】また、単語対単語の共起情報ばかりでな
く、単語グループを表す符号（いわゆる意味素性）と単
語、あるいは単語グループ符号間の共起情報であっても
構わない。単語対単語の共起情報にも符号を含んだ共起
情報にも一致しなかった単語は赤の下線で示し、符号を
含んだ共起情報だけに一致した分は薄い赤の下線で示し
てもよい。Further, not only the word-to-word co-occurrence information, but also co-occurrence information between a code representing a word group (so-called semantic feature) and a word or a word group code may be used. Words that do not match the word-to-word co-occurrence information or the code-containing co-occurrence information are shown in red underline, and the words that match only the code-containing co-occurrence information are shown in light red underline. Good.

【００６０】また、３単語以上の共起情報でも構わな
い。また、共起情報と一致した単語対に対し、一致した
共起情報が所在する場所を表す情報、例えばその場所を
参照するための情報を付与して共起有無情報としても構
わない。Further, co-occurrence information of three or more words may be used. Further, the word pair matching with the co-occurrence information may be provided with information indicating a place where the matching co-occurrence information is located, for example, information for referring to the place may be provided as co-occurrence information.

【００６１】参照するための情報とは、例えば、共起情
報がシステム内部の辞書であればその辞書のアドレス情
報でもよいし、システム外部のＣＤ−ＲＯＭ上の文書デ
ータ中の表現であれば、その場所を示す情報でもよい。The information to be referred to may be address information of the dictionary if the co-occurrence information is a dictionary inside the system, or if it is an expression in document data on a CD-ROM outside the system. It may be information indicating the place.

【００６２】参照した共起情報の種類、例えば、単語間
の構文的関係（合成語であるとか、主語と述語であると
かの関係）、共起情報の頻度、共起情報の分野、共起情
報の年代、共起情報の所属したデータ、共起情報の尤度
に関する情報等の種類情報により、共起有無情報に種別
を設け、共起有無情報の有用性を高めることもできる。The type of co-occurrence information referred to, for example, syntactic relation between words (relationship between compound words, subject and predicate), frequency of co-occurrence information, field of co-occurrence information, co-occurrence The usefulness of the co-occurrence information can be enhanced by providing the co-occurrence information with a type according to the type of information such as the age of the information, the data to which the co-occurrence information belongs, and information about the likelihood of the co-occurrence information.

【００６３】また、予め設定された品詞列条件、例えば
品詞＋助詞＋動詞という条件により言語情報中の単語対
がこの条件に一致するか否かを調べる処理と組み合わせ
るこで、例えば条件に一致した単語対でかつ共起情報に
一致しない単語対を検出することにより、参考とすべき
共起有無情報の量を適正化することができる。By combining with a process of checking whether a word pair in the language information matches this condition by a preset part-of-speech sequence condition, for example, a condition of part-of-speech + particle + verb, the condition is met. By detecting a word pair that does not match the co-occurrence information, the amount of co-occurrence presence / absence information to be used as a reference can be optimized.

【００６４】また、予め設定された品詞列条件、例えば
副詞＋動詞という条件により、言語情報中の単語対がこ
の条件に一致するか否かを調べる処理と組み合わせるこ
とで、例えば条件に一致した単語対でかつ共起情報に一
致しない単語対を検出することにより、参考とすべき共
起有無情報の量を適正化することができる。Further, by combining with a process of checking whether a word pair in the language information matches this condition by a preset part-of-speech sequence condition, for example, an adverb + verb condition, for example, a word that matches the condition By detecting a word pair that is a pair and does not match the co-occurrence information, the amount of co-occurrence presence / absence information to be referenced can be optimized.

【００６５】また、予め設定された品詞列条件、例えば
名詞＋動詞という条件により、言語情報中の単語対がこ
の条件に一致するか否かを調べる処理と組み合わせるこ
とで、例えば条件に一致した単語対でかつ共起情報に一
致しない単語対を検出することにより、参考とすべき共
起有無情報の量を適正化することができる。By combining with a process of checking whether a word pair in the language information matches this condition by a preset part-of-speech sequence condition, for example, a condition of noun + verb, for example, a word that matches the condition. By detecting a word pair that is a pair and does not match the co-occurrence information, the amount of co-occurrence presence / absence information to be referenced can be optimized.

【００６６】また、予め設定された品詞列条件、例えば
副詞＋動詞という条件により、言語情報中の単語対がこ
の条件に一致するか否かを調べる処理と組み合わせるこ
とで、例えば条件に一致した単語対でかつ共起情報に一
致しない単語対を検出することにより、参考とすべき共
起有無情報の量を適正化することができる。Further, by combining a part-of-speech condition set in advance, for example, an adverb + verb, with a process for checking whether a word pair in the language information matches this condition, for example, a word that matches the condition. By detecting a word pair that is a pair and does not match the co-occurrence information, the amount of co-occurrence presence / absence information to be referenced can be optimized.

【００６７】さらに、特定の単語に付与された素性（２
値でも多値でも良く、特性情報と呼んでも、また、プロ
パティと呼んでもよい）、例えば「用言の連体修飾を受
けやすい名詞」に対して付与された素性を、言語情報中
に単語が持つか持たないかを調べる処理と組み合わせる
ことで、例えば素性を持たず、用言の連体修飾を受け、
かつ共起情報に一致しない単語対を検出することによ
り、参考とすべき共起有無情報の量を適正化することが
できる。Furthermore, the features (2
It may be a value or a multivalue, and may be called property information or property), for example, a word in language information has a feature assigned to "a noun susceptible to adnominal modification of a noun" By combining it with the process of checking whether it has or not, for example, it has no features and is subject to adnominal modification
Moreover, by detecting a word pair that does not match the co-occurrence information, the amount of co-occurrence presence / absence information to be referred to can be optimized.

【００６８】[0068]

【発明の効果】以上のように本発明によれば、入力を変
換して得られる自然言語表現を構成する言語情報を対象
として、この言語情報の中の単語群が共起情報と一致す
るか否かを調べ、上記言語情報の中の単語群が共起情報
と一致しない場合あるいは一致する場合に、その旨を知
らしめる共起有無情報を自然言語表現に付して出力する
ようにしたため、自然言語表現が出力された際に、その
表現が共起として既知であるか否かを容易に確認できる
ようになる。これにより、出力された自然言語表現を理
解し、さらには編集時における文書の品質向上と文書作
成の効率向上を図ることができる。As described above, according to the present invention, whether the word group in this language information matches the co-occurrence information for the language information that constitutes the natural language expression obtained by converting the input. When the word group in the above language information does not match or coincides with the co-occurrence information, the co-occurrence presence / absence information indicating the fact is output by attaching it to the natural language expression. When a natural language expression is output, it becomes possible to easily confirm whether or not the expression is known as co-occurrence. As a result, the output natural language expression can be understood, and the quality of the document at the time of editing and the efficiency of document creation can be improved.

[Brief description of drawings]

【図１】本発明の一実施形態に係る仮名漢字変換装置の
概略構成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a kana-kanji conversion device according to an embodiment of the present invention.

【図２】上記装置の単語辞書の構成を示す図。FIG. 2 is a diagram showing a configuration of a word dictionary of the device.

【図３】上記装置の共起辞書の構成を示す図。FIG. 3 is a diagram showing a configuration of a co-occurrence dictionary of the device.

【図４】上記単語辞書の具体例を示す図。FIG. 4 is a diagram showing a specific example of the word dictionary.

【図５】上記共起辞書の具体例を示す図。FIG. 5 is a diagram showing a specific example of the co-occurrence dictionary.

【図６】第１の実施形態としての表示例を示す図。FIG. 6 is a diagram showing a display example according to the first embodiment.

【図７】第１の実施形態としての処理動作を示すフロー
チャート。FIG. 7 is a flowchart showing a processing operation according to the first embodiment.

【図８】第２の実施形態としての処理動作を示すフロー
チャート。FIG. 8 is a flowchart showing a processing operation according to the second embodiment.

【図９】第３の実施形態としての処理動作を示すフロー
チャート。FIG. 9 is a flowchart showing a processing operation according to the third embodiment.

[Explanation of symbols]

１０１…入力部、１０２…編集制御部、１０３…表示部、１０４…単語検索部、１０５…文節候補生成部、１０６…文節候補選択部、１０７…接続テーブル、１０８…単語列候補保持部、１０９…単語辞書、１１０…共起辞書。 101 ... Input unit, 102 ... Editing control unit, 103 ... Display unit, 104 ... Word search unit, 105 ... Phrase candidate generation unit, 106 ... Phrase candidate selection unit, 107 ... Connection table, 108 ... Word string candidate holding unit, 109 … Word dictionary, 110… Co-occurrence dictionary.

───────────────────────────────────────────────────── フロントページの続き (72)発明者出羽達也神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者水谷由美神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Tatsuya Dewa, Inventor Tatsuya Dewa, 1 Komukai Toshiba-cho, Sachi-ku, Kawasaki-shi, Kanagawa Within the Corporate Research and Development Center, Toshiba Corporation (72) Yumi Mizutani, Komukai-Toshiba, Saiwai-ku, Kawasaki-shi, Kanagawa Town No. 1 Toshiba Corporation Research & Development Center

Claims

[Claims]

1. A holding means for holding language information constituting a natural language expression obtained by converting an input, and whether or not a word group in the language information held by this holding means matches the cooccurrence information. The co-occurrence determining means for determining whether or not the co-occurrence determining means determines whether the word group in the language information does not match or coincides with the co-occurrence information by the co-occurrence determining means. A natural language processing apparatus, comprising: an output unit that outputs an expression.

2. Targeting the linguistic information that constitutes the natural language expression obtained by converting the input, it is checked whether or not the word group in this linguistic information matches the co-occurrence information, and the A natural language processing method characterized in that, when the word group of does not match with the co-occurrence information or when it matches, the co-occurrence presence / absence information indicating that fact is added to the natural language expression and output.