JPH07110844A

JPH07110844A - Japanese document processor

Info

Publication number: JPH07110844A
Application number: JP5255549A
Authority: JP
Inventors: Minako Kuwata; みな子桑田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1993-10-13
Filing date: 1993-10-13
Publication date: 1995-04-25

Abstract

(57)【要約】【目的】本発明は、文字や音声などの日本語認識入力
装置やキーボードからの日本語入力装置などにおいて、
「対象文字の３文字接続テーブル」を用いて、認識装置
やキーボードから入力された文章の中の連続する対象文
字３文字の接続頻度を検査することにより、入力のミス
の可能性を指摘する方法である。さらに、特に認識装置
においては従来の技術において認識候補文字が存在す
る。【構成】認識結果１位の対象文字の候補文字を対象文
字と入れ換えて検査を行ない、認識結果１位の文字と比
較して、候補文字の接続のほうが頻度が高く、かつある
条件をみたしているときに、元の認識結果１位の文字が
誤っていたものとして、頻度の高い候補文字に自動的に
修正する。対象文字が多くなれば、弊害がでてくるた
め、本発明では、平仮名およびカタカナおよび日本語に
頻繁に用いられる記号を、対象文字に設定する。 (57) [Abstract] [Purpose] The present invention provides a Japanese recognition input device for characters and voices, a Japanese input device from a keyboard, and the like.
A method of pointing out the possibility of an input error by checking the connection frequency of three consecutive target characters in a sentence input from a recognition device or a keyboard using the "three-character connection table for target characters" Is. Furthermore, especially in the recognition device, there are recognition candidate characters in the prior art. [Structure] The candidate character of the first character of the recognition result is replaced with the target character for inspection, and the candidate character is more frequently connected than the character of the first character of the recognition result, and a certain condition is satisfied. In this case, it is automatically corrected to a candidate character with a high frequency, assuming that the first character of the original recognition result was wrong. As the number of target characters increases, harmful effects will appear. Therefore, in the present invention, symbols frequently used in hiragana, katakana, and Japanese are set as the target characters.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本願発明は、日本語の音声、文字
認識装置、ワードプロセッサの入力時の処理装置に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Japanese voice, a character recognition device, and a processing device at the time of input of a word processor.

【０００２】[0002]

【従来の技術】従来、日本語入力装置、特に、認識装置
の後処理方式は、単語辞書との照合検査や、単語間の文
法的接続検査による方式が使用されている（梅田：「単
語辞書を用いた文字認識における文字の確定能力」電子
情報通信学会論文誌Ｖｏｌ．Ｊ７２−Ｄ−ＩＩＮ
ｏ．１ｐｐ．２２−３１１９８９年１月、及び池
原ほか：「単語解析プログラムによる日本文誤字の自
動検出と二次マルコフモデルによる訂正候補の抽出」情
報処理学会誌Ｖｏｌ．２５Ｎｏ．２ｐｐ．２９８
−３０５１９８４年３月）。2. Description of the Related Art Conventionally, as a post-processing method for a Japanese input device, particularly a recognition device, a method based on a collation check with a word dictionary and a grammatical connection check between words has been used (Umeda: "Word Dictionary". Determining Power of Characters in Character Recognition Using "The Institute of Electronics, Information and Communication Engineers, Vol. J72-D-II N
o. 1 pp. 22-31 January 1989, and Ikehara et al .: "Automatic detection of Japanese typographical errors by word analysis program and extraction of correction candidates by secondary Markov model" IPSJ Journal Vol. 25 No. 2 pp. 298
-305 March 1984).

【０００３】また、２文字連接確率を利用する方法もあ
る（杉村ほか：「文字連接情報を用いた読取り不能文字
の判定処理」電子通信学会論文誌Ｖｏｌ．Ｊ６８−Ｄ
Ｎｏ．１ｐｐ．６４−７１１９８５年１月）。There is also a method of utilizing a two-character concatenation probability (Sugimura et al .: "Unreadable character determination processing using character concatenation information", IEICE Transactions Vol. J68-D.
No. 1 pp. 64-71 January 1985).

【０００４】これらの方法は、漢字ひらがな、カタカナ
などの全ての文字種を含む日本語文章の処理をカバーす
ることができるものであり、後処理の修正効果は高い。These methods can cover the processing of Japanese sentences including all character types such as kanji, hiragana and katakana, and the post-processing correction effect is high.

【０００５】[0005]

【発明が解決しようとする課題】上述の方法では、解析
に用いる巨大な単語辞書を必要とする、処理が複雑であ
る、処理時間がかかる、誤ることの少ない漢字の処理に
多くの時間を使うため処理に無駄が多い、などの形態素
解析を利用した際の後処理の問題がある。In the above method, a huge word dictionary used for analysis is required, processing is complicated, processing takes time, and much time is spent for processing Kanji that is not easily mistaken. Therefore, there is a problem of post-processing when using morphological analysis, such as a lot of waste of processing.

【０００６】[0006]

【課題を解決するための手段】本発明は上述する課題を
解決するためになされたもので、日本語文書における入
力の誤りを指摘し、修正する日本語文書処理装置であっ
て、一般文章から統計処理により作成された対象文字の
３文字接続頻度テーブルと、入力された漢字を含む日本
語の対象文字の並びを取り出す手段と、取り出された文
字の並びを前記３文字接続頻度テーブルにより誤入力の
可能性があるか否かをチェックする手段と、前記チェッ
クの結果から頻度の極端に低い文字の並びを誤入力の可
能性がある綴りとし使用者に指摘する手段と、前記チェ
ックの結果から頻度の極端に低い文字の並びを頻度の高
い文字の並びに修正する手段と、からなる日本語文書処
理装置を提供するものである。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and is a Japanese document processing apparatus for pointing out and correcting an input error in a Japanese document. A three-character connection frequency table of target characters created by statistical processing, a means for extracting a sequence of Japanese target characters including the input Kanji, and an incorrect input of the sequence of extracted characters by the three-character connection frequency table. From the result of the check, and a means for pointing out to the user the spelling of a character sequence with an extremely low frequency from the result of the check (EN) A Japanese document processing device comprising a means for correcting an arrangement of characters having an extremely low frequency and a sequence of characters having a high frequency.

【０００７】[0007]

【作用】入力を行う時避けられない問題である入力ミス
を、漢字の誤りは一見してわかることが多い、仮名
の入力ミスは校正の段階において見つけにくい、文字
認識装置や音声認識装置において、仮名の並びは特徴の
量が少ないことから、漢字の並びに比べて認識を誤るこ
とが多い、３文字の並びの確率を用いる方が、２文字
の確率を用いるより信頼性が高いという点に注目して指
摘、修正することにより、全体的な入力ミスの数の軽減
を行なうことが可能となる。[Operation] Kanji errors are often seen at first glance, which is an unavoidable problem when inputting. Kana input errors are difficult to find in the proofreading stage. Since the sequence of kana has a small amount of features, it is more likely to be misrecognized than the sequence of kanji, so it is more reliable to use the probability of three-letter sequences than to use the probability of two letters. Then, it is possible to reduce the number of input errors as a whole by pointing out and correcting.

【０００８】[0008]

【実施例】以下、表を参照しながらＯＣＲで実現した場
合の本発明の実施例を説明するが、本発明はこれに限定
されるものではない。EXAMPLES Examples of the present invention when realized by OCR will be described below with reference to the tables, but the present invention is not limited to these.

【０００９】（対象文字３文字接続頻度テーブルの作
成）先ず、対象文字３文字接続頻度テーブルを作成す
る。ただし、ここではひらがな文字（８３文字）、カタ
カナ文字（８６文字）、及び記号を対象文字として、テ
ーブルを作成する。日本語テキストデータを入力して、
ひらがな文字（８３文字）カタカナ文字（８６文字）、
及び記号について、文字列を取り出し、３文字列の存在
する頻度を求める。ここで、頻度を求める対象となる文
字（前記ひらがな文字、カタカナ文字、及び記号）を対
象文字と呼ぶ。３文字列とは、対象文字が連続して３文
字つづく文字列である。各文字は対象文字コード表（表
１）により順序付けられている。(Creation of target character three-character connection frequency table) First, a target character three-character connection frequency table is created. However, here, a table is created with hiragana characters (83 characters), katakana characters (86 characters), and symbols as target characters. Enter Japanese text data,
Hiragana characters (83 characters) Katakana characters (86 characters),
For and symbols, the character strings are extracted and the frequency of existence of the three character strings is obtained. Here, the character for which the frequency is to be obtained (the hiragana character, katakana character, and symbol) is called the target character. The three-character string is a character string in which the target character is continuously three characters. Each character is ordered by the target character code table (Table 1).

【００１０】[0010]

【表１】 [Table 1]

【００１１】例えば、対象文字コード表が表１に示すも
のである場合、「ぁ」の位置番号は「０」、「あ」の位
置番号は「１」、「ぃ」の位置番号は「２」、「い」の
位置番号は「３」という、並んでいる順番に従って、番
号が決まっている。対象コード表が決まった状態で、大
量の日本語テキストデータを入力して、処理を行う。こ
の対象文字３文字接続頻度テーブル作成処理例のフロー
チャートを図１に示す。処理は、日本語テキストデータ
を１文字ずつ読み込んで進む。例えば次の表２に示すデ
ータが入力されるとする。For example, when the target character code table is as shown in Table 1, the position number of "a" is "0", the position number of "a" is "1", and the position number of "i" is "2". The position numbers of "" and "I" are "3", and the numbers are determined according to the order of arrangement. With the target code table decided, a large amount of Japanese text data is input and processed. FIG. 1 shows a flowchart of an example of the processing for creating the connection frequency table for the three target characters. The process proceeds by reading Japanese text data character by character. For example, assume that the data shown in Table 2 below is input.

【００１２】[0012]

【表２】 [Table 2]

【００１３】３文字綴が成立しているかをチェックす
る。（ステップ１）３文字綴が成立していれば、３文字バッファに記されて
いる位置のテーブルに１を加算し、３文字綴が、成立し
ていなければ、ステップ２に進む。（ステップ２）１文字を読み込む。ファイルの終りであれば、処理を終
了する。（ステップ３）読み込まれた文字が、対象文字であるかどうかチェック
する。（ステップ４）ステップ４において、チェックの結果が、対象文字であ
れば、３文字綴バッファの左位置にシフトしてその文字
位置を３文字綴バッファの最後に記入する。（ステップ
５）ステップ４において、チェックの結果が、対象文字でな
い場合は、３文字バッファをクリアしステップ１に戻
る。（ステップ６）例えば、この処理で、表２のうち、最初の「同じものと
みなし、」というデータ入力について、次の表３に示
す、６種類の文字列が抽出される。It is checked whether the three-letter spelling is established. (Step 1) If the three-character spelling is established, 1 is added to the table at the position described in the three-character buffer, and if the three-character spelling is not established, the process proceeds to step 2. (Step 2) Read one character. If it is the end of the file, the process ends. (Step 3) It is checked whether the read character is the target character. (Step 4) In step 4, if the result of the check is the target character, it is shifted to the left position of the three-character spelling buffer and the character position is entered at the end of the three-character spelling buffer. (Step 5) If the check result in step 4 is not the target character, the three-character buffer is cleared and the process returns to step 1. (Step 6) For example, in this process, six types of character strings shown in the following Table 3 are extracted with respect to the first data input “same as,” in Table 2.

【００１４】[0014]

【表３】 [Table 3]

【００１５】そして、表１より「じ」の位置は２３
「も」の位置は６５、「の」の位置は４５であるから、
１度（じもの）という文字列が出現する毎に、テーブル
１２３，６５，４５）に１が加算されることになる。From Table 1, the position of "ji" is 23.
Since the position of "mo" is 65 and the position of "no" is 45,
Every time the character string “one time” appears, 1 is added to the tables 123, 65, 45).

【００１６】全ての日本語テキストデータを処理する
と、処理した日本語テキストデータの中の対象文字列の
存在回数がテーブルに記される。これらの存在回数を総
合回数で割ったものを「対象文字３文字接続頻度テーブ
ル」とする（図２）。When all the Japanese text data is processed, the number of times the target character string exists in the processed Japanese text data is recorded in the table. A value obtained by dividing the number of times of existence of these by the total number of times is defined as a “target character three-character connection frequency table” (FIG. 2).

【００１７】ただし、このテーブルは、対象文字数×対
象文字数×対象文字数の大きさがあり、このままではサ
イズが大き過ぎて実用的でない。従って、以下に述べる
テーブル縮小プログラムによって存在する対象文字列の
みを記憶しておく。これにより、テーブルの大きさは３
Ｋバイト足らずに縮小される。However, this table has a size of the number of target characters × the number of target characters × the number of target characters, and the size is too large to be practical as it is. Therefore, only the target character strings existing by the table reduction program described below are stored. This makes the table size 3
It is reduced to less than K bytes.

【００１８】（対象文字３文字接続テーブル縮小方法）
基本的に本縮小方法は作成された「対象文字３文字接続
頻度テーブル」の頻度が０のものをとり除き、縮小する
ものである（フローチャート図３、図４）。(Method for reducing target character three-character connection table)
Basically, this reduction method is to reduce the created “target character three-character connection frequency table” with a frequency of 0 (flowcharts FIGS. 3 and 4).

【００１９】対象文字分の大きさをもったテーブル（検
索テーブル１ｔａｂｃｏｕｎｔ［ＭＯＪＩＳＵＵ］）を
用意する。（ステップ１）３文字接続頻度テーブルの最
初の文字をグループわけのキーとして、同じキーにいく
つの要素が存在するかを調べるために、３文字接続頻度
テーブルの最初の文字を固定し、３文字接続頻度テーブ
ルの頻度が０でないとき、検索テーブルに１を加算す
る。（ステップ２）検索テーブルを圧縮テーブルファイ
ルに書き込む（ステップ３）。３文字接続頻度テーブル
をテーブルの要素の回数だけチェックし、テーブルの内
容が０でない時、テーブルの２番めの文字の位置、３番
めの文字の位置、及びその位置にあたる頻度テーブルの
数値を圧縮テーブルに書き込む。（ステップ４）圧縮テ
ーブルファイルを読むときはこの操作の逆を行えばよ
い。A table (search table 1 tabcount [MOJISUU]) having a size corresponding to the target character is prepared. (Step 1) The first character of the three-character connection frequency table is fixed as the grouping key, and the first character of the three-character connection frequency table is fixed to check how many elements exist in the same key. When the frequency of the connection frequency table is not 0, 1 is added to the search table. (Step 2) The search table is written in the compressed table file (step 3). Check the 3 character connection frequency table for the number of times of the elements of the table, and when the content of the table is not 0, the position of the second character of the table, the position of the third character, and the numerical value of the frequency table corresponding to that position are displayed. Write to compressed table. (Step 4) When reading the compressed table file, the reverse operation may be performed.

【００２０】（入力の誤り指摘、修正処理）上記の方法
により作成した「対象文字３文字接続頻度テーブル圧縮
テーブル」を用いて行う、入力の誤り指摘、修正処理の
流れを以下に示す（図５、図６、図７にフローチャー
ト、図８にブロック図）。(Input error indication / correction processing) The flow of input error indication / correction processing performed by using the "target character three-character connection frequency table compression table" created by the above method is shown below (FIG. 5). 6 and 7 are flowcharts and FIG. 8 is a block diagram).

【００２１】ＯＣＲシステムなどの入力装置は、テキス
トデータ形式で入力を行なう。本処理は、入力される日
本語文字列全体から、かななどの対象文字列の３文字綴
を取り出すステップ（対象文字３文字綴取り出しステッ
プ）と、３文字綴の存在頻度を頻度テーブルから参照す
るステップ（テーブル参照ステップ）と、３文字綴の存
在頻度のチェックするステップ（チェックステップ）
と、存在頻度がある閾値を下回るとき、その３文字綴を
入力ミスの可能性のある文字列として登録する（入力ミ
ス候補取り出しステップ）と、入力ミス候補の位置をユ
ーザに伝える（入力ミス修正結果出力ステップ）とを有
する。An input device such as an OCR system performs input in a text data format. This process refers to a step of extracting a three-character spelling of a target character string such as kana (target character three-character spelling extraction step) from the entire input Japanese character string and the frequency of occurrence of the three-character spelling from the frequency table. Step (table reference step) and step of checking existence frequency of three-letter spelling (check step)
When the existence frequency is below a certain threshold, the three-letter spelling is registered as a character string that may cause an input error (input error candidate extraction step), and the position of the input error candidate is notified to the user (input error correction). Result output step).

【００２２】対象文字３文字綴取り出しステップは、Ｏ
ＣＲの認識部から出力された日本語の文字綴を１文字づ
つ読み込む。たとえば、対象文字綴取り出しステップに
おいて、The target character three-character spelling extraction step is O
The Japanese character spelling output from the CR recognition unit is read character by character. For example, in the target character spelling extraction step,

【００２３】[0023]

【表４】 [Table 4]

【００２４】という文字列がＯＣＲの認識部から出力さ
れた場合、まず、「同」がバッファに読み込まれ、対象
文字かどうかをチェックする。チェックは、対象文字コ
ードテーブルを用いて行なう。対象文字でなければ、何
も行なわれないで、次の文字を読み込む。次に、「じ」
が読み込まれ、同様に対象文字かどうかをチェックす
る。対象文字であれば、３文字綴バッファにコピーさ
れ、次の文字を読み込む。「も」も対象文字であるの
で、３文字綴バッファに「じ」の次の場所にコピーされ
る。同様に「の」も３文字綴りバッファにコピーされ
る。ここで対象文字の３文字綴りが完成し、対象文字綴
り取り出しステップは終了する。When the character string "is output from the recognition unit of the OCR, first," the same "is read into the buffer and it is checked whether it is the target character. The check is performed using the target character code table. If it is not the target character, nothing is done and the next character is read. Next, "ji"
Is read and similarly it is checked whether it is the target character. If it is the target character, it is copied to the three-character spelling buffer and the next character is read. Since "mo" is also the target character, it is copied to the place next to "ji" in the three-character spelling buffer. Similarly, "no" is also copied to the three-character spelling buffer. At this point, the three-character spelling of the target character is completed, and the target character spelling extraction step ends.

【００２５】テーブル参照ステップは、対象文字綴取り
出しステップから送られる３文字綴から「対象文字３文
字接続頻度テーブル圧縮テーブル」を参照し、３文字綴
の頻度を得る。チェックステップでは、テーブル参照ス
テップで得られた頻度が低く、その値がある閾値を越え
た時に、その３文字綴チェック情報として、最初の「じ
もの」の部分にエラーの可能性があるフラグを送る。The table reference step obtains the frequency of three-letter spelling by referring to the "target character three-letter connection frequency table compression table" from the three-letter spelling sent from the target letter spelling extraction step. In the check step, the frequency obtained in the table reference step is low, and when the value exceeds a certain threshold value, a flag indicating a possibility of an error is added to the first "jimono" part as the 3-character spell check information. send.

【００２６】再度、対象文字綴取り出しステップら戻
り、「と」を読み込む。ここで「ものと」という３文字
綴が作成され、テーブル参照ステップへ送られる。同様
の処理が、読み込まれる文字が対象文字でなくなるま
で、行なわれる。エラーの可能性のあるフラグがあった
場合、入力ミス候補取り出しステップでは、エラーの可
能性のあるフラグの位置をチェックし、エラーの可能性
のあるフラグが閾値をこえて続く時、その３文字綴の重
なりの部分の文字がエラーの可能性があるとして情報を
伝える。同時に、出力表示を反転させるなどして、エラ
ーの可能性があるとユーザに指摘する。次に、その文字
の候補を参照し、エラーの可能性がある誤文字を候補の
文字に置き換える。次に、「対象文字３文字接続頻度テ
ーブル圧縮テーブル」の候補に入れ換えた３文字綴の頻
度を参照する。この時、元の頻度と候補に入れ換えた場
合の頻度を比較し、閾値を越えていれば、入力ミス修正
結果出力ステップに進む。Returning again from the target character spelling extraction step, "to" is read. Here, the three-letter spelling "monoto" is created and sent to the table reference step. Similar processing is performed until the read character is not the target character. If there is a flag with an error possibility, the position of the flag with an error possibility is checked in the input error candidate extraction step, and when the flag with an error possibility continues beyond the threshold, the three characters Characters in the overlapped part of the spelling convey information as a possible error. At the same time, the user is pointed out that there is a possibility of error by reversing the output display. Next, the candidate character is referenced, and the erroneous character that may be an error is replaced with the candidate character. Next, the frequency of the three-character spelling replaced with the candidate of the “target character three-character connection frequency table compression table” is referred to. At this time, the original frequency is compared with the frequency when the candidate is replaced, and if it exceeds the threshold, the process proceeds to the input error correction result output step.

【００２７】入力ミス候補の位置をユーザに伝える（入
力ミス修正結果出力ステップ）では、入力候補取り出し
ステップにおいて取り出され、頻度の高い、３文字綴を
修正結果として出力する。In the case of notifying the user of the position of the input error candidate (input error correction result output step), the three-letter spells, which are frequently extracted in the input candidate extraction step, are output as the correction result.

【００２８】以下に、本処理の入力および出力の例を示
す。An example of input and output of this processing will be shown below.

【００２９】＜入力の誤り指摘、修正処理への入力例＞<Example of input error input / correction processing>

【００３０】[0030]

【表５】 [Table 5]

【００３１】この文字列の第１位認識結果がテーブル対
象文字であり、かつその文字数が３文字以上の文字列が
入力の誤り指摘、修正処理頻度テーブル参照部に送られ
る。つまり、上の列では「はかりではない。」の１文字
列である。The first-order recognition result of this character string is the table target character, and a character string having the number of characters of 3 or more is sent to the input error indication / correction processing frequency table reference section. That is, in the upper row, it is a single character string "not a scale."

【００３２】＜入力の誤り指摘、修正処理出力結果例＞<Example of output result of correction processing indicating correction of input error>

【００３３】[0033]

【表６】 [Table 6]

【００３４】「は」が「ば」に修正されている。“Ha” is modified to “ba”.

【００３５】上記実施例において、日本語ＯＣＲを用い
て説明したが、本発明はこれに限定されるものではな
く、音声認識装置、ワードプロセッサなどの入力装置の
後処理におけるミスチェックに用いてもよい。In the above embodiment, the Japanese OCR is used for explanation, but the present invention is not limited to this and may be used for a mistake check in the post-processing of an input device such as a voice recognition device or a word processor. .

【００３６】また、一般文章の漢字部分をかなに直した
ひらがなばかりの文章から、統計処理により作成された
「対象３音節接続頻度テーブル」を持ち、入力されたか
な文字の並びを上記テーブルにより誤入力の可能性があ
るかどうかをチェックする手段と、頻度の極端に低い文
字の並びを誤入力の可能性がある綴としてユーザに指摘
する手段とを備え、ワードプロセッサなどの入力部にお
いて、かな漢字変換の漢字に変換する前の、かなばかり
の文字列をミスチェックする処理装置として用いてもよ
い。In addition, it has a "target three-syllable connection frequency table" created by statistical processing from a sentence in which only the hiragana characters in which the kanji part of a general sentence is corrected into a kana, and the arrangement of the inputted kana characters is incorrect by the above table Equipped with a means for checking whether there is a possibility of inputting and a means for pointing out to the user a sequence of characters that has an extremely low frequency as a spelling that may cause an erroneous input. It may be used as a processing device for performing a mischeck on a character string of kana just before it is converted into the kanji.

【００３７】[0037]

【発明の効果】本発明により、従来の技術である形態素
解析を利用した後処理の問題点、即ち、解析に用いる巨
大な単語辞書を必要とする、処理が複雑である、処理時
間がかかる、誤ることの少ない漢字の処理に多くの時間
を使うため処理に無駄が多い、といった問題を解決する
ことが可能となる。As described above, according to the present invention, there is a problem in the post-processing using the conventional technique of morphological analysis, that is, a huge word dictionary used for the analysis is required, the processing is complicated, and the processing time is long. It is possible to solve the problem that a large amount of time is spent on the processing of kanji that are not erroneously made, and the processing is wasteful.

[Brief description of drawings]

【図１】本発明の１実施例による対象文字３文字接続頻
度テーブル作成処理を示すフローチャートである。FIG. 1 is a flowchart showing a target character three-character connection frequency table creation process according to an embodiment of the present invention.

【図２】本発明の１実施例による対象文字３文字接続頻
度テーブル構成図である。FIG. 2 is a configuration diagram of a connection frequency table of three target characters according to an embodiment of the present invention.

【図３】本発明の１実施例による対象文字３文字接続頻
度テーブルの縮小方法を示すフローチャートである。FIG. 3 is a flowchart illustrating a method of reducing a target character three character connection frequency table according to an embodiment of the present invention.

【図４】本発明の１実施例による対象文字３文字接続頻
度テーブルの縮小方法を示すフローチャートである。FIG. 4 is a flowchart showing a method of reducing a target character three character connection frequency table according to an embodiment of the present invention.

【図５】本発明の１実施例による入力誤り指摘、修正処
理を示すフローチャートである。FIG. 5 is a flowchart showing an input error indication and correction process according to an embodiment of the present invention.

【図６】本発明の１実施例による入力誤り指摘、修正処
理を示すフローチャートである。FIG. 6 is a flowchart showing input error indication and correction processing according to an embodiment of the present invention.

【図７】本発明の１実施例による入力誤り指摘、修正処
理を示すフローチャートである。FIG. 7 is a flowchart showing an input error indication and correction process according to an embodiment of the present invention.

【図８】本発明の１実施例による構成を示すブロック図
である。FIG. 8 is a block diagram showing a configuration according to an exemplary embodiment of the present invention.

Claims

[Claims]

1. A Japanese document processing apparatus for pointing out and correcting an input error in a Japanese document, comprising a three-character connection frequency table of target characters created by statistical processing from general sentences, and input Kanji characters. Means for extracting a sequence of Japanese target characters including "," means for checking whether or not there is a possibility of erroneous input of the sequence of extracted characters by the three-character connection frequency table, and a frequency from the result of the check. Means for pointing out to the user the spelling of the extremely low sequence of characters as a spelling that may be erroneous input, and means for correcting the sequence of extremely low frequency characters from the result of the check as described above, A Japanese document processing device characterized by comprising.