JPH07296102A

JPH07296102A - Data input method

Info

Publication number: JPH07296102A
Application number: JP6089403A
Authority: JP
Inventors: Takuya Okamoto; 卓哉岡本; Masatoshi Hino; 匡利樋野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-04-27
Filing date: 1994-04-27
Publication date: 1995-11-10

Abstract

(57)【要約】【目的】ＯＣＲを用いた多量のデータ入力において、修
正作業の効率アップをはかる。【構成】画像データをメモリ上に格納し（２０１）、認
識対象文字を抽出し、文字認識および、単語マッチング
による修正を行い（２０２）、認識結果を認識結果格納
ディスク（２０７）に格納する。単語マッチングにより
修正されなかった部分については、チェックシート（２
０８）に画像データおよびその出現位置，認識候補文字
もしくはリジェクト文字を出力する（２０３）。次に候
補文字と画像データが不一致の場合には、人手により修
正データ(２０９)を入力(２０４)し、修正データに従っ
て、認識結果（２０７）を修正する（２０５）。 (57) [Summary] [Purpose] To improve the efficiency of correction work when a large amount of data is input using OCR. [Structure] Image data is stored on a memory (201), a recognition target character is extracted, correction is performed by character recognition and word matching (202), and a recognition result is stored in a recognition result storage disk (207). For the parts that were not corrected by word matching, check sheet (2
The image data, its appearance position, the recognition candidate character or the reject character is output to (08) (203). Next, when the candidate character and the image data do not match, the correction data (209) is manually input (204), and the recognition result (207) is corrected according to the correction data (205).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、大量の文書を認識し、
入力するシステムに係り、特に、高精度で効率的なデー
タ入力方式に関する。This invention recognizes large numbers of documents,
The present invention relates to a system for inputting data, and particularly to a highly accurate and efficient data input method.

【０００２】[0002]

【従来の技術】文書入力の一般的な方法としては、ＯＣ
Ｒ利用による方法やキーパンチによる方式などがある。2. Description of the Related Art OC is generally used as a document input method.
There are a method using R and a method using key punch.

【０００３】文字入力のための文字認識について、多く
の方法が提案されている(特願平4−51305 号：文字認識
方式など）。また、認識結果の高精度化手段として、単
語辞書を用いた単語チェックによる方法（特開平4−115
384 号：単語チェック機能を持つ日本語ＯＣＲ）や、前
後の文字との接続関係による修正方法（特開昭63−1932
87号：文字読取り方法）などがある。Many methods have been proposed for character recognition for character input (Japanese Patent Application No. 4-51305: character recognition method, etc.). In addition, as a method of improving the accuracy of the recognition result, a method of word check using a word dictionary (Japanese Patent Laid-Open No. 4-115
No. 384: Japanese OCR with word check function, and correction method based on the connection relation with the preceding and following characters (Japanese Patent Laid-Open No. 63-1932)
No. 87: How to read characters).

【０００４】入力されたデータに対する、最終的な要求
精度が高くなるほど、誤認識や誤入力発見のために、人
手による目視チェックの作業量が増大する。この作業量
の低減のためには、効率的な誤認識の検出手段，修正手
段が必要となる。The higher the final required accuracy for the input data, the more the amount of work required for manual visual check due to erroneous recognition and erroneous input discovery. In order to reduce the amount of work, efficient erroneous recognition detection means and correction means are required.

【０００５】従来のＯＣＲの誤認識の検出，修正手段と
しては、パーソナルコンピュータ（ＰＣ），ワークステ
ーション（ＷＳ）等を用いて、文字の画像データ，認識
結果をディスプレイ上に表示し、判別がつかない文字
は、色を変えたり、マークなどをつける方法（特開昭53
−24733 号：文字読み取り装置），候補文字を表示し、
選択して入力する方法（特開昭57−20873 号：光学文字
読み取り装置）などがある。As a conventional means for detecting and correcting erroneous recognition of OCR, a personal computer (PC), a workstation (WS) or the like is used to display the character image data and the recognition result on the display to make a distinction. If there are no characters, you can change the color or add a mark.
-24733: Character reader), displaying candidate characters,
There is a method of selecting and inputting (JP-A-57-20873: optical character reader).

【０００６】また、入力したコード情報を修正する、修
正データの入力方法としては、「挟み込み」と呼ばれる
方法がある。これは、修正文字の前後数文字を入力する
ことで修正文字の位置を同定し、修正する方法である。
この方法は、１文字の修正のために、多くの文字を入力
することになるが、修正文字の位置情報を入力する必要
がなく、修正対象のコードデータの情報だけあれば修正
できるという利点がある。As a method of inputting correction data for correcting the input code information, there is a method called "pinching". This is a method of identifying and correcting the position of the correction character by inputting several characters before and after the correction character.
This method requires many characters to be input in order to correct one character, but has the advantage that it is not necessary to input the position information of the correction character, and only the information of the code data to be corrected can be used for correction. is there.

【０００７】[0007]

【発明が解決しようとする課題】上記従来技術のうち、
ディスプレイに判別不能文字を表示する方式は、少量の
文書を修正する場合には良いが、大量の文書の修正作業
を行う場合には、作業効率の面で問題がある。チェック
と修正を同時に行うことによる修正効率の低下や、ディ
スプレイ上に表示できる範囲が限られるため１回にチェ
ックできる量が少なくなるなどの問題である。紙に出力
すれば分担して作業できるが、ディスプレイに出力する
場合には端末台数の制限を受けるなどの問題もある。ま
た、ディスプレイを見続けることによりオペレータの疲
労度が増大するため、チェックミスが増加することも考
えられる。Of the above-mentioned conventional techniques,
The method of displaying indistinguishable characters on the display is good for correcting a small number of documents, but has a problem in work efficiency when correcting a large number of documents. There are problems such as a reduction in correction efficiency due to simultaneous checking and correction, and a reduction in the amount that can be checked at one time because the range that can be displayed on the display is limited. Although it is possible to share the work by outputting on paper, there is a problem that the number of terminals is limited when outputting on display. Further, since the operator's fatigue level increases by continuing to look at the display, check mistakes may increase.

【０００８】多量に入力された文書の誤りをチェックす
る場合、紙に出力した方が作業しやすいが、単に認識結
果を出力し、原本との目視チェックを行うだけでは、誤
りの発見率が低いだけでなく、作業量も莫大なものとな
る。したがって、チェックしやすい認識結果の出力方法
が必要となる。When checking for errors in a large number of input documents, it is easier to work on paper, but the error detection rate is low by simply outputting the recognition result and visually checking the original. Not only that, the amount of work becomes enormous. Therefore, a method of outputting the recognition result that is easy to check is required.

【０００９】また、上記従来技術の「挟み込み」による
修正では、１文字の修正のために前後数文字を入力する
必要がある。修正量が少ない場合には問題がないが、修
正量が多くなると、入力文字数の増大が問題となる。し
たがって、修正箇所が、ある程度以上存在する場合は、
修正データの入力を容易にする情報を付加して認識結果
を出力し、これを利用して修正データを効率的に入力す
る方法が必要となる。Further, in the above-mentioned conventional "sandwiching" correction, it is necessary to input several characters before and after in order to correct one character. If the correction amount is small, there is no problem, but if the correction amount is large, the increase in the number of input characters becomes a problem. Therefore, if there are more than a few corrections,
There is a need for a method of outputting correction results by adding information for facilitating the input of correction data and efficiently inputting the correction data using the recognition results.

【００１０】[0010]

【課題を解決するための手段】ＯＣＲの認識結果に対し
て、単語マッチングなどの論理チェックによる修正を加
え、論理チェックで合格した部分については、チェック
用のデータは出力しない。さらに、論理チェックできな
かった部分についても、文字認識結果の確信度の高いも
のと低いものを区別して出力する。認識結果のチェッ
ク，修正を行うためのチェックシートには、画像デー
タ，認識結果の文字，その文字の認識結果中での位置を
同定するための位置情報を出力する。[Means for Solving the Problems] The OCR recognition result is corrected by a logical check such as word matching, and the check data is not output for a portion that passes the logical check. Further, even for the portion which cannot be logically checked, the character recognition result having a high certainty and the character having a low certainty are separately output. Image data, characters of the recognition result, and position information for identifying the position of the character in the recognition result are output to a check sheet for checking and correcting the recognition result.

【００１１】また、キーボードなどから入力された、修
正文字の位置情報，修正文字などの修正データを基に、
修正対象の認識結果中の修正範囲を同定し、この範囲の
文字を、修正データの文字コードと置き換え、認識結果
を修正する手段を用いる。Further, based on the correction information such as the position information of the correction character and the correction character input from the keyboard or the like,
A means for correcting the recognition result by identifying the correction range in the recognition result of the correction target, replacing the characters in this range with the character code of the correction data, and correcting the recognition result.

【００１２】[0012]

【作用】文書の内容が限定されたり、言い回しの変化が
少ない文書は、論理的なチェックが可能な部分が多く、
さらに、論理チェックされた部分は充分に高い精度が得
られる。したがって、論理チェックされた部分をチェッ
クシートに出力しないことで、人手によるチェックの対
象が限定され、チェックの作業量が削減される。[Function] A document whose content is limited or whose wording is not changed has many parts that can be logically checked.
Further, the logically checked portion has sufficiently high accuracy. Therefore, by not outputting the logically checked portion to the check sheet, the target of manual check is limited, and the amount of check work is reduced.

【００１３】チェックシートには、画像データ，認識結
果の文字を出力し、認識の確信度が低い文字は、強調文
字を利用する、リジェクト文字を出力するなど、認識結
果の表示方法を変更することで、単に文字コードを出力
し、原本との比較を行う場合に比べてチェック効率を向
上し、誤認識の発見率を向上させることができる。It is necessary to change the display method of the recognition result such as outputting the image data and the character of the recognition result to the check sheet, and using the emphasized character and the reject character for the character with low certainty of recognition. Thus, it is possible to improve the checking efficiency and improve the detection rate of erroneous recognition, as compared with the case where the character code is simply output and compared with the original.

【００１４】また、チェック後の修正データを入力する
時には、チェックシート上に記載された、修正する文字
の位置情報と、修正する文字コードを入力する。これに
より、文字位置を同定するための入力文字数を削減する
ことができ、効率的な修正データ入力作業を可能とす
る。Further, when inputting the correction data after the check, the position information of the character to be corrected and the character code to be corrected, which are described on the check sheet, are input. As a result, the number of input characters for identifying the character position can be reduced, and efficient correction data input work can be performed.

【００１５】本発明を用いることで、ＯＣＲを用いたデ
ータ入力において、修正作業の効率化と、チェック修正
作業後の結果の高精度化が実現できる。By using the present invention, in the data input using the OCR, the efficiency of the correction work and the accuracy of the result after the check correction work can be improved.

【００１６】[0016]

【実施例】図１は、本発明の一実施例の文書データ入力
システムの構成図である。スキャナ１０１から、入力さ
れた画像データは、スキャナコントローラ１０２を介し
て読み出され、ディスクコントローラ１０３を介して、
ディスク１０４に格納される。ＣＰＵ１０５では、メモ
リ１０６に格納されたプログラムを読み出し、認識処理
を行う。ディスク１０４から画像データをメモリ１０６
上に読み出し、認識結果は、ディスク１０４に格納され
る。さらに、修正のためのチェックシートをプリンタコ
ントローラ１０７を介して、プリンタ１０８から出力す
る。1 is a block diagram of a document data input system according to an embodiment of the present invention. The image data input from the scanner 101 is read out via the scanner controller 102, and is read out via the disk controller 103.
It is stored in the disk 104. The CPU 105 reads the program stored in the memory 106 and performs recognition processing. Image data from the disk 104 to the memory 106
The result of reading up is stored in the disk 104. Further, a check sheet for correction is output from the printer 108 via the printer controller 107.

【００１７】認識結果のチェック，修正は、チェックシ
ートを人手でチェックし、修正データをキーボード１０
９より、入力することにより行う。修正データは、キー
ボードコントローラ１１０を介して、メモリ１０６上に
格納される。さらに、ディスク１０４に格納された認識
結果を、メモリ１０６上に読みだし、入力された修正デ
ータに従って、認識結果を修正する。修正結果は、再び
ディスクに格納する。To check and correct the recognition result, the check sheet is manually checked and the correction data is input to the keyboard 10.
It is done by inputting from 9. The correction data is stored in the memory 106 via the keyboard controller 110. Further, the recognition result stored in the disk 104 is read out on the memory 106, and the recognition result is corrected according to the input correction data. The correction result is stored in the disk again.

【００１８】図２は本発明の一実施例の概略処理フロー
である。FIG. 2 is a schematic processing flow of an embodiment of the present invention.

【００１９】処理２０１：本システムでは、まず、画像
データをディスクから読み出す。Process 201: In this system, first, image data is read from the disc.

【００２０】処理２０２：画像データ中から、認識対象
フィールドを抽出し、各フィールドから、認識対象の文
字列および、段落を抽出する。文字パターンは、文字認
識し、候補文字を出力する。さらに、各文字の認識候補
に対し、単語マッチングなどの論理チェックを加え、認
識精度を向上する。認識結果は、認識結果格納ディスク
（２０７）に格納する。Process 202: The recognition target field is extracted from the image data, and the recognition target character string and paragraph are extracted from each field. The character pattern is recognized and the candidate character is output. Furthermore, a logical check such as word matching is added to the recognition candidate of each character to improve the recognition accuracy. The recognition result is stored in the recognition result storage disk (207).

【００２１】処理２０３：論理チェックを加えた後の結
果で、論理チェックで合格しなかった部分については、
その画像データと、認識候補と、その出現位置をチェッ
クシート(２０８)に出力し、その正誤を人手でチェック
する。また、認識の評価値によっては、認識不能とし
て、リジェクト識別用の文字（「？」など）を出力す
る。Process 203: As for the result after the logical check is added, which is not passed by the logical check,
The image data, the recognition candidate, and the appearance position are output to a check sheet (208), and the correctness is manually checked. Further, depending on the evaluation value of recognition, the character for reject identification (such as "?") Is output as unrecognizable.

【００２２】ここで、論理チェックに合格した部分は、
チェックシートに出力しないことで、人手によるチェッ
ク範囲を減少する。Here, the portion that has passed the logic check is
By not outputting the check sheet, the range of manual checks is reduced.

【００２３】処理２０４：２０３で出力されたチェック
シートを人手でチェックして、誤認識であった文字につ
いて、キーボードなどから、修正データ（２０９）を入
力する。The check sheet output in the process 204: 203 is manually checked, and the correction data (209) is input from the keyboard or the like for the character that has been erroneously recognized.

【００２４】処理２０５：入力された修正データ（２０
９）に基づいて、認識結果（２０７）を修正する。Process 205: The input correction data (20
The recognition result (207) is corrected based on 9).

【００２５】処理２０６：修正結果を、修正結果格納デ
ィスク（２１０）に出力する。Process 206: The correction result is output to the correction result storage disk (210).

【００２６】以上の処理により、画像データの認識およ
び修正を行う。By the above processing, the image data is recognized and corrected.

【００２７】図３に、チェックシートの例を示す。３０
１は認識結果のデータ名、あるいは、ファイル名であ
る。３０２は同一のデータ名のチェックシートのページ
数である。３０３は修正部分の段落位置を同定するため
の段落番号である。３０４は修正文字位置で、段落中で
の修正対象の文字位置を出力する。以上のデータを基
に、修正文字のデータファイル上での位置を表す。３０
５は、論理チェックに合格しなかった文字に対応する画
像データである。チェックの際に判定不能文字の前後を
含めて、画像データを出力する。３０６は、画像データ
の認識結果である。FIG. 3 shows an example of the check sheet. Thirty
1 is the data name or file name of the recognition result. 302 is the number of pages of check sheets with the same data name. Reference numeral 303 is a paragraph number for identifying the paragraph position of the corrected portion. A correction character position 304 outputs the character position of the correction target in the paragraph. The position of the corrected character on the data file is represented based on the above data. Thirty
Reference numeral 5 is image data corresponding to characters that have failed the logical check. Image data including before and after undecidable characters is output when checking. 306 is a recognition result of the image data.

【００２８】論理チェックにより、認識結果が正しいと
判定された部分については、網かけをするなどの方法で
区別する。また、かすれ，つぶれなどの低品質な文字
や、類似文字があるため判別不可能な文字など、信頼度
の低い文字は、認識結果を出力せず、“？”などの、リ
ジェクト文字を出力することで、チェックミスを防ぐ。
信頼度は、第１位と第２位の認識候補の評価値を比較
し、その差が閾値以下の場合は、信頼度が低いと判定す
るなどの方法を用いる。A portion for which the recognition result is determined to be correct by the logical check is distinguished by a method such as shading. Characters with low reliability such as low quality characters such as faint or crushed characters, or characters that cannot be distinguished because there are similar characters do not output the recognition result but output reject characters such as “?”. By doing so, you can prevent check mistakes.
The reliability is determined by comparing the evaluation values of the first and second recognition candidates, and if the difference is less than or equal to a threshold value, the reliability is determined to be low.

【００２９】図４は、入力される修正データの例であ
る。修正データは、修正段落番号，修正文字開始位置，
修正文字終了位置，修正文字コードを入力する。修正文
字が１文字だけなら、修正文字終了位置は省略できる。
また、修正段落番号が一つ前の修正データと同じであれ
ば、これも省略できる。一つの修正データは、最後に改
行コードが入力されることで終了する。FIG. 4 shows an example of correction data that is input. The correction data is the correction paragraph number, the correction character start position,
Enter the correction character end position and the correction character code. If there is only one correction character, the end position of the correction character can be omitted.
If the correction paragraph number is the same as the correction data immediately before, this can also be omitted. One piece of correction data ends when a line feed code is input at the end.

【００３０】４０１は、１文字が間違っている場合で、
修正段落番号，文字位置，修正文字コードを入力する。
これにより、対象となる位置の文字が修正される。401 is when one character is wrong,
Enter the correction paragraph number, character position, and correction character code.
As a result, the character at the target position is corrected.

【００３１】４０２は、ノイズを文字として認識してし
まった場合で、修正段落番号，文字位置のみを入力する
ことで削除される。Reference numeral 402 denotes a case in which noise is recognized as a character and is deleted by inputting only the corrected paragraph number and character position.

【００３２】４０３は、元々１文字であったものを２文
字として認識した場合で、修正段落番号，修正文字開始
位置，修正文字終了位置，修正文字コードを入力するこ
とで、２文字を１文字に修正する。Reference numeral 403 denotes a case where one character originally has been recognized as two characters, and by inputting the correction paragraph number, the correction character start position, the correction character end position, and the correction character code, two characters are converted into one character. To fix.

【００３３】４０４は、２文字を１文字として認識した
場合で、修正段落番号，修正文字位置，２文字分の修正
文字コードを入力することで、１文字を２文字に修正す
る。Reference numeral 404 is a case where two characters are recognized as one character, and one character is corrected to two characters by inputting the corrected paragraph number, the corrected character position, and the corrected character code for two characters.

【００３４】４０５は、抽出されなかった文字を追加す
る場合で、修正終了文字位置の代わりに“ａ”を入力す
ることで区別する。Reference numeral 405 denotes a case where a character that has not been extracted is added and is distinguished by inputting "a" instead of the correction end character position.

【００３５】図５は、リジェクト番号と、フィールド，
段落を対応付けるリジェクト番号対応表である。前記の
処理２０３で、リジェクトリストの出力と共に、この表
を作成する。FIG. 5 shows reject numbers, fields,
It is a rejection number correspondence table that associates paragraphs. In the process 203, this table is created together with the output of the reject list.

【００３６】５０１はリジェクト番号である。５０２は
画像番号である。５０３は画像中のフィールド番号であ
る。５０４はフィールド中の段落番号である。Reference numeral 501 is a reject number. 502 is an image number. 503 is a field number in the image. Reference numeral 504 is a paragraph number in the field.

【００３７】修正処理の際、この表を用いて、リジェク
ト番号と修正対象の段落を対応づけ、これにより、リジ
ェクト番号だけで、修正位置が同定できるため、修正の
ために入力する文字数が削減される。During the correction process, this table is used to associate the reject number with the paragraph to be corrected, whereby the correction position can be identified only by the reject number, and the number of characters input for correction can be reduced. It

【００３８】以下では、前記の処理２０２の認識処理の
内容について説明する。The contents of the recognition process of the process 202 will be described below.

【００３９】まず、図６のフローチャートに従って、文
字認識処理を行う。First, character recognition processing is performed according to the flowchart of FIG.

【００４０】処理６０１：画像から、基準となる罫線，
マークなどを抽出し、これらの位置情報を基に、所定の
フォーマットに対応する認識フィールドを設定する。Process 601: From an image, a ruled line serving as a reference,
A mark or the like is extracted, and a recognition field corresponding to a predetermined format is set based on the position information.

【００４１】処理６０２：設定されたフィールドごと
に、文字成分を抽出する。文字成分は、フィールド内の
黒画素連結成分を基に抽出するが、文字同士の接触があ
る場合は、文字の標準サイズにしたがって切り分ける。Process 602: A character component is extracted for each set field. The character component is extracted based on the black pixel connected component in the field, but if there is contact between characters, they are separated according to the standard size of the character.

【００４２】処理６０３：文字成分をフィールドごと
に、文字の並び方向に統合することで、文字列が生成さ
れる。Process 603: A character string is generated by integrating the character components for each field in the character arrangement direction.

【００４３】処理６０４：各行の字下げなどの情報か
ら、段落をまとめて抽出する。Process 604: Paragraphs are collectively extracted from information such as indentation of each line.

【００４４】処理６０５：各段落の文字成分を抽出し、
文字認識を行い、認識結果（６０６）を出力する。Process 605: Extract the character component of each paragraph,
Character recognition is performed and a recognition result (606) is output.

【００４５】これらの書式解析方式および文字抽出方式
としては、「横書き日本語文書における個別文字の抽
出」電子通信学会論文誌 '８５／１１ Vol.J68-D No.11
pp.1899−1909の方法などが適用できる。文字認識につ
いても、前記特願平4−51305号に述べられている方法な
どが適用できる。As the format analysis method and the character extraction method, "extraction of individual characters in horizontal Japanese document", IEICE Transactions, '85 / 11 Vol.J68-D No.11
The method of pp.1899-1909 can be applied. The method described in Japanese Patent Application No. 4-51305 can be applied to character recognition.

【００４６】文字認識候補は、図７に示したように、複
数の文字候補を評価値とともに出力する。As the character recognition candidates, as shown in FIG. 7, a plurality of character candidates are output together with the evaluation value.

【００４７】７０１は文字数、７０２は候補文字出力
数、７０３は文字番号、７０４は候補文字、７０５は評
価値が格納される部分である。Reference numeral 701 is the number of characters, 702 is the number of output candidate characters, 703 is a character number, 704 is a candidate character, and 705 is a portion for storing an evaluation value.

【００４８】図８に示すように、文字パターンに対し
て、複数の認識候補を求めて、この候補文字から単語マ
ッチングなどの論理チェックにより、認識結果を修正す
る。このとき、単語マッチングによる、認識結果の修正
方式としては、前記特開平4−115384号などの方法が利
用可能である。As shown in FIG. 8, a plurality of recognition candidates are obtained for the character pattern, and the recognition result is corrected from the candidate characters by logical checking such as word matching. At this time, as a method of correcting the recognition result by word matching, the method disclosed in Japanese Patent Laid-Open No. 4-115384 can be used.

【００４９】図９は、前記認識結果２０７（２１０の修
正結果も同様）の格納形式である。単語マッチングによ
って修正された認識結果は、段落ごとに文字列として格
納される。FIG. 9 shows a storage format of the recognition result 207 (same as the correction result of 210). The recognition result corrected by word matching is stored as a character string for each paragraph.

【００５０】９０１の画像番号には、一意に決定される
画像の番号が格納される。９０２のフィールド番号に
は、認識フィールドの番号が格納される。９０３の段落
番号には、フィールド中の段落番号が格納される。９０
４の認識結果には、９０１から９０３によって、同定さ
れる段落の文字コードデータが格納される。The image number 901 stores the number of the image that is uniquely determined. The field number 902 stores the number of the recognition field. The paragraph number in the field is stored in the paragraph number 903. 90
In the recognition result of No. 4, the character code data of the paragraph identified by 901 to 903 is stored.

【００５１】以上の処理により、高精度な文字認識およ
び、効率的な認識結果の修正が実現される。By the above processing, highly accurate character recognition and efficient correction of the recognition result are realized.

【００５２】[0052]

【発明の効果】文字認識結果に対して、論理チェックを
利用することで、高い認識精度が得られる。したがっ
て、認識結果をチェック，修正するためのチェックシー
トを、論理チェックに合格した部分については出力しな
い。さらに、論理チェックに合格しなかった部分につい
ても、文字認識の評価値により、チェックシートへの出
力形式を変更することで、人手によるチェックの対象を
限定し、チェック効率を向上する。By using the logic check for the character recognition result, high recognition accuracy can be obtained. Therefore, the check sheet for checking and correcting the recognition result is not output for the portion that has passed the logic check. Further, even for the part that does not pass the logical check, the output form to the check sheet is changed according to the evaluation value of the character recognition, so that the target of the manual check is limited and the check efficiency is improved.

【００５３】また、チェックシートに出力する修正文字
位置情報を利用して、修正文字の位置を指定するための
入力文字数を削減することで、効率的な修正データ入力
を実現する。Further, the correction character position information output on the check sheet is used to reduce the number of input characters for designating the position of the correction character, thereby realizing efficient correction data input.

【００５４】本発明により、ＯＣＲを用いたデータ入力
において、高精度な認識と、チェック，修正作業の効率
化と、チェック修正作業後の結果の高精度化を実現でき
る。According to the present invention, in data input using OCR, highly accurate recognition, efficiency of check / correction work, and high accuracy of results after check / correction work can be realized.

[Brief description of drawings]

【図１】本発明の一実施例のシステム構成のブロック
図。FIG. 1 is a block diagram of a system configuration of an embodiment of the present invention.

【図２】本発明の一実施例の処理内容の概略フローチャ
ート。FIG. 2 is a schematic flowchart of processing contents according to an embodiment of the present invention.

【図３】チェックシートの出力例を示す図。FIG. 3 is a diagram showing an output example of a check sheet.

【図４】認識結果を修正する修正データの入力例を示す
図。FIG. 4 is a diagram showing an example of inputting correction data for correcting a recognition result.

【図５】リジェクト番号と画像，フィールド，段落の対
応表を示す図。FIG. 5 is a diagram showing a correspondence table of reject numbers, images, fields, and paragraphs.

【図６】文字認識処理のフローチャート。FIG. 6 is a flowchart of character recognition processing.

【図７】文字認識候補の出力結果を示す図。FIG. 7 is a diagram showing an output result of character recognition candidates.

【図８】単語マッチング処理の結果の例を示す図。FIG. 8 is a diagram showing an example of a result of word matching processing.

【図９】文字認識結果の出力内容を示す図。FIG. 9 is a diagram showing output contents of a character recognition result.

【符号の説明】１０１…スキャナ、１０２…スキャナコントローラ、１
０３…ディスクコントローラ、１０４…ディスク、１０
５…ＣＰＵ、１０６…メモリ、１０７…プリンタコント
ローラ、１０８…プリンタ、１０９…キーボード、１１
０…キーボードコントローラ，２０７…認識結果格納デ
ィスク、２０８…チェックシート、２０９…修正デー
タ、２１０…修正結果格納ディスク。[Explanation of Codes] 101 ... Scanner, 102 ... Scanner Controller, 1
03 ... disk controller, 104 ... disk, 10
5 ... CPU, 106 ... Memory, 107 ... Printer controller, 108 ... Printer, 109 ... Keyboard, 11
0 ... Keyboard controller, 207 ... Recognition result storage disk, 208 ... Check sheet, 209 ... Correction data, 210 ... Correction result storage disk.

Claims

[Claims]

1. A data input method for inputting characters in a document to a computer system, comprising: character extracting means for extracting each character pattern from a document image; character recognizing means for recognizing the extracted character pattern; Character recognition candidate output means for outputting a candidate character for recognition, recognition candidate correction means for correcting the output recognition candidate by logical check, and its reliability from the results obtained by the recognition candidate correction means and the character recognition means. A recognition result discriminating means for discriminating, and a recognition result check according to the discrimination result obtained by the recognition result discriminating means,
A check / correction information output means for outputting information for making correction, a means for inputting correction data based on the check / correction information, and a recognition result obtained by the character recognition means according to the input correction data. A data input method characterized by having means.