JPS63143685A

JPS63143685A - Recognition result display method in character recognition device

Info

Publication number: JPS63143685A
Application number: JP61291301A
Authority: JP
Inventors: Toshiaki Morita; 森田　敏昭; Minehiro Konya; 峰弘紺矢; Hideaki Tanaka; 秀明田中
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1986-12-05
Filing date: 1986-12-05
Publication date: 1988-06-15

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は文字認識装置における認識結果表示方法に関
する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a recognition result display method in a character recognition device.

（技術背景）文書の文字情報をコンピュータ処理により認識する文字
認識装置として、認識しようとする文字情報、例えば英
数字を光電変換し、該光電変換された電気信号を１文字
型位で切り出し、認識部において所定の認識論理に従っ
て１文字ずつ認識を行う、光学式文字読取装置（ＯＣＲ
）が知られている。(Technical Background) As a character recognition device that recognizes character information in a document through computer processing, the character information to be recognized, for example, alphanumeric characters, is photoelectrically converted, and the photoelectrically converted electrical signal is cut out into one character type and recognized. The optical character reader (OCR) recognizes each character one character at a time according to predetermined recognition logic.
)It has been known.

この種の文字認識装置において、従来は認識された文字
の正続率が低くて疑わしいと判定、いわゆるリジェクト
（否定）された場合、陰極線管（ＣＲＴ）等を用いた表
示部にリジェクトされた文字のみが点滅又は反転表示さ
れ、操作者は該表示を見ながら当該リジェクト文字を原
稿と照合して確認しつつキーボード等の修正手段を介し
てリジェクト文字の修正を行っていた。In this type of character recognition device, conventionally, when a recognized character has a low correctness rate and is judged to be suspicious, so-called rejected, the rejected character is displayed on a display unit using a cathode ray tube (CRT), etc. The operator corrects the rejected characters using a correction means such as a keyboard while checking the display and comparing the rejected characters with the original.

しかしながら、上記従来の方法は、原理的に１文字型位
で認識処理した結果を表示するものであるから、一般に
認識率が低く、即ちリジェクト頻度も高く、したがって
修正に多大な手間を要し、認識作業能率がいま１つ不満
足なものであった。However, since the above-mentioned conventional method basically displays the result of recognition processing for one character type, the recognition rate is generally low, that is, the rejection frequency is high, and therefore, a great deal of effort is required for correction. The recognition work efficiency was still unsatisfactory.

（解決しようとする課Ｍ）本発明は上記問題点に鑑みてなされたもので、認識され
た文字列を単語単位でスペルチェックおよびスペルコレ
クト処理を行って認識結果の修正作業を有効に低減する
とともに該修正作業能率を向上させて認識作業の高能率
化を図ることが出来る、文字認識装置における認識結果
表示方法を提供することを目的とする。(Section M to be solved) The present invention has been made in view of the above problems, and performs spell check and spell correct processing on recognized character strings word by word to effectively reduce the work of correcting recognition results. It is also an object of the present invention to provide a recognition result display method in a character recognition device, which can improve the efficiency of the correction work and make the recognition work highly efficient.

（構成）上記目的を達成するために、本発明の認識結果表示方法
は認識部において一文字毎に認識された結果を表示部に
表示するに当たり、上記認識部からの認識された文字列
を単語毎に切り出し、切り出された単語毎にスペルチェ
ックおよびスペルコレクト処理を行い、上記スペルチェ
ック処理の結果、該当する単語の存在が否定されたとき
当該単語の全文字を反転表示し、スペルコレクト処理に
より修正された単語の全文字にアンダーラインを付して
表示することを特徴とするものである。(Structure) In order to achieve the above object, the recognition result display method of the present invention displays the recognized character string from the recognition unit word by word when displaying the result recognized character by character in the recognition unit on the display unit. Spell check and spell correct processing are performed for each extracted word, and when the existence of the corresponding word is denied as a result of the spell check processing, all characters of the word are highlighted and corrected by spell correct processing. This feature is characterized by displaying all characters of the word with an underline.

以下に、本発明を添付図面とともに説明する。The present invention will be explained below with reference to the accompanying drawings.

（実施例）第１図は本発明の認識結果表示方法を適用出来る光学式
文字読取装置（ＯＣＲ）のブロック図を示す。(Embodiment) FIG. 1 shows a block diagram of an optical character reader (OCR) to which the recognition result display method of the present invention can be applied.

第１図において、１は入力部で、この入力部１は例えば
Ｃ０Ｄ（電荷結合デバイス）を用いた光電変換式イメー
ジキャスナ−２と画像メモリ３を有する。このイメージ
キャスナ−２は図示しない原稿台に載置された文書の、
例えば英数文字画像を読み取って光電変換し、該文字画
像を表す電気信号を画像メモリ３に格納する。In FIG. 1, reference numeral 1 denotes an input section, and this input section 1 has a photoelectric conversion type image caster 2 using, for example, a C0D (charge-coupled device) and an image memory 3. This image caster 2 can scan documents placed on a document table (not shown).
For example, an alphanumeric character image is read and photoelectrically converted, and an electrical signal representing the character image is stored in the image memory 3.

５は例えばマイクロコンピュータを用いた文字認識部で
、図示しないＲＯＭ（リード・オンリー・メモリ）に格
納された認識論理式にしたがって人力部ｌからの文字情
報の認識を行う。Reference numeral 5 denotes a character recognition section using, for example, a microcomputer, which recognizes character information from the human power section 1 according to a recognition logic formula stored in a ROM (read only memory), not shown.

文字認識部５の１行切り出し部６において、画像メモリ
３内の画像信号は公知の方法で１行革位で切り出し、い
わゆる、セグメンテーションが行なわれ、この切り出さ
れた文字情報が１行メモリ７に蓄積される。In the one-line cutting section 6 of the character recognition section 5, the image signal in the image memory 3 is cut out at one-line intervals using a known method, so-called segmentation is performed, and this cut-out character information is stored in the one-line memory 7. be done.

この１行メモリ７に蓄積された文字情報は、１文字切り
出し部８において例えば画像濃淡ヒストグラム等を用い
て解析された各文字間の間隔情報に基づき１文字毎切り
出される。そしてこの切り出された各文字は、１文字認
識部９において所定の英数字を記憶した文字辞書１０を
参照しつつ１文字単位で認識され、単語メモリ１１へ送
られる。また、上記１行メモリ７からの文字情報出力は
同時に単語切り出し部１２にも送られ、この単語切り出
し部■２において抽出された単語間の区切り情報に基づ
ぎ、上記１文字認識部９からの各文字が上記単語メモリ
１１において単語単位で蓄積される。The character information stored in the one-line memory 7 is extracted character by character by a character extraction unit 8 based on the interval information between each character analyzed using, for example, an image density histogram. Each of the extracted characters is then recognized character by character in the character recognition section 9 while referring to a character dictionary 10 storing predetermined alphanumeric characters, and sent to the word memory 11. Further, the character information output from the one-line memory 7 is simultaneously sent to the word segmentation unit 12, and based on the delimiter information between words extracted in the word segmentation unit 2, the character information output from the one-character recognition unit 9 Each character is stored in the word memory 11 in units of words.

１５は単語のスペルチェックおよびスペルコレクト処理
を行う言語処理部で、スペルチェック部１６、単語辞書
１７およびスペルコレクト部１８から構成される。Reference numeral 15 denotes a language processing section that performs word spell checking and spell correcting processing, and is composed of a spell checking section 16, a word dictionary 17, and a spell correcting section 18.

上記単語メモリ１１から言語処理部１５のスペルチェッ
ク部１６に逐次単語が読み出され、このスペルチェック
部１６において読み出された単語に対するスペルチェッ
ク処理が行なわれる。このスペルチェック処理は単語メ
モリ１１から読み出された単語が予め所定の英単語を記
憶した単語辞書１７内に蓄積されたものと完全に一致し
たものが存在するかどうかを判定して行なわれる。Words are sequentially read from the word memory 11 to the spell check section 16 of the language processing section 15, and the spell check section 16 performs spell check processing on the read words. This spell check processing is performed by determining whether or not a word read out from the word memory 11 completely matches a word stored in a word dictionary 17 that stores predetermined English words.

このスペルチェック部１６における１回目のスペルチェ
ック結果が“ノー”と判定、即ちリジェクト（否定）さ
れると、スペルコレクト部１８が作動する。When the first spell check result in the spell check section 16 is determined to be "no", that is, it is rejected, the spell correct section 18 is activated.

このスペルコレクト部１８は上記１文字認識部９におけ
る各文字の認識率データ等に基づいて当該リジェクトさ
れた単語を構成する文字のうち認識率の低い文字の修正
を行って複数の単語候補を選定する。例えば、正続率の
高い順に第１位から第５位までの５つの単語候補の選定
が行なわれる。この単語候補を選定する処理は本明細書
においてコレクト処理という。This spell correcting unit 18 selects a plurality of word candidates by correcting characters with a low recognition rate among the characters that make up the rejected word based on the recognition rate data of each character in the single character recognition unit 9. do. For example, five word candidates are selected from first to fifth in order of successive success rates. This process of selecting word candidates is referred to as a collect process in this specification.

上記コレクト処理により選定された全ての単語候補がス
ペルチェック部１６においてリジェクト（否定）される
と、当該認識結果を表す単語の存在する可能性がないと
して当該単語がリジェクトされ２０は文字認識結果の表
示及び該結果データの出力を行う出力部である。この出
力部２０の出力メモリ２１に上記スペルチェック部１６
における処理結果、即ち認識処理結果が記憶され、該認
識結果は例えば陰極線管（ＣＲＴ）を用いた表示部２２
に表示されるとともに、例えば翻訳機、ワードプロセッ
サ等の当該認識装置の外部の機器２４に送出される。上
記出力メモリ２１はキーボード等を用いたスペル修正手
段２３と接続され、操作者は該スペル修正手段２３を介
して上記表示部２２の表示画面に映し出された単語の文
字画像を見ながらそのスペルを随意に修正できるように
なっている。When all the word candidates selected by the above-mentioned correcting process are rejected by the spell check section 16, the word is rejected as there is no possibility that a word representing the recognition result exists, and the word 20 is rejected based on the character recognition result. This is an output unit that displays and outputs the result data. The spell check section 16 is stored in the output memory 21 of this output section 20.
The processing result, that is, the recognition processing result is stored, and the recognition result is displayed on the display section 22 using, for example, a cathode ray tube (CRT).
and is sent to a device 24 external to the recognition device, such as a translator or word processor. The output memory 21 is connected to a spelling correction means 23 using a keyboard or the like, and the operator spells the word while looking at the character image of the word displayed on the display screen of the display section 22 via the spelling correction means 23. It can be modified at will.

次に、本発明の認識結果表示方法を第２図の動作フロー
チャートとともに説明する。Next, the recognition result display method of the present invention will be explained with reference to the operational flowchart of FIG.

ステップｌにおいて、上記読取装置の図示しない原稿台
に読み取り対象の原稿、例えば、手書き英文書をセット
して認識操作を開始すると、イメージキ干スナ−２を介
して上記文書面の画像読み取りが行なわれ、読み取られ
た文字情報を表す信号が画像メモリ３に人力される。In step 1, when a document to be read, for example, a handwritten English document, is set on the document table (not shown) of the reading device and a recognition operation is started, an image of the document surface is read through the image scanner 2. , a signal representing the read character information is manually input to the image memory 3.

ステップ２において、上記画像メモリ３内の画像情報が
読み出され、１行切り出し部６において公知の方法で１
行革位で切り出し又はセグメンテーションが行なわれ、
さらに文字情報の１文字単位での切り出しが行なわれる
。この１文字単位で切り出された各文字は公知の方法で
文字辞書１０を参照しながら認識処理される。In step 2, the image information in the image memory 3 is read out, and the one-line cutout unit 6 uses a known method to
Extraction or segmentation is performed at administrative reform level,
Further, the character information is cut out character by character. Each character cut out in single character units is recognized by a known method while referring to the character dictionary 10.

ステップ３において、上述したように１文字毎に認識さ
れた文字列が単語切り出し部１２を介して公知の方法で
単語単位で切り出しが行なわれ、切り出された各単語は
単語メモリ１１に格納される。In step 3, the character string recognized character by character as described above is segmented word by word by a known method via the word segmentation unit 12, and each segmented word is stored in the word memory 11. .

ステップ４において、上記単語メモリ１１から言語処理
部１５のスペルチャック部１６に１単語づつ読み出され
スペルチェック処理が行なわれる。このスペルチェック
処理は読み出された単語の全文字と完全に符合した単語
が単語辞書１７内に存在するかどうかを判定することに
より行なわれる。このスペルチェック処理の結果、“有
”であれば、ステップ５において当該単語が正しく認識
されたとしてスペリングの修正を全く行うことなくその
イメージが表示部２２に表示される。In step 4, each word is read out from the word memory 11 to the spell check section 16 of the language processing section 15 and spell check processing is performed. This spell check processing is performed by determining whether a word that completely matches all the characters of the read word exists in the word dictionary 17. If the result of this spell check process is "Yes", the word is recognized correctly in step 5, and its image is displayed on the display unit 22 without any spelling correction.

一方、ステップ４におけるスペルチェック処理の結果、
“無”と判定されると、ステップ６においてコレクト処
理が行なわれ、可能性もしくは蓋然性の高い単語候補の
選定が行なわれる。このコレクト処理は言語処理部１５
のスペルコレクト部１８において実行される。On the other hand, as a result of the spell check process in step 4,
If it is determined to be "absent", a collect process is performed in step 6, and word candidates with high possibility or probability are selected. This collection process is performed by the language processing unit 15.
This is executed in the spell correcting section 18 of.

上記スペルコレクト処理により選定された全単語候補は
ステップ７において上述したと同様にしてスペルチェッ
ク処理され、可能性のある単語の検索が実行される。こ
の検索の結果、可能性のある単語を発見、即ち、上記単
語辞書１７内に登録されているいずれかの単語と完全に
符合した単語候補を発見した場合、ステップ８において
表示部２２に該単語候補の全文字にアンダーラインを付
して表示される。一方、可能性のある単語を発見出来な
かった場合、ステップ９において表示部２２に当該スペ
ルコレクト処理を行う前の単語の全文字即ち、当該単語
単位の認識結果が反転表示される。All word candidates selected by the spell correcting process are spell checked in step 7 in the same manner as described above, and a search for possible words is performed. As a result of this search, if a possible word is found, that is, a word candidate that completely matches any of the words registered in the word dictionary 17, the word is displayed on the display section 22 in step 8. All candidate characters are displayed with an underline. On the other hand, if a possible word cannot be found, in step 9, all the characters of the word before the spell correcting process, that is, the recognition results for each word, are displayed in reverse on the display section 22.

上記ステップ８および９におけるアンダーラインを付し
た表示例（Ａ）および反転表示例（Ｂ）を第３図に示す
。FIG. 3 shows an underlined display example (A) and an inverted display example (B) in steps 8 and 9 above.

なお、上記実施例においては認識対象の文字は英数字と
したが適宜な認識論理式を用いることにより、漢字・か
な文字に対しても適用可能である。In the above embodiment, the characters to be recognized are alphanumeric characters, but by using an appropriate recognition logical formula, the present invention can also be applied to kanji and kana characters.

（効果）以上の説明から明らかなように、本発明によれば認識さ
れた文字列を単語単位でスペルチェックおよびスペルコ
レクト処理を行って略完全な誤認識の文字を含む単語と
自動的に修正された正続率の高い単語とに区別し、操作
者による修正作業を単語単位で行うようにしたから、従
来方法の文字単位で修正を行うものに比べ有効に修正作
業を軽減することが出来る。(Effects) As is clear from the above description, according to the present invention, recognized character strings are spell-checked and spell-corrected word by word, and words containing almost completely misrecognized characters are automatically corrected. Since the operator is able to perform correction work on a word-by-word basis, the correction work can be effectively reduced compared to the conventional method, which corrects on a character-by-character basis. .

また、略完全な誤認識を含む単語を反転表示する一方、
自動的に修正された正続率の高い、換言すれば可能性の
ある単語にアンダーラインを付して表示するようにした
から、操作者は表示画面を一見するだけで所要の作業を
迅速に実行することが出来、したがって認識作業能率を
有効に高めることが出来る。In addition, while highlighting words that include almost completely misrecognized words,
Words with a high success rate (in other words, words that are likely to be correct) are automatically corrected and displayed with an underline, allowing the operator to quickly perform the required tasks with just a glance at the display screen. Therefore, the efficiency of recognition work can be effectively increased.

[Brief explanation of the drawing]

第１図は本発明の方法を適用できる光学式文字読取装置
のブロック図、第２図は本発明の方法を使用した上記第１図の装置の動
作フローチャート、第３図は本発明を使用した際の第１図の装置の表示部に
おける反転表示およびアンダーライン（−ｊ記表示例を
示す。１・・・人力部、５・・・認識部、６・・１行切り出し部、８・・・・１文字切り出し部、
９・・・１文字認識部、１０・・・文字辞書、１２・・
・単語切り出し部、１５・・・言語処理部、１６・・・
スペルチェック部、Ｉ７・・・単語辞書、１８・・・ス
ペルコレクト部、２０・・・出力部、２２・・・表示部
、２３・・・修正手段、２４・・・外部機器、Ａ・・・
アンダーライン表示、Ｂ・・・反転表示。Fig. 1 is a block diagram of an optical character reading device to which the method of the present invention can be applied, Fig. 2 is an operation flowchart of the device shown in Fig. 1 using the method of the present invention, and Fig. 3 is a block diagram of an optical character reading device to which the method of the present invention is applied. Examples of inverted display and underline (-j) display on the display section of the apparatus shown in FIG.・・1 character cutting part,
9...1 character recognition unit, 10...character dictionary, 12...
・Word extraction unit, 15...Language processing unit, 16...
Spell check section, I7... Word dictionary, 18... Spell correct section, 20... Output section, 22... Display section, 23... Correction means, 24... External device, A...・
Underline display, B...Reverse display.

Claims

[Claims]

(1) In order to display the results recognized character by character in the recognition unit on the display unit, the recognized character string from the recognition unit is cut out word by word, and spell check and spell correction processing is performed for each word cut out. and when the existence of the corresponding word is denied as a result of the spell check process, all the letters of the word are displayed in reverse video, and all the letters of the word corrected by the spell correct process are displayed with an underline. A recognition result display method in a character recognition device characterized by: