[go: up one dir, main page]

JPH0259504B2 - - Google Patents

Info

Publication number
JPH0259504B2
JPH0259504B2 JP58054481A JP5448183A JPH0259504B2 JP H0259504 B2 JPH0259504 B2 JP H0259504B2 JP 58054481 A JP58054481 A JP 58054481A JP 5448183 A JP5448183 A JP 5448183A JP H0259504 B2 JPH0259504 B2 JP H0259504B2
Authority
JP
Japan
Prior art keywords
character
similarity
characters
value
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP58054481A
Other languages
Japanese (ja)
Other versions
JPS59205681A (en
Inventor
Fumio Yoda
Keiji Kobayashi
Masataka Yamamoto
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Basic Technology Research Association Corp
Original Assignee
Computer Basic Technology Research Association Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Basic Technology Research Association Corp filed Critical Computer Basic Technology Research Association Corp
Priority to JP58054481A priority Critical patent/JPS59205681A/en
Publication of JPS59205681A publication Critical patent/JPS59205681A/en
Publication of JPH0259504B2 publication Critical patent/JPH0259504B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 この発明は帳票などに記入された文字を読取つ
て当該文字の文字コードを出力する文字読取装置
に関するものであり、更に詳しくは、文字読取装
置における大分類(第1段の分類)方法に関する
ものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character reading device that reads characters written on a form etc. and outputs the character code of the character. classification) method.

漢字などの多字種の文字を認識する場合、最初
に識別の対象とする文字を小数に絞つた後、更に
詳細な特徴を用いて識別を行う階層的識別方法が
一般に用いられている。この最初の処理を大分類
と称し、大分類においてはその処理自体が簡単で
あつて、大分類で選定される候補文字の数がなる
べく少くなることがのぞましい。
When recognizing multi-character characters such as kanji, a hierarchical identification method is generally used in which the characters to be identified are first narrowed down to a decimal number, and then further detailed characteristics are used for identification. This first process is called major classification, and it is desirable that the process itself be simple and that the number of candidate characters selected in the major classification be as small as possible.

第1図はこの発明の一実施例を示すブロツク図
であるが、第1図において閾値テーブル7を除い
たものが、ほゞ従来の装置を示し、1は帳票、2
は走査手段、3は特徴抽出手段、4は類似度算出
手段、5は認識辞書、6は分類手段、8は識別手
段である。
FIG. 1 is a block diagram showing an embodiment of the present invention, but the one in FIG. 1 excluding the threshold value table 7 shows a substantially conventional device.
3 is a scanning means, 3 is a feature extraction means, 4 is a similarity calculation means, 5 is a recognition dictionary, 6 is a classification means, and 8 is an identification means.

第2図は帳票1の枠内に記入された文字を示す
図で、9は漢字「田」が記入されている例を示
す。従来の装置はよく知られているので、その詳
細な説明を省略するが、帳票1に記入された文字
9は、たとえばテレビジヨンカメラのような光電
変換装置で走査され、この走査の結果入力文字パ
ターンとなつて記憶される。特徴抽出手段3は入
力文字パターンから所定の法則に従つてその特徴
を抽出する。
FIG. 2 is a diagram showing characters written in the frame of form 1, and 9 shows an example in which the kanji character "田" is written. Since the conventional device is well known, a detailed explanation thereof will be omitted, but the character 9 written on the form 1 is scanned by a photoelectric conversion device such as a television camera, and as a result of this scanning, the input character is It becomes a pattern and is memorized. The feature extraction means 3 extracts features from the input character pattern according to a predetermined rule.

一方、この装置で読取るべきすべての文字に対
して、各文字の基準とする文形の文字パターンに
ついて特徴抽出を行い、その特徴を各文字の文字
コードに対応して認識辞書5に記憶している。
On the other hand, for all the characters to be read by this device, features are extracted for the character pattern of the standard sentence shape of each character, and the extracted features are stored in the recognition dictionary 5 in correspondence with the character code of each character. There is.

特徴抽出手段3で抽出された特徴は、類似度算
出手段4によつて認識辞書5内に記憶される特徴
と比較されてその類似度が算出される。
The features extracted by the feature extraction means 3 are compared with the features stored in the recognition dictionary 5 by the similarity calculation means 4 to calculate the similarity.

第3図に算出された類似度の一例を示す図で1
0は認識辞書内では文字コードによつて表わされ
ている文字を仮に漢字で示し、11は各文字10
に対する類似度を、12は文字10のうちの漢字
「田」を、13は漢字「田」に対する類似度を示
す。
Figure 3 shows an example of the calculated similarity.
0 temporarily indicates the character represented by the character code in the recognition dictionary as a kanji, and 11 indicates each character 10.
12 indicates the similarity to the kanji character ``田'' among the characters 10, and 13 indicates the similarity to the kanji character ``田''.

類似度算出手段4から出力される、たとえば第
3図に示すような類似度11から候補文字を選定
する従来の方法のうちの1つは、類似度11の値
の大きいものからN個の文字を選定することであ
つた。第3図の例において、N=5とすると
「田」「国」「図」「間」「女」の5文字が候補文字
となるがこの方法の欠点は新しく類似度が算出さ
れるごとに、類似度順に従つて文字コード順の並
べ換えを実行しなければならず、認識の対象とな
る文字が多い場合、並べ換えの実行のために多く
の時間を要し、ひいては認識処理が遅くなるとい
う欠点があつた。
One of the conventional methods of selecting candidate characters from the similarities 11 outputted from the similarity calculation means 4, for example as shown in FIG. It was a matter of selecting. In the example in Figure 3, if N = 5, the five characters ``field'', ``country'', ``figure'', ``ma'', and ``onna'' are candidate characters, but the disadvantage of this method is that each time a new similarity is calculated, , the character code order must be sorted according to the similarity order, and when there are many characters to be recognized, it takes a lot of time to perform the sorting, which slows down the recognition process. It was hot.

従来の方法のうちの第2の方法は、各文字に対
する類似度から最大類似度を求め、この最大類似
度から、あらかじめ定めた固定の閾値を減じたも
のを判定値とし、この判定値より大きな類似度を
有する文字を候補文字として選定する方法であ
る。この方法では類似度順により文字の順序を並
べ換える必要はなく、最大類似度から判定値を求
め、この判定値と各文字に対する類似度を比較す
れば良い。
The second method among the conventional methods is to find the maximum similarity from the similarity for each character, and use the judgment value obtained by subtracting a predetermined fixed threshold from this maximum similarity. This is a method of selecting characters with a degree of similarity as candidate characters. In this method, it is not necessary to rearrange the order of the characters in order of similarity, but it is sufficient to obtain a judgment value from the maximum similarity and compare this judgment value with the degree of similarity for each character.

たゞ、この第2の方法では、どの文字に対して
も同一の閾値を用いるため、ある文字群に対して
はこの閾値が大きすぎて候補文字数が多くなりす
ぎ、また、他の文字群に対してはこの閾値が小さ
すぎて誤分類の割合が増加するという欠点があつ
た。
However, in this second method, the same threshold value is used for all characters, so this threshold value is too large for some character groups, resulting in too many candidate characters. However, this threshold value was too small, resulting in an increased rate of misclassification.

この発明は従来の装置における上記の欠点を除
去するためになされたもので、文字ごとに類似度
の分散から字形の類似性で分類した文字群に対す
る閾値を決定し、この文字群に属する文字に対す
る閾値を記憶し、この記憶した閾値を用いて大分
類を行うことによつて、分類能力を低下させるこ
となく分類処理に要する時間を短縮することを目
的としている。
This invention was made in order to eliminate the above-mentioned drawbacks of conventional devices, and it determines a threshold value for a group of characters classified by the similarity of glyph shapes from the dispersion of similarity for each character, and The purpose of this invention is to shorten the time required for classification processing without reducing classification ability by storing threshold values and performing major classification using the stored threshold values.

以下、図面についてこの発明の実施例を説明す
る。先に述べたとおり、第1図はこの発明の一実
施例を示すブロツク図で、類似度算出手段4の出
力点までの動作は既に説明したとおりである。
Embodiments of the invention will be described below with reference to the drawings. As mentioned above, FIG. 1 is a block diagram showing one embodiment of the present invention, and the operation up to the output point of the similarity calculation means 4 is as described above.

閾値テーブル7の内容は次のようにして決定す
ることができる。すなわち、1つの文字について
その文字の基準的な字形から変形した字形であつ
て、変形はしているけれども人間が読取るときは
容易に正しく読取ることができる許容範囲内にあ
る変形文字を複数個用意し、これら変形文字の類
似度を算出することにより、当該文字に対する類
似度の分類を知ることができる。次に字形の類似
性で分類した文字群に対応する閾値を上記分類を
用いて決定する。そして、この文字群毎に定めた
閾値を文字群を構成する文字に対応させて記憶す
る。例えば、文字群Mの閾値THMは、文字群M
に属する文字の分散値の最大値とすることができ
る。そして、文字群Mに属する文字Cmに対応す
る閾値THcnとして上記閾値THMの値をセツトす
る。一般に文字は字形の類似した特定の文字に誤
読し易いことがこの分野の周知の事実として知ら
れている。そこで、相互に誤読し易い文字の集合
を文字群として定める。この結果、字形の変形に
対し類似度の分散が小さい文字群においては閾値
を小さくすることができる。一方、類似度の分散
が大きな文字群に対しては変形して記入された場
合でも候補文字の選定から洩れることのないよう
に閾値を大きくしておくことができるである。
The contents of the threshold table 7 can be determined as follows. In other words, for one character, a plurality of deformed characters are prepared, which are deformed from the standard character form, and which are deformed but within a permissible range that can be easily and correctly read by humans. However, by calculating the degree of similarity of these modified characters, it is possible to know the classification of degree of similarity for the character. Next, a threshold value corresponding to the character group classified based on the similarity of letterforms is determined using the above classification. Then, the threshold value determined for each character group is stored in association with the characters constituting the character group. For example, the threshold value TH M for the character group M is
It can be the maximum value of the variance value of the characters belonging to . Then, the value of the threshold THM is set as the threshold THcn corresponding to the character Cm belonging to the character group M. It is a well-known fact in this field that characters are generally easily misread as specific characters with similar shapes. Therefore, a set of characters that are likely to be mutually misread is defined as a character group. As a result, the threshold value can be reduced for a character group in which the dispersion of similarity is small with respect to character shape deformation. On the other hand, for a group of characters with a large variance in similarity, the threshold value can be set large so that even if the character group is deformed and written, the character group will not be omitted from the selection of candidate characters.

認識辞書5中に存在するすべての文字コードの
文字に対し閾値を決定して、これを文字コードと
対応して閾値テーブル7に記憶しておく。
Thresholds are determined for characters of all character codes existing in the recognition dictionary 5, and are stored in a threshold table 7 in correspondence with the character codes.

第4図は第3図の文字に対応して分類手段6が
閾値テーブル7から読出した閾値を示し、第3図
と同一符号は同一意味をを有し、14は閾値、1
5は漢字「田」に対する閾値である。
FIG. 4 shows the thresholds read out from the threshold table 7 by the classification means 6 in correspondence with the characters in FIG. 3, where the same symbols as in FIG.
5 is the threshold value for the kanji character "田".

分類手段6は、類似度算出手段4によつて算出
された最大類似度は0.80(第3図)であることを
知り、この類似度を与える文字「田」に対する閾
値15を閾値テーブル7から0.04と読出し、0.80
−0.04=0.76を判定値として、類似度0.76以上の
文字「田」「国」「図」を候補文字16(第5図)
として決定する。
The classification means 6 knows that the maximum similarity calculated by the similarity calculation means 4 is 0.80 (FIG. 3), and sets the threshold 15 for the character "田" giving this similarity to 0.04 from the threshold table 7. and read out, 0.80
Using −0.04=0.76 as the judgment value, candidate characters 16 are the characters “Ta”, “Kuni”, and “Zu” with a similarity of 0.76 or more (Figure 5)
Determine as.

もし、漢字「田」が著しく変形して記入されて
おり、或は文字パターンにノイズが存在し、漢字
「田」の類似度0.80−0.04=0.76になり、他が第3
図に示すとおりであるとすれば、最大類似度は
0.78(「国」)となり、判定値は0.78−0.04=0.74と
なり、「女」「図」「田」「国」「間」が候補文字と
なるが、「田」が候補文字から除外される機会は
極めて少ない。
If the kanji ``田'' is written with a significant deformation, or if there is noise in the character pattern, the similarity of the kanji ``田'' will be 0.80-0.04 = 0.76, and the other characters will be in the 3rd place.
If it is as shown in the figure, the maximum similarity is
The result is 0.78 (``Kuni''), and the judgment value is 0.78-0.04 = 0.74, so ``Onna'', ``Figure'', ``田'', ``Kuni'', and ``Ma'' are candidate characters, but ``田'' is excluded from the candidate characters. There are very few opportunities.

識別手段8は候補文字を入力し、その中からい
ずれか1つの文字を決定するか、又は読取り拒否
とするかの判定を行うが、この動作は従来の装置
と同一であるのでその説明を省略する。
The identification means 8 inputs candidate characters and determines whether to select one of the characters or refuse to read it, but this operation is the same as that of conventional devices, so a description thereof will be omitted. do.

以上のようにこの発明では、あらかじめ字形の
類似性によつて定めた文字群ごとに決定した閾値
を文字ごとの閾値として設定し、最大類似度を与
える文字に応じた閾値を分類判定に用いているた
め、分類誤りを増大させることなく、候補文字を
効果的に絞ることが可能であり、また類似度の高
さに基いて文字の順序を並べ変える必要がないた
め高速に分類を行うことができる。
As described above, in this invention, a threshold value determined in advance for each character group based on the similarity of character shapes is set as a threshold value for each character, and a threshold value corresponding to the character that gives the maximum degree of similarity is used for classification determination. Therefore, it is possible to effectively narrow down candidate characters without increasing classification errors, and there is no need to rearrange the order of characters based on high similarity, so classification can be performed at high speed. can.

なお、第2図〜第5図の例は漢字だけを示した
が、この発明によつて読取ることのできる文字は
漢字に限定されるものではない。
Although the examples of FIGS. 2 to 5 show only Chinese characters, the characters that can be read by this invention are not limited to Chinese characters.

以上のようにこの発明によれば、認識精度を低
下することなく、大分類の処理時間を大幅に短縮
することができる。
As described above, according to the present invention, the processing time for major classification can be significantly shortened without reducing recognition accuracy.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図はこの発明の一実施例を示すブロツク
図、第2図は入力文字の例を示す図、第3図は第
1図の類似度算出手段で算出される類似度の例を
示す図、第4図は第3図に示す文字に対応する閾
値を示す図、第5図は選定された候補文字を示す
図である。 1……帳票、2……走査手段、3……特徴抽出
手段、4……類似度算出手段、5……認識辞書、
6……分類手段、7……閾値テーブル、8……認
識手段。
FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of input characters, and FIG. 3 is a diagram showing an example of similarity calculated by the similarity calculation means of FIG. 1. , FIG. 4 is a diagram showing threshold values corresponding to the characters shown in FIG. 3, and FIG. 5 is a diagram showing selected candidate characters. 1... Form, 2... Scanning means, 3... Feature extraction means, 4... Similarity calculation means, 5... Recognition dictionary,
6... Classification means, 7... Threshold table, 8... Recognition means.

Claims (1)

【特許請求の範囲】 1 帳票などに記入された文字を走査して光電変
換し、その処理結果を入力文字パターンとして記
憶する走査手段と、 上記入力文字パターンから所定の法則に従つて
当該文字パターンの特徴を抽出する特徴抽出手段
と、 読取るべきすべての文字の基準的な字形につい
ての入力文字パターンから上記所定の法則に従つ
て抽出した文字パターンの特徴を各文字の文字コ
ードと対応して記憶する認識辞書と、 上記特徴抽出手段で抽出した特徴と上記認識辞
書に記憶される特徴との間の類似度を算出する類
似度算出手段と、 上記認識辞書に記憶されるすべての文字につい
て、当該文字の基準的な字形から許容される範囲
内で変形した複数種類の変形字形について抽出し
た各特徴と、上記認識辞書内の当該文字の特徴と
の間の各類似度を算出することによつて得られる
当該文字に関する類似度の分散値に基づいて、あ
らかじめ字形の類似性によつて定めた文字群毎に
決定した閾値を各文字の文字コードと対応して記
憶する閾値テーブルと、 上記類似度算出手段によつて算出された類似度
のうち最大値を与える文字に対する閾値を上記閾
値テーブルから読出して、この読出した閾値を上
記最大値から減算した値を判定値として、上記類
似度算出手段によつて算出された類似度のうち上
記判定値以上の類似度を有する文字を候補文字と
して選定する分類手段と、 この分類手段により選定された候補文字の中か
ら所定の法則により1つの文字を決定し又は決定
不能として処理する認識手段と、 を備えた文字読取装置。
[Scope of Claims] 1. A scanning means for scanning and photoelectrically converting characters written on a form, etc., and storing the processing result as an input character pattern; a feature extracting means for extracting the features of the standard glyphs of all the characters to be read; and a feature extracting means for extracting the features of the character patterns extracted according to the above-mentioned predetermined rules from the input character pattern regarding the standard glyph shapes of all the characters to be read, and storing the features of the character patterns in correspondence with the character code of each character. a recognition dictionary that calculates the similarity between the features extracted by the feature extraction means and the features stored in the recognition dictionary; By calculating the similarity between each feature extracted for multiple types of deformed glyphs that have been deformed within the allowable range from the standard glyph shape of the character and the features of the character in the recognition dictionary. a threshold table that stores a threshold value determined for each character group predetermined based on the similarity of character shapes, in correspondence with the character code of each character, based on the obtained dispersion value of the similarity degree regarding the character; and The threshold value for the character that gives the maximum value among the degrees of similarity calculated by the calculation means is read from the threshold value table, and the value obtained by subtracting the read threshold value from the maximum value is used as a judgment value, and the value obtained by subtracting the read threshold value from the maximum value is used as the judgment value. a classification means for selecting as a candidate character a character having a degree of similarity equal to or higher than the above-mentioned judgment value among the degrees of similarity thus calculated; and one character is determined according to a predetermined rule from among the candidate characters selected by the classification means. A character reading device comprising: a recognition means for processing the characters as either undecidable or undecidable;
JP58054481A 1983-03-30 1983-03-30 Character reader Granted JPS59205681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58054481A JPS59205681A (en) 1983-03-30 1983-03-30 Character reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58054481A JPS59205681A (en) 1983-03-30 1983-03-30 Character reader

Publications (2)

Publication Number Publication Date
JPS59205681A JPS59205681A (en) 1984-11-21
JPH0259504B2 true JPH0259504B2 (en) 1990-12-12

Family

ID=12971845

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58054481A Granted JPS59205681A (en) 1983-03-30 1983-03-30 Character reader

Country Status (1)

Country Link
JP (1) JPS59205681A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0241588A (en) * 1988-08-01 1990-02-09 Fuji Electric Co Ltd Rejecting method for unknown pattern recognition result
JP4649017B2 (en) * 2000-07-28 2011-03-09 株式会社東芝 Character recognition device and character recognition method

Also Published As

Publication number Publication date
JPS59205681A (en) 1984-11-21

Similar Documents

Publication Publication Date Title
Kanai et al. Automated evaluation of OCR zoning
EP0621542B1 (en) Method and apparatus for automatic language determination of a script-type document
US6272242B1 (en) Character recognition method and apparatus which groups similar character patterns
US6151423A (en) Character recognition with document orientation determination
US4910787A (en) Discriminator between handwritten and machine-printed characters
US6834121B2 (en) Apparatus for rough classification of words, method for rough classification of words, and record medium recording a control program thereof
JPH04225485A (en) Bar-code recognizing method and apparatus
JP4553241B2 (en) Character direction identification device, document processing device, program, and storage medium
US20020154815A1 (en) Character recognition device and a method therefore
US4288779A (en) Method and apparatus for character reading
JP3485020B2 (en) Character recognition method and apparatus, and storage medium
Lehal et al. Feature extraction and classification for OCR of Gurmukhi script
Verma et al. Removal of obstacles in Devanagari script for efficient optical character recognition
EP0144006B1 (en) An improved method of character recognitionand apparatus therefor
JPS60153574A (en) Character reading system
JPH0259504B2 (en)
JPH0567237A (en) Method and device for blank recognition, character recognition device and english/japanese trasnslation device
EP0708945B1 (en) Method of analyzing cursive writing
JPH0458073B2 (en)
JP2582611B2 (en) How to create a multi-font dictionary
JP4215385B2 (en) PATTERN RECOGNIZING DEVICE, PATTERN RECOGNIZING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM CONTAINING PROGRAM FOR CAUSING COMPUTER TO EXECUTE THE METHOD
JP2578767B2 (en) Image processing method
JP2578768B2 (en) Image processing method
JP2851865B2 (en) Character recognition device
JPH0475557B2 (en)