JP3548233B2

JP3548233B2 - Character recognition apparatus and method

Info

Publication number: JP3548233B2
Application number: JP14605394A
Authority: JP
Inventors: 徹二木
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-06-28
Filing date: 1994-06-28
Publication date: 2004-07-28
Anticipated expiration: 2019-07-28
Also published as: JPH0816728A

Description

【０００１】
【産業上の利用分野】
本発明は文字認識装置及び方法に関するものである。
【０００２】
【従来の技術】
一般に、文字認識処理にもいくつかあるが、図４に示すような例も有効である。以下、順を追って説明する。
【０００３】
先ず、ステップＳ４０１で文書画像のスキャンを行い画像情報を２値のデジタル画像データに変換する。次のステップＳ４０２では、読み取った画像データに対して、１文字ごとの領域を取り出していく文字切り出し処理を行い、ステップＳ４０３で予め定められたアルゴリズムにしたがって特徴抽出を行う。
【０００４】
例えば、一文字の画像を８×８の小領域（計６４個の領域）に分割し、その小領域内で黒画素の数を集計する。６４個の領域ごとに黒画素の特徴が計算されるので６４次元の特徴ベクトルが生成されることになる。さらに文字の大きさの影響を取り除くためにステップＳ４０４で文字の大きさによって特徴ベクトルを正規化する。元の特徴ベクトルをＸ、正規化後の特徴ベクトルをＸ’とすると、
Ｘ’＝（ａ／（ｈ×ｗ））Ｘ
という正規化を行う。ただし、ｈ，ｗはそれぞれ文字の高さと幅で、ａは定数である。
【０００５】
こうして、正規化された特徴ベクトルと認識辞書の学習パターンを照合し最適な文字種を選び出すわけだが、ここで認識精度を高めようとして、精度の高い距離計算を用いて照合を行おうとすると、必然的に計算量が増大し、処理速度が低下してしまうという問題が発生し、逆に処理速度を向上させようとする場合には簡単な距離計算を行うことになるので認識精度が低下するという問題が発生する。
【０００６】
そこで、簡単な距離計算を用いた第一次照合により文字種を絞り込み、絞り込まれた文字種（候補）に対してのみ、より精度が高く演算量の多い第二次照合を行うという処理が考えられる。
【０００７】
ステップＳ４０５では、演算量の少ない距離計算を用いて第一次照合を行う。本例ではシティ・ブロック距離を用いる。そして、個々の正規な文字ｋに対する平均ベクトルμ（ｋ）（＝μ１（ｋ），…，μ６４（ｋ））を予め記憶させた辞書を用意しておき、この平均ベクトルと認識対象の文字から得られた入力特徴ベクトルＸ（＝ｘ１（ｋ），…，ｘ６４（ｋ））との距離ｄｃ（ｋ）を下記の式で求める。
【０００８】
【数１】

全ての文字種に足してｄＣが計算され、ステップＳ４０６でｄＣの小さい方から上位１０個の文字種が選択される。こうして選択された上位１０個の文字種に対してステップＳ４０７で第二次照合を行う。第二次照合では精度を重視して、演算量は多いが、精度の高い距離計算として次式で与えられるマハラノビス距離を用いる。
【０００９】
【数２】

ここで、Ｓ（ｋ）は文字種ｋの共分散行列である。また、右肩の添え字ｔは転置を、−１は逆行列を表す。
【００１０】
こうして、ステップＳ４０８において、最も小さいｄＭを与える文字種を最終的な候補と判定しステップＳ４０９で出力する。
【００１１】
【発明が解決しようとする課題】
上記処理によれば、２段階で文字を絞り込むという点で、ある面で優れているが、これは第１次照合で絞られた候補の中に正解となる文字が含まれるというのが前提である。つまり、もし第１次照合で得られた候補内に正解となる文字が含まれていない場合には、第２次照合の精度にかかわらず正しい認識結果が得られないという問題がある。
【００１２】
【課題を解決するための手段】
本発明は上記問題点に鑑みなされたものであり、速度的に問題なく、且つ、認識対象の文字の取りこぼしの発生を抑制し、高い精度で文字を認識する文字認識装置及び方法を提供しようとするものである。
【００１３】
この課題を解決するため、例えば本発明の文字認識装置は以下に示す構成を備える。すなわち、
入力された文字画像を文字認識する文字認識装置において、
速度優先の第１の文字認識手段と、
精度優先の第２の文字認識手段と、
文字と当該文字に対する類似文字群の組みで構成された類似文字辞書と、
前記速度優先の第１の文字認識手段を用いて前記文字画像を文字認識することによって、候補文字群を抽出する第１の抽出手段と、
前記文字画像を含む原稿画像に応じて、類似文字群を抽出する対象となる候補文字の個数Ｎを変更する変更手段と、
前記抽出された候補文字群中における類似度の高い方から上位Ｎ個の候補文字それぞれに対する類似文字群を前記類似文字辞書から抽出する第２の抽出手段と、
前記第１及び第２の抽出手段で抽出された文字群を候補範囲とした前記精度優先の第２の文字認識手段を用いて前記文字画像を文字認識し、該第２の文字認識手段で文字認識した結果の候補文字を出力するように制御する制御手段とを備える。
【００１５】
また、前記第１の文字認識手段は、シティ・ブロック距離によって文字認識し、前記第２の文字認識手段は、マハラノビス距離によって文字認識することが望ましい。これによって、認識対象の候補文字を高速に絞り込むことが可能となり、尚且つ、高精度に文字を認識することが可能になる。
【００１９】
【実施例】
以下、添付図面に従って本発明に係る実施例を詳細に説明する。
【００２０】
本実施例では、ＯＣＲ（光学的文字認識装置）に適用した場合である。図１は第１の実施例の構成を表すブロック図である。
【００２１】
図１において、１０１は画像原稿に光を照射し、その反射光を読み取り電気信号に変換するスキャナ、１０２はスキャナ１０１で得られた電気信号を２値のデジタル電気信号に変換し他の装置構成要素に伝送するためのスキャナインターフェース回路、１０３はディスプレイのウィンドウ上で所望とする座標を入力するためのポインティングデバイス（マウス等）、１０４はポインティングデバイス１０３からの信号を受け、それを他の装置構成要素に伝送するためのインターフェース回路、１０５は装置全体の制御及び文字切り出し処理や認識処理を実行するためのＣＰＵ、１０６はＣＰＵ１０５が実行する制御プログラム、各種処理プログラム、認識辞書、類似文字種テーブルなどを格納しているＲＯＭ、１０７は文字画像の展開や文字認識処理のための作業領域などとして用いられるＲＡＭである。また、１０８は入力イメージや認識結果を表示するためのディスプレイ、１０９はディスプレイインターフェース回路、そして１１０は各装置構成要素を接続するバスである。
【００２２】
先ず、実施例における類似文字種テーブルの作成法を説明する。文字同士の近さの度合いを各文字種の平均特徴ベクトル間の距離で定義する。平均特徴ベクトルは同じ文字でもフォントや大きさ印刷状態などの異なるさまざまな学習データから予め求めておく。本実施例では距離はユークリッド距離を用いて類似文字種を定める。すなわち、ある文字種ｍに注目した場合、他の文字種ｋとの平均ベクトル距離Ｄ（ｍ，ｋ）は次のようにする。
【００２３】
【数３】

ここで、Ｐは文字種の総数である。そして、すべてのｋに対してＤ（ｍ，ｋ）を計算し、距離の小さい方から上位１０個を類似文字とする。ただし、自分自身はＤ＝０で最も距離が小さくなるが意味がないので除く。この計算によりすべての文字種に対する類似文字種が定義されるのでこれをテーブル状態で予め格納しておく。尚、このテーブルは、実際にはその内容が変更されることはないので、実施例ではＲＯＭ１０６に予め書き込んでおく。但し、ＲＯＭに限定されるものではなく、ハードディスク装置等の記憶装置に保持させてもよいし、電源投入時にそれら不揮発性記憶装置からＲＡＭにロードするようにしても良い。
【００２４】
以上の結果、例えば“間”という文字に着目したとき、この文字“間”に平均ベクトルの一番小さいものから１０個が瞬時に取り出せるようになっている。
【００２５】
図３は上記類似文字テーブルの構造を模式的に示したものである。先頭文字３００に連続して格納されている文字がその先頭文字に類似している文字群であり、その類似度順（平均ベクトル距離の小さい順）に配置されている。従って、このテーブルには各先頭文字に対して類似している文字が１０文字ずつ並んでいるので、テーブルの先頭位置から１１文字ごとに検索することで、注目文字を検索できるので、注目文字に対する１０個の類似文字を抽出することが可能になっている。従って、文字“間”に着目した場合には、図示の符号３８０に該当する文字（文字コード）を検索できるので、その結果、問、聞、開、…閣の計１０文字をいっきに抽出できる。
【００２６】
尚、図では文字そのもので表しているが、実際には文字コード（例えば、ＪＩＳコード等）が格納されている。
【００２７】
次に、上記テーブルを備えた装置におけるＣＰＵ１０５の処理内容を図２のフローチャートに従って説明する。尚、同図における処理手順に対応するプログラムはＲＯＭ１０６に格納されているものである。
【００２８】
ステップＳ２０１〜Ｓ２０５は従来例のＳ４０１〜Ｓ４０５と同じであるものとし、説明が重複するので簡単に説明する。
先ず、原稿画像を読み取り、その読み取った２値画像データを一旦ＲＡＭ１０７の所定エリアに格納し、文字の切り出しを行う。１つの文字は、先に説明したのと同じ個数、すなわち、８×８の小領域（全部で６４個の領域）に分割し、６４次元の特徴ベクトルを生成する。次いで、文字のサイズによる影響を除くためにその生成した特徴ベクトルを正規化する（正規化の原理は先に説明した通り）。こうして、正規化された特徴ベクトルに基づいて、辞書（ＲＯＭ１０６に格納されているものとする）を参照する
以上がステップＳ２０１〜Ｓ２０５の処理である。
【００２９】
さて、辞書を参照することで、正規化された特徴ベクトルに近い方から数文字（説明を簡単にするためここでも１０文字にする）を選び出す（ステップＳ２０６）。
【００３０】
この結果、例えば以下の文字が選択されたとしよう。
【００３１】
間問聞開闘閣閉岡向商（１）
最も距離の近かった文字は「間」なので、類似文字種テーブル（図３参照）で「間」をキーとして検索する（ステップＳ２０７）。図３の類似文字種テーブルにおいて、テーブルの先頭から１１文字ごとに照合していくことによってキーと３８０のコードが一致する。その結果、符号３８１の位置から符号３９０の前の位置まで格納されている１０個の類似文字
問聞開関闘閉閑岡商閣（２）
を得ることができる。（２）と元々の１０個の候補（１）を合わせ、重複しているものを除くと
間問聞開関闘閉閑岡商閣向（３）
という１２個の文字が得られる。
【００３２】
次に、処理はステップＳ２０８に進んで、（３）の１２個の文字に対して第二次照合を行う。第二次照合は、先に説明したのと同じように精度の高いマハラノビス距離を次式で得る。
【００３３】
【数４】

ここで、ｋは１２個の文字種を表し、Ｘは認識しようとしている文字イメージから得たベクトルである。
【００３４】
そして、最終的に、上記演算のよる１２個のｄＭ（ｋ）の中で距離の近い順に並び替え、その先頭文字を第１候補として出力する（ステップＳ２０９）。尚、表示画面に表示された第１候補を見た操作者が、キーボードやポインティングデバイスを使用して次候補表示指示を行った場合には、第２候補以下を表示画面にその順番に表示する。操作者は、この表示された候補の中から目的の文字を探し出し、選択することになる。
【００３５】
尚、ここでは１２個に対してマハラノビス距離を算出したが、第１次照合と第２次照合とでそれぞれ１０個の類似文字を用いるわけであるから、計算する個数は１０〜２０の範囲である。
【００３６】
以上の結果、本実施例によれば、認識対象の文字イメージに基づいて高速な手法によって先ず所定数の第１の候補文字（上記実施例では１０文字）を抽出する。そして、その候補文字群における一番類似している文字そのもの対する類似文字をテーブルから所定数の第２の候補文字（実施例ではこれも１０文字）を抽出する。そして、これら第１の候補文字群と第２の候補文字群の論理和された結果（重複する文字を除くという意味）に対して、より高精度の認識処理を行い、その結果に基づいて候補文字をう。従って、精度が多少落ちるものの高速に処理できる認識処理による候補文字群中に目的の文字がない場合であっても、その候補文字を越える範囲で、しかも、不要に広範囲にならずに高精度の文字認識を行えることで、処理速度の高速性を保ちながら、認識結果のとりこぼしの発生を低く抑えることが可能になる。
【００３７】
［第２の実施例の説明］
上記実施例（第１の実施例）では、類似文字種の定義として（１）式のように平均ベクトル同士の距離を用いたが、次式のように学習サンプルの中で最も近いもの同士の距離によって類似度を定義してもよい。ある文字ｍ（ｍ＝１，２，…，Ｐ）に対して、ｉ番目の学習データをａｉ（ｍ）で表し、学習データの数をｎとすると
【数５】

ここで、１≦ｉ，ｊ≦ｎなるすべてのｉ，ｊ
を文字種ｍと文字種ｋの距離とする。この場合、学習サンプルの分布も考慮して類似文字種を計算できることになる。
【００３８】
［第３の実施例の説明］
また、第１の実施例では第一次照合の結果もっとも距離の小さい文字種の類似文字種だけをテーブルから抽出し、それらを第２次照合の対象に追加していたが、第一次照合の結果距離の小さい方から上位Ｎ個の文字種の類似文字種を第二次照合の候補に追加してもよい。たとえば、第一次照合の結果上位３個の文字種が「間」「聞」「開」であった場合、この上位３個の文字種すべての類似文字種テーブルを参照することによ次のような類似文字種が得られたとする。
【００３９】
類似文字テーブルからの抽出文字
間 → 問聞開関闘閣閉岡向商
聞 → 間開問闘関閣閉向闇岡
開 → 間問聞関閉闘閣岡閑閥
参照された文字種のうち重複するものを除いて、以下のものが得られる。
【００４０】
関闘閣閉岡向商闇閑閥
そして、この候補文字群を第２次照合の候補として追加することになる（勿論、第１次照合処理と重複するものは追加しない）。
【００４１】
尚、上記Ｎは固定であっても良いが、例えば認識対象の原稿が鮮明な場合には少ない数を、不鮮明な場合には大きい数字を与えることで、適宜変更できるようにすることが望ましい。尚、ユーザに対しては、Ｎがいかなる意味を持つかを意識させないで済むように、例えば、原稿画像の鮮明度を数段階で指示することで対処させれば良い。場合によっては、手書き原稿か、ファクシミリ受信画像か、プリンタ等で印刷したものか等を設定しても良い。この場合、Ｎは手書き＞ファクシミリ受信画像＞プリンタ出力に対応する値を持つことになろう。
【００４２】
［第４の実施例の説明］
また、更に、第１の実施例においてひとつの文字に対する類似文字種の数を一定としたが、これは文字種によって可変にしてもよい。ある文字種ｍに対して他の文字種ｋの距離をｄ（ｍ，ｋ）で表すと、
ｄ（ｍ，ｋ）＜ｄ０
を満足する文字種ｋを類似文字種とする。距離の定義は第１の実施例に同じでもよくまたそれ以外の距離の定義を用いてもよい。ｄ０は実験的に定めた定数であり、例えば候補群に目的の文字が含まれる確率に基づくものでよい。
【００４３】
このように類似文字種を定めると、類似文字種が多くて間違えやすい文字に対して候補字種が増え、類似文字の少ない文字種に対して必要以上に候補文字種を増やして速度を低下させることもなく効率よく削減させることができる。
【００４４】
また、第２次照合によって得られる候補群の数も１０に限定されるものではなく、それぞれの着目文字によって異なっても良い。
【００４５】
このためには、例えば図５に示すように、テーブルを２つに分け、テーブル５０には文字コードとその文字コードに対する類似文字群を記憶しているテーブルの格納先アドレスを格納しておく。テーブル５０は例えば文字コード順に並んでいて、尚且つ、１つのレコード（文字コードとアドレスの構造体の大きさ）は固定であるので、与えられた文字コードからテーブルの位置は簡単に計算でき、瞬時に目的のアドレスを得ることが可能になる。そして、そのテーブルの次のレコードのアドレスを調べれば目的の類似文字の個数が判別できる。
【００４６】
この結果、テーブル５１からその個数分の候補文字を第２次候群として抽出する。
【００４７】
尚、本発明は、活字文字だけでなく手書き文字にも適用でき、また言語の種類を問わない。
【００４８】
また、上記実施例では、第１次照合に文字イメージを複数領域に分割し、その小領域単位の特徴ベクトルでもって第１次候補群を決定し、第２次候補群はマハラノビス距離で算出し抽出した。しかしながら、本願発明はこれによって限定されるものではない。
【００４９】
要は、第１次照合においては速度優先、第２次照合には精度優先をその根底の思想とするものだからである。従って、この思想の範疇にあるものであれば、本願発明はいかように改良変更しても構わない。
【００５０】
また、本発明はスキャナ、ディスプレイ等の個々の装置を接続したシステムとして説明したが、単独の装置内にこれらの機能を実現させる場合にも適応できることは言うまでもない。また、処理手順（プログラム）はＲＯＭに格納されているものではなく、外部から供給することで動作する場合にも適応できるのは、上記実施例からすれば容易に想到できよう。
【００５１】
更に、実施例ではスキャナ等の光学機器から原稿を読み取り、文字認識する例を説明したが、フロッピー等の記憶媒体に予め原稿画像を記憶しておいて、この記憶媒体から画像データを入力し文字認識しても構わない。また、回線を介して受信した画像（ファクシミリ受信画像）から直接文字認識するようにしても良い。
【００５２】
【発明の効果】
以上説明したように本発明によれば、少ない演算量の計算による第１次照合を行い、その結果距離が近い方から上位Ｎ個の文字に詳細な識別計算による第２次照合を行う文字認識処理において、第一次照合の上位Ｍ個（Ｍ≦Ｎ）の認識結果の類似文字種を加えて詳細な識別計算による第２次照合を行うことによって、正しい候補が第一次照合でもれてしまった場合でも類似文字種として復活する可能性が高く、精度の高い文字認識が実現できる。
【００５３】
【図面の簡単な説明】
【図１】第１の実施例の構成を表すブロック図である。
【図２】第１の実施例のフローチャートである。
【図３】類似文字テーブルの構造を説明するための図である。
【図４】従来例の処理のフローチャートである。
【図５】他の実施例における類似文字検索テーブルの構造を示す図である。
【符号の説明】
１０１スキャナ
１０２スキャナインターフェース回路
１０３ポインティングデバイス
１０４ポインティングデバイスインターフェース回路
１０５ＣＰＵ
１０６ＲＯＭ
１０７ＲＡＭ
１０８ディスプレイ
１０９ディスプレイインターフェース回路
１１０ＣＰＵバス[0001]
[Industrial applications]
The present invention relates to a character recognition device and method.
[0002]
[Prior art]
Generally, there are some character recognition processes, but the example shown in FIG. 4 is also effective. Hereinafter, description will be made in order.
[0003]
First, in step S401, a document image is scanned to convert image information into binary digital image data. In the next step S402, a character cutout process for extracting an area for each character is performed on the read image data, and feature extraction is performed in step S403 according to a predetermined algorithm.
[0004]
For example, an image of one character is divided into 8 × 8 small areas (a total of 64 areas), and the number of black pixels is counted in the small areas. Since the feature of the black pixel is calculated for each of the 64 regions, a 64-dimensional feature vector is generated. In order to remove the influence of the character size, the feature vector is normalized by the character size in step S404. Assuming that the original feature vector is X and the normalized feature vector is X ′,
X ′ = (a / (h × w)) X
Is performed. Here, h and w are the height and width of the character, respectively, and a is a constant.
[0005]
In this way, the normalized feature vector is compared with the learning pattern of the recognition dictionary to select the optimal character type.However, in order to increase the recognition accuracy, it is necessary to perform the matching using highly accurate distance calculation. However, there is a problem that the calculation amount increases and the processing speed decreases, and conversely, if the processing speed is to be improved, a simple distance calculation will be performed, and the recognition accuracy will decrease. Occurs.
[0006]
Therefore, a process of narrowing down the character types by primary collation using simple distance calculation and performing secondary collation with higher accuracy and a large amount of calculation only for the narrowed character types (candidates) is conceivable.
[0007]
In step S405, the first collation is performed using the distance calculation with a small amount of calculation. In this example, the city block distance is used. A dictionary in which an average vector μ (k) (= μ1 (k),..., Μ64 (k)) for each normal character k is prepared in advance, and a dictionary is prepared based on the average vector and the character to be recognized. The distance dc (k) from the obtained input feature vector X (= x1 (k),..., X64 (k)) is obtained by the following equation.
[0008]
(Equation 1)

DC is calculated by adding all the character types, and in step S406, the top 10 character types are selected from the smaller dC. In step S407, the second collation is performed on the top 10 character types selected in this manner. In the second collation, the Mahalanobis distance given by the following equation is used as the distance calculation with high accuracy, although the calculation amount is large, with emphasis on accuracy.
[0009]
(Equation 2)

Here, S (k) is a covariance matrix of the character type k. The suffix t on the right shoulder indicates transposition, and -1 indicates an inverse matrix.
[0010]
Thus, in step S408, the character type that gives the smallest dM is determined as a final candidate, and is output in step S409.
[0011]
[Problems to be solved by the invention]
According to the above processing, the method is excellent in that it narrows down the characters in two stages, but it is premised on that the correct character is included in the candidates narrowed down in the first collation. is there. That is, if the character obtained as the correct answer is not included in the candidates obtained in the primary collation, there is a problem that a correct recognition result cannot be obtained regardless of the accuracy of the secondary collation.
[0012]
[Means for Solving the Problems]
The present invention has been made in view of the above problems, and has as its object to provide a character recognition device and method that can recognize characters with high accuracy without causing a problem in speed, suppressing occurrence of missing characters to be recognized. Is what you do.
[0013]
In order to solve this problem, for example, the character recognition device of the present invention has the following configuration. That is,
In a character recognition device for character recognition of an input character image,
Speed-first character recognition means;
A second character recognition unit that gives priority to accuracy;
A similar character dictionary composed of a set of characters and similar character groups for the characters,
First extracting means for extracting a candidate character group by character-recognizing the character image using the first character recognizing means having the speed priority;
Changing means for changing the number N of candidate characters from which a similar character group is extracted according to a document image including the character image;
Second extracting means for extracting the similar character group from the similar character dictionary for the top N candidate characters from each of the higher degree of similarity in the candidate character group which the extracted,
The character image is character-recognized by using the second character recognition unit with the priority given to the accuracy with the character group extracted by the first and second extraction units as a candidate range, and the character is recognized by the second character recognition unit. Control means for controlling to output candidate characters as a result of the recognition.
[0015]
Further, it is preferable that the first character recognition means performs character recognition based on a city block distance , and the second character recognition means performs character recognition based on a Mahalanobis distance . As a result, it becomes possible to narrow down the candidate characters to be recognized at a high speed, and it is possible to recognize the characters with high accuracy.
[0019]
【Example】
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
[0020]
This embodiment is a case where the present invention is applied to an OCR (optical character recognition device). FIG. 1 is a block diagram showing the configuration of the first embodiment.
[0021]
In FIG. 1, reference numeral 101 denotes a scanner which irradiates an image original with light and reads the reflected light and converts the read light into an electric signal. A scanner interface circuit for transmitting to the element; 103, a pointing device (mouse or the like) for inputting desired coordinates on a window of the display; 104, a signal from the pointing device 103; An interface circuit for transmitting data to the elements, a CPU 105 for controlling the entire apparatus and executing character extraction processing and recognition processing, and a control program 106 for the CPU 105, various processing programs, a recognition dictionary, a similar character type table, etc. ROM 107 stores character image expansion and characters Work areas for identification process is a RAM used as such. Reference numeral 108 denotes a display for displaying an input image or a recognition result, 109 denotes a display interface circuit, and 110 denotes a bus connecting each device component.
[0022]
First, a method of creating a similar character type table in the embodiment will be described. The degree of closeness between characters is defined by the distance between the average feature vectors of each character type. The average feature vector is obtained in advance from the various learning data of the same character, such as different fonts and size printing conditions. In this embodiment, a similar character type is determined as the distance using the Euclidean distance. That is, when attention is paid to a certain character type m, the average vector distance D (m, k) from another character type k is set as follows.
[0023]
[Equation 3]

Here, P is the total number of character types. Then, D (m, k) is calculated for all k, and the top 10 characters with the smaller distance are regarded as similar characters. However, although the distance is the shortest when D = 0, it is meaningless, so it is excluded. Since similar character types for all character types are defined by this calculation, these are stored in a table state in advance. Since the contents of this table are not actually changed, they are written in the ROM 106 in advance in the embodiment. However, the present invention is not limited to the ROM, and may be stored in a storage device such as a hard disk device, or may be loaded from the nonvolatile storage device to the RAM when the power is turned on.
[0024]
As a result, for example, when attention is paid to the character “between”, ten characters having the smallest average vector can be instantaneously extracted in the character “between”.
[0025]
FIG. 3 schematically shows the structure of the similar character table. A character group of characters that are stored consecutively in the

first

character 300 is similar to the first character, it is arranged in the similarity order (mean vector distances ascending order). Therefore, in this table, since characters similar to each of the first characters are arranged in a row of 10 characters , the target character can be searched by searching every 11 characters from the top position of the table. It is possible to extract ten similar characters. Therefore, when attention is focused on the letter "between", since it searches for a character corresponding to the code 380 shown (character code), as a result, question, listen, open, can be extracted all at once a ... Pavilion total of 10 characters.
[0026]
In the figure, the characters are represented by the characters themselves, but actually a character code (for example, a JIS code or the like) is stored.
[0027]
Next, the processing contents of the CPU 105 in the apparatus having the above-described table will be described with reference to the flowchart of FIG. A program corresponding to the processing procedure in FIG.
[0028]
Steps S201 to S205 are assumed to be the same as steps S401 to S405 of the conventional example, and the description will be duplicated, and thus will be briefly described.
First, a document image is read, the read binary image data is temporarily stored in a predetermined area of the RAM 107, and characters are cut out. One character is divided into the same number as described above, that is, into 8 × 8 small regions (64 regions in total) to generate a 64-dimensional feature vector. Next, the generated feature vector is normalized to eliminate the influence of the character size (the principle of normalization is as described above). The process of steps S201 to S205 is the process of referring to the dictionary (stored in the ROM 106) based on the normalized feature vector.
[0029]
Now, by referring to the dictionary, several characters (here also 10 characters for simplicity of explanation) are selected from the one closer to the normalized feature vector (step S206).
[0030]
As a result, assume that the following characters are selected, for example.
[0031]
Interview with the Cabinet Closed Oka Mukaisho (1)
Since the closest character is “between”, a search is performed using “between” as a key in the similar character type table (see FIG. 3) (step S207). In the similar character type table of FIG. 3, the key and the code of 380 are matched by collating every 11 characters from the top of the table. As a result, the similar character Q listening open about fighting closed leisure Oka quotient pavilion 10 stored before the position of the mark 390 from the position of the mark 381 (2)
Can be obtained. Combine (2) with the original 10 candidates (1) and remove any duplicates.
12 characters are obtained.
[0032]
Next, the process proceeds to step S208, in which the second collation is performed on the twelve characters of (3). In the second collation, a Mahalanobis distance with high accuracy is obtained by the following equation as described above.
[0033]
(Equation 4)

Here, k represents 12 character types, and X is a vector obtained from the character image to be recognized.
[0034]
Then, finally, the dM (k) is rearranged in ascending order among the twelve dM (k), and the first character is output as the first candidate (step S209). When the operator who has viewed the first candidates displayed on the display screen gives a next candidate display instruction using a keyboard or a pointing device, the second and lower candidates are displayed on the display screen in that order. . The operator searches for and selects a target character from the displayed candidates.
[0035]
Here, has been calculated Mahalanobis distance for 12, since the first collation and is not used ten similar character, respectively the second collation, the number of calculations in the range of 10 to 20 is there.
[0036]
As a result, according to the present embodiment, a predetermined number of first candidate characters (10 characters in the above embodiment) are first extracted by a high-speed technique based on the character image to be recognized. Then, a predetermined number of second candidate characters (also 10 characters in the embodiment) are extracted from the table as similar characters corresponding to the most similar characters themselves in the candidate character group. Then, the result of the logical sum of the first candidate character group and the second candidate character group (meaning that overlapping characters are removed) is subjected to a higher-accuracy recognition process. Let's write letters. Therefore, even if the target character is not included in the candidate character group by the recognition processing which can be processed at high speed although the accuracy is slightly lowered, the high accuracy can be obtained without exceeding the candidate character and without unnecessarily widening. By performing the character recognition, it is possible to keep the recognition result from being lost while keeping the processing speed high.
[0037]
[Description of Second Embodiment]
In the above embodiment (first embodiment), the distance between the average vectors is used as the definition of the similar character type as in equation (1). May define the similarity. Given a character m (m = 1, 2,..., P), the i-th learning data is represented by ai (m), and the number of learning data is n.

Here, all i, j satisfying 1 ≦ i, j ≦ n
Is the distance between the character type m and the character type k. In this case, the similar character type can be calculated in consideration of the distribution of the learning samples.
[0038]
[Description of Third Embodiment]
Also, in the first embodiment, only similar character types of the character type having the shortest distance as a result of the primary collation are extracted from the table and added to the target of the secondary collation. Similar character types of the top N character types starting from the one with the smaller distance may be added to the candidates for the second collation. For example, as a result of the first collation, if the top three character types are “between”, “listening”, and “open”, the similarity type table shown in FIG. Assume that the character type is obtained.
[0039]
Characters extracted from the similar character table-> Q & A Open Sekitokaku Closes Okamu Shobun-> Open Q & A Sekikankaku closes to Dark Okaoka-> Q & A Closes Kakuoka Oka Except for the following, the following are obtained:
[0040]
Sekitokaku Shukai Okamu Shokami Yakukan and this candidate character group will be added as a candidate for the second collation (of course, those that overlap with the first collation process will not be added).
[0041]
It should be noted that the above N may be fixed, but it is desirable that a small number be given when the document to be recognized is clear and a large number is given when the document is unclear, so that it can be changed as appropriate. In order to avoid the user from being conscious of the meaning of N, for example, the user may be caused to instruct the sharpness of the document image in several steps. Depending on the case, a handwritten document, a received facsimile image, a printed image by a printer or the like may be set. In this case, N will have a value corresponding to handwriting> facsimile received image> printer output.
[0042]
[Description of Fourth Embodiment]
Further, in the first embodiment, the number of similar character types for one character is fixed, but this may be variable depending on the character type. When the distance between a certain character type m and another character type k is represented by d (m, k),
d (m, k) <d0
Is assumed to be a similar character type. The definition of the distance may be the same as in the first embodiment, or another definition of the distance may be used. d0 is an experimentally determined constant, which may be based on, for example, the probability that the candidate group contains the target character.
[0043]
When the similar character type is determined in this manner, the number of candidate character types increases for a character having many similar character types and is likely to be mistaken, and the efficiency is improved without reducing the speed by increasing the number of candidate character types more than necessary for a character type having few similar characters. Can be reduced well.
[0044]
Further, the number of candidate groups obtained by the second collation is not limited to ten, and may be different depending on each target character.
[0045]
For this purpose, for example, as shown in FIG. 5, the table is divided into two, and the table 50 stores the character code and the storage destination address of the table storing the similar character group for the character code. The table 50 is arranged, for example, in the order of the character codes, and one record (the size of the structure of the character code and the address) is fixed. Therefore, the position of the table can be easily calculated from the given character code. It becomes possible to obtain a target address instantly. By examining the address of the next record in the table, the number of target similar characters can be determined.
[0046]
As a result, that number of candidate characters are extracted from the table 51 as the second candidate group.
[0047]
Note that the present invention can be applied not only to printed characters but also to handwritten characters, and the type of language is not limited.
[0048]
In the above embodiment, the character image is divided into a plurality of regions for the first collation, the first candidate group is determined based on the feature vector of the small region unit, and the second candidate group is calculated by the Mahalanobis distance. Extracted. However, the present invention is not limited by this.
[0049]
The point is that speed is the priority in the primary collation and accuracy is the priority in the secondary collation. Therefore, the invention of the present application may be modified or changed as long as it falls within the scope of the concept.
[0050]
Although the present invention has been described as a system in which individual devices such as a scanner and a display are connected, it goes without saying that the present invention can also be applied to a case where these functions are realized in a single device. In addition, the processing procedure (program) is not stored in the ROM, and can be adapted to the case where it operates by being supplied from the outside.
[0051]
Furthermore, in the embodiment, an example has been described in which an original is read from an optical device such as a scanner and characters are recognized. However, an original image is stored in advance in a storage medium such as a floppy, and image data is input from this storage medium to input characters. You may recognize it. Alternatively, characters may be directly recognized from an image (facsimile reception image) received via a line.
[0052]
【The invention's effect】
As described above, according to the present invention, the primary recognition is performed by calculation with a small amount of calculation, and as a result, the secondary recognition is performed by performing detailed identification calculation on the top N characters from the closest distance. In the processing, by adding the similar character types of the recognition results of the top M (M ≦ N) of the primary collation and performing the secondary collation by detailed identification calculation, a correct candidate is lost in the primary collation. In such a case, there is a high possibility that the character will be restored as a similar character type, and highly accurate character recognition can be realized.
[0053]
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a first embodiment.
FIG. 2 is a flowchart of the first embodiment.
FIG. 3 is a diagram for explaining a structure of a similar character table.
FIG. 4 is a flowchart of processing of a conventional example.
FIG. 5 is a diagram showing a structure of a similar character search table in another embodiment.
[Explanation of symbols]
101 scanner 102 scanner interface circuit 103 pointing device 104 pointing device interface circuit 105 CPU
106 ROM
107 RAM
108 display 109 display interface circuit 110 CPU bus

Claims

In a character recognition device for character recognition of an input character image,
Speed-first character recognition means;
A second character recognition unit that gives priority to accuracy;
A similar character dictionary composed of a set of characters and similar character groups for the characters,
First extracting means for extracting a candidate character group by character-recognizing the character image using the first character recognizing means having the speed priority;
Changing means for changing the number N of candidate characters from which a similar character group is extracted according to a document image including the character image;
Second extracting means for extracting the similar character group from the similar character dictionary for the top N candidate characters from each of the higher degree of similarity in the candidate character group which the extracted,
The character image is character-recognized by using the second character recognition unit with the priority given to the accuracy with the character group extracted by the first and second extraction units as a candidate range, and the character is recognized by the second character recognition unit. Control means for controlling to output candidate characters as a result of the recognition.

2. The character recognition device according to claim 1, wherein the change unit changes the number N of candidate characters from which the similar character group is to be extracted, according to the definition of a document image including the character image. 3. apparatus.

2. The method according to claim 1, wherein the changing unit changes the number N of candidate characters from which the similar character group is extracted by the second extracting unit according to a type of the document image including the character image. A character recognition device according to the item.

2. The character recognition apparatus according to claim 1, wherein the first character recognition unit performs character recognition based on a city block distance, and the second character recognition unit performs character recognition based on a Mahalanobis distance.

The similar character dictionary uses the learning data of each character to determine a similar character group similar to each character based on the similarity between the learning data, and associates the obtained similar character group with each character. 2. The character recognition apparatus according to claim 1, wherein the dictionary is a dictionary stored in a form.

The character recognition device according to claim 5 , wherein the number of similar characters in the similar character group associated with each of the characters is variable.

In a character recognition method for character recognition of an input character image,
A first extraction step of extracting a candidate character group by character-recognizing the character image using a speed-priority first character recognition unit;
A changing step of changing the number N of candidate characters from which a similar character group is to be extracted according to a document image including the character image;
Second extracting the similar character group for the top N candidate characters from each of the higher degree of similarity in the candidate character group that is the extraction from the similar character dictionary composed of a set of similar character group for the character and the character Extraction process,
The character image is character-recognized by using a second character recognizing unit which gives priority to the character group extracted in the first and second extracting steps and which has a priority as a candidate range, and the character recognition is performed by the second character recognizing unit. And a control step of controlling to output a candidate character as a result of the character recognition.

In the changing step, the similar character group is changed according to the sharpness of a document image including the character image. 8. The character recognition method according to claim 7, wherein the number N of candidate characters to be extracted is changed.

8. The character according to claim 7, wherein, in the changing step, the number N of candidate characters from which the similar character group is extracted is changed according to a type of a document image including the character image. Recognition method.

Said first character recognition means, and character recognition by the city block distance, said second character recognition means, a character recognition method according to claim 7, wherein, characterized in that the character recognition by Mahalanobis distance.

The similar character dictionary uses the learning data of each character to determine a similar character group similar to each character based on the similarity between the learning data, and associates the obtained similar character group with each character. character recognition method according to claim 7, wherein which is a dictionary with Te.

The character recognition method according to claim 11 , wherein the number of similar characters in the similar character group associated with each of the characters is variable.