JP5107100B2

JP5107100B2 - Character recognition method, character recognition device, and character recognition program

Info

Publication number: JP5107100B2
Application number: JP2008059420A
Authority: JP
Inventors: 豊細金; 勲篠原; 尚金田; 伊津美間嶋; あゆみ佐藤; 大前田; 裕一村山
Original assignee: Kyodo Printing Co Ltd
Current assignee: Kyodo Printing Co Ltd
Priority date: 2008-03-10
Filing date: 2008-03-10
Publication date: 2012-12-26
Anticipated expiration: 2028-03-10
Also published as: JP2009217454A

Description

本発明は、画像に含まれる文字を認識するための文字認識方法、文字認識装置及び文字認識プログラムに関する。 The present invention relates to a character recognition method, a character recognition device, and a character recognition program for recognizing characters included in an image.

汎用的な光学機器、例えばカメラ付き携帯電話のような、ごく身近にあり、かつどこででも画像を撮影できるという環境が整い、これらを用いて文字認識を行うというニーズが出てきた。また携帯電話の特徴として、画面表示・演算・ネットワーク機能などがあり、それらを利用するソフトウェアを独自に組み込むこともできるという特徴があることから、文字認識の結果を用いて、撮影後直ちにネットワーク接続等のアクションを起こすことが可能となる。 A general-purpose optical device, such as a camera-equipped mobile phone, is close to the environment where images can be taken anywhere, and there is a need for character recognition using these. In addition, mobile phone features include screen display, computation, and network functions, and software that uses them can be built in independently. It is possible to take actions such as.

一方で、これらの機器の既存手法と比較した短所として、
・画像の撮影品質が良くない（ゆがみがあり、解像度が低い）
・画像の撮影品質が機種により大きく異なる
・撮影環境がまばらであり、撮影倍率、光源、撮影の向きを一定にできず、手ぶれ補正がない、もしくは不十分である
が挙げられ、スキャナ等で画像を取り込むことに比べ、認識が大変難しいことがわかる。
特開２００３−２５６４２９号公報 On the other hand, as a disadvantage compared to the existing methods of these devices,
・ Image quality is not good (distortion and resolution is low)
・ Image quality varies greatly depending on the model. ・ Shooting environment is sparse, shooting magnification, light source, shooting direction cannot be fixed, and there is no or insufficient camera shake correction. It can be seen that the recognition is very difficult compared to capturing.
JP 2003-256429 A

一般的に画像の状態が悪いほど、文字を認識することが困難になり、また文字を認識するために必要となる計算量も多くなる。 In general, the worse the state of the image, the more difficult it is to recognize characters, and the more calculation is required to recognize characters.

携帯電話機を用いた文字認識の従来手法としては、携帯電話機で撮影した画像を携帯電話機からサーバに送信し、携帯電話機よりも演算力の高いサーバで文字認識を行う方法があるが、認識に適した画像はサイズが大きくなる傾向があり、そのため送信時間が長く、通信費も嵩む。また、携帯電話で文字認識を行う方法では、サーバに比べ計算能力が低いため、速度を優先した場合には精度が、精度を優先した場合には速度が著しく低下するという課題がある。 As a conventional method of character recognition using a mobile phone, there is a method in which an image taken by a mobile phone is transmitted from the mobile phone to a server, and character recognition is performed by a server having a higher computing power than the mobile phone. The image tends to increase in size, so that the transmission time is long and the communication cost increases. Also, the method of performing character recognition with a mobile phone has a problem that the calculation capability is lower than that of the server, so that accuracy is significantly reduced when speed is prioritized, and speed is significantly decreased when accuracy is prioritized.

そこで、本発明は、状態の悪い画像に対しても、少ない計算量かつ高い精度で文字を認識することができ、さらにサーバに送信するべきデータ量を少なくできる文字認識方法、文字認識装置及び文字認識プログラムを提供することを目的とする。 Therefore, the present invention can recognize a character with a small amount of calculation and high accuracy even for an image in a bad state, and can further reduce the amount of data to be transmitted to the server, a character recognition device, and a character The purpose is to provide a recognition program.

本発明によれば、一文字を含むと判断された画素群であって、二値化された画素より構成される画素群に含まれる画素のうち注目する画素を一つずつ選択し、前記画素群に含まれる画素についての統計処理をする統計処理ステップを備え、前記統計処理ステップでは、現在注目されている画素を中心とした縦ａ画素×横ｂ画素の領域にあるａ×ｂ画素から前記注目している画素と他の任意のｃ画素とを含む（１＋ｃ）画素を選択する_{ａ×ｂ−１}Ｃ_ｃ通りの組合せから他の画素を中心とした場合とパターンが重複するｄ通りの組合せを除いた_{ａ×ｂ−１}Ｃ_ｃ−ｄ通りの組合せを第一の次元とし、現在注目されている画素を含む（１＋ｃ）画素の取りうる値の２^{（１＋ｃ）}通りの組合せから全画素の値が同一である２通りの組合せを除いた２^{（１＋ｃ）}−２通りの組合せを第二の次元とする（_{ａ×ｂ−１}Ｃ_ｃ−ｄ）×（２^{（１＋ｃ）}−２）の二次元配列を設け、前記画素群に含まれる各画素が該当する前記二次元配列の要素の値を１だけ増加させることを前記画素群に含まれる全ての画素について行うことにより、前記画素群に対応した二次元配列の要素の値の分布を求め、更に、前記画素群に対して得られた前記二次元配列の要素の値の分布に最も近い前記二次元配列の要素の値の分布を有する文字が前記画素群に含まれる文字であると判断する判断ステップと、を備えることを特徴とする文字認識方法が提供される。 According to the present invention, a pixel group that is determined to include a single character and that is selected one by one from among pixels that are included in a pixel group that includes binarized pixels, the pixel group A statistical processing step for performing statistical processing on the pixels included in the pixel, and in the statistical processing step, the attention is made from the a × b pixel in the region of vertical a pixel × horizontal b pixel centered on the currently focused pixel. The (1 + c) pixel including the selected pixel and another arbitrary c pixel is selected. From the _{a × b−1} C _c combinations, d combinations whose patterns overlap with the case where the other pixels are the center are selected. The _{a × b−1} C _c −d combinations excluding the first dimension, and the values of all pixels from 2 ^{(1 + c)} combinations of possible values of the (1 + c) pixels including the pixel of interest. 2 but excluding the combination of two types are identical ^{( + C)} providing a two-dimensional array of combinations of types -2 and the second dimension _{_{(a × b-1 C c}} -d) × (2 (1 + c) -2), each pixel included in the pixel group The distribution of the element values of the two-dimensional array corresponding to the pixel group is obtained by increasing the value of the corresponding element of the two-dimensional array by 1 for all the pixels included in the pixel group, and Determining that a character having a distribution of element values of the two-dimensional array closest to the distribution of element values of the two-dimensional array obtained for the pixel group is a character included in the pixel group A character recognition method comprising the steps of:

また、本発明によれば、一文字を含むと判断された画素群であって、二値化された画素より構成される画素群に含まれる画素のうち注目する画素を一つずつ選択し、前記画素群に含まれる画素についての統計処理をする統計処理手段を備え、前記統計処理手段では、現在注目されている画素を中心とした縦ａ画素×横ｂ画素の領域にあるａ×ｂ画素から前記注目している画素と他の任意のｃ画素とを含む（１＋ｃ）画素を選択する_{ａ×ｂ−１}Ｃ_ｃ通りの組合せから他の画素を中心とした場合とパターンが重複するｄ通りの組合せを除いた_{ａ×ｂ−１}Ｃ_ｃ−ｄ通りの組合せを第一の次元とし、現在注目されている画素を含む（１＋ｃ）画素の取りうる値の２^{（１＋ｃ）}通りの組合せから全画素の値が同一である２通りの組合せを除いた２^{（１＋ｃ）}−２通りの組合せを第二の次元とする（_{ａ×ｂ−１}Ｃ_ｃ−ｄ）×（２^{（１＋ｃ）}−２）の二次元配列を設け、前記画素群に含まれる各画素が該当する前記二次元配列の要素の値を１だけ増加させることを前記画素群に含まれる全ての画素について行うことにより、前記画素群に対応した二次元配列の要素の値の分布を求め、更に、前記画素群に対して得られた前記二次元配列の要素の値の分布に最も近い前記二次元配列の要素の値の分布を有する文字が前記画素群に含まれる文字であると判断する判断手段と、を備えることを特徴とする文字認識装置が提供される。 Further, according to the present invention, the pixel group determined to include one character, and the pixel of interest is selected one by one from among the pixels included in the pixel group composed of binarized pixels, Statistical processing means for performing statistical processing on the pixels included in the pixel group, wherein the statistical processing means includes a × b pixels in a region of vertical a pixel × horizontal b pixel centered on the pixel of interest. From the _{a × b−1} C _c combinations that select (1 + c) pixels including the pixel of interest and other arbitrary c pixels, there are d patterns whose patterns overlap with the case where the other pixels are the center. The _{a × b−1} C _c −d combinations excluding the combinations are the first dimension, and all the pixels from the 2 ^{(1 + c)} combinations of possible values of the (1 + c) pixel including the currently focused pixel are included. values excluding the combination of two types are the same 2 ^{(1 c)} providing a two-dimensional array of combinations of types -2 and the second dimension _{_{(a × b-1 C c}} -d) × (2 (1 + c) -2), each pixel included in the pixel group The distribution of the element values of the two-dimensional array corresponding to the pixel group is obtained by increasing the value of the corresponding element of the two-dimensional array by 1 for all the pixels included in the pixel group, and Determining that a character having a distribution of element values of the two-dimensional array closest to the distribution of element values of the two-dimensional array obtained for the pixel group is a character included in the pixel group And a character recognizing device.

更に、本発明によれば、上記の文字認識方法をコンピュータに実行させるための文字認識プログラムが提供される。 Furthermore, according to the present invention, there is provided a character recognition program for causing a computer to execute the above character recognition method.

本発明の統計処理ステップにより、二次元配列の要素の値から、少ない計算量かつ高い精度で文字を認識することができ、また二次元配列の要素の値の分布を携帯電話機からサーバに送信し、判断ステップをサーバで行う場合には、携帯電話機からサーバに送信するデータ量を減らすことができる。 According to the statistical processing step of the present invention, characters can be recognized with a small amount of calculation and high accuracy from the element values of the two-dimensional array, and the distribution of the element values of the two-dimensional array is transmitted from the mobile phone to the server. When the determination step is performed by the server, the amount of data transmitted from the mobile phone to the server can be reduced.

以下、図面を参照して本発明を実施するための最良の形態について詳細に説明する。 The best mode for carrying out the present invention will be described below in detail with reference to the drawings.

本発明での文字認識とは、サーバに登録済みの画像から、撮影画像に最も似た画像を選択し（これを検索と呼ぶ）、その登録画像に書かれている文字を認識結果としており、画像認識の一種といえる。 Character recognition in the present invention is to select an image most similar to a photographed image from images registered in the server (this is called search), and the character written in the registered image is the recognition result. This is a kind of image recognition.

検索の対象となる文字列は長方形の枠で囲まれているものとする。この枠を認識することで回転角度や、拡大縮小率を判定し、また、枠内にある文字の位置も一定にできる。これによりカメラの回転、対象物との距離、位置が固定でなくても認識可能となるなど、撮影の自由度があがる。スキャナと違い、携帯電話機のように様々な体勢での撮影（被写体との距離、回転の有無、ぶれ等）が考えられる光学機器では、撮影の自由度があることは特に有用である。 It is assumed that the character string to be searched is surrounded by a rectangular frame. By recognizing the frame, the rotation angle and the enlargement / reduction ratio can be determined, and the position of the character in the frame can be made constant. As a result, the degree of freedom of shooting increases, such as enabling recognition even if the camera rotation, distance to the object, and position are not fixed. Unlike optical scanners, it is particularly useful to have a degree of freedom in photographing in optical devices that can be photographed in various postures (distance from the subject, presence / absence of rotation, blurring, etc.) like a mobile phone.

検索では、撮影画像と登録画像を解析した結果である画像特徴量を比較して行う。撮影画像と登録画像の画像特徴量の比較結果を距離と呼ぶ。距離が小さいほど、二つの画像は似ていることを表す。よって、距離の値が最小となる画像が検索結果画像となる。何を画像特徴量にするかは、後ほど説明する。 The search is performed by comparing image feature amounts, which are the results of analyzing the captured image and the registered image. The comparison result of the image feature amount between the captured image and the registered image is called a distance. The smaller the distance, the more similar the two images are. Therefore, the image having the minimum distance value is the search result image. What will be used as the image feature amount will be described later.

画像特徴量抽出までの工程は、画像から一定の大きさとなる文字列画像を抜き出す前半部分と、画像特徴量を計算する後半部分からなる。 The process up to the image feature extraction includes a first half of extracting a character string image having a certain size from the image and a second half of calculating the image feature.

前半部分の文字列画像の抜き出しの説明を行う。 The extraction of the character string image of the first half part will be described.

まず、撮影したカラー画像をグレースケール画像に変換する。この変換は、一般的な処理なので、このアルゴリズムについての説明は行わない。 First, the photographed color image is converted into a gray scale image. Since this conversion is a general process, this algorithm will not be described.

次に、グレースケール画像から枠領域を認識する。枠領域とは、長方形の枠に囲まれた領域のことである。枠の中に文字列が挿入されている。この認識方法は、本発明とは別の発明を利用したものであるため、具体的な認識方法の説明を省略する。 Next, the frame area is recognized from the gray scale image. The frame area is an area surrounded by a rectangular frame. A character string is inserted in the frame. Since this recognition method uses an invention different from the present invention, a specific description of the recognition method is omitted.

この結果として、４頂点の各ｘ，ｙ座標が得られる。 As a result, the x and y coordinates of the four vertices are obtained.

この座標がきちんと枠領域の最も外側の点、もしくは内側の点が取れるなど、一定の座標が取れるのであれば次の枠を広げる処理は省略できるが、ここでは枠を大まかに認識して、それより少し大きめの範囲を枠領域とする方法をとっている。 If this coordinate can be taken at the outermost point or the inner point of the frame area, and if a certain coordinate can be taken, the process of expanding the next frame can be omitted, but here the frame is roughly recognized and A slightly larger range is used as the frame area.

枠を広げる処理について、まず頂点の座標をなるべく枠の外側を指すように修正を行う。各点について接する二本の辺を見て、ここから広げる方向を決定する。各辺の頂点方向のベクトルを加えたベクトルの方向に向かって枠を広げていくが、これをｘ，ｙ成分の（＋，０，−）で表す。その変換は、ベクトルごとの（ｘ，ｙ）座標での修正方向のように、先ほどのベクトル方向をｘ，ｙ成分の正負の方向で表す。現在の頂点を（Ｘ，Ｙ）としたとき、ベクトルがｘが＋、ｙが−であれば、ｘ成分について（Ｘ＋１，Ｙ）が枠かどうか調べ、次にｙ成分について（Ｘ，Ｙ−１）を調べる。いずれかが枠の画素だった場合、頂点座標をその座標に移動する。また、この頂点座標について同様のことを座標が動かなくなるまで行う。これを全ての頂点について行う。 Regarding the process of expanding the frame, first, the coordinates of the vertex are corrected so as to point outside the frame as much as possible. Look at the two sides that touch each point, and decide the direction to spread from here. The frame is expanded toward the direction of the vector obtained by adding the vector in the vertex direction of each side, and this is represented by (+, 0, −) of the x and y components. In the conversion, like the correction direction in the (x, y) coordinates for each vector, the previous vector direction is expressed by the positive and negative directions of the x and y components. If the current vertex is (X, Y) and the vector is x + and y is-, it is checked whether (X + 1, Y) is a frame for the x component, and then (X, Y-) for the y component. Check 1). If either is a frame pixel, the vertex coordinates are moved to that coordinate. The same processing is performed for the vertex coordinates until the coordinates stop moving. Do this for all vertices.

最後に、各頂点についてベクトル方向に一定値を加算・減算する。このようにすることで、頂点を結んだ四角形は枠よりも大きくなる。 Finally, a constant value is added / subtracted in the vector direction for each vertex. By doing so, the quadrilateral connecting the vertices becomes larger than the frame.

次に、その枠（頂点）の内側の画像を抜き出す。枠が回転している場合は枠の外側に余白ができる。余白は灰色（ここでは１６０）で塗りつぶす。 Next, an image inside the frame (vertex) is extracted. If the frame is rotating, there will be white space outside the frame. The margin is filled with gray (160 here).

この時点で、枠と文字が書かれた黒い部分と、背景である白い部分の切り離しが比較的容易になる。そこで、画像から二値化閾値を判別分析法（大津の方法）等により計算する。この時点では二値化画像に変換は行わず、グレースケール画像のままである。 At this point, it is relatively easy to separate the black part with the frame and characters from the white part as the background. Therefore, the binarization threshold value is calculated from the image by a discriminant analysis method (Otsu's method) or the like. At this point, the image is not converted into a binarized image and remains a grayscale image.

次に、画像から長方形の形状を認識し、この長方形が何度回転しているか調べる。 Next, a rectangular shape is recognized from the image, and it is examined how many times the rectangle is rotated.

回転角度としては０度から９０度までが判明する。０から３６０度ではない理由は、枠の天地がわからないことと、辺の縦横の判断がつかないためである。しかし、通常枠に文字を記入する場合、幅が高さよりも長い。これを全ての検索対象文字列のルールとすることで、辺の長さの比較から縦横が判明し、０度から１８０度の回転は判明する。 As the rotation angle, 0 to 90 degrees is found. The reason why the angle is not 0 to 360 degrees is that the top and bottom of the frame is not known and the determination of the vertical and horizontal sides cannot be made. However, when writing characters in a normal frame, the width is longer than the height. By making this a rule for all search target character strings, the length and breadth are found from the comparison of the lengths of the sides, and the rotation from 0 degrees to 180 degrees is found.

回転角度が分かれば、その角度分、画像を逆方向に回転させる。この結果、枠の回転向きが一定になる。ただし、先ほどの理由で上下逆さの可能性は残る。 If the rotation angle is known, the image is rotated in the reverse direction by the angle. As a result, the rotation direction of the frame is constant. However, the possibility of upside down remains for the reason mentioned above.

次に、枠のサイズが所定のサイズになるように画像を拡大又は縮小させる。 Next, the image is enlarged or reduced so that the size of the frame becomes a predetermined size.

画像を拡大又は縮小をした後の枠のサイズは、固定とするか、若しくは複数の候補から選択する。後者の場合には、枠の縦横比の最も近い候補を選択する。注意するべきことは、検索画像のサイズと、登録画像のサイズを同じにする必要があることである。 The size of the frame after enlarging or reducing the image is fixed or selected from a plurality of candidates. In the latter case, a candidate having the closest frame aspect ratio is selected. It should be noted that the size of the search image and the size of the registered image must be the same.

次に、画像から枠を除去する。このためには、画像の外側の上下左右から数ピクセル取り除く作業を行う。どれだけのピクセルをくりぬくかは、先ほどの拡大縮小で選択したサイズによって決定する。 Next, the frame is removed from the image. For this purpose, several pixels are removed from the top, bottom, left, and right outside the image. The number of pixels to be hollowed out is determined by the size selected in the previous scaling.

次に、文字を一文字ずつ切り離す。二値化を行った結果の画像の文字同士の間に１ピクセル以上の空白があるならば、縦軸方向に走査して文字と重なるかどうかのチェックを縦線内の全てのピクセルに対して行い、文字と重なる縦線が連続したならば、連続した縦線全体を一つの文字の領域と認識する、という方法で文字が区切られる。ある区切れの縦線から、その次の縦線までが、一文字の領域となる。 Next, separate the characters one by one. If there is a space of 1 pixel or more between characters in the binarized image, check whether it overlaps the character by scanning in the vertical axis direction for all pixels in the vertical line If a vertical line that overlaps with a character continues, the character is divided by a method of recognizing the entire continuous vertical line as an area of one character. One character area extends from a certain vertical line to the next vertical line.

しかし撮影画像によっては、カメラレンズの性能が低いこと、ピントが合っていないこと、手ぶれがあること、撮影対象の文字間隔が狭いことなどを理由として、二値化すると文字が繋がってしまい、上手く区切れないことがある。そこで下記の別の方法により一文字の領域を検出する。 However, depending on the photographed image, characters may be connected correctly when binarized because the performance of the camera lens is low, the subject is out of focus, camera shake, or the character spacing of the subject is narrow. May not be separated. Therefore, an area of one character is detected by another method described below.

まず、全て縦軸について、それぞれ軸上の画素の最小値を求める。この値を画像の幅の長さの配列に代入する。 First, for all the vertical axes, the minimum value of the pixels on each axis is obtained. Assign this value to the array of image width lengths.

この配列は、文字がある部分では値が小さく、文字がない部分では値が大きくなるという特徴を持つ。配列の値を連続してみると、波の頂から次の頂までが一文字にあたる波形と見ることができる。この波形の山（頂）・谷（底）・山（頂）の並びが一つの文字となる。よって、この谷と、山を配列の中から見つけ出すことにより、文字の区切りが判別できる。 This array has a feature that a value is small in a portion where there is a character and a value is large in a portion where there is no character. If the values of the array are viewed continuously, it can be seen that the waveform from the top of the wave to the next top corresponds to one character. The sequence of peaks (tops), valleys (bottoms), and peaks (tops) of this waveform is a single character. Therefore, the character delimiter can be determined by finding the valley and the mountain from the array.

まず、谷の見つけ方を説明する。まず、配列の開始点を決める。一文字目の探索であれば、配列の初めを開始点とする。そうでなければ、前の文字の区切りとなる山が開始点になる。この開始点の配列の値を、暫定の最小値として保持する。次に、点の配列の値をこの開始点から順番に値を参照する。配列の値を見て、これが暫定の最小値より小さければ、この値が暫定の最小値になる。しばらく配列参照を進めると、値が大きくなってくる。最小値周辺が文字領域であり、文字領域が終わり文字の切れ目になると、値が大きくなるためである。この配列の値と最小値の差が一定閾値を超えると、文字と、文字区切りの存在を確認したと判定でき、よって、最小値を谷と決定する。この方法であれば文字のない領域の配列を参照した場合にも、最小値との差がほとんど無いため谷が確定できず、文字がこの領域に存在しないとすることができる。 First, explain how to find the valley. First, determine the starting point of the sequence. If the search is for the first character, the start of the array is taken as the starting point. Otherwise, the starting point is the mountain that delimits the previous character. The value of the starting point array is held as a provisional minimum value. Next, the values of the array of points are referred to in order from this starting point. Looking at the value of the array, if this is smaller than the provisional minimum value, this value becomes the provisional minimum value. If the array reference is advanced for a while, the value becomes larger. This is because the area around the minimum value is a character area, and the value increases when the character area ends and the character breaks. If the difference between the array value and the minimum value exceeds a certain threshold value, it can be determined that the character and the character delimiter have been confirmed, and therefore the minimum value is determined to be a valley. With this method, even when an array of a region without characters is referred to, the valley cannot be determined because there is almost no difference from the minimum value, and it can be assumed that characters do not exist in this region.

次に、山を探索する。山の探索は、谷の探索のほぼ反対の動作となる。開始点は先ほど見つけた谷の点とする。この値を暫定的な最大値とする。順番に配列を参照していき、最大値を見つける。また、しばらくすると値が急激に小さくなる。これは、最大値周辺が文字の切れ目であり、小さくなるのは文字領域が始まるためである。最大値と、この配列の値との差が一定閾値を超えると、文字と、文字区切りの存在を確認したと判定でき、よって最大値を山と確定する。 Next, explore the mountains. The search for the mountain is almost the opposite of the search for the valley. The starting point is the valley point found earlier. This value is the provisional maximum value. Refer to the array in order and find the maximum value. In addition, the value decreases rapidly after a while. This is because the area around the maximum value is a break of characters, and the reason for the decrease is that the character area starts. If the difference between the maximum value and the value of this array exceeds a certain threshold value, it can be determined that the presence of the character and the character delimiter has been confirmed, and thus the maximum value is determined as a peak.

以上のような谷と山の探索を交互に行い、その結果から、山から次の山までが一つの文字の領域であると判定できる。 The search for valleys and mountains as described above is performed alternately, and from the result, it can be determined that the area from one mountain to the next is an area of one character.

この文字領域には文字の前後に背景も含まれる可能性がある。そこで、先ほどの配列の値が二値化閾値よりも大きい場合には、その領域は背景として文字領域から除く。これにより、左端から右端までが文字領域になる。 This character area may include a background before and after the character. Therefore, when the value of the previous array is larger than the binarization threshold, the area is excluded from the character area as a background. As a result, the character region extends from the left end to the right end.

文字が区切れた時点で、二値化閾値により二値化を行う。 When the characters are separated, binarization is performed using a binarization threshold.

これまでの処理の結果、一文字だけを画像から抜き出すことが可能である。ここからは、各文字画像に対して画像加工を行う。 As a result of the processing so far, only one character can be extracted from the image. From here, image processing is performed on each character image.

文字画像は二値画像だが、文字の輪郭線の切りだしが難しく、二値化閾値で文字と背景を切り分けたところ、輪郭線がガタガタになってしまうことがよくある。これは後に述べる３画素間関係の画像特徴量が撮影ごとにばらつく要因となり、文字検出の精度に悪影響を与える。そのため、境界はなるべくなだらかにしたい。 Although the character image is a binary image, it is difficult to cut out the outline of the character, and when the character and the background are separated by the binarization threshold, the outline often becomes rattled. This causes an image feature amount related to the relationship between the three pixels, which will be described later, to vary from shooting to shooting, and adversely affects the accuracy of character detection. Therefore, I want to make the boundary as gentle as possible.

このための方法として、メディアンフィルタをかけたり、太線化をする。これにより、輪郭線のガタガタがいくらか軽減され、文字検出の精度を上げることができる。 As a method for this, a median filter is applied or a thick line is applied. Thereby, the rattling of the outline is somewhat reduced, and the accuracy of character detection can be increased.

この次に、囲まれた領域の穴埋めを行う。囲まれた領域とは、例えば「０」や「９」といった文字の中の空間のことである。これにより、「６」「９」という輪郭が似通った文字間の画像の違いがはっきりする上、画像特徴量の垂直・水平方向のヒストグラムの値が安定し、精度が向上する。 Next, the enclosed area is filled. The enclosed area is a space in characters such as “0” and “9”, for example. As a result, the difference in image between characters with similar outlines “6” and “9” becomes clear, and the vertical and horizontal histogram values of the image feature amount are stabilized, and the accuracy is improved.

ここからは、文字認識の後半部分である画像特徴量の取得手法について説明する。画像特徴量は文字ごとに、３画素間関係のヒストグラムと、垂直方向と水平方向のヒストグラムの二種類である。 From here, an image feature amount acquisition method, which is the latter half of character recognition, will be described. There are two types of image feature amounts for each character: a histogram of the relationship between three pixels and a histogram in the vertical direction and the horizontal direction.

３画素間関係のヒストグラムについて説明を行う。 A histogram of the relationship between the three pixels will be described.

画像中のある画素と、その周囲８方向中２画素の合計３画素に注目する。８方向中２画素の取り方は、_８Ｃ_２の２８通りだが、そのうちの８通りは、隣接する画素を中心にした時と同じ３画素を参照しているため、この重複する８通りを省略した計２０通りを見る（図１参照）。 Attention is paid to a total of 3 pixels, that is, a certain pixel in the image and 2 pixels in the surrounding 8 directions. There are 28 ways of ₈ C _{2 to} take 2 pixels in 8 directions, but 8 of them refer to the same 3 pixels as when centering on adjacent pixels, so this overlapping 8 ways are omitted A total of 20 patterns are seen (see FIG. 1).

これら３画素間の値のとりうる組み合わせは２の３乗で８通りだが、３画素全て同じ値となる２パターンを除いた６パターンについて注目する。２パターンを除く理由は、全て背景、及び文字となるパターンをそれぞれを表しており、つまり文字面積に関連性が強いといえる。しかし、文字面積は撮影するたびに大きく変化する。例えば「ｌ」という文字幅は、撮影のたびに一定にならない。仮に、ある撮影での文字幅は４であるとする。これが別の撮影で高さは同じだが文字幅が５になると、同じ対象物を撮影したにも関わらず、文字面積が４：５と大きく異なる。このような面積の違いは頻繁に発生し、これを撮影ごとに一定になるように画像を加工することは困難である。特に画像が回転する場合には一定しない。よって、文字面積に関連する２パターンを除く。 There are eight possible combinations of values between these three pixels, but pay attention to six patterns excluding two patterns in which all three pixels have the same value. The reasons for excluding the two patterns all represent the background and the pattern that becomes the character, that is, it can be said that the relation to the character area is strong. However, the character area changes greatly every time an image is taken. For example, the character width “l” does not become constant every time shooting is performed. It is assumed that the character width in a certain shooting is 4. When the height is the same in another shooting but the character width is 5, the character area is greatly different from 4: 5 even though the same object is shot. Such a difference in area frequently occurs, and it is difficult to process an image so as to be constant for each photographing. Especially when the image rotates, it is not constant. Therefore, two patterns related to the character area are excluded.

３画素間関係は文字の輪郭の特徴を捉えている。上記の２パターンを除くことにより、３画素間には０と１が含まれることになる。つまり、周囲３×３の画素中にエッジが含まれている。これを３画素の位置関係と関連付けて特徴を記録することで、輪郭の形状と密接に関連するといえる。 The relationship between the three pixels captures the feature of the outline of the character. By excluding the above two patterns, 0 and 1 are included between the three pixels. That is, an edge is included in the surrounding 3 × 3 pixels. By associating this with the positional relationship of the three pixels and recording the feature, it can be said that it is closely related to the contour shape.

これを、画像中の全画素について行う。３画素間の関係の２０パターンをそれぞれ区別し、とりうる値の６パターンも区別する。一文字あたり長さ１２０の配列が３画素間関係の画像特徴量となる。５文字であれば画像特徴量は長さ６００の配列である。文字ごと、全画素についてパターンが表れるたびにカウントしていき、２０×６の配列のヒストグラムを作成する。 This is performed for all pixels in the image. Twenty patterns of the relationship between the three pixels are distinguished from each other, and six patterns of possible values are also distinguished. An array having a length of 120 per character is an image feature amount having a relationship between three pixels. If there are 5 characters, the image feature amount is an array of length 600. For each character, counting is performed every time a pattern appears for all pixels, and a 20 × 6 array histogram is created.

画像特徴量抽出を最初から順を追って説明すると、まず文字ごとに分ける。一文字の全画素を走査していく。ある１画素について、３画素間関係２０パターン中の１パターン目の３画素を見る。全てが同じ画素値である場合は無視するが、そうでない場合は対応するヒストグラム配列をカウントする。この時カウントされる配列は１から６番である。２パターン目では、７から１２番の配列となる。これを全２０パターンに行うため、一文字の配列は長さ１２０（＝２０×６）となる。二文字目の配列は１２１−２４０番となる。 The image feature extraction will be described in order from the beginning. All pixels of one character are scanned. Regarding a certain pixel, three pixels in the first pattern in the 20 patterns between the three pixels are seen. If all have the same pixel value, ignore it, otherwise count the corresponding histogram array. The sequences counted at this time are 1 to 6. In the second pattern, the array is numbered 7 to 12. Since this is performed for all 20 patterns, the arrangement of one character has a length of 120 (= 20 × 6). The second character array is numbered 121-240.

一文字に注目した場合の配列のヒストグラムの作り方について図２を参照して説明をする。 A method of creating an array histogram when focusing on one character will be described with reference to FIG.

一文字の領域に含まれる全ての画素についてステップＳ１０３以降の処理を繰り返す。 The processes in and after step S103 are repeated for all the pixels included in one character area.

ステップＳ１０３では、現在着目している画素を含む３画素の全ての組合せｉについてステップＳ１０５以降の処理を繰り返す。但し、全ての組合せからは、別画素の同一パターンと重複する８個の組合せを除く。従って、この繰り返しは２０回行われる。 In step S103, the processes in and after step S105 are repeated for all combinations i of the three pixels including the pixel of interest. However, from all the combinations, eight combinations that overlap with the same pattern of another pixel are excluded. Therefore, this repetition is performed 20 times.

ステップＳ１０５では、３画素がなすパターンが６パターン中のどのパターンであるのかを判断する。 In step S105, it is determined which of the six patterns the pattern formed by the three pixels is.

３画素のなすパターンがパターン１である場合には、配列Ａ（ｉ、１）の値を１増加させる（ステップＳ１０７）。 When the pattern formed by the three pixels is pattern 1, the value of array A (i, 1) is increased by 1 (step S107).

３画素のなすパターンがパターン２である場合には、配列Ａ（ｉ、２）の値を１増加させる（ステップＳ１０９）。 When the pattern formed by the three pixels is pattern 2, the value of array A (i, 2) is increased by 1 (step S109).

３画素のなすパターンがパターン３である場合には、配列Ａ（ｉ、３）の値を１増加させる（ステップＳ１１１）。 When the pattern formed by the three pixels is pattern 3, the value of array A (i, 3) is incremented by 1 (step S111).

３画素のなすパターンがパターン４である場合には、配列Ａ（ｉ、４）の値を１増加させる（ステップＳ１１３）。 When the pattern formed by the three pixels is pattern 4, the value of array A (i, 4) is increased by 1 (step S113).

３画素のなすパターンがパターン５である場合には、配列Ａ（ｉ、５）の値を１増加させる（ステップＳ１１５）。 When the pattern formed by the three pixels is pattern 5, the value of array A (i, 5) is increased by 1 (step S115).

３画素のなすパターンがパターン６である場合には、配列Ａ（ｉ、６）の値を１増加させる（ステップＳ１１７）。 When the pattern formed by the three pixels is the pattern 6, the value of the array A (i, 6) is increased by 1 (step S117).

以上が一文字に注目した場合の配列のヒストグラムの作り方である。 The above is how to create an array histogram when focusing on one character.

文字認識の際に比較の基準となる０〜９、Ａ〜Ｚ、ａ〜ｚといった英数字の他、記号等についても、上記と同様な方法により配列のヒストグラムを作成しておく。従って、文字認識しようとする文字のヒストグラムと最も近いヒストグラムを有する文字（０〜９、Ａ〜Ｚ、ａ〜ｚといった英数字の他、記号等のうちの何れかの文字）が文字認識後の文字となる。 In addition to alphanumeric characters such as 0 to 9, A to Z, and a to z, which are used as a reference for character recognition, a histogram of an array is created in the same manner as described above. Therefore, the character having the histogram closest to the histogram of the character to be recognized (characters such as 0 to 9, A to Z, a to z, or any one of symbols, etc.) after the character recognition It becomes a character.

この次に垂直方向と水平方向のヒストグラムを作成する。すなわち、文字画像の垂直方向の長さの配列と、水平方方向の長さの配列を用意して、文字である黒の画素があると、それに対応する両方向の配列をカウントしていき、ヒストグラムを二つ作成する。 Next, histograms in the vertical and horizontal directions are created. In other words, an array of lengths in the vertical direction of a character image and an array of lengths in the horizontal direction are prepared, and if there are black pixels that are characters, the corresponding array in both directions is counted, and the histogram Create two.

これにより、文字位置が反映された画像特徴量が取得できる。 Thereby, the image feature amount reflecting the character position can be acquired.

またこの他に、文字数と、文字を区切る前の画像の各文字の中心ｘ座標も画像特徴量として取得する。これらは検索精度向上と検索の高速化に役立つ。 In addition to this, the number of characters and the center x-coordinate of each character of the image before dividing the character are also acquired as image feature amounts. These are useful for improving the search accuracy and speeding up the search.

検索に関する追記として、登録画像が存在しない場合、答えを出さない機能を実装している。方法としては、登録画像に対する距離について、一文字ごとにある閾値を超えるかどうか確認し、越える文字が存在する場合、たとえ距離が最小となってもこの画像を答えとせず、再撮影とする。 As a postscript regarding search, a function that does not give an answer when a registered image does not exist is implemented. As a method, it is confirmed whether the distance to the registered image exceeds a certain threshold for each character. If there is a character exceeding the threshold, even if the distance is minimum, the image is not answered and re-photographed.

また、もうひとつのチェック機能として、最小値となる距離と、二位となる距離について、
（距離の二位）／（一位の距離）＜Ｄ
のようにある閾値Ｄよりも小さくなる場合、他の画像と間違う可能性が多いため、答えを出力せずに再撮影としている。 As another check function, the minimum distance and the second distance are:
(Second distance) / (First distance) <D
When the threshold value D is smaller than a certain threshold value D, there is a high possibility that it is mistaken for another image, so that the re-photographing is performed without outputting the answer.

また、ヒストグラムの比較は、３画素間関係同様にユークリッド距離を使っている。但し、ユークリッド距離の代わりに、シティブロック距離やマハラノビス距離を使ってもよい。垂直・水平ヒストグラムの距離を一文字ずつ、別個に計算する。登録画像と撮影画像のヒストグラムの距離を求める際に、同じ配列位置のヒストグラムを参照するが、それに加え片側の配列位置を前後一つずつずらした距離も計算する。この３つの距離のうち最小となる値を正式な距離として選択する。これは１ピクセル未満の小さな画像のずれを考慮したものである。 The comparison of histograms uses the Euclidean distance as in the relationship between the three pixels. However, a city block distance or Mahalanobis distance may be used instead of the Euclidean distance. The distance between the vertical and horizontal histograms is calculated separately for each character. When the distance between the registered image and the captured image is obtained, the histogram at the same array position is referred to. In addition, the distance obtained by shifting the array position on one side one by one is calculated. The smallest value among these three distances is selected as the formal distance. This takes into account a small image shift of less than one pixel.

最終的な距離は、全ての文字の、３画素間関係と、垂直ヒストグラムと、水平ヒストグラムの距離の総和を用いている。また、３点間距離、垂直ヒストグラムの距離、水平ヒストグラムの距離のそれぞれに重み付けをしてからそれらの総和をとるようにしてもよい。３画素間関係の距離のみを用いて文字認識を行い、３画素間関係の距離が近い複数の候補が現れた場合に、垂直ヒストグラムと水平ヒストグラムとを用いてもよい。 As the final distance, the total sum of the distance between the three pixels, the vertical histogram, and the horizontal histogram of all characters is used. Alternatively, the distance between the three points, the distance of the vertical histogram, and the distance of the horizontal histogram may be weighted and then summed. Character recognition is performed using only the distance between the three pixels, and when a plurality of candidates having a close distance between the three pixels appear, a vertical histogram and a horizontal histogram may be used.

認識する文字列が、例えば、最初のｎ文字が大文字のアルファベットで、次のｍ文字が数字であるというように限られている場合がある。このような場合、最初の文字から順々に文字認識を行って、ｎより小さいｉ番目の文字で数字が認識されるようであれば、エラー終了するようにしてもよい。 The character string to be recognized may be limited, for example, such that the first n characters are uppercase alphabets and the next m characters are numbers. In such a case, character recognition may be performed sequentially from the first character, and if the number is recognized by the i-th character smaller than n, the error may be terminated.

また、検索の際の比較回数を減らすために、後に画像登録の説明の際に出てくる印刷物定義による検索の省略も行う。印刷物定義は同じフォーマットとなる文字の情報が書かれている。例えば、５から１０文字の間で、枠候補１を使っているなどである。画像検索の際には、文字比較、文字位置比較の前に、印刷物定義との比較を行う。文字数と、枠候補を比べ、もしどちらかでも異なれば、この印刷物定義を用いた登録データの中に答えが無いことがわかるため、この印刷物定義を使った登録画像との比較を行わなくてすむ。 Further, in order to reduce the number of comparisons at the time of search, the search by the print definition that will be described later when explaining image registration is also omitted. In the print definition, information on characters in the same format is written. For example, frame candidate 1 is used between 5 and 10 characters. At the time of image search, the comparison with the printed matter definition is performed before the character comparison and the character position comparison. Compare the number of characters with the frame candidates. If either of them is different, you can see that there is no answer in the registration data using this print definition, so you do not have to compare it with the registered image using this print definition. .

次に、画像の登録について説明する。 Next, image registration will be described.

画像の種類は、印刷に使うデジタル画像を使う方法と、実際の撮影画像を使う方法がある。 There are two types of images: a method using a digital image used for printing and a method using an actual captured image.

登録画像は印刷物により分別される。この印刷物ごとの特徴を前もってＤＢに登録する。登録される特徴としては、最小最大文字数、枠の縦横サイズ、がある。その他にも、印刷物の名称や、それに付随する情報、フォントの種類、文字間隔、文字パターン（アルファベットの大文字小文字から構成させる等）が挙げられる。 Registered images are sorted by printed matter. The features for each printed matter are registered in advance in the DB. The registered features include a minimum maximum number of characters and a vertical and horizontal size of the frame. In addition, the name of the printed matter, information associated therewith, font type, character spacing, and character pattern (including upper and lower case letters of the alphabet) can be mentioned.

画像登録を行う際には、この印刷物の定義を一つ選択して行う。定義の値から、最も縦横比の近い枠候補を選択する。画像特徴量の抽出を行うときには、この枠候補の値を用いて拡大縮小を行う。画像特徴の抽出については、基本的に先ほどの説明と同じであるが、登録するデジタル画像の種類によっては、画像を回転させる必要がなくなり、また枠の位置も枠抽出を行わなくても固定位置から取得できる。これらの情報も、印刷物定義で前もって決めておく。 When registering an image, one definition of the printed material is selected. A frame candidate having the closest aspect ratio is selected from the definition value. When extracting the image feature amount, enlargement / reduction is performed using the value of the frame candidate. Image feature extraction is basically the same as described above, but depending on the type of digital image to be registered, there is no need to rotate the image, and the position of the frame is fixed without performing frame extraction. Can be obtained from These pieces of information are also determined in advance in the print definition.

これらの処理により、画像の文字数が得られるが、これを定義と比較して、違っていれば、画像、もしくは印刷物定義に間違いがあると判明する。これは枠認識を行う場合の、枠の縦横の長さと、印刷物定義の比較などにもいえ、つまり印刷物定義は登録時のエラーチェックとしても使える。 By these processes, the number of characters of the image can be obtained. If this is compared with the definition, if it is different, it is found that there is an error in the definition of the image or the printed matter. This can also be said to compare the vertical and horizontal lengths of the frame and the print definition when performing frame recognition. That is, the print definition can also be used as an error check during registration.

画像登録は、正位置方向と、１８０度回転させた２種類の画像特徴量を登録する。これは、撮影画像の上下の区別がつかないための処置であり、これにより撮影画像の向きに関わらず検索することができる。 In the image registration, the normal position direction and two types of image feature values rotated by 180 degrees are registered. This is a measure for making it impossible to distinguish the upper and lower sides of the captured image, and it is possible to search regardless of the orientation of the captured image.

サーバに登録しておく画像特徴量のデータとしては、画像を識別する名前と、画像特徴量がある。これを０度と１８０度回転の２種類登録する。また、サービスを提供する場合は、それぞれの登録画像が選択された場合のイベントを指定する。携帯の場合、ＷＥＢにアクセスして、音楽・動画などのデジタルコンテンツにアクセスしたり、メールが送られて来たり、音やバイブレーション機能などの動作を行ったり、新しい連絡先や、スケジュールが生成されたり、また電話をかけることも考えられる。 The image feature amount data registered in the server includes a name for identifying an image and an image feature amount. Two types of rotation, 0 degree and 180 degree rotation, are registered. When providing a service, an event is specified when each registered image is selected. In the case of a mobile phone, you can access WEB to access digital contents such as music and videos, receive emails, perform operations such as sound and vibration functions, and generate new contacts and schedules. Or make a phone call.

これまでは概要図に沿って説明してきたが、別の文字認識形態を説明する。 Up to this point, the description has been made with reference to the schematic diagram, but another character recognition mode will be described.

この形態は、画像特徴量を求めるところまでは同じだが、検索も携帯内で行う。ここでは、一文字ごとの画像特徴量を携帯内に持たせる。画像解析結果の画像特徴量の一文字と、携帯内の画像特徴量を比較し、距離が最小の文字を撮影画像の文字列に当てはめる。これを全ての解析結果の文字に対して行い、結果として撮影画像の文字列の認識が完了する。 This form is the same up to the point where the image feature amount is obtained, but the search is also carried out in the mobile phone. Here, an image feature amount for each character is provided in the mobile phone. One character of the image feature value of the image analysis result is compared with the image feature value in the mobile phone, and the character with the smallest distance is applied to the character string of the photographed image. This is performed for all analysis result characters, and as a result, recognition of the character string of the photographed image is completed.

なお、上述した文字認識方法は、ハードウェア、ソフトウェア又はこれらの組合せにより実現される。 The character recognition method described above is realized by hardware, software, or a combination thereof.

また、上述の説明では、３画素間の関係を画像特徴量としたが、四画素間の関係や更に多くの画素の間の関係を画像特徴量としてもよい。また、注目画素を中心とした３画素×３画素の領域から画像特徴量を算出するための画素を選択する代わりに、５画素×５画素の領域やそれよりも大きい領域から画像特徴量を算出するための画素を選択するようにしてもよい。 In the above description, the relationship between three pixels is an image feature amount, but a relationship between four pixels or a relationship between more pixels may be an image feature amount. Also, instead of selecting a pixel for calculating the image feature amount from a 3 pixel × 3 pixel region centered on the target pixel, the image feature amount is calculated from a 5 pixel × 5 pixel region or a larger region. You may make it select the pixel for doing.

また、上述の説明では、携帯電話機を例に挙げているが、これに限定されるものでもない。 In the above description, a mobile phone is taken as an example, but the present invention is not limited to this.

本発明の実施形態による３×３の領域から中央の画素を含む３つの画素を選択する組合せを羅列した図である。It is the figure which enumerated the combination which selects three pixels including a center pixel from the 3 * 3 area | region by embodiment of this invention. 本発明の実施形態による文字を含む画像の特徴量を求めるための方法を示す図である。It is a figure which shows the method for calculating | requiring the feature-value of the image containing the character by embodiment of this invention.

Claims

A pixel group that is determined to include a single character, and selects one pixel of interest from among pixels included in the pixel group that includes the binarized pixels, and the pixel group includes the pixel group. A statistical processing step for statistical processing;
The statistical processing step includes the pixel of interest and other arbitrary c pixels from a × b pixels in a vertical a pixel × horizontal b pixel region centered on the pixel of interest currently (1 + c) ) a combination of _{a × b-1 C} c _-d Street when the other pixels with the center of a combination of street _{a × b-1} C _c and pattern excluding the combination of d as duplicate for selecting the pixel first 2 ^{(1 + c)} − which is one dimension and excludes 2 ^{(1 + c)} combinations of values that can be taken by (1 + c) pixels including the currently focused pixel from 2 ^{(1 + c)} combinations where the values of all pixels are the same. A two-dimensional array of ( _{a × b−1} C _c −d) × (2 ^{(1 + c)} −2) having two combinations as the second dimension is provided, and each pixel included in the pixel group corresponds. Increasing the value of an element of a two-dimensional array by 1 is included in the pixel group By performing the pixel of Te, we obtain a distribution of the values of the elements of the two-dimensional array corresponding to the pixel groups,
Furthermore,
A determination step of determining that a character having a distribution of element values of the two-dimensional array closest to the distribution of element values of the two-dimensional array obtained for the pixel group is a character included in the pixel group; A character recognition method comprising:

The character recognition method according to claim 1,
A vertical distribution calculation step for obtaining a vertical distribution of values obtained by adding the values of the pixels included in the pixel group in the horizontal direction for each vertical position;
A horizontal distribution calculation step of obtaining a horizontal distribution of values obtained by adding the values of the pixels included in the pixel group in the vertical direction for each horizontal position;
Further comprising
In the determining step, the distribution of element values of the two-dimensional array obtained for the pixel group, the vertical distribution, the horizontal distribution, and the distribution of element values of the two-dimensional array of each candidate character, the vertical A character recognition method comprising: determining a character included in the pixel group based on a distance between the distribution and the horizontal distribution.

The character recognition method according to claim 1 or 2,
Character string candidates obtained by performing character recognition for each character are limited to a plurality of predetermined character strings,
An error that terminates with an error if a part of a character string that is not included in any of a plurality of predetermined character strings is obtained during character recognition for each character sequentially. A character recognition method further comprising an end step.

The character recognition method according to any one of claims 1 to 3,
A character recognition method, wherein a = 3, b = 3, c = 2, and d = 8.

A pixel group that is determined to include a single character, and selects one pixel of interest from among pixels included in the pixel group that includes the binarized pixels, and the pixel group includes the pixel group. Provide statistical processing means for statistical processing,
The statistical processing means includes the pixel of interest and another arbitrary c pixel from a × b pixels in a region of vertical a pixel × horizontal b pixel centered on the pixel of interest at present (1 + c ) a combination of _{a × b-1 C} c _-d Street when the other pixels with the center of a combination of street _{a × b-1} C _c and pattern excluding the combination of d as duplicate for selecting the pixel first 2 ^{(1 + c)} − which is one dimension and excludes 2 ^{(1 + c)} combinations of values that can be taken by (1 + c) pixels including the currently focused pixel from 2 ^{(1 + c)} combinations where the values of all pixels are the same. A two-dimensional array of ( _{a × b−1} C _c −d) × (2 ^{(1 + c)} −2) having two combinations as the second dimension is provided, and each pixel included in the pixel group corresponds. Everything contained in the pixel group to increase the value of an element of a two-dimensional array by 1 By performing the pixel, we obtain a distribution of the values of the elements of the two-dimensional array corresponding to the pixel groups,
Furthermore,
Determination means for determining that a character having a distribution of element values of the two-dimensional array closest to the distribution of element values of the two-dimensional array obtained for the pixel group is a character included in the pixel group A character recognition device comprising:

The character recognition device according to claim 5,
Vertical direction distribution calculating means for obtaining a vertical distribution of values obtained by adding the values of the pixels included in the pixel group in the horizontal direction for each vertical position;
Horizontal direction distribution calculating means for obtaining a horizontal direction distribution of values obtained by adding the values of the pixels included in the pixel group in the vertical direction for each horizontal position;
Further comprising
The determination means includes a distribution of element values of the two-dimensional array obtained for the pixel group, a distribution of element values of the two-dimensional array of the vertical distribution and the horizontal distribution and each candidate character, A character recognition apparatus, wherein a character included in the pixel group is determined based on a distance between a vertical distribution and the horizontal distribution.

In the character recognition device according to claim 5 or 6,
Character string candidates obtained by performing character recognition for each character are limited to a plurality of predetermined character strings,
An error that terminates with an error if a part of a character string that is not included in any of a plurality of predetermined character strings is obtained during character recognition for each character sequentially. A character recognition apparatus, further comprising an ending unit.

The character recognition device according to any one of claims 5 to 7,
A character recognition device, wherein a = 3, b = 3, c = 2, and d = 8.

A character recognition program for causing a computer to execute the character recognition method according to claim 1.