JPS6325391B2

JPS6325391B2 -

Info

Publication number: JPS6325391B2
Application number: JP55126845A
Authority: JP
Inventors: Teruo Akyama; Isao Masuda
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1980-09-12
Filing date: 1980-09-12
Publication date: 1988-05-25
Also published as: JPS5752971A

Description

【発明の詳細な説明】本発明は、２次元文字領域抽出装置、特に紙面
上の文字を読み取る光学的文字読取装置（OCR
装置）などにおいて、文字の位置、大きさ等の事
前情報が全く与えられない自由な書式の紙面に対
しても紙面の傾き補正を行ない、その上で紙面の
中から文字領域を濃度値累積結果を用いて自動的
に抽出することの出来る文字領域抽出装置に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a two-dimensional character area extraction device, particularly an optical character reading device (OCR) that reads characters on a paper surface.
(device), etc., the inclination of the paper surface is corrected even for free-format paper surfaces for which no prior information such as character position, size, etc. This invention relates to a character area extracting device that can automatically extract character areas using.

OCRを用いて文字を読取る場合には、まず最
初に入力用紙の傾き補正を行ない、読み取るべき
文字を切り出しておく必要がある。従来、用紙の
傾き補正の手法として、用紙にガイドマークを印
刷しておき、そのガイドマークを基準に補正する
もの、また用紙枠を検出してそれを基準に補正す
るもの等があつた。しかし前者の場合は予め用紙
にガイドマークを印刷しておかなければならず、
また後者の場合は切り抜きや、或いは複写する際
に傾いて複写されたものに対しては傾き補正が出
来ないという欠点があつた。また文字の切り出し
に関しては従来、用紙内の文字の位置大きさ等を
予め決めておき、ガイドマーク或は用紙端からの
距離等によつて各文字を切り出しているため、通
常我々の扱つている自由な書式のもとに作成され
た文書に対しては文字領域の抽出が全く出来ない
という欠点を持つていた。 When reading characters using OCR, it is first necessary to correct the tilt of the input paper and cut out the characters to be read. Conventionally, as methods for correcting the inclination of paper, there have been methods such as printing guide marks on the paper and correcting the paper using the guide marks as a reference, and methods that detect the paper frame and correct the paper using it as a reference. However, in the former case, guide marks must be printed on the paper in advance,
Furthermore, in the latter case, there is a drawback in that it is not possible to correct the skew of a copy that is skewed during cutting or copying. In addition, conventionally, when it comes to cutting out characters, the position and size of the characters on the paper are determined in advance, and each character is cut out based on guide marks or the distance from the edge of the paper. This method had the disadvantage that it was not possible to extract character areas at all for documents created using free formatting.

本発明は、これらの欠点を解決するために、傾
き補正や文字切り出しのための事前情報の全くな
い任意の書式の文書に対してもその傾きを補正し
その上で文書に含まれる文字領域を濃度値累積結
果を用いて抽出することを目的としており、以下
図面に従つて詳細に説明する。 In order to solve these drawbacks, the present invention corrects the skew even for documents in arbitrary formats that do not have any prior information for skew correction or character extraction, and then corrects the skew of the text areas included in the document. The purpose is to perform extraction using the density value accumulation results, and will be described in detail below with reference to the drawings.

第１図は本発明のブロツク図であつて１は入力
信号を量子化する量子化回路、２は量子化された
画像を記憶する２次元画像メモリ、３は傾き補正
装置、４は領域分割装置、５はフイールドセパレ
ータ検出装置、６は領域追加分割装置、７は領域
分割結果記憶装置、８は非空白領域検出装置、９
は非空白領域長方矩形化装置、１０は領域ラベリ
ング装置である。以下それぞれの装置について詳
細に説明していく。 FIG. 1 is a block diagram of the present invention, in which 1 is a quantization circuit that quantizes an input signal, 2 is a two-dimensional image memory that stores the quantized image, 3 is a tilt correction device, and 4 is a region dividing device. , 5 is a field separator detection device, 6 is an area addition/division device, 7 is an area division result storage device, 8 is a non-blank area detection device, 9
1 is a non-blank region rectangularization device, and 10 is a region labeling device. Each device will be explained in detail below.

量子化回路１は、原画像を光電変換装置等によ
り走査し、得られる電気信号を複数個のレベルに
量子化するもので、量子化されたデータはそのま
ま２次元画像メモリ２に記憶される。 The quantization circuit 1 scans an original image using a photoelectric conversion device or the like and quantizes the obtained electrical signal into a plurality of levels, and the quantized data is stored as is in the two-dimensional image memory 2.

第２図は傾き補正装置３によつて行なわれる傾
き補正の原理について説明したものである。ａは
２次元画像メモリ２に記憶されている画像データ
の一例である。ここでb₁は画面中に含まれる見出
しの文字、b₂は本文の１つの文字列を示す。２次
元画像メモリ２に対して複数方向にラスター走査
し、各走査線上にある画素の濃度値の和である濃
度累積値を求め、その結果をヒストグラムで示し
たのが第２図ｅ，ｆである。ｅは方向ｃの方向に
ラスター走査した場合、ｆは方向ｄの方向にラス
ター走査した場合に得られるヒストグラムであ
る。原画面を０、１の２値に量子化した場合、濃
度累積値は走査線上の黒画素数と等しい。第２図
から明らかな様に文字の並んでいる方向と走査方
向が一致した場合はヒストグラム上にピークが明
確に現われるので（ｆ参照）、この性質を利用し
て傾きを補正する。具体的にはたとえば互いに隣
接する走査線における濃度累積値の差分（e′，f′）
をとり、全走査又は一部分の走査についての和で
ある濃度累積値差分絶体値和を用いる。２次元画
像メモリー内のデータに対し、濃度累積値差分絶
体値和を複数方向からのラスター走査について求
め、その中で最大の値を取る走査の方向を画面の
主方向とする。すなわち max ｛θ｝｛_o-1 〓ⁱ⁼¹ ｜ｆ（l_i+1）−ｆ（l_i）｜｝となるθ₀を求める。ここでｆ（l_i）は第ｉ走査線上
にある画素の濃度累積値である。２次元画像デー
タに−θ₀だけ回転を加えることにより傾きの補正
が実現出来る。 FIG. 2 explains the principle of tilt correction performed by the tilt correction device 3. a is an example of image data stored in the two-dimensional image memory 2. Here, b ₁ represents a heading character included in the screen, and b ₂ represents one character string of the main text. The two-dimensional image memory 2 is raster-scanned in multiple directions to obtain the cumulative density value, which is the sum of the density values of pixels on each scanning line, and the results are shown in a histogram as shown in Figures 2e and f. be. e is a histogram obtained when raster scanning is performed in the direction c, and f is a histogram obtained when raster scanning is performed in the direction d. When the original screen is quantized into binary values of 0 and 1, the cumulative density value is equal to the number of black pixels on the scanning line. As is clear from FIG. 2, when the direction in which the characters are lined up and the scanning direction match, a peak clearly appears on the histogram (see f), and this property is used to correct the inclination. Specifically, for example, the difference (e′, f′) in density cumulative values in mutually adjacent scanning lines
, and use the sum of the absolute values of the cumulative density value differences, which is the sum of the entire scan or a portion of the scan. For the data in the two-dimensional image memory, the sum of the absolute values of cumulative density value differences is determined for raster scanning from a plurality of directions, and the direction of scanning that takes the largest value among them is determined as the main direction of the screen. That is, θ ₀ such that max {θ} { _o-1 〓 ⁱ⁼¹ |f(l _i+1 )−f(l _i )|} is determined. Here, f(l _i ) is the cumulative density value of the pixel on the i-th scanning line. Correction of the tilt can be realized by adding rotation by −θ ₀ to the two-dimensional image data.

回路構成例を第３図に示す。ラスター走査中に
画素の濃度値がレジスタ１１に加算され、一本の
ラスター走査が終了すると差分絶体値回路１３を
用いてレジスタ１１と１走査前の濃度累積値が保
持されているバツフアメモリ１２との差分絶体値
を取り、レジスタ１４の内容に加算すると同時に
レジスタ１１の内容をバツフアメモリ１２に書
き、レジスタ１１の内容をクリアする。画面全体
のラスター走査が終了した時にレジスタ１４の内
容をメモリ１５に記憶しレジスタ１４をクリアす
る。１６はメモリ１５の内容を見て２次元画像デ
ータの傾き補正をする傾き補正回路である。画面
をθ度時計方向に回転させる時、原データの座標
（ｘ、ｙ）に対応する傾き補正後のデータの座標
（ｘ、ｙ）は x′＝xcosθ−ysinθ y′＝xsinθ＋ycosθ で求められる。 An example of the circuit configuration is shown in FIG. During raster scanning, the density value of a pixel is added to the register 11, and when one raster scan is completed, the difference absolute value circuit 13 is used to add the density value of the pixel to the register 11 and the buffer memory 12, which holds the density cumulative value from one scan before. The absolute value of the difference is taken and added to the contents of the register 14, and at the same time the contents of the register 11 are written to the buffer memory 12, and the contents of the register 11 are cleared. When raster scanning of the entire screen is completed, the contents of the register 14 are stored in the memory 15 and the register 14 is cleared. Reference numeral 16 denotes a tilt correction circuit that looks at the contents of the memory 15 and corrects the tilt of the two-dimensional image data. When the screen is rotated clockwise by θ degrees, the coordinates (x, y) of the tilt-corrected data corresponding to the coordinates (x, y) of the original data are determined as follows: x′=xcosθ−ysinθ y′=xsinθ+ycosθ.

第４図は第１図中の領域分割装置４を用いて行
なう領域分割の原理について述べたものである。
領域分割装置４では傾き補正装置３で得られた傾
き補正画面の濃度累積値を用いる。傾き補正後の
画面に対して水平、垂直方向にラスター走査して
得られる濃度累積値を用いるのが一番効果的であ
る。濃度累積値は画面内に存在する文字列の位置
を反映する。すなわち、濃度累積値をヒストグラ
ムで表わすと文字列に対応してヒストグラム上に
ピークが現われるので、棒状ヒストグラムの立ち
上り、立ち下りの位置の座標を利用して文字列を
切り出す（第４図）。g₁〜g₆、h₁〜h₆は領域分割
線である。第５図は走査した画面内に文字列が存
在する場合に生ずる濃度ヒストグラムのピークの
一例である。ここでi₁とi₂の部分を検出出来れば
１つの文字列を切り出すことが可能である。ここ
で、i₁、i₂の部分を安定に検出するために第５図
i₁〜i₅に示される様な細かい凹凸を削除しておく
必要があるのでヒストグラムに対して以下に述べ
る修正を加える。第ｉ走査における濃度累積値を
ｆ（l_i）とした時ｆ（l_i）＜ｆ（_i+1）ならばｆ（l_i）＜ｆ（l_i+a）かつｆ（l_i）≧ｆ（l_i+a+1）となる最小の正の整数ａを見つけ、ａの値がある
一定値よりも小さければ第ｉ＋１走査から第ｉ＋
ａ走査までの濃度累積値を第ｉ走査における濃度
累積値に置き換える。そしてこの処理を全ての走
査線における濃度累積値について行なう。この処
理によつて第５図上のi₁、i₂の立ち上り、立ち下
りの部分をなまらせずに細かい凹凸を削除するこ
とが出来る（第６図）。ヒストグラム上の幅の狭
い谷を埋めるにはヒストグラムに対しこれと逆の
処理とすれば良い。すなわちｆ（l_i）＞ｆ（l_i+1）ならばｆ（l_i）＞ｆ（l_i+1）ならばｆ（l_i）＜ｆ（l_i+a′）かつｆ（l_i）≦ｆ（l_i+a′
₊₁）となる最小の正の整数a′を見つけ、a′の値がある
一定値よりも小さければ第ｉ＋１走査から第ｉ＋
ａ走査までの濃度累積値を第ｉ走査における濃度
累積値に置き換える。領域分割線の位置を決定す
るために、例えば濃度累積値ｆ（l_i）とｆ（l_i+1）の
差分の符号を用いることが出来る。第６図で範囲
ｍを符号（＋）の部分とする。その左端の濃度累
積値ｆ（l_s）と右端の濃度累積値ｆ（l_e）の差があ
る一定値θ′以上であれば、その範囲の中にヒスト
グラムの立ち上りの部分があると考える。さらに
適当な値θ″を決めｆ（l_s+1）−ｆ（l_s）＜θ″ 〓ｆ（l_s+j）−ｆ（l_s+j-1）＜θ″ ｆ（l_s+j+1）−ｆ（l_+j）≧θ″ となる正の整数ｊを見つけ、l_s+jとl_s+j-1の間を領
域分割線n₁を引くための位置とする。ヒストグラ
ムの立ち下りの部分についても同様の処理をす
る。領域分割装置４で得られた結果は領域分割結
果記憶装置７に格納される。 FIG. 4 describes the principle of region division performed using the region division device 4 in FIG. 1.
The area dividing device 4 uses the cumulative density value of the tilt corrected screen obtained by the tilt correcting device 3. It is most effective to use cumulative density values obtained by raster scanning the screen after tilt correction in the horizontal and vertical directions. The density cumulative value reflects the position of the character string existing within the screen. That is, when the density cumulative value is expressed in a histogram, peaks appear on the histogram corresponding to character strings, so the character strings are cut out using the coordinates of the rising and falling positions of the bar-shaped histogram (FIG. 4). g ₁ to g ₆ and h ₁ to _{h 6} are area dividing lines. FIG. 5 is an example of a peak in a density histogram that occurs when a character string exists within a scanned screen. Here, if parts i ₁ and i ₂ can be detected, it is possible to extract one character string. Here, in order to stably detect the i ₁ and i ₂ parts, Fig. 5 is used.
Since it is necessary to remove small irregularities such as those shown in i ₁ to _{i 5} , the following modifications are made to the histogram. When the cumulative density value in the i-th scan is f(l _i ), if f(l _i ) < f( _i+1 ), then f(l _i ) < f(l _i+a ) and f(l _i ) ≥ Find the smallest positive integer a that satisfies f(l _i+a+1 ), and if the value of a is smaller than a certain value, the
The density cumulative value up to the a scan is replaced with the density cumulative value in the i-th scan. This process is then performed for the density cumulative values in all scanning lines. By this process, fine irregularities can be removed without dulling the rising and falling portions of i ₁ and i ₂ in FIG. 5 (FIG. 6). To fill in narrow valleys on the histogram, the histogram may be processed in the opposite way. That is, if f(l _i )>f(l _i+1 ), then f(l _i )>f(l _i+1 ), then f(l _i )<f(l _i+a ′) and f(l _i )≦f(l _i+a ′
₊₁ ), and if the value of a' is smaller than a certain value, the
The density cumulative value up to the a scan is replaced with the density cumulative value in the i-th scan. In order to determine the position of the area dividing line, for example, the sign of the difference between the density cumulative values f(l _i ) and f(l _i+1 ) can be used. In FIG. 6, the range m is indicated by the sign (+). If the difference between the left end cumulative density value f( _ls ) and the right end cumulative density value f( _le ) is greater than or equal to a certain value θ', it is considered that there is a rising portion of the histogram within that range. Furthermore, determine an appropriate value θ″, f(l _s+1 )−f(l _s )＜θ″ 〓 f(l _s+j )−f(l _s+j-1 )＜θ″ f(l _s+ Find a positive integer j that satisfies _j+1 )−f(l _+j )≧θ″, and set the position between l _s+j and l _s+j-1 to draw the area dividing line n ₁ . Similar processing is performed for the falling portion of the histogram. The results obtained by the region dividing device 4 are stored in the region dividing result storage device 7.

フイールドセパレータ検出装置５は２次元画像
メモリー２内に存在するフイールドセパレータの
位置、長さ等を検出する装置である。フイールド
セパレータは画面を区切るために用いられる図形
の総称であり、通常直線状のものが用いられ、そ
の殆んどが画面に対して水平、垂直方向に存在す
る（第７図、Q₁，Q₂，Ｒ）。従つてある一定の濃
度値以上を持つ画素が水平、垂直方向に連続して
存在している部分を検出すればフイールドセパレ
ータの抽出が可能である。各画素の量子化レベル
が３以上の場合には適当なしきい値を与え、その
しきい値以上の濃度値を持つた画素がある決まつ
た数以上連続して存在する場所をフイールドセパ
レータとして検出する。直線状のフイールドセパ
レータであれば原理的にはこの手法で検出が可能
であるが印刷の状態、或はフイールドセパレータ
の種類によつて、かすれていたり、場合によつて
は正確な直線ではないものもあるので、フイール
ドセパレータを安定に検出するためにデータをぼ
かした状態にする。水平方向のフイールドセパレ
ータを検出する場合には縦方向のぼかしを加える
と効果的である。たとえば画素（ｉ、ｊ）の濃度
値がフイールドセパレータ検出のためのしきい値
を越えている場合には画素（ｉ−１、ｊ）、（ｉ＋
１、ｊ）も計数の対象とする。さらにある一定の
距離以内にあるフイールドセパレータ同士を連結
するブリツジング操作を行なう。 The field separator detection device 5 is a device that detects the position, length, etc. of the field separator existing in the two-dimensional image memory 2. Field separators are a general term for shapes used to separate screens, and are usually linear, and most of them are horizontal or vertical to the screen (Fig. 7, Q ₁ , Q ₂ , R). Therefore, it is possible to extract a field separator by detecting a portion where pixels having a certain density value or more are continuously present in the horizontal and vertical directions. If the quantization level of each pixel is 3 or higher, an appropriate threshold value is given, and a location where a certain number or more of pixels with a density value higher than the threshold value continuously exists is detected as a field separator. do. In principle, it is possible to detect a straight field separator using this method, but depending on the printing condition or type of field separator, the field separator may be blurred or may not be an exact straight line. In order to stably detect the field separator, the data is blurred. When detecting field separators in the horizontal direction, it is effective to add blur in the vertical direction. For example, if the density value of pixel (i, j) exceeds the threshold for field separator detection, pixels (i-1, j), (i+
1, j) are also counted. Furthermore, a bridging operation is performed to connect field separators that are within a certain distance.

領域追加分割装置６では領域分割結果記憶装置
７の内容を見てフイールドセパレータ検出装置５
で検出されたフイールドセパレータが、領域分割
装置４で既に検出されていたかどうかを調べ、検
出されていない時は新たに領域を分割する。具体
的にはフイールドセパレータから幅方向に見てあ
る一定距離α以内に領域分割線があればそのフイ
ールドセパレータは装置４によつて検出されてい
るものとする。そうでない場合はフイールドセパ
レータを完全に含む最小の領域において検出され
たフイールドセパレータから距離αを持つて分割
線を加え装置７の内容を修正する。 The area additional division device 6 reads the contents of the area division result storage device 7 and uses the field separator detection device 5.
It is checked whether the field separator detected in is already detected by the area dividing device 4, and if it has not been detected, a new area is divided. Specifically, if there is a region dividing line within a certain distance α from the field separator in the width direction, it is assumed that the field separator has been detected by the device 4. If this is not the case, the contents of the device 7 are modified by adding a dividing line at a distance α from the detected field separator in the smallest area that completely includes the field separator.

第７図においてフイールドセパレータＲは領域
分割線t₅，t₆により検出されていると考えられる
が、フイールドセパレータQ₁，Q₂はいずれも検
出されていない。そこでQ₁全体を含む領域w₁，
w₂，w₃においてフイールドセパレータからαの
距離をもつて新しい分割線u₁，u₂を引く。Q₂に
ついても同様である。 In FIG. 7, field separator R is considered to be detected by area dividing lines t ₅ and t ₆ , but field separators Q ₁ and Q ₂ are not detected. Therefore, the area w ₁ that includes the entire Q ₁ ,
At w ₂ and w ₃ , new dividing lines u ₁ and u ₂ are drawn at a distance of α from the field separator. The same applies to _Q2 .

非空白領域検出装置８は、領域分割装置４及び
領域追加分割装置６によつて分割され、その結果
が領域分割結果記憶装置７に格納されている個々
の領域に対して、そこに含まれる画素の濃度値の
総和を求め、面積に対するその値の割合が一定値
以下の場合にはその領域を空白領域とし、そうで
ないものを非空白領域とする。 The non-blank area detection device 8 detects the pixels included in each area divided by the area division device 4 and the area addition division device 6 and whose results are stored in the area division result storage device 7. The sum of the density values is determined, and if the ratio of the value to the area is less than a certain value, the area is determined to be a blank area, and if not, the area is determined to be a non-blank area.

非空白領域長方矩形化装置９は非空白領域検出
装置８によつて検出された非空白領域のうち互い
に連結した非空白領域をまとまつた１つの領域と
し、その領域全体を含む最小の長方矩形領域を切
り出し、その結果を領域分割結果記憶装置７にわ
たすものである（第８図）。第８図ａでy₁₁〜y₁₅，
y₂₁，y₂₅，y₃₁，y₃₄，y₄₁，y₄₃〜y₄₄は空白領域、
y₂₂〜y₂₄，y₃₂〜y₃₃，y₄₂は非空白領域である。こ
こにおいて非空白領域を１つにまとめ、非空白領
域全部を含む最小の長方矩形領域である第８図Ｚ
を切り出し結果を装置７に格納する。２次元画像
メモリ２の中で装置７で非空白領域として登録さ
れている領域に対してさらに領域分割装置４を用
いた分割を行ない、最初の分割では分割し得なか
つた領域を分割する。第７図において領域x₁，
x₂，x₃は非空白領域として統合され、新たな分割
線v₁によつて分割される。分割線v₂についても同
様である。この操作をそれ以上分割線が引けなく
なるまで反復する。 The non-blank area rectangularization device 9 converts the connected non-blank areas detected by the non-blank area detection device 8 into one unified area, and converts the non-blank areas detected by the non-blank area detection device 8 into a minimum rectangular rectangle that includes the entire area. A rectangular area is cut out and the result is passed to the area division result storage device 7 (FIG. 8). In Figure 8a, y ₁₁ ~ y ₁₅ ,
y ₂₁ , y ₂₅ , y ₃₁ , y ₃₄ , y ₄₁ , y ₄₃ to y ₄₄ are blank areas,
y ₂₂ to _{y 24} , y ₃₂ to y ₃₃ , and y ₄₂ are non-blank areas. Here, the non-blank areas are combined into one, and the minimum rectangular area containing all the non-blank areas is shown in FIG.
The result is stored in the device 7. The region registered as a non-blank region in the two-dimensional image memory 2 by the device 7 is further divided using the region dividing device 4, and the region that could not be divided in the first division is divided. In Fig. 7, the area x ₁ ,
x ₂ and x ₃ are integrated as a non-blank area and divided by a new dividing line v ₁ . The same applies to the dividing line _v2 . Repeat this operation until no more dividing lines can be drawn.

領域ラベリング装置１０では領域分割結果記憶
装置７の中に記憶されている個々の非空白領域に
対してその位置の座標、大きさ、カテゴリをテー
ブルにして出力する。 The region labeling device 10 outputs the coordinates, size, and category of each non-blank region stored in the region division result storage device 7 in the form of a table.

以上説明した様に、本発明によれば、OCR等
を用いて文書を入力する際に文書中に傾き補正用
のマークが印刷されていなくても、また複写等の
際に原稿が傾いて複写されたものでも、読み取る
ものが切り抜きの様なものであつても、また文字
の位置大きさ等が予め決められていない自由書式
のものでも、ラスタ走査を行なつた際の濃度値累
積を行なうようにして、比較的容易に文字領域を
切り出すことが出来る。なお本発明は、主に印刷
物に対して大きな効果をもつが、手書きの書類に
対しても有効であり、また文字以外の表、グラフ
等を含む紙面からの文字領域抽出にも有効であ
る。 As explained above, according to the present invention, when inputting a document using OCR etc., even if there is no skew correction mark printed in the document, or when copying, etc., the original is skewed and copied. Even if the object to be read is a cutout, or a free-format object in which the position and size of characters are not determined in advance, density values are accumulated during raster scanning. In this way, character areas can be cut out relatively easily. Although the present invention is largely effective for printed matter, it is also effective for handwritten documents, and is also effective for extracting character areas from sheets of paper that include tables, graphs, etc. other than characters.

[Brief explanation of the drawing]

第１図は本発明装置の一実施例ブロツク図、第
２図は角度補正の手法を説明する原理説明図、第
３図は第１図図示の傾き補正装置の一実施例回路
構成図、第４図は領域分割の原理を示す説明図、
第５図は濃度累積値ヒストグラムの一例、第６図
は該ヒストグラムに修正を加えて領域分割線の位
置を決定する手法を説明する説明図、第７図は領
域分割と文字領域の抽出とに関する説明図、第８
図は第１図図示の非空白領域長方矩形化装置の動
作を説明する説明図を示す。図中、１は量子化回路、２は画像メモリ、３は
傾き補正装置、４は領域分割装置、５はフイール
ド・セパレータ検出装置、６は領域追加分割装
置、７は領域分割結果記憶装置、８は非空白領域
検出装置、９は非空白領域長方矩形化装置、１０
は領域ラベリング装置を表わす。 FIG. 1 is a block diagram of an embodiment of the device of the present invention, FIG. 2 is a principle explanatory diagram illustrating the method of angle correction, and FIG. 3 is a circuit diagram of an embodiment of the tilt correction device shown in FIG. 1. Figure 4 is an explanatory diagram showing the principle of area division.
FIG. 5 is an example of a density cumulative value histogram, FIG. 6 is an explanatory diagram illustrating a method for determining the position of area dividing lines by modifying the histogram, and FIG. 7 is related to area division and character area extraction. Explanatory diagram, No. 8
The figure shows an explanatory diagram for explaining the operation of the non-blank area rectangularization device shown in FIG. In the figure, 1 is a quantization circuit, 2 is an image memory, 3 is a tilt correction device, 4 is an area division device, 5 is a field separator detection device, 6 is an area addition division device, 7 is an area division result storage device, 8 9 is a non-blank area detection device, 9 is a non-blank area rectangularization device, and 10 is a non-blank area detection device.
represents a region labeling device.

Claims

[Claims] 1. A quantization circuit that decomposes a two-dimensional screen into pixels and quantizes an electrical signal output according to the density of each pixel, and a two-dimensional image memory that stores the quantized data. In preparation, in a two-dimensional character area extraction device that performs raster scanning on a two-dimensional image memory to extract a character area, raster scanning is performed on the two-dimensional image memory from multiple directions, and pixels on a scanning line are extracted. a tilt correction device that detects the tilt of an input screen by accumulating quantized density values and determining the scanning direction in which the density value accumulation results change most rapidly, and corrects the tilt based on the detection result; a device that performs region segmentation by detecting a portion where the density value cumulative result obtained by raster scanning the two-dimensional image memory resulting from the correction from multiple directions changes rapidly; a device for storing results obtained using a region dividing device; a field separator detecting device for detecting a field separator from a continuous amount of pixels having a predetermined value or more in the two-dimensional image memory; A two-dimensional character area, characterized in that it is equipped with an area additional dividing device that corrects the already obtained area dividing result based on the field separator detected by the device, and extracts a character area. Extraction device.