JP2006331057A

JP2006331057A - Character information extraction device, character information extraction method, and computer program

Info

Publication number: JP2006331057A
Application number: JP2005153418A
Authority: JP
Inventors: Hisao Okuda; 尚生奥田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-05-26
Filing date: 2005-05-26
Publication date: 2006-12-07

Abstract

【課題】テレビ映像に含まれるテロップなど、自然画中に描画されている斜体文字を文字認識処理が可能な文字情報として好適に取り出す。
【解決手段】テロップは背景に対して輝度が高いという特徴を利用して、入力画像をＹＵＶ変換し、このうちＹ成分に対してエッジ計算を行なう。さらに２値化して、文字部分のエッジだけを残す。続いて、エッジ画像を縮退して細線化し、細線化したエッジ画像から直線を検出する。エッジ画像を細線化処理することで、直権検出で誤検出を減らすことができる。そして、検出された直線の傾きの平均を斜体文字の傾斜角として用いる。
【選択図】図１
PROBLEM TO BE SOLVED: To suitably extract an italic character drawn in a natural image such as a telop included in a television image as character information capable of character recognition processing.
A telop performs YUV conversion of an input image using the feature that the luminance is high with respect to the background, and performs edge calculation for the Y component. Further, binarization is performed to leave only the edge of the character portion. Subsequently, the edge image is degenerated and thinned, and a straight line is detected from the thinned edge image. By thinning the edge image, false detection can be reduced by right detection. Then, the average of the detected straight line inclinations is used as the inclination angle of the italic character.
[Selection] Figure 1

Description

本発明は、画像中に描画されている文字情報を抽出する文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムに係り、特に、例えばテレビ放送の録画により得られた画像フレームに含まれているテロップから文字情報を抽出する文字情報を抽出する文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムに関する。 The present invention relates to a character information extraction device, a character information extraction method, and a computer program for extracting character information drawn in an image, and is particularly included in an image frame obtained by recording a television broadcast, for example. The present invention relates to a character information extraction apparatus, a character information extraction method, and a computer program for extracting character information from a telop.

さらに詳しくは、本発明は、画像中に描画されている文字情報を一般的なＯＣＲエンジンで文字認識処理が可能となる形態で取り出す文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムに係り、特に、テレビ映像に斜体文字として描画されるテロップを文字認識処理が可能な文字情報として取り出す文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムに関する。 More specifically, the present invention relates to a character information extraction apparatus, a character information extraction method, and a computer program that extract character information drawn in an image in a form that allows character recognition processing by a general OCR engine. In particular, the present invention relates to a character information extraction device, a character information extraction method, and a computer program that extract a telop drawn as italic characters on a television image as character information that can be subjected to character recognition processing.

現代の情報文明社会において、放送の役割は計り知れない。とりわけ、音響とともに映像情報を視聴者の元に直接届けるテレビ放送の影響は大きい。放送技術は、信号処理やその送受信、音声や映像の情報処理など、幅広い技術を包含する。テレビの普及率は極めて高く、ほとんどすべての家庭内に設置されており、各放送局から配信される放送コンテンツは不特定多数の人々によって視聴されている。 In the modern information civilized society, the role of broadcasting is immeasurable. In particular, the influence of television broadcasting that delivers video information directly to viewers along with sound is significant. Broadcast technology includes a wide range of technologies such as signal processing, transmission / reception thereof, and information processing of audio and video. The penetration rate of television is extremely high, and it is installed in almost all households. Broadcast content distributed from each broadcasting station is viewed by an unspecified number of people.

最近では、放送コンテンツを視聴する他の形態として、受信したコンテンツをハード・ディスク・レコーダなどで一旦録画しておき、好きな時間に再生するという視聴形態が一般的となりつつある。ここで、ハード・ディスクの大容量化に伴い、数十時間分にも及ぶ番組録画が可能であるため、録画した映像コンテンツをユーザがすべて視聴することは不可能に近い。そこで、ユーザが興味を持つ場面だけをうまくシーン検索し、ダイジェスト視聴するというスタイルが、より効率的であるとともに録画コンテンツの有効活用にもなる。 Recently, as another form of viewing broadcast content, a viewing form in which received content is once recorded with a hard disk recorder or the like and played back at a desired time is becoming common. Here, as the capacity of the hard disk increases, program recording for several tens of hours is possible, so it is almost impossible for the user to view all the recorded video content. Therefore, the style of searching for only scenes that the user is interested in and viewing the digest is more efficient and also makes effective use of the recorded content.

このような録画コンテンツにおけるシーン検索やダイジェスト視聴を行なうには、映像に対しインデックス付けを行なう必要がある。 In order to perform scene search and digest viewing in such recorded content, it is necessary to index the video.

例えば、ニュース番組やバラエティ番組などのテレビ放送においては、番組の制作・編集の手法として、フレームの四隅に番組のトピックを明示的あるいは暗示的に表現したテロップを表示することが一般的に採用されている。フレーム中で表示されるテロップは、その表示区間での放送番組のトピックを特定又は推定するための重要な手がかりになる。したがって、映像コンテンツからテロップを抽出し、テロップの表示内容を文字認識して得られる文字情報を指標として映像インデックス付けを行なうことが可能であると考えられる。 For example, in television broadcasting such as news programs and variety programs, it is common to display telops that express program topics explicitly or implicitly at the four corners of the frame as a method of program production and editing. ing. The telop displayed in the frame is an important clue for specifying or estimating the topic of the broadcast program in the display section. Therefore, it is considered possible to perform video indexing using character information obtained by extracting a telop from video content and character-recognizing the display content of the telop as an index.

例えば、フレーム中のテロップを特徴画部分として検出し、テロップだけの映像データを抽出して放送番組の内容を示すメニューを自動的に作成する放送番組内容メニュー作成装置について提案がなされている（例えば、特許文献１を参照のこと）。 For example, a broadcast program content menu creation device that detects a telop in a frame as a feature image portion, extracts video data of only the telop and automatically creates a menu indicating the content of the broadcast program has been proposed (for example, , See Patent Document 1).

通常、画像に埋め込まれた文字情報を取り出すには、光学的な画像データから文字データを認識処理するＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）エンジンを用いて文字認識を行なう必要がある。例えば、映像からテロップが表示されている領域を検出し、テロップ文字を構成する画素のみを抽出して、ＯＣＲ処理で認識を行なうテロップ情報処理装置について提案がなされている（例えば、特許文献２を参照のこと）。 Usually, in order to extract character information embedded in an image, it is necessary to perform character recognition using an OCR (Optical Character Reader) engine that recognizes character data from optical image data. For example, there has been proposed a telop information processing apparatus that detects a region where a telop is displayed from a video, extracts only pixels constituting a telop character, and performs recognition by OCR processing (see, for example, Patent Document 2). See

ところが、テレビ画像中のテロップには斜体が使わることが多いのに対し、既存のＯＣＲエンジンは斜体文字に対応していない。このため、ほとんどのテロップを自動認識することができず、有効な映像インデックス付けを行ないない、という問題がある。 However, while italics are often used for telops in TV images, existing OCR engines do not support italic characters. For this reason, most telops cannot be automatically recognized, and there is a problem that effective video indexing is performed.

斜体文字を文字認識するための解決する主な方法として、ＯＣＲエンジンを斜体に対応させる、あるいは斜体を直立文字に変換するという２通りの方法が挙げられる。 There are two main methods for resolving italic characters: the OCR engine supports italics, or italics are converted to upright characters.

前者の解決方法では、ＯＣＲエンジンが用意すべき辞書の規模が非常に大きくなり、また辞書サイズに比例して認識処理にも時間を要してしまうため、現実的ではない。 The former solution is not practical because the scale of the dictionary to be prepared by the OCR engine becomes very large, and the recognition process takes time in proportion to the dictionary size.

また、後者の解決方法として、具体的には以下の方法が挙げられる。 Further, as the latter solution, the following methods are specifically mentioned.

（１）エッジの向きを利用する方法（非特許文献１を参照のこと）
（２）文字輪郭を利用した縦方向のストロークを検出する方法（非特許文献２並びに非特許文献３を参照のこと）
（３）実際に剪断処理を行なう方法（非特許文献４を参照のこと） (1) Method using edge direction (refer to Non-Patent Document 1)
(2) Method for detecting a vertical stroke using a character outline (refer to Non-Patent Document 2 and Non-Patent Document 3)
(3) Method of actually carrying out shearing treatment (see Non-Patent Document 4)

ここで、剪断処理を行なう方法では、角度を制限して剪断処理を行ない、各処理結果に重み付けをし、重みが最も大きくなる角度を斜体文字の傾斜角として採用し、文字の傾斜を補正する処理を行なう。 Here, in the method of performing the shearing process, the shearing process is performed by limiting the angle, each processing result is weighted, the angle at which the weight is the largest is adopted as the inclination angle of the italic character, and the character inclination is corrected. Perform processing.

ところが、従来の文字認識技術の多くは印刷物など白地に形成された黒文字を認識することを基本としており、テレビ画像中のテロップのように自然画中に埋め込まれた文字の認識には適していないという問題がある。テロップの背景が白地のように均一であることは稀である。 However, many of the conventional character recognition technologies are based on recognizing black characters formed on a white background such as printed matter, and are not suitable for recognizing characters embedded in natural images such as telops in TV images. There is a problem. It is rare that the background of a telop is as uniform as a white background.

特開２００４−３６４２３４号公報JP 2004-364234 A 特開２００１−２８５７１６号公報JP 2001-285716 A Ｃ．ＳｕｎａｎｄＤ．Ｓｉ，“ＳｋｅｗａｎｄＳｌａｎｔＣｏｒｒｅｃｔｉｏｎｆｏｒＤｏｃｕｍｅｎｔＩｍａｇｅｓＵｓｉｎｇＧｒａｄｉｅｎｔＤｉｒｅｃｔｉｏｎ”（Ｐｒｏｃ．ＩＣＤＡＲ，ｖｏｌ１，ｐｐ１４２−１４６，Ａｕｇ．１９９７）C. Sun and D.C. Si, “Skew and Sland Correction for Document Images Using Gradient Direction” (Proc. ICDAR, vol1, pp142-146, Aug. 1997). Ｇ．ＫｉｍａｎｄＶ．Ｇｏｖｉｎｄａｒａｊｕ，“Ａｌｅｘｉｃｏｎｄｒｉｖｅｎａｐｐｒｏａｃｈｔｏｈａｎｄｗｒｉｔｔｅｎｗｏｒｄｒｅｃｏｇｎｉｔｉｏｎｆｏｒｒｅａｌ−ｔｉｍｅａｐｐｌｉｃａｔｉｏｎｓ”（ＩＥＥＥＴｒａｎｓ．ＰＡＭＩ，Ｖｏｌ．１９，ｐｐ３６６−３７９，Ａｐｒｉｌ１９９７）G. Kim and V.M. Govindaraju, “A lexicon driven approach to handwritten word recognition for real-time applications” (IEEE Trans. PAMI, Vol. 19, pp 366-379, April 367, 1997). Ｆ．Ｋｉｍｕｒａ，Ｍ．Ｓｈｒｉｄｈａｒ，ａｎｄＺ．Ｃｈｅｎ， “Ｉｍｐｒｏｖｅｍｅｎｔｓｏｆａｌｅｘｉｃｏｎｄｉｒｅｃｔｅｄａｌｇｏｒｉｔｈｍｆｏｒｒｅｃｏｇｎｉｔｉｏｎｏｆｕｎｃｏｎｓｔｒａｉｎｅｄｈａｎｄｗｒｉｔｔｅｎｗｏｒｄｓ”（Ｐｒｏｃ．２ｎｄＩＣＤＡＲ．，ｐｐ．１８−２２，Ｏｃｔ．１９９３）F. Kimura, M .; Shridhar, and Z.M. Chen, “Improvements of a lexicon directed algorithm for recognition of unconstrained handwritten words” (Proc. 2nd ICDAR., Pp. 18-22, Oct. 1993). Ｅ．Ｋａｖａｌｌｉｅｒａｔｏｕｅｔａｌ，“Ｎｅｗａｌｇｏｒｉｔｈｍｓｆｏｒｓｋｅｗｉｎｇｃｏｒｒｅｃｔｉｏｎａｎｄｓｌａｎｔｒｅｍｏｖａｌｏｎｗｏｒｄ−ｌｅｖｅｌ”（Ｐｒｏｃ．６ｔｈＩＣＥＣＳ，ｖｏｌ．２，Ｓｅｐ．１９９９）E. Kavallierato et al, “New algorithms for skewing correction and slant removal on word-level” (Proc. 6th ICECS, vol. 2, Sep. 1999).

本発明の目的は、画像フレームに含まれている文字情報を一般的なＯＣＲエンジンで文字認識処理が可能となる形態で取り出すことができる、優れた文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムを提供することにある。 An object of the present invention is to provide an excellent character information extraction device, character information extraction method, and computer that can extract character information contained in an image frame in a form that allows character recognition processing by a general OCR engine.・ To provide a program.

本発明のさらなる目的は、テレビ映像に含まれるテロップなど、自然画中に描画されている活字を文字認識処理が可能な文字情報として好適に取り出すことができる、優れた文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is to provide an excellent character information extraction apparatus and character information capable of suitably extracting characters drawn in a natural image such as a telop included in a television image as character information capable of character recognition processing. To provide an extraction method and a computer program.

本発明のさらなる目的は、テレビ映像に含まれるテロップなど、自然画中に描画されている斜体文字を文字認識処理が可能な文字情報として好適に取り出すことができる、優れた文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is to provide an excellent character information extraction apparatus and character that can suitably extract italic characters drawn in a natural image such as a telop included in a television image as character information that can be subjected to character recognition processing. To provide an information extraction method and a computer program.

本発明は、上記課題を参酌してなされたものであり、その第１の側面は、入力画像中に描画された文字情報を抽出する文字情報抽出装置であって、
文字情報が斜体文字である場合に、文字の傾斜を補正して直立文字に変換してから文字認識を行なうために該文字情報の傾斜角を算出する際に、
入力画像のエッジ情報を検出するエッジ情報検出手段と、
前記エッジ情報検出手段により検出されたエッジ画像に対し所定の閾値を用いて２値化処理する２値化処理手段と、
前記２値化処理手段により２値化されたエッジ画像を、連結性を保持したまま所定の太さの線分まで縮退して細線化する細線化処理手段と、
細線化されたエッジ画像から直線を検出する直線検出手段と、
前記直線検出手段により検出された直線を用いて文字情報の傾斜角を算出する傾斜角計算手段と、
を備えることを特徴とする文字情報抽出装置である。 The present invention has been made in consideration of the above problems, and a first aspect thereof is a character information extraction device that extracts character information drawn in an input image,
When the character information is an italic character, when calculating the inclination angle of the character information in order to perform character recognition after correcting the character inclination and converting it to an upright character,
Edge information detection means for detecting edge information of the input image;
Binarization processing means for binarizing the edge image detected by the edge information detection means using a predetermined threshold;
Thinning processing means for reducing and thinning the edge image binarized by the binarizing processing means to a line segment having a predetermined thickness while maintaining connectivity;
A straight line detecting means for detecting a straight line from the thinned edge image;
An inclination angle calculating means for calculating an inclination angle of character information using a straight line detected by the straight line detecting means;
Is a character information extracting device.

本発明は、画像中に描画されている文字情報を抽出する文字情報抽出装置に関する。本発明は、とりわけテレビ映像からテロップの文字情報を抽出する際に適用される。 The present invention relates to a character information extraction device that extracts character information drawn in an image. The present invention is particularly applied when extracting text information of a telop from a television image.

ここで、テレビ映像中に描画されるテロップには斜体文字が多く使われることや、自然画中に文字情報が描画されていることが文字認識を行なう場合の問題となる。斜体文字を一般的なＯＣＲエンジンで処理可能にするための手法として斜体を直立文字に変換すればよいが、従来の斜体文字変換方法は、テロップのように自然画中に描画された斜体文字の処理には適していない。 Here, it is a problem in character recognition that italic characters are often used for telops drawn in a television image and that character information is drawn in a natural image. As a method for making italic characters processable by a general OCR engine, italic characters may be converted into upright characters. However, the conventional italic character conversion method is used for italic characters drawn in a natural image like a telop. Not suitable for processing.

そこで、本発明に係る文字情報抽出装置は、自然画中に描画された活字の傾斜角を算出する際に、まず入力画像からエッジ情報を検出し、さらに２値化して文字のエッジ情報を抽出し、続いてエッジ画像を縮退して細線化し、細線化したエッジ画像から直線を検出し、検出された直線の傾きを斜体文字の傾斜角として用いることにした。 Therefore, the character information extracting apparatus according to the present invention first detects edge information from an input image and then binarizes and extracts character edge information when calculating the inclination angle of a type drawn in a natural image. Subsequently, the edge image is degenerated and thinned, a straight line is detected from the thinned edge image, and the inclination of the detected straight line is used as the inclination angle of the italic character.

テロップは背景に対して輝度が高いという特徴を利用して、入力画像をＹＵＶ変換し、このうちＹ成分に対して横方向のエッジ計算を行なう。さらに、テロップは背景に対して輝度が高いので、エッジ情報を２値化処理することで、文字部分のエッジだけを残すことができる。そして、エッジ画像に対し細線化処理することで、直線検出で誤検出を減らすことができる。 The telop uses the feature that the luminance is high with respect to the background to YUV convert the input image, and among these, the edge calculation in the horizontal direction is performed for the Y component. Furthermore, since the telop has a high brightness with respect to the background, it is possible to leave only the edge of the character portion by binarizing the edge information. Then, by performing thinning processing on the edge image, false detection can be reduced by straight line detection.

また、斜体文字と言っても、視認性などによりその傾斜角には限度がある（通常、３０度以上傾いていることはない）。そこで、直線検出手段は、求める直線の傾きを制限して、誤検出を減らすことができる。 In addition, even if it is an italic character, there is a limit to the angle of inclination due to visibility (usually, it is not inclined more than 30 degrees). Therefore, the straight line detection means can limit the inclination of the obtained straight line to reduce false detection.

また、本発明の第２の側面は、入力画像中に描画された文字情報を抽出する文字情報抽出処理をコンピュータ・システム上で実行するようにコンピュータ可読形式で記述されたコンピュータ・プログラムであって、
文字情報が斜体文字である場合に、文字の傾斜を補正して直立文字に変換してから文字認識を行なうために該文字情報の傾斜角を算出する際に、前記コンピュータ・システムに対し、
入力画像のエッジ情報を検出するエッジ情報検出手順と、
前記エッジ情報検出手順において検出されたエッジ画像に対し所定の閾値を用いて２値化処理する２値化処理手順と、
前記２値化処理手順において２値化されたエッジ画像を、連結性を保持したまま所定の太さの線分まで縮退して細線化する細線化処理手順と、
細線化されたエッジ画像から直線を検出する直線検出手順と、
前記直線検出手順において検出された直線を用いて文字情報の傾斜角を算出する傾斜角計算手順と、
を実行させることを特徴とするコンピュータ・プログラムである。 According to a second aspect of the present invention, there is provided a computer program written in a computer-readable format so that character information extraction processing for extracting character information drawn in an input image is executed on a computer system. ,
When the character information is an italic character, when calculating the inclination angle of the character information in order to perform character recognition after correcting the character inclination and converting it to an upright character, the computer system,
Edge information detection procedure for detecting edge information of the input image;
A binarization processing procedure for binarizing the edge image detected in the edge information detection procedure using a predetermined threshold;
A thinning processing procedure for reducing and thinning the edge image binarized in the binarization processing procedure to a line segment having a predetermined thickness while maintaining connectivity;
A straight line detection procedure for detecting a straight line from the thinned edge image;
An inclination angle calculation procedure for calculating an inclination angle of character information using a straight line detected in the straight line detection procedure;
Is a computer program characterized in that

本発明の第２の側面に係るコンピュータ・プログラムは、コンピュータ・システム上で所定の処理を実現するようにコンピュータ可読形式で記述されたコンピュータ・プログラムを定義したものである。換言すれば、本発明の第２の側面に係るコンピュータ・プログラムをコンピュータ・システムにインストールすることによって、コンピュータ・システム上では協働的作用が発揮され、本発明の第１の側面に係る文字情報抽出装置と同様の作用効果を得ることができる。 The computer program according to the second aspect of the present invention defines a computer program described in a computer-readable format so as to realize predetermined processing on a computer system. In other words, by installing the computer program according to the second aspect of the present invention in the computer system, a cooperative action is exhibited on the computer system, and the character information according to the first aspect of the present invention. The same effect as the extraction device can be obtained.

本発明によれば、画像フレームに含まれている文字情報を一般的なＯＣＲエンジンで文字認識処理が可能となる形態で取り出すことができる、優れた文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムを提供することができる。 According to the present invention, an excellent character information extraction apparatus, character information extraction method, and computer that can extract character information contained in an image frame in a form that allows character recognition processing by a general OCR engine.・ Provide programs.

また、本発明によれば、テレビ映像に含まれるテロップなど、自然画中に描画されている斜体文字を文字認識処理が可能な文字情報として好適に取り出すことができる、優れた文字情報抽出装置及び文字情報抽出方法、並びにコンピュータ・プログラムを提供することができる。 In addition, according to the present invention, an excellent character information extraction device capable of suitably extracting italic characters drawn in a natural image such as a telop included in a television image as character information that can be subjected to character recognition processing, and A character information extraction method and a computer program can be provided.

本発明によれば、自然画中に描画された活字の傾斜角を自動的に高精度で算出することが可能になる。とりわけ、文字形状が複雑な日本語においても正しく傾斜角を求めることができるので、文字傾斜の補正を正確に行なうことで、ＯＣＲによる文字認識の精度を向上させることができる。 According to the present invention, it is possible to automatically calculate the inclination angle of a type drawn in a natural picture with high accuracy. In particular, since the inclination angle can be obtained correctly even in Japanese having a complicated character shape, the accuracy of character recognition by OCR can be improved by accurately correcting the character inclination.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施形態や添付する図面に基づくより詳細な説明によって明らかになるであろう。 Other objects, features, and advantages of the present invention will become apparent from more detailed description based on embodiments of the present invention described later and the accompanying drawings.

以下、図面を参照しながら本発明の実施形態について詳解する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明は、画像中に描画されている文字情報を抽出する文字情報抽出システムに関する。本発明では、とりわけテレビ映像からテロップの文字情報を抽出する際に適用される。この種のシステムの用途として、録画したテレビ番組コンテンツに対しシーン検索やダイジェスト視聴を行なうための映像インデックス付けなどが考えられる。 The present invention relates to a character information extraction system that extracts character information drawn in an image. The present invention is particularly applied when extracting text information of telop from a television image. As an application of this type of system, video indexing for performing scene search and digest viewing on recorded TV program content can be considered.

ここで、テレビ映像中のテロップを文字認識する際の問題点として、テロップには斜体文字が多く使われることや、自然画中に文字情報が描画されていることが挙げられる。斜体文字を一般的なＯＣＲエンジンで処理可能にするための手法として斜体を直立文字に変換する方法が挙げられる。すなわち、入力画像から取り出された文字情報の傾斜角を算出し、文字の傾斜を補正して直立文字に変換してからＯＣＲエンジンに投入する。しかしながら、従来の斜体文字変換方法は基本的には白地に描かれた黒文字など印刷文字を対象としており、テロップのように自然画中に描画された斜体文字の処理には適していない。 Here, as problems when recognizing characters in a telop in a television image, there are many italic characters used in the telop and character information is drawn in a natural image. As a method for making italic characters processable by a general OCR engine, there is a method of converting italic characters into upright characters. That is, the inclination angle of the character information extracted from the input image is calculated, and the inclination of the character is corrected and converted into an upright character before being input to the OCR engine. However, the conventional italic character conversion method basically targets printed characters such as black characters drawn on a white background, and is not suitable for processing italic characters drawn in a natural image like a telop.

そこで、本発明では、自然画中に描画された活字の傾斜角を算出する際に、まず入力画像からエッジ情報を検出し、さらに２値化して文字のエッジ情報を抽出し、続いてエッジ画像を縮退して細線化し、細線化したエッジ画像から直線を検出し、検出された直線の傾きを斜体文字の傾斜角として用いることにした。 Therefore, in the present invention, when calculating the inclination angle of a type drawn in a natural image, edge information is first detected from the input image, further binarized to extract character edge information, and then the edge image. Is thinned, a straight line is detected from the thinned edge image, and the inclination of the detected straight line is used as the inclination angle of the italic character.

テロップは背景に対して輝度が高いという特徴を利用して、入力画像をＹＵＶ変換し、このうちＹ成分に対してエッジ計算を行なう。さらに、テロップは背景に対して輝度が高いので、エッジ情報を２値化処理することで、文字部分のエッジだけを残すことができる。そして、エッジ画像に対し細線化処理することで、直線検出で誤検出を減らすことができる。 The telop uses the feature that the luminance is high with respect to the background, YUV-converts the input image, and performs edge calculation for the Y component. Furthermore, since the telop has a high brightness with respect to the background, it is possible to leave only the edge of the character portion by binarizing the edge information. Then, by performing thinning processing on the edge image, false detection can be reduced by straight line detection.

したがって、本発明に係る文字情報抽出システムによれば、自然画中に描画された活字の傾斜角を自動的に高精度で算出することが可能になる。とりわけ、文字形状が複雑な日本語においても正しく傾斜角を求めることができるので、文字傾斜の補正を正確に行なうことで、ＯＣＲによる文字認識の精度を向上させることができる。 Therefore, according to the character information extraction system according to the present invention, it is possible to automatically calculate the inclination angle of a type drawn in a natural image with high accuracy. In particular, since the inclination angle can be obtained correctly even in Japanese having a complicated character shape, the accuracy of character recognition by OCR can be improved by accurately correcting the character inclination.

図１には、本発明の一実施形態に係る文字情報抽出システム１０の機能的構成を模式的に示している。図示のシステム１０は、傾斜角計算部１１と、文字傾斜補正部１２と、文字認識部１３という各機能モジュールを備えている。同システム１０は、専用のハードウェア装置としてデザインできる他、パーソナル・コンピュータ（ＰＣ）などの一般的な計算機システム上で所定のアプリケーション・プログラムを実行すると言う形態で実現することも可能である。以下、各機能モジュールについて説明する。 FIG. 1 schematically shows a functional configuration of a character information extraction system 10 according to an embodiment of the present invention. The illustrated system 10 includes functional modules such as an inclination angle calculation unit 11, a character inclination correction unit 12, and a character recognition unit 13. The system 10 can be designed as a dedicated hardware device, or can be realized by executing a predetermined application program on a general computer system such as a personal computer (PC). Hereinafter, each functional module will be described.

入力画像：
システム１０に入力される画像は、テレビ映像など自然画中にテロップなどの文字情報が描画された画像フレームである。例えば、ハード・ディスク・レコーダなどテレビ番組を録画した映像蓄積装置（図示しない）から画像が入力される。 Input image:
An image input to the system 10 is an image frame in which character information such as a telop is drawn in a natural image such as a television image. For example, an image is input from a video storage device (not shown) that records a television program such as a hard disk recorder.

例えばニュース番組やバラエティ番組といったジャンルのテレビ放送では、視聴者の理解や賛同を得る、あるいは興味を抱かせ番組内に気持ちを引き込ませるといった目的で、テロップを表示する方法が採られる。その多くの場合、図２に示すように画面の四隅のいずれかの領域を利用して静止テロップが存在するので、これらの領域から文字情報を抽出するようにしてもよい。 For example, in a television broadcast of a genre such as a news program or a variety program, a method of displaying a telop is used for the purpose of obtaining the understanding and approval of the viewer or attracting interest in the program. In many cases, as shown in FIG. 2, since a stationary telop exists using any one of the four corners of the screen, character information may be extracted from these areas.

傾斜角計算部：
傾斜角計算部１１は、入力された画像における文字の傾斜角θを計算する。図３には、傾斜角の計算処理手順をフローチャートの形式で示している。同図に示すように、傾斜角の計算処理は、入力画像の横方向エッジの計算（Ｓ１）と、２値化処理（Ｓ２）と、細線化処理（Ｓ３）と、直線検出（Ｓ４）と、直線の傾き計算（Ｓ５）からなる。また、図４には、入力されたオリジナル画像がこれらの計算処理の各ステップにより加工されていく様子を示している。 Inclination angle calculator:
The inclination angle calculation unit 11 calculates the inclination angle θ of characters in the input image. FIG. 3 shows an inclination angle calculation processing procedure in the form of a flowchart. As shown in the figure, the calculation process of the tilt angle includes the calculation of the horizontal edge of the input image (S1), the binarization process (S2), the thinning process (S3), and the straight line detection (S4). And straight line slope calculation (S5). FIG. 4 shows a state in which the input original image is processed by each step of these calculation processes.

まず、ステップＳ１では、入力されたオリジナル画像に対し横方向のエッジを計算することにより、図４（ａ）に示したオリジナル画像から、図４（ｂ）に示すようなエッジ画像を得ることができる。 First, in step S1, an edge image as shown in FIG. 4B can be obtained from the original image shown in FIG. 4A by calculating a lateral edge with respect to the input original image. it can.

エッジ情報を検出する際、入力画像をＹＵＶ変換し、Ｙ成分に対してエッジ計算を行なう。これは、テロップが背景に対して輝度が高いという特徴を利用するためである。そして、例えばＸ方向のＳｏｂｅｌフィルタを用いてエッジを検出することができる。Ｓｏｂｌｅフィルタについては、例えば、田村秀行著「コンピュータ画像処理」（ｐｐ１８８−１９１、オーム社）を参照されたい。但し、本発明の要旨はエッジ検出にＳｏｂｅｌフィルタを用いることに限定されない。 When detecting edge information, the input image is YUV converted and edge calculation is performed on the Y component. This is because the telop uses the feature that the luminance is higher than the background. For example, an edge can be detected using a Sobel filter in the X direction. For the Soble filter, see, for example, “Computer Image Processing” written by Hideyuki Tamura (pp 188-191, Ohmsha). However, the gist of the present invention is not limited to using a Sobel filter for edge detection.

参考のため、元画像から横方向すなわちＸ方向のエッジを算出した場合と、縦方向すなわちＹ方向のエッジを算出した場合の例を図５に示しておく。同図からも分かるように、Ｘ方向のエッジ抽出することにより、縦方向のストロークを検出することができる。 For reference, FIG. 5 shows an example in which the edge in the horizontal direction, that is, the X direction is calculated from the original image, and the case where the edge in the vertical direction, that is, the Y direction is calculated. As can be seen from the figure, a vertical stroke can be detected by extracting an edge in the X direction.

続くステップＳ２では、ステップＳ１で求めたエッジ画像に対し２値化処理を適用し、画素値が閾値未満のエッジをすべて０に、閾値以上のエッジをすべて１にする処理を行ない、図４（ｃ）に示すエッジ画像を生成する。 In the subsequent step S2, binarization processing is applied to the edge image obtained in step S1, and processing is performed to set all edges whose pixel values are less than the threshold to 0 and all edges that are greater than or equal to the threshold to 1, as shown in FIG. The edge image shown in c) is generated.

テロップはその背景に比べて輝度が高いという特徴があるので、このような閾値処理を適切に行なうことで、文字部分のエッジだけを残すようにすることができる。 Since the telop has a feature that the luminance is higher than that of the background, it is possible to leave only the edge of the character portion by appropriately performing such threshold processing.

閾値を求める方法として、大津のアルゴリズムを用いることができる。大津のアルゴリズムに関しては、例えば、大津著「判別および最小２乗基準に基づく自動しきい値選定法」（電子通信学会論文誌、Ｖｏｌ．Ｊ６３−Ｄ、Ｎｏ．４、ｐｐ３４９−３５６、１９８０）を参照されたい。但し、本発明の要旨はこれに限定されるものではない。 As a method for obtaining the threshold, Otsu's algorithm can be used. Regarding Otsu's algorithm, for example, Otsu's “Automatic threshold selection method based on discrimination and least squares criterion” (Electronic Communication Society Journal, Vol. J63-D, No. 4, pp 349-356, 1980). Please refer. However, the gist of the present invention is not limited to this.

さらに、ステップＳ３では、２値化処理後のエッジ画像に対し細線化処理を行なう。ここで言う細線化処理とは、エッジ部分を、連結性を保持したまま太さ１（画素）の線分まで縮退させる処理であり、図４（ｄ）に示すようなエッジ画像を生成する。 Further, in step S3, the thinning process is performed on the edge image after the binarization process. The thinning process referred to here is a process for reducing the edge portion to a line segment having a thickness of 1 (pixel) while maintaining connectivity, and generates an edge image as shown in FIG.

細線化処理は、後続の直線検出処理において誤検出を減らす目的で行なわれる。図６には、エッジ画像を細線化処理した場合としない場合における直線検出の相違を図解している。高さｈ（画素）のエッジを細線化した場合、ｈ点以上を通過する直線の検出を行なうと、縦の直線だけが検出される。これに対し、細線化を施していない場合には、同様にｈ点以上を通過する直線の検出を行なった結果として、複数の直線が誤検出されることになる。 The thinning process is performed for the purpose of reducing erroneous detection in the subsequent straight line detection process. FIG. 6 illustrates the difference in straight line detection when the edge image is thinned and when it is not thinned. When the edge of height h (pixel) is thinned, if a straight line passing through the point h or higher is detected, only a vertical straight line is detected. On the other hand, when thinning is not performed, a plurality of straight lines are erroneously detected as a result of detecting straight lines passing through the points h and higher.

細線化処理として、マスクを用いたアルゴリズムを用いることができる。例えば、“骨格化と細線化処理”（ＣＭａｇａｚｉｎｅ２０００年９月号ｐｐ１２２−１３１）を参照されたい。但し、本発明の要旨はこれに限定されるものではない。 As the thinning process, an algorithm using a mask can be used. See, for example, “Skeletalization and Thinning” (C Magazine September 2000, pp122-131). However, the gist of the present invention is not limited to this.

そして、ステップＳ４では、細線化処理されたエッジ画像から直線検出を行なう。図４（ｅ）では、文字「お」から直線が検出された様子を示している。 In step S4, straight line detection is performed from the thinned edge image. FIG. 4E shows a state in which a straight line is detected from the character “O”.

例えば、細線化されたエッジ画像に対し、Ｈｏｕｇｈ変換を用いて直線検出を行なうことができる。Ｈｏｕｇｈ変換に関しては、例えば、田村秀行著「コンピュータ画像処理」（ｐｐ２０４−２０６、オーム社）を参照されたい。但し、本発明の要旨はこれに限定されるものではない。 For example, straight line detection can be performed on the thinned edge image using Hough transform. Regarding Hough conversion, see, for example, “Computer Image Processing” written by Hideyuki Tamura (pp 204-206, Ohmsha). However, the gist of the present invention is not limited to this.

図７には、直線検出処理の手順をフローチャートの形式で示している。 FIG. 7 shows a procedure of straight line detection processing in the form of a flowchart.

まず、求める直線の通過する画素数Ｎの初期値として、画像の高さの２分の１を与える（ステップＳ１１）。 First, as an initial value of the number N of pixels through which a desired straight line passes, a half of the image height is given (step S11).

そして、Ｈｏｕｇｈ変換による直線検出を行ない（ステップＳ１２）、検出された直線の数をチェックする（ステップＳ１３）。 Then, straight line detection by Hough conversion is performed (step S12), and the number of detected straight lines is checked (step S13).

ここで、直線が検出されなかった場合には、求める直線の通過する画素数Ｎをデクリメントしてから（ステップＳ１５）、ステップＳ１２に戻り、直線検出を繰り返し行なう。 Here, if a straight line is not detected, the number N of pixels through which the desired straight line passes is decremented (step S15), and then the process returns to step S12 to repeat the straight line detection.

一方、１以上の直線が検出された場合には、傾きが０〜３０度の範囲に収まっている直線が１以上あるかどうかをチェックする（ステップＳ１４）。視認性などにより文字の傾斜角には限度があり、通常は３０度以上傾いていることはあり得ない。そこで、このように求める直線の傾きを制限して、誤検出を減らすことができる。 On the other hand, when one or more straight lines are detected, it is checked whether or not there is one or more straight lines whose inclination is in the range of 0 to 30 degrees (step S14). There is a limit to the inclination angle of characters due to visibility and the like, and it is usually impossible to incline more than 30 degrees. Therefore, it is possible to reduce false detection by limiting the slope of the straight line thus obtained.

ここで、傾きが０〜３０度の範囲に収まっている直線がない場合には、求める直線の通過する画素数Ｎをデクリメントしてから（ステップＳ１５）、ステップＳ１２に戻り、直線検出を繰り返し行なう。一方、１以上の直線が検出された場合には、これを直線検出の結果として出力する。 Here, when there is no straight line whose inclination is within the range of 0 to 30 degrees, the number N of pixels through which the obtained straight line passes is decremented (step S15), and then the process returns to step S12 to repeat the straight line detection. . On the other hand, if one or more straight lines are detected, this is output as a result of straight line detection.

そして、ステップＳ５において、検出された直線の傾きの平均を計算し、これを入力画像における文字情報の傾斜角として、後続の文字傾斜補正部１２に出力する。 In step S5, the average of the detected inclinations of the straight lines is calculated, and this is output to the subsequent character inclination correction unit 12 as the inclination angle of the character information in the input image.

文字傾斜補正部：
文字傾斜補正部１２では、傾斜角計算部１１により算出された文字の傾斜角θを用いて剪断処理を行ない、斜体の文字を直立化する。ここで言う剪断処理とは、ある画素座標（ｘ，ｙ）の移動後の位置を（ｘ´，ｙ´）とすると、下式のように計算する（図８を参照のこと）。 Character inclination correction part:
The character inclination correction unit 12 performs a shearing process using the character inclination angle θ calculated by the inclination angle calculation unit 11 to erect an italic character. The shearing process referred to here is calculated as shown in the following equation (see FIG. 8), where (x ′, y ′) is a position after movement of a certain pixel coordinate (x, y).

ｘ´＝ｘ−（ｈｅｉｇｈｔ−ｙ）×ｔａｎθ
ｙ´＝ｙ x ′ = x− (height−y) × tan θ
y '= y

図９に示すようなＴＶ式ラスタ走査によって画像上を走査し、すべての画素について上記の計算を行なう。剪断処理後の位置が重なる画素が存在する場合には、上書きや平均などの処理を施す。 The image is scanned by TV raster scanning as shown in FIG. 9, and the above calculation is performed for all pixels. If there are pixels whose positions after the shearing process overlap, processing such as overwriting or averaging is performed.

文字認識部：
文字認識部１３では、入力された画像を文字認識し、文字情報を出力する。文字認識処理としては、既存のＯＣＲエンジンを適用することができる。その一例として、本出願人に既に譲渡されている特許第２８９３７８１号明細書に記載の文字認識装置を挙げることができる。 Character recognition part:
The character recognition unit 13 performs character recognition on the input image and outputs character information. As the character recognition process, an existing OCR engine can be applied. As an example, a character recognition device described in Japanese Patent No. 2893781 already assigned to the present applicant can be cited.

以上、特定の実施形態を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施形態の修正や代用を成し得ることは自明である。 The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiment without departing from the gist of the present invention.

本明細書では、テレビ映像で描画されているテロップから文字情報を抽出する場合を例にとって説明してきたが、本発明の要旨はこれに限定されるものではない。テレビ映像以外であっても、自然画中に描画された斜体文字、あるいは通常の印刷物として形成された斜体文字から文字情報を取り出す場合であっても、同様に本発明を適用することができることは言うまでもない。 In this specification, the case where character information is extracted from a telop drawn on a television image has been described as an example. However, the gist of the present invention is not limited to this. It is possible to apply the present invention in the same way even when the character information is extracted from italic characters drawn in a natural image or italic characters formed as a normal printed matter, even if it is not a television image. Needless to say.

要するに、例示という形態で本発明を開示してきたのであり、本明細書の記載内容を限定的に解釈するべきではない。本発明の要旨を判断するためには、特許請求の範囲の記載を参酌すべきである。 In short, the present invention has been disclosed in the form of exemplification, and the description of the present specification should not be interpreted in a limited manner. In order to determine the gist of the present invention, the description of the scope of claims should be considered.

図１は、本発明の一実施形態に係る文字情報抽出システム１０の機能的構成を模式的に示した図である。FIG. 1 is a diagram schematically showing a functional configuration of a character information extraction system 10 according to an embodiment of the present invention. 図２は、テロップ領域を含んだテレビ番組の画面構成例を示した図である。FIG. 2 is a diagram showing a screen configuration example of a television program including a telop area. 図３は、傾斜角の計算処理手順を示したフローチャートである。FIG. 3 is a flowchart showing the procedure for calculating the tilt angle. 図４は、入力されたオリジナル画像が傾斜角の計算処理の各ステップにより加工されていく様子を示した図である。FIG. 4 is a diagram illustrating a state in which the input original image is processed by each step of the tilt angle calculation process. 図５は、元画像から横方向すなわちＸ方向のエッジを算出した場合と、縦方向すなわちＹ方向のエッジを算出した場合の例を示した図である。FIG. 5 is a diagram illustrating an example in which an edge in the horizontal direction, that is, the X direction is calculated from an original image, and an edge in the vertical direction, that is, an edge in the Y direction is calculated. 図６は、エッジ画像を細線化処理した場合としない場合における直線検出の相違を説明するための図である。FIG. 6 is a diagram for explaining a difference in straight line detection when the edge image is thinned and when it is not thinned. 図７は、直線検出処理の手順を示したフローチャートである。FIG. 7 is a flowchart showing the procedure of straight line detection processing. 図８は、文字の傾斜角θを用いて剪断処理を行ない、斜体の文字を直立化する様子を示した図である。FIG. 8 is a diagram showing a state in which the italic character is made upright by performing a shearing process using the inclination angle θ of the character. 図９は、ＴＶ式ラスタ走査を行なう様子を示した図である。FIG. 9 is a diagram showing a state in which TV-type raster scanning is performed.

Explanation of symbols

１０…文字情報抽出システム
１１…傾斜角計算部
１２…文字傾斜補正部
１３…文字認識部
DESCRIPTION OF SYMBOLS 10 ... Character information extraction system 11 ... Inclination angle calculation part 12 ... Character inclination correction part 13 ... Character recognition part

Claims

A character information extraction device that extracts character information drawn in an input image,
When the character information is an italic character, when calculating the inclination angle of the character information in order to perform character recognition after correcting the character inclination and converting it to an upright character,
Edge information detection means for detecting edge information of the input image;
Binarization processing means for binarizing the edge image detected by the edge information detection means using a predetermined threshold;
Thinning processing means for reducing and thinning the edge image binarized by the binarizing processing means to a line segment having a predetermined thickness while maintaining connectivity;
A straight line detecting means for detecting a straight line from the thinned edge image;
An inclination angle calculating means for calculating an inclination angle of character information using a straight line detected by the straight line detecting means;
A character information extracting apparatus comprising:

The input image is a television image, and a telop drawn in the television image is extracted as character information.
The character information extracting apparatus according to claim 1, wherein:

The edge information detecting means detects a lateral edge of the input image;
The character information extracting apparatus according to claim 1, wherein:

The edge information detection means performs YUV conversion on the input image and performs edge detection on the Y component.
The character information extracting apparatus according to claim 1, wherein:

The straight line detecting means limits the inclination of the straight line to be detected.
The character information extracting apparatus according to claim 1, wherein:

A character information extraction method for extracting character information drawn in an input image,
When the character information is an italic character, when calculating the inclination angle of the character information in order to perform character recognition after correcting the character inclination and converting it to an upright character,
An edge information detection step for detecting edge information of the input image;
A binarization processing step of binarizing the edge image detected in the edge information detection step using a predetermined threshold;
A thinning step for thinning the edge image binarized in the binarization processing step to a line segment having a predetermined thickness while maintaining connectivity;
A straight line detection step for detecting a straight line from the thinned edge image;
An inclination angle calculating step of calculating an inclination angle of the character information using the straight line detected in the straight line detecting step;
A character information extraction method comprising:

The input image is a television image, and a telop drawn in the television image is extracted as character information.
The character information extracting method according to claim 6.

In the edge information detection step, a lateral edge of the input image is detected.
The character information extracting method according to claim 6.

In the edge information detection step, the input image is subjected to YUV conversion, and edge detection is performed on the Y component.
The character information extracting method according to claim 6.

In the straight line detection step, the inclination of the straight line to be detected is limited.
The character information extracting method according to claim 6.

A computer program written in a computer-readable format so as to execute character information extraction processing for extracting character information drawn in an input image on a computer system,
When the character information is an italic character, when calculating the inclination angle of the character information in order to perform character recognition after correcting the character inclination and converting it to an upright character, the computer system,
Edge information detection procedure for detecting edge information of the input image;
A binarization processing procedure for binarizing the edge image detected in the edge information detection procedure using a predetermined threshold;
A thinning processing procedure for reducing and thinning the edge image binarized in the binarization processing procedure to a line segment having a predetermined thickness while maintaining connectivity;
A straight line detection procedure for detecting a straight line from the thinned edge image;
An inclination angle calculation procedure for calculating an inclination angle of character information using a straight line detected in the straight line detection procedure;
A computer program for executing