KR101390561B1

KR101390561B1 - Method and apparatus for subtitles detection

Info

Publication number: KR101390561B1
Application number: KR1020130016327A
Authority: KR
Inventors: 문영식; 박민수; 박기태
Original assignee: 한양대학교 에리카산학협력단
Priority date: 2013-02-15
Filing date: 2013-02-15
Publication date: 2014-05-27
Anticipated expiration: 2033-02-15

Abstract

자막 검출 방법 및 그 장치가 개시된다. 자막 검출 방법은 입력 영상을 그레이스케일 영상으로 변환하는 단계; 상기 그레이스케일 변환된 영상을 해리스 코너 검출 알고리즘을 적용하여 제1 자막 후보 영역을 검출하는 단계; 상기 그레이스케일 변환된 영상을 이산코사인 변환을 수행하여 제2 자막 후보 영역을 검출하는 단계; 및 상기 제1 자막 후보 영역과 상기 제2 자막 후보 영역 중 중첩되는 영역을 자막 영역으로 결정하는 단계를 포함한다.A subtitle detection method and apparatus therefor are disclosed. The subtitle detection method includes: converting an input image into a gray scale image; Detecting a first caption candidate region by applying a Harris corner detection algorithm to the gray-scale transformed image; Performing a discrete cosine transform on the gray-scale-converted image to detect a second caption candidate region; And determining an overlapping area between the first caption candidate region and the second caption candidate region as a caption region.

Description

TECHNICAL FIELD The present invention relates to a method and apparatus for detecting subtitles,

본 발명은 자막을 포함한 입력 영상에서 자막을 보다 효율적으로 검출할 수 있는 방법 및 그 장치에 관한 것이다.
The present invention relates to a method and apparatus for more efficiently detecting subtitles in an input image including subtitles.

최근 다양한 형태의 멀티미디어 서비스 기술이 발달하면서 미디어 산업의 관심이 커지고 있다. 가장 이슈화가 되고 있는 기술 중 하나는 영상 콘텐츠 내의 삽입된 자막을 이용하여 처리하는 것이다. 이러한 영상내의 자막은 시청자에게 편리한 시청환경과 콘텐츠에 대한 다양한 정보를 제공한다. 뉴스 보도나 다큐멘터리 영상내의 삽입된 자막이 보도되는 내용을 구체적으로 보이는 글씨로 전달하여 제공하는 효율적인 부가 기능의 역할을 하는 것처럼 직접 듣지 못한 부분까지도 눈으로 직접 정보 전달을 해주는 유용한 일원이 되었다. 이러한 삽입된 자막은 비디오 정보 검색, 영상 검색, 장면 검색 등의 기술에서 유용하게 사용된다. Recently, the interest of the media industry is growing as various types of multimedia service technologies are developed. One of the most urgent technologies is to use embedded subtitles in video content. The caption in this image provides a viewing environment convenient for viewers and various information about the contents. It is a useful part to convey information directly to the parts that are not heard directly, as it serves as an effective supplementary function that delivers the contents of news reports or documentary images in concrete texts. Such embedded subtitles are useful for techniques such as video information search, image search, and scene search.

종래에는 텍스처 기반, 색상 기반 및 에지 특성을 고려하여 자막 영역을 추출하였다. 텍스처 기반의 자막 추출 방법은 영상 전체의 특성을 이용하여 자막 영역을 추출하는 방법으로, 자막의 텍스처 형태 특징을 추출하기 위한 방법으로 주로 해리스 코너 검출기를 사용하여 특징 점을 추출하고 입력 영상을 프레임 단위로 나누어 유사도를 측정하여 자막 영역을 검출하는 방법이 주로 이용되었다. 그러나 텍스처 기반 방법을 이용한 자막 영역 검출은 환경변수가 다양하고 복잡할 뿐만 아니라 영상 내의 배경으로 포함된 텍스처 형태의 영역까지 검출되어 잘못된 결과를 불러일으킬 수 있는 문제점이 있다.Conventionally, caption regions are extracted considering texture base, color base, and edge characteristics. The texture-based caption extraction method extracts the caption area using the characteristics of the whole image. It extracts feature points of the caption using the Harris corner detector. And the similarity is measured to detect the caption area. However, the detection of the caption area using the texture-based method has a problem in that not only the environment variable is varied and complex, but also the area of the texture type included in the background of the image is detected, resulting in erroneous results.

색상 기반 자막 영역 검출은 자막이 특정 색상 대역에 분포하여 있다는 특성을 사용하여 자막 영역을 검출하는 것으로, 영상 내의 자막이 동일한 색상 및 밝기 영역 대에 있고, 자막과 배경 간의 대비 차이가 있기 때문에 RGB 채널 값을 변화 시켜서 검출하는 방법이다. 이러한 색상 기반 자막 영역 검출은 동영상 압축 기술이 발달함에 있어서 색상 정보 값이 손실되거나 원본 영상 내의 색상 값의 유지가 어려운 문제점이 있다.The color-based caption region detection detects the caption region using the characteristic that the caption is distributed in a specific color band. Since the caption in the image is in the same color and brightness region and there is a contrast difference between the caption and the background, And changing the value. Such color-based caption region detection has a problem in that color information values are lost or color values in an original image are difficult to maintain when moving picture compression techniques are developed.

마지막으로 에지 특성을 고려하여 자막 영역을 추출하는 방법은 일반적으로에지 마스크 및 에지 맵을 추출하기 위한 다양한 검출기를 수행하여 자막을 검출한다. 그러나, 이러한 방법은 복잡한 구조를 가지고 있지 않은 영상에서는 효율적인 검출이 가능하지만, 복잡한 구조의 영상에서는 추가적인 처리 과정이 요구되고 정확도 면에서도 떨어지는 단점이 있다.
Finally, a method of extracting a caption area by considering an edge characteristic generally detects a caption by performing various detectors for extracting an edge mask and an edge map. However, such a method can efficiently detect images that do not have a complicated structure, but has a disadvantage in that an additional processing process is required in the case of a complicated structure image and the accuracy is also deteriorated.

본 발명은 복잡한 구조의 배경을 갖는 영상에서도 정확하게 자막 영역을 검출할 수 있는 자막 검출 방법 및 그 장치를 제공하기 위한 것이다.
The present invention provides a subtitle detection method and apparatus capable of accurately detecting a subtitle area even in an image having a complex structure background.

본 발명의 일 측면에 따르면, 영상에서 자막을 정확하고 효율적으로 검출할 수 있는 방법 및 그 방법을 수행하기 위한 프로그램을 기록한 기록매체가 제공된다. According to an aspect of the present invention, there is provided a method of accurately and efficiently detecting subtitles in an image and a recording medium on which a program for performing the method is recorded.

본 발명의 일 실시예에 따르면, 입력 영상을 그레이스케일 영상으로 변환하는 단계; 상기 그레이스케일 변환된 영상을 해리스 코너 검출 알고리즘을 적용하여 제1 자막 후보 영역을 검출하는 단계; 상기 그레이스케일 변환된 영상을 이산코사인 변환을 수행하여 제2 자막 후보 영역을 검출하는 단계; 및 상기 제1 자막 후보 영역과 상기 제2 자막 후보 영역 중 중첩되는 영역을 자막 영역으로 결정하는 단계를 포함하는 자막 검출 방법이 제공될 수 있다.According to an embodiment of the present invention, there is provided a method of converting an input image into a gray scale image, Detecting a first caption candidate region by applying a Harris corner detection algorithm to the gray-scale transformed image; Performing a discrete cosine transform on the gray-scale-converted image to detect a second caption candidate region; And determining a region overlapping the first caption candidate region and the second caption candidate region as a caption region.

상기 제1 자막 후보 영역을 검출하는 단계는, 상기 그레이스케일 변환된 영상의 밝기 평균과 표준 편차를 도출하는 단계; 상기 밝기 평균과 표준 편차를 이용하여 정규 분포값에 따른 영상으로 정규화하는 단계; 상기 정규화된 영상을 해리스 코너 검출 알고리즘을 적용하여 코너 점들을 검출하는 단계; 상기 코너 점들이 검출된 영상을 블러링(blur) 수행하여 노이즈를 제거하는 단계; 상기 노이즈 제거된 코너 점의 인접 영역끼리 레이블링하여 상기 제1 자막 후보 영역을 검출하는 단계를 포함할 수 있다.Wherein the step of detecting the first caption candidate region comprises: deriving brightness average and standard deviation of the gray-scale-converted image; Normalizing the image according to the normal distribution value using the brightness average and the standard deviation; Detecting the corner points by applying the Harris corner detection algorithm to the normalized image; Blurring an image in which the corner points are detected to remove noise; And labeling adjacent regions of the noise-removed corner points to detect the first caption candidate region.

상기 제2 자막 후보 영역을 검출하는 단계는, 상기 이산코사인 변환 수행된 영상을 사인 함수에 적용하여 정규화하는 단계; 상기 정규화된 영상을 역이산코사인 변환을 수행하는 단계; 상기 역이산코사인 변환된 영상을 가우시안 필터를 적용하는 단계; 상기 가우시안 필터 적용된 영상을 임계값으로 이진화하여 상기 제2 자막 후보 영역을 검출하는 단계를 포함할 수 있다.The step of detecting the second caption candidate region may include: normalizing the discrete cosine transformed image by applying the discrete cosine transformed image to a sine function; Performing inverse discrete cosine transform on the normalized image; Applying the Gaussian filter to the inverse discrete cosine transformed image; And binarizing the Gaussian filtered image to a threshold value to detect the second caption candidate region.

상기 임계값은 상기 가우시안 필터 적용된 영상의 평균과 분산을 합산한 값이다.The threshold value is a value obtained by summing the average and variance of the Gaussian filtered image.

상기 가우시안 필터 적용된 영상을 임계값으로 인진화하는 것은, 상기 가우시안 필터 적용된 영상의 각 픽셀값이 상기 임계값 이상이면, 제1 값으로 변환하고, 상기 임계값 미만이면 제2 값으로 각 픽셀값을 변환하여 이진화할 수 있다.In the case where the Gaussian filtered image is transformed into a first value when each pixel value of the Gaussian filtered image is greater than or equal to the threshold value and each pixel value is converted into a second value when the pixel value is less than the threshold value, Can be converted and binarized.

상기 제2 자막 후보 영역을 검출하는 단계는, 상기 제2 값에 해당하는 픽셀값들의 인접 영역을 레이블링하여 상기 제2 자막 후보 영역을 검출할 수 있다.
The detecting of the second caption candidate region may detect the second caption candidate region by labeling an adjacent region of the pixel values corresponding to the second value.

본 발명의 다른 측면에 따르면, 영상에서 자막을 정확하고 효율적으로 검출할 수 있는 장치가 제공된다.According to another aspect of the present invention, there is provided an apparatus for accurately and efficiently detecting subtitles in an image.

본 발명의 일 실시예에 따르면, 입력 영상을 그레이스케일 영상으로 변환하는 전처리부; 상기 그레이스케일 변환된 영상을 해리스 코너 검출 알고리즘을 적용하여 제1 자막 후보 영역을 검출하는 제1 신호 처리부; 상기 그레이스케일 변환된 영상을 이산코사인 변환을 수행하여 제2 자막 후보 영역을 검출하는 제2 신호 처리부; 및 상기 제1 자막 후보 영역과 상기 제2 자막 후보 영역 중 중첩되는 영역을 자막 영역으로 검출하는 자막 검출부를 포함하는 자막 검출 장치가 제공될 수 있다.According to an embodiment of the present invention, there is provided an image processing apparatus including a preprocessing unit for converting an input image into a gray scale image; A first signal processing unit for detecting a first caption candidate region by applying a Harris corner detection algorithm to the gray-scale converted image; A second signal processing unit for performing a discrete cosine transform on the gray-scale-converted image to detect a second caption candidate region; And a subtitle detection unit for detecting a region overlapping the first subtitle candidate region and the second subtitle candidate region as a subtitle region.

상기 제1 신호 처리부는, 상기 그레이스케일 변환된 영상의 밝기 평균과 표준 편차를 도출하고, 상기 밝기 평균과 표준 편차를 이용하여 정규 분포값에 따른 영상으로 정규화하며, 상기 정규화된 영상을 해리스 코너 검출 알고리즘을 적용하여 코너 점들을 검출하여 블러링(blur)을 수행한 후 코너 점의 인접 영역끼리 레이블링하여 상기 제1 자막 후보 영역을 검출할 수 있다.Wherein the first signal processor derives brightness average and standard deviation of the gray-scale-converted image, normalizes the brightness average and standard deviation of the image into an image according to a normal distribution value using the brightness average and standard deviation, Algorithm is applied to detect corner points, blurring is performed, and adjacent regions of the corner points are labeled to detect the first caption candidate region.

상기 제2 신호 처리부는, 상기 이산코사인 변환 수행된 영상을 사인 함수에 적용하여 정규화하여 역이산코사인 변환을 수행하며, 상기 역이산코사인 변환된 영상을 가우시안 필터를 적용한 후 임계값으로 이진화하여 상기 제2 자막 후보 영역을 검출할 수 있다.
Wherein the second signal processor performs an inverse discrete cosine transform by applying the discrete cosine transformed image to a sine function to normalize the transformed image and applies the Gaussian filter to the inverse discrete cosine transformed image to binarize the inverse discrete cosine transformed image into a threshold value, 2 subtitle candidate area can be detected.

본 발명의 일 실시예에 따른 자막 검출 방법 및 그 장치를 제공함으로써, 복잡한 구조의 배경을 갖는 영상에서도 자막을 효율적이고 정확하게 검출할 수 있는 이점이 있다.
There is an advantage that the caption can be efficiently and accurately detected even in an image having a complicated structure background by providing the caption detection method and apparatus according to an embodiment of the present invention.

도 1은 본 발명의 일 실시예에 따른 자막 검출 장치의 내부 구성을 개략적으로 도시한 블록도.
도 2는 본 발명의 일 실시예에 따른 종래 방법과 제1 신호 처리부를 통한 자막 후보 영역 검출 결과를 비교하기 위해 도시한 도면.
도 3은 본 발명의 일 실시예에 따른 제1 신호 처리부를 통한 자막 후보 영역 검출을 설명하기 위해 도시한 도면.
도 4는 본 발명의 일 실시예에 따른 제2 신호 처리부를 통한 자막 후보 영역 검출을 설명하기 위해 도시한 도면.
도 5는 본 발명의 일 실시예에 따른 자막 검출 결과를 설명하기 위해 도시한 도면.
도 6은 본 발명의 일 실시예에 따른 자막 검출 장치에서 자막을 검출하는 방법을 나타낸 순서도.
도 7은 종래 방법과 본 발명의 일 실시예에 따른 자막 검출 성능을 비교한 표.1 is a block diagram schematically showing an internal configuration of a caption detecting apparatus according to an embodiment of the present invention;
FIG. 2 is a diagram for comparing a caption candidate region detection result through a conventional method and a first signal processing unit according to an embodiment of the present invention; FIG.
FIG. 3 is a view for explaining caption candidate region detection through a first signal processing unit according to an embodiment of the present invention; FIG.
FIG. 4 is a diagram for explaining caption candidate region detection through a second signal processing unit according to an embodiment of the present invention; FIG.
FIG. 5 is a diagram for explaining subtitle detection result according to an embodiment of the present invention; FIG.
6 is a flowchart illustrating a method of detecting a caption in a caption detection apparatus according to an exemplary embodiment of the present invention.
7 is a table comparing subtitle detection performance according to an embodiment of the present invention with a conventional method.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.BRIEF DESCRIPTION OF THE DRAWINGS The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

이하, 본 발명의 실시예를 첨부한 도면들을 참조하여 상세히 설명하기로 한다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 자막 검출 장치의 내부 구성을 개략적으로 도시한 블록도이고, 도 2는 본 발명의 일 실시예에 따른 종래 방법과 제1 신호 처리부를 통한 자막 후보 영역 검출 결과를 비교하기 위해 도시한 도면이고, 도 3은 본 발명의 일 실시예에 따른 제1 신호 처리부를 통한 자막 후보 영역 검출을 설명하기 위해 도시한 도면이고, 도 4는 본 발명의 일 실시예에 따른 제2 신호 처리부를 통한 자막 후보 영역 검출을 설명하기 위해 도시한 도면이며, 도 5는 본 발명의 일 실시예에 따른 자막 검출 결과를 설명하기 위해 도시한 도면이다.FIG. 1 is a block diagram schematically showing the internal structure of a caption detection apparatus according to an embodiment of the present invention. FIG. 2 is a block diagram illustrating a conventional method according to an embodiment of the present invention and a caption candidate region detection FIG. 3 is a view for explaining caption candidate region detection through a first signal processing unit according to an embodiment of the present invention, and FIG. 4 is a view for explaining a method for detecting a caption candidate region according to an embodiment of the present invention FIG. 5 is a view for explaining caption candidate detection using a second signal processing unit according to an embodiment of the present invention, and FIG. 5 is a view for explaining caption detection result according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 자막 검출 장치(100)는 전처리부(110), 제1 신호 처리부(120), 제2 신호 처리부(130) 및 자막 검출부(140)를 포함하여 구성된다.Referring to FIG. 1, a caption detection apparatus 100 according to an embodiment of the present invention includes a preprocessing unit 110, a first signal processing unit 120, a second signal processing unit 130, and a caption detection unit 140 .

전처리부(110)는 입력 영상을 입력받아 그레이스케일 영상으로 변환하여 제1 신호 처리부(120) 및 제2 신호 처리부(130)로 각각 출력한다. 이때, 전처리부(110)는 동영상 형태로 입력되는 입력 영상을 각 프레임 단위로 그레이스케일 변환하여 제1 신호 처리부(120) 및 제2 신호 처리부(130)로 각각 출력할 수 있다.The preprocessing unit 110 receives the input image, converts the input image into a gray-scale image, and outputs the gray-scale image to the first signal processing unit 120 and the second signal processing unit 130, respectively. At this time, the preprocessing unit 110 may convert the input image input in a moving image format to gray scale conversion for each frame, and output it to the first signal processing unit 120 and the second signal processing unit 130, respectively.

제1 신호 처리부(120)는 전처리부(110)를 통해 입력받은 그레이스케일 영상을 정규화한 후 해리스 코너 자막 검출 알고리즘을 이용하여 자막 후보 영역을 검출하기 위한 수단이다. 본 명세서에서는 이해와 설명의 편의를 도모하기 위해 제1 신호 처리부(120)에서 검출된 자막 후보 영역을 제1 자막 후보 영역이라 칭하기로 한다.The first signal processing unit 120 is a unit for normalizing the gray scale image input through the preprocessing unit 110 and then detecting a caption candidate region using a Harris corner subtitle detection algorithm. In this specification, the caption candidate region detected by the first signal processing unit 120 is referred to as a first caption candidate region in order to facilitate understanding and explanation.

일반적으로, 영상내 삽입된 자막은 배경 또는 다른 객체에 비해 밝기와 색상에서 차이가 크다. 이러한 자막의 특성을 이용하여 그레이스케일 영상의 밝기의 평균과 표준편차를 구하여 정규 분포값에 따른 영상으로 정규화하는 과정을 선행한다.Generally, subtitles embedded in an image are significantly different in brightness and color from background or other objects. The average and standard deviation of the brightness of the grayscale image are calculated using the characteristics of the subtitles, and the normalization process is performed on the image according to the normal distribution value.

예를 들어, 제1 신호 처리부(120)는 그레이스케일 영상을 하기 수학식 1을 이용하여 정규화할 수 있다.For example, the first signal processor 120 may normalize a gray-scale image using Equation (1).

여기서, I는 그레이스케일 변환된 영상의 픽셀값을 나타내고, AVG()는 그레이스케일 변환된 영상의 평균 밝기값을 나타낸다. 또한, STD()는 그레이스케일 변환된 영상의 밝기 값을 이용하여 구한 표준 편차를 나타내고, O(x,y)는 표준 정규 분포 계산식의 표준화를 이용하여 얻어진 결과 픽셀 값들을 나타낸다.Here, I represents the pixel value of the gray-scale-converted image, and AVG () represents the average brightness value of the gray-scale-converted image. Also, STD () represents the standard deviation obtained by using the brightness value of the gray-scale-converted image, and O (x, y) represents the resulting pixel values obtained by using the standardization of the standard normal distribution calculation equation.

이와 같이 정규화된 영상을 이용하여 제1 신호 처리부(120)는 해리스 코너 검출 알고리즘을 이용하여 코너 점들을 검출한다. 해리스 코너 검출 알고리즘은 자막의 경우, 이론적으로 코너 점의 분포 영역이 밀집되어 있기 때문에 다른 객체에 비해 많은 코너 점을 포함하고 있는 특징을 이용하여 검출하는 방식이다. 해리스 코너 검출 알고리즘은 당업자에게는 자명한 사항이므로, 해리스 코너 검출 알고리즘 자체에 대한 설명은 생략하기로 한다.Using the normalized image, the first signal processing unit 120 detects the corner points using the Harris corner detection algorithm. In the case of caption, Harris corner detection algorithm uses the feature that contains many corner points compared with other objects because the distribution area of the corner point is concentrated theoretically. Since the Harris corner detection algorithm is obvious to those skilled in the art, the description of the Harris corner detection algorithm itself will be omitted.

제1 신호 처리부(120)는 해리스 코너 검출 알고리즘을 적용하여 코너 점들을 검출한 후, 자막 후보 영역으로 검출할 후보 영역을 최소화하기 위해 해리스 코너 검출 알고리즘을 적용한 영상에 블러링을 수행한 후 레이블링하여 자막 후보 영역을 검출할 수 있다.The first signal processing unit 120 detects corner points by applying the Harris corner detection algorithm, blurring an image to which a Harris corner detection algorithm is applied to minimize a candidate region to be detected as a caption candidate region, The caption candidate region can be detected.

도 2의 210은 원본 입력 영상을 정규화 과정 없이 해리스 코너 검출 알고리즘을 적용한 결과 영상이고, 220은 입력 영상을 정규화한 후 해리스 코너 검출 알고리즘을 적용한 결과 영상이다.In FIG. 2, reference numeral 210 denotes an image obtained by applying the Harris corner detection algorithm without normalization to the original input image, and reference numeral 220 denotes a result image obtained by applying the Harris corner detection algorithm after normalizing the input image.

도 2의 결과 영상에서 보여지는 바와 같이, 배경 영역이 복잡한 구조로 이루어져 있거나 자막과 비슷한 텍스쳐가 포함되어 있을 경우, 해리스 코너 검출 알고리즘을 적용하면 코너 점들이 다양하게 분포되어 검출되는 것을 알 수 있다. As can be seen from the result image of FIG. 2, when the background area has a complicated structure or a texture similar to a caption is included, it can be seen that corner points are detected by variously distributing when the Harris corner detection algorithm is applied.

반면, 220의 결과 영상에서 알 수 있듯이, 제1 신호 처리부(120)과 같이, 그레이스케일 변환된 영상을 정규화한 후 해리스 코너 검출 알고리즘을 적용하는 경우, 작은 노이즈로 인한 소수의 코너 점을 제외하면 코너 점들이 자막 영역에만 분포되어 있는 것을 알 수 있다. 제1 신호 처리부(120)는 코너 점들을 인접된 영역끼리 레이블링하여 자막 후보 영역을 도출할 수 있다.On the other hand, as can be seen from the result image 220, when the Harris corner detection algorithm is applied after normalizing the gray-scale-converted image like the first signal processing unit 120, except for a small number of corner points due to small noise It can be seen that the corner points are distributed only in the caption area. The first signal processing unit 120 can derive a caption candidate region by labeling corner points between adjacent regions.

도 3의 310은 원본 입력 영상으로, 320에는 제1 신호 처리부(120)에 의해 정규화된 그레이스케일 영상이 도시되어 있고, 330에는 제1 신호 처리부(120)에 의해 정규화된 그레이스케일 영상에 해리스 코너 검출 알고리즘을 적용한 결과가 도시되어 있다.3, reference numeral 310 denotes an original input image, 320 denotes a gray scale image normalized by the first signal processing unit 120, 330 denotes a gray scale image normalized by the first signal processing unit 120, The results of applying the detection algorithm are shown.

제2 신호 처리부(130)는 그레이스케일 변환된 영상을 이산코사인(이하 DCT라 칭하기로 함) 변환을 수행하여 자막 후보 영역을 검출하기 위한 수단이다.The second signal processor 130 is a means for detecting a candidate region of a subtitle by performing a discrete cosine (DCT) transformation on the gray-scale-transformed image.

우선, 제2 신호 처리부(130)는 그레이스케일 변환된 영상에 대해 DCT 변환을 수행한다(도 4의 420은 입력 영상의 일부를 DCT 변환한 결과를 나타냄). 이어, 제2 신호 처리부(130)는 일반적으로 자막 영역이 배경이나 다른 객체 영역에 비해 대비가 크고 고주파 성분을 갖는 특성을 이용하여 저주파 성분을 감소시키기 위해 사인 함수(sign())를 적용하여 정규화한다(도 4의 420 참조). 그리고, 제2 신호 처리부(130)는 정규화된 영상에 역이산코사인 변환을 수행한 후 가우시안 필터를 적용(도 4의 440 참조)함으로써 블러링을 수행하여 노이즈를 제거한 후 평균과 분산을 이용하여 임계값을 결정한 후 해당 임계값을 이용하여 자막 후보 영역을 검출할 수 있다(도 4의 450 참조).First, the second signal processing unit 130 performs DCT conversion on the gray-scale-converted image (420 in FIG. 4 indicates a result of DCT conversion of a part of the input image). Next, the second signal processing unit 130 applies a sine function (sign ()) to reduce the low frequency component using the characteristic that the caption area has a contrast and a high frequency component compared to the background or other object areas, (See 420 in Fig. 4). Then, the second signal processor 130 performs inverse discrete cosine transform on the normalized image and applies blurring by applying a Gaussian filter (see 440 in FIG. 4) to remove the noise, The subtitle candidate region can be detected using the threshold value (see 450 in FIG. 4).

예를 들어, 제2 신호 처리부(130)는 하기 수학식 2를 이용하여 DCT 변환에 따른 자막 후보 영역을 검출할 수 있다.For example, the second signal processing unit 130 can detect a subtitle candidate region according to the DCT transform using Equation (2).

,

여기서, I는 그레이스케일 변환된 영상을 나타내고, x,y는 영상의 X축 및 Y축 픽셀을 나타내다. sign()는 저주파 성분을 감소시키기 위한 사인함수로써, 음수의 경우 -1, 0은 0, 양수는 1을 결과값으로 리턴하기 위한 함수이다.Here, I represents a grayscale-converted image, and x and y represent X-axis and Y-axis pixels of the image. sign () is a sine function to reduce the low-frequency component. It is a function to return -1 as a negative value, 0 as a zero value, and 1 as a positive value.

abs()는 절대값을 리턴하기 위한 함수를 나타내고, G는 가우시안 필터를 나타내고, DCT^-1은 역이산코사인변환을 나타낸다. 이는 당업자에게는 자명한 사항이므로 이에 대한 별도의 설명은 생략하기로 한다.abs () denotes a function for returning an absolute value, G denotes a Gaussian filter, and DCT ^-1 denotes an inverse discrete cosine transform. It will be obvious to those skilled in the art that a separate description thereof will be omitted.

제2 신호 처리부(130)는 가우시안 필터가 적용된 영상에서 자막 후보 영역을 검출하기 위해 이진화한 후 평균과 분산을 이용하여 임계값을 도출하고, 임계값을 이용하여 자막 후보 영역을 검출할 수 있다. 예를 들어, 제2 신호 처리부(130)는 가우시안 필터 적용된 영상에서 임계값 이상인 픽셀값들은 제1 값(예를 들어, 255)로 변환하고, 임계값 미만인 픽셀값들은 제2 값(예를 들어, 0)으로 변환한 후 제2 값에 해당하는 픽셀값들을 자막 후보 영역으로 검출할 수 있다. 이때, 제2 신호 처리부(130)는 검출된 자막 후보 영역간의 인접 화소를 연결하여 각각 레이블링하여 자막 후보 영역을 검출할 수 있다.The second signal processing unit 130 may binarize a caption candidate region in an image to which a Gaussian filter is applied, derive a threshold value using an average and variance, and detect a caption candidate region using a threshold value. For example, the second signal processing unit 130 may convert pixel values of a threshold value or more to a first value (e.g., 255) in the Gaussian filtered image, and pixel values of less than a threshold value may be converted to a second value , 0), and then pixel values corresponding to the second value can be detected as a caption candidate region. At this time, the second signal processing unit 130 can detect the caption candidate regions by connecting adjacent pixels between the detected caption candidate regions and labeling them.

본 명세서에서는 이해와 설명의 편의를 도모하기 위해 제2 신호 처리부(130)에 의해 검출된 자막 후보 영역을 제2 자막 후보 영역이라 칭하기로 한다. In this specification, the caption candidate region detected by the second signal processing unit 130 is referred to as a second caption candidate region in order to facilitate understanding and explanation.

자막 검출부(140)는 제1 신호 처리부(120) 및 제2 신호 처리부(130)로부터 각각 제1 자막 후보 영역과 제2 자막 후보 영역을 입력받고, 공동으로 겹쳐지는 자막 후보 영역을 최종 자막 영역으로써 검출한다.The caption detection unit 140 receives the first caption candidate region and the second caption candidate region from the first signal processing unit 120 and the second signal processing unit 130, respectively, .

도 5의 510은 제2 신호 처리부(130)에서 DCT를 이용하여 자막 후보 영역을 검출한 결과를 도시이며, 530은 인접 화소를 연결하여 레이블링한 자막 후보 영역을 도시한 것이다. 520은 제1 신호 처리부(120)에서 해리스 코너 검출 알고리즘을 이용하여 검출한 코너 점들을 도시한 것이고, 540은 코너 점들을 레이블링한 자막 후보 영역을 도시한 것이다. 도 5에서 보여지는 바와 같이, 제1 신호 처리부(120) 및 제2 신호 처리부(130)에서 검출한 제1 자막 후보 영역과 제2 자막 후보 영역이 각각 상이한 것을 알 수 있다. In FIG. 5, 510 indicates a result of detecting a candidate region of a caption using the DCT in the second signal processing unit 130, and 530 indicates a caption candidate region labeled by connecting adjacent pixels. Reference numeral 520 denotes corner points detected by the first signal processing unit 120 using the Harris corner detection algorithm, and reference numeral 540 denotes a caption candidate region labeled with corner points. As shown in FIG. 5, the first caption candidate region and the second caption candidate region detected by the first signal processing unit 120 and the second signal processing unit 130 are different from each other.

이에 따라, 자막 검출부(140)는 제1 신호 처리부(120)와 제2 신호 처리부(130)에서 검출한 제1 자막 후보 영역과 제2 자막 후보 영역을 비교한 후 도 5의 550과 같이 중복되는 영역을 자막 영역으로 검출할 수 있다.
Accordingly, the subtitle detection unit 140 compares the first subtitle candidate region and the second subtitle candidate region detected by the first signal processing unit 120 and the second signal processing unit 130, Area can be detected as the caption area.

도 6은 본 발명의 일 실시예에 따른 자막 검출 장치에서 자막을 검출하는 방법을 나타낸 순서도이고, 도 7은 종래 방법과 본 발명의 일 실시예에 따른 자막 검출 성능을 비교한 표이다. 이하에서 설명되는 각각의 단계는 자막 검출 장치 각각의 내부 구성 요소에 의해 수행되나 이해와 설명의 편의를 도모하기 위해 자막 검출 장치로 통칭하여 설명하기로 한다.FIG. 6 is a flowchart illustrating a method of detecting a caption in a caption detection apparatus according to an exemplary embodiment of the present invention. FIG. 7 is a table comparing caption detection performance according to an exemplary embodiment of the present invention. Each of the steps described below is performed by the internal components of each of the caption detecting devices, but will be collectively referred to as a caption detecting device in order to facilitate understanding and explanation.

단계 610에서 자막 검출 장치(100)는 입력 영상을 입력받는다. 여기서, 입력 영상은 동영상 형태일 수 있다.In step 610, the subtitle detection apparatus 100 receives an input image. Here, the input image may be in the form of a moving image.

단계 615에서 자막 검출 장치(100)는 입력받은 입력 영상을 프레임 단위로 그레이스케일 변환한다. In step 615, the caption detecting apparatus 100 converts the inputted input image into a gray-scale converted frame unit.

이후, 자막 검출 장치(100)는 601 및 602를 각각 병렬로 수행하여 각각의 자막 후보 영역을 검출할 수 있다. 이하에서는 이해와 설명의 편의를 도모하기 위해 601 해리스 코너 검출 알고리즘을 이용하여 자막 후보 영역을 검출하는 방법을 우선하여 설명하기로 한다.Subsequently, the subtitle detection apparatus 100 can detect each of the caption candidate regions by performing 601 and 602 in parallel. Hereinafter, a method of detecting a candidate region of a caption using the 601 Harris corner detection algorithm will be described in order to facilitate understanding and explanation.

단계 620에서 자막 검출 장치(100)는 그레이스케일 변환된 영상의 밝기 평균과 표준 편차를 도출한다.In operation 620, the subtitle detection apparatus 100 derives a brightness average and a standard deviation of the gray-scale-converted image.

단계 625에서 자막 검출 장치(100)는 도출된 평균과 표준 편차를 이용하여 정규 분포값에 따른 영상으로 정규화한다.In step 625, the caption detection apparatus 100 normalizes the image according to the normal distribution value using the derived mean and standard deviation.

단계 630에서 자막 검출 장치(100)는 정규화된 그레이스케일 영상을 해리스 코너 검출 알고리즘을 이용하여 코너 점들을 검출한다.In step 630, the subtitle detection apparatus 100 detects the corner points using the normalized gray-scale image using the Harris corner detection algorithm.

단계 635에서 자막 검출 장치(100)는 검출된 코너 점들을 인접된 영역끼리 레이블링하여 자막 후보 영역(제1 자막 후보 영역이라 칭함)을 검출한다.In step 635, the subtitles detecting apparatus 100 labels the detected corner points among adjacent areas to detect a caption candidate region (referred to as a first caption candidate region).

예를 들어, 자막 검출 장치(100)는 검출된 코너 점들을 블러링을 수행하여 노이즈를 제거한 후 인접된 영역끼리 레이블링하여 제1 자막 후보 영역을 검출할 수 있다.For example, the caption detection apparatus 100 may detect the first caption candidate region by blurring the detected corner points to remove noise, and then label the adjacent regions with each other.

이어, DCT를 이용한 자막 후보 영역 검출 방법(602)에 대해 설명하기로 하자. Next, a description will be made of a method 602 for detecting a candidate region of a caption using DCT.

단계 640에서 자막 검출 장치(100)는 그레이스케일 변환된 영상에 대해 DCT 변환을 수행한다.In operation 640, the subtitle detection apparatus 100 performs DCT conversion on the gray-scale-converted image.

이어, 단계 645에서 자막 검출 장치(100)는 DCT 변환된 영상을 사인 함수를 적용하여 정규화한다.Subsequently, in step 645, the subtitle detection apparatus 100 normalizes the DCT-transformed image by applying a sine function.

단계 650에서 자막 검출 장치(100)는 정규화된 영상을 다시 IDCT 변환을 수행한다.In operation 650, the subtitle detection apparatus 100 performs the IDCT transformation again on the normalized image.

이어, 단계 655에서 자막 검출 장치(100)는 IDCT 변환된 영상을 가우시안 필터를 적용하여 블러링을 수행한 후 평균과 분산을 이용하여 임계값을 결정하며, 임계값을 이용하여 후보 영역을 검출한다. 이때, 자막 검출 장치(100)는 후보 영역들간의 인접 화소 연결을 통해 레이블링하여 자막 후보 영역(제2 자막 후보 영역)을 검출할 수 있다.Subsequently, in step 655, the subtitle detection apparatus 100 performs blurring by applying a Gaussian filter to the IDCT-transformed image, determines a threshold value using an average and variance, and detects a candidate region using the threshold value . At this time, the caption detecting apparatus 100 can detect the caption candidate region (second caption candidate region) by labeling through the adjacent pixel connection between the candidate regions.

단계 660에서 자막 검출 장치(100)는 제1 자막 후보 영역과 제2 자막 후보 영역을 비교한 후 중복되는 영역을 최종 자막 영역으로 검출한다.In step 660, the caption detecting apparatus 100 compares the first caption candidate region and the second caption candidate region, and detects an overlapping region as a final caption region.

도 7은 종래의 방법과 본 발명의 일 실시예에 따른 자막 검출 성능 결과를 비교한 표이다. 도 7은 하기 수학식 3을 이용하여 자막 검출 정확도와 효율을 도출하였다. 도 7을 참조하면, 일괄적인 자막 구조와 배경과 쉽게 구분이 가능한 뉴스나 다큐멘터리, 자막이 삽입된 드라마 영상에서의 자막 영역 검출은 종래의 방법에 비해 효율적으로 자막을 검출한 것을 알 수 있다.
FIG. 7 is a table comparing a conventional method and subtitle detection performance according to an embodiment of the present invention. FIG. 7 shows the subtitle detection accuracy and efficiency using Equation (3). Referring to FIG. 7, it can be seen that subtitle detection in a drama image in which news, documentary, and subtitles are inserted in which the subtitle structure and background can be easily distinguished are detected more efficiently than the conventional method.

한편, 본 발명의 실시예에 따른 자막 검출 방법은 다양한 전자적으로 정보를 처리하는 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 저장 매체에 기록될 수 있다. 저장 매체는 프로그램 명령, 데이터 파일, 데이터 구조등을 단독으로 또는 조합하여 포함할 수 있다. Meanwhile, the caption detection method according to an embodiment of the present invention may be implemented in a form of a program command that can be executed through a variety of means for processing information electronically and recorded in a storage medium. The storage medium may include program instructions, data files, data structures, and the like, alone or in combination.

저장 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다. 저장 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 전자적으로 정보를 처리하는 장치, 예를 들어, 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. Program instructions to be recorded on the storage medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of software. Examples of storage media include magnetic media such as hard disks, floppy disks and magnetic tape, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, magneto-optical media and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as devices for processing information electronically using an interpreter or the like, for example, a high-level language code that can be executed by a computer.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.
The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined in the appended claims. It will be understood that the invention may be varied and varied without departing from the scope of the invention.

100: 자막 검출 장치
110: 전처리부
120: 제1 신호 처리부
130: 제2 신호 처리부
140: 자막 검출부100: Subtitle detection device
110:
120: first signal processor
130: second signal processor
140:

Claims

Converting an input image into a gray scale image;
Detecting a first caption candidate region by applying a Harris corner detection algorithm to the gray-scale transformed image;
Performing a discrete cosine transform on the gray-scale-converted image to detect a second caption candidate region; And
And determining a region overlapping the first caption candidate region and the second caption candidate region as a caption region.

The method according to claim 1,
Wherein the step of detecting the first caption candidate region comprises:
Deriving brightness average and standard deviation of the gray-scale converted image;
Normalizing the image according to the normal distribution value using the brightness average and the standard deviation;
Detecting the corner points by applying the Harris corner detection algorithm to the normalized image;
Blurring an image in which the corner points are detected to remove noise; And
And labeling neighboring regions of the noise-removed corner points to detect the first caption candidate region.

The method according to claim 1,
Wherein the step of detecting the second caption candidate region comprises:
Performing normalization by applying the discrete cosine transformed image to a sine function;
Performing inverse discrete cosine transform on the normalized image;
Applying the Gaussian filter to the inverse discrete cosine transformed image;
And binarizing the Gaussian filtered image to a threshold value to detect the second caption candidate region.

The method of claim 3,
Wherein the threshold value is a value obtained by summing an average and variance of the Gaussian filtered image.

The method of claim 3,
In order to encode the Gaussian filtered image to a threshold value,
Wherein if the pixel value of the Gaussian filtered image is greater than or equal to the threshold value, the pixel value is converted to a first value, and if the pixel value is less than the threshold value, the pixel value is converted to a second value to binarize the pixel value.

5. The method of claim 4,
Wherein the step of detecting the second caption candidate region comprises:
And labeling the adjacent region of the pixel values corresponding to the second value to detect the second caption candidate region.

A recording medium on which a program for performing a subtitle detection method according to any one of claims 1 to 6 is recorded.

A preprocessing unit for converting an input image into a gray scale image;
A first signal processing unit for detecting a first caption candidate region by applying a Harris corner detection algorithm to the gray-scale converted image;
A second signal processing unit for performing a discrete cosine transform on the gray-scale-converted image to detect a second caption candidate region; And
And a subtitle detection unit for detecting a region overlapping the first subtitle candidate region and the second subtitle candidate region as a subtitle region.

9. The method of claim 8,
Wherein the first signal processor comprises:
The brightness average and standard deviation of the gray-scale-converted image are derived, normalized to an image according to the normal distribution value using the brightness average and standard deviation, and the normalized image is subjected to the Harris corner detection algorithm to calculate corner points And detects the first caption candidate region by performing labeling of neighboring regions of the corner point after blurring.

9. The method of claim 8,
Wherein the second signal processing unit comprises:
The image subjected to the discrete cosine transform is applied to a sine function to perform an inverse discrete cosine transform by normalizing the transformed image, and the inverse discrete cosine transform is applied to a Gaussian filter to binarize the inverse discrete cosine transformed image to a threshold value to detect the second subtitle candidate region And the subtitles are detected.