CN106846339A

CN106846339A - Image detection method and device

Info

Publication number: CN106846339A
Application number: CN201710076259.8A
Authority: CN
Inventors: 李红匣
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2017-02-13
Filing date: 2017-02-13
Publication date: 2017-06-13
Also published as: WO2018145470A1

Abstract

The embodiment of the invention discloses an image detection method and device. The image detection method comprises the steps of obtaining an image to be detected, extracting a maximum stable extremum MSER region from the image to be detected, wherein the MSER region is a connected region, and filtering the MSER region to obtain a text region in the image to be detected. The MSER region is extracted from the image to be detected, the MSER region is extracted in a mode of dividing the connected region to serve as a candidate region, then the extracted MSER region is filtered and screened, and finally the text region in the image to be detected is obtained.

Description

An image detection method and device

技术领域technical field

本发明涉及图像处理技术领域，尤其涉及一种图像检测方法和装置。The present invention relates to the technical field of image processing, in particular to an image detection method and device.

背景技术Background technique

随着数码摄像设备的成熟和普及，人们已经能够非常方便快捷地记录现实世界在不同视角下的方方面面。而作为人类语言的可视化文本，在人类活动中具有特殊而不可替代的地位。自然场景文字检测是计算机视觉与模式识别技术在目标检测与识别领域中的重要研究课题之一。该技术目的在于在所拍摄的自然场景图像中准确地检测出文字信息，其在自然场景理解与分析、机器人辅助导航、视频检索、盲人辅助阅读及文字翻译等方面有广泛的应用前景。With the maturity and popularization of digital camera equipment, people have been able to record all aspects of the real world under different perspectives very conveniently and quickly. As a human language, visual text has a special and irreplaceable position in human activities. Text detection in natural scenes is one of the important research topics of computer vision and pattern recognition technology in the field of target detection and recognition. The purpose of this technology is to accurately detect text information in the captured natural scene images. It has broad application prospects in natural scene understanding and analysis, robot-assisted navigation, video retrieval, assisted reading for the blind, and text translation.

目前，自然场景文本检测方法分为两种：基于滑动窗口的方法和基于连通区域的方法。Currently, natural scene text detection methods are divided into two types: methods based on sliding windows and methods based on connected regions.

基于滑动窗口的方法，是指将多尺度的窗口在图像中从左到右、从上到下进行滑动，并对滑动窗口内的图像进行分类，判断其是否为文字区域，为了能够检测所有的文本区域，该方法通常需要大量的滑动窗口，导致计算复杂度增高，并不能达到实时的要求。The method based on the sliding window refers to sliding a multi-scale window from left to right and from top to bottom in the image, and classifying the image in the sliding window to determine whether it is a text area. In order to be able to detect all For the text area, this method usually requires a large number of sliding windows, which leads to increased computational complexity and cannot meet the real-time requirements.

基于连通区域的方法，是指根据文本固有的属性，如颜色、纹理、笔划宽度等，对像素进行相似性聚类，生成大量的连通区域，并对连通区域进行特征(如文字高度、宽度和间距等)提取，过滤非文本区域，从而完成文本检测，相对于基于滑动窗口的方法，该方法的计算量相对减少，但是对要求连通区域的提取有很高的要求，即所提取的连通区域要包括所有的文字区域，并且很难有效地应对复杂背景的情况。The method based on connected regions refers to the similarity clustering of pixels according to the inherent properties of the text, such as color, texture, stroke width, etc., to generate a large number of connected regions, and perform characteristics on the connected regions (such as text height, width and spacing, etc.) extraction, filtering non-text areas, so as to complete the text detection, compared with the method based on the sliding window, the calculation amount of this method is relatively reduced, but it has high requirements for the extraction of connected regions, that is, the extracted connected regions To include all text areas, and it is difficult to deal effectively with complex backgrounds.

发明内容Contents of the invention

为解决相关技术问题，本发明提供一种图像检测方法和装置，可实现快速、准确地在复杂自然场景中提检测出文字区域。In order to solve related technical problems, the present invention provides an image detection method and device, which can quickly and accurately detect text regions in complex natural scenes.

为实现上述目的，本发明实施例采用如下技术方案：In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

第一方面，本发明实施例提供了一种图像检测方法，包括：In a first aspect, an embodiment of the present invention provides an image detection method, including:

获取待检测图像；Obtain the image to be detected;

从所述待检测图像中提取最大稳定极值MSER区域，其中，所述MSER区域为连通区域；Extracting a maximum stable extremum MSER region from the image to be detected, wherein the MSER region is a connected region;

过滤所述MSER区域，得到所述待检测图像中的文本区域。Filter the MSER region to obtain the text region in the image to be detected.

第二方面，本发明实施例还对应地提供了一种图像检测装置，包括：In the second aspect, the embodiment of the present invention also provides an image detection device correspondingly, including:

待检测图像获取模块，用于获取待检测图像；The image acquisition module to be detected is used to obtain the image to be detected;

MSER区域提取模块，用于从所述待检测图像中提取最大稳定极值MSER区域，其中，所述MSER区域为连通区域；The MSER region extraction module is used to extract the maximum stable extremum MSER region from the image to be detected, wherein the MSER region is a connected region;

MSER区域过滤模块，用于过滤所述MSER区域，得到所述待检测图像中的文本区域。The MSER region filtering module is used to filter the MSER region to obtain the text region in the image to be detected.

本发明实施例提供的技术方案带来的有益效果：Beneficial effects brought by the technical solutions provided by the embodiments of the present invention:

本技术方案中，获取待检测图像，从待检测图像中提取最大稳定极值MSER区域，其中，最大稳定极值区域为连通区域，过滤MSER区域，得到待检测图像中的文本区域。通过从待检测图像中提取MSER区域，以划分连通区域的方式提取MSER区域作为候选区域，再对提取到的MSER区域进行过滤筛选，最终得到待检测图像中的文本区域，区域划分有利于减少计算量、提高检测效率，同时提取MSER区域可减少图像背景的干扰，可提高在检测背景复杂的待检测图像时的准确率。In the technical solution, the image to be detected is acquired, and the maximum stable extremum MSER area is extracted from the image to be detected, wherein the maximum stable extremum area is a connected area, and the MSER area is filtered to obtain the text area in the image to be detected. By extracting the MSER area from the image to be detected, the MSER area is extracted as a candidate area by dividing the connected area, and then the extracted MSER area is filtered and screened, and finally the text area in the image to be detected is obtained. The area division is conducive to reducing calculations. At the same time, extracting the MSER region can reduce the interference of the image background and improve the accuracy of detecting the image to be detected with a complex background.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对本发明实施例描述中所需要使用的附图作简单的介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据本发明实施例的内容和这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments of the present invention. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention , for those skilled in the art, other drawings can also be obtained according to the content of the embodiment of the present invention and these drawings without any creative effort.

图1是本发明实施例一提供的一种图像检测方法的流程示意图；FIG. 1 is a schematic flow chart of an image detection method provided in Embodiment 1 of the present invention;

图2A是本发明实施例二提供的一种图像检测方法的流程示意图；FIG. 2A is a schematic flowchart of an image detection method provided in Embodiment 2 of the present invention;

图2B是图2A中S250的可选实施方式的流程示意图；FIG. 2B is a schematic flowchart of an optional implementation manner of S250 in FIG. 2A;

图2C是本发明实施例二中使用的卷积神经网络模型的结构示意图；FIG. 2C is a schematic structural diagram of a convolutional neural network model used in Embodiment 2 of the present invention;

图3是本发明实施例三提供的一种图像检测装置的架构示意图；FIG. 3 is a schematic diagram of the structure of an image detection device provided in Embodiment 3 of the present invention;

图4A是本发明实施例四提供的一种图像检测装置的架构示意图；FIG. 4A is a schematic structural diagram of an image detection device provided in Embodiment 4 of the present invention;

图4B是图4A中MSER区域过滤模块450的可选实施方式的架构示意图。FIG. 4B is a schematic structural diagram of an alternative implementation of the MSER region filtering module 450 in FIG. 4A .

具体实施方式detailed description

为使本发明解决的技术问题、采用的技术方案和达到的技术效果更加清楚，下面将结合附图对本发明实施例的技术方案作进一步的详细描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the technical problems solved by the present invention, the technical solutions adopted and the technical effects achieved clearer, the technical solutions of the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only the technical solutions of the present invention. Some, but not all, embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present invention.

实施例一Embodiment one

请参考图1，其是本发明实施例一提供的一种图像检测方法的流程示意图。本实施例的方法可以由配置有摄像头的智能手机、平板电脑或笔记本电脑等移动设备来执行，可适用于检测识别自然场景图像中文本区域的情况。Please refer to FIG. 1 , which is a schematic flowchart of an image detection method provided by Embodiment 1 of the present invention. The method of this embodiment can be executed by a mobile device such as a smart phone, a tablet computer, or a notebook computer equipped with a camera, and is applicable to detection and recognition of text regions in natural scene images.

本实施例提供的一种图像检测方法，可以包括以下步骤：An image detection method provided in this embodiment may include the following steps:

S110：获取待检测图像。S110: Acquire an image to be detected.

示例性的，在本发明实施例中，待检测图像可以为原始图像，也可以为对原始图像经过预处理的得到的图像。在本发明的一个实施例中，优选将原始图像进行预处理得到待检测图像。Exemplarily, in the embodiment of the present invention, the image to be detected may be an original image, or may be an image obtained by preprocessing the original image. In one embodiment of the present invention, the original image is preferably preprocessed to obtain the image to be detected.

S120：从待检测图像中提取最大稳定极值MSER区域。S120: Extract the maximum stable extremum MSER region from the image to be detected.

示例性的，最大稳定极值(Maximally Stable Extrernal Regions，MSER)区域是指待检测图像经过一定的阈值变化后形成的连通区域，可以从待检测图像中提取出多个MSER区域，可以连通区域的最小外接矩形来表示MSER区域。其中，同一个连通区域内的颜色、纹理、字符笔画宽度等特征基本相同。Exemplarily, the Maximally Stable Extrernal Regions (MSER) region refers to a connected region formed after the image to be detected undergoes a certain threshold change, and multiple MSER regions can be extracted from the image to be detected, and the connected regions can be Minimum enclosing rectangle to represent the MSER region. Among them, features such as color, texture, and character stroke width in the same connected region are basically the same.

在待检测图像中所显示的每个矩形框均代表一个MSER区域，可以从待检测图像中提取出多个MSER区域，也可能提取不出MSER区域，即待检测图像中没有文本区域。Each rectangular frame displayed in the image to be detected represents an MSER region, and multiple MSER regions can be extracted from the image to be detected, or no MSER region can be extracted, that is, there is no text region in the image to be detected.

S130：过滤MSER区域，得到待检测图像中的文本区域。S130: Filter the MSER region to obtain a text region in the image to be detected.

示例性的，过滤MSER区域的方法有很多，例如根据MSER区域的区域特征来过滤。在本发明的实施例二提供了一种过滤MSER区域的可选实施方式，在此不加以赘述。Exemplarily, there are many methods for filtering MSER regions, for example, filtering according to regional characteristics of MSER regions. Embodiment 2 of the present invention provides an optional implementation manner of filtering MSER regions, which will not be repeated here.

综上，在本技术方案中，获取待检测图像，从待检测图像中提取最大稳定极值MSER区域，其中，最大稳定极值区域为连通区域，过滤MSER区域，得到待检测图像中的文本区域。通过从待检测图像中提取MSER区域，以划分连通区域的方式提取MSER区域作为候选区域，再对提取到的MSER区域进行过滤筛选，最终得到待检测图像中的文本区域，区域划分有利于减少计算量、提高检测效率，同时提取MSER区域可减少图像背景的干扰，可提高在检测背景复杂的图像时的准确率。In summary, in this technical solution, the image to be detected is obtained, and the maximum stable extremum MSER region is extracted from the image to be detected, wherein the maximum stable extremum region is a connected region, and the MSER region is filtered to obtain the text region in the image to be detected . By extracting the MSER area from the image to be detected, the MSER area is extracted as a candidate area by dividing the connected area, and then the extracted MSER area is filtered and screened, and finally the text area in the image to be detected is obtained. The area division is conducive to reducing calculations. At the same time, extracting the MSER region can reduce the interference of the image background and improve the accuracy of detecting images with complex backgrounds.

实施例二Embodiment two

请参考图2A、图2B和图2C，其中，图2A是本发明实施例二提供的一种图像检测方法的流程示意图，图2B是图2A中S250的可选实施方式的流程示意图，图2C是本发明实施例二中使用的卷积神经网络模型的结构示意图。本实施例与实施例一的主要区别在于，在实施例一的基础上增加了S210、S220、S260和S270的内容，并进一步提供了S250的可选实施方式。Please refer to FIG. 2A, FIG. 2B and FIG. 2C, wherein FIG. 2A is a schematic flowchart of an image detection method provided in Embodiment 2 of the present invention, FIG. 2B is a schematic flowchart of an optional implementation of S250 in FIG. 2A, and FIG. 2C is a schematic structural diagram of the convolutional neural network model used in Embodiment 2 of the present invention. The main difference between this embodiment and the first embodiment is that the contents of S210, S220, S260 and S270 are added on the basis of the first embodiment, and an optional implementation manner of S250 is further provided.

本实施例提供的一种图像检测方法，可以包括如下步骤：An image detection method provided in this embodiment may include the following steps:

S210：接收初始图像。S210: Receive an initial image.

示例性的，初始图像可以是通过摄像头拍摄自然场景得到的图像，通常是RGB图像。Exemplarily, the initial image may be an image obtained by shooting a natural scene with a camera, usually an RGB image.

S220：对初始图像进行颜色空间转换，以获得待检测图像。S220: Perform color space conversion on the initial image to obtain an image to be detected.

示例性的，通过对初始图像进行颜色空间转换，得到R、G、B、Grayscale、H、S、V共7个通道的图像，作为待检测图像，后续步骤中均是对这7个图像进行操作。Exemplarily, by performing color space conversion on the initial image, images of 7 channels of R, G, B, Grayscale, H, S, and V are obtained as images to be detected, and these 7 images are all processed in subsequent steps operate.

S230：获取待检测图像。S230: Acquire an image to be detected.

S240：从待检测图像中提取最大稳定极值MSER区域。S240: Extracting the maximum stable extremum MSER region from the image to be detected.

示例性的，可以通过MSER算法从待检测图像中提取MSER区域，主要过程为：对待检测图像进行二值化处理，调节二值化阈值在[0,255]范围内变化，当连通区域的面积变化幅度V(i)小于设定的变化幅度值时，确定连通区域为MSER区域；举例来说，对检测图像的灰度图二值化处理时，将像素值小于二值化阈值的像素点均设置像素值为0，将像素值不小于二值化阈值的像素点均设置像素值为255，则对应的二值化图像就经历一个从全黑到全白的过程(就像水位不断上升的俯瞰图)，在这个过程中，有些连通区域的面积随着二值化阈值的变化而变化很小，即V(i)小于设定的变化幅度值(如0.25)，这种连通区域就是MSER区域。Exemplarily, the MSER region can be extracted from the image to be detected by the MSER algorithm. The main process is: binarize the image to be detected, and adjust the binarization threshold to change within the range of [0, 255]. When the area of the connected region changes When V(i) is less than the set change range value, it is determined that the connected region is an MSER region; for example, when binarizing the grayscale image of the detected image, all pixels whose pixel value is smaller than the binarization threshold are set to The pixel value is 0, and the pixel value is set to 255 for the pixel points whose pixel value is not less than the binarization threshold, then the corresponding binarized image will go through a process from completely black to completely white (like a bird's-eye view of the rising water level. Figure), in this process, the area of some connected regions changes very little with the change of the binarization threshold, that is, V(i) is less than the set change range value (such as 0.25), this connected region is the MSER region .

其中，Q_i表示二值化阈值为i时连通区域的面积；Δ表示二值化阈值的微小变化；面积变化幅度V(i)表示当二值化阈值为i发生微小变化时，连通区域的面积变化程度。in, Q _i represents the area of the connected region when the binarization threshold is i; Δ represents the small change of the binarization threshold; the area change range V(i) represents the area change of the connected region when the binarization threshold value i changes slightly degree.

S250：过滤MSER区域，得到待检测图像中的文本区域。S250: Filter the MSER region to obtain a text region in the image to be detected.

可选的，如图2B所示，过滤MSER区域可以包括S251、S252、S253和S254四个步骤，其中：Optionally, as shown in Figure 2B, filtering the MSER region may include four steps of S251, S252, S253 and S254, wherein:

S251：统计MSER区域的像素值或区域长宽比。S251: Count pixel values or area aspect ratios of the MSER area.

示例性的，在实际应用中，拍摄到的自然场景图像几乎没有少于30个像素的文字图像，并且一般文字区域的长宽比也在一定的范围内，例如，文字区域的长宽比通常在0.3-3的范围内，因此可以根据确定的MSER区域矩形框内的像素值或长宽比，来初步过滤MSER区域中的非文本区域。Exemplarily, in practical applications, there are almost no text images with less than 30 pixels in the captured natural scene images, and the aspect ratio of the general text area is also within a certain range, for example, the aspect ratio of the text area is usually In the range of 0.3-3, therefore, the non-text area in the MSER area can be preliminarily filtered according to the pixel value or the aspect ratio in the determined rectangular frame of the MSER area.

S252：将像素值小于预设像素阈值或区域长宽比不在预设范围内的MSER区域过滤。S252: Filter the MSER region whose pixel value is smaller than a preset pixel threshold or whose aspect ratio is not within a preset range.

示例性的，将像素数少于30，或区域长宽比不在0.3-3范围内的MSER区域过滤。Exemplarily, MSER regions with less than 30 pixels or region aspect ratios not in the range of 0.3-3 are filtered.

此外，当一个文字区域有多个矩形框时，为减少计算量，可以从多个矩形框中选取其中一个来代表该文字区域。例如，对于任意一个矩形框A，当另一个矩形框B与矩形框A的重叠区域面积，与矩形框A和矩形框B并集的总面积的比值大于0.8时，则认为矩形框A和矩形框B位于同一个位置、代表的是同一个文字区域，将矩形框A和矩形框B合并，遍历剩余所有矩形框，将符合上述合并条件的矩形框与矩形框A合并，同时也对待检测图像中其他矩形框进行类似操作，可最大限度地减少后续计算量。In addition, when a text area has multiple rectangles, in order to reduce the amount of calculation, one of the multiple rectangles can be selected to represent the text area. For example, for any rectangular frame A, when the ratio of the overlapping area of another rectangular frame B and rectangular frame A to the total area of the union of rectangular frame A and rectangular frame B is greater than 0.8, it is considered that rectangular frame A and rectangular frame Frame B is located at the same position and represents the same text area. Merge the rectangular frame A and the rectangular frame B, traverse all the remaining rectangular frames, merge the rectangular frame that meets the above merging conditions with the rectangular frame A, and also the image to be detected Similar operations can be performed on other rectangles in the frame, which can minimize the amount of subsequent calculations.

S253：连续对过滤后剩余的MSER区域进行卷积和下采样处理，获得特征映射图。S253: Continuously perform convolution and down-sampling processing on the remaining MSER regions after filtering to obtain a feature map.

示例性的，本实施例采用MSER区域提取的二值化图像对卷积神经网络模型进行训练。如图2C所示，首先输入一张32*32的图像，经过6个5*5的核矩阵对输入图像进行卷积，得到C1层6个28*28的特征映射图；对C1层的特征映射图进行下采样处理，每4个像素(2*2)得到一个值，则得到S2层6个14*14的特征映射图；然后利用5*5的核矩阵对S2层的特征映射图进行卷积，得到C3层的16个10*10的特征映射图；和S2同理，对C3层的特征映射图进行下采样处理，可以得到S4层的16个5*5的特征映射图；利用5*5的核矩阵对S4层的特征映射图进行卷积，得到C5层的120个1*1的特征映射图；同理，对C5层的特征映射图进行下采样处理，可以得到F6层的84个1*1的特征映射图。Exemplarily, in this embodiment, the binarized image extracted from the MSER region is used to train the convolutional neural network model. As shown in Figure 2C, first input a 32*32 image, and convolve the input image through six 5*5 kernel matrices to obtain six 28*28 feature maps of the C1 layer; The map is down-sampled, and every 4 pixels (2*2) get a value, and then six 14*14 feature maps of the S2 layer are obtained; then the feature map of the S2 layer is processed using a 5*5 kernel matrix Convolve to obtain 16 feature maps of 10*10 in C3 layer; similarly to S2, downsampling the feature map of C3 layer can obtain 16 feature maps of 5*5 in S4 layer; use The 5*5 kernel matrix convolutes the feature map of the S4 layer to obtain 120 1*1 feature maps of the C5 layer; similarly, downsampling the feature map of the C5 layer can obtain the F6 layer The 84 1*1 feature maps.

S254：将特征映射图输入到分类器中，根据分类器的输出结果确定MSER区域为文本区域。S254: Input the feature map into the classifier, and determine the MSER region as the text region according to the output result of the classifier.

示例性的，将上述S253中获得的F6层的特征映射图输入到softmax分类器中，根据softmax分类器的输出结果确定输入的图像为文本图像，相应的MSER区域为文本区域。在其他实施例中，也可采用SVM等其他分类器。Exemplarily, the feature map of the F6 layer obtained in S253 above is input into the softmax classifier, and the input image is determined to be a text image according to the output result of the softmax classifier, and the corresponding MSER region is a text region. In other embodiments, other classifiers such as SVM can also be used.

经卷积神经网络模型对MSERA区域进行分类后，基本可以确定待检测图像中单个字符或文字的区域，基本过滤了非文本区域的矩形框，保留了文本区域矩形框。After the convolutional neural network model classifies the MSERA area, the area of a single character or text in the image to be detected can be basically determined, the rectangular frame of the non-text area is basically filtered, and the rectangular frame of the text area is retained.

S260：在水平方向上合并相邻文本区域。S260: Merge adjacent text regions in the horizontal direction.

示例性的，对于包含英文单词的待检测图像，还需要将各字符组合合并为单词。计算所有相邻字符区域之间的距离，并计算出平均距离；找到未被处理的最左侧的字符区域，然后在水平方向上依次寻找与字符区域最近的字符区域，当相邻两个字符区域的高度比在预设的高度比值范围内时，例如，高度比在0.5-2之间时，将这两个字符区域合并，当相邻两个字符区域之间的距离大于设定距离(如上述平均距离的3倍)时，停止迭代，这样可以划分出处于同一行的文本区域。Exemplarily, for an image to be detected that contains English words, it is also necessary to combine each character combination into a word. Calculate the distance between all adjacent character regions, and calculate the average distance; find the leftmost character region that has not been processed, and then look for the character region closest to the character region in the horizontal direction, when two adjacent characters When the height ratio of the region is within the preset height ratio range, for example, when the height ratio is between 0.5-2, the two character regions are merged, and when the distance between two adjacent character regions is greater than the set distance ( When it is 3 times of the above average distance), the iteration is stopped, so that the text area in the same line can be divided.

S270：对合并后的文本区域进行区域内单词分割。S270: Perform intra-region word segmentation on the merged text region.

示例性的，对于经上述S260合并后的每组文本区域内，若相邻两个字符区域之间的距离大于上述平均距离，则将该相邻的两个字符区域分割开，这样可以分割同一行中的不同单词。Exemplarily, for each group of text regions merged by the above S260, if the distance between two adjacent character regions is greater than the above-mentioned average distance, then the adjacent two character regions are divided, so that the same different words in the line.

重复S260和S270，直到所有文本区域均被处理。Repeat S260 and S270 until all text areas are processed.

需要说明的是，本发明实施例中所述的MSER区域，也表示MSER区域对应的区域图像。It should be noted that, the MSER region mentioned in the embodiment of the present invention also means a region image corresponding to the MSER region.

综上，在本技术方案中，接收初始图像，对初始图像进行颜色空间转换，获取待检测图像，从待检测图像中提取最大稳定极值MSER区域，其中，最大稳定极值区域为连通区域，过滤MSER区域，得到待检测图像中的文本区域，并进一步对文本区域进行区域间合并及区域内单词分割。通过从待检测图像中提取MSER区域，以划分连通区域的方式提取MSER区域作为候选区域，再对提取到的MSER区域进行过滤筛选，最终得到待检测图像中的文本区域，区域划分有利于减少计算量、提高检测效率，同时提取MSER区域可减少图像背景的干扰，可提高在检测背景复杂的图像时的准确率。To sum up, in this technical solution, the initial image is received, the color space conversion is performed on the initial image, the image to be detected is obtained, and the maximum stable extremum MSER region is extracted from the image to be detected, wherein the maximum stable extremum region is a connected region, Filter the MSER region to obtain the text region in the image to be detected, and further merge the text region between regions and segment words within the region. By extracting the MSER area from the image to be detected, the MSER area is extracted as a candidate area by dividing the connected area, and then the extracted MSER area is filtered and screened, and finally the text area in the image to be detected is obtained. The area division is conducive to reducing calculations. At the same time, extracting the MSER region can reduce the interference of the image background and improve the accuracy of detecting images with complex backgrounds.

以下为本发明实施例提供的一种图像检测装置的实施例，图像检测装置与上述图像检测方法属于同一个发明构思，在装置的实施例中未详尽描述的细节内容，请参考上述方法的实施例。The following is an embodiment of an image detection device provided by the embodiment of the present invention. The image detection device and the above-mentioned image detection method belong to the same inventive concept. For the details not described in detail in the embodiment of the device, please refer to the implementation of the above method example.

实施例三Embodiment three

请参考图3，其是本发明实施例三提供的一种图像检测装置的架构示意图。Please refer to FIG. 3 , which is a schematic structural diagram of an image detection device provided by Embodiment 3 of the present invention.

本实施例提供的一种图像检测装置300，可以包括以下内容：An image detection device 300 provided in this embodiment may include the following:

待检测图像获取模块310，用于获取待检测图像。The image to be detected acquisition module 310 is configured to acquire an image to be detected.

MSER区域提取模块320，用于从待检测图像中提取最大稳定极值MSER区域，其中，MSER区域为连通区域。The MSER region extraction module 320 is configured to extract the maximum stable extremum MSER region from the image to be detected, wherein the MSER region is a connected region.

MSER区域过滤模块330，用于过滤MSER区域，得到待检测图像中的文本区域。The MSER region filtering module 330 is configured to filter the MSER region to obtain a text region in the image to be detected.

实施例四Embodiment Four

请参考图4A和图4B，其中，图4A是本发明实施例四提供的一种图像检测装置的架构示意图，图4B是图4A中MSER区域过滤模块450的可选实施方式的架构示意图。本实施例与实施例三的主要区别在于，在实施例三的基础上增加了初始图像接收模块410、颜色空间转换模块420、文本区域合并模块460和单词分割模块470的内容，并进一步提供了MSER区域过滤模块450的可选实施方式。Please refer to FIG. 4A and FIG. 4B , wherein FIG. 4A is a schematic structural diagram of an image detection device provided in Embodiment 4 of the present invention, and FIG. 4B is a schematic structural schematic diagram of an optional implementation of the MSER region filtering module 450 in FIG. 4A . The main difference between this embodiment and the third embodiment is that on the basis of the third embodiment, the content of the initial image receiving module 410, the color space conversion module 420, the text region merging module 460 and the word segmentation module 470 are added, and further provides An optional implementation of the MSER region filtering module 450 .

本实施例提供的一种图像检测装置400，可以包括如下内容：An image detection device 400 provided in this embodiment may include the following:

初始图像接收模块410，用于接收初始图像。The initial image receiving module 410 is configured to receive an initial image.

颜色空间转换模块420，用于对初始图像进行颜色空间转换，以获得待检测图像。The color space conversion module 420 is configured to perform color space conversion on the initial image to obtain the image to be detected.

待检测图像获取模块430，用于获取待检测图像。The image to be detected acquisition module 430 is configured to acquire an image to be detected.

MSER区域提取模块440，用于从待检测图像中提取最大稳定极值MSER区域，其中，MSER区域为连通区域。The MSER region extraction module 440 is configured to extract the maximum stable extremum MSER region from the image to be detected, wherein the MSER region is a connected region.

优选的，MSER区域提取模块440，具体用于：Preferably, the MSER region extraction module 440 is specifically used for:

对待检测图像进行二值化处理，调节二值化阈值在[0,255]范围内变化，当连通区域的面积变化幅度V(i)小于设定的变化幅度值时，确定连通区域为MSER区域；Carry out binarization processing on the image to be detected, adjust the binarization threshold to change within the range of [0, 255], when the area change range V(i) of the connected region is less than the set change range value, determine the connected region as the MSER region;

其中，Q_i表示二值化阈值为i时连通区域的面积，Δ表示二值化阈值的微小变化。in, Q _i represents the area of the connected region when the binarization threshold is i, and Δ represents the small change of the binarization threshold.

MSER区域过滤模块450，用于过滤MSER区域，得到待检测图像中的文本区域。The MSER region filtering module 450 is configured to filter the MSER region to obtain the text region in the image to be detected.

可选的，如图4B所示，MSER区域过滤模块450可以包括统计单元451、过滤单元452、特征映射图获得单元453和文本区域确定单元454，其中：Optionally, as shown in FIG. 4B, the MSER region filtering module 450 may include a statistical unit 451, a filtering unit 452, a feature map obtaining unit 453, and a text region determining unit 454, wherein:

统计单元451，用于统计MSER区域的像素值或区域长宽比。A statistical unit 451, configured to count pixel values or area aspect ratios of the MSER area.

过滤单元452，用于将像素值小于预设像素阈值或区域长宽比不在预设范围内的MSER区域过滤。The filtering unit 452 is configured to filter MSER regions whose pixel values are smaller than a preset pixel threshold or whose aspect ratio is not within a preset range.

特征映射图获得单元453，用于连续对过滤后剩余的MSER区域进行卷积和下采样处理，获得特征映射图。The feature map obtaining unit 453 is configured to continuously perform convolution and downsampling processing on the remaining MSER regions after filtering to obtain a feature map.

文本区域确定单元454，用于将特征映射图输入到分类器中，根据分类器的输出结果确定MSER区域为文本区域。The text region determination unit 454 is configured to input the feature map into the classifier, and determine the MSER region as the text region according to the output result of the classifier.

文本区域合并模块460，用于在水平方向上合并相邻文本区域。The text area merging module 460 is used for merging adjacent text areas in the horizontal direction.

单词分割模块470，用于对合并后的文本区域进行区域内单词分割。The word segmentation module 470 is configured to perform intra-region word segmentation on the merged text region.

综上，在本技术方案中，接收初始图像，对初始图像进行颜色空间转换，获取待检测图像，从待检测图像中提取最大稳定极值MSER区域，其中，最大稳定极值区域为连通区域，过滤MSER区域，得到待检测图像中的文本区域，并进一步对文本区域进行区域间合并及区域内单词分割。通过从待检测图像中提取MSER区域，以划分连通区域的方式提取MSER区域作为候选区域，再对提取到的MSER区域进行过滤筛选，最终得到待检测图像中的文本区域，区域划分有利于减少计算量、提高检测效率，同时提取MSER区域可减少图像背景的干扰，可提高在检测背景复杂的图像时的准确率。To sum up, in this technical solution, the initial image is received, the color space conversion is performed on the initial image, the image to be detected is obtained, and the maximum stable extremum MSER region is extracted from the image to be detected, wherein the maximum stable extremum region is a connected region, Filter the MSER region to obtain the text region in the image to be detected, and further merge the text region between regions and segment words within the region. By extracting the MSER region from the image to be detected, the MSER region is extracted as a candidate region by dividing the connected region, and then the extracted MSER region is filtered and screened, and finally the text region in the image to be detected is obtained. Region division is beneficial to reduce calculation At the same time, extracting the MSER region can reduce the interference of the image background and improve the accuracy of detecting images with complex backgrounds.

注意，上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解，本发明不限于这里所述的特定实施例，对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此，虽然通过以上实施例对本发明进行了较为详细的说明，但是本发明不仅仅限于以上实施例，在不脱离本发明构思的情况下，还可以包括更多其他等效实施例，而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, and the present invention The scope is determined by the scope of the appended claims.

Claims

1. An image detection method, characterized in that, comprising:

Obtain the image to be detected;

Extracting a maximum stable extremum MSER region from the image to be detected, wherein the MSER region is a connected region;

Filter the MSER region to obtain the text region in the image to be detected.

2. The method according to claim 1, wherein, before receiving the image to be detected, further comprising:

receive the initial image;

Perform color space conversion on the initial image to obtain the image to be detected.

3. The method according to claim 2, wherein the extracting the maximum stable extremum MSER region from the image to be detected comprises:

Perform binarization processing on the image to be detected, adjust the binarization threshold to change within the range of [0, 255], and determine the The connected area is the MSER area;

in, Q _i represents the area of the connected region when the binarization threshold is i, and Δ represents a slight change of the binarization threshold.

4. The method according to claim 3, wherein said filtering said MSER region to obtain the text region in said image to be detected comprises:

Counting pixel values or area aspect ratios of the MSER area;

Filter out MSER regions whose pixel values are smaller than a preset pixel threshold or whose aspect ratio is not within a preset range.

5. The method according to claim 4, wherein after filtering the MSER region whose pixel value is smaller than a preset pixel threshold or whose region aspect ratio is not within a preset range, further comprising:

Continuously perform convolution and downsampling processing on the remaining MSER regions after filtering to obtain feature maps;

The feature map is input into a classifier, and the MSER region is determined as a text region according to the output result of the classifier.

6. The method according to any one of claims 1-5, wherein, after filtering the MSER region and obtaining the text region in the image to be detected, further comprising:

Merge adjacent text areas horizontally;

Intra-region word segmentation is performed on the merged text regions.

7. An image detection device, characterized in that it comprises:

The image acquisition module to be detected is used to obtain the image to be detected;

The MSER region extraction module is used to extract the maximum stable extremum MSER region from the image to be detected, wherein the MSER region is a connected region;

The MSER region filtering module is used to filter the MSER region to obtain the text region in the image to be detected.

8. The device of claim 7, further comprising:

An initial image receiving module, configured to receive an initial image;

a color space conversion module, configured to perform color space conversion on the initial image to obtain the image to be detected;

The text area merging module is used for merging adjacent text areas in the horizontal direction;

A word segmentation module for intra-region word segmentation on merged text regions.

9. The device according to claim 8, wherein the MSER region extraction module is specifically used for:

10. The device according to claim 9, wherein the MSER region filtering module comprises:

A statistical unit, used to count pixel values or area aspect ratios of the MSER area;

A filtering unit, configured to filter MSER regions whose pixel values are smaller than a preset pixel threshold or whose aspect ratio is not within a preset range;

The feature map obtaining unit is used to continuously perform convolution and down-sampling processing on the remaining MSER regions after filtering to obtain the feature map;

A text region determining unit, configured to input the feature map into a classifier, and determine the MSER region as a text region according to an output result of the classifier.