CN106203417A

CN106203417A - A kind of adhesion character alienable RMB crown word number identification method

Info

Publication number: CN106203417A
Application number: CN201610544250.0A
Authority: CN
Inventors: 周晖; 彭云峰; 朱国君
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2016-07-12
Filing date: 2016-07-12
Publication date: 2016-12-07

Abstract

The invention discloses a method for recognizing RMB serial numbers with divisible sticky characters. Through horizontal fine-segmentation, it is judged that those whose width is greater than the maximum width D _max of a single character image are abnormal character graphics, that is, sticky character images. Vertically project the curve, find the minimum value point in the set sub-interval as the character segmentation point, and then perform vertical segmentation. If the character image of the segmentation process is greater than the minimum character image width D _min , keep the character image, so that the characters will be glued Segmentation can improve the accuracy rate of character segmentation in the preprocessing process and reduce the misrecognition rate of prefix numbers.

Description

A Recognition Method of Renminbi Serial Numbers with Segmentable Sticky Characters

技术领域technical field

本发明属于模式识别以及光学字符识别(OCR)技术领域，特别是，更为具体地讲，涉及一种粘连字符可分割的人民币冠字号识别方法，可应用于点钞验钞清分机系统的人民币冠字号识别。The invention belongs to the field of pattern recognition and optical character recognition (OCR) technology, and in particular, relates to a method for recognizing a Renminbi serial number with detachable sticky characters, which can be applied to the Renminbi counting, checking, sorting and sorting machine system. Serial number identification.

背景技术Background technique

《人民币鉴别仪通用技术条件》国家标准修改起草工作组在2009年公布了新版验钞机标准GB 16999-2010，要求验钞机增加纸币冠字号识别功能，因此在点钞机中嵌入纸币冠字号识别功能已经成为最新的研究课题。In 2009, the National Standard Revision and Drafting Working Group of "General Technical Conditions for RMB Discriminators" published a new version of the banknote detector standard GB 16999-2010, requiring banknote detectors to increase the banknote serial number recognition function, so the banknote serial number is embedded in the banknote counting machine Recognition function has become the latest research topic.

受机器性能或钞票本身污损等影响，验钞机对钞票上冠字号进行拍摄或扫描时，会产生一些干扰，使得字符发生粘连，这样造成验钞机无法正确识别人民币的冠字号，使得其无法达到《人民币鉴别仪通用技术条件》中规定的冠字号码误识率小于0.03％的要求，其中，冠字号码误识率是指发生冠字号码误识的纸币数量与实际识别纸币数量的比率。Affected by the performance of the machine or the defacement of the banknote itself, when the money detector shoots or scans the serial number on the banknote, there will be some interference, which will cause the characters to stick together. Unable to meet the requirement of the serial number misrecognition rate of less than 0.03% stipulated in the "General Technical Conditions for Renminbi Verifiers", where the serial number misrecognition rate refers to the number of banknotes with serial number misrecognition and the actual number of banknotes recognized ratio.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提出一种粘连字符可分割的人民币冠字号识别方法，以有效地分割粘连字符，减小冠字号码误识率。The object of the present invention is to overcome the deficiencies of the prior art, and propose a method for recognizing the serial number of RMB in which the glued characters can be divided, so as to effectively segment the glued characters and reduce the misrecognition rate of the serial number.

为实现上述发明目的，本发明粘连字符可分割的人民币冠字号识别方法，其特征在于，包括以下步骤：In order to realize the above-mentioned purpose of the invention, the renminbi serial number recognition method that the sticky characters of the present invention can be divided is characterized in that, comprises the following steps:

(1)、字符分割(1), character segmentation

1.1)、字符水平粗分割1.1), character horizontal rough segmentation

输入为图像传感器所采集的冠字号灰度图像，并进行二值化处理，并然后进行垂直投影解析，在水平位置投影值为0值处，进行垂直切分，得到字符的粗分割图像；The input is the serial number grayscale image collected by the image sensor, and binarization is performed, and then vertical projection analysis is performed, and vertical segmentation is performed at the horizontal position projection value of 0 to obtain a rough segmentation image of the character;

1.2)、字符水平细分割1.2), character horizontal subdivision

1.2.1)、检测粗分割得到的字符图像的宽度:若宽度大于单个字符图像的最大宽度D_max，将该字符图像视为异常字符图像，转步骤1.2.2)；否则将该字符图像视为单个字符图像，继续检测下一个字符图像，直到检测完最后一个字符图像并结束字符分割；1.2.1), detect the width of the character image obtained by coarse segmentation: if the width is greater than the maximum width D _max of a single character image, the character image is regarded as an abnormal character image, and step 1.2.2) is turned; otherwise, the character image is regarded as For a single character image, continue to detect the next character image until the last character image is detected and character segmentation ends;

1.2.2)、统计异常字符图像的垂直投影曲线，在包含水平中心坐标的字符图像水平区间的子区间[x_l+s,x_r-s]内寻找垂直投影曲线的最小值点，并将该点作为粘连字符分割点，其中，x_l,x_r为粗分割得到字符图像水平方向的左右坐标，s为子区间设定参数，根据具体情况设定，其值越大，子区间越小；1.2.2), count the vertical projection curve of the abnormal character image, find the minimum value point of the vertical projection curve in the subinterval [x _l +s, x _r -s] of the character image horizontal interval containing the horizontal center coordinates, and This point is used as the segmentation point of glued characters, where x _l and x _r are the left and right coordinates in the horizontal direction of the character image obtained by rough segmentation, and s is the subinterval setting parameter, which is set according to the specific situation. The larger the value, the smaller the subinterval ;

1.2.3)、以步骤1.2.2)得到的粘连字符分割点对异常字符图像做垂直切分，并判断切分后的左右字符图像的字符宽度，如果该字符图像宽度小于最小字符图像宽度D_min，则舍弃该字符图像，反之保留该字符图像，然后返回步骤1.2)，继续检测下一个字符图像；1.2.3), with the sticky character segmentation point that step 1.2.2) obtains, vertically segment the abnormal character image, and judge the character width of the left and right character images after the segmentation, if the character image width is less than the minimum character image width D _min , discard the character image, otherwise keep the character image, then return to step 1.2), and continue to detect the next character image;

1.3)、字符垂直分割1.3), character vertical division

将步骤1.2)得到的字符图像进行水平投影解析，在投影值为0值字符图像去除即将字符图像上下空白部分切除；The character image obtained in step 1.2) is analyzed horizontally, and the character image is removed when the projection value is 0, and the upper and lower blank parts of the character image are removed;

(2)、字符识别(2), character recognition

将步骤(1)字符分割得到的单个字符图像经过字符图像大小归一化后，按照水平切分顺序，依次送入训练好的卷积神经网络进行识别，得到相应的字符；After the single character image obtained by the character segmentation in step (1) is normalized by the size of the character image, it is sequentially sent to the trained convolutional neural network according to the order of horizontal segmentation to identify the corresponding character;

(3)、把所有的识别出来的字符，按照切分顺序进行组合，得到冠字号(码)。(3), all recognized characters are combined according to the order of segmentation to obtain the serial number (code).

本发明的目的是这样实现的。The purpose of the present invention is achieved like this.

本发明粘连字符可分割的人民币冠字号识别方法，通过水平细分割即：通过宽度判定大于单个字符图像最大宽度D_max的为异常字符图形即粘连字符图像，并通过异常字符图像的垂直投影曲线，在设定的子区间寻找最小值点作为字符分割点，然后进行垂直切分，如果切分处理的字符图像大于最小字符图像宽度D_min，则保留该字符图像，这样将粘连字符分割开来，提高预处理过程的字符分割正确率，减小冠字号码误识率。The method for recognizing the RMB serial number with the divisible sticky characters of the present invention, through horizontal subdivision, that is: by determining the width greater than the maximum width D _max of a single character image, it is an abnormal character figure, that is, a sticky character image, and through the vertical projection curve of the abnormal character image, Find the minimum value point in the set sub-interval as the character segmentation point, and then perform vertical segmentation. If the character image of the segmentation process is greater than the minimum character image width D _min , then keep the character image, so that the glued characters are separated. Improve the correct rate of character segmentation in the preprocessing process, and reduce the misrecognition rate of prefix numbers.

附图说明Description of drawings

图1是本发明粘连字符可分割的人民币冠字号识别方法一种具体实施方式流程图；Fig. 1 is a kind of specific implementation flow chart of the Renminbi serial number identification method that sticking character can be divided of the present invention;

图2是存在粘连字符的冠字号灰度图像一具体实例图；Fig. 2 is a specific example figure of a serial number grayscale image with glued characters;

图3是图2所示冠字号灰度图像的垂直投影曲线图；Fig. 3 is the vertical projection curve diagram of serial number grayscale image shown in Fig. 2;

图4是图2所示冠字号灰度图像进行水平细分割得到的字符图像；Fig. 4 is the character image that the gray scale image of serial number shown in Fig. 2 carries out horizontal subdivision and obtains;

图5是图1所示卷积神经网络一具体实施结构图。FIG. 5 is a structural diagram of a specific implementation of the convolutional neural network shown in FIG. 1 .

具体实施方式detailed description

下面结合附图对本发明的具体实施方式进行描述，以便本领域的技术人员更好地理解本发明。需要特别提醒注意的是，在以下的描述中，当已知功能和设计的详细描述也许会淡化本发明的主要内容时，这些描述在这里将被忽略。Specific embodiments of the present invention will be described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention. It should be noted that in the following description, when detailed descriptions of known functions and designs may dilute the main content of the present invention, these descriptions will be omitted here.

图1是本发明粘连字符可分割的人民币冠字号识别方法一种具体实施方式流程图。Fig. 1 is a flow chart of a specific embodiment of the recognition method for RMB serial numbers with detachable sticky characters of the present invention.

在本实施例中，如图1所述，本发明粘连字符可分割的人民币冠字号识别方法包括以下步骤：In the present embodiment, as shown in Fig. 1, the method for identifying the RMB serial number with detachable sticky characters of the present invention comprises the following steps:

一、字符分割1. Character segmentation

S101、输入为图像传感器所采集的8比特灰度冠字号图像(输入图像)，进行二值化处理，得到二值化图像；在本实施例中，二值化处理后，进行了取反操作，即0变为255，255变为0，这样字符就变为了白色，而背景变为了黑色；S101, the input is the 8-bit grayscale serial number image (input image) collected by the image sensor, and binarization processing is performed to obtain a binarization image; in this embodiment, after the binarization processing, a negation operation is performed , that is, 0 becomes 255, and 255 becomes 0, so that the characters become white and the background becomes black;

S102、将二值化图像进行垂直投影解析，在水平位置投影值为0值处即该处无垂直方向为255的像素点个数为0，进行垂直切分，得到字符的粗分割图像；S102. Perform vertical projection analysis on the binarized image. At the place where the horizontal position projection value is 0, that is, there is no pixel point number of 255 in the vertical direction, the number of pixels is 0, and the vertical segmentation is performed to obtain a rough segmentation image of the character;

在本实施例中，步骤S101、S102即为字符水平粗分割。In this embodiment, steps S101 and S102 are horizontal rough segmentation of characters.

S103、粘连字符检测及分割即字符水平细分割；S103, detection and segmentation of sticky characters, i.e. character horizontal subdivision;

在本实施例中，如图2所示，冠字号图像采集过程中可能会由于噪声和污损导致字符之间存在粘连，如果采用传统的垂直投影分割法即步骤S101、S102的方法直接进行分割，U和F会被视为一个字符进行分割，显然分割结果是错误的，这会导致无法识别或识别错误。同时，最后一个字符0由于噪声导致字符宽度大于D_max，也被视为异常字符。In this embodiment, as shown in Figure 2, during the image acquisition process of serial number images, there may be adhesion between characters due to noise and contamination. If the traditional vertical projection segmentation method, that is, the method of steps S101 and S102, is used to directly segment , U and F will be regarded as a character for segmentation, obviously the segmentation result is wrong, which will lead to unrecognized or incorrectly recognized. At the same time, the last character 0 is also regarded as an abnormal character because the character width is greater than D _max due to noise.

在本发明中，采用以下方法进行字符水平细分割，即：In the present invention, the following method is adopted to carry out character level subdivision, namely:

S1031、检测粗分割得到的字符图像的宽度:若宽度大于单个字符图像的最大宽度D_max，将该字符图像视为异常字符图像，转步骤S1032；否则将该字符图像视为单个字符图像，继续检测下一个字符图像，直到检测完最后一个字符图像并结束字符分割；S1031, detect the width of the character image obtained by coarse segmentation: if the width is greater than the maximum width D _max of a single character image, the character image is regarded as an abnormal character image, and step S1032 is turned; otherwise, the character image is regarded as a single character image, and continues Detect the next character image until the last character image is detected and character segmentation ends;

S1032、统计异常字符图像的垂直投影曲线，在包含水平中心坐标的字符图像水平区间的子区间[x_l+s,x_r-s]内寻找垂直投影曲线的最小值点即该点垂直方向为255的像素点个数最少，并将该点作为粘连字符分割点，其中，x_l,x_r为粗分割得到字符图像水平方向的左右坐标，s为子区间设定参数，根据具体情况设定，其值越大，子区间越小；在本实施例中，字符图像大小采用像素点表示，参数s＝3，即3个像素点S1032, count the vertical projection curve of the abnormal character image, and find the minimum value point of the vertical projection curve in the subinterval [x _l +s, x _r -s] of the character image horizontal interval including the horizontal center coordinate, that is, the vertical direction of the point is The pixel number of 255 is the least, and this point is used as the sticky character segmentation point, wherein, x _l and x _r are the left and right coordinates of the horizontal direction of the character image obtained by rough segmentation, and s is the sub-interval setting parameter, which is set according to the specific situation , the larger its value, the smaller the sub-interval; in this embodiment, the size of the character image is represented by pixels, and the parameter s=3, that is, 3 pixels

S1033、以步骤S1032得到的粘连字符分割点对异常字符图像做垂直切分，并判断切分后的左右字符图像的字符宽度，如果该字符图像宽度小于最小字符图像宽度D_min，则舍弃该字符图像，反之保留该字符图像，然后返回步骤1.2)，继续检测下一个字符图像；S1033. Vertically segment the abnormal character image with the sticky character segmentation point obtained in step S1032, and determine the character width of the segmented left and right character images. If the character image width is smaller than the minimum character image width D _min , discard the character. image, otherwise keep the character image, then return to step 1.2), and continue to detect the next character image;

采用上述字符水平细分割方法后，得到图3用黑色粗线标注的两个粘连字符图像分割坐标，第一个异常字符图像分割后得到两个字符，字符宽度都大于字符最小宽度D_min，因此都保留下来，第二个异常字符分割后得到两个字符，粘连分割点左边的区域宽度大于D_min，保留字符，右边的区域宽度小于D_min,因此舍弃。最后得到正确分割后的十个字符，如图4所示。After adopting the above character horizontal fine segmentation method, the segmentation coordinates of the two cohesive character images marked with black thick lines in Figure 3 are obtained. After the first abnormal character image is segmented, two characters are obtained, and the character width is greater than the minimum character width D _min , so They are all preserved, and the second abnormal character is split to obtain two characters. The width of the area on the left side of the sticky split point is greater than D _min , the character is reserved, and the width of the area on the right is smaller than D _min , so it is discarded. Finally, the ten characters correctly segmented are obtained, as shown in Fig. 4 .

S104、字符垂直分割S104, character vertical division

将步骤S103得到的字符图像进行水平投影解析，在投影值为0值字符图像去除即将字符图像上下空白部分切除。Perform horizontal projection analysis on the character image obtained in step S103, and remove the upper and lower blank parts of the character image when the projection value is 0.

二、字符识别2. Character recognition

S201、将步骤一字符分割得到的单个字符图像经过字符图像大小归一化。在本实施例中，归一化尺寸为14x14。S201. Normalize the individual character images obtained by character segmentation in Step 1 through character image size normalization. In this example, the normalized size is 14x14.

S202、归一化尺寸后的字符图像，按照水平切分顺序，依次送入训练好的卷积神经网络进行识别，得到相应的字符。S202. The character image after the normalized size is sequentially sent to the trained convolutional neural network for recognition according to the order of horizontal segmentation, and corresponding characters are obtained.

与相传统模板匹配和人工提取字符特征的识别方法相比，卷积神经网络具有识别率高，自动提取字符图像的各种深层次特征的特点。在本实施例中，为了减少对资源的消耗，提高识别速度，同时兼顾识别的冠字号码正确率(冠字号码误识率)，对传统的卷积神经网络结构进行了简化。Compared with the recognition method of traditional template matching and manual extraction of character features, convolutional neural network has the characteristics of high recognition rate and automatic extraction of various deep-level features of character images. In this embodiment, in order to reduce the consumption of resources, increase the recognition speed, and take into account the correct rate of recognized serial numbers (the misrecognition rate of serial numbers), the structure of the traditional convolutional neural network is simplified.

在本实施例中，如图5所示，卷积神经网络结构是以LeNet5架构为基础，简化操作为：In this embodiment, as shown in Figure 5, the convolutional neural network structure is based on the LeNet5 architecture, and the simplified operation is:

1、不包含输入层，LeNet5架构的卷积神经网络共有7层，而在本实施例中，卷积神经网络共有5层，即减少了C5层和F6层，C5层为120个特征图组成的卷积层，F6为全连接层，去掉这两层能够减少大量的训练参数，节约了计算资源和存储器资源，S4层特征图直接采用全连接方式连接到输出层，无需中间隐含层；1. Does not include the input layer. The convolutional neural network of the LeNet5 architecture has 7 layers in total. In this embodiment, the convolutional neural network has 5 layers in total, that is, the C5 layer and the F6 layer are reduced, and the C5 layer is composed of 120 feature maps. The convolutional layer of F6 is a fully connected layer. Removing these two layers can reduce a large number of training parameters and save computing resources and memory resources. The S4 layer feature map is directly connected to the output layer in a fully connected manner without an intermediate hidden layer;

2、减少每层的特征图数量，LeNet5架构的卷积神经网络C3层特征图含有16个特征图，而本发明C3层特征图只含有6个特征图；2. Reduce the number of feature maps of each layer. The feature map of the convolutional neural network C3 layer of the LeNet5 architecture contains 16 feature maps, while the feature map of the C3 layer of the present invention only contains 6 feature maps;

3、S2层特征图到C3特征图采用全连接的方式，而不是LeNet5架构的卷积神经网络中的部分连接方式，采用这种方式虽然增加了一部分连接参数，但是降低了FPGA实现难度。3. The feature map of the S2 layer is fully connected to the C3 feature map, instead of the partial connection method in the convolutional neural network of the LeNet5 architecture. Although this method increases some connection parameters, it reduces the difficulty of FPGA implementation.

4、在本实施例中，下采样层采用的是平均池化，无需训练参数，每层只需训练一个偏置参数即可。4. In this embodiment, the downsampling layer adopts average pooling, no training parameters are required, and only one bias parameter needs to be trained for each layer.

具体而言，在本实施例中，所述的卷积神经网络进行识别为：Specifically, in this embodiment, the convolutional neural network is identified as:

S2021、将大小归一化的14x14单个字符图像进行卷积，得到包含6个特征图的C1层特征图，每个特征图的大小为12x12；S2021. Convolving the 14x14 single character image normalized in size to obtain a C1 layer feature map containing 6 feature maps, each of which has a size of 12x12;

S2022、对C1层特征图进行下采样，得到包含6个特征图的S2层特征图，每个特征图的大小为6x6；S2022. Down-sampling the C1 layer feature map to obtain an S2 layer feature map containing 6 feature maps, each of which has a size of 6x6;

S2023、对S2层特征图进行卷积，得到包含6个特征图的C3层特征图，每个特征图的大小为4x4；S2023. Convolving the S2 layer feature map to obtain a C3 layer feature map containing 6 feature maps, each feature map has a size of 4x4;

S2024、对C3层特征图进行下采样，得到包含6个特征图的S4层特征图，每个特征图的大小为2x2；S2024. Down-sampling the feature map of layer C3 to obtain feature maps of layer S4 comprising 6 feature maps, each of which has a size of 2x2;

S2025、S4层特征图直接采用全连接方式连接到输出层，输出相应的字符。S2025, the feature map of layer S4 is directly connected to the output layer in a fully connected manner, and the corresponding characters are output.

三、把所有的识别出来的字符，按照切分顺序进行组合，得到冠字号(码)。冠字号。3. Combine all the recognized characters according to the order of segmentation to obtain the serial number (code). Crown number.

在本实施例中，先在PC上位机软件平台上构建并训练卷积神经网络，然后将训练好的卷积神经网络下载到验钞机的FPGA芯片中对采集分割得到的字符图像进行识别。In this embodiment, the convolutional neural network is first constructed and trained on the PC host computer software platform, and then the trained convolutional neural network is downloaded to the FPGA chip of the cash detector to recognize the character images obtained by collecting and segmenting.

尽管上面对本发明说明性的具体实施方式进行了描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。Although the illustrative specific embodiments of the present invention have been described above, so that those skilled in the art can understand the present invention, it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, As long as various changes are within the spirit and scope of the present invention defined and determined by the appended claims, these changes are obvious, and all inventions and creations using the concept of the present invention are included in the protection list.

Claims

1. A method for recognizing the renminbi serial number that can be divided by sticking characters, is characterized in that, comprises the following steps:

(1), character segmentation

1.1), character horizontal rough segmentation

The input is the serial number grayscale image collected by the image sensor, and binarization is performed, and then vertical projection analysis is performed, and vertical segmentation is performed at the horizontal position projection value of 0 to obtain a rough segmentation image of the character;

1.2), character horizontal subdivision

1.2.1), detect the width of the character image obtained by coarse segmentation: if the width is greater than the maximum width D _max of a single character image, the character image is regarded as an abnormal character image, and step 1.2.2) is turned; otherwise, the character image is regarded as For a single character image, continue to detect the next character image until the last character image is detected and character segmentation ends;

1.2.2), count the vertical projection curve of the abnormal character image, find the minimum value point of the vertical projection curve in the subinterval [x _l +s, x _r -s] of the character image horizontal interval containing the horizontal center coordinates, and This point is used as the segmentation point of glued characters, where x _l and x _r are the left and right coordinates in the horizontal direction of the character image obtained by rough segmentation, and s is the subinterval setting parameter, which is set according to the specific situation. The larger the value, the smaller the subinterval ;

1.2.3), with the sticky character segmentation point that step 1.2.2) obtains, vertically segment the abnormal character image, and judge the character width of the left and right character images after the segmentation, if the character image width is less than the minimum character image width D _min , discard the character image, otherwise keep the character image, then return to step 1.2), and continue to detect the next character image;

1.3), character vertical division

The character image obtained in step 1.2) is analyzed horizontally, and the character image is removed when the projection value is 0, and the upper and lower blank parts of the character image are removed;

(2), character recognition

After the single character image obtained by the character segmentation in step (1) is normalized by the size of the character image, it is sequentially sent to the trained convolutional neural network according to the order of horizontal segmentation to identify the corresponding character;

(3), all recognized characters are combined according to the order of segmentation to obtain the serial number (code).

2. identification method according to claim 1, is characterized in that, in step (2), described convolutional neural network is identified as:

2.1), the size normalized 14x14 single character image is convolved to obtain a C1 layer feature map comprising 6 feature maps, each of which has a size of 12x12;

2.2), the C1 layer feature map is down-sampled to obtain the S2 layer feature map containing 6 feature maps, and the size of each feature map is 6x6;

2.3) Convolving the S2 layer feature map to obtain a C3 layer feature map containing 6 feature maps, the size of each feature map is 4x4;

2.4), the C3 layer feature map is down-sampled, and the S4 layer feature map comprising 6 feature maps is obtained, and the size of each feature map is 2x2;

2.5), the S4 layer feature map is directly connected to the output layer in a fully connected manner, and the corresponding characters are output.