CN115565165A

CN115565165A - Semantic segmentation and Transformer-based container number detection and identification method

Info

Publication number: CN115565165A
Application number: CN202211103809.8A
Authority: CN
Inventors: 陈平平; 游索; 陈宏辉; 陈锋; 林志坚
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-09-09
Filing date: 2022-09-09
Publication date: 2023-01-03
Anticipated expiration: 2042-09-09
Also published as: CN115565165B

Abstract

The invention provides a container number detection and identification method based on semantic segmentation and Transformer, which comprises the following steps: s1, detecting container numbers and constructing an identification data set; s2, constructing a box number detection network based on semantic segmentation and training; s3, performing text correction on the box number area obtained by detection; s4, constructing a container number recognition network and training; and S5, detecting and identifying the container number by using the trained detection and identification network. By the technical scheme, the container number area in the image can be efficiently and accurately segmented, three different types of container numbers are classified, and the corrected container number area image is obtained through TPS text correction.

Description

A Container Number Detection and Recognition Method Based on Semantic Segmentation and Transformer

技术领域technical field

本发明涉及计算机视觉技术领域，特别是一种基于语义分割和Transformer的集装箱号检测识别方法。The invention relates to the technical field of computer vision, in particular to a container number detection and recognition method based on semantic segmentation and Transformer.

背景技术Background technique

随着国际贸易和社会经济的迅速发展，我国对物流运输行业的需求日益增长，集装箱作为货物运输的重要运载容器，在整个运输体系的方式占有举足轻重的地位。为实现大规模集装箱运输及管理的自动化、信息化和智能化，设计一个高效精准集装箱箱号的识别系统就显得非常有必要。With the rapid development of international trade and social economy, my country's demand for logistics and transportation industry is increasing day by day. As an important container for cargo transportation, containers occupy a pivotal position in the entire transportation system. In order to realize the automation, informatization and intelligence of large-scale container transportation and management, it is very necessary to design an efficient and accurate container number identification system.

传统常用的集装箱箱号检测方法有基于边缘检测、基于数学形态学和基于最大稳定极值区域(MSER)三种，这些传统的方法总结起来主要都是采取人工设计特征。在获取到图像之后，传统方法使用图像灰度化得到图像的灰度图；接着使用方向梯度直方图、尺度不变特征变换求取图像特征；然后通过滑动窗口或者基于图像区域相似度的方法搜索来产生箱号位置和箱号字符位置信息；在识别阶段利用决策树、支持向量机和模板匹配等方法对检测到的箱号区域进行识别。上述传统的图像学处理方法因为图片复杂的背景、图片噪声等因素，箱号检测难免有一定的局限性，且检测速度相对较低。Traditionally, there are three commonly used container number detection methods based on edge detection, based on mathematical morphology and based on the maximum stable extremum region (MSER). These traditional methods are mainly based on artificial design features. After the image is acquired, the traditional method uses image grayscale to obtain the grayscale image of the image; then uses the directional gradient histogram and scale-invariant feature transformation to obtain image features; then searches through sliding windows or methods based on image region similarity To generate the box number position and box number character position information; in the recognition stage, use decision tree, support vector machine and template matching methods to identify the detected box number area. Due to the complex background of the picture, picture noise and other factors, the above-mentioned traditional image processing method inevitably has certain limitations in box number detection, and the detection speed is relatively low.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提供一种基于语义分割和Transformer的集装箱号检测识别方法，高效、准确地将图像中集装箱箱号区域分割出来，并将三种不同类型的箱号分类，通过TPS文本矫正得到矫正后的箱号区域图像。In view of this, the purpose of the present invention is to provide a container number detection and recognition method based on semantic segmentation and Transformer, which can efficiently and accurately segment the container number area in the image, and classify three different types of container numbers. TPS text correction obtains the corrected box number area image.

为实现上述目的，本发明采用如下技术方案：一种基于语义分割和Transformer的集装箱号检测识别方法，包括以下步骤：In order to achieve the above object, the present invention adopts the following technical solutions: a container number detection and recognition method based on semantic segmentation and Transformer, comprising the following steps:

步骤S1:集装箱箱号检测与识别数据集的构建；Step S1: the construction of container number detection and identification data set;

步骤S2:构建基于语义分割的箱号检测网络并训练；Step S2: construct and train the box number detection network based on semantic segmentation;

步骤S3:将检测得到的箱号区域进行文本矫正；Step S3: Carry out text correction in the box number area that detects;

步骤S4:构建集装箱号码识别网络并训练；Step S4: construct container number identification network and train;

步骤S5:利用训练好的检测及识别网络对集装箱箱号进行检测识别的过程。Step S5: the process of using the trained detection and recognition network to detect and recognize the container number.

在一较佳的实施例中，所述步骤S1具体为：In a preferred embodiment, the step S1 is specifically:

步骤S11:分析集装箱号要检测和识别的类型，确定包含该类信息的图片为训练图片；Step S11: analyze the type of the container number to be detected and identified, and determine that the picture containing this type of information is a training picture;

步骤S12:实地港口采集集装箱数据集；Step S12: the container data set is collected at the port on the spot;

步骤S13:利用LabeIme标注软件，标注集装箱号码区域，将号码区域四边形框位置信息、分类信息以及箱号标注信息以json文件形式保存，得到初始集装箱号数据集；Step S13: Utilize LabeIme labeling software, label container number area, the square frame position information of number area, classification information and box number mark information are saved with json file form, obtain initial container number data set;

步骤S14:对初始集装箱号数据集进行数据增强处理，得到集装箱号数据集。Step S14: Perform data enhancement processing on the initial container number data set to obtain the container number data set.

在一较佳的实施例中，所述集装箱号要检测和识别的类型包括横向、纵向以及双行的集装箱号，其中横向及纵向的箱号标注为4个坐标，双行为6个坐标。In a preferred embodiment, the types of container numbers to be detected and identified include horizontal, vertical, and double-line container numbers, wherein the horizontal and vertical container numbers are marked with 4 coordinates, and the double-line container numbers are marked with 6 coordinates.

在一较佳的实施例中，所述步骤S2具体为：In a preferred embodiment, the step S2 is specifically:

步骤S21:利用Swin Transformer作为主干网络，分别得到原图1/4、1/8、1/16、1/32像素大小的特征图C1、C2、C3、C4；Step S21: using the Swin Transformer as the backbone network to obtain feature maps C1, C2, C3, and C4 of the size of the original image 1/4, 1/8, 1/16, and 1/32 pixels respectively;

步骤S22:特征图C1、C2、C3、C4通过FPN结构得到特征图F；Step S22: feature map C1, C2, C3, C4 obtain feature map F by FPN structure;

步骤S23:特征图F通过语义分割网络预测头，得到预测三种不同箱号类型的特征图；Step S23: feature map F predicts the head by semantic segmentation network, obtains the feature map of predicting three different box number types;

步骤S24:利用步骤S23得到的预测三个map，设计的损失函数，计算损失值；Step S24: utilize the prediction three maps that step S23 obtains, the loss function of design, calculate loss value;

步骤S25:开始训练，训练结束后保存权重文件；Step S25: start training, save weight file after training finishes;

步骤S26:利用步骤S25保存权重文件检测待识别图片，得到基于语义分割检测网络的箱号坐标数据。Step S26: Utilize step S25 to save the weight file to detect the picture to be recognized, and obtain the box number coordinate data based on the semantic segmentation detection network.

在一较佳的实施例中，所述损失值计算，具体如下：In a preferred embodiment, the calculation of the loss value is specifically as follows:

Loss＝l_k,(k＝1,2,3)Loss=l _k , (k=1,2,3)

其中，Loss表示总的损失，Loss是3个map的损失的和，l_k(k＝1,2,3)表示预测3个不同箱号类型map的损失，k表示第k张map；y_k表示真值像素分割图，f_k(x)表示预测箱号分割图，y_ki表示真值热图第i个像素值，f_ki(x)表示预测热图第i个像素值，n表示总的像素值；预测的3个箱号分割热图损失函数相加，得到检测网络总的损失函数。Among them, Loss represents the total loss, Loss is the sum of losses of 3 maps, l _k (k=1,2,3) represents the loss of predicting 3 maps of different box number types, k represents the kth map; y _k Indicates the true pixel segmentation map, f _k (x) represents the predicted box number segmentation map, y _ki represents the i-th pixel value of the true value heat map, f _ki (x) represents the i-th pixel value of the predicted heat map, n represents the total The pixel value of ; the predicted loss function of the three box number segmentation heatmaps is added to obtain the total loss function of the detection network.

在一较佳的实施例中，步骤S3中箱号区域矫正方法为基于薄板样条插值方法。In a preferred embodiment, the box number area correction method in step S3 is based on a thin plate spline interpolation method.

在一较佳的实施例中，所述步骤S4中集装箱号码识别网络训练步骤为：In a preferred embodiment, the container number recognition network training step in the step S4 is:

步骤S41：搭建基于多头注意力机制的集装箱号码识别的网络；Step S41: Build a container number recognition network based on a multi-head attention mechanism;

步骤S42:对输入的集装箱箱号图片及箱号标注进行预处理操作；Step S42: Carry out preprocessing operation to the input container box number picture and box number label;

步骤S43:设计网络的损失函数，计算损失并根据损失更新识别网络参数；Step S43: design the loss function of the network, calculate the loss and update the identification network parameters according to the loss;

步骤S44:设定网络训练轮数，结合箱号标注作为监督信息，训练网络至结束，保存训练好的集装箱箱号识别部分的网络框架和网络参数。Step S44: set the number of network training rounds, combine the box number label as supervision information, train the network to the end, save the network framework and network parameters of the trained container box number recognition part.

在一较佳的实施例中，所述步骤S43中，集装箱识别网络由6层卷积层；2层多头注意力机制网络；4层最大池化层；3个全连接层组成。In a preferred embodiment, in the step S43, the container recognition network consists of 6 convolutional layers; 2 layers of multi-head attention mechanism network; 4 layers of maximum pooling layer; 3 fully connected layers.

在一较佳的实施例中，所述步骤S43中，所述损失值计算，具体如下：In a preferred embodiment, in the step S43, the calculation of the loss value is specifically as follows:

其中，l_rec表示识别的损失值，w为箱号标注编码后的字符序列数据，|w|为训练目标序列的字符数，这里|w|＝11，w_j为字符预测序列数据的第j个字符，y_j为字符真值序列数据的第j个字符。Among them, l _rec represents the loss value of recognition, w is the character sequence data encoded by the box number label, |w| is the number of characters in the training target sequence, where |w|=11, w _j is the jth of the character prediction sequence data characters, y _j is the jth character of the character truth sequence data.

与现有技术相比，本发明具有以下有益效果：本发明以SwinTransformer为主干网络设计了基于语义分割的集装箱箱号检测网络，本文检测网络可以高效、准确地将图像中集装箱箱号区域分割出来，并将三种不同类型的箱号分类，通过TPS文本矫正得到矫正后的箱号区域图像。箱号的识别阶段，本文网络的结合了卷积神经网络和多头注意力机制，进而设计出来了专门针对集装箱号序列预测网络，有效的解决了箱号区域内字符模糊、破损和连字等的问题。Compared with the prior art, the present invention has the following beneficial effects: the present invention designs a container number detection network based on semantic segmentation with SwinTransformer as the backbone network, and the detection network in this paper can efficiently and accurately segment the container number area in the image , and classify three different types of box numbers, and get the corrected box number area image through TPS text correction. In the identification stage of the container number, the network of this paper combines the convolutional neural network and the multi-head attention mechanism, and then designs a prediction network specifically for the container number sequence, which effectively solves the problems of blurred characters, broken characters, and hyphens in the container number area. question.

附图说明Description of drawings

图1为本发明优选实施例的结构流程图。Fig. 1 is a structural flow chart of a preferred embodiment of the present invention.

图2为本发明优选实施例中步骤S1中采集部分数据集效果图。Fig. 2 is an effect diagram of part of the data set collected in step S1 in the preferred embodiment of the present invention.

图3为本发明优选实施例中步骤S13中3种不同箱号类型示例。Fig. 3 is an example of three different box number types in step S13 in the preferred embodiment of the present invention.

图4为本发明优选实例中步骤S2箱号检测网络图。Fig. 4 is a network diagram of box number detection in step S2 in the preferred example of the present invention.

图5为本发明优选实施例中步骤S3中TPS箱号矫正示例图。Fig. 5 is an example diagram of TPS box number correction in step S3 in the preferred embodiment of the present invention.

图6为本发明优选实施例中步骤S4箱号识别网络图。Fig. 6 is a network diagram of box number identification in step S4 in the preferred embodiment of the present invention.

图7为本发明优选实施例中步骤S4识别CNN特征提取网络结构。Fig. 7 is a network structure for identifying CNN feature extraction in step S4 in a preferred embodiment of the present invention.

图8为本发明优选实施例中步骤S5整体箱号检测识别示例图。Fig. 8 is an example diagram of detection and identification of the overall box number in step S5 in the preferred embodiment of the present invention.

具体实施方式detailed description

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

应该指出，以下详细说明都是例示性的，旨在对本申请提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式；如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used here are only for describing specific embodiments, and are not intended to limit exemplary embodiments according to the present application; as used herein, unless the context clearly indicates otherwise, the singular form is also intended to include In addition, it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, it indicates the presence of features, steps, operations, means, components and/or combinations thereof.

请参照图1-8，本发明提供一种基于语义分割和Transformer的集装箱号检测识别方法，包括以下步骤：Please refer to Figures 1-8, the present invention provides a container number detection and recognition method based on semantic segmentation and Transformer, including the following steps:

在本实施例中，步骤S1具体为：In this embodiment, step S1 is specifically:

步骤S12:如图2所示，实地港口采集集装箱数据集；Step S12: as shown in Figure 2, the container data set is collected at the port on the spot;

步骤S13:利用LabeIme标注软件，标注集装箱号码区域，将号码区域四边形框位置信息、分类信息以及箱号标注信息以json文件形式保存，得到初始集装箱号数据集；如图3所示，所述集装箱号要检测和识别的类型包括横向、纵向方向以及双行的集装箱号，其中横向及纵向的箱号标注为4个坐标，双行为6个坐标。Step S13: Utilize LabeIme labeling software, label container number area, the quadrangular frame position information of number area, classification information and case number mark information are saved with json file form, obtain initial container number data set; As shown in Figure 3, described container The types of numbers to be detected and identified include horizontal, vertical, and double-line container numbers, where the horizontal and vertical container numbers are marked with 4 coordinates, and the double-line container numbers are marked with 6 coordinates.

步骤S14:对初始集装箱号数据集图像进行数据增强处理，对数据集的标注进行字典编码处理，得到集装箱号数据集。Step S14: Perform data enhancement processing on the image of the initial container number data set, and perform dictionary encoding processing on the annotation of the data set to obtain the container number data set.

在本实施例中，所述步骤S2具体为：In this embodiment, the step S2 is specifically:

步骤S21:如图4所示，利用Swin Transformer作为主干网络，分别得到原图1/4、1/8、1/16、1/32像素大小的特征图C1、C2、C3、C4。Step S21: As shown in Figure 4, use the Swin Transformer as the backbone network to obtain feature maps C1, C2, C3, and C4 with pixel sizes of 1/4, 1/8, 1/16, and 1/32 of the original image, respectively.

步骤S25:损失值的计算：Step S25: Calculation of loss value:

Loss＝l_k,(k＝1,2,3)Loss=l _k, (k=1,2,3)

步骤S26:开始训练，训练结束后保存权重文件；Step S26: start training, save weight file after training finishes;

步骤S27:利用步骤S25保存权重文件检测待识别图片，得到基于语义分割检测网络的箱号坐标数据。Step S27: Utilize step S25 to save the weight file to detect the picture to be recognized, and obtain the box number coordinate data based on the semantic segmentation detection network.

在本实施例中，步骤S3具体为：In this embodiment, step S3 is specifically:

步骤S31：如图5所示，在输入图像中将精准定位得到的四边形框转化为均匀等间的输入控制点C',现根据检测框坐标设置合适长宽的矩形区域作为输出图像，设置K个输出控制点；Step S31: As shown in Figure 5, convert the quadrilateral frame obtained by precise positioning into an evenly spaced input control point C' in the input image, and now set a rectangular area with an appropriate length and width as the output image according to the coordinates of the detection frame, and set K an output control point;

步骤S32：将输出图像的控制点C线性投影得到输入图像中的控制点C'，转换关系如下公式所示：Step S32: linearly project the control point C of the output image to obtain the control point C' in the input image, and the conversion relationship is shown in the following formula:

其中：in:

φ(r)＝r²log(r)φ(r)=r ² log(r)

T＝[C'O^2×3]ΔC^-1 T=[C'O ^2×3 ]ΔC ^-1

r＝||c_i-c_k||，表示输出控制点c_i与c_k的欧氏距离，转化矩阵T为待求的目标矩阵；△C的计算如下公式所示：r=||c _i -c _k ||, represents the Euclidean distance between the output control point c _i and c _k , and the transformation matrix T is the target matrix to be obtained; the calculation of △C is shown in the following formula:

其中，C⁺是由

组成的矩阵。where C ⁺ is made by

composed matrix.

步骤S33：对于输入图像和输出图像的所有像素坐标p’和p对应关系同样符合公式6中的TPS转换关系，p’和p的映射关系如下公式所示：Step S33: For all pixel coordinates p' and p of the input image and the output image, the corresponding relationship also conforms to the TPS conversion relationship in formula 6, and the mapping relationship between p' and p is shown in the following formula:

步骤S34:根据如上坐标转换公式计算得到预测图像中二维坐标p在输入图像中的对应的二维坐标p'，采样器通过对p'的相邻像素进行双线性插值运算来计算预测图像p的值，最终得到场景文字的矫正结果图。Step S34: Calculate the corresponding two-dimensional coordinate p' of the two-dimensional coordinate p in the input image in the predicted image according to the above coordinate conversion formula, and the sampler calculates the predicted image by performing bilinear interpolation operations on adjacent pixels of p' The value of p, and finally get the correction result map of the scene text.

在本实施例中，步骤S4具体为：In this embodiment, step S4 is specifically:

步骤S41：如图6所示，搭建基于多头注意力机制的集装箱号码识别的网络；Step S41: As shown in Figure 6, build a container number recognition network based on the multi-head attention mechanism;

步骤S44:计算识别损失函数：Step S44: Calculate the recognition loss function:

步骤S45:设定网络训练轮数，结合箱号标注作为监督信息，训练网络至结束，保存训练好的集装箱箱号识别部分的网络框架和网络参数。Step S45: set the number of network training rounds, combine the box number label as supervision information, train the network to the end, save the network framework and network parameters of the trained container box number recognition part.

在本实施例中，步骤S5具体为：In this embodiment, step S5 is specifically:

步骤S51:将箱号检测网络和箱号识别网络拼接在一起；Step S51: the box number detection network and the box number identification network are spliced together;

步骤S52:加载训练保存好的网络权重数据；Step S52: load the network weight data that training preserves;

步骤S53:输入待检测的箱号图片；Step S53: input the box number picture to be detected;

步骤S54:如图8所示，得到图片中集装箱号的识别结果；Step S54: as shown in Figure 8, obtain the recognition result of container number in the picture;

以上所述仅为本发明的较佳实施例，凡依本发明申请专利范围所做的均等变化与修饰，皆应属本发明的涵盖范围。The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the scope of the patent application of the present invention shall fall within the scope of the present invention.

Claims

1. A container number detection and identification method based on semantic segmentation and Transformer is characterized by comprising the following steps:

s1, detecting container numbers and constructing an identification data set;

s2, constructing a box number detection network based on semantic segmentation and training;

s3, performing text correction on the box number area obtained by detection;

s4, constructing a container number recognition network and training;

and S5, detecting and identifying the container number by using the trained detection and identification network.

2. The semantic segmentation and Transformer-based container number detection and identification method according to claim 1, wherein the step S1 specifically comprises:

s11, analyzing the type of the container number to be detected and identified, and determining the picture containing the type information as a training picture;

s12, collecting a container data set at a field port;

step S13, labeling a container number area by using LabeIMe labeling software, and storing position information, classification information and container number labeling information of a quadrilateral frame in the number area in a json file form to obtain an initial container number data set;

and S14, performing data enhancement processing on the initial container number data set to obtain a container number data set.

3. The semantic segmentation and Transformer-based container number detection and identification method according to claim 2, wherein the types to be detected and identified by the container number include a horizontal container number, a vertical container number, and a double-row container number, wherein the horizontal container number and the vertical container number are labeled with 4 coordinates, and the double-row container number is 6 coordinates.

4. The semantic segmentation and Transformer-based container number detection and identification method according to claim 1, wherein the step S2 specifically comprises:

s21, using Swin transform as a backbone network to respectively obtain characteristic graphs C1, C2, C3 and C4 of original images with 1/4, 1/8, 1/16 and 1/32 pixel sizes;

s22, obtaining a characteristic diagram F through the FPN structure by the characteristic diagrams C1, C2, C3 and C4;

s23, predicting the head by the semantic segmentation network to obtain feature maps for predicting three different box number types by the feature map F;

s24, calculating a loss value by using the three maps obtained by predicting in the step S23 and a designed loss function;

s25, starting training, and saving a weight file after the training is finished;

and S26, detecting the picture to be identified by using the weight file stored in the step S25 to obtain the box number coordinate data of the detection network based on semantic segmentation.

5. The semantic segmentation and Transformer-based container number detection and identification method according to claim 4, wherein the loss value is calculated as follows:

Loss＝l _k, (k＝1,2,3)

where Loss denotes the total Loss, loss is the sum of the losses of 3 maps, l _k (k =1,2,3) denotes predicting the loss of 3 different bin number type maps, k denotes the kth map; y is _k Representing a true value pixel segmentation map, f _k (x) A division diagram representing the predicted case number, y _ki Indicating the ith pixel value, f, of a truth heat map _ki (x) Representing the ith pixel value of the predicted heat map and n representing the total pixel value; and adding the predicted 3 box number segmentation heat map loss functions to obtain the total loss function of the detection network.

6. The container number detection and identification method based on semantic segmentation and Transformer according to claim 1, wherein the box number region correction method in step S3 is a thin-plate spline interpolation-based method.

7. The semantic segmentation and Transformer-based container number detection and identification method according to claim 1, wherein the step of training the container number identification network in step S4 is as follows:

step S41: constructing a container number identification network based on a multi-head attention mechanism;

s42, preprocessing the input container number picture and the container number label;

s43, designing a loss function of the network, calculating loss and updating and identifying network parameters according to the loss;

and S44, setting the number of network training rounds, training the network to the end by taking the box number label as monitoring information, and storing the network frame and the network parameters of the trained container number identification part.

8. The semantic segmentation and Transformer-based container number detection and identification method according to claim 7, wherein in step S43, the container identification network consists of 6 convolutional layers; 2 layers of multi-head attention mechanism network; 4 maximum pooling layers; 3 full connection layers.

9. The method for detecting and identifying container number based on semantic segmentation and Transformer according to claim 7, wherein in the step S43, the loss value is calculated as follows:

wherein l _rec Representing the loss value of recognition, w is the character sequence data after box number marking and coding, | w | is the character number of the training target sequence, wherein | w | =11, w _j Predicting the jth character, y, of sequence data for a character _j Is the jth character of the truth-valued sequence data.