TWI869931B

TWI869931B - License plate recognition system and method thereof

Info

Publication number: TWI869931B
Application number: TW112126376A
Authority: TW
Inventors: 林鼎; 王志鴻; 姚明孝; 王冠傑; 游思湉
Original assignee: 艾陽科技股份有限公司
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2025-01-11
Also published as: TW202503699A

Abstract

The present invention relates to a license plate recognition system and method thereof. The license plate recognition system comprises at least a server device storing at least one picture file of a license plate to be identified and a plurality of picture files of sample license plates. Each of the plurality of picture files of sample license plates has a double row of characters. Through deep learning training, the system generates a neural network model, inputs the at least one picture file of a license plate to be identified into the neural network model to output analysis result information, and decodes the analysis result information by a decoding algorithm to obtain a character content of an identified license plate.

Description

License plate recognition system and method

本發明是有關一種車牌辨識系統及其方法，特別是一種能夠辨識具有雙排字元的車牌之辨識系統及其方法。 The present invention relates to a license plate recognition system and method, in particular to a recognition system and method capable of recognizing license plates with double rows of characters.

一般車牌上皆會具有車牌編號(車牌號碼)，由於車牌規格的限制，故往往會有雙排車牌的情況發生，對於傳統辨識雙排字元車牌的方法，一般需要先定位字元所在的位置，再使用OCR辨識模組，用以辨識不同位置的字元，之後將不同位置的辨識結果串接，以得到最終的辨識結果。 Generally, license plates have license plate numbers. Due to the limitation of license plate specifications, double-row license plates often occur. For the traditional method of recognizing double-row character license plates, it is generally necessary to locate the position of the characters first, and then use the OCR recognition module to recognize the characters at different positions. After that, the recognition results at different positions are concatenated to obtain the final recognition result.

然而傳統對於這一類具有雙排字元的車牌，往往需要分別標註上下兩排的字元，再分別進行辨識，這樣的方式是非常費時的，但傳統若是將雙排字元標註起來要進行辨識是有難度的，往往會有辨識度下降的問題發生，因此若能夠克服如此問題，將能夠有效降低辨識所花費的時間與成本。 However, traditionally, for this type of license plate with double rows of characters, it is often necessary to mark the upper and lower rows of characters separately and then identify them separately. This method is very time-consuming. However, it is difficult to identify the double rows of characters traditionally, and there is often a problem of reduced recognition. Therefore, if this problem can be overcome, it will be able to effectively reduce the time and cost of identification.

因此，本案使用較為深層的神經網路作為骨幹，透過訓練的過程，網路除了從影像中提取字元特徵，還將雙排的影像字元特徵(單排的影像字元特徵亦可)做一定程度的重新排列，使其特徵圖依序排列了影像中的字元特徵，進入到最後的輸出層時得以使用辨識的方式處理，以取得辨識車牌字元內容，因此本發明應為一最佳解決方案。 Therefore, this case uses a deeper neural network as the backbone. Through the training process, the network not only extracts character features from the image, but also rearranges the double-row image character features (single-row image character features are also acceptable) to a certain extent, so that its feature graph arranges the character features in the image in sequence. When entering the final output layer, it can be processed using a recognition method to obtain the recognized license plate character content. Therefore, this invention should be an optimal solution.

本發明車牌辨識系統，係包含：至少一伺服器設備，係至少包含有：一車牌資料儲存模組，係儲存有至少一個等待辨識車牌圖片檔及多個車牌樣本圖片檔，該車牌樣本圖片檔上係具有一標註區域，而該標註區域之影像內容係為一影像字元特徵，該影像字元特徵係為一雙排字元；一神經網路學習模組，係與該車牌資料儲存模組相連接，用以將多個車牌樣本圖片檔之標註區域進行深度學習訓練，以產生一神經網路模型；一車牌辨識模組，係與該車牌資料儲存模組及該神經網路學習模組相連接，用以將該等待辨識車牌圖片檔輸入該神經網路模型以輸出一分析結果資訊；以及一解碼輸出模組，係與該車牌辨識模組相連接，用以透過一解碼演算法對該分析結果資訊進行解碼，以取得一辨識車牌字元內容。 The license plate recognition system of the present invention comprises: at least one server device, which at least comprises: a license plate data storage module, which stores at least one license plate image file to be recognized and a plurality of license plate sample image files, wherein the license plate sample image file has a marking area, and the image content of the marking area is an image character feature, and the image character feature is a double-row character; a neural network learning module, which is connected to the license plate data storage module, and is used to store a plurality of license plate sample image files; The marked area of the image file is deep-learned to generate a neural network model; a license plate recognition module is connected to the license plate data storage module and the neural network learning module to input the license plate image file to be recognized into the neural network model to output an analysis result information; and a decoding output module is connected to the license plate recognition module to decode the analysis result information through a decoding algorithm to obtain a recognized license plate character content.

更具體的說，所述辨識車牌字元內容係對應於該等待辨識車牌圖片檔上的車牌編碼，該車牌編碼係為單排字元或是雙排字元。 More specifically, the content of the recognized license plate characters corresponds to the license plate code on the license plate image file to be recognized, and the license plate code is a single-row character or a double-row character.

更具體的說，所述解碼演算法係為一貪婪演算法(Greedy algorithm)或是一定向搜尋演算法(Beam Search)。 More specifically, the decoding algorithm is a greedy algorithm or a beam search algorithm.

更具體的說，所述神經網路模型係至少具有多個卷積層及一反卷積層，其中經過多個卷積層能夠將該標註區域之影像內容提取出為一特徵圖，用以將影像內容中的字元特徵重新排列，而該反卷積層用以放大該特徵圖，以提高辨識字串長度的上限，其中該特徵圖上係對應具有多個時間特徵區域。 More specifically, the neural network model has at least a plurality of convolutional layers and a deconvolutional layer, wherein the image content of the labeled area can be extracted into a feature map through a plurality of convolutional layers to rearrange the character features in the image content, and the deconvolutional layer is used to enlarge the feature map to increase the upper limit of the length of the recognized string, wherein the feature map corresponds to a plurality of time feature regions.

更具體的說，所述反卷積層係連接有一字元特徵提取層，該字元特徵提取層係依據多個字元種類，對該特徵圖之每一個時間特徵區域進行提取一字元特徵矩陣，該字元特徵矩陣係至少包含一輸出通道數量資訊、一垂直方向特徵資訊及一水平方向特徵資訊，其中該輸出通道數量資訊係為一字元種類數量，其中該垂直方向特徵資訊係為該時間特徵區域的高度，其中該水平方向特徵資訊係為該時間特徵區域的寬度。 More specifically, the deconvolution layer is connected to a character feature extraction layer, which extracts a character feature matrix for each time feature region of the feature map according to multiple character types. The character feature matrix at least includes output channel quantity information, vertical feature information, and horizontal feature information, wherein the output channel quantity information is the number of character types, wherein the vertical feature information is the height of the time feature region, and wherein the horizontal feature information is the width of the time feature region.

更具體的說，所述字元特徵提取層係連接有一取平均降維層，該取平均降維層用以將該字元特徵矩陣之所有的垂直方向特徵資訊取出一平均值，以輸出一降維字元特徵矩陣。 More specifically, the character feature extraction layer is connected to an average dimension reduction layer, which is used to extract an average value of all vertical feature information of the character feature matrix to output a reduced-dimensional character feature matrix.

更具體的說，所述取平均降維層係連接有一輸出層，該輸出層係透過一連結時序分類方法對該降維字元特徵矩陣進行處理，並由該解碼輸出模組透過該解碼演算法將每一個時間特徵區域辨識為一個字元，並移除連續的字元與空白，以取得該辨識車牌字元內容。 More specifically, the averaging dimension reduction layer is connected to an output layer, which processes the dimension reduction character feature matrix through a linked temporal classification method, and the decoding output module recognizes each time feature region as a character through the decoding algorithm, and removes continuous characters and blanks to obtain the recognized license plate character content.

一種車牌辨識方法，其步驟為：(1)一伺服器設備係儲存有至少一個等待辨識車牌圖片檔及多個車牌樣本圖片檔，該車牌樣本圖片檔上係具有一標註區域，而該標註區域之影像內容係為一影像字元特徵，該影像字元特徵係為一雙排字元；(2)該伺服器設備用以將多個車牌樣本圖片檔之標註區域進行深度學習訓練，以產生一神經網路模型；(3)該伺服器設備用以將該等待辨識車牌圖片檔輸入該神經網路模型以輸出一分析結果資訊；(4)該伺服器設備用以透過一解碼演算法對該分析結果資訊進行解碼，以取得一辨識車牌字元內容。 A license plate recognition method, the steps of which are: (1) a server device stores at least one license plate image file to be recognized and multiple license plate sample image files, the license plate sample image file has a marked area, and the image content of the marked area is an image character feature, and the image character feature is a double-row character; (2) the server device is used to perform deep learning training on the marked areas of the multiple license plate sample image files to generate a neural network model; (3) the server device is used to input the license plate image file to be recognized into the neural network model to output an analysis result information; (4) the server device is used to decode the analysis result information through a decoding algorithm to obtain a recognized license plate character content.

更具體的說，所述神經網路模型係至少具有多個卷積層、一反卷積層、一字元特徵提取層、一取平均降維層及一輸出層，該多個卷積層將該標註區域之影像內容提取出為一具有多個時間特徵區域之特徵圖，用以將影像中的字元特徵重新排列，該反卷積層用以放大該特徵圖，該字元特徵提取層用以對該特徵圖之每一個時間特徵區域進行提取一字元特徵矩陣，而該取平均降維層用以將該字元特徵矩陣降維並輸出一降維字元特徵矩陣，該輸出層再透過一連結時序分類方法對該降維字元特徵矩陣進行處理，透過該解碼演算法將每一個時間特徵區域辨識為一個字元，並移除連續的字元與空白，以取得該辨識車牌字元內容。 More specifically, the neural network model has at least a plurality of convolutional layers, a deconvolutional layer, a character feature extraction layer, an average dimension reduction layer and an output layer. The plurality of convolutional layers extract the image content of the labeled area into a feature map having a plurality of temporal feature regions to rearrange the character features in the image. The deconvolutional layer is used to enlarge the feature map. The character feature extraction layer is used to average the feature map. A character feature matrix is extracted from each time feature region, and the average dimension reduction layer is used to reduce the dimension of the character feature matrix and output a reduced dimension character feature matrix. The output layer then processes the reduced dimension character feature matrix through a linked temporal classification method, and uses the decoding algorithm to identify each time feature region as a character, and removes continuous characters and blanks to obtain the recognized license plate character content.

1:伺服器設備 1: Server equipment

11:處理器 11: Processor

12:電腦可讀取記錄媒體 12: Computer-readable recording media

121:應用程式 121: Applications

1211:車牌資料儲存模組 1211: License plate data storage module

1212:神經網路學習模組 1212:Neural network learning module

1213:車牌辨識模組 1213: License plate recognition module

1214:解碼輸出模組 1214: Decoding output module

2:車牌本體 2: License plate body

21:標註區域 21: Annotation area

[第1圖]係本發明車牌辨識系統及其方法之整體架構示意圖。 [Figure 1] is a schematic diagram of the overall structure of the license plate recognition system and method of the present invention.

[第2圖]係本發明車牌辨識系統及其方法之應用程式之架構示意圖。 [Figure 2] is a schematic diagram of the application program architecture of the license plate recognition system and method of the present invention.

[第3圖]係本發明車牌辨識系統及其方法之車牌樣本圖片檔之影像內容示意圖。 [Figure 3] is a schematic diagram of the image content of the license plate sample image file of the license plate recognition system and method of the present invention.

[第4A圖]係本發明車牌辨識系統及其方法之神經網路模型之網路架構示意圖。 [Figure 4A] is a schematic diagram of the network architecture of the neural network model of the license plate recognition system and method of the present invention.

[第4B圖]係本發明車牌辨識系統及其方法之神經網路模型之瓶頸層之網路架構示意圖。 [Figure 4B] is a schematic diagram of the network architecture of the bottleneck layer of the neural network model of the license plate recognition system and method of the present invention.

[第4C圖]係本發明車牌辨識系統及其方法之神經網路模型之字元特徵提取層之網路架構示意圖。 [Figure 4C] is a schematic diagram of the network architecture of the character feature extraction layer of the neural network model of the license plate recognition system and method of the present invention.

[第5圖]係本發明車牌辨識系統及其方法之流程示意圖。 [Figure 5] is a schematic diagram of the process of the license plate recognition system and method of the present invention.

有關於本發明其他技術內容、特點與功效，在以下配合參考圖式之較佳實施例的詳細說明中，將可清楚的呈現。 Other technical contents, features and effects of the present invention will be clearly presented in the following detailed description of the preferred embodiment with reference to the drawings.

請參閱第1圖，為本發明車牌辨識系統及其方法之整體架構示意圖，由圖中可知，該車牌辨識系統係包含一伺服器設備1，該伺服器設備1係至少包含有至少一個處理器11及至少一個電腦可讀取記錄媒體12，該等電腦可讀取記錄媒體12儲存有至少一個應用程式121，其中該電腦可讀取記錄媒體12更進一步儲存有電腦可讀取指令，當由該等處理器11執行該等電腦可讀取指令時，能夠使該應用程式121進行運作。 Please refer to Figure 1, which is a schematic diagram of the overall structure of the license plate recognition system and method of the present invention. As can be seen from the figure, the license plate recognition system includes a server device 1, which includes at least one processor 11 and at least one computer-readable recording medium 12. The computer-readable recording medium 12 stores at least one application 121, wherein the computer-readable recording medium 12 further stores computer-readable instructions. When the processors 11 execute the computer-readable instructions, the application 121 can be operated.

如第2圖所示，該應用程式121係包含有一車牌資料儲存模組1211、一神經網路學習模組1212、一車牌辨識模組1213及一解碼輸出模組1214。 As shown in FIG. 2, the application 121 includes a license plate data storage module 1211, a neural network learning module 1212, a license plate recognition module 1213 and a decoding output module 1214.

該車牌資料儲存模組1211係儲存有至少一個等待辨識車牌圖片檔及多個車牌樣本圖片檔(車牌樣本圖片檔用以做為深度學習訓練用的資料集)，如第3圖所示，該車牌樣本圖片檔上係具有一車牌本體2，而該車牌本體2表面上係具有一標註區域21，該標註區域21之影像內容係為一影像字元特徵，該影像字元特徵係為一雙排字元或一單排字元，本案是取具有單排字元的車牌影像及具有雙排字元的車牌影像進行訓練。 The license plate data storage module 1211 stores at least one license plate image file to be recognized and multiple license plate sample image files (license plate sample image files are used as data sets for deep learning training). As shown in Figure 3, the license plate sample image file has a license plate body 2, and the license plate body 2 has a label area 21 on its surface. The image content of the label area 21 is an image character feature, and the image character feature is a double-row character or a single-row character. In this case, the license plate image with a single-row character and the license plate image with a double-row character are used for training.

該辨識車牌字元內容係對應於該等待辨識車牌圖片檔上的車牌編碼，該車牌編碼係為單排字元或是雙排字元，其中每張車牌至少包含6個字元。 The content of the recognized license plate characters corresponds to the license plate code on the image file of the license plate to be recognized. The license plate code is a single row of characters or a double row of characters, where each license plate contains at least 6 characters.

該神經網路學習模組1212係與該車牌資料儲存模組1211相連接，用以將多個車牌樣本圖片檔之標註區域進行深度學習訓練，以產生一神經網路模型，如第4圖所示，本實施例中，是建構以resnet50為骨幹的神經網路，用以訓練神經網路。 The neural network learning module 1212 is connected to the license plate data storage module 1211 to perform deep learning training on the labeled areas of multiple license plate sample image files to generate a neural network model. As shown in FIG. 4, in this embodiment, a neural network with resnet50 as the backbone is constructed to train the neural network.

當車牌樣本圖片檔輸入該神經網路學習模組1212時，神經網路輸入尺寸，寬240像素，高96像素，輸入維度是1x3x96x240，經過神經網路處理，得到網路輸出層矩陣，輸出層維度是1x36x30。 When the license plate sample image file is input into the neural network learning module 1212, the neural network input size is 240 pixels wide and 96 pixels high, and the input dimension is 1x3x96x240. After being processed by the neural network, the network output layer matrix is obtained, and the output layer dimension is 1x36x30.

本實施例中，影像字元特徵的集合為數字0~9、英文大寫字母A-Z除去I、O及dash(-)，共36種字元，但不限於如此，不同類型字元亦可應用於本案的技術架構。 In this embodiment, the image character feature set is the numbers 0-9, English capital letters A-Z excluding I, O and dash (-), a total of 36 characters, but not limited to this, different types of characters can also be applied to the technical architecture of this case.

本實施例中，是使用resnet50為骨幹(本案骨幹可不限於resnet50，也可使用較為輕量的resnet34)，保留前方N層卷積層，作為影像的特徵提取器(輸入(input)->二維卷積層(conv2D)->批量標準化層(BN)->整流線性單位函數(ReLU)->二維最大池化層(MaxPooling2D)->瓶頸層C=64(bottleneck layer)->瓶頸層C=128(bottleneck layer)->瓶頸層C=256(bottleneck layer))，其中虛線框旁的*3,*4,*6，代表該區塊重複3次、4次、6次。 In this embodiment, resnet50 is used as the backbone (the backbone in this case is not limited to resnet50, and the lighter resnet34 can also be used), and the front N convolutional layers are retained as the image feature extractor (input->2D convolutional layer (conv2D)->batch normalization layer (BN)->rectified linear unit function (ReLU)->2D maximum pooling layer (MaxPooling2D)->bottleneck layer C=64->bottleneck layer C=128->bottleneck layer C=256), where *3, *4, *6 next to the dashed box represent that the block is repeated 3 times, 4 times, and 6 times.

上述經過多個卷積層能夠將該標註區域之影像內容提取出為一特徵圖，用以將影像內容中的字元特徵重新排列。 The above-mentioned multiple convolutional layers can extract the image content of the marked area into a feature map, which is used to rearrange the character features in the image content.

如第4B圖所示，為網路架構圖中的瓶頸層(輸入(input)->二維卷積層(conv2D)卷積核1x1輸出通道C->批量標準化層(BN)->整流線性單位函數(ReLU)->二維卷積層(conv2D)卷積核3x3輸出通道C->批量標準化層(BN)->整流線性單位函數(ReLU)->二維卷積層(conv2D)卷積核3x3輸出通道Cx4->批量標準化層(BN)->特徵相加層(Add)->整流線性單位函數(ReLU)->輸出層(Output))(輸入(input)->二維卷積層(conv2D)卷積核1x1輸出通道Cx4->批量標準化層(BN)->特徵相加層(Add))。 As shown in Figure 4B, the bottleneck layer in the network architecture diagram (input->2D convolution layer (conv2D) convolution kernel 1x1 output channel C-> batch normalization layer (BN)-> rectified linear unit function (ReLU)->2D convolution layer (conv2D) convolution kernel 3x3 output channel C-> batch normalization layer (BN)-> rectified linear unit function (ReLU)->2D convolution layer (conv2D) convolution kernel 3x3 output channel C-> batch normalization layer (BN)-> rectified linear unit function (ReLU)->2D convolution layer (conv2D) Convolution layer (conv2D) convolution kernel 3x3 output channel Cx4-> batch normalization layer (BN)-> feature addition layer (Add)-> rectified linear unit function (ReLU)-> output layer (Output))(input (input)-> 2D convolution layer (conv2D) convolution kernel 1x1 output channel Cx4-> batch normalization layer (BN)-> feature addition layer (Add)).

接續第4A圖，接上二維反卷積層(TransposConv2d)、字元特徵提取層、取平均降維層(ReduceMean)、輸出層(Output)。 Continuing from Figure 4A, we have the two-dimensional deconvolution layer (TransposConv2d), the character feature extraction layer, the average dimension reduction layer (ReduceMean), and the output layer (Output).

該反卷積層(二維反卷積層)用以放大該特徵圖，以提高辨識字串長度的上限，其中該特徵圖上係對應具有多個時間特徵區域(timestep)，於本實施例中，將輸出層的timestep由15提高為30，後續在字元特徵提取層的輸出層，有較多的時間特徵區域(timestep)，提供連結時序分類方法(Connectionist temporal classification,CTC)對應特徵圖中各個水平位置上的字元。 The deconvolution layer (two-dimensional deconvolution layer) is used to enlarge the feature map to increase the upper limit of the length of the recognized string, wherein the feature map corresponds to multiple time feature regions (timestep). In this embodiment, the timestep of the output layer is increased from 15 to 30. The output layer of the subsequent character feature extraction layer has more time feature regions (timestep), providing the connectionist temporal classification method (CTC) corresponding to the characters at each horizontal position in the feature map.

進一步說明，反捲積層的主要目的是提高理論上能辨識的最大字串長度。 To explain further, the main purpose of the deconvolution layer is to increase the maximum string length that can be recognized in theory.

進一步說明，序列處理模型中，timestep通常指的是序列中的時間特徵區域(亦可稱為時間步長)，每一個時間特徵區域都對應著序列中的一個元素。例如，如果我們有一個句子「我喜歡吃蘋果」，每個字可以視為一個timestep，因此這個句子就有六個timesteps。 To explain further, in sequence processing models, timestep usually refers to the time feature region in the sequence (also called time step), and each time feature region corresponds to an element in the sequence. For example, if we have a sentence "I like to eat apples", each word can be regarded as a timestep, so this sentence has six timesteps.

其中連結時序分類方法(Connectionist temporal classification,CTC)是一種用來處理序列數據的技術，特別適合處理輸入和輸出的長度不一致的情況，由於每一個timestep的輸出(特徵圖的每個區域)會對應到一個可能的字符，CTC就是用來決定每個timestep應該對應到哪個字符。 Among them, Connectionist temporal classification (CTC) is a technology used to process sequence data, which is particularly suitable for processing situations where the length of input and output are inconsistent. Since the output of each timestep (each area of the feature map) corresponds to a possible character, CTC is used to determine which character each timestep should correspond to.

該反卷積層連接有字元特徵提取層，該字元特徵提取層係依據多個字元種類，對該特徵圖之每一個時間特徵區域進行提取一字元特徵矩陣，該字元特徵矩陣係至少包含一輸出通道數量資訊、一垂直方向特徵資訊及一水平方向特徵資訊，其中該輸出通道數量資訊係為一字元種類數量，其中該垂直方向特徵資訊係為該時間特徵區域的高度，其中該水平方向特徵資訊係為該時間特徵區域的寬度。 The deconvolution layer is connected to a character feature extraction layer, which extracts a character feature matrix for each time feature region of the feature map according to multiple character types. The character feature matrix at least includes output channel quantity information, vertical feature information, and horizontal feature information, wherein the output channel quantity information is the number of character types, the vertical feature information is the height of the time feature region, and the horizontal feature information is the width of the time feature region.

如第4C圖所示，為網路架構圖中的字元特徵提取層(輸入(input)->二維卷積層(conv2D)卷積核13x1輸出通道1024->整流線性單位函數(ReLU)->二維卷積層(conv2D)卷積核1x30輸出通道36->整流線性單位函數(ReLU)->輸出層(Output))，由圖中可知，本實施例是使用13x1尺寸的卷積核(kernel)和1x30尺寸的卷積核分別提取垂直跟水平方向的相關資訊，輸出通道數(channel)為36，對應可能的35種字元加上空白(blank)，空白(blank)為CTC loss function訓練時所需的特殊字元，而輸出矩陣維度是1x36x13x30。 As shown in Figure 4C, it is the character feature extraction layer in the network architecture diagram (input->two-dimensional convolution layer (conv2D) convolution kernel 13x1 output channel 1024->rectified linear unit function (ReLU)->two-dimensional convolution layer (conv2D) convolution kernel 1x30 output channel 36->rectified linear unit function (ReLU)->output layer (Output)). As can be seen from the figure, this embodiment uses a 13x1 convolution kernel and a 1x30 convolution kernel to extract relevant information in the vertical and horizontal directions respectively. The number of output channels is 36, corresponding to 35 possible characters plus blank, and blank is CTC loss. Special characters required for function training, and the output matrix dimension is 1x36x13x30.

其中13x1的卷積核主要是捕捉影像垂直方向的特徵，而1x30的卷積核則是捕捉水平方向的特徵。 The 13x1 convolution kernel is mainly used to capture the vertical features of the image, while the 1x30 convolution kernel is used to capture the horizontal features.

其中輸出通道數為36，代表輸出有36個獨立的特徵地圖(feature map)，每一個特徵地圖都是對原始輸入資訊的不同表徵。在這個情況下，這36個通道對應到可能的35種字元加上一個表示空白的特殊字元(blank)，空白字符在CTC loss function中是必需的，它用於表示不同字元之間的分隔。 The number of output channels is 36, which means that the output has 36 independent feature maps, each of which is a different representation of the original input information. In this case, these 36 channels correspond to 35 possible characters plus a special character representing blank. The blank character is required in the CTC loss function, and it is used to represent the separation between different characters.

其中輸出通道數為36，這是因為本實施例使用了35種可能的字元以及一種特殊的空白符號。這35種可能的字元通常包括26個英文字母和9個數字(或者根據實際的車牌系統有所不同)。 The number of output channels is 36 because this embodiment uses 35 possible characters and a special blank symbol. These 35 possible characters usually include 26 English letters and 9 numbers (or may be different depending on the actual license plate system).

其中輸出矩陣維度是1x36x13x30，第一維度(1)是批次大小(batch size)，代表同時處理的圖片數量，在這個案例中，我們一次處理一張圖片。 The output matrix dimension is 1x36x13x30. The first dimension (1) is the batch size, which represents the number of images processed simultaneously. In this case, we process one image at a time.

第二維度(36)是通道數(channel)，如前所述，對應到35個可能的字元和一個空白字符。 The second dimension (36) is the number of channels, which, as mentioned above, corresponds to 35 possible characters and one whitespace character.

第三維度與第四維度(13和30)分別代表了特徵地圖的高度和寬度。這表示我們在垂直方向上有13個不同的位置，水平方向上有30個不同的位置，所以我們共有13x30=390個位置，每一個位置都有一個36維的向量來表徵該位置的資訊。 The third and fourth dimensions (13 and 30) represent the height and width of the feature map respectively. This means that we have 13 different positions in the vertical direction and 30 different positions in the horizontal direction, so we have a total of 13x30=390 positions, and each position has a 36-dimensional vector to represent the information of that position.

該字元特徵提取層係連接有取平均降維層(ReduceMean)，該取平均降維層用以將該字元特徵矩陣之所有的垂直方向特徵資訊(高度)取出一平均值，以輸出一降維字元特徵矩陣。 The character feature extraction layer is connected to the average dimension reduction layer (ReduceMean), which is used to extract an average value of all vertical feature information (height) of the character feature matrix to output a reduced dimension character feature matrix.

進一步說明，取平均降維層(ReduceMean)將前一層的輸出針對第三維度(高度)取平均降低維度，作為網路的最終輸出層，輸出矩陣維度是1x36x30，其中36對應了可能的字元類別，30對應了特徵圖的時間特徵區域(timestep)。 To further explain, the average dimension reduction layer (ReduceMean) averages the output of the previous layer for the third dimension (height) to reduce the dimension. As the final output layer of the network, the output matrix dimension is 1x36x30, where 36 corresponds to the possible character categories and 30 corresponds to the time feature region (timestep) of the feature map.

該取平均降維層係連接有一輸出層，該輸出層係透過一連結時序分類方法對該降維字元特徵矩陣進行處理，輸出層代表神經網路運算的結果，之後需要透過解碼處理將這些結果轉換為對應的字串。 The average dimension reduction layer is connected to an output layer, which processes the dimension reduction character feature matrix through a link temporal classification method. The output layer represents the results of the neural network operation, and then needs to be converted into corresponding strings through decoding processing.

本實施例中的損失函數(loss function)使用連結時序分類方法(Connectionist temporal classification,CTC)，並透過Adam優化法訓練網路，其中初始學習率為0.0001，學習率使用指數衰減，每20 epochs衰減率為0.5，每次批次大小為128張圖片，每2個epoch使用驗證資料集，總共訓練100 epochs，存下訓練過程中驗證正確率最高的權重，該權重即為訓練完成的最終模型權重。 The loss function in this embodiment uses the connectionist temporal classification method (CTC), and the network is trained through the Adam optimization method, where the initial learning rate is 0.0001, the learning rate uses exponential decay, the decay rate is 0.5 every 20 epochs, the batch size is 128 pictures each time, and the validation data set is used every 2 epochs. The total training is 100 epochs, and the weight with the highest validation accuracy during the training process is saved. This weight is the final model weight after training.

如下表一所示，以4 x T的矩陣做為輸出層輸出之分析結果資訊(各個timestep特徵值)的例子，並進一步說明神經網路的輸出矩陣(36 x 30)如何解碼為最終的輸出字串，於此例子中，縱軸為各個字元，橫軸為timesteps，可能的字元種類(A~C+blank)為4，timesteps的數量為T，ε代表空白字元，矩陣中的數字代表機率，其中每個timestep對應一行，每行的總和為1(所有可能性的機率和為1)；表一是經網路輸出層的矩陣數值(分析結果資訊)，但尚未轉換為最後的字串，故仍需要解碼處理將這些結果轉換為對應的字串。 As shown in Table 1 below, a 4 x T matrix is used as an example of the analysis result information (each timestep feature value) output by the output layer, and further explains how the output matrix (36 x 30) of the neural network is decoded into the final output string. In this example, the vertical axis is each character, the horizontal axis is timesteps, the possible character types (A~C+blank) are 4, the number of timesteps is T, ε represents blank characters, and the numbers in the matrix represent probabilities, where each timestep corresponds to a row, and the sum of each row is 1 (the sum of all possible probabilities is 1); Table 1 is the matrix value (analysis result information) through the network output layer, but it has not yet been converted into the final string, so decoding processing is still required to convert these results into corresponding strings.

該車牌辨識模組1213係與該車牌資料儲存模組1211及該神經網路模型相連接，用以將該等待辨識車牌圖片檔輸入該神經網路模型以輸出一分析結果資訊(輸出各個timestep特徵值)。 The license plate recognition module 1213 is connected to the license plate data storage module 1211 and the neural network model to input the license plate image file to be recognized into the neural network model to output analysis result information (output each timestep feature value).

該解碼輸出模組1214係與該車牌辨識模組1213相連接，用以透過一解碼演算法對該分析結果資訊進行解碼，以取得一辨識車牌字元內容，該解碼演算法係為一連串特定規則的運算，例如貪婪演算法(Greedy algorithm)或是定向搜尋演算法(Beam Search)。 The decoding output module 1214 is connected to the license plate recognition module 1213 to decode the analysis result information through a decoding algorithm to obtain the content of the recognized license plate characters. The decoding algorithm is a series of operations based on specific rules, such as a greedy algorithm or a beam search algorithm.

該解碼輸出模組1214透過該解碼演算法，將該輸出層所輸出之每一個時間特徵區域辨識為一個字元，並移除連續的字元與空白，以取得該辨識車牌字元內容。 The decoding output module 1214 recognizes each time feature region output by the output layer as a character through the decoding algorithm, and removes consecutive characters and blanks to obtain the recognized license plate character content.

進一步說明，解碼演算法是依序處理每一個時間特徵區域 (timestep)，計算最大值所對應的字元，代表該時間特徵區域(timestep)所辨識的字元，處理完每個時間特徵區域(timestep)後，將各自的字元串接，移除連續的字元與空白(blank)，得到最終的輸出結果。 To explain further, the decoding algorithm processes each time feature region (timestep) in sequence, calculates the character corresponding to the maximum value, and represents the character recognized by the time feature region (timestep). After processing each time feature region (timestep), the respective characters are concatenated, and the continuous characters and blanks are removed to obtain the final output result.

本案的解碼技術，參考表一的例子進行解說，首先，第一步驟，先在每一個timestep的位置取出機率最高的字元，第一個timestep最高機率為0.7，對應字元為A，第二個timestep，機率最高的字元為ε，以此類推，記錄所有timestep的最高機率字元；第二步驟，移除連續相同字元中的重複字元，例如AAA移除修正為「A」，AABBBCC修正為「ABC」；第三步驟，移除空白字元得到最終結果。 The decoding technology of this case is explained with reference to the example in Table 1. First, in the first step, the character with the highest probability is taken out at each timestep. The highest probability of the first timestep is 0.7, and the corresponding character is A. The character with the highest probability of the second timestep is ε , and so on. The characters with the highest probability of all timesteps are recorded. In the second step, repeated characters in consecutive identical characters are removed, for example, AAA is removed and corrected to "A", and AABBBCC is corrected to "ABC". In the third step, blank characters are removed to obtain the final result.

於表一的例子中，前三個timestep，在第一步驟得到的字串為AεC，最終結果為「AC」。 In the example of Table 1, for the first three timesteps, the string obtained in the first step is A ε C, and the final result is "AC".

前述第一步驟直接取最高機率的方式是Greedy algorithm的特性，而第二步驟與第三步驟是配合CTC的規則，與Beam Search相同。 The first step of directly taking the highest probability is a feature of the Greedy algorithm, while the second and third steps are in accordance with the CTC rules, which is the same as Beam Search.

另外，若是第一步驟得到的字串是AεCCCεBB，第二步驟移除連續相同字元中的重複字元(AεCCCεBB=>AεCεB)，第三步驟，移除空白字元得到最終結果(ACB)。 In addition, if the string obtained in the first step is AεCCCεBB, the second step removes the repeated characters in the consecutive identical characters (AεCCCεBB=>AεCεB), and the third step removes the blank characters to obtain the final result (ACB).

本實驗例中，訓練神經網路使用13630張車牌影像，測試時使用8800張車牌影像，其中包括2300張雙排字元車牌影像，單排字元車牌影像正確率為98.04%，雙排字元車牌影像正確率為94.69%。 In this experimental example, 13,630 license plate images were used to train the neural network, and 8,800 license plate images were used for testing, including 2,300 double-row license plate images. The accuracy of single-row license plate images was 98.04%, and the accuracy of double-row license plate images was 94.69%.

本案車牌辨識方法，如第5圖所示，其步驟為： (1)一伺服器設備係儲存有至少一個等待辨識車牌圖片檔及多個車牌樣本圖片檔，該車牌樣本圖片檔上係具有一標註區域，而該標註區域之影像內容係為一影像字元特徵，該影像字元特徵係為一雙排字元501；(2)該伺服器設備用以將多個車牌樣本圖片檔之標註區域進行深度學習訓練，以產生一神經網路模型502；(3)該伺服器設備用以將該等待辨識車牌圖片檔輸入該神經網路模型以輸出一分析結果資訊503；(4)該伺服器設備用以透過一解碼演算法對該分析結果資訊進行解碼，以取得一辨識車牌字元內容504。 The license plate recognition method of this case is shown in Figure 5, and its steps are: (1) A server device stores at least one license plate image file to be recognized and multiple license plate sample image files, and the license plate sample image file has a marking area, and the image content of the marking area is an image character feature, and the image character feature is a double-row character 501; (2) The server device is used to convert multiple (1) The server device is used to perform deep learning training on the labeled area of the license plate sample image file to generate a neural network model 502; (2) The server device is used to input the license plate image file to be recognized into the neural network model to output an analysis result information 503; (3) The server device is used to decode the analysis result information through a decoding algorithm to obtain a recognized license plate character content 504.

本發明所提供之車牌辨識系統及其方法，與其他習用技術相互比較時，其優點如下： The license plate recognition system and method provided by the present invention have the following advantages when compared with other commonly used technologies:

(1)本發明使用較為深層的神經網路作為骨幹，透過訓練的過程，網路除了從影像中提取字元特徵，還將單排與雙排的影像字元特徵做一定程度的重新排列，使其特徵圖依序排列了影像中的字元特徵，進入到最後的輸出層時得以使用一般針對單行文字辨識的方式處理。 (1) The present invention uses a relatively deep neural network as the backbone. Through the training process, the network not only extracts character features from the image, but also rearranges the single-row and double-row image character features to a certain extent, so that its feature map arranges the character features in the image in sequence. When entering the final output layer, it can be processed using the general method for single-line text recognition.

(2)本發明使用的神經網路，不僅可應用於單排字元車牌，對於雙排字元車牌仍具有高度的辨識能力。 (2) The neural network used in the present invention can be applied not only to single-row license plates, but also has a high degree of recognition capability for double-row license plates.

(3)本發明相較於傳統辨識雙排字元車牌的方法，本案不需上下兩排分開辨識，而能夠雙排同時進行辨識，如此將能夠有效降低辨識所花費的時間與成本。 (3) Compared with the traditional method of identifying double-row license plates, the present invention does not need to identify the upper and lower rows separately, but can identify both rows at the same time, which will effectively reduce the time and cost of identification.

本發明已透過上述之實施例揭露如上，然其並非用以限定本發明，任何熟悉此一技術領域具有通常知識者，在瞭解本發明前述的技術特徵及實施例，並在不脫離本發明之精神和範圍內，當可作些許之更動與潤飾，因此本發明之專利保護範圍須視本說明書所附之請求項所界定者為準。 The present invention has been disclosed through the above-mentioned embodiments, but they are not used to limit the present invention. Anyone familiar with this technical field and having common knowledge can make some changes and modifications without departing from the spirit and scope of the present invention after understanding the above-mentioned technical features and embodiments of the present invention. Therefore, the scope of patent protection of the present invention shall be subject to the definition of the claim attached to this specification.

121:應用程式 121: Applications

1211:車牌資料儲存模組 1211: License plate data storage module

1212:神經網路學習模組 1212:Neural network learning module

1213:車牌辨識模組 1213: License plate recognition module

1214:解碼輸出模組 1214: Decoding output module

Claims

A license plate recognition system includes: at least one server device, which includes at least: a license plate data storage module, which stores at least one license plate image file to be recognized and multiple license plate sample image files, the license plate sample image file has a label area, and the image content of the label area is an image character feature, and the image character feature is a double-row character or a single-row character; a neural network learning module, which is connected to the license plate data storage module, and is used to store the label area of the multiple license plate sample image files. A deep learning training is performed in the domain to generate a neural network model; a license plate recognition module is connected to the license plate data storage module and the neural network learning module, and is used to input the license plate image file to be recognized into the neural network model to output an analysis result information; and a decoding output module is connected to the license plate recognition module, and is used to decode the analysis result information through a decoding algorithm to obtain a recognized license plate character content, wherein the decoding algorithm is a greedy algorithm (Greedy algorithm) or a beam search algorithm (Beam Search); wherein the neural network model has at least a plurality of convolutional layers, a deconvolutional layer, a character feature extraction layer, an average dimension reduction layer and an output layer, wherein the plurality of convolutional layers extract the image content of the marked area into a feature map having a plurality of time feature regions to rearrange the character features in the image content, the deconvolutional layer is used to enlarge the feature map, and the character feature extraction layer is used to extract a character from each time feature region of the feature map. The average dimension reduction layer is used to reduce the dimension of the character feature matrix and output a reduced dimension character feature matrix. The output layer then processes the reduced dimension character feature matrix through a linked temporal classification method to obtain the analysis result information, which is the feature value of each time feature region. Each time feature region is then identified as a character through the decoding algorithm, and continuous characters and blanks are removed to obtain the recognized license plate character content.

A license plate recognition system as described in claim 1, wherein the recognized license plate character content corresponds to the license plate code on the license plate image file to be recognized, and the license plate code is a single-row character or a double-row character.

A license plate recognition system as described in claim 1, wherein the deconvolution layer enlarges the feature map to increase the upper limit of the length of the recognition string.

The license plate recognition system as described in claim 1, wherein the deconvolution layer is connected to the character feature extraction layer, and the character feature extraction layer extracts the character feature matrix for each time feature region of the feature map according to multiple character types, and the character feature matrix at least includes an output channel quantity information, a vertical direction feature information, and a horizontal direction feature information, wherein the output channel quantity information is a character type quantity, wherein the vertical direction feature information is the height of the time feature region, and wherein the horizontal direction feature information is the width of the time feature region.

The license plate recognition system as described in claim 1, wherein the character feature extraction layer is connected to the averaging dimensionality reduction layer, and the averaging dimensionality reduction layer is used to extract an average value of all vertical feature information of the character feature matrix to output the reduced dimensionality character feature matrix.

A license plate recognition method, the steps of which are: a server device stores at least one license plate image file to be recognized and a plurality of license plate sample image files, the license plate sample image file has a marking area, and the image content of the marking area is an image character feature, and the image character feature is a double row of characters; the server device is used to store the marking area of the plurality of license plate sample image files Perform deep learning training to generate a neural network model; the server device is used to input the license plate image file to be recognized into the neural network model to output an analysis result information; the server device is used to decode the analysis result information through a decoding algorithm to obtain a recognized license plate character content, wherein the decoding algorithm is a greedy algorithm (Greedy algorithm) or a beam search algorithm (Beam Search); wherein the neural network model has at least a plurality of convolutional layers, a deconvolutional layer, a character feature extraction layer, an average dimension reduction layer and an output layer, wherein the plurality of convolutional layers extract the image content of the labeled area into a feature map having a plurality of time feature regions to rearrange the character features in the image content, the deconvolutional layer is used to enlarge the feature map, and the character feature extraction layer is used to extract a character feature from each time feature region of the feature map. The character feature matrix is obtained by the average dimension reduction layer, and the averaging dimension reduction layer is used to reduce the dimension of the character feature matrix and output a reduced dimension character feature matrix. The output layer then processes the reduced dimension character feature matrix through a linked temporal classification method to obtain the analysis result information, which is the feature value of each time feature region. The decoding algorithm then identifies each time feature region as a character, and removes continuous characters and blanks to obtain the recognized license plate character content.