CN111008629A

CN111008629A - Cortex-M3-based method for identifying number of tip

Info

Publication number: CN111008629A
Application number: CN201911245954.8A
Authority: CN
Inventors: 刘柏罕; 刘谋海; 贺达江; 米贤武; 丁黎明; 黄利军; 陈灵曦
Original assignee: Huaihua University
Current assignee: Huaihua University
Priority date: 2019-12-07
Filing date: 2019-12-07
Publication date: 2020-04-14

Abstract

The invention discloses a Cortex-M3-based meter end digital identification method, comprising a water meter reader installed on the upper end of the water meter for recognizing the water meter image, the water meter reader adopts a Cortex-M3 processor, and the Cortex-M3 processor On the basis of the deep learning convolutional network algorithm, a network model is adopted. The network model can run on the Cortex‑M3 processor by optimizing the depth and width of the deep learning network, so as to form a digital recognition system on the table. algorithm. The invention creatively proposes a network model on the basis of the deep learning convolutional network algorithm, and builds a digital recognition algorithm by optimizing the depth and width of the deep learning network, which can run on the Cortex-M3 processor. The algorithm occupies RAM and ROM resources within the scope of the processor and can be deployed in collectors or device terminals.

Description

Cortex-M3-based method for identifying number of tip

Technical Field

The invention relates to the field of water meter reading, in particular to a Cortex-M3-based meter end number identification method.

Background

In order to solve the problem of remote reading of the remote water meter, a device for shooting reading is adopted, so that a base meter can be free from being replaced. If the image data is uploaded to the cloud, the mobile charge and the time are consumed. The remote reading of the traditional remote water meter is relatively high in construction cost, time and manpower are consumed, image data are uploaded to the cloud 6, and the mobile cost and the time are consumed.

The traditional method of digital identification uses FPGA or DSP to calculate, which is high in cost. The deep learning is a branch of machine learning and is an algorithm for performing characterization learning on data by taking an artificial neural network as a framework. The benefit of deep learning is to replace the manual feature acquisition with unsupervised or semi-supervised feature learning and hierarchical feature extraction efficient algorithms. To date, several deep learning frameworks have been proposed, such as deep neural networks, convolutional neural networks, deep belief networks, and recurrent neural networks. They have been applied to the fields of computer vision, speech recognition, natural language processing, audio recognition, and bioinformatics and have achieved excellent results.

The efficiency of deep learning comes at the cost of massive computation and massive memory. Taking the highly efficient and well-known ShuffleNet as an example, ShuffleNet still requires 5M Ram and 140 MFLOPS. The Cortex-M3 series single chip microcomputer cannot support the large Ram and the consumption of calculated amount at present.

Disclosure of Invention

The invention aims to solve the technical problem that a Cortex-M3-based meter end number identification method provides a deep learning network model, can greatly reduce the cost of a collector or a meter reader, and is greatly helpful for designing and popularizing water meter, gas meter and heat meter collection.

The invention is realized by the following technical scheme: a meter-end digital identification method based on Cortex-M3 comprises a water meter reader which is arranged at the upper end of a water meter and used for identifying images of the water meter, wherein the water meter reader adopts a Cortex-M3 processor, and the Cortex-M3 processor adopts a network model on the basis of a deep learning convolutional network algorithm, and the network model can run on a Cortex-M3 processor by optimizing the depth and the width of the deep learning network so as to build a meter-end digital identification algorithm.

Preferably, the network model is a small network for 128KRAM and is named DigitalNet, which mainly uses group convolution and channel shuffling technology.

As a preferable technical solution, the DigitalNet has a basic unit, the DigitalNet is improved on the basis of a residual unit, dense 1x1 convolution is replaced by 1x1 groupconvolume, a channel shuffle operation is added after the first 1x1 convolution, and then a 3x3 depthwise volume is connected;

for the residual unit, 3x3 avg pool with stride 2 is adopted for the original input, so that a feature map with the same size as the output is obtained, and then the obtained feature map is connected with the output instead of being added.

As a preferred technical solution, the whole network structure of DigitalNet is composed of three basic units, the input size of a picture is 64 × 48, there are always three stages, each Stage is divided into two parts, the first part is a characteristic graph halving Stage with stride of 2, the second part has residual modules with group convolution, the residual modules can be stacked continuously, the precision can increase with the increase of the depth, and the last modules are global max pole and fullconnect, and the final recognition result is output.

As a preferable technical scheme, a network training method of a network model of DigitalNet adopts a data enhancement mode, the data enhancement adopts an online enhancement method, and the online enhancement method adopts five methods of Gaussian enhancement, corrosion enhancement, expansion enhancement, scaling enhancement and data mixup enhancement.

As a preferred technical solution, gaussian enhancement: the template coefficient of the Gaussian filter is reduced along with the increase of the distance from the center of the template, so that the image blurring degree of the Gaussian filter is smaller than that of the average value filter, the Gaussian enhancement is used for enhancing the compatibility of the algorithm to different fonts,

and (3) corrosion enhancement: scanning each pixel of the image by using a 3x3 structural element, and performing AND operation by using the structural element and a binary image covered by the structural element, wherein if the structural element and the binary image are both 1, the pixel of the image is 1, otherwise, the pixel is 0, and the corrosion enhancement increases the compatibility of the model to font thickness variation;

and (3) expansion enhancement: scanning each pixel of the image by using a 3x3 structural element, and performing AND operation on the structural element and the binary image covered by the structural element, wherein if the structural element and the binary image are both 0, the pixel of the image is 0, otherwise, the pixel is 1, and the expansion enhancement is used for increasing the compatibility of the model to font thickness variation;

zooming enhancement: zooming the image, wherein bilinear interpolation is used during zooming, and the zooming enhances the compatibility of the model to the change of the digital size;

data mixup enhancement: mixup regularizes the network by letting it support simple linear behavior between training samples, the Mixup data enhancement formula is as follows:

the invention has the beneficial effects that: the invention innovatively provides a deep learning network model, can greatly reduce the cost of a collector or a meter reader, and is greatly helpful for designing and popularizing water meter, gas meter and heat meter collection.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of a generic convolution;

FIG. 2 is a schematic diagram of group convolution;

FIG. 3 is a schematic diagram of channel shuffling;

FIG. 4 is a diagram of the basic cell architecture of DigitalNet;

FIG. 5 is a schematic diagram of an confusable image;

fig. 6 is a numerical sample diagram.

Detailed Description

All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.

Any feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

In the description of the present invention, it is to be understood that the terms "one end", "the other end", "outside", "upper", "inside", "horizontal", "coaxial", "central", "end", "length", "outer end", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the present invention.

Further, in the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

The use of terms such as "upper," "above," "lower," "below," and the like in describing relative spatial positions herein is for the purpose of facilitating description to describe one element or feature's relationship to another element or feature as illustrated in the figures. The spatially relative positional terms may be intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "below" or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, the exemplary term "below" can encompass both an orientation of above and below. The device may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly

In the present invention, unless otherwise explicitly specified or limited, the terms "disposed," "sleeved," "connected," "penetrating," "plugged," and the like are to be construed broadly, e.g., as a fixed connection, a detachable connection, or an integral part; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

As shown in figure 1, the meter-end digital recognition method based on Cortex-M3 comprises a water meter reader which is arranged at the upper end of a water meter and used for recognizing images of the water meter, the water meter reader adopts a Cortex-M3 processor, and the Cortex-M3 processor adopts a network model on the basis of a deep learning convolutional network algorithm, wherein the network model can be constructed by optimizing the depth and the width of a deep learning network so as to run on a Cortex-M3 processor.

In view of the features of the Cortex-M3 device, the present invention has designed a small network for 128KRAM and is named DigitalNet.

The design goal of DigitalNet is also how to achieve the best model accuracy with limited computational resources, which requires a good balance between speed and accuracy. At present, the main design ideas of CNN model compression are mainly two aspects: model structure design and model compression. Here we use a specific network structure approach to achieve model scaling and speeding, rather than compressing or migrating a large trained model. DigitalNet takes ShuffleNet as a plate making rule of the blue book, and has the characteristics of maintaining the precision and greatly reducing the calculated amount of a model and the size of the model as same as the ShuffleNet. DigitalNet mainly uses group convolution and channel shuffling technology.

The general convolution technique: if the size of the input feature map is C H W, the number of convolution kernels is N, the number of output feature maps is N, the number of the output feature maps is the same as the number of the convolution kernels, the size of each convolution kernel is C K, the total parameter number of the N convolution kernels is N C K, the connection mode of the input map and the output map is shown in FIG. 1, and each output feature of the convolution network is related to all input features

Group convolution technique: group convolution, as the name implies, groups the input feature maps, and then each group is convolved separately. Assuming that the size of the input feature maps is C × H × W, the number of output feature maps is N, if it is set to be divided into G groups, the number of input feature maps of each group is C/G, the number of output feature maps of each group is N/G, the size of each convolution kernel is C × K/G, the total number of convolution kernels is still N, the number of convolution kernels of each group is N/G, a convolution kernel is only convolved with the input map of the same group, and the total number of parameters of a convolution kernel is N × C × K/G, it is seen that the total number of parameters is reduced to the original 1/G, and the connection mode is shown in fig. 2.

Channel shuffling technique: in small networks we can use convolution instead of ordinary convolution, which can greatly reduce the model size and the amount of computation. However, if multiple groups are stacked together, a channel output is derived from only a small fraction of the input channels, as shown in FIG. 3(a), and such an attribute reduces the information throughput between channel groups, reducing the information presentation capacity. If we allow the group convolution to get a different set of input data, i.e. the effect shown in fig. 3(b), then the input and output channels will be fully correlated. Specifically, for the output channels of the previous layer, we can do a shuffle operation, as shown in fig. 3(c), and then divide the output channels into several groups for input to the next layer. In this way, we can reduce the model and the calculation amount on the premise of keeping the precision.

The basic unit of DigitalNet is improved on the basis of a residual unit, as shown in fig. 4(a), which is a residual unit comprising 3 layers: first a 1x1 convolution, then a depthwise convolution of 3x3, where the 3x3 convolution is the bottleneck layer, followed by a 1x1 convolution, and finally a short-circuit connection, adding the input directly to the output. Now, the following modifications are made: the dense 1x1 convolution is replaced by a group constraint of 1x1, but a channel shuffle operation is added after the first 1x1 convolution, followed by a depthwise constraint of 3x 3. The modification is shown in fig. 4 (b). For the residual unit.

If stride is 1, then the input and output shape are consistent and can be added directly, and when stride is 2, the number of channels increases and the profile size decreases, then the input and output do not match. Typically, a 1x1 convolution can be used to map the input to the same shape as the output. However, in DigitalNet, a different strategy is adopted, as shown in fig. 4 (c): the original input is used with 3x3 avg pool with stride 2, so that a characteristic map with the same size as the output is obtained, and then the obtained characteristic map is connected (concat) with the output instead of being added. The purpose of this is mainly to reduce the amount of computation and the size of the parameters.

The overall structure of digitalnt is made up of the above three basic units, the overall structure is shown in table 1, the input size of the picture is 64 x 48, there are always three stages. Each Stage is divided into two parts, the first part is a characteristic diagram halving Stage with stride of 2, the second part is provided with residual modules for group convolution, the residual modules can be continuously stacked, and the precision can be increased along with the increase of the depth. And finally, the modules are global max pool and fullconnect, and the final recognition result is output.

Table 1 DigitalNe overall network architecture.

The network training method adopts data enhancement and a training method, and the data enhancement is an effective method for expanding the scale of a data sample. Deep learning is a method based on big data, and the larger the scale and the higher the quality of the data, the better the recognition result and generalization ability of the model.

However, when data is actually collected, it is often difficult to cover all scenes, such as: for lighting conditions, attitude of the actual object, shooting angle, etc. In this regard, we can address this problem with data enhancement of existing data. The data enhancement has two modes of off-line enhancement and on-line enhancement. Because online enhancement can obtain more data than offline enhancement and does not occupy hard disk space, online enhancement is adopted, and Gaussian enhancement, corrosion enhancement, expansion enhancement, scaling enhancement and data mixup enhancement are adopted together.

Gaussian enhancement: the gaussian filter is a linear filter, and can effectively suppress noise and smooth an image. The coefficients of the template of the gaussian filter decrease with increasing distance from the center of the template. Therefore, the gaussian filter has a smaller degree of image blurring than the mean filter, and gaussian enhancement is used to enhance the compatibility of the algorithm with different fonts.

And (3) corrosion enhancement: scanning each pixel of the image by using a 3x3 structural element, and performing AND operation by using the structural element and the binary image covered by the structural element, wherein if the structural element and the binary image are both 1, the pixel of the image is 1, otherwise, the pixel is 0, and the corrosion enhancement increases the compatibility of the model to the font thickness variation.

And (3) expansion enhancement: each pixel of the image is scanned with a 3x3 structuring element, and the structuring element is anded with the binary image it overlays. If both are 0, then the pixel of the resulting image is 0, otherwise 1, and dilation enhancement increases the model's compatibility to font weight variations.

Zooming enhancement: and scaling the image, wherein bilinear interpolation is used in scaling. Scaling enhancements mainly increase the compatibility of the model to variations in digital size.

the training method comprises the following steps:

the batch _ size of the training parameter is 1000, the learning rate is 0.1, and the training can be completed by running 40 epochs, which is about 20 hours.

TABLE 2 network model test results

During training it was found that for some numbers it was easy to identify, while other locations were easy to identify with errors, as shown in fig. 5, the first 8 disturbed by the light was easily identified as 6, the second 8.5 disturbed by the light was easily identified as 9.5, the third 6 disturbed by the light was easily identified as 5, the fourth and fifth 1.5 disturbed by dust was easily identified as 7.5.

In order to solve the problem, the local is replaced by the Focal local from Softmax, the Focal local can automatically carry out hard mining, the specific gravity of the local is adjusted according to the condition of a sample, and the formula of the Focal local is as follows, wherein a is 0.25, and y is 2.

As shown in fig. 6, a standard data set of about 50 million pictures of 20 commercially available phenotypes was collected and divided into a test set (60%), a training set (20%), and a validation set (20%).

The influence of augment and loss on the identification precision is respectively tested by taking the verification set as an evaluation standard, wherein the error rate (Pw) is the number of samples which are identified incorrectly and have the identification confidence coefficient of more than 0.5 in the total samples, and the success rate (Pr) is the proportion of the number of samples which are identified correctly and have the confidence coefficient of about 0.5 in the total samples.

We can see that data enhancement can reduce the error rate and enhance the generalization capability of the model, but also reduce the recognition success rate. The automatic difficult mining of Focal local can simultaneously reduce the error rate and improve the recognition success rate, and the DigitalNet recognition precision can meet the practical application requirements.

The invention innovatively provides a deep learning network model, which can greatly reduce the cost of a collector or a meter reader and is greatly helpful for designing and popularizing water meter, gas meter and heat meter collection.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not thought of through the inventive work should be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope defined by the claims.

Claims

1. a method for digital identification of a meter end based on Cortex-M3, it is characterized in that: comprise the water meter reader that is installed on the water meter upper end for identifying the water meter image, this water meter reader adopts the Cortex-M3 processor, and the Cortex-M3 Based on the deep learning convolutional network algorithm, the processor adopts a network model, which optimizes the depth and width of the deep learning network so that it can run on the Cortex-M3 processor to form a table-side Number recognition algorithm.

2. the table-end digital identification method based on Cortex-M3 according to claim 1, is characterized in that: described network model is for the small network for 128KRAM, and is named as DigitalNet, DigitalNet mainly uses group convolution and channel mixing. wash technique.

3. The method for identifying numbers at the table end based on Cortex-M3 according to claim 2, wherein the DigitalNet has a basic unit, and the basic unit of the DigitalNet is improved on the basis of a residual unit, Replace the dense 1x1 convolution with a 1x1 group convolution, add a channel shuffle operation after the first 1x1 convolution, and then connect a 3x3 depthwise convolution;

For the residual unit, a 3x3 avg pool with stride=2 is used for the original input, so that a feature map of the same size as the output is obtained, and then the obtained feature map is connected to the output instead of adding.

4. the table-side digital identification method based on Cortex-M3 according to claim 1, is characterized in that: the overall network structure of DigitalNet is made up of three kinds of basic units, and the input size of the picture is 64*48, and there are always three Stages, Each stage is divided into two parts, the first part is the feature map halving stage with stride 2, and the second part has a residual module of group convolution. The residual module can be continuously stacked, and the accuracy will increase with the increase of depth. Increase, the last module is the global max pool and fullconnect, and output the final recognition result.

5. the table-end digital identification method based on Cortex-M3 according to claim 1, is characterized in that: the network training method of the network model of DigitalNet adopts data enhancement mode, and described data enhancement adopts online enhancement method, and described online enhancement The method adopts five methods: Gaussian enhancement, erosion enhancement, dilation enhancement, scaling enhancement and data mixup enhancement.

6. the table-end digital identification method based on Cortex-M3 according to claim 5, is characterized in that:

Gaussian enhancement: The template coefficient of the Gaussian filter decreases as the distance from the center of the template increases. Therefore, the Gaussian filter has less blur on the image than the mean filter. Gaussian enhancement is used to enhance the algorithm for different font compatibility,

Corrosion enhancement: Use 3x3 structuring elements to scan each pixel of the image, and perform an "AND" operation with the structuring element and the binary image it covers. If both are 1, the pixel of the resulting image is 1, otherwise it is 0, corrosion enhancement Increase the compatibility of the model to font weight changes;

Dilation enhancement: scan each pixel of the image with a 3x3 structuring element, and perform an "AND" operation with the structuring element and the binary image it covers. If both are 0, the pixel of the resulting image is 0, otherwise it is 1, and dilation enhancement Increase the compatibility of the model to font weight changes;

Zoom enhancement: zoom the image, use bilinear interpolation when zooming, and zoom enhancement mainly increases the compatibility of the model to changes in the size of numbers;

Data mixup enhancement: Mixup regularizes the network by allowing the network to support simple linear behavior between training samples. The Mixup data enhancement formula is as follows: