Disclosure of Invention
The invention aims to solve the technical problem that a Cortex-M3-based meter end number identification method provides a deep learning network model, can greatly reduce the cost of a collector or a meter reader, and is greatly helpful for designing and popularizing water meter, gas meter and heat meter collection.
The invention is realized by the following technical scheme: a meter-end digital identification method based on Cortex-M3 comprises a water meter reader which is arranged at the upper end of a water meter and used for identifying images of the water meter, wherein the water meter reader adopts a Cortex-M3 processor, and the Cortex-M3 processor adopts a network model on the basis of a deep learning convolutional network algorithm, and the network model can run on a Cortex-M3 processor by optimizing the depth and the width of the deep learning network so as to build a meter-end digital identification algorithm.
Preferably, the network model is a small network for 128KRAM and is named DigitalNet, which mainly uses group convolution and channel shuffling technology.
As a preferable technical solution, the DigitalNet has a basic unit, the DigitalNet is improved on the basis of a residual unit, dense 1x1 convolution is replaced by 1x1 groupconvolume, a channel shuffle operation is added after the first 1x1 convolution, and then a 3x3 depthwise volume is connected;
for the residual unit, 3x3 avg pool with stride 2 is adopted for the original input, so that a feature map with the same size as the output is obtained, and then the obtained feature map is connected with the output instead of being added.
As a preferred technical solution, the whole network structure of DigitalNet is composed of three basic units, the input size of a picture is 64 × 48, there are always three stages, each Stage is divided into two parts, the first part is a characteristic graph halving Stage with stride of 2, the second part has residual modules with group convolution, the residual modules can be stacked continuously, the precision can increase with the increase of the depth, and the last modules are global max pole and fullconnect, and the final recognition result is output.
As a preferable technical scheme, a network training method of a network model of DigitalNet adopts a data enhancement mode, the data enhancement adopts an online enhancement method, and the online enhancement method adopts five methods of Gaussian enhancement, corrosion enhancement, expansion enhancement, scaling enhancement and data mixup enhancement.
As a preferred technical solution, gaussian enhancement: the template coefficient of the Gaussian filter is reduced along with the increase of the distance from the center of the template, so that the image blurring degree of the Gaussian filter is smaller than that of the average value filter, the Gaussian enhancement is used for enhancing the compatibility of the algorithm to different fonts,
and (3) corrosion enhancement: scanning each pixel of the image by using a 3x3 structural element, and performing AND operation by using the structural element and a binary image covered by the structural element, wherein if the structural element and the binary image are both 1, the pixel of the image is 1, otherwise, the pixel is 0, and the corrosion enhancement increases the compatibility of the model to font thickness variation;
and (3) expansion enhancement: scanning each pixel of the image by using a 3x3 structural element, and performing AND operation on the structural element and the binary image covered by the structural element, wherein if the structural element and the binary image are both 0, the pixel of the image is 0, otherwise, the pixel is 1, and the expansion enhancement is used for increasing the compatibility of the model to font thickness variation;
zooming enhancement: zooming the image, wherein bilinear interpolation is used during zooming, and the zooming enhances the compatibility of the model to the change of the digital size;
data mixup enhancement: mixup regularizes the network by letting it support simple linear behavior between training samples, the Mixup data enhancement formula is as follows:
the invention has the beneficial effects that: the invention innovatively provides a deep learning network model, can greatly reduce the cost of a collector or a meter reader, and is greatly helpful for designing and popularizing water meter, gas meter and heat meter collection.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.
In the description of the present invention, it is to be understood that the terms "one end", "the other end", "outside", "upper", "inside", "horizontal", "coaxial", "central", "end", "length", "outer end", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the present invention.
Further, in the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
The use of terms such as "upper," "above," "lower," "below," and the like in describing relative spatial positions herein is for the purpose of facilitating description to describe one element or feature's relationship to another element or feature as illustrated in the figures. The spatially relative positional terms may be intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "below" or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, the exemplary term "below" can encompass both an orientation of above and below. The device may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly
In the present invention, unless otherwise explicitly specified or limited, the terms "disposed," "sleeved," "connected," "penetrating," "plugged," and the like are to be construed broadly, e.g., as a fixed connection, a detachable connection, or an integral part; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
As shown in figure 1, the meter-end digital recognition method based on Cortex-M3 comprises a water meter reader which is arranged at the upper end of a water meter and used for recognizing images of the water meter, the water meter reader adopts a Cortex-M3 processor, and the Cortex-M3 processor adopts a network model on the basis of a deep learning convolutional network algorithm, wherein the network model can be constructed by optimizing the depth and the width of a deep learning network so as to run on a Cortex-M3 processor.
In view of the features of the Cortex-M3 device, the present invention has designed a small network for 128KRAM and is named DigitalNet.
The design goal of DigitalNet is also how to achieve the best model accuracy with limited computational resources, which requires a good balance between speed and accuracy. At present, the main design ideas of CNN model compression are mainly two aspects: model structure design and model compression. Here we use a specific network structure approach to achieve model scaling and speeding, rather than compressing or migrating a large trained model. DigitalNet takes ShuffleNet as a plate making rule of the blue book, and has the characteristics of maintaining the precision and greatly reducing the calculated amount of a model and the size of the model as same as the ShuffleNet. DigitalNet mainly uses group convolution and channel shuffling technology.
The general convolution technique: if the size of the input feature map is C H W, the number of convolution kernels is N, the number of output feature maps is N, the number of the output feature maps is the same as the number of the convolution kernels, the size of each convolution kernel is C K, the total parameter number of the N convolution kernels is N C K, the connection mode of the input map and the output map is shown in FIG. 1, and each output feature of the convolution network is related to all input features
Group convolution technique: group convolution, as the name implies, groups the input feature maps, and then each group is convolved separately. Assuming that the size of the input feature maps is C × H × W, the number of output feature maps is N, if it is set to be divided into G groups, the number of input feature maps of each group is C/G, the number of output feature maps of each group is N/G, the size of each convolution kernel is C × K/G, the total number of convolution kernels is still N, the number of convolution kernels of each group is N/G, a convolution kernel is only convolved with the input map of the same group, and the total number of parameters of a convolution kernel is N × C × K/G, it is seen that the total number of parameters is reduced to the original 1/G, and the connection mode is shown in fig. 2.
Channel shuffling technique: in small networks we can use convolution instead of ordinary convolution, which can greatly reduce the model size and the amount of computation. However, if multiple groups are stacked together, a channel output is derived from only a small fraction of the input channels, as shown in FIG. 3(a), and such an attribute reduces the information throughput between channel groups, reducing the information presentation capacity. If we allow the group convolution to get a different set of input data, i.e. the effect shown in fig. 3(b), then the input and output channels will be fully correlated. Specifically, for the output channels of the previous layer, we can do a shuffle operation, as shown in fig. 3(c), and then divide the output channels into several groups for input to the next layer. In this way, we can reduce the model and the calculation amount on the premise of keeping the precision.
The basic unit of DigitalNet is improved on the basis of a residual unit, as shown in fig. 4(a), which is a residual unit comprising 3 layers: first a 1x1 convolution, then a depthwise convolution of 3x3, where the 3x3 convolution is the bottleneck layer, followed by a 1x1 convolution, and finally a short-circuit connection, adding the input directly to the output. Now, the following modifications are made: the dense 1x1 convolution is replaced by a group constraint of 1x1, but a channel shuffle operation is added after the first 1x1 convolution, followed by a depthwise constraint of 3x 3. The modification is shown in fig. 4 (b). For the residual unit.
If stride is 1, then the input and output shape are consistent and can be added directly, and when stride is 2, the number of channels increases and the profile size decreases, then the input and output do not match. Typically, a 1x1 convolution can be used to map the input to the same shape as the output. However, in DigitalNet, a different strategy is adopted, as shown in fig. 4 (c): the original input is used with 3x3 avg pool with stride 2, so that a characteristic map with the same size as the output is obtained, and then the obtained characteristic map is connected (concat) with the output instead of being added. The purpose of this is mainly to reduce the amount of computation and the size of the parameters.
The overall structure of digitalnt is made up of the above three basic units, the overall structure is shown in table 1, the input size of the picture is 64 x 48, there are always three stages. Each Stage is divided into two parts, the first part is a characteristic diagram halving Stage with stride of 2, the second part is provided with residual modules for group convolution, the residual modules can be continuously stacked, and the precision can be increased along with the increase of the depth. And finally, the modules are global max pool and fullconnect, and the final recognition result is output.
Table 1 DigitalNe overall network architecture.
The network training method adopts data enhancement and a training method, and the data enhancement is an effective method for expanding the scale of a data sample. Deep learning is a method based on big data, and the larger the scale and the higher the quality of the data, the better the recognition result and generalization ability of the model.
However, when data is actually collected, it is often difficult to cover all scenes, such as: for lighting conditions, attitude of the actual object, shooting angle, etc. In this regard, we can address this problem with data enhancement of existing data. The data enhancement has two modes of off-line enhancement and on-line enhancement. Because online enhancement can obtain more data than offline enhancement and does not occupy hard disk space, online enhancement is adopted, and Gaussian enhancement, corrosion enhancement, expansion enhancement, scaling enhancement and data mixup enhancement are adopted together.
Gaussian enhancement: the gaussian filter is a linear filter, and can effectively suppress noise and smooth an image. The coefficients of the template of the gaussian filter decrease with increasing distance from the center of the template. Therefore, the gaussian filter has a smaller degree of image blurring than the mean filter, and gaussian enhancement is used to enhance the compatibility of the algorithm with different fonts.
And (3) corrosion enhancement: scanning each pixel of the image by using a 3x3 structural element, and performing AND operation by using the structural element and the binary image covered by the structural element, wherein if the structural element and the binary image are both 1, the pixel of the image is 1, otherwise, the pixel is 0, and the corrosion enhancement increases the compatibility of the model to the font thickness variation.
And (3) expansion enhancement: each pixel of the image is scanned with a 3x3 structuring element, and the structuring element is anded with the binary image it overlays. If both are 0, then the pixel of the resulting image is 0, otherwise 1, and dilation enhancement increases the model's compatibility to font weight variations.
Zooming enhancement: and scaling the image, wherein bilinear interpolation is used in scaling. Scaling enhancements mainly increase the compatibility of the model to variations in digital size.
Data mixup enhancement: mixup regularizes the network by letting it support simple linear behavior between training samples, the Mixup data enhancement formula is as follows:
the training method comprises the following steps:
the batch _ size of the training parameter is 1000, the learning rate is 0.1, and the training can be completed by running 40 epochs, which is about 20 hours.
TABLE 2 network model test results
During training it was found that for some numbers it was easy to identify, while other locations were easy to identify with errors, as shown in fig. 5, the first 8 disturbed by the light was easily identified as 6, the second 8.5 disturbed by the light was easily identified as 9.5, the third 6 disturbed by the light was easily identified as 5, the fourth and fifth 1.5 disturbed by dust was easily identified as 7.5.
In order to solve the problem, the local is replaced by the Focal local from Softmax, the Focal local can automatically carry out hard mining, the specific gravity of the local is adjusted according to the condition of a sample, and the formula of the Focal local is as follows, wherein a is 0.25, and y is 2.
As shown in fig. 6, a standard data set of about 50 million pictures of 20 commercially available phenotypes was collected and divided into a test set (60%), a training set (20%), and a validation set (20%).
The influence of augment and loss on the identification precision is respectively tested by taking the verification set as an evaluation standard, wherein the error rate (Pw) is the number of samples which are identified incorrectly and have the identification confidence coefficient of more than 0.5 in the total samples, and the success rate (Pr) is the proportion of the number of samples which are identified correctly and have the confidence coefficient of about 0.5 in the total samples.
We can see that data enhancement can reduce the error rate and enhance the generalization capability of the model, but also reduce the recognition success rate. The automatic difficult mining of Focal local can simultaneously reduce the error rate and improve the recognition success rate, and the DigitalNet recognition precision can meet the practical application requirements.
The invention innovatively provides a deep learning network model, which can greatly reduce the cost of a collector or a meter reader and is greatly helpful for designing and popularizing water meter, gas meter and heat meter collection.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not thought of through the inventive work should be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope defined by the claims.