Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1, fig. 1 is a flowchart of a method for identifying a flipped image according to an embodiment of the present invention, where the flipped image identifying method is applied to a server, and the method is executed by application software installed in the server.
As shown in FIG. 1, the method includes steps S10 to S50.
Step S10, acquiring sample images and generating first label information corresponding to each sample image, wherein the first label information is a multidimensional vector and at least comprises category information and real spectrum information of the sample images.
The image reproduction refers to a process of reproducing a document by a photographing method, for example, an image obtained by displaying an existing image on a display or laser printing and then passing through an imaging device, that is, a reproduction image. For example, in the fast-food industry, a manufacturer may schedule a patrol to a store for a patrol, and take a live picture to upload to the system as a proof. However, the situation of falsification and falsification often occurs, and a patrol person can turn over the image from the screen of other electronic equipment and upload the turned-over image to the system. The roll-over cheating means not only brings direct economic loss to enterprises, but also brings misleading information to marketing management of the enterprises. Thus, identifying whether an image uploaded by a store is flipped or not is an urgent need for a quick-care business. Of course, not only the quick-service industry, but also other industries, there is the above-mentioned need.
In the embodiment, a sample image is obtained through shooting by a terminal camera, or the sample image is obtained from a local album, or the sample image is obtained from a server at a network side. The sample images comprise real images and flip images, and each sample image is processed into label information expressed by multi-dimensional vectors for training a convolutional neural network model. The label information is presented in a multidimensional vector, and the label information comprises real category information and real spectrum information of the sample image. Wherein the category information refers to whether the sample image is a real image or a flip image. Because the real image and the flip image have certain differences in light, environment and the like, the frequency distribution, the energy distribution and the like of the two images on the spectrogram have larger differences, so that the intrinsic image characteristics of the images in the frequency domain can be extracted through the frequency spectrum information.
In one embodiment, as shown in fig. 2, step S10 includes:
Step S11, acquiring a sample image, and carrying out graying treatment on the sample image to obtain a gray image of the sample image;
S12, carrying out Fourier transform on the gray level image according to a preset rule to obtain an optimal Fourier spectrogram;
S13, converting a frequency domain matrix corresponding to the optimal Fourier spectrogram into a multidimensional vector to obtain real spectrum information of the sample image;
Step S14, obtaining real category information of the sample image, and combining the real category information with real spectrum information to obtain first tag information of the sample information.
In this embodiment, according to the imaging difference of the real image and the flip image in different environments and light, the frequency distribution, the energy distribution and the like of the two types of images on the spectrogram have larger difference, so that the spectrum information of the real image and the flip image is extracted, and the label information for training the convolutional neural network model is manufactured.
First, an image contains three channels, a red channel, a green channel, and a blue channel, respectively. Each pixel in the image has three channel component values, red (R), green (G) and blue (B) component values. And carrying out gray scale processing on each pixel point in one sample image, and obtaining the gray scale value of each pixel point through a weighted average method formula, thereby obtaining the gray scale image of the image. The formula of the weighted average method is that Y=0.299R+0.578G+0.114B, wherein Y is a gray component value of each pixel point, R is a red component value of each pixel point, G is a green component value of each pixel point, and B is a blue component value of each pixel point. After the gray level image of each sample image is obtained, carrying out Fourier transform on the gray level image according to a preset rule to obtain an optimal Fourier spectrum diagram of the image. Wherein, the fourier transform formula is as follows:
Where f (x, y) represents an image matrix of size m×n, where x=0, 1,2,..m-1 and y=0, 1,2,..n-1;F (u, v) represents the frequency domain matrix after discrete fourier transform, where u=0, 1,2,..m-1 and v=0, 1,2,..n-1;e is the base of the natural logarithm and j represents the imaginary unit.
And converting the frequency domain matrix corresponding to the obtained optimal Fourier spectrogram into a multidimensional vector, and taking the multidimensional vector as the frequency spectrum information of the sample image. And finally, representing the category information of the sample image by using a vector with a preset dimension, and combining the category information with the multidimensional vector of the frequency spectrum information to obtain the label information of the sample information. For example, the class information of one sample image is represented by a 2-dimensional vector, and the spectrum information of one sample image is represented by a 100-dimensional vector. The 2-dimensional class information is combined with the 100-dimensional spectrum information to generate tag information representing a 102-dimensional vector of the sample image.
In one embodiment, as shown in fig. 3, step S12 includes:
s121, carrying out Fourier transform on the gray level image to obtain an initial Fourier spectrogram;
And step S122, carrying out normalization processing on the initial Fourier spectrogram, and adjusting the initial Fourier spectrogram according to a preset size specification to obtain an optimal Fourier spectrogram.
In this embodiment, the range of the initial fourier spectrum data obtained by the gray image transformation is relatively large, which is not beneficial to convolutional neural network model training. Therefore, normalization processing is needed to be carried out on the initial Fourier spectrogram, and the data range is further narrowed. The data of the initial fourier spectrogram is converted into the range of 0-1 using a maximum and minimum normalization method, and the normalization formula is as follows:
wherein X norm is normalized data, X is original data in an initial Fourier spectrum diagram, and X max、Xmin is the maximum value and the minimum value in all original data respectively. After normalization processing, in order to facilitate data representation of the initial Fourier spectrogram, the frequency domain matrix of the initial Fourier spectrogram is adjusted to a preset size specification, and the optimal Fourier spectrogram is obtained. The preset size specification may be m×n, preferably 10×10 sizes.
Step 20, based on the sample image, carrying out model training on a preset convolutional neural network model to obtain theoretical label information of the sample image, wherein the theoretical label information at least comprises theoretical category information and theoretical spectrum information of the sample image;
And step S30, calculating category loss and spectrum loss by using a preset loss function based on the first label information and the theoretical label information, and optimizing model parameters of the convolutional neural network model according to the category loss and the spectrum loss to obtain the optimal convolutional neural network model.
In this embodiment, in order to avoid excessive model parameter errors of the convolutional neural network model, a preset convolutional neural network model is model-trained by using a sample image to obtain theoretical label information of the sample image, and then based on the real label information and the theoretical label information of the sample image, class loss and spectrum loss of the sample image are calculated according to a loss function, so that model parameters are optimized. It should be noted that the convolutional neural network model outputs a multidimensional vector, and the multidimensional vector includes theoretical category information and theoretical spectrum information. For example, a vector of 102 dimensions, the first 2 dimensions store the theoretical class information of the image, and the second 100 dimensions store the theoretical spectrum information of the image.
In one embodiment, the network structure of the convolutional neural network model is shown in table 1.
TABLE 1
The convolutional neural network model comprises 4 convolutional modules and 1 output module, wherein each convolutional module comprises 2 convolutional layers and 1 pooling layer, and the output module comprises 2 fully-connected layers. Each convolution layer is a convolution layer of a3 x3 convolution kernel and each pooling layer is a pooling layer of a 2x 2 pooling size. The step length of the convolution layers in the first convolution module is 1, and the step length of the convolution layers in the second convolution module, the third convolution module and the fourth convolution module is 2. The padding for each convolution layer is 1. The number of the convolution layer channels of the first convolution module is 64, the number of the convolution layer channels of the second convolution module is 128, the number of the convolution layer channels of the third convolution module is 256, and the number of the convolution layer channels of the fourth convolution module is 512. It should be noted that the structure of the convolutional neural network model can be designed according to the needs.
In this embodiment, to improve the situation that the middle layer data distribution of the convolutional neural network model changes during the training process. And after each convolution layer, the label information is subjected to batch normalization processing, so that the training speed can be increased, the model training precision is improved, and the regularization function is also realized. The calculation formula for batch normalization is as follows:
yi=γx′i+β,
Where x i is the ith input data, m is the number of input data per batch, μ is the mean, σ 2 is the variance, ε is a very small number, x' i is normalized data, γ is the scale factor, β is the offset, γ and β are the parameters to be learned, and y i is the batch normalized data.
In an embodiment, the loss function comprises two parts, namely a class loss and a spectrum loss, since the output of the model comprises two parts, namely class information and spectrum information, wherein the class loss uses cross entropy and the spectrum information loss uses a square loss function. The initial network parameters are optimized by a loss function. When the convolutional neural network model is trained, a sample image is input to the model to obtain theoretical label information, the loss of the calculated spectrum information and the actual spectrum information is calculated according to a loss function based on the calculated spectrum information in the theoretical label information and the actual spectrum information in the actual label information of the sample image, and then model parameters are optimized. The loss function formula is as follows:
where N is the number of sample images, M is the vector dimension of the real spectral information, Is the probability that the model predicts that the sample image is a true image, c i takes a value of 1 if the sample image is a true image, or 0;y i takes a value of the true spectrum information corresponding to the sample image,Is theoretical spectrum information output by the model.
Further, in order to avoid the overfitting of the convolutional neural network model, a Dropout layer (discarding layer) is added after the pooling layer and the full-connection layer, and the parameters of the Dropout layer are self-defined according to actual needs. For example, the Dropout layer parameter is set to 0.25, i.e., the neurons of that layer are randomly 25% likely to be discarded (deactivated) at each iteration of training and not engaged in training.
And S40, inputting a detection image into the convolutional neural network model, and calculating second label information of the detection image.
In this embodiment, after the optimal convolutional neural network model is determined through the sample image, the convolutional neural network model, and the loss function in the above embodiment. And calculating the label information of the detection image according to the convolutional neural network model function of the determined optimal model parameters.
And S50, judging whether the detected image is a flip image or not according to the category information of the second label information.
In this embodiment, the multidimensional vector representing the category information in the second label information is identified according to the second label information output by the convolutional neural network model, so as to determine that the detected image is a real image or a flip image.
For example, label information of 102-dimensional vectors output by the convolutional neural network model is preset, wherein the first 2-dimensional vectors represent category information. The first position in the first 2-dimensional representation represents the probability that the image is a true image and the second position represents the probability that the image is a flipped image. If the value of the first position is larger than that of the second position, the image is judged to be a real image, otherwise, the image is judged to be a flip image.
In one embodiment, as shown in fig. 4, step S50 includes:
Step S51, identifying category information of the second tag, wherein the category information comprises a first probability of a real image and a second probability of a flip image;
step S52, judging whether the first probability is larger than a second probability;
Step S531, if the first probability is larger than the second probability, judging that the detected image is a real image;
and S532, if the first probability is smaller than the second probability, judging that the detected image is a flip image.
In this embodiment, the category information representing the category of the image in the second tag information of the detected image is identified, and the category information may be a true image probability (i.e., a first probability) and a reproduction image probability (i.e., a second probability), and then the detected image is determined to be the true image or the reproduction image by comparing the true image probability and the reproduction image probability. Preferably, if the true image probability is larger than the flip image probability, the detection image is judged to be the true image, and if the true image probability is smaller than the flip image probability, the detection image is judged to be the flip image.
For example, if the probability of representing the real image in the class information of the current detection image is 1 and the probability of representing the flipped image is 0, the current detection image is the real image, and if the probability of representing the real image in the class information of the current detection image is 0 and the probability of representing the flipped image is 1, the current detection image is the flipped image.
The method ensures the accuracy of the convolutional neural network model in identifying the flip image and improves the generalization of the model.
The embodiment of the invention also provides a device for identifying the flipped image, which is used for executing any embodiment of the method for identifying the flipped image. Specifically, referring to fig. 5, fig. 5 is a schematic block diagram of a flipped image recognition apparatus according to an embodiment of the present invention. The apparatus 100 for recognizing a flip image may be configured in a server.
As shown in fig. 5, the apparatus 100 for recognizing a reproduction image includes an acquisition unit 110, a first calculation unit 120, an adjustment unit 130, a second calculation unit 140, and a judgment unit 150.
The acquiring unit 110 is configured to acquire a sample image and generate first tag information corresponding to each sample image, where the training tag includes at least category information and real spectrum information of the sample image.
The first computing unit 120 is configured to perform model training on a preset convolutional neural network model based on the sample image, and obtain theoretical label information of the sample image, where the theoretical label information at least includes theoretical category information and theoretical spectrum information of the sample image;
And the adjusting unit 130 is configured to calculate a class loss and a spectrum loss according to the first tag information and the theoretical tag information by using a preset loss function, and optimize model parameters of the convolutional neural network model according to the class loss and the spectrum loss, so as to obtain the optimal convolutional neural network model.
And a second calculating unit 140, configured to input a detection image to the convolutional neural network model, and calculate second label information for acquiring the detection image.
And a judging unit 150, configured to judge whether the detected image is a flip image according to the category information of the second tag information.
In one embodiment, as shown in fig. 6, the acquisition unit 110 includes:
An obtaining subunit 111, configured to obtain a sample image, and perform graying processing on the sample image to obtain a gray image of the sample image;
a transformation subunit 112, configured to perform fourier transformation on the gray-scale image according to a preset rule, so as to obtain an optimal fourier spectrum diagram;
a conversion subunit 113, configured to convert the frequency domain matrix corresponding to the optimal fourier spectrum graph into a multidimensional vector, so as to obtain real spectrum information of the sample image;
And the combining subunit 114 is configured to obtain the real class information of the sample image, and combine the real class information with the real spectrum information to obtain the first tag information of the sample information.
In one embodiment, as shown in FIG. 7, the transform subunit 112 includes:
A fourier transform subunit 1121, configured to perform fourier transform on the gray-scale image to obtain an initial fourier spectrum diagram;
And the normalization subunit 1122 is configured to normalize the initial fourier spectrum, and adjust the initial fourier spectrum according to a preset size specification, so as to obtain an optimal fourier spectrum.
In one embodiment, as shown in fig. 8, the determining unit 150 includes:
An identifying subunit 151, configured to identify category information of the second tag, where the category information includes a first probability of a real image and a second probability of a flip image;
and a determining subunit 152 configured to determine whether the first probability is greater than a second probability, determine that the detected image is a real image if the first probability is greater than the second probability, and determine that the detected image is a flip image if the first probability is less than the second probability.
The specific content of the embodiment of the image-capturing device corresponds to the specific content of the embodiment of the image-capturing method, and the specific details thereof may refer to the description of the embodiment and are not repeated herein.
In another embodiment of the present invention, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for identifying a flipped image as described above when executing the computer program.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor realizes the steps of acquiring sample images and generating first label information corresponding to each sample image, carrying out model training on a preset convolutional neural network model based on the sample images to acquire theoretical label information of the sample images, calculating category loss and spectrum loss by using a preset loss function based on the first label information and the theoretical label information, optimizing model parameters of the convolutional neural network model according to the category loss and the spectrum loss to obtain an optimal convolutional neural network model, inputting a detection image to the convolutional neural network model, calculating second label information for acquiring the detection image, and judging whether the detection image is a flap image or not according to the category information of the second label information.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes various media capable of storing program codes, such as a usb (universal serial bus), a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.