WO2017166586A1 - 基于卷积神经网络的图片鉴别方法、系统和电子设备 - Google Patents
基于卷积神经网络的图片鉴别方法、系统和电子设备 Download PDFInfo
- Publication number
- WO2017166586A1 WO2017166586A1 PCT/CN2016/096031 CN2016096031W WO2017166586A1 WO 2017166586 A1 WO2017166586 A1 WO 2017166586A1 CN 2016096031 W CN2016096031 W CN 2016096031W WO 2017166586 A1 WO2017166586 A1 WO 2017166586A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layer
- convolution
- layers
- pooling
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Definitions
- the invention relates to the field of convolutional neural network technology, in particular to a picture identification method, system and electronic device based on a convolutional neural network.
- CNN Convolutional Neural Network
- the basic structure of a convolutional neural network includes a plurality of convolutional layers, each of which is provided with a plurality of neurons, and the input of each neuron is locally receptive to the previous convolutional layer (local receptive Filed) is concatenated by convolving the data of the locally accepted domain of the previous convolutional layer to extract the features of the locally accepted domain. Once the local feature is extracted, its positional relationship with other features is also followed. Determined; then, by performing local averaging (also known as pooling processing) and quadratic feature extraction for feature mapping, feature information is obtained, and the feature information is output to the next convolution layer to continue processing until the last layer is reached ( Output layer) to get the final output.
- local averaging also known as pooling processing
- quadratic feature extraction for feature mapping feature information is obtained, and the feature information is output to the next convolution layer to continue processing until the last layer is reached ( Output layer) to get the final output.
- Feature mapping usually uses the sigmoid function as the activation function of the convolutional neural network.
- a convolutional neural network neurons on one convolutional layer share weight with other neurons in the same layer, thus reducing the number of network free parameters.
- an activation function can be applied to each data value as an output to determine whether a threshold is reached, and thus the resulting data value is used as an input to the next convolutional layer.
- a convolutional neural network calculation model for identification includes a convolutional layer, a pooling layer, and a full The connection layer and subsequent classifiers. By training the existing sample data, a better convolutional neural network calculation model can be obtained. When it is necessary to identify a new target, only the target data needs to be input into the calculation model, and the recognition of the new target can be realized.
- the existing computational model using convolutional neural networks is used for target identification, it is usually calculated according to the existing fixed model architecture, such as AlexNet, VGG, GoogLeNet, etc.
- the convolutional layer, the pooling layer, and the whole Parameters and architectures such as the connection layer and the activation function have been fixed.
- they are versatile, they also make the recognition results poor when applied to specific scenarios. For example, in the yellowing of videos or pictures, the effect of discrimination is poor.
- the object of the present invention is to provide a picture identification method and system based on convolutional neural network, which can greatly improve the speed and accuracy of picture authentication.
- a method for discriminating a picture based on a convolutional neural network comprising:
- the image data to be identified is input into at least two concatenated layers connected in series for continuous extraction of features, to obtain feature data after image extraction, wherein the core sizes of the at least two convolution layers are no more than 5 ⁇ 5;
- the two-dimensional feature values are classified by a classifier to obtain a discrimination result of the picture.
- the at least two concatenated concatenated layers comprise four convoluted layer C1 layers, C2 layers, C3 layers and C4 layers connected in sequence, and the core sizes of the convolution layers are respectively: C1 layer The core size is 3 ⁇ 3, the core size of the C2 layer is 3 ⁇ 3, the core size of the C3 layer is 5 ⁇ 5, and the core size of the C4 layer is 5 ⁇ 5.
- the number of steps of the four successively connected convolution layers is 1; the number of convolution kernels of the four convolutional layers is 96; the pad values of the C1 layer and the C2 layer are both 1, The pad values of the C3 layer and the C4 layer are both 2.
- the feature data extracted by the image is repeatedly subjected to dimension reduction and feature data extraction by at least one pooling layer and at least one convolution layer to obtain feature data of the reduced dimension of the image.
- the steps include:
- the core size of the pooling layer P4, the pooling layer P5, and the pooling layer P8 are all 3, the number of steps is 2, and the pad value is 0.
- the core size of the convolution layer C5 is 5, and the number of steps is 1, the pad value is 2, the number of convolution kernels is 256; the cores of the convolution layer C6, the convolution layer C7, and the convolution layer C8 are all 3, the number of steps is 1, and the pad values are all 1.
- the number of convolution kernels is 384, 384, and 256, respectively.
- the at least one fully connected layer is a fully connected layer fc9, a fully connected layer fc10, a fully connected layer fc11, and a fully connected layer fc12 connected in sequence; wherein the number of nodes of the fully connected layer is 2048, 2048, respectively 2048, 2; and all connected layers use the dropout method for data processing.
- the picture data to be identified passes through the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the pooling layer P4, the convolution layer C5, the pooling layer P5, and the convolution.
- the processing of layer C6, convolution layer C7, convolution layer C8, pooling layer P8, fully connected layer fc9, fully connected layer fc10, fully connected layer fc11, fully connected layer fc12, and then connected to the classifier SVM is classified , get the identification result of the picture.
- all of the convolutional layers and all of the fully connected layers perform activation processing of data by using an activation function LEAKY RELU.
- the invention also provides a picture identification system based on convolutional neural network, comprising:
- a data extraction module configured to input the image data to be identified into at least two concatenated layers connected in series to perform continuous feature extraction, obtain feature data after image extraction, and send the feature data extracted by the image to the data dimensionality reduction module.
- the core sizes of the at least two convolution layers are no more than 5 ⁇ 5;
- a data dimension reduction module configured to receive the feature data extracted by the image extraction module, and perform the feature data by using the at least one pooled layer and the at least one convolution layer Dimensionality reduction and feature data extraction, obtaining feature data after dimension reduction of the image, and transmitting the feature data of the obtained image reduced dimension to the fully connected module; wherein the pooling layer adopts an average pooling;
- a full connection module configured to receive feature data of the reduced dimension of the image sent by the feature dimension reduction module, and input the feature data of the reduced dimension of the image into at least one fully connected layer to obtain a two-dimensional feature value of the image data; Sending the obtained two-dimensional feature value of the picture data to the classification module;
- the classification module is configured to receive the two-dimensional feature value of the picture data sent by the fully connected module, and classify the two-dimensional feature value by using a classifier to obtain a picture identification result.
- the data extraction module includes:
- C1 layer core size is 3 ⁇ 3
- C2 layer core size is 3 ⁇ 3
- the size of the C3 layer core is 5 ⁇ 5
- the size of the C4 layer core is 5 ⁇ 5.
- the number of steps of the four successively connected convolution layers is 1; the number of convolution kernels of the four convolutional layers is 96; the pad values of the C1 layer and the C2 layer are both 1, The pad values of the C3 layer and the C4 layer are both 2.
- the data dimension reduction module includes:
- a pooling layer P4 a convolution layer C5, a pooling layer P5, a convolution layer C6, a convolution layer C7, a convolution layer C8, and a pooling layer P8 connected in sequence; wherein the pooling layer P4 and the pooling layer P5, the pooling layer P8 has a kernel size of 3, the number of steps is 2, and the pad value is 0; the convolutional layer C5 has a kernel size of 5, a step number of 1, a pad value of 2, and a convolution kernel.
- the number of the convolutional layer C6, the convolutional layer C7, and the convolutional layer C8 is 3, the number of steps is 1, the pad value is 1, and the number of convolution kernels is 384, 384, 256.
- the fully connected module includes:
- the system includes a convolution layer C1, a convolution layer C2, a convolution layer C3, a convolution layer C4, a pooling layer P4, a convolution layer C5, a pooling layer P5, and a convolution layer C6 which are sequentially connected.
- the processing of the convolutional layer C7, the convolutional layer C8, the pooling layer P8, the fully connected layer fc9, the fully connected layer fc10, the fully connected layer fc11, and the fully connected layer fc12 is then connected to the classifier SVM for classification processing. The result of the identification of the picture.
- all of the convolutional layers and all of the fully connected layers perform activation processing of data by using an activation function LEAKY RELU.
- Embodiments of the invention further disclose an electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor The instruction is executed by the at least one processor to enable the at least one processor to input the picture data to be identified into at least two concatenated layers connected in series for continuous extraction of features, to obtain features after image extraction Data; extracting the feature data of the image through at least one pooling layer and at least one convolution layer to perform dimension reduction of the feature data and feature data extraction, to obtain feature data after the dimension reduction of the image; wherein the pooling layer Using average pooling; the picture will be The dimensioned data is input into at least one fully connected layer to obtain a two-dimensional feature value of the picture data; and the two-dimensional feature value is classified by the classifier to obtain a picture identification result.
- the at least two concatenated layers connected in series comprise four convoluted layer C1 layers, C2 layers, C3 layers and C4 layers connected in series, and the core sizes of the convolution layers are respectively
- the core size of the C1 layer is 3 ⁇ 3
- the core size of the C2 layer is 3 ⁇ 3
- the core size of the C3 layer is 5 ⁇ 5
- the core size of the C4 layer is 5 ⁇ 5.
- the number of steps of the four sequentially connected convolution layers is 1; the number of convolution kernels of the four convolution layers is 96; the pads of the C1 layer and the C2 layer The values are all 1, and the pad values of the C3 layer and the C4 layer are both 2.
- the feature data extracted by the image is repeatedly subjected to dimension reduction and feature data extraction by at least one pooling layer and at least one convolution layer to obtain a dimension reduction of the image.
- the step of characterizing data includes: passing the extracted feature data through the pooled layer P4, the convolution layer C5, the pooling layer P5, the convolution layer C6, the convolution layer C7, the convolution layer C8, and the pool sequentially connected.
- the layer P8 wherein, the pooling layer P4, the pooling layer P5, and the pooling layer P8 have a kernel size of 3, the number of steps is 2, and the pad value is 0; the core size of the convolution layer C5 5, the number of steps is 1, the pad value is 2, and the number of convolution kernels is 256; the cores of the convolutional layer C6, the convolutional layer C7, and the convolutional layer C8 are all 3, and the number of steps is 1, the pad value is 1, and the number of convolution kernels is 384, 384, 256, respectively.
- the at least one fully connected layer is a fully connected layer fc9, a fully connected layer fc10, a fully connected layer fc11, and a fully connected layer fc12 connected in sequence; wherein the number of nodes of the fully connected layer is 2048 , 2048, 2048, 2; and all connected layers use the dropout method for data processing.
- the picture data to be identified passes through the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the pooling layer P4, the convolution layer C5, and the pooling layer P5.
- the present invention also discloses a non-volatile computer storage medium, wherein the storage medium stores computer-executable instructions that, when executed by an electronic device, cause an electronic device
- the image data to be identified is input into at least two convolution layers connected in series to perform continuous feature extraction, and the feature data after the image extraction is obtained; and the feature data extracted by the image is passed through at least one pooling layer and at least A convolution layer performs dimension reduction of the feature data and feature data extraction to obtain feature data after the dimension reduction of the image; wherein the pooling layer adopts an average pooling; and the feature data of the reduced dimension of the image is input into at least one In the fully connected layer, a two-dimensional feature value of the picture data is obtained; and the two-dimensional feature value is classified by the classifier to obtain a picture identification result.
- the above storage medium wherein the at least two concatenated layers connected in series comprise four convoluted layer C1 layers, C2 layers, C3 layers and C4 layers connected in series, and the core sizes of the convolution layers are respectively
- the core size of the C1 layer is 3 ⁇ 3
- the core size of the C2 layer is 3 ⁇ 3
- the core size of the C3 layer is 5 ⁇ 5
- the core size of the C4 layer is 5 ⁇ 5.
- the number of steps of the four sequentially connected convolution layers is 1; the number of convolution kernels of the four convolution layers is 96; the pads of the C1 layer and the C2 layer The values are all 1, and the pad values of the C3 layer and the C4 layer are both 2.
- the feature data extracted by the image is repeatedly subjected to dimension reduction and feature data extraction by at least one pooling layer and at least one convolution layer to obtain a dimension reduction of the image.
- the step of characterizing data includes: passing the extracted feature data through the pooled layer P4, the convolution layer C5, the pooling layer P5, the convolution layer C6, the convolution layer C7, the convolution layer C8, and the pool sequentially connected.
- the layer P8 wherein, the pooling layer P4, the pooling layer P5, and the pooling layer P8 have a kernel size of 3, the number of steps is 2, and the pad value is 0; the core size of the convolution layer C5 5, the number of steps is 1, the pad value is 2, and the number of convolution kernels is 256; the cores of the convolutional layer C6, the convolutional layer C7, and the convolutional layer C8 are all 3, and the number of steps is 1, the pad value is 1, and the number of convolution kernels is 384, 384, 256, respectively.
- the at least one fully connected layer is a fully connected layer fc9, a fully connected layer fc10, a fully connected layer fc11, and a fully connected layer fc12 connected in sequence; wherein the number of nodes of the fully connected layer is 2048 , 2048, 2048, 2; and all connected layers use the dropout method for data processing.
- the picture data to be identified passes through the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the pooling layer P4, the convolution layer C5, and the pooling layer P5.
- Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer
- the computer is caused to perform the method of any of the above.
- the convolutional neural network-based image discriminating method and system provided by the embodiments of the present invention are first connected through a convolution layer of a plurality of small windows (cores of the convolutional layer), so that Quickly extracting local features of the image and quickly combining these local features into advanced features can greatly improve the speed and efficiency of image recognition.
- the image discriminating method and system based on the convolutional neural network uses the average pooling and the processing of the all-connected layer to make the final output of the image data into two features, thereby enabling the classifier to perform classification and classification.
- the image discriminating method and system based on the convolutional neural network uses the average pooling and the processing of the all-connected layer to make the final output of the image data into two features, thereby enabling the classifier to perform classification and classification. When processed, not only is it faster, but it is more accurate.
- FIG. 1 is a flowchart of an embodiment of a method for discriminating a picture based on a convolutional neural network according to the present invention
- FIG. 2 is a schematic structural diagram of a convolutional neural network calculation model provided by the present invention.
- FIG. 3 is a schematic structural diagram of an embodiment of a convolutional neural network-based picture authentication system according to the present invention.
- FIG. 4 is a schematic structural diagram of hardware of an electronic device according to an embodiment of the present invention.
- connection or integral connection; may be mechanical connection or electrical connection; may be directly connected, may also be indirectly connected through an intermediate medium, or may be internal communication of two components, may be wireless connection, or may be wired connection.
- connection or integral connection; may be mechanical connection or electrical connection; may be directly connected, may also be indirectly connected through an intermediate medium, or may be internal communication of two components, may be wireless connection, or may be wired connection.
- FIG. 1 a flow chart of an embodiment of a convolutional neural network based picture authentication method provided by the present invention is shown.
- the method for image identification based on a convolutional neural network includes:
- Step 101 Input the image data to be identified into at least two concatenated layers connected in series to perform continuous extraction of features, and obtain feature data after image extraction, wherein, preferably, the core sizes of the at least two convolution layers are Not more than 5 ⁇ 5;
- the picture data to be identified may be direct picture data information, or may be picture information acquired in the video, that is, the method according to the present invention is also applicable to video identification.
- the convolution layer is used to extract the local block features of the input picture data to obtain a higher level of feature data, and multiple convolution operations are performed in each convolution layer.
- the core of the convolution layer adopts an n ⁇ n structure (m ⁇ n may also be used), and the smaller the core of the convolution layer, the more features can be extracted, but the corresponding feature data is also more.
- step 102 the feature data extracted by the image is repeatedly subjected to dimension reduction and feature data extraction by at least one pooling layer and at least one convolution layer to obtain feature data after dimension reduction of the image;
- the pooling layer adopts an average pooling;
- the pooling layer is used for performing dimensionality reduction processing on the feature data outputted by the convolution layer, that is, the data amount is greatly reduced on the basis of ensuring the validity of the data.
- Repeat here refers to repeated pooling or convolution
- the process for example: pooling layer - convolution layer - pooling layer - convolution layer, of course, it is also possible to have a pooling layer or a convolution layer multiple times in a certain layer in the middle.
- the average pooling refers to taking the average value of the data within the size range of the pooling kernel as the output data after pooling according to the principle of pooling.
- Step 103 Enter feature data of the reduced-dimensionality of the picture into at least one fully connected layer to obtain a two-dimensional feature value of the picture data.
- the last fully connected layer outputs a 2-dimensional feature data, which makes it more accurate when classifying and identifying.
- Step 104 Perform classification processing on the two-dimensional feature value by using a classifier to obtain a discrimination result of the picture.
- the convolutional neural network-based image discriminating method is sequentially connected by a convolution layer of a plurality of small windows (ie, the core of the convolution layer is small), so that the picture can be extracted better and faster.
- the local features and the quick combination of these local features into advanced features can greatly improve the speed and efficiency of image recognition.
- the image discriminating method and system based on the convolutional neural network according to the present invention uses the average pooling and the processing of the all-connected layer to make the final output of the image data into two features, thereby causing the classifier to perform classification and discrimination processing. Not only faster, but also more accurate.
- the at least two concatenated layers connected in series comprise four convoluted layer C1 layers, C2 layers, C3 layers and C4 layers connected in sequence, and the core of the convolution layer
- the size is: C1 layer core size is 3 ⁇ 3, C2 layer core size is 3 ⁇ 3, C3 layer core size is 5 ⁇ 5, and C4 layer core size is 5 ⁇ 5.
- the number of steps of the four successively connected convolution layers is 1; the number of convolution kernels of the four convolutional layers is 96; the C1 layer and the C2 layer
- the pad value is 1, and the pad values of the C3 layer and the C4 layer are both 2.
- the step number of the convolution layer refers to the step size of each movement of the core of the convolution layer, and the pad value refers to whether a circle of data is added to participate in the operation around the input data, and the size of the pad value is added data. The number of laps. In this way, the processing efficiency and speed of the convolution layer can be further improved, thereby improving the efficiency of picture authentication.
- the feature data extracted by the image is repeatedly subjected to dimensionality reduction and feature data of the feature data by at least one pooling layer and at least one convolution layer.
- the step 102 of extracting and obtaining the feature data after the dimension reduction includes: passing the feature data extracted by the image through the pooled layer P4, the convolution layer C5, the pooling layer P5, the convolution layer C6, and the convolution which are sequentially connected.
- the convolution layer C5 has a kernel size of 5, a step number of 1, a pad value of 2, and a number of convolution kernels of 256; a kernel size of the convolutional layer C6, the convolutional layer C7, and the convolutional layer C8. All are 3, the number of steps is 1, the pad value is 1, and the number of convolution kernels is 384, 384, and 256, respectively.
- the at least one fully connected layer is a fully connected layer fc9, a fully connected layer fc10, a fully connected layer fc11, a fully connected layer fc12 connected in sequence; wherein the number of nodes of the fully connected layer They are 2048, 2048, 2048, 2; and all connected layers are processed by dropout.
- the node data can also be understood as the number of features.
- the dropout method is to discard the remaining data by randomly opening a certain number of data, so that the over-fitting of the data can be effectively prevented, thereby improving the efficiency of the authentication.
- FIG. 2 it is a schematic structural diagram of a convolutional neural network calculation model provided by the present invention.
- the picture data to be identified passes through the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the pooling layer P4, the convolution layer C5, the pooling layer P5, the convolution layer C6, and the volume.
- the processing of the layer C7, the convolution layer C8, the pooling layer P8, the fully-connected layer fc9, the fully-connected layer fc10, the fully-connected layer fc11, and the fully-connected layer fc12 is then connected to the classifier SVM for classification processing to obtain a picture. Identify the results.
- All of the convolutional layers and the fully connected layer fc9, the fully connected layer fc10, and the fully connected layer fc11 are processed by the activation function LEAKY RELU, so that the data of the previous layer can be transferred to the next layer.
- the activation function calculates a new output result by the last output data through an algorithm in the activation function, and uses the new output result as the input data of the next layer.
- the invention makes it more suitable for the identification of the binary problem by selecting the classifier SVM (Support Vector Machine).
- the activation function LEAKY RELU used in the present invention has a certain output value when the function value is less than zero, that is, the data of the part whose function value is less than zero can also participate in the training process, compared with the conventional activation function RELU.
- the output value is multiplied by a coefficient a, which is preferably a fixed value.
- all of the convolutional layers and all of the fully connected layers perform activation processing of data using an activation function LEAKY RELU.
- the last fully connected base layer may not require an activation function. In this way, data transfer can be made more efficient.
- the present invention prepares a 100-hour video of the positive and negative training samples, and intercepts 1.1 million pictures from the video, wherein the positive sample training picture is 500,000 and the negative sample training picture is 500,000. There are 100,000 test samples and 50,000 positive and negative samples.
- the convolutional layer in the network is initialized with a Gaussian distribution with a standard deviation of 0.01.
- the coefficient a parameter of the LEAKY RELU function is 0.01.
- the parameters in the fully connected layer are initialized with a Gaussian distribution with a standard deviation of 0.002.
- the dropout module has a parameter of 0.5.
- the training process uses the back propagation algorithm (BP algorithm) to train and update the parameters. A total of 300,000 iterations are trained in the present invention.
- BP algorithm back propagation algorithm
- FIG. 3 it is a schematic structural diagram of an embodiment of a picture identification system based on a convolutional neural network according to the present invention.
- the convolutional neural network based picture authentication system includes:
- the data extraction module 201 is configured to input image data to be identified into at least two convolution layers connected in series to perform continuous feature extraction, obtain feature data after image extraction, and send the feature data extracted by the image to data dimensionality reduction.
- Module 202 wherein, the core size of the at least two convolution layers are no more than 5 ⁇ 5;
- the data dimension reduction module 202 is configured to receive the feature data extracted by the image extraction module 201, and perform the feature data of the image extraction through at least one pooling layer and at least one convolution layer.
- the dimensionality reduction of the data and the extraction of the feature data are performed to obtain the feature data after the dimension reduction of the image, and the obtained feature data of the reduced image is sent to the full connection module 203; wherein the pooling layer adopts an average pool;
- the full connection module 203 is configured to receive the feature data of the reduced dimension of the image sent by the feature dimension reduction module 202, and input the feature data of the reduced dimension of the image into at least one fully connected layer to obtain a two-dimensional feature of the image data. a value; the obtained two-dimensional feature value of the picture data is sent to the classification module 204;
- the classification module 204 is configured to receive the two-dimensional feature value of the picture data sent by the full connection module 203, and classify the two-dimensional feature value by using a classifier to obtain a picture identification result.
- the convolutional neural network-based image discriminating system completes the convolution of data by the data extraction module 201, and then extracts the features of the digital data, and then implements the feature by the data dimension reduction module 202.
- the dimensionality reduction process obtains the two-dimensional feature value of the picture data through the full connection module 203, and finally the image data is identified by the classification module 204.
- the convolutional neural network-based image discriminating system realizes effective extraction of feature data by using a convolution layer of a smaller kernel, which not only improves the efficiency and speed of image discrimination, but also effectively prevents over-fitting.
- the data extraction module 201 includes: four convolution layer C1 layers, C2 layers, C3 layers, and C4 layers connected in sequence, and the core sizes of the convolution layers are respectively: C1
- the size of the layer core is 3 ⁇ 3
- the size of the C2 layer is 3 ⁇ 3
- the size of the C3 layer is 5 ⁇ 5
- the size of the C4 layer is 5 ⁇ 5.
- the number of steps of the four successively connected convolution layers is 1; the number of convolution kernels of the four convolutional layers is 96; the C1 layer and the C2 layer The pad value is 1, and the pad values of the C3 layer and the C4 layer are both 2.
- the data dimension reduction module 202 includes: a pooling layer P4, a convolution layer C5, a pooling layer P5, a convolution layer C6, a convolution layer C7, and a convolution layer which are sequentially connected.
- the core size is 5, the number of steps is 1, the pad value is 2, and the number of convolution kernels is 256; the core sizes of the convolution layer C6, the convolution layer C7, and the convolution layer C8 are all 3 steps.
- the number is 1, the pad value is 1, and the number of convolution kernels is 384, 384, and 256, respectively.
- the fully connected module 203 includes: a fully connected layer fc9, a fully connected layer fc10, a fully connected layer fc11, and a fully connected layer fc12 connected in sequence; wherein the number of nodes of the fully connected layer is respectively It is 2048, 2048, 2048, 2; and all connected layers are processed by dropout.
- the system includes a convolution layer C1, a convolution layer C2, a convolution layer C3, a convolution layer C4, a pooling layer P4, a convolution layer C5, and a pool which are sequentially connected.
- the SVM is classified and processed to obtain the discrimination result of the picture.
- all of the convolutional layers and all of the fully connected layers perform activation processing of data using an activation function LEAKY RELU.
- an embodiment of the present invention further discloses an electronic device including at least one processor 810; and a memory 800 communicably connected to the at least one processor 810; wherein the memory 800 stores An instruction executed by the at least one processor 810, the instructions being executed by the at least one processor 810 to enable the at least one processor 810 to input picture data to be authenticated into at least two concatenated concatenations
- the layer performs continuous extraction of features to obtain feature data after image extraction; and extracts the feature data extracted by the image through at least one pooling layer and at least one convolution
- the layer performs dimension reduction of the feature data and extracts the feature data, and obtains feature data after the dimension reduction of the image; wherein the pooling layer adopts an average pooling; and the feature data of the reduced dimension of the image is input into at least one fully connected layer.
- the two-dimensional feature value of the picture data is obtained; and the two-dimensional feature value is classified and processed by the classifier to obtain a picture identification result.
- the electronic device also includes an input device 830 and an output device 840 that are electrically coupled to the memory 800 and the processor, the electrical connections preferably being connected by a bus.
- the at least two concatenated layers connected in series comprise four convoluted layer C1 layers, C2 layers, C3 layers, and C4 layers connected in sequence, and the core of the convolution layer
- the sizes are: the core size of the C1 layer is 3 ⁇ 3, the core size of the C2 layer is 3 ⁇ 3, the core size of the C3 layer is 5 ⁇ 5, and the nuclear size of the C4 layer is 5 ⁇ 5.
- the number of steps of the four sequentially connected convolution layers is 1; the number of convolution kernels of the four convolution layers is 96; the C1 layer and the C2 The pad values of the layers are all 1, and the pad values of the C3 layer and the C4 layer are both 2.
- the feature data extracted by the image is repeatedly subjected to dimension reduction and feature data extraction by at least one pooling layer and at least one convolution layer to obtain a picture drop.
- the step of dimensioning the feature data includes: passing the extracted feature data through the pooled layer P4, the convolution layer C5, the pooling layer P5, the convolution layer C6, the convolution layer C7, and the convolution layer.
- the pooling layer P4, the pooling layer P5, and the pooling layer P8 have a kernel size of 3, a number of steps of 2, and a pad value of 0;
- the convolution layer C5 The core size is 5, the number of steps is 1, the pad value is 2, and the number of convolution kernels is 256; the core sizes of the convolution layer C6, the convolution layer C7, and the convolution layer C8 are all 3 steps.
- the number is 1, the pad value is 1, and the number of convolution kernels is 384, 384, and 256, respectively.
- the at least one fully connected layer is a fully connected layer fc9, a fully connected layer fc10, a fully connected layer fc11, and a fully connected layer fc12 connected in sequence; wherein the number of nodes of the fully connected layer They are 2048, 2048, 2048, 2; and all connected layers are processed by dropout.
- the picture data to be identified passes through the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the pooling layer P4, the convolution layer C5, and the pool.
- the SVM is classified and processed to obtain the discrimination result of the picture.
- all of the convolution layers and all of the fully connected layers are The activation process of the data is performed using the activation function LEAKY RELU.
- Embodiments of the present invention also disclose a non-volatile computer storage medium, wherein the storage medium stores the computer-executable instructions of computer-executable instructions that, when executed by an electronic device, enable an electronic device to be authenticated
- the picture data is input into at least two concatenated layers connected in series to perform continuous feature extraction to obtain feature data after image extraction; and the feature data extracted by the picture is characterized by at least one pooling layer and at least one convolution layer
- the dimensionality reduction of the data and the extraction of the feature data are performed to obtain the feature data after the dimensionality reduction of the image; wherein the pooling layer adopts the average pooling; and the feature data after the dimensionality reduction of the image is input into at least one fully connected layer,
- the two-dimensional feature value of the picture data; the two-dimensional feature value is classified and processed by the classifier to obtain the identification result of the picture.
- the at least two concatenated layers connected in series comprise four convoluted layer C1 layers, C2 layers, C3 layers and C4 layers connected in sequence, and the core of the convolution layer
- the sizes are: the core size of the C1 layer is 3 ⁇ 3, the core size of the C2 layer is 3 ⁇ 3, the core size of the C3 layer is 5 ⁇ 5, and the nuclear size of the C4 layer is 5 ⁇ 5.
- the number of steps of the four sequentially connected convolution layers is 1; the number of convolution kernels of the four convolution layers is 96; the C1 layer and the C2 The pad values of the layers are all 1, and the pad values of the C3 layer and the C4 layer are both 2.
- the feature data extracted by the image is repeatedly subjected to dimension reduction and feature data extraction by at least one pooling layer and at least one convolution layer to obtain a picture drop.
- the step of dimensioning the feature data includes: passing the extracted feature data through the pooled layer P4, the convolution layer C5, the pooling layer P5, the convolution layer C6, the convolution layer C7, and the convolution layer.
- the pooling layer P4, the pooling layer P5, and the pooling layer P8 have a kernel size of 3, a number of steps of 2, and a pad value of 0;
- the convolution layer C5 The core size is 5, the number of steps is 1, the pad value is 2, and the number of convolution kernels is 256; the core sizes of the convolution layer C6, the convolution layer C7, and the convolution layer C8 are all 3 steps.
- the number is 1, the pad value is 1, and the number of convolution kernels is 384, 384, and 256, respectively.
- the at least one fully connected layer is a fully connected layer fc9, a fully connected layer fc10, a fully connected layer fc11, and a fully connected layer fc12 connected in sequence; wherein the number of nodes of the fully connected layer They are 2048, 2048, 2048, 2; and all connected layers are processed by dropout.
- the picture data to be identified passes through the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the pooling layer P4, the convolution layer C5, and the pool.
- Layer P5 The processing of the convolutional layer C6, the convolutional layer C7, the convolutional layer C8, the pooling layer P8, the fully connected layer fc9, the fully connected layer fc10, the fully connected layer fc11, and the fully connected layer fc12 is then connected to the classifier SVM. Classification processing, to obtain the identification result of the picture.
- all of the convolutional layers and all of the fully connected layers perform activation processing of data using an activation function LEAKY RELU.
- Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer
- the computer is caused to perform the method described in the above embodiments.
- embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
- the device is implemented in a flow or a flow chart The functions specified in a block or blocks of a flow and/or block diagram.
- These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
- the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于卷积神经网络的图片鉴别方法、系统和电子设备,包括:将图片数据输入至少两个串联连接的卷积层进行特征的提取,获得提取后的特征数据,其中,所述卷积层的核大小均不大于5×5;将提取后的特征数据通过池化层和卷积层进行特征数据的降维和提取,获得降维后的特征数据;其中,池化层采用平均值池化;将图片降维后的特征数据输入全连接层中,得到图片数据的二维特征值;将二维特征值通过分类器进行分类处理,得到图片的鉴别结果。本发明还公开了基于卷积神经网络的图片鉴别系统。所述基于卷积神经网络的图片鉴别方法及系统通过较小核的卷积层进行特征数据的提取,使得能够更好、更快地提取出图片的局部特征,进而提高了图片鉴别的速度和效率。
Description
交叉引用
本申请要求在2016年03月30日提交中国专利局、申请号为201610195777.7、发明名称为“基于卷积神经网络的图片鉴别方法及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及卷积神经网络技术领域,特别是指一种基于卷积神经网络的图片鉴别方法、系统和电子设备。
卷积神经网络(Convolutional Neural Network,CNN)是近年发展起来,并引起广泛重视的一种高效识别方法。目前,卷积神经网络已经成为众多科学领域的研究热点之一,特别是在模式分类领域,由于该网络避免了对图像的复杂前期预处理,可以直接输入原始图像,因而得到了更为广泛的应用。
一般地,卷积神经网络的基本结构中包括多个卷积层,每个卷积层上设置有多个神经元,每个神经元的输入与前一个卷积层的局部接受域(local receptive filed)相连,通过对前一个卷积层的局部接受域的数据进行卷积运算,以提取该局部接受域的特征,一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;然后,通过求局部平均(也称池化处理)与二次特征提取以进行特征映射,得到特征信息,该特征信息输出到下一个卷积层继续进行处理,直到到达最后一层(输出层),从而得到最终输出结果。特征映射通常采用sigmoid函数作为卷积神经网络的激活函数。在卷积神经网络中,一个卷积层上的神经元与同一层的其他神经元共享权重,因而减少了网络自由参数的个数。在卷积神经网络模型中,可将激活函数(activation function)应用于每个作为输出结果的数据值以确定是否达到阈值,因此而产生的数据值作为下一个卷积层的输入。
通常的,一个用于识别的卷积神经网络计算模型包括卷积层、池化层、全
连接层以及后续的分类器。通过对已有的样本数据的训练,能够获得较好的卷积神经网络计算模型,当需要识别新的目标时,只需要将目标的数据输入计算模型,就能够实现对新的目标的识别。
但是,现有的利用卷积神经网络的计算模型进行目标鉴别时,通常按照现有较为固定的模型架构进行计算,如AlexNet、VGG、GoogLeNet等,这些模型中卷积层、池化层、全连接层及激活函数等参数及架构都已经固定,虽然其具有通用性,但是也使得这些模型在应用于特定的场景时,其识别结果不佳。例如:在针对于视频或图片的鉴黄中,鉴别的效果较差。
发明内容
本发明的目的在于提出一种基于卷积神经网络的图片鉴别方法及系统,能够大大提高图片鉴别的速度和准确性。
基于上述目的本发明提供的一种基于卷积神经网络的图片鉴别方法,包括:
将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据,其中,所述至少两个卷积层的核大小均不大于5×5;
将所述图片提取后的特征数据通过至少一个池化层和至少一个卷积层进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;
将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;
将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
可选的,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层的核大小为3×3、C2层的核大小为3×3、C3层的核大小为5×5、C4层的核大小为5×5。
进一步,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
可选的,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据的步骤包括:
将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为256个;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
可选的,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
可选的,所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
可选的,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
本发明还提供了一种基于卷积神经网络的图片鉴别系统,包括:
数据提取模块,用于将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据,并将图片提取后的特征数据发送给数据降维模块;其中,所述至少两个卷积层的核大小均不大于5×5;
数据降维模块,用于接收所述数据提取模块发送的图片提取后的特征数据,将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据,将获得的图片降维后的特征数据发送给全连接模块;其中,所述池化层采用平均值池化;
全连接模块,用于接收所述特征降维模块发送的图片降维后的特征数据,将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将获得的图片数据的二维特征值发送给分类模块;
分类模块,用于接收所述全连接模块发送的图片数据的二维特征值,将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
可选的,所述数据提取模块包括:
四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层核大小为3×3、C2层核大小为3×3、C3层核大小为5×5、C4层核大小为5×5。
进一步,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
可选的,所述数据降维模块包括:
依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为256个;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256。
可选的,所述全连接模块包括:
依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
可选的,所述系统包括依次连接的卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
可选的,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
本发明实施例又公开了一种电子设备,包括至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据;将所述图片提取后的特征数据通过至少一个池化层和至少一个卷积层进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;将所述图片
降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
上述的电子设备,其中,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层的核大小为3×3、C2层的核大小为3×3、C3层的核大小为5×5、C4层的核大小为5×5。
上述的电子设备,其中,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
上述的电子设备,其中,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据的步骤包括:将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为为256;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
上述的电子设备,其中,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
上述的电子设备,其中,所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
上述的电子设备,其中,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
本发明还公开了一种非易失性计算机存储介质,其中,所述存储介质存储有计算机可执行指令,所述计算机可执行指令当由电子设备执行时使得电子设
备能够:将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据;将所述图片提取后的特征数据通过至少一个池化层和至少一个卷积层进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
上述的存储介质,其中,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层的核大小为3×3、C2层的核大小为3×3、C3层的核大小为5×5、C4层的核大小为5×5。
上述的存储介质,其中,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
上述的存储介质,其中,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据的步骤包括:将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为为256;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
上述的存储介质,其中,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
上述的存储介质,其中,所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
上述的存储介质,其中,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
本发明实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任一所述的方法。
从上面所述可以看出,本发明实施例提供的基于卷积神经网络的图片鉴别方法及系统首先通过多个小窗口(卷积层的核小)的卷积层相连,使得能够更好、更快地提取出图片的局部特征,并将这些局部特征快速组合成高级特征,能够大大提高图片鉴别的速度和效率。
此外,本发明实施例所述的基于卷积神经网络的图片鉴别方法及系统通过采用平均值池化以及全连接层的处理,使得图片数据最后输出为2个特征,进而使得分类器进行分类鉴别处理时,不仅速度更快,而且准确性更高。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明提供的基于卷积神经网络的图片鉴别方法的实施例的流程图;
图2为本发明提供的卷积神经网络计算模型的结构示意图;
图3为本发明提供的基于卷积神经网络的图片鉴别系统的实施例的结构示意图。
图4为本发明实施例中电子设备的硬件结构示意图。
下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实
施例,都属于本发明保护的范围。
在本发明的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。
在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,还可以是两个元件内部的连通,可以是无线连接,也可以是有线连接。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。
参照图1所示,为本发明提供的基于卷积神经网络的图片鉴别方法的实施例的流程图。所述基于卷积神经网络的图片鉴别方法,包括:
步骤101,将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据,其中,优选的,所述至少两个卷积层的核大小均不大于5×5;
其中,所述待鉴别的图片数据既可以为直接的图片数据信息,还可以为视频中获取的图片信息,也即本发明所述的方法也同样适用于视频的鉴别。所述卷积层用于对输入的图片数据进行局部块特征的提取,获得更高一级的特征数据,且每个卷积层中均会进行多次的卷积操作。通常卷积层的核采用n×n的结构(也可以采用m×n),所述卷积层的核越小,则能够提取出更多的特征,但是相应的特征数据也更多。
步骤102,将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;
其中,所述池化层用于将卷积层输出的特征数据进行降维处理,也即在保证数据有效性的基础上大大减少数据量。这里所述的反复是指重复池化或卷积
的过程,例如:池化层-卷积层-池化层-卷积层,当然,也可以在中间的某一层中多次出现池化层或者卷积层。所述平均值池化是指按照池化的原则,取池化核大小范围内数据的平均值作为池化后的输出数据。
步骤103,将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;
不论经过多少个全连接层的处理,最后一个全连接层都输出一个2维的特征数据,这样,在进行分类鉴别时能够更加准确。
步骤104,将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
由上述实施例可知,所述基于卷积神经网络的图片鉴别方法通过多个小窗口(即卷积层的核较小)的卷积层依次相连,使得能够更好、更快地提取出图片的局部特征,并将这些局部特征快速组合成高级特征,能够大大提高图片鉴别的速度和效率。同时,本发明所述的基于卷积神经网络的图片鉴别方法及系统通过采用平均值池化以及全连接层的处理,使得图片数据最后输出为2个特征,进而使得分类器进行分类鉴别处理时,不仅速度更快,而且准确性更高。
作为本发明一个较佳的实施例,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层核大小为3×3、C2层核大小为3×3、C3层核大小为5×5、C4层核大小为5×5。这样,依次连接的卷积层能够更有效地提取图片的特征数据,同时还减少了神经网络计算模型的参数,对于提升鉴别速度,防止过度拟合由较大的作用。
作为本发明进一步的实施例,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。其中,所述卷积层的步数指卷积层的核每次移动的步长,所述pad值是指在输入数据的周围是否添加一圈数据参与运算,pad值的大小也即添加数据的圈数。这样,能够进一步提高卷积层的处理效率和速度,进而提高图片鉴别的效率。
作为本发明一个优选的实施例,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的
提取,获得图片降维后的特征数据的步骤102包括:将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为256;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
作为本发明另一个优选的实施例,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。这里,所述的节点数据也可以理解为特征数目。所述dropout方式是通过随机开启一定数目的数据,而将剩下的数据丢弃,这样能够有效地防止数据的过拟合,进而提高鉴别的效率。
参照图2所示,为本发明提供的卷积神经网络计算模型的结构示意图。所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。其中,所有卷积层以及全连接层fc9、全连接层fc10、全连接层fc11均通过激活函数LEAKY RELU进行数据的处理,使得前一层的数据能够向下一层传递。所述激活函数是将上一次的输出数据通过激活函数中的算法计算得到新的输出结果,将所述新的输出结果作为下一层的输入数据。本发明通过选用分类器SVM(支持向量机),使得能够更加适用于二分问题的鉴别。同时,本发明所采用的激活函数LEAKY RELU相比于传统的激活函数RELU,在函数值小于零时,也具有一定的输出值,也即使得函数值小于零这一部分的数据也可以参与训练过程。这里,当函数值小于0时,输出值为输入值乘以一个系数a,所述系数a优选采用固定值。
优选的,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。其中,最后一个全连基层可以不需要激活函数。
这样,能够使得数据的传递更为有效。
在一些可选的实施例中,本发明准备正负训练样本一共100小时视频,从视频中截取110万张图片,其中,正样本训练图片50万张,负样本训练图片50万张。测试样本10万张,正负样本各5万张。网络中卷积层采用标准偏差为0.01的高斯分布进行初始化。LEAKY RELU函数的系数a参数为0.01。全连接层中参数采用标准偏差为0.002的高斯分布进行初始化。dropout模块的参数为0.5。训练过程采用反向传播算法(BP算法)进行参数的训练及更新。本发明中一共训练30万次迭代。
参照图3所示,为本发明提供的一种基于卷积神经网络的图片鉴别系统的一个实施例的结构示意图。所述基于卷积神经网络的图片鉴别系统包括:
数据提取模块201,用于将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据,并将图片提取后的特征数据发送给数据降维模块202;其中,所述至少两个卷积层的核大小均不大于5×5;
数据降维模块202,用于接收所述数据提取模块201发送的图片提取后的特征数据,将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据,将获得的图片降维后的特征数据发送给全连接模块203;其中,所述池化层采用平均值池化;
全连接模块203,用于接收所述特征降维模块202发送的图片降维后的特征数据,将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将获得的图片数据的二维特征值发送给分类模块204;
分类模块204,用于接收所述全连接模块203发送的图片数据的二维特征值,将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
由上述实施例可知,所述基于卷积神经网络的图片鉴别系统通过所述数据提取模块201完成数据的卷积,进而数显图片数据特征的提取,然后通过所述数据降维模块202实现特征的降维处理,通过所述全连接模块203获得图片数据的二维特征值,最后通过所述分类模块204实现图片数据的鉴别。所述基于卷积神经网络的图片鉴别系统通过采用较小核的卷积层实现特征数据的有效提取,不仅提高了图片鉴别的效率和速度,而且有效的防止了过拟合。
作为本发明一个优选的实施例,所述数据提取模块201包括:四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层核大小为3×3、C2层核大小为3×3、C3层核大小为5×5、C4层核大小为5×5。
作为本发明进一步的实施例,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
作为本发明另一个优选的实施例,所述数据降维模块202包括:依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为256个;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256。
在一些可选的实施例中,所述全连接模块203包括:依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
在本发明另一些可选的实施例中,所述系统包括依次连接的卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
优选的,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
如图4所示,本发明实施例又公开了一种电子设备,包括至少一个处理器810;以及,与所述至少一个处理器810通信连接的存储器800;其中,所述存储器800存储有可被所述至少一个处理器810执行的指令,所述指令被所述至少一个处理器810执行,以使所述至少一个处理器810能够将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据;将所述图片提取后的特征数据通过至少一个池化层和至少一个卷积
层进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。所述电子设备还包括与所述存储器800和所述处理器电连接的输入装置830和输出装置840,所述电连接优选为通过总线连接。
本实施例的电子设备,优选地,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层的核大小为3×3、C2层的核大小为3×3、C3层的核大小为5×5、C4层的核大小为5×5。
本实施例的电子设备,优选地,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
本实施例的电子设备,优选地,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据的步骤包括:将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为为256;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
本实施例的电子设备,优选地,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
本实施例的电子设备,优选地,所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
本实施例的电子设备,优选地,所有所述卷积层和所有所述全连接层均采
用激活函数LEAKY RELU进行数据的激活处理。
本发明实施例还公开了一种非易失性计算机存储介质,其中,所述存储介质存储有计算机可执行指令的所述计算机可执行指令,当由电子设备执行时使得电子设备能够将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据;将所述图片提取后的特征数据通过至少一个池化层和至少一个卷积层进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
本实施例的存储介质,优选地,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层的核大小为3×3、C2层的核大小为3×3、C3层的核大小为5×5、C4层的核大小为5×5。
本实施例的存储介质,优选地,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
本实施例的存储介质,优选地,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据的步骤包括:将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为为256;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
本实施例的存储介质,优选地,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
本实施例的存储介质,优选地,所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、
卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
本实施例的存储介质,优选地,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
本发明实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述实施例所述的方法。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个
流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。
Claims (29)
- 一种基于卷积神经网络的图片鉴别方法,应用于终端,其特征在于,包括:将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据;将所述图片提取后的特征数据通过至少一个池化层和至少一个卷积层进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
- 根据权利要求1所述的方法,其特征在于,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层的核大小为3×3、C2层的核大小为3×3、C3层的核大小为5×5、C4层的核大小为5×5。
- 根据权利要求2所述的方法,其特征在于,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
- 根据权利要求1所述的方法,其特征在于,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据的步骤包括:将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为为256;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
- 根据权利要求1所述的方法,其特征在于,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采 用dropout方式进行数据处理。
- 根据权利要求1所述的方法,其特征在于,所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
- 根据权利要求1-6任意一项所述的方法,其特征在于,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
- 一种基于卷积神经网络的图片鉴别系统,其特征在于,包括:数据提取模块,用于将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据,并将图片提取后的特征数据发送给数据降维模块;其中,所述至少两个卷积层的核大小均不大于5×5;数据降维模块,用于接收所述数据提取模块发送的图片提取后的特征数据,将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据,将获得的图片降维后的特征数据发送给全连接模块;其中,所述池化层采用平均值池化;全连接模块,用于接收所述特征降维模块发送的图片降维后的特征数据,将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将获得的图片数据的二维特征值发送给分类模块;分类模块,用于接收所述全连接模块发送的图片数据的二维特征值,将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
- 根据权利要求8所述的系统,其特征在于,所述数据提取模块包括:四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层核大小为3×3、C2层核大小为3×3、C3层核大小为5×5、C4层核大小为5×5。
- 根据权利要求9所述的系统,其特征在于,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
- 根据权利要求8所述的系统,其特征在于,所述数据降维模块包括:依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、 卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为256个;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
- 根据权利要求8所述的系统,其特征在于,所述全连接模块包括:依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
- 根据权利要求8所述的系统,其特征在于,所述系统包括依次连接的卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
- 根据权利要求8-13任意一项所述的系统,其特征在于,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
- 一种电子设备,其特征在于包括至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据;将所述图片提取后的特征数据通过至少一个池化层和至少一个卷积层进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
- 根据权利要求15所述的电子设备,其特征在于,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层的核大小为3×3、C2层的核大小为3×3、 C3层的核大小为5×5、C4层的核大小为5×5。
- 根据权利要求16所述的电子设备,其特征在于,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
- 根据权利要求15所述的电子设备,其特征在于,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据的步骤包括:将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为为256;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
- 根据权利要求15所述的电子设备,其特征在于,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
- 根据权利要求15所述的电子设备,其特征在于,所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
- 根据权利要求15-20任意一项所述的电子设备,其特征在于,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
- 一种非易失性计算机存储介质,其特征在于:所述存储介质存储有计算机可执行指令,所述计算机可执行指令当由电子设备执行时使得电子设备能够:将待鉴别的图片数据输入至少两个串联连接的卷积层进行特征的连续提取,获得图片提取后的特征数据;将所述图片提取后的特征数据通过至少一个池化层和至少一个卷积层进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据;其中,所述池化层采用平均值池化;将所述图片降维后的特征数据输入至少一个全连接层中,得到图片数据的二维特征值;将所述二维特征值通过分类器进行分类处理,得到图片的鉴别结果。
- 根据权利要求22所述的存储介质,其特征在于,所述至少两个串联连接的卷积层包含四个依次连接的卷积层C1层、C2层、C3层和C4层,且所述卷积层的核大小分别为:C1层的核大小为3×3、C2层的核大小为3×3、C3层的核大小为5×5、C4层的核大小为5×5。
- 根据权利要求23所述的存储介质,其特征在于,所述四个依次连接的卷积层的步数均为1;四个卷积层的卷积核的个数均为96个;所述C1层和C2层的pad值均为1,所述C3层和C4层的pad值均为2。
- 根据权利要求22所述的存储介质,其特征在于,所述将所述图片提取后的特征数据通过至少一个的池化层和至少一个卷积层反复的进行特征数据的降维和特征数据的提取,获得图片降维后的特征数据的步骤包括:将所述图片提取后的特征数据通过依次连接的池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8和池化层P8;其中,所述池化层P4、池化层P5、池化层P8的核大小均为3,步数均为2,pad值均为0;所述卷积层C5的核大小为5,步数为1,pad值为2,卷积核的个数为为256;所述卷积层C6、卷积层C7、卷积层C8的核大小均为3,步数均为1,pad值均为1,卷积核的个数分别为384、384、256个。
- 根据权利要求22所述的存储介质,其特征在于,所述至少一个全连接层为依次连接的全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12;其中所述全连接层的节点数目分别为2048、2048、2048、2;且所有全连接层均采用dropout方式进行数据处理。
- 根据权利要求22所述的存储介质,其特征在于,所述待鉴别的图片数据依次经过卷积层C1、卷积层C2、卷积层C3、卷积层C4、池化层P4、卷积层C5、池化层P5、卷积层C6、卷积层C7、卷积层C8、池化层P8、全连接层fc9、全连接层fc10、全连接层fc11、全连接层fc12的处理,然后连接到分类器SVM中经过分类处理,得到图片的鉴别结果。
- 根据权利要求22-27任意一项所述的存储介质,其特征在于,所有所述卷积层和所有所述全连接层均采用激活函数LEAKY RELU进行数据的激活处理。
- 一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,其特征在于,当所述程序指令被计算机执行时,使所述计算机执行上述任一权利要求所述的方法。
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610195777.7 | 2016-03-30 | ||
| CN201610195777.7A CN105868785A (zh) | 2016-03-30 | 2016-03-30 | 基于卷积神经网络的图片鉴别方法及系统 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017166586A1 true WO2017166586A1 (zh) | 2017-10-05 |
Family
ID=56626701
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/096031 Ceased WO2017166586A1 (zh) | 2016-03-30 | 2016-08-19 | 基于卷积神经网络的图片鉴别方法、系统和电子设备 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN105868785A (zh) |
| WO (1) | WO2017166586A1 (zh) |
Cited By (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108257180A (zh) * | 2018-02-07 | 2018-07-06 | 北京深度奇点科技有限公司 | 焊接间隙定位方法和装置 |
| CN109658489A (zh) * | 2018-12-17 | 2019-04-19 | 清华大学 | 一种基于神经网络的立体网格数据处理方法和系统 |
| CN109858497A (zh) * | 2019-01-18 | 2019-06-07 | 五邑大学 | 一种改进的残差网络及其特征提取方法、装置 |
| CN110378424A (zh) * | 2019-07-23 | 2019-10-25 | 国网河北省电力有限公司电力科学研究院 | 基于卷积神经网络的变压器套管故障红外图像识别方法 |
| CN110674488A (zh) * | 2019-09-06 | 2020-01-10 | 深圳壹账通智能科技有限公司 | 基于神经网络的验证码识别方法、系统及计算机设备 |
| CN111145169A (zh) * | 2019-12-31 | 2020-05-12 | 成都理工大学 | 基于多列异步神经网络的地铁站乘客数量调度系统及方法 |
| CN111222529A (zh) * | 2019-09-29 | 2020-06-02 | 上海上实龙创智慧能源科技股份有限公司 | 一种基于GoogLeNet-SVM的污水曝气池泡沫识别方法 |
| CN111291627A (zh) * | 2020-01-16 | 2020-06-16 | 广州酷狗计算机科技有限公司 | 人脸识别方法、装置及计算机设备 |
| CN111709389A (zh) * | 2020-06-24 | 2020-09-25 | 山东省食品药品检验研究院 | 基于显微图像的中药粉末智能鉴别方法和系统 |
| CN111709390A (zh) * | 2020-08-11 | 2020-09-25 | 山东省食品药品检验研究院 | 基于显微图像的草酸钙晶体智能鉴别方法和系统 |
| CN112215243A (zh) * | 2020-10-30 | 2021-01-12 | 百度(中国)有限公司 | 图像特征提取方法、装置、设备及存储介质 |
| CN112437926A (zh) * | 2019-06-18 | 2021-03-02 | 神经技术Uab公司 | 使用前馈卷积神经网络的快速鲁棒摩擦脊印痕细节提取 |
| CN113204659A (zh) * | 2021-03-26 | 2021-08-03 | 北京达佳互联信息技术有限公司 | 多媒体资源的标签分类方法、装置、电子设备及存储介质 |
| CN113554581A (zh) * | 2020-04-21 | 2021-10-26 | 北京灵汐科技有限公司 | 一种三维医学影像识别方法和系统 |
| US11620772B2 (en) | 2016-09-01 | 2023-04-04 | The General Hospital Corporation | System and method for automated transform by manifold approximation |
| CN111666865B (zh) * | 2020-06-02 | 2023-05-23 | 上海数创医疗科技有限公司 | 多导联心电信号卷积神经网络分类方法及其使用方法 |
Families Citing this family (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105868785A (zh) * | 2016-03-30 | 2016-08-17 | 乐视控股(北京)有限公司 | 基于卷积神经网络的图片鉴别方法及系统 |
| CN107886344A (zh) * | 2016-09-30 | 2018-04-06 | 北京金山安全软件有限公司 | 基于卷积神经网络的欺诈广告页面识别方法和装置 |
| WO2018099473A1 (zh) * | 2016-12-02 | 2018-06-07 | 北京市商汤科技开发有限公司 | 场景分析方法和系统、电子设备 |
| CN108229263B (zh) * | 2016-12-22 | 2021-03-02 | 杭州光启人工智能研究院 | 目标对象的识别方法和装置、机器人 |
| CN106855944B (zh) * | 2016-12-22 | 2020-01-14 | 浙江宇视科技有限公司 | 行人标志物识别方法及装置 |
| CN108256544B (zh) * | 2016-12-29 | 2019-07-23 | 杭州光启人工智能研究院 | 图片分类方法和装置、机器人 |
| CN107247949B (zh) * | 2017-08-02 | 2020-06-19 | 智慧眼科技股份有限公司 | 基于深度学习的人脸识别方法、装置和电子设备 |
| CN109840584B (zh) * | 2017-11-24 | 2023-04-18 | 腾讯科技(深圳)有限公司 | 基于卷积神经网络模型的图像数据分类方法及设备 |
| CN108009592A (zh) * | 2017-12-15 | 2018-05-08 | 云南大学 | 一种糖尿病性视网膜图像自动分类方法 |
| CN109740482A (zh) * | 2018-12-26 | 2019-05-10 | 北京科技大学 | 一种图像文本识别方法和装置 |
| CN110309707A (zh) * | 2019-05-08 | 2019-10-08 | 昆明理工大学 | 一种基于深度学习的咖啡果实成熟度的识别方法 |
| CN116486185A (zh) * | 2022-01-11 | 2023-07-25 | 中国石油化工股份有限公司 | 油井工况识别方法、装置、设备和存储介质 |
| CN116883716A (zh) * | 2023-06-05 | 2023-10-13 | 中国银行股份有限公司 | 图像处理方法、装置、计算机设备、存储介质和程序产品 |
| CN116959477B (zh) * | 2023-09-19 | 2023-12-19 | 杭州爱华仪器有限公司 | 一种基于卷积神经网络的噪声源分类的方法及装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104850836A (zh) * | 2015-05-15 | 2015-08-19 | 浙江大学 | 基于深度卷积神经网络的害虫图像自动识别方法 |
| CN105184271A (zh) * | 2015-09-18 | 2015-12-23 | 苏州派瑞雷尔智能科技有限公司 | 一种基于深度学习的车辆自动检测方法 |
| CN105354568A (zh) * | 2015-08-24 | 2016-02-24 | 西安电子科技大学 | 基于卷积神经网络的车标识别方法 |
| CN105868785A (zh) * | 2016-03-30 | 2016-08-17 | 乐视控股(北京)有限公司 | 基于卷积神经网络的图片鉴别方法及系统 |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104866524A (zh) * | 2015-04-10 | 2015-08-26 | 大连交通大学 | 一种商品图像精细分类方法 |
-
2016
- 2016-03-30 CN CN201610195777.7A patent/CN105868785A/zh active Pending
- 2016-08-19 WO PCT/CN2016/096031 patent/WO2017166586A1/zh not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104850836A (zh) * | 2015-05-15 | 2015-08-19 | 浙江大学 | 基于深度卷积神经网络的害虫图像自动识别方法 |
| CN105354568A (zh) * | 2015-08-24 | 2016-02-24 | 西安电子科技大学 | 基于卷积神经网络的车标识别方法 |
| CN105184271A (zh) * | 2015-09-18 | 2015-12-23 | 苏州派瑞雷尔智能科技有限公司 | 一种基于深度学习的车辆自动检测方法 |
| CN105868785A (zh) * | 2016-03-30 | 2016-08-17 | 乐视控股(北京)有限公司 | 基于卷积神经网络的图片鉴别方法及系统 |
Cited By (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11620772B2 (en) | 2016-09-01 | 2023-04-04 | The General Hospital Corporation | System and method for automated transform by manifold approximation |
| CN108257180B (zh) * | 2018-02-07 | 2023-08-04 | 北京深度奇点科技有限公司 | 焊接间隙定位方法和装置 |
| CN108257180A (zh) * | 2018-02-07 | 2018-07-06 | 北京深度奇点科技有限公司 | 焊接间隙定位方法和装置 |
| CN109658489A (zh) * | 2018-12-17 | 2019-04-19 | 清华大学 | 一种基于神经网络的立体网格数据处理方法和系统 |
| CN109658489B (zh) * | 2018-12-17 | 2023-06-30 | 清华大学 | 一种基于神经网络的立体网格数据处理方法和系统 |
| CN109858497A (zh) * | 2019-01-18 | 2019-06-07 | 五邑大学 | 一种改进的残差网络及其特征提取方法、装置 |
| CN109858497B (zh) * | 2019-01-18 | 2023-09-01 | 五邑大学 | 一种改进的残差网络及其特征提取方法、装置 |
| CN112437926B (zh) * | 2019-06-18 | 2024-05-31 | 神经技术Uab公司 | 使用前馈卷积神经网络的快速鲁棒摩擦脊印痕细节提取 |
| CN112437926A (zh) * | 2019-06-18 | 2021-03-02 | 神经技术Uab公司 | 使用前馈卷积神经网络的快速鲁棒摩擦脊印痕细节提取 |
| CN110378424A (zh) * | 2019-07-23 | 2019-10-25 | 国网河北省电力有限公司电力科学研究院 | 基于卷积神经网络的变压器套管故障红外图像识别方法 |
| CN110674488B (zh) * | 2019-09-06 | 2024-04-26 | 深圳壹账通智能科技有限公司 | 基于神经网络的验证码识别方法、系统及计算机设备 |
| CN110674488A (zh) * | 2019-09-06 | 2020-01-10 | 深圳壹账通智能科技有限公司 | 基于神经网络的验证码识别方法、系统及计算机设备 |
| CN111222529A (zh) * | 2019-09-29 | 2020-06-02 | 上海上实龙创智慧能源科技股份有限公司 | 一种基于GoogLeNet-SVM的污水曝气池泡沫识别方法 |
| CN111145169A (zh) * | 2019-12-31 | 2020-05-12 | 成都理工大学 | 基于多列异步神经网络的地铁站乘客数量调度系统及方法 |
| CN111291627A (zh) * | 2020-01-16 | 2020-06-16 | 广州酷狗计算机科技有限公司 | 人脸识别方法、装置及计算机设备 |
| CN111291627B (zh) * | 2020-01-16 | 2024-04-19 | 广州酷狗计算机科技有限公司 | 人脸识别方法、装置及计算机设备 |
| CN113554581A (zh) * | 2020-04-21 | 2021-10-26 | 北京灵汐科技有限公司 | 一种三维医学影像识别方法和系统 |
| CN111666865B (zh) * | 2020-06-02 | 2023-05-23 | 上海数创医疗科技有限公司 | 多导联心电信号卷积神经网络分类方法及其使用方法 |
| CN111709389A (zh) * | 2020-06-24 | 2020-09-25 | 山东省食品药品检验研究院 | 基于显微图像的中药粉末智能鉴别方法和系统 |
| CN111709390A (zh) * | 2020-08-11 | 2020-09-25 | 山东省食品药品检验研究院 | 基于显微图像的草酸钙晶体智能鉴别方法和系统 |
| CN112215243A (zh) * | 2020-10-30 | 2021-01-12 | 百度(中国)有限公司 | 图像特征提取方法、装置、设备及存储介质 |
| CN113204659A (zh) * | 2021-03-26 | 2021-08-03 | 北京达佳互联信息技术有限公司 | 多媒体资源的标签分类方法、装置、电子设备及存储介质 |
| CN113204659B (zh) * | 2021-03-26 | 2024-01-19 | 北京达佳互联信息技术有限公司 | 多媒体资源的标签分类方法、装置、电子设备及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN105868785A (zh) | 2016-08-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017166586A1 (zh) | 基于卷积神经网络的图片鉴别方法、系统和电子设备 | |
| US11093805B2 (en) | Image recognition method and apparatus, image verification method and apparatus, learning method and apparatus to recognize image, and learning method and apparatus to verify image | |
| CN110070067B (zh) | 视频分类方法及其模型的训练方法、装置和电子设备 | |
| CN108764164B (zh) | 一种基于可变形卷积网络的人脸检测方法及系统 | |
| CN108710847B (zh) | 场景识别方法、装置及电子设备 | |
| US10726244B2 (en) | Method and apparatus detecting a target | |
| CN107529650B (zh) | 闭环检测方法、装置及计算机设备 | |
| CN111797893A (zh) | 一种神经网络的训练方法、图像分类系统及相关设备 | |
| CN113705769A (zh) | 一种神经网络训练方法以及装置 | |
| CN111797983A (zh) | 一种神经网络构建方法以及装置 | |
| EP3035246A2 (en) | Image recognition method and apparatus, image verification method and apparatus, learning method and apparatus to recognize image, and learning method and apparatus to verify image | |
| JP7769610B2 (ja) | 物体検出および認識のためのシステムおよび方法 | |
| CN111709285A (zh) | 一种基于无人机的疫情防护监控方法、装置和存储介质 | |
| WO2018036276A1 (zh) | 图片品质的检测方法、装置、服务器及存储介质 | |
| CN113936302B (zh) | 行人重识别模型的训练方法、装置、计算设备及存储介质 | |
| CN108734673B (zh) | 去网纹系统训练方法、去网纹方法、装置、设备及介质 | |
| CN117456389B (zh) | 一种基于YOLOv5s的改进型无人机航拍图像密集和小目标识别方法、系统、设备及介质 | |
| JP2019536120A (ja) | Id写真の真正性を検証するためのシステムおよび方法 | |
| CN110222718B (zh) | 图像处理的方法及装置 | |
| CN113361549A (zh) | 一种模型更新方法以及相关装置 | |
| CN113095199B (zh) | 一种高速行人识别方法及装置 | |
| CN108875540A (zh) | 图像处理方法、装置和系统及存储介质 | |
| CN108875482B (zh) | 物体检测方法和装置、神经网络训练方法和装置 | |
| CN105095902A (zh) | 图片特征提取方法及装置 | |
| CN110968734A (zh) | 一种基于深度度量学习的行人重识别方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16896358 Country of ref document: EP Kind code of ref document: A1 |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16896358 Country of ref document: EP Kind code of ref document: A1 |