CN113469277A

CN113469277A - Image recognition method and device

Info

Publication number: CN113469277A
Application number: CN202110827593.9A
Authority: CN
Inventors: 郑哲; 王盟; 陈斐洋; 吴立
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2021-10-01

Abstract

The present invention provides an image recognition method and device, comprising: acquiring N groups of initial scaling factor sets of an original convolutional neural network, wherein each group of initial scaling factor sets includes M scaling factors, and each initial scaling factor corresponds to At least one convolution layer of the original convolutional neural network, N and M are greater than or equal to 1; use an evolutionary algorithm to optimize N groups of initial scaling factor sets to obtain a set of optimized scaling factor sets, wherein, a set of optimized scaling factor sets Each optimized scaling factor in is used to represent the convolution kernel pruning ratio of the corresponding convolutional layer; prune the convolution kernel of the original convolutional neural network according to a set of optimized scaling factors to obtain a lightweight convolutional neural network . The invention solves the problem of low image recognition accuracy for terminals with limited resources, thereby achieving the effect of improving the pruning efficiency of the convolutional neural network.

Description

Image recognition method and device

Technical Field

The invention relates to the field of image recognition, in particular to an image recognition method and device.

Background

In the related technology, the convolutional neural network has high accuracy in image recognition. However, most convolutional neural networks have the characteristics of large calculation amount and many model parameters, and at present, images are usually identified on a large computer through the convolutional neural networks, but for resource-limited mobile terminals (such as mobile phones, wearable devices and other devices), it is difficult to directly deploy the convolutional neural networks to the resource-limited mobile terminals, and the mobile terminals cannot identify the images by using the convolutional neural networks.

Aiming at the problem that the accuracy rate of image recognition is low for a resource-limited terminal in the related art, an effective solution does not exist at present.

Disclosure of Invention

The embodiment of the invention provides an image identification method and device, which at least solve the problem of low image identification accuracy rate of a resource-limited terminal in the related art.

According to an embodiment of the present invention, there is provided an image recognition method including: acquiring N groups of initial scaling factor sets of an original convolutional neural network, wherein each group of initial scaling factor sets comprises M scaling factors, each initial scaling factor corresponds to at least one convolutional layer of the original convolutional neural network, and N and M are greater than or equal to 1; optimizing the N groups of initial scaling factor sets by using an evolutionary algorithm to obtain a group of optimized scaling factor sets, wherein each optimized scaling factor in the group of optimized scaling factor sets is used for representing the convolution kernel pruning proportion of a corresponding convolution layer; pruning the convolution kernels of the original convolution neural network according to the set of optimized scaling factor sets to obtain the lightweight convolution neural network.

Optionally, optimizing the N sets of initial scaling factor sets by using an evolutionary algorithm to obtain a set of optimized scaling factor sets, including: pruning the original convolutional neural network by using each group of initial scaling factor sets in the N groups of initial scaling factor sets to obtain N first convolutional neural networks; and under the condition that a convolutional neural network meeting a preset convergence condition exists in the N first convolutional neural networks, determining the convolutional neural network meeting the convergence condition as the light-weight convolutional neural network, wherein the preset convergence condition is used for indicating that the output of the fitness function is within a threshold range.

Optionally, the method further comprises: under the condition that no convolutional neural network meeting a preset convergence condition exists in the N first convolutional neural networks, crossing scaling factors at preset cross points in any two groups of initial scaling factor sets, and/or carrying out variation on one or more initial scaling factors at preset variation points in the scaling factor sets to obtain N groups of first scaling factor sets; pruning the original convolutional neural network by using each group of first scaling factor sets in the N groups of first scaling factor sets respectively to obtain N second convolutional neural networks; determining a convolutional neural network of the N second convolutional neural networks that satisfies the convergence condition as the lightweight convolutional neural network.

Optionally, pruning the convolution kernel of the original convolutional neural network according to the set of optimized scaling factors, including: determining a T value according to an optimized scaling factor corresponding to each convolutional layer in the original convolutional neural network and the number of convolutional kernels in each convolutional layer, wherein the T value is the number of convolutional kernels needing pruning, and T is an integer; sorting the convolution kernels in each convolution layer according to the norm of the convolution kernels; and pruning the convolution kernels ranked as the first T in each convolution layer.

Optionally, obtaining N sets of initial scaling factor sets of the original convolutional neural network, including: and randomly generating the N groups of initial scaling factor sets according to the number of convolution layers of the original convolutional neural network, wherein each initial scaling factor is a binary code, and each binary code corresponds to at least one convolution layer of the original convolutional neural network.

Optionally, the method further comprises: generating image feature data of a specified category using a conditional generation countermeasure network; training the lightweight convolutional neural network using the specified class of image feature data.

According to another embodiment of the present invention, there is provided an image recognition apparatus including: an obtaining module, configured to obtain N sets of initial scaling factors of an original convolutional neural network, where each set of initial scaling factors includes M scaling factors, each initial scaling factor corresponds to at least one convolutional layer of the original convolutional neural network, and N and M are greater than or equal to 1; the optimization module is used for optimizing the N groups of initial scaling factor sets by using an evolutionary algorithm to obtain a group of optimized scaling factor sets, wherein each optimized scaling factor in the group of optimized scaling factor sets is used for representing the convolution kernel pruning proportion of a corresponding convolution layer; the processing module is used for pruning the convolution kernels of the original convolution neural network according to the set of optimized scaling factor sets to obtain a light-weight convolution neural network; and the identification module is used for identifying the target image by using the light-weight convolutional neural network to obtain the identification result of the target image.

Optionally, the optimization module comprises: a first processing unit, configured to prune the original convolutional neural network using each initial scaling factor set of the N initial scaling factor sets to obtain N first convolutional neural networks; and a determining unit, configured to determine, when a convolutional neural network that satisfies a preset convergence condition exists in the N first convolutional neural networks, the convolutional neural network that satisfies the convergence condition as the lightweight convolutional neural network, where the preset convergence condition is used to indicate that an output of the fitness function is within a threshold range.

According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the invention, N groups of initial scaling factor sets of the original convolutional neural network are obtained; optimizing the N groups of initial zoom factor sets by using an evolutionary algorithm to obtain a group of optimized zoom factor sets; pruning the convolution kernel of the original convolution neural network according to the set of optimized scaling factor sets to obtain a lightweight convolution neural network, deploying the lightweight convolution neural network at a resource-limited mobile terminal, and identifying the target image by using the lightweight convolution neural network at the mobile terminal. The problem that the image recognition accuracy rate is low for a terminal with limited resources can be solved, and the effect of improving the image recognition accuracy rate is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a mobile terminal of an image recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of an image recognition method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a model structure of a non-block type deep convolutional neural network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a model structure of a block-type deep convolutional neural network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a convolutional neural network model structure in accordance with an alternative embodiment of the present invention;

fig. 6 is a block diagram of the structure of an image recognition apparatus according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking an example of the method performed in a mobile terminal, fig. 1 is a block diagram of a hardware structure of the mobile terminal according to an embodiment of the present invention. As shown in fig. 1, the mobile terminal 10 may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the image recognition method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In the present embodiment, an image recognition method operating in the mobile terminal is provided, and fig. 2 is a flowchart of the image recognition method according to the embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, obtaining N groups of initial scaling factor sets of an original convolutional neural network, wherein each group of initial scaling factor sets comprises M scaling factors, each initial scaling factor corresponds to at least one convolutional layer of the original convolutional neural network, and N and M are greater than or equal to 1;

step S204, optimizing the N groups of initial scaling factor sets by using an evolutionary algorithm to obtain a group of optimized scaling factor sets, wherein each optimized scaling factor in the group of optimized scaling factor sets is used for representing the convolution kernel pruning proportion of the corresponding convolution layer;

step S206, pruning the convolution kernel of the original convolution neural network according to the set of optimized scaling factors to obtain a light-weight convolution neural network;

and S208, identifying the target image by using the lightweight convolutional neural network to obtain an identification result of the target image.

Through the steps, N groups of initial scaling factor sets of the original convolutional neural network are obtained; optimizing the N groups of initial zoom factor sets by using an evolutionary algorithm to obtain a group of optimized zoom factor sets; pruning the convolution kernel of the original convolution neural network according to the set of optimized scaling factor sets to obtain a lightweight convolution neural network, deploying the lightweight convolution neural network at a resource-limited mobile terminal, and identifying the target image by using the lightweight convolution neural network at the mobile terminal. The problem that the image recognition accuracy rate is low for a terminal with limited resources can be solved, and the effect of improving the image recognition accuracy rate is achieved.

Alternatively, the execution subject of the above steps may be a terminal or the like, but is not limited thereto.

As an optional implementation mode, the deep convolutional network is divided according to the composition characteristics of the deep convolutional network and can be divided into a non-block type and a block type. The non-block type network, such as early AlexNet, VGGNet, etc., is a schematic model structure diagram of a non-block type deep convolutional neural network according to an optional embodiment of the present invention, as shown in fig. 3, and the non-block type network is formed by stacking and connecting single-layer convolutional networks in series. The block types such as ResNet, MobileNet, ShuffleNet and the like are shown in fig. 4, which is a schematic diagram of a model structure of a block type deep convolutional neural network according to an optional embodiment of the present invention, the block type deep neural network is formed by stacking a plurality of stages containing blocks, each block is composed of several convolutional layers, and the blocks are connected by residual blocks.

As an optional implementation, when the convolutional neural network CNN is of a non-block type, a scaling factor α e [0, 1] is introduced into each convolutional layer of the convolutional neural network to form a scaling factor set, where each scaling factor in the scaling factor set corresponds to one convolutional layer. For example, a convolutional neural network comprises M layers, then M scaling factors constitute a set of scaling factors. When the convolutional neural network CNN is a block type, a scaling factor is introduced to each block of the convolutional neural network, where each scaling factor corresponds to one block, and each block includes a plurality of convolutional layers. For example, the convolutional neural network includes M blocks, and then scaling factors corresponding to the M blocks constitute a scaling factor set.

As an optional implementation, taking a non-block type convolutional neural network as an example, assume that the original convolutional neural network model CNN_origThe number of convolution layers of (1) is M. The number of convolution kernels of each convolution layer is N_lThe number of channels of the convolution kernel is C_lHeight of two-dimensional convolution is H_lWidth of the two-dimensional convolution is W_l. Each layer of the convolution kernel may be parameterized by

And (4) showing. An evolutionary algorithm may be employed to search for an optimal scaling factor, which searches forThe rope process comprises the following steps:

in step S21, performing population initialization, which may be preset to set the number N of individuals in the population, where each individual is used to represent a set of scaling factor sets, and a set of scaling factor sets is performed to initialize a scaling factor for each convolutional layer in the convolutional neural network. And initializing scaling factors for N times on each convolution layer in the convolutional neural network to obtain N scaling factor sets, wherein the number of individuals in the population is N. Since the convolutional neural network model has M convolutional layers, M scaling factors are included in each scaling factor set.

Step S22, performing cross variation on the scaling factors by using an evolutionary algorithm to obtain a set of optimized scaling factor sets, where each set of optimized scaling factor sets includes M scaling factors, and each scaling factor is used to represent a scaling ratio of a convolution kernel in each convolution layer of the convolutional neural network. For example, a1 in a set of scaling factors is 0.3, and there are 10 convolution sums in the convolution layer corresponding to the scaling factor a1, then the convolution layer needs to prune 3 convolution kernels.

And step S23, according to each scaling factor in the set of optimized scaling factors, pruning the convolution sum in the convolution layer corresponding to each scaling factor. Assume that the convolutional neural network includes two convolutional layers, a first convolutional layer with 10 convolutional sums and a second convolutional layer with 20 convolutional kernels. One set of optimized scaling factors includes a 1-0.3 and a 2-0.5, where a1 corresponds to the first convolutional layer and a2 corresponds to the second convolutional layer, and the first convolutional layer needs to prune 3 convolutional kernels and the second convolutional layer needs to prune 10 convolutional kernels. And pruning each convolutional layer to obtain the light convolutional neural network.

As an optional embodiment, optimizing the N sets of initial scaling factor sets using an evolutionary algorithm may include the steps of:

step S31, for each individual, i.e. each set of initial scaling factor sets, for the original convolutional neural network CNN_origPruning is carried out, and a lightweight model CNN is extracted from the original neural network_{light_pre}Extracting the N times of lightweight models by N groups of initial scaling factor sets to obtain N lightweight models CNN_{light_pre}I.e. N first convolutional neural networks. For the selection of the convolution kernels needing pruning in each convolution layer, the norm of each convolution kernel in each convolution layer can be obtained, the convolution kernels of each convolution layer are sorted according to the magnitude of the norm of the convolution kernels, and the convolution kernels can be sorted in the order of the norm from small to large. Pruning the convolution kernel of the convolution layer according to the corresponding scaling factor of the convolution layer. For example, the convolutional layer includes 30 convolutional kernels, the 30 convolutional kernels are sorted from small to large according to the norm of the 30 convolutional kernels, if the scaling factor corresponding to the convolutional layer is 0.1, 3 convolutional kernels need to be pruned, the convolutional kernels in the first three of the sorting are removed, and the remaining convolutional kernels are retained.

Step S32, determining whether the N first convolution neural networks satisfy a preset convergence condition, where the preset convergence condition may be convergence of an output value of a fitness function f ═ γ accumy + (1- γ) cost, where accumy is an accuracy rate of output of the first convolution neural network, and specifically, inputting the verification data set into the first convolution neural network, and comparing the estimated classification result output by the first convolution neural network with a known classification result of the verification data set, so as to obtain the accuracy rate accumy. cost is the ratio of the total parameters of the pruning to the total parameters of the model, and can also be the ratio of the number of the convolution kernels of the pruning to the total convolution kernels of the original convolution neural network model. Gamma is a hyper-parameter used for adjusting the proportion of the accuracy rate in evaluating the individual. The larger the fitness function is, the better the individual is. And if the first convolution nerves meeting the preset convergence condition exist in the N first convolution neural networks, determining the first convolution neural network meeting the convergence condition as a light-weight convolution neural network model. If there is no first convolutional neural network satisfying the convergence condition among the N first convolutional neural networks, step 32 is performed.

Step S33, when the preset crossover prob is satisfied_crossWhen in probability, carrying out multipoint matching and crossing on the N groups of initial scaling factor sets, and presetting the crossing probability prob_crossCan be a numerical value between 0 and 1, and the specific size can be determined according to actual conditions; when the preset cross probability prob is satisfied_mutateThen, a single-point variation is performed on one or more initial scaling factor sets in the N initial scaling factor sets to preset a cross probability prob_mutateMay be a value between 0 and 1. Pairing original convolutional neural network CNN with N sets of scale factors after interleaving and/or mutation_origAnd performing pruning, and repeatedly executing the steps S31 and S32 until the fitness function of the convolutional neural network model obtained by pruning converges.

As an optional implementation manner, in the case that there is no convolutional neural network that satisfies the preset convergence condition, the following steps may be iteratively performed until the convolutional neural network satisfies the preset convergence condition:

in step S41, a preset intersection and a variation point are set. The scaling factor may be a binary number of eight bits. Specifically, all the scaling factors α may be binary-coded, assuming that the number of coded bits is b-8, the coding space is code ∈ [00000000, …,11111111], when the scaling factor is 0 in decimal, the binary is 00000000, and when the scaling factor is 1 in decimal, the binary is 11111111, and the decimal a may be represented by an 8-bit binary code. The preset intersection and the variation point may be determined according to actual situations, for example, the first bit of the 8-bit binary code may be selected as the intersection, the second bit may be selected as the variation point, or the first bit and the second bit may be selected as the intersection, and the third bit and the fourth bit may be selected as the variation point. The number of the cross points and the variation points can be determined according to actual conditions, and the positions can also be determined according to actual conditions.

Step S42, performing intersection and/or mutation processing on the N sets of initial scaling factor sets according to preset intersection points and mutation points to obtain N sets of first scaling factor sets. For example, assume that the first scaling factor in the first set of scaling factors is 11100000 and the first scaling factor in the second set of scaling factors is 00000111. In the case of the first bit and the second bit at the intersection, after the intersection is performed, the first scaling factor in the first set of scaling factors is 00100000, and the first scaling factor in the second set of scaling factors is 11000111. Specifically, the crossed scaling factor sets may be determined according to actual situations, and may be obtained by crossing corresponding scaling factors in any two sets of scaling factor sets in the N sets of scaling factor sets. Taking the variation fork as an example, assuming that one of the scaling factors in a set of scaling factors is 11100000, and the predetermined variation point is the third bit of the first bit, the variation of 11100000 is 01000000.

And step S43, pruning the original convolutional neural network by using the scaling factor set obtained after the cross mutation processing. And judging whether the pruned convolutional neural network meets a preset convergence condition, if so, the pruned convolutional neural network is a light convolutional neural network. If not, the above steps S41 and S42 are repeatedly executed until satisfied.

As an optional embodiment, for the selection of the convolution kernels that need pruning in each convolution layer, the norm of each convolution kernel in each convolution layer may be obtained, the convolution kernels of each convolution layer may be sorted according to the magnitude of the norm of the convolution kernel, and the sorting may be performed in the order of the norm from small to large. Pruning the convolution kernel of the convolution layer according to the corresponding scaling factor of the convolution layer. For example, the convolutional layer includes 30 convolutional kernels, the 30 convolutional kernels are sorted from small to large according to the norm of the 30 convolutional kernels, if the scaling factor corresponding to the convolutional layer is 0.1, 3 convolutional kernels need to be pruned, the convolutional kernels in the first three of the sorting are removed, and the remaining convolutional kernels are retained.

As an alternative embodiment, the initial scaling factors in the N sets of initial scaling factors may be randomly generated, and the scaling factors may be binary numbers of eight bits. Specifically, all the scaling factors α may be binary-coded, assuming that the number of coded bits is b-8, the coding space is code ∈ [00000000, …,11111111], when the scaling factor is 0 in decimal, the binary is 00000000, and when the scaling factor is 1 in decimal, the binary is 11111111, and the decimal a may be represented by an 8-bit binary code.

As an alternative implementation, fig. 5 is a schematic diagram illustrating a convolutional neural network model structure according to an alternative embodiment of the present invention, and the depth model is composed of a feature extractor and a classifier. Aiming at the classifier, the false features of the class can be generated by the conditional generation countermeasure network according to the specified class, the class image feature data is expanded, and the classifier is further trained, optimized and improved in performance. The "data" enhancement can be done from a hierarchy of features. A Conditional Generation Adaptive Network (CGAN) may be used to input class information so that the generating network generates pseudo features from the noise that may be used to train the classifier. Suppose the ith class of classification is C_iFirst, a countermeasure network is generated using conditions such that it is in accordance with a given class C_iGenerating image feature data F having such non-discriminativity_fake(i)Combining the true features F_real(i)And constructing image characteristic data of the category, and finally finely adjusting the classifier by using the SGD.

As an optional implementation manner, for a block-type convolutional neural network model, the same processing manner as for a non-block-type convolutional neural network model is used, except that a scaling factor is introduced into each block, the other processes are the same, after the scaling factor of each block is solved, pruning is performed on all convolutional layers under the block in the same proportion, and the size of the scaling factor is the size of the scaling factor.

The following description of the present application is made by specific examples, taking non-block types as examples:

setting original convolution neural network model CNN_origThe number of convolution layers of (a) is L. Introducing a scaling factor a to each convolutional layer in the network_l∈[0,1]Let each layer of convolution kernel be composed of parameters

Is shown byIn, N_lRepresenting the number of convolution kernels, C_lNumber of channels, H, representing convolution kernel_lHigh, W representing a two-dimensional convolution_lRepresenting the width of the two-dimensional convolution. An evolutionary algorithm is adopted to search for the optimal scaling factor, and the searching process is as follows:

a. population initialization: all the scaling factors α are binary coded, and assuming that the number of coded bits is b equal to 8, the coding space is code e [00000000, 111111111, when α is 0, the coding space is 00000000, and when α is 1, the coding space is 11111111, that is, α is represented by 8 binary codes. The transformation relationship is as follows: α is decimal (code) × 1/255. Then the population matrix p is e {0, 1}^P×(b×L)Wherein P is the number of population individuals, and the coding length of each individual is b multiplied by L.

b. Constructing a fitness function: for each individual, from CNN_origMiddle-extracted lightweight model CNN_{light_pre}. For example, let i-th layer of convolution layer a_i＝0.72，a_i-1＝0.51，N_i＝N_i-1＝32，H_i＝W_lIf 3, the parameter matrix is

It should be noted that, here, the convolution kernel selection is not random selection, but the convolution kernel is firstly selected according to norm l₁norm sorts, picks top int (N)_ia_i) Small convolution kernel composition CNN_{light_pre}。

For the obtained CNN_{ligh_tpr}And respectively obtaining the accuracy accuracuacy of Cifar10 on the proxy data set, and the calculated quantity flops and the parameter quantity params of Cifar10, and constructing a fitness function f ═ gamma accuracyaccy + (1-gamma) cost. The cost function is a flops parameter or params parameter calculated by searching out the model, and here refers to a parameter removal ratio of the model, i.e. a ratio of a total parameter of pruning to a total parameter of the model. Gamma is a hyper-parameter used for adjusting the proportion of the accuracy rate in evaluating the individual. The larger the fitness function is, the better the individual is.

c. And (3) a crossover operator: setting the crossover probability prob_cross∈[0,1]Performing multipoint matching intersection when the intersection probability is met;

d. mutation operator: setting the probability of variation prob_mutate∈[0,1]Performing single-point mutation when the mutation probability is met;

e. selecting: selecting individuals meeting the calculated amount and the parameter number at the same time according to the performance parameters of the mobile terminal equipment to be deployed for next iteration;

f. and repeating the steps b to e until a termination condition (iteration times or fitness function convergence) is reached.

After the optimal search factor is solved on the proxy data set, pruning operation is executed, and finally fine adjustment is carried out on the target data set to obtain CNN_{light_pre}。

According to the method, the resource parameters of the hardware are fused into the automatic searching process, and under the limiting condition, the lightweight model meeting the equipment resource requirement can be directly obtained by adopting the automatic searching method, so that the complicated and tedious workload of algorithm personnel is avoided. A novel data enhancement method is provided, image characteristic data related to the generation category of the countermeasure network is directly generated by using conditions, data expansion is carried out at a characteristic level, and performance gain is obtained by finely adjusting a classifier.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, an image recognition apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of the apparatus is omitted for brevity. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 6 is a block diagram of the structure of an image recognition apparatus according to an embodiment of the present invention, as shown in fig. 6, the apparatus including: an obtaining module 62, configured to obtain N sets of initial scaling factors of an original convolutional neural network, where each set of initial scaling factors includes M scaling factors, each initial scaling factor corresponds to at least one convolutional layer of the original convolutional neural network, and N and M are greater than or equal to 1; an optimizing module 64, configured to optimize the N sets of initial scaling factor sets by using an evolutionary algorithm to obtain a set of optimized scaling factor sets, where each optimized scaling factor in the set of optimized scaling factor sets is used to represent a convolution kernel pruning proportion of a corresponding convolution layer; the processing module 66 is configured to prune the convolution kernel of the original convolution neural network according to the set of optimized scaling factor sets to obtain a lightweight convolution neural network; and the identification module 68 is configured to identify the target image by using the lightweight convolutional neural network, so as to obtain an identification result of the target image.

Optionally, the apparatus is further configured to, in a case that there is no convolutional neural network that satisfies a preset convergence condition in the N first convolutional neural networks, intersect scaling factors at a preset intersection in any two sets of the initial scaling factor sets, and/or vary one or more initial scaling factors at a preset variation point in the scaling factor sets, to obtain N sets of first scaling factor sets; pruning the original convolutional neural network by using each group of first scaling factor sets in the N groups of first scaling factor sets respectively to obtain N second convolutional neural networks; determining a convolutional neural network of the N second convolutional neural networks that satisfies the convergence condition as the lightweight convolutional neural network.

Optionally, the apparatus is further configured to determine a T value according to an optimized scaling factor corresponding to each convolutional layer in the original convolutional neural network and the number of convolutional kernels in each convolutional layer, where the T value is the number of convolutional kernels that need pruning, and T is an integer; sorting the convolution kernels in each convolution layer according to the norm of the convolution kernels; and pruning the convolution kernels ranked as the first T in each convolution layer.

Optionally, the apparatus is further configured to randomly generate the N sets of initial scaling factors according to the number of convolutional layers of the original convolutional neural network, where each initial scaling factor is a binary code, and each binary code corresponds to at least one convolutional layer of the original convolutional neural network.

Optionally, the apparatus is further configured to generate image feature data of a specified category using the conditional generation countermeasure network; training the lightweight convolutional neural network using the specified class of image feature data.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, obtaining N groups of initial scaling factor sets of the original convolutional neural network, wherein each group of initial scaling factor sets comprises M scaling factors, each initial scaling factor corresponds to at least one convolutional layer of the original convolutional neural network, and N and M are greater than or equal to 1;

s2, optimizing the N groups of initial scaling factor sets by using an evolutionary algorithm to obtain a group of optimized scaling factor sets, wherein each optimized scaling factor in the group of optimized scaling factor sets is used for representing the convolution kernel pruning proportion of the corresponding convolution layer;

s3, pruning the convolution kernels of the original convolution neural network according to the set of optimized scaling factors to obtain a light-weight convolution neural network;

and S4, identifying the target image by using the lightweight convolutional neural network to obtain the identification result of the target image.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. an image recognition method, is characterized in that, comprises:

Obtaining N groups of initial scaling factor sets of the original convolutional neural network, wherein each group of initial scaling factor sets includes M scaling factors, and each initial scaling factor corresponds to at least one layer of the original convolutional neural network. Layers, N and M are greater than or equal to 1;

Using an evolutionary algorithm to optimize the N groups of initial scaling factor sets to obtain a set of optimized scaling factor sets, wherein each optimized scaling factor in the set of optimized scaling factor sets is used to represent the convolution of the corresponding convolutional layer nuclear pruning ratio;

Prune the convolution kernel of the original convolutional neural network according to the set of optimized scaling factors to obtain a lightweight convolutional neural network;

Recognizing the target image by using the lightweight convolutional neural network to obtain the recognition result of the target image.

2. The method according to claim 1, wherein an evolutionary algorithm is used to optimize the N groups of initial scaling factor sets to obtain a set of optimized scaling factor sets, comprising:

Using each group of initial scaling factor sets in the N groups of initial scaling factor sets to prune the original convolutional neural network to obtain N first convolutional neural networks;

In the case where a convolutional neural network that satisfies a preset convergence condition exists in the N first convolutional neural networks, it is determined that the convolutional neural network that satisfies the preset convergence condition is the lightweight convolutional neural network, Wherein, the preset convergence condition is used to indicate that the output of the fitness function is within the threshold range.

3. The method according to claim 2, wherein the method further comprises:

In the case where there is no convolutional neural network that satisfies the preset convergence condition in the N first convolutional neural networks, the scaling factors at the preset intersection points in any two sets of the initial scaling factor sets are crossed, And/or, mutating one or more initial scaling factors at preset variation points in the scaling factor set to obtain N groups of first scaling factor sets;

The original convolutional neural network is pruned by using each group of first scaling factor sets in the N groups of first scaling factor sets, respectively, to obtain N second convolutional neural networks;

It is determined that the convolutional neural network satisfying the convergence condition among the N second convolutional neural networks is the light-weight convolutional neural network.

4. The method according to any one of claims 1 to 3, wherein pruning the convolution kernel of the original convolutional neural network according to the set of optimized scaling factors, comprising:

The T value is determined according to the optimized scaling factor corresponding to each convolutional layer in the original convolutional neural network and the number of convolution kernels in each convolutional layer, where the T value is the convolution that needs to be pruned The number of cores, T is an integer;

Sorting the convolution kernels in the convolutional layers of each layer according to the convolution kernel norm;

Prune the top T convolution kernels in each convolutional layer.

5. The method according to any one of claims 1 to 3, wherein obtaining N groups of initial scaling factor sets of the original convolutional neural network, comprising:

The N groups of initial scaling factor sets are randomly generated according to the number of convolutional layers of the original convolutional neural network, wherein each initial scaling factor is a binary code, and each binary code corresponds to the original convolutional At least one convolutional layer of the neural network.

6. The method of claim 1, wherein the method further comprises:

Use conditional generative adversarial network to generate image feature data of specified categories;

The lightweight convolutional neural network is trained using the specified class of image feature data.

7. An image recognition device, characterized in that, comprising:

an acquisition module, configured to acquire N groups of initial scaling factor sets of the original convolutional neural network, wherein each group of initial scaling factor sets includes M scaling factors, and each initial scaling factor corresponds to the original convolutional neural network At least one convolutional layer of , N and M are greater than or equal to 1;

An optimization module, configured to use an evolutionary algorithm to optimize the N groups of initial scaling factor sets to obtain a set of optimized scaling factor sets, wherein each optimized scaling factor in the set of optimized scaling factor sets is used to represent a corresponding volume The convolution kernel pruning ratio of the multi-layer;

a processing module, configured to prune the convolution kernel of the original convolutional neural network according to the set of optimized scaling factors to obtain a lightweight convolutional neural network;

The recognition module is used to recognize the target image by using the light-weight convolutional neural network to obtain the recognition result of the target image.

8. The device according to claim 7, wherein the optimization module comprises:

a first processing unit, configured to prune the original convolutional neural network using each group of initial scaling factor sets in the N groups of initial scaling factor sets to obtain N first convolutional neural networks;

A determination unit, configured to determine that the convolutional neural network that satisfies the convergence condition is the lightweight convolutional neural network when there is a convolutional neural network that satisfies a preset convergence condition in the N first convolutional neural networks A neural network, wherein the preset convergence condition is used to indicate that the output of the fitness function is within a threshold range.

9. A storage medium, wherein a computer program is stored in the storage medium, wherein the program can be executed by a terminal device or a computer to execute the method according to any one of claims 1 to 6 .

10. An electronic device comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute any one of claims 1 to 6 method described in.