HK40057952B

HK40057952B - Processing method and device for image recognition model

Info

Publication number: HK40057952B
Application number: HK42022048033.9A
Authority: HK
Inventors: 翁海琴
Original assignee: 支付宝(杭州)信息技术有限公司
Filing date: 2022-02-11
Publication date: 2022-10-14

Description

Processing method and device for image recognition model

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for processing an image recognition model.

Background

With the development of machine learning technology, deep learning makes a great breakthrough in tasks such as image classification, image detection, face recognition and the like, and even the recognition capability of the model exceeds the image recognition capability of human beings on some tasks. However, studies have shown that the leading edge classification model may suffer from the recognition capability under the disturbance of disturbance. In other words, in the case of having a disturbing factor on a picture, an error may occur in the image recognition result of machine learning. For example, the ImageNet C data set illustrates that the image classifier may generate differences in identifying disturbance noise samples in weather with rain, snow, fog, etc., resulting in robustness problems for the image recognition model. Model robustness issues may cause security issues or cause asset losses, such as: errors in the digital recognition by the OCR model may result in financial system entry errors; the autopilot system identifies the wrong way, may execute the wrong instructions resulting in an irretrievable traffic accident, and so on. Therefore, the method is a very valuable research problem for improving the robustness of the model and the performance of the model aiming at the image recognition model.

Disclosure of Invention

One or more embodiments of the present specification describe a processing method and apparatus for an image recognition model, so as to solve one or more problems mentioned in the background art.

According to a first aspect, there is provided a method of processing for an image recognition model, the method comprising: obtaining a first image at a first resolution based on a first sample image in a sample set, the first sample image corresponding to a first identification tag;

generating, for the first image, a first antagonizing sample image using an image recognition model pre-trained based on the sample set; improving the resolution of the first anti-sample image to obtain a second image with a second resolution, wherein the second resolution is higher than the first resolution; and further training the pre-trained image recognition model by using the second image and the first recognition label as a first correction sample to obtain a corrected image recognition model for image recognition.

In one embodiment, said deriving a first image at a first resolution based on a first sample image in a sample set comprises: obtaining the first sample image from a sample set; and performing resolution reduction operation on the first sample image to obtain a first image under the first resolution, wherein the resolution reduction operation comprises at least one of convolution operation, pixel merging operation and pixel screening operation.

In one embodiment, the generating, for the first image, a first antagonizing sample image using an image recognition model pre-trained based on the sample set comprises: adding current disturbance to the first image to obtain a current disturbance image, wherein the image characteristics of the current disturbance image are determined by the superposition result of the image of the first image and the disturbance value of the current disturbance; determining the model loss of the pre-trained image recognition model for processing the current disturbance image; and adjusting the current disturbance to generate the first contrast sample image by taking the gradient increase of the model loss aiming at the image characteristics of the current disturbance image as a target.

In a further embodiment, the model loss comprises a recognition loss determined based on a comparison of an output result of the pre-trained image recognition model processing the current perturbation image with the recognition label of the first sample image.

In another further embodiment, the model loss comprises a visual loss determined based on a feature comparison of the current perturbed image and the first image.

In a further embodiment, the features of the current perturbed image and the first image are compared by: respectively extracting feature maps of the current disturbance image and the first image through a first neural network; feature maps of the current disturbance image and feature maps of the first image are compared with corresponding difference values one by one; and determining a comparison result according to the difference values respectively corresponding to the characteristic points.

In one embodiment, the features of the current perturbed image and the first image are compared by: respectively extracting feature vectors of the current disturbance image and the first image through a first neural network; and determining a comparison result according to the vector similarity of the feature vector of the current disturbance image and the feature vector of the first image.

In one embodiment, the resolution of the first pair of anti-sample images is increased by up-sampling one of an interpolation method, a super-resolution model, and a transposed convolution.

According to a second aspect, there is provided a processing apparatus for an image recognition model, the apparatus comprising:

the acquisition unit is configured to obtain a first image under a first resolution ratio based on a first sample image in a sample set, wherein the first sample image corresponds to a first identification tag;

a generating unit configured to generate, for the first image, a first antagonistic sample image using an image recognition model pre-trained based on the sample set;

an expansion unit configured to increase the resolution of the first contrast sample image to obtain a second image with a second resolution, wherein the second resolution is higher than the first resolution;

A correction unit configured to further train the pre-trained image recognition model using the second image and the first identification label as a first correction sample, so that the corrected image recognition model is used for image recognition.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.

According to the device and the method provided by the embodiment of the specification, aiming at the image recognition model trained by using the sample set, the model loophole is discovered by generating the countermeasure sample with low resolution, and the mode of improving the resolution of the countermeasure sample image is adopted to obtain the high-resolution disturbance sample. Therefore, the vulnerability existing in the image recognition model can be repaired by using the high-resolution disturbance sample and the correct recognition label as correction samples, so that the robustness of the model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of an embodiment of the present invention;

FIG. 2 illustrates a process flow diagram for an image recognition model in one embodiment of the present description;

FIG. 3 shows a block schematic diagram of a processing device for an image recognition model according to one embodiment of the present description.

Detailed Description

The scheme provided by the specification is described in the following with reference to the attached drawings.

FIG. 1 depicts a schematic diagram of an embodiment of the present disclosure for preprocessing an image recognition model. It can be understood that, in the process of identifying a target, for some abnormal samples, such as countersamples, samples added with disturbance such as physical world illumination or fog, the model may be determined incorrectly, and the case of the model determined incorrectly is referred to as a model leak in the present specification. The preprocessing scheme for the image recognition model provided by the specification aims to automatically find out the model bugs and repair the model bugs by using an effective scheme. The implementation architecture may be implemented by a computing platform. The computing platform is, for example, any terminal, device, computer or device cluster with certain data processing capability.

Referring to fig. 1, in an implementation architecture for preprocessing an image recognition model, a certain image recognition model may be pre-trained through a sample set. The sample set may contain a plurality of images related to the current recognition service, such as face images under the face recognition service, images related to candidate objects (such as cats and dogs) under the classification service, various obstacle images under the vehicle obstacle recognition service, and the like. The images in the sample set may have various pixels. The model structure of the image recognition model may be predetermined, such as the number of layers of the convolutional neural network, the number of nodes of each layer, and the like. The image recognition model determined by the model structure can be called as a determined image recognition model, one or more model parameters such as a weight parameter and a truncation parameter can be contained in the image recognition model, and the parameters are used as undetermined parameters to be adjusted in the model training process. The model pre-training process is the process of adjusting the undetermined parameters. The pre-training process is referred to as a model training process, because the model needs to be further modified at a later stage. In the pre-training process, each undetermined parameter may be adjusted by using model loss through a gradient descent method, and details are not described herein.

Further, for the pre-trained image recognition model, the confrontation sample can be determined by using the confrontation sample generation method by using the sample set. It can be understood that, since the countermeasure sample is a sample that adds a slight disturbance to the normal sample, resulting in an erroneous image recognition result, the process of determining the countermeasure sample can also be regarded as a process of finding a model vulnerability of the image recognition model. Under the technical idea of the present specification, a low-resolution high-mobility countermeasure sample can be generated using a sample image in a sample set. This is because the low-resolution image is easier to find a general purpose hole. Next, a correction sample may be generated using the challenge sample. The correction sample is a sample for repairing a hole in the pre-trained image recognition model. That is, the misrecognized results of the countersample are corrected. Since the countermeasure sample is an image formed by adding a disturbance to the original sample image, the sample label of the original sample image can be used as the correction label of the countermeasure sample to form the correction sample.

Although the low resolution reacts to samples with model-generic vulnerability and high attack mobility, such samples rarely exist in the physical world and cannot be directly integrated into the model training dataset. Therefore, under the technical idea of the present specification, in the process of generating the correction sample using the countermeasure sample, the low-resolution image is restored to the high-resolution image. These high resolution samples have practical significance in the physical world and can be added to the model training dataset as training samples for the image recognition model. Therefore, the correction samples can be determined by using the sample labels of the high-resolution image and the original sample image, the pre-trained image recognition model is trained by using the plurality of correction samples, and the loopholes in the image recognition model can be repaired, so that the robustness of the image recognition model is improved.

The technical concept of the specification provides a model for searching for the vulnerability under low resolution, and the copper leakage is repaired under high resolution through the migration from the vulnerability under low resolution to the vulnerability under high resolution, so that common vulnerabilities are effectively found, the vulnerabilities are effectively repaired, and the robustness of the image to be modeled is improved.

The technical details of the design concept of the present specification are described below in conjunction with specific embodiments.

Referring to fig. 2, fig. 2 shows a process flow for an image recognition model according to an embodiment. The process may be performed after pre-training a predetermined image recognition model with the sample set. In the pre-training process of the image recognition model, model loss can be determined through comparison between the recognition result of the sample image and the sample label, and undetermined parameters are adjusted by taking minimization of the model loss as a target until predetermined end conditions are met, such as accuracy is greater than an accuracy threshold, F1 score is greater than a predetermined threshold, and the like.

As shown in fig. 2, for the pre-trained image recognition model, the following processing is further performed: step 201, obtaining a first image under a first resolution ratio based on a first sample image in a sample set, wherein the first sample image corresponds to a first identification tag; step 202, aiming at a first image, generating a first contrast sample image by using a pre-trained image recognition model; step 203, improving the resolution of the first anti-sample image to obtain a second image with a second resolution, wherein the second resolution is higher than the first resolution; and step 204, further training the pre-trained image recognition model by using the second image and the first recognition label as a first correction sample to obtain a corrected image recognition model for image recognition.

Step 201, a first image at a first resolution is obtained based on a first sample image in a sample set. It can be understood that the sample set may include a plurality of sample images, and each sample image may also correspond to an identification tag (or referred to as a service tag, a sample tag, or the like) related to an image identification service. The first sample image may be any sample image in the sample set, which may be acquired according to a predetermined resolution, or may be acquired randomly, which is not limited herein. The identification tag corresponding to the first sample image may be referred to as a first identification tag.

The size of an image can be generally described by the number of pixels, i.e., the resolution, e.g., 1024 × 996 image, which means that there are 1024 rows and 996 columns of pixels in the image. It is understood that under the technical idea of the present specification, it is necessary to generate the anti-sample detection model hole by using the low resolution image, and the predetermined low resolution is usually a predetermined smaller resolution, for example, 64 × 64. Accordingly, a corresponding low resolution image, referred to as a first image, may be determined based on the first sample image. In the case where the resolution of the first sample image is such that a predetermined low resolution is satisfied, the first image may be the first sample image itself. In practice, however, low resolution images do not typically accurately reflect the physical world, that is, the image resolution in the sample set is typically much higher than the predetermined low resolution. To generate the confrontational samples at low resolution, the resolution of the first sample image may be reduced to obtain the first image.

The operation method for reducing the resolution includes but is not limited to one or more of convolution operation, pixel merging operation, pixel screening operation and the like down sampling operation. The convolution operation may be implemented by one or more convolution layers, for example, by performing convolution on a feature map formed by color values of the first sample image through convolution, extracting features in the image, obtaining a feature map with a lower resolution, and forming a low-resolution image. The binning operation may bin a number of pixels into a few pixels. For example, in a specific example, a plurality of (e.g., 5) pixels may be merged into a small number (e.g., 2) pixels in a certain row, and the color value of the merged small number of pixels may retain the color value of the corresponding original pixel, or may be the mean of the color values of neighboring pixels, etc., such as retaining the 2 nd pixel, retaining the color value of the pixel as the mean of the color values of the original 1 st to 3 rd pixels, etc. The pixel sifting operation may sift out pixels according to a predetermined rule, such as 3 pixels for each pixel taken, etc. In other embodiments, other methods may be used to reduce pixels, which are not described herein again. After the pixels are reduced, an image of the first resolution may be obtained. Correspondingly, the image obtained by reducing the resolution of the first sample image is referred to as a first image. It will be understood by those skilled in the art that the term "first" herein does not substantially define the image itself. For other sample images, other low-resolution images can also be obtained, which is not described herein.

Step 202, aiming at the first image, generating a first countermeasure sample image by using the pre-trained image recognition model. Here, the pre-trained image recognition model may be a model obtained by training and adjusting the relevant undetermined parameters of the determined image recognition model through a sample set. The pre-training process may be a real training process, and the training is finished after the model performance (such as accuracy, model loss, F1 score, etc.) reaches a predetermined condition.

Antagonistic samples (adaptive algorithms) generally refer to input samples formed by deliberately adding subtle perturbations in the data set that can cause the model to give an erroneous output with high confidence. The countermeasure sample can be used to discover model vulnerabilities. The method for generating the confrontational sample is generally performed by Gradient, and examples thereof include fgsm (fast Gradient signal method) and pgd (project Gradient detector). In the generation process of the countermeasure sample, the attack is often considered to be adding disturbance so that the loss of the model is increased, so that the attack sample is generated along the gradient direction. In training the image recognition model, the undetermined parameter is adjusted in the direction of decreasing gradient of the undetermined parameter, usually along with model loss for a fixed sample, while in generating the confrontation sample, the undetermined parameter is fixed, i.e. the undetermined parameter adjusted in the pre-training process, and then the sample is changed in the direction of increasing gradient of the model loss for the sample feature. That is, with the goal that the model loss is as large as possible, the sample feature value is adjusted by the gradient of the model loss to the sample feature, and a new sample is generated as a countermeasure sample.

For a first image, in the process of generating a first contrast sample, current disturbance may be added to the first image to obtain a current disturbance image, where the current disturbance image is obtainedThe image characteristics of the image are determined by the superposition result of the image of the first image and the disturbance value of the current disturbance, then the model loss of the pre-trained image recognition model for processing the current disturbance image is determined, and then the current disturbance is adjusted by taking the gradient increase of the model loss aiming at the image characteristics of the current disturbance image as a target, so that a first antagonistic sample image is generated. The image features may be, for example, color values on respective color channels, and the like. For example, the values of the pixels on the single channel in the corresponding mode, such as the gray values in a gray image, the chrominance values on R, G or the B channel in a color image in RGB mode, the chrominance values on C, M, Y or the K channel in a color image in CMYK mode, and so on. The model loss may be a model loss consistent with the pre-training image recognition model process, or may be a redefined model loss. In practice, the image features and the current disturbance are fused to obtain a current disturbance image, and in the process of generating the countermeasure sample image, the image features of the current disturbance image are used, so that the current disturbance can be adjusted by adjusting the image features of the current disturbance image. For example, the image feature x corresponding to a certain pixel of the first image ₁₁After current disturbance, becomes x₁₁＇＝x₁₁+r₁₁，r₁₁If the disturbance value corresponding to the pixel in the current disturbance is the disturbance value, the current disturbance can be adjusted by adjusting r₁₁Adjusting x₁₁' can also adjust x directly₁₁"substantially also includes the pair r₁₁Adjustment) to achieve consistent effects, which is not described herein again. Initial r₁₁It may be 0 (no perturbation added) or a small random value.

Taking FGSM as an example, in one specific example, the challenge sample may be generated by:

wherein x represents the sample characteristics of the original sample, J (x, y) represents the model loss of the pre-trained image recognition model y ═ f (x),represents the gradient to the feature x, ∈ represents the step size of adding the perturbation, and x' represents the sample feature of adding the perturbation on the original sample, i.e. the new sample (the confrontation sample). For example, the image feature of the original sample may be a color value of the image on each channel, and the color value obtained after adding the disturbance to each pixel corresponds to a new image added with the disturbance on the original sample image. It can be understood that the sign function herein ensures that the perturbation causes the model loss to increase towards the gradient direction, so that the recognition result of the pre-trained image recognition model on the challenge sample is wrong.

FGSM is an attack that increases the gradient only once for an image. However, this method may not succeed if the attack is a complex non-linear model. It is conceivable that complex nonlinear models may vary drastically over a very small range, so that a large gradient span may not attack success, so PGD considers shifting a large step of FGSM into multiple small steps, for example:

let the initial perturbation sample be consistent with the original image x and be recorded asThe disturbance samples updated in t rounds are recorded asThen there are:

where clip is used to prevent a small step from being too large.

Model loss J (x, y) orIt can be determined by a loss function in the pre-training process, or it can be determined in other ways. E.g. using C&The W algorithm generates a challenge sample. C&The W algorithm is based onThe optimized attack algorithm is characterized in that the difference between input and output is measured by setting a special loss function. This loss function contains adjustable hyper-parameters, as well as parameters that can control the confidence of the generated challenge samples. By selecting appropriate values for these two parameters, excellent challenge samples are generated.

As a specific example, the size of the disturbance may not be limited, i.e., it is not necessary to consider whether the disturbance is invisible to the naked eye, whether the disturbance is small enough, and whether the disturbance is meaningful in visual effect. Thus, generating the perturbation may take into account two types of loss, respectively, a recognition loss and a perceptual similarity loss (lpips, which may also be referred to as a visual loss). Thus, taking the identification loss as the classification error loss cls as an example, the model loss J can be set as follows:

J＝L_cls(f(x+x_adv)，y)+L_lpips(f^*(x+x_adv)，f^*(x))

Wherein x is_advRepresenting the disturbance to be regulated, x + x_advImage features, L, representing a currently disturbed image with a current disturbance added to the first image_cls(f(x+x_adv) Y) denotes an image recognition model, f (x + x)_adv) And the identification result of the image identification model for the sample output obtained by adding the current disturbance is represented, and y represents the identification label corresponding to the original sample image (such as the first sample image). L is a radical of an alcohol_cls(f(x+x_adv) Y) is a classification error penalty, which may be determined, for example, by comparison of the classification label with the original sample image (e.g., the first image) and the model output. Loss of vision L_lpips(f^*(x+x_adv)，f^*(x) In f) of^*Can be used to extract features from the original samples and the perturbed samples, which can be correlated to identify the loss L_cls(f(x+x_adv) Y) are completely different, and can also be the first several layers (such as the first 2 layers) of the neural network of f. The visual loss is determined by comparing the features of the current disturbance image and the first image, i.e. f^*(x+x_adv) And f^*(x) Is determined.

The result of extracting features from the current perturbation image and the first image may be feature maps of several channels (e.g. feature maps processed by a convolutional neural network), or may be feature vectors. Wherein: under the condition that the extracted features are feature maps of a plurality of channels, all difference values can be obtained through numerical value comparison of all feature points, and then the visual loss is determined according to all the difference values, if the visual loss is the sum of all the difference values, generally, the larger the difference value is, the larger the visual loss is, and otherwise, the smaller the difference value is, the smaller the visual loss is; in the case where the extracted feature is a feature vector, the visual loss can be determined by comparing the vector similarity of the two feature vectors, the smaller the vector similarity, the larger the visual loss, and the larger the vector similarity, the smaller the visual loss.

In further embodiments, the challenge sample may also be generated in other reasonable ways. In this specification, one countermeasure sample generated using the first image is referred to as a first countermeasure sample image. According to the nature of the countermeasure sample, the countermeasure sample represents the vulnerability of the image recognition model, that is, the image recognition model cannot correctly recognize the countermeasure sample.

Next, in step 203, the resolution of the first contrast sample image is increased to obtain a second image with a second resolution. To obtain a high resolution effective sample image, the resolution may be increased for a first contrast sample image at a first resolution. It will be appreciated that the second resolution is often much higher than the first resolution. The second resolution may be a resolution consistent with the original first image resolution, or may be a preset other resolution, and the image at the resolution may generally reflect real physical world things, such as 1280 × 960, 800 × 600, and so on. For convenience of description, an image in which the second contrast sample image is raised to the second resolution may be referred to as a second image.

Examples of the method for improving the resolution include an interpolation method, a super-resolution model, and an up-sampling method such as a transposed convolution. Examples of the interpolation method include a linear interpolation method, Nearest-neighbor interpolation (Nearest-neighbor), Bilinear interpolation (Bilinear), bicubic (bicubic), and the like. Taking a linear interpolation method as an example, a pixel may be inserted at a certain position, and the color value of the inserted pixel is the average value of neighboring pixels. In one embodiment, several pixels may be inserted in each row and then in each column. Since the second resolution may be much higher than the first resolution, in one example, a maximum of one pixel may be inserted between two pixels at a time and the difference may be performed for multiple times to obtain the image of the second resolution, and in another example, a plurality of pixels may be inserted between two pixels, and the color values of the respective pixels are arranged in an equal-difference sequence by taking the two pixels as endpoints, so as to obtain the image of the second resolution by uniformly increasing or decreasing. The nearest neighbor interpolation method is, for example, to take the color value of the nearest valid pixel for the interpolated pixel, and so on. In summary, various interpolation methods utilize different ways to determine color values for interpolated pixels to extend the resolution of an image.

The interpolation method can be regarded as that after the pixel is copied with the magnification, a certain fixed convolution kernel is used for deconvolution; the Super-Resolution (Super-Resolution) method based on the convolutional neural network can learn a convolution kernel, and network parameters are updated according to the difference between the constructed Super-Resolution image and a real high-Resolution image (Ground Truth), so that a clearer and real image can be recovered.

GAN (generic adaptive Networks, Generative antagonistic Networks) is a typical super-resolution model, which is a deep neural network architecture composed of two competing Networks, such as a generator and a discriminator, an auto-encoder and a variational auto-encoder, and so on. Taking a generator and a discriminator as an example, in general, the generator is a neural network used to generate new data instances, and the discriminator is a neural network used to evaluate the authenticity thereof. I.e. the arbiter decides whether each data instance it examines belongs to the actual training data set. According to one example, the operating principle of generating the countermeasure network is, for example: the generator inputs a series of numbers and returns an image (i.e., the generated image); sending the generated image and the image stream in the real data set to a discriminator; and the discriminator receives the real image and the false image and returns a probability value, wherein the probability value ranges from 0 to 1, 1 represents the prediction of authenticity, and 0 represents the false prediction. In this way, a double feedback loop is obtained: the discriminator is in a feedback loop containing the true phase of the image; the generator is in the feedback loop of the arbiter. From this principle, GAN can generate more realistic images. Thus, in one specific example, step 203 may employ a GAN-based high (super) resolution image generation method to convert low resolution images, such as networks like SRGAN and ESRGAN. Instead of GAN, a neural network, such as SRResNet, may be directly used for super-resolution generation, which is not limited herein.

Further, in step 204, the pre-trained image recognition model is further trained using the second image and the first recognition label as the first modified sample. It can be understood that the second image is an image obtained by increasing the resolution of the second countermeasure sample image, and the second image is an image which is recognized by the pre-trained image recognition model and is wrong as a countermeasure sample, and represents a model hole. Therefore, the image under high resolution can construct a correction sample through a correct identification label, and the pre-trained image identification model is further trained, so that model holes are repaired, and the trained image identification model can be used for more accurately performing image identification services.

In this specification, the second image is an image obtained based on the resolution expansion of the first resist sample image generated by adding noise (i.e., disturbance) to the first image, and the first image corresponds to the first sample image, and therefore, the correct recognition result of the second image should coincide with the first sample image. Therefore, the second image and the first identification label can form a correctly labeled training sample for correcting the image recognition model, so that the image recognition model can correctly recognize the second image formed by adding the disturbance. In this way, the correct training sample composed of the second image and the first identification label may also be referred to as a first corrected sample.

In practice, a single correction sample has little meaning in correcting the model, and therefore, through steps 201 and 202, a plurality of confrontational sample images (a single sample image may correspond to one or more confrontational sample images) may be generated using a plurality of sample images in the sample set, so that a plurality of correction samples are obtained through steps 203 and 204. The process of correcting the pre-trained image recognition model by using a plurality of correction samples is similar to the process of pre-training the image recognition model, and is not repeated herein.

Reviewing the process, aiming at the image recognition model trained by the sample set, discovering the model loophole by generating the countermeasure sample with low resolution, and constructing the correction sample by using the high-resolution image in a mode of improving the resolution of the countermeasure sample image for repairing the model loophole. Since the vulnerability in the low-resolution sample reflects the common vulnerability of the model in the image, which is a more ubiquitous and robust vulnerability, the vulnerability in the model can be found by searching the vulnerability of the model in the sample with the low-resolution sample (e.g., with a resolution of 64 × 64). And the low-resolution disturbance images are directly put into a training set of the model, so that the repairing effect cannot be generated, and even the performance of the model can be reduced. This is because most of the training data that characterize the physical world has a higher resolution, and if these low resolution data are directly integrated into the training sample set of the model, it may affect the model to learn the recognition information from the normal high resolution training data. Therefore, the present specification proposes a technical concept of converting a low-resolution perturbed sample (challenge sample) into a high-resolution perturbed sample, and this process is equivalent to migrating a hole at a low resolution to a hole at a high resolution. Therefore, the vulnerability existing in the image recognition model can be repaired by taking the high-resolution disturbance sample and the correct recognition label as correction samples, and the robustness of the model is improved.

According to another aspect, embodiments of the present specification further provide a processing apparatus for an image recognition model. The device can process the pre-trained image recognition model, find the model bugs and repair the bugs, thereby improving the robustness of the image recognition model.

As an embodiment, as shown in fig. 3, a processing apparatus 300 for an image recognition model includes: an obtaining unit 31 configured to obtain a first image at a first resolution based on a first sample image in the sample set, where the first sample image corresponds to a first identification tag; a generating unit 32 configured to generate, for the first image, a first antagonistic sample image using an image recognition model pre-trained on the sample set; an expanding unit 33 configured to increase the resolution of the first contrast sample image to obtain a second image with a second resolution, where the second resolution is higher than the first resolution; and a modification unit 34 configured to further train the pre-trained image recognition model using the second image and the first identification label as a first modification sample to obtain a modified image recognition model for image recognition.

In one embodiment, the obtaining unit 31 is further configured to:

Obtaining a first sample image from a sample set;

and performing resolution reduction operation on the first sample image to obtain a first image under the first resolution, wherein the resolution reduction operation comprises one of convolution operation, pixel merging operation and pixel screening operation.

In one embodiment, the generating unit 32 is further configured to:

adding current disturbance to the first image to obtain a current disturbance image, wherein the image characteristics of the current disturbance image are determined by the superposition result of the image of the first image and the disturbance value of the current disturbance;

determining the model loss of the pre-trained image recognition model for processing the current disturbance image;

and adjusting the current disturbance by taking the gradient increase of the model loss aiming at the image characteristics of the current disturbance image as a target so as to generate a first contrast sample image.

In one embodiment, the model loss comprises a recognition loss determined based on a comparison of an output result of a pre-trained image recognition model processing a current perturbation image with a recognition label of a first sample image.

In one embodiment, the model loss comprises a visual loss determined based on a comparison of features of the current perturbed image and the first image.

In one embodiment, the generating unit 32 compares the features of the current perturbed image and the first image by:

respectively extracting feature maps of the current disturbance image and the first image through a first neural network;

feature maps of the current disturbance image and feature maps of the first image are compared with corresponding difference values one by one;

and determining a comparison result according to the difference values respectively corresponding to the characteristic points.

In one embodiment, the generation unit 32 compares the features of the current perturbed image and the first image by:

respectively extracting feature vectors of a current disturbance image and a first image through a first neural network;

and determining a comparison result according to the vector similarity of the feature vector of the current disturbance image and the feature vector of the first image.

In one embodiment, the expansion unit 33 increases the resolution of the first anti-sample image by one of an interpolation method, a super-resolution model, and a transposed convolution.

It should be noted that the embodiment of the apparatus shown in fig. 3 corresponds to the embodiment of the method shown in fig. 2, and therefore, the corresponding description for fig. 2 is also applicable to the embodiment shown in fig. 3, and is not repeated herein.

According to an embodiment of yet another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in any of fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2 described above.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above embodiments are only intended to be specific embodiments of the technical concept of the present disclosure, and should not be used to limit the scope of the technical concept of the present disclosure, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical concept of the embodiments of the present disclosure should be included in the scope of the technical concept of the present disclosure.

Claims

1. A method of processing for an image recognition model, the method comprising:

obtaining a first image under a first resolution ratio based on a first sample image in a sample set, wherein the first sample image corresponds to a first identification label;

generating a first antagonistic sample image by using a model loss obtained by processing the first image based on an image recognition model pre-trained by the sample set;

improving the resolution of the first anti-sample image to obtain a second image with a second resolution, wherein the second resolution is higher than the first resolution;

and further training the pre-trained image recognition model by using the second image and the first recognition label as a first correction sample to obtain a corrected image recognition model for image recognition.

2. The method of claim 1, wherein the deriving a first image at a first resolution based on a first sample image in a sample set comprises:

obtaining the first sample image from a sample set;

and performing resolution reduction operation on the first sample image to obtain a first image under the first resolution, wherein the resolution reduction operation comprises at least one of convolution operation, pixel merging operation and pixel screening operation.

3. The method of claim 1, wherein the generating a first antagonizing sample image using model losses resulting from processing the first image based on an image recognition model pre-trained on the sample set comprises:

adjusting the current perturbation to generate the first antagonistic sample image with a target of the model loss for an increase in gradient of image features of the current perturbed image.

4. The method of claim 3, wherein the model penalty comprises a recognition penalty determined based on a comparison of an output result from processing the current perturbed image by the pre-trained image recognition model with a recognition label of the first sample image.

5. The method of claim 3 or 4, wherein the model loss comprises a visual loss determined based on a feature contrast of the current perturbed image and the first image.

6. The method of claim 5, wherein the features of the current perturbed image and the first image are compared by:

7. The method of claim 5, wherein the features of the current perturbed image and the first image are compared by:

respectively extracting feature vectors of the current disturbance image and the first image through a first neural network;

8. The method of claim 1, wherein the increasing the resolution of the first anticalculus sample image is achieved by up-sampling one of interpolation, super-resolution modeling, and transposed convolution.

9. A processing apparatus for an image recognition model, the apparatus comprising:

An acquisition unit configured to obtain a first image at a first resolution based on a first sample image in a sample set, the first sample image corresponding to a first identification tag;

a generating unit configured to generate a first antagonizing sample image by using a model loss obtained by processing the first image based on an image recognition model pre-trained by the sample set;

an expansion unit configured to increase the resolution of the first countermeasure sample image to obtain a second image with a second resolution, wherein the second resolution is higher than the first resolution;

and the correction unit is configured to further train the pre-trained image recognition model by using the second image and the first recognition label as a first correction sample so as to obtain a corrected image recognition model for image recognition.

10. The apparatus of claim 9, wherein the obtaining unit is further configured to:

obtaining the first sample image from a sample set;

11. The apparatus of claim 9, wherein the generating unit is further configured to:

12. The apparatus of claim 11, wherein the model penalty comprises a recognition penalty determined based on a comparison of an output result from processing the current perturbation image by the pre-trained image recognition model and a recognition label of the first sample image.

13. The apparatus of claim 11 or 12, wherein the model loss comprises a visual loss determined based on a feature comparison of the current perturbed image and the first image.

14. The apparatus of claim 13, wherein the generation unit compares the features of the current perturbed image and the first image by:

feature points of the feature map of the current disturbance image and feature maps of the first image are compared with corresponding difference values one by one;

15. The apparatus of claim 13, wherein the generation unit compares the features of the current perturbed image and the first image by:

16. The apparatus of claim 9, wherein the extension unit is configured to increase the resolution of the first pair of anti-sample images by up-sampling one of an interpolation method, a super-resolution model, and a transposed convolution.

17. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-8.

18. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-8.