CN117934379A

CN117934379A - Noise detection method, device and storage medium

Info

Publication number: CN117934379A
Application number: CN202311781685.3A
Authority: CN
Inventors: 王亚运; 廖炳焱; 林骏; 金恒; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2023-12-21
Filing date: 2023-12-21
Publication date: 2024-04-26

Abstract

The application discloses a noise detection method, equipment and a storage medium, wherein the noise detection method comprises the following steps: acquiring related text information of an image to be processed by using a language model; the method comprises the steps of taking an image to be processed as a diffusion model to carry out forward diffusion input, taking a noise adding image output by a forward process and related text information as a diffusion model to carry out backward diffusion input, and taking an output result of the backward process as a target denoising image; performing image processing on the image to be processed and the target denoising image by using an image processing model to obtain a first image processing result of the image to be processed and a second image processing result of the target denoising image; and determining that noise exists in the image to be processed in response to the difference between the first image processing result and the second image processing result being greater than or equal to a preset difference. By the scheme, whether noise exists in the image to be processed or not can be determined.

Description

Noise detection method, device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a noise detection method, apparatus, and storage medium.

Background

With rapid development of technology, rapid construction of an intelligent system, and stable performance of the intelligent system becomes an increasingly important part. Some attack modes which are specially aimed at the intelligent algorithm at present also show that the intelligent algorithm at present has certain defects. If these attacks are directed against the intelligent algorithm, the performance of the intelligent system may be poor, and the original function of the intelligent system is lost.

Disclosure of Invention

The application provides at least a noise detection method, device and storage medium.

The application provides a noise detection method, which comprises the following steps: acquiring related text information of an image to be processed by using a language model; the method comprises the steps of taking an image to be processed as a diffusion model to carry out forward diffusion input, taking a noise adding image output by a forward process and related text information as a diffusion model to carry out backward diffusion input, and taking an output result of the backward process as a target denoising image; performing image processing on the image to be processed and the target denoising image by using an image processing model to obtain a first image processing result of the image to be processed and a second image processing result of the target denoising image; and determining that noise exists in the image to be processed in response to the difference between the first image processing result and the second image processing result being greater than or equal to a preset difference.

The application provides a noise detection device, comprising: the device comprises a first processing module, a second processing module, an image processing module and a noise confirmation module; the first processing module is used for acquiring related text information of the image to be processed by using the language model; the second processing module is used for taking the image to be processed as the diffusion model to carry out forward diffusion input, taking the noise adding image output by the forward process and the related text information as the diffusion model to carry out backward diffusion input, and taking the output result of the backward process as a target denoising image; the image processing module is used for carrying out image processing on the image to be processed and the target denoising image by utilizing the image processing model to obtain a first image processing result of the image to be processed and a second image processing result of the target denoising image; and the noise confirmation module is used for determining that the image to be processed has noise in response to the difference between the first image processing result and the second image processing result being greater than or equal to a preset difference.

The application provides an electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the noise detection method.

The present application provides a computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the above-described noise detection method.

According to the scheme, the text information of the image to be processed is obtained by using the language model, then the text information is used as input in the backward diffusion of the diffusion model, so that the diffusion model can refer to the text information of the image to be processed in the backward diffusion process, the main information of the image to be processed can be reserved by the target denoising image obtained through determination, and the result of determining whether noise exists in the obtained image to be processed is more accurate.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart of an embodiment of a noise detection method of the present application;

FIG. 2 is a schematic diagram of a training process of a diffusion model according to an embodiment of the noise detection method of the present application;

FIG. 3 is a schematic workflow diagram of a diffusion model according to an embodiment of the noise detection method of the present application;

FIG. 4 is a schematic flow chart of an embodiment of a noise detection method according to the present application;

FIG. 5 is a schematic diagram of a noise detecting device according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an embodiment of an electronic device of the present application;

FIG. 7 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

The following describes embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.

In the present application, an execution subject for realizing the noise detection method described in the present application may be a noise detection apparatus, an electronic device, or the like. For example, the noise detection apparatus may be provided in a terminal device or a server or other processing device, where the terminal device may be a security check device, an electronic device, a User Equipment (UE), a handheld device, a computing device, an in-vehicle device, a wearable device, or the like. In some possible implementations, the noise detection method may be implemented by way of a processor invoking computer readable instructions stored in a memory.

Referring to fig. 1, fig. 1 is a flow chart of an embodiment of a noise detection method according to the present application.

As shown in fig. 1, the noise detection method provided by the embodiment of the disclosure may include the following steps:

Step S11: and acquiring the related text information of the image to be processed by using the language model.

The image to be processed may be a two-dimensional image or a three-dimensional image. The image to be processed may be a security image or other image, and the type of the image to be processed is not particularly limited. The Language Model (LM) plays an important role in natural language processing, and its task is to predict the probability of a sentence occurring in a language. The development of the language model successively goes through a grammar rule language model, a statistical language model and a neural network language model, and the types of the language models are various. The large language model can be an open source large model, and the general large model has stronger robustness and is less affected by attack after a large amount of data and parameters are iterated for a plurality of times. The method can also be a large model with certain defending performance after targeted fine tuning, for example, a model for targeted defending against a basic self-attention mechanism structure of a transducer, and has better effect. The language model is capable of converting an input image into corresponding text information, which can be understood as descriptive information of the image to be processed. Alternatively, the related text information is related to the image processing model, for example, if the image processing model is a target classification model, the related text information may be classification information of a target object in the image, and if the image processing model is a target detection model, the related text information may be positioning information of the target object in the image. If the image processing model is an event detection model, the related text information may be event description information in the image.

Step S12: and taking the image to be processed as a diffusion model to carry out forward diffusion input, taking the noise adding image output by the forward process and related text information as a diffusion model to carry out backward diffusion input, and taking the output result of the backward process as a target denoising image.

The diffusion model (diffusion models) is a generative model that includes a forward process, specifically forward diffusion as described above, and a backward process, specifically backward diffusion as described above. The diffusion model is of many types, and this embodiment takes a Stable-diffusion model as an example. The forward process (diffusion process) generally adds noise (standard gaussian distribution) to the input data in a continuous iteration manner to obtain a final noise adding diagram, and the number of iterations in the scheme is not particularly limited. The backward process is the reverse process of the forward process, gradually reverting to the original picture without noise, which can be called as a denoising process. In the scheme, forward diffusion is to iteratively add noise to the image to be processed, and backward process is to refer to text information of the image to be processed to denoise the noise added image, so as to obtain a denoised image. Specifically, the relevant text information of the image to be processed can be input as text guide words of the stable diffusion model. The related text information may indicate a denoising direction in a denoising process such that the related text information of the denoised image is similar to the related text information of the image to be processed.

Step S13: and performing image processing on the image to be processed and the target denoising image by using the image processing model to obtain a first image processing result of the image to be processed and a second image processing result of the target denoising image.

The image processing model may be an object recognition model, an object detection model, an event detection model, or other model, and the kind of the image processing model is not particularly limited here. The image processing model is an object recognition model, the first image processing result is a result obtained by performing object recognition on the image to be processed, and the second image processing result is an image processing result of the object denoising image. The image processing model sequentially processes the image to be processed and the target denoising image or simultaneously processes the image to be processed and the target denoising image.

Step S14: and determining that noise exists in the image to be processed in response to the difference between the first image processing result and the second image processing result being greater than or equal to a preset difference.

The target denoising image may be regarded as an image without noise, and if the difference between the image processing result of the target denoising image and the image processing result of the image to be processed is large, the noise is regarded as being present in the image to be processed.

The method note or comment can ensure that the obtained related text information retains main characteristics of the image to be processed by using a large language model, so that a denoising image obtained by diffusion according to the main characteristics can also retain main characteristics of the image to be processed, and the denoising image can be used for downstream image processing tasks such as target recognition or detection.

In some embodiments, the following steps may be performed before performing step S11 described above:

the above related text information may be related to an image processing model, and the language big model may convert the image to be processed into related text information corresponding to a subsequent image processing model, where the image processing model is taken as an image recognition model as an example.

It is assumed that the task of the image processing model is to identify cats and dogs. The image text transformations associated with the image processing model may be constructed by different cue words, for example, the cue word that may be constructed at this time is "the animal type appearing in this figure is _____" in cat or dog ". Then, the language model processes the image to be processed to obtain relevant text information, the text output in the task of identifying the cats and dogs will be 'cats' or 'dogs', the text output will be the saved text marked as T _in, and if there is a cat in the image to be processed, that is, T _in is 'the type of animal appearing in the image is a cat or a cat in a dog'. And then, taking the noise adding graph output by the forward process and T _in as diffusion models to carry out backward diffusion input, wherein the output result of the backward process is the target denoising graph.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a training flow of a diffusion model according to an embodiment of the noise detection method of the present application. In some embodiments, the noise detection method may further comprise a training step of the diffusion model, the training step comprising:

Step S21: and acquiring the related text information of the sample image by using the language model.

The related text information may be description information of the sample image, and the related text information may be, for example, a category description of the target object obtained by classifying the target object in the sample image. The manner of acquiring the related text information in the sample image may refer to the above step S11, which is not specifically limited herein. Optionally, the related text information may include, in addition to the output text, a probability feature f, where f may be used to represent the confidence level of the output text, and f is a feature vector or a feature map, which may obtain a specific confidence level after passing through a full connection layer.

Step S22: and taking the sample image as a diffusion model to carry out forward diffusion input, taking the sample noise adding image output by the forward process and the related text information of the sample image as the diffusion model to carry out backward diffusion input, and taking the output result of the backward process as a sample noise removing image.

The iteration times in forward diffusion are the same as those in backward diffusion.

Referring to fig. 3, fig. 3 is a schematic workflow diagram of a diffusion model according to an embodiment of the noise detection method of the present application. As shown in FIG. 3, the input image I can be considered as an image to be processed or a sample image, X ₀…X_t-1、X_t…X_T is obtained by denoising the image to be processed through multiple iterations in forward diffusion, T _in and X _T corresponding to the sample image are input in backward diffusion, and X _T is obtained by denoising through multiple iterationsAnd finally obtaining an output graph I ^*, wherein the output graph is a target denoising graph or a sample denoising graph. The image generated by the intermediate iteration may be considered a temporary image.

Step S23: similarity between the first temporary image generated by forward diffusion and the second temporary image generated by backward diffusion at each iteration is lost.

With continued reference to FIG. 3, the first temporary image may be X _t and the second temporary image may beThe first temporary image may be X _t-1 and the second temporary image may be/>Respectively calculating X _t and/>Similarity loss between X _t-1 and/>And the similarity loss between the two is similar. The similarity loss may be calculated by using an L1 loss or other loss capable of judging the similarity of different features, for example, L2, cos, sin functions, etc., and the method can make the generated output graph I ^* similar to the input graph I by restricting the similarity loss in each iteration process in the diffusion process.

Step S24: and adjusting parameters in the diffusion model by utilizing the similarity losses.

Alternatively, the similarity losses may be weighted and fused, the input in the diffusion model may be adjusted, or the similarity loss of each iteration process may be used only to adjust the parameters used in the iteration process in the diffusion model, and the manner of adjusting the parameters in the diffusion model by using the similarity losses is not specifically limited herein.

Wherein, the training process of the diffusion model can further comprise the following steps:

Firstly, acquiring related text information of a sample denoising graph by using a language model. Then, information loss between the relevant text information of the image to be processed and the relevant text information of the sample denoising graph is acquired. On this basis, the step S14 may specifically be: and adjusting parameters in the diffusion model by utilizing the similarity loss and the information loss.

The method for obtaining the text information related to the sample denoising map by using the language model can refer to the above step S11 and step S21, and will not be described herein. Specifically, T _in and f ^* of the sample denoising map may be obtained, then the sample denoising map and T _in of the sample image are compared, if they are different, it is indicated that the difference between the sample image and the sample denoising map is too large, and at this time, the information loss may be a preset value. If the sample denoising graph and the sample image have the same T _in, the information loss can be determined according to the similarity between f corresponding to the sample image and f ^* corresponding to the sample denoising graph, and the information loss determination method can be a supervised learning method suitable for smoothing such as weak supervision or distillation learning, such as KD loss (distillation learning), soft labe, and the like. The method for adjusting the parameters in the diffusion model by using the similarity losses and the information losses may be to adjust the diffusion model by using the similarity losses, and then adjust the diffusion model by using the information losses, or to obtain the target losses by weighting and fusing the similarity losses and the information losses, and then adjust the diffusion model by using the target losses, where the specific implementation manner of the step S14 is not specifically limited.

In some embodiments, the first image processing result includes a first image feature of the image to be processed and a first prediction result of the image to be processed. The second image processing result comprises a second image feature of the target denoising map and a second prediction result of the target denoising map. The image features may be feature graphs or feature vectors, and may be used to represent the confidence level of the corresponding prediction result, for example, after the image features are subjected to full-link layer processing, the confidence level of the prediction result can be obtained. The prediction result is related to the task executed by the image processing model, if the task executed by the image processing model is target identification, the prediction result is whether a target exists in the identified image, and if the task executed by the image processing model is target detection, the prediction result can be information such as the position of the target in the image.

The noise detection method further comprises the following steps: and determining that the difference between the first image processing result and the second image processing result is greater than or equal to a preset difference in response to the similarity between the first image feature and the second image feature being less than or equal to a preset similarity and/or the first prediction result and the second prediction result being different.

In some application scenarios, in response to the similarity between the first image feature and the second image feature being less than or equal to a preset similarity, it is determined that a difference between the first image processing result and the second image processing result is greater than or equal to a preset difference. In some application scenarios, in response to the first prediction result and the second prediction result being different, it is determined that a difference between the first image processing result and the second image processing result is greater than or equal to a preset difference. In some application scenarios, in response to the similarity between the first image feature and the second image feature being less than or equal to a preset similarity and the first prediction result and the second prediction result being different, it is determined that a difference between the first image processing result and the second image processing result is greater than or equal to a preset difference. The preset similarity can be determined according to learning, for example, the preset similarity is an empirical value obtained by statistics of different attack modes, and the preset similarity can be properly adjusted through a prediction result in an online learning mode. The manner in which the similarity between two image features is determined is numerous and is not particularly limited herein.

In some embodiments, after determining that noise exists in the image to be processed, the noise may be further evaluated, the size of the noise may be determined, if the noise is too large, alarm processing may be performed, and if the noise is small, the subsequent image processing may be performed. Referring to fig. 4, fig. 4 is another flow chart of an embodiment of the noise detection method of the present application, as shown in fig. 4, the noise detection method further includes the following steps after executing the step S14:

Step S31: denoising the image to be processed by using a preset number of denoising modes to obtain a preset number of reference denoising pictures.

The denoising process may use depth learning based image restoration or other denoising structures, such as CNN based image defogging algorithms, and the like. Alternatively, step S31 may include the steps of: and performing image enhancement processing on the image to be processed, wherein the image enhancement processing comprises at least one of color transformation and random white noise. And then denoising the image after the image enhancement processing by using a preset number of denoising modes to obtain a preset number of reference denoising pictures. The image enhancement processing may be a conventional arbitrary data enhancement method, for example, image rotation or the like. The noise removal process may also include a denoising scheme used in the back diffusion process using the diffusion model.

Step S32: and respectively determining the similarity between each reference denoising picture and the image to be processed.

Optionally, the image features obtained by the reference denoising image through the image processing model and the image features obtained by the image to be processed through the image processing model can be determined first, and the similarity between the two image features is used as the similarity between the reference denoising image and the image to be processed. Or directly calculating the similarity between the reference denoising picture and the image to be processed.

In this embodiment, the image features obtained by the reference denoising image through the image processing model and the image features obtained by the image to be processed through the image processing model are determined first, and the similarity between the two image features is taken as an example of the similarity between the reference denoising image and the image to be processed.

Step S33: and carrying out weighted fusion on the similarity to obtain the noise fraction of the image to be processed.

Assuming that N denoising methods are used in total, the method for determining the noise fraction F _score may refer to formula (1):

Wherein, M _T (I) can be used for representing the image characteristics output by the image processing model, M _T (I) represents the image characteristics obtained by performing image processing on the image I to be processed by the image processing model, N _i represents the ith denoising process, and N _i (I) represents the reference denoising map obtained by performing the ith process on the image I to be processed. dis (·) is a distance metric for determining similarity, and may be, but not limited to, L1, L2, and cos equidistance metrics, k _i is a learnable weight value, meaning that the weights k corresponding to different denoising processes may be different.

Step S34: and executing preset alarm processing to remind the image to be processed to be an attack image in response to the noise score being greater than the preset noise score.

The predetermined noise score thr _β may be learned. According to the application, different attack methods are introduced, the learnable weight k _i is optimized, meanwhile, the parameter threshold thr _β is learned according to the actual requirement of the image processing model, when F _score>thr_β is adopted, the image processing model cannot normally identify N (N < N) denoised images (N is an experience value acceptable by a specific algorithm item), at the moment, the noise is judged to be too large, the image to be processed is forbidden to be sent into the image processing model, and the algorithm is warned to be suspected to be attacked maliciously.

Step S35: and determining the image processing result of each reference denoising picture by the image processing model in response to the noise score being less than or equal to the preset noise score.

And in response to the noise fraction being smaller than or equal to the preset noise fraction, inputting the N reference denoising pictures into an image processing model to obtain N results. The image processing results may include a prediction result and image features that may be used to further calculate a confidence level of the prediction result.

Step S36: and determining an image processing result of the image to be processed from the image processing results of the reference denoising pictures.

The implementation manner of step S36 may be: and determining the vote number of the image processing result of each reference denoising graph based on a preset voting mechanism. Then, the image processing result corresponding to the highest ticket number is used as the image processing result of the image to be processed.

Many voting methods are used, for example, classification is performed on the predicted results, the most number of categories are selected as target categories, and then the predicted result with the highest confidence under the target categories is used as an image processing result. Or the voting mode can also be that the similarity between the image features corresponding to each reference denoising image is calculated, the similarity score of the image features of each reference denoising image is calculated, then the image feature corresponding to the highest similarity is selected to be used as the target image feature, and the prediction result corresponding to the target image feature is used as the image processing result. For a certain image feature, the similarity score for that image feature may be the sum of the similarity of that image feature to other image features.

By the voting mode, attack noise can be effectively removed, and the accuracy of an algorithm is ensured to the greatest extent.

The noise detection method provided by the application can identify various noise countering methods with potential threats, and complete the denoising operation, has stronger robustness and expansibility, and can also have better detection effect on the types of noise which do not appear. The noise detection method provided by the application can be applied to the countering defense in an intelligent scene, can effectively detect the image with the attack, and has stronger generalization.

In addition, a large language model is innovatively introduced to guide the diffusion model, the diffusion model is trained aiming at the image processing model to be protected, so that various types of anti-noise can be identified, special training on various new attack methods is not needed, and the defense cost is relatively low.

The main body characteristics of the picture are reserved in the algorithm process through the robust large model, so that a generated image with the reserved main characteristics is obtained, and the downstream task is ensured to be capable of effectively completing the set task (such as identification, detection and the like) according to the reserved main body characteristics. The reconstruction of the reserved main body characteristics of the input picture can destroy the structure of the input picture, has obvious interference effect on an attack mode taking low disturbance as a starting point, such as noise resistance, and leads the attack to be invalid. Therefore, the method can identify the noise with potential threat and complete the denoising operation, and is insensitive to the generation mode of the noise due to strong destructiveness on the noise, so that the effective detection and denoising of various noises can be achieved.

In addition, the denoising structure is combined with the detection noise, the operations such as denoising are realized in the noise detection process, noise evaluation is introduced, and the accuracy of the model to be protected is ensured to the greatest extent. The denoising and the noise detection are combined, so that training efficiency is improved, and computing resources are saved.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a noise detection device according to an embodiment of the application. The noise detection device 30 may perform the noise detection method described above. The noise detection device 30 includes a first processing module 31, a second processing module 32, an image processing module 33, and a noise confirmation module 34; a first processing module 31, configured to acquire relevant text information of an image to be processed using a language model; the second processing module 32 is configured to perform forward diffusion input by using the image to be processed as a diffusion model, perform backward diffusion input by using the noise-added image and the related text information output by the forward process as a diffusion model, and output result of the backward process is a target denoising image; the image processing module 33 is configured to perform image processing on the image to be processed and the target denoising graph by using the image processing model, so as to obtain a first image processing result of the image to be processed and a second image processing result of the target denoising graph; the noise confirmation module 34 is configured to determine that noise exists in the image to be processed in response to a difference between the first image processing result and the second image processing result being greater than or equal to a preset difference.

The functions of each module may be described in the embodiments of the noise detection method, which is not described herein.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device 40 comprises a memory 41 and a processor 42, the processor 42 being arranged to execute program instructions stored in the memory 41 for implementing the steps of any of the noise detection method embodiments described above. In one particular implementation scenario, electronic device 40 may include, but is not limited to: the monitoring device, microcomputer, server, and the electronic device 40 may also include a notebook computer, a tablet computer, and other carrier devices, which are not limited herein.

In particular, the processor 42 is configured to control itself and the memory 41 to implement the steps of any of the noise detection method embodiments described above. The processor 42 may also be referred to as a CPU (Central Processing Unit ). The processor 42 may be an integrated circuit chip having signal processing capabilities. The Processor 42 may also be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 42 may be commonly implemented by an integrated circuit chip.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application. A computer readable storage medium 50 having stored thereon program instructions 51, the program instructions 51 when executed by a processor performing the steps of any of the noise detection method embodiments described above.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. The other image locations, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, may be in electrical, mechanical, or other form.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. A noise detection method, comprising:

Acquiring related text information of an image to be processed by using a language model;

Taking the image to be processed as a diffusion model to carry out forward diffusion input, taking the noise adding image output by the forward process and the related text information as the diffusion model to carry out backward diffusion input, wherein the output result of the backward process is a target denoising image;

Performing image processing on the image to be processed and the target denoising image by using an image processing model to obtain a first image processing result of the image to be processed and a second image processing result of the target denoising image;

And determining that noise exists in the image to be processed in response to the difference between the first image processing result and the second image processing result being greater than or equal to a preset difference.

2. The method of claim 1, wherein after the determining that the image to be processed is noisy in response to a difference between the first image processing result and the second image processing result being greater than or equal to a preset difference, the method further comprises:

denoising the image to be processed by using a preset number of denoising modes to obtain a preset number of reference denoising pictures;

respectively determining the similarity between each reference denoising picture and the image to be processed;

Weighting and fusing the similarity to obtain the noise fraction of the image to be processed;

And executing preset alarm processing to remind the image to be processed to be an attack image in response to the noise score being larger than the preset noise score.

3. The method according to claim 2, wherein the denoising the image to be processed by using a preset number of denoising modes to obtain a preset number of reference denoising graphs includes:

performing image enhancement processing on the image to be processed, wherein the image enhancement processing comprises at least one of color conversion and random white noise;

denoising the image subjected to the image enhancement processing by using a preset number of denoising modes to obtain a preset number of reference denoising pictures.

4. The method according to claim 2, wherein the method further comprises:

determining an image processing result of each reference denoising picture by the image processing model in response to the noise score being less than or equal to a preset noise score;

and determining the image processing result of the image to be processed from the image processing results of the reference denoising pictures.

5. The method of claim 4, wherein determining the image processing result of the image to be processed from the image processing results of each of the reference denoising figures comprises:

determining the number of votes of the image processing results of each reference denoising graph based on a preset voting mechanism;

and taking an image processing result corresponding to the highest ticket number as an image processing result of the image to be processed.

6. The method according to any one of claims 1 to 5, further comprising a training step of the diffusion model, the training step comprising:

acquiring related text information of a sample image by using a language model;

Taking the sample image as a diffusion model to carry out forward diffusion input, taking a sample noise adding image output by the forward process and related text information of the sample image as the diffusion model to carry out backward diffusion input, wherein the output result of the backward process is a sample noise removing image, and the iteration times in the forward diffusion are the same as the iteration times in the backward diffusion;

A similarity loss between a first temporary image generated by the forward diffusion and a second temporary image generated by the backward diffusion during each iteration respectively;

And adjusting parameters in the diffusion model by utilizing the similarity losses.

7. The method of claim 6, wherein the method further comprises:

acquiring related text information of the sample denoising graph by using the language model;

acquiring information loss between the related text information of the image to be processed and the related text information of the sample denoising graph;

The adjusting parameters in the diffusion model by using the similarity losses comprises:

and adjusting parameters in the diffusion model by utilizing the similarity loss and the information loss.

8. The method according to any one of claims 1 to 5, wherein the first image processing result includes a first image feature of the image to be processed and a first prediction result of the image to be processed, the second image processing result includes a second image feature of the target denoising map and a second prediction result of the target denoising map, the method further comprising:

And determining that the difference between the first image processing result and the second image processing result is greater than or equal to a preset difference in response to the similarity between the first image feature and the second image feature being less than or equal to a preset similarity and/or the first prediction result and the second prediction result being different.

9. An electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the method of any one of claims 1 to 8.

10. A computer readable storage medium having stored thereon program instructions, which when executed by a processor, implement the method of any of claims 1 to 8.