[go: up one dir, main page]

CN118967424B - Screen shot image robust watermarking method based on attention mechanism and contrast learning - Google Patents

Screen shot image robust watermarking method based on attention mechanism and contrast learning Download PDF

Info

Publication number
CN118967424B
CN118967424B CN202411448920.XA CN202411448920A CN118967424B CN 118967424 B CN118967424 B CN 118967424B CN 202411448920 A CN202411448920 A CN 202411448920A CN 118967424 B CN118967424 B CN 118967424B
Authority
CN
China
Prior art keywords
feature map
image
layer
discriminator
watermark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411448920.XA
Other languages
Chinese (zh)
Other versions
CN118967424A (en
Inventor
高光勇
李力
陈晓安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202411448920.XA priority Critical patent/CN118967424B/en
Publication of CN118967424A publication Critical patent/CN118967424A/en
Application granted granted Critical
Publication of CN118967424B publication Critical patent/CN118967424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • G06T1/005Robust watermarking, e.g. average attack or collusion attack resistant
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a robust watermarking method for a screen shot image based on attention mechanism and contrast learning, which comprises the steps of generating an encoded image containing watermark information by using an encoder, inputting the encoded image and a carrier image into a discriminator, outputting a predicted value by using the discriminator, carrying out distortion simulation on the encoded image, inputting the distorted encoded image into a decoder, extracting watermark information hidden in the distorted encoded image, carrying out model training on the encoder, the discriminator and the decoder according to the predicted value of the discriminator and a joint loss function, forming a screen shot watermarking model by using the encoder and the decoder after training, and encoding and decoding the screen shot image by using the screen shot watermarking model. The method and the device effectively enhance the robustness of the watermark model in the real scene by optimizing the watermark image coding process while guaranteeing the invisibility of the coded image, and better guarantee the integrity of watermark information when facing screen shot noise.

Description

Screen shot image robust watermarking method based on attention mechanism and contrast learning
Technical Field
The invention relates to a screen shot image watermarking technology, in particular to a screen shot image robust watermarking method based on an attention mechanism and contrast learning.
Background
With the update iteration of intelligent equipment, the streaming and spreading of digital information become unprecedented convenient and rapid, the development of digital copyright protection is very perfect under the multimedia technical background of high-speed development, but the copyright protection under the scene facing screen shooting still has gaps, such as film and television works piracy, medical data piracy, military secret piracy and the like, and the traditional electronic information watermarking technology can not well prevent the malicious behaviors, so that new problems and challenges are brought to the technical field of image watermarking.
Most of traditional digital watermarking schemes are based on electronic channel propagation, and the design purpose is mainly to prevent watermark information extraction, such as Gaussian noise, JPEG compression, color distortion and the like, from being influenced when watermark images face electronic channel noise, however, the difference between shooting principles in real scenes and distortion principles of electronic channels is large, so that the watermarks in the traditional schemes are difficult to resist the distortion. This is due to many physical distortions in the panning scene, such as lens distortion, illumination distortion, motion blur, moire effects, etc., which are not present in the traditional electronic channel domain. Therefore, the screen robust watermark has been developed, and the purpose of the screen robust watermark is to enable watermark images to smoothly read hidden watermark information after the watermark images are shot by facing the screen, so that the security of user data is ensured.
The advent of deep learning has provided a powerful aid to the screening robust watermarking study, where the generation of an antagonism network has provided a new idea to the watermarking technology study, by generating antagonism training of the encoder and discriminator of the antagonism network, the position of watermark information embedding can be fitted to the image features as much as possible. In addition, the noise layer simulates specific noise that may occur in a real scene to enhance the robustness of the decoder to the specific noise. However, the following problems still remain in the prior art:
1. the robustness is insufficient, namely, the situation of insufficient watermark extraction rate can still occur in the encoder after the encoded image generated by the prior art is subjected to the panning attack.
2. The invisibility is insufficient, namely the embedding position of the watermark in the coded image generated by the prior art is poor in fit with the image characteristics, so that the similarity between the watermark and the original image is low, and a user can easily observe that the image possibly contains the watermark by naked eyes.
3. The model architecture is insufficient, and the problems of mode collapse, gradient disappearance and the like exist in the model architecture in the prior art, so that the upper limit of the model performance is blocked, and the overall performance of the model is influenced.
Disclosure of Invention
Aiming at the problems, the invention aims to provide a screen shot image robust watermarking method based on an attention mechanism and contrast learning.
The invention discloses a screen shot image robust watermarking method based on an attention mechanism and contrast learning, which comprises the following steps:
Inputting the carrier image and the watermark information into an encoder, and generating an encoded image containing the watermark information by the encoder;
inputting the encoded image and the carrier image into a discriminator, and outputting a predicted value using the discriminator;
Performing distortion simulation on the coded image to obtain a distorted coded image;
Inputting the distorted coded image into a decoder, and extracting watermark information hidden in the distorted coded image;
And (3) carrying out model training on the encoder, the discriminator and the decoder according to the predicted value of the discriminator and the joint loss function, forming a screen shot watermark model by using the trained encoder and decoder, and encoding and decoding the screen shot image by using the screen shot watermark model.
Further, before inputting the carrier image and the watermark information into the encoder, comprising:
Randomly extracting n 0 numerical elements from standard uniform distribution in the interval obeying [0, 1), taking 1 for numerical values larger than 0.5 and 0 for numerical values not larger than 0.5 to form binary watermark ciphertext information as watermark information;
processing the original carrier image size to n 1 N 1 as carrier images.
Further, inputting the carrier image and watermark information into the encoder comprises:
Carrying out three times of downsampling operation on the carrier image to sequentially obtain local feature images F1, F2 and F3, carrying out maximum pooling operation after each downsampling operation, obtaining a local feature image F4 after the third maximum pooling operation, and carrying out global average pooling once again to obtain a global feature image F5;
Watermark information is subjected to a full-connection layer to obtain watermark tensor M, and the dimension of the watermark tensor M is the same as the dimension of the feature map F6;
splicing the local feature map F4, the feature map F6 and the watermark tensor M in the channel dimension, and carrying out up-sampling once to obtain a feature map D4;
splicing the local feature map F3, the feature map D4 and the watermark tensor M in the channel dimension, and obtaining a feature map D3 through a secondary convolution layer and an up-sampling layer;
splicing the local feature map F2, the feature map D3 and the watermark tensor M in the channel dimension, and obtaining a feature map D2 through a secondary convolution layer and an up-sampling layer;
Splicing the local feature map F1, the feature map D2 and the watermark tensor M in the channel dimension, and performing secondary convolution layer and 1 1 Convolving the layer to obtain the encoded image D1.
Further, performing the downsampling operation on the carrier image three times includes:
At each downsampling, sequentially passing the carrier image through a 3×3 convolution layer, a batch normalization layer, a first activation function layer, a 3×3 convolution layer, a batch normalization layer, a second activation function layer and a HWC attention module;
The HWC attention module includes HAttention, WAttention, and CAttention modules;
In HAttention module, input feature map x is subjected to self-adaptive maximum pooling operation and self-adaptive global average pooling operation respectively, the width is compressed to be 1, feature map max h and feature map avg h are obtained respectively, feature map max h sequentially passes through a 1×1 convolution layer, an activation function layer and a 1×1 convolution layer to obtain feature map se (max h), feature map avg h sequentially passes through the 1×1 convolution layer, the activation function layer and the 1×1 convolution layer to obtain feature map se (avg h), feature map se (max h) and feature map se (avg h) are spliced, and feature map A h is obtained after the activation function layer;
In WAttention module, the input feature map x is subjected to adaptive maximum pooling and adaptive global average pooling respectively, the height is compressed to 1, so as to obtain a feature map max w and a feature map avg w, the feature map max w sequentially passes through a 1×1 convolution layer, an activation function layer and a 1×1 convolution layer to obtain a feature map se (max w), the feature map avg w sequentially passes through the 1×1 convolution layer, the activation function layer and the 1×1 convolution layer to obtain a feature map se (avg w), the feature map se (max w) and the feature map se (avg w) are spliced, and then the feature map A w is obtained after the function layer is activated;
In CAttention module, the input feature map x is compressed to 1 through a self-adaptive maximum value pooling layer and a self-adaptive global average pooling layer respectively, then the two obtained tensors are spliced in the channel dimension, the dimension of the channel is reduced to 1 through a 1X1 convolution layer, and then the feature map A c is obtained through an activation function layer;
The input feature map x is multiplied by the feature map a h and the feature map a w in the height and width dimensions, then multiplied by the feature map a c in the channel dimension, and the obtained result is added to the input feature map x to be used as an output result of the HWC attention module.
Further, the discriminator generates an countermeasure network for spectrum normalization, the carrier picture and the watermark encoded image are input to the spectrum normalization generation countermeasure network, the output of the discriminator is true or false, the output result of the discriminator is input to the encoder, and the loss function L C of the encoder and the loss function L E of the discriminator are calculated, and the calculation expressions are respectively:
,
,
Where α and β are training hyper-parameters, X r and X w represent the carrier image and the encoded image, respectively, L nce and L gan represent NCE loss and hinge loss, respectively, C represents the weight parameters of the discriminator, Representing the weight parameters of the fixed discriminator, E represents the weight parameters of the encoder,Representing the weight parameters of the fixed encoder.
Further, performing distortion simulation on the encoded image includes:
randomly perturbing four corners of the encoded image by utilizing perspective transformation, and then performing bilinear resampling on the encoded image to create a perspective warped image;
performing illumination distortion and moire distortion simulation on the image subjected to perspective distortion by using an illumination simulation function and a moire simulation function;
and carrying out distortion simulation on the interference in the real scene by using Gaussian noise.
Further, the decoder includes 3 single convolution blocks, 3 residual convolution blocks, 1 single convolution block, 6 residual convolution blocks, 1 single convolution block, and 1 full connection layer, which are sequentially arranged.
Further, extracting watermark information hidden in the distorted encoded image includes:
and calculating a decoder loss function L D according to the decoded watermark information and the original watermark information, wherein the calculation formula is as follows:
,
Where M represents the original watermark information, M d represents the decoded watermark information, γ D represents the decoding super-parameters, I n represents the distorted encoded image, and D (γ D,In) represents the decoder to decode the distorted encoded image.
Further, the joint loss function is composed of an encoder loss function, a discriminator loss function, and a decoder loss function, and the formula is as follows:
,
Where λ 1、λ2 and λ 3 are weight parameters of the corresponding loss function.
Further, model training the encoder, the discriminator, and the decoder based on the predictor of the discriminator and the joint loss function comprises:
The encoder, the discriminator and the decoder are input to an Adam optimizer for iterative training, the maximum iteration number is set, and the joint loss function is utilized for back propagation.
Compared with the prior art, the invention has the remarkable advantages that:
1. the invention improves the performance of the encoder by designing a new multi-branch convolution and HWC attention mechanism module, so that the invisibility of the encoded watermark is improved;
2. The invention solves the problem of mode collapse in the traditional model architecture by comparing the learning with the multi-discriminator module, improves the upper limit of model training and enhances the performance of the model after training is finished;
3. by arranging the screen shot distortion simulation layer, the invention simulates several kinds of distortion possibly generated during screen shooting in a real scene, trains the decoder by using the screen shot distortion simulation layer, and improves the robustness of the decoder to the screen shot distortion.
Drawings
FIG. 1 is a block flow diagram of a robust watermarking method for a screen shot image based on attention mechanisms and contrast learning;
FIG. 2 is a schematic diagram of the structure of the HWC attention mechanism module;
FIG. 3 is a schematic diagram of pairing positive and negative samples in a contrast learning process;
FIG. 4 is a moire image under noise interference of different intensities;
fig. 5 is a screen shot image under different angular disturbances.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent.
The method for robust watermarking of a screen shot image based on an attention mechanism and contrast learning in this embodiment at least includes steps 1 to 5, and a flowchart is shown in fig. 1.
Step 1, inputting a carrier image and watermark information into an encoder, and generating an encoded image containing the watermark information by using the encoder;
Step 2, inputting the coded image and the carrier image into a discriminator, and outputting a predicted value by using the discriminator;
step 3, performing distortion simulation on the coded image to obtain a distorted coded image;
Step 4, inputting the distorted coded image into a decoder, and extracting watermark information hidden in the distorted coded image;
And 5, performing model training on the encoder, the discriminator and the decoder according to the predicted value of the discriminator and the joint loss function, forming a screen shot watermark model by using the trained encoder and decoder, and encoding and decoding the screen shot image by using the screen shot watermark model.
Further, before inputting the carrier image and the watermark information into the encoder, comprising:
Randomly extracting n 0 numerical elements from standard uniform distribution in the interval obeying [0, 1), taking 1 for numerical values larger than 0.5 and 0 for numerical values not larger than 0.5 to form binary watermark ciphertext information as watermark information;
processing the original carrier image size to n 1 N 1 as carrier images.
In one example, 30 elements may be randomly extracted from a standard uniform distribution within the [0, 1] interval according to the dimension of the encoder input to form watermark information to be embedded into the carrier image, n 1 takes on a value of 128, i.e., the original carrier image is processed to 128128, And inputting the processed watermark information and the carrier image into an encoder for encoding, thereby obtaining an encoded image containing the watermark information.
Further, in step 1, inputting the carrier image and the watermark information into the encoder comprises:
In the encoder, carrying out three downsampling operations on a carrier image to sequentially obtain local feature images F1, F2 and F3, carrying out maximum pooling operation after downsampling each time, obtaining a local feature image F4 after the third maximum pooling operation, and carrying out global average pooling once again to obtain a global feature image F5;
Watermark information is subjected to a full-connection layer to obtain watermark tensor M, and the dimension of the watermark tensor M is the same as the dimension of the feature map F6;
splicing the local feature map F4, the feature map F6 and the watermark tensor M in the channel dimension, and carrying out up-sampling once to obtain a feature map D4;
splicing the local feature map F3, the feature map D4 and the watermark tensor M in the channel dimension, and obtaining a feature map D3 through a secondary convolution layer and an up-sampling layer;
splicing the local feature map F2, the feature map D3 and the watermark tensor M in the channel dimension, and obtaining a feature map D2 through a secondary convolution layer and an up-sampling layer;
Splicing the local feature map F1, the feature map D2 and the watermark tensor M in the channel dimension, and performing secondary convolution layer and 1 1 Convolving the layer to obtain the encoded image D1.
Specifically, performing the downsampling operation on the carrier image three times includes:
at each downsampling, the carrier image is sequentially passed through a 3×3 convolution layer, a batch normalization layer, a first activation function layer, a 3×3 convolution layer, a batch normalization layer, a second activation function layer, and a HWC attention module.
As shown in fig. 2, the HWC attention module includes HAttention, WAttention, and CAttention modules;
In HAttention, the input feature map x of the HWC attention module is subjected to adaptive maximum pooling operation and adaptive global average pooling operation, the width is compressed to 1, that is, the original feature map size is changed from (B, C, H, W) to (B, C, H, 1), B, C, H, W respectively represent training batch, channel dimension, width dimension, and height dimension, respectively obtain feature map max h and feature map avg h, the feature map max h sequentially passes through a 1×1 convolution layer, relu activation function layer, and a 1×1 convolution layer, to obtain a feature map se (max h), the feature map avg h sequentially passes through a 1×1 convolution layer, relu activation function layer, and a 1×1 convolution layer, to obtain a feature map se (avg h), the feature map se (max h) and the feature map se (avg h) are spliced, the feature map a h is obtained after the sigid activation function layer, the channel number is reduced from C1 to C2 when the channel number passes through the first convolution layer, and the channel number is restored from C2 to the second channel number when the channel number passes through the first convolution layer.
In WAttention module, the input feature map x is respectively subjected to adaptive maximum pooling and adaptive global average pooling, the height is compressed to be 1, namely the original feature map size is changed from (B, C, H and W) to (B, C,1 and W), a feature map max w and a feature map avg w are obtained, the feature map max w sequentially passes through a 1×1 convolution layer, a Relu activation function layer and a 1×1 convolution layer to obtain a feature map se (max w), the feature map avg w sequentially passes through the 1×1 convolution layer, the Relu activation function layer and the 1×1 convolution layer to obtain a feature map se (avg w), the feature map se (max w) and the feature map se (avg w) are spliced, the feature map A w is obtained after the sigmoid activation function layer, the channel number is reduced from C1 to C2 when the first convolution layer passes through the second convolution layer, and the channel number is restored from C2 to the original C1 when the second convolution layer passes through the first convolution layer.
In CAttention module, the input feature image x is compressed to 1 through the self-adaptive maximum value pooling layer and the self-adaptive global average pooling layer respectively, namely the original feature image size is changed from (B, C, H and W) to (B, 1, H and W), then the two obtained tensors are spliced in the channel dimension, the channel dimension is reduced to 1 through the 1X 1 convolution layer, and then the feature image A c is obtained through the sigmoid activation function layer;
The input feature map x is multiplied by the feature map a h and the feature map a w in the height and width dimensions, then multiplied by the feature map a c in the channel dimension, and the obtained result is added to the input feature map x to be used as an output result of the HWC attention module.
In the encoder, for the convolutional layer, kaiming normal distribution initialization weights are used, for the batch normalization layer, the weights are initialized to be constant 1, the offsets are initialized to be constant 0, and for the full-connection layer, normal distribution initialization weights are used, the standard deviation is 0.001, and the offsets are initialized to be constant 0.
The HWC module has the main function that the encoder carries out attention mechanism operation in three dimensions of width, height and channel through three sub-modules HAttention, WAttention and CAttention respectively, so that the feature perceptibility of an input original carrier image is enhanced, the watermark embedding position is better determined, and the quality of the encoded image is improved. In addition, the weight optimization is used for initializing weight parameters so as to prevent the conditions of gradient disappearance, gradient explosion and the like in training from interfering with the normal training process.
The discriminator generates an countermeasure network SNGAN for spectrum normalization, the carrier picture and the watermark encoded image are input to the spectrum normalization generates the countermeasure network, the output of the discriminator is true or false, the output result of the discriminator is input to the encoder, and the loss function L C of the encoder and the loss function L E of the discriminator are calculated, and the calculation expressions are respectively:
,
,
Where α and β are training hyper-parameters, X r and X w represent the carrier image and the encoded image, respectively, L nce and L gan represent NCE loss and hinge loss, respectively, C represents the weight parameters of the discriminator, Representing the weight parameters of the fixed discriminator, E represents the weight parameters of the encoder,Representing the weight parameters of the fixed encoder.
Discriminator SNGAN comprises a plurality of local feature discriminators and a global feature discriminator, which, after extracting the local and global features, project into a higher dimensional reproduction kernel hilbert space (RKHS, reproducing Kernel Hilbert Space), capturing the similarity between the global and local features using linearly evaluated values, as shown in fig. 1. Then, these projection features go through a contrast learning sample pairing stage to create positive/negative sample pairs, the process is as shown in fig. 3, where local features and global features of an m×m input image are taken as positive sample pairs, other pictures in the same batch and pictures in different batches are taken as negative sample pairs, these positive and negative sample pairs are used to calculate the NCE loss in the later input loss function, and finally true or false is output and input to the next round of encoder.
The multiple local feature discriminators and the global feature discriminator in the example adopt a multi-discriminator structure, so that the problem of catastrophic forgetting which is easy to occur in a traditional single discriminator is relieved, the problem of mode collapse of the encoder is prevented by comparing the pairing of learning samples and NCE loss, the two methods optimize two common problems which occur in the prior art and lead to the fact that training cannot be performed normally, the framework structure of a training model is optimized, and the upper limit of model performance is improved.
Further, in step 3, performing distortion simulation on the encoded image includes:
s31, randomly perturbing four corners of the coded image by utilizing perspective transformation, and then performing bilinear resampling on the coded image to create a perspective distorted image;
S32, performing illumination distortion and moire distortion simulation on the image subjected to perspective distortion by using an illumination simulation function and a moire simulation function;
and S33, performing distortion simulation on the interference in the real scene by using Gaussian noise.
In step 4, watermark information hidden in the distorted encoded image is extracted using a decoder. The decoder comprises 3 single convolution blocks, 3 residual convolution blocks, 1 single convolution block, 6 residual convolution blocks, 1 single convolution block and 1 full connection layer which are sequentially arranged.
The structure of the single convolution block is a 3×3 convolution, batch normalization and activation function which are sequentially arranged. The structure of the residual convolution block is 3×3 convolution, batch normalization, activation function, 3×3 convolution, batch normalization and one jump connection which are sequentially arranged, and the final output is the sum of jump connection after the activation function and 3×3 convolution result.
Further, extracting watermark information hidden in the distorted encoded image includes:
and calculating a decoder loss function L D according to the decoded watermark information and the original watermark information, wherein the calculation formula is as follows:
,
Where M represents the original watermark information, M d represents the decoded watermark information, γ D represents the decoding super-parameters, I n represents the distorted encoded image, and D (γ D,In) represents the decoder to decode the distorted encoded image.
Further, the joint loss function is composed of an encoder loss function, a discriminator loss function, and a decoder loss function, and the formula is as follows:
,
Where λ 1、λ2 and λ 3 are weight parameters of the corresponding loss function. In this example, λ 1、λ2 and λ 3 can be set to 0.5, 0.001 and 3, respectively.
In step 5, model training the encoder, discriminator and decoder based on the predictor of the discriminator and the joint loss function comprises:
The encoder, the discriminator and the decoder are input into an Adam optimizer for iterative training, the maximum iteration times are set, and the joint loss function is utilized for back propagation, so that a final trained model is obtained.
And forming a screen shot watermark model by the trained encoder and decoder, inputting the carrier image and watermark information into the encoder in the screen shot watermark model to obtain an encoded image, displaying the encoded image on a display, and decoding the watermark information in the screen shot image by using the decoder in the screen shot watermark model after shooting by a mobile phone to obtain decoded watermark information.
To verify the effectiveness of the method of the invention, the method of the invention was tested as follows:
According to the invention, 15000 images are randomly selected from a COCO training set to serve as a training set, 500 images are randomly selected from a COCO testing set to serve as a testing set, pyTorch is selected from a programming language, NVIDIA RTX 3070 GPU is used as training equipment, AOC LV273HUPR and CSO 1609 are used as displays for experiments, and Realme RMX3366 and HUAWEI DBY-W09 are used as shooting equipment. The training batch was set to 100 and the batch size was set to 16. The invention in tables 1 to 4 below are all used to represent the panning watermark models proposed in the present invention, and other comparative models are all existing models, including STEGASTAMP (STEGASTAMP: robust HYPERLINKS IN PHYSICAL photographs) model, RIHOOP (RIHOOP: robust HYPERLINKS IN Offline and Online Photographs) model 、PIMoG(PIMoG: An effective screen-shooting noise-layer simulation for deep-learning-based watermarking network) model, and the average bit correct extraction rate is selected as an evaluation index for the robustness experiment.
The results of the robustness experiments of the proposed model of the invention against moire noise are given in table 1, compared with several other models. In view of the fact that the moire noise intensity does not have a fixed evaluation index, the consistency of the moire noise is ensured by fixing the distance and angle of the device in this example, and the moire noise is carried out under the condition that the fixed distance is 20cm and the angle is 0 degrees, namely, the moire noise intensity is opposite to the screen, and the moire image under the interference of the strong, medium and weak noise intensities is shown in fig. 4.
TABLE 1 Moire noise test results at different intensities
From table 1, it can be derived that the screen watermark model of the invention shows more excellent performance, and the average extraction rate of the screen watermark model against moire noise is up to 99.019%.
The robustness experiments of the panning watermark model and other models under the interference of 40 degrees left, 20 degrees left, 0 degrees right, 20 degrees right and 40 degrees right are shown in table 2, and the panning images under the interference of different angles are shown in fig. 5. From the data in table 2, the model provided by the invention shows more excellent performance, and the extraction rate is higher than that of the existing other models under the interference of five different angles.
Table 2 results of robustness experiments at different angles
The robustness experiments of the screen shot watermark model and other comparison models under different conditions are shown in the table 3, and the experimental results show that the model provided by the invention is better than the existing scheme under different brands of equipment and has higher robustness.
TABLE 3 results of experiments with different devices
The comparison of the image quality experimental results of the model proposed by the present invention with other comparative models, including peak signal-to-noise ratio PSNR and structural similarity SSIM, is given in table 4, and the image quality of the present invention is superior to the existing scheme, indicating the effectiveness of the scheme proposed by the present invention.
TABLE 4 results of image quality experiments
The comparison result shows that the watermark method ensures the quality of the coded image, has higher extraction rate than other models, and has invisibility and robustness.

Claims (9)

1.一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,包括:1. A screen shot image robust watermarking method based on attention mechanism and contrastive learning, characterized by comprising: 将载体图像和水印信息输入至编码器中,利用编码器生成含有水印信息的编码图像;The carrier image and the watermark information are input into the encoder, and the encoder is used to generate an encoded image containing the watermark information; 将编码图像和载体图像输入至鉴别器中,利用鉴别器输出预测值;Input the encoded image and the carrier image into the discriminator, and use the discriminator to output the predicted value; 对编码图像进行失真模拟,得到失真的编码图像;Performing distortion simulation on the coded image to obtain a distorted coded image; 将失真的编码图像输入至解码器中,提取出隐藏在失真的编码图像中的水印信息;The distorted coded image is input into a decoder to extract the watermark information hidden in the distorted coded image; 根据鉴别器的预测值以及联合损失函数,对编码器、鉴别器和解码器进行模型训练,利用训练之后的编码器和解码器组成屏摄水印模型,利用屏摄水印模型对屏摄图像进行编码和解码;According to the prediction value of the discriminator and the joint loss function, the encoder, discriminator and decoder are trained, the trained encoder and decoder are used to form a screen capture watermark model, and the screen capture image is encoded and decoded using the screen capture watermark model; 根据鉴别器的预测值以及联合损失函数,对编码器、鉴别器和解码器进行模型训练包括:Based on the discriminator’s predictions and the joint loss function, the encoder, discriminator, and decoder are trained by: 将编码器、鉴别器和解码器输入至Adam优化器进行迭代训练,设置最大迭代次数,利用联合损失函数,进行反向传播。The encoder, discriminator, and decoder are input into the Adam optimizer for iterative training, the maximum number of iterations is set, and the joint loss function is used for back propagation. 2.根据权利要求1所述的一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,将载体图像和水印信息输入至编码器中之前包括:2. According to claim 1, a screen shot image robust watermarking method based on attention mechanism and contrastive learning is characterized in that before the carrier image and the watermark information are input into the encoder, the method comprises: 从服从[0,1)区间内的标准均匀分布中随机抽取n 0个数值元素,并将大于0.5的数值取1,不大于0.5的数值取0,组成二进制水印密文信息,作为水印信息;Randomly select n 0 numerical elements from the standard uniform distribution in the interval [0,1), and set the values greater than 0.5 to 1 and the values less than 0.5 to 0 to form binary watermark ciphertext information as watermark information; 将原始载体图像尺寸处理至n 1 n 1,作为载体图像。Resize the original carrier image to n 1 n 1 , as the carrier image. 3.根据权利要求2所述的一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,将载体图像和水印信息输入至编码器中包括:3. According to claim 2, a screen shot image robust watermarking method based on attention mechanism and contrastive learning is characterized in that inputting the carrier image and watermark information into the encoder comprises: 将载体图像进行三次下采样操作,依次得到局部特征图F1、F2、F3,每次下采样后进行最大池化操作;在第三次最大池化操作后得到局部特征图F4,再进行一次全局平均池化,得到全局特征图F5;将局部特征图F4和全局特征图F5拼接,得到特征图F6;The carrier image is downsampled three times to obtain local feature maps F1 , F2 , and F3 in sequence, and a maximum pooling operation is performed after each downsampling; after the third maximum pooling operation, a local feature map F4 is obtained, and a global average pooling operation is performed again to obtain a global feature map F5 ; the local feature map F4 and the global feature map F5 are spliced to obtain a feature map F6 ; 将水印信息经过一个全连接层得到水印张量M,且水印张量M的维度与特征图F6的维度相同;The watermark information is passed through a fully connected layer to obtain the watermark tensor M , and the dimension of the watermark tensor M is the same as the dimension of the feature map F6 ; 将局部特征图F4、特征图F6和水印张量M在通道维度上拼接,并进行一次上采样,得到特征图D4;The local feature map F4 , the feature map F6 and the watermark tensor M are concatenated in the channel dimension and upsampled once to obtain the feature map D4 ; 将局部特征图F3、特征图D4和水印张量M在通道维度上拼接,并经过二次卷积层和上采样层,得到特征图D3;The local feature map F 3, feature map D 4 and watermark tensor M are concatenated in the channel dimension, and then passed through a secondary convolution layer and an upsampling layer to obtain feature map D 3; 将局部特征图F2、特征图D3和水印张量M在通道维度上拼接,并经过二次卷积层和上采样层,得到特征图D2;The local feature map F2 , feature map D3 and watermark tensor M are concatenated in the channel dimension, and then passed through a secondary convolution layer and an upsampling layer to obtain feature map D2 ; 将局部特征图F1、特征图D2和水印张量M在通道维度上拼接,并经过二次卷积层和11卷积层,得到编码图像D1。The local feature map F1 , feature map D2 and watermark tensor M are concatenated in the channel dimension and passed through the secondary convolution layer and 1 1 convolution layer to obtain the encoded image D 1. 4.根据权利要求3所述的一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,将载体图像进行三次下采样操作包括:4. According to the method of claim 3, wherein the method of performing three downsampling operations on the carrier image comprises: 在每一次下采样时,将载体图像依次经过3×3卷积层、批归一化层、第一激活函数层、3×3卷积层、批归一化层、第二激活函数层和HWC注意力模块;At each downsampling, the carrier image is sequentially passed through a 3×3 convolution layer, a batch normalization layer, a first activation function layer, a 3×3 convolution layer, a batch normalization layer, a second activation function layer, and a HWC attention module; HWC注意力模块包括HAttention模块、WAttention模块和CAttention模块;The HWC attention module includes the HAttention module, the WAttention module, and the CAttention module; 其中,在HAttention模块中,将输入特征图x分别经过自适应最大值池化操作和自适应全局平均池化操作,将宽度压缩为1,分别得到特征图max h 和特征图avg h ,将特征图max h 依次通过1×1卷积层、激活函数层和1×1卷积层,得到特征图se(max h ),将特征图avg h 依次通过1×1卷积层、激活函数层和1×1卷积层,得到特征图se(avg h ),将特征图se(max h )和特征图se(avg h )进行拼接,再通过激活函数层后得到特征图A h Among them, in the HAttention module, the input feature map x is subjected to adaptive maximum pooling operation and adaptive global average pooling operation respectively, and the width is compressed to 1 to obtain the feature map max h and the feature map avg h respectively. The feature map max h passes through the 1×1 convolution layer, the activation function layer and the 1×1 convolution layer in sequence to obtain the feature map se ( max h ), and the feature map avg h passes through the 1×1 convolution layer, the activation function layer and the 1×1 convolution layer in sequence to obtain the feature map se ( avg h ). The feature map se ( max h ) and the feature map se ( avg h ) are concatenated and then pass through the activation function layer to obtain the feature map Ah ; 在WAttention模块中,将输入特征图x分别经过自适应最大值池化和自适应全局平均池化,将高度压缩为1,得到特征图max w 和特征图avg w ,将特征图max w 依次通过1×1卷积层、激活函数层和1×1卷积层,得到特征图se(max w ),将特征图avg w 依次通过1×1卷积层、激活函数层和1×1卷积层,得到特征图se(avg w ),将特征图se(max w )和特征图se(avg w )进行拼接,再通过激活函数层后得到特征图A w In the WAttention module, the input feature map x is subjected to adaptive maximum pooling and adaptive global average pooling respectively, and the height is compressed to 1 to obtain the feature map max w and the feature map avg w . The feature map max w passes through the 1×1 convolution layer, the activation function layer and the 1×1 convolution layer in sequence to obtain the feature map se ( max w ). The feature map avg w passes through the 1×1 convolution layer, the activation function layer and the 1×1 convolution layer in sequence to obtain the feature map se ( avg w ). The feature map se ( max w ) and the feature map se ( avg w ) are concatenated and then pass through the activation function layer to obtain the feature map A w . 在CAttention模块中,将输入特征图x分别经过自适应最大值池化层和自适应全局平均池化层,将通道数压缩为1,然后将得到的两个张量在通道维度上进行拼接,再经过1×1卷积层将通道维数降维到1,再经过激活函数层得到特征图A c In the CAttention module, the input feature map x passes through the adaptive maximum pooling layer and the adaptive global average pooling layer respectively, compresses the number of channels to 1, and then concatenates the two tensors in the channel dimension, and then passes through a 1×1 convolution layer to reduce the channel dimension to 1, and then passes through the activation function layer to obtain the feature map A c ; 将输入特征图x与特征图A h 、特征图A w 在高度和宽度维度相乘,再与特征图A c 在通道维度上相乘,得到的结果再与输入特征图x相加,作为HWC注意力模块的输出结果。The input feature map x is multiplied with the feature map Ah and the feature map Aw in the height and width dimensions, and then multiplied with the feature map Ac in the channel dimension. The result is then added to the input feature map x as the output result of the HWC attention module. 5.根据权利要求4所述的一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,鉴别器为频谱归一化生成对抗网络,将载体图片和水印编码图像输入至频谱归一化生成对抗网络,鉴别器的输出为真或假,将鉴别器的输出结果输入至编码器中,并计算编码器损失函数L C 和鉴别器损失函数L E ,计算表达式分别为:5. According to claim 4, a robust watermarking method for screen capture images based on attention mechanism and contrastive learning is characterized in that the discriminator is a spectrum normalization generative adversarial network, the carrier image and the watermark encoded image are input into the spectrum normalization generative adversarial network, the output of the discriminator is true or false, the output result of the discriminator is input into the encoder, and the encoder loss function LC and the discriminator loss function LE are calculated, and the calculation expressions are respectively: , , 式中,αβ是训练超参数,X r X w 分别表示载体图像和编码图像,L nce L gan 分别表示NCE损失和合页损失,C表示鉴别器的权重参数,表示固定鉴别器的权重参数;E表示编码器的权重参数,表示固定编码器的权重参数。Where α and β are training hyperparameters, Xr and Xw represent the carrier image and the encoded image respectively , Lnce and Lgan represent the NCE loss and hinge loss respectively, C represents the weight parameter of the discriminator, represents the weight parameter of the fixed discriminator; E represents the weight parameter of the encoder, represents the weight parameters of the fixed encoder. 6.根据权利要求5所述的一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,对编码图像进行失真模拟包括:6. According to the method of claim 5, wherein the step of simulating the distortion of the encoded image comprises: 利用透视变换随机扰动编码图像的四个角,然后对编码图像进行双线性重采样,创建透视扭曲后的图像;The four corners of the encoded image are randomly perturbed using a perspective transform, and then the encoded image is bilinearly resampled to create a perspective-distorted image. 利用光照模拟函数和摩尔纹模拟函数,对透视扭曲后的图像进行光照失真和摩尔纹失真模拟;Using the illumination simulation function and the moiré simulation function, the illumination distortion and the moiré distortion of the perspective-distorted image are simulated; 利用高斯噪声对现实场景下的干扰进行失真模拟。Gaussian noise is used to simulate the distortion of interference in real scenarios. 7.根据权利要求6所述的一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,解码器包括依次设置的3个单卷积块、3个残差卷积块、1个单卷积块、6个残差卷积块、1个单卷积块和1个全连接层。7. According to claim 6, a robust watermarking method for screen shots based on attention mechanism and contrastive learning is characterized in that the decoder includes 3 single convolution blocks, 3 residual convolution blocks, 1 single convolution block, 6 residual convolution blocks, 1 single convolution block and 1 fully connected layer arranged in sequence. 8.根据权利要求7所述的一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,提取出隐藏在失真的编码图像中的水印信息后包括:8. According to claim 7, a method for robust watermarking of screen shots based on attention mechanism and contrastive learning is characterized in that after extracting the watermark information hidden in the distorted encoded image, the method comprises: 根据解码后的水印信息和原始的水印信息计算解码器损失函数L D ,计算公式为:The decoder loss function LD is calculated based on the decoded watermark information and the original watermark information . The calculation formula is: , 式中,M表示原始的水印信息,M d 表示解码后的水印信息,γ D 表示解码超参数,I n 表示失真后的编码图像,D(γ D , I n )表示解码器对失真后的编码图像进行解码。Wherein, M represents the original watermark information, Md represents the decoded watermark information , γD represents the decoding hyperparameter, In represents the distorted coded image, and D ( γD , In ) represents the decoder decoding the distorted coded image. 9.根据权利要求8所述的一种基于注意力机制和对比学习的屏摄图像鲁棒水印方法,其特征在于,联合损失函数由编码器损失函数、鉴别器损失函数以及解码器损失函数构成,公式如下:9. According to claim 8, a screen shot image robust watermarking method based on attention mechanism and contrastive learning is characterized in that the joint loss function is composed of an encoder loss function, a discriminator loss function and a decoder loss function, and the formula is as follows: , 其中,λ1、λ2和λ3为对应损失函数的权重参数。Among them, λ 1 , λ 2 and λ 3 are weight parameters of the corresponding loss function.
CN202411448920.XA 2024-10-17 2024-10-17 Screen shot image robust watermarking method based on attention mechanism and contrast learning Active CN118967424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411448920.XA CN118967424B (en) 2024-10-17 2024-10-17 Screen shot image robust watermarking method based on attention mechanism and contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411448920.XA CN118967424B (en) 2024-10-17 2024-10-17 Screen shot image robust watermarking method based on attention mechanism and contrast learning

Publications (2)

Publication Number Publication Date
CN118967424A CN118967424A (en) 2024-11-15
CN118967424B true CN118967424B (en) 2025-03-14

Family

ID=93393397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411448920.XA Active CN118967424B (en) 2024-10-17 2024-10-17 Screen shot image robust watermarking method based on attention mechanism and contrast learning

Country Status (1)

Country Link
CN (1) CN118967424B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120374346B (en) * 2025-06-26 2025-08-26 南京信息工程大学 Anti-screen robust watermarking method based on generation of antagonism network and multiple tokens

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200710A (en) * 2020-10-08 2021-01-08 东南数字经济发展研究院 Self-adaptive invisible watermark synchronous detection method based on deep learning
CN116992407A (en) * 2023-08-18 2023-11-03 湖南大学 An anti-screen distortion watermarking method based on reversible bijection structure

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648436A (en) * 2022-03-16 2022-06-21 南京信息工程大学 Screen shot resistant text image watermark embedding and extracting method based on deep learning
CN118037518A (en) * 2024-01-17 2024-05-14 武汉大学 Traceable anti-watermark generation method and system for face image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200710A (en) * 2020-10-08 2021-01-08 东南数字经济发展研究院 Self-adaptive invisible watermark synchronous detection method based on deep learning
CN116992407A (en) * 2023-08-18 2023-11-03 湖南大学 An anti-screen distortion watermarking method based on reversible bijection structure

Also Published As

Publication number Publication date
CN118967424A (en) 2024-11-15

Similar Documents

Publication Publication Date Title
Zhang et al. Robust invisible video watermarking with attention
CN111491170B (en) Method for embedding watermark and watermark embedding device
Fang et al. Encoded feature enhancement in watermarking network for distortion in real scenes
CN115131188B (en) Robust image watermarking method based on generation countermeasure network
CN118967424B (en) Screen shot image robust watermarking method based on attention mechanism and contrast learning
CN114445256A (en) Training method, device, equipment and storage medium for digital watermark
Fu et al. Waverecovery: Screen-shooting watermarking based on wavelet and recovery
He et al. Robust blind video watermarking against geometric deformations and online video sharing platform processing
CN114862645B (en) Anti-printing digital watermarking method and device based on combination of U-Net network and DFT optimal quality radius
Cao et al. Screen-shooting resistant image watermarking based on lightweight neural network in frequency domain
CN120374346B (en) Anti-screen robust watermarking method based on generation of antagonism network and multiple tokens
CN118279119B (en) Image watermark information processing method, device and equipment
CN117455749A (en) A robust screen watermarking method based on wavelet domain cascade network and reverse recovery
Liao et al. GIFMarking: The robust watermarking for animated GIF based deep learning
Zhang et al. Embedding Guided End‐to‐End Framework for Robust Image Watermarking
Liu et al. Hiding functions within functions: Steganography by implicit neural representations
Zhang et al. A convolutional neural network-based blind robust image watermarking approach exploiting the frequency domain
CN117611422A (en) Image steganography method based on Moire pattern generation
KR101169826B1 (en) Bit-accurate film grain simulation method based on pre-computed transformed coefficients
CN115526758A (en) Hadamard transform screen-shot-resistant watermarking method based on deep learning
CN114119330B (en) Robust digital watermark embedding and extracting method based on neural network
Liu et al. Screen shooting resistant watermarking based on cross attention
Zhong et al. Enhanced attention mechanism-based image watermarking with simulated JPEG compression
Chen et al. Rowsformer: A robust watermarking framework with swin transformer for enhanced geometric attack resilience
CN118741149B (en) A watermark updating method suitable for multi-stage transmission process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant