[go: up one dir, main page]

WO2025056974A1 - Digital watermarking of video frames and images for robust payload extraction from partial frames or images - Google Patents

Digital watermarking of video frames and images for robust payload extraction from partial frames or images Download PDF

Info

Publication number
WO2025056974A1
WO2025056974A1 PCT/IB2024/000509 IB2024000509W WO2025056974A1 WO 2025056974 A1 WO2025056974 A1 WO 2025056974A1 IB 2024000509 W IB2024000509 W IB 2024000509W WO 2025056974 A1 WO2025056974 A1 WO 2025056974A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
video frame
pixelgroup
pixelgroups
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/IB2024/000509
Other languages
French (fr)
Inventor
Alexander Solonsky
Michael Stattmann
Niels Thorwirth
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Castlabs GmbH
Original Assignee
Castlabs GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Castlabs GmbH filed Critical Castlabs GmbH
Publication of WO2025056974A1 publication Critical patent/WO2025056974A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • G06T1/0028Adaptive watermarking, e.g. Human Visual System [HVS]-based watermarking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0051Embedding of the watermark in the spatial domain
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0061Embedding of the watermark in each block of the image, e.g. segmented watermarking

Definitions

  • the present disclosure relates generally to the field of embedding and extracting of digital watermarks in imaging or video frame content.
  • Digital watermarking is a process for embedding information in digital media, such as images and video frames, in a way that is imperceptible to human viewers but can be detected and extracted by computers using specialized algorithms. Digital watermarking is used for a variety of purposes, including copyright protection, content authentication, and tracking the distribution of digital media. Digital watermarking can be performed in the spatial domain, by, for example, directly modifying pixel values, or in the frequency domain, by applying a transform such as the Discrete Cosine Transform (DCT) or Fast Fourier Transform (FFT) to the media and embedding the watermark in the resulting coefficients.
  • DCT Discrete Cosine Transform
  • FFT Fast Fourier Transform
  • the embedded information can be made robust to common signal processing operations, such as compression, scaling, and addition of noise, so that the watermark can still be detected after the media has been transformed.
  • Watermarking techniques can include the use of perceptual models to reduce visibility and error correction codes to improve the reliability of information extraction.
  • Embodiments are directed to digital watermarking of video frames and images.
  • Information can be embedded in the frames and images such that a payload can be extracted from a portion of a frame or image.
  • An image or video frame to be protected can be selected.
  • the image or video frame can be divided into groups of pixels (i.e., so-called Pixelgroups).
  • the Pixelgroups may be spread throughout the frame or image.
  • the Pixelgroups may have different shapes and sizes associated with individual ones of the Pixelgroups.
  • a payload message may be received, and a payload sequence may be derived from the payload message.
  • the payload message may include information about the recipient, creator, devices, or other information about the content flow and processing that will help identify ownership, copyright and sources of illegal content distribution and sharing.
  • Pixelgroups are modified based on a payload sequence and a seed value.
  • a strength values indicates how much to modify the Pixelgroups, and a sign that can be positive, negative, or neutral sets the direction of modification.
  • the content of each Pixelgroup may be modified based on the strength values, with considerations for visibility and error correction.
  • the modified image may be embedded with the watermark, which may be applied to luminance and/or pixel color values.
  • the extraction process may include identifying the watermarked image, or image sequence for the image, photograph or video recording containing the area surrounding the watermarked frame or image, optionally identifying the original content, and restoring the frame to correct any distortions.
  • the Pixelgroups where information has been embedded may be identified along with the strength values that were used.
  • the information can be extracted by understanding the sign used for embedding. For this, the original value of each Pixelgroup may be estimated by subtracting the original image from the watermarked version. With an estimation of the sign of each Pixelgroup, error correction, decoding and aggregation may be applied for reversing the embedding processes and to determine the embedded payload.
  • the Pixelgroups may have different patterns and shapes.
  • the arrangement of the pixels in Pixelgroups may be used to change the average luminance of the area to which they are assigned and estimate the original luminance.
  • the patterns may also be used to identify the original locations of the Pixelgroups, which may be used to restore the original shape of the content as it was watermarked.
  • the embedding areas may be larger than the image or video frame, such as the entire desktop or browser window.
  • Primary and secondary watermarks may be embedded together as combined watermark patterns.
  • the primary watermark may be embedded by modifying the average luminance, and the secondary watermark may be embedded using a noise pattern.
  • the embedding may be performed in the luminance, color, and/or frequency domain.
  • the embedding may be performed using algorithms implemented on a central processing unit (CPU) or graphics processing unit (GPU), via graphic overlay with alpha blending, or other suitable technologies.
  • the extraction process may include image segmentation, alignment using known elements, and filtering to enhance the watermark signal. Multiple frames may be combined to improve readout accuracy.
  • the watermarking technique may also be used for image authentication, where the presence or absence of the watermark indicates potential tampering.
  • the resulting process enables an imperceptible, high-capacity watermark with robustness against various analogue to digital conversions that may include camera pictures and printouts, enabled by the ability to recover from distortions.
  • the same approach furthermore may protect an image against image tampering by highlighting modifications to the marked area.
  • the embedding process is designed to be highly efficient and applicable to various hardware and software architectures, which may be critical for applications like video processing.
  • FIG. 1 is a block diagram of an exemplary system diagram for the embedding of digital watermarks, in accordance with the principles of the present disclosure.
  • FIG. 2 is an exemplary embedding workflow for the embedding of digital watermarks, in accordance with the principles of the present disclosure.
  • Fig. 3 is an exemplary extraction workflow for the extraction of digital watermarks, in accordance with the principles of the present disclosure.
  • FIG. 4 illustrates various Pixelgroup patterns for use with the exemplary system of FIG. 1, in accordance with the principles of the present disclosure.
  • FIG. 5 illustrates an exemplary Pixelgroup shape, in accordance with the principles of the present disclosure.
  • FIG. 6 illustrates an exemplary watermark applied to video, in accordance with the principles of the present disclosure.
  • FIG. 7 illustrates the extending of a watermark to a website background, in accordance with the principles of the present disclosure.
  • FIG. 8 illustrates the extending of a watermark to a desktop background, in accordance with the principles of the present disclosure.
  • FIG. 9 illustrates an exemplary combined watermark pattern, in accordance with the principles of the present disclosure.
  • FIG. 1 An embodiment of a system in accordance with the present disclosure for embedding a hidden watermark in media is shown in FIG. 1.
  • the system 10 in the illustrated embodiment is an internet network connected to video processing devices connected to the network 20, connecting devices for video processing, distribution and playback.
  • the devices able to execute one or more aspects of the present disclosure, may include a personal computer 25, a cloud server such as a video processing workstation, a remote workstation or video encoding server 30, a video encoder 40, a mobile phone 50, and/or a consumer electronics device 60 such as a set top box, a game console, video decoder, video dongle, or an Internet connected television.
  • a cloud server such as a video processing workstation, a remote workstation or video encoding server 30
  • video encoder 40 such as a mobile phone 50
  • consumer electronics device 60 such as a set top box, a game console, video decoder, video dongle, or an Internet connected television.
  • Video content may originate from the devices, be transmitted between any of the devices or stem from a dedicated media source 70, such as a camera, TV station broadcast, or other video stream.
  • Connections of devices to the network may include a wired connection 100, as well as wireless connections (e.g., WiFi 110 or mobile connections 120).
  • FIG. 2 shows the embedding workflow in one embodiment of the present disclosure.
  • the process may be applied on a central processing unit (CPU), graphics processing unit (GPU) or other types graphic processing units implemented in hardware or software.
  • CPU central processing unit
  • GPU graphics processing unit
  • FIG. 2 shows the embedding workflow in one embodiment of the present disclosure.
  • the process may be applied on a central processing unit (CPU), graphics processing unit (GPU) or other types graphic processing units implemented in hardware or software.
  • CPU central processing unit
  • GPU graphics processing unit
  • the process starts with selecting an image which requires protection 210.
  • This may be an image that requires protection in a format like tiff, jpeg or png; it may also be video from which frames are selected for processing.
  • the frame is split into groups of pixels (i.e., so-called Pixelgroups) that are distributed over the frame, and, in some implementations, pixels of the frame are assigned to Pixelgroups.
  • Pixelgroups groups of pixels that are distributed over the frame, and, in some implementations, pixels of the frame are assigned to Pixelgroups.
  • the shape of the Pixelgroups is square and resembles compression macroblocks used in image or video compression of, for example, 8x8 or 16x16 groups of pixels.
  • the number of Pixelgroups in a given frame or image is fixed and the size of each of the Pixelgroups varies with the dimension of the content.
  • the shape may not be fixed to be square for all images, but in an alternative implementation may vary with the aspect ratio of the content.
  • a payload is chosen.
  • the payload may identify one or more of a recipient of the video or image content, information about the creator of the video or image content, such as an individual, company, institution, or image creation system that may be using, for example, artificial intelligence (Al) or machine learning.
  • the payload may also identify or point to a manifest that contains more information about the image such as the origin and authenticity.
  • This manifest may, for example, follow the standard of the "Coalition for Content Provenance and Authenticity" that is published at https://c2pa.org/.
  • the robust link provided by this watermark can ensure that that information about the creator and processing of the image is available, even after meta information has been stripped, as is frequently the case during upload to social media.
  • the same mark that is used to pinpoint the source of the content can also be used to identify regions of the image where the mark is not intact and that have been tampered with.
  • a sequence is derived from the payload of the video or image content.
  • the sequence may consist of a string of bits or digits from a binary system or other numeric systems with a different base such as decimal.
  • the sequence may have been previously encrypted and encoded using error detection or error correction codecs such as Cyclic redundancy check (CRC), Bose-Chaudhuri-Hocquenghem (BCH), low-density parity-check (LPDC), or Reed-Solomon codecs.
  • CRC Cyclic redundancy check
  • BCH Bose-Chaudhuri-Hocquenghem
  • LPDC low-density parity-check
  • Reed-Solomon codecs Reed-Solomon codecs.
  • Parts of the payload sequence are assigned to Pixelgroups. For example, each bit in the payload sequence is assigned to multiple Pixelgroups such that all groups have a bit assigned.
  • the bit is translated into a positive, neutral, or negative sign value that is used to embed the mark.
  • a neutral sign signifies no modification in that Pixelgroup.
  • a neutral sign is introduced to reduce the amount of marking for reduced visibility but may also be used as a signal itself and provides extra variation to the watermark.
  • the assignment may depend on a secret key in a manner, such that the assignment is not known without the key and the key is required for correct assignment of Pixelgroups to payload parts without which the watermark is not readable. Information like user profiles, personal information or company specific information may be used as a key.
  • the assignment of payload bits to Pixelgroups can remain the same for an asset, asset collection. Some Pixelgroups are not used for embedding and the sign value may be neutral in this case.
  • a seed is derived (240) that may affect the embedding.
  • the seed may be derived using an input such as an identifier of the frame such as a frame number, picture order count, time code or a hash code calculated over the frame content.
  • a seed may also be a number derived from the image content that changes in value along with changes of the image content, such as, for example, a sum of pixel values of a reduced image size, possibly by image sections.
  • One advantage of such an implementation is that the watermark only changes if the image content changes.
  • These derived seeds may also repeat and cycle in regular intervals. Additionally, unmarked locations in the image or video content may be derived from the seed, and in some implementations in combination with strength values assigned at 250.
  • These seeds may determine the strength of each Pixelgroup in a frame and may change with frame or difference within frames.
  • strength values are assigned to each Pixelgroup (250).
  • the payload sequence may be used to assign a positive, or negative strength value for each Pixelgroup depending on the value of the payload sequence.
  • the assigned value may also be neutral, resulting in no modification to the Pixelgroup during the embedding process.
  • This effect is also known as the so-called dirty window effect as it is resembles a static impurity on a window that becomes visible when the content seen through the window is moving. Additional reasons for these variations include enhanced security resulting from watermarking each frame differently.
  • the values are assigned repeatedly, such that the payload sequence is embedded multiple times. Repetitions may be identical or contain variations that will aid during extraction with error correction coding.
  • a strength value is assigned that will determine the amount of changes applied in the Pixelgroup.
  • Different strength values within a Pixelgroup may be used by the decoder to estimate the unmarked, original input image from the watermarked image and use it to discern precisely which Pixelgroups have been embedded and use those for extraction only. If this information is not available during extraction because the original image is not available or cannot be identified, then the embedded information may still be readable, as the areas that have not been used for embedding will be noise that can be compensated for even if they are not known.
  • the pattern created but variation in strength or presence (neutral sign) of Pixelgroups can be used to locate a group of Pixelgroups and thereby understand the location in the image even if it has been changed by transformations such as cropping.
  • Pixels in the group may be changed with different strengths and some may not be changed at all (i.e., so-called dithering).
  • This dithering may be used to achieve an overall luminance change between integer values of discrete luminance changes. For example, it is possible to change the average luminance by .8 units by setting 8 out of 10 values to 1, with the remaining values set to 0.
  • the dithering pattern (e.g., which pixels of the aforementioned 8 out of 10 values) may be regular like a checkerboard pattern or pseudo random, i.e. noise like as shown in FIG. 4.
  • the pattern may also be chosen to identify the Pixelgroup or signal additional information derived by, for example, another payload used for a different purpose.
  • the identification can be used to determine the original location of the Pixelgroup when reading from an image where the location has changed due to geometric variations like cropping or resizing. In this case the pattern in the Pixelgroup is compared to a known list of patterns and their locations.
  • the unique pattern may also serve to identify small changes to the image when verifying the structure of the pattern. This may be useful for the use case of image authentication.
  • pixels that are not assigned to a Pixelgroup are not changed such that there are unmodified pixels between any neighboring Pixelgroups within a given image or frame. These pixels may help during the reading of the watermark when used to estimate the unmarked content of the Pixelgroups and then derive the most likely watermarking depending on if the estimate is lighter or darker than the estimation. The gap may also be helpful to conceal changes as it widens and thereby blurs the transition between light and dark Pixelgroups.
  • the image is modified (270) by altering it according to the Pixelgroups strength, shape and content.
  • the changes are applied to the luminance of the pixels, and in alternative implementations, color values like blue for reduced visibility or green for increased strength may be chosen. Embedding in the color domain may be considered in cases where color transformations are unlikely or do not need to maintain a readable watermark.
  • the value is clamped to the range (16, 240) when it is required to comply with the BT.709 specification. Other specifications with different ranges can be supported as well, including, for example, BT.2020.
  • This step may be applied in a different processor or module and application may also include the use of a computer graphics shader or on-screen display (OSD) as it commonly available in video playback environments.
  • OSD on-screen display
  • the watermark to be applied is rendered as an image with different levels of dark and light influence according to the sign values and alpha values indicating the strength of the overlaying image according to the strength value.
  • video frames may be modified with synchronization marks (280). These may have a known shape of dark corners or known noise patterns that can be located by searching for a high correlation. Ordinarily the resulting image has an unchanged visual appearance, and the mark is embedded in a hidden manner; however, depending on the strength and required survivability of the mark, some artifacts may become noticeable.
  • the extraction as shown in FIG. 3 consists of multiple steps.
  • the embedding mechanism enables the extraction after severe degradation such as detection from a single frame of video that has been recorded using a smartphone camera or after the watermarked image has been printed and scanned again.
  • the first step 310 is to identify a watermarked image or image sequence. This may be done when confidential or copyrighted material is illegally distributed or if content is submitted for verification.
  • a next, optional step 320 is to identify the original content as for example, matching it with the frame number and content of the unmarked video; it may also be an image out of an image archive. Identification methods like fingerprinting (also known as automated content recognition), perceptual hashes, other watermarking information and manual identification may be applied. Similar algorithms may also be applied in the next step 330 for frame restoration. The identification may not require procurement of the entire image but may be using elements or high-level information thereof that are useful for restoration.
  • the frame restoration aims to identify the content location and size used for watermarking in order to understand distortions such as rotation, shifting, cropping, resizing, bending, etc. Matching content from the original may be used to understand the distortions as well as identification of synchronization marks which reveal known locations. Filtering or trained artificial intelligence networks can be used to allow for estimation of changes in size and rotation. Identification of the Pixelgroup patterns or their unique strength profile can help to identify the location of Pixelgroups in the image and as these can be fixed or known can be used for geometric content restoration.
  • the next step 340 is to determine embedding locations in the same way as during embedding (250 and 260) by determining the location and use of Pixelgroups. Additional filtering, comparison with the original or training artificial intelligence networks may help to emphasize watermark signal (350) so that it becomes stronger compared to the noise introduced from the underlying image or copying process.
  • Step 360 will determine strength values for each of the Pixelgroups, as in the embedding stage, the seed may be used to determine what strength values were used for the watermarking and the assignment of payload parts (without yet knowing their value) to Pixelgroups is also the same as during embedding. What strength values were used for the watermarking may improve extraction performance but is not required if the information cannot be determined.
  • step 370 the actual readout step to estimate the embedded value of each Pixelgroup used for watermarking.
  • the change applied by the watermark such as change in luminance added to the unmarked image is estimated by comparing or subtracting the original in the same geometric transformation or, alternatively by estimating the unmarked (original) luminance using unmarked pixels in the Pixelgroup such as, for example, the border pixels.
  • a value such as a bit is derived from each Pixelgroup and from the resulting bitstring, the payload is decoded 380. This consists of reverse processes applied during embedding, like error detection coding and encryption.
  • the payload decode may be followed by observations on the probability of correct detection and variations of any previous steps may be used to find the most likely payload.
  • Reliability observations may be done using the output of error-correction or -detection coding as well as consistency of Pixelgroup readings that represent the same information.
  • the Pixelgroups are used to modify the pixel values of the area assigned to it. Here shown in an example with luminance modification to change the average luminance value in the Pixelgroup. The content and composition of what pixels are changed is determined by the Pixelgroups pattern.
  • FIG. 4 shows several variations. 410 is an example of a squared Pixelgroup of 16x16 pixels with a solid center of pixel size 14x14. This represents a variation that is used to darken the image according to the darkness of the represented pixels such that the white border 420 results in no modification of the watermarked image. The center pixels will be modified by a small percentage of luminance only but are represented stronger for visualization.
  • Pixelgroup 430 shows a dithered version that reduces the overall average luminance less and can be used to change it by a fraction of a digital luminance step.
  • 440 shows a Pixelgroup being filled with a random pattern that can be used for secondary storage of information or as a seal for image authentication. In the blurred version 450 this pattern is less noticeable and includes a high frequency removal that may occur anyway with common compression techniques applied to images and video for storage and transmission.
  • 460 is the pattern shown in 410 with the edges blurred in a dithered method such that the blocks are less pronounced and thereby less visible. When a random pattern is used the reduction of the transition to the outside of the Pixelgroup like 440 is depicted in 470.
  • FIG. 5 shows Pixelgroup shapes that are rectangular. This way, they resemble macroblocks or other rectangular structures used during video compression in a way that sometimes creates blocky artifacts. In giving the Pixelgroups a similar structure, they resemble these compression artifacts, and the watermarking therefore is not obvious as such.
  • the Pixelgroups can be combined into a pattern 510 covering the image.
  • Some Pixelgroups may be combined into repeating blocks as in 520 that, due to the repetitive nature, allow identification using auto correlation. While in general the pattern repeats, some variations may be performed to, for example, disable the watermarking of some locations as in 530.
  • the white and dark areas signify reduction or increase in luminance when applied, while the medium gray will not result in changes to the image that is watermarked.
  • FIG. 6 shows the output (610) of a device capable of video playback such as a TV, PC or smartphone and the video area that contains a watermark (with increased strength for emphasis) that is used to watermark the rendered output video.
  • the watermark may contain a number identifying the recipient such that leaked content from this recipient can be identified. Watermarking may have been performed on a server when the video was processed and delivered as watermarked content to the client. Alternatively, the watermark may have been applied on the client side.
  • FIG. 7 shows an output (710) where the web browser output window (710) is completely covered and protected by the watermark such that if part of this area is leaked along with the valuable video content it can also serve to read the watermark, allowing for a more reliable detection.
  • client-side technologies like shaders may be applied.
  • FIG. 8 shows a watermarking of the entire desktop (810) to extend the protection further.
  • the application may be on the client-side, or it may be on the server side, where the desktop output of a remote workstation using a remote desktop connection is watermarked in order to be secured before transmission to a remote terminal. Protection in this case includes any application or combination of multiple applications that show confidential information that require protection such that potential leaks can be identified.
  • the application to the desktop may occur in locations that process the video output, this includes a graphics processor, graphics card or a virtual display adapter of a virtual machine.
  • FIG. 9 shows an example of a combined watermark pattern (910) where the primary watermark consists of darker (920), lighter (930) or neutral (940) areas, while the secondary watermark is also embedded with an independent noise pattern.
  • the primary watermark having the average luminance value of the square Pixelgroup as a means to encode information by darkening and lightening the underlying image while the composition of the noise pattern and luminance value in relation to the other elements of the same noise pattern are a secondary watermark.
  • the embedding domain can vary to allow for additional layers of independent watermarks or improve invisibility or robustness for a certain category of images or attacks.
  • Embedding in the preferred embodiment, is applied to the luminance domain as it is very robust and is therefore suitable as a carrier for a robust watermark.
  • the watermark may also be applied to individual colors channels such as other channels of the YUV or channels of the RGB image representation, or channels of alternative color representation formats like ICtCp.
  • the modification is applied to an image that has been transformed into the frequency domain including using a DCT Discrete Cosine Transform (DCT) or Fast Fourier Transform (FFT).
  • DCT DCT Discrete Cosine Transform
  • FFT Fast Fourier Transform
  • Different components of the payload might be applied in different domains. For instance, error correction information may be applied in a color domain while the main watermark is embedded in a luminance domain. In another example, information from different parties such as studio and distributor are embedded in different domains.
  • Layers of different embedded watermarks do not have to vary in the embedding domain and can be in the same domain and overlap each other.
  • Techniques like error correction coding different assignments of Pixelgroups by, for example, using different passwords and random variations of the location can help to keep the watermarks apart. This is particularly relevant if the same embedder is used by different parties on the same content without an orchestrated effort or knowledge of each other.
  • Pixelgroups in the preferred embodiment, have a square shape but, in an alternative embodiment, may also be rectangular.
  • a rectangular shape may have the advantage that a rotation of, for example, 90° may be easier to detect when extracting the regular patterns created by embedded Pixelgroups.
  • the size and location of Pixelgroups may be aligned with the size and location of macro blocks or encoding blocks that are used during video encoding, if that is known at the time of embedding.
  • the advantage is that the watermark is better maintained during compression after watermarking because the luminance values of lower frequencies, that hold most of the watermark's information, are maintained with a higher precision and uniform variation. If, in contrast, a Pixelgroup is split between macroblocks, the values have a higher likelihood to be rounded off and be affected stronger by compression. Consequently, the alignment with the same grid or size of macroblocks preserves more of the embedded information.
  • Patterns within a Pixelgroup may represent individual bits or symbols that can represent one of several values (e.g., each pattern is chosen from a group of 8 different alternative patterns). This is applied in addition to, or as an alternative to, luminance variations to store additional information.
  • Patterns within a Pixelgroup may be a noise pattern that is detectable with cross correlation approaches or a repeating pattern detectable with auto correlation approaches.
  • the noise pattern may be spanning over several Pixelgroups to embed a secondary signal.
  • the darker pixels in the noise pattern or the lighter pixels are taken to shape the watermark to be embedded, in a strength that is given by considerations in 250.
  • the entire noise pattern of the secondary watermark is reduced according to the luminance requirement of the primary watermark. The result is an overall noise structure of the secondary pattern while the average luminance of the Pixelgroups is determined by the primary watermark.
  • the change in Pixelgroup strength values may also vary between areas with strong vs very light embedding such that in the very lightly embedded areas the pattern is not visible.
  • This process may consider the visual properties of the image using a perceptual model that may be using different properties for flat vs noisy image regions. These can be reconstructed using the original image during extraction time or estimated from the watermarked image to estimate the embedding strength used, as it can help readout.
  • a perceptual model depending on values derived from the encoded content such as size of the frame (for frame type) the size of encoded motion vectors as a proxy for similarity to previous frames and indication of the type of motion in this frame can be used.
  • Regular motion may increase the dirty window effect, and more variation of the watermark may be important.
  • Many different motion vectors will take more compression space and may leave less data to encode error residuals and hence the compression for this frame may be more lossy than for the average frame and the watermark needs to be embedded stronger to survive.
  • This method may be combined with an overlay approach that adds a watermark image to the content without access to individual pixels. Global frame statistics like this help to estimate some frame or video properties and change the watermark accordingly to enable perceptual shaping of the watermark in space or time.
  • the creation of a watermark pattern can be separated from its perceptual shaping (i.e. adjusting strength of the watermark pattern depending on its visibility in an area given the unwatermarked image).
  • the pattern creation may occur on a server and is transmitted to the client where it is applied and, if possible, adjusted to the content.
  • erasure codes such as Fountain codes or Online codes (e.g. Tornado or RaptorQ code) can be used to extend the sequence, creating redundancy with a more uniform distribution, allowing for reduced errors at readout as well as a similar amount of information bits in the payload sequence.
  • baseband video before compression can be used.
  • An overlay image may be applied to embed the watermark onto the content. It may change less frequently but can also vary for security and invisibility. Modification can occur using technologies like shaders (WebGI) or OSDs (on-screen display) overlaid onto the image transparently, typically on the client side with video playback in video players including a web browser. These approaches may be used to apply the pattern to an image larger than the video image (see below).
  • the watermark can be removed by reversing the embedding process and inverting the pixel changes. This may be used for instance in use cases where the watermark is only temporary or needs to be replaced with a different watermark.
  • a picture When a picture is taken of a video it may contain elements other than the image content, such as a frame of the TV displaying the video.
  • Image segmentation using for example object detection techniques may help to find regions that may contain the relevant content only and that can be used to attempt extraction.
  • Candidates could for example be images inside a television frame or on a phone display.
  • Alignment of watermarked images during extraction to original shape and dimension can be performed using known elements, without use of the original image content. This is done by looking for the watermark pattern, including synchronization marks that are semitransparent or only appear periodically. Additionally known elements such as visual elements within the screen, such as broadcast logos, captions, or navigation buttons from online players that are always displayed on the same known location on the screen can be used.
  • logos may have been introduced for synchronization purposes but also contain a visual signal like do not copy.
  • These logos may be designed with synchronization alignment and recognition properties in mind with, for example, a unique shape or pattern that is easier to recognize and identify. Patterns may consist of a barcode-like structure that can store additional information for each marked copy or is fixed to allow for identification using correlation approaches.
  • Blending occurs when, during the time to expose the camera image, several video frames are displayed. Frames may be mixed unevenly depending on the sequence of when pixels are displayed by the display device as well as when pixels are recorded by the camera. For example, some cameras record the upper part of the image first resulting in the top of the image recording of frame n and the lower part of the image recording more of frame n+1. Other cameras record all pixels at the same time and the resulting image may be an even blend of two or more frames. This can be used by testing for different ratios of blended frames for example after a principal frame has been identified once the ratio is established the original frames can be blended accordingly to allow for better synchronization and resulting better extraction, in particular when subtracting the original from the watermarked copy.
  • the identification of the original frame is performed with fingerprinting. Additional methods of identification include fingerprinting that identifies relevant frames by detecting all objects in a video at the time of watermarking using automated labeling techniques. This will create a dictionary of all elements in each frame. This may also include object properties like color, size and frequency. At extraction time, application of the same or similar automated labeling, segmentation or object detection techniques are applied for information used to search the database of known content.
  • a robust hash or output of a fingerprinting system can be used for identification of watermarked images. This may include different levels of precision where some areas are decided with a robust hash and others with a fine grain hash to ensure that some areas can be identified from an approximation of the original image e.g. from the watermarked copy, and at the same time ensure sufficient modification of the image can be tolerated.
  • Fingerprinting can be implemented with methods including a luminosity histogram or a scaled down version of the image as a fingerprint
  • Image filtering before extraction that enhances the watermark location and suppresses the image content can improve extraction performance.
  • High pass filters that pass the expected size of the Pixelgroups i.e. a high pass or sharpening filter that multiplies with the size of the Pixelgroup (e.g., 16x16 if there is no size change suspected) can be used to emphasize Pixelgroups of that shape.
  • extraction can be performed by correlating the watermarked original with a suspected watermark Pixelgroup pattern to find the highest correlation score which is an indication of the likelihood of the embedded pattern.
  • Areas that potentially contain a lower quality image than the rest are identified and ignored during extraction. This can include for example areas that are overexposed due to glare from a light source that is reflected in the monitor displaying the image as it is captured. In another embodiment this may include areas that are underexposed or exhibit a strong Moire pattern.
  • multiple frames are combined to assemble the embedded information and improve the readout.
  • the combination of frames may be on the pixel, Pixelgroup or payload level.
  • Application for authentication the present disclosure and the pattern that is applied may also be used to authenticate an image and tamper proof it. While, for a robust watermark, the task is to identify the information even if it is only read in parts of the image, for the authentication use case any parts of the image that do not contain the watermark as embedded are suspicious. Those regions where the watermark is weaker or not present are more likely to have been modified by a process other than compression and may indicate that the image has been tampered with. Authentication may be performed in a two-step process during which, in a first step, the most likely watermark is read and in a second that the integrity of this watermark is verified throughout the image.
  • Increased security is accomplished by using a secret key to encrypt the payload or decision on how to assign parts of the payload information sequence to individual Pixelgroups.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

Methods and systems for embedding a watermark in an image or video frame are disclosed. In one embodiment, the method includes splitting the image or video frame into a plurality of Pixelgroups; deriving a payload sequence for a payload message associated with the image or video frame; assigning each Pixelgroup of the plurality of Pixelgroups a positive, negative or neutral sign value using the derived payload sequence; deriving a seed value from the image or video frame; assigning each Pixelgroup of the plurality of Pixelgroups a strength value using the derived seed value; and modifying the image or video frame using the assigned positive, negative or neutral sign value and the assigned strength value associated with respective ones of the plurality of Pixelgroups to modify pixels within the respective Pixelgroup that do not reside at a border region of the respective Pixelgroup. Methods of extracting the watermark are also disclosed.

Description

DIGITAL WATERMARKING OF VIDEO FRAMES AN D I MAGES FOR ROBUST
PAYLOAD EXTRACTION FROM PARTIAL FRAMES OR I MAGES
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority to U.S. Provisional Patent Application Serial No. 63/582,761 filed on Sept 14, 2023, and entitled "Imperceptible watermarking of video frames or images, enabling full payload extraction from a single frame or image", the contents of which being incorporated herein by reference in its entirety.
FI ELD
[0002] The present disclosure relates generally to the field of embedding and extracting of digital watermarks in imaging or video frame content.
BACKGROU N D
[0003] Digital watermarking is a process for embedding information in digital media, such as images and video frames, in a way that is imperceptible to human viewers but can be detected and extracted by computers using specialized algorithms. Digital watermarking is used for a variety of purposes, including copyright protection, content authentication, and tracking the distribution of digital media. Digital watermarking can be performed in the spatial domain, by, for example, directly modifying pixel values, or in the frequency domain, by applying a transform such as the Discrete Cosine Transform (DCT) or Fast Fourier Transform (FFT) to the media and embedding the watermark in the resulting coefficients. The embedded information can be made robust to common signal processing operations, such as compression, scaling, and addition of noise, so that the watermark can still be detected after the media has been transformed. Watermarking techniques can include the use of perceptual models to reduce visibility and error correction codes to improve the reliability of information extraction.
[0004] While a variety of approaches for digital watermarking has been proposed in the past, traditional approaches have not been able to imperceptibly watermark and image sufficiently robust embedded information that is reliably readable from a camera picture, or printout while embedding unique numbers and allow for low latency embedding needed during processing of large image quantities. SUMMARY
[0005] Embodiments are directed to digital watermarking of video frames and images. Information can be embedded in the frames and images such that a payload can be extracted from a portion of a frame or image. An image or video frame to be protected can be selected. The image or video frame can be divided into groups of pixels (i.e., so-called Pixelgroups). The Pixelgroups may be spread throughout the frame or image. The Pixelgroups may have different shapes and sizes associated with individual ones of the Pixelgroups. A payload message may be received, and a payload sequence may be derived from the payload message. The payload message may include information about the recipient, creator, devices, or other information about the content flow and processing that will help identify ownership, copyright and sources of illegal content distribution and sharing.
[0006] Pixelgroups are modified based on a payload sequence and a seed value. A strength values indicates how much to modify the Pixelgroups, and a sign that can be positive, negative, or neutral sets the direction of modification. The content of each Pixelgroup may be modified based on the strength values, with considerations for visibility and error correction. The modified image may be embedded with the watermark, which may be applied to luminance and/or pixel color values.
[0007] The extraction process may include identifying the watermarked image, or image sequence for the image, photograph or video recording containing the area surrounding the watermarked frame or image, optionally identifying the original content, and restoring the frame to correct any distortions. With the known original orientation, the Pixelgroups where information has been embedded may be identified along with the strength values that were used. The information can be extracted by understanding the sign used for embedding. For this, the original value of each Pixelgroup may be estimated by subtracting the original image from the watermarked version. With an estimation of the sign of each Pixelgroup, error correction, decoding and aggregation may be applied for reversing the embedding processes and to determine the embedded payload.
[0008] The Pixelgroups may have different patterns and shapes. The arrangement of the pixels in Pixelgroups may be used to change the average luminance of the area to which they are assigned and estimate the original luminance. The patterns may also be used to identify the original locations of the Pixelgroups, which may be used to restore the original shape of the content as it was watermarked. The embedding areas may be larger than the image or video frame, such as the entire desktop or browser window.
[0009] Primary and secondary watermarks may be embedded together as combined watermark patterns. The primary watermark may be embedded by modifying the average luminance, and the secondary watermark may be embedded using a noise pattern. The embedding may be performed in the luminance, color, and/or frequency domain. The embedding may be performed using algorithms implemented on a central processing unit (CPU) or graphics processing unit (GPU), via graphic overlay with alpha blending, or other suitable technologies.
[0010] The extraction process may include image segmentation, alignment using known elements, and filtering to enhance the watermark signal. Multiple frames may be combined to improve readout accuracy. The watermarking technique may also be used for image authentication, where the presence or absence of the watermark indicates potential tampering.
[0011] The resulting process enables an imperceptible, high-capacity watermark with robustness against various analogue to digital conversions that may include camera pictures and printouts, enabled by the ability to recover from distortions. The same approach furthermore may protect an image against image tampering by highlighting modifications to the marked area. Lastly, the embedding process is designed to be highly efficient and applicable to various hardware and software architectures, which may be critical for applications like video processing.
BRI EF DESCRI PTION OF DRAWINGS
[0012] FIG. 1 is a block diagram of an exemplary system diagram for the embedding of digital watermarks, in accordance with the principles of the present disclosure.
[0013] FIG. 2 is an exemplary embedding workflow for the embedding of digital watermarks, in accordance with the principles of the present disclosure.
[0014] Fig. 3 is an exemplary extraction workflow for the extraction of digital watermarks, in accordance with the principles of the present disclosure.
[0015] FIG. 4 illustrates various Pixelgroup patterns for use with the exemplary system of FIG. 1, in accordance with the principles of the present disclosure.
[0016] FIG. 5 illustrates an exemplary Pixelgroup shape, in accordance with the principles of the present disclosure.
[0017] FIG. 6 illustrates an exemplary watermark applied to video, in accordance with the principles of the present disclosure. [0018] FIG. 7 illustrates the extending of a watermark to a website background, in accordance with the principles of the present disclosure.
[0019] FIG. 8 illustrates the extending of a watermark to a desktop background, in accordance with the principles of the present disclosure.
[0020] FIG. 9 illustrates an exemplary combined watermark pattern, in accordance with the principles of the present disclosure.
DETAI LED DESCRIPTION
[0021] EMBEDDING SYSTEM -
[0022] An embodiment of a system in accordance with the present disclosure for embedding a hidden watermark in media is shown in FIG. 1. The system 10 in the illustrated embodiment, is an internet network connected to video processing devices connected to the network 20, connecting devices for video processing, distribution and playback. The devices, able to execute one or more aspects of the present disclosure, may include a personal computer 25, a cloud server such as a video processing workstation, a remote workstation or video encoding server 30, a video encoder 40, a mobile phone 50, and/or a consumer electronics device 60 such as a set top box, a game console, video decoder, video dongle, or an Internet connected television. Although specific playback devices are illustrated in FIG. 1, any of a variety of client devices capable of processing video or images for purposes of encoding or decoding may be utilized in accordance with embodiments of the present disclosure. Video content may originate from the devices, be transmitted between any of the devices or stem from a dedicated media source 70, such as a camera, TV station broadcast, or other video stream. Connections of devices to the network may include a wired connection 100, as well as wireless connections (e.g., WiFi 110 or mobile connections 120).
[0023] EMBEDDING WORKFLOW -
[0024] FIG. 2 shows the embedding workflow in one embodiment of the present disclosure. The process may be applied on a central processing unit (CPU), graphics processing unit (GPU) or other types graphic processing units implemented in hardware or software.
[0025] The process starts with selecting an image which requires protection 210. This may be an image that requires protection in a format like tiff, jpeg or png; it may also be video from which frames are selected for processing. [0026] In 220, the frame is split into groups of pixels (i.e., so-called Pixelgroups) that are distributed over the frame, and, in some implementations, pixels of the frame are assigned to Pixelgroups. In some implementations, the shape of the Pixelgroups is square and resembles compression macroblocks used in image or video compression of, for example, 8x8 or 16x16 groups of pixels. In other implementations, the number of Pixelgroups in a given frame or image is fixed and the size of each of the Pixelgroups varies with the dimension of the content. Similarly, the shape may not be fixed to be square for all images, but in an alternative implementation may vary with the aspect ratio of the content. Next, a payload is chosen. Depending on the application, the payload may identify one or more of a recipient of the video or image content, information about the creator of the video or image content, such as an individual, company, institution, or image creation system that may be using, for example, artificial intelligence (Al) or machine learning. In another application, the payload may also identify or point to a manifest that contains more information about the image such as the origin and authenticity. This manifest may, for example, follow the standard of the "Coalition for Content Provenance and Authenticity" that is published at https://c2pa.org/. The robust link provided by this watermark can ensure that that information about the creator and processing of the image is available, even after meta information has been stripped, as is frequently the case during upload to social media. In addition, the same mark that is used to pinpoint the source of the content can also be used to identify regions of the image where the mark is not intact and that have been tampered with.
[0027] In 230, a sequence is derived from the payload of the video or image content. The sequence may consist of a string of bits or digits from a binary system or other numeric systems with a different base such as decimal. The sequence may have been previously encrypted and encoded using error detection or error correction codecs such as Cyclic redundancy check (CRC), Bose-Chaudhuri-Hocquenghem (BCH), low-density parity-check (LPDC), or Reed-Solomon codecs. Parts of the payload sequence are assigned to Pixelgroups. For example, each bit in the payload sequence is assigned to multiple Pixelgroups such that all groups have a bit assigned. The bit is translated into a positive, neutral, or negative sign value that is used to embed the mark. A neutral sign signifies no modification in that Pixelgroup. A neutral sign is introduced to reduce the amount of marking for reduced visibility but may also be used as a signal itself and provides extra variation to the watermark. The assignment may depend on a secret key in a manner, such that the assignment is not known without the key and the key is required for correct assignment of Pixelgroups to payload parts without which the watermark is not readable. Information like user profiles, personal information or company specific information may be used as a key. The assignment of payload bits to Pixelgroups can remain the same for an asset, asset collection. Some Pixelgroups are not used for embedding and the sign value may be neutral in this case.
[0028] In addition, a seed is derived (240) that may affect the embedding. The seed may be derived using an input such as an identifier of the frame such as a frame number, picture order count, time code or a hash code calculated over the frame content. A seed may also be a number derived from the image content that changes in value along with changes of the image content, such as, for example, a sum of pixel values of a reduced image size, possibly by image sections. One advantage of such an implementation is that the watermark only changes if the image content changes. These derived seeds may also repeat and cycle in regular intervals. Additionally, unmarked locations in the image or video content may be derived from the seed, and in some implementations in combination with strength values assigned at 250. These seeds may determine the strength of each Pixelgroup in a frame and may change with frame or difference within frames. Using the derived seed value, strength values are assigned to each Pixelgroup (250). The payload sequence may be used to assign a positive, or negative strength value for each Pixelgroup depending on the value of the payload sequence. The assigned value may also be neutral, resulting in no modification to the Pixelgroup during the embedding process. In some implementations, it may be important to omit some embeddings of Pixelgroups such that the embedded watermark does not appear static, which commonly results in visible artifacts that are perceived by the viewer of the content during playback as they are highlighted by the human vision that has the tendency to notice a different motion between the embedding modification located at a constant position and video content that may include scenes such as panning during which the content is moving. This effect is also known as the so-called dirty window effect as it is resembles a static impurity on a window that becomes visible when the content seen through the window is moving. Additional reasons for these variations include enhanced security resulting from watermarking each frame differently.
[0029] If the size of the payload sequence is shorter than the amount of Pixelgroups, the values are assigned repeatedly, such that the payload sequence is embedded multiple times. Repetitions may be identical or contain variations that will aid during extraction with error correction coding.
[0030] In addition to the aforementioned positive, neutral, or negative sign for each pixel group a strength value is assigned that will determine the amount of changes applied in the Pixelgroup. [0031] Different strength values within a Pixelgroup may be used by the decoder to estimate the unmarked, original input image from the watermarked image and use it to discern precisely which Pixelgroups have been embedded and use those for extraction only. If this information is not available during extraction because the original image is not available or cannot be identified, then the embedded information may still be readable, as the areas that have not been used for embedding will be noise that can be compensated for even if they are not known.
[0032] The pattern created but variation in strength or presence (neutral sign) of Pixelgroups can be used to locate a group of Pixelgroups and thereby understand the location in the image even if it has been changed by transformations such as cropping.
[0033] The value of pixels in each Pixelgroup, i.e. their shape and content, is decided in step 260. Pixels in the group may be changed with different strengths and some may not be changed at all (i.e., so-called dithering). This dithering may be used to achieve an overall luminance change between integer values of discrete luminance changes. For example, it is possible to change the average luminance by .8 units by setting 8 out of 10 values to 1, with the remaining values set to 0. The dithering pattern (e.g., which pixels of the aforementioned 8 out of 10 values) may be regular like a checkerboard pattern or pseudo random, i.e. noise like as shown in FIG. 4. The pattern may also be chosen to identify the Pixelgroup or signal additional information derived by, for example, another payload used for a different purpose. The identification can be used to determine the original location of the Pixelgroup when reading from an image where the location has changed due to geometric variations like cropping or resizing. In this case the pattern in the Pixelgroup is compared to a known list of patterns and their locations.
[0034] The unique pattern may also serve to identify small changes to the image when verifying the structure of the pattern. This may be useful for the use case of image authentication.
[0035] In an additional variation, pixels that are not assigned to a Pixelgroup are not changed such that there are unmodified pixels between any neighboring Pixelgroups within a given image or frame. These pixels may help during the reading of the watermark when used to estimate the unmarked content of the Pixelgroups and then derive the most likely watermarking depending on if the estimate is lighter or darker than the estimation. The gap may also be helpful to conceal changes as it widens and thereby blurs the transition between light and dark Pixelgroups.
[0036] Ultimately the image is modified (270) by altering it according to the Pixelgroups strength, shape and content. In some implementations the changes are applied to the luminance of the pixels, and in alternative implementations, color values like blue for reduced visibility or green for increased strength may be chosen. Embedding in the color domain may be considered in cases where color transformations are unlikely or do not need to maintain a readable watermark. The value is clamped to the range (16, 240) when it is required to comply with the BT.709 specification. Other specifications with different ranges can be supported as well, including, for example, BT.2020. This step may be applied in a different processor or module and application may also include the use of a computer graphics shader or on-screen display (OSD) as it commonly available in video playback environments. In this example, the watermark to be applied is rendered as an image with different levels of dark and light influence according to the sign values and alpha values indicating the strength of the overlaying image according to the strength value.
[0037] Depending on the extraction process and required robustness against geometric distortions stemming from, for example a digital picture being taken with a camera, non-blind extraction that allows comparison to an unmarked frame may not be available. Hence, video frames may be modified with synchronization marks (280). These may have a known shape of dark corners or known noise patterns that can be located by searching for a high correlation. Ordinarily the resulting image has an unchanged visual appearance, and the mark is embedded in a hidden manner; however, depending on the strength and required survivability of the mark, some artifacts may become noticeable.
[0038] EXTRACTION WORKFLOW -
[0039] The extraction as shown in FIG. 3 consists of multiple steps. The embedding mechanism enables the extraction after severe degradation such as detection from a single frame of video that has been recorded using a smartphone camera or after the watermarked image has been printed and scanned again.
[0040] The first step 310 is to identify a watermarked image or image sequence. This may be done when confidential or copyrighted material is illegally distributed or if content is submitted for verification. A next, optional step 320 is to identify the original content as for example, matching it with the frame number and content of the unmarked video; it may also be an image out of an image archive. Identification methods like fingerprinting (also known as automated content recognition), perceptual hashes, other watermarking information and manual identification may be applied. Similar algorithms may also be applied in the next step 330 for frame restoration. The identification may not require procurement of the entire image but may be using elements or high-level information thereof that are useful for restoration. The frame restoration aims to identify the content location and size used for watermarking in order to understand distortions such as rotation, shifting, cropping, resizing, bending, etc. Matching content from the original may be used to understand the distortions as well as identification of synchronization marks which reveal known locations. Filtering or trained artificial intelligence networks can be used to allow for estimation of changes in size and rotation. Identification of the Pixelgroup patterns or their unique strength profile can help to identify the location of Pixelgroups in the image and as these can be fixed or known can be used for geometric content restoration. The next step 340 is to determine embedding locations in the same way as during embedding (250 and 260) by determining the location and use of Pixelgroups. Additional filtering, comparison with the original or training artificial intelligence networks may help to emphasize watermark signal (350) so that it becomes stronger compared to the noise introduced from the underlying image or copying process.
[0041] Step 360 will determine strength values for each of the Pixelgroups, as in the embedding stage, the seed may be used to determine what strength values were used for the watermarking and the assignment of payload parts (without yet knowing their value) to Pixelgroups is also the same as during embedding. What strength values were used for the watermarking may improve extraction performance but is not required if the information cannot be determined.
[0042] This will allow, in step 370, the actual readout step to estimate the embedded value of each Pixelgroup used for watermarking. To do so the change applied by the watermark, such as change in luminance added to the unmarked image is estimated by comparing or subtracting the original in the same geometric transformation or, alternatively by estimating the unmarked (original) luminance using unmarked pixels in the Pixelgroup such as, for example, the border pixels. With the Pixelgroup values that have been read, a value such as a bit is derived from each Pixelgroup and from the resulting bitstring, the payload is decoded 380. This consists of reverse processes applied during embedding, like error detection coding and encryption. The payload decode may be followed by observations on the probability of correct detection and variations of any previous steps may be used to find the most likely payload. Reliability observations may be done using the output of error-correction or -detection coding as well as consistency of Pixelgroup readings that represent the same information.
[0043] Different image modifications such as copy via camera, digital filtering or reduction in size which may be estimated from image properties such as type of noise or number of fine details and steps above may be varied to have better accuracy of payload detection for the estimated modification.
[0044] Multiple images that are known to have the same watermark can be extracted and results can be combined at any stage to increase the probability of the correct readout.
[0045] PIXELGROUP PATTERN -
[0046] The Pixelgroups are used to modify the pixel values of the area assigned to it. Here shown in an example with luminance modification to change the average luminance value in the Pixelgroup. The content and composition of what pixels are changed is determined by the Pixelgroups pattern. FIG. 4 shows several variations. 410 is an example of a squared Pixelgroup of 16x16 pixels with a solid center of pixel size 14x14. This represents a variation that is used to darken the image according to the darkness of the represented pixels such that the white border 420 results in no modification of the watermarked image. The center pixels will be modified by a small percentage of luminance only but are represented stronger for visualization. Pixelgroup 430 shows a dithered version that reduces the overall average luminance less and can be used to change it by a fraction of a digital luminance step. 440 shows a Pixelgroup being filled with a random pattern that can be used for secondary storage of information or as a seal for image authentication. In the blurred version 450 this pattern is less noticeable and includes a high frequency removal that may occur anyway with common compression techniques applied to images and video for storage and transmission. 460 is the pattern shown in 410 with the edges blurred in a dithered method such that the blocks are less pronounced and thereby less visible. When a random pattern is used the reduction of the transition to the outside of the Pixelgroup like 440 is depicted in 470.
[0047] PIXELGROUP SHAPE -
[0048] FIG. 5 shows Pixelgroup shapes that are rectangular. This way, they resemble macroblocks or other rectangular structures used during video compression in a way that sometimes creates blocky artifacts. In giving the Pixelgroups a similar structure, they resemble these compression artifacts, and the watermarking therefore is not obvious as such.
[0049] They also do not touch each other but have a border around them, which allows for detection techniques to estimate the modification but may also be useful as it leaves a regular, grid-like pattern that can be used to identify the watermark's position and structure using image processing.
[0050] As shown in FIG. 5, the Pixelgroups can be combined into a pattern 510 covering the image. Some Pixelgroups may be combined into repeating blocks as in 520 that, due to the repetitive nature, allow identification using auto correlation. While in general the pattern repeats, some variations may be performed to, for example, disable the watermarking of some locations as in 530. For this pattern, the white and dark areas signify reduction or increase in luminance when applied, while the medium gray will not result in changes to the image that is watermarked.
[0051] EMBEDDING AREAS -
[0052] FIG. 6 shows the output (610) of a device capable of video playback such as a TV, PC or smartphone and the video area that contains a watermark (with increased strength for emphasis) that is used to watermark the rendered output video. The watermark may contain a number identifying the recipient such that leaked content from this recipient can be identified. Watermarking may have been performed on a server when the video was processed and delivered as watermarked content to the client. Alternatively, the watermark may have been applied on the client side.
[0053] FIG. 7 shows an output (710) where the web browser output window (710) is completely covered and protected by the watermark such that if part of this area is leaked along with the valuable video content it can also serve to read the watermark, allowing for a more reliable detection. For application, client-side technologies like shaders may be applied.
[0054] FIG. 8 shows a watermarking of the entire desktop (810) to extend the protection further. The application may be on the client-side, or it may be on the server side, where the desktop output of a remote workstation using a remote desktop connection is watermarked in order to be secured before transmission to a remote terminal. Protection in this case includes any application or combination of multiple applications that show confidential information that require protection such that potential leaks can be identified. The application to the desktop may occur in locations that process the video output, this includes a graphics processor, graphics card or a virtual display adapter of a virtual machine.
[0055] COMBINED WATERMARK PATTERN - [0056] FIG. 9 shows an example of a combined watermark pattern (910) where the primary watermark consists of darker (920), lighter (930) or neutral (940) areas, while the secondary watermark is also embedded with an independent noise pattern. The primary watermark having the average luminance value of the square Pixelgroup as a means to encode information by darkening and lightening the underlying image while the composition of the noise pattern and luminance value in relation to the other elements of the same noise pattern are a secondary watermark. As mentioned above, the first being read by estimating the applied luminance to an area of a Pixelgroup and the second being read by estimating what noise pattern is present using cross-, or auto correlation.
[0057] EMBEDDING PROCESS VARIATIONS -
[0058] The embedding domain can vary to allow for additional layers of independent watermarks or improve invisibility or robustness for a certain category of images or attacks.
[0059] Embedding, in the preferred embodiment, is applied to the luminance domain as it is very robust and is therefore suitable as a carrier for a robust watermark. In alternative embodiments the watermark may also be applied to individual colors channels such as other channels of the YUV or channels of the RGB image representation, or channels of alternative color representation formats like ICtCp. In other embodiments the modification is applied to an image that has been transformed into the frequency domain including using a DCT Discrete Cosine Transform (DCT) or Fast Fourier Transform (FFT).
[0060] Different components of the payload might be applied in different domains. For instance, error correction information may be applied in a color domain while the main watermark is embedded in a luminance domain. In another example, information from different parties such as studio and distributor are embedded in different domains.
[0061] Layers of different embedded watermarks, however, do not have to vary in the embedding domain and can be in the same domain and overlap each other. Techniques like error correction coding, different assignments of Pixelgroups by, for example, using different passwords and random variations of the location can help to keep the watermarks apart. This is particularly relevant if the same embedder is used by different parties on the same content without an orchestrated effort or knowledge of each other.
[0062] Pixelgroups, in the preferred embodiment, have a square shape but, in an alternative embodiment, may also be rectangular. A rectangular shape may have the advantage that a rotation of, for example, 90° may be easier to detect when extracting the regular patterns created by embedded Pixelgroups.
[0063] The size and location of Pixelgroups may be aligned with the size and location of macro blocks or encoding blocks that are used during video encoding, if that is known at the time of embedding. The advantage is that the watermark is better maintained during compression after watermarking because the luminance values of lower frequencies, that hold most of the watermark's information, are maintained with a higher precision and uniform variation. If, in contrast, a Pixelgroup is split between macroblocks, the values have a higher likelihood to be rounded off and be affected stronger by compression. Consequently, the alignment with the same grid or size of macroblocks preserves more of the embedded information.
[0064] Patterns within a Pixelgroup may represent individual bits or symbols that can represent one of several values (e.g., each pattern is chosen from a group of 8 different alternative patterns). This is applied in addition to, or as an alternative to, luminance variations to store additional information.
[0065] Patterns within a Pixelgroup may be a noise pattern that is detectable with cross correlation approaches or a repeating pattern detectable with auto correlation approaches. For this purpose, the noise pattern may be spanning over several Pixelgroups to embed a secondary signal. In order to maintain the pattern structure given by the noise pattern as well as the desired luminance change for this Pixelgroup given by the primary watermark either the darker pixels in the noise pattern or the lighter pixels only are taken to shape the watermark to be embedded, in a strength that is given by considerations in 250. Alternatively, the entire noise pattern of the secondary watermark is reduced according to the luminance requirement of the primary watermark. The result is an overall noise structure of the secondary pattern while the average luminance of the Pixelgroups is determined by the primary watermark.
[0066] The change in Pixelgroup strength values may also vary between areas with strong vs very light embedding such that in the very lightly embedded areas the pattern is not visible. This process may consider the visual properties of the image using a perceptual model that may be using different properties for flat vs noisy image regions. These can be reconstructed using the original image during extraction time or estimated from the watermarked image to estimate the embedding strength used, as it can help readout.
[0067] In applications where pixel access is not available, a perceptual model depending on values derived from the encoded content such as size of the frame (for frame type) the size of encoded motion vectors as a proxy for similarity to previous frames and indication of the type of motion in this frame can be used. Regular motion may increase the dirty window effect, and more variation of the watermark may be important. Many different motion vectors will take more compression space and may leave less data to encode error residuals and hence the compression for this frame may be more lossy than for the average frame and the watermark needs to be embedded stronger to survive. This method may be combined with an overlay approach that adds a watermark image to the content without access to individual pixels. Global frame statistics like this help to estimate some frame or video properties and change the watermark accordingly to enable perceptual shaping of the watermark in space or time.
[0068] The creation of a watermark pattern can be separated from its perceptual shaping (i.e. adjusting strength of the watermark pattern depending on its visibility in an area given the unwatermarked image). In particular, the pattern creation may occur on a server and is transmitted to the client where it is applied and, if possible, adjusted to the content.
[0069] Instead of applying the same payload sequence multiple times to leverage correlation to enhance robustness, erasure codes such as Fountain codes or Online codes (e.g. Tornado or RaptorQ code) can be used to extend the sequence, creating redundancy with a more uniform distribution, allowing for reduced errors at readout as well as a similar amount of information bits in the payload sequence.
[0070] To apply any watermark modification, baseband video before compression can be used. An overlay image may be applied to embed the watermark onto the content. It may change less frequently but can also vary for security and invisibility. Modification can occur using technologies like shaders (WebGI) or OSDs (on-screen display) overlaid onto the image transparently, typically on the client side with video playback in video players including a web browser. These approaches may be used to apply the pattern to an image larger than the video image (see below).
[0071] The watermark can be removed by reversing the embedding process and inverting the pixel changes. This may be used for instance in use cases where the watermark is only temporary or needs to be replaced with a different watermark.
[0072] EXTRACTION PROCESS VARIATIONS -
[0073] When a picture is taken of a video it may contain elements other than the image content, such as a frame of the TV displaying the video. Image segmentation using for example object detection techniques may help to find regions that may contain the relevant content only and that can be used to attempt extraction. Candidates could for example be images inside a television frame or on a phone display.
[0074] Alignment of watermarked images during extraction to original shape and dimension can be performed using known elements, without use of the original image content. This is done by looking for the watermark pattern, including synchronization marks that are semitransparent or only appear periodically. Additionally known elements such as visual elements within the screen, such as broadcast logos, captions, or navigation buttons from online players that are always displayed on the same known location on the screen can be used.
[0075] They may have been introduced for synchronization purposes but also contain a visual signal like do not copy. These logos may be designed with synchronization alignment and recognition properties in mind with, for example, a unique shape or pattern that is easier to recognize and identify. Patterns may consist of a barcode-like structure that can store additional information for each marked copy or is fixed to allow for identification using correlation approaches.
[0076] When identifying or aligning the original image with the watermarked copy, stemming from a camera copy, the blending of neighboring frames can be taken into account. Blending occurs when, during the time to expose the camera image, several video frames are displayed. Frames may be mixed unevenly depending on the sequence of when pixels are displayed by the display device as well as when pixels are recorded by the camera. For example, some cameras record the upper part of the image first resulting in the top of the image recording of frame n and the lower part of the image recording more of frame n+1. Other cameras record all pixels at the same time and the resulting image may be an even blend of two or more frames. This can be used by testing for different ratios of blended frames for example after a principal frame has been identified once the ratio is established the original frames can be blended accordingly to allow for better synchronization and resulting better extraction, in particular when subtracting the original from the watermarked copy.
[0077] The identification of the original frame is performed with fingerprinting. Additional methods of identification include fingerprinting that identifies relevant frames by detecting all objects in a video at the time of watermarking using automated labeling techniques. This will create a dictionary of all elements in each frame. This may also include object properties like color, size and frequency. At extraction time, application of the same or similar automated labeling, segmentation or object detection techniques are applied for information used to search the database of known content.
[0078] Alternatively, use of a robust hash or output of a fingerprinting system can be used for identification of watermarked images. This may include different levels of precision where some areas are decided with a robust hash and others with a fine grain hash to ensure that some areas can be identified from an approximation of the original image e.g. from the watermarked copy, and at the same time ensure sufficient modification of the image can be tolerated. Fingerprinting can be implemented with methods including a luminosity histogram or a scaled down version of the image as a fingerprint
[0079] Image filtering before extraction that enhances the watermark location and suppresses the image content can improve extraction performance. High pass filters that pass the expected size of the Pixelgroups, i.e. a high pass or sharpening filter that multiplies with the size of the Pixelgroup (e.g., 16x16 if there is no size change suspected) can be used to emphasize Pixelgroups of that shape.
[0080] In some embodiments extraction can be performed by correlating the watermarked original with a suspected watermark Pixelgroup pattern to find the highest correlation score which is an indication of the likelihood of the embedded pattern.
[0081] Areas that potentially contain a lower quality image than the rest are identified and ignored during extraction. This can include for example areas that are overexposed due to glare from a light source that is reflected in the monitor displaying the image as it is captured. In another embodiment this may include areas that are underexposed or exhibit a strong Moire pattern.
[0082] If multiple frames are available, they are combined to assemble the embedded information and improve the readout. The combination of frames may be on the pixel, Pixelgroup or payload level.
[0083] OTHER VARIATIONS -
[0084] Application for authentication: the present disclosure and the pattern that is applied may also be used to authenticate an image and tamper proof it. While, for a robust watermark, the task is to identify the information even if it is only read in parts of the image, for the authentication use case any parts of the image that do not contain the watermark as embedded are suspicious. Those regions where the watermark is weaker or not present are more likely to have been modified by a process other than compression and may indicate that the image has been tampered with. Authentication may be performed in a two-step process during which, in a first step, the most likely watermark is read and in a second that the integrity of this watermark is verified throughout the image.
[0085] Increased security is accomplished by using a secret key to encrypt the payload or decision on how to assign parts of the payload information sequence to individual Pixelgroups.
[0086] Where certain elements of these implementations can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present disclosure are described, and detailed descriptions of other portions of such known components are omitted so as not to obscure the disclosure.
[0087] In the present specification, an implementation showing a singular component should not be considered limiting; rather, the disclosure is intended to encompass other implementations including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein.
[0088] Further, the present disclosure encompasses present and future known equivalents to the components referred to herein by way of illustration.
[0089] It will be recognized that while certain aspects of the technology are described in terms of a specific sequence of steps of a method, these descriptions are only illustrative of the broader methods of the disclosure and may be modified as required by the particular application. Certain steps may be rendered unnecessary or optional under certain circumstances. Additionally, certain steps or functionality may be added to the disclosed implementations, or the order of performance of two or more steps permuted. All such variations are considered to be encompassed within the disclosure disclosed and claimed herein.
[0090] While the above detailed description has shown, described, and pointed out novel features of the disclosure as applied to various implementations, it will be understood that various omissions, substitutions, and changes in the form and details of the device or process illustrated may be made by those skilled in the art without departing from the disclosure. The foregoing description is of the best mode presently contemplated of carrying out the principles of the disclosure. This description is in no way meant to be limiting, but rather should be taken as illustrative of the general principles of the technology. The scope of the disclosure should be determined with reference to the claims.

Claims

Claims
1. A method for embedding a robust watermark in an image or video frame, the method comprising: splitting the image or video frame into a plurality of Pixelgroups; deriving a payload sequence for a payload message associated with the image or video frame; assigning each Pixelgroup of the plurality of Pixelgroups a positive, negative or neutral sign value using the derived payload sequence; deriving a seed value from the image or video frame; assigning each Pixelgroup of the plurality of Pixelgroups a strength value using the derived seed value; and modifying the image or video frame using the assigned positive, negative or neutral sign value and the assigned strength value associated with respective ones of the plurality of Pixelgroups to modify pixels within the respective Pixelgroup that do not reside at a border region of the respective Pixelgroup.
2. The method of Claim 1, wherein the image or video frame comprises a plurality of video frames and the method further comprises varying the seed value between each of the plurality of video frames.
3. The method of Claim 1, wherein the deriving of the seed value further comprises using information identifying the image or video frame.
4. The method of Claim 1, wherein the modifying of the image or video frame using the assigned positive or negative sign value and the assigned strength value associated with the given Pixelgroup comprises modifying an average luminance value for the given Pixelgroup.
5. The method of Claim 1, wherein the modifying of the image or video frame comprises modifying pixels within the given Pixelgroup using a dithering pattern.
6. The method of Claim 1, wherein each of the plurality of Pixelgroups comprises a rectangular image space of pixels.
7. The method of Claim 1, wherein the deriving of the payload sequence further comprises deriving from identifying information of a manifest that contains information about origin and authenticity of the image or video frame.
8. The method of Claim 1, further comprising using a desktop output from a computer as the image or video frame.
9. The method of Claim 1, further comprising using a graphics overlay for the modifying of the image or video frame.
10. The method of Claim 1, further comprising using a web browser output window as the image or video frame.
11. The method of Claim 1, further comprising performing the steps of the method using a graphics processing unit.
12. A non-transitory computer-readable storage apparatus comprising a plurality of instructions, that when executed by a processor apparatus, are configured to: split an image or video frame into a plurality of Pixelgroups; derive a payload sequence for a payload message associated with the image or video frame; assign each Pixelgroup of the plurality of Pixelgroups a positive, negative or neutral sign value using the derived payload sequence; derive a seed value from the image or video frame; assign each Pixelgroup of the plurality of Pixelgroups a strength value using the derived seed value; and modify the image or video frame using the assigned positive, negative or neutral sign value and the assigned strength value associated with respective ones of the plurality of Pixelgroups to modify pixels within the respective Pixelgroup that do not reside at a border region of the respective Pixelgroup.
13. The non-transitory computer-readable storage apparatus of Claim 12, wherein the image or video frame comprises a plurality of video frames and the plurality of instructions, when executed by the processor apparatus, varies the seed value between each of the plurality of video frames.
14. The non-transitory computer-readable storage apparatus of Claim 12, wherein the derivation of the seed value further comprises use of information identifying the image or video frame.
15. The non-transitory computer-readable storage apparatus of Claim 12, wherein the modification of the image or video frame using the assigned positive or negative sign value and the assigned strength value associated with the given Pixelgroup comprises modification of an average luminance value for the given Pixelgroup.
16. The non-transitory computer-readable storage apparatus of Claim 12, wherein the modification of the image or video frame comprises modification of pixels within the given Pixelgroup using a dithering pattern.
17. The non-transitory computer-readable storage apparatus of Claim 12, wherein each of the plurality of Pixelgroups comprises a rectangular image space of pixels.
18. The non-transitory computer-readable storage apparatus of Claim 12, wherein the derivation of the payload sequence further comprises derivation from identifying information of a manifest that contains information about origin and authenticity of the image or video frame.
19. The non-transitory computer-readable storage apparatus of Claim 12, further comprising use of a desktop output from a computer as the image or video frame.
20. The non-transitory computer-readable storage apparatus of Claim 12, further comprising use of a graphics overlay for the modification of the image or video frame.
PCT/IB2024/000509 2023-09-14 2024-09-13 Digital watermarking of video frames and images for robust payload extraction from partial frames or images Pending WO2025056974A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363582761P 2023-09-14 2023-09-14
US63/582,761 2023-09-14

Publications (1)

Publication Number Publication Date
WO2025056974A1 true WO2025056974A1 (en) 2025-03-20

Family

ID=93119431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2024/000509 Pending WO2025056974A1 (en) 2023-09-14 2024-09-13 Digital watermarking of video frames and images for robust payload extraction from partial frames or images

Country Status (1)

Country Link
WO (1) WO2025056974A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010022848A1 (en) * 1994-03-17 2001-09-20 Rhoads Geoffrey B. Method of producing a security document
US20050129271A1 (en) * 2003-12-05 2005-06-16 Yun-Qing Shi System and method for robust lossless data hiding and recovering from the integer wavelet representation
EP2410726A2 (en) * 2004-02-02 2012-01-25 Nippon Telegraph And Telephone Corporation Digital watermark detection apparatus, method and program
US20210150538A1 (en) * 2019-11-15 2021-05-20 Ck&B Co., Ltd. Genuine-product certification content creation device and integrated certification system using the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010022848A1 (en) * 1994-03-17 2001-09-20 Rhoads Geoffrey B. Method of producing a security document
US20050129271A1 (en) * 2003-12-05 2005-06-16 Yun-Qing Shi System and method for robust lossless data hiding and recovering from the integer wavelet representation
EP2410726A2 (en) * 2004-02-02 2012-01-25 Nippon Telegraph And Telephone Corporation Digital watermark detection apparatus, method and program
US20210150538A1 (en) * 2019-11-15 2021-05-20 Ck&B Co., Ltd. Genuine-product certification content creation device and integrated certification system using the same

Similar Documents

Publication Publication Date Title
Swanson et al. Multimedia data-embedding and watermarking technologies
CN101273367B (en) Covert and robust mark for media identification
US8194917B2 (en) Progressive image quality control using watermarking
US7187780B2 (en) Image processing methods using reversible watermarking
Wang et al. Automatic image authentication and recovery using fractal code embedding and image inpainting
CN111932432B (en) Blind watermark implanting method, blind watermark detecting method, blind watermark implanting device, blind watermark detecting equipment and storage medium
Queluz Authentication of digital images and video: Generic models and a new contribution
US20020120849A1 (en) Parallel processing of digital watermarking operations
Gugelmann et al. Screen watermarking for data theft investigation and attribution
US10958926B2 (en) Digitally watermarked compressed video image sequences
Rigoni et al. Detecting tampering in audio-visual content using QIM watermarking
Cheddad et al. Enhancing steganography in digital images
CN114549270A (en) Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization
Huang et al. Unseen visible watermarking: a novel methodology for auxiliary information delivery via visual contents
CN109474830B (en) Embedding and extracting method of digital video steel seal
Maiorana et al. Multi‐bit watermarking of high dynamic range images based on perceptual models
KR101200345B1 (en) Block-bases image authentication method using reversible watermarking based on progressive differential histogram
WO2025056974A1 (en) Digital watermarking of video frames and images for robust payload extraction from partial frames or images
US10820064B2 (en) Marking video media content
Fu et al. Reliable information hiding based on support vector machine
US7356159B2 (en) Recording and reproduction apparatus, recording and reproduction method, recording and reproduction program for imperceptible information to be embedded in digital image data
CN114710712A (en) Video digital watermark adding and extracting method
WO2024040474A1 (en) Encrypted image watermark processing method and apparatus, and display device
CN119676371A (en) Video playback protection method and device based on digital watermark
Pantuwong et al. Alpha channel digital image watermarking method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24790665

Country of ref document: EP

Kind code of ref document: A1