HK1150090B - Methods and apparatus for embedding watermarks - Google Patents
Methods and apparatus for embedding watermarks Download PDFInfo
- Publication number
- HK1150090B HK1150090B HK11103846.7A HK11103846A HK1150090B HK 1150090 B HK1150090 B HK 1150090B HK 11103846 A HK11103846 A HK 11103846A HK 1150090 B HK1150090 B HK 1150090B
- Authority
- HK
- Hong Kong
- Prior art keywords
- data stream
- compressed
- media data
- identification information
- media
- Prior art date
Links
Description
The present application is a divisional application of patent applications having application dates 6 and 14/2004, application number 200480020200.8, international application number PCT/US2004/018953, and invented name "watermark embedding method and apparatus".
Technical Field
The present invention relates generally to media metering and, more particularly, to a method and apparatus for embedding watermarks in compressed digital data streams.
Background
In modern television or radio broadcasting stations, compressed digital data streams are generally used to carry video and/or audio data to be transmitted. For example, the Advanced Television Systems Committee (ATSC) standard for Digital Television (DTV) broadcasts in the United states employs a Moving Picture Experts Group (MPEG) standard (e.g., MPEG-1, MPEG-2, MPEG-3, MPEG-4, etc.) for carrying video content and a digital audio compression standard (e.g., AC-3, also known as Dolby) for carrying audio content) (i.e., the ATSC standard: digital audio compression (AC-3), revision a, 8 months 2001). The AC-3 compression standard is based on perceptual digital audio coding techniques that reduce the amount of data required to reproduce the original audio signal while minimizing perceptual distortion. In particular, the AC-3 compression standard recognizes that the human ear cannot perceive a change in spectral energy at a particular spectral frequency that is less than the masking energy at that particular spectral frequency. The masking energy is an audio segment characteristic that depends on the pitch and noise-like characteristics of the audio segment. Different well-known psychoacoustic models can be used to determine the masking energy at a particular spectral frequency. In addition, the AC-3 compression standard provides a multi-channel digital audio format for Digital Television (DTV), High Definition Television (HDTV), Digital Versatile Disk (DVD), digital cable, and satellite transmission(e.g., 5.1 channel format) that enables the broadcast of special sound effects (e.g., surround sound).
Existing television or radio broadcast stations employ watermarking techniques to embed watermarks within video and/or audio data streams compressed according to compression standards such as the AC-3 compression standard and the MPEG Advanced Audio Coding (AAC) compression standard. Typically, watermarks are digital data that uniquely identify a broadcaster and/or program. Typically, a watermark is extracted using a decoding operation at one or more receiving points (e.g., a household or other media consumption point), whereby the watermark can be used to evaluate the viewing characteristics of individual households and/or groups of households to generate ratings information.
However, many existing watermarking techniques are designed for use with analog broadcast systems. Specifically, existing watermarking techniques convert analog program data into a decompressed digital data stream, insert watermark data into the decompressed digital data stream, and convert the watermarked data stream into an analog format prior to transmission. With the ongoing transition to an all-digital broadcast environment, where compressed video and audio streams are transmitted over a broadcast network to a local affiliate, watermark data may need to be embedded or inserted directly into the compressed digital data stream. Existing watermarking techniques may decompress a compressed digital data stream into time domain samples, insert watermark data into the time domain samples, and recompress the watermarked time domain samples into a watermarked compressed digital data stream. Such decompression/compression may result in a degradation of the quality of the media content in the compressed digital data stream. Furthermore, existing decompression/compression techniques require additional equipment and result in delays in the broadcast audio component that may be unacceptable in some situations. Furthermore, the methods employed by local affiliates for receiving compressed digital data streams from their parent networks and inserting local content through sophisticated splicing (spicing) devices do not allow for the conversion of the compressed digital data streams into time domain (decompressed) signals before recompression of the digital data streams.
Drawings
FIG. 1 is a block diagram representation of an example media monitoring system;
FIG. 2 is a block diagram representation of an example watermark embedding system;
FIG. 3 is a block diagram representation of an example decompressed digital data stream associated with the example watermark embedding system of FIG. 2;
FIG. 4 is a block diagram representation of an example embedding device that may be used to implement the example watermark embedding system of FIG. 2;
FIG. 5 illustrates an example compressed digital data stream associated with the example embedding device of FIG. 4;
FIG. 6 illustrates an example quantization look-up table that may be used to implement the example watermark embedding system of FIG. 2;
FIG. 7 illustrates another example decompressed digital data stream that may be compressed and then processed using the example watermark embedding system of FIG. 2;
FIG. 8 illustrates an example compressed digital data stream associated with the example decompressed digital data stream of FIG. 7;
FIG. 9 illustrates one manner in which the example watermark embedding system of FIG. 2 may be configured to embed a watermark;
FIG. 10 illustrates one manner in which the modification process of FIG. 9 may be implemented;
FIG. 11 illustrates one manner in which a data frame may be processed;
FIG. 12 illustrates one manner in which a watermark may be embedded in a compressed digital data stream;
FIG. 13 illustrates an example encoding frequency index table that may be used to implement the example watermark embedding system of FIG. 2; and
fig. 14 is a block diagram representation of an example processor system that may be used to implement the example watermark embedding system of fig. 2.
Detailed Description
Generally, a method and apparatus for embedding a watermark in a compressed digital data stream is disclosed. The method and apparatus disclosed herein may be used to embed watermarks in compressed digital data streams without having to decompress the compressed digital data streams in advance. Thus, the methods and apparatus disclosed herein do not require multiple decompression/compression cycles on the compressed digital data stream, which is generally unacceptable for, for example, simulcast stations of a television broadcast network, as multiple decompression/compression cycles can significantly degrade the quality of the media content in the compressed digital data stream.
Prior to broadcast, the methods and apparatus disclosed herein may be used, for example, to unpack (unpackack) Modified Discrete Cosine Transform (MDCT) coefficient sets associated with a compressed digital data stream formatted according to a digital audio compression standard, such as the AC-3 compression standard. The mantissas of the unpacked MDCT coefficient sets may be modified to embed a watermark that imperceptibly augments the compressed digital data stream. Upon receiving the compressed digital data stream, a receiving device (such as a set-top television metering device at a media consumption site) may extract the embedded watermark information from the decompressed analog output (e.g., output emanating from a television set speaker). The extracted watermark information may be used to identify a media source and/or program (e.g., a broadcast station) associated with media currently being consumed (e.g., viewed, listened to, etc.) at the media consumption point. The source and program identification information may then be used in a known manner to generate ratings information and/or any other information that may be used to evaluate viewing characteristics associated with individual households and/or groups of households.
Referring to fig. 1, an example broadcast system 100 is metered using an audience measurement system, the example broadcast system 100 including a service provider 110, a television 120, a remote control device 125, and a receiving device 130. The various parts of the broadcast system 100 may be connected in any known manner. For example, the television 120 is placed in a viewing area 150 located in a household having one or more individuals, referred to as family members 160, some or all of whom have agreed to participate in an audience measurement research study. The receiving device 130 may be a set-top box (STB), a video tape recorder, a digital video recorder, a personal computer, a digital video disk player, etc. connected to the television 120. The viewing area 150 includes an area in which the television 120 is located, and one or more family members 160 located in the viewing area 150 can view the television 120 from the viewing area 150.
In the illustrated example, metering device 140 is configured to identify viewing information based on the video/audio output signal transmitted from receiving device 130 to television 120. The metering device 140 provides the viewing information and other tuning and/or demographic data to the data collection facility 180 via the network 170. Network 170 may be implemented using any desired combination of hardware and wireless communication links, including, for example, the internet, an ethernet connection, a Digital Subscriber Line (DSL), a telephone line, a cellular telephone system, a coaxial cable, etc. The data collection facility 180 may be designed to process and/or store data received from the metering device 140 to generate ratings information.
The service provider 110 may be implemented by any service provider, such as a cable television service provider 112, a Radio Frequency (RF) television service provider 114, and/or a satellite television service provider 116. The television 120 receives a plurality of television signals transmitted by the service provider 110 via a plurality of channels and may adapt the television 120 to process and display television signals provided in any format, such as a National Television Standards Committee (NTSC) television signal format, a High Definition Television (HDTV) signal format, an Advanced Television Systems Committee (ATSC) television signal format, a Phase Alternating Line (PAL) television signal format, a Digital Video Broadcasting (DVB) television signal format, an Association of Radio Industries and Businesses (ARIB) television signal format, and the like.
A user-operated remote control device 125 allows a user (e.g., a family member 160) to tune the television 120 to a desired channel and receive signals transmitted on the desired channel, and allows the television 120 to process and present or play out programs or media content contained in the signals transmitted on the desired channel. The processing performed by the television 120 may include, for example: extract the video and/or audio components conveyed via the received signal, cause the video components to be displayed on a screen/display associated with the television 120, and cause the audio components to be emitted by speakers associated with the television 120. The program content contained in the television signal may include, for example, television programs, movies, advertisements, video games, web pages, still images, and/or previews of other program content currently provided by the service provider 110 or to be provided in the future.
Although the various parts shown in fig. 1 are shown as separate parts within the broadcast system 100, the functions performed by some of these structures may be integrated within a single unit or may be implemented using two or more separate parts. For example, although the television 120 and the receiving device 130 are shown as separate structures, the television 120 and the receiving device 130 may be integrated into a single unit (e.g., an integrated digital television). In another example, television 120, receiving device 130, and/or recording device 140 may be integrated into a single unit.
To evaluate the viewing characteristics of individual household members 160 and/or household groups, a watermark embedding system, such as watermark embedding system 200 of fig. 2, may encode a watermark for uniquely identifying a broadcaster and/or program into a broadcast signal from service provider 110. The watermark embedding system may be implemented at the service provider 110 such that each of a plurality of media signals (e.g., television signals) transmitted by the service provider 110 includes one or more watermarks. Depending on the selection of the family member 160, the receiving device 130 may tune to a desired channel and receive the media signal transmitted on the desired channel and cause the television 120 to process and present the program content contained in the signal transmitted on the desired channel. Metering device 140 may identify watermark information based on the video/audio output signal transmitted from receiving device 130 to television 120. Thus, the metering device 140 may provide the watermark information and other tuning and/or demographic data to the data collection facility 180 via the network 170.
In fig. 2, an example watermark embedding system 200 includes an embedding device 210 and a watermark source 220. The embedding device 210 is configured to insert watermark information 230 from a watermark source 220 into a compressed digital data stream 240. The compressed digital data stream 240 may be compressed according to an audio compression standard, such as the AC-3 compression standard and/or the MPEG-AAC compression standard, either of which may be used to process blocks of audio signals using a predetermined number of digitized samples from each of a plurality of blocks of audio signals. A source (not shown) of the compressed digital data stream 240 may be sampled at a rate of, for example, 48 kilohertz (kHZ) to form audio blocks as described below.
Typically, audio compression techniques, such as those based on the AC-3 compression standard, use overlapping audio blocks and MDCT algorithms to convert audio signals into a compressed digital data stream (e.g., compressed digital data stream 240 of fig. 2). Two different block sizes (i.e., short block and long block) may be used depending on the dynamics of the sample audio signal. For example, an AC-3 short block may be used to minimize pre-echo for transient segments of an audio signal, while an AC-3 long block may be used to achieve high compression gain for non-transient segments of an audio signal. According to the AC-3 compression standard, an AC-3 long block corresponds to a block of 512 time-domain audio samples, while an AC-3 short block corresponds to 256 time-domain audio samples. According to the overlapping structure of the MDCT algorithm used in the AC-3 compression standard, in the case of AC-3 long blocks, an audio block of 512 time-domain samples is created by concatenating 256 time-domain samples of a previous (old) block with 256 time-domain samples of a current (new) block to obtain 512 time-domain samples. The AC-3 long block is then transformed using the MDCT algorithm to generate 256 transform coefficients. The AC-3 short block is similarly obtained from a pair of consecutive time-domain sample audio blocks according to the same standard. The AC-3 short block is then transformed using the MDCT algorithm to generate 128 transform coefficients. The 128 transform coefficients corresponding to two adjacent short blocks are then interleaved to generate a set of 256 transform coefficients. Thus, processing of either the AC-3 long block or the AC-3 short block results in the same number of MDCT coefficients. According to the MPEG-AAC compression standard as another example, a short block contains 128 samples and a long block contains 1024 samples.
In the example of fig. 3, the decompressed digital data stream 300 includes a plurality of 256-sample time-domain audio blocks 310, as shown generally by a0, a1, a2, A3, a4, and a 5. The MDCT algorithm processes the audio block 310 to generate MDCT coefficient sets 320, for example as shown by MA0, MA1, MA2, MA3, MA4, and MA5 (where MA5 is not shown). For example, the MDCT algorithm may process the audio blocks a0 and a1 to generate the MDCT coefficient set MA 0. The audio block a0 is concatenated with a1 to generate a 512-sample audio block (e.g., an AC-3 long block), which is MDCT transformed using an MDCT algorithm to generate an MDCT coefficient set MA0 that includes 256 MDCT coefficients. Similarly, the audio blocks a1 and a2 may be processed to generate the MDCT coefficient set MA 1. Thus, the audio block a1 is an overlapping audio block because it is used to generate both MDCT coefficient sets MA0 and MA 1. In a similar manner, the MDCT algorithm is used to transform the audio blocks a2 and A3 to generate the MDCT coefficient set MA2, the audio blocks A3 and a4 to generate the MDCT coefficient set MA3, the audio blocks a4 and a5 to generate the MDCT coefficient set MA4, and so on. Thus, the audio block a2 is an overlapped audio block for generating MDCT coefficient sets MA1 and MA2, the audio block A3 is an overlapped audio block for generating MDCT coefficient sets MA2 and MA3, the audio block a4 is an overlapped audio block for generating MDCT coefficient sets MA3 and MA4, and so on. Together, the plurality of MDCT coefficient sets 320 form the compressed digital data stream 240.
As described in detail below, the embedding device 210 of fig. 2 may embed or insert watermark information or watermark 230 from a watermark source 220 into a compressed digital data stream 240. For example, watermark 230 may be used to uniquely identify a broadcaster and/or program such that media consumption information (e.g., viewing information) and/or ratings information may be generated. Thus, the embedding device 210 generates a watermarked compressed digital data stream 250 for transmission.
In the example of fig. 4, the embedding device 210 includes an identifying unit 410, an unpacking unit 420, a modifying unit 430, and a repacking unit 440. Although the operation of the embedder 210 is described below in accordance with the AC-3 compression standard, the embedder 210 can be implemented to operate with another or other compression standard, such as the MPEG-AAC compression standard. The operation of the embedding arrangement 210 is described in more detail in connection with fig. 5.
First, the identifying unit 410 is configured to identify one or more frames 510 associated with the compressed digital data stream 240, some of which are shown, for example, as frame a and frame B in fig. 5. As previously described, the compressed digital data stream 240 may be a digital data stream compressed according to the AC-3 standard (hereinafter referred to as an "AC-3 data stream"). Although the AC-3 data stream 240 may include multiple channels, the following example describes the AC-3 data stream 240 as including only one channel for simplicity. In the AC-3 data stream 240, each frame 510 includes a plurality of MDCT coefficient sets 520. According to the AC-3 compression standard, for example, each frame 510 includes 6 MDCT coefficient sets (i.e., 6 "audioblocks"). For example, frame a includes MDCT coefficient sets MA0, MA1, MA2, MA3, MA4, and MA5, and frame B includes MDCT coefficient sets MB0, MB1, MB2, MB3, MB4, and MB 5.
The identification unit 410 is further configured to identify header information associated with each frame 510, e.g., the number of channels associated with the AC-3 data stream 240. Although the example AC-3 data stream 240 includes only one channel as described above, an example compressed digital data stream having multiple channels is described below in connection with FIGS. 7 and 8.
Referring to fig. 5, the unpacking unit 420 is configured to unpack the MDCT coefficient sets 520 to determine compression information, such as parameters of the original compression process (i.e., the manner in which the audio compression technique compressed the audio signal or audio data to form the compressed digital data stream 240). For example, the unpacking unit 420 may determine how many bits are used to represent each MDCT coefficient within the MDCT coefficient sets 520. In addition, the compression parameters may include information that limits the extent to which the AC-3 data stream 240 may be modified to ensure that the media content communicated over the AC-3 data stream 240 is of a sufficiently high quality level. The embedding device 210 then embeds/inserts the desired watermark information 230 into the AC-3 data stream 240 using the compression information identified by the unpacking unit 420, thereby ensuring that the watermark insertion is performed in a manner consistent with the compression information provided in the signal.
The compression information also includes mantissas and exponents associated with each MDCT coefficient, as described in detail in the AC-3 compression standard. The AC-3 compression standard employs techniques to reduce the number of bits used to represent each MDCT coefficient. Psychoacoustic masking is one factor that can be exploited by these techniques. For example, acoustic energy E that is present at a particular frequency k (e.g., a tone) or across a frequency band near the particular frequency k (e.g., a noise-like characteristic)kA masking effect is generated. I.e. if the energy variation at a frequency k or in a spectral region spanning a frequency band close to this frequency k is smaller than a given energy threshold deltaekThen the human ear cannot perceive the energy change. Due to this property of the human ear, the same as Δ E can be utilizedkStep size in question versus MDCT coefficient m associated with frequency kkThe quantization is performed without the risk of any human perceptible changes to the audio content. For the AC-3 data stream 240, each MDCT coefficient mkExpressed as mantissa MkAnd power XkSo that m isk=Mk·2-X k. The mantissa M used to represent each MDCT coefficient of the MDCT coefficient sets 520 may be determined from a known quantization look-up table (e.g., the quantization look-up table 600 of fig. 6) published in the AC-3 compression standardkThe number of bits of (c). In the example of fig. 6, the quantization look-up table 600 gives mantissa codes or bit patterns represented by four bits of MDCT coefficients and corresponding mantissa values. As described in detail below, the mantissa M may be changed (e.g., increased)kTo represent modified values of the MDCT coefficients to embed the watermark in the AC-3 data stream 240.
Returning to fig. 5, the modification unit 430 is configured to perform an inverse transform on each MDCT coefficient set 520 to generate a time-domain audio block 530, for example as shown by TA0 ', TA3 ", TA4 ', TA 4", TA5 ', TA5 ", TB0 ', TB 0", TB1 ', TB1 "and TB5 ' (TA 0" to TA3 ' and TB2 ' to TB4 ' are not shown). The modification unit 430 performs an inverse transform operation to generate a previous (old) group of time-domain audio blocks (denoted as prime-block) and a current (new) group of time-domain audio blocks (denoted as double-prime-block)) associated with a plurality of 256-sample time-domain audio blocks (the 256-sample time-domain audio blocks are concatenated to form the MDCT coefficient groups 520 of the AC-3 data stream 240). For example, the modification unit 430 performs inverse transformation on the MDCT coefficient group MA5 to generate the time-domain blocks TA4 "and TA5 ', performs inverse transformation on the MDCT coefficient group MB0 to generate TA 5" and TB0 ', performs inverse transformation on the MDCT coefficient group MB1 to generate TB0 "and TB1 ', and so on. In this manner, the modification unit 430 generates reconstructed time-domain audio blocks 540, the reconstructed time-domain audio blocks 540 providing a reconstruction of the compressed original time-domain audio blocks to form the AC-3 data stream 240. To generate the reconstructed time-domain audio blocks 540, the modification unit 430 may add the time-domain audio blocks, for example, according to the well-known Princen-Bradley time-domain aliasing cancellation (TDAC) technique as described in: princen et al, Analysis/Synthesis Filter Bank Design Based on Time Domain Analysis Cancellation, Institute of Electrical and Electronics Engineers (IEEE) Transactions on Acoustics, Speech and Signal Processing, Vol.ASSP-35, No.5, pp.1153-1161 (1996). For example, the modification unit 430 may reconstruct the time-domain audio block TA5 (i.e., TA5R) by adding a primary time-domain audio block TA 5' and a dual primary time-domain audio block TA5 "using Princen-Bradley TDAC techniques. Similarly, the modification unit 430 may reconstruct the time-domain audio block TB0 (i.e., TB0R) by adding the dominant audio block TB 0' and the dual-dominant audio block TB0 "using Princen-Bradley TDAC techniques. In this manner, the original time-domain audio blocks used to form the AC-3 data stream 240 are reconstructed so that the watermark 230 may be embedded or inserted directly into the AC-3 data stream 240.
The modification unit 430 is further configured to insert the watermark 230 into the reconstructed time-domain audio blocks 540 to generate watermarked time-domain audio blocks 550, e.g. as shown by TA0W, TA4W, TA5W, TB0W, TB1W and TB5W (blocks TA1W, TA2W, TA3W, TB2W, TB3W and TB4W not shown)). To insert the watermark 230, the modification unit 430 generates a modifiable time-domain audio block by concatenating two adjacent reconstructed time-domain audio blocks to create a 512-sample audio block. For example, the modification unit 430 may concatenate the reconstructed time-domain audio blocks TA5R with TB0R (each being a 256-sample audio block) to form 512-sample audio blocks. The modification unit 430 may then insert the watermark 230 into the 512-sample audio blocks formed by the reconstructed time-domain audio blocks TA5R and TB0R to generate watermarked time-domain audio blocks TA5W and TB 0W. The watermark 230 may be inserted into the reconstructed time-domain audio blocks 540 using an encoding process such as that described in U.S. Pat. Nos. 6,272,176, 6,504,870, and 6,621,881. The entire disclosures of U.S. Pat. Nos. 6,272,176, 6,504,870, and 6,621,881 are hereby incorporated by reference herein.
In the example encoding methods and apparatus described in U.S. Pat. Nos. 6,272,176, 6,504,870, and 6,621,881, a watermark may be inserted into a 512-sample audio block. For example, each 512-sample audio block carries one bit of embedded or inserted data of the watermark 230. In particular, the index f may be modified or increased1And f2To insert the data bits associated with watermark 230. For example, to insert a binary "1", the AND exponent f may be enhanced or increased1The power at the first spectral frequency is correlated such that it is the spectral power maximum in the neighborhood of frequencies (e.g., by the exponent f1-2、f1-1、f1、f1+1、f1+2 defined frequency neighborhood). At this time, the attenuation or increase is compared with the exponent f2The power at the associated second spectral frequency to be the spectral power minimum in the neighborhood of frequencies (e.g. by the exponent f)2-2、f2-1、f2、f2+1、f2+2 defined frequency neighborhood). In contrast, to insert a binary "0", the attenuation and the exponent f1The power at the associated first spectral frequency to make it a local spectral power minimum, while the enhancement is associated with an exponent f2The power at the associated second spectral frequency to make it a local spectral power maximum.
Returning to fig. 5, from the watermarked time-domain audio blocks 550, the modification unit 430 generates watermarked MDCT coefficient sets 560, for example as shown by MA0W, MA4W, MA5W, MB0W, and MB5W (blocks MA1W, MA2W, MA3W, MB1W, MB2W, MB3W, and MB4W are not shown). Following the above example, the modification unit 430 generates the watermarked MDCT coefficient set MA5W from the watermarked time-domain audio blocks TA5W and TB 0W. In particular, the modification unit 430 concatenates the watermarked time-domain audio block TA5W with TB0W to form a 512-sample audio block, and converts the 512-sample audio block into a watermarked MDCT coefficient set MA5W, which, as described in more detail below, the watermarked MDCT coefficient set MA5W may be used to modify the original MDCT coefficient set MA 5.
The difference between the MDCT coefficient sets 520 and the watermarked MDCT coefficient sets 560 represents a change in the AC-3 data stream 240 that results from embedding or inserting the watermark 230. As described in connection with fig. 6, for example, the modification unit 430 may modify mantissa values in the MDCT coefficient set MA5 according to a difference between coefficients in the corresponding watermarked MDCT coefficient set MA5W and coefficients in the original MDCT coefficient set MA 5. A quantization look-up table (such as look-up table 600 of fig. 6) may be used to determine new mantissa values associated with the MDCT coefficients of the watermarked MDCT coefficient sets 560 to replace old mantissa values associated with the MDCT coefficients of the MDCT coefficient sets 520. Thus, the new mantissa value represents a change or augmentation to the AC-3 data stream 240 that results from embedding or inserting the watermark 230. It is specifically noted that in this example implementation, the exponentiations of the MDCT coefficients are unchanged. Changing the power may require recalculation of the basic compressed signal representation, requiring a true decompression/compression cycle on the compressed signal. If modifying only the mantissas is not sufficient to fully reflect the difference between the watermarked and original MDCT coefficients, the affected MDCT mantissas are set to a maximum or minimum value, as appropriate. In the presence of such encoding constraints, the redundancy involved in the watermarking process allows the correct watermark to be decoded.
Returning to FIG. 6, the example quantization look-up table 600 includes an example mantissa M in the range of-0.9333 to +0.9333k15 levels of quantized mantissa codes and mantissa values. Although the example quantization look-up table 600 gives mantissa information associated with MDCT coefficients that are represented using 4 bits, the AC-3 compression standard provides quantization look-up tables associated with other suitable numbers of bits for each MDCT coefficient. To illustrate where the modification unit 430 may modify the mantissa contained in the MDCT coefficient set MA5 to be MkOf a particular MDCT coefficient mkIn one approach, assume a raw mantissa value of-0.2666 (i.e., -4/15). Using the quantization look-up table 600, a particular MDCT coefficient m in the MDCT coefficient set MA5 will be comparedkThe corresponding mantissa code is determined to be 0101. The watermarked MDCT coefficient set MA5W includes a mantissa value of WMkOf the watermarked MDCT coefficients wmk. Furthermore, it is assumed that the corresponding watermarked MDCT coefficient wm in the watermarked MDCT coefficient set MA5WkIs-0.4300, which is between the mantissa codes 0011 and 0100. In other words, in this example, the watermark 230 results in a difference of-0.1667 between the original mantissa value of-0.2666 and the watermarked mantissa value of-0.4300.
To embed or insert the watermark 230 into the AC-3 data stream 240, the modification unit 430 may use the watermarked MDCT coefficient sets MA5W to modify or augment the MDCT coefficients in the MDCT coefficient sets MA 5. Following the example above, due to the corresponding watermarked MDCT coefficients wmkAssociated watermarked mantissa WMkBetween mantissa codes 0011 and 0100 (since corresponding to watermarked MDCT coefficients wmkThe mantissa value of-0.4300), so that either the mantissa code 0011 or the mantissa code 0100 may be substituted for the MDCT coefficient mkAssociated mantissa code 0101. The mantissa value corresponding to mantissa code 0011 is-0.5333 (i.e., -8/15), and the mantissa value corresponding to mantissa code 0100 is-0.4 (i.e., -6/15). In this example, since the mantissa value-0.4 corresponding to the mantissa code 0100 is closest to the desired watermarked mantissa value-0.4300, the modification unit 430 selects the mantissa code 0100 instead of the mantissa code 0011 instead of the MDCT coefficient mkAssociated mantissa code 0101. The result is correlated with the watermarked MDCT coefficients wmkWM of watermarked mantissakThe corresponding new mantissa bitpattern 0100 replaces the original mantissa bitpattern 0101. Similarly, each MDCT coefficient in the MDCT coefficient set MA5 may be modified in the manner described above. If the watermarked mantissa value is outside the mantissa value quantization range (i.e., greater than 0.9333 or less than-0.9333), then the positive limit 1110 or the negative limit 0000 is selected as the new mantissa value, if appropriate. Furthermore, as described above, although the mantissa codes associated with the respective MDCT coefficients of the MDCT coefficient sets may be modified as described above, the exponents associated with the MDCT coefficients remain unchanged.
The repacking unit 440 is configured to repack the watermarked MDCT coefficient sets 560 associated with the frames of the AC-3 data stream 240 to be transmitted. In particular, the repacking unit 440 identifies the location of each MDCT coefficient set within a frame of the AC-3 data stream 240 so that the corresponding watermarked MDCT coefficient set may be used to modify the MDCT coefficient set. For example, to reconstruct the watermarked frame a, the repacking unit 440 may identify the locations of the MDCT coefficient sets MA0 through MA5 and modify the MDCT coefficient sets MA0 through MA5 according to the corresponding watermarked MDCT coefficient sets MA0W through MA5W at the corresponding identified locations. With the unpacking, modifying, and repacking processes described herein, the AC-3 data stream 240 remains a compressed digital data stream while the watermark 230 is embedded or inserted into the AC-3 data stream 240. As a result, the embedding device 210 inserts the watermark 230 into the AC-3 data stream 240 without additional decompression/compression cycles that may degrade the quality of the media content in the AC-3 data stream 240.
For simplicity, an AC-3 data stream 240 comprising a single channel is described in connection with FIG. 5. However, as described below, the methods and apparatus disclosed herein may be applied to a compressed digital data stream having audio blocks associated with a plurality of channels, such as 5.1 channels (i.e., 5 full bandwidth channels). In the example of fig. 7, the decompressed digital data stream 700 may include a plurality of audio block sets 710. Each audio chunk set 710 may include audio chunks associated with multiple channels 720 and 730, the channels 720 and 730 including, for example, a front left channel, a front right channel, a center channel, a surround left channel, a surround right channel, and a Low Frequency Effects (LFE) channel (e.g., a subwoofer channel). For example, the audio block set AUD0 includes an audio block A0L associated with the front left channel, an audio block A0R associated with the front right channel, an audio block A0C associated with the center channel, an audio block A0SL associated with the surround left channel, an audio block A0SR associated with the surround right channel, and an audio block A0LFE associated with the LFE channel. Similarly, the audio block set AUD1 includes an audio block A1L associated with the front left channel, an audio block A1R associated with the front right channel, an audio block A1C associated with the center channel, an audio block A1SL associated with the surround left channel, an audio block A1SR associated with the surround right channel, and an audio block A1LFE associated with the LFE channel.
Each audio block associated with a particular channel in the set of audio blocks 710 may be processed in a manner similar to that described above in connection with fig. 5 and 6. For example, a plurality of audio blocks associated with the center channel 810 of fig. 8 (e.g., as shown by A0C, A1C, A2C, and A3C) may be transformed to generate MDCT coefficient sets 820 associated with the compressed digital data stream 800. As indicated above, each MDCT coefficient set 820 may be derived from a 512-sample audio block formed by concatenating a previous (old) 256-sample audio block with a current (new) 256-sample audio block. The MDCT algorithm may then process the time-domain audio blocks 810 (e.g., A0C through A5C) to generate MDCT coefficient sets (e.g., M0C through M5C).
From the MDCT coefficient sets 820 of the compressed digital data stream 800, the identification unit 410 identifies a plurality of frames (not shown) and header information associated with each frame as described above. The header information includes compression information associated with the compressed digital data stream 800. For each frame, the unpacking unit 420 unpacks the MDCT coefficient sets 820 to determine the compression information associated with the MDCT coefficient sets 820. For example, the unpacking unit 420 may identify the number of bits used by the original compression process to represent the mantissa of each MDCT coefficient in each MDCT coefficient set 820. Such compressed information may be used to embed the watermark 230 as described above in connection with fig. 6. The modification unit 430 then generates an inverse transformed time-domain audio block 830, e.g. as shown by TA0C ", TA1C ', TA 1C", TA2C ', TA2C "and TA3C '. The time-domain audio blocks 830 include a previous (old) set of time-domain audio blocks (represented as prime blocks) and a current (new) set of time-domain audio blocks (represented as double prime blocks). The original time-domain audio blocks (i.e., the reconstructed time-domain audio blocks 840) that were compressed to form the AC-3 digital data stream 800 may be reconstructed by adding the corresponding prime and double prime blocks, for example, according to Princen-Bradley TDAC techniques. For example, the modification unit 430 may add the time-domain audio blocks TA 1C' and TA1C "to reconstruct the time-domain audio block TA1C (i.e., TA1 CR). Similarly, the modification unit 430 may add the time-domain audio blocks TA 2C' and TA2C "to reconstruct the time-domain audio block TA2C (i.e., TA2 CR).
To insert the watermark 230 from the watermark source 220, the modification unit 430 concatenates two adjacent reconstructed time-domain audio blocks to create a 512-sample audio block (i.e., a modifiable time-domain audio block). For example, the modification unit 430 may concatenate the reconstructed time-domain audio blocks TA1CR with TA2CR (each being a short block of 256 samples) to form 512-sample audio blocks. The modification unit 430 then inserts the watermark 230 into the 512-sample audio blocks formed from the reconstructed time-domain audio blocks TA1CR and TA2CR to generate watermarked time-domain audio blocks TA1CW and TA2 CW.
From the watermarked time-domain audio blocks 850, the modification unit 430 may generate watermarked MDCT coefficient sets 860. For example, the modification unit 430 may concatenate the watermarked time-domain audio blocks TA1CW with TA2CW to generate the watermarked MDCT coefficient set M1 CW. The modification unit 430 modifies the MDCT coefficient sets 820 based on a corresponding one of the watermarked MDCT coefficient sets 860. For example, the modification unit 430 may use the watermarked MDCT coefficient set M1CW to modify the original MDCT coefficient set M1C. The modification unit 430 may then repeat the above process for the audio blocks associated with each channel to insert the watermark 230 into the compressed digital data stream 800.
Fig. 9 is a flow diagram illustrating one manner in which the example watermark embedding system of fig. 2 may be configured to embed or insert a watermark into a compressed digital data stream. The example process of fig. 9 may be implemented as machine-accessible instructions utilizing any of many different programming codes stored on any combination of machine-accessible media such as a volatile or non-volatile memory or other mass storage device (e.g., a floppy disk, a CD, and a DVD). For example, the machine-accessible instructions may be embodied in the following machine-accessible medium: a programmable gate array, an Application Specific Integrated Circuit (ASIC), an erasable programmable read-only memory (EPROM), a read-only memory (ROM), a Random Access Memory (RAM), a magnetic media, an optical media, and/or any other suitable type of media. Further, while FIG. 9 illustrates a particular order of actions, these actions may be performed in other temporal sequences. Moreover, the flow chart 900 presented and described in connection with fig. 2 through 5 is merely an example of one way to construct a system to embed a watermark in a compressed digital data stream.
In the example of fig. 9, the process begins with the identification unit 410 (fig. 4) identifying a frame, such as frame a (fig. 5), associated with the compressed digital data stream 240 (fig. 2) (block 910). The identified frame may include a plurality of MDCT coefficient sets formed by overlapping and concatenating a plurality of audio blocks. For example, according to the AC-3 compression standard, a frame may include 6 MDCT coefficient sets (i.e., 6 "addblks"). In addition, the identifying unit 410 (fig. 4) also identifies header information associated with the frame (block 920). For example, the identification unit 410 may identify the number of channels associated with the compressed digital data stream 240.
The unpacking unit 420 then unpacks the plurality of MDCT coefficient sets to determine compression information associated with the original compression process used to generate the compressed digital data stream 240 (block 930). Specifically, the unpacking unit 420 identifies each MDCT coefficient m of each MDCT coefficient setkMantissa M ofkAnd power Xk. The exponents of the MDCT coefficients may then be grouped in a manner that is compatible with the AC-3 compression standard. The unpacking unit 420 (fig. 4) also determines the number of bits used to represent the mantissa of each MDCT coefficient so that the plurality of MDCT coefficient sets may be modified or augmented using a suitable quantization look-up table specified by the AC-3 compression standard as described above in connection with fig. 6. Control then proceeds to block 940, which block 940 is described in more detail below in conjunction with FIG. 10.
As shown in fig. 10, the modification process 940 begins by: an inverse transform is performed on the MDCT coefficient sets with the modification unit 430 (fig. 4) to generate inverse transformed time-domain audio blocks (block 1010). In particular, the modification unit 430 generates a previous (old) time-domain audio block (e.g., represented as a prime block in fig. 5) and a current (new) time-domain audio block (represented as a double-prime block in fig. 5) associated with each of the 256-sample original time-domain audio blocks used to generate the corresponding MDCT coefficient set. As described in connection with fig. 5, for example, the modification unit 430 may generate TA4 "and TA5 ' from the MDCT coefficient group MA5, TA 5" and TB0 ' from the MDCT coefficient group MB0, and TB0 "and TB1 ' from the MDCT coefficient group MB 1. For each time-domain audio block, the modification unit 430 adds the corresponding master block and dual master block to reconstruct the time-domain audio block, e.g., according to Princen-Bradley TDAC techniques (block 1020). According to the above example, the main block TA5 'and the double-main block TA5 "may be added to reconstruct the time-domain audio block TA5 (i.e., the reconstructed time-domain audio block TA5R), while the main block TB 0' and the double-main block TB 0" may be added to reconstruct the time-domain audio block TB0 (i.e., the reconstructed time-domain audio block TB 0R).
To insert the watermark 230, the modification unit 430 generates a modifiable time-domain audio block using the reconstructed time-domain audio block (block 1030). The modification unit 430 generates a modifiable 512-sample time-domain audio block using two adjacent reconstructed time-domain audio blocks. For example, the modification unit 430 may generate the modifiable time-domain audio blocks by concatenating the reconstructed time-domain audio blocks TA5R of fig. 5 with TB 0R.
The modification unit 430 inserts the watermark 230 from the watermark source 220 into the modifiable time-domain audio blocks by implementing an encoding process, such as one or more of the encoding methods and apparatus described in U.S. Pat. Nos. 6,272,176, 6,504,870, and/or 6,621,881 (block 1040). For example, the modification unit 430 may insert the watermark 230 into 512-sample time-domain audio blocks generated using the reconstructed time-domain audio blocks TA5R and TB0R to generate watermarked time-domain audio blocks TA5W and TB 0W. Based on the watermarked time-domain audio blocks and the compression information, the modification unit 430 generates watermarked MDCT coefficient sets (block 1050). As indicated above, two watermarked time-domain audio blocks (where each block includes 256 samples) may be used to generate the watermarked MDCT coefficient sets. For example, the watermarked time-domain audio block TA5W may be concatenated with TB0W and then used to generate the watermarked MDCT coefficient set MA 5W.
As described above in connection with fig. 6, the modification unit 430 calculates mantissa values associated with each watermarked MDCT coefficient in the watermarked MDCT coefficient sets MA5W based on compression information associated with the compressed digital data stream 240. In this manner, the modification unit 430 can useThe watermarked MDCT coefficient sets modify or augment the original MDCT coefficient sets to embed or insert the watermark 230 into the compressed digital data stream 240 (block 1060). Following the above example, the modification unit 430 may replace the original MDCT coefficient set MA5 with the watermarked MDCT coefficient set MA5W of fig. 5. For example, the modification unit 430 may replace the original MDCT coefficients in the MDCT coefficient set MA5 with corresponding watermarked MDCT coefficients (which have augmented mantissa values) from the watermarked MDCT coefficient set MA 5W. Alternatively, the modification unit 430 may calculate the difference (i.e., Δ M) between the mantissa codes associated with the original MDCT coefficients and the corresponding watermarked MDCT coefficientsk=Mk-WMk) And according to the difference DeltaMkThe original MDCT coefficients are modified. In either case, after the original MDCT coefficient sets have been modified, the modification process 940 ends and control returns to block 950.
Returning to fig. 9, the repackaging unit 440 repackages the frames of the compressed digital data stream (block 950). The repacking unit 440 identifies the location of the MDCT coefficient sets within the frame so that modified MDCT coefficient sets can be replaced at the location of the original MDCT coefficient sets to reconstruct the frame. At block 960, if the embedding device 210 determines that additional frames of the compressed digital data stream 240 need to be processed, control returns to block 910. And if all frames of the compressed digital data stream 240 have been processed, the process 900 ends.
As noted above, known watermarking techniques typically decompress a compressed digital data stream into decompressed time domain samples, insert a watermark into the time domain samples, and recompress the watermarked time domain samples into a watermarked compressed digital data stream. In contrast, the digital data stream 240 remains compressed during the example unpacking, modifying, and repacking processes described herein. As a result, the watermark 230 is embedded in the compressed digital data stream 240 without additional decompression/compression cycles that may degrade the quality of the content in the compressed digital data stream 500.
To further illustrate the example modification processes of fig. 9 and 10, fig. 11 shows one manner in which a data frame (e.g., an AC-3 frame) may be processed. Example frame processing 1100 starts by: the embedding device 210 reads header information of the obtained frame (e.g., an AC-3 frame) (block 1110) and initializes an MDCT coefficient set count to 0 (block 1120). In the case of AC-3 frames being processed, each AC-3 frame includes 6 MDCT coefficient sets (e.g., MA0, MA1, MA2, MA3, MA4, and MA5 of fig. 5, also referred to as "audblk" in the AC-3 standard) having compressed domain data. Accordingly, the embedding device 210 determines whether the MDCT coefficient set count is equal to 6 (block 1130). If the MDCT coefficient set count is not yet equal to 6, indicating that at least one more MDCT coefficient set needs to be processed, the embedding device 210 extracts the exponents (block 1140) and mantissas (block 1150) associated with the MDCT coefficients of the frame (the original mantissa M described above in connection with FIG. 6)k). The embedding device 210 computes a new mantissa (e.g., WM, described above in connection with FIG. 6) associated with the code symbol read at block 1220k) (block 1160) and modifies the original mantissa associated with the frame based on the new mantissa (block 1170). For example, the original mantissa may be modified based on the difference between the new mantissa and the original mantissa (but limited to the range associated with the bit representation of the original mantissa). The embedding device 210 increments the MDCT coefficient set count by 1 (block 1180) and control returns to block 1130. Although the example process of fig. 11 above is described as including 6 MDCT coefficient sets (e.g., the threshold for the MDCT coefficient set count is 6), a process utilizing more or fewer MDCT coefficient sets may also be used. At block 1130, if the MDCT coefficient set count is equal to 6, then all of the MDCT coefficient sets have been processed so that the watermark has been embedded and the embedding device 210 repacks the frame (block 1190).
As indicated above, many methods are known for embedding watermarks imperceptible to the human ear (e.g. inaudible codes) in a decompressed audio signal. For example, one known method is described in U.S. Pat. No.6,421,445 to Jensen et al, the entire disclosure of which is incorporated herein by reference. In particular, as described in Jensen et al, a code signal (e.g., a watermark) may include information combined at 10 different frequencies that may be detected by a decoder using fourier spectral analysis of a sequence of audio samples (e.g., a sequence of 12,288 audio samples as described in detail below). For example, the audio signal may be sampled at a rate of 48 kilohertz (kHz) to output an audio sequence of 12,288 audio samples that may be processed (e.g., using a fourier transform) to obtain a relatively high-resolution (e.g., 3.9Hz) frequency-domain representation of the decompressed audio signal. However, a sinusoidal code signal having a constant amplitude over the entire sequence of audio samples is not acceptable according to the encoding process of the method disclosed by Jensen et al, because the sinusoidal code signal is perceived by the human ear. To meet the masking energy constraint (i.e., to ensure that sinusoidal code signal information remains imperceptible), the sinusoidal code signal is synthesized over the entire sequence of 12,288 audio samples using a masking energy analysis that determines the local sinusoidal amplitude within each block of audio samples (e.g., where each block of audio samples may include 512 audio samples). Thus, according to the masking energy analysis, the local sinusoidal waveform may be (phase) coherent over a sequence of 12,288 audio samples, but with varying amplitude.
However, in contrast to the method disclosed by Jensen et al, the method and apparatus described herein may be used to embed a watermark or other code signal into a compressed audio signal in such a way that the compressed digital data stream containing the compressed audio signal remains compressed during the unpacking, modifying and repacking processes. Figure 12 shows one way in which a watermark, such as the watermark disclosed by Jensen et al, may be inserted into a compressed audio signal. The example process 1200 begins by initializing a frame count to 0 (block 1210). 8 frames (e.g., AC-3 frames) representing a total of 12,288 audio samples for each audio channel may be processed to embed one or more code symbols (e.g., one or more symbols "0", "1", "S", and "E" as shown in fig. 13 and described by Jensen et al) in the audio signal. Although the compressed digital data stream described herein includes 12,288 audio samples, the compressed digital data stream may have more or fewer audio samples. Embedding device 210 (fig. 2) may read watermark 230 from watermark source 220 to insert one or more code symbols into the sequence of frames (block 1220). The embedding device 210 may obtain one of the frames (block 1230) and proceed to the frame processing operation 1100 described above to process the obtained frame. Accordingly, the example frame processing operation 1100 ends, and control returns to block 1250 to increment the frame count by 1. The embedding device 210 determines whether the frame count is 8 (block 1260). If the frame count is not 8, the embedding device 210 returns to obtain another frame in the sequence and repeats the example frame processing operation 1100 as described above in connection with FIG. 11 to process another frame. If, on the other hand, the frame count is 8, then the embedding device 210 returns to block 1210 to reinitialize the frame count to 0 and repeat the process 1200 to process another sequence of frames.
As noted above, a code signal (e.g., watermark 230) may be embedded or inserted into a compressed digital data stream (e.g., an AC-3 data stream). As shown in the example table 1300 of FIG. 13 and described by Jensen et al, the code signal may include a frequency index f1To f10The corresponding 10 sinusoidal components are combined to represent one of the 4 code symbols "0", "1", "S", and "E". For example, code symbol "0" may represent a binary value of 0 and code symbol "1" may represent a binary value of 1. Further, code symbol "S" may represent the beginning of the message and code symbol "E" may represent the end of the message. Although only 4 code symbols are shown in fig. 13, more or fewer code symbols may be used. Table 1300 lists transform bits (transform bins) corresponding to the center frequencies at which the 10 sinusoidal components of each symbol are located approximately. For example, 512-sample center frequency indices (e.g., 10, 12, 14, 16, 18, 20, 22, 24, 26, and 28) are associated with a low-resolution frequency-domain representation of the compressed digital data stream, and 12,288-sample center frequency indices (e.g., 240, 288, 336, 384, 432, 480, 528, 576, 624, and 672) are associated with a high-resolution frequency-domain representation of the compressed digital data stream.
As noted above, the frequency index f shown in Table 1300 may be used1To f10The associated 10 sinusoidal components form each code symbol. For example, the code signal for inserting or embedding the code symbol "0" includes 10 sinusoidal components corresponding to the frequency exponents 237, 289, 339, 383, 429, 481, 531, 575, 621 and 673, respectively. Similarly, a code for inserting or embedding a code symbol "1The signal includes 10 sinusoidal components corresponding to frequency indices 239, 291, 337, 381, 431, 483, 529, 573, 623, and 675, respectively. As shown in the example table 1300, frequency index f1To f10Each having a unique frequency value at or near each of the 12,288 sample center frequency indices.
Frequency index f may be correlated in the time domain using the methods and apparatus described herein1To f10Each of the associated 10 sinusoidal components is synthesized. For example, the code signal for inserting or embedding the code symbol "0" may comprise a sinusoid c1(k)、c2(k)、c3(k)、c4(k)、c5(k)、c6(k)、c7(k)、c8(k)、c9(k) And c10(k) In that respect The first sinusoid c may be plotted in the time domain1(k) Synthesized as the following sample sequence:for k 0 to 12287. However, the sinusoid c generated in this way1(k) Will have a constant amplitude over the entire 12,288 sample window. Conversely, to generate a sinusoid whose amplitude may vary from audio block to audio block, the first sinusoid c may be calculated as follows1(k) Sample values in an associated 512-sample audio block (e.g., a long AC-3 block):for m-0 to 511 and p-0 to 46, where w (m) is the window function used in the AC-3 compression described above. Those skilled in the art will appreciate that the previous formula can be used directly to calculate c1p(m), or c may be calculated in advance1(k) And extracting appropriate segments to generate c1p(m) of the reaction mixture. In either case, c1pThe MDCT transforms of (m) each include a set of MDCT coefficient values (e.g., 256 real numbers). Following the previous example, for c corresponding to the symbol "01p(m), the MDCT coefficient values associated with the 512 sample frequency indices 9, 10, and 11 may be of a large order of magnitude because c1p(m) and 12,288 sample centersThe frequency indices 240 (which correspond to a 512 sample center frequency index of 10) are associated. For c1p(m), the MDCT coefficient values associated with the other 512-sample frequency indices will be ignored with respect to the MDCT coefficient values associated with the 512-sample frequency indices 9, 10, and 11. Usually, a handle and c1p(m) (and other sinusoidal components c)2p(m),...,c10p(m)) the associated MDCT coefficient value is divided by the following normalization factor Q:where 512 is the number of samples associated with each block. This normalization allows a time domain cosine wave of unit magnitude at the 12,288 sample center frequency index 240 to generate a unit magnitude MDCT coefficient at the 512 sample center frequency index 10.
Following the previous example, for c associated with code symbol "01p(m), code frequency index 237 (e.g., associated with frequency index f associated with code symbol "0")1Corresponding frequency values) such that the 512 sample center frequency exponent 10 has the highest MDCT magnitude relative to the 512 sample frequency exponents 9 and 11 because the 512 sample center frequency exponent 10 corresponds to the 12,288 sample center frequency exponent 240 and the code frequency exponent 237 is close to the 12,288 sample center frequency exponent 240. Similarly, a second frequency index f corresponding to the code frequency index 2892MDCT coefficients with a large MDCT magnitude can be generated in the 512-sample frequency indices 11, 12, and 13. The code frequency index 289 may be such that the 512-sample center frequency index 12 has the highest MDCT magnitude, since the 512-sample center frequency index 12 corresponds to the 12,288-sample center frequency index 288 and the code frequency index 289 is close to the 12,288-sample center frequency index 288. Similarly, a third frequency index f corresponding to the code frequency index 3393MDCT coefficients with a large MDCT magnitude can be generated in the 512-sample frequency indices 13, 14, and 15. The code frequency indices 339 may be such that the 512 sample center frequency indices 14 have the highest MDCT magnitude because the 512 sample center frequency indices 14 correspond to 12,288 sample center frequency indices 336 and the code frequency indices 339 are close to 12,288 sample center frequency indices 336. According to 10Frequency index f1To f10The MDCT coefficients representing the actual watermarked code signal will correspond to 512 sample frequency indices in the range from 9 to 29. Some 512 sample frequency indices (e.g., 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, and 29) may be affected by energy spillover from two adjacent code frequency indices, where the amount of spillover is a function of the weight applied to each sinusoidal component according to the masked energy analysis. Thus, in each 512-sample audio block of the compressed digital data stream, MDCT coefficients may be computed as described below to represent the code signal.
In a compressed AC-3 data stream, for example, each AC-3 frame includes MDCT coefficient sets having 6 MDCT coefficients (e.g., MA0, MA1, MA2, MA3, MA4, and MA5 of FIG. 5), where each MDCT coefficient corresponds to a 512-sample audio block. As described above in connection with FIGS. 5 and 6, each MDCT coefficient is represented asWherein XkIs a power, MkIs a mantissa. Mantissa MkIs the mantissa step size skAnd integer value NkThe product of the two. The mantissa may be stepped by a step skAnd power XkFor forming quantization stepReferring to the lookup table 600 of FIG. 6, for example, when the original mantissa value is-0.2666 (i.e., -4/15), the mantissa step size skIs 2/15, integer value NkIs-2.
For inserting the code signal into the compressed AC-3 data stream, a mantissa group M of 9 to 29 is determined for kkModifications are made. For example, consider a mantissa group M where k is 9 to 29kWith watermarked MDCT coefficients wm9、wm10And wm11Corresponding MDCT coefficient magnitude C9、C10And C11Respectively-0.3, 0.8 and 0.2 (with varying amplitudes based on local masking energy). Furthermore, assume a code MDCT magnitude C associated with a 512-sample center frequency index of 1111Article for makingWith the whole tail array (C)kK 9 to 29) of the lowest absolute magnitude (e.g., absolute value 0.2). Due to code MDCT magnitude C11Having the lowest absolute magnitude, and hence the code MDCT magnitude C11Is used for the MDCT coefficient m9、m10And m11(and group m)9To m29Other MDCT coefficients in (b) are normalized and modified. Firstly, mix C11Normalized to 1.0 and then used for normalization, e.g. C9And C10Normalized to C9=-0.3/C111.5 and C10=0.8/C114.0. Then, the original MDCT coefficient m is compared with11Corresponding mantissa integer value N11Increase by 1 because 1 is the minimum (due to mantissa step size quantization) with which m can be modified11To reflect with C11Addition of the corresponding watermark code. Finally, relative to N as follows11Modified and original MDCT coefficient m9And m10Corresponding mantissa integer value N9And N10:Andthus, the modified mantissa may be integer value N9、N10And N11(and similarly modified mantissa integer N12To N29) For modifying the corresponding original MDCT coefficients to embed the watermark code. Also, as described above, for any MDCT coefficient, the maximum change is given by its mantissa integer value NkUpper and lower limits of (d). For example, referring to FIG. 6, table 600 shows a lower limit value of-0.9333 to an upper limit value of + 0.9333.
Thus, the foregoing example illustrates how the local masking energy can be used to determine the code magnitude of code symbols to be embedded in a compressed audio signal digital data stream. Furthermore, in the encoding process of the methods and apparatus described herein, 8 consecutive frames of the compressed digital data stream are modified without performing decompression on the MDCT coefficients.
Fig. 14 is a block diagram of an example processor system 2000 that may be used to implement the methods and apparatus disclosed herein. The processor system 2000 may be a desktop computer, a laptop computer, a notebook computer, a Personal Digital Assistant (PDA), a server, an internet appliance, or any other type of computing device.
The processor system 2000 illustrated in fig. 14 includes a chipset 2010, the chipset 2010 including a memory controller 2012 and an input/output (I/O) controller 2014. As is well known, a chipset typically provides memory and I/O management functions, as well as a plurality of general purpose and/or special purpose registers, timers, etc. that are accessible or used by a processor 2020. The processor 2020 is implemented using one or more processors. Alternatively, other processing technology may be used to implement the processor 2020. The processor 2020 includes a cache 2022, which may be implemented using a first level unified cache (L1), a second level unified cache (L2), a third level unified cache (L3), and/or any other suitable structure to store data.
The memory controller 2012 conventionally functions to perform functions that enable the processor 2020 to access and communicate with a main memory 2030 including a volatile memory 2032 and a non-volatile memory 2034 via a bus 2040. The volatile memory 2032 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM), and/or any other type of random access memory device. The non-volatile memory 2034 may be implemented using flash memory, Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), and/or any other desired type of storage device.
The processor system 2000 also includes an interface circuit 2050 that is coupled to the bus 2040. The interface circuit 2050 may be implemented using any type of well known interface standard such as an ethernet interface, a Universal Serial Bus (USB), a third generation input/output interface (3GIO) interface, and/or any other suitable type of interface.
One or more input devices 2060 are connected to the interface circuit 2050. The input device(s) 2060 permit a user to enter data and commands into the processor 2020. For example, the input device 2060 may be implemented by a keyboard, a mouse, a touch-sensitive display, a track pad, a track ball, an isopoint, and/or a voice recognition system.
One or more output devices 2070 are also connected to the interface circuit 2050. For example, the output device(s) 2070 may be implemented by media presentation devices (e.g., a Light Emitting Display (LED), a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT) display, a printer and/or speakers). Thus, the interface circuit 2050 typically includes, among other things, a graphics driver card.
The processor system 2000 also includes one or more mass storage devices 2080 to store software and data. Examples of such mass storage devices 2080 include floppy disks and their drives, hard disk drives, optical disks and their drives, and Digital Versatile Disks (DVDs) and their drives.
The interface circuit 2050 also includes a communication device such as a modem or a network interface card to facilitate exchange of data with external computers via a network. The communication link between the processor system 2000 and the network may be any type of network connection such as an ethernet connection, a Digital Subscriber Line (DSL), a telephone line, a cellular telephone system, a coaxial cable, etc.
In a conventional manner, access to the input devices 2060, the output devices 2070, the mass storage device 2080 and/or the network is typically controlled through the I/O controller 2014. In particular, the I/O controller 2014 performs functions that enable the processor 2020 to communicate with the input device(s) 2060, the output device(s) 2070, the mass storage device(s) 2080 and/or the network via the bus 2040 and the interface circuit 2050.
While the portions shown in fig. 14 are illustrated as separate blocks within the processor system 2000, the functions performed by some of these blocks may be integrated within a single semiconductor circuit or may be implemented using two or more separate integrated circuits. For example, although the memory controller 2012 and the I/O controller 2014 are depicted as separate blocks within the chipset 2010, the memory controller 2012 and the I/O controller 2014 may be integrated within a single semiconductor circuit.
The methods and apparatus disclosed herein are particularly well suited for use with data streams implemented according to the AC-3 standard. However, the methods and apparatus disclosed herein may be applied to other digital audio coding techniques.
Further, while the present disclosure is presented with respect to an example television system, it should be appreciated that the disclosed system is readily applicable to many other media systems. Thus, while this disclosure describes example systems and processes, the disclosed examples are not the only implementations of such systems.
Although certain example methods, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents. For example, although this disclosure describes example systems including, among other components, software executed on hardware, it should be noted that such systems are merely illustrative and should not be considered as limiting. In particular, it is contemplated that any or all of the disclosed hardware and software components could be embodied exclusively in dedicated hardware, exclusively in firmware, exclusively in software, or in some combination of hardware, firmware, and/or software.
Claims (13)
1. A method for embedding media identification information in a compressed media data stream, the method comprising the steps of:
reconstructing a non-compressed media data stream from the compressed media data stream, the non-compressed media data stream being separate from the compressed media data stream;
embedding the media identification information into the uncompressed media data stream to determine a watermarked uncompressed media data stream; and
modifying a first mantissa value corresponding to a first transform coefficient associated with the compressed media data stream to embed the media identification information in the compressed media data stream without decompressing the compressed media data stream, the modification of the first mantissa value being based on a difference between the first transform coefficient and a corresponding second transform coefficient, the second transform coefficient being generated from the watermarked uncompressed media data stream.
2. The method of claim 1, wherein the compressed media data stream comprises a compressed audio data stream and the uncompressed media data stream comprises a time domain audio data stream.
3. The method of claim 1, wherein the media identifying information comprises a watermark representing at least one of a program or source identifying information.
4. The method of claim 1, wherein the first and second transform coefficients comprise respective first and second Modified Discrete Cosine Transform (MDCT) coefficients.
5. The method of claim 1, wherein reconstructing a non-compressed media data stream from the compressed media data stream comprises:
determining an inverse transform of the compressed media data stream to generate first and second inverse transform data blocks; and
combining the first and second inverse transformed data blocks to form the uncompressed media data stream.
6. The method of claim 1, wherein embedding the media identification information in the uncompressed media data stream to determine a watermarked uncompressed media data stream comprises:
increasing a first frequency component of the uncompressed media data stream and decreasing a second frequency component of the uncompressed media data stream to represent a first data value associated with the media identifying information; and
decreasing the first frequency component of the uncompressed media data stream and increasing the second frequency component of the uncompressed media data stream to represent a second data value associated with the media identification information.
7. The method of claim 1, wherein embedding the media identification information into the uncompressed media data stream to determine a watermarked uncompressed media data stream comprises:
determining a plurality of code signal components to represent data values associated with the medium identification information; and
combining the plurality of code signal components with the uncompressed media data stream based on the determined masking energy.
8. The method of claim 1, wherein modifying the first mantissa value corresponding to the first transform coefficient associated with the compressed media data stream comprises:
determining a second mantissa value associated with the second transform coefficient generated from the watermarked uncompressed media data stream;
quantizing the second mantissa value based on compression information associated with the first mantissa value; and
replacing the first mantissa value with a quantized second mantissa value.
9. The method of claim 1, wherein the first transform coefficient further comprises the first mantissa value and a first exponent value, and wherein the first exponent value is embedded in the media identification information without being modified.
10. The method of claim 9, wherein the first mantissa value is set to at least one of a minimum value or a maximum value based on compression information associated with the first mantissa value when only the modification of the first mantissa value is insufficient to account for a difference between the first transform coefficient and the second transform coefficient.
11. A method for determining identification information, the method comprising the steps of:
extracting identification information embedded in the rendered media content, the identification information being embedded in a broadcasted compressed audio data stream corresponding to the rendered media content, the identification information being embedded in the compressed audio data stream without decompressing the compressed audio data stream by:
modifying a first mantissa value corresponding to a first transform coefficient associated with the compressed audio data stream to embed the identification information in the compressed audio data stream, the modification of the first mantissa value being based on a difference between the first transform coefficient and a corresponding second transform coefficient generated from a separate uncompressed version of the compressed audio data stream in which the identification information is also embedded.
12. The method of claim 11, wherein the identification information comprises a watermark representing at least one of a program or source identification information.
13. The method of claim 11, further comprising the steps of:
decompressing, at the receiving device, the broadcasted compressed audio data stream to generate a non-compressed audio data stream corresponding to the presented media content; and
the identification information is extracted from an analog audio signal corresponding to the uncompressed audio data stream, the analog audio signal being provided by at least one of a speaker or an analog output of the receiving apparatus.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US47862603P | 2003-06-13 | 2003-06-13 | |
| US60/478,626 | 2003-06-13 | ||
| US57125804P | 2004-05-14 | 2004-05-14 | |
| US60/571,258 | 2004-05-14 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1150090A1 HK1150090A1 (en) | 2011-10-28 |
| HK1150090B true HK1150090B (en) | 2013-08-30 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8078301B2 (en) | Methods and apparatus for embedding codes in compressed audio data streams | |
| US9202256B2 (en) | Methods and apparatus for embedding watermarks | |
| AU2010200873B2 (en) | Methods and apparatus for embedding watermarks | |
| CN1993700B (en) | Method and apparatus for mixing compressed digital bit streams | |
| HK1150090B (en) | Methods and apparatus for embedding watermarks | |
| HK1090476B (en) | Methods and apparatus for embedding watermarks | |
| AU2012261653B2 (en) | Methods and apparatus for embedding watermarks | |
| AU2011203047B2 (en) | Methods and Apparatus for Mixing Compressed Digital Bit Streams | |
| HK1106047B (en) | Methods and apparatus for mixing compressed digital bit streams |