[go: up one dir, main page]

HK1164565B - Methods and apparatus to perform audio watermarking and watermark detection and extraction - Google Patents

Methods and apparatus to perform audio watermarking and watermark detection and extraction Download PDF

Info

Publication number
HK1164565B
HK1164565B HK12105179.8A HK12105179A HK1164565B HK 1164565 B HK1164565 B HK 1164565B HK 12105179 A HK12105179 A HK 12105179A HK 1164565 B HK1164565 B HK 1164565B
Authority
HK
Hong Kong
Prior art keywords
code
audio
message
encoded
symbol
Prior art date
Application number
HK12105179.8A
Other languages
Chinese (zh)
Other versions
HK1164565A1 (en
Inventor
韦努戈帕尔.斯里尼瓦桑
亚历山大.帕夫洛维奇.托普奇
Original Assignee
尼尔森(美国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/464,811 external-priority patent/US9667365B2/en
Application filed by 尼尔森(美国)有限公司 filed Critical 尼尔森(美国)有限公司
Publication of HK1164565A1 publication Critical patent/HK1164565A1/en
Publication of HK1164565B publication Critical patent/HK1164565B/en

Links

Description

Method and apparatus for performing audio watermark embedding and watermark detection and extraction
RELATED APPLICATIONS
The priority of U.S. provisional application serial No.61/174,708 entitled METHODS AND APPARATUS FOR use in the DETECTION of AUDIO WATERMARKING AND WATERMARK DETECTION AND analysis, filed on day 5/1 of 2009, AND U.S. provisional application serial No.61/108,380 entitled "stackingmethod FOR ADVANCED WATERMARK DETECTION", filed on day 24/10 of 2008, the disclosures of both provisional applications being incorporated herein by reference in their entirety.
Technical Field
The present invention relates generally to media monitoring and, more particularly, to methods and apparatus for performing audio watermark embedding and watermark detection and extraction.
Background
Identifying media information, and more particularly identifying audio streams (e.g., audio information), is useful for evaluating audience exposure (audience exposure) of television, radio, or any other media. For example, in a television audience statistics (metering) application, a code may be inserted into the audio or video of the media, where the code is subsequently detected at a monitoring point when the media is presented (e.g., played at a monitored residence). The payload of the code/watermark information embedded in the original signal may include a unique source identification, broadcast time information, transaction (transactional) information, or additional content metadata.
Monitoring points typically include locations such as a residence that monitor audience member's media consumption or audience member exposure to media. For example, at a monitoring point, codes from audio and/or video are captured and may be associated with an audio stream or a video stream of media associated with a selected channel, radio station, media source, or the like. The collected code may then be sent to a central data collection facility for analysis. However, the collection of data related to media exposure or consumption need not be limited to exposure or consumption at home.
Drawings
Fig. 1 is a schematic diagram of a broadcast audience measurement system employing a program identification code added to the audio portion of a composite television signal.
Fig. 2 is a block diagram of the example encoder of fig. 1.
Fig. 3 is a flow diagram illustrating an example encoding process that may be performed by the example decoder of fig. 2.
FIG. 4 is a flow diagram illustrating an example process that may be performed to generate a frequency index table for use in conjunction with the code frequency selector of FIG. 2.
Fig. 5 is a graph illustrating key (critical) band indices and how they correspond to short and long block sample indices.
Fig. 6 illustrates one example of selecting frequency components that will represent a particular information symbol.
Fig. 7-9 are graphs illustrating different example code frequency configurations that may be produced by the process of fig. 4 and used in conjunction with the code frequency selector of fig. 2.
Fig. 10 illustrates a frequency relationship between audio coding indexes.
Fig. 11 is a block diagram of the example decoder of fig. 1.
Fig. 12 is a flow diagram illustrating an example decoding process that may be performed by the example encoder of fig. 11.
Fig. 13 is a flow diagram of an example process that may be performed to superimpose (stack) audio in the decoder of fig. 11.
Fig. 14 is a flow diagram of an example process that may be performed to determine symbols encoded in an audio signal in the decoder of fig. 11.
Fig. 15 is a flow diagram of an example process that may be performed to process a buffer to identify a message in the decoder of fig. 11.
FIG. 16 illustrates an example circular buffer set that can store message symbols.
FIG. 17 illustrates an example pre-existing code tag circular buffer set that can store message symbols.
Fig. 18 is a flow diagram of an example process that may be performed to validate an identified message in the decoder of fig. 11.
Fig. 19 illustrates an example filter stack (stack) that may store the identified messages in the decoder of fig. 11.
Fig. 20 is a schematic diagram of an example processor platform that may be used and/or encoded to perform any or all of the processes described herein or to implement any or all of the example systems, example apparatus, and/or example methods described herein.
Detailed Description
The following description refers to audio encoding and decoding, which are also commonly referred to as audio watermarking (audio watermarking) and watermark detection, respectively. It should be noted that in this context, audio may be any type of signal having frequencies that fall within the frequency spectrum audible to normal humans. For example, the audio may be speech, music, audio and/or audio portions of a video program or work (e.g., television programs, movies, internet video, radio programs, commercials, etc.), media programs, noise, or any other sound.
Generally speaking, as described in detail below, encoding audio inserts one or more codes or information (e.g., watermarks) into the audio and ideally renders the codes inaudible to a listener of the audio. However, in certain situations, the code may be heard by a particular listener. The code embedded in the audio may be of any suitable length and any suitable technique may be selected to distribute the code into the information.
As described below, code or information to be inserted into audio may be converted into symbols represented by code frequency signals to be embedded into the audio to represent the information. These code frequency signals comprise one or more code frequencies, wherein different code frequencies or groups of code frequencies are assigned to represent different information symbols. Techniques are also described for generating one or more tables that map symbols to representative code frequencies such that the symbols are distinguishable from each other at a decoder. Any suitable encoding or error correction technique may be used to convert the code into symbols.
By controlling the amplitude at which these code frequency signals are input into the raw (native) audio, the human hearing may not perceive the presence of these code frequency signals. Thus, in one example, information is provided using masking (masking) operations based on the energy content of the original audio at different frequencies and/or the tonal or noise-like nature of the original audio, the amplitude of these code frequency signals being based on this information.
In addition, the audio signal may have passed through a distribution chain (distribution chain) where, for example, the content has been delivered from the content creator to a network publisher (e.g., NBC national desk) and further to a local content publisher (e.g., NBC of chicago). As the audio signal passes through the distribution chain, one of the publishers may encode a watermark into the audio signal according to the techniques described herein, thereby including in the audio signal an identification of the publishers or an indication of the time of distribution. The encoding described herein is very robust and therefore the code inserted into the audio signal is not easily removed. Thus, any subsequent publisher of the audio content may encode the audio signal that has been previously encoded using the techniques described herein, such that the subsequent publisher's code will be detectable and any crediting (crediting) to the subsequent publisher will be acknowledged.
In addition, code detection may be improved by superimposing messages and converting the encoded audio signal into a signal with an emphasized (accentuate) code due to the repetition or partial repetition of the code within the signal. When the audio signal is sampled at the monitoring location, audio samples of approximately equal sized blocks are summed and averaged. The superposition process utilizes the temporal characteristics of the audio signal such that the code signal is emphasized within the audio signal. Thus, increased robustness to noise or other interference may be provided when using the superposition process. This superposition process may be useful, for example, when the decoding operation uses a microphone that may acquire ambient noise in addition to the audio signal output by the speaker.
A further technique to add robustness to the decoding operation described herein provides for verification (validation) of the message identified by the decoding operation. After identifying the messages in the encoded audio signal, the messages are added to a stack (stack). Subsequent repetitions of the message are then compared to identify a match. A message is identified as authenticated when it can be matched with another identified message at an appropriate repetition interval. When a message may partially match another message that has already been verified, the message is marked as partially verified, and subsequent messages are used to identify portions of the message that may have been corrupted. According to this example authentication technique, messages are output from the decoder only when they are authenticated. This technique prevents errors in the message due to interference and/or detection errors.
The following examples generally relate to encoding an audio signal having information, such as a code, and obtaining the information from the audio via a decoding process. The following example encoding and decoding processes may be used for a number of different technical applications for transferring information from one place to another.
The example encoding and decoding processes described herein may be used to perform broadcast identification. In such an example, a work is encoded before it is broadcast as a code that includes information indicating the source of the work, the time of broadcast of the work, the distribution channel of the work, or any other information deemed relevant to the system operator. When a work is presented (e.g., played via television, radio, computing device, or any other suitable device), people in the area of the presentation are exposed not only to the work, but also to the code embedded in the work without being known to those people. Thus, one may be configured with a decoder operating on a microphone-based platform so that a work may be obtained by the decoder using free-field detection and processed to extract a code therefrom. These codes are then recorded and reported back to the central facility for further processing. The microphone-based decoder may be a dedicated stand-alone device or may be implemented using a cellular telephone or any other type of device having a microphone and software to perform decoding and code recording operations. Alternatively, a wire-based system may be used, as long as the work and its ancillary code are available via a hard wired connection.
The example encoding and decoding processes described herein may be used for tracking and/or forensics (forensics) related to audio and/or video works, for example, by tagging copyright protected audio and/or associated video content with a particular code. The example encoding and decoding processes may be used to implement a business encoding system in which a unique code is inserted into a work when the work is purchased by a consumer. Thus, media distribution is allowed to identify the source of the work. The act of purchasing may include the purchaser physically receiving a tangible medium (e.g., an optical disc, etc.) containing the work, or may include downloading the work via a network such as the internet. In the context of a business coding system, each purchaser of the same work receives the work, but the work received by each purchaser is encoded with a different code. That is, the code inserted into the work may be person-to-person for a purchaser, wherein each work purchased by the purchaser includes the purchaser's code. Alternatively, each work may be encoded with a sequentially assigned code.
Further, the example encoding and decoding techniques described herein may be used to perform control functions by hiding code in a steganographic manner, where the hidden code is used to control a target device programmed to respond to the code. For example, the control data may be hidden in the speech signal or any other audio signal. A decoder in the region where the audio signal is presented processes the received audio to obtain the concealed code. After obtaining the code, the target device takes some predetermined action based on the code. This may be useful, for example, where advertisements within a store are changed based on audio presented in the store, etc. For example, a rolling billboard advertisement (billboard advertisement) within a store may be synchronized with an audio advertisement presented in the store by using a code embedded in the audio advertisement.
An example encoding and decoding system 100 is shown in fig. 1. The example system 100 may be, for example, a television audience measurement system that will serve as background to further explain the encoding and decoding processes described herein. The example system 100 includes an encoder 102, the encoder 102 adding a code or information 103 to an audio signal 104 to produce an encoded audio signal. The information 103 may be any selected information. For example, in a media monitoring context, information 103 may represent an identification of a broadcast media program, such as a television broadcast, radio broadcast, or the like. Additionally, information 103 may include timing information indicating the time at which information 103 was inserted into the audio or the time of the media broadcast. Alternatively, the code may include control information for controlling the behavior of one or more target devices.
The audio signal 104 may be any form of audio including, for example, speech, music, commercial audio, audio associated with a television program, a live performance, and the like. In the example of fig. 1, the encoder 102 passes the encoded audio signal to the transmitter 106. The transmitter 106 transmits the encoded audio signal along with any video signals 108 associated with the encoded audio signal. The encoded audio signal does not necessarily have any associated video, although in some cases the encoded audio signal may have an associated video signal 108.
In one example, the audio signal 104 is a digitized version of an analog audio signal that has been sampled at 48 kilohertz (KHz). As described in detail below, two seconds of audio corresponding to 96000 audio samples at a sampling rate of 48 kilohertz may be used to carry one message, which may be a synchronization message and 49 bits of information. With a 7-bit per symbol coding scheme, the message requires the transmission of 8 symbols of information. Alternatively, in the context of overwrite (overwrite) described below, one synchronization symbol is used, and the synchronization symbol is followed by one information symbol that conveys one of the 128 states. As described in detail below, according to one example, one 7-bit information symbol is embedded in a long block of audio samples corresponding to 9216 samples. In one example, such a long block comprises a short block of 36 overlapping 256 samples, where 256 of these samples are old and 256 are new in a 50% overlapping block.
Although the transmit side of the example system 100 shown in fig. 1 shows a single transmitter 106, the transmit side may be much more complex and may include multiple stages in a distribution chain through which the audio signal 104 may pass. For example, the audio signal 104 may be generated at the national network level and passed to the local network level for local distribution. Thus, although the encoder 102 is shown in the transmit lineup (lineup) prior to the transmitter 106, one or more encoders may be provided throughout the distribution chain of the audio signal 104. Thus, the audio signal 104 may be encoded at multiple levels and may include embedded codes associated with these multiple levels. Further details regarding the encoding and example encoders are provided below.
Transmitter 106 may include one or more of a Radio Frequency (RF) transmitter that may distribute encoded audio signals through free-space propagation (e.g., via a terrestrial or satellite communication link) or a transmitter for distributing encoded audio signals over cable, fiber optics, or the like. In one example, the transmitter 106 may be used to broadcast the encoded audio signal over an entire wide geographic area. In other cases, the transmitter 106 may distribute the encoded audio signal over a limited geographic area. The transmitting may include up-converting the encoded audio signal to radio frequency to enable propagation of the audio signal. Alternatively, the transmitting may include distributing the encoded audio signal in the form of digital bits or packets of digital bits that may be transmitted over one or more networks, such as the internet, a wide area network, or a local area network. Thus, the encoded audio signal may be carried by a carrier signal, by information packets, or by any suitable technique for distributing audio signals.
When the encoded audio signal is received by the receiver 110 (in a media monitoring context, the receiver 110 may be located at a statistically selected measurement point 112), the audio signal portion of the received program signal is processed to recover the code, even if the presence of the code is not perceptible (or substantially imperceptible) to a listener when the encoded audio signal is presented by the speaker 114 of the receiver 110. To this end, the decoder 116 is connected directly to an audio output 118 available at the receiver 110 or to a microphone 120 arranged in the vicinity of the loudspeaker 114 used for audio reproduction. The received audio signal may be in mono or stereo form. Further details regarding decoding and example decoders are provided below.
Audio coding
As explained above, the encoder 102 inserts one or more inaudible (or substantially inaudible) codes into the audio 104 to create encoded audio. An example encoder is shown in fig. 2. In one implementation, the example encoder 102 of fig. 2 may be implemented using, for example, a digital signal processor programmed with instructions for implementing the encoding lineup 202, the operation of the encoding lineup 202 being affected by the operation of the preceding code detector 204 and the masking lineup 206, either or both of the code detector 204 and the masking lineup 206 may be implemented using a digital signal processor programmed with instructions. Of course, any other implementation of the example encoder 102 is possible. For example, encoder 102 may be implemented using one or more processors, programmable logic devices, or any suitable combination of hardware, software, and firmware.
Generally, during operation, the encoder 102 receives the audio 104 and the prior code detector 204 determines whether the audio 104 has been previously encoded with information (which would make it difficult for the encoder 102 to encode additional information into the previously encoded audio). For example, the previous encoding may have been performed at a previous location in the audio distribution chain (e.g., at the national network level). The prior code detector 204 informs the encoding lineup 202 whether the audio has been previously encoded. The pre-code detector 204 may be implemented by a decoder as described herein.
The coding lineup 202 receives the information 103, generates a code frequency signal based on the information 103, and combines the code frequency signal with the audio 104. The operation of the coding lineup 202 is affected by the output of the prior code detector 204. For example, if the audio 104 has been previously encoded and the prior code detector 204 informs the encoding lineup 202 of this fact, the encoding lineup 202 may select an alternate message to be encoded into the audio 104 and may also change the details of encoding the alternate message (e.g., different time positions within the message, different frequencies used to represent symbols, etc.).
The coding lineup 202 is also affected by the masking lineup 206. In general, the masking lineup 206 processes the audio 104 corresponding to the point in time when the encoding lineup 202 wants to encode information and determines the amplitude at which the encoding is performed. As described below, the masking lineup 206 may output a signal for controlling the amplitude of the code frequency signal to keep the code frequency signal below a threshold for human perception.
As shown in the example of fig. 2, the coding lineup includes a message generator 210, a symbol selector 212, a code frequency selector 214, a synthesizer 216, an inverse fourier transform 218, and a combiner 220. Message generator 210 responds to information 103 and outputs a message having a format shown generally at reference numeral 222. The information 103 provided to the message generator may be the current time, a television or radio station identification, a program identification, etc. In one example, message generator 210 may output a message every 2 seconds. Of course, other message transmission intervals are possible.
In one example, the message format 222 representing the message output from the message generator 210 includes a synchronization symbol 224. The synchronization symbol 224 is used by a decoder, examples of which are described below, to obtain timing information indicating the start of a message. Thus, when a decoder receives a synchronization symbol 224, the decoder wants to see additional information after the synchronization symbol 224.
In the example message format 222 of fig. 2, the synchronization symbol 224 is followed by 42 bits of message information 226. The information may include a binary representation of the station identifier and coarse timing information. In one example, the timing information represented in 42 bits of message information 226 changes every 64 seconds, or every 32 message intervals. Thus, 42 bits of message information 226 remains static for 64 seconds. The 7 bits of message information 228 may be a high resolution time that increments every two seconds.
Message format 222 also includes pre-existing code flag information 230. However, the pre-existing code flag information 230 is only selectively used to transmit information. The pre-existing code flag information 230 is not used when the pre-code detector 204 notifies the message generator 210 that the audio 104 has not been previously encoded. Thus, the message output by the message generator includes only the synchronization symbol 224, the 42-bit message information 226, and the 7-bit message information 228; the pre-existing code flag information 230 is blank or filled as indicated by unused symbols. Conversely, when the preceding code detector 204 provides an indication to the message generator 210 that the audio 104 into which the message information is to be encoded has been previously encoded, the message generator 210 will not output the synchronization symbol 224, the 42-bit message information 226, or the 7-bit message information 228. Instead, the message generator 210 will only use the pre-existing code flag information 230. In one example, the pre-existing code flag information will include a pre-existing code flag synchronization symbol to signal the presence of pre-existing code flag information. The pre-existing code flag synchronization symbol is different from synchronization symbol 224 and therefore may be used to signal the start of pre-existing code flag information. Upon receiving the pre-existing code mark synchronization symbol, the decoder may ignore any previously received information that is aligned in time with the synchronization symbol 224, the 42-bit message information 226, or the 7-bit message information 228. To convey information such as a channel indication, a release identification, or any other suitable information, a single pre-existing code flag information symbol follows the pre-existing code flag synchronization symbol. This pre-existing code marking information may be used to provide proper crediting in the audience monitoring system.
The output from the message generator 210 is passed to a symbol selector 212 which selects the representative symbol. The symbol selector may not have to perform any mapping when outputting the synchronization symbol 224, since the synchronization symbol 224 is already in the symbol format. Alternatively, if bits of information are output from the message generator 210, the symbol selector may use a direct mapping in which, for example, 7 bits output from the message generator 210 are mapped to a symbol having a decimal value of the 7 bits. For example, if the value 1010101 is output from the message generator 210, the symbol selector may map the bits to the symbol 85. Other conversions between bits and symbols may of course be used. In certain examples, redundancy or error coding may be used to select the symbols representing the bits. In addition, any other suitable number of bits other than 7 may be selected for conversion to symbols. The number of bits used to select a symbol may be determined based on the maximum symbol space available in the communication system. For example, if the communication system can only transmit one of 4 symbols at a time, only two bits from the message generator 210 will be converted to symbols at a time.
The symbol from the symbol selector 212 is passed to a code frequency selector 214 which selects the code frequency used to represent the symbol. The symbol selector 212 may include one or more look-up tables (LUTs) 232 that may be used to map symbols to code frequencies representing the symbols. That is, the symbol is represented by a plurality of code frequencies that the encoder 102 emphasizes in the audio to form the transmitted encoded audio. Upon receiving the encoded audio, the decoder detects the presence of the emphasized code frequency and decodes the pattern of the emphasized code frequency into the transmitted symbol. Thus, the same LUT selected at the encoder 210 for selecting the code frequency needs to be used in the decoder. An example LUT is described in conjunction with fig. 3-5. Additionally, example techniques for generating LUTs are provided in connection with fig. 7-9.
Code frequency selector 214 may select any number of different LUTs based on various criteria. For example, the code frequency selector 214 may use a particular LUT or group of LUTs in response to a particular synchronization symbol previously received. Additionally, if the pre-code detector 204 indicates that a message has been previously encoded into the audio 104, the code frequency selector 214 may select a look-up table that is unique to the pre-existing code case to avoid confusion between the frequency used to previously encode the audio 104 and the frequency used to include the pre-existing code flag information.
An indication of the code frequency selected to represent a particular symbol is provided to synthesizer 216. The synthesizer 26 may store three complex fourier coefficients representing each of a plurality of possible code frequencies to be indicated by the code frequency selector 214 for each of the short blocks making up the long block. These coefficients represent a transformation of the windowed sinusoidal code frequency signal having a phase angle corresponding to the starting phase angle of the code sinusoid in the short block.
Although the example code synthesizer 216 that generates sine waves or data representing sine waves is described above, other example implementations of a code synthesizer are possible. For example, rather than generating a sine wave, another example code synthesizer 216 may output fourier coefficients in the frequency domain that are used to adjust the amplitude of particular frequencies of audio provided to combiner 220. In this way, the frequency spectrum of the audio can be adjusted to include the necessary sinusoids.
Three complex amplitude adjusted fourier coefficients corresponding to the symbol to be transmitted are provided from combiner 216 to inverse fourier transform 218, which inverse fourier transform 218 converts the coefficients into a time domain signal having a prescribed frequency and amplitude to enable the coefficients to be inserted into the audio to pass the desired symbol to combiner 220. The combiner 220 also receives the audio. Specifically, combiner 220 inserts the signal from inverse fourier transform 218 into a long block of audio samples. As described above, for a given sampling rate of 48KHz, the long block is 9216 audio samples. In the example provided, the synchronization symbol and 49 bits of information require a total of 8 long blocks. Since each long block is 9216 audio samples, only 73728 samples of audio 104 are required to encode a given message. However, because the message begins every two seconds (i.e., every 96000 audio samples), there are many uncoded samples at the end of the 96000 audio samples. The combination can be done in the digital domain or in the analog domain.
However, in the case of a pre-existing code marker, the pre-existing code marker is inserted into the audio 104 after the last symbol representing the previously inserted 7 bits of message information. Thus, the insertion of the pre-existing code flag information begins at sample 73729 and extends (run) for two long blocks or 18432 samples. Thus, when pre-existing code marking information is used, fewer of the 96000 audio samples 104 will be unencoded.
The masking lineup 206 includes an overlapping short block generator (maker) that generates a short block of 512 audio samples, where 256 of the samples are old and 256 are new. That is, the overlap short block generator 240 generates a block of 512 samples, wherein 256 samples are shifted into or out of the buffer at a time. For example, when a first set of 256 samples enters the buffer, the oldest 256 samples are shifted out of the buffer. In subsequent iterations, the 256 samples of the first group are moved to a later position in the buffer, and 256 samples are moved into the buffer. Each time a new short block is generated by moving in 256 new samples and removing the 256 oldest samples, the new short block is provided to the mask evaluator 242. The 512-sample blocks output from the overlap short block generator 240 are multiplied by an appropriate window function so that the "overlap and add" operation restores the audio samples to their correct values at the output. The synthesized code signal to be added to the audio signal is also similarly windowed to prevent abrupt transitions at block edges when there is a change in code amplitude from one 512-sample block to the next overlapping 512-sample block. If these transitions are present, audible artifacts (artifacts) will be produced.
The masking evaluator 242 receives a plurality of samples (e.g., 512 samples) of the overlapped short block and determines the ability of the overlapped short block to audibly conceal the code frequencies to the person. That is, the masking evaluator determines whether the code frequency can be hidden within the audio represented by the short block by: evaluating each of the critical bands of the audio as a whole to determine the energy of the audio; determining noise-like or tone-like attributes of each key frequency band; and determining a sum capability of the critical frequency bands to mask the code frequencies. According to the illustrated example, the bandwidth of the critical band increases with frequency. If the masking evaluator 242 determines that the code frequency can be concealed within the audio 104, the masking evaluator 204 indicates that the code frequency can be inserted within the audio 104 while still maintaining the concealed amplitude level and provides the amplitude information to the synthesizer 216.
In one example, the masking evaluator 242 performs the masking evaluation by determining the energy Eb or the maximum change in masking energy level that can occur at any critical frequency without making the change noticeable to the listener. The masking evaluations performed by the masking evaluator 242 may be performed, for example, as described in the moving Picture experts group-advanced Audio coding (MPEG-AAC) Audio compression Standard ISO/IEC13818-7: 1997. The acoustic energy in each critical band affects the masking energy of its neighboring critical bands and algorithms for calculating this masking effect are described in standard documents such as ISO/IEC13818-7: 1997. These analyses may be used to determine, for each short block, the masking contribution due to features of pitch (e.g., the audio being evaluated is more or less similar to pitch) and noise-like (i.e., the audio being evaluated is more or less similar to noise). Further analysis may evaluate temporal masking, which extends the masking capability of audio over a short time, typically by 50 to 100 milliseconds (ms). The analysis by the masking evaluator 242 provides a determination on a per critical band basis of the amplitude of the code frequencies that may be added to the audio 104 without producing any noticeable audio degradation (e.g., without being audible).
Because a 256 sample block will appear at both the beginning of a short block and the end of the next short block, the 256 sample block will be evaluated twice by the mask evaluator 242, which performs two mask evaluations that include the 256 sample block. The amplitude indication provided to combiner 216 is a composite of the two evaluations comprising the 256 sample block and is time controlled (timed) to utilize the samples arriving at combiner 220 to time control the amplitude of the code inserted into the 256 samples.
Referring now to fig. 3-5, an example LUT232 is shown, the example LUT232 including one column 302 representing symbols and 7 columns 304, 306, 308, 310, 312, 314, 316 representing numbered code frequency indices. The LUT232 includes 128 rows that are used to represent data symbols. Because the LUT232 includes 128 different data symbols, data may be transmitted at a rate of 7 bits per symbol. The frequency indices in the table may range from 180 to 656 and are based on a long block size of 9216 samples and a sampling rate of 48 KHz. Thus, the frequencies corresponding to these indices are in the range from 937.5Hz to 3126.6Hz, which falls within the audible range of humans. Of course, other sampling rates and frequency indices may be selected. A description of a process of generating a LUT such as the table 232 is provided in connection with fig. 7 to 9.
In one example operation of code frequency selector 214, symbol 25 (e.g., binary value 0011001) is received from symbol selector 212. The code frequency selector 214 accesses the LUT232 and reads the row 25 of the symbol column 302. The code frequency selector reads from the line to the code frequency index 217, 288, 325, 403, 512, 548 and 655 to be emphasized (emphasze) in the audio 104 to send the symbol 25 to the decoder. The code frequency selector 214 then provides an indication of the indices to the synthesizer 216, which synthesizer 216 synthesizes the code signal by outputting fourier coefficients corresponding to the indices.
The combiner 220 receives both the output of the code synthesizer 216 and the audio 104 and combines the two to form encoded audio. The combiner 220 may combine the output of the code synthesizer 216 with the audio 104 in analog or digital form. If the combiner 220 performs digital combining, the output of the code synthesizer 216 may be combined with the output of the sampler, but not with the audio input to the sampler. For example, blocks of audio in digital form may be combined with sinusoids in digital form, alternatively the combination may be performed in the frequency domain, with the frequency coefficients of the audio being adjusted according to the frequency coefficients representing the sinusoids. As a further alternative, the sinusoids may be combined with the audio in analog form. The encoded audio may be output from the combiner 220 in analog or digital form. If the output of combiner 220 is digital, the output of combiner 220 is then converted to analog form before being coupled to transmitter 106.
Fig. 6 illustrates an example encoding process 600. The example process 600 may be performed by the example encoder 102 of fig. 2 or any other suitable encoder. The example process 600 begins when an audio sample to be encoded is received (block 602). The process 600 then determines whether the received samples have been previously encoded (block 604). This determination is performed, for example, by the previous code detector 204 of fig. 2 or any suitable decoder configured to examine the audio to be encoded to demonstrate previous encoding.
If the received sample was not previously encoded (block 604), the process 600 generates a communication message (block 606), such as one having the format shown in FIG. 2 at reference numeral 222. In one particular example, the communication message may include a synchronization portion and one or more portions including data bits when the audio has not been previously encoded. The communication message generation is performed, for example, by the message generator 210 of fig. 2.
The communication message is then mapped to symbols (block 608). For example, if the synchronization information is already a symbol, the synchronization information does not have to map to a symbol. In another example, if the portion of the communication message is a series of bits, the bits or groups of bits may be represented by one symbol. As described above in connection with the symbol selector 212 as one way of performing the mapping (block 608), one or more tables or coding schemes may be used to convert bits to symbols. For example, some techniques may include the use of error correction coding or the like to increase the robustness of the message by using coding gain. In one specific example implementation having a symbol space sized to accommodate 128 data symbols, 7 bits may be converted to one symbol. Of course, other numbers of bits may be processed depending on many factors including available symbol space, error correction coding, and the like.
After the communication symbols have been selected (block 608), the process 600 selects the LUT (block 610) that is used to determine the code frequencies that will be used to represent the respective symbols. In one example, the selected LUT may be the example LUT232 of fig. 3-5 or may be any other suitable LUT. Additionally, the LUT may be any LUT generated as described in connection with fig. 7-9. The LUT may be selected based on a number of factors including a synchronization symbol selected during generation of the communication message (block 606).
After the symbols have been generated (block 608) and the LUTs selected (block 610), the symbols are mapped to code frequencies using the selected LUTs (block 612). In one example of selecting the LUT232 of fig. 3-5, for example, the symbol 35 would be mapped to the frequency indices 218, 245, 360, 438, 476, 541, and 651. The data space in the LUT is between symbol 0 and symbol 127, while symbol 128, which uses a unique set of code frequencies that do not match any other code frequencies in the table, is used to indicate a synchronization symbol. The LUT selection (block 610) and mapping (block 612) may be performed, for example, by the code frequency selector 214 of fig. 2. After the code frequencies are selected, an indication of these code frequencies is provided to the synthesizer 216 of fig. 2, for example.
The code signals including these code frequencies are then synthesized (block 614) according to the amplitudes evaluated from the masking, as described in connection with blocks 240 and 242 of fig. 2, and in connection with process 600 below. In one example, the synthesis of these code frequency signals may be performed by providing appropriately adjusted (scale) fourier coefficients to an inverse fourier process. In one particular example, three fourier coefficients may be output to represent each of the code frequencies in the code frequency signals. The code frequencies can thus be synthesized by inverse fourier processing in a manner that windows the synthesized frequencies to prevent the code frequencies from spilling into other parts of the signal in which they are embedded. One example configuration that may be used to perform the synthesis of block 614 is shown in blocks 216 and 218 of FIG. 2. Other implementations and configurations are certainly possible.
After the code signals including these code frequencies have been synthesized, these code signals are combined with the audio samples (block 616). As described in connection with fig. 2, the combination of the code signals with the audio causes a symbol to be inserted into each long block of audio samples. Thus, to transmit one synchronization symbol and 49 data bits, the information is encoded as 8 long blocks of audio information: one long block for the synchronization symbol and one long block for each 7-bit data (assuming 7-bit/symbol encoding). These messages are inserted into the audio at 2 second intervals. Thus, the audio of the 8 long blocks immediately after the start of the message may be encoded together with the audio, and the remaining long blocks constituting the balance of the 2 seconds of audio may not be encoded.
The insertion of the code signal into the audio may be performed by adding samples of these code signals to samples of the main audio signal, wherein such addition is done in the analog domain or the digital domain. Alternatively, the frequency components of the audio signal may be adjusted in the frequency domain with appropriate frequency alignment and registration and the adjusted spectrum converted back to the time domain.
The operation of the process 600 is described above when the process 600 determines that the received audio samples have not been previously encoded (block 604). However, where a portion of the media has passed through the distribution chain and has been encoded as it is processed, the received audio samples processed at block 604 already include code. For example, a local television station using a free news clip from a CNN may not be credited (credit) with viewing based on a previous encoding of the CNN clip in a local news broadcast. Likewise, additional information is added to the local news broadcast in the form of pre-existing code tag information. If the received audio sample has been previously encoded (block 604), the process generates pre-existing code flag information (block 618). The pre-existing code flag information may include the generation of pre-existing code flag synchronization symbols and, for example, the generation of 7-bit data represented by a single data symbol. The data symbol may represent a station identification, a time, or any other suitable information. For example, a media monitoring point (MMS) may be programmed to detect the pre-existing code flag information for attribution to (credit) stations identified therein.
After the pre-existing code flag information has been generated (block 618), the process 600 selects a pre-existing code flag LUT (block 620) that identifies a code frequency representing the pre-existing code flag information. In one example, the pre-existing code tag LUT may be different from other LUTs used in the case of non-pre-existing codes. In one particular example, the pre-existing code mark synchronization symbols may be represented by code frequencies 220, 292, 364, 436, 508, 580, and 652.
After the pre-existing code flag information has been generated (block 618) and the pre-existing code flag LUT selected (block 620), the pre-existing code flag symbols are mapped to code frequencies (block 612), with the remainder of the processing following that described above.
At some time prior to synthesizing the code signal (block 614), the process 600 performs a masking evaluation to determine the amplitude at which the code signal should be generated such that the code signal remains inaudible or substantially inaudible to a listener. Thus, the process 600 produces overlapping short blocks of audio samples, each containing 512 audio samples (block 622). As described above, these overlapping short blocks include 50% old samples and 50% newly received samples. This operation may be performed, for example, by the overlap short block generator 240 of fig. 2.
After the overlapping short blocks are generated (block 622), a masking evaluation is performed on the short blocks (block 624). This masking evaluation may be performed, for example, as described in connection with block 242 of fig. 2. The results of this masking evaluation are used by the process 600 at block 614 to determine the amplitude of the code signal to be synthesized. The overlapping short block method may produce two masking evaluations for a particular 256 audio samples (one masking evaluation being when the 256 samples are "new samples" and the other masking evaluation being when the 256 samples are "old samples"), and the result provided to block 614 of process 600 may be a composite of the masking evaluations (composite). Of course, the timing of the process 600 is such that the masking evaluations for a particular audio block are used to determine the code amplitude of that audio block.
Look-up table generation
System 700 is implemented using hardware, software, a combination of hardware and software, firmware, etc., and system 700 is configured to populate (populate) one or more LUTs with code frequencies corresponding to symbols. The system 700 of fig. 7 may be used to generate any number of LUTs, such as the LUTs of fig. 3-5. The system 700, operating as described below in connection with fig. 7 and 8, generates a code frequency index LUT, wherein: (1) two symbols of the table are represented by no more than 1 common frequency index, (2) no more than 1 of the frequency indices representing a symbol resides, for example, in the MPEG-AA compression standard ISO/IEC13818-7:1997, and (3) code frequencies in adjacent critical bands are not used to represent a single symbol. The criterion number 3 helps to ensure that the audio quality is not impaired during the audio encoding process.
The critical band pair qualifier 702 defines a plurality (P) of critical band pairs. For example, referring to fig. 9, table 900 includes columns representing AAC critical band indices 902, short block indices 904 in the ranges of these AAC indices, and long block indices 906 in the ranges of these AAC indices. In one example, the value of P may be 7, thus forming 7 critical band pairs from these AAC indices (block 802). Fig. 10 shows the frequency relationship between these AAC indices. According to one example, as shown at reference numeral 1002 in fig. 10 (where the frequencies of a critical band pair are shown separated by a dashed line), the AAC index may be selected as a pair as follows: 5 and 6, 7 and 8, 9 and 10, 11 and 12, 13 and 14, 15 and 16, and 17. The AAC index 17 comprises a wide range of frequencies, so the index 17 is shown twice, once for the low frequency part and once for the high frequency part.
The frequency qualifier 704 qualifies the number of frequencies (N) selected for use in each critical band pair. In one example, the value of N is 16, meaning that there are 16 data locations in the combination of critical bands forming each critical band pair. Reference numeral 1004 in fig. 10 identifying the 17 frequency locations is shown. The circled position 4 is reserved for synchronization information and therefore this circled position 4 is not used for data.
The number generator 706 defines the number of frequency locations in the critical band pair defined by the critical band pair qualifier 702. In one example, the number generator 706 generates all NPA P digit (digit) number. For example, if N is 16 and P is 7, the process yields numbers 0 to 268435456, but the process may be performed on a base 16 (hexadecimal) which would result in values 0 to 10000000.
A redundancy reducer (redundancy reducer)708 then eliminates all digits in the same position that share more than one common digit between digits from the list of generated digits. This ensures compliance with criterion (1) above, as these digits represent the frequencies selected to represent the symbols, as described below. An excess (excess) reducer 710 may then further reduce the remaining numbers from the generated list of numbers to the number of symbols required. For example, if the symbol space is 129 symbols, the remaining numbers are reduced to a count of 129. The reduction may be performed randomly, or by selecting the remaining numbers with the largest euclidean distance, or by any other suitable data reduction technique. In another example, the reduction may be performed in a pseudo-random manner.
After the foregoing reduction, the count of the number list is equal to the number of symbols in the symbol space. Accordingly, the code frequency qualifier 712 qualifies the remaining digits in the format of the radix P to represent frequency indices representing symbols in the critical band pair. For example, referring to FIG. 10, the hexadecimal number F1E4B0F is in radix 16 matching P. A first digit of the hexadecimal number maps to a frequency component in a first critical band pair, a second digit maps to a second critical band pair, and so on. Each digit represents a frequency index that will be used to represent a symbol corresponding to the hexadecimal number F1E4B 0F.
Using this first hexadecimal number as an example of mapping to a particular frequency index, the decimal value of Fh is 15. Since position 4 of each critical band pair is reserved for non-data information, any hexadecimal value greater than 4 is incremented by one decimal value. Thus, 15 becomes 16. Thus 16 is designated (as indicated by the asterisks in fig. 10) as the code frequency component in the first critical band pair to represent the symbol corresponding to the hexadecimal number F1E4B 0F. Although not shown in fig. 10, the index 1 position (e.g., the second position from the leftmost position in the critical band 7) will be used to represent the hexadecimal number F1E4B 0F.
LUT stuffer 714 receives these symbol indications and corresponding code frequency component indications from code frequency qualifier 712 and populates the information into an LUT.
An example code frequency index table generation process 800 is shown in FIG. 8. This process 800 may be implemented using the system of FIG. 7 or any other suitable configuration. The process 800 of fig. 8 may be used to generate any number of LUTs, such as the LUTs of fig. 3-5. Although one example process 800 is shown, other processes may be used. The result of this process 800 is a code frequency index LUT, where: (1) representing two symbols of the table by a common frequency index of no more than 1, (2) no more than 1 of the frequency indices representing a symbol resides in a common frequency index represented by the MPEG-AA compression standard ISO/IEC13818-7:1997, and (3) code frequencies in adjacent critical bands are not used to represent a single symbol. The criterion number 3 helps to ensure that the audio quality is not impaired during the audio encoding process.
The process 800 begins by defining a plurality (P) of critical band pairs. For example, referring to fig. 9, table 900 includes columns representing AAC critical band indices 902, short block indices 904 in the ranges of these AAC indices, and long block indices 906 in the ranges of these AAC indices. In one example, the value of P may be 7, thus forming 7 critical band pairs from these AAC indices (block 802). Fig. 10 shows the frequency relationship between these AAC indices. According to one example, as shown at reference numeral 1002 in fig. 10 (where the frequencies of a critical band pair are shown separated by a dashed line), the AAC index may be selected as the following pair: 5 and 6, 7 and 8, 9 and 10, 11 and 12, 13 and 14, 15 and 16, and 17. The AAC index 17 comprises a wide range of frequencies, so the index 17 is shown twice, once for the low frequency part and once for the high frequency part.
After the band pairs have been defined (block 802), the number of frequencies (N) is selected for each critical band pair (block 804). In one example, the value of N is 16, meaning that there are 16 data locations in the combination of critical bands forming each critical band pair. As indicated by reference numeral 1004 in fig. 10, 17 frequency locations are shown. The circled position 4 is reserved for synchronization information and therefore this circled position 4 is not used for data.
After defining the number of critical band pairs and the number of frequency locations in those critical band pairs, process 800 results in having no more than one common tenAll N of the six digitsPP digits (block 806). For example, if N is 16 and P is 7, the process yields digits 0 through 268435456, but the process may be performed in radix 16 (hexadecimal), which would result in 0 through FFFFFFF, but does not include multiple digits that share more than one common hexadecimal digit. This ensures compliance with criterion (1) above, as these digits will represent the frequency selected to represent the symbol, as described below.
According to an exemplary process for determining a set of numbers that meet the above criteria (1) (and any other desired criteria), tests from 0 to NP-1. First, a value corresponding to zero is stored as the first member of the result set R. Then, from 1 to N is selectedPThe numbers of-1 are analyzed to determine if these numbers meet criterion (1) when compared to the members of R. The respective numbers that satisfy criterion (1) when compared to all current entries in R are added to the result set. Specifically, according to this example process, to test the number K, each hexadecimal digit of interest in K is compared to the corresponding hexadecimal digit of interest in entry M from the current result set. In 7 comparisons, no more than one hexadecimal digit in K should equal the corresponding hexadecimal digit in M. After comparing K to all the numbers currently in the result set, K is added to the result set R if none of the members in the latter have more than one common hexadecimal digit. The algorithm is repeated for the set of possible digits until all values have been identified that satisfy criterion (1).
Although the above describes an example process for determining a set of numbers that satisfy criterion (1), any process or algorithm may be used, and the present invention is not limited to the above process. For example, the processing may use heuristic rules or the like to remove a number of digits from the set of digits before repeating for the set of digits. For example, all numbers whose relevant bits start with two 0's, two 1's, two 2's, etc., and end with two 0's, two 1's, two 2's, etc., can be removed immediately because these numbers must have a hamming distance of less than 6. Additionally or alternatively, the example process may not be repeated for the entire set of possible digits. For example, the process may be repeated until sufficient numbers are found (e.g., 128 numbers when 128 symbols are desired). In another implementation, the process may randomly select a first value included in the set of possible values, and then may repeatedly or randomly search through all remaining sets of numbers until a value is found that meets a desired criterion (e.g., criterion (1)).
The process 800 then selects the desired number from the generated values (block 810). For example, if the symbol space is 129 symbols, the remaining numbers are decremented to a count of 129. The reduction may be performed randomly, or by selecting the remaining numbers with the largest euclidean distance, or by any other suitable data reduction technique.
After the foregoing reduction, the count of the number list is equal to the number of symbols in the symbol space. The remaining digits in the format of the base P are thus defined to represent frequency indices representing symbols in the critical band pair (block 812). For example, referring to FIG. 10, the hexadecimal number F1E4B0F is in radix 16 matching P. A first digit of the hexadecimal number maps to a frequency component in a first critical band pair, a second digit maps to a second critical band pair, and so on. Each digit represents a frequency index that will be used to represent a symbol corresponding to the hexadecimal number F1E4B 0F.
Using the first hexadecimal number as an example of mapping to a particular frequency index, the decimal value of Fh is 15. Since position 4 of each critical band pair is reserved for non-data information, the value of any hexadecimal digit greater than 4 is incremented by one decimal value. Thus, 15 becomes 16. Thus 16 is designated (as indicated by the asterisks in fig. 10) as the code frequency component in the first critical band pair to represent the symbol corresponding to the hexadecimal number F1E4B 0F. Although not shown in fig. 10, the index 1 position (e.g., the second position from the leftmost position in the critical band 7) will be used to represent the hexadecimal number F1E4B 0F.
After assigning the representative code frequency (block 812), the numbers are populated into the LUT (block 814).
Of course, the systems and processes described in connection with fig. 8-10 are merely examples that may be used to produce LUTs with desired characteristics in connection with the encoding and decoding systems described herein. Other configurations and processes may be used.
Audio decoding
In general, the decoder 116 detects a code signal that is inserted into the received audio in order to form encoded audio at the encoder 102. That is, the decoder 116 looks for emphasized patterns in the frequency of the code it processes. When the decoder 116 has determined which code frequency has been emphasized, the decoder 116 determines the symbols present within the encoded audio based on the emphasized code frequency. The decoder 116 may record the symbols or may decode the symbols into codes that are provided to the encoder 102 for insertion into the audio.
In one implementation, the example decoder 116 of fig. 11 may be implemented, for example, using a digital signal processor that is programmed with instructions to implement the components of the decoder 116. Of course, any other implementation of the example decoder 116 is possible. For example, the decoder 116 may be implemented using one or more processors, programmable logic devices, or any suitable combination of hardware, software, and firmware.
As shown in fig. 11, the example decoder 116 includes a sampler 1102, which sampler 1102 may be implemented using an analog-to-digital converter (a/D) or any other suitable technique, to which sampler 1102 the encoded audio is provided in an analog format. As shown in fig. 1, the encoded audio may be provided over a wired or wireless connection to the receiver 110. The sampler 1102 samples the encoded audio at a sampling frequency of, for example, 8 KHz. Of course, other sampling frequencies may be advantageously selected to increase resolution or reduce computational load in decoding. The Nyquist frequency is 4KHz at the sampling frequency of 8KHz, thus preserving the entire embedded code signal, since the spectral frequency of the embedded code signal is below the Nyquist frequency. The FFT long block length of 9216 samples at a sampling rate of 48KHz is reduced to 1536 samples at a sampling rate of 8 KHz. But even according to this modified DFT block size, these code frequency indices are the same as the original coding frequency and range from 180 to 656.
The samples from sampler 1102 are provided to a superimposer 1104. Generally, the superimposer 1104 emphasizes the code signal in the audio signal information by exploiting the fact that the message is repeated or substantially repeated (i.e., only the least significant bits are changed) for a period of time. For example, when 42 bits of data 226 in the message include a station identifier and a coarse timestamp that increments once every 64 seconds, 42 bits (226 in fig. 2) of 49 bits (226 and 224) of the previously described example message in fig. 2 remain constant for 64 seconds (32 2 second message intervals). The variable data in the last 7-bit group 232 represents time increments in seconds and thus changes from message to message. The example superimposer 1104 aggregates multiple blocks of audio signal information to emphasize a code signal in the audio signal information. In an example implementation, the superimposer 1104 includes a buffer to store multiple samples of audio information. For example, if a complete message is embedded in 2 seconds of audio, the buffer may be 12 seconds long to store 6 messages. The example superimposer 1104 additionally includes an adder to sum the audio signal information associated with the 6 messages and a divider to divide the sum by the selected number of repeated messages (e.g., 6).
By way of example, the watermarked signal y (t) may be represented by a host signal x (t) and a watermark w (t):
y(t)=x(t)+w(t)
in the time domain, the watermark may be repeated after a known period T:
w(t)=w(t-T)
according to an example superposition method, the input signal y (t) is replaced by a superposed signal s (t):
in the superimposed signal s (T), if the period T is sufficiently large, the contribution of the main signal decreases, because the sample values x (T), x (T-T),.. and x (T-nT) are independent. At the same time, the contribution of the watermark, for example consisting of in-phase sinusoids, increases.
Assuming that X (T), X (T-T),. -, X (T-nT) are independent random variables derived from the same distribution X with a zero mean E [ X ] ═ 0, we have:
and
thus, the underlying host signal contributions x (t), will effectively cancel each other out without the watermark changing, so that the watermark can be more easily detected.
In the example shown, the power of the resulting signal decreases linearly with the number n of superimposed signals. Thus, averaging over multiple independent portions of the primary signal may reduce the effects of interference. The watermark is not affected because the watermark is always added in phase.
An example process for implementing the superimposer 1104 is described in conjunction with fig. 12.
The decoder 116 may additionally include a superimposer controller 1106 to control the operation of the superimposer 1104. The example stacker controller 1106 receives a signal indicating whether the stacker 1104 should be enabled or disabled. For example, the superimposer controller 1106 may receive a received audio signal and may determine whether the signal includes significant noise that would distort the signal and, in response to the determination, cause the superimposer to be enabled. In another implementation, the superimposer controller 1106 may receive a signal from a switch that may be manually controlled to enable or disable the superimposer 1104 based on a setting (displacement) of the decoder 116. For example, when the decoder 116 is wired to the receiver 110 or the microphone 120 is placed in close proximity to the speaker 114, the superimposer controller 1106 may disable the superimposer 1104 because the superimposition would not be needed and would result in corruption of rapidly changing data (e.g., the least significant bits of the timestamp) in the respective messages. Alternatively, the superimposer 1104 may be enabled by the superimposer controller 1106 when the decoder 116 is disposed at a distance from the speaker 114 or in another environment where significant interference may be expected. Of course, any type of desired control may be applied by the stacker controller 1106.
The output of the adder 1104 is provided to a time-to-frequency domain converter 1108. The time-to-frequency domain converter 1108 may be implemented using a Discrete Fourier Transform (DFT) or any other suitable technique for converting time-based information to frequency-based information. In one example, the time-to-frequency domain converter 1108 may be implemented using a sliding long block Fast Fourier Transform (FFT), wherein the spectrum of the code frequency of interest is calculated each time 8 new samples are provided to the example time-to-frequency domain converter 1108. In one example, the time-to-frequency domain converter 1108 uses 1536 samples of encoded audio and 192 slips per 8 samples to determine the spectrum therefrom. The resolution of the spectrum generated by the time-to-frequency domain converter 1108 increases as the number of samples used to generate the spectrum increases. Thus, the number of samples processed by the time-to-frequency domain converter 1108 should match the resolution used to select the indices in the tables of fig. 3-5.
The spectrum produced by the time-to-frequency domain converter 1108 is passed to a critical band normalizer 1110, which critical band normalizer 1110 normalizes the spectrum of each critical band. In other words, the frequency with the largest amplitude in each critical band is set to 1, and all other frequencies within each critical band are normalized accordingly. For example, if critical band 1 includes frequencies having amplitudes 112, 56, and 56, the critical band normalizer 1110 adjusts the frequencies to 1, 0.5, and 0.5. Of course, any desired maximum value may be used instead of 1 for this normalization. The critical band normalizer 1110 outputs normalized scores (score) for each frequency of interest.
The spectrum of scores produced by the critical band normalizer 1110 is passed to a symbol scorer 1112, which scorer 1112 computes an overall score for each possible symbol in the table of valid symbols. In an example implementation, the symbol scorer 1112 repeats and sums the normalized scores for each frequency of interest for a particular symbol from the critical band normalizer 1110 for each symbol in the symbol table to produce a score for the particular symbol. The symbol scorer 1112 outputs scores for the respective symbols to a maximum score selector 1114, the maximum score selector 1114 selects the symbol having the maximum score and outputs the symbol and the score.
The identified symbol and score from the maximum score selector 1114 are passed to a comparator 1116, which compares the score to a threshold value. When the score exceeds the threshold, the comparator 1116 outputs the received symbol. When the score does not exceed the threshold, the comparator 1116 outputs an error indication. For example, when the score does not exceed the threshold, the comparator 1116 may output a symbol indicating an error (e.g., a symbol that is not included in the valid symbol table). Thus, an error indication is provided when the message has been corrupted such that a sufficiently large score (i.e., a score that does not exceed the threshold) has not been calculated for the symbol. In an example implementation, an error indication may be provided to the stacker controller 1106 such that the stacker 1104 is enabled when a threshold number of errors (e.g., a number of errors over a period of time, a number of consecutive errors, etc.) are identified.
The identified symbol or error from comparator 1116 is passed to both circular buffer 1118 and pre-existing code flag circular buffer 1120. An example implementation of the standard buffer 1118 is described in connection with fig. 15. The example circular buffer 1118 includes one circular buffer (e.g., 192 buffers) for each sliding of the time-domain to frequency-domain converter 1108. Each of the circular buffers 1118 includes one storage location for each symbol block in the synchronization symbols and messages (e.g., 8 block messages will be stored in an 8-location circular buffer) so that the entire message can be stored in each circular buffer. Thus, when the audio samples are processed by the time-to-frequency-domain converter 1108, the identified symbols are stored in the same location of each circular buffer until that location in each circular buffer has been filled. The symbols are then stored in the next location in each circular buffer. In addition to storing symbols, the circular buffers 1118 may additionally include a location in each circular buffer for storing a sample index that indicates the sample in the received audio signal that resulted in the identified symbol.
The example pre-existing code flag circular buffer 1120 is implemented in the same manner as the circular buffer 1118 except that the pre-existing code flag circular buffer 1120 includes one position for a pre-existing code flag synchronization symbol and one position for each symbol in a pre-existing code flag message (e.g., a pre-existing code flag synchronization including one message symbol will be stored in a circular buffer in two positions). The pre-existing code marker loop buffer 1120 is filled at the same time and in the same manner as the loop buffer 1118.
The example message identifier 1122 analyzes the circular buffer 1118 and the pre-existing code marker circular buffer 1120 for synchronization symbols. For example, the message identifier 1122 searches the circular buffer 1118 for a synchronization symbol and searches the pre-existing code tag circular buffer 1120 for a pre-existing code tag synchronization symbol. When a synchronization symbol is identified, the symbol following the synchronization symbol (e.g., 7 symbols following the synchronization symbol in the circular buffer 1118, or one symbol following the pre-existing code mark synchronization symbol in the pre-existing code mark circular buffer 1120) is output by the message identifier 1122. In addition, a sample index identifying the last audio signal sample processed is output.
The message symbols and sample indices output by the message identifier 1122 are passed to the verifier 1124, which verifies the individual messages. The validator 1124 includes a filter stack (filter stack) that stores a plurality of consecutively received messages. Because the messages are repeated (e.g., every 2 seconds or 16000 samples at 8 KHz), each message is compared to other messages in the filter stack that are separated by approximately the number of audio samples in a single message to determine if there is a match. If there is a match or approximate match, then both messages are verified. If the message cannot be identified, then it is determined that the message is erroneous and the message is not sent from the validator 1124. In the case where a message may be affected by noise interference, a message may be considered a match when a subset of symbols in the message match the same subset in another already verified message. For example, a message may be identified as partially validated if 4 of 7 symbols in the message match the same 4 symbols in another message that has already been validated. The sequence of repeated messages can then be observed to identify non-matching symbols in the partially validated message.
The validated message from validator 1124 is passed to a symbol to bit converter 1126, which symbol to bit converter 1126 uses a valid symbol table to translate the individual symbols into corresponding data bits of the message.
An example decoding process 1200 is shown in fig. 12. The example process 1200 may be performed by the example decoder 116 shown in fig. 11 or by any other suitable decoder. The example process 1200 begins by sampling audio (block 1202). The audio may be obtained via an audio sensor, a hardwired connection, via an audio file, or by any other suitable technique. As described above, the sampling may be performed at 8000Hz or any other suitable frequency.
As each sample is obtained, the samples are aggregated by an adder, such as the example adder 1104 of fig. 11 (block 1204). An example process for performing this overlay is described in connection with fig. 13.
The newly superimposed audio samples from the superimposer process 1204 are inserted into a buffer and the oldest audio samples are removed (block 1206). As individual samples are obtained, sliding time to frequency conversion is performed for a set of samples that includes a large number of old samples and newly added samples obtained at blocks 1202 and 1204 (block 1208). In one example, a sliding FFT may be used to process stream input samples that include 9215 old samples and one newly added sample. In one example, an FFT of 9216 samples is used to obtain a spectrum with a resolution of 5.2 Hz.
After the spectrum is obtained by the time-to-frequency conversion (block 1208), the transmitted symbols are determined (block 1210). An example process for determining transmitted symbols is described in connection with fig. 14.
After identifying the transmitted message (block 1210), post-buffering processing is performed to identify synchronization symbols and corresponding message symbols (block 1212). An example process for performing post-processing is described in connection with FIG. 15.
After post-processing is performed to identify the transmitted message (block 1212), message validation is performed to verify the validity of the message (block 1214). An example process for performing this message authentication is described in connection with FIG. 18.
After the message has been validated (block 1214), the message is converted from symbols to bits using the valid symbol table (block 1216). Control then returns to block 1106 to process the next set of samples.
FIG. 13 illustrates an example process for superimposing audio signal samples to emphasize an encoded code signal to implement the superimposed audio processing 1204 of FIG. 12. This example process may be performed by the superimposer 1104 and the superimposer controller 1106 of fig. 11. The example process begins by determining whether control of the superimposer is enabled (block 1302). When control of the superimposer is not enabled, no superimposition occurs and the process of FIG. 13 ends, and control returns to block 1206 of FIG. 12 to process the audio signal samples that are not superimposed.
When control of the superimposer is enabled, the newly received audio samples are pushed into the buffer and the oldest samples are pushed out (block 1304). The buffer stores a plurality of samples. For example, when a particular message is repeatedly encoded in an audio signal every 2 seconds and the encoded audio is sampled at 8KHz, each message will repeat every 16000 samples, such that the buffer will store a certain multiple of 16000 samples (e.g., the buffer may store 6 messages with a 96000 sample buffer). Next, the adder 1108 selects approximately equal blocks of samples in the buffer (block 1306). These approximately equal blocks of samples are then summed (block 1308). For example, sample 1 is added to samples 16001, 32001, 48001, 64001, and 80001, sample 2 is added to samples 16002, 32002, 48002, 64002, and 80002, and sample 16000 is added to 32000, 48000, 64000, 80000, and 96000.
After the audio signal samples in the buffer are added, the resulting sequence is divided by the number of selected blocks (e.g., 6 blocks) to calculate an average sequence of samples (e.g., 16000 averaged samples) (block 1310). The resulting averaged sequence of samples is output by the adder (block 1312). The process of FIG. 13 then ends, and control returns to block 1206 of FIG. 12.
Fig. 14 illustrates an example process for implementing the symbol determination process 1210 after the received audio signal has been converted to the frequency domain. The example process of fig. 14 may be performed by the decoder 116 of fig. 1 and 11. The example process of FIG. 14 begins by normalizing the code frequencies in the various critical bands (block 1402). For example, the code frequencies may be normalized such that the frequency with the largest amplitude is set to 1 and all other frequencies in the critical band are adjusted accordingly. In the example decoder 116 of fig. 11, this normalization is performed by a critical band normalizer 1110.
After the frequencies of interest have been normalized (block 1402). The example symbol scorer 1112 selects an appropriate symbol table based on the previously determined synchronization table (block 1404). For example, the system may include two symbol tables: one table is for normal synchronization and the other table is for pre-existing code mark synchronization. Alternatively, the system may include a single symbol table or may include multiple synchronization tables that may be identified by synchronization symbols (e.g., cross-table synchronization symbols). The symbol scorer 1112 then calculates a symbol score for each symbol in the selected symbol table (block 1406). For example, the symbol scorer 1112 may repeat for each symbol in the symbol table and add the normalized scores for each frequency of interest for that symbol to calculate a symbol score.
After the individual symbols are scored (block 1406), the example maximum score selector 1114 selects the symbol with the maximum score (block 1408). The example comparator 1116 then determines whether the score for the selected symbol exceeds a maximum score threshold (block 1410). When the score does not exceed the maximum score threshold, an error indication is stored in a circular buffer (e.g., circular buffer 1118 and pre-existing code marking circular buffer 1120) (block 1412). The process of FIG. 14 is then complete, and control returns to block 1212 of FIG. 12.
When the score exceeds the maximum score threshold (block 1410), the identified symbol is stored in a circular buffer (e.g., circular buffer 1118 and pre-existing code marking circular buffer 1120) (block 1414). The process of FIG. 14 is then complete, and control returns to block 1212 of FIG. 12.
FIG. 15 illustrates an example process for implementing the post-buffer process 1212 of FIG. 12. The example process of fig. 15 begins when the message identifier 1122 of fig. 11 searches the circular buffer 1118 and the circular buffer 1120 for a synchronization indication (block 1502).
For example, fig. 16 illustrates an example implementation of a loop buffer 1118, and fig. 17 illustrates an example implementation of a pre-existing code marking loop buffer 1120. In the example shown in fig. 16, the last position in the circular buffer to be filled is position 3 indicated by an arrow. Thus, the sample index indicates the position in the audio signal sample that results in these symbols being stored in position 3. Because the line corresponding to the sliding index 37 is a circular buffer, the successively identified symbols are 128, 57, 22, 111, 37, 23, 47, and 0. Since 128 in the illustrated example is a synchronization symbol, the message may be identified as a symbol following the synchronization symbol. The message identifier 1122 will wait until 7 symbols have been set after the identification of the synchronization symbol at the sliding index 37.
The pre-existing code marker circular buffer 1120 of fig. 17 includes two locations for each circular buffer, as the pre-existing code marker message in the illustrated example includes one pre-existing code marker synchronization symbol (e.g., symbol 254) followed by a single message symbol. According to the illustrated example of fig. 2, a pre-existing code flag data block 230 is embedded in two long blocks next to the 7-bit timestamp long block 228. Thus, because there are two long blocks for the pre-existing code marker data and each long block in the illustrated example is 1536 samples at a sampling rate of 8KHz, the pre-existing code marker data symbol will be identified 3072 samples after the original message in the pre-existing code marker circular buffer. In the example shown in fig. 17, the sliding index 37 corresponds to the sample index 38744, which sample index 38744 is 3072 samples later than the sliding index 37 of fig. 16 (sample index 35672). Thus, the pre-existing code flag data symbol 68 may be determined to correspond to a message in the sliding index 37 of fig. 16, indicating that the message in the sliding index 37 of fig. 16 identifies the original encoded message (e.g., identifies the original broadcaster of the audio), and that the sliding index 37 identifies the pre-existing code flag message (e.g., identifies the re-broadcaster of the audio).
Returning to fig. 12, upon detection of a synchronization or pre-existing code mark synchronization symbol, the messages in the circular buffer 1118 or pre-existing code mark circular buffer 1120 are condensed (consistency) to eliminate redundancy in these messages. For example, as shown in fig. 16, messages are identified in the audio data for a period of time (the sliding indices 37-39 contain the same message) due to the duration of the sliding time-domain to frequency-domain conversion and the encoding for each message. Identical messages in consecutive sliding indices may be reduced to a single message because they represent only one encoded message. Alternatively, the compaction may be cancelled and the entire message may be output, if desired. The message identifier 1122 then stores the condensed message in a filter stack associated with the verifier 1124 (block 1506). The process of FIG. 15 then ends, and control returns to block 1214 of FIG. 12.
Fig. 18 illustrates an example process for implementing the message authentication process 1214 of fig. 12. This example process of FIG. 12 may be performed by the verifier 1124 of FIG. 11. The example process of FIG. 18 begins when the validator 1124 reads the top message in the filter stack (block 1802).
For example, FIG. 19 illustrates an example implementation of a filter stack. The example filter stack includes message indices, 7 symbol positions for each message index, sample index identifications, and validation tags for each message index. Individual messages are added at message index M7, and the message at position M0 is the top message read in block 1802 of fig. 18. Due to sampling rate variations and variations in message boundaries within the message identification, it is expected that messages will be separated by a sample index of a multiple of approximately 16000 samples when the messages are repeated every 16000 samples.
Returning to FIG. 19, after selecting the top message in the filter stack (block 1802), the validator 1124 determines whether the validation token indicates that the message has been previously validated (block 1804). For example, fig. 19 indicates that message M0 has been verified. When the message has been previously authenticated, the authenticator 1124 outputs the message (block 1812) and control proceeds to block 1816.
When the message has not been previously validated (block 1804), the validator 1124 determines whether there is another properly matching message in the filter stack (block 1806). Messages may be properly matched when they are the same as another message, when a threshold number of message symbols match another message (e.g., 4 out of 7 symbols), or when any other error determination indicates that two messages are similar enough to infer that the two messages are the same. According to the illustrated example, a message may be only partially authenticated with another message that has already been authenticated. When an appropriate match is not identified, control proceeds to block 1814.
When an appropriate match is identified, the verifier 1124 determines whether the duration (e.g., in samples) between the same messages is appropriate (block 1808). For example, when a message is repeated every 16000 samples, it is determined whether the interval between two properly matched messages is about a multiple of 16000 samples. When the duration is not appropriate, control proceeds to block 1814.
When the duration is appropriate (block 1808), the verifier 1124 verifies both messages by setting a verification flag for each message (block 1810). When the message has been fully validated (e.g., an exact match), the flag may indicate that the message is fully validated (e.g., the validated message in fig. 19). When the message is only partially validated (e.g., only 4 out of 7 symbols match), the message is marked as partially validated (e.g., the partially validated message in fig. 19). The authenticator 1124 then outputs the top message (block 1812), and control proceeds to block 1816.
When it is determined that there is no appropriate match for the top message (block 1806) or that the duration between the appropriate match(s) is not appropriate (block 1808), the top message is not validated (block 1814). The unverified message is not output from the verifier 1124.
Upon determining not to validate the message (blocks 1806, 1808, and 1814) or not to output the top message (block 1812), the validator 1816 pops the filter stack to remove the top message from the filter stack. Control then returns to block 1812 to process the next message at the top of the filter stack.
Although example manners of implementing any or all of the example encoder 102 and the example decoder 116 have been illustrated and described above, one or more data structures, elements, processes and/or devices illustrated in the figures and described above may be combined, divided, rearranged, omitted, eliminated and/or implemented in any other manner. Moreover, the example encoder 102 and the example decoder 116 may be implemented by hardware, software, firmware, and/or any combination of hardware, software, and/or firmware, and thus, for example, the example encoder 102 and the example decoder 116 may be implemented by one or more circuits, programmable processors, Application Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), and/or Field Programmable Logic Devices (FPLDs), among others. For example, the decoder 116 may be implemented using software on a platform device such as a mobile phone. If any of the appended claims are read to cover a pure software implementation, at least one of the preceding code detector 204, the example message generator 210, the symbol selector 212, the code frequency selector 214, the synthesizer 216, the inverse FFT 218, the mixer 220, the overlapping short block generator 240, the masking evaluator 242, the critical band pair qualifier 702, the frequency qualifier 704, the number generator 706, the redundancy reducer 708, the over-scalar reducer 710, the code frequency qualifier 712, the LUT populator 714, the sampler 1102, the superimposer 1104, the superimposer controller 1106, the time domain to frequency domain converter 1108, the critical band normalizer 1110, the symbol scorer 1112, the max score selector 1114, the comparator 1116, the loop buffer 1118, the pre-existing code mark loop buffer 1120, the message identifier 1122, the verifier 1124, and the symbol to bit converter 1126 are expressly defined herein to include, for example, a memory, DVD, CD, etc. Also, the example encoder 102 and the example decoder 116 may include data structures, elements, processes and/or means in place of, or in addition to, those shown in the figures and described above, and/or may include more than one of any or all of the illustrated data structures, elements, processes and/or means.
Fig. 20 is a schematic diagram of an example processor platform 2000, which example processor platform 2000 may be used and/or programmed to implement any or all of the example encoder 102 and decoder 116, and/or any other components described herein. For example, processor platform 2000 may be implemented by one or more general-purpose processors, processor cores, microcontrollers, etc. In addition, the processor platform 2000 may be implemented as part of a device having other functionality. For example, the processor platform 2000 may be implemented using processing capabilities provided in a mobile phone or any other handheld device.
The processor platform 2000 in the example of fig. 20 includes at least one general purpose programmable processor 2005. The processor 2005 executes encoded instructions 2010 and/or 2012 present in main memory of the processor 2005 (e.g., within RAM 2015 and/or ROM 2020). The processor 2005 can be any type of processing unit, such as a processor core, a processor, and/or a microcontroller. Further, the processor 2005 may execute example machine accessible instructions that implement the processes described herein. The processor 2005 communicates with main memory (including ROM2020 and/or RAM 2015) via a bus 2025. The RAM 2015 may be implemented by DRAM, SDRAM, and/or any other type of RAM device, and ROM may be implemented by flash memory and/or any other desired type of memory device. Access to the memory 2015 and 2020 can be controlled by a memory controller (not shown).
The processor platform 2000 also includes interface circuitry 2030. The interface circuit 2030 may be implemented by any type of interface standard such as a USB interface, a Bluetooth interface, an external memory interface, a serial port, a general input/output, and the like. One or more input devices 2035 and one or more output devices 2040 are connected to the interface circuit 2030.
Although certain example apparatus, methods, and articles of manufacture have been described herein, other implementations are possible. The scope of coverage of this patent is not limited to the specific examples described herein. On the contrary, this patent covers all apparatus, methods, and articles of manufacture fairly falling within the scope of the invention.

Claims (6)

1. A method of transforming media to include encoding, the method comprising the steps of:
detecting a first encoded identification code in the received audio samples, the first encoded identification code having been encoded based on a first lookup table;
in response to the detecting, generating a pre-existing code mark comprising a pre-existing code mark synchronization symbol and a second encoded identification code, the pre-existing code mark encoded based on a second lookup table, wherein the pre-existing code mark synchronization symbol signals a start of the pre-existing code mark;
encoding the pre-existing code label information in the audio samples to transform the audio samples into encoded audio samples including the first encoded identification code and the pre-existing code label; and
the encoded audio samples are stored in tangible memory.
2. The method of claim 1, wherein the second encoded identification code is encoded by:
identifying a set of frequencies corresponding to the second encoded identification code; and
frequencies in the set of frequencies are emphasized.
3. The method of claim 2, wherein the step of emphasizing the frequency comprises the steps of:
generating a code signal having an emphasized frequency from the set of frequencies; and
adding the code signal to the audio samples.
4. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
wherein the first encoded identification code identifies a first media publisher that broadcasts media of the audio sample at a first time and the second encoded identification code identifies a second media publisher that broadcasts the media at a second time later than the first time.
5. An apparatus for transforming media content to include encoding, the apparatus comprising:
a pre-code detector for detecting a first encoded identification code in the received audio samples, the first encoded identification code having been encoded based on a first look-up table;
a code frequency selector for identifying a set of frequencies corresponding to the second encoded identification code;
a code signal synthesizer for generating a pre-existing code mark in response to the detection, the pre-existing code mark including a pre-existing code mark synchronization symbol and the second encoded identification code, the pre-existing code mark being generated from a code signal having an amplified frequency from a set of frequencies, the pre-existing code mark synchronization symbol informing of a start of the pre-existing code mark; and
a mixer for combining a pre-existing code marker with the audio sample to transform the audio sample into an encoded audio sample including the first encoded identification code and the pre-existing code marker, the mixer storing the encoded audio sample in a tangible memory.
6. The apparatus of claim 5, wherein the first encoded identification code identifies a first media publisher that broadcasts media of the audio samples at a first time, and the second encoded identification code identifies a second media publisher that broadcasts the media at a second time later than the first time.
HK12105179.8A 2008-10-24 2009-10-22 Methods and apparatus to perform audio watermarking and watermark detection and extraction HK1164565B (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US10838008P 2008-10-24 2008-10-24
US61/108,380 2008-10-24
US17470809P 2009-05-01 2009-05-01
US61/174,708 2009-05-01
US12/464,811 US9667365B2 (en) 2008-10-24 2009-05-12 Methods and apparatus to perform audio watermarking and watermark detection and extraction
US12/464,811 2009-05-12
PCT/US2009/061749 WO2010048458A2 (en) 2008-10-24 2009-10-22 Methods and apparatus to perform audio watermarking and watermark detection and extraction

Publications (2)

Publication Number Publication Date
HK1164565A1 HK1164565A1 (en) 2012-09-21
HK1164565B true HK1164565B (en) 2016-03-18

Family

ID=

Similar Documents

Publication Publication Date Title
US12189684B2 (en) Methods and apparatus to perform audio watermarking and watermark detection and extraction
US12002478B2 (en) Methods and apparatus to perform audio watermarking and watermark detection and extraction
HK1164565B (en) Methods and apparatus to perform audio watermarking and watermark detection and extraction
HK1207200B (en) Methods and apparatus to perform audio watermarking and watermark detection and extraction
HK1165078B (en) Methods and apparatus to perform audio watermarking and watermark detection and extraction
AU2013203838B2 (en) Methods and apparatus to perform audio watermarking and watermark detection and extraction