METHODS AND APPARATUS TO PERFORM SPREAD SPECTRUM ENCODING AND DECODING FOR BROADCAST APPLICATIONS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 60/458,511 filed March 28, 2003, which is incorporated herein by reference.
TECHINCAL FIELD
[0002] The present disclosure pertains to broadcasting and, more particularly, to methods and apparatus to perform spread spectrum encoding and decoding for broadcast applications.
BACKGROUND
[0003] Audience metering systems uniquely identify programs transmitted by a television or radio station or other similar entity. In audience metering, the objective is to detect audio and video identifiers in homes that are part of the metered sample. The identifiers may be digital codes of 12 to 18 bits in length. One way to perform program identification is to embed in a program audio stream a program a unique identifier in such a manner that the presence of the identifier is imperceptible to the human ear. For example, audio codes may be inserted at a broadcast facility just prior to program transmission. Alternatively, a program originator may insert audio codes in the programming during a post-production phase.
[0004] In the metered home, a microphone and/or electrical leads connected to an audio output line of a television set and/or cable box is used to obtain the audio signal in the vicinity of a television set. Processing of the audio by suitable hardware, and extraction of the embedded codes, enables identification of the program being displayed by the television set at any given instant of time. The extracted codes, along with other information, are generally transmitted to a central office where the extracted codes are used in performing statistical analysis and report generation.
[0005] In recent years, several methods for inserting audio codes, one type of which is referred to as a watermark, have been developed. In the case of watermarks, the primary objective is to identify and authenticate the ownership of audio or video content. The watermark serves as a means of tracking illegal copies of copyrighted content. Watermarks are expected to be robust enough to be detectable even after the
audio or video content has been subjected to several cycles of processing such as compression, analog-to-digital conversion and or reproduction from tape.
[0006] One technique of audio encoding based on modification of selected spectral components such that the components have the highest or lowest power in their neighborhood of frequency bins is disclosed in U.S. Patent 6,272,176, "Broadcast Encoding System and Method." Additionally, there are many other audio watermarking methods based on spectral modifications.
[0007] Other techniques for watermarking audio include phase coding, echo hiding and spread spectrum. In phase coding K consecutive blocks of digitized audio b0 , bj ,.bκ each containing N samples are processed by means of a Fourier Transform to obtain the spectral amplitude and phase of each frequency in each block. The phase angles for the very first block are modified to correspond to certain predetermined values that represent either a binary 1 or a binary 0. To maintain audio fidelity, the phase angles of the frequencies in the subsequent blocks are modified such that the relative phase shift for each frequency with respect to the same frequency in the previous block remains unaltered. During detection, the unique phase angles associated with the first block are recognized.
[0008] Echo hiding is a technique by which a code segment of the audio signal is attenuated using a decay function and added to the original audio signal with a known time offset, thereby making the code signal an echo of the original audio signal. Because humans are unable to perceive an echo with an offset of less than a millisecond, the offsets at which echoes are combined with original audio are typically less than one millisecond. Two unique time offsets are used to represent a binary 1 and a binary 0 respectively. Detection of these echoes in encoded audio is carried out by means of a spectrum analysis.
[0009] Spread spectrum approaches watermarking involve the use of long pseudorandom (PΝ) sequences of logical Is and 0s, wherein often the 0s are treated as having a value of -1. The individual members of the PΝ sequence are called chips to distinguish them from the digital data stream containing bits to be embedded. Binary PΝ sequences have lengths specified by the equation L = 2K - 1 , where K is an integer. A length L PΝ sequence can be generated by an array of K shift registers with appropriate feedback. Consider a PΝ sequence S0 = {sϋ,s1 ,...,sL } of length L
and each element sk = 1 or - 1 . Let Sm = sm , sm÷1 ,..., s{L+m VoL } be a sequence obtained by shifting the elements of S0 to the left by m in a circular fashion. The % in the subscript notation implies modulo arithmetic. When the inner product sos m + sιs m+ι + — • + S L -s (L+m)%L between the shifted and the unshifted sequences is computed, the result is usually a small positive number or small negative number. The exception occurs when m = 0 i.e., when both sequences are identical; and the inner product is a large positive value, L. Thus a PN sequence is characterized by a sharp correlation function. Even if the second sequence is corrupted by noise causing some of the elements not to match, the inner product is a fairly distinct large positive value when the shifted sequence and the unshifted sequence correlate.
[0010] Communication systems such as cellular telephony use spread spectrum technology to allow a limited band of available frequencies to be used simultaneously by several subscribers. In one type of spread spectrum known as direct spread spectrum (DSS), each data bit 1 or 0 is represented by a PN sequence or its inverse (referred to hereinafter as !PN), respectively. The inverse sequence obtained by interchanging the Is and 0s in the original PN sequence. While the PN sequence produces a large positive inner product (approaching L) when there is a correlation, correlation between the PN sequence and its inverse produces a large negative inner product approaching -L.
[0011] When consecutive chips of a PN sequence modulate a carrier wave by any well-known method such as binary phase shift keying (BPSK), the spectrum of the carrier wave spreads across a wide band of frequencies. Unlike a conventional communication system in which a receiver tunes to a carrier frequency and determines the data present as an amplitude, frequency or phase modulation, in a spread spectrum system the received signal is correlated with a reference signal, which is a carrier modulated to represent the PN sequence. Positive and negative peaks in the correlation between the received signal and the reference signal represent the data bits 0 and 1 constituting the received data packet. Even if the code signal corresponding to the PN sequence has very low amplitude and/or energy, its presence can be detected provided its length is sufficient to generate a distinct peak in the correlation.
[0012] The main criteria for selecting the length of the PN sequence are computing power available at the decoder, along with data throughput. A longer sequence requires higher processor speeds to perform continuous correlations and also decreases the effective data rate.
[0013] In the case of embedding audio watermarks using DSS, a carrier wave, such as, for example, a 4 kilohertz (KHz) cosine wave may be BPSK modulated using a 4095 chip PN sequence, which typically spreads the energy of the original 4 KHz carrier across roughly the 0-8 KHz frequency band. The modulated carrier wave is then added to the original audio with an appropriate amplitude so that the effect on any single "critical band" of the human auditory system is minimal. The addition of the modulated carrier wave is equivalent to the addition of broadband noise to the original audio. The amplitude of the carrier wave that is combined with the original audio is usually controlled by psycho-acoustic masking models, which specify the maximum energy change permissible for each critical band so that the changes are imperceptible to the human ear.
[0014] Spread spectrum methods for watermarking have been proposed. For example, U.S. Patent 5,319,735 discloses a data hiding technique for audio using techniques of spread spectrum modulation. Others references describe a spread spectrum technique that uses the motion pictures expert group (MPEG) psycho- acoustic model to control the amplitude of a 511 chip PN sequence or an audio watermarking scheme in which the amplitude of the spread spectrum signal is controlled by a psycho-acoustic model.
[0015] One significant drawback to presently known spread spectrum techniques is the need to modify the audio content of programming content frequently. Often, injecting spread spectrum audio data into an audio signal leaves a greater opportunity to adversely affect the fidelity of the audio signal into which the spread spectrum data is being injected.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] FIG. 1 is a block diagram of a spread spectrum audio encoding and decoding system.
[0017] FIG. 2 is a detailed block diagram of the code generator of FIG. 1.
[0018] FIG. 3 is a detailed block diagram of the data/synch generator of FIG 2.
[0019] FIG. 4 is an illustration of example waveforms representative of one and zero chips of a PN signal produced by the modulator of FIG 2.
[0020] FIG. 5 is a detailed illustration of the amplitude corrector of FIG. 2.
[0021] FIG. 6 is an illustration of an example mask amplitude chart.
[0022] FIG. 7 is an illustration of an example code spectrum average amplitude chart.
[0023] FIG. 8 is an illustration of an example setting code amplitude chart.
[0024] FIG. 9 is a diagram illustrating how information is encoded by the code generator of FIG. 1.
[0025] FIG. 10 is a detailed block diagram of the code processor of FIG. 1.
[0026] FIG. 11 includes detailed illustrations of received program content including spread spectrum signals embedded therein and the received program content after correlation with the proper PN signal.
[0027] FIG. 12 is a block diagram of a processor-based system on which a spread spectrum encoder and/or decoder may be implemented.
[0028] FIG. 13 is a flow diagram of an example encode process that may be implemented on the processor-based system of FIG. 12.
[0029] FIGS. 14A and 14B are flow diagrams of an example decode process that may be implemented on the processor-based system of FIG. 12.
DETAILED DESCRIPTION
[0030] As described below, encoded PN sequences may be used as markers to delineate data. Correlation peaks obtained with spread spectrum sequences are extremely well defined in the time domain and, therefore, a number of audio samples and/or elapsed time between adjacent markers may be used to convey information. For example, a first marker in a message packet may be a synch marker, which produces a positive correlation during detection. Following the synch marker, several data markers may be used to denote data by a sample count or elapsed time therebetween.
[0031] Turning now to FIG. 1, a spread spectrum audio encoding and decoding system 100 generally includes a transmission portion 102 and a reception portion 104
that are linked via a medium 106. The programming content handled by the system 100 may be audio programming or an audio portion of audio/video programming. As will be readily appreciated by those having ordinary skill in the art, the transmission portion 102 may be located at, for example, a broadcast location and the reception portion 104 may be located at a media consumer's home that may be a significant distance from the transmission portion 102.
[0032] The medium 106 shown in FIG. 1 may be, for example, wired or wireless media. In one example, the media 106 may be a cable system infrastructure. As an alternative example, the transmitter 112 may broadcast information over the air at frequencies in the very high frequency (VHF) range, the ultra high frequency (UHF) range, etc. Additionally or alternatively, the medium 106 may be the amplitude modulation (AM) or frequency modulation (FM) radio signal bands. As a further example, one or more relays (not shown) may be interposed between the transmission portion 102 and the reception portion 104. For example, the transmission portion 102 may transmit information to a satellite (not shown) at a frequency in the Ku-band and the satellite may, in turn, relay received signals to the reception portion of 104.
[0033] According to the example of FIG. 1, the transmission portion 102 includes a code generator 110 that is coupled to a transmitter 112, which also receives program content such as, for example, television or radio content. As described in detail below, during operation the code, generator 110 receives information from a data source as well as the programming information. The information provided by a data source may include, for example, identification of a station broadcasting the programming content and may also include a timestamp identifying the time at which the programming content is being broadcast. The code generator 110 processes the data and the programming content to generate codes that are passed to the transmitter 112 and embedded into the programming content in a manner that is hardly humanly detectable, if detectable at all. The transmitter combines the output from the code generator 110 with the program content to generate signals to be sent to across the medium 106 to the reception portion 104. As noted previously, the medium 106 may be any one of various wired or wireless media. The properties of the transmitter (e.g., transmission frequency, amplitude, etc.) are selected to be compatible with medium
[0034] The reception portion 104, as shown in the example of FIG. 1, includes a receiver 120 that is coupled to a code processor 122. The receiver 120 is selected to be complimentary to the transmitter 112 so that the receiver 120 can receive the information broadcast from the transmitter 112. Signals received from the medium 106 by the receiver 120 are processed to output program content, such as audio, video, etc. The program content from the receiver 120 is coupled to the code processor 122, which processes such signals to obtain the code that was produced by the code generator 110 and broadcast by the transmitter 112.
[0035] As described in further detail below, the code generator 110 and the code processor 122 operate using spread spectrum techniques. In particular, as described below, the code generator 110 outputs a spread spectrum code or codes that are embedded into, for example, audio program content. The timing between consecutive spread spectrum codes corresponds to the data embedded into the program content. As the program content is received by the code processor 122, the code processor 122 searches for the presence of spread spectrum code(s) therein as samples of the received signal are processed. When a complete spread spectrum code is received, the code processor 122 begins counting received samples, or elapsed time, until a second spread spectrum code is received. The number of samples or elapsed time period between consecutive spread spectrum codes corresponds to the encoded data.
[0036] As shown in the example of FIG. 2, the code generator 110 includes a packetizer 202, a base converter 204, a multiplier 206 and a sample counter 208. The code generator 110 further includes a data/synch generator 210, a modulator 212 and an amplitude corrector 214. Input data 220, which may for example include a station identifier (ID) 222 and a time stamp 224, are coupled to the packetizer 202. In one example, the station ID 222 may be a 16-bit binary value and the time stamp may be a 32-bit binary value denoting the actual time at which the station ID was inserted at the broadcast facility.
[0037] In operation, the packetizer 202 receives the input data 220, which may be, for example, 48 bits in length (wherein the station ID 222 is 16 bits long and the time stamp 224 is 32 bits long), and converts the input data 220 into a number of 12-bit packets. For example, if the input data 220 is 48 bits in length, the packetizer 202 segments the 48 bits into four, 12-bit packets.
[0038] The 12-bit packets are coupled to the base converter 204, which converts each of the 12-bit packets from a binary value into a decimal value. For example, the four, 12-bit packets are converted into four decimal values of d0,d1,d2,d . The decimal value of each 12-bit packet determines the separation between sets of spread spectrum markers, wherein the separation between sets of markers will be detected by the code processor 122 of FIG 1 and used to recover the information encoded by the code generator 110.
[0039] The decimal value of each 12-bit packet is coupled to the multiplier 206, which multiplies each decimal value by a factor of N to determine the number of samples or corresponding time period between markers. In one example, the value of N is 16 and, therefore, a decimal value of dx, incremented by 1, is multiplied by a factor of 16 to yield a value of 16(dx +1). The value of N could, of course, be selected to have another value. The multiplication is performed to allow for jitter in the transmission and decoding processes carried out down stream of the code generator 110. In the disclosed example, the maximum sample count corresponding to a 12-bit decimal value of 2047, which, if the multiplication factor N is 16, yields a value of 32768. Accordingly, 32768 samples would occur between markers in the case of encoding a decimal value of 2047. Alternatively, in this case, a maximum time of 0.68 seconds may elapse between markers if the samples are taken at 48 KHz.
[0040] Each of the four multiplied decimal values is provided to the sample counter 208 along with the output from the amplitude corrector 214. The sample counter 208 periodically enables the data/synch generator 210 to output a PN sequence to be embedded into audio information. The periodicity with which the sample counter 208 enables the data/synch generator 210 is dictated by the output from the multiplier 206 and the number of samples output from the amplitude corrector 214. For example, if the decimal number output by the multiplier 206 is 32768, the sample counter 208 counts the number of samples output from the amplitude corrector 214 and, when number of samples received by the sample counter 208 reaches 32768 since the last marker, the sample counter 208 outputs an enable signal to the data/synch generator 210. In an alternative, the sample counter 208 may function as a timer. In such a case, the enable signal would be output 0.68 seconds after the completion of the output of the prior marker.
[0041] In general, the data/synch generator 210 receives the enable signal and, in response to the enable signal, generates a PN sequence representing a data signal or the inverse of a PN sequence representing a synchronization signal. An example data/synch generator 210, as shown in FIG. 3, includes a 12-stage shift register 302 having a number of stages, four relevant stages of which are shown at reference numerals 304, 306, 308 and 310. Each of the relevant stages 304-310 is coupled to an odd parity generator 312, the output of which is coupled to an enable block 314. The shift register stage 310 is coupled to an output driver 316 that receives a synch enable signal and selectively inverts the contents of the shift register stage 310.
[0042] During operation of the data/synch generator 210 all stages of the shift register are initially all set to values of logical one. When the sample counter 208 enables the data/synch generator 210 via the enable block 314, the shift register 302 is shifted to the right. After each shift to the right the bit in the first shift register stage 304 is computed by the odd parity generator 312 as the odd parity among the bit values in the relevant stages 304-310 prior to the shift. In this manner 4095 chips are output from the shift register 302 through the output driver 316. When a data enable signal is asserted to the output driver 316, the PN output from the data/synch generator 210 will inverted to be a data marker (!PN). Conversely, if the data enable signal is not asserted to the output driver 316, the PN signal is not inverted and is a synch marker (PN). An additional chip with a zero value is added to make the total sequence length 4096. This is done because when performing subsequent spectral analysis by means of a Fast Fourier Transform (FFT) it is useful to have a block size that may be expressed as a power of two.
[0043] Returning to the description of the operation of the code generator 110 as shown in FIG. 2, the output of the data/synch generator 210, which consists of PN sequences of logical ones and logical zeros, is coupled to the modulator 212. In response to the logical ones and logical zeros, the modulator 212 outputs chips formed by samples. For example, as shown in FIG. 4, a graph 400 having a y-axis 402 showing magnitude and an x-axis 404 that delineates sample numbers includes a plot 406 representing a logical one chip and a plot 408 representing a logical zero chip. As will be readily appreciated by those having ordinary skill in the art, the logical zero chip and the logical one chip are binary phase shift keyed (BPSK) modulated cosine waveforms. Each chip may be, for example, one period of a 4 KHz waveform,
which, when sampling at rate of 48 KHz, may be represented by 12 samples of audio. A guard band of two samples on one side and one sample on the other side of the waveform are added to yield a total chip length of 15 samples.
[0044] In one example, when the data/synch generator 210 outputs a synch marker to the modulator 212, the output of the modulator is 4096 (the length of the marker) x 15 (the number of samples representing each marker bit) = 61440 samples, which has a duration of 1.28 seconds. As stated earlier, a data marker may be obtained by interchanging the ones and zeros of the PN sequence (i.e., inverting the PN sequence) which, when uninverted, is a synch marker.
[0045] The samples output from the modulator 212 are coupled to the amplitude corrector 214. As described in detail below, the amplitude corrector 214 receives the program content and processes the program content to determine the amplitude at which the samples should be output to the transmitter 112 of FIG. 1.
[0046] As shown in further detail in the example of FIG. 5, the amplitude corrector 214 includes an analog-to-digital converter (A/D) 502, a Fourier transform module (FFT) 504, a masking curve generator 506, and a programmable gain module 508.
[0047] During operation of the amplitude corrector 214, program content (e.g., an audio signal) is provided to the A/D 502, which samples the program content at, for example, 48 KHz, wherein each sample is represented by 16 bits. The sampled program content is collected into blocks of 512 samples, which are passed to the FFT 504. The FFT 504 combines transforms of two previous blocks to generate a psycho- acoustic masking curve based on the variation in spectral energy distribution across the three blocks. In general, and as described in detail below, the program content spectral distribution is passed to the masking curve generator 506, which processes the spectrum provided by the FFT 504 to produce a mask amplitude of FIG. 6. The programmable gain module 508 combines the mask amplitude of FIG. 6 with a code spectrum average amplitude as shown in FIG. 7 to obtain a setting code amplitude of FIG. 8.
[0048] As shown in FIG. 6, the mask amplitude chart 600 includes a y-axis 602 representing amplitude and an x-axis 604 representing frequency, such as critical band indices, a mask amplitude curve 606 dependent on the information provided by the FFT 504 output. The masking amplitude 606 is one example of a psycho-acoustic
masking curve, one example of which is well known to those having ordinary skill in the art and is described in the Moving Picture Experts Group - Advanced Audio Coding (MPEG-AAC) specification. In general, the mask amplitude curve 606 represents the audio amplitude below which a particular frequency of audio content may be altered without perception of the alteration by the user. Amplitude changes to program content below the mask amplitude curve 606 are imperceptible to the unaided human ear. Accordingly, different frequency bands can tolerate more modification than can other frequency bands. For example, as shown in FIG. 6, the critical band index 20 is the band that can tolerate the smallest amplitude change.
[0049] Either before or after the mask amplitude curve 606 is determined, the programmable gain module 508 calculates a code spectrum average amplitude of the PN output from the data/synch generator 210. The PN output from the data/synch generator 210 is fixed, so the code spectrum average amplitude may be pre-calculated and stored within the programmable gain module 508. One example of a code spectrum average amplitude chart 700 is shown in FIG. 7, which has a y-axis 702 representing amplitude and an x-axis 704 representing frequency indices on which a code spectrum average amplitude curve 706 is plotted. The code spectrum average amplitude curve 706 may be calculated by splitting a unit amplitude marker signal into 120 blocks, wherein each block has 512 samples. Each block of samples is processed by a Fourier transformation, which results in 256 unique frequency indexes, each of which has an amplitude and phase. The spectral power of a frequency with an index k may be denoted by Pk . By computing the spectral power for the index k in each of the 120 blocks, an average value for spectral power PUverage and average amplitude VUvemge = ^PUverage may be determined. FIG. 7 shows the average amplitude for a range of frequency indexes k = 0,1,...255 . It is seen that frequency indexes ranging from 24 to 72 have significant values and may be used in conjunction with a psycho-acoustic masking function to determine the amplitude level for the code signal. The frequency index range between 24 and 72 corresponds to the critical bands of 12 to 25 of the masking amplitude of FIG. 6.
[0050] The combination of the code spectrum average amplitude curve 706 and the mask amplitude curve 606 yields a setting code amplitude chart 800 including a setting code amplitude curve 802 plotted on a coordinate system having a y-axis 804
representing amplitude and an x-axis 806 demarking critical band indexes. In particular, the setting code amplitude curve 802 is obtained by dividing the mask amplitude curve 606 by the code spectrum average amplitude curve 706. As will be readily appreciated by those having ordinary skill in the art, the setting code amplitude curve 802 accounts for both tonal and noise-like features of the current block of audio to calculate masking energy, which is the energy change below which modification of the program content will be detectable by an unaided human. The mask calculations are performed on a critical band basis. Each critical band at the lower end of the spectrum contains a pair of frequencies. Higher critical bands may encompass several frequencies.
[0051] The programmable gain module 508 considers both the mask amplitude curve 606 of an incoming block of audio and the setting code amplitude curve 802 to select the amplitude at which the output of the modulator 212 should be coupled to the transmitter 112 (FIG. 1). In particular, consider a 512-sample block of audio with signal energy Eb in critical band b. If the psycho-acoustic masking energy available in critical band b is Mb , the estimated change in amplitude sustainable by that band is Ab = -x Eb + Mb - jE^ assuming the worst-case scenario in which the added signal is in phase with the original signal for this spectral band. Knowing the frequency indexes encompassed by band b, a normalization factor Nb can also be determined from FIG. 8. The normalization factor N6is such that the amplitude of the code
signal, based on available mask energy in band b is Acode[b] = A . Φ FIG η [s a plot of
Acode[b] for a typical 512-sample block of audio. The code amplitude specified by critical band 20 of FIG. 8 has the lowest value of approximately 60. The combination of masking energy in critical band 20 and the code spectral energy in that band results in the worst case value for code amplitude. By choosing 60 as the code amplitude for this block of audio it is ensured that at all critical frequency bands in the code spectral energy is lower than the masking energy available.
[0052] Once the programmable gain module 508 is set to the proper gain, the amplitude of each sample of each chip output from the modulator 212 (FIG. 2) is amplified by the programmed gain. At the transmitter 112, the signal from the amplitude corrector 214 and the program content are combined. Accordingly, in one
example, the modulated PN sequence, which is a data or synch marker, is embedded into the audio portion of the program content.
[0053] FIG. 9 illustrates, at a high level, the information contained in a transmitted signal. For example, a data or synch signal may be transmitted with the program content over a first period of time or number of samples denoted by reference numeral 902. After the data or synch signal 902 is output, a number of samples of program content are broadcast over a period of time denoted at reference numeral 904. A second data or synch signal will then be transmitted over a period of time denoted at reference numeral 906. The timing or number of samples between the markers 902 and 906 is controlled by the sample counter 208 to encode packets of the input data 220. For example, as referred to above, a packet of input data may have a value of 32768, which indicates that there are 32768 samples between the end of the first data or synch 902 and the beginning of the second data or synch 906. In one example, if four packets of data are to be transmitted, the first packet may be represented by the spacing between a synch and a data signal and the three subsequent packets may be represented by the spacing between data signals.
[0054] As described in detail below, the code processor 122 of the receive portion 104 detects the presence of the data or synch signals 902, 906 and determines the number of samples that lie therebetween to determine the data that was originally encoded by the code generator 110. Alternately, timing between markers may be used to represent the encoded information. In particular, the receiver 120 of FIG. 1 receives the broadcast signal and generates a signal that is coupled to the code processor 122. The signal may be an audio signal having information embedded therein. In the alternative, the source of the information may be a microphone amplifier that detects audio.
[0055] As shown in FIG. 10, the code processor 122 includes an analog-to-digital converter (A/D) 1002 that digitizes at, for example, 48 KHz, the signal from the receiver 120. The sampled audio is coupled to a buffer 1004, which may be implemented using a 15-sample circular buffer in which each new sample becomes the last sample entered into the buffer. The output of the buffer 1004 is coupled to a correlator 1006, which also receives a reference chip signal from a reference chip signal generator 1008. The reference chip signal may be, for example, 15 samples of either the signal 406 or 408 of FIG. 4. The correlator 1006 determines the inner
product between the 15 samples in the buffer 1004 and the 15 samples provided by the reference chip signal generator 1008. If the samples in the buffer 1004 positively correlate with the samples provided by the reference chip signal generator 1008, the results of the correlation will be relatively large positive products. If, on the other hand, the samples in the buffer 1004 correlate negatively with the samples in the reference chip signal generator 1008, the output from the correlator 1006 will be a relatively large negative value.
[0056] The results of the correlation are provided to a comparator 1010 that compares the correlation results to two thresholds Tl 1012, T2 1014. In one example the threshold Tl 1012 may be a small positive threshold and T2 1014 may be a small negative threshold. If the inner product is greater than the threshold Tl 1012, the comparator 1010 declares the chip to have a value of +1. If the inner product is less than the threshold T2, the comparator 1010 declares the chip to have a value of -1. If the inner product has a value between the threshold Tl 1012 and T2 1014, the chip is declared to have a value of zero. The chip is passed to a buffer or array 1018 to form the last element in a 4096 chip long sequence.
[0057] The array 1018 stores the chip values corresponding to the previous 61425 samples, along with the chip value passed thereto by the comparator 1008 to form a 4095 chip long array. The contents of the array 1018 are compared to a reference signal provided by a reference signal generator 1022 to determine an inner product between the two. The reference signal generator 1022 may output synch or data marker signals to the correlator 1020.
[0058] The output from the correlator 1020 is coupled to a peak detector 1024, which also receives an input from a threshold generator 1026. Peaks appearing in the inner product value indicate the detection of markers. For example, a SYNCH marker correlated with the reference signal 1022 produces a large positive peak and a DATA marker correlated with the reference signal 1022 produces large negative peaks. The threshold generator 1026 determines moving averages across inner products of, for example, 1000 or more samples. Each time a peak is detected, the peak detector 1024 provides a signal to a sample counter 1028. The sample counter determines the number of samples that are processed between peaks that are detected by the peak detector 1024.
[0059] The count of the number of samples between detected peaks is passed to a divider 1030, which divides the count by N, which may be, for example, 16. In any event, the value of N will selected to be the same as the value of N used by the multiplier 206 of the code generator 110. The counts divided by N yield the 12-bit values corresponding to the initial data groups encoded by the code generator 110 and are output from the divider 1030 as data out. At the end of four such data groups, another SYNCH marker denoting the start of a new message packet will be expected.
[0060] To aid in the understanding of the code processor 122, a plot 1100 is shown in FIG. 11. The plot 1100 includes a y-axis 1102 delineated as amplitude and an x- axis 1104 delineated as samples. The plot 1100 includes a first sub-plot 1106 showing program information (i.e., program audio) having data encoded therein. As an example, the samples shown in the first sub-plot 1106 may be a series of samples output from the A/D 1002, only a subset of which will be contained in the buffer 1004 at any given time. As will be readily appreciated by those having ordinary skill in the art, the magnitudes of the samples in the first sub-plot 1106 are dictated by program information (e.g., audio) and the markers embedded therein, wherein the number of samples between the embedded markers represents the data embedded in the first subplot 1106.
[0061] A second sub-plot 1108 represents the output from, for example, the correlator 1020 of FIG. 10. In particular, the positive peaks (e.g., a positive peak 1110) represent a period in time in which the marker embedded in the program information was correlated with the reference signal identical to the marker signal. In contrast, the negative peaks (e.g., a negative peak 1112) represent a period of time in which the inverse of the marker signal was correlated with the marker signal. Accordingly, for example, the positive peaks may represent the presence of a synch signal and the negative peaks may represent the presence of a data signal.
[0062] The foregoing describes various functionalities as being associated with blocks to form the code generator 110 and the code processor 122. However, the functionality of the code generator 110 and the code processor 122 can be implemented by software or firmware that is executed by one or more processor systems. In one example, an example processor system 1200 such as that shown in FIG. 12 may be used to implement one or more processes to form a code generator that may be located at, for example, a broadcast facility. An additional processor
system 1200 may be used in conjunction with programming instructions to implement a code processor. Further detail regarding instructions that may be used in conjunction with processor systems 1200 to perform code generation and code processing is provided below following a description of the example processor system 1200.
[0063] The example processor system 1200 includes a processor 1202 having associated memories 1204, such as a random access memory (RAM) 1206, a read only memory (ROM) 1208 and a flash memory 1210. The processor 1202 is coupled to an interface, such as a bus 1212 to which other components may be interfaced. In the illustrated example, the components interfaced to the bus 1212 include an input device 1214, a display device 1216, a mass storage device 1218 and a removable storage device drive 1220. The removable storage device drive 1220 may include associated removable storage media 1222. Such as magnetic or optical media.
[0064] The example processor system 1200 may be, for example, a conventional desktop personal computer, a notebook computer, a workstation or any other computing device. Additionally, the example processor system may be implemented using a digital signal processor (DSP)-based architecture. In a DSP-based architecture, some of the components interfaced to the bus may be eliminated.
[0065] The processor 1202 may be any type of processing unit, such as a microprocessor, a microcontroller, a DSP or custom hardware, such as an application- specific integrated circuit.
[0066] The memories 1206-1210 that are coupled to the processor 1202 may be any suitable memory devices and may be sized to fit the storage demands of the system 1200. The memories 1206-1210 may store, for example instructions that implement the functionality described below. The processor 1202 may recall such instructions from memory 1204 for execution.
[0067] The input device 1214 may implemented by a keyboard, a mouse, a touch screen, a track pad or any other device that enables a user to provide information to the processor 1202. Alternatively, the input device 1214 may be a network connection or an input port that may receive and transmit information to and from the processor 1202. For example, programming information such as audio information may be passed to and received from the processor 1202 via the input device 1214.
[0068] The display device 1216 may be, for example, a liquid crystal display (LCD) monitor, a cathode ray tube (CRT) monitor or any other suitable device that acts as an interface between the processor 1202 and a user. The display device 1216, as pictured in FIG. 12, includes any additional hardware required to interface a display screen to the processor 1202.
[0069] The mass storage device 1218 may be, for example, a conventional hard drive or any other magnetic or optical media that is readable by the processor 1202.
[0070] The removable storage device drive 1220 may, for example, be an optical drive, such as a compact disk-recordable (CD-R) drive, a compact disk-rewritable (CD-RW) drive, a digital versatile disk (DVD) drive or any other optical drive. It may alternatively be, for example, a magnetic media drive. A removable storage media 1222 is complimentary to the removable storage device drive 1220, inasmuch as the media 1222 is selected to operate with the drive 1220. For example, if the removable storage device drive 1220 is an optical drive, the removable storage media 1222 may be a CD-R disk, a CD-RW disk, a DVD disk or any other suitable optical disk. On the other hand, if the removable storage device drive 1220 is a magnetic media device, the removable storage media 1222 may be, for example, a diskette or any other suitable magnetic storage media.
[0071] As will be readily appreciated by those having ordinary skill in the art, some components of the system 1200 may be omitted in certain implementations. The display device 1216, the mass storage device 1218 and the removable storage device drive 1222 are examples of components that may be omitted.
[0072] An example encode process 1300 is illustrated in FIG. 13. The encode process 1300 may be implemented using one or more software programs or sets of instructions that are stored in one or more memories (e.g., the memories 1206-1210) and executed by one or more processors (e.g., the processor 1202). However, some or all of the blocks of the encode process 1300 may be performed manually and/or by some other device. Additionally, although the encode process 1300 is described with reference to the flowchart illustrated in FIG. 13, a person of ordinary skill in the art will readily appreciate that many other methods of performing the encode process 1300 may be used. For example, the order of many of the blocks may be altered, the
operation of one or more blocks may be changed, blocks may be combined and/or blocks may be eliminated.
[0073] In general, the encode process 1300 receives programming content and information to be encoded into the programming content. The encode process 1300 uses spread spectrum techniques to embed synchronization or data signals into the programming content at intervals that are dictated by the information to be encoded into the programming content.
[0074] The encode process 1300 begins by reading a data set to be encoded and sample blocks of programming information into which the data is to be encoded (block 1302). In one example, the data set to be encoded may include a station identifier and a timestamp that are provided to the processor 1202 via the input device 1214, the sum of which may be represented by 48 bits of data. The sample blocks of programming information may include, for example, audio program content that may be from radio programming or may be the audio portion of television programming. The sample blocks may be stored in one or more of the memories 1206-1210 or the mass storage device 1218 and read by the processor 1202. In the alternative, the sample blocks may be stored remotely from the example processor system 1200 and provided to the processor 1202 via the input device 1214.
[0075] After the data and sample blocks have been read (block 1302), the frequency spectrum of a marker signal (i.e., a data or synch signal) is determined (block 1304). For example, referring to FIG. 3, the frequency spectrum of the PN or the !PN may be calculated to yield a curve as shown in FIG. 7. In the alternative, a curve similar to that of FIG. 7 for the PN or the !PN may be precalculated and stored in memory 1206-1210 and read by the processor 1202 at the block 1304.
[0076] After the frequency spectrum of the signal to be embedded into the program content is determined (block 1304), a code amplitude setting curve similar to that of FIG. 6 is determined (block 1306). As will be readily appreciated by those having ordinary skill in the art, the masking curve represents the energy or amplitude change below which signal alteration in particular frequency bands will be undetectable. The masking curve may be calculated according to any number of known and suitable techniques including those described in conjunction with MPEG encoding.
[0077] After the frequency spectrum of the data or synchronization spread spectrum signal is calculated and the masking curve of the sample blocks is determined (blocks 1304 and 1306), the code amplitude at which the spread spectrum signal is to be injected into the program information is determined (block 1308). As described above, the code amplitude may be determined by comparing relevant frequency bands of the code spectrum (e.g., the code spectrum shown in FIG. 7) and the masking curve (e.g., the masking curve shown in FIG 6).
[0078] The encode process 1300 converts the data set read at block 1302 into packets (block 1310). The packets may be, for example, 12 bits in length. For example, if the data set is 48 bits in length, the encode process 1300 converts the data set into four, 12-bit packets. After the packets are formed (block 1310), the packets and converted into a decimal base and each packet is stored (block 1312).
[0079] After the packets are stored (block 1312), the next packet is read (block 1314) and multiplied by a factor of, for example, 16 (block 1316). The multiplication factor need not be 16 and could be any other suitable factor. As noted above, the multiplication factor is used to improve the robustness of the system in terms of sample counting. For example, if a decimal 9 is to be transmitted, a data or synchronization signal embedded in the program content followed by a number of program content samples and another data or synchronization signal. If no multiplication factor were used (or the multiplication factor were one), a receiver would count 10 samples between two synchronization and/or data signals. If one of the samples were missed or double counted, there would be a 10% error in determining the information that was encoded. If, however, a multiplication factor of 16 were used and a sample were missed or double counted, the margin of error introduced by the missed or double counted sample would be 0.625%. Accordingly, the multiplication factor enhances the robustness of the system.
[0080] The encode process 1300 stores the results of the multiplication as a required sample count (block 1318) and compares the required sample count to the number of samples that have been transmitted since the last data or synchronization signal was completely transmitted (block 1320). If the sample count does not equal the required sample count (block 1320), control remains at the block 1320 as samples continue to be transmitted. When the sample count reaches the required sample count (block 1320), the time domain version of the code signal (i.e., a synchronization or a
data signal) is synthesized (block 1322) and combined with the program content at the amplitude determined at the block 1308 (block 1324).
[0081] The code signal to be used depends on whether the information being encoded is the first packet of a data set of subsequent packets of a data set. For example, the first packet may be represented by a number of samples following a synchronization signal and subsequent data packets may be represented by the number of samples following data signals. As noted previously, in one example arrangement a synchronization signal could be embedded when a new set of data is being transmitted and a data signal could be embedded between blocks of the data set. As described above, the spread spectrum signals representing data and synchronization may be inverses of one another. In the alternative, the data and synchronization signals may not be so related and may be completely different from one another.
[0082] After the code is combined with the program content (block 1324), the encode process determines if the entire data set has been transmitted (block 1326). If the entire data set has been transmitted, the process reads another data set and more sample blocks (block 1302). In the alternative, if the entire data set has not been fransmitted (block 1326), the next data packet is read (block 1314). The encode process 1300 loops between the blocks 1314 and 1326 until each data packet of the data set is transmitted.
[0083] An example decode process 1400 is illustrated in FIGS. 14A and 14B. The decode process 1400 may be implemented using one or more software programs or sets of instructions that are stored in one or more memories (e.g., the memories 1206- 1210) and executed by one or more processors (e.g., the processor 1202). However, some or all of the blocks of the decode process 1400 may be performed manually and/or by some other device. Additionally, although the decode process 1400 is described with reference to the flowchart illustrated in FIGS. 14A and 14B, a person of ordinary skill in the art will readily appreciate that many other methods of performing the decode process 1400 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined and/or blocks may be eliminated.
[0084] In general, the decode process 1400 receives programming content (e.g., program audio) having spread spectrum data or synchronization signals embedded therein. The decode process 1400 detects the presence of the spread spectrum signals in the program content and counts the number of samples of program content that are received between the spread spectrum signals. The number of samples between the spread spectrum signals represents information encoded into the program content.
[0085] The decode process 1400 begins by reading received samples into a buffer (block 1402). The received samples may have been previously stored in a memory 1206-1210 or may have been stored in the mass storage 1218.
[0086] The decode process 1400 then computes an inner product between certain ones of the samples and samples of a reference chip (block 1404). For example, the decode process 1400 may compute an inner product between 16 received samples that are buffered and 16 samples of a reference chip, such as one of the referenced chips shown at plots 406 or 408 of FIG. 4. As will be readily appreciated by those having ordinary skills in the art, when the buffered samples correlate with the samples of a reference chip, the inner product will be large. Conversely, when the received samples correlate with the inverse of the referenced chip signal the inner product will be negative. However, if the received samples do not strongly positively or negatively correlate with the referenced chip samples, the inner product will be small.
[0087] Accordingly, the decode process 1400 compares the inner product computed at block 1404 with a first threshold and determines if the inner product exceeds the first threshold (block 1406). If the inner product does exceed the first threshold, a value of positive one is assigned to that inner product (block 1408). In the alternative, if the inner product does not exceed the first threshold (block 1406), the decode process 1400 determines if the inner product is less than a second threshold, (block 1410). If the inner product is less than the second threshold (block 1410), a value of negative one is assigned to the inner product (block 1412). In the alternative, if the inner product does not exceed the first threshold (block 1406) or is not less than the second threshold (block 1410), the inner product is assigned a value of zero (1314).
[0088] The values assigned to the inner products at blocks 1408, 1412 or 1414 are then written to a buffer (block 1416). As we readily appreciated by those having
ordinary skill in the art, the buffer may be in one of the memories 1206-1210 or maybe within the mass storage device 1218.
[0089] After the assigned inner products have been written to the buffer (block 1416), an inner product is calculated between the buffer contents and a reference signal (block 1418). In such an arrangement, the reference signal is the marker or spread spectrum signal embedded into the program information by the encode process 1300. The results of the inner product are stored in an array (block 1420) and a threshold for comparison to the inner product results is generated (block 1422). As described above in conjunction with the threshold generator 1026 of FIG. 10, the threshold may be calculated as moving average across inner products of, for example, 1000 or more samples.
[0090] After the inner product has been stored (block 1420) and the threshold has been generated (block 1422), the decode process 1400 determines if the inner product exceeds the threshold (block 1424). If the inner product exceeds the threshold (block 1424), a value of a sample counter divided by a factor is output (block 1426). As described above, the sample counter tallies the number of samples that have been received by the decode process 1400. The division of the sample counter output by a factor (N), followed by a decrement of 1, converts the sample count to a decimal value representative of the information embedded into the program content by the encode process 1300.
[0091] After the value of the sample counter divided by a factor has been output (block 1426), the sample counter is zeroed and restarted (block 1428) and the decode process 1400 returns to execution of the block 1402.
[0092] Alternatively, if the inner product does not exceed the threshold (block 1424), the sample counter is incremented (block 1430) and the decode process 1400 returns to execution at the block 1402. The sample counter incrementing that takes place at block 1430 increments the sample counter by the number of samples that have been processed by the decode process. For example, the inner product computed at block 1404 is carried out every time a sample is received, accordingly the sample counter will be incremented at block 1430 equal to the number of inner products that were calculated at block 1404.
[0093] In one alternative, a sampling rate of 16 KHz may be used at the code processor 122 to reduce the quantity of data that needs to be processed. Accordingly, instead of 15-sample groups representing each chip, chips would be represented with 5-sample groups. As a further alternative, inner product calculation may be abandoned if the partial sum after 1024 chip values does not have the proper sign. For example, in case of search for a synch marker, it is expected that a positive value will be observed and for a data marker it is expected that the partial sum is negative.
[0094] In the example described above the separation between markers varies in duration between 1.28 seconds and 1.28 + 0.68 = 1.96 seconds. Given the robustness of the markers it is possible to extend the concept to the case of watermarking copyrighted content such as, for example, feature music. For example, a song with duration of 180 seconds could have a synch marker inserted at a suitable location in the initial portion of the audio. Data markers can follow this at approximate intervals of 30 seconds. The actual sample count between markers can be set to a sequence of large unique numbers which then become the watermark. The advantage of the
1 OΛ 1 8 method is that most of the audio, : — x 100 = 95.6% in the above example
180 would be left unmodified by the watermarking process. The degradation of the audio is therefore confined to a small fraction of the total play period.
[0095] Although the foregoing discloses example systems including, among other components, software executed on hardware, it should be noted that such systems are merely illustrative and should not be considered as limiting. For example, it is contemplated that any or all of these hardware and software components could be embodied exclusively in dedicated hardware, exclusively in software, exclusively in firmware or in some combination of hardware, firmware and/or software. Accordingly, while the foregoing describes example systems, persons of ordinary skill in the art will readily appreciate that the examples are not the only way to implement such systems.