HK1142186B

HK1142186B - Methods and apparatus for characterizing media

Info

Publication number: HK1142186B
Application number: HK10108511.1A
Authority: HK
Inventors: 亚历山大‧托普奇; 韦努戈帕尔‧斯里尼瓦桑; 阿伦‧拉马斯瓦米
Original assignee: 尼尔森（美国）有限公司
Priority date: 2007-02-20
Filing date: 2008-02-20
Publication date: 2013-07-05

Description

Method and apparatus for characterizing media

RELATED APPLICATIONS

This patent claims priority to U.S. provisional patent applications nos. 60/890,680 and 60/894,090, filed on 20/2007 and 9/2007, respectively, the entire contents of which are hereby incorporated by reference.

Technical Field

The present invention relates generally to media monitoring and, more particularly, to methods and apparatus for characterizing media and for generating signatures that identify media information.

Background

It is known to use signature matching techniques to identify media information, and more particularly, to identify audio streams (e.g., audio information). Known signature matching techniques are commonly used in television and radio audience statistics applications (metering applications) and are implemented using several methods for generating signatures and matching. For example, in a television audience statistics application, signatures are generated at a monitored location (e.g., a monitored household) and a reference location. The monitoring locations typically include locations such as households that monitor media consumption by audience members. For example, at a monitoring location, a monitored signature may be generated based on an audio stream associated with a selected channel, broadcast station, or the like. The monitored signature may then be sent to a central data collection facility for analysis. At the reference location, a signature (often referred to as a reference signature) is generated based on known programs provided within the broadcast area. The reference signature may be stored at the reference location and/or the central data collection device and compared to the monitoring signature generated at the monitoring location. A monitoring signature matching the reference signature can be found and the known program corresponding to the matching reference signature can be identified as the program presented at the monitoring location.

Drawings

Fig. 1A and 1B illustrate an exemplary audio stream identification system for generating a signature and identifying an audio stream.

Fig. 2 is a flow diagram illustrating an exemplary signature generation process.

FIG. 3 is a flow diagram illustrating further details of the exemplary capture audio process shown in FIG. 2.

Fig. 4 is a flow diagram illustrating further details of the exemplary compute decision metric process shown in fig. 2.

Fig. 5 is a flow chart illustrating further details of one exemplary process for determining the relationship between the frequency bins (bins) and frequency bands (bands) shown in fig. 4.

Fig. 6 is a flow chart illustrating further details of a second exemplary process for determining a relationship between frequency bands and frequency bands shown in fig. 4.

Fig. 7 is a flow diagram of an exemplary signature matching process.

Fig. 8 is a diagram of how signatures are compared according to the flowchart of fig. 7.

Fig. 9 is a block diagram of an exemplary signature generation system that generates a signature based on an audio stream or audio block.

FIG. 10 is a block diagram of an exemplary signature comparison system for comparing signatures.

FIG. 11 is a block diagram of an exemplary processor system that may be used to implement the methods and apparatus described herein.

Detailed Description

Although the following discloses an exemplary system implemented using software executing on hardware, among other components, it should be noted that such a system is merely exemplary and should not be considered as limiting. For example, any or all of these hardware and software components may be implemented solely in hardware, solely in software, or in any combination of hardware and software. Thus, while an exemplary system is described below, those skilled in the art will readily appreciate that the examples provided are not the only way to implement the system.

The methods and apparatus described herein relate generally to generating digital signatures that can be used to identify media information. Digital signatures are audio descriptors that accurately characterize an audio signal for purposes of matching, indexing, or database retrieval. In particular, the disclosed methods and apparatus are described for generating a digital signature based on an audio stream or audio block (e.g., audio information). However, the methods and apparatus described herein may also generate digital signatures based on any other type of media information (e.g., video information, web pages, still images, computer data, etc.). Further, the media information may be associated with: from any presence of the exemplary method of fig. 7, and with particular reference thereto, the exemplary process 700 includes obtaining monitored signatures and their associated timing (block 702). As shown in FIG. 8, a signature set may include a plurality of monitored signatures, 3 of which are shown in FIG. 8 at reference numerals 802, 804 and 806. Each signature is represented by sigma (σ). Each of the monitored signatures 802, 804, and 806 may include timing information 808, 810, 812, whether implicit or explicit.

Described, the methods and apparatus described herein identify media information, including audio streams, based on digital signatures. The exemplary techniques described herein utilize blocks of audio samples to compute signatures at specific times by analyzing properties of audio spectra in the blocks of audio samples. As will be described below, a decision function or decision metric is calculated for the signal bands of the audio spectrum and signature bits are assigned to blocks of audio samples based on the values of the decision metric. The decision function or decision metric may be calculated based on a comparison between spectral bands or by convolving a band with two or more vectors. The decision function may be derived from other methods than from the spectral representation (spectral representation) of the original signal, such as wavelet transform, cosine transform, etc.

The above techniques may be utilized at a monitoring location to generate monitored signatures based on audio streams associated with media information (e.g., monitored audio streams) consumed by an audience. For example, the monitored signature may be generated based on audio blocks of a track (track) of a television program presented at the monitoring location. The monitored signature may then be transmitted to a central data collection device for comparison with one or more reference signatures.

The above techniques are utilized at a reference location and/or a central data collection device to generate a reference signature based on an audio stream associated with known media information. The known media information may include media broadcast within an area, media rendered within a home (reproduction), media received via the internet, and so on. Each reference signature is stored in memory along with media identifying information (e.g., song title, movie title, etc.). When a monitored signature is received at the central data collection device, the monitored signature is compared to one or more signatures until a match is found. The matching information is then used to identify the media information (e.g., the monitored audio stream) from which the monitoring signature was generated. For example, a look-up table or database may be referenced to retrieve a media title, program identification (program), episode number (episode number), etc. corresponding to the media information from which the monitoring signature was generated.

In one example, the generation rates of the monitor signature and the reference signature may be different. Of course, in a setting where the data rates of the monitor signature and the reference signature are different, when comparing the monitor signature with the reference signature, the difference must be accounted for. For example, if the monitored rate is 25% of the reference rate, then each successive monitored signature will correspond to every 4 th reference signature.

Fig. 1A and 1B illustrate exemplary audio stream identification systems 100 and 150 for generating digital spectral signatures and identifying audio streams. Exemplary audio stream identification systems 100 and 150 may be implemented as a television broadcast information identification system and a radio broadcast information identification system, respectively. The exemplary audio stream identification system 100 includes a monitoring site 102 (e.g., a monitoring home), a reference site 104, and a central data collection facility 106.

The monitoring of the television broadcast information comprises the following steps: the audio data based on the television broadcast information generates a monitored signature at the monitoring location 102 and transmits the monitored signature to the central data collection facility 106 via the network 108. The reference signature may be generated at the reference location 104 and may also be transmitted to the central data collection facility 106 via the network 108. The audio content represented by the monitored signature generated at the monitoring site 102 may be identified at the central data collection device 106 by comparing the monitored signature to one or more reference signatures until a match is found. Alternatively, the monitored signature may be transmitted from the monitoring location 102 to the reference location 104 and compared to one or more reference signatures at the reference location 104. In another example, the reference signature may be transmitted to the monitoring location 102 and compared to the monitored signature at the monitoring location 102.

The monitoring site 102 may be, for example, a household that monitors audience media consumption. In general, the monitoring site 102 may include a plurality of media delivery devices 110, a plurality of media presentation devices 112, and a signature generator 114 for generating monitored signatures associated with media presented at the monitoring site 102.

The plurality of media delivery devices 110 may include, for example, a set-top box tuner (e.g., cable tuner, satellite tuner, etc.), a DVD player, a CD player, a radio, etc. Some or all of media delivery device 110 (e.g., a set-top box tuner) may be communicably coupled to one or more broadcast information receiving devices 116, and broadcast information receiving devices 116 may include cables, satellite dishes, antennas, and/or any other suitable device for receiving broadcast information. Media delivery device 110 may be configured to reproduce media information (e.g., audio information, video information, web pages, still images, etc.) based on, for example, broadcast information and/or stored information. The broadcast information may be obtained from the broadcast information receiving device 116 and the stored information may be obtained from an information storage medium (e.g., DVD, CD, tape, etc.). The media delivery device 110 is communicably coupled to the media presentation device 112 and may be configured to communicate media information to the media presentation device 112 for presentation. The media presentation device 112 may include a television with a display device and/or a set of speakers through which audience members consume, for example, broadcast television information, music, movies, and the like.

As will be described in greater detail below, the signature generator 114 may be used to generate monitored digital signatures based on audio information. In particular, at the monitoring site 102, the signature generator 114 may be configured to generate a monitored signature based on a monitored audio stream that is reproduced by the media delivery device 110 and/or rendered by the media rendering device 112. The signature generator 114 may be communicatively coupled to the media delivery device 110 and/or the media presentation device 112 via the audio monitoring interface 118. In this manner, the signature generator 114 may obtain an audio stream associated with the media information reproduced by the media delivery device 110 and/or presented by the media presentation device 112. Additionally or alternatively, the signature generator 114 may be communicatively coupled to a microphone (not shown) positioned proximate the media presentation device 112 to monitor the audio stream. The signature generator 114 may also be communicably coupled to the central data collection facility 106 via the network 108.

The network 108 may be used to communicate signatures (e.g., digital spectrum signatures), control information, and/or configuration information between the monitoring sites 102, the reference sites 104, and the central data collection facility 106. Any wired or wireless communication system (e.g., a broadband wired network, a DSL network, a cellular telephone network, a satellite network, and/or any other communication network) may be used to implement network 108.

As shown in fig. 1A, the reference site 104 may include a plurality of broadcast information tuners 120, a reference signature generator 122, a transmitter 124, a database or memory 126, and a broadcast information receiving device 128. The reference signature generator 122 and the transmitter 124 may be communicatively coupled to a memory 126 to store reference signatures therein and/or retrieve stored reference signatures therefrom.

Broadcast information tuner 120 may be communicatively coupled to broadcast information receiving device 128, and broadcast information receiving device 128 may include a cable, an antenna, a satellite dish, and/or any other suitable device for receiving broadcast information. Each broadcast information tuner 120 may be configured to tune to a particular broadcast channel. Typically, the number of tuners at the reference site 104 is equal to the number of channels available in a particular broadcast area. In this way, a reference signature may be generated for all media information transmitted through all channels in the broadcast area. The audio portion of the tuned media information may be communicated from the broadcast information tuner 120 to the reference signature generator 122.

The reference signature generator 122 may be configured to obtain the audio portion of all media information available in a particular broadcast area. Reference signature generator 122 may then generate a plurality of reference signatures based on the audio information (as will be described in more detail below) and store the reference signatures in memory 126. Although one reference signature generator is shown in fig. 1, multiple reference signature generators may be used at the reference site 104. For example, each of the plurality of signature generators may be communicatively coupled to a respective one of the broadcast information tuners 120.

The transmitter 124 may be communicatively coupled to the memory 126 and configured to retrieve the signature therefrom and transmit the reference signature to the central data collection device 106 via the network 108.

The central data collection facility 106 may be configured to compare monitored signatures received from the monitoring sites 102 with reference signatures received from the reference sites 104. Further, the central data collection facility 106 may be configured to identify the monitored audio streams by matching the monitored signatures to reference signatures and utilize the matching information to retrieve television program identification information (e.g., program title, broadcast time, broadcast channel, etc.) from the database. The central data collection facility 106 includes a receiver 130, a signature analyzer 132, and a memory 134, all of which are communicatively coupled as shown.

The receiver 130 may be configured to receive the monitored signature and the reference signature via the network 108. The receiver 130 is communicatively coupled to the memory 134 and is configured to store the monitored signature and the reference signature therein.

The signature analyzer 132 may be used to compare the reference signature with the monitored signatures. The signature analyzer 132 is communicatively coupled to the memory 134 and is configured to retrieve the monitored signature and the reference signature from the memory 134. The signature analyzer 132 may be configured to retrieve the reference signature and the monitored signature from the memory 134 and compare the monitored signature to the reference signature until a match is found. Memory 134 may be implemented using any machine-accessible information storage medium (e.g., one or more hard disk drives, one or more optical storage devices, etc.).

Although the signature analyzer 132 is located in the central data collection facility 106 in FIG. 1A, the signature analyzer 132 may instead be located at the reference site 104. In such a configuration, the monitored signature may be transmitted from the monitoring location 102 to the reference location 104 via the network 108. Alternatively, the memory 134 may be located at the monitoring site 102 and the reference signature may be periodically added to the memory 134 via the network 108 by the transmitter 124. Additionally, although the signature analyzer 132 is shown as a separate device from the signature generators 114 and 122, the signature analyzer 132 may be integrally formed with the reference signature generator 122 and/or the signature generator 114. Additionally, although fig. 1 illustrates a single monitoring site (i.e., monitoring site 102) and a single reference site (i.e., reference site 104), a plurality of such sites may be coupled to central data collection apparatus 106 via network 108.

The audio stream identification system of fig. 1B may be configured to monitor and identify audio streams associated with radio broadcast information. In general, audio stream identification system 150 is used to monitor content broadcast by a plurality of radio stations in a particular broadcast area. Unlike the audio stream identification system 100, which is used to monitor television content consumed by an audience, the audio stream identification system 150 may be used to monitor music, songs, etc. broadcast within a broadcast area and the number of times they are broadcast. This type of media tracking may be used to determine royalty (royalty) payments, correct usage of copyrights, etc., associated with individual audio works. The audio stream identification system 150 includes a monitoring site 152, a central data collection facility 154, and the network 108.

The monitoring site 152 is configured to receive all of the radio broadcast information available in a particular broadcast area and generate a monitored signature based on the radio broadcast information. The monitoring site 152 includes the plurality of broadcast information tuners 120, the transmitter 124, the memory 126, and the broadcast information receiving device 128, all of which are described in conjunction with fig. 1A. In addition, the monitoring site 152 includes a signature generator 156. When used in the audio stream recognition system 150, the broadcast information receiving device 128 is configured to receive radio broadcast information and the broadcast information tuner 120 is configured to tune to the radio broadcast station. The number of broadcast information tuners 120 at the monitoring location 152 may be equal to the number of radio broadcast stations in a particular broadcast area.

Signature generator 156 is configured to receive the tuned audio information from the respective broadcast information tuners 120 and generate monitoring signatures for the tuned audio information. Although one signature generator (i.e., signature generator 156) is shown, the monitoring site 152 may include multiple signature generators, each of which is communicatively coupled to one of the broadcast information tuners 120. The signature generator 156 may store the monitored signatures in the memory 126. The transmitter 124 may retrieve the monitored signatures from the memory 126 and transmit them to the central data collection device 154 via the network 108.

The central data collection apparatus 154 is configured to receive the monitored signature from the monitoring venue 152, generate a reference signature based on the reference audio stream, and compare the monitored signature to the reference signature. The central data collection apparatus 154 includes a receiver 130, a signature analyzer 132, and a memory 134. All of which are specifically described above in connection with fig. 1A. In addition, the central data collection apparatus 154 includes a reference signature generator 158.

The reference signature generator 158 is configured to generate a reference signature based on the reference audio stream. The reference audio stream may be stored in any type of machine accessible medium (e.g., CD, DVD, Digital Audio Tape (DAT)). Typically, artists and/or record-making companies send their audio pieces (i.e., music, songs, etc.) to the central data collection 154 to add them to the reference library. The reference signature generator 158 may read audio data from a machine-accessible medium and generate a plurality of reference signatures based on individual audio works (i.e., the captured audio 300 in fig. 3). The reference signature generator 158 may then store the reference signature in the memory 134 for subsequent retrieval by the signature analyzer 132. Identification information (e.g., song title, artist name, track number, etc.) associated with each reference audio stream may be stored in a database and may be indexed based on the reference signature. In this manner, the central data collection facility 154 includes a database with reference signatures and identification information corresponding to all known and available song titles.

The receiver 130 is configured to receive the monitored signature from the network 108 and store the monitored signature in the memory 134. The monitored signature and the reference signature are retrieved from the memory 134 by the signature analyzer 132 for use in identifying monitored audio streams broadcast within the broadcast area. The signature analyzer 132 may identify the monitored audio stream by first matching the monitored signature to a reference signature. The matching information and/or matching reference signature is then used to retrieve identification information (e.g., song title, song track, artist, etc.) from a database stored in memory 134.

Although one monitoring site (e.g., monitoring site 152) is shown in fig. 1B, multiple monitoring sites may be communicatively coupled to network 108 and configured to generate monitored signatures. In particular, the respective monitored locations may be located in respective broadcast areas and configured to monitor the content of broadcast stations within the respective broadcast areas.

An exemplary signature generation process and apparatus for creating a digital signature, for example, of length 24 bits, is described below. In one example, each signature (i.e., each 24-bit word) is derived from a long block of audio samples having a duration of about 2 seconds. Of course, the selected signature length and size of the block of audio samples are merely exemplary, and other signature lengths and block sizes may be selected.

Fig. 2 is a flow diagram representing an exemplary signature generation process 200. As shown in FIG. 2, the signature generation process 200 first captures an audio block to be characterized by a signature (block 202). Audio may be captured from an audio source via, for example, a hardwired connection to the audio source or via a wireless connection to the audio source, such as an audio sensor. If the audio source is analog, the capturing includes sampling (digitizing) the analog audio source using, for example, an analog-to-digital converter.

An incoming analog audio stream whose signature is to be determined is digitally sampled at a sampling rate (Fs) of 8 kHz. This means that the analog audio is represented by digital samples that are decimated at a rate of 8000 samples per second or at a rate of 125 microseconds (us)1 samples. Individual audio samples may be represented with a resolution of 16 bits. Generally, the number of samples captured in an audio block is denoted here by the variable N. In one example, the audio is sampled at 8kHz for a duration of 2.048 seconds, resulting in N16384 samples in the time domain. In this arrangement, the time range of the captured audio corresponds to t … t + N/Fs, where t is the time of the first sample. Of course, the specific sampling rate, bit resolution, sample duration, and number of resulting time domain samples specified above are merely one example.

As shown in FIG. 3, the capture audio processing 202 may be implemented by shifting samples by an amount, such as 256 samples, in an input buffer (block 302), and reading new samples to fill in empty portions of the buffer (block 304). As described in the examples below, because individual Frequency bins (Frequency bins) are more sensitive to the selection of audio blocks, the signature characterizing an audio block is derived from a Frequency band that includes multiple Frequency bins rather than from a Frequency Bin. In some examples, ensuring stability of the signature with respect to the block arrangement is critical because the reference signature and the measured location signature (hereinafter referred to as a location unit signature) are calculated from blocks of audio samples that cannot be aligned with each other in the time domain. To address this issue, in one example, the reference signature is captured at 32 millisecond intervals (i.e., an audio block of 16384 samples is updated by appending 256 new samples and discarding the oldest 256 samples). In an exemplary place unit, the signature is captured at intervals of 128 milliseconds or at sampling increments of 1024 samples. Thus, the block offset between the worst case reference signature and the place unit signature is 128 samples. A desirable feature of the signature is robustness against shifts of 128 samples. In fact, in the matching process described below, it is desirable that the location unit signature be perfectly consistent with the reference signature to enable successful "hit" look-up tables

Referring to FIG. 2, after audio is captured (block 202), the captured audio is transformed (block 204). In one example, the transform may be a transform from the time domain to the frequency domain. For example, N samples of captured audio may be converted into an audio spectrum represented by N/2 complex Discrete Fourier Transform (DFT) coefficients including real and imaginary frequency components. Equation 1 below shows an exemplary frequency conversion equation that is performed on amplitude values in the time domain to convert them into complex-valued frequency-domain spectral coefficients X [ k ].

Formula 1

Wherein, X [ k ]]Is a complex number having a real component and an imaginary component, such that X [ k ]]＝X_R[k]+jX_I[k]K is more than or equal to 0 and less than or equal to N-1, and the real part and the imaginary part are respectivelyIs X_R[k]And X_I[k]. The individual frequency components are identified by a bin index k. Although the above description refers to a DFT process, any suitable transform may be employed (such as a wavelet transform, Discrete Cosine Transform (DCT), MDCT, Haar (Haar) transform, Walsh (Walsh) transform, etc.).

After the transform is complete (block 204), the process 200 computes a decision metric (block 206). As described below, the decision metric may be calculated by dividing the transformed audio into frequency bands (i.e., into several frequency bands, each of which includes several complex-valued frequency component bins). In one example, the transformed audio may be divided into 24 frequency bands of frequency bands. After the division, a decision metric is determined for each frequency band, for example, based on the relationship between the spectral coefficient values in the frequency band (comparing them to each other, or to the value of another frequency band, or convolving with two or more vectors). The relationship may be based on processing of groups of frequency components within respective frequency bands. In one particular example, the set of frequency components may be selected in an iterative manner such that all frequency component segments within a frequency band become one member of the set at some point in the iteration. The calculation of the decision metric generates for example one decision metric for each frequency band of the considered frequency band. Thus, for 24 bands of a frequency band, 24 discrete decision metrics are generated. Exemplary decision metric calculations are described below in conjunction with fig. 4-6.

Based on the decision metric (block 206), the process 200 determines a digital signature (block 208). Thus, one exemplary structure of a signature is to derive the individual bits from the sign (i.e., positive and negative) of the corresponding decision metric. For example, if the corresponding decision metric (hereinafter defined as D)_B[p]Where p is a band including a set of bands being analyzed (collection) is non-negative, each bit in the 24-bit signature is set to 1. Otherwise, if the corresponding decision metric (D)_B[p]) Negative, 1 bit of the 24-bit signature is set to 0.

After determining the signature (block 208), the process 200 determines whether the signature generation process should be iterated (block 210). When another signature should be generated, the process 200 captures audio (block 202) and the process 200 repeats.

An exemplary process of calculating the decision metric 206 is shown in fig. 4. According to this example, after the audio is transformed (block 206), the transformed audio is divided into frequency bands (block 402). In one example, a 24-bit signature s (t) at time t (e.g., the time at which the last amplitude is captured) is computed by observing the spectral components (real and imaginary parts) at 3072 contiguous frequency bands (which are divided into 24 frequency bands), for example, starting at k-508. These 3072 frequency bands span a frequency range, for example, from about 250Hz to about 3.25 kHz. The frequency range is the frequency range in which most of the audio energy in typical audio content, such as speech and music, is contained. The set of these bins forms, for example, 24 frequency bands B P (0 ≦ P, where P ≦ 24 frequency bands), where each frequency band includes 128 bins. In general, in some examples, the number of frequency bands within a frequency band may be different for different frequency bands.

After dividing the transformed audio into frequency bands (block 402), relationships between the frequency bins in the respective frequency bands are determined (block 404). That is, in order to characterize a spectrum using signatures, the relationship between adjacent bands in one band must be calculated in such a way that each band can be reduced to a single data bit. These relationships may be determined by grouping frequency component segments and operating on the groups. Fig. 5 and 6 show two exemplary ways for determining the relationship between frequency bands in the respective frequency bands. In some examples, the decision function calculation for the selected frequency band may be considered as a data reduction step, thereby reducing the value of the spectral coefficients in one frequency band to a value of 1 bit.

In general, the decision function or metric D can be constructed without reference to the energy of the underlying (underlying) frequency bands or the magnitudes of the spectral components. To obtain a differenceThe function D of (2) can construct a quadratic form (quadratic form) for the real and imaginary vectors of the DFT coefficients that can be used. Consider vector { X_R(k)，X_I(k) Set (where k is the index of the DFT coefficients), quadratic form D can be written as a linear combination of two-by-two scalar products (dot products) of the vectors in the set. The relationship between the frequency bands in the respective frequency bands can be determined by multiplying and adding imaginary components and real components representing the frequency bands. This is possible because, as described above, the result of the transformation includes real and imaginary components for each frequency band. An example of a decision metric is shown in equation 2 below. D [ M ] is as follows]Is the product of real and imaginary spectral components of a neighborhood or set of bands m-w, m. Of course, D [ M ]]The calculation of (c) is iterative for each value of m within the frequency band. Therefore, the calculation shown in equation 2 is iterated until the frequency component segment of the entire frequency band is processed.

Formula 2

Wherein alpha is_jk，β_rs，γ_uvIs the coefficient to be determined and j, k, r, s, u, v is an index across the entire neighborhood (i.e., across all bins in the band). The design objective is to determine that D m is fully specified]The value of the coefficient of the quadratic form { α, β, γ }.

D [ m ] is calculated for each m value in the selected frequency band based on the frequency band around each m value]After the value of (D), D [ m ] is paired over all the bands constituting the band p]Are summed to obtain an overall decision metric D for band p_B[p]. In general, D can be represented by a linear combination of dot products of vectors formed by the real and imaginary parts of the spectral magnitudes_B[p]. Therefore, the decision function of the frequency band p can also be expressed in the form shown in equation 3. As described in connection with fig. 2, in one example, the sign (i.e., positive or negative of the decision metric) determines the signature bit allocation for the frequency band under consideration.

Formula 3

Turning to fig. 6, the relationship between frequency bins in a frequency band may be determined in a different manner than the exemplary manner described in connection with fig. 5. As described below, this second exemplary approach is a method of deriving a robust signature from the spectrum of a signal (such as an audio signal) by convolving each frequency band of a frequency band representing or constituting the spectrum with a pair of complex vectors of M components.

In one such example, the decision metric may limit the width of the group to 3 frequency bands. That is, the division performed by block 402 of fig. 4 generates a plurality of groups each having 3 frequency bands, so that a value of w ═ 1 can be considered. In such an arrangement, the coefficient α is not calculated_jk，β_rs，γ_uvTo do soA convolution may be performed with a pair of 3-element complex vectors with 3 selected frequency bins (e.g., 3 fourier coefficients) forming a group in one example (block 602). Exemplary vectors for convolution are shown in equations 4 and 5 below. From the above description, the considered group of 3 bandwidths may be indexed and incremented until each frequency band in the band is considered.

While specific exemplary vectors are shown in the following equations, it will be appreciated that any suitable vector values may be used to perform a frequency domain convolution or sliding correlation with a set of 3 frequency bands of interest (i.e., fourier coefficients representing the frequency bands of interest). In other examples, vectors of length greater than 3 may be used. Thus, the following example is merely one embodiment of a vector that may be used. In one example, a pair of vectors used to generate a signature bit with an equal probability of a value of 1 or 0 must have constant energy (i.e., the sum of the squares of the elements of the two vectors must be identical). Furthermore, the number of vector elements should be small when it is desired to keep the computation simple. In one exemplary implementation, the number of elements is odd to create a neighborhood of symmetric length on either side of the frequency bin of interest. In generating the signature, it is advantageous to select different pairs of vectors for different frequency bands to obtain maximum decorrelation (decorrelation) between the bits of the signature.

Formula 4

Formula 5

For a bin with index k, the vector W of 3 elements with complex numbers: convolution of [ a + jb, c, d + je ] yields a complex output as shown in equation 6.

A_W[k]＝(X_R[k]+jX_I[k])c+

(X_R[k-1]+jX_I[k-1])(a+jb)+

(X_R[k+1]+jX_I[k+1])(d+je)

Formula 6

For the above vector pair, the energy difference between the convolved bin magnitudes can be calculated using the two vectors. This difference is shown in equation 7.

D_W1W2[k]＝|A_W1[k]|²-|A_W2[k]|²

Formula 7

After expansion and simplification, the result is shown in equation 8.

D_W1W2[k]＝2(X_R[k]Q_k-X_I[k]P_k)+

X_R[k-1]X_I[k+1]-X_R[k+1]X_I[k-1]

Formula 8

Wherein, P_k＝X_R[k-1]-X_R[k+1]And Q_k＝X_I[k-1]-X_I[k+1]。

The features related to the energy distribution characteristics are calculated above for the frequency band k within the time domain sample block. In this case, this is a measure of symmetry. If the energy differences are summed over all frequency bands of the frequency band Bp, a corresponding measure of the distribution of the whole block can be obtained as shown in equation 9.

Formula 9

Wherein, P_sAnd P_eIs the starting bin index and the ending bin index of the band p. The overall decision function for a frequency band of interest may thus be the sum of the products of the real and imaginary components and a numerical parameter suitably selected for each frequency band belonging to that frequency band.

In order for the signature to be unique, each bit of the signature should be highly de-correlated with the other bits. This decorrelation can be achieved by using different coefficients in the convolution calculations for different frequency bands. This decorrelation is facilitated by convolving vectors containing symmetric complex triplets. In the example above, a correlated product is obtained that includes both the real and imaginary parts of all 3 bins associated with the convolution. This is in contrast to a simple energy measure based on squaring and adding the real and imaginary parts.

In some arrangements, one of the disadvantages is that approximately 30% of the generated signatures contain highly correlated adjacent bits. For example, the most significant 8 bits of the 24 bits may all be 1 or 0. Such signatures are called trivial signatures because they are derived from audio blocks as follows: in the audio block, the energy distribution is almost identical for many spectral bands, at least with respect to the significant (significant) part of the spectrum. This highly correlated nature of the resulting frequency band results in the signature bits being identical to each other over a large segment. Several audio waveforms that differ widely from each other may produce signatures that would result in false positive matches. Such trivial signatures may be rejected during the matching process and may be detected by the matching process detecting whether a long string of 1's or 0's exists.

To extract a meaningful signature from such a distorted (skewed) distribution, more than two vectors need to be used to extract the band representation. In one example, 3 vectors may be used. Examples of 3 vectors that may be used are shown in equations 10-12 below.

Formula 10

Formula 11

Formula 12

A24-bit signature can now be computed in such a way that each bit p (0 ≦ p ≦ 23) of the signature is different from its neighboring bits in the vector pair used to determine its value:

formula 12

As an example, in the above formula, bits or frequency bands where p is 0, 3, 6, etc. may use m is 1, n is 2; and bits or bands where p is 1, 4, 7, etc. may use m is 1, n is 3; for the bits or bands, p is 2, 5, 8, etc., m is 2 and n is 3. That is, these indices may be combined with any subset of vectors. Even if adjacent bits are derived from bands immediately adjacent to each other, the convolutions are performed using different pairs of vectors such that they are responsive to different portions of the audio block. In this way, the vectors become decorrelated.

Of course, multiple 3 vectors may be used, which may be combined with the bits having indices in any suitable manner. In some examples, using more than two vectors may reduce the occurrence of trivial signatures to 10%. In addition, some examples using more than two vectors may increase the number of successful matches by 20%.

The foregoing describes a signature technique that may be performed for determining a signature representative of a portion of captured audio. As described above, these signatures may be generated as reference signatures or location unit signatures. Typically, the reference signature may be calculated at intervals of, for example, 32 milliseconds or 256 audio samples, and stored in a "hash table". In one example, the lookup address of the table is the signature itself. The content of the location is an index in the reference audio stream that specifies the location at which the particular signature was captured. When a place unit signature for a match is received, its value constitutes the address for entering the hash table. If the location includes a valid time index, it indicates that a potential match has been detected. However, in one example, a single match based on a signature derived from a 2 second block of audio cannot be used to declare a successful match.

In practice, the hash table accessed by the location unit signature itself may include multiple indices stored as a linked list. Each such entry (entry) indicates a potential matching location in the reference audio stream. To confirm a match, a "hit" check is made in the hash table for a subsequent place unit signature. Each such hit may generate an index that points to a different reference audio stream location. The place unit signature is also time indexed.

The difference in index value between the location signature and the matching reference unit signature provides an offset value. When a successful match is observed, several place unit signatures that are separated from each other by a time step of 128 milliseconds result in a hash table hit, such that the offset value is the same as the offset value of the previous hit. When the number of identical shifts observed in a segment of the place unit signature exceeds a threshold, it can be confirmed that there is a match between two corresponding time segments in the reference and place unit streams.

Fig. 7 illustrates an exemplary signature matching process 700 that may be used to compare a reference signature (i.e., a signature determined at a reference site) with a monitored signature (i.e., a signature determined at a monitoring site). The ultimate goal of signature matching is to find the closest match between the query audio signature (e.g., the monitored audio) and the signatures in the database (e.g., the signatures derived based on the reference audio). The comparison may be performed at a reference location, a monitoring location, or other data processing location that has access to the monitored signature and the database containing the reference signature.

Referring now specifically to the exemplary method of fig. 7, the exemplary process 700 includes obtaining monitored signatures and their associated timing (block 702). As shown in FIG. 8, a signature set may include a plurality of monitored signatures, 3 of which are shown in FIG. 8 at reference numerals 802, 804 and 806. Each signature is represented by sigma (σ). Each of the monitored signatures 802, 804, and 806 may include timing information 808, 810, 812, whether implicit or explicit.

The database containing the reference signature is then queried (block 704) to identify the signature in the database that has the closest match. In one implementation, the measure of similarity (proximity) between signatures is taken as the hamming distance, i.e., the number of locations where the query value differs from the reference bit string. In fig. 8, a database of signatures and timing information is shown at reference 816. Of course, database 816 may include any number of different signatures from different media presentations. An association between the program associated with the matching reference signature and the unknown signature is then established (block 706).

Optionally, process 700 may then establish an offset between the monitored signature and the reference signature (block 708). This is very helpful because the offset remains constant for a considerable period of the continuous query signature (the value of the continuous query signature is derived from the continuous content). The constant offset value is itself a measure representing the accuracy of the match. This information may be used to assist in processing 700 in further data queries.

In case all descriptors of more than one reference signature are associated with a hamming distance below a predetermined hamming distance threshold, more than one monitored signature needs to be matched with respective reference signatures of possible matching reference audio streams. It is almost impossible that all monitored signatures generated based on the monitored audio streams match all reference signatures of more than one reference audio stream, and therefore it is possible to prevent that more than one reference audio streams are erroneously matched to the monitored audio streams.

The above-described exemplary methods, processes and/or techniques may be implemented in hardware, software and/or a combination thereof. More specifically, the exemplary method may be implemented in hardware as defined in the block diagrams of fig. 9 and 10. The example methods, processes and/or techniques may also be implemented by software executing on a processor system (e.g., the processor system 1110 of fig. 11).

Fig. 9 is a block diagram of an exemplary signature generation system 900 for generating digital spectrum signatures. In particular, the example signature generation system 900 may be used to generate monitored signatures and/or reference signatures based on the above-described sampling, transformation, and decision metric calculations. For example, the exemplary signature generation system 900 may be used to implement the signature generators 114 and 122 of FIG. 1A or the signature generators 156 and 158 of FIG. 1B. Additionally, the example signature generation system 900 may be used to implement the example methods of fig. 2-6.

As shown in fig. 9, the exemplary signature generation system 900 includes a sample generator 902, a transformer 908, a decision metric calculator 910, a signature determiner 914, a storage 916, and a data communication interface 918, all of which are communicatively coupled as shown. The example signature generation system 900 may be configured to obtain an example audio stream, obtain a plurality of audio samples from the example audio stream to form an audio block, and generate a signature representing the audio block from the single audio block.

The sample generator 902 may be configured to obtain an exemplary audio stream or media stream. The stream may be any analog or digital audio stream. If the exemplary audio stream is an analog audio stream, the sample generator 902 may be implemented using an analog-to-digital converter. If the exemplary audio stream is a digital audio stream, the sample generator 902 may be implemented using a digital signal processor. Further, the sample generator 902 may be configured to obtain and/or extract audio samples at any desired sampling frequency Fs. For example, as described above, the sample generator may be configured to acquire N samples at 8kHz, and each sample may be represented by 16 bits. In this arrangement, N may be any number of samples (such as 16384). The sample generator 902 may also notify the reference time generator 904 when to begin the audio sample acquisition process. The sample generator 902 passes the samples to the transformer 908.

The timing device 903 may be configured to generate time data and/or timestamp information, and the timing device 903 may be implemented by a clock, a timer, a counter, and/or any other suitable device. The timing device 903 may be communicatively coupled to the reference time generator 904 and may be configured to transmit time data and/or a timestamp to the reference time generator 904. A timing device 903 may also be communicatively coupled to the sample generator 902 and may assert a start signal or interrupt to instruct the sample generator 902 to begin collecting or acquiring audio sample data. In one example, the timing device 903 is implemented by a real-time clock with a period of 24 hours that tracks time in a millisecond resolution. In this case, the timing device 903 may be configured to reset to 0 at midnight and track time in milliseconds with respect to midnight.

When a notification is received from the sample generator 902, the reference time generator 904 may reference a referenceTime t₀Initialization is performed. The reference time t₀May be used to indicate the time at which the signature was generated within the audio stream. Specifically, the reference time generator 904 may be configured to read out time data and/or a value of a time stamp from the time device 903 when the start of the sample acquisition process is notified by the sample generator 902. Reference time generator 904 may then store the value of the timestamp as reference time t₀。

Transformer 908 may be configured to perform an N/2 point DFT on each 16384 sampled audio block. For example, if the sample generator obtains 16384 samples, the transformer will generate a spectrum from the samples in which the spectrum is represented by 8192 discrete frequency coefficients having real and imaginary components.

In one example, the decision metric calculator 910 is configured to identify several frequency bands (e.g., 24 frequency bands) within the DFT generated by transformer 908 by grouping adjacent frequency bands to be considered. In one example, 3 bins are selected per frequency band, thereby forming 24 frequency bands. The frequency band may be selected according to any technique. Of course, any number of suitable frequency bands and frequency bands for each frequency band may be selected.

The decision metric calculator 910 then determines a decision metric for each frequency band. For example, the decision metric calculator 910 may multiply and sum the complex amplitudes or energies of adjacent frequency bands in a frequency band. Alternatively, as described above, the decision metric calculator 910 may convolve the frequency bands with two or more arbitrary dimensional vectors. For example, the decision metric calculator 910 may convolve 3 frequency bins in a frequency band with 2 vectors (3 dimensions each). In another example, the decision metric calculator 910 may convolve 3 frequency bins in a frequency band with 2 vectors selected from a set of 3 vectors, wherein 2 of the 3 vectors are selected based on the frequency band under consideration. For example, the vectors may be selected in a rotation, wherein a first vector and a second vector are used for a first frequency band, a first and a third vector are used for a second frequency band, and a second vector and a third vector are used for a third frequency band, wherein the selection rotation is performed cyclically.

The result of the decision metric calculator 910 is a single value for each frequency band consisting of frequency bands. For example, if there are 24 frequency bands consisting of frequency bands, the decision metric calculator 910 will generate 24 decision metrics.

The signature determiner 914 operates on the values obtained from the decision metric calculator 910 to generate one signature bit for each of the decision metrics. For example, a bit value of 1 may be assigned if the decision metric is positive, and a bit value of 0 may be assigned if the decision metric is negative. The signature bits are output to the storage 916.

The memory may be any suitable medium suitable for signature storage. For example, the storage 916 may be a memory such as a Random Access Memory (RAM), a flash memory, or the like. Additionally or alternatively, the storage 916 may be a mass storage such as a hard disk drive, optical storage media, tape drive, or the like.

The storage 916 is coupled to the data communication interface 918. For example, if the system 900 is located at a monitoring site (e.g., in a person's home), the signature information in the storage 916 can be communicated to a collection device, reference site, etc. using the data communication interface 918.

Fig. 10 is a block diagram of an exemplary signature comparison system 1000 for comparing digital spectrum signatures. In particular, the exemplary signature comparison system 1000 may be used to compare a monitored signature to a reference signature. For example, the exemplary signature comparison system 1000 may be used to implement the signature analyzer 132 of fig. 1A that compares monitored signatures to reference signatures. Additionally, the example signature comparison system 1000 may be used to implement the example process of FIG. 7.

The exemplary signature comparison system 1000 includes a monitor signature receiver 1002, a reference signature receiver 1004, a comparator 1006, a hamming distance filter 1008, a media identifier 1010, and a media identification look-up table interface 1012, all of which are communicatively coupled as shown.

Monitoring signature receiver 1002 may be configured to obtain a monitored signature via network 108 (fig. 1) and communicate the monitored signature to comparator 1006. The reference signature receiver 1004 may be configured to obtain a reference signature from the memory 134 (fig. 1A and 1B) and communicate the reference signature to the comparator 1006.

The comparator 1006 and the hamming distance filter 1008 may be configured to compare the reference signature with the monitored signature using hamming distance. In particular, the comparator 1006 may be configured to compare the descriptors of the monitored signatures with the descriptors of the plurality of reference signatures to generate a value of the hamming distance for each comparison. Then, the hamming distance filter 1008 obtains the value of the hamming distance from the comparator 1006 and filters out the unmatched reference signatures based on the value of the hamming distance.

When a matching reference signature is found, media identifier 1010 may obtain the matching reference signature and, working in conjunction with media identification look-up table interface 1012, may identify media information associated with the unrecognized audio stream. For example, the media identification look-up table interface 1012 may be communicatively coupled to a media identification look-up table or to a database for cross-referencing (cross-reference) media identification information (e.g., movie titles, exhibition titles, song titles, artist names, episode numbers, etc.) based on reference signatures. In this manner, media identifier 1010 may retrieve media identification information from a media identification database based on the matching reference signature. Fig. 11 is a block diagram of an exemplary processor system 1110 that may be used to implement the apparatus and methods described herein. As shown in fig. 11, the processor system 1110 includes a processor 1112 coupled to the interconnection bus or network 114. Processor 1112 includes a register bank or register space 1116 (shown in FIG. 11 as being entirely on-chip), but alternatively, the register bank or register space 1116 may be entirely or partially off-chip and coupled directly to processor 1112 via dedicated electrical connections and/or via an interconnecting network or bus 1114. Processor 1112 may be any suitable processor, processing unit or microprocessor. Although not shown in FIG. 11, the system 1110 may be a multi-processor system and, thus, may include one or more additional processors that are identical or similar to the processor 1112 and that are communicatively coupled to the interconnection bus or network 1114.

The processor 1112 of FIG. 11 is coupled to a chipset 1118, which chipset 1118 includes a memory controller 1120 and an input/output (I/O) controller 1122. As is well known, a chipset typically provides I/O and memory management functions as well as a plurality of general purpose and/or special purpose registers, timers, etc. that are accessible or used by one or more processors coupled to the chipset. The memory controller 1120 performs functions that enable the processor 1112 (or processors if multiple processors are present) to access a system memory 1124 and a mass storage memory 1125.

The system memory 1124 may include any desired type of volatile and/or nonvolatile memory such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), flash memory, Read Only Memory (ROM), or the like. The mass storage 125 may include any desired type of mass storage device, including hard disk drives, optical drives, tape storage devices, and the like.

The I/O controller 1122 performs functions that enable the processor 1112 to communicate with peripheral input/output (I/O) devices 1126 and 1128 via an I/O bus 1130. The I/O devices 1126 and 1128 may be any desired type of I/O device, such as a keyboard, video display or monitor, mouse, or the like. Although the memory controller 1120 and the I/O controller 1122 are depicted in fig. 11 as separate functional blocks within the chipset 1118, the functions performed by these blocks may be integrated within a single semiconductor circuit or may be implemented using two or more separate integrated circuits.

The methods described herein may be implemented using instructions stored on a computer-readable medium and executed by processor 112. A computer-readable medium may include any desired combination of solid-state, magnetic and/or optical media implemented with any desired combination of mass storage device (e.g., a disk drive), removable storage device (e.g., a floppy disk, memory card or stick, etc.), and/or integrated memory device (e.g., random access memory, flash memory, etc.).

It will be readily appreciated that the above-described signature generation and matching processes and/or methods may be implemented in any number of different ways. For example, these processes may be implemented using software or firmware executed on hardware in addition to these components. However, this is merely an example, and it is contemplated that the process may be implemented using any form of logic. This logic may include, for example, an implementation specific to dedicated hardware (e.g., circuits, transistors, logic gates, hard-coded processors, Programmable Array Logic (PAL), an Application Specific Integrated Circuit (ASIC), etc.), specific to software, specific to firmware, or some combination of hardware, firmware, and/or software. For example, instructions representing some or all of the illustrated processing may be stored in one or more memories or other machine-readable media (such as hard disk drives, etc.). Such instructions may be hard-coded or changeable. In addition, some portions of the process may be performed manually. Further, while the various processes described herein are shown in a particular order, those skilled in the art will readily recognize that this order is merely an example and that numerous other orders exist. Thus, while exemplary processes are described above, those skilled in the art will readily appreciate that these examples are not the only way to implement such processes.

Although certain methods, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto.

Claims

1. A method of characterizing media, the method comprising the steps of:

capturing an audio block;

converting at least a portion of the block of audio into a frequency domain representation comprising a plurality of complex-valued frequency components;

defining a frequency band of complex-valued frequency components to be considered;

determining a decision metric using the frequency bands of the complex-valued frequency components by multiplying and adding real spectral components of a neighborhood or group of frequency bands in the frequency bands of the complex-valued frequency components with imaginary spectral components; and

determining signature bits based on the value of the decision metric.

2. The method of claim 1, wherein capturing the block of audio comprises obtaining audio via a hardwired connection.

3. The method of claim 1, wherein capturing the block of audio comprises obtaining audio via a wireless audio sensor.

4. The method of claim 1, wherein the step of capturing the block of audio comprises the steps of: the audio signal is digitally sampled and the digital samples are stored in a buffer.

5. The method of claim 4, wherein the step of capturing the block of audio comprises the steps of: a number of old samples are shifted out of the buffer and a number of new samples are shifted into the buffer.

6. A method as defined in claim 1, wherein converting at least a portion of the block of audio into a frequency-domain representation comprises: a fourier transform is used.

7. A method as defined in claim 1, wherein the step of defining a band of complex-valued frequency components comprises the steps of: complex-valued frequency components that are adjacent in the frequency-domain representation are grouped.

8. The method of claim 7, wherein the step of defining a band of complex-valued frequency components comprises the steps of: complex-valued frequency components in the auditory frequency range are grouped.

9. A method as defined in claim 1, wherein the step of determining a decision metric using the frequency band of complex-valued frequency components comprises the steps of: a linear combination of dot products of a set of vectors representing real and imaginary components of the complex-valued frequency components in the frequency band is computed.

10. The method of claim 9, wherein the linear combination is computed based on a set of complex-valued frequency components within the frequency band.

11. The method of claim 9, wherein the step of determining a decision metric further comprises the steps of: a summation calculation is performed over a linear combination of all complex-valued frequency components in the frequency band.

12. A method as defined in claim 1, wherein the step of using the frequency band of complex-valued frequency components to determine a decision metric comprises the steps of: the complex-valued frequency components are convolved with complex vectors.

13. A method as defined in claim 12, wherein the convolving comprises convolving each complex-valued frequency component in the frequency band with a pair of complex vectors.

14. The method of claim 13, wherein a set of 3 complex-valued frequency components in the frequency band are each convolved with a pair of 3-element complex vectors.

15. The method of claim 14, wherein the step of determining a decision metric comprises the steps of: the convolutions are summed.

16. The method of claim 15, wherein the sum of squares of the first 3-element vector is equal to the sum of squares of the second 3-element vector.

17. The method of claim 15, wherein the pair of 3-element complex vectors is selected from a set of 3 or more than 3-element complex vectors.

18. The method of claim 17, wherein the pair of 3-element complex vectors is selected based on a frequency band being processed.

19. A method as defined in claim 12, wherein the convolution of complex-valued frequency components with a complex vector represents a symmetric energy distribution in the frequency band.

20. A method as defined in claim 12, wherein the decision metric is based on a difference between a convolution of the complex-valued frequency components and a first complex vector and a convolution of the complex-valued frequency components and a second complex vector.

21. A method as defined in claim 20, wherein the decision metric is based on a summation of differences between the convolution result of the complex-valued frequency components and a first complex vector and the convolution result of the complex-valued frequency components and a second complex vector.

22. A device that characterizes media, comprising:

a sample generator that captures audio blocks;

a transformer that converts at least a portion of the block of audio into a frequency domain representation comprising a plurality of complex-valued frequency components;

a decision metric calculator:

which defines the frequency band of the complex-valued frequency components to be considered; and is

Determining a decision metric using a frequency band of said complex-valued frequency components by multiplying and adding real spectral components of a neighborhood or group of frequency bands in said frequency band of complex-valued frequency components with imaginary spectral components;

and

a signature determiner to determine signature bits based on the value of the decision metric.

23. An apparatus as defined in claim 22, wherein capturing the block of audio comprises obtaining audio via a hardwired connection.

24. An apparatus as defined in claim 22, wherein capturing the block of audio comprises obtaining audio via a wireless audio sensor.

25. The apparatus of claim 22, wherein capturing the block of audio comprises digitally sampling the audio signal and storing the digital samples in a buffer.

26. An apparatus as defined in claim 25, wherein capturing the block of audio comprises shifting a number of old samples out of the buffer and a number of new samples into the buffer.

27. An apparatus as defined in claim 22, wherein converting at least a portion of the block of audio to the frequency-domain representation comprises using a fourier transform.

28. An apparatus as defined in claim 22, wherein defining a band of complex-valued frequency components comprises grouping adjacent frequency components in the frequency-domain representation.

29. An apparatus as defined in claim 28, wherein defining the band of complex-valued frequency components comprises grouping complex-valued frequency components within an audible frequency range.

30. An apparatus as defined in claim 22, wherein determining the decision metric using the frequency bands of complex-valued frequency components comprises computing a linear combination of dot products of a set of vectors representing real and imaginary components of the complex-valued frequency components in the frequency bands.

31. The apparatus of claim 30, wherein the linear combination is computed based on a set of complex-valued frequency components within the frequency band.

32. The apparatus of claim 30, wherein determining a decision metric further comprises summing a linear combination of all complex-valued frequency components in the frequency band.

33. An apparatus as defined in claim 22, wherein determining the decision metric using the set of complex-valued frequency components comprises convolving the complex-valued frequency components with a complex vector.

34. An apparatus as defined in claim 33, wherein the convolution comprises convolving each complex-valued frequency component in the frequency band with a pair of complex vectors.

35. The apparatus of claim 34, wherein a set of 3 complex-valued frequency components in the frequency band are each convolved with a pair of 3-element complex vectors.

36. The apparatus of claim 35, wherein determining a decision metric comprises summing convolutions.

37. The apparatus of claim 35, wherein a sum of squares of a first 3-element vector is equal to a sum of squares of a second 3-element vector.

38. The apparatus of claim 35, wherein the pair of 3-element complex vectors is selected from a set of 3-element complex vectors having 3 or more than 3.

39. The apparatus of claim 35, wherein the pair of 3-element complex vectors is selected based on a frequency band being processed.

40. The apparatus of claim 33, wherein a convolution of complex-valued frequency components with a complex vector represents an energy distribution that is symmetric within the frequency band.

41. An apparatus as defined in claim 33, wherein the decision metric is based on a difference between a convolution of the complex-valued frequency components and a first complex vector and a convolution of the complex-valued frequency components and a second complex vector.

42. An apparatus as defined in claim 41, wherein the decision metric is based on a summation of differences between a convolution result of the complex-valued frequency components and a first complex vector and a convolution result of the complex-valued frequency components and a second complex vector.

43. A method of characterizing media, the method comprising the steps of:

capturing an audio block;

converting at least a portion of the audio block into a transform-domain representation comprising a plurality of transform-domain coefficients;

defining a frequency band of transform domain coefficients to be considered;

determining a decision metric by calculating a convolution of the transform domain coefficients with a complex vector, wherein adjacent convolutions of the transform domain coefficients use different complex vectors; and

determining signature bits based on the value of the decision metric.

44. The method of claim 43, wherein the convolution comprises convolving each transform domain coefficient in the frequency band with a pair of complex vectors.

45. The method of claim 44, wherein a set of 3 transform domain coefficients in the frequency band are each convolved with a pair of 3-element complex vectors.