HK1155863B

HK1155863B - Methods and apparatus for generating signatures

Info

Publication number: HK1155863B
Application number: HK11109921.2A
Authority: HK
Inventors: 亚历山大‧帕夫洛维奇‧托普奇; 阿伦‧拉马斯瓦米; 韦努戈帕尔‧斯里尼瓦桑
Original assignee: 尼尔森（美国）有限公司
Priority date: 2008-03-05
Filing date: 2008-11-06
Publication date: 2013-09-06

Description

Method and device for generating signature

Technical Field

The present disclosure relates generally to media monitoring, multimedia content searching and retrieval, and more particularly to methods and apparatus for generating signatures for identifying media information.

Background

Methods for identifying media information, and more particularly audio signals (e.g., information in an audio stream), using signature matching techniques are well established. Signatures are also known and often referred to as fingerprints. Signature matching techniques are commonly used in television viewer and radio audience metering applications and are implemented using several methods of generating and matching signatures. For example, in a television audience metering application, signatures are generated at a monitoring site (e.g., a monitored household) and a reference site. The monitoring venue typically includes a location such as a home that monitors the media consumption of the audience/audience members. For example, at a monitoring site, a monitored signature may be generated based on an audio stream associated with a selected channel, radio station, or the like. The monitored signatures may then be sent to a central data collection facility for analysis. At the reference site, a signature is generated based on known procedures provided within the broadcast area, commonly referred to as a reference signature. The reference signature may be stored at the reference site and/or the central data collection facility and compared to the monitored signature generated by the monitoring site. It may be found that the monitored signature matches the reference signature and known programs corresponding to the matching reference signature may be identified as programs that occurred under monitoring.

Drawings

FIGS. 1A and 1B illustrate an exemplary audio stream identification system for generating a signature and identifying an audio stream.

FIG. 2 is a flow diagram illustrating an exemplary signature generation process.

FIG. 3 is a time domain representation of an exemplary monitored audio stream.

Fig. 4 is a graph of an example of a portion of a monitored audio stream (i.e., an audio block), which is sinusoidal.

Fig. 5 is a graph of an exemplary window that may be applied to the audio block of fig. 4.

Fig. 6 is a graph of windowed audio blocks resulting from applying the window of fig. 5 to the audio blocks of fig. 4.

Fig. 7 is a graph of a second exemplary window that may be applied to the audio block of fig. 4.

Fig. 8 is a graph of a windowed audio block resulting from applying the window of fig. 7 to the audio block of fig. 4.

Fig. 9 is a graph of the window of fig. 5 showing the corresponding frequency response of the window.

Fig. 10 is a graph of the window of fig. 7 showing the corresponding frequency response of the window.

Fig. 11 is a graph of a second alternative exemplary window and its corresponding frequency response.

Fig. 12 is a graph of a third alternative exemplary window and its corresponding frequency response.

Fig. 13 is a flow diagram of an exemplary signature matching process.

Fig. 14 is a diagram showing how signatures may be compared according to the flowchart of fig. 13.

Fig. 15 is a block diagram of an exemplary signature generation system that generates a signature based on an audio stream or audio block.

FIG. 16 is a block diagram of an exemplary signature comparison system for comparing signatures.

FIG. 17 is a block diagram of an exemplary processor system that may be used to implement the methods and apparatus described herein.

Detailed Description

Although the following discloses example systems implemented using software executing on hardware among other components, it should be noted that such systems are merely illustrative and should not be considered as limiting. For example, it is contemplated that all or portions of these hardware and software components could be embodied exclusively in hardware, exclusively in software, or in any combination of hardware and software. Thus, while exemplary embodiments are described below, those skilled in the art will readily appreciate that the examples provided are not the only way to implement such systems.

The methods and apparatus described herein relate generally to generating digital signatures that may be used to identify media information. Digital signatures, or digital fingerprints, are signal descriptors that accurately characterize a signal for matching, indexing, or database search and retrieval. In particular, the disclosed methods and apparatus are described in terms of generating a digital audio signature based on an audio stream or audio block (e.g., audio information). However, the methods and apparatus described herein may also be used to generate digital signatures based on any other type of signal, time series data, and media information (e.g., video information, web pages, still images, computer data, etc.). Further, the media information may be associated with broadcast information (e.g., television information, radio information, etc.), information reproduced from any storage medium (e.g., Compact Disc (CD), Digital Versatile Disc (DVD), etc.), or any other information associated with or for which a digital signature is generated for an audio stream, a video stream, etc. In one particular example, the audio stream is identified based on a digital signature that includes a monitored digital signature generated at a monitoring site (e.g., a monitored household) and a reference digital signature generated and/or stored at a reference site and/or a central data collection facility.

As described in detail below, the methods and apparatus described herein identify media information (including audio streams or any other media) based on a digital signature. The example techniques described herein, for example, compute a signature at a particular time instant using a single audio block of audio samples, but process the audio block using two or more window functions to obtain two or more windowed audio blocks. Further processing of the windowed audio block enables detection of a windowing effect on the audio spectrum of the audio block. A signature value unique or substantially unique to the audio block is derived from the effect of two or more window functions on the audio block. That is, the example techniques described herein enable an audio signature to be calculated or determined without using time-permuted audio blocks. Of course, the choice of the window function may be adjusted, as may the type of transform, parameters, and/or resolution used to determine the signature.

As described in detail below, after applying the window function to the block of audio samples, the frequency components of the windowed audio block are generated by transforming the windowed audio block from the time domain to the frequency domain, e.g., using a Discrete Fourier Transform (DFT) or any other suitable transform, whether or not based on a fourier transform, such as a Discrete Cosine Transform (DCT), a Modified Discrete Cosine Transform (MDCT), a Haar (Haar) transform, a Walsh (Walsh) transform, etc. The transform may be used to analyze the frequency components in the windowed audio block and identify the spectral power of each frequency component. The spectral power may then be used to generate a digital signature.

Other techniques may be used after the window function is applied to the audio block. For example, the windowed audio blocks may be processed using a wavelet transform that transforms audio data from the time domain to the wavelet domain. In general, wavelet transforms may be used to decompose blocks or frames of data (e.g., time-domain audio samples) into multiple sub-bands, thereby enabling analysis of data sets at different scales and/or resolutions. By dividing the data into a plurality of sub-bands, wavelet transforms may be used to analyze various time intervals of the data at a desired scale or resolution.

Alternatively, in addition to applying a time-domain windowing function to time-domain blocks of audio samples, windowing can also be implemented in the frequency domain, where the frequency response corresponding to the time-domain window can be convolved with the frequency spectrum of the audio block. If frequency domain processing including convolution is used, the conversion of the audio blocks into the frequency domain may be performed using a Fourier transform, with adjustments being made between the audio blocks to account for discontinuities. Furthermore, if the processing and application of the window is implemented in the frequency domain, a time domain window having a frequency characteristic with a plurality of non-zero elements (e.g., 3 to 5 non-zero elements) may be selected.

The monitored signatures may be generated at a monitoring site using the techniques described above based on an audio stream associated with media information (e.g., a monitored audio stream) consumed by or presented to a viewer/listener. For example, the monitored signature may be generated based on the soundtrack of a television program or any other media presented at the monitoring site. The monitored signatures may then be transmitted to a central data collection facility for comparison with one or more reference signatures.

A reference signature is generated at a reference location and/or a central data collection facility for an audio stream associated with known media information using the techniques described above. The known media information may include media broadcast within an area, media reproduced within a home, media received over the internet, and the like. Each reference signature is stored in memory along with media identifying information such as a song title, movie title, etc. When the monitored signature is received at the central data collection facility, the monitored signature is compared to one or more reference signatures until a match is found. This matching information may then be used to identify the media information (e.g., monitored audio stream) from which the monitored signature was generated. For example, a look-up table or database may be consulted to retrieve the media title, program label, screen number, etc. corresponding to the media information from which the monitored signature was generated.

In one example, the rate at which the monitored signature and the reference signature are generated may be different. For example, for processing or other aspects, the monitored signature may be 25% of the data rate of the reference signature. For example, a 48-bit reference signature may be generated every 0.032 seconds, which results in a reference data rate of 48 bits 31.25/second or 187.5 bytes/second. In this configuration, a 48-bit reference signature may be generated every 0.128 seconds, resulting in a reference data rate of 48 bits 7.8125/sec or 46.875 bytes/sec. Of course, in a configuration in which the data rates of the monitored signature and the reference signature are different, such a difference must be taken into account when comparing the monitored signature with the reference signature. For example, if the monitored rate is 25% of the reference rate, each successive monitored signature will correspond to every fourth reference signature.

Fig. 1A and 1B illustrate exemplary audio stream identification systems 100 and 150 for generating digital spectral signatures and identifying audio streams. Exemplary audio stream identification systems 100 and 150 may be implemented as a television broadcast information identification system and a radio broadcast information identification system, respectively. The exemplary audio stream identification system 100 includes a monitoring site 102 (e.g., a monitored household), a reference site 104, and a central data collection facility 106.

Monitoring television broadcast information involves generating a monitored signature at the monitoring site 102 based on audio data of the television broadcast information and transmitting the monitored signature to the central data collection facility 106 via the network 108. The reference signature may be generated at the reference site 104 or may be transmitted to the central data collection facility 106 via the network 108. The audio content represented by the monitored signature generated at the monitoring site 102 may be identified at the central data collection facility 106 by comparing the monitored signature to one or more reference signatures until a match is found. Alternatively, the monitored signature may be communicated from the monitoring site 102 to the reference site 104 and compared to one or more reference signatures at the reference site 104. In another example, the reference signature may be transmitted to the monitoring site 102 and compared to the monitored signature at the monitoring site 102.

For example, the monitoring site 102 may be a household that monitors media consumption by a viewer/listener. In general, the monitoring site 102 may include a plurality of media delivery devices 110, a plurality of media presentation devices 112, and a signature generator 114 for generating monitored signatures associated with media presented at the monitoring site 102.

The plurality of media delivery devices 110 may include, for example, a set-top box tuner (e.g., cable modulator, satellite modulator, etc.), PVR device, DVD player, CD player, radio, etc. Some or all of media delivery devices 110, such as a set-top box tuner, may be communicatively connected to one or more broadcast information receiving devices 116, and broadcast information receiving devices 116 may include a cable, a satellite dish (satellite dish), an antenna, and/or any other suitable device for receiving broadcast information. Media delivery device 110 may be configured to render media information (e.g., audio information, video information, web pages, still images, etc.), for example, based on broadcast information and/or stored information. The broadcast information may be obtained from the broadcast information receiving device 116 and the stored information may be obtained from any information storage medium (e.g., DVD, CD, tape, etc.). The media delivery device 110 is communicatively connected to the media presentation device 112 and may be configured to transmit media information to the media presentation device 112 for presentation. The media presentation device 112 may include a television having a display device and/or a set of speakers, e.g., whereby a viewer/listener consumes broadcast television information, music, movies, etc.

As described in more detail below, the signature generator 114 may be used to generate a monitored digital signature based on audio information. In particular, at the monitoring site 102, the signature generator 114 may be configured to generate a monitored signature based on a monitored audio stream reproduced by the media delivery device 110 and/or presented by the media presentation device 112. The signature generator 114 may be communicatively connected to the media delivery device 110 and/or the media presentation device 112 via the audio monitoring interface 118. In this way, signature generator 114 may obtain an audio stream associated with media information rendered by media delivery device 110 and/or presented by media presentation device 112. Additionally or alternatively, the signature generator 114 may be communicatively connected to a microphone (not shown) placed near the media presentation device 112 to detect the audio stream. The signature generator 114 may also be communicatively connected to the central data collection facility 106 via the network 108.

The network 108 may be used to communicate signatures (e.g., digital spectrum signatures), control information, and/or configuration information between the monitoring site 102, the reference site 104, and the central data collection facility 106. Any wired or wireless communication system, such as a broadcast cable network, a DSL network, a cellular telephone network, a satellite network, and/or any other communication network, may be used to implement network 108.

As shown in fig. 1A, the reference ground 104 may include a plurality of broadcast information tuners 120, a reference signature generator 122, a transmitter 124, a database or memory 126, and a broadcast information receiving device 128. The reference signature generator 122 and the transmitter 124 may be communicatively connected to the memory 126 to store the reference signature in the memory 126 and/or to retrieve the stored reference signature from the memory 126.

The broadcast information tuner 120 may be communicatively connected to a broadcast information receiving device 128, and the broadcast information receiving device 128 may include a cable, antenna, satellite dish, and/or any other suitable device for receiving broadcast information. Each broadcast information tuner 120 may be configured to tune to a particular broadcast channel. Typically, the number of tuners at the reference ground 104 is equal to the number of channels available in a particular broadcast area. In this way, a reference signature may be generated for all media information transmitted on all channels within a broadcast area. The audio portion of the tuned media information may be communicated from the broadcast information tuner 120 to the reference signature generator 122.

The reference signature generator 122 may be configured to obtain the audio portion of all media information available in a particular broadcast region. Reference signature generator 122 may then generate a plurality of reference signatures based on the audio information (e.g., using processes described in more detail below) and store the reference signatures in memory 126. Although one reference signature generator is shown in fig. 1, multiple reference signature generators may be used in the reference ground 104. For example, each of the plurality of signature generators may be communicatively coupled to a respective broadcast information tuner 120.

The transmitter 124 may be communicatively connected to the memory 126 and configured to retrieve the signature from the memory 126 and transmit the reference signature to the central data collection facility 106 via the network 108.

The central data collection facility 106 may be configured to compare the monitored signature received from the monitoring site 102 with the reference signature received from the reference site 104. Further, the central data collection facility 106 may be configured to identify the monitored audio streams by matching the monitored signatures to the reference signatures and utilizing the matching information to retrieve television program identification information (e.g., program title, broadcast time, broadcast channel, etc.) from the database. The central data collection facility 106 includes a receiver 130, a signature analyzer 132, and a memory 134, all of which may be communicatively coupled as shown.

The receiver 130 may be configured to receive the monitored signature and the reference signature over the network 108. The receiver 130 is communicatively connected to the memory 134 and is configured to store the monitored signature and the reference signature in the memory 134.

Signature analyzer 132 may be used to compare the reference signature to the monitored signature. The signature analyzer 132 is communicatively coupled to a memory 134 and is configured to retrieve the monitored signature and the reference signature from the memory 134. The signature analyzer 132 may be configured to retrieve the monitored signature and the reference signature from the memory 134 and compare the monitored signature to the reference signature until a match is found. Memory 134 may be implemented using any machine-accessible information storage medium (e.g., one or more hard disks, one or more optical storage devices, etc.).

Although in fig. 1A, the signature analyzer 132 is located at the central data collection facility 106, the signature analyzer 132 may also be located at the reference ground 104. In such a configuration, the monitored signature may be transmitted from the monitoring site 102 to the reference site 104 over the network 108. Alternatively, memory 134 may be located at monitoring site 102 and the reference signature may be periodically added to memory 134 by transmitter 124 over network 108. Additionally, although the signature analyzer 132 is shown as a separate device from the signature generators 114 and 122, the signature analyzer 132 may be integrated with the reference signature generator 122 and/or the signature generator 114. Additionally, although fig. 1 depicts a single monitoring site (i.e., monitoring site 102) and a single reference site (i.e., reference site 104), a plurality of such sites may be connected to central data collection facility 106 via network 108.

The audio stream identification system 150 of FIG. 1B may be configured to monitor and identify audio streams or any other media associated with the radio broadcast information. In general, the audio stream identification system 150 is used to monitor the content broadcast by a plurality of radio stations in a particular broadcast area. Unlike the audio stream identification system 100, which is used to monitor television content consumed by a viewer, the audio stream identification system 150 may be used to monitor music, songs, etc. broadcast within a broadcast area and the number of times they are broadcast. This type of media track may be used to determine royalty payments made with the various audio components, proper usage of copyrights, etc. The audio stream identification system 150 includes a monitoring site 152, a central data collection facility 154, and the network 108.

The monitoring site 152 is configured to receive all broadcast information available for a particular broadcast area and generate a monitored signature based on the radio broadcast information. The monitoring site 152 includes a plurality of broadcast information tuners 120, transmitters 124, memory 126, and broadcast information receiving devices 128, all of which are described above in connection with fig. 1A. In addition, the monitoring ground 152 includes a signature generator 156. When used in the audio stream identification system 150, the broadcast information receiving device 128 is configured to receive radio broadcast information, and the broadcast information tuner 120 is configured to tune to a radio broadcast station. The number of broadcast information tuners 120 at the monitoring site 152 may be equal to the number of radio broadcast stations in a particular broadcast area.

Signature generator 156 is configured to receive the tuned audio information from each broadcast information tuner 120 and generate monitored signatures thereon. Although only one signature generator (i.e., signature generator 156) is shown, the monitoring site 152 may include multiple signature generators, each of which may be communicatively coupled to one of the broadcast information tuners 120. The signature generator 156 may store the monitored signature in the memory 126. Transmitter 124 may retrieve the monitored signatures from memory 126 and transmit them to central data collection facility 154 via network 108.

The central data collection facility 154 is configured to receive the monitored signature from the monitoring site 152, generate a reference signature based on the reference audio stream, and compare the monitored signature to the reference signature. The central data collection facility 154 includes the receiver 130, the signature analyzer 132, and the memory 134, all of which are described in more detail above in connection with FIG. 1A. In addition, the central data collection facility 154 includes a reference signature generator 158.

The reference signal generator 158 is configured to generate a reference signature based on the audio stream. The reference audio stream may be stored on any type of machine accessible medium, e.g., a CD, DVD, digital tape (DAT), etc. Typically, artists and/or record-making companies send their audio pieces (i.e., music, songs, etc.) to the central data collection facility 154 for addition to the reference library. The reference signature generator 158 may read audio data from the machine-accessible medium and generate a plurality of reference signatures based on respective audio works (e.g., the captured audio 300 of fig. 3). The reference signature generator 158 may then store the reference signature in the memory 134 for subsequent retrieval by the signature analyzer 132. Identification information (e.g., song title, artist name, track number, etc.) associated with each reference audio stream may be stored in a database and may be indexed based on the reference signature. In this way, the central data collection facility 154 includes a database that includes reference signatures and identification information corresponding to all known and available song titles.

The receiver 130 is configured to receive the monitored signature from the network 108 and store the monitored signature in the memory 134. The signature analyzer 132 retrieves the monitored signatures and the reference signature from the memory 134 for use in identifying monitored audio streams broadcast within the broadcast area. Signature analyzer 132 may first match the monitored signature with a reference signature to identify the monitored audio stream. The matching information and/or the matching reference signature is then utilized to retrieve identification information (e.g., song title, song track, artist, etc.) from a database stored in memory 134.

Although one monitoring site (e.g., monitoring site 152) is shown in fig. 1B, multiple monitoring sites may be communicatively connected to network 108 and configured to generate monitored signatures. In particular, each monitoring location may be located in a respective broadcast area and configured to monitor the content of broadcast stations within the respective broadcast area.

Fig. 2 is a flow diagram illustrating an exemplary signature generation process 200. As shown in FIG. 2, the signature generation process 200 first captures an audio block to be characterized by a signature (block 202). An exemplary audio time domain plot that may be captured is shown in fig. 3 at reference numeral 300. The audio may be captured from the audio source, for example, by a wired connection to the audio source or by a wireless connection to the audio source (e.g., an audio sensor). If the audio source is analog, the capturing includes sampling the analog audio source using, for example, an analog-to-digital converter. In one example, the audio source may be at an 8 kilohertz (kHz) rate, referred to as a sample rate (F)_s) By sampling, it is meant that the analog audio is represented by its digital samples taken at a rate of 8000 samples per second or 1 sample per 125 microseconds. Each audio sample may be represented by a mono (monoaural) 16-bit resolution.

In one example, an audio block corresponding to 8192 samples is captured for processing. At the previous sample rate of 8kHz, which corresponds to 1.024 seconds of audio. However, this is just one example, and the number of samples collected may correspond to any audio clip ranging from about 1 second to 2 seconds in duration. Generally, the number of captured samples in an audio block is denoted here by the variable N. Thus, in the above example, N8192 and the time range of the captured audio corresponds to t … t + N/F_s. A representation of an audio block is shown in fig. 4 at reference numeral 402, whereFor illustrative purposes, the audio blocks correspond to sinusoids.

After an audio block is captured (block 202), the process 200 applies a first window function (referred to as W) to the audio block₁) (block 204A) to generate a first windowed audio block. In addition, process 200 utilizes a second window function (referred to as W)₂) The audio block is windowed (block 204B) to produce a second windowed audio block. For example, the window may be a Gaussian or Bell type function, where W is W, as shown, for example, by reference numeral 502 in FIG. 5₁The high and low ends of window 502 have zero values, while the center of window 502 has a value of 1. In one example, windowing is a sample-wise multiplication (sample-wise multiplication) between the values of the window function and the samples of the audio block. For example, windowing the audio block 402 with the window 502 results in a windowed audio block 602, as shown in fig. 6, where the amplitude of the windowed audio block 602 is zero at both ends of the window 502 and the center of the windowed audio block 602 has the same amplitude as the audio block 402.

Alternatively, rather than applying a window function in the time domain to the audio block using a sample-wise multiplication of the window function, windowing can also be implemented in the frequency domain, where the frequency response corresponding to the time domain window can be convolved with the frequency spectrum of the audio block. As described above, if frequency domain processing including convolution is used, the conversion of the audio blocks into the frequency domain may be performed using a fourier transform, with adjustments being made between the audio blocks to account for discontinuities. Furthermore, if the processing and application of the window is implemented in the frequency domain, a time domain window having a frequency characteristic with a plurality of non-zero elements (e.g., 3 to 5 non-zero elements) may be selected.

To W₁And W₂The selected windows may be substantially complementary. For example, if for W₁Selecting window 502 shown in FIG. 5, W may be selected₂The window 702 shown in fig. 7 is selected. As shown in FIG. 7, window 702 is window W₁In the reverse form, i.e. W₂(k)＝1-W₁(k) Where k is the sample index in the window domain. Window W₂A unity value is reached at the high and low ends of the window 702 and a zero value is present at the center of the window 702. Thus, when the window 702 is applied to the audio block 402, a windowed audio block 802 results, as shown in fig. 8. As shown in fig. 8, the windowed audio block 802 has a zero amplitude at its center, but has an amplitude at both ends of the windowed audio block 802 that substantially matches the amplitude of the audio block 402.

As shown in fig. 9 and 10, the windows 502 and 702 have respective frequency responses 902 and 1002. Thus, applying windows 502 and 702 to an audio block (e.g., audio block 402) affects the frequency spectrum of the audio block. As described below, different effects of different windows on the audio block are detected to determine a signature representative of the audio block.

Although the windows 502, 702 selected for the above description are similar to the Hann window and the inverted Hann window, respectively, other window shapes may be used. For example, as shown in fig. 11 and 12, two asymmetric windows 1102, 1202 may be selected, where the first window 1102 occupies the upper half of the windowed space and the second window 1202 occupies the lower half of the windowed space. The frequency response of the asymmetric windows 1102, 1202 is the same, as shown in fig. 11 and 12 at reference numerals 1104 and 1204, but because the windows act on most different parts of the audio block, the result of the windowing has different spectral characteristics for audio signals that are not sinusoidal.

Although specific examples of window shapes are described herein, other window shapes may also be used. For example, a first window and a second window (e.g., W) may be paired₁And W₂) Both arbitrarily select a window shape, where the selection is made from a set of window functions. Of course, different windows may be used at different times, as long as the monitoring and reference sites use the same. Further, more than two windows may be used.

Returning to fig. 2, after windowing is completed (blocks 204A and 204B), the windowed audio blocks are transformed (blocks 206A and 206B), respectively. In one example, the transform may be a transform from the time domain to the frequency domain. For example, N samples of the captured audio that have been windowed may be converted to an audio spectrum represented by N/2 complex DFT coefficients. Alternatively, any suitable transform may be used, for example, wavelet transform, DCT, MDCT, haar transform, walsh transform, and the like.

After the transformations are completed (blocks 206A and 206B), the process 200 characterizes the results of each transformation (blocks 208A and 208B). For example, the process may determine the energy within each of the K +1 different bands of each transform result. I.e. using the window W₁The obtained result of the transformation of the windowed audio (block 206A) may be divided into 16 different bands, for example, and the energy in each of the 16 different bands may be determined. This can be expressed as E_j(W1) wherein j ranges from 0 to 15 and W1 represents the energy vs. window W₁The resulting spectrum applied to the sampled audio (i.e., audio blocks) is correlated. Similarly, with window W₂The resulting transform result (block 206B) for the windowed audio block may be divided, for example, into 16 different bands, the energy of which may be determined and denoted E_j(W2), where j ranges from 0 to 15, and W2 indicates that the energy is associated with the spectrum obtained by applying window W2. Alternatively, different spectral features other than energy may be used to characterize the results. For example, the spectral flatness of the energy distribution may be used.

After characterizing the sets of transform results (blocks 208A and 208B), the process 210 compares the characterized results. For example, the results of these characterizations for each band may be subtracted from each other. In one example, the intermediate value may be calculated as d_j＝E_j(w2)-E_j(w1), wherein j ranges from 0 to K. For the above specific example of K15, an intermediate value d may be calculated_jWherein d is_j＝E_j(w2)-E_j(w1), and j ranges from 0 to 15. Thus, this comparison results in 16 different intermediate values (e.g., d)₀，d₁，d₂…d₁₅) Wherein each intermediate value is, for example, a difference in a characteristic of a similar band of a spectrum obtained by transforming the windowed audio block.

After computing the intermediate value to represent a comparison of the plurality of features (block 210), the process 200 determines a signature based on the comparison (block 212). For example, if the median value d_jIf > 0, signature bit S_jA value of 1 may be assigned, and a value of 0 may be assigned otherwise, where j ranges from 0 to K. More specifically, this is mentioned in the example above where K ═ 15, so there are 16 comparisons of the intermediate value to the value 0, and based on these comparisons, a 16-bit signature will be generated to represent the audio block captured at block 202 of fig. 2. After the signature is determined (block 212), the process 200 loops (block 214) and captures additional audio (block 202) to obtain an additional signature.

Although the foregoing describes selecting the first window (W)₁) And a second window (W)₂) And all signatures of the captured audio blocks are determined using the selected window, but other approaches are possible. For example, a first window pair (e.g., W) may be utilized₁And W₂) Some bits of the signature representing the captured audio block are determined, while different window pairs (e.g., W) may be utilized₃And W₄) To determine the other bits of the signature. Further, a third window pair (e.g., W) may be used₁And W₃) Additional signature bits are determined. In some cases, a unique window pair may be selected in a predetermined manner or in any manner to determine the value of each signature bit, so long as the same window pair is selected on a baseline to act on the same window block.

The foregoing describes signature techniques that may be used to determine a signature representative of a portion of captured audio. Fig. 13 illustrates an exemplary signature matching process 1300 that may be performed to compare a reference signature (i.e., a reference determined signature) to a monitored signature (i.e., a monitor determined signature). The ultimate goal of signature matching is to find the closest match between the audio signature of the query (e.g., the monitored audio) and the signature in the database (e.g., the signature taken based on the reference audio). The comparison may be performed at the reference site, the monitoring site, or any other data processing site that has access to the monitored signature and the database containing the reference signature.

Turning now to the example method of fig. 13 in detail, the example process 1300 involves acquiring a monitored signature and its associated timing (block 1302). As shown in FIG. 14, a signature set may include a plurality of monitored signatures, three of which are shown in FIG. 14 at reference numerals 802, 804, and 806. Each signature is represented by sigma (sigma). Each monitored signature 1402, 1404, 1406 may include timing information 1408, 1410, 1412, whether implicit or explicit.

The database containing the reference signature is then queried (block 1304) to identify the signature in the database that has the closest match. In one implementation, the similarity measure (proximity) between signatures can be taken as the Hamming distance, i.e., the number of different positions of the value of the query and the base bit string. In fig. 14, a database of signatures and timing information is shown at reference numeral 1416. Of course, the database 1406 may include any number of different signatures from different media representations. An association is then made between the program associated with the matched reference signature and the unknown signature (block 1306).

Optionally, process 1300 may then determine a deviation between the monitored signature and the reference signature (block 1308). An offset value is needed to better determine with more confidence whether the query signature block well matches the reference signature. For monitoring (observation) reasons, the bias values of all signatures in a short query block are typically kept almost constant with respect to each reference signature.

In the case where all descriptors of more than one reference signature are associated with hamming distances below a predetermined hamming distance threshold, it may be desirable for more than one monitored signature to match each reference signature of a possible matching reference audio stream. It is relatively unlikely that all monitored signatures generated based on the monitored audio stream will match all reference signatures of more than one reference audio stream, and therefore, it is possible to prevent more than one reference audio stream from being erroneously matched to the monitored audio stream.

The above-described exemplary methods, processes and/or techniques may be implemented in hardware, software and/or any combination thereof. More specifically, the exemplary methods may be performed in hardware as defined in the block diagrams of fig. 15 and 16. The example methods, processes and/or techniques may also be implemented in software executing on a processor system, such as the processor system 1610 of fig. 16.

Fig. 15 is a block diagram of an exemplary signature generation system 1500 for generating digital spectrum signatures. In particular, the example signature generation system 1500 may be used to generate monitored signatures and/or reference signatures based on windowing, transforming, characterizing, and comparing audio blocks as described above. For example, the example signature generation system 1500 may be used to implement the signature generators 114 and 122 of FIG. 1A or the signature generators 156 and 158 of FIG. 1B. Additionally, the example signature generation system 1500 may be used to implement the example method of FIG. 2.

As shown in fig. 15, the example signature generation system 1500 includes a sample generator 1502, a timing device 1503, a reference time generator 1504, a windower 1506, a transformer 1508, a feature determiner 1510, a comparator 1512, a signature determiner 1514, a memory 1516, and a data communication interface 1518, all of which may be communicatively coupled as shown. The example signature generation system 1500 may be configured to obtain an example audio stream, obtain a plurality of audio samples from the example audio stream to form an audio block, and generate a signature representative thereof from the single audio block.

The sample generator 1502 may be configured to obtain an exemplary audio stream, e.g., resulting in the stream of captured audio 300 of fig. 3. Stream 300 may be any analog or digital audio stream. If the exemplary audio stream is an analog audio stream, the sample generator 1502 is implemented to utilize an analog-to-digital converter. If the exemplary audio stream is a digital audio stream, the sample generator 1502 may be implemented to utilize a digital signal processor. Further, the sample generator 1502 may be configured to sample at any desired sampling frequency F_sAudio samples are acquired and/or extracted. For example, as described above, the sample generator may be configured to acquire N samples at 8kHz, andeach sample is represented by 16 bits. In this arrangement, N may be any number of samples, for example, N8192. The sample generator 1502 may also notify the reference time generator 1504 when to begin the audio sample acquisition process. The sample generator 1502 passes the samples to the windower 1506.

The timing device 1503 may be configured to generate time data and/or timestamp information and may be implemented by a clock, a timer, a counter, and/or any other suitable device. The timing device 1503 may be communicatively coupled to the reference time generator 1504 and may be configured to transmit time data and/or timestamps to the reference time generator 1504. The timing device 1503 may also be communicatively connected to the sample generator 1502 and may indicate a start signal or interrupt to instruct the sample generator 1502 to begin collecting or acquiring audio sample data. In one example, the timing device 1503 may be implemented by a real-time clock having a 24-hour period with millisecond resolution to record time. In this case, the timing device 1503 may be configured to reset to zero at midnight and record the time in milliseconds relative to midnight. However, in general, a timestamp may represent the complete year, month, day, hour, minute, second information as a number of seconds elapsed from a predetermined time in the past, e.g., 1 month, 1 day 00:00AM 2005. The derivation may be based on the determined acquisition rate of the collected audio signatures to increase to sub-second resolution.

The reference time generator 1504 may initialize the reference time t upon receiving a notification from the sample generator 1502₀. Reference time t₀May be used to indicate the moment in time within the audio stream at which the signature was generated. In particular, the reference time instant generator 1504 may be configured to read time data and/or time stamp values from the timing device 1503 when notified by the sample generator 1502 to start the sample acquisition process. The reference time generator 1504 may then store the timestamp value as the reference time t₀。

For example, the windower 1506 applies, for example, two windows to the audio blocks output by the sample generator 1502. Thus, the windower 1506 results in two windowed audio blocks. As described above, a window may be any set of windows. However, complementary windows are preferred because they easily ensure that on average the two values of energy are the same, resulting in a quasi-equal bit distribution.

The transformer 1508 may be configured to perform an N-point DFT on each windowed audio block, where N is the number of samples obtained by the sample generator 1502. For example, if the sample generator takes 8192 samples, the transformer will generate a spectrum from these samples, where the spectrum is represented by 4096 complex-valued fourier coefficients.

The characteristic determiner 1510 may be configured to identify several bands (e.g., 16 bands) within the DFT spectral characteristics generated by the transformer 1508. The selected frequency bands may, but preferably do not, overlap. These frequency bands may be selected according to any technique. Of course, any number of suitable frequency bands (e.g., 48) may be selected. The characteristic determiner 1510 then determines characteristics of each frequency band. For example, the characteristic determiner 1510 may determine the energy of each frequency band. Thus, the feature determiner 1510 derives two feature sets for each of, for example, 16 frequency bands. For example, if 16 frequency bands are selected, the output of the characteristic determiner 1510 will be 32 energy indicators, one for each frequency band in each DFT. Can be prepared from E_j(w1) and E_j(w2) denotes these features, j ranges from 0 to K (e.g., 0 to 15), and w1 and w2 denote Window 1 and Window 2, respectively.

The comparator 1512 compares the characteristics of the frequency bands to determine an intermediate value. For example, comparator 1512 may be based on d_j＝E_j(w2)-E_j(w1) generating intermediate values such that the energy in each frequency band of the DFT is subtracted from each other.

The signature determiner 1514 operates on the values obtained from the comparator 1512 to generate one signature bit for each intermediate value. This operation may be very similar or identical to process 212 described above in connection with fig. 2. That is, the signature bit value may be based on a comparison of the intermediate value to zero. The signature bits are output to memory 1516.

The memory may be any suitable medium for housing signature memory. For example, the memory 1516 may be a memory such as a Random Access Memory (RAM), flash memory, or the like. Additionally or alternatively, the memory 1516 may be a mass storage such as a hard disk drive, an optical storage medium, a tape drive, or the like.

The memory 1516 is connected to the data communication interface 1518. For example, if the system 1500 is a monitoring location (e.g., at an individual's home), the signature information in the memory 1516 can be communicated to a collection device, a reference location, etc. using the data communication interface 1518.

Fig. 16 is a block diagram of an exemplary signature comparison system 1600 for comparing digital spectrum signatures. In particular, the example signature comparison system 1600 may be used to compare a monitored signature to a reference signature. For example, the example signature comparison system 1600 may be used to implement the signature analyzer 132 of fig. 1A to compare a monitored signature to a reference signature. Additionally, the example signature comparison system 1600 may be used to implement the example process of FIG. 13.

The exemplary signature comparison system 1600 includes a monitored signature receiver 1602, a reference signature receiver 1604, a comparator 1606, a hamming distance filter 1608, a media identifier 1610, and a media identifier look-up table interface 1612, all of which may be communicatively coupled as shown.

Monitored signature receiver 1602 may be configured to obtain monitored signatures over network 108 (fig. 1) and to communicate monitored signatures to comparator 1606. The reference signature receiver 1604 may be configured to obtain the reference signature from the memory 134 (fig. 1A and 1B) and transmit the reference signature to the comparator 1606.

The comparator 1606 and hamming distance filter 1608 may be configured to compare the reference signature with the monitored signature using hamming distance. In particular, the comparator 1606 can be configured to compare the descriptors of the monitored signature with descriptors from a plurality of reference signatures, and generate a hamming distance value for each comparison. The hamming distance filter 1608 may then obtain a hamming distance value from the comparator 1606 and filter out reference signatures that do not match based on the hamming distance value.

Upon finding a matching reference signature, the media identifier 1610 may retrieve the matching reference signature and, in conjunction with the media identification look-up table interface 1612, may identify media information associated with the unidentified audio stream (e.g., the exemplary monitored audio stream 300 of fig. 3). For example, the media identification look-up table interface 1612 may be communicatively connected to a media identification look-up table or database for cross-referencing media identification information (e.g., movie titles, show titles, song titles, artist names, screen numbers, etc.) based on reference signatures. As such, media identifier 1610 may retrieve media identification information from a media identification database based on the matching reference signature.

Fig. 17 is a block diagram of an example processor system 1710 that may be used to implement the apparatus and methods described herein. As shown in fig. 17, the processor system 1710 includes a processor 1712 connected to an interconnection bus or network 1714. Processor 1712 includes register set or register space 116, which is depicted in FIG. 17 as being wholly on-chip, but may alternatively be wholly or partially off-chip and connected directly to processor 1712 via dedicated electrical connections and/or via an interconnection network or bus 1714. Processor 1712 may be any suitable processor, processing unit, or microprocessor. Although not shown in fig. 17, the system 1710 may be a multi-processor system and, thus, may include one or more additional processors that are identical or similar to the processor 1712 and that are communicatively coupled to the interconnection bus or network 1714.

The processor 1712 of fig. 17 is connected to a chipset 1718, and the chipset 1718 includes a memory controller 1720 and an input/output (I/O) controller 1722. As is well known, a chipset typically provides I/O and memory management functions as well as a plurality of general purpose and/or special purpose registers, timers, etc. that are accessible or used by one or more processors coupled to the chipset. The memory controller 1720 performs functions that enable the processor 1712, which when multiple processors are present, enables the multiple processors to access a system memory 1724 and a mass storage memory 1725.

The system memory 1724 may include any desired type of volatile and/or nonvolatile memory such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), flash memory, Read Only Memory (ROM), and the like. Mass storage 1725 may include any desired type of mass storage device, including hard disk drives, optical drives, tape storage devices, and the like.

I/O controller 1722 performs functions that enable processor 1712 to communicate with peripheral input/output (I/O) devices 1726 and 1728 via I/O bus 1730. The I/O devices 1726 and 1728 may be any desired type of I/O device such as a keyboard, video display or monitor, mouse, etc. Although the memory controller 1720 and the I/O controller 1722 are depicted in fig. 17 as separate functional blocks within the chipset 1718, the functions performed by these blocks may be integrated within a single semiconductor circuit or may be implemented using two or more separate integrated circuits.

The methods described herein may be implemented using instructions stored on a computer-readable medium that are executed by a processor 1712. The computer-readable medium may include any desired combination of solid-state, magnetic and/or optical media implemented with any desired combination of mass storage (e.g., disk drives), removable storage (e.g., floppy disks, memory cards or sticks, etc.), and/or integrated storage (e.g., random access memory, flash memory, etc.).

Although certain methods, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto.

Cross Reference to Related Applications

This application claims priority to U.S. provisional application 61/033,992 filed on 5.3.2008, incorporated herein by reference in its entirety.

Claims

1. A method of generating a signature representing a portion of an audio signal, the method comprising the steps of:

capturing an audio signal;

applying a first window function to a portion of the captured audio signal to generate a first windowed audio block;

applying a second window function to the portion of the captured audio signal to generate a second windowed audio block;

determining a first characteristic of a frequency band in the first windowed audio block;

determining a second characteristic of the frequency band in the second windowed audio block;

comparing the first feature to the second feature; and

assigning signature bits representing the portion of the captured audio signal based on the comparison of the first and second features.

2. A method as defined in claim 1, wherein applying the first window function to the portion of the captured audio signal to generate the first windowed audio block and applying the second window function to the portion of the captured audio signal to generate the second windowed audio block comprises frequency domain processing to create a first transformed windowed audio block and a second transformed windowed audio block.

3. A method as defined in claim 2, wherein determining first and second characteristics of the frequency band in the first and second windowed audio blocks comprises processing the first and second transformed windowed audio blocks.

4. The method of claim 1, wherein capturing the audio signal comprises wireless audio capture.

5. The method of claim 4, wherein capturing the audio signal comprises digital sampling.

6. The method of claim 1, wherein the first and second window functions comprise complementary functions.

7. The method of claim 6, wherein the first window function comprises first amplitudes at high and low ends of the first window function, the first amplitudes being greater than a second amplitude at a center of the first window function.

8. The method of claim 7, wherein the second window function includes third amplitudes at high and low ends of the second window function, the third amplitudes being less than a fourth amplitude at a center of the second window function.

9. The method of claim 6, wherein the first window function comprises a first amplitude at a high end of the first window function and a second amplitude at a low end of the first window function, the second amplitude being less than the first amplitude.

10. The method of claim 9, wherein the second window function comprises a third amplitude at a high end of the second window function and a fourth amplitude at a low end of the second window function, the third amplitude being less than the fourth amplitude.

11. The method of claim 1, wherein the first window function and the second window function are selected from a set of window functions.

12. The method of claim 11, wherein the first window function and the second window function are arbitrarily selected from the set of window functions.

13. The method of claim 1, wherein applying the first window function to the portion of the captured audio signal comprises multiplying the first window function with the portion of the captured audio signal in a time domain operation.

14. The method of claim 13, wherein applying the second window function to the portion of the captured audio signal comprises multiplying the second window function with the portion of the captured audio signal in a time domain operation.

15. The method of claim 1, wherein the first and second characteristics comprise first and second energies.

16. The method of claim 15, wherein comparing the first and second characteristics comprises subtracting the first and second energies.

17. The method of claim 16, wherein assigning signature bits comprises assigning signature bit values based on a result of a subtraction of the first energy and the second energy.

18. The method of claim 15, further comprising the steps of: additional features in each of the first windowed audio block and the second windowed audio block are determined and additional bits representing the portion of the captured audio signal are determined using the additional features.

19. The method of claim 1, further comprising the steps of:

applying a third window function and a fourth window function to the portion of the captured audio to generate a third windowed audio block and a fourth windowed audio block; and

processing the third windowed audio block and the fourth windowed audio block to determine additional bits representing the portion of the captured audio signal.

20. A method as defined in claim 19, wherein the processing includes comparing one or more of a third characteristic of the third windowed audio block, a fourth characteristic of the fourth windowed audio block, the first characteristic, or the second characteristic.

21. An apparatus for generating a signature representing a portion of an audio signal, the apparatus comprising:

a sample generator for capturing an audio signal;

a windower for applying a first window function to a portion of the captured audio signal to generate a first windowed audio block and applying a second window function to the portion of the captured audio signal to generate a second windowed audio block;

a characteristic determiner for determining a first characteristic of the frequency band in the first windowed audio block and for determining a second characteristic of the frequency band in the second windowed audio block;

a comparator for comparing the first characteristic with the second characteristic; and

a signature determiner to assign signature bits representing the portion of the captured audio signal based on a comparison of the first and second features.

22. An apparatus as defined in claim 21, wherein the windower applies the first window function to the portion of the captured audio signal to generate the first windowed audio block and the second window function to the portion of the captured audio signal to generate the second windowed audio block using frequency domain processing to create first and second transformed windowed audio blocks.

23. An apparatus as defined in claim 22, wherein the feature determiner determines the first and second features of the frequency band in the first and second windowed audio blocks comprises processing the first and second transformed windowed audio blocks.

24. The apparatus of claim 21, wherein the first and second window functions comprise complementary functions.

25. The apparatus of claim 24, wherein the windower applies the first window function to the portion of the captured audio signal by multiplying the first window function and the portion of the captured audio signal in a time domain operation.

26. The apparatus of claim 25, wherein the windower applies the second window function to the portion of the captured audio signal by multiplying the second window function and the portion of the captured audio signal in a time domain operation.

27. The apparatus of claim 21, wherein the first and second characteristics comprise first and second energies.

28. The apparatus of claim 27, wherein the comparator subtracts the first energy and the second energy.

29. The apparatus of claim 28, wherein the signature determiner assigns a signature bit value based on a subtraction of the first energy and the second energy.

30. An apparatus as defined in claim 29, wherein the feature determiner determines additional features in each of the first and second windowed audio blocks and utilizes the additional features to determine additional bits representing the portion of the captured audio signal.