HK1116616B

HK1116616B - Methods and apparatus to monitor audio/visual content from various sources

Info

Publication number: HK1116616B
Application number: HK08105557.6A
Authority: HK
Inventors: 阿伦．拉马斯瓦米
Original assignee: 尼尔逊媒介研究股份有限公司
Priority date: 2004-08-09
Filing date: 2005-08-09
Publication date: 2014-04-11

Description

Method and apparatus for monitoring audio/visual content from various sources

Technical Field

The present disclosure relates generally to ratings surveys and more particularly to methods and apparatus for monitoring audio/visual content from various sources.

Background

Television ratings and measurement information is typically generated by collecting viewing records and/or other viewing information from a statistically selected group of households. Each statistically selected household typically has a data recording and processing unit, which is often referred to as a "home unit". In a home with multiple view points (e.g., multiple television systems), the data recording and processing functions may be distributed between a single home unit and multiple "point units" (one for each view point). The home unit (or combination of the home unit and the point unit) often communicates with various accessories that provide input to or receive output from the home unit. For example, a source identification unit, such as a frequency detector accessory, may communicate with a television to sense a local oscillator frequency of a television tuner. In this manner, the frequency detector accessory can be used to determine which channel the current television is tuned to based on the detected frequency. Additional source identification devices, such as an on-screen reader and a Light Emitting Diode (LED) display reader, for example, may be provided to determine whether the television is on (i.e., turned on) and/or to determine the channel to which the television is tuned. The people counter may be located at a viewing location of the television and in communication with the home unit, thereby enabling the home unit to detect the identity and/or number of people currently viewing a program displayed on the television.

The home unit typically processes input from the accessory (e.g., channel tuning information, viewer identity, etc.) to generate a viewing record. The viewing records may be generated periodically (e.g., at fixed time intervals) or non-periodically (e.g., in response to one or more predetermined events, such as a memory fullness or a change in input (e.g., a change in the identity of a person viewing the television, a change in channel tuning information (i.e., a channel change)), etc.). Each viewing record typically contains channel information (e.g., channel number and/or logo (ID)) and the time (e.g., date and time) at which the channel was displayed. Where the program content being displayed is associated with a local audio/video content delivery device (e.g., a Digital Versatile Disk (DVD) player (also known as a digital video disk player), a Digital Video Recorder (DVR), a Video Cassette Recorder (VCR), etc.), the viewing record may include content identification (i.e., program identification) information as well as information relating to the time and manner in which the associated content is displayed. The viewing record may also contain additional information such as the number of viewers present at the viewing time.

The home unit typically collects a plurality of viewing records and periodically (e.g., daily) transmits the collected viewing records to a central office or data processing facility for further processing or analysis. The central data processing facility receives viewing records from the home units located in some or all of the statistically selected households and analyzes the viewing records to determine viewing behavior of households in the geographic area or market of interest, particular households selected from all participating households, and/or particular groups of households. Additionally, the central data processing facility may generate measurement statistics and other parameters that represent viewing behavior associated with some or all of the participating households. This data may be extrapolated to reflect the viewing behavior of the market and/or region modeled by the statistically selected households.

To generate viewing behavior information from the viewing records, the central office or data processing facility may compare reference data, such as a program listing (e.g., a television program schedule or television guide), to the viewing records. In this manner, the central office can infer which program is displayed by comparing the time and channel information in the viewing record to the programs associated with the same time and channel in the program schedule. This collation process may be performed for each viewing record received by the central office, thereby enabling the central office to reconstruct the programs displayed by the selected household and the time at which the programs were displayed. Of course, the aforementioned collation process is not necessary in systems where the identity of the program is obtained by the home unit and included in the viewing record.

The rapid development and application of a wide variety of audio/video content delivery and distribution platforms has made the task of providing a central data collection facility with a home unit of viewing records or information much more complex. For example, although the above-described frequency detector device may be employed to detect channel information at a point at which a network television broadcast is being displayed (because, under normal operating conditions, the local oscillator frequency corresponds to a known network channel), such devices are generally not usable in digital broadcast systems. In particular, digital broadcast systems (e.g., satellite-based digital television systems, digital cable systems, etc.) typically include a digital receiver or set-top box macro block at each user site. The digital receiver or set-top box demodulates the multi-program data stream, parses the multi-program data stream into individual audio and/or video data packets, and selectively processes those data packets to generate the audio/video signals of the desired program. The audio and/or video output signals generated by the set-top box may be directly coupled to an audio/video input of an output device (e.g., a television, video monitor, etc.). Thus, the local oscillator frequency of the output device tuner (if any) does not necessarily identify the channel or program currently being displayed.

In order to be able to generate meaningful viewing records, for example in cases where the channel tuned by the monitoring information providing device is not readily identifiable or may not uniquely correspond to the displayed program, measurement techniques based on the use of ancillary codes and/or content characteristics may be employed. Measurement techniques based on ancillary codes often encode and embed identifying information (e.g., broadcast/network channel numbers, program identification codes, broadcast timestamps, source identifiers for identifying networks and/or stations providing and/or broadcasting content, etc.) in the broadcast signal in a manner such that the codes are not noticeable to viewers. For example, a well-known technique employed in television broadcasting involves embedding auxiliary codes in the invisible Vertical Blanking Interval (VBI) of a video signal. Another example includes embedding an inaudible code in a portion of the audio signal accompanying the broadcast program. The latter technique is particularly advantageous since the embedded code can be reproduced by, for example, a television speaker and non-intrusively monitored by an external sensor such as a microphone.

Generally, feature-based program identification techniques employ one or more characteristics of currently displayed (but not yet identified) audio/video content to generate a substantially unique proxy (proxy) or feature (e.g., a series of numerical values, waveforms, etc.) of the content. The characteristic information of the content being displayed may be compared to a set of reference characteristics corresponding to a known set of programs. When a substantial match is found, the currently displayed program content may be identified with a relatively high likelihood.

Due to the current trend of incorporating multiple sources of audio/visual content into a single home viewing area, generating accurate monitoring information is becoming increasingly interesting. For example, a typical home entertainment system may include a cable or broadcast satellite set-top box with an integrated or discrete DVR, DVD player, DVD recorder, VCR, video game player, or the like. In order to generate accurate monitoring information, the audio/video content source and any associated content identification information must be accurately determined. However, individually monitoring each possible audio/video content source may result in an overly complex and/or cumbersome monitoring system. In addition, it is desirable to perform the monitoring in a manner that does not require any after-market modification of the various possible audio/video content sources.

Drawings

FIG. 1 is a block diagram of an example home entertainment system monitored by an example multi-engine meter.

FIG. 2 is a block diagram of an example multi-engine meter that may be used in the example of FIG. 1.

FIG. 3 is a block diagram of an example set of audio engines that may be used to implement the example multi-engine meter of FIG. 2.

FIG. 4 is a block diagram of an example set of video engines that may be used to implement the example multi-engine meter of FIG. 2.

FIG. 5 is a block diagram of an example set of metadata engines that may be used to implement the example multi-engine meter of FIG. 2.

FIG. 6 is a block diagram of an example decision processor that may be used to implement the example multi-engine meter of FIG. 2.

FIGS. 7A-7D together form a flowchart representative of example machine readable instructions that may be executed to implement the example decision processor of FIG. 6.

Fig. 8 is a flow diagram representing example machine readable instructions that may be executed to implement the example volume and silence detector of fig. 3.

FIG. 9 is a flowchart representative of example machine readable instructions that may be executed to implement the example compression detector of FIG. 3.

FIG. 10 is a flowchart representative of example machine readable instructions that may be executed to implement the example ringtone detector of FIG. 3.

FIG. 11 is a flowchart representative of example machine readable instructions that may be executed to implement the example spectral shape processor of FIG. 3.

FIG. 12 is a flowchart representative of example machine readable instructions that may be executed to implement the example scene change and blank frame detector of FIG. 4.

Fig. 13 is a flow diagram representing example machine-readable instructions that may be executed to implement the example macroblock detector of fig. 4.

FIG. 14 is a flowchart representative of example machine readable instructions that may be executed to implement the example template matcher of FIG. 4.

Fig. 15 is a block diagram of an example computer that may execute the example machine-readable instructions of fig. 7A-7D, 8-13, and/or 14 to implement the example multi-engine meter of fig. 2.

Fig. 16A-16F illustrate example decision metrics (metrics) that may be employed by the example decision processor of fig. 6.

Detailed Description

A block diagram of an example home entertainment system 100 with content monitoring capabilities is illustrated in fig. 1. The example home entertainment system 100 includes a plurality of audio/visual (A/V) content sources 102, which plurality of audio/visual (A/V) content sources 102 may include any or all of a game console 104, a set-top box (STB)106, a Digital Video Disk (DVD) player 108, a Video Cassette Recorder (VCR)110, a Personal Video Recorder (PVR), a Digital Video Recorder (DVR)112, and so forth. A/V content source 102 is connected to an input of A/V switcher 114 to route output from a selected A/V content source 102 to an input of a television 116 or other information presentation device. Additionally, the demultiplexer 118 routes the input provided to the television 116 to the multi-engine meter 120 to facilitate monitoring of the A/V content provided to the television 116 and rendered by the television 116. The components of the home entertainment system 100 may be connected in any known manner, including the manner shown in fig. 1.

The gaming machine 104 may be any device capable of playing a video game. The example gaming machine 104 is a standard dedicated gaming machine such as Microsoft's XBOX, Nintendo's GameCube, Sony's PlayStation, etc. Another example gaming machine 104 is a portable dedicated gaming device, such as the GameBoy SP or Game Boy DS of Nintendo, or the PSP of Sony. Other example gaming machines 104 include Personal Digital Assistants (PDAs), personal computers, DVD players, DVRs, PVRs, cellular/mobile phones, and the like.

STB 106 may be any set top box such as a cable television converter, a Direct Broadcast Satellite (DBS) decoder, an Over The Air (OTA) Digital Television (DTV) receiver, a VCR, etc. The set top box 106 receives a plurality of broadcast channels from a broadcast source (not shown). Generally, the STB 106 selects one of a plurality of broadcast channels based on user input and outputs one or more signals received via the selected broadcast channel. In the case of an analog signal, the STB 106 tunes to a particular channel to obtain the programming transmitted on that channel. For digital signals, STB 106 may tune to a channel and decode certain data packets to obtain the program transmitted on the selected channel. For example, STB 106 may tune to the main channel and then extract the programs carried on the sub-channels within the main channel through the decoding process described above.

The DVD player 108 may be configured to output, for example, a/V content stored in a digital format on a DVD and/or audio content stored in a digital format on a Compact Disc (CD). The VCR 110 may be configured to output pre-recorded a/V content stored on a video box, for example, and/or to record a/V content provided by another a/V content source 102 for later presentation via the television 116. The PVR/DVR 112 may be configured to support time-shifted presentation of a/V content, such as provided by the STB 106. The PVR/DVR 112 typically supports various features including: presenting live A/V content, delaying presentation of live A/V content, fast forwarding and rewinding A/V content, pausing presentation of A/V content, recording A/V content for later presentation while viewing live broadcasts of other A/V content, and so forth. PVRs are typically DVRs that are configured to automatically adapt to, or respond automatically to, the viewing preferences of a particular user or group of users in a particular household. For example, many DVRs provide a telephone line connection that enables the DVR to communicate with a central service facility that receives viewer preference information from the DVRs and sends configuration information to the DVRs based on those viewer preferences. The DVR automatically configures, using the configuration information, the DVR to record video programs that conform to the preferences of one or more viewers associated with the DVR. TiVo^TMIs a well-known service that provides PVR functionality to other standard or conventional DVRs.

The a/V switch 114 is configured to route a/V inputs selected by a user to the switch output. As shown in FIG. 1, the output of each of the plurality of A/V content sources 102 is routed to a respective input of A/V switch 114. The user may then employ A/V switcher 114 to select which A/V content source 102 to connect to television 116. The format of the input and output of the a/V switch 114 will depend on the output format of the a/V content source 102 and the input format of the television 116. For example, the inputs and outputs of the A/V switch 114 may be composite audio/video, component audio/video, RF, and the like. Additionally, as one of ordinary skill in the art will recognize, the A/V switch 114 may be implemented as a standalone device or integrated into, for example, a home entertainment receiver, a television, or the like.

The output from the a/V switch 114 is fed to a demultiplexer 118, the demultiplexer 118 for example being a composite audio/video demultiplexer in case of a direct composite audio/video connection between the a/V switch 114 and the tv set 116 or for example being a single analog y-demultiplexer in case of an RF coaxial connection between the a/V switch 114 and the tv set 116. In the example home entertainment system 100, the signal splitter 118 generates two signals representing the output from the A/V switch 114. Of course, one of ordinary skill in the art will readily appreciate that any number of signals may be generated by signal splitter 118.

In the illustrated example, one of the two signals from the demultiplexer 118 is fed to the television 116 and the other signal is delivered to the multi-engine meter 120. The television 116 may be any type of television or television display device. For example, the television 116 may be a television and/or display device that supports the National Television Standards Committee (NTSC) standard, the Phase Alternating Line (PAL) standard, the sequential and storage color television System (SECAM) standard, a standard developed by the Advanced Television Systems Committee (ATSC) (e.g., High Definition Television (HDTV)), a standard developed by the Digital Video Broadcasting (DVB) project, a multimedia computer system, or the like.

The second of the two signals from the signal splitter 118 (i.e., the signal carried by the connection 122 in fig. 1) is coupled to an input of the multi-engine meter 120. The multi-engine meter 120 is an A/V content monitoring device capable of determining the A/V content source 102 that provides A/V content to the television 116. Such source identification information may be output via source identification output 124. Additionally, the multi-engine meter 120 may be configured to determine content identification information (also referred to as tuning information) that depends on the content source (e.g., video game title, broadcast program title, recorded program title, initial broadcast time, presentation time, trick mode of use, etc.). Such content identification information may be output via content information output 126. The multi-engine meter 120 determines content identification information based on a signal corresponding to the a/V content output through the a/V switch 114.

To facilitate determination of the source identification information and the content identification information, the multi-engine meter 120 may also be provided with one or more sensors 128. For example, one sensor 128 may be configured to detect a signal transmitted by a remote control device 130. As shown in FIG. 1, the example home entertainment system 100 also includes a remote control device 130 to transmit control information that may be received by any or all of the A/V content sources 102, the television 116, and/or the multi-engine meter 120. One of ordinary skill in the art will recognize that the remote control device 130 may transmit this information using a variety of techniques including, but not limited to, Infrared (IR) transmission, radio frequency transmission, wired/cable connections, and the like.

FIG. 2 illustrates a block diagram of an example multi-engine meter 200 that may be used to implement the multi-engine meter 120 of FIG. 1. The example multi-engine meter 200 is configured to process a composite a/V input including stereo left and right audio input signals 204 and a video input signal 208. The audio sampler 212 samples the stereo audio input signal 204 at an appropriate sampling rate (e.g., 48kHz) and converts it to a digital mono audio signal. The resulting digital audio samples are stored in an audio buffer 216. The video input signal 208 is sampled by a video sampler 220 to form digital video samples stored in a video buffer 224. In this example, video sampler 220 and video buffer 224 are configured to sample video input 208 at an NTSC frame rate of 29.97 frames/second at a resolution of 640 x 480 pixels. In addition, the input color video signal is converted into a black-and-white luminance signal. However, one of ordinary skill in the art will appreciate that various sampling rates, resolutions, and color conversions may also be employed.

The multi-engine meter 200 includes one or more audio engines 228 to process digital audio samples stored in the audio buffer 216. The audio engine 228 is configured to determine characteristics of the input audio signal 204 and/or information included in the input audio signal 204 that can be used to determine the a/V content source connected to the multi-engine meter 200 (e.g., which a/V content source 102 in fig. 1 is connected to the multi-engine meter 120 and thus to the television 116). Additionally, the audio engine 228 may be configured to determine a/V content identification information based on the input audio signal 204. An example of the audio engine 228 will be discussed in more detail below with respect to fig. 3.

The example multi-engine gauge 200 also includes one or more video engines 232 to process the digital video samples stored in the video buffer 224. Similar to the audio engine 228, the video engine 232 is configured to determine characteristics of the input video signal 208 and/or information included in the input video signal 208 that can be used to determine the a/V content source connected to the multi-engine meter 200 (e.g., which a/V content source 102 in fig. 1 is connected to the multi-engine meter 120 and thus to the television 116). Additionally, the video engine 232 may be configured to determine a/V content identification information based on the input video signal 208. An example of the video engine 232 will be discussed in more detail below with respect to fig. 4.

To receive, decode, and process metadata that may be embedded in the input audio signal 204 and/or the input video signal 208, the example multi-engine meter 200 includes a metadata extractor 236 and one or more associated metadata engines 240. The metadata extractor 236 is configured to extract and/or process portions of the input audio signal 204 and/or the input video signal 208 that are available to carry embedded metadata information. The extracted/processed signal portion is then further processed by a metadata engine 240 to determine whether metadata is present in the signal portion and, if so, to receive/decode such metadata. The resulting metadata may be used to determine the source of a/V content connected to the multi-engine meter 200 and/or determine a/V content information associated with the input signals 204, 208. An example of the metadata engine 240 will be discussed in more detail below with respect to fig. 5.

The example multi-engine meter 200 includes a decision processor 244 to process output information generated by the audio engine 228, the video engine 232, and the metadata engine 240. Additionally, the decision processor 244 of the example multi-engine gauge 200 is configured to process a remote control signal 248 transmitted by a remote control device (e.g., the remote control device 130 of fig. 1). As shown, remote control signal 248 is received by remote control detector 252 and provided as an input to decision processor 244. The decision processor 244 processes the available input information to determine the source of the a/V content connected to the multi-engine meter 200 and outputs this information via a source Identification (ID) output 256. In addition, decision processor 244 can determine A/V content identification information and output this information via content information (Info) output 260. An example decision processor 244 is discussed in more detail below with respect to FIG. 6.

An example set of audio engines 300 that may be used to implement the audio engine 228 of fig. 2 is shown in fig. 3. The audio engine 300 processes input audio samples 304, such as provided by the audio buffer 216 in fig. 2. The input audio samples 304 correspond to audio signals being output by an A/V content source (e.g., one of the A/V content sources 102 in FIG. 1) and provided as input to a monitored rendering device (e.g., television 116). The audio engine included in the audio engine 300 is configured to read the plurality of input audio samples 304 at a rate and frequency that depends on the processing performed by the particular audio engine. Thus, the audio engine 300 may operate autonomously and read the input audio samples 304 and generate corresponding audio engine outputs 308 in an autonomous manner.

The example set of audio engines 300 includes an audio code detector 312, an audio feature processor 316, a volume and silence detector 320, a compression detector 324, a ring tone detector 328, and a spectral shape processor 332. The example audio code detector 312 is configured to detect and process auxiliary audio codes that may be embedded in the audio signal corresponding to the input audio samples 304. As described above, the auxiliary audio codes may be used to encode and embed identifying information (e.g., broadcast/network channel numbers, program identification codes, broadcast timestamps, source identifiers used to identify networks and/or stations that provide and/or broadcast content, etc.) in, for example, the inaudible portion of the audio signal accompanying the broadcast program. Methods and apparatus for implementing the audio code detector 312 are well known in the art. Srinivasan, for example, in U.S. patent No.6,272,176, which is incorporated herein by reference in its entirety, discloses a broadcast encoding system and method for encoding and decoding information transmitted in an audio signal. The audio code detector 312 may be implemented using this and/or any other suitable technology.

The example audio feature processor 316 is configured to generate and process audio features corresponding to the input audio samples 304. As described above, characteristics of the audio portion of the rendered A/V content may be used to generate a substantially unique agent or feature (e.g., a series of digital values, waveforms, etc.) for the content. The characteristic information of the presented content may be compared to a set of reference characteristics corresponding to a set of known content. When a substantial match is found, the currently displayed A/V content may be identified with a relatively high likelihood. Methods and apparatus for implementing the audio feature processor 316 are well known in the art. For example, Srinivasan et al, in U.S. patent No. 09/427,970, which is incorporated herein by reference in its entirety, disclose audio feature extraction and related techniques. As another example, Lee et al disclose feature-based program identification apparatus and methods for digital broadcast systems in the patent cooperation treaty application serial No. US03/22562, the entire contents of which are incorporated herein by reference. The audio feature processor 316 may be implemented using these and/or any other suitable techniques.

The example volume and mute detector 320 is configured to determine whether the input audio samples 304 correspond to an audio signal in a mute state. Additionally or alternatively, the volume and mute detector 320 may be configured to determine a volume level associated with the input audio sample 304. A decision processor (e.g., decision processor 224 in fig. 2) may employ knowledge of whether audio is in a muted state, for example, to determine which audio engine output 308 to process and/or how to process such output. Example machine-readable instructions 800 executable to implement the volume and mute detector 320 will be discussed in the detailed description of FIG. 8 below.

The example compression detector 324 is configured to determine whether the input audio samples 304 correspond to a compressed audio signal. Additionally or alternatively, the compression detector 324 is configured to determine which type of compression is performed on the compressed audio signal. For example, DVD and digital television systems typically employ AC3 compression to store/transmit digital audio, while some DVR/PVRs may employ MPEG audio compression. Thus, a decision processor (e.g., decision processor 224 in fig. 2) may employ knowledge of whether the audio has been compressed, and if so, the type of compression employed, to determine the a/V content source corresponding to the input audio samples 304, for example. Example machine-readable instructions 900 executable to implement the compression detector 324 are discussed in the detailed description of FIG. 9 below.

The example ringtone detector 328 is configured to determine whether the input audio sample 304 corresponds to an audio ringtone generated by an a/V content source, for example, when a user causes the a/V content source to display a menu such as a power-on menu, a channel/program selection menu, or the like. A decision processor (e.g., decision processor 224 in fig. 2) may employ knowledge of whether the input audio samples 304 correspond to an audio ringtone, for example, to determine which a/V content source generated the audio ringtone and thus is the source of the corresponding input audio sample 304. Known techniques for generating and comparing audio features (e.g., those described above with respect to the example audio feature processor 316) may be adapted to determine whether the input audio sample 304 corresponds to a reference audio ringtone. Example machine-readable instructions 1000 that may be executed to implement ringtone detector 328 will be discussed in the detailed description of fig. 10 below.

The example spectral shape processor 332 is configured to determine whether the input audio sample 304 corresponds to an audio signal having a particular spectral shape. For example, audio signals in analog cable television transmission systems exhibit an increase in energy in the frequency band at or near 15.75kHz due to video signal leakage. Thus, a decision processor (e.g., decision processor 224 in fig. 2), for example, can employ knowledge of whether the audio has a particular spectral shape to determine the a/V content source corresponding to the input audio sample 304. Example machine-readable instructions 1100 executable to implement the spectrum shape processor 332 are discussed in the detailed description of FIG. 11 below.

As shown in the example of fig. 3, the results of the individual audio engines 312 to 332 may be adjusted/prioritized by a set of respective weights 336 to 356. For example, the weights 336-356 may adjust the audio engine results exactly based on the amount of information, confidence, etc., that the respective audio engine results may contribute to the processing performed by the decision processor (e.g., decision processor 224 in fig. 2). Additionally or alternatively, in this example, the weights 336 through 356 may be implicit and based on, for example, which level of decision processing a particular audio engine result is used in a decision processor to perform, the order of precedence of particular audio engine results given by the decision processor, and so forth.

An example set of video engines 400 that may be used to implement the video engine 232 of fig. 2 is shown in fig. 4. The video engine 400 processes input video samples 404, such as provided by the video buffer 224 in fig. 2. The input video samples 404 correspond to video signals being output by an a/V content source (e.g., one of the a/V content sources 102 in fig. 1) and provided as input to a monitored presentation device (e.g., television 116). The video engine included in video engine 400 is configured to read the plurality of input video samples 404 at a rate and frequency that depends on the processing performed by the particular video engine. Thus, the video engine 400 may operate autonomously and read the input video samples 404 and generate corresponding video engine outputs 408 in an autonomous manner.

The example video engine set 400 includes a text detector 412, a blur detector 416, a scene change and blank frame detector 420, a macroblock detector 424, and a template matcher 428. The example text detector 412 is configured to determine whether a portion/region of the video corresponding to the input video sample 404 includes text associated with, for example, a known display (e.g., a menu displayed by a particular a/V content source based on invocation of the selected operating mode). Thus, a decision processor (e.g., decision processor 224 in fig. 2) may employ knowledge of whether the input video sample 404 corresponds to a video displaying particular text, for example, to determine the a/V content source corresponding to the input video sample 404. Methods and apparatus for implementing text detector 412 are well known in the art. For example, in patent cooperative treaty application serial No. US04/012272, the entire contents of which are incorporated herein by reference, Nelson et al disclose methods and apparatus for detecting a television channel change event based on determining whether a selected portion of a video display includes a numerical digit corresponding to a displayed channel number. This and/or any other suitable technique may be employed to implement text detector 412.

The example blur detector 416 is configured to determine whether a portion/region of the video corresponding to the input video sample 404 is blurry or exhibits a blurry characteristic. For example, blur may be introduced into the video/image due to compression associated with a particular A/V content source. Thus, a decision processor (e.g., decision processor 224 in fig. 2) may employ knowledge of whether the input video sample 404 corresponds to a video exhibiting blur, for example, to determine the a/V content source corresponding to the input video sample 404. Methods and apparatus for implementing the blur detector 416 are well known in the art. For example, "Digital image retrieval," IEEE Signal Processing Magazine, 1997, 3 months, pp.24-41, Banham and Katsaggelos, which is incorporated herein by reference in its entirety, describe various techniques for identifying blur in an image. These and/or any other suitable techniques may be employed to implement blur detector 416.

The example scene change and blank frame detector 420 is configured to determine whether a set of sequential frames corresponding to the input video sample 404 exhibits, for example, a scene change, a pause frame, one or more blank frames, and so forth. Such information may be used, for example, to determine whether a trick mode (e.g., pause) was performed by the a/V content source providing the input video samples 404. In addition, the number of blank frames detected within a predetermined interval (e.g., two minutes) may be employed to determine whether the A/V content corresponds to, for example, a commercial dense cluster (pod), thereby indicating whether the A/V content source is a broadcast source. Thus, a decision processor (e.g., decision processor 224 in fig. 2), for example, can employ knowledge of whether the input video sample 404 corresponds to a scene change, pause frame, blank frame, etc., to determine the a/V content source corresponding to the input video sample 404. Example machine-readable instructions 1200 executable to implement the scene change and blank frame detector 420 will be discussed in the detailed description of fig. 12 below.

The example macroblock detector 424 is configured to determine whether video corresponding to the input video samples 404 exhibits macroblock characteristics corresponding to MPEG video compression. In addition, the macro block detector 424 may determine whether the video signal exhibits a near-perfect color mixture that represents a video game being played through a gaming machine (e.g., gaming machine 104 in FIG. 4). A decision processor (e.g., decision processor 224 in fig. 2) may employ knowledge of whether the input video sample 404 exhibits macroblock characteristics or near-perfect color blending, for example, to determine the source of a/V content corresponding to the input video sample 404. Example machine-readable instructions 1300 that may be executed to implement macro block detector 424 are discussed in the detailed description of fig. 13 below.

The example template matcher 428 is configured to determine whether the video corresponding to the input video sample 404 matches a known/stored template, e.g., corresponding to a menu screen being output by a particular a/V content source. A decision processor (e.g., decision processor 224 in fig. 2) may employ knowledge of whether the input video sample 404 corresponds to a known/stored template, for example, to determine the a/V content source corresponding to the input video sample 404. Known techniques for generating and comparing Video features (e.g., techniques described in U.S. patent No.6,633,651 entitled "Method and Apparatus for registering Video sequences" and U.S. patent No.6,577,346 entitled "registering a Pattern in an a Video Segment to Identify the Video Segment," both of which are incorporated herein by reference in their entirety) may be adapted to determine whether the input Video sample 404 corresponds to a reference template. Example machine-readable instructions 1400 that may be executed to implement template matcher 428 are discussed in the detailed description of fig. 14 below.

As shown in the example of fig. 4, the results of the various video engines 412 through 428 may be adjusted/prioritized by a set of respective weights 432 through 448. For example, the weights 432-448 may exactly adjust the video engine results based on the amount of information, confidence, etc., that the respective video engine results may contribute to the processing performed by the decision processor (e.g., decision processor 224 in fig. 2). Additionally or alternatively, in this example, the weights 442 through 448 may be implicit and used, for example, in which stage of the decision process the decision processor performs based on the particular video engine results, the order of precedence given by the decision processor to the particular video engine output, and so forth.

An example set of metadata engines 500 that may be used to implement the metadata engine 240 of fig. 2 is shown in fig. 5. The metadata engine 500 processes input metadata 504 provided, for example, by the metadata extractor 236 in fig. 2. Input metadata 504 corresponds to audio and/or video signals being output by an A/V content source (e.g., one of A/V content sources 102 in FIG. 1) and provided as input to a monitored rendering device (e.g., television 116). The metadata engines included in the metadata engine 500 are configured to read the input metadata 504 at a rate and frequency that depends on the processing performed by the particular metadata engine. Thus, the metadata engine 500 may operate autonomously and read the input metadata 504 and generate corresponding metadata engine outputs 508 in an autonomous manner.

The example set of metadata engines 500 includes an automatic orchestration (lineup) measurement (AMOL) processor 512, a closed caption processor 516, and a teletext processor 520. The example AMOL processor 512 is configured to determine whether the input metadata 504 corresponds to AMOL code and process such code if correspondence exists. AMOL codes may be embedded, for example, in broadcast television transmissions to enable identification of the content being transmitted, the source from which the content is transmitted, and the like. More specifically, the AMOL code may be included in an invisible portion of the broadcast television signal (e.g., lines 20 of a Vertical Blanking Interval (VBI)) and/or in a visible portion of the broadcast television signal (e.g., lines 22 of an active video portion of the video signal). In addition, the AMOL code may be encrypted. Typically, AMOL codes, e.g., transmitted in line 20 of a VBI, are not recoverable after digital compression because the digital video signal does not use the VBI, and therefore, the compression algorithm may discard/corrupt such information. The AMOL code, for example, transmitted in line 22, is recoverable after digital compression because such code is transmitted in the active video portion of the video signal.

Thus, a decision processor (e.g., decision processor 224 in fig. 2), for example, can employ the processed AMOL code to determine the a/V content source and the additional content identification information corresponding to the input metadata 504. Methods and apparatus for implementing the AMOL processor 512 are well known in the art. For example, in U.S. patent nos. 5,425,100 and 5,526,427, which are incorporated herein by reference in their entirety, Thomas et al disclose a general broadcast code and a multi-level coded signal monitoring system that can be used to process AMOL codes. The AMOL processor 512 may be implemented using these and/or any other suitable techniques.

The example closed caption processor 516 is configured to determine whether the input metadata 504 corresponds to closed caption information and, if so, to process such information. Closed caption information (e.g., text) may be included in the invisible portion of the broadcast television signal (e.g., line 21 of the VBI). A decision processor (e.g., decision processor 224 in fig. 2) can employ the processed closed caption information to determine the a/V content source and additional content identification information corresponding to the input metadata 504, for example. Methods and apparatus for implementing the closed caption processor 516 are well known in the art. For example, in U.S. patent No.4,857,999, which is incorporated herein by reference in its entirety, Welsh describes a video surveillance system for processing closed caption information. This and/or any other suitable techniques may be employed to implement the closed caption processor 516.

The example teletext processor 520 is configured to determine whether the input metadata 504 corresponds to teletext information and, if present, process such information. As with closed caption information, teletext information may be included in the invisible portion of the broadcast television signal. A decision processor (e.g., decision processor 224 in fig. 2) may employ the processed teletext information, for example, to determine the a/V content source and additional content identification information corresponding to the input metadata 504. Methods and apparatus for implementing teletext processor 520 are well known in the art. For example, the techniques for processing closed caption information may be adapted to process teletext. Likewise, teletext processor 520 may be implemented using any suitable technology.

As shown in the example of fig. 5, the results of the respective metadata engines 512 through 520 may be adjusted/prioritized by a set of respective weights 524 through 532. For example, the weights 524-532 may exactly adjust the metadata engine results based on the amount of information, confidence, etc., that the respective metadata engine results may contribute to the processing performed by the decision processor (e.g., decision processor 224 in fig. 2). Additionally or alternatively, in this example, the weights 524 through 532 may be implicit and based on, for example, which level of decision processing a particular metadata engine result is used in a decision processor to perform, the order of priority given by the decision processor to the particular metadata engine output, and so forth.

One of ordinary skill in the art will appreciate that additional or alternative metadata processors may be included in the set of metadata engines 500 depending on the type of metadata provided by the metadata input 504. Such additional or alternative metadata processors may, for example, be configured to process content identification information included in a digital bitstream providing the monitored a/V content. The content identification information may be, for example, the universal international standard audio video number (VISAN), or any other type of identifier that may be used to identify the monitored a/V content.

A block diagram of an example decision processor 600 for implementing the decision processor 244 of fig. 2 is illustrated in fig. 6. The example decision processor 600 receives one or more audio engine results 604 from one or more audio engines (e.g., the audio engine 300 in fig. 3), one or more video engine results 608 from one or more video engines (e.g., the video engine 400 in fig. 4), and one or more metadata engine results 612 from one or more metadata engines (e.g., the metadata engine 500 in fig. 5). The audio engine results 604 are stored in respective audio metric registers 616 through 620. Video engine results 608 are stored in respective video metric registers 624 through 628. The metadata engine results 612 are stored in respective metadata metric registers 632 through 636. The audio metric registers 616-620, video metric registers 624-628, and metadata metric registers 632-636 may be implemented as hardware registers, memory units, etc., or a combination thereof. Because the various audio engine results 604, video engine results 608, and metadata engine results 612 are generated autonomously, the audio metric registers 616-620, video metric registers 624-628, and metadata metric registers 632-636 may be updated autonomously as their respective results are available.

The example decision processor 600 includes an audio metric sampler 640, a video metric sampler 644, and a metadata metric sampler 648 to sample the audio metric registers 616-620, the video metric registers 624-628, and the metadata metric registers 632-636, respectively (e.g., to read corresponding results from hardware registers, memory locations, etc.). The sampling operations may be performed at predetermined intervals, based on the occurrence of predetermined events, etc., or any combination thereof. The audio metric sampler 640, video metric sampler 644, and metadata metric sampler 648 provide sampled results to the measurement engine metric evaluator 652. The measurement engine metric evaluator 652 employs the available audio engine results, video engine results, and metadata engine results to determine the A/V content source corresponding to the monitored A/V content. The measurement engine metric evaluator 652 outputs the detected a/V content source via the source ID output 656. The measurement engine metric evaluator 652 may also determine additional content identification information corresponding to the monitored a/V content. Such content identification information may be output via content information output 660. Example machine-readable instructions 700 executable to implement the measurement engine metric evaluator 652 are discussed in the following detailed description of fig. 7A through 7D.

16A-16F illustrate example decision metrics that the example measurement engine metric evaluator 652 may utilize, for example, to determine the A/V content source corresponding to the monitored A/V content and/or to determine whether the A/V content source corresponding to the monitored A/V content is in a special operating mode. FIG. 16A lists decision metrics that can be used to determine whether an A/V content source is a live analog television source (analog television live) or an analog video-on-demand (VOD) source (analog VOD). A first decision metric representing live analog television source detection is the presence of AMOL codes (e.g., provided by AMOL processor 512 in fig. 5) in line 20 of the VBI of the broadcast television signal, while no time shift in presentation of the a/V content is detected. As described above, the presence of the AMOL code in line 20 of the VBI indicates that the A/V content source is an analog television source because the AMOL code cannot withstand the compression associated with a digital television source. However, if no AMOL code is detected in line 20 of the VBI (e.g., by AMOL processor 512), then a live analog television source may also be detected using a second decision metric, including: the presence of an audio signal corresponding to the monitored presentation of a/V content is detected (e.g., corresponding to the detection of a "no audio mute" condition, as determined by, for example, volume and mute detector 320 in fig. 3), the presence of cable spectral shaping of the detected audio signal is detected (e.g., detected by spectral shape processor 332), and the absence of time shifting is detected. As described above, the presence of cable spectral shaping indicates that the detected audio signal has passed through an analog cable transmission system, whereby the a/V content source is an analog television source.

Similarly, FIG. 16A lists two decision metrics that may be used to detect an analog VOD source. The first analog VOD decision metric detects an analog television source by virtue of the presence of AMOL code (e.g., provided by AMOL processor 512) in line 20 of the VBI, while the presence of a time shift indicates that the source is not live but an analog VOD source. If there are no AMOL codes in the line 20 of VBIs (e.g., as determined by AMOL processor 512), a second analog VOD decision metric can be evaluated, which includes: the presence of an audio signal corresponding to the presentation of the a/V content is detected (e.g., corresponding to the detection of a "no audio mute" condition, as determined by, for example, the volume and mute detector 320 in fig. 3), the presence of cable spectral shaping indicative of an analog television source is detected (e.g., detected by the spectral shape processor 332), and the detection of a time shift indicative of the VOD presentation. Various techniques may be employed to detect time shifts in presentation of a/V content, such as: compare the broadcast time stamp included in the AMOL information with the real-time clock included in the multi-engine meter 200, compare the time stamp included in the audio code embedded in the detected audio signal with the real-time clock included in the multi-engine meter 200, and so on.

Fig. 16B lists two decision metrics corresponding to a third possible analog source, namely Video Cassette Recorder (VCR) playback. The first VCR playback decision metric combines the following: the presence of an AMOL code in line 20 of the VBI representing an analog television source (e.g., as provided by the AMOL processor 512), the presence of a time shift representing that the analog television source is not live, and the absence of spectral shaping representing the cable transmission system (e.g., as determined by the spectral shape processor 332) indicates that the source is a local VCR rather than a cable transmission system. If no AMOL codes in line 20 of the VBI are detected (e.g., by AMOL processor 512), then a second VCR playback decision metric can be evaluated to detect VCR playback, and include: detecting the presence of an audio signal corresponding to the presentation of the a/V content (e.g., corresponding to the detection of a "no audio mute" condition, as determined by, for example, volume and mute detector 320 in fig. 3); detecting the absence of spectral shaping indicative of a cable television transmission system (e.g., as determined by the spectral shape processor 332); and detecting the absence of any characteristics associated with the digital television transmission, such as video macro-block phenomena (e.g., as determined by macro-block detector 424 in fig. 4), AC3 audio compression (e.g., as determined by compression detector 324), or MPEG audio compression (e.g., as determined by compression detector 324), as described above. By excluding processing, the second VCR playback decision metric determines that the a/V content source corresponds to a local analog source and thus VCR playback.

Fig. 16B also lists decision metrics that can be used to detect digital a/V content sources corresponding to Digital Versatile Disc (DVD) playback. The DVD playback decision metric combines the absence of AMOL code in line 20 representing the VBI of the analog television source (e.g., as determined by AMOL processor 512) with: the presence of an audio signal corresponding to the presentation of a/V content is detected (e.g., corresponding to the detection of a "no audio mute" condition, as determined by, for example, volume and mute detector 320 in fig. 3), an audio macro-block phenomenon representative of the digital video presentation is detected (e.g., by macro-block detector 424), and AC3 audio compression representative of the digital audio presentation is detected (e.g., by compression detector 324). AC3 audio compression is employed to store audio content on a DVD, and video macro-blocking is more readily apparent in DVD-video presentations than in digital television presentations (as will be described in more detail below). Thus, the presence of AC3 audio compression and video macroblocking can be used to determine whether an a/V content source corresponds to DVD playback.

Fig. 16C lists decision metrics that may be used to detect a digital television source corresponding to a live broadcast (digital television live) or playback by a digital video recorder or similar device (digital television DVR playback). These metrics combine the absence of AMOL code in line 20 representing the VBI of the analog television source (e.g., as determined by AMOL processor 512) with: the presence of an audio signal corresponding to the presentation of a/V content is detected (e.g., corresponding to the detection of a "no audio mute" condition, as determined by, for example, volume and mute detector 320 in fig. 3), the absence of significant video macroblocking is determined (e.g., as determined by macroblock detector 424), and AC3 audio compression representing the digital audio presentation is detected (e.g., by compression detector 324). Live digital television differs from DVR playback in that there is no or a detected time shift, respectively. As in the case of DVD playback, digital television employs AC3 audio compression. However, in the case of digital television, significant video macroblocking is often not apparent due to the presence of anti-macroblocking filters in the digital television transmission system, less compression of the digital television video signal compared to the DVD video signal, the absence of transmission noise in the digital television signal during DVD playback, and so on, as compared to DVD playback. Thus, the presence of AC3 audio compression and the absence of significant video macroblocks can be employed to distinguish digital television sources from DVD playback.

FIG. 16D lists decision metrics that can be used to detect DVR sources that employ MPEG audio compression and provide live broadcast (MPEG DVR live) or delayed playback (MPEG DVR playback) of previously recorded A/V content. These metrics combine the absence of AMOL code in line 20 representing the VBI of the analog television source (e.g., as determined by AMOL processor 512) with: the presence of an audio signal corresponding to an a/V content presentation is detected (e.g., corresponding to the detection of a "no audio mute" condition, as determined by, for example, volume and mute detector 320 in fig. 3), the absence of significant video macro-block artifacts is determined (e.g., as determined by macro-block detector 424), and MPEG audio compression representing an MPEG DVR audio presentation is detected (e.g., by compression detector 324). Live MPEG DVR presentation differs from MPEG DVR playback in that there is no or a detected time shift, respectively. The input to the MPEG DVR is typically a digital television broadcast, and thus, for the reasons described above, the digital television video signal does not exhibit significant macroblocking, so the resulting MPEG DVR video signal will typically not exhibit significant macroblocking. Thus, the presence of MPEG audio compression and the absence of significant video macroblocks can be employed to detect MPEG DVR sources.

FIG. 16E lists decision metrics that may be used to detect the source of the video game. The video game decision metric combines the absence of AMOL code in line 20 representing the VBI of the analog television source (e.g., as determined by AMOL processor 512) with: the presence of an audio signal corresponding to the presentation of a/V content is detected (e.g., corresponding to the detection of a "no audio mute" condition, as determined by, for example, volume and mute detector 320 in fig. 3), as well as the zero video macroblocking result (e.g., determined by macroblock detector 424) representing perfect color mixing. As described above, perfect color mixing represents a video game presentation and thus can be used to detect a video game source.

Fig. 16E also lists decision metrics that may be used to detect a particular mode of operation of the a/V content source corresponding to a blank frame state or an audio mute state. The blank frame metric is based on detecting the presence of a blank video frame (e.g., by the scene change and blank frame detector 420 in fig. 4). The audio muting metric is based on detecting the absence of an audio signal corresponding to the presentation of the a/V content (e.g., as determined by volume and mute detector 320). The audio silence metric also checks whether closed caption or teletext data is present (e.g., as determined by the closed caption processor 516 and teletext processor 520 of fig. 5, respectively) to verify that the a/V content presentation corresponds only to an audio silence state and not to other special modes of operation as described below.

Fig. 16F lists decision metrics that may be used to detect additional special operating modes corresponding to menu display and pause states. The menu display metric is based on detecting a paused video display (e.g., by scene change and blank frame detector 420) and matching the a/V content presentation to the template, ring tone, and/or text corresponding to the menu display (e.g., as determined by template matcher 428 of fig. 4, ring tone detector 328 of fig. 3, and text detector 412 of fig. 4, respectively). Optionally, the menu display metric may also check whether there is no audio signal corresponding to the a/V content presentation (e.g., as determined by volume and mute detector 320) and/or whether there is no closed captioning or teletext data (e.g., as determined by closed caption processor 516 and teletext processor 520, respectively) to further verify that the current display does not correspond to a normal a/V content presentation. The pause metric is based on detecting the absence of an audio signal corresponding to the a/V content presentation (e.g., as determined by volume and silence detector 320), detecting the absence of closed captioning or teletext data (e.g., as determined by closed caption processor 516 and teletext processor 520, respectively), detecting the pause of the video display (e.g., by scene change and blank frame detector 420), and detecting the absence of a template and/or text match corresponding to the menu display (e.g., as determined by template matcher 428 and text detector 412, respectively).

Finally, FIG. 16F also lists metrics that can be used to determine whether the A/V content source is operating in some other trick mode (e.g., rewind state, fast-forward state, etc.). The trick mode metric is based on detecting that there is no audio signal corresponding to the a/V content presentation (e.g., as determined by volume and silence detector 320), no closed captioning or teletext data (e.g., as determined by closed caption processor 516 and teletext processor 520, respectively), and no paused video display or blank frames (e.g., as determined by scene change and blank frame detector 420) present. The absence of an audio signal and closed captioning or teletext data indicates that the active display does not correspond to a normal a/V content presentation. However, since the video display does not correspond to a pause state (representing a pause frame or menu display) or blank frame, it is considered that the active display corresponds to some other trick mode operation of the A/V content source.

A flowchart representative of example machine readable instructions executable to implement at least a portion of the measurement engine metric evaluator 652 of fig. 6 and the audio engine 300 of fig. 3 and the video engine 400 of fig. 4 is shown in fig. 7A through 7D up to fig. 14. In these examples, the machine-readable instructions represented by the various flow diagrams may include one or more programs for execution by: (a) a processor, such as processor 1512 shown in example computer 1500, discussed below with respect to FIG. 15; (b) a controller; and/or c) any other suitable device. The one or more programs may be embodied in software stored on a tangible medium such as flash memory, CD-ROM, floppy disk, hard disk, DVD, or memory associated with the processor 1512, but persons of ordinary skill in the art will readily appreciate that the entire program or programs and/or parts of the program may alternatively be executed by a device other than the processor 1512 and/or may be embodied in firmware or dedicated hardware in a well-known manner (e.g., as implemented by an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Logic Device (FPLD), discrete logic, etc.). For example, any or all of the measurement engine metric evaluator 652, the audio engine 300, and/or the video engine 400 (as well as the metadata engine 500 in fig. 5) may be implemented by any combination of software, hardware, and/or firmware. Additionally, some or all of the machine-readable instructions represented by the flow diagrams in FIGS. 7A-7D through FIG. 14 may be implemented manually. Furthermore, although the example machine-readable instructions are described with reference to the flowcharts illustrated in fig. 7A-7D through fig. 14, persons of ordinary skill in the art will readily appreciate that many other techniques for implementing the example methods and apparatus described herein may alternatively be employed. For example, with reference to the flow diagrams illustrated in fig. 7A-7D through 14, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, combined, and/or subdivided into multiple blocks.

An example machine-readable instruction 700 executable to implement the measurement engine metric evaluator 652 of fig. 6 is illustrated in fig. 7A-7D. Although the example machine-readable instructions 700 are based on the decision metrics shown in fig. 16A-16F and are for a monitored television that conforms to the NTSC standard, the machine-readable instructions can be readily modified to support any type of display/information presentation device. The example machine-readable instructions 700 may be executed at predetermined intervals, based on the occurrence of predetermined events, the like, or any combination thereof. The machine-readable instructions 700 begin execution at block 701 in fig. 7A, at block 701, the measurement engine metric evaluator 652 samples available audio, video, and metadata metrics/results obtained, for example, from the audio engine 300, the video engine 400, and the metadata engine 500. Control then passes to sub-process 702 (which is discussed in more detail below with respect to fig. 7B), where the measurement engine metric evaluator 652 determines the a/V content source that provides the monitored a/V content presentation. After sub-process 702 is complete, control then passes to sub-process 703 (which is discussed in more detail below with respect to FIG. 7C), at sub-process 703, measurement engine metric evaluator 652 determines content identification information (e.g., tuning data) corresponding to the monitored A/V content presentation provided by the A/V content source identified by sub-process 701. Next, after sub-process 703 is complete, control then passes to sub-process 704 (which is discussed in more detail below with respect to FIG. 7D), where at sub-process 704 measurement engine metric evaluator 652 detects any special operating modes of the A/V content source identified by sub-process 701. Finally, after the sub-processor 704 is complete, control proceeds to block 705 where the measurement engine metric evaluator 652 reports the identified a/V content sources, content identification information (e.g., tuning data), and/or any special operating modes of the a/V content sources to, for example, a central facility for generating ratings statistics via outputs 656 and 660. The example process 700 then ends.

An example sub-process 702 for determining which A/V content source is providing the monitored A/V content presentation is shown in FIG. 7B, and the example sub-process 702 is based on the example decision metrics listed in FIGS. 16A through 16F. The example process 702 begins at decision node 706, where the measurement engine metric evaluator 652 determines whether the video metric sampled at block 701 of fig. 7A represents AMOL information present in a line 20 of an NTSC television signal as processed by, for example, the AMOL processor 512. If there is AMOL information in row 20 (decision node 706), control proceeds to decision node 707 at decision node 707, where measurement engine metric evaluator 652 detects whether A/V content is being provided with a time shift, e.g., based on comparing a broadcast timestamp included in the AMOL information to a current processing time. The current processing time may be determined, for example, based on a real-time clock function performed on the measurement engine metric evaluator 652, the multi-engine meter 200, or the like, or a real-time clock device connected to the measurement engine metric evaluator 652, the multi-engine meter 200, or the like. If a time shift is not detected (decision node 707), control proceeds to block 708 and based on the AMOL information present in line 20 and in accordance with the first analog TV live metric in FIG. 16A, the measurement engine metric evaluator 652 determines that the A/V content source is an analog TV broadcast (e.g., terrestrial, cable, etc.). The example process 702 then ends.

However, if a time shift is detected (decision node 707), control proceeds to decision node 710 where the measurement engine metric evaluator 652 determines whether the audio metric represents that the monitored audio presentation conforms to spectral shaping of the broadcast analog cable system. Such a measure may be provided, for example, by the spectral shape processor 332. If the measurement engine metric evaluator 652 determines that cable spectral shaping is present (decision node 710), then control proceeds to block 712 according to the first analog VOD metric of FIG. 16A, and based on the AMOL information present in row 20, the analog cable spectral shaping, and the detected time shift, the measurement engine metric evaluator 652 determines that the A/V content source is an analog Video On Demand (VOD) presentation. The example process 702 then ends. If, however, cable spectral shaping is not detected (decision node 710), control proceeds to block 714 according to the first VCR playback metric of fig. 16B, and based on the AMOL information present in line 20, the detected time shift, and the absence of cable spectral shaping, the measurement engine metric evaluator 652 determines that the a/V content source is VCR playback. The example process 702 then ends.

Returning to decision node 706, if, however, AMOL information is not present in line 20, control proceeds to decision node 718 where the measurement engine metric evaluator 652 determines whether the audio metric represents that an audio mute state has been detected, for example, by the volume and mute detector 320, at decision node 718. If an audio mute state is not detected (decision node 718), and thus there is an audio signal corresponding to the monitored A/V content, then control passes to decision node 722 where the measurement engine metric evaluator 652 determines whether the audio metric represents a monitored audio presentation that conforms to the spectral shaping of the broadcast analog cable television system. If cable spectral shaping is present (decision node 722), control proceeds to decision node 724 where the measurement engine metric evaluator 652 detects whether the A/V content is being provided time-shifted. The measurement engine metric evaluator 652 may determine whether a time shift exists, for example, based on comparing a broadcast timestamp included with an audio code embedded in the audio signal to a current processing time. If no time shift is detected (decision node 724), then control proceeds to block 726 according to the second analog television live metric in FIG. 16A, and based on the presence of an audio signal with cable spectral shaping, the measurement engine metric evaluator 652 determines that the A/V content source is an analog television broadcast. The example process 702 then ends. If, however, a time shift is detected (decision node 724), then control proceeds to block 728 according to the second analog VOD live metric in fig. 16A, and based on the presence of audio codes, analog cable spectral shaping, and the detected time shift, the measurement engine metric evaluator 652 determines the a/V content source as an analog VOD transmission. The example process 702 then ends.

Returning to decision node 722, if, however, analog cable spectral shaping is not present, control passes to decision node 730 where the measurement engine metric evaluator 652 determines, e.g., via the macroblock detector 424, whether the video metric indicates that a macroblock is detected at decision node 730. If a macroblock is detected (decision node 730), then control proceeds to decision node 732 where the measurement engine metric evaluator 652 determines whether the audio metric represents that the audio signal has been compressed by the AC3, for example, as detected by the compression detector 324. If AC3 compression is detected (decision node 732), control proceeds to block 734 and the measurement engine metric evaluator 652 determines that the A/V content source is DVD playback based on the absence of analog cable spectral shaping and the presence of macroblocks and AC3 compression, in accordance with the DVD playback metric of FIG. 16B. The example process 702 then ends.

However, if no AC3 compression is detected (decision node 732), the measurement engine metric evaluator 652 determines that there is insufficient information to determine the a/V content source directly from the audio, video, and metadata metrics sampled at block 701 of fig. 7A. Accordingly, control proceeds to block 736 where the measurement engine metric evaluator 652 employs previously stored heuristic information to determine the A/V content source at block 736. Employing the stored heuristics to determine the A/V content source will be discussed in more detail below. Upon completion of the processing at block 736, the example sub-process 702 ends.

However, if no macroblock is detected (decision node 730), then control proceeds to decision node 737 where, at decision node 737, measurement engine metric evaluator 652 determines whether the video metric indicates, for example, that the macroblock index output by macroblock detector 424 is equal to zero (equal to zero indicates perfect color match). If the macroblock index is not equal to zero, control proceeds to decision node 738 where the measurement engine metric evaluator 652 determines whether the audio metric indicates that AC3 compression is detected at decision node 738. If no AC3 compression is detected (decision node 738), then control proceeds to decision node 740 where the measurement engine metric evaluator 652 determines whether the audio metric represents that the audio signal has been MPEG audio compressed, for example, as detected by the compression detector 324. If MPEG audio compression is detected (decision node 740), control proceeds to decision node 742 where the measurement engine metric evaluator 652 detects whether a time shift is present, for example, by comparing timestamp information included in the MPEG audio compressed data to the current processing time at decision node 742. If no time shift is detected (decision node 742), then control proceeds to block 744 according to the MPEG DVR live metrics in fig. 16D, and based on the presence of MPEG audio compression, the absence of macroblocks, and no time shift detected, the measurement engine metrics evaluator 652 determines that the a/V content source is an MPEG-type DVR that outputs a "live" broadcast program. If, however, a time shift is detected (decision node 742), then control proceeds to block 746 according to the MPEG DVR playback metrics of fig. 16D, and based on the absence of macroblocks, the presence of MPEG audio compression, and the detection of a time shift, the measurement engine metric evaluator 652 determines that the a/V content source is an MPEG-type DVR that plays back previously recorded a/V content. However, if MPEG audio compression is not detected (decision node 740), then control proceeds to block 748 according to the second VCR playback metric of fig. 16B, and since there are no macroblocks, audio compression, and AMOL information, the measurement engine metric evaluator 652 determines the a/V content source as a VCR to play back the previously recorded a/V content. After processing at any of blocks 744, 746, or 748 is complete, the example sub-process 702 ends.

Returning to decision node 738, if, however, AC3 compression is detected, control proceeds to decision node 750 where the measurement engine metric evaluator 652 detects whether a time shift exists, for example, by comparing timestamp information included in the AC3 audio compressed data to the current processing time at decision node 750. If a time shift is detected (decision node 750), control proceeds to block 752 according to the digital television playback metrics of FIG. 16C, and based on the absence of macroblocks and the presence of AC3 audio compression and the detection of the time shift, the measurement engine metrics evaluator 652 determines that the A/V content source is, for example, a cable TV DVR outputting previously recorded A/V content (block 752). If, however, no time shift is detected (decision node 742), then control proceeds to block 754 in accordance with the digital television live metrics in fig. 16C, and based on the presence of AC3 audio compression and the absence of macroblocks and the absence of a time shift, the measurement engine metrics evaluator 652 determines that the a/V content source is a digital cable broadcast that outputs "live" a/V content (perhaps through an associated DVR). After the process at block 752 or 754 is complete, the example sub-process 702 ends.

If, however, at decision node 737, the measurement engine metric evaluator 652 determines that the macroblock index output by the macroblock detector 424 is equal to zero, then control passes to block 756. At block 756 and in accordance with the video game decision metric of fig. 16E, the measurement engine metric evaluator 652 determines the a/V content source as a video game based on a perfect color match represented by a macroblock index equal to zero. The example sub-process 702 then ends.

Returning to decision node 718, if the measurement engine metric evaluator 652 determines that an audio mute state is detected, for example, by the volume and mute detector 320, the measurement engine metric evaluator 652 may determine that there is insufficient information to determine the a/V content source due to the lack of audio or AMOL information provided by the audio, video, and metadata metrics sampled at block 701 of fig. 7A. Accordingly, control proceeds to block 760 where the measurement engine metric evaluator 652 employs previously stored heuristic information to determine the A/V content source at block 760. Employing the stored heuristics to determine the A/V content source will be discussed in more detail below. After the process at block 760 is complete, the example sub-process 702 ends.

An example sub-process 703 for determining content identification information (e.g., tuning data) corresponding to a presentation of content provided by an a/V content source identified, for example, by sub-process 701 in fig. 7B is shown in fig. 7C. The content identification information may include, for example, content/program name, broadcast time, broadcast station ID/channel number, and the like. Example process 703 begins at decision node 762 where, for example, measurement engine metric evaluator 652 of fig. 6 determines whether the video metric sampled at block 701 of fig. 7A represents AMOL information present in line 20 of the NTSC television signal as processed by, for example, AMOL processor 512. If there is AMOL information in row 20 (decision node 762), control proceeds to block 764 where measurement engine metric evaluator 652 determines content identification information from the detected AMOL information in row 20 based on any suitable technique (e.g., those described above with respect to AMOL processor 512) at block 764. The example process 703 then ends.

However, if AMOL information is not present in row 20 (decision node 762), control proceeds to decision node 766 where the measurement engine metric evaluator 652 determines whether the video metric indicates that AMOL information is present in row 22. If there is AMOL information in row 22 (decision node 766), control proceeds to block 768, at block 768, the measurement engine metric evaluator 652 determines content identification information from the detected AMOL information in row 22 based on any suitable technique (e.g., those described above with respect to AMOL processor 512). The example process 703 then ends.

However, if AMOL information is not present in line 22 (decision node 766), then control proceeds to decision node 770 where the measurement engine metric evaluator 652 determines whether the audio metric represents the presence of an audio code, e.g., as processed by the audio code detector 312 in fig. 3. If audio codes are present (decision node 770), control proceeds to block 772 where the measurement engine metric evaluator 652 determines program identification information from the available audio codes based on any suitable technique (e.g., those described above with respect to the audio code detector 312) at block 772. The example process 703 then ends.

However, if no audio code is present (decision node 770), then control passes to block 774 where the measurement engine metric evaluator 652 may determine program identification information by, for example, comparing the audio features corresponding to the monitored a/V content presentation and generated by the audio feature processor 316 in fig. 3 to a set of known reference features. Additionally or alternatively, the measurement engine metric evaluator 652 can output audio features corresponding to the monitored a/V content for comparison to a set of known reference features at, for example, a central processing facility. Any known technique for generating and comparing features (e.g., those described above with respect to the audio feature processor 316) may be employed at block 774 to confirm the desired content identification information. In any case, after the processing at block 774 is complete, the example sub-process 703 ends.

An example sub-process 704 is shown in fig. 7D, the example sub-process 704 being used to detect any special operating mode of the a/V content source, for example, identified by the sub-process 701 in fig. 7B, and based on the decision metrics listed in fig. 16A through 16F. The special operating modes detected by the sub-process 704 include blank frame mode, audio mute mode, pause mode, menu display mode, device off mode, and full control (catch-all) trick mode representations. The full control trick mode representation is used to indicate that the identified a/V content source may operate in any number of special trick modes including, for example, rewind mode, fast forward mode, etc. The example process 704 begins at decision node 776 where the measurement engine metric evaluator 652 determines whether the video metric sampled at block 701 of fig. 7A indicates that the monitored a/V content presentation corresponds to a blank frame, as detected by, for example, the scene change and blank frame detector 420 in fig. 4. If a blank frame is not detected (decision node 776), control passes to decision node 778 where the measurement engine metric evaluator 652 determines whether an audio mute state is detected, for example, by the volume and mute detector 320.

If an audio mute state is detected (decision node 778), then control proceeds to decision node 780 where the measurement engine metric evaluator 652 determines whether the metadata metric represents the presence of closed captioning or teletext information, as processed, for example, by the closed captioning processor 516 or teletext processor 520 of FIG. 5, respectively, at decision node 780. If closed captioning or teletext information is not present (decision node 780), control proceeds to decision node 782 where the measurement engine metric evaluator 652 determines whether the video metric indicates that a pause state has been detected, for example, by the scene change and blank frame detector 420 at decision node 782. If a pause state is not detected (decision node 782), control proceeds to block 784 according to the trick mode metric in FIG. 16F, and based on the absence of audio, closed caption information, and pause state, the measurement engine metric evaluator 652 determines that the newly identified A/V content source is operating in trick mode (since no audio is present in the video and the pause indicates a sudden change in the presentation of the A/V content). The example sub-process 704 then ends.

However, if a pause state is detected (decision node 782), control proceeds to decision node 786, where at decision node 786, the measurement engine metric evaluator 652 determines whether the video metric indicates that the paused video frame matches a known template (e.g., as determined by the template matcher 428) or contains predetermined text (e.g., as determined by the text detector 412 in fig. 4). If no template or text match is detected (decision node 786), then control proceeds to block 788 according to the pause metric of FIG. 16F, and based on the presence of a pause state and the absence of a template or text match, the measurement engine metric evaluator 652 determines that the newly identified A/V content source entered a pause mode of operation. However, if a template or text match is detected (decision node 786), then the decision metric is displayed according to the menu of FIG. 16F, control proceeds to block 790, and the measurement engine metric evaluator 652 determines that the corresponding A/V content source is displaying a menu corresponding to the matched reference template or predetermined text. After the processing at block 788 or 790 is complete, the example sub-process 704 ends.

Returning to decision node 780, if however closed captioning or teletext information is present, control proceeds to block 792 according to the audio muting decision metric of FIG. 16E, and based on the presence of closed captioning information and the audio muting state, the measurement engine metric evaluator 652 determines that the newly identified A/V content source entered the audio muting operational mode. The example sub-process 704 then ends. However, if at decision node 778. The measurement engine metric evaluator 652 determines that the audio is not muted and thus there is an audio signal corresponding to the monitored a/V content presentation, then since no blank frame is detected at decision node 776, control may proceed to block 794 where the measurement engine metric evaluator 652 may determine that the newly identified a/V content source is operating in a normal presentation mode at block 794. The example sub-process 704 then ends.

Returning to decision node 776, if a blank frame is detected, e.g., by the scene change and blank frame detector 420, then control passes to decision node 796 where the measurement engine metric evaluator 652 determines whether an audio mute state is detected, e.g., by the volume and mute detector 320, at decision node 796. If an audio mute state is not detected (decision node 796), then control proceeds to block 798 and the measurement engine metric evaluator 652 determines that the newly identified A/V content source is displaying a blank frame, in accordance with the blank frame decision metric of FIG. 16E. However, if an audio mute state is detected (decision mode 796), based on the detected audio signal not being with a blank video frame, control may proceed to block 799 where the measurement engine metric evaluator 652 may determine whether a presentation transition (e.g., corresponding to a transition between a program and a commercial) occurred at block 799. If the audio silence and blank frame states have a longer duration, the measurement engine metric evaluator 652 may determine that the newly identified a/V content source is placed in an off state at block 799. In any case, after the process at block 798 or 799 is complete, the example sub-process 704 ends.

Additionally, although not shown in fig. 7A-7D, a multi-engine meter employing the example process 700 or any similar process may employ other detected information to confirm the a/V content source and/or associated content identification information. For example, the multi-engine gauge 200 in fig. 2 includes a remote control detector 252 for detecting and processing signals received from a remote control device. The received remote control signal may be decoded and processed to determine, for example, which of a set of possible a/V content sources the user is controlling, the operational status of such a/V content sources, and so forth.

In certain instances, such as when sufficient metric information is not available, the example machine-readable instructions 700 may employ the stored heuristics to determine a/V content source, content identification information, and so forth. For example, a multi-engine meter executing machine-readable instructions 700 or similar processing may store statistics, content identification, and the like regarding previous A/V content source selections. The information may be classified according to time of day, order of selection, etc., for example. Then, as shown in FIG. 7B, in certain instances the machine-readable instructions 700 may employ a set of heuristics to determine the A/V content source based on the stored statistical information.

Additionally, as described above, the audio metrics, video metrics, and metadata metrics may be updated autonomously such that a particular metric or set of metrics may not be available when the machine-readable instructions 700 read the metrics at block 701. Thus, the machine-readable instructions 700 may employ one or more timeout timers to reset one or more audio, video, or metadata metrics to a known state. This mechanism prevents metric information from becoming stale if it is not updated within a desired/reasonable time.

An example machine-readable instruction 800 executable to implement the volume and mute detector 320 of fig. 3 is shown in fig. 8. The machine-readable instructions 800 begin execution at block 804 where the volume and mute detector 320 reads samples from an audio buffer, such as the audio buffer 216 in FIG. 2. For example, the volume and silence detector 320 may read a set of 512 audio samples from the audio buffer 216. Additionally, the machine-readable instructions 800 may be scheduled to be executed each time a new set of 512 audio samples is stored in the audio buffer 216. After reading the audio samples, the audio and silence detector 320 then counts the number of zero crossings that occur in the set of samples read from the audio buffer (block 808). As is known, a zero crossing occurs where the transition from the previous sample to the next sample will require the passage of a zero. In the case of an audio mute state, the audio samples will typically correspond to quantization noise and will therefore tend to fluctuate around zero. Accordingly, the volume and mute detector 320 determines whether the number of zero crossings exceeds a predetermined threshold that indicates a fluctuation in audio silence (block 812). If the number of zero crossings exceeds the threshold (block 812), the volume and mute detector 320 reports that the monitored audio signal corresponds to an audio mute state (block 816). The example process 800 then ends.

However, if the number of zero crossings does not exceed the threshold (block 812), the volume and mute detector 320 determines the energy of the audio sample (block 820). The volume and mute detector 320 then compares the audio energy to a predetermined threshold indicative of an audio mute state (block 824). If the audio energy is less than the threshold (block 824), the volume and mute detector 320 reports an audio mute state (block 816) and the example process 800 ends. However, if the audio energy is not less than the threshold (block 824), the volume and silence detector 320 reports the volume level of the audio sample (block 828), e.g., based on quantizing the audio energy to correspond to a set of predetermined volume levels. The example process 800 then ends.

An example machine-readable instructions 900 that may be executed to implement the compression detector 324 of FIG. 3 is shown in FIG. 9. The machine-readable instructions 900 begin execution at block 904 where the compression detector 324 reads samples from an audio buffer, such as the audio buffer 216 in FIG. 2. For example, the compression detector 324 may read a set of 256 audio samples from the audio buffer 216 that are generated by sampling the audio input signal 204 at a sampling rate of 48kHz as described above. Additionally, the machine-readable instructions 900 may be scheduled to execute each time a new set of 256 audio samples is stored in the audio buffer 216. After the audio samples are read, the compression detector 324 then computes a Modified Discrete Cosine Transform (MDCT) of the audio samples and may quantize the coefficients to correspond, for example, to the quantization used in AC3 audio compression (block 908). For example, the compression detector 324 may calculate a length-256 MDCT corresponding to 256 MDCT coefficients by processing 512 audio samples having a 256-sample overlap (e.g., corresponding to 256 "old" samples read in a previous execution of the process 900 and 256 "new" samples read from the audio buffer 216 in a current execution of the process 900). Then, for a one-second window of audio samples, the compression detector 324 determines the number of MDCT coefficients that are substantially zero valued at frequencies above a predetermined threshold frequency (block 912). The predetermined threshold frequency corresponds to an audio pass band associated with AC3 audio compression. Thus, if the audio samples correspond to an audio signal that has been AC3 compressed, the MDCT coefficients corresponding to frequencies above the pass band threshold will be substantially equal to zero. In the example described herein, the predetermined threshold frequency approximately corresponds to the MDCT coefficient bank (bin) 220. Accordingly, the compression detector 324 determines whether the number of zero MDCT coefficients in the example frequency regions corresponding to the MDCT coefficient bins 220 through 256 is less than 4000 (block 916). If the number of zero MDCT coefficients is less than 4000, the audio signal is not compressed and the compression detector 324 reports that the monitored A/V content corresponds to a broadcast analog transmission or VCR playback (block 920). The example process 900 then ends.

However, if the number of zero MDCT coefficients is not less than 4000 (block 916), the compression detector 324 determines whether the number of MDCT coefficients in the considered frequency region corresponding to the MDCT coefficient bins 220 through 256 is greater than 6000 (block 924). If the number of zero MDCT coefficients exceeds 6000 (block 924), the compression detector 324 determines that the audio signal is substantially equal to zero at these frequencies, and thus determines that the audio signal has been AC3 compressed (block 928). The example process 900 then ends. However, if the number of zero MDCT coefficients does not exceed 6000 (block 924), the compression detector 324 compares the MDCT coefficients to a stored template corresponding to the frequency response of the sub-band filter used in the MPEG audio compression (block 932). If the MDCT coefficients match the template (block 936), the compression detector 324 reports that the audio signal has undergone MPEG audio compression (block 940). However, if the MDCT coefficients do not match the template (block 936), the compression detector 324 reports that the audio signal has been AC3 compressed (block 928). The example process 900 then ends.

An example machine-readable instruction 1000 executable to implement ring detector 328 in fig. 3 is shown in fig. 10. The machine-readable instructions 1000 begin execution at block 1004, the ringtone detector 328 reads samples from an audio buffer, such as the audio buffer 216 in fig. 2. For example, the ringtone detector 328 may read a set of 512 audio samples from the audio buffer 216. Additionally, the machine-readable instructions 1000 may be scheduled to execute each time a new set of 512 audio samples is stored in the audio buffer 216. After reading the audio samples, ringtone detector 328 compares the audio samples to a set of stored reference templates corresponding to known audio ringtones and for various possible a/V content sources (block 1008). As described above, this comparison may be performed using, for example, any known technique for comparing audio features. If the audio sample matches a template corresponding to a gaming machine ringtone (block 1012), then ringtone detector 328 reports that the A/V content source is a gaming machine (block 1016), and the example process 1000 ends. However, if the audio sample matches a template corresponding to an STB ring tone (block 1020), the ring tone detector 328 reports that the A/V content source is an STB (block 1024) and the example process 1000 ends.

However, if the audio sample matches a template corresponding to a DVD player ring tone (block 1028), ring tone detector 328 reports the A/V content source as a DVD player (block 1032), and example process 1000 ends. However, if the audio sample matches a template corresponding to a VCR ring tone (block 1036), then the ring tone detector 328 reports that the A/V content source is a VCR (block 1040) and the example process 1000 ends. However, if the audio sample matches a template corresponding to a PVR/DVR ringtone (block 1044), the ringtone detector 328 reports the A/V content source as a PVR/DVR player (block 1048) and the example process 1000 ends. However, if the audio sample does not match any stored reference templates, then ringtone detector 328 reports an A/V content source uncertainty (block 1052) and the example process 1000 ends.

An example machine-readable instruction 1100 executable to implement the spectral shape processor 332 of fig. 3 is shown in fig. 11. The machine-readable instructions 1100 begin execution at block 1104 where the spectral shape processor 332 reads samples from an audio buffer, such as the audio buffer 216 in fig. 2. For example, the spectral shape processor 332 may read a set of 512 audio samples from the audio buffer 216. Additionally, the machine-readable instructions 1100 may be scheduled to be executed each time a new set of 512 audio samples is stored in the audio buffer 216. After reading the audio samples, process 1100 may then follow one or both of the following paths. In the case of the first processing path, the spectral shape processor 332 applies a notch filter with a center frequency at 15.75kHz to the audio samples (block 1108). The spectral shape processor 332 then determines whether the output of the notch filter exceeds a predetermined threshold (block 1112). The predetermined threshold corresponds to a spectral leakage expected for an analog cable television system. If the notch filter output exceeds the threshold (block 1112), the spectral shape processor 332 reports that the A/V content source is an analog cable television broadcast (block 1116). However, if the notch filter output does not exceed the threshold (block 1112), then the spectral shape processor 332 reports that the A/V content source is uncertain (block 1120). The example process 1100 then ends.

In the case of the second processing path, the spectral shape processor 332 computes a spectrum corresponding to the audio samples (e.g., based on a fast fourier transform or FFT) (block 1124). The spectral shape processor 332 then compares the audio spectrum to a template corresponding to the desired frequency response of the simulated cable system (block 1128). If the audio spectrum matches the template (block 1132), the spectral form processor 332 reports the A/V content source as an analog cable television broadcast (block 1136). If the audio spectrum does not match the template (block 1132), the spectral shape processor 332 reports an A/V content source uncertainty (block 1140). The example process 1100 then ends.

An example machine-readable instructions 1200 executable to implement the scene change and blank frame detector 420 of fig. 4 are shown in fig. 12. The machine-readable instructions 1200 begin execution at block 1204, at which block 1204 the scene change and blank frame detector 420 reads samples from a video buffer, such as the video buffer 224 in fig. 2. For example, the video buffer 224 may store video samples corresponding to an input frame rate of 30 frames/second at a resolution of 640 × 480 pixels. This results in a buffer size of 640 x 480 x 3 bytes, where a factor of 3 corresponds to storing 3 colors (e.g., red, green, and blue) per pixel, each color being represented by 1 byte-8 bits. The machine-readable instructions 1200 may be scheduled to be executed each time the video buffer 224 fills, which corresponds to processing each sampled video frame. After the video samples are read, the scene change and blank frame detector 420 computes histograms of pixel luminance values corresponding to three regions in the first video frame (block 1208). One of ordinary skill in the art will appreciate that fewer or more than three regions may be employed, depending on, for example, the size and frequency of the regions in which process 1200 is performed. The scene change and blank frame detector 420 then calculates pixel luminance values corresponding to the same three regions in the second video frame (block 1216). The scene change and blank frame detector 420 then calculates the distance between the histograms of the first and second frames (block 1216). This distance may be calculated, for example, by calculating the absolute differences between corresponding histogram bins in two frames and then summing these absolute differences.

The scene change and blank frame detector 420 then compares the histogram distance to a predetermined threshold corresponding to an expected luminance change associated with the scene change (block 1220). If the histogram distance exceeds the threshold (block 1220), the scene change and blank frame detector 420 reports that a scene change has occurred (block 1224). In addition, the scene change and blank frame detector 420 may determine the number of scene changes that occur per unit time (block 1228). However, if the histogram distance does not exceed the threshold (block 1220), the scene change and blank frame detector 420 determines whether the histogram is dominated by black luminance values (or ranges of values) (block 1232). If black is not dominant (block 1232), the scene change and blank frame detector 420 reports that the current video frame corresponds to a pause state (block 1236). However, if black is dominant (block 1232), the scene change and blank frame detector 420 reports that a blank frame is present (block 1240). In addition, the scene change and blank frame detector 420 may determine the number of blank frames occurring per unit time (1244). The number of blank frames per unit time may be used, for example, to determine whether the monitored video corresponds to a transition from broadcast content to ad insertion. The example process 1200 then ends.

An example process 1300 that may be used to implement the macroblock detector 424 of fig. 4 is shown in fig. 13. Process 1300 begins at block 1304, and at block 1304, macroblock detector 424 reads samples from a video buffer, such as video buffer 224 in fig. 2. For example, the video buffer 224 may store video samples corresponding to an input frame rate of 30 frames/second at a resolution of 640 × 480 pixels. This results in a buffer size of 640 x 480 x 3 bytes, where a factor of 3 corresponds to storing 3 colors (e.g., red, green, and blue) per pixel, each color being represented by 1 byte-8 bits. Process 1300 may be scheduled to process, for example, every 10 th sampled video frame.

As described above, MPEG video compression introduces macroblocks in the video image. For example, the size of a macroblock may be 16 pixels by 16 pixels. Macroblocks tend to have different average (DC) luminance values, a characteristic that can be exploited to detect the presence of a macroblock phenomenon in a video image. To detect the presence of a macroblock phenomenon, the macroblock detector 424 calculates inter-pixel differences in the horizontal and/or vertical directions of the video image (block 1308). The macroblock detector 424 then calculates the Power Spectral Density (PSD) of the calculated inter-pixel difference (block 1312). Next, macroblock detector 424 median filters the PSD (block 1316), calculates the difference between the original PSD and the median filtered PSD (1320), and sums these differences (block 1324). Median filtering is well known and can be used to smooth transitions in images. For example, a 3 × 3 median filter replaces a given pixel with a median of nine pixels that are near and include the given pixel. Thus, a video image exhibiting a macro-block phenomenon will have a large amount of PSD difference compared to a video image not exhibiting a macro-block phenomenon due to different average values of different macro-blocks.

Therefore, the macroblock detector 424 then compares the sum of the PSD differences to a predetermined threshold set to detect macroblocking (block 1328). If the sum of the PSD differences exceeds the threshold (block 1328), the macroblock detector 424 detects a macroblock phenomenon and reports that the monitored video signal has undergone video compression (block 1332). However, if the sum of the PSD differences does not exceed the threshold (block 1328), the macroblock detector 424 determines whether the sum is substantially equal to zero (block 1336). A substantially zero sum represents a perfect color match typically associated with video game content. Thus, if the sum of the PSD differences is substantially zero (block 1336), then the macroblock detector 424 reports that the A/V content source corresponds to a gaming machine (block 1340). Otherwise, the macroblock detector 424 reports an A/V content source uncertainty (block 1344). The example process 1300 then ends.

An example machine-readable instruction 1400 executable to implement template matcher 428 of fig. 4 is shown in fig. 14. The machine-readable instructions 1400 begin execution at block 1404 where the template matcher 428 reads samples from a video buffer, such as the video buffer 224 in fig. 2. For example, the video buffer 224 may store video samples corresponding to an input frame rate of 30 frames/second at a resolution of 640 × 480 pixels. This results in a buffer size of 640 x 480 x 3 bytes, where a factor of 3 corresponds to storing 3 colors (e.g., red, green, and blue) per pixel, each color being represented by 1 byte-8 bits. The machine-readable instructions 1400 may be configured to process, for example, every 10 th sampled video frame. After reading the video sample, the template matcher 428 then compares the video sample to a set of stored reference templates corresponding to known video frames (e.g., menu frames) for various possible a/V content sources (block 1408). If the video sample matches a template corresponding to a reference gaming machine video frame (block 1412), the template matcher 428 reports that the A/V content source is a gaming machine (block 1416) and the example process 1400 ends. However, if the video sample matches a template corresponding to a reference STB video frame (block 1420), the template matcher 428 reports that the A/V content source is an STB (block 1424), and the example process 1400 ends.

However, if the video sample matches a template corresponding to a reference DVD player video frame (block 1428), the template matcher 428 reports that the A/V content source is a DVD player (block 1432) and the example process 1400 ends. However, if the video sample matches a template corresponding to a reference VCR video frame (block 1436), the template matcher 428 reports the A/V content source as a VCR (block 1440) and the example process 1400 ends. However, if the video sample matches the template corresponding to the reference PVR/DVR video frame (block 1444), the template matcher 428 reports that the A/V content source is a PVR/DVR (block 1448) and the example process 1400 ends. However, if the video sample does not match any stored reference templates, the template matcher 428 reports that the A/V content source is uncertain (block 1452) and the example process 1400 ends.

FIG. 15 is a block diagram of an example computer 1500 capable of implementing the apparatus and methods disclosed herein. The computer 1500 may be, for example, a server, a personal computer, a Personal Digital Assistant (PDA), an Internet appliance, a DVD player, a CD player, a digital video recorder, a personal video recorder, a set-top box, or any other type of computing device.

The system 1500 of the present example includes a processor 1512 such as a general purpose programmable processor. The processor 1512 includes a local memory 1514, and executes coded instructions 1516 present in the local memory 1514 and/or in another memory device. Further, processor 1512 may execute the machine-readable instructions represented in FIGS. 7A-7D up to FIG. 14. Processor 1512 may be any type of processing unit, such as an IntelCentrinoMicroprocessor family, IntelPentiumMicroprocessor family, IntelItaniumMicroprocessor family and/or Intel XScaleOne or more microprocessors in a family of processors. Of course, other processors of other families are also suitable.

The processor 1512 communicates with a main memory including a volatile memory 1518 and a non-volatile memory 1520 via a bus 1522. The volatile memory 1518 may be implemented by Static Random Access Memory (SRAM), Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM), and/or any other type of random access memory device. The non-volatile memory 1520 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1518, 1520 is typically controlled by a memory controller (not shown) in a conventional manner.

The computer 1500 also includes a conventional interface circuit 1524. The interface circuit 1524 may be implemented by any type of well-known interface standard, such as an ethernet interface, a Universal Serial Bus (USB), and/or a third generation input/output (3GIO) interface.

One or more input devices 1526 are connected to the interface circuit 1524. An input device 1526 allows a user to enter data and commands into the processor 1512. The input device may be implemented by, for example, a keyboard, mouse, touch screen, track pad, track ball, isopoint, and/or a voice recognition system.

One or more output devices 1528 are also connected to the interface circuit 1524. The output devices 1528 can be implemented, for example, by display devices (e.g., a liquid crystal display, a cathode ray tube display (CRT)), by a printer and/or by speakers. Thus, the interface circuit 1524 typically includes a graphics driver card.

The interface circuit 1524 also includes a communication device such as a modem or network interface card to facilitate exchange of data with external computers via a network (e.g., an ethernet connection, a Digital Subscriber Line (DSL), a telephone line, coaxial cable, cellular telephone system, etc.).

The computer 1500 also includes one or more mass storage devices 1530 for storing software and data. Examples of such mass storage devices 1530 include floppy disk drives, hard disk drives, optical disk drives, and Digital Versatile Disk (DVD) drives. The mass storage device 1530 may implement the audio metric registers 616-620, the video metric registers 624-628, and/or the metadata metric registers 632-636. Alternatively, the volatile memory 1518 may implement the audio metric registers 616-620, the video metric registers 624-628, and/or the metadata metric registers 632-636.

At least some of the above-described example methods and/or apparatus are implemented by one or more software and/or firmware programs running on a computer processor. However, dedicated hardware implementations including, but not limited to, Application Specific Integrated Circuits (ASICs), Programmable Logic Arrays (PLAs), and other hardware devices may likewise be constructed to implement, in whole or in part, some or all of the example methods and/or apparatus described herein. Further, alternative software implementations including, but not limited to, distributed processing or component/object distributed processing, parallel processing, or virtual machine processing can also be constructed to implement the example methods and/or apparatus described herein.

It should also be noted that the example software and/or firmware implementations described herein are optionally stored on a tangible storage medium such as: magnetic media (e.g., magnetic disks or tapes); magneto-optical or optical media such as optical disks; alternatively, a solid-state medium such as a memory card or other package that houses one or more read-only (non-volatile) memories, random access memories, or other re-writable (volatile) memories; or a signal containing computer instructions. A digital file or other information archive or set of archives attached to an email is considered a distribution medium equivalent to a tangible storage medium. Thus, the example software and/or firmware described herein may be stored on a tangible storage medium or distribution medium or later storage medium such as the ones described above.

Additionally, although this patent discloses example systems including software or firmware executed on hardware, it should be noted that such systems are merely illustrative and should not be considered as limiting. For example, it is contemplated that any or all of these hardware and software components could be embodied exclusively in hardware, exclusively in software, exclusively in firmware or in some combination of hardware, firmware and/or software. Thus, while the above description describes example systems, methods, and articles of manufacture, those of ordinary skill in the art will readily appreciate that these examples are not the only way to implement such systems, methods, and articles of manufacture. Thus, although certain example methods, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.

This patent claims priority from U.S. provisional application serial No. 60/600,007 entitled "Methods and Apparatus to monitor Audio/Visual Content from Sources" filed on 9.8.2004. U.S. provisional application serial No. 60/600,007 is hereby incorporated by reference in its entirety.

Claims

1. A method for monitoring media content provided by a selected media content source of a plurality of media content sources for presentation by an information presentation device, the method comprising the steps of:

determining first information based on a video signal corresponding to the monitored media content;

determining second information based on an audio signal corresponding to the monitored media content;

identifying a selected one of the plurality of media content sources based on a plurality of decision nodes, each decision node corresponding to a respective classification decision resulting from processing at least one of the first information or the second information, each subsequent decision node being selected based on a result of evaluation of a previous decision node; and

identifying the monitored media content based on identification information associated with at least one of the video signal or the audio signal.

2. The method of claim 1, wherein the plurality of media content sources comprises at least one analog media content source and at least one digital media content source.

3. The method of claim 2, wherein the at least one analog media content source comprises at least one of a live analog television broadcast, an analog video-on-demand presentation, or a video cassette recorder playback.

4. The method of claim 2, wherein the at least one source of digital media content comprises at least one of a live digital television broadcast, a time-shifted digital television presentation, a digital versatile disc playback, or a video game presentation.

5. The method of claim 1, wherein the first information comprises at least one of detected metadata, detected text, a blur measurement, a detected blank frame, a macroblock measurement, or a detected template.

6. The method of claim 5, wherein the detected metadata comprises at least one of automated programming measurement data, closed caption data, or teletext data.

7. The method of claim 1, wherein the second information comprises at least one of a detected audio code, a detected audio feature, a volume measurement, an audio compression measurement, a ringtone measurement, or an audio spectral shape measurement.

8. The method of claim 1, wherein the step of identifying the selected one of the plurality of media content sources comprises the steps of: determining whether a selected media content source of the plurality of media content sources is at least one of an analog television broadcast, an analog video-on-demand presentation, or a video cassette recorder playback based on whether the first information includes detected automated programming measurement data and whether the second information includes an audio spectral shape measurement.

9. The method of claim 8, wherein the determining whether the selected one of the plurality of media content sources is at least one of an analog television broadcast, an analog video-on-demand presentation, or a video cassette recorder playback is based on detecting a time shift associated with the monitored media content.

10. The method of claim 8, wherein the determining whether the selected one of the plurality of media content sources is at least one of an analog television broadcast, an analog video-on-demand presentation, or a video cassette recorder playback is based on detecting whether the audio spectral shape measurement represents a cable television transmission system.

11. The method of claim 1, wherein the step of identifying the selected one of the plurality of media content sources comprises the steps of: determining whether a selected media content source of the plurality of media content sources is at least one of a live digital television broadcast, a time-shifted digital television presentation, or a digital versatile disc presentation based on whether the first information includes a macroblock measurement and whether the second information includes an audio compression measurement.

12. The method of claim 11, wherein the step of determining whether the selected one of the plurality of media content sources is at least one of a live digital television broadcast, a time-shifted digital television presentation, or a digital versatile disc presentation comprises the steps of: determining that the selected one of the plurality of media content sources is the digital versatile disk presentation if the first information comprises a macro block measurement and the second information comprises an audio compression measurement, wherein the macro block measurement represents a media content presentation exhibiting macro block phenomena and the audio compression measurement represents AC3 compression.

13. The method of claim 11, wherein the step of determining whether the selected one of the plurality of media content sources is at least one of a live digital television broadcast, a time-shifted digital television presentation, or a digital versatile disc presentation comprises the steps of: determining that the selected one of the plurality of media content sources is at least one of the live digital television broadcast or the time-shifted digital television if the first information comprises a macro-block measurement and the second information comprises an audio compression measurement, wherein the macro-block measurement represents a media content presentation that does not exhibit macro-block phenomena and the audio compression measurement represents AC3 compression.

14. The method of claim 13, wherein the determining that the selected one of the plurality of media content sources is at least one of the live digital television broadcast or the time-shifted digital television is based on detecting a time shift associated with the monitored media content.

15. The method of claim 1, further comprising the step of prioritizing the first information and the second information.

16. The method of claim 15, wherein prioritizing the first information and the second information comprises processing the first information and the second information in a predetermined order.

17. The method of claim 15, wherein prioritizing the first information and the second information comprises assigning a first weight to the first information and a second weight to the second information.

18. The method of claim 1, wherein at least one of the video signal or the audio signal corresponds to an input to the information presentation device.

19. The method of claim 1, further comprising the steps of:

determining third information based on the remote control signal; and

evaluating the first information, the second information, and the third information to:

identifying a selected media content source of the plurality of media content sources; and is

The monitored media content is identified.

20. The method of claim 1, further comprising the step of determining whether a selected media content source of the plurality of media content sources is in at least one special operating mode,

wherein the at least one special operating mode comprises presenting blank frames, presenting an audio mute state, presenting a menu display, presenting a pause state, or being in a trick mode.

21. A multi-engine meter for monitoring media content provided by a selected media content source of a plurality of media content sources for presentation by an information presentation device, the multi-engine meter comprising:

at least one audio engine for processing audio samples corresponding to the monitored media content;

at least one video engine for processing video samples corresponding to the monitored media content;

at least one metadata engine for processing at least one of the audio samples or the video samples; and

a decision processor for evaluating a plurality of decision nodes to identify a selected one of the plurality of media content sources, each decision node corresponding to a decision metric evaluated using information generated by at least one of the at least one audio engine, the at least one video engine, or the at least one metadata engine, each subsequent decision node being selected based on a result of evaluation of a previous decision node.

22. The multi-engine meter of claim 21, wherein the plurality of media content sources comprises at least one analog media content source and at least one digital media content source.

23. The multi-engine meter of claim 21, wherein the at least one audio engine comprises at least one of an audio code detector, an audio feature processor, a volume and silence detector, a compression detector, a ringtone detector, or a spectral shape processor.

24. The multi-engine meter of claim 21, wherein the at least one video engine comprises at least one of a text detector, a blur detector, a scene change and blank frame detector, a macroblock detector, or a template matcher.

25. The multi-engine meter of claim 21, wherein the at least one metadata engine comprises at least one of an automated schedule measurement processor, a closed caption processor, or a teletext processor.

26. The multi-engine meter of claim 21, wherein the decision processor comprises:

a metric sampler to sample information generated by the at least one audio engine, the at least one video engine, and the at least one metadata engine; and

a measurement engine metric evaluator to evaluate the sampled information to identify a selected one of the plurality of media content sources and to identify the monitored media content.

27. The multi-engine meter of claim 26 wherein the metric sampler is configured to poll the at least one audio engine, the at least one video engine, and the at least one metadata engine at predetermined time intervals.

28. A multi-engine meter as defined in claim 26 wherein the measurement engine metric evaluator is configured to evaluate the sampled information based on a priority associated with the information.

29. The multi-engine meter of claim 28, wherein the prioritization is based on assigning weights to the information.

30. The multi-engine meter of claim 21 wherein the decision processor is further configured to determine whether a selected media content source of the plurality of media content sources is in a special operating mode,

wherein the special operating mode comprises presenting blank frames, presenting an audio mute state, presenting a menu display, presenting a pause state, or being in a trick mode.