US20090158309A1

US20090158309A1 - Method and system for media audience measurement and spatial extrapolation based on site, display, crowd, and viewership characterization

Info

Publication number: US20090158309A1
Application number: US12/001,611
Authority: US
Inventors: Hankyu Moon; Rajeev Sharma; Namsoon Jung
Original assignee: VideoMining Corp
Current assignee: VideoMining Corp
Priority date: 2007-12-12
Filing date: 2007-12-12
Publication date: 2009-06-18
Also published as: US9161084B1

Abstract

The present invention provides a comprehensive method to design an automatic media viewership measurement system, from the problem of sensor placement for an effective sampling of the viewership to the method of extrapolating spatially sampled viewership data. The system elements that affect the viewership—site, display, crowd, and audience—are identified first. The site-viewership analysis derives some of the crucial elements in determining an effective data sampling plan: visibility, occupancy, and viewership relevancy. The viewership sampling map is computed based on the visibility map, the occupancy map, and the viewership relevancy map; the viewership measurement sensors are placed so that the sensor coverage maximizes the viewership sampling map. The crowd-viewership analysis derives a model of the viewership in relation to the system parameters so that the viewership extrapolation can effectively adapt to the time-changing spatial distribution of the viewership; the step identifies crowd dynamics, and its invariant features as the crucial elements that extract the influence of the site, display, and the crowd to the temporal changes of viewership. The extrapolation map is formulated around these quantities, so that the site-wide viewership can be effectively estimated from the sampled viewership measurement.

Description

BACKGROUND OF THE INVENTION

1. Field of the Invention
The present invention is a system and method for designing a comprehensive media audience measurement platform starting from a site, display, and crowd characterization, to a data sampling planning and a data extrapolation method.
2. Background of the Invention
The role of digital media for advertisement in public spaces is becoming increasingly important. The task of measuring the degree of media exposure is also deemed as very important both as a guide to the equipment installation (equipment kind, position, size, and orientation) and as a rating for the content programming. As the number of such displays is growing, measuring the viewing behavior of the audience using human intervention can be very costly.
Unlike the traditional broadcast media, the viewing typically occurs in public spaces where a very large number of unknown people can assemble to comprise an audience. It is therefore hard to take surveys of the audience using traditional interviewing through telephone or mail methods, and on-site interviews can be both very costly and potentially highly biased.
There are technologies to perform automated measurement of viewing behavior; the viewing behavior in this context is called ‘viewership’. These automatic viewership measurement systems can be maintained with little cost once installed, and can provide a continuous stream of viewership measurement data. These systems typically employ electro-optical visual sensor devices, such as video cameras or infrared cameras, and provide consistent sampling of the viewing behavior based on visual observations. However, due to the high initial installation cost, the sensor placement planning is extremely important. While the equipments deliver consistent viewership measurements, they have limitations of measuring the view from their individual fixed positions (and orientations, in most cases). However, relocating the equipment can affect the integrity of the data.
In a typical media display scenario, it is unrealistic, if not impossible, to detect and record all instances of viewership occurring in the site using these sensors. Any optical sensor has a limited field of coverage, and its area of coverage can also depend on its position and orientation. A large venue can be covered by multiple sensors, and their individual lens focal lengths, positions, and orientations need to be determined.
The data delivered from these sensors also needs to be properly interpreted, because the viewership data that each equipment provides has been spatially sampled from the whole viewership at the site. The ultimate goal of the audience measurement system is to estimate the site-wide viewership for the display; it is crucial to extrapolate the site-wide viewership data from the sampled viewership data in a mathematically sound way.
The present invention provides a comprehensive solution to the problem of automatic media measurement, from the problem of sensor placement for effective sampling to the method of extrapolating spatially sampled data.
There have been prior attempts for measuring the degree of public exposure for the media, including broadcast media or publicly displayed media.
U.S. Pat. No. 4,858,000 of Lu, et al. (hereinafter Lu U.S. Pat. No. 4,858,000) disclosed an image recognition method and system for identifying predetermined individual members of a viewing audience in a monitored area. A pattern image signature is stored corresponding to each predetermined individual member of the viewing audience to be identified. An audience scanner includes audience locating circuitry for locating individual audience members in the monitored area. A video image is captured for each of the located individual audience members in the monitored area. A pattern image signature is extracted from the captured image. The extracted pattern image signature is then compared with each of the stored pattern image signatures to identify a particular one of the predetermined audience members. These steps are repeated to identify all of the located individual audience members in the monitored area.
U.S. Pat. No. 5,771,307 of Lu, et al. (hereinafter Lu U.S. Pat. No. 5,771,307) disclosed a passive identification apparatus for identifying a predetermined individual member of a television viewing audience in a monitored viewing area, where a video image of a monitored viewing area is captured first. A template matching score is provided for an object in the video image. An Eigenface recognition score is provided for an object in the video image. The Eigenface score may be provided by comparing an object in the video image to reference files. The template matching score and the Eigenface recognition score are fused to form a composite identification record from which a viewer may be identified. Body shape matching, viewer tracking, viewer sensing, and/or historical data may be used to assist in viewer identification. The reference files may be updated as recognition scores decline.
U.S. Pat. No. 6,958,710 of Zhang, et al. (hereinafter Zhang) disclosed systems, methods and devices for gathering data concerning exposure of predetermined survey participants to billboards. A portable transmitter is arranged to transmit a signal containing survey participant data, and a receiver located proximately to the billboard serves to receive the signal transmitted by the transmitter.
U.S. Pat. No. 7,176,834 of Percy, et al. (hereinafter Percy) disclosed a method directed to utilizing monitoring devices for determining the effectiveness of various locations, such as media display locations, for an intended purpose. The monitoring devices are distributed to a number of study respondents. The monitoring devices track the movements of the respondents. While various technologies may be used to track the movements of the respondents, at least some of the location tracking of the monitoring device utilize a satellite location system, such as the global positioning system. These movements of the respondent and monitoring device at some point coincide with exposure to a number of media displays. Geo data collected by the monitoring device are downloaded to a download server, for determining to which media displays the respondent was exposed. The exposure determinations are made by a post-processing server.
U.S. Pat. Application No. 20070006250 of Croy, et al. (hereinafter Croy) disclosed portable audience measurement architectures and methods for portable audience measurement. The disclosed system contains a plurality of portable measurement devices configured to collect audience measurement data from media devices, a plurality of data collection servers configured to collect audience measurement data from the plurality of portable measurement devices, and a central data processing server. A portable measurement device establishes a communication link with a data collection server in a peer-to-peer manner and transfers the collected audience measurement data to the data collection server. Because the portable measurement device is not dedicated to a particular local data collection server, the portable measurement device periodically or aperiodically broadcasts a message attempting to find a data collection server with which to establish a communication link.
U.S. patent application Ser. No. 11/818,554 of Sharma, et al. (hereinafter Sharma) presented a method and system for automatically measuring viewership of people for displayed objects, such as in-store marketing elements, static signage, POP displays, various digital media, retail TV networks, and kiosks, by counting the number of viewers who actually viewed the displayed object vs. passers-by who may appear in the vicinity of the displayed object but do not actually view the displayed object, and the duration of viewing by the viewers, using a plurality of means for capturing images and a plurality of computer vision technologies, such as face detection, face tracking, and the 3-dimensional face pose estimation of the people, on the captured visual information of the people.
Lu U.S. Pat. No. 4,858,000 and Lu U.S. Pat. No. 5,771,307 introduce systems for measuring viewing behavior of broadcast media by identifying viewers from a predetermined set of viewers, based primarily on facial recognition. Sharma introduces a method to measure viewership of displayed objects using computer vision algorithms. The present invention also aims to measure viewing behavior of an audience using visual information, however, has focus on publicly displayed media for a general unknown audience. The present invention utilizes an automated method similar to Sharma to measure the audience viewership by processing data from visual sensors. The present invention provides not only a method of measuring viewership, but also a solution to a much broader class of problems including the data sampling plan and the data extrapolation method based on the site, display, and crowd analysis, to design an end-to-end comprehensive audience measurement system.
All of the systems presented in Zhang, Percy, and Croy involve portable hardware and a central communication/storage device for tracking audience and transmitting/storing the measured data. They rely on a predetermined number of survey participants to carry the devices so that their behavior can be measured based on the proximity of the devices to the displayed media. The present invention can measure a very large number of audience behaviors without relying on recruited participants or carry-on devices. It can accurately detect not only the proximity of the audience to the media display, but the actual measurement of the viewing time and duration based on the facial images of the audience. These prior inventions can only collect limited measurement data sampled from a small number of participants; however, the present invention provides a scheme to extrapolate the measurement data sampled from the camera views, so that the whole site-wide viewership data can be estimated.
There have been prior attempts for estimating an occupancy map of people based on the trajectory of the people.
U.S. Pat. No. 6,141,041 of Carlbom, et al. (hereinafter Carlbom) disclosed a method and apparatus for deriving an occupancy map reflecting an athlete's coverage of a playing area based on real time tracking of a sporting event. The method according to the invention includes a step of obtaining a spatiotemporal trajectory corresponding to the motion of an athlete and based on real time tracking of the athlete. The trajectory is then mapped over the geometry of the playing area to determine a playing area occupancy map indicating the frequency with which the athlete occupies certain areas of the playing area, or the time spent by the athlete in certain areas of the playing area. The occupancy map is preferably color coded to indicate different levels of occupancy in different areas of the playing area, and the color coded map is then overlaid onto an image (such as a video image) of the playing area. The apparatus according to the invention includes a device for obtaining the trajectory of an athlete, a computational device for obtaining the occupancy map based on the obtained trajectory and the geometry of the playing area, and devices for transforming the map to the camera view, generating a color coded version of the occupancy map, and overlaying the color coded map on a video image of the playing area.
It is one of the features of the invention to estimate the occupancy map of the crowd with a method similar to Carlbom. The invention makes use of other concepts, such as the visibility map and viewership relevancy map, for the purpose of characterizing an audience behavior, and ultimately for the purpose of finding an optimal camera placement plan.
There have been attempts for designing a camera platform or a placement method for the purpose of monitoring a designated area.
U.S. Pat. No. 3,935,380 of Coutta, et al. (hereinafter Coutta) disclosed a closed circuit TV surveillance system for retail and industrial establishments in which one or more cameras are movable along a rail assembly suspended from the ceiling which enables the cameras to be selectively trained on any area of interest within the establishment. Employing two cameras, one may be both tilted and horizontally trained to observe any location within the line of sight of the camera and the other one particularly tilted and trained to observe the amount showing on a cash register.
U.S. Pat. No. 6,437,819 of Loveland, et al. (hereinafter Loveland) disclosed an automated system for controlling multiple pan/tilt/zoom video cameras in such a way as to allow a person to be initially designated and tracked thereafter as he/she moves through the various camera fields of view. Tracking is initiated either by manual selection of the designated person on the system monitor through the usage of a pointing device, or by automated selection of the designated person using software. The computation of the motion control signal is performed on a computer through software using information derived from the cameras connected to the system, and is configured in such a way as to allow the system to pass tracking control from one camera to the next, as the designated person moves from one region to another. The system self-configuration is accomplished by the user's performance of a specific procedure involving the movement and tracking of a marker throughout the facility.
U.S. Pat. No. 6,879,338 of Hashimoto, et al. (hereinafter Hashimoto) disclosed asymmetrical camera systems, which are adapted to utilize a greater proportion of the image data from each camera as compared to symmetrical camera systems. Specifically, an outward facing camera system in accordance with one embodiment of the invention includes a plurality of equatorial cameras distributed evenly about an origin point in a plane. The outward facing camera system also includes a first plurality of polar cameras tilted above the plane. Furthermore, some embodiments of the invention include a second plurality of polar cameras tilted below the plane. The equatorial cameras and polar cameras are configured to capture a complete coverage of an environment.
Coutta presented a method to place multiple cameras for monitoring a retail environment, especially the cash register area. Because the area to be monitored is highly constrained, the method doesn't need a sophisticated methodology to optimize the camera coverage as the present invention aims to provide.
Loveland presents a pan/tilt/zoom camera system for the purpose of monitoring an area and tracking people one by one, while the present invention aims to find an optimal placement of cameras so that the cameras have maximal concurrent coverage of the area and of multiple people at the same time.
Hashimoto employs multiple outward facing cameras to have a full coverage of the surroundings, while the present invention provides a methodology to place cameras to have maximal coverage given the constraints of the number of cameras and the constraints of the site, display, and the measurement algorithm.
There have been prior attempts for counting or monitoring people in a designated area by automated means.
U.S. Pat. No. 5,866,887 of Hashimoto, et al. (hereinafter Hashimoto) disclosed a method to measure the number of people passing by a certain area. Plurality of rows is provided on a calling of sensors and each have a plurality of distance variation measuring sensors. The distance variation measuring sensors each include a light emitter and a light receiver arranged in the orthogonal direction to the direction in which human bodies pass. The number of passers is detected on the basis of the number of the distance variation measuring sensors that have detected a human body. The traveling direction of human bodies is detected on the basis of the change in the distance to the distance variation measuring human bodies measured by the sensors.
U.S. Pat. No. 6,987,885 of Gonzalez-Banos, et al. (hereinafter Gonzalez-Banos) disclosed systems, apparatuses, and methods that determine the number of people in a crowd using visual hull information. In one embodiment, an image sensor generates a conventional image of a crowd. A silhouette image is then determined based on the conventional image. The intersection of the silhouette image cone and a working volume is determined. The projection of the intersection onto a plane is determined. Planar projections from several image sensors are aggregated by intersecting them, forming a subdivision pattern. Polygons that are actually empty are identified and removed. Upper and lower bounds of the number of people in each polygon are determined and stored in a tree data structure. This tree is updated as time passes and new information is received from image sensors. The number of people in the crowd is equal to the lower bound of the root node of the tree.
U.S. Pat. No. 6,697,104 of Yakobi, et al. (hereinafter Yakobi) disclosed a video based system and method for detecting and counting persons traversing an area being monitored. The method includes the steps of initialization of at least one end unit forming part of a video imaging system, the end unit having at least one camera installed, the camera producing images within the field of view of the camera of at least part of the area being monitored, the end unit includes a plurality of counters; digitizing the images and storing the digitized images; detecting objects of potential persons from the digitized images; comparing the digitized images of objects detected in the area being monitored with digitized images stored in the working memory unit to determine whether the detected object is a new figure that has entered the area being monitored or whether the detected object is a new figure that has entered the area being monitored or whether the detected object is a known figure, that has remained within the area being monitored and to determine that a figure which was not detected has left the area being monitored; and incrementing at least one of the plurality of counters with an indication of the number of persons that have passed through the area being monitored.
U.S. Pat. No. 7,139,409 of Paragios, et al. (hereinafter Paragios) disclosed a system and method for automated and/or semi-automated analysis of video for discerning patterns of interest in video streams. In a preferred embodiment, the invention is directed to identify patterns of interest in indoor settings. In one aspect, the invention deals with the change detection problem using a Markov Random Field approach, where information from different sources are naturally combined with additional constraints to provide the final detection map. A slight modification is made of the regularity term within the MRF model that accounts for real-discontinuities in the observed data. The defined objective function is implemented in a multi-scale framework that decreases the computational cost and the risk of convergence to local minima. The crowdedness measure used is a geometric measure of occupancy that is quasi-invariant to objects translating on the platform.
U.S. Pat. No. 7,203,338 of Ramaswamy, et al. (hereinafter Ramaswamy) disclosed methods and apparatus to count people appearing in an image. One disclosed method reduces objects appearing in a series of images to one or more blobs; for each individual image in a set of the images of the series of images, represents the one or more blobs in the individual image by one or more symbols in a histogram; and analyzes the symbols appearing in the histogram to count the people in the image.
U.S. Pat. Application No. 20060171570 of Brendley, et al. (hereinafter Brendley) disclosed a “Smartmat” system that monitors and identifies people, animals and other objects that pass through a control volume. Among other attributes, an exemplary system implementation can count, classify and identify objects, such as pedestrians, animals, bicycles, wheelchairs, vehicles, rollerbladers and other objects, either singly or in groups. Exemplary Smartmat implementations differentiate objects based on weight, footprint and floor/wall pressure patterns, such as footfall patterns of pedestrians and other patterns. The system may be applied to security monitoring, physical activity monitoring, market traffic surveys and other traffic surveys, security checkpoint/gate monitoring, traffic light activation and other device activation such as security cameras, and other monitoring applications. Smartmat may be portable or permanently installed.
U.S. Pat. Application No. 20070032242 of Goodman, et al. (hereinafter Goodman) disclosed methods and apparatus for providing statistics on the number, distribution and/or flow of people or devices in a geographic region based on active wireless device counts. Wireless devices may be of different types, e.g., cell phones, PDAs, etc. Wireless communications centers report the number and type of active devices in the geographic region serviced by the wireless communications center and/or indicate the number of devices entering/leaving the serviced region. The active wireless device information is correlated to one or more targeted geographical areas. Population counts are extrapolated from the device information for the targeted geographic areas. Traffic and/or flow information is generated from changes in the device counts or population estimates over time and/or from information on the number of active devices entering/leaving a region. Reports may include predictions of crowd population characteristics based on information about the types and/or number of different wireless devices being used.
Hashimoto and Brendley use special sensors (distance measuring and pressure mat sensors, respectively) placed in a designated space, so that they can count the number of people passing and, in the case of Brendley, classify the kind of traffic, whereas in the present invention the visual sensor based technology can measure not only the amount of traffic, but also the direction of the traffic, and on wider areas.
Goodman introduces using the tracking of active wireless devices, such as mobile phones or PDAs, so that the people carrying these devices can be detected and tracked. The crowd estimation method of the present invention can measure the crowd traffic without any requirement of the people carrying certain devices, and without introducing potential bias toward business people or bias against seniors or children.
Gonzales-Banos, Yakobi, and Ramaswamy detect and count the number of people in a scene by processing video frames to detect people. One of the exemplary embodiments of the present invention utilizes top-down view cameras so that person detection and tracking can be carried out effectively, where an individual person in the crowd is being tracked so that both the crowd density and direction can be estimated. These prior inventions don't concern the crowd directions.
Paragios measures the pattern of crowd motion without explicitly detecting or tracking people. One of the exemplary embodiments of the present invention also makes use of such crowd dynamics estimation, however, it is a part of the comprehensive system where the goal is to extrapolate the sampled viewership measurement based on the crowd dynamics.
There have been prior attempts for estimating a motion vector field based on video image sequences.
U.S. Pat. No. 5,574,663 of Ozcelik, et al. (hereinafter Ozcelik) disclosed a method and apparatus for regenerating a dense motion vector field, which describes the motion between two temporally adjacent frames of a video sequence, utilizing a previous dense motion vector field. In this method, a spatial DVF (dense motion vector field) and a temporal DVF are determined and summed to provide a DVF prediction. This method and apparatus enables a dense motion vector field to be used in the encoding and decoding process of a video sequence. This is very important since a dense motion vector field provides a much higher quality prediction of the current frame as compared to the standard block matching motion estimation techniques. The problem to date with utilizing a dense motion vector field is that the information contained in a dense motion field is too large to transmit. The invention eliminates the need to transmit any motion information.
U.S. Pat. No. 6,400,830 of Christian, et al. (hereinafter Christian) disclosed a technique for tracking objects through a series of images. In one embodiment, the technique is realized by obtaining at least first and second representations of a plurality of pixels, wherein at least one grouping of substantially adjacent pixels has been identified in each of the first and second representations. Each identified grouping of substantially adjacent pixels in the first representation is then matched with an identified grouping of substantially adjacent pixels in the second representation.
U.S. Pat. No. 6,944,227 of Bober, et al. (hereinafter Bober) disclosed a method and apparatus for representing motion in a sequence of digitized images, which derives a dense motion vector field where vector quantizes the motion vector field.
The present invention makes use of a motion field computed in a similar manner as described in these prior inventions. The motion field computation at each floor position can be a dense optical flow computation as disclosed in Ozcelik or Bober. It can also be an object tracking as disclosed in Christian, so that the dense motion field can be computed using the motion trajectories of multiple objects.
There have been prior attempts for representing the pattern of motion.
U.S. Pat. No. 6,535,620 of Wildes, et al. (hereinafter Wildes) disclosed an invention, which is embodied in a method for representing and analyzing spatiotemporal data in order to make qualitative yet semantically meaningful distinctions among various regions of the data at an early processing stage. In one embodiment of the invention, successive frames of image data are analyzed to classify spatiotemporal regions as being stationary, exhibiting coherent motion, exhibiting incoherent motion, exhibiting scintillation and so lacking in structure as to not support further inference. The exemplary method includes filtering the image data in a spatiotemporal plane to identify regions that exhibit various spatiotemporal characteristics. The output data provided by these filters is then used to classify the data.
U.S. Pat. No. 6,806,705 of van Muiswinkel, et al. (hereinafter van Muiswinkel) disclosed an imaging method for imaging a subject including fibrous or anisotropic structures and includes acquiring a 3-dimensional apparent diffusion tensor map of a region with some anisotropic structures. The apparent diffusion tensor at a voxel is processed to obtain Eigenvectors and Eigenvalues. A 3-dimensional fiber representation is extracted using the Eigenvectors and Eigenvalues. During the extracting, voxels are locally interpolated in at least a selected dimension in a vicinity of the fiber representation. The interpolating includes weighting the voxels by a parameter indicative of a local anisotropy. The interpolating results in a 3-dimensional fiber representation having a higher tracking accuracy and representation resolution than the acquired tensor map.
In Wildes, the image motion is estimated and represented using plurality of spatiotemporal filter banks. In van Muiswinkel, the 3-dimensional structure is represented as diffusion tensor map. The present invention makes use of similar tensorial (but 2×2 tensor) representation of crowd motion, where the motion anisotropy is computed using the Eigenvalues of the motion tensor in the same way.
There have been prior attempts for learning a general mapping based on available training data.
U.S. Pat. No. 5,682,465 of Kil, et al. (hereinafter Kil) disclosed a function approximation method, which is based on nonparametric estimation by using a network of three layers, such as an input layer, an output layer and a hidden layer. The input and the output layers have linear activation units, while the hidden layer has nonlinear activation units, which have the characteristics of bounds and locality. The whole learning sequence is divided into two phases. The first phase estimates the number of kernel functions based on a user's requirement on the desired level of accuracy of the network, and the second phase is related to parameter estimation. In the second phase, a linear learning rule is applied between output and hidden layers, and a non-linear (piecewise-linear) learning rule is applied between hidden and input layers. Accordingly, an efficient way of function approximation is provided from the view point of the number of kernel functions as well as increased learning speed.
U.S. Pat. No. 5,950,146 of Vapnik, et al. (hereinafter Vapnik) disclosed a method for estimating a real function that describes a phenomenon occurring in a space of any dimensionality. The function is estimated by taking a series of measurements of the phenomenon being described and using those measurements to construct an expansion that has a manageable number of terms. A reduction in the number of terms is achieved by using an approximation that is defined as an expansion on kernel functions, the kernel functions forming an inner product in Hilbert space. By finding the support vectors for the measurements, one specifies the expansion functions. The number of terms in an estimation according to the invention is generally much less than the number of observations of the real world phenomenon that is being estimated.
The present invention makes use of a statistical learning method similar to Kil or Vapnik, where the input-output relation between a large number of data can be used to learn a regression function. In the present invention, the regression function is used to compute the viewership extrapolation mapping.
In summary, the present invention aims to measure the media viewership using automated and unobtrusive means that employ computer vision algorithms, which is a significant departure from methods using devices that need to be carried by a potential audience. It also provides comprehensive solutions to the sensor placement issue and data extrapolation issue, based on the site and display analysis; these features also contrast with the inventions just introducing measurement algorithms. There have been prior inventions that address the problem of sensor placement for the purpose of monitoring people's behavior, but the present invention provides an optimal solution to the issue based on the analysis of the site and the display. The present invention utilizes crowd tracking or dense motion computation, similar to some of the prior inventions, but in a way that the estimated crowd motion along with its tensorial formulation represent the collective motion of the crowd, both in spatial and temporal dimensions. The present invention employs statistical machine learning approaches, similarly to some of the prior inventions, to extrapolate the sampled viewership data to estimate the site-wide viewership data; the method of the present invention utilizes the learning approach to achieve time-dependent extrapolation of the viewership data based on the insights from the crowd and viewership analysis.

SUMMARY

The present invention is a system and method for designing a comprehensive media audience measurement platform starting from a site, display, and crowd characterization, to a data sampling planning and a data extrapolation method.
It is one of the objectives of the first step of the processing to identify the system elements that affect the crowd and audience behaviors, so that an effective data sampling and extrapolation plan can be studied and designed around the identified parameters.
The step identifies the site, display, crowd, and audience as four major system parameters that need to be considered when designing the media audience measurement solution. The step also identifies subparameters from each of these elements that affect other system variables. The site parameters, including size, location, direction, and width of the pathways in the site, the obstacles for the passers-by, and attractions for the viewers are some of the relevant site parameters that affect the crowd behavior. The position and the orientation of the display within the site, and the size, brightness, and content of the display are the display-related parameters that affect the viewing behavior. The crowd and the audience parameters are assumed to depend on the site and the display parameters. The viewership measurement algorithm is also an important element of the system. The algorithm is assumed to be already calibrated according to the site and the display; designing or calibrating the measurement algorithm is not within the scope of the present invention.
It is one of the objectives of the second step of the processing to model the viewership in relation to the site and display for the purpose of deriving the data sampling plan. The step identifies the display parameters as one of the primary factors that directly affect how the crowd (potential viewers) responds to the display and the displayed media. The display parameters are therefore the first factors to consider when determining the specifications and placement for the sensors; the sensors should be placed to cover the locations where the most viewing is expected to occur. For a realistic data sampling scheme, sensors should also be placed so that they can capture the largest number of potential viewers. The step identifies the crowd occupancy as one of the parameters to consider when planning the sensor placement; it measures how much traffic each floor location will have. The notion of occupancy includes both the number and the moving speed of people; given a unit area in the floor space, the occupancy map represents how many people stay in that area for a given time period.
A subset of the site parameters, the direction of the crowd dynamics and attractions (potential distractions from the media display) in the site, are also crucial in determining the sensor positions and orientations when the specific method to measure the viewing depends on the direction that the viewer is facing. The step identifies the viewership relevancy measure as one of the primary factors to consider when designing a sensor placement plan.
It is one of the objectives of the third step of the processing to compute the viewership sampling map after computing the visibility map, the occupancy map, and the viewership relevancy map, and place the sensor based on the viewership sampling map.
The visibility measure is realized by the visibility map on top of the site map; it is computed based on the display parameters (the position, orientation, and the size of the display) and the visibility according to the characteristics of the human visual perception. The occupancy measure is realized by the occupancy map on top of the site map; it can be determined based on the site parameters alone or based on the actual measured traffic density. The viewership relevancy measure is realized by the viewership relevancy map; it is determined by comparing the local viewership counts from the system and the site-wide ground truth viewership counts. If a certain floor position has higher correlation between the measured viewership counts and the ground truth site-wide viewership counts, then it has higher viewership relevancy.
After the series of analysis, the visibility map, the occupancy map, and the viewership relevancy map are combined to determine the viewership sampling map. The sensors are placed and oriented, and the lens specifications are determined so that the floor area covered by the sensors can sample the viewership in an optimal manner. There may be other constraints to consider, such as physical constraints, the cost for specific placement, and the requirement to hide the sensors from public view, etc.
In one of the exemplary embodiments of the present invention, the set of sensors for measuring the viewership can also be used to measure the crowd dynamics.
In another exemplary embodiment of the present invention, a dedicated set of sensors can be used to measure the crowd dynamics. The sensors can be ceiling mounted so that they look down and potentially provide more reliable estimates of the crowd dynamics. The sensor placement can follow the same principle as the placement of the viewership measurement sensors; the placement only needs to consider the occupancy map so that the arrangement of the sensors can achieve a maximum coverage of the crowd motion.
It is one of the objectives of the fourth step of the processing to come up with a model of the viewership in relation to the crowd behavior, to estimate the total viewership from the sampled data.
The arrangement of the sensors generates certain sampled measurement of the viewership at the site. It is necessary to determine the parameters that affect the relation between the sampled measurement and the total viewership, so that one can design an extrapolation map based on these parameters. In a simplest scenario where the viewing behavior only depends on the floor position, the extrapolation map will be a simple function that is fixed over time; the function is just a constant factor that is multiplied to the measured viewership to estimate the total site-wide viewership.
In a more realistic scenario, the viewing behavior at a given floor position can change over time. The time-changing likelihood of the viewership as a function of the floor position at a given time instance is called the viewership map. The step identifies the crowd dynamics as the single most decisive variable that affects the viewing behavior and consequently the viewership map. It is assumed that other parameters, such as the site and time elements, are all reflected in the crowd dynamics. The present system formulates the extrapolation map as parameterized by the crowd dynamics. The system measures the crowd dynamics once for each time period, and determines the extrapolation map. The crowd dynamics parameter can be continuous (numerical) or discrete (categorical). Given that the crowd dynamics is the major factor in how the viewing behavior changes over time, it is important to have a proper mathematical representation of it. Because each floor position can have potentially multiple directions that the crowd frequently travels, vector field representation is not general enough. The step identifies the crowd velocity tensor field as the mathematical representation of the crowd dynamics for the purpose of extrapolating sampled measurements. At each floor position, the distribution of the crowd velocity is accumulated during the time interval, and the covariance matrix is computed. The directions of the Eigenvectors of the matrix represent the dominant directions of the crowd motion, and the magnitudes of the Eigenvalues represent the average velocities of the corresponding directions.
To be able to determine the extrapolation map using the estimated tensor field, it is necessary to understand how the changing crowd velocity tensor field affects the viewership map—how the crowd dynamics change the viewing behavior. More specifically, one needs to identify a set of features from the crowd velocity tensor field that are most relevant to encoding the relation between the crowd dynamics and the viewership map. The step identifies the motion anisotropy and the crowd speed (average speed at the floor position) as the most relevant features from the crowd dynamics. The motion anisotropy is the ratio of the two Eigenvalues of the velocity tensor, and is determined at every sampled point on the floor space. It measures the ratio of the primary speed to the secondary speed: the degree of dominance of the primary motion direction. The motion anisotropy is computed using the crowd velocity tensor at each sampled floor space.
It is one of the objectives of the fifth step of the processing to come up with a method to compute an extrapolation map as parameterized by the crowd dynamics, which extrapolates the viewership measurement in the sampled area to the viewership estimate for the whole site.
In a simplified scenario where the viewership occurs uniformly across the whole site, simply multiplying the sampled number of viewership by a constant (the ratio between the area of the sampled site and the area of the whole site) will effectively perform the extrapolation. In real scenarios, the distribution of viewership is not uniform and, from our assumption, it changes according to the crowd dynamics. Therefore, the multiplying factor should be a function of the invariant features of the crowd dynamics.
The relationship from the set of invariant features of the crowd dynamics to the extrapolation factor is estimated using the training data. The viewership data from the sampled region and the ground truth viewership measurement from the whole site are collected over some period of time. A statistical learning or regression method can be used to find the relationship from the crowd velocity anisotropy histogram to the viewership extrapolation factor.

DRAWINGS—FIGURES

FIG. 1 is an overall view of the preferred embodiment of the invention.

FIG. 2 shows a typical exemplary embodiment of the media measurement system deployed in a public space, and all the site elements.

FIG. 3 shows sub-elements within each major system parameter of the audience measurement framework.

FIG. 4 shows the site-viewership analysis.

FIG. 5 shows the sensor placement scheme derived from the site-viewership analysis.

FIG. 6 shows the viewership relevancy map computed on the site layout.

FIG. 7 shows the computed viewership sampling map and the sensor placement based on it.

FIG. 8 shows the crowd-viewership modeling for the purpose of viewership extrapolation.

FIG. 9 shows the method to compute the crowd velocity tensor field.

FIG. 10 shows the process of extrapolation map estimation.

FIG. 11 shows an exemplary embodiment of the feature to viewership map learning.

FIG. 12 shows the actual extrapolation scheme.

FIG. 13 shows an exemplary embodiment of the viewership measurement algorithm.

DETAILED DESCRIPTION OF THE INVENTION

A preferred embodiment of the closure of the present invention is illustrated in FIG. 1. It shows four major system parameters 100 of the media audience measurement system, and their dependencies (marked by arrows) identified by the dependency analysis 200, in the process of designing the media audience measurement system. From the dependency analysis 200, data processing methods 300 are derived. The site 105 parameter influences the position and size of the display 120 in a way that the display 120 should be well visible in the targeted area on the site. The site parameters, such as the arrangement of the pathways and obstacles, also affect the crowd 140 behavior. The display 120 specifications are determined based on the site 105 characteristics; the display and content analysis 220 procedure is not a scope of the present invention. Based on the analysis about the influence of the site 105, the display 120, and the crowd parameters on the behavior of the audience 160, the site-viewership analysis 220 stage determines the viewership sampling method 320. The crowd-viewership analysis 270 identifies the crowd dynamics as the major deciding factor from the crowd 140 for modeling the changing viewership map, and further determines the viewership extrapolation method 340 to estimate the site-wide viewership from the sampled viewership measurement. The measurement algorithm 180 is also a system parameter that affects the site-viewership analysis, and therefore affects the viewership sampling method. However, the viewership measurement algorithm 180 is not a variable element, and the design of the algorithm is not within the scope of the present invention; only its fixed influence to the viewership measurement is considered.
FIG. 2 shows a typical exemplary embodiment of the media measurement system deployed in a public space among all the site elements. The site 105 is the limited space where the crowd 140 resides and the display 120 is placed so that the audience 160 views the display 120. Based on the viewership sampling method 320, both the viewership measurement sensor 375 and the crowd measurement sensor 372 are placed. The viewership data 430 is sampled from the viewership measurement sensor coverage area 316, and the crowd dynamics data 420 is sampled from the crowd measurement sensor coverage area 314. The site-wide viewership data 435 including the viewership data of the non-sampled audience 442 (from the non-sampled audience 163), is extrapolated from the sampled viewership data 440 (from the sampled audience 162).
FIG. 3 shows sub-elements within each major system parameters 100 of the audience measurement framework. The site parameters 110 have the size 111, (the location, size, and direction of) the pathways/obstacles 112 and visual attractions 115. The display parameters 125 have the position/orientation/size 126 of the display. The brightness/contrast/content 127 of the display are also parameters that may affect the viewing behavior; these parameters are not taken into account for the purpose of audience measurement design, because they may be a part of what the measurement system aims to evaluate. The crowd parameters 145 have number/position/velocity as its sub-parameters. These elements are represented within a single quantity of crowd motion tensor field 285, based on the site-viewership analysis 220. The audience parameters 165 have number/position/duration 166 of the viewing, which are also represented as a single entity of the viewership map 265.
FIG. 4 shows the site-viewership analysis 220, where the influence of the site parameters 110 and the display parameters 125 to the audience behavior 168, and the influence of the measurement algorithm to the viewership measurement data 432, have been derived. As products of the analysis, the notions of occupancy 230, visibility 240, and the viewership relevancy 250 have been identified; they are crucial elements extracted from the set of the system parameters that affect the audience behavior and the viewership measurement process. The geometry of site 105 constrains the degree of occupancy at each floor position, so that the people in the general public 142 form the spatial distribution of the crowd 140. The display 120 (including its position, size, and the direction) determines the visibility 240 at each floor position; the visibility 240 affects how the crowd 140 converts into an audience 160. In a typical scenario, the people in the crowd 140 convert into the audience 160 more frequently when the display is more easily visible. The viewership relevancy 250 captures the comprehensive effect of the site 105 (through the attractions 115) to the audience behavior 168, and of the viewership measurement algorithm 180 to the viewership measurement data 432. At a floor position where there are other attractions (another visual display, merchandise, etc.), these attractions can draws people's attention so that the people looking at these can be falsely detected by the viewership measurement algorithm 180. Therefore, both the attractive element in the site and the limitation of the viewership measurement algorithm 180 can render the measured viewership less dependable. These floor positions should be assigned lower viewership relevancy.
FIG. 5 shows the viewership sampling method 320 derived from the site-viewership analysis 220. The visibility 240 is realized by the visibility map 244 on the site plan 117; it is computed based on the display-perception analysis 242: the study of the influence of display parameters 125 (the position, orientation, and the size of the display) to the visibility in relation to the characteristics of the human visual perception. The occupancy 230 is realized by the occupancy map 234 on the site map; it is determined based on the crowd occupancy analysis 232 (actual crowd measurement or prediction). The viewership relevancy is realized by the viewership relevancy map 252; it is determined by comparing the initial viewership measurement 312 over time from the viewership measurement algorithm 180 and the site-wide ground truth viewership data 445 over time. Both the ground truth viewership data 445 and the initial viewership measurement 312 are accumulated over a certain period of time. The period of time should be determined to cover most of the variations of the crowd dynamics 155 and the audience behavior 168. Then the correlation between the ground truth viewership data 445 and the temporal changes of the initial viewership measurement 312 are computed at each floor position. If a certain floor position has higher correlation, then it has higher viewership relevancy.
The visibility map 244, the occupancy map 234, and the viewership relevancy map 252 are combined to determine the viewership sampling map 260. In an exemplary embodiment of the present invention, the viewership sampling map 260 can be computed by multiplying the corresponding visibility 240, occupancy 230, and viewership relevancy 250 values at each floor position. In a certain scenario, the collection of viewership measurement data 432 and the ground-truth viewership data can be so expensive that estimating the viewership relevancy map is not feasible. In this case, the viewership sampling map 260 can be determined by the visibility map 244 and the occupancy map 234. In a more limited scenario where the crowd analysis or measurement is too costly, only the visibility map 244 can be used to determine the viewership sampling map 260.
The sensors are placed, oriented, and the lens specifications are determined so that the floor area covered by the sensors can sample the viewership in an optimal manner. There may be other constraints to consider, such as physical constraints, the cost for specific placement, and the requirement to hide the sensors from public view, etc; these are captured through the sensor placement constraints 332.
FIG. 6 shows the viewership relevancy map 252 computed on the site layout. The viewership relevancy 250 is illustrated as different gray values in the figure; the darker regions represent the higher viewership relevancy. The baseline viewership relevancy map will have the highest relevancy values around the ‘sweet spot’ near the display in terms of ease of viewing. The reason the high visibility spot may have high viewership relevancy 250 is that the viewership measured at the position is likely to be more dependable; the measurement will be more likely to contain viewership from a true audience 160 than falsely detected viewership (a false-audience 161). The relevancy measure gradually decreases away from the display and from the optimal viewing angles. In the figure, the viewership relevancy map has irregular bumps near an obstacle 114 (a column) and in front of a visual attraction 115. The placement of the column structure makes the display invisible from behind the column or keeps people from staying near it; therefore the region close to the column has low viewership relevancy. The second bump in the viewership relevancy map is caused by another displayed object-the visual distraction (the attractions 115 in the site parameters 110) on the wall. People (false audience 161) may show viewing behavior in front of the display 120, but that doesn't belong to the target display 120; the viewership occurring near the distraction may or may not turn out as a true viewership. Therefore, the floor position provides less reliable samples, and should have less viewership relevancy.
FIG. 7 shows the computed viewership sampling map 260 based on the site-viewership analysis 220 of the display 120, and the sensor placement based on it. The distribution of the viewership sampling map 260 is illustrated as different gray values in the figure; the darker regions represent the higher viewership sampling measure. The sensor field-of-view is typically a conic region, and the angle of coverage is determined by the focal length of the lens and the size of the imaging chip (such as CCD, CMOS). The length of the sensor coverage is determined by both the focal length of the lens and the video resolution, so that the captured video frame has large enough images of faces to be detected and processed by the viewership measurement algorithm 180. Given the constraints of the number of sensors and of the mounting, the sensor specifications (lens focal lengths, CCD sizes, etc.) should be determined, and the sensors should be placed so that the sensor coverage cones cover the viewership sampling map 260 in an optimal manner. In the figure, the area is covered by two sensors: the first viewership measurement sensor 376, and the second viewership measurement sensor 377. The two cones represent the first viewership measurement sensor coverage area 317 and the second viewership measurement sensor coverage area 318.
FIG. 8 shows the crowd-viewership analysis 270 for the purpose of deriving the viewership extrapolation method. The viewership extrapolation aims to estimate the site-wide viewership data based on the sampled viewership data from the floor area covered by the viewership measurement sensors, taking into account the temporally varying nature of the audience behavior among floor positions. Based on the analysis of how the crowd behavior 150 influences the spatial and the temporal viewership 172, the crowd dynamics 155 are identified as a sole independent parameter for constraining the viewership extrapolation. The crowd dynamics 155 are assumed to reflect the constraints enforced by both the site parameters 110 and the temporal factor (time of the day, day of the week, seasonal, etc.), so that the viewership extrapolation map can be designed around the crowd dynamics 155. The crowd velocity tensor field 285 is identified as a mathematical representation of the crowd dynamics. The viewership map 265 is considered to encode the relationship between the site-wide viewership and the local viewership, so that the site-wide viewership can be estimated from the sampled (local) viewership measurement data. The viewership map 265 is conceptually the spatial distribution of the likelihood of viewership happening on the site. However, in reality it is not feasible to estimate the site-wide viewership map. In an exemplary embodiment of the present invention, the viewership map 265 is realized by a time-varying constant (called the viewership extrapolation factor 352) that is the ratio between the sampled viewership 178 and the site-wide viewership 174. This quantity encodes the relation between the sampled area and the total site, in terms of the likelihood of the viewership in the given time period.
FIG. 9 shows the method to compute crowd velocity tensor field 285 based on the crowd motion field data collected over a sampling time period 408. The sampling time period 408 should be determined so that it is long enough to collect sufficient crowd motion vectors 276 data for reliable tensor estimation, yet short enough to capture meaningful changes in crowd dynamics. The sampling time period 408 also corresponds to the frequency of the extrapolation map update. In an exemplary embodiment of the present invention, the crowd dynamics 155 are collected over 15 minutes, and then the viewership extrapolation map 350 is estimated and updated based on the measured crowd dynamics 155. The crowd velocity vector 275 is also collected inside a sampling grid 411 where the sampling grid 411 is predefined over the site plan 117. The crowd velocity vectors 276 accumulated over the sampling time period 408 inside a sampling grid 411 are then converted to a crowd velocity covariance matrix 280. The Eigen spectrum 282 (the two Eigenvalues and Eigenvectors) of the crowd velocity covariance matrix 280 then represent the crowd velocity tensor 287 at each sampled floor position.
FIG. 10 shows the process of viewership extrapolation map learning 362. To be able to determine the extrapolation map using the crowd velocity tensor field 285 extracted from the crowd dynamics 155, the step identifies and extracts the crowd dynamics invariant feature 293 based on the analysis on how the changing crowd velocity tensor field 285 affects the viewership map 265. In an exemplary embodiment of the present invention, two quantities are identified as the most relevant features to represent the relationship between the crowd dynamics 155 and the viewership map 265: the crowd motion anisotropy 290 and the crowd average speed 277. The crowd motion anisotropy 290 is the ratio of the two Eigenvalues of the velocity tensor, and is determined for every sampling grid 411 on the floor space; it measures the ratio of the primary speed to the secondary speed: the degree of dominance of the motion direction. The crowd average speed 277 is simply the average speed of the crowd motion vectors 276 at the floor position. The crowd motion anisotropy 290 and the crowd average speed 277 are computed using the crowd velocity tensor 287 at each sampled floor space, and the joint distribution of both, called the crowd dynamics distribution 295, is estimated by the histogram of the accumulated crowd motion anisotropy and speed.
In a simplified scenario where the viewership occurs uniformly across the whole site, simply multiplying the sampled number of viewership by a constant (the ratio between the sampled and the whole area) will effectively perform the extrapolation. In real scenarios, the viewership distribution is not uniform; from our assumption, it changes according to the crowd dynamics. The relationship from the set of invariant features for the crowd dynamics to the extrapolation factor is estimated using the training data. The viewership data from the sampled region and the ground truth viewership measurement from the whole site are collected over some period of time. A statistical learning or regression method can be used to find the relationship. In general, the relation between the crowd dynamics invariant feature 293 and the viewership map 265 is learned in the feature to viewership map learning 358 step. Then the learned feature to viewership map 357 is used to compute the viewership map 265, which is then used to compute the viewership extrapolation map 350. In a typical scenario, the sampled to site-wide viewership ratio 448 can be an ultimately useful quantity that is used as an extrapolation factor; it extracts the relation between areas in the viewership map into a single scalar value. The viewership extrapolation factor 352 should be a function of the invariant features of the crowd dynamics.
Once the feature to viewership map 357 has been learned off-line, the process of determining the viewership extrapolation map 350 from the crowd velocity tensor field 285 is carried out on-line using the learned feature to viewership map 357. This on-line process is called viewership extrapolation map determination 360.
FIG. 11 shows an exemplary embodiment of the feature to viewership map learning 358. From the crowd dynamics 155, the crowd dynamics distribution 295 is computed (via crowd velocity tensor field 285 computation). The corresponding viewership map is computed based on the ground truth measurement of the viewership inside the sampled area and the site-wide area. In the exemplary embodiment, the viewership map is represented simply by the sampled to site-wide viewership ratio 448. The process is repeated many times over different time periods, and the training data 415 is accumulated. The training data 415 is then fed to a learning machine 364, so that the learning machine can estimate a sampled to site-wide viewership ratio 448 from the crowd dynamics distribution 295. The sampled to site-wide viewership ratio 448 is used as a viewership extrapolation factor 352.
FIG. 12 shows the actual viewership extrapolation 342 scheme, after the viewership extrapolation map 350 has been estimated. Given the measured crowd velocity tensor field 285, the viewership extrapolation map determination 360 step determines the correct viewership extrapolation map 350 for the crowd dynamics of the current scene. The sampled viewership data 440 then extrapolates to the site-wide viewership data 435 using this viewership extrapolation map 350.
FIG. 13 shows an overview of a viewership measurement algorithm 180 in an exemplary embodiment of the present invention. The viewership measurement algorithm 180 aims to automatically assess the viewership for the display 120, by processing the video input images 342 from a viewership measurement sensor 375. The system takes live video input images 185; the face detection 190 step finds people's faces in the video. The face tracking step 192 tracks individual faces by keeping the identities, and estimates the 3-dimensional facial pose 194, time-stamps the appearance and disappearance, and collects 197 the data; it effectively collects the time and duration of the viewership. In the exemplary embodiment, the sensor can be placed flexibly near the display 120, because the 3-dimensional pose estimation method can automatically correct the viewing angle offset between the viewership measurement sensor 375 and the display 120.
While the above description contains much specificity, these should not be construed as limitations on the scope of the invention, but as exemplifications of the presently preferred embodiments thereof. Many other ramifications and variations are possible within the teachings of the invention. Thus, the scope of the invention should be determined by the appended claims and their legal equivalents, and not by the examples given.

Claims

1. A method for designing an automatic media measurement system based on information from visual sensors, comprising the following steps of:

a) deriving a viewership sampling map based on visibility, occupancy, and viewership relevancy pertaining to a site and a display,

b) placing viewership measurement sensors based on the viewership sampling map,

c) measuring a sampled viewership in a first time period by processing the visual data from the viewership measurement sensors using a viewership measurement algorithm,

d) determining a time-varying viewership extrapolation map based on crowd dynamics invariant features extracted from a crowd dynamics measurement, and

e) estimating a site-wide viewership of a second time period based on the sampled viewership, using the viewership extrapolation map.

2. The method according to claim 1, wherein the method further comprises a step of deriving the viewership sampling map based on

a) a visibility map derived from a display-human perception analysis,

b) an occupancy map derived from a crowd occupancy analysis, and

c) a viewership relevancy map derived from a comparison of ground truth viewership data and an initial viewership measurement.

3. The method according to claim 2, wherein the display-human perception analysis further comprises a step of finding a relationship between display characteristics and a degree of the ease of viewing in relation to the position of a human audience relative to the display,

wherein the visibility map is computed given a position and a direction of the display.

4. The method according to claim 2, wherein the crowd occupancy analysis further comprises a step of predicting a crowd flow based on the site constraints and computing the occupancy map,

whereby the crowd occupancy analysis measures the crowd flow over time and computes the occupancy map based on time-integrated measured data.

5. The method according to claim 2, wherein the method further comprises a step of computing a statistical correlation between the ground truth viewership data and the initial viewership measurement using the viewership measurement algorithm and the viewership measurement sensors, for computing the viewership relevancy map.

6. The method according to claim 1, wherein the method further comprises a step of placing the viewership measurement sensors so that a spatial coverage of the sensors optimizes the viewership sampling map,

whereby the optimization is constrained by the sensor placement constraints imposed by the physical structure of the site, the cost of an installation, and the requirement to hide the viewership measurement sensors.

7. The method according to claim 1, wherein the crowd dynamics measurement further comprises a step of computing a crowd velocity tensor field at each sampling grid on the site floor by computing the covariance matrix of the crowd motion vectors.

8. The method according to claim 7, wherein the method further comprises a step of estimating the crowd motion vectors using dense optical field estimation.

9. The method according to claim 7, wherein the method further comprises a step of estimating the crowd motion vectors by tracking each person in the crowd and computing the trajectories.

10. The method according to claim 7, wherein the method further comprises a step of computing the crowd motion anisotropy at each sampling grid by dividing the first Eigenvalue by the sum of the two Eigenvalues.

11. The method according to claim 7, wherein the crowd dynamics measurement further comprises a step of utilizing at least a top-down view visual sensor.

12. The method according to claim 7, wherein the crowd dynamics measurement further comprises a step of utilizing the viewership measurement sensors.

13. The method according to claim 7, wherein the method further comprises a step of utilizing the histogram of the dynamic features of the crowd dynamics at each sampling grid, for the crowd dynamics invariant features.

14. The method according to claim 7, wherein the crowd dynamics invariant features extraction further comprises a step of computing the crowd dynamics distribution over the site,

whereby the crowd dynamics distribution is processed by computing the joint histogram of crowd motion anisotropy and the average speed of the crowd over the sampling grid.

15. The method according to claim 1, wherein the method further comprises a step of formulating the viewership extrapolation map as parameterized by the crowd dynamics invariant features of the crowd dynamics.

16. The method according to claim 1, wherein the method further comprises a step of determining the viewership extrapolation map by estimating the viewership map in each time period.

17. The method according to claim 16, wherein the method further comprises a step of representing the viewership map by the sampled viewership to the site-wide viewership ratio,

wherein the viewership extrapolation map is a multiplication by a time-varying constant whose value is an estimated value of the sampled viewership to the site-wide viewership ratio.

18. The method according to claim 16, wherein the method further comprises a step of learning a functional relation between the crowd dynamics and the viewership map by training a learning machine.

19. The method according to claim 1, wherein the viewership measurement algorithm further comprises the following steps of:

a) employing face detection to find faces in captured video frames,

b) employing face tracking to keep the identities of an audience in the scene, and

c) estimating a 3-dimensional facial pose to detect the viewership of an audience.

20. An apparatus for designing an automatic media measurement system based on information from visual sensors, comprising:

a) means for deriving a viewership sampling map based on visibility, occupancy, and viewership relevancy pertaining to a site and a display,

b) means for placing viewership measurement sensors based on the viewership sampling map,

c) means for measuring a sampled viewership in a first time period by processing the visual data from the viewership measurement sensors using a viewership measurement algorithm,

d) means for determining a time-varying viewership extrapolation map based on crowd dynamics invariant features extracted from a crowd dynamics measurement, and

e) means for estimating a site-wide viewership of a second time period based on the sampled viewership, using the viewership extrapolation map.

21. The apparatus according to claim 20, wherein the apparatus further comprises means for deriving the viewership sampling map based on

a) a visibility map derived from a display-human perception analysis,

b) an occupancy map derived from a crowd occupancy analysis, and

c) a viewership relevancy map derived from a comparison of a ground truth viewership data and an initial viewership measurement.

22. The apparatus according to claim 21, wherein the display-human perception analysis further comprises means for finding a relationship between display characteristics and a degree of the ease of viewing in relation to the position of a human audience relative to the display,

23. The apparatus according to claim 21, wherein the crowd occupancy analysis further comprises means for predicting a crowd flow based on the site constraints and computing the occupancy map,

24. The apparatus according to claim 21, wherein the apparatus further comprises means for computing a correlation between the ground truth viewership data and the initial viewership measurement using the viewership measurement algorithm and the viewership measurement sensors for the viewership relevancy map.

25. The apparatus according to claim 20, wherein the apparatus further comprises means for placing the viewership measurement sensors so that a spatial coverage of the sensors optimizes the viewership sampling map,

whereby the optimization is constrained by the sensor placement constraints imposed by the physical structure of the site and the cost of an installation.

26. The apparatus according to claim 20, wherein the crowd dynamics measurement further comprises means for computing a crowd velocity tensor field at each sampling grid on the site floor by computing the covariance matrix of the crowd motion vectors.

27. The apparatus according to claim 26, wherein the apparatus further comprises means for estimating the crowd motion vectors using a dense optical field estimation.

28. The apparatus according to claim 26, wherein the apparatus further comprises means for estimating the crowd motion vectors by tracking each person in the crowd and computing the trajectories.

29. The apparatus according to claim 26, wherein the apparatus further comprises means for computing the crowd motion anisotropy at each sampling grid by dividing the first Eigenvalue by the sum of the two Eigenvalues.

30. The apparatus according to claim 26, wherein the crowd dynamics measurement further comprises means for utilizing at least a top-down view visual sensor.

31. The apparatus according to claim 26, wherein the crowd dynamics measurement further comprises means for utilizing the viewership measurement sensors.

32. The apparatus according to claim 26, wherein the apparatus further comprises means for utilizing the histogram of the dynamic features of the crowd dynamics at each sampling grid for the crowd dynamics invariant features.

33. The apparatus according to claim 26, wherein the crowd dynamics invariant features extraction further comprises means for computing the crowd dynamics distribution over the site,

34. The apparatus according to claim 20, wherein the apparatus further comprises means for formulating the viewership extrapolation map as parameterized by the crowd dynamics invariant features of the crowd dynamics.

35. The apparatus according to claim 20, wherein the apparatus further comprises means for determining the viewership extrapolation map by estimating the viewership map in each time period.

36. The apparatus according to claim 35, wherein the apparatus further comprises means for representing the viewership map by the sampled viewership to the site-wide viewership ratio,

37. The apparatus according to claim 35, wherein the apparatus further comprises means for learning a functional relation between the crowd dynamics and the viewership map by training a learning machine.

38. The apparatus according to claim 20, wherein the viewership measurement algorithm further comprises:

a) means for employing face detection to find faces in captured video frames,

b) means for employing face tracking to keep the identities of audience in the scene, and

c) means for estimating a 3-dimensional facial pose to detect the viewership of an audience,

whereby the viewership measurement sensors comprise video cameras.