HK1252983B

HK1252983B - Method and system for determining and displaying contextual targeted content

Info

Publication number: HK1252983B
Application number: HK18112343.9A
Authority: HK
Inventors: 泽埃夫·诺伊迈尔; 艾多·利贝尔提
Original assignee: 构造数据有限责任公司
Priority date: 2009-12-29
Filing date: 2018-09-26
Publication date: 2021-05-07

Description

Method and system for determining and displaying contextually targeted content

本申请是申请号为201080064471.9、申请日为2010年11月18日、国际申请号为PCT/US2010/057153、发明名称为“联网电视的视频片段识别方法及上下文定向内容显示方法”的发明专利申请的分案申请。This application is a divisional application of the invention patent application with application number 201080064471.9, application date November 18, 2010, international application number PCT/US2010/057153, and invention name "Video clip recognition method and context-oriented content display method for networked TV".

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本项申请根据美国临时专利申请案《美国法典》第35卷119(e)条的规定主张2009年5月29日提出的第61/182,334号美国临时专利申请案以及2009年 12月29日提出的第61/290,714号的利益。This application claims the benefit of U.S. Provisional Patent Application No. 61/182,334, filed May 29, 2009, and U.S. Provisional Patent Application No. 61/290,714, filed December 29, 2009, under 35 U.S.C. § 119(e).

背景技术Background Art

本发明大体上涉及电视系统正在显示的视频片段的识别系统和方法以及根据该视频片段识别向电视系统提供上下文定向内容的系统和方法。在本文中，“电视系统”包括但不限于网络电视、联网电视等电视以及与电视搭配使用或纳入电视的设备，例如，机顶盒(STB)、DVD播放器、视频录像机等。在本文中，“电视信号”包括代表视频数据和音频数据的信号，视频数据和音频数据一起播送(有/无元数据)，形成电视节目或广告的图像和声音部分。在本文中，“元数据”指与电视信号的音频/视频数据有关的数据。The present invention generally relates to systems and methods for identifying a video clip being displayed by a television system, and systems and methods for providing contextually targeted content to the television system based on the identification of the video clip. As used herein, a "television system" includes, but is not limited to, televisions such as network televisions, connected televisions, and devices used with or incorporated into a television, such as set-top boxes (STBs), DVD players, and video recorders. As used herein, a "television signal" includes a signal representing video data and audio data that are broadcast together (with or without metadata) to form the image and sound components of a television program or advertisement. As used herein, "metadata" refers to data related to the audio/video data of a television signal.

近年来，光纤和数字传输技术的发展使电视行业得以提高频道容量，提供某种程度的互动电视服务。这种技术进步主要得益于电视行业结合了强大的计算机处理能力(以机顶盒的形式)以及光缆的大信息容量。电视行业成功地利用机顶盒扩大了渠道选择范围，实现了某种程度的互动。Recent advances in fiber optics and digital transmission technology have enabled the television industry to increase channel capacity and offer a degree of interactive television services. This technological advancement is primarily due to the television industry's ability to combine powerful computer processing power (in the form of set-top boxes) with the high data capacity of optical fiber cables. The television industry has successfully used set-top boxes to expand channel selection and achieve a degree of interactivity.

人们开发互动电视(ITV)技术的目的是使电视(TV)机成为双向信息发送机构。具有各种营销、娱乐及教育功能的互动电视(例如，能让用户订购广告产品或服务的互动电视)与游戏节目等竞争节目形成竞争。一般由机顶盒执行为电视转播编写的互动程序，控制互动功能。互动功能一般显示在电视屏幕上，包括图标或菜单，供用户通过遥控器或键盘做出选择。Interactive television (ITV) technology was developed to transform televisions into two-way information delivery devices. ITV, with its various marketing, entertainment, and educational features (for example, allowing users to subscribe to advertised products or services), competes with competing programs such as game shows. Interactive features are typically controlled by a set-top box, which executes interactive programs written for the television broadcast. These features are typically displayed on the TV screen as icons or menus, which the user can select using a remote control or keyboard.

根据一项现有技术，互动内容可以加入播放流(本文亦称“频道/网络输入”)。在本发明中，“播放流”指电视收到的播放信号(模拟或数字信号)，无论信号传输方式，例如，天线、卫星、电缆等模拟或数字信号传输方式。一种确实将互动内容加入播放流的现有技术是在特定节目的播放流中插入触发程序。有时称这种插入触发程序的节目内容为增强节目内容或增强电视节目或视频信号。触发程序可以提醒机顶盒注意互动内容的存在。触发程序包含可用内容信息以及内容的内存位置信息。触发程序还可以包含在屏幕底部等位置显示的用户感知文本，提示用户采取某一行动或者在多个选项中作选择。According to one prior art, interactive content can be added to the playback stream (also referred to herein as "channel/network input"). In the present invention, "playback stream" refers to the playback signal (analog or digital signal) received by the television, regardless of the signal transmission method, for example, antenna, satellite, cable and other analog or digital signal transmission methods. One prior art method for actually adding interactive content to the playback stream is to insert a trigger program into the playback stream of a specific program. Sometimes the program content with the inserted trigger program is called enhanced program content or enhanced television program or video signal. The trigger program can alert the set-top box to the presence of interactive content. The trigger program contains information about the available content and the memory location information of the content. The trigger program can also include user-aware text displayed at a location such as the bottom of the screen, prompting the user to take an action or make a choice among multiple options.

联网电视是通过观众的家庭网络(有线或无线)接入互联网的电视。联网电视运行互动网络型应用程序。有几种相互竞争的联网电视平台，其中 Yahoo是最主要的联网电视平台(参见“http://connectedtv.yahoo.com/”)。联网电视平台的基本共同特性包括(1)与互联网的连接；以及(2)在电视显示器顶部运行软件的能力。目前市场上已经有几种支持上述功能的电视 (例如，LG、三星和瑞轩已经推出了一些型号)。不久的将来，会有更多此类电视进入市场。行业观察人员预测，几年后，所有新电视都会有这些特性。Connected TV is a television that is connected to the Internet through the viewer's home network (wired or wireless). Connected TV runs interactive web-based applications. There are several competing connected TV platforms, with Yahoo being the leading one (see "http://connectedtv.yahoo.com/"). The basic common features of connected TV platforms include (1) connectivity to the Internet; and (2) the ability to run software on top of the TV display. There are already several TVs on the market that support these features (for example, LG, Samsung, and RTX have some models available). More such TVs will enter the market in the near future. Industry observers predict that in a few years, all new TVs will have these features.

联网电视能运行雅虎微件引擎、Flash Lite(参见“http://www.adobe.com/products/flashlite/”)、Google Android等应用平台或专有平台。开发者社区构建了在该平台上运行的微件。微件是图形用户界面的元素，图形用户界面显示可供用户修改的信息整序，例如，窗口和文本框。微件引擎是指运行微件的操作系统。在本文中，“微件”指在微件引擎上运行的代码。每个微件各自运行自己的系统进程，因此关闭一个微件时不影响其他微件的运行。微件引擎可包括名为“底座”的特性，“底座”显示各可用微件的相应图标。电视微件使观众能够通过某种方式，例如请求观看主题的附加信息，与电视进行互动，无须将观众的背景从观看电视节目切换到进入应用程序。针对上述请求，请求的信息作为微件的部分视觉再现显示在电视屏幕上。Connected TVs can run application platforms such as the Yahoo Widget Engine, Flash Lite (see http://www.adobe.com/products/flashlite/), Google Android, or proprietary platforms. A community of developers builds widgets that run on these platforms. Widgets are elements of a graphical user interface that display an arrangement of information that can be modified by the user, such as windows and text boxes. The widget engine is the operating system that runs widgets. In this article, "widget" refers to the code that runs on the widget engine. Each widget runs its own system process, so closing one widget does not affect the operation of other widgets. The widget engine may include a feature called a "dock," which displays icons for each available widget. TV widgets allow viewers to interact with the TV in a way that allows them to, for example, request additional information about the subject being viewed, without having to switch their context from watching a TV program to accessing an application. In response to these requests, the requested information is displayed on the TV screen as part of the visual representation of the widget.

目前，几乎所有电视(联网等)在观众正在观看的内容上没有元数据。虽然内容输送管线有一些位和片段形式的信息，但是当节目到达屏幕时，仅剩下视频和音频信息。具体而言，电视不知道观众在观看的频道或节目，也不知道节目内容。(观众在屏幕上看到的频道和节目信息有时候是从不完整的信息嫁接到机顶盒的信息。)这种障碍乃电视内容发行业的基本结构所致，限制了互动电视的功能范围，对互动电视而言是非常严重的问题。Currently, almost all televisions (connected or otherwise) have no metadata about the content a viewer is watching. While the content delivery pipeline has some information in the form of bits and pieces, by the time a program reaches the screen, only the video and audio information remains. Specifically, the TV has no idea which channel or program the viewer is watching, nor does it know the content of that program. (The channel and program information that viewers see on the screen is sometimes grafted onto the set-top box from incomplete information.) This barrier, stemming from the fundamental structure of the television content distribution industry, limits the capabilities of interactive TV and is a serious problem for interactive TV.

因此，需要改进观众正在观看的视频片段识别系统和方法。另外还需要改进向联网电视系统提供上下文定向内容的系统和方法。Therefore, there is a need for improved systems and methods for identifying the video segment that a viewer is currently watching. There is also a need for improved systems and methods for providing contextually targeted content to a networked television system.

发明内容Summary of the Invention

本发明涉及电视系统的屏幕上正在显示的视频片段的识别系统和方法。具体而言，正在观看的视频片段的识别数据可用于提取观众对于特定视频片段(广告等)的反应(换频道等)，并且作为指标报告所提取的信息。The present invention relates to a system and method for identifying a video segment currently being displayed on a television system screen. Specifically, identification data of the video segment currently being viewed can be used to extract the viewer's reaction (e.g., changing the channel) to a particular video segment (e.g., an advertisement), and the extracted information can be reported as an indicator.

在某些实施方式中，识别视频片段的方法是先对屏幕显示的像素数据 (或相关音频数据)子集取样，然后在内容数据库中查找类似的像素(或音频)。在另一些实施方式中，视频片段的识别方法是先提取与该视频片段相关的视频或图像数据，然后在内容数据库查找类似的音频或图像数据。在一些替代性实施方式中，视频片段的识别方法是利用现有的自动语音识别技术处理与该视频片段有关的音频数据。在另一些替代性实施方式中，通过处理与视频片段相关的元数据识别该视频片段。In some embodiments, a video segment is identified by first sampling a subset of pixel data (or associated audio data) displayed on the screen and then searching a content database for similar pixels (or audio). In other embodiments, a video segment is identified by first extracting video or image data associated with the video segment and then searching a content database for similar audio or image data. In some alternative embodiments, a video segment is identified by processing the audio data associated with the video segment using existing automatic speech recognition technology. In other alternative embodiments, a video segment is identified by processing metadata associated with the video segment.

本发明进一步涉及向互动电视系统提供上下文定向内容的系统和方法。进行上下文定向时，不仅需要识别显示的视频片段，还要确定当前显示的视频片段的具体部分的播放时间或时间偏移。“播放时间”和“时间偏移”指偏离固定时间点的时间，例如，具体电视节目或广告的开始时间，在本文中互换使用。The present invention further relates to systems and methods for providing contextually targeted content to an interactive television system. Contextual targeting requires not only identifying a displayed video clip but also determining the play time or time offset of a specific portion of the currently displayed video clip. "Play time" and "time offset" refer to a time offset from a fixed point in time, such as the start time of a specific television program or advertisement, and are used interchangeably herein.

更具体地说，本发明包含一项能够检测联网电视当前播放内容、推断播放内容主题、与观众进行相应互动的技术。具体而言，本文公开的技术解决了互动电视的能力限制，以通过互联网充分发挥服务器的功能，从而实现多种商业模式，包括如下应用(1)提供附加内容(导演评论、人物传记等) 使观众更大程度地参与正在观看的节目；(2)提供基于具体内容的“立即购买”功能(植入式广告、“购买本首歌曲”功能等)；(3)向观众提供网络型促销功能(游戏、竞赛等)的访问途径。More specifically, the present invention includes a technology that can detect the content currently being played on an Internet-connected television, infer the theme of the content being played, and interact with the viewer accordingly. Specifically, the technology disclosed herein addresses the capacity limitations of interactive television to fully utilize the server's functions through the Internet, thereby realizing a variety of business models, including the following applications: (1) providing additional content (director's commentary, character biographies, etc.) to enable viewers to participate more deeply in the program they are watching; (2) providing "buy now" functions based on specific content (product placement, "buy this song" function, etc.); and (3) providing viewers with access to network-based promotional functions (games, competitions, etc.).

在某些实施方式中，识别视频片段及确定时间偏移的方法是对屏幕显示的像素数据(或音频数据)子集进行取样，然后在内容数据库中查找类似的像素(或音频)数据。在另一些实施方式中，识别视频片段及确定时间偏移的方法是提取与视频片段有关的音频或图像数据，然后在内容数据库中查找类似的音频或图像数据。在一些替代性实施方式中，识别视频片段及确定时间偏移的方法是利用现有的自动语音识别技术处理与该视频片段有关的音频数据。在另一些替代性实施方式中，通过处理与视频片段有关的元数据识别视频片段，确定时间偏移。In some embodiments, the method for identifying a video segment and determining a time offset is to sample a subset of pixel data (or audio data) displayed on the screen and then search for similar pixel (or audio) data in a content database. In other embodiments, the method for identifying a video segment and determining a time offset is to extract audio or image data associated with the video segment and then search for similar audio or image data in a content database. In some alternative embodiments, the method for identifying a video segment and determining a time offset is to process the audio data associated with the video segment using existing automatic speech recognition technology. In other alternative embodiments, the video segment is identified and the time offset is determined by processing metadata associated with the video segment.

如下文所述，联网电视的观看视频片段识别软件可以选择性地位于包含联网电视的电视系统。在一些替代性实施方式中，视频片段识别软件的一部分位于电视系统，另一部分位于通过互联网连接电视系统的服务器。As described below, the video segment identification software for the networked TV can optionally be located in a TV system that includes the networked TV. In some alternative embodiments, a portion of the video segment identification software is located in the TV system and another portion is located in a server connected to the TV system via the Internet.

本发明的其他方面如下文的说明和权利要求所述。Other aspects of the invention are described below and in the claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明一个实施方式所述的联网电视的示意图。FIG1 is a schematic diagram of an Internet-connected TV according to an embodiment of the present invention.

图2-4显示了相应的示范性微件，可以根据通过主题关联的视频片段检测在联网电视上显示微件。2-4 show corresponding exemplary widgets that can be displayed on a connected TV based on the detection of video clips associated with a theme.

图5显示了一个示范性弹出窗口，点击相关字段(显示在图4所示的微件上)可显示弹出窗口。FIG. 5 shows an exemplary pop-up window, which can be displayed by clicking on a relevant field (displayed on the widget shown in FIG. 4 ).

图6-10的方框图显示了本发明其他实施方式所述的系统。6-10 are block diagrams showing systems according to other embodiments of the present invention.

图11-16为附件所述的图表，公开了利用模糊线索跟踪视频传输所用的算法。11-16 are diagrams described in the appendix, disclosing an algorithm for tracking video transmissions using fuzzy clues.

下面结合附图进行说明，附图以相同的数字表示不同附图中的类似元素。The following description is made with reference to the accompanying drawings, in which the same numerals represent similar elements in different drawings.

具体实施方式DETAILED DESCRIPTION

在图1所示的本发明的第一个实施方式中，系统100包括联网电视10。联网电视10一般通过处理器12连接全球计算机网络20。虽然图中显示的处理器12设在联网电视10外，但是本领域的技术人员应该理解处理器12也可以设在电视内。在本文中，“全球计算机网络”包括互联网。虽然图1没有显示电视信号源，但是应该理解联网电视10接收的是携带程序流的电视信号。In a first embodiment of the present invention, shown in FIG1 , system 100 includes a networked television 10. Networked television 10 is typically connected to a global computer network 20 via a processor 12. While processor 12 is shown as being external to networked television 10, those skilled in the art will appreciate that processor 12 may also be located within the television. In this context, "global computer network" includes the Internet. While FIG1 does not illustrate a television signal source, it should be understood that networked television 10 receives a television signal carrying a program stream.

在处理器12上运行的内容微件包括实时识别联网电视10所显示的视频片段的计算机软件。作为选择，内容微件还可能包括确定片段开始时间偏移的计算机软件。片段和偏移合称“位置”。响应于识别正在观看的视频片段且可选地确定时间偏移，微件向电视观众提供一个弹出窗口110形式的微件，弹出窗口110显示涉及与所看视频片段最相关的主题的类别。观众能在窗口110中选择一个主题，在处理器12上运行的微件能根据观众的选择，在全球计算机网络20中检索与选定主题有关的更多信息。检索方式，举例来说，包括将选定主题输入搜索引擎、在线百科或定制搜索算法。或者也可以在定制算法中输入位置，定制算法根据节目的位置显示规定内容。The content widget running on the processor 12 includes computer software that identifies the video clip being displayed on the networked TV 10 in real time. Optionally, the content widget may also include computer software that determines the time offset of the clip's start. The clip and offset are collectively referred to as the "position." In response to identifying the video clip being viewed and optionally determining the time offset, the widget provides the television viewer with a widget in the form of a pop-up window 110 that displays categories related to the most relevant topics for the video clip being viewed. The viewer can select a topic in window 110, and the widget running on the processor 12 can retrieve more information related to the selected topic from the global computer network 20 based on the viewer's selection. Retrieval methods include, for example, entering the selected topic into a search engine, an online encyclopedia, or a customized search algorithm. Alternatively, the location can be entered into a customized algorithm that displays specific content based on the location of the program.

有几种内容检测方法可用。在一个实施方式中，微件检查与程序流一起提供的元数据，元数据说明程序流所述的主题。例如，微件检查与电视信号一起发送的隐藏字幕数据。在另一个实施方式中，微件使用语音识别软件，维护所测语言在一段时间内的使用次数统计表。在另一个实施方式中，微件可以用语音特征检测或图像检测软件识别程序流中的显示图像。在另一个实施方式中，微件将视频或音频线索送到进行检测和上下文定向的服务器(下文将结合图10详细说明适用的视频像素线索处理软件的一个实施方式)。有几种主题相关性确定方法。There are several content detection methods available. In one embodiment, the widget examines metadata provided with the program stream that describes the subject matter described by the program stream. For example, the widget examines closed caption data sent with a television signal. In another embodiment, the widget uses speech recognition software to maintain a statistical table of the number of times the measured language is used over a period of time. In another embodiment, the widget can use speech feature detection or image detection software to identify displayed images in the program stream. In another embodiment, the widget sends video or audio cues to a server that performs detection and contextual targeting (one embodiment of suitable video pixel cue processing software is described in detail below in conjunction with Figure 10). There are several methods for determining topic relevance.

响应于识别正在观看的视频片段且可选地确定时间偏移，TV微件检索确定为与正在观看的视频片段主题上下文相关的附加信息。对于被确定为与所看视频片段主题上下文有关的附加信息或广告的检索过程下文称为“上下文定向”。现在结合图2-5说明上下文定向电视微件。In response to identifying the video segment being viewed and optionally determining a time offset, the TV widget retrieves additional information determined to be relevant to the thematic context of the video segment being viewed. The process of retrieving additional information or advertisements determined to be relevant to the thematic context of the video segment being viewed is hereinafter referred to as "contextual targeting." The contextually targeted TV widget will now be described with reference to Figures 2-5.

上下文定向电视微件是在联网电视10顶部运行的软件。软件提取足够的信息，识别观众正在观看的内容，然后根据提取的信息确定观众可能感兴趣的主题的附加信息。在电视屏幕当前显示的程序顶部屏幕位置显示附加信息。附加信息从聚合或互联网聚集工具(Wikipedia或Google)进入网络 (一般是互联网)。一些附加信息作为免费附加值提供给用户，另一些附加信息为付费广告或促销优惠。A contextually targeted TV widget is software that runs on top of the connected TV 10. The software extracts enough information to identify the content the viewer is currently watching and then, based on that information, determines additional information on topics that may be of interest to the viewer. The additional information is displayed on top of the currently displayed program on the TV screen. The additional information is accessed from a network (typically the internet) via aggregation or internet aggregators (Wikipedia or Google). Some of this additional information is provided to users as free value-added content, while other information is paid advertising or promotional offers.

为了解释本发明所述系统提供的观众体验，下面介绍几个场景。To illustrate the audience experience provided by the system of the present invention, several scenarios are described below.

在图2所示的第一个场景中，系统通过获取关键词、定向一般信息和广告词实现常规集成。在该场景中，观众正在观看热播剧《绯闻少女》。在观看的某一时刻，人物讨论前往汉普顿过暑假。上下文定向电视微件检测关键词“汉普顿”。针对关键词的检测，微件底座闪烁或高亮显示新的关键词，如图2的淡阴影所示。如果观众打开微件底座(即，观众有兴趣与微件互动)，则观众能展开微件。如果观众不打开微件底座且希望晚点再看，则保存关键词。电视观众始终能够滚动最后N个关键词，其中N是50等整数。当观众看到自己感兴趣的内容时，可以点击高亮显示的关键词，微件展开为侧栏模式，电视节目继续在背景中运行。展开的微件(如图2所示)显示有关汉普顿的定向信息，例如：(1)汉普顿概述：“汉普顿深受大众喜爱的海滨度假胜地。对于拥有当地度假屋的富人而言，汉普顿的很多地方是他们的乐园……”；以及(2)显示汉普顿位置的Google地图。在本实施例中，展开的微件还显示汉普顿的一些新闻，例如，“虽然麦当娜的助理称是讨厌的狗仔破坏了麦姐在汉普顿度假屋的周末，但是警官回应说……”。如果观众点击显示的其他字段，则展开的微件还会显示定向广告，例如汉普顿房地产、纽约旅行、以及富人(如同在汉普顿那样)聚集的其他海滩目的地度假套餐。In the first scenario shown in Figure 2, the system implements conventional integration by acquiring keywords, targeted general information, and advertising slogans. In this scenario, the viewer is watching the hit TV series "Gossip Girl". At some point during the viewing, the characters discuss going to the Hamptons for the summer vacation. The context-targeted TV widget detects the keyword "Hamptons". In response to the keyword detection, the widget base flashes or highlights the new keyword, as shown in the light shading of Figure 2. If the viewer opens the widget base (i.e., the viewer is interested in interacting with the widget), the viewer can expand the widget. If the viewer does not open the widget base and wants to watch it later, the keyword is saved. The TV viewer can always scroll through the last N keywords, where N is an integer such as 50. When the viewer sees content that interests them, they can click on the highlighted keyword, and the widget expands to sidebar mode, and the TV program continues to run in the background. The expanded widget (as shown in FIG2 ) displays targeted information about the Hamptons, such as: (1) an overview of the Hamptons: “The Hamptons are a popular seaside resort. For wealthy individuals who own local vacation homes, many areas of the Hamptons are a haven for them…”; and (2) a Google map showing the location of the Hamptons. In this embodiment, the expanded widget also displays some news about the Hamptons, such as, “While Madonna’s assistant claims that pesky paparazzi ruined her weekend at her Hamptons vacation home, police officers responded that…”. If the viewer clicks on other displayed fields, the expanded widget also displays targeted advertisements, such as Hamptons real estate, New York City tours, and vacation packages to other beach destinations where wealthy individuals (like the Hamptons) gather.

在图3所示的第二个场景中，系统通过获取热门关键词，利用与亚马逊等一键式商家的关系和促销宣传实现较复杂的集成。在第二场景中，观众正在观看《今夜娱乐》、《每日秀》等直播节目。在节目过程中，有些人在讨论某一名人的行为，例如，小甜甜布兰妮。上下文定向电视微件获取关键词“布兰妮”。针对这些关键词的检测，微件底座闪烁或高亮显示新的关键词，如图3所示。如果观众打开微件底座，则微件侧栏显示定向信息，例如，(1)布兰妮的简单介绍：“……布兰妮·简·斯皮尔斯，1981年12月2 日出生，是美国歌手兼艺人。布兰妮在最畅销女歌手中排名第8……”； (2)带“立即购买”按钮的专辑，包括1999年的《...Baby One More Time》、2000年的《Oops！...I Did It Again》、2001年的《Britney》、2003 年的《In the Zone》、2007年的《Blackout》、及2008年的《Circus》； (3)布兰尼的一些新闻(尽可能考虑上一个关键词“汉普顿”)；(4)与布兰妮的照片或MTV有关的图片或YouTube搜索结果链接；以及(5)布兰妮最新演唱会的宣传广告，设有互动日历，显示观众的地理区域和“立即购买”按钮。观众点击“立即购买”按钮后打开一个屏幕，以供观众尽量以最少的步骤完成交易(使用亚马逊ID/密码组合等)。完成一次购买以后，观众在日后购物时无须重复输入个人信息。In the second scenario shown in Figure 3, the system achieves a more complex integration by acquiring popular keywords and leveraging relationships and promotions with one-click merchants such as Amazon. In the second scenario, viewers are watching live programs such as Entertainment Tonight and The Daily Show. During the program, some people are discussing the behavior of a celebrity, such as Britney Spears. The context-targeted TV widget acquires the keyword "Britney". In response to the detection of these keywords, the widget base flashes or highlights the new keyword, as shown in Figure 3. If the viewer opens the widget base, the widget sidebar displays targeted information, such as (1) a brief introduction to Britney: "...Britney Jean Spears, born on December 2, 1981, is an American singer and entertainer. Britney ranks 8th among the best-selling female singers..."; (2) albums with "Buy Now" buttons, including "...Baby One More Time" in 1999, "Oops! ...I Did It Again” in 2001, “Britney” in 2003, “In the Zone” in 2007, “Blackout” in 2008, and “Circus” in 2008; (3) some news about Britney (considering the previous keyword “Hamptons” whenever possible); (4) images or YouTube search results related to Britney’s photos or music videos; and (5) a promotional advertisement for Britney’s latest concert, with an interactive calendar showing the viewer’s geographic area and a “Buy Now” button. After the viewer clicks the “Buy Now” button, a screen opens for the viewer to complete the transaction in the fewest possible steps (using an Amazon ID/password combination, etc.). After completing a purchase, the viewer does not need to re-enter personal information for future purchases.

在图4所示的第三个场景中，系统通过获取关键词或特定合作伙伴为了丰富媒体宣传活动而购买的视频/音频片段实现定制集成。在第三个场景中，观众正在观看广告，例如，汽车广告。在该实例中，广告向观众提供了激活微件底座“继续展示”的决定权。上下文定向微件通过预定标记或短语获取广告，使广告成为设定时间段的优先事件。微件侧栏显示微型网站，激发用户进一步进行品牌体验，例如，扩展广告人物和主题的附加webasodes；规格或功能比较等附加信息；以及游戏、彩票抽奖或定制工具等互动功能。例如，观众点击“和杰森·包恩一起开MINI”字段后显示Bourne Conspiracy- MINI微型网站(如图5所示)。In the third scenario, shown in Figure 4, the system implements customized integration by acquiring keywords or video/audio clips purchased by specific partners to enrich media campaigns. In this third scenario, a viewer is viewing an advertisement, such as a car commercial. In this instance, the advertisement offers the viewer the option to activate the widget base to "continue viewing." Contextually targeted widgets acquire advertisements based on predetermined tags or phrases, making them a priority event during a set time period. The widget sidebar displays a microsite that encourages users to further engage with the brand, such as additional web ads that expand on the characters and themes of the advertisement; additional information such as specs or feature comparisons; and interactive features such as games, lotteries, or customization tools. For example, a viewer clicking the "Drive a MINI with Jason Bourne" field displays the Bourne Conspiracy-MINI microsite (see Figure 5).

在某些实施方式中，识别视频片段和确定时间偏移的方法是对屏幕显示的像素数据(或相关音频数据)的子集进行采样，然后在内容数据库中查找类似的像素(或音频)数据。在另一些实施方式中，识别视频片段和确定时间偏移的方法是提取与视频片段有关的音频或图像数据，然后在内容数据库中查找类似的音频或图像数据。在替代性实施方式中，识别视频片段和确定时间偏移的方法是利用现有的自动语音识别技术处理与视频片段相关的音频数据。在另一些替代性实施方式中，通过处理与视频片段相关的元数据识别视频片段，确定时间偏移。In some embodiments, the method for identifying a video segment and determining a time offset is to sample a subset of pixel data (or related audio data) displayed on the screen and then search for similar pixel (or audio) data in a content database. In other embodiments, the method for identifying a video segment and determining a time offset is to extract audio or image data related to the video segment and then search for similar audio or image data in a content database. In alternative embodiments, the method for identifying a video segment and determining a time offset is to process the audio data related to the video segment using existing automatic speech recognition technology. In other alternative embodiments, the video segment is identified and the time offset is determined by processing metadata related to the video segment.

在另一些实施方式中，无须确定时间偏移，系统只需要对出现的关键词或短语作出反应。例如，图1所示的能在处理器12上运行的一种软件包括四个基础模块：(1)元数据采集模块，采集任意观看内容的元数据；(2)主题/关键词提取模块，分析采集的元数据，提取节目的相关内容；(3)有用信息上下文定向模块，根据提取的主题/关键词采集附加信息，并且向用户提供该等信息；以及(4)广告上下文定向模块，根据提取的主题/关键词采集创收信息，并且向用户提供该等信息(包括“立即购买”按钮以及关键词广告和宣传活动)。In other embodiments, there is no need to determine the time offset, and the system only needs to react to the occurrence of a keyword or phrase. For example, FIG1 shows a software program that can be executed on processor 12 and includes four basic modules: (1) a metadata collection module that collects metadata for any viewed content; (2) a topic/keyword extraction module that analyzes the collected metadata to extract relevant content from the program; (3) a useful information context targeting module that collects additional information based on the extracted topics/keywords and provides such information to the user; and (4) an advertising context targeting module that collects revenue-generating information based on the extracted topics/keywords and provides such information to the user (including a "buy now" button and keyword advertising and promotional campaigns).

观众正在观看的内容有很多元数据来源，包括(1)网络/电视台或第三方(电视指南等)提供的节目信息；(2)隐藏字幕输入；(3)观看节目的音频输入(通过语音识别运行)；(4)观看节目的视频传输(通过图像识别运行)；(5)观看节目音频或视频输入顶部的附加渠道；以及(6)人为加入特定节目和节目段落的定制内容。There are many sources of metadata about the content a viewer is watching, including (1) program information provided by the network/station or third parties (TV guide, etc.); (2) closed captioning input; (3) audio input of the program being watched (run through voice recognition); (4) video feed of the program being watched (run through image recognition); (5) additional channels on top of the audio or video feed of the program being watched; and (6) customized content that is manually added to specific programs and program segments.

在一个具体的实施方式中，处理器12在可行的情况下从观看节目的视频传输和隐藏字幕信息组合中采集元数据。语音识别引擎处理视频流，提取关键词。小心维护语音识别算法的字典语言模型，以有效地提取值得定向的关键词或短语。例如，对字典加权，以查找“布兰妮”、“扬基队”等正确的名词，阻止查找“绿色”、“热”等单词。对于字幕数据，由关键词/主题分析引擎进行流(文本流)处理。In one embodiment, processor 12 collects metadata from a combination of video transmissions and closed captioning information, where available, for the program being viewed. A speech recognition engine processes the video stream and extracts keywords. Care is taken to maintain a dictionary language model for the speech recognition algorithm to effectively extract keywords or phrases worth targeting. For example, the dictionary is weighted to search for correct nouns such as "Britney" and "Yankees" while discouraging searches for words such as "green" and "hot." For the captioning data, a keyword/topic analysis engine performs stream (text stream) processing.

现在介绍元数据采集构件的四种可用设置。在图6所示的实施方式中，在中心服务器(其通过互联网等广域网连接远程电视)上执行元数据采集和上下文定向所需的数据处理。在图7所示的实施方式中，在电视上处理元数据采集所需的数据，在中心服务器(通过广域网连接远程电视)上处理上下文定向所需的数据。在图8所示的实施方式中，在中心服务器(通过广域网连接远程电视)上处理上下文定向所需的数据，在离线服务器(通过局域网等连接上下文定向服务器)上处理元数据采集所需的数据。请注意，在该实施方式中，电视客户端18向服务器的频道识别构件26发送线索，以确定正在观看的节目，从而确定适用于该电视的元数据。另外还需要注意，虽然图8使用音频流作为电视客户端18的输入，但同样也可以使用视频输入。或者，在可能的情况下使用结合上述两种方法的混合方案。Four possible configurations for the metadata collection component will now be described. In the embodiment shown in FIG6 , metadata collection and data processing required for contextual targeting are performed on a central server (which is connected to a remote TV via a wide area network, such as the Internet). In the embodiment shown in FIG7 , data required for metadata collection is processed on the TV, and data required for contextual targeting is processed on the central server (which is connected to the remote TV via a wide area network). In the embodiment shown in FIG8 , data required for contextual targeting is processed on the central server (which is connected to the remote TV via a wide area network), and data required for metadata collection is processed on an offline server (which is connected to the contextual targeting server via a local area network, etc.). Note that in this embodiment, the TV client 18 sends a clue to the server's channel identification component 26 to determine the program being watched and, therefore, the metadata applicable to that TV. It should also be noted that while FIG8 uses an audio stream as the input to the TV client 18, a video input could also be used. Alternatively, a hybrid approach combining the two aforementioned approaches could be used, where possible.

现在结合图6进行说明，系统200包括远程电视10，远程电视10通过广域网(图中未显示)连接中心服务器220。电视10包括多像素屏幕、处理器及电视信号接收接口。电视处理器利用包含微件平台或引擎14以及与服务器220通信的电视客户端18的软件进行编程。微件引擎可以在电视屏幕上显示多个微件16中的任意一个。服务器220有一个或多个处理器及软件，包括语音识别模块22和上下文定向模块24。在该实施方式中，客户端18接收电视节目或观看节目的音频流，压缩该音频流，并将压缩的音频流发送给服务器220。电视客户端18向服务器220发送的信息也可以包括字幕流(如有)或其他元数据及/或频道信息。语音识别模块22处理压缩的音频流，确定正在观看的频道。Now, referring to FIG6 , the system 200 includes a remote TV 10 connected to a central server 220 via a wide area network (not shown). The TV 10 includes a multi-pixel screen, a processor, and a TV signal receiving interface. The TV processor is programmed with software comprising a widget platform or engine 14 and a TV client 18 that communicates with the server 220. The widget engine can display any one of a plurality of widgets 16 on the TV screen. The server 220 has one or more processors and software, including a speech recognition module 22 and a contextual orientation module 24. In this embodiment, the client 18 receives an audio stream of a TV program or a program being watched, compresses the audio stream, and sends the compressed audio stream to the server 220. The information sent by the TV client 18 to the server 220 may also include a subtitle stream (if any) or other metadata and/or channel information. The speech recognition module 22 processes the compressed audio stream to determine the channel being watched.

在图6所示的设置中，需要为电视操作系统(一般是Linux)构建轻型客户端，电视操作系统采集电视10的音频流、压缩信号、并且使信号通过网络流向正在等待的服务器220。该音频流附有令牌，以使服务器关联数据流与特定的用户及/或电视。接着，服务器在流上运行实时语音识别算法 22，或者确定字幕搜索，提取关键词/短语。有几种适用的语音识别程序包，例如，名为Sphinx-4的(http://cmusphinx.sourceforge.net/sphinx4/)开源包， Sphinx-4是完全使用JAVA^TM编程语言编写的语音识别程序。关键词/短语可附于相关的用户/电视，供上下文定向模块24用于向微件16传送内容(第三方内容输入等)。服务器220在永久用户数据库30中存入用户信息，包括电视ID号，在该电视上观看的节目、以及利用电视显示的微件16所做的选择。In the setup shown in FIG6 , a lightweight client needs to be built for the television operating system (typically Linux), which captures the audio stream from the television 10, compresses the signal, and streams it over the network to a waiting server 220. A token is attached to the audio stream so that the server can associate the data stream with a specific user and/or television. The server then runs a real-time speech recognition algorithm 22 on the stream, or determines a subtitle search to extract keywords/phrases. Several suitable speech recognition packages are available, such as the open source package called Sphinx-4 ( http://cmusphinx.sourceforge.net/sphinx4/ ), which is a speech recognition program written entirely in the JAVA ^™ programming language. Keywords/phrases can be attached to the relevant user/television for use by the contextual targeting module 24 in delivering content (third-party content input, etc.) to the widget 16. The server 220 stores user information in a permanent user database 30, including the television ID number, programs watched on that television, and selections made using the widget 16 displayed on the television.

现在结合图7进行说明，系统300包括远程电视10，远程电视10通过广域网(图中未显示)连接中心服务器320。在该设置中，需要为电视操作系统构建较重的(基本仍属于轻型)客户端18，电视操作系统将采集元数据 (包括字幕数据)或采集电视的音频流，运行较为有限的算法以确定相关主题，仅向服务器320发送提取的关键词/短语。在一个实施方式中，语音识别客户端18查看服务器320，定期更新其字典语言模块。有几种程序包能够为移动设备和嵌入设备(没有强大的CPU，与电视类似)提供轻型语音识别。相关实例包括上述开源Sphinx-4包的移动版本——PocketSphinx (http://cmusphinx.sourceforge.net/html/compare.php)。关键词/短语附于相关的用户/ 电视，供上下文定向模块24用于向微件16传送内容(第三方内容输入等)。同样，服务器320在永久用户数据库30中存入用户信息，包括电视 ID号，在该电视上观看的节目、以及利用电视显示的微件16所做的选择。Referring now to FIG. 7 , system 300 includes a remote TV 10 connected to a central server 320 via a wide area network (not shown). In this setup, a heavier (but still essentially lightweight) client 18 is built for the TV operating system. The TV operating system collects metadata (including subtitle data) or the TV's audio stream, runs a relatively limited algorithm to determine relevant topics, and sends only the extracted keywords/phrases to server 320. In one embodiment, the speech recognition client 18 monitors server 320 and periodically updates its dictionary language module. Several packages are available that provide lightweight speech recognition for mobile and embedded devices (which lack powerful CPUs, similar to TVs). An example includes PocketSphinx (http://cmusphinx.sourceforge.net/html/compare.php), a mobile version of the aforementioned open source Sphinx-4 package. Keywords/phrases are attached to the relevant user/TV and used by the contextual targeting module 24 to deliver content (third-party content input, etc.) to the widget 16. Likewise, the server 320 stores user information in the permanent user database 30, including the TV ID number, the programs viewed on the TV, and the selections made using the widgets 16 displayed on the TV.

现在结合图8进行说明，系统400包括远程电视10(通过广域网(图中未显示)连接中心服务器420)以及一个或多个离线服务器410，离线服务器通过局域网(图中未显示)连接服务器420。服务器420的软件包括上下文定向模块24和频道识别模块26。在图8所示的配置中，离线服务器410 连续接收对应电视频道组的输入内容，运行较繁复更强大的算法，为每个频道设置元数据标记。轻型电视客户端18(电视操作系统的组成部分)仅向服务器420输送足够的信息，使服务器420得以识别正在观看的频道。如图8 所示，电视客户端接收观看电视节目的音频流，提取待发送给服务器420的音频数据，使服务器420以音频特征检测或其他方式识别正在观看的频道。或者，电视客户端18可向服务器420发送像素或音频线索(包括像素或音频数据样本批)，服务器420可以利用本文附件公开的技术处理模糊像素或音频线索，识别正在观看的频道。例如，电视客户端18可以向服务器420发送像素线索，在这种情况下，频道识别模块26可能包括适当的音频像素线索处理软件，下文将结合图10对这类软件进行说明。在另一个替代性实施方式中，电视客户端18可能接收视频流，提取待发给服务器420的图像数据，使频道识别模块26能够利用图像识别软件识别正在观看的频道。Referring now to FIG8 , system 400 includes a remote television 10 (connected to a central server 420 via a wide area network (not shown)) and one or more offline servers 410 connected to server 420 via a local area network (not shown). The software in server 420 includes a contextual targeting module 24 and a channel identification module 26. In the configuration shown in FIG8 , offline server 410 continuously receives input corresponding to a set of television channels and runs a more complex and robust algorithm to assign metadata tags to each channel. A lightweight television client 18 (part of the television operating system) only transmits enough information to server 420 to enable server 420 to identify the channel being viewed. As shown in FIG8 , the television client receives the audio stream of the television program being viewed and extracts the audio data to be sent to server 420, enabling server 420 to identify the channel being viewed using audio feature detection or other methods. Alternatively, the television client 18 can send pixels or audio cues (including batches of pixel or audio data samples) to server 420, which can then process the blurred pixels or audio cues using the techniques disclosed in the appendix to this document to identify the channel being viewed. For example, the television client 18 may send pixel cues to the server 420, in which case the channel identification module 26 may include appropriate audio pixel cue processing software, such as is described below in conjunction with FIG10. In another alternative embodiment, the television client 18 may receive the video stream and extract image data to be sent to the server 420, enabling the channel identification module 26 to identify the channel being viewed using image recognition software.

根据从电视客户端收到的信息，服务器420能方便地识别正在观看的视频片段以及与节目开始时间相对的时间偏移。在线服务器420匹配观众正在观看的频道与离线服务器410标记的频道，向上下文定向模块24传输离线服务器先前提供的、相应的关键词/短语。关键词/短语附于相关的用户/电视，供上下文定向模块24用于向微件16传送内容(第三方内容输入等)。离线服务器没有必要实时运行。离线服务器定期(每小时或每天)将元数据(包括上述关键词/短语)加载到服务器420的内存中。此外，虽然离线服务器410采集的是网络直播输入，但是观众可能延迟数小时或数天观看同一内容。在线服务器420对频道和相应的时间指标与节目(实时节目和以往节目)作匹配。离线服务器410和频道识别元件26经配置在指定时间内(通常为数日至一周)保存节目线索和元数据。Based on the information received from the TV client, server 420 can conveniently identify the video clip being watched and the time offset relative to the program start time. Online server 420 matches the channel currently being watched with the channels tagged by offline server 410 and transmits the corresponding keywords/phrases previously provided by the offline server to contextual targeting module 24. The keywords/phrases are attached to the relevant user/TV and used by contextual targeting module 24 to deliver content (third-party content input, etc.) to widget 16. The offline server does not necessarily run in real time. The offline server periodically (hourly or daily) loads metadata (including the aforementioned keywords/phrases) into server 420's memory. Furthermore, while offline server 410 captures live webcast feeds, viewers may be delayed for hours or days watching the same content. Online server 420 matches the channels and corresponding time indicators to programs (both live and past). Offline server 410 and channel identification component 26 are configured to store program leads and metadata for a specified period (typically several days to a week).

仍然结合图8进行说明，另外还有一种设置是在向离线服务器410输送以外(或者作为代替)，定期将节目传输分批输入离线服务器410。离线服务器410在指定的时间内保存线索和元数据，使该设置形成节目线索和元数据库，特别有利于在DVR或DVD上观看内容的用户。请注意，也可以用网络输入中不可用的分批输入加载节目。Continuing with reference to FIG. 8 , another configuration is to periodically batch program transmissions to the offline server 410 in addition to (or instead of) transmitting to the offline server 410. The offline server 410 stores clues and metadata for a specified period of time, allowing this configuration to form a database of program clues and metadata, which is particularly beneficial for users who view content on DVRs or DVDs. Note that batch loading can also be used, which is not available with network input.

在本发明的另一个方面，可以采用混合方案。合理的设置必须是一个混合方案，能在最适合的情况下采用上述每种方法。由于电视配置差异巨大，因此一个方案不能满足所有用户的要求。对于有可用频道/网络数据的观众(例如，用户线上观看电视或者下载按需服务的内容)或者在音频输入识别为已知频道的情况下，可优先选用离线计算方法(如图8所示)。如果没有宽带无法形成音频流，则可以用电视的语音识别客户端(参见图7)处理最热门的关键词。对于观看DVD或者使用DVR的观众而言，流向服务器的传输(参见图6)将提供更好/更深的解析。如频道检测或具体节目设置可行，则仅将线索发送到服务器上进行匹配(参见图8)。系统在各种情况下根据各种方法的成功率、客户记录和广告商的价值、可用带宽或计算能力等标准选用方法。In another aspect of the present invention, a hybrid solution can be used. A reasonable setup must be a hybrid solution that can use each of the above methods in the most appropriate situation. Since TV configurations vary greatly, one solution cannot meet the requirements of all users. For viewers with available channels/network data (for example, users watching TV online or downloading content from on-demand services) or when the audio input is identified as a known channel, offline calculation methods can be preferred (as shown in Figure 8). If there is no broadband and an audio stream cannot be formed, the most popular keywords can be processed by the TV's voice recognition client (see Figure 7). For viewers watching DVDs or using DVRs, streaming to the server (see Figure 6) will provide better/deeper analysis. If channel detection or specific program settings are feasible, the clues will only be sent to the server for matching (see Figure 8). The system selects a method in each case based on criteria such as the success rate of each method, customer records and advertiser value, available bandwidth or computing power.

在图9所示的本发明的另一个实施方式中，系统500包括服务器520，服务器520维护与离线上下文定向服务器510通信的用户特定数据库30。离线上下文定向服务器510接收数据库30的输入以及频道或网络输入和内容输入，向服务器520提供信息，服务器520据此将处理过的信息传给联网电视 10。In another embodiment of the present invention shown in FIG9 , the system 500 includes a server 520 that maintains a user-specific database 30 that communicates with an offline contextual targeting server 510. The offline contextual targeting server 510 receives input from the database 30 as well as channel or network input and content input, and provides information to the server 520, which then transmits processed information to the network-connected TV 10.

具体而言，系统500包括电视10(带微件平台14和客户端18)，离线服务器510(上下文定向引擎在该服务器上运行)，服务器520(包括与模块530匹配的音频输入频道和语音识别上下文定向引擎540)以及观众数据库30。系统500经编程算出观众正在观看的内容，已经用进入电视10的音频流予以实现。电视有各种可能的设置，大多数设置会“损失”最有价值的元数据，例如，字幕、频道信息及节目描述。具体而言，大部分通过HDMI 电缆连接电视的分线盒配置元数据性能均较差。音频和视频输入是最小公分母，普遍存在于所有设置。图9显示了电视客户端18和音频输入频道匹配模块530(以音频特征检测或其他方式用音频流检测正在观看的频道)。或者，电视客户端18可以向服务器520发送像素线索(包括像素数据样本批)，频道匹配模块530利用本文附件1公开的技术处理模糊像素线索，识别正在观看的频道。在图9所示的具体实施方式中，电视客户端模块18是电视操作系统的轻型客户端，电视操作系统采集电视的音频流、压缩信号、通过全球计算机网络(图9未显示)向服务器52发送信号。音频流附有令牌，使服务器520得以关联音频流和特定电视/观众。Specifically, system 500 includes a television 10 (with a widget platform 14 and client 18), an offline server 510 (on which a contextual targeting engine runs), a server 520 (including an audio input channel matching module 530 and a speech recognition contextual targeting engine 540), and a viewer database 30. System 500 is programmed to determine what the viewer is watching, using the audio stream entering the television 10. Televisions have a variety of possible configurations, most of which "lose" the most valuable metadata, such as subtitles, channel information, and program descriptions. Specifically, most cable box configurations that connect to the television via an HDMI cable have poor metadata performance. Audio and video inputs are the lowest common denominator and are ubiquitous in all configurations. Figure 9 shows the television client 18 and audio input channel matching module 530 (which uses the audio stream to detect the currently viewed channel, using audio feature detection or other means). Alternatively, the television client 18 can send pixel cues (including batches of pixel data samples) to server 520. Channel matching module 530 uses the techniques disclosed in Appendix 1 herein to process the blurred pixel cues and identify the currently viewed channel. In the specific embodiment shown in FIG9 , the TV client module 18 is a lightweight client of the TV operating system, which collects the TV's audio stream, compresses the signal, and sends the signal to the server 52 via a global computer network (not shown in FIG9 ). The audio stream is attached with a token, allowing the server 520 to associate the audio stream with a specific TV/viewer.

服务器520接收电视10的音频流，与既定的电视/观众关联，将音频流发给音频传入频道匹配模块530，如无法发给模块530，则发送给语音识别上下文定向引擎540用于标记。标记定向内容后，服务器54将定向内容回传给电视10微件16。Server 520 receives the audio stream from TV 10, associates it with a specific TV/viewer, and sends the audio stream to audio incoming channel matching module 530. If the audio stream cannot be sent to module 530, it is sent to speech recognition context targeting engine 540 for tagging. After tagging the targeted content, server 540 returns the targeted content to TV 10 widget 16.

服务器520包括音频传入频道匹配模块530，该模块试图将电视10的音频流与全国人气最高的有线电视频道的数百个已知即时动态集作匹配。如果观众在观看已知频道，则利用运行于离线服务器510的上下文定向引擎收集的元数据进行标记。如果观众不是在观看已知频道，则经语音识别上下文定向引擎540处理。语音识别上下文定向引擎540作为备份选项，不需要监控全国的所有频道。另外，在该连续过程中，对频道变化进行检测，添加多个频道的主题/关键词确定增加标记数量。Server 520 includes an audio incoming channel matching module 530, which attempts to match the audio stream of TV 10 with hundreds of known live feeds of the most popular cable TV channels nationwide. If the viewer is watching a known channel, it is tagged using metadata collected by the contextual targeting engine running on the offline server 510. If the viewer is not watching a known channel, it is processed by the speech recognition contextual targeting engine 540. The speech recognition contextual targeting engine 540 serves as a backup option, eliminating the need to monitor all channels nationwide. In addition, in this continuous process, channel changes are detected, and the number of tags is increased by adding topics/keywords for multiple channels.

上下文定向引擎是在离线服务器510上运行的软件。或者，可以使用多个离线服务器。离线服务器510连接全国各地高人气有线和网络频道的即时动态。可以对动态内容进行配置，以显示客户端电视丢失的有用元数据。具体而言，隐藏字幕数据、节目说明和频道类型激活在频道上标记实时主题/关键词信息的上下文定向引擎。由于每个频道只需处理一次(而不是每个客户端电视一次)，因此可以实时运行更强大的算法。使用微件的观众的实际响应连续改善过程产生元数据字典。观众响应从微件16发送到服务器520，存入用户数据库30，发送给离线服务器510，如图9名为“用户微件反馈”箭头所示。上下文定向引擎使微件观众的互动关键词享有优先权，忽略的内容降级。从而使当前电视10内容的元数据字典更为精确。The contextual targeting engine is software running on an offline server 510. Alternatively, multiple offline servers can be used. The offline server 510 connects to live feeds of popular cable and network channels across the country. The dynamic content can be configured to display useful metadata that is missing from the client TV. Specifically, closed caption data, program descriptions, and channel type activate the contextual targeting engine, which tags real-time topic/keyword information on the channel. Since each channel only needs to be processed once (rather than once per client TV), more powerful algorithms can be run in real time. The metadata dictionary is generated by a continuous improvement process of actual responses from viewers using the widget. Viewer responses are sent from the widget 16 to the server 520, stored in the user database 30, and sent to the offline server 510, as shown by the arrow titled "User Widget Feedback" in Figure 9. The contextual targeting engine prioritizes the interactive keywords of the widget viewer and demotes ignored content. This results in a more accurate metadata dictionary for the current TV 10 content.

如前所述，服务器520包括语音识别上下文定向引擎540。对于调到未识别频道、播放DVD或使用DVR的观众，用实时语音识别方案提取主题/ 关键词。语音识别系统只能使用有限的字典，该方案具有可行性的关键是离线服务器510维护的简明主题/关键词字典，即当前电视节目流行的、微件观众已经参与的关键词字典。由于在最近的时间(很多情况下仅在数小时后) 录制播放内容，在离线过程中标记播放内容且利用微件反馈内容改善播放内容的元数据，因此上述系统特别有利于使用DVR的观众。现在仍然结合图9 进行说明，电视10微件16利用上述所有构件完成的定向，仅为观众实际观看系统的组成部分。普通的Konfabulator微件必须定期更新以获得新的感觉和外观，与之不同的是，上下文定向微件16能在任意规定时间根据定向内容改变呈现形式。As previously mentioned, server 520 includes a speech recognition contextual targeting engine 540. For viewers tuning into unrecognized channels, playing DVDs, or using DVRs, a real-time speech recognition solution extracts topics/keywords. Speech recognition systems can only use a limited dictionary, so the key to the feasibility of this solution is the concise topic/keyword dictionary maintained by the offline server 510—a dictionary of keywords that are currently popular in current television programs and that widget viewers have already engaged with. Because the content is recorded recently (in many cases just hours later), marked offline, and its metadata improved using widget feedback, the system is particularly beneficial for viewers using DVRs. Still referring to FIG. 9 , the targeting performed by the TV 10 widget 16 using all of the aforementioned components is only part of the viewer's actual viewing system. Unlike conventional Konfabulator widgets, which must be periodically updated to achieve a fresh feel and appearance, the contextual targeting widget 16 can change its presentation at any given time based on the targeted content.

现在说明本发明的优选实施方式。虽然待公开系统的联网电视带内容微件引擎和客户端软件(产生像素线索点等)，但是在本发明范围内，微件引擎和客户端软件置于STB、DVR或DVD播放器等向联网电视提供电视信号的配置设备。另外，虽然待公开系统对像素值进行取样和处理，但取样、处理的值也可以是音频值或隐藏字幕等元数据。A preferred embodiment of the present invention will now be described. While the networked television system of the disclosed system includes a content widget engine and client software (generating pixel cue points, etc.), within the scope of the present invention, the widget engine and client software reside in a device such as a STB, DVR, or DVD player that provides television signals to the networked television. Furthermore, while the disclosed system samples and processes pixel values, the sampled and processed values can also be audio values or metadata such as closed captions.

图10所示的优选实施方式所述系统的主要构件包括电视系统52以及通过互联网等网络通信的第一服务器54。另外，系统包括第二服务器56(以下称“离线服务器”)，离线服务器56通过网络，优选为局域网(LAN) 与第一服务器54通信。10 , the system comprises a television system 52 and a first server 54 communicating via a network such as the Internet. Additionally, the system comprises a second server 56 (hereinafter referred to as the "offline server") communicating with the first server 54 via a network, preferably a local area network (LAN).

图10显示了电视系统52、第一服务器54和离线服务器56的功能构件。电视系统52包括电视，该电视包括由多像素屏幕(图10未显示)以及至少一个向电视提供电视信号的其他构件(未显示)组成。例如，其他电视构件包括STB、DVR或DVD播放器。电视系统52还包括处理器(图10未显示)。处理器可加入电视或至少一个电视系统的其他构件。FIG10 illustrates the functional components of a television system 52, a first server 54, and an offline server 56. The television system 52 includes a television comprising a multi-pixel screen (not shown in FIG10 ) and at least one other component (not shown) that provides a television signal to the television. For example, the other television component may include a STB, a DVR, or a DVD player. The television system 52 also includes a processor (not shown in FIG10 ). The processor may be incorporated into the television or into at least one other component of the television system.

现在仍然结合图10进行说明，电视系统的处理器用软件进行编程，其包括微件引擎58和客户端60。如前文关于电视系统处理器位置的说明，微件引擎和客户端软件可位于电视或至少一个电视系统的其他构件上。应该理解的是，微件引擎和客户端软件也可以在电视系统52的不同处理器上运行。10 , the television system's processor is programmed with software and includes a widget engine 58 and a client 60. As previously described regarding the location of the television system's processor, the widget engine and client software may be located on the television or at least one other component of the television system. It should be understood that the widget engine and client software may also run on different processors within the television system 52.

在任意情况下，客户端模块60经编程对像素数据进行采样，并且根据采样的像素数据产生发送给服务器54的HTTP请求。HTTP请求包括时间标记和多个RGB(或十六进制)值串，RGB(或十六进制)值串称为“像素线索点”。每个像素线索点包括构成电视屏幕正在显示的视频片段的相应“帧”的相应的RGB(或十六进制)值子集，详细说明参见下文。[事实上，数字视频没有帧。本文公开的系统以一定的时间采样，例如，在各时间量T采样。]In any case, the client module 60 is programmed to sample pixel data and generate an HTTP request based on the sampled pixel data to be sent to the server 54. The HTTP request includes a time stamp and a plurality of RGB (or hexadecimal) value strings, which are called "pixel clue points." Each pixel clue point includes a corresponding subset of RGB (or hexadecimal) values that constitute a corresponding "frame" of the video clip being displayed on the television screen, as described in detail below. [In fact, digital video does not have frames. The system disclosed herein samples at a certain time, for example, at each time amount T.]

此处需要注意的是服务器54包括处理器和内存(图10均未显示)。但是，图10显示的服务器54至少包括以下软件构件：频道识别模块62、上下文定向模块64及包括索引内容库的数据库66。频道识别模块和上下文定向模块在服务器处理器上运行。库数据本身需要一致、可用、可供快速搜索的存储格式。最简单的方法是将库载入服务器内存的数据结构。另一个方法是将库的大部分内容存盘。It is important to note that the server 54 includes a processor and memory (neither of which is shown in FIG10 ). However, FIG10 shows that the server 54 includes at least the following software components: a channel identification module 62, a contextual targeting module 64, and a database 66 comprising an indexed content library. The channel identification module and the contextual targeting module run on the server processor. The library data itself requires a consistent, usable, and rapidly searchable storage format. The simplest approach is to load the library into a data structure in the server's memory. Another approach is to save the majority of the library's contents to disk.

频道识别模块62包括点管理子模块和用户管理子模块(图10未显示)。点管理子模块以两种方式搜索数据库66：(1)在整个库中搜索既定的点集，返回所有可能匹配的内容；以及(2)搜索既定的点集和既定的可能位置，返回用户是否真正处于当前存储数据所指示的位置。[“用户”指用全局唯一性ID识别的唯一电视等设备。]The channel identification module 62 includes a point management submodule and a user management submodule (not shown in FIG10 ). The point management submodule searches the database 66 in two ways: (1) searching the entire database for a given set of points and returning all possible matches; and (2) searching a given set of points and a given possible location and returning whether the user is actually at the location indicated by the currently stored data. [A “user” is a unique device, such as a television, identified by a globally unique ID.]

用户管理子模块保留用户会话，利用点管理子模块的结果匹配位置(位于观看的视频片段)给具体用户。用户管理子模块同时保留用于确定匹配形式及时间的配置和容差。用户管理子模块还包括会话管理器。用户管理子模块根据从电视客户端模块60接收的HTTP请求匹配用户位置。如果用户ID 已经包括会话数据，则HTTP请求路由至会话所附的用户管理子模块(会话持续性)。用户管理子模块查看用户记录，决定向点管理子模块提出的搜索请求(如有)类型。如果用户位置是可能的内容，则调用点管理子模块以围绕该位置作强力搜索。如果不知道用户位置，则调用点管理子模块作概率全局搜索。用户管理子模块将更新位置存入用户会话。The user management submodule maintains user sessions and uses the results from the point management submodule to match locations (located in the video clip being viewed) to specific users. The user management submodule also maintains configurations and tolerances for determining the format and timing of matches. The user management submodule also includes a session manager. The user management submodule matches user locations based on HTTP requests received from the TV client module 60. If the user ID already includes session data, the HTTP request is routed to the user management submodule attached to the session (session persistence). The user management submodule reviews the user's history and determines the type of search request (if any) to submit to the point management submodule. If the user's location is a possible context, the point management submodule is called to perform a brute force search around that location. If the user's location is unknown, the point management submodule is called to perform a probabilistic global search. The user management submodule stores the updated location in the user session.

如图10标为“线索”的箭头所示，客户端模块60向频道识别模块62的用户管理子模块发送含像素线索信息的定期更新。上述通信通过上述HTTP 请求完成，像素线索信息通过“获取”参数发送。10, the client module 60 sends periodic updates containing pixel clue information to the user management submodule of the channel identification module 62. This communication is accomplished via the aforementioned HTTP request, with the pixel clue information being sent via the "get" parameter.

上述HTTP请求的实例：Example of the above HTTP request:

http://SERVER_NAME/index？token＝TV_ID&time＝5799&cueData＝8-l-0,7- 0-0,170-158-51,134-21-16,3-0-6,210-210-212,255-253-251,3-2-0,255-255- 244,13-0-0,182-30-25,106-106-40,198-110-103,|28-5-0,3-0-2,100-79-2,147- 31-41,3-0-6,209-209-209,175-29-19,0-0-0,252-249-237,167-168-165,176-25- 17,113-113-24,171-27-32,|38-7-0,2-2-2,99-70-0,116-21-31,6-0-9,210-210-210, 179-31-22,31-31-33,162-65-64,10-10-10,184-33-25,105-108-32,169-28-28, |104-86-15,4-4-4,46-18-0,178-112-116,0-0-1,213-213-213,178-31-22,211-211- 211,164-62-72,0-0-0,183-32-24,150-149-42,153-27-19,|188-192-43,2-1-6,67- 49-0,156-92-95,3-1-2,215-215-215,177-28-19,226-233-53,249-247-247,207- 211-21,182-31-23,136-153-47,152-25-18,|192-118-109,176-181-84,201-201- 201,218-172-162,201-200-39,226-226-226,244-244-244,221-214-212,166- 165-170,209-209-209,191-26-36,154-28-20,150-21-15,|0-3-0,0-0-0,156-27-22, 161-28-19,192-192-26,157-26-22,174-29-23,149-23-18,190-34-25,156-27-20, 176-27-18,0-0-0,184-30-25,|159-29-19,9-3-0,161-26-22,137-22-15,0-4-9,167- 26-26,159-28-25,165-27-24,65-21-13,154-22-19,99-24-11,153-24-20,185-34- 28,|153-26-21,0-0-0,165-25-15,141-24-13,1-1-1,165-25-17,154-27-24,182- 32-26,180-31-25,149-25-17,155-21-19,36-12-4,171-29-22,|153-26-21,0-0-0, 165-25-15,141-24-13,1-1-1,165-25-17,154-27-24,182-32-26,180-31-25,149- 25-17,155-21-19,36-12-4,171-29-22,|http://SERVER_NAME/index? token=TV_ID&time=5799&cueData=8-l-0,7- 0-0,170-158-51,134-21-16,3-0-6,210-210-212,255-253-251,3-2-0,255-255- 244,13-0-0,182-30-25,106-106-40,198-110-103,|28-5-0,3-0-2,100-79-2,147- 31-41,3-0-6,209-209-209,175-29-19,0-0-0,252-249-237,167-168-165,176-25- 17,113-113-24,171-27-32,|38-7-0,2-2-2,99-70-0,116-21-31,6-0-9,210-210-210, 179-31-22,31-31-33,162-65-64,10-10-10,184-33-25,105-108-32,169-28-28, |104-86-15,4-4-4,46-18-0,178-112-116,0-0-1,213-213-213,178-31-22,211-211- 211,164-62-72,0-0-0,183-32-24,150-149-42,153-27-19,|188-192-43,2-1-6,67- 49-0,156-92-95,3-1-2,215-215-215,177-28-19,226-233-53,249-247-247,207- 211-21,182-31-23,136-153-47,152-25-18,|192-118-109,176-181-84,201-201- 201,218-172-162,201-200-39,226-226-226,244-244-244,221-214-212,166- 165-170,209-209-209,191-26-36,154-28-20,150-21-15,|0-3-0,0-0-0,156-27-22, 161-28-19,192-192-26,157-26-22,174-29-23,149-23-18,190-34-25,156-27-20, 176-27-18,0-0-0,184-30-25,|159-29-19,9-3-0,161-26-22,137-22-15,0-4-9,167- 26-26,159-28-25,165-27-24,65-21-13,154-22-19,99-24-11,153-24-20,185-34- 28,|153-26-21,0-0-0,165-25-15,141-24-13,1-1-1,165-25-17,154-27-24,182- 32-26,180-31-25,149-25-17,155-21-19,36-12-4,171-29-22,|153-26-21,0-0-0, 165-25-15,141-24-13,1-1-1,165-25-17,154-27-24,182-32-26,180-31-25,149- 25-17,155-21-19,36-12-4,171-29-22,|

该HTTP请求包含如下参数：The HTTP request contains the following parameters:

参数“token(令牌)”是电视(或其他设备)的唯一标识符。生产商向每台电视分配全局唯一的ID，该ID发给频道识别模块62的用户管理子模块，如图10所示。The parameter "token" is a unique identifier of the TV (or other device). The manufacturer assigns a globally unique ID to each TV, which is sent to the user management submodule of the channel identification module 62, as shown in FIG10 .

参数“time(时间)”是任意时间标记，用于保持请求顺序，协助计算下文所述的“最可能位置”。一般由电视机的内置时间提供该参数。The parameter "time" is an arbitrary time stamp used to maintain the order of requests and to assist in calculating the "most likely position" described below. This parameter is typically provided by the TV's built-in clock.

参数“cueData”是RGB值列表，例如，包括RGB组合的像素值样本。其格式包括R1-G1-B1，R2-G2-G2，...|R3-G3-B3,R4-G4-B4,...|等，其中每个 RX-GX-BX指示一个相应的RGB位置。RGB位置1、RGB位置2和RGB位置3组成样本。样本1|样本2|样本3|等构成HTTP请求。[本文所附的权利要求书将样本称为“像素线索点”。]应该尽可能对“RGB位置”作广泛的解释，以包含以X坐标和Y坐标标识的单独像素的RGB值集，以及作为阵列(方阵列等)内多个单独像素RGB值函数的RGB值集。在第二种情况下，阵列所有像素的单独RGB值集的集合称为“片数据(PatchData)”。像素阵列位于电视屏幕的既定区域(方形区域等)。The parameter "cueData" is a list of RGB values, for example, a sample of pixel values comprising RGB combinations. Its format includes R1-G1-B1, R2-G2-G2, ...|R3-G3-B3, R4-G4-B4, ...|, etc., where each RX-GX-BX indicates a corresponding RGB position. RGB position 1, RGB position 2, and RGB position 3 constitute a sample. Sample 1 | Sample 2 | Sample 3 |, etc. constitute an HTTP request. [The claims appended hereto refer to samples as "pixel cue points."] "RGB position" should be interpreted as broadly as possible to include a set of RGB values for a single pixel identified by an X-coordinate and a Y-coordinate, as well as a set of RGB values that is a function of the RGB values of multiple single pixels within an array (a square array, etc.). In the second case, the collection of sets of single RGB values for all pixels of the array is called "patch data (PatchData)". The pixel array is located in a given area (a square area, etc.) of the TV screen.

在上述实施例中，HTTP请求的cueData参数有10个样本，一个视频帧一个样本，每个样本包含13个像素或13个像素阵列的RGB值，为十个帧中的每个采集相同的像素或像素阵列。但是，每帧像素值的数量、取样像素的位置、以及HTTP请求的样本数量可以随着电视客户端构件收到的点采样指令而变化。In the above embodiment, the cueData parameter of the HTTP request has 10 samples, one sample per video frame, and each sample contains the RGB values of 13 pixels or a 13-pixel array, with the same pixels or pixel arrays sampled for each of the ten frames. However, the number of pixel values per frame, the locations of the sampled pixels, and the number of samples in the HTTP request may vary depending on the point sampling instructions received by the television client component.

在图10所示的实施方式中，电视系统52包含提取像素信息的系统级函数。函数定期“醒来”，例如，每0.1秒“醒来”一次，提取各像素数据“帧”的N片像素数据，其中是N正整数(例如，13)。各片的像素数据减至单一像素样本，即单一RGB值集，该单一像素样本是相应片的像素数据的函数。函数可以是平均、加权平均等适当函数。一系列“帧”(例如10 个)的像素样本累积后发送给服务器。例如，电视客户端定期(例如，每 1.0秒一次)向服务器发送像素样本批。In the embodiment shown in FIG10 , the television system 52 includes a system-level function for extracting pixel information. The function “wakes up” periodically, for example, every 0.1 seconds, and extracts N slices of pixel data for each “frame” of pixel data, where N is a positive integer (for example, 13). The pixel data for each slice is reduced to a single pixel sample, i.e., a single set of RGB values, which is a function of the pixel data for the corresponding slice. The function can be an average, a weighted average, or any other suitable function. A series of “frames” (e.g., 10) of pixel samples are accumulated and sent to the server. For example, the television client sends batches of pixel samples to the server periodically (e.g., once every 1.0 second).

下面为示范性API规格说明文件(以C计算机语言编写)。API是电视客户端模块60的一部分，软件在电视系统的芯片组中运行。芯片组实现以下API规范定义的具体函数。The following is an exemplary API specification document (written in C computer language). The API is part of the TV client module 60, and the software runs in the chipset of the TV system. The chipset implements the specific functions defined in the following API specification.

API文件包括相关“Patch(片)”、“PatchData(片数据)”和“Pixel(像素)”的三个数据结构声明。电视屏幕像素安排在X、Y平面，每个像素由X 坐标和Y坐标识别。“像素”包括三个整数(RGB值等)。[或者，可以用十六进制值构成声明]。“片”是电视屏幕的方块坐标，每个方块包括像素阵列。“PatchData”是屏幕既定方块中的“像素”集合。句法注意事项：在C 语言中，“Pixel*”指“像素”集合。因此，“Pixel*pixelData；”指任意命名的“pixelData”的“像素”集合。函数：The API file includes three data structure declarations related to "Patch", "PatchData" and "Pixel". The pixels of the TV screen are arranged in the X, Y plane, and each pixel is identified by an X coordinate and a Y coordinate. "Pixel" consists of three integers (RGB values, etc.). [Alternatively, the declaration can be made using hexadecimal values]. "Patch" is the square coordinates of the TV screen, and each square contains an array of pixels. "PatchData" is a collection of "pixels" in a given square of the screen. Syntax note: In C language, "Pixel*" refers to a collection of "pixels". Therefore, "Pixel*pixelData;" refers to a collection of "pixels" of an arbitrarily named "pixelData". Function:

PatchData*getPatchesFromVideo(Patch*requestedPatches,int numOfPatches)；PatchData*getPatchesFromVideo(Patch*requestedPatches,int numOfPatches);

由电视系统的芯片组实现，指函数“getPatchesFromVideo”返回“PatchData”集合。Implemented by the TV system chipset, it refers to the function "getPatchesFromVideo" that returns the "PatchData" collection.

再次结合图10进行说明，在优选实施方式中，电视客户端模块60经编程获取各阵列(即片)内各个像素的RGB值。然后对各个像素阵列或片的 RGB值的集合进行处理，产生各个像素阵列或片的相应RGB值集。换言之，对于3X3的像素阵列，9个RGB值集合减至单个RGB值集。可以用不同的数学函数实现上述运算，例如，平均或加权平均。电视客户端60向服务器54的频道识别模块62发送的HTTP请求包括各片的RGB值集(三个整数等)，不包括各片的全部片数据。10 , in a preferred embodiment, the TV client module 60 is programmed to obtain the RGB values for each pixel within each array (i.e., slice). The set of RGB values for each pixel array or slice is then processed to generate a corresponding set of RGB values for each pixel array or slice. In other words, for a 3x3 pixel array, nine sets of RGB values are reduced to a single set of RGB values. This operation can be performed using various mathematical functions, such as an average or a weighted average. The HTTP request sent by the TV client 60 to the channel identification module 62 of the server 54 includes the set of RGB values for each slice (e.g., three integers), rather than the complete slice data for each slice.

电视客户端模块60是通过“烘烤”附于电视芯片等设备的嵌入式代码段，向频道识别模块62的用户管理子模块发送采集的点。可以在具有固件更新的字段中更新电视客户端模块。在一个实施方式中，电视客户端模块60 定期向服务器54请求指令，以确定采样点数、频率、位置等。电视客户端模块不需要按采样速率向服务器发送点。在一个实施方式中，电视客户端模块每秒大约取10次样，对结果点进行分批，以一定的时间，例如每秒，向服务器发送点。电视客户端模块需要了解具有会话的用户管理子模块构件。在初始化过程中(此后定期)，电视客户端模块调用用户管理会话管理器获取用户管理构件的地址。分配给既定电视等设备的用户管理构件保持用户的会话信息。如果没有分配的用户管理构件可用(如崩溃等)，则会话管理器指定新的用户管理构件。电视客户端模块还需要任意时间标记，以保持请求顺序，向点管理子模块的相关构件发送定位信息。The TV client module 60, an embedded code segment baked into a device such as a TV chip, sends collected points to the user management submodule of the channel identification module 62. The TV client module can be updated in the field with firmware updates. In one embodiment, the TV client module 60 periodically requests instructions from the server 54 to determine the number, frequency, and location of sampled points. The TV client module does not need to send points to the server at a constant sampling rate. In one embodiment, the TV client module samples approximately 10 times per second, batches the resulting points, and sends them to the server at a fixed interval, such as once per second. The TV client module requires knowledge of the user management submodule components associated with the session. During initialization (and periodically thereafter), the TV client module calls the user management session manager to obtain the address of the user management component. The user management component assigned to a given device, such as a TV, maintains the user's session information. If the assigned user management component is no longer available (e.g., due to a crash), the session manager assigns a new user management component. The TV client module also requires arbitrary timestamps to maintain request order and send location information to the relevant components of the point management submodule.

响应从客户端模块60收到的HTTP请求，频道识别模块62实时识别 HTTP请求中的cueData所取自的视频片段，以及相对于该片段开始时间的时间偏移。如前所述，片段和偏移合称“位置”。频道识别模块62的点管理子模块用路径跟踪算法在数据库66中搜索存入数据库中的最接近HTTP请求收到的像素线索点的像素线索点。完成方式参见附件《路径跟踪问题：用任意线索跟踪视频传输》，该附件的全部内容通过引用方式并入本文。图10 所示的PPLEB双箭头说明频道识别模块的点管理子模块与数据库16通信，同时执行路径跟踪算法，算法包括平均球概率点位(PPLEB)和高效可能性更新算法。附件详细介绍了最可能“位置”的识别方法，包括数学方程式。下文将更简明地介绍搜索方法。In response to an HTTP request received from client module 60, channel identification module 62 identifies, in real time, the video segment from which the cueData in the HTTP request is taken, as well as its time offset relative to the start time of the segment. As previously mentioned, the segment and offset are collectively referred to as a "position." The point management submodule of channel identification module 62 uses a path tracing algorithm to search database 66 for the pixel cue point stored in the database that is closest to the pixel cue point received in the HTTP request. This is accomplished by referring to the appendix "Path Tracing Problem: Tracking Video Transmission with Arbitrary Clues," which is incorporated herein by reference in its entirety. The double PPLEB arrows shown in Figure 10 indicate that the point management submodule of the channel identification module communicates with database 16 while executing the path tracing algorithm, which includes the Probability Point Location of the Mean Sphere (PPLEB) and the Efficient Likelihood Update algorithm. The appendix details the method for identifying the most likely "position," including the mathematical equations. The search method is described more briefly below.

附件所述的路径跟踪算法使用名为局部敏感哈希的数学构造。在现有技术的已知方法是将每个数据集的各个点映射到单词，单词是哈希值列表。这些单词置入排序字典(比较像常用的英语字典)。搜索一个点以后，算法先构建单词，再返回到字典中最近的编篡匹配项。这需要独立计算单词的每个字母，执行字典搜索。在附件公开的版本中，先构建固定长度的单词(完全取决于点向量名词)，然后频道识别模块的点管理子模块仅在字典中查找完全匹配的单词。其优势在于，1)分批计算与点对应的单词，速度远远超过逐个字母的计算速度；2)用传统的哈希函数而不是字典进行字典搜索能提高速度、简化过程。The path tracing algorithm described in the appendix uses a mathematical construct called locality sensitive hashing. A known method in the prior art is to map each point in each data set to a word, which is a list of hash values. These words are placed in a sorted dictionary (much like a common English dictionary). After searching for a point, the algorithm first constructs the word and then returns the closest edited match in the dictionary. This requires calculating each letter of the word independently and performing a dictionary search. In the version disclosed in the appendix, a fixed-length word is first constructed (which depends entirely on the point vector noun), and then the point management submodule of the channel identification module only searches the dictionary for the exact matching word. The advantages are that 1) the words corresponding to the points are calculated in batches, which is much faster than calculating them letter by letter; and 2) using a traditional hash function instead of a dictionary for the dictionary search can increase the speed and simplify the process.

需要理解是，位置搜索(即视频片段加时间偏移)查找的是最可能的内容。路径跟踪算法先查找可能的位置，再计算这些可能位置的概率分布。更具体地说，向每个可能位置分配一个概率，表示与电视屏幕正在显示的视频片段匹配的可能性。如果具有最大概率的可能位置的概率超过预定的临界值，则做出可能位置与正在显示的视频片段对应的决定。否则，在连续接收的像素线索点被处理时，路径跟踪算法继续更新可能位置列表和它们的概率分布。It is important to understand that a location search (i.e., video clip plus time offset) searches for the most likely content. The path tracing algorithm first finds possible locations and then calculates a probability distribution for these possible locations. More specifically, a probability is assigned to each possible location, representing the likelihood of it matching the video clip currently being displayed on the television screen. If the probability of the possible location with the highest probability exceeds a predetermined threshold, a decision is made that the possible location corresponds to the video clip currently being displayed. Otherwise, the path tracing algorithm continues to update the list of possible locations and their probability distributions as successively received pixel clue points are processed.

路径跟踪算法是一种概率方法：不需要始终完全匹配像素线索点，而是根据总体证明做出结果为真的决定。算法始终进行实时跟踪，能处理偏离相同序列其他像素数据点的断续像素线索点。例如，虽然算法也许只能识别一个视频片段10帧中的7帧，但是仍然能够识别大多数可能位置。算法还对电视观众的暂停、换频道等操作做出快速反应。The path-following algorithm is a probabilistic approach: it doesn't always need to perfectly match pixel clues; instead, it determines the result as true based on the overall evidence. The algorithm tracks in real time and can handle intermittent pixel clues that deviate from other pixel data points in the same sequence. For example, while the algorithm may only identify 7 out of 10 frames in a video clip, it can still identify the majority of possible locations. The algorithm also responds quickly to actions such as pauses and channel changes by viewers.

收到电视系统的第一个像素线索点后，服务器计算所有可能位置的概率分布。收到来自同一电视系统的各个后续像素线索点后，更新可能位置列表，针对更新的可能位置计算更新的概述分布。始终实时执行该迭代进程，密切监控用户的观看习惯。从电视接收的各个像素线索点在处理后予以丢弃。可能位置记录及其概率分布保存在各个用户会话的内存中。但是，如果具体可能位置的可能性降低(例如，概率低于预定的临界值下限)，则可以忽略该可能位置，即从存储记录中删除。Upon receiving the first pixel cue point from a TV system, the server calculates a probability distribution for all possible locations. Upon receiving each subsequent pixel cue point from the same TV system, the list of possible locations is updated, and an updated summary distribution is calculated for each updated possible location. This iterative process is always performed in real time, closely monitoring the user's viewing habits. Each pixel cue point received from the TV is processed and discarded. A record of possible locations and their probability distributions is stored in memory for each user session. However, if the likelihood of a particular possible location decreases (for example, the probability falls below a predetermined threshold), the possible location can be ignored, i.e., deleted from the stored record.

另外还需要注意的是，为各电视系统搜索整个像素线索库的效率不高。为了提高搜索效率，将数据库中的像素线索数据分成几个部分。仅在一个部分搜索最近的内容。相关详细说明请参见附件。It's also important to note that searching the entire pixel cue database for each TV system is inefficient. To improve search efficiency, the pixel cue data in the database is divided into several sections. Only the most recent content is searched in one section. See the attached document for detailed instructions.

在数据库中识别最可能的位置以后，上下文定向模块64可以检索与该位置关联的存入数据库的内容(参见图10)。在优选实施方式中，上下文定向模块64从频道识别模块62收到节目ID和最可能位置(即概率最大的可能位置，但概率超过可预置的成功临界值)的时间偏移，然后用该信息在数据库66中检索相关强化内容。数据库包含视频片段隐藏字幕，视频片段的标识符和像素线索点存入数据库。数据库还包含由提取自文件的触发程序 (即，单个单词或简短的单词序列以及指示相对明确的具体主题的专有名词)组成的内容百科全书以及与各触发程序相关的相应内容。百科全书是结构化内容数据的索引，优选为按类别组织。上下文定向模块包括搜索引擎，搜索引擎搜索与识别的位置相关的隐藏字幕(存在数据库)，在相关隐藏字幕中识别触发程序。参见图10中标有“触发程序搜索”的箭头。然后，上下文定向模块在数据库中检索与百科全书中识别的触发程序有关的内容。根据具体内容(具体的电视节目、广告、电影等)定制触发程序集和搜索配置。例如，识别篮球比赛等，上下文定向模块使用包含队员/教练姓名等的触发程序集。在另一个实例中，对新闻和时事节目进行配置，使用强调政客姓名和时事流行词(“医疗”等)的触发程序集。在另一个实例中，对电视剧和情景喜剧进行配置，使用的触发程序集包括任意的单词组合以及在区域特定位置触发活动的时间标记，且无须查看对话主题(例如，在区域特定点后显示、与唱歌对应的活动)。After identifying the most likely location in the database, the contextual targeting module 64 can retrieve the database-stored content associated with that location (see FIG. 10 ). In a preferred embodiment, the contextual targeting module 64 receives the program ID and the time offset of the most likely location (i.e., the most likely location, but with a probability exceeding a preset success threshold) from the channel identification module 62 and then uses this information to retrieve relevant enhanced content from the database 66. The database contains closed captions for video clips, with the video clip identifiers and pixel cue points stored in the database. The database also contains a content encyclopedia consisting of triggers extracted from the files (i.e., single words or short sequences of words and proper nouns that indicate relatively specific topics) and corresponding content associated with each trigger. The encyclopedia is an index of structured content data, preferably organized by category. The contextual targeting module includes a search engine that searches the database for closed captions associated with the identified location and identifies the triggers in the associated closed captions. See the arrow labeled "Trigger Search" in FIG. 10 . The contextual targeting module then searches the database for content associated with the triggers identified in the encyclopedia. The trigger set and search configuration are customized based on the specific content (specific TV shows, commercials, movies, etc.). For example, to identify basketball games, the contextual targeting module uses a trigger set that includes the names of players/coaches, etc. In another example, news and current affairs programs are configured with a trigger set that emphasizes the names of politicians and current buzzwords ("medical," etc.). In another example, TV series and sitcoms are configured with a trigger set that includes any combination of words and time stamps that trigger activities at specific locations in the region without having to look at the topic of the conversation (for example, activities corresponding to singing songs are displayed after a specific point in the region).

离线服务器56(参见图10)接收频道/网络输入和内容输入，构建数据库66。离线服务器接收输入内容时持续更新数据库。The offline server 56 (see FIG10 ) receives channel/network input and content input and builds a database 66. The offline server continuously updates the database as it receives input content.

在优选实施方式中，离线服务器56提取频道/网络输入的时间标记、像素线索点和隐藏字幕。提取的信息存储为数据库66的一部分。更具体地说，数据库包含各个电视节目、广告等播放或视频片段的以下信息：(a) 各视频片段的像素线索点列表；(b)相对于某些固定时间点的偏移，偏移分别与上述像素线索点相关，从而在像素线索点出现时指示时序；以及 (3)相关元数据(隐藏字幕等)。在优选情况下，离线服务器对像素数据进行采样，采样速率与电视系统客户端相同。但是，在采样同一视频片段时，两种设备不一定以完全相同的速度采样。In a preferred embodiment, the offline server 56 extracts the time stamps, pixel cue points, and closed captions of the channel/network input. The extracted information is stored as part of a database 66. More specifically, the database contains the following information for each television program, commercial, etc. broadcast or video clip: (a) a list of pixel cue points for each video clip; (b) offsets relative to certain fixed time points, the offsets being associated with the aforementioned pixel cue points to indicate the timing when the pixel cue points occur; and (3) related metadata (closed captions, etc.). Preferably, the offline server samples the pixel data at the same rate as the television system client. However, when sampling the same video clip, the two devices may not necessarily sample at exactly the same rate.

离线服务器56还提取内容输入的触发程序和内容。存入内存时，提取的信息构成上述百科全书，同样是数据库16的组成部分。离线服务器还为特定的电视节目创建定制索引。The offline server 56 also extracts the triggers and content of the content input. When stored in memory, the extracted information constitutes the above-mentioned encyclopedia, which is also an integral part of the database 16. The offline server also creates a customized index for a specific television program.

离线服务器56可能包含主源模块，该模块对内容进行索引，并且将内容加入点管理子模块搜索的库(数据库66等)。主源模块的构件是仿真器集合，除了能在主模式下运行外，主源模块的构件其他方面与电视客户端构件相同。在主模式下，向点管理子模块发送点的方法与标准模式相同，但是主模式下的点附有元数据以及指示点管理子模块在库中加入点而不是搜索点的指令。主源模块有四种运行方式：(1)批；(2)直播；(3)频道；及 (4)UGC。在批模式下，内容以完整视频文件的形式远远早于“播送日期”到达。电视客户端仿真器在主模式下播放视频文件，将点发送至待加入库的点管理子模块。在直播模式下，具体的直播活动配置为被索引状态(篮球比赛等)。流安排在内容检索时间前，附于在主模式下运行的某一仿真器。在频道模式下，仿真器设置为连续观看和检索既定频道状态。内容一般通过机顶盒进入公共分布网。设置带采集卡的服务器，以获取机顶盒内容，运行仿真器。识别被检索的节目时，还必须访问电子节目指南。在UGC模式下，给定电视的电视客户端模块等设备可以在主模式下活动，以便将内容加入目前正在观看的设备库。主源模块还包含一个简单数据库，其中在唯一内容ID上标识基本内容元数据(名称、频道等)。数据库仅列出正在检索的内容。The offline server 56 may include a master source module that indexes content and adds it to a library (database 66, etc.) that is searched by the point management submodule. The master source module is composed of a collection of emulators that are identical to the TV client components except that they can operate in master mode. In master mode, the method of sending points to the point management submodule is the same as in standard mode, but the points in master mode are accompanied by metadata and instructions for the point management submodule to add the points to the library rather than searching for them. The master source module has four operating modes: (1) batch; (2) live; (3) channel; and (4) UGC. In batch mode, content arrives in the form of complete video files well in advance of the "broadcast date". The TV client emulator plays the video file in master mode and sends the points to the point management submodule to be added to the library. In live mode, a specific live event is configured to be indexed (basketball game, etc.). The stream is scheduled before the content retrieval time and attached to a emulator running in master mode. In channel mode, the emulator is set to continuously watch and retrieve the given channel status. Content typically enters the public distribution network through a set-top box. A server with a capture card is set up to retrieve content from the set-top box and run an emulator. To identify the program being retrieved, the electronic program guide must also be accessed. In UGC mode, devices such as the TV client module for a given TV can operate in master mode to add content to the library of the device currently being viewed. The master source module also contains a simple database that identifies basic content metadata (title, channel, etc.) based on a unique content ID. The database only lists the content being retrieved.

同样结合图10进行说明，上下文定向模块64为面向用户的应用软件，软件通过所观看内容的当前片段隐藏字幕流的预定库确定内容。上下文定向模块64完全依靠正确的内容检测检索相关隐藏字幕信息，因此依靠用户管理子模块、点管理子模块和主源模块操作。10 , the contextual targeting module 64 is a user-facing application that determines content based on a predefined library of closed caption streams for the current segment of content being viewed. The contextual targeting module 64 relies entirely on accurate content detection to retrieve relevant closed caption information, and therefore operates in conjunction with the user management submodule, the point management submodule, and the primary source module.

更具体地说，上下文定向模块64针对在电视系统52的微件引擎58上运行的具体微件请求将检索到的内容发送给该微件。更具体地说，在电视微件引擎(或电视的其他GUI)上运行的微件定期向服务器54发送节目信息、元数据及上下文定向内容请求。请求的详细内容取决于电视应用软件所需的具体功能。下面介绍服务器的一些响应实例。More specifically, the contextual targeting module 64 sends the retrieved content to a specific widget running on the widget engine 58 of the television system 52 in response to that widget's request. More specifically, a widget running on the television widget engine (or other GUI on the television) periodically sends program information, metadata, and contextually targeted content requests to the server 54. The details of the request depend on the specific functionality required by the television application software. Some examples of server responses are described below.

第一响应是服务器根据隐藏字幕，针对上下文定向内容请求所作的响应实例：The first response is an example of a server response to a request for context-targeted content based on closed captioning:

针对HTTP请求所作的第一示范性响应包含如下参数：The first exemplary response to the HTTP request includes the following parameters:

参数“createdOn”是用户会话的创建日期/时间的时间标记，用于跟踪用户看电视的时间。The parameter "createdOn" is a timestamp of the creation date/time of the user session and is used to track the time the user has watched TV.

参数“token”与前文所述的电视唯一标识符相同。该ID号用于联系频道识别构件和上下文定向构件。The parameter "token" is the same as the TV unique identifier mentioned above. This ID number is used to connect the channel identification component and the contextual targeting component.

参数“channel”按名称和播送日期识别正在观看的节目。The parameter "channel" identifies the program being watched by name and air date.

参数“channelTime”是识别内容段的播放时间(单位毫秒)。“播放时间”和“时间偏移”含义相同，在本文中互换使用。The parameter "channelTime" is the playback time (in milliseconds) of the identified content segment. "Playback time" and "time offset" have the same meaning and are used interchangeably in this document.

参数“myContent”是根据隐藏字幕在节目的相应位置中定向的内容列表。该参数包括三个示范性内容项。各个内容项的参数如下：The parameter "myContent" is a list of content that is targeted at the corresponding position of the program according to the closed caption. This parameter includes three exemplary content items. The parameters of each content item are as follows:

“searchKey”是特定内容项的唯一标识符；“displayName”是特定内容项的标题；“founding”是与特定内容项匹配的隐藏字幕行；“engineName”是使用的内部搜索引擎(可以从算法不同、针对不同节目进行优化的多个搜索引擎中选择的一个搜索引擎使用)；“matchedText”是在隐藏字幕流的具体文本，所述隐藏字幕流触发为具体内容项匹配的搜索引擎。"searchKey" is a unique identifier for a specific content item; "displayName" is the title of the specific content item; "founding" is the closed caption line that matches the specific content item; "engineName" is the internal search engine used (one of multiple search engines with different algorithms and optimized for different programs can be selected); "matchedText" is the specific text in the closed caption stream that triggered the search engine to match the specific content item.

下面是服务器针对具体节目定制索引的上下文定向内容请求所做出的示范性响应：The following is an example server response to a context-targeted content request for a customized index of a specific program:

针对HTTP请求所作的第二示范性响应包含如下参数：The second exemplary response to the HTTP request includes the following parameters:

参数“widget(微件)”是使用该数据源的定制应用软件ID。The parameter "widget" is the ID of the custom application that uses this data source.

参数“myContent”是根据隐藏字幕等元数据在节目的相应位置定向的内容列表。The parameter "myContent" is a list of contents targeted at the corresponding position of the program based on metadata such as closed captions.

参数“searchKey”是内容的唯一标识符。The parameter "searchKey" is a unique identifier for the content.

参数“startTime”和“endTime”将具体内容项限制在节目的特定区域。The parameters "startTime" and "endTime" restrict a specific content item to a specific area of the program.

参数“engineName”是使用的内部搜索引擎(在此情况下是使用包含 AndersonCooper博客条目索引的CNN特定搜索引擎)。The parameter "engineName" is the internal search engine to use (in this case a CNN-specific search engine that contains an index of Anderson Cooper's blog entries).

参数“byline(标题下署名行)”、“images(图像)”、“abstract(摘要)”和“publishDate”是向用户显示的内容。The parameters "byline", "images", "abstract", and "publishDate" are displayed to the user.

根据向图10所示系统的电视系统52提供上下文定向内容的一种方法，服务器54执行如下步骤：(a)为多个视频片段中的各个视频片段存储相应的数据集，每个数据集包括识别相应视频片段的数据、从相应的视频片段电视信号中提取的数据点、以及相应的时间偏移数据，时间偏移数据指示从相应视频片段的电视信号中提取的数据点的相应时序；(b)在屏幕上显示视频片段时，接收电视系统52的数据点；(c)在数据库中检索与收到的数据点最匹配的数据点有关的识别数据和时间偏移数据，其中，识别数据和时间偏移数据相结合来识别屏幕正在显示的视频片段的部分；(d)达到或超过成功识别的临界可能性时，在数据库中检索与识别到的屏幕正在显示的视频片段部分有关的内容；以及(e)向电视系统52发送检索到的内容。According to a method for providing context-targeted content to a television system 52 of the system shown in Figure 10, a server 54 performs the following steps: (a) storing a corresponding data set for each video segment in a plurality of video segments, each data set including data identifying the corresponding video segment, data points extracted from the corresponding video segment television signal, and corresponding time offset data, the time offset data indicating the corresponding timing of the data points extracted from the television signal of the corresponding video segment; (b) receiving data points from the television system 52 when the video segment is displayed on the screen; (c) retrieving identification data and time offset data associated with the data point that most closely matches the received data point from a database, wherein the identification data and time offset data are combined to identify the portion of the video segment being displayed on the screen; (d) when a critical probability of successful identification is reached or exceeded, retrieving content associated with the identified portion of the video segment being displayed on the screen from a database; and (e) sending the retrieved content to the television system 52.

在图10所示的实施方式中，数据库66存储像素线索点和多个视频片段的内容，服务器54经编程执行以下步骤：(a)确定数据库66中可能与通过网络从电视系统52中接收的像素线索点匹配的像素线索点；(b)为步骤 (a)确定的像素线索点计算概率分布；(c)在数据库中检索与像素线索点 (确定为最可能与通过网络从电视系统中接收的像素线索点相匹配)有关的节目识别符和播放时间；(d)在数据库中检索与步骤(c)检索的节目标识符及播放时间有关的内容；及(e)通过网络向电视系统发送内容。In the embodiment shown in Figure 10, the database 66 stores pixel clue points and the content of multiple video clips, and the server 54 is programmed to perform the following steps: (a) determine the pixel clue points in the database 66 that are likely to match the pixel clue points received from the television system 52 via the network; (b) calculate the probability distribution for the pixel clue points determined in step (a); (c) retrieve the program identifier and broadcast time associated with the pixel clue point (determined to be most likely to match the pixel clue point received from the television system via the network) from the database; (d) retrieve the content associated with the program identifier and broadcast time retrieved in step (c) from the database; and (e) send the content to the television system via the network.

另外，根据图10所示的实施方式的另一个方面，电视系统52包括多像素屏幕和处理器系统，处理器系统包括微件引擎和独立的客户端，客户端经编程生成包含像素线索点的请求，各像素线索点包括像素值集，该像素值集于相应的时间在屏幕的预定像素集中显示，预定像素集是屏幕像素总数的子集。In addition, according to another aspect of the embodiment shown in Figure 10, the television system 52 includes a multi-pixel screen and a processor system, the processor system includes a widget engine and an independent client, the client is programmed to generate requests including pixel clue points, each pixel clue point includes a set of pixel values, and the pixel value set is displayed in a predetermined set of pixels on the screen at a corresponding time, and the predetermined set of pixels is a subset of the total number of pixels on the screen.

根据图10所述的实施方式的另一个方面，系统包括网络、连接网络的服务器54、以及连接网络的电视系统52。电视系统52包括多像素屏幕和处理器系统，处理器系统包括微件引擎和客户端，客户端经编程向服务器发送请求，包括像素线索点。服务器54包括数据库66(用于存储像素线索点和多个视频片段的内容)和处理器系统，该系统经编程执行如下步骤：(a) 确定数据库66中可能与通过网络从电视系统52中接收的像素线索点匹配的像素线索点；(b)为步骤(a)确定的像素线索点计算概率分布；(c)在数据库66中检索与像素线索点(确定为最可能与通过网络从电视系统52中接收的像素线索点相匹配)有关的节目识别符和播放时间；(d)在数据库66 中检索与步骤(c)检索的节目标识符及播放时间有关的内容；及(e)通过网络向电视系统52发送内容。According to another aspect of the embodiment shown in FIG10 , a system includes a network, a server 54 connected to the network, and a television system 52 connected to the network. The television system 52 includes a multi-pixel screen and a processor system, the processor system including a widget engine and a client, the client being programmed to send a request to the server, including a pixel cue point. The server 54 includes a database 66 for storing pixel cue points and content of a plurality of video clips, and a processor system programmed to perform the following steps: (a) determining pixel cue points in the database 66 that are likely to match pixel cue points received from the television system 52 via the network; (b) calculating a probability distribution for the pixel cue points determined in step (a); (c) retrieving from the database 66 a program identifier and a play time associated with the pixel cue point determined to be most likely to match the pixel cue point received from the television system 52 via the network; (d) retrieving from the database 66 content associated with the program identifier and play time retrieved in step (c); and (e) transmitting the content to the television system 52 via the network.

根据图10所示系统的电视系统52多像素屏幕显示的视频片段像素值的另一种自动处理方法，服务器54执行如下步骤：(a)为多个视频片段中的各个视频片段存储相应的数据集，每个数据集包括识别相应视频片段的数据以及从相应视频片段提取的数据点，各像素线索点包括构成相应视频片段相应帧的相应像素值集的相应子集；(b)在多像素屏幕上显示视频片段时，接收来自电视系统52的像素线索点；(c)确定数据库中可能与接收到的像素线索点匹配的像素线索点；(d)计算步骤(c)确定的像素线索点的概率分布；以及(e)在数据库66中检索与像素线索点(确定为最可能与收到的像素线索点相匹配)有关的识别数据，其中识别数据识别电视系统52的多像素屏幕所显示的视频片段。According to another automatic processing method for pixel values of a video clip displayed on a multi-pixel screen of a television system 52 of the system shown in FIG10 , the server 54 performs the following steps: (a) storing a corresponding data set for each video clip in a plurality of video clips, each data set including data identifying the corresponding video clip and data points extracted from the corresponding video clip, each pixel clue point including a corresponding subset of a corresponding set of pixel values constituting a corresponding frame of the corresponding video clip; (b) receiving pixel clue points from the television system 52 when displaying the video clip on the multi-pixel screen; (c) determining pixel clue points in a database that are likely to match the received pixel clue points; (d) calculating a probability distribution of the pixel clue points determined in step (c); and (e) retrieving identification data associated with the pixel clue point determined to be most likely to match the received pixel clue point from the database 66, wherein the identification data identifies the video clip displayed on the multi-pixel screen of the television system 52.

为了执行前段所述的方法，服务器54可能进一步包括指标软件模块 (图10未显示)，以收集用户管理模块的匹配信息，在数据库66中保存匹配信息，以供日后生成报告。指标模块不仅能提供系统操作方式方面的有用信息，还能创建增值报告，增值报告可出售给需要了解观众观看习惯的公司。在一个实施方式中，指标数据发给聚合器/转发器，以便将数据异步存入、卸出数据库。原始指标数据先存入数据库，然后进行处理，加入各种报告，例如，指定节目观看用户数报告、指定节目时移(通过DVR等)观看用户数报告、指定广告观看用户数报告等。To implement the method described in the preceding paragraph, server 54 may further include a metrics software module (not shown in FIG. 10 ) that collects matching information from the user management module and stores the matching information in database 66 for later report generation. The metrics module not only provides useful information about the system's operation but also creates value-added reports that can be sold to companies interested in understanding audience viewing habits. In one embodiment, the metrics data is sent to an aggregator/forwarder for asynchronous storage and unloading of the data into and out of the database. The raw metrics data is first stored in the database and then processed to be incorporated into various reports, such as reports on the number of users who viewed a specific program, reports on the number of users who time-shifted a specific program (via DVR, etc.), and reports on the number of users who viewed a specific advertisement.

虽然已结合各种实施方式对本发明进行了说明，但是本领域的技术人员应该理解，在不脱离本发明精神的情况下，本发明还有各种变化，可以用同等的元件替换本发明元件。另外，在不脱离本发明范围的情况下，还可以对本发明作各种修改，以适应本发明的特定说明情况。因此，应视本文公开的具体实施方式为本发明的最佳实现方式，不得视其为本发明的限制条件。While the present invention has been described in conjunction with various embodiments, those skilled in the art will appreciate that various variations of the present invention are possible without departing from the spirit of the invention, and that equivalent elements may be substituted for the elements of the present invention. Furthermore, various modifications may be made to the present invention to accommodate the specific circumstances of the invention without departing from the scope of the invention. Therefore, the specific embodiments disclosed herein should be considered the best modes for carrying out the invention and should not be construed as limitations of the invention.

在权利要求书中，“处理器系统”应作广义解释，包括一个或多个处理器。此外，用字母符号说明的方法步骤并不表示以字母顺序执行相应的方法步骤。In the claims, "processor system" should be interpreted broadly to include one or more processors. In addition, the use of alphabetical symbols to describe method steps does not necessarily mean that the corresponding method steps must be performed in alphabetical order.

附件appendix

路径跟踪问题：用模糊线索跟踪视频传输The path tracking problem: tracking video transmissions with fuzzy cues

摘要summary

一种有效的视频跟踪方法。由于视频片段数量众多，因此系统必须能够实时识别指定查询视频输入所取自的片段以及时间偏移。片段和偏移合称位置。该方法称为视频跟踪方法的原因是它必须能够有效检测并适应暂停、快进、回放、突然切换到其他片段及切换到未知片段。必须先处理数据库才能跟踪直播视频。每几分之一秒获取帧的视频线索(少量像素值)，置入专用的数据结构。¹视频跟踪形式是连续接收输入视频的线索，更新当前位置的信任值集。每个线索或与信任值一致或与信任值不一致，经调整反映新的证据。如果我们相信视频位置为真且对其有足够的信心，则视该视频位置正确。只要跟踪一个小的可能位置集就能有效地完成上述操作。An efficient video tracking method. Due to the large number of video segments, the system must be able to identify in real time the segment from which a given query video input was taken, as well as the time offset. The segment and offset are collectively referred to as the position. The method is called a video tracking method because it must be able to effectively detect and adapt to pauses, fast-forwards, rewinds, sudden switches to other segments, and switches to unknown segments. Tracking live video requires processing a database. Video clues (a small number of pixel values) are acquired for each frame every fraction of a second and placed into a dedicated data structure. ^Video tracking takes the form of continuously receiving clues from the input video and updating a set of confidence values for the current position. Each clue is either consistent with the confidence value or inconsistent with the confidence value and is adjusted to reflect the new evidence. A video position is considered correct if we believe it to be true and have sufficient confidence in it. This can be done efficiently by tracking a small set of possible positions.

引言introduction

本文介绍了一种视频跟踪方法，用抽象的数学构造作了解释和研究。引言部分旨在向读者介绍两种领域的必需转化工具。视频信号包含连续帧。每帧可视为一个静止的图像。帧是像素光栅。每个像素由三个分别对应像素红、绿、蓝(RGB)色的强度值组成。在本文中，线索是帧内像素子集的 RGB值列表和对应的时间标记。线索的像素数量远远小于帧内像素数量，一般为5-15个像素。线索是标量值有序列表，事实上是一种向量。该向量亦称点。This article presents a video tracking method, explained and investigated using abstract mathematical constructs. The introduction aims to introduce the reader to the necessary transformation tools in both fields. A video signal consists of a series of frames. Each frame can be considered a still image. A frame is a raster of pixels. Each pixel consists of three intensity values, corresponding to the red, green, and blue (RGB) colors of the pixel. In this article, a clue is a list of RGB values for a subset of pixels within a frame and the corresponding time stamp. The number of pixels in a clue is much smaller than the number of pixels in a frame, typically 5-15 pixels. A clue is an ordered list of scalar values, effectively a vector. This vector is also called a point.

虽然点的维度较大，一般在15-150之间，但可以将点设想为两个维度之间的点。事实上，我们将以二维图的形式提供插图。试想视频及其对应线索点的演进。较小的时间变化导致像素价值发生较小的变化。可以视像素点在两帧之间“移动”。继帧间的微小移动后，线索遵循空间路径，就像弯丝上的珠子。While points have a large dimension, typically between 15 and 150, it's possible to think of them as points between two dimensions. In fact, we'll provide illustrations in the form of two-dimensional graphs. Imagine the evolution of a video and its corresponding clue points. Small changes in time result in small changes in pixel value. This pixel can be thought of as "moving" between two frames. Following this tiny shift between frames, the clue follows a spatial path, like a bead on a curved wire.

在该比喻中，我们在视频跟踪中接收珠子的空间位置(线索点)，查找珠子遵循的钢丝(路径)。但是有两个原因导致过程变得非常困难，原因 1：珠子并不完全遵循钢丝的路径，而是与钢丝保持一段变化的未知距离；原因2：钢丝全部绕在一起。详细说明请参见第2章。下文所述的算法以两个概念步骤予以实现。收到线索后，在所有非常接近线索点的已知路径上查找所有点；这些点称为可能点。可以用PPLEB算法有效地完成上述过程。可能点加入历史数据结构，计算表示真位置的各个可能点的概率。该步骤还包括删除可能性不大的可能位置。记录更新进程一方面能确保仅保存一小部分记录，另一方面也许从不删除可能位置。通用算法参见《算法1》和图 11。In this analogy, we receive the spatial position of a bead (a clue point) in video tracking and search for the wire (the path) that the bead follows. However, this process is difficult for two reasons: 1) the bead does not follow the wire's path perfectly, but rather maintains a varying, unknown distance from it; and 2) the wires are all twisted together. See Chapter 2 for a detailed description. The algorithm described below is implemented in two conceptual steps. Upon receiving the clue, all points on the known path that are very close to the clue point are searched; these points are called possible points. This process can be efficiently performed using the PPLEB algorithm. Possible points are added to a history data structure, and the probability of each possible point representing the true position is calculated. This step also involves deleting possible positions that are unlikely. The record update process ensures that only a small number of records are kept, while also ensuring that possible positions are never deleted. The general algorithm is shown in Algorithm 1 and Figure 11.

本文第1章介绍平均球概率点位(PPLEB)函数。PPLEB函数用于有效地执行《算法1》的第5行。快速查找可能内容的能力是该方法适用性的决定因素。第2章将介绍执行第6行和第7行的统计模型。所述模型是我们在设置过程中所作的自然选择。我们还会介绍有效使用该模型的方式。Chapter 1 introduces the Probability Points of the Average Ball (PPLEB) function. The PPLEB function is used to efficiently execute Line 5 of Algorithm 1. The ability to quickly find possible content is crucial for the applicability of this method. Chapter 2 introduces the statistical model used to execute Lines 6 and 7. This model was a natural choice during the setup process. We also describe how to effectively use it.

1.平均球概率点位1. Average ball probability point

下面介绍执行平均球概率点位(PPLEB)的简单算法。在传统的平均球点位(PLEB)中，我们先在R^d中设置n个点x_i集，指定半径为r的球。赋予算法O(poly(n))进行时间预处理，产生有效的数据结构。接着，赋予算法查询点x，返回所有点x_i，包括||x-x_i||≤r。||x-x_i||≤r等点集按几何级数位于围绕查询x、半径为r的球内(参见图12)。我们称这种关系x_i接近x，或者x_i和x 邻近。The following describes a simple algorithm for performing Probability Point Location in the Average Sphere (PPLEB). In traditional PLEB, we first set n points x _i in R ^d and specify a sphere of radius r. The algorithm is given O(poly(n)) time preprocessing to produce an efficient data structure. Next, given a query point x, the algorithm returns all points x _i such that ||xx _i || ≤ r. Point sets such as ||xx _i || ≤ r lie geometrically within the sphere of radius r surrounding the query x (see Figure 12). We call this relationship x _i close to x, or x _i and x adjacent.

PPLEB问题和最邻近搜索问题是学术界广为关注的两个类似问题。事实上，这也是计算几何学领域早期研究中遇到的部分问题。有很多不同的方法适应环境维度d较小或恒定的情况。这些方法以不同方式分割空间，递归搜索各个部分，包括KD-trees[2]cover-trees[1]等。这些方法在低维度时非常有效，但在高环境维度下性能很差。人们称之为“维度灾难”。既能解决问题又能克服维度灾难的方法包括Gionis等人[3]、Lv等人[5]、Kushilevitz等人[4] 的研究工作。我们采用的算法是[3]所述算法的简化快速版本，主要依靠局部敏感哈希。The PPLEB problem and the nearest neighbor search problem are two similar problems that have attracted widespread attention in academia. In fact, these are also some of the problems encountered in the early research in the field of computational geometry. There are many different methods that adapt to the situation where the environment dimension d is small or constant. These methods divide the space in different ways and recursively search each part, including KD-trees[2] and cover-trees[1]. These methods are very effective in low dimensions, but perform poorly in high environment dimensions. People call it the "curse of dimensionality". Methods that can solve the problem and overcome the curse of dimensionality include the research work of Gionis et al.[3], Lv et al.[5], and Kushilevitz et al.[4]. The algorithm we use is a simplified and fast version of the algorithm described in [3], which mainly relies on locality sensitive hashing.

1.1局部敏感哈希1.1 Locality Sensitive Hashing

在我们的局部敏感哈希方案，有人设计了系列哈希函数H，例如：In our locality-sensitive hashing scheme, someone designed a series of hash functions H, such as:

也就是说，当x和y的概率相互接近时，通过h映射到同一值的x和y的概率明显变大。That is, when the probabilities of x and y are close to each other, the probability of x and y being mapped to the same value by h becomes significantly larger.

为了清楚起见，我们先处理简单的场景，其中所有入向量的长度相同详细说明请参见下文。首先，我们设定了一个随机函数u ∈U，根据x和y的夹角分开x和y。设为通过单位球面统一选取的随机函数向量，设(参见图13)。很容易验证， Pr_u～U(u(x))≠u(y))＝θ_x，y/π。另外，对于圆上的任意点x、y、x'、y'为||x'- y'||>2||x–y||，使θ_x′，y′≥20_x，y·。设p为：For clarity, we first deal with a simple scenario where all input vectors have the same length. Detailed explanation follows. First, we set a random function u ∈ U that separates x and y according to the angle between them. Let be a random function vector uniformly selected over the unit sphere, and let (see Figure 13). It is easy to verify that Pr _u～U (u(x))≠u(y))＝θ _x，y /π. In addition, for any point x, y, x', y' on the circle, ||x'- y'||>2||x–y||, such that θ _{x′, y′} ≥ 20 _x，y· . Let p be:

设系列函数H是与t无关的u副本向量积，即， h(x)＝[u₁(x)，...，u_i(x)]。直观地看，h(x)＝h(y)时，x和y可能相互邻近。下面我们进行量化。首先，计算假阳性反应的期望值n_fp。这属于h(x)＝h(y)但 ||x-y||＞2r的情况。我们得到n_fp不超过1的值t，即，我们不大可能出错。Let the family function H be the vector product of u replicas, independent of t, that is, h(x) = [u ₁ (x), ..., _ui (x)]. Intuitively, when h(x) = h(y), x and y are likely to be close to each other. Let's quantify this. First, calculate the expected value of false positives, n _fp . This is the case where h(x) = h(y) but ||xy|| > 2r. We find that for values t, n _fp does not exceed 1, meaning we are unlikely to make a mistake.

E[n_ft]≤n(1-2p)^t≤1 (5)E[n _ft ]≤n(1-2p) ^t ≤1 (5)

→t≥log(1/n)/log(1-2p) (6) →t≥log(1/n)/log(1-2p) (6)

接下来计算h(x)＝h(y)，前提条件为两者相邻。Next, h(x)=h(y) is calculated, provided that the two are adjacent.

Pr(h(x)＝h(y)|||x-y||≤r)≥(1-p)^{log(1/n)/log(1-2p)} (7)Pr(h(x)＝h(y)|||xy||≤r)≥(1-p) ^{log(1/n)/log(1-2p)} (7)

＝(1/n)^{log(1-p)/log(1-2p)} (8) =(1/n) ^{log(1-p)/log(1-2p)} (8)

注意，我们必须使2p＜1，即要求这个成功概率并不高。事实上，远远小于1/2。下面将介绍我们将该概率提高到1/2的方法。Note that we must ensure that 2p < 1, which means that the probability of success is not very high. In fact, it is much less than 1/2. The following describes how we can increase this probability to 1/2.

1.2点搜索算法1.2 Point Search Algorithm

各函数h将空间内的各个点映射到桶。相对于哈希函数h将点x的桶函数定义为B_h(x)≡{x_i|h(x_i)＝h(x)}。我们保持的数据结构是桶函数的实例，当有人搜索x时，我们返回。根据上一节的说明，我们得到两个需要的结果：Each function h maps each point in space to a bucket. The bucket function for a point x is defined as B _h (x) ≡ { _xi |h( _xi ) = h(x)} relative to the hash function h. The data structure we maintain is an instance of the bucket function, which we return when someone searches for x. Based on the explanation in the previous section, we obtain the two desired results:

Pr(x_i∈B(x)|||x_i-x||≤r)≥1/2 (10)Pr(x _i ∈B(x)|||x _i -x||≤r)≥1/2 (10)

换言之，概率不小于1/2时，我们能找到x的所有邻近点，基本找不到非邻近点。In other words, when the probability is not less than 1/2, we can find all the neighboring points of x and basically cannot find non-neighboring points.

1.3不同半径输入向量的处理1.3 Processing of input vectors with different radii

上一节仅介绍了在同长度向量r′中的搜索。现在介绍如何用结构作为构件支持在不同半径中的搜索。如图14所示，我们将空间分成宽度呈指数级增加的圆环。以R_i表示的i环包括所有点x_i，使||x_i||∈[2r(1+ε)ⁱ，2r(1+ε)ⁱ⁺¹).，实现两个目的。第一，如x_i和x_j属于同一圆环，则||x_j||/(1+ε)≤||x_i||≤||x_j||(1+ε)。第二，可以在不超过1/ε的圆形中进行任何搜索。另外，如果数据集中的最大长度向量为r′，则系统圆环总数为O(log(r′/r))。The previous section only introduced searches within vectors of the same length r′. Now we will introduce how to use structures as building blocks to support searches within different radii. As shown in Figure 14, we divide the space into rings of exponentially increasing width. The i-ring, denoted by R _i, includes all points x _i such that ||x _i ||∈[2r(1+ε) ⁱ , 2r(1+ε) ⁱ⁺¹ ). This achieves two goals. First, if x _i and x _j belong to the same ring, then ||x _j ||/(1+ε)≤||x _i ||≤||x _j ||(1+ε). Second, any search can be performed within a circle no larger than 1/ε. In addition, if the maximum length vector in the data set is r′, the total number of system rings is O(log(r′/r)).

2.路径跟踪问题2. Path tracking problem

在路径跟踪问题中，已知空间的固定路径以及时间序列形式的粒子位置。粒子、线索和点可互换使用。算法要求输出粒子在路径上的位置，但有些因素增加了输出难度。In the path-following problem, a fixed path in space is known, along with the positions of particles as a time series. The terms particle, clue, and point are used interchangeably. The algorithm must output the position of the particle along the path, but several factors complicate this task.

·粒子只是大致沿着路径运动。The particles only roughly follow the path.

·路径本身多次断开、相交。The path itself breaks and intersects multiple times.

·以时间序列点形式给出粒子和路径位置(每个点都不一样)。The particle and path positions are given as a time series of points (different for each point).

需要注意的是，本问题可以模拟任意数量的路径上的粒子跟踪。只须将各条路径连接到较长的路径，将所获位置看作单独路径上的位置即可。Note that this problem can simulate particle tracking on any number of paths. Simply connect the paths together to form a longer path and treat the resulting positions as positions on a single path.

更准确地说，使路径P为参数曲线。曲线参数称为时间。路径上的已知点为任意时间点t_i，即已知n对(t_i，P(t_i))。粒子沿路径运动，但其已知位置是不同的时间点形式，如图15所示。我们得到对m对 (t＇_j，x(t＇_j))，其中x(t＇_j)为以时间表示的粒子位置t＇_j。More precisely, let path P be a parametric curve. The curve parameter is called time. A known point on the path is any point in time t _i , meaning we have n known pairs (t _i , P(t _i )). The particle moves along the path, but its known position is at different points in time, as shown in Figure 15 . This gives us m pairs (t ' _j , x(t ' _j )), where x(t ' _j ) is the particle's position expressed in time at t ' _j .

2.1似然估算2.1 Likelihood Estimation

由于粒子并没有完全遵循路径且路径本身多次交叉，所以一般不能确切地识别粒子在路径上的实际位置。因此，我们计算了全部可能的路径位置概率分布。如果位置的概率属于明显可能，则假设粒子位置已知。下节介绍有效的实现方法。Because particles do not follow paths perfectly and paths often intersect each other, it is often impossible to definitively identify a particle's actual position along the path. Therefore, we compute a probability distribution over all possible path positions. If the probability of a position is clearly possible, the particle's position is assumed to be known. The next section describes an efficient implementation.

如果粒子遵循我们的路径，则粒子时间标识与P的相应点偏移应相对固定。换言之，如果x(t＇)当前在路径上的偏移为t，则其应邻近P(t)。另外，数秒前的τ应偏移t-τ。因此，x(t′-τ)应邻近P(t-τ)。²将相对偏移定义为Δ≡t-t′。请注意，只要粒子沿路径运动，那么相对偏移Δ应保持不变。即，x(t′)邻近P(t'+Δ)。If the particle follows our path, the particle's time stamp should be fixed relative to the corresponding point in P. In other words, if x(t') is currently offset by t on the path, it should be near P(t). Furthermore, a few seconds ago, τ, should be offset by t-τ. Therefore, x(t'-τ) should be near P(t-τ). ² Define the relative offset as Δ≡tt'. Note that as long as the particle follows the path, the relative offset Δ should remain constant. That is, x(t') should be near P(t'+Δ).

通过计算得到最大似然相对偏移。The maximum likelihood relative shift is obtained by calculation.

总之，最可能的相对偏移是粒子的历史最可能的相对偏移。但是，必须使用统计模型解该方程。统计模型必须量化：In short, the most likely relative shift is the most likely relative shift in the particle's history. However, a statistical model must be used to solve this equation. The statistical model must quantify:

·x遵循路径的严格程序。x follows a strict procedure for the path.

·x在两个位置之间“跳跃”的可能性。The probability of x "jumping" between two positions.

·测量点间的路径曲线和粒子曲线平滑度。Measure the smoothness of path curves and particle curves between points.

2.2时间折扣面元2.2 Time Discount Pixel

现在介绍一个简单的似然函数估算用统计模型。模型假设粒子对于路径的偏离以标准偏差αr正常分布。另外假设，在给定的任意时间点，粒子突然转到另一路径的概率非零。以过去点的指数级时间折扣体现。除了建模合理性外，模型的另一项优势是能够有效更新。对于某些常数时间单位τ，设似然函数与下面定义的f成比例：Now let's introduce a simple statistical model for estimating the likelihood function. The model assumes that particle deviations from the path are normally distributed with a standard deviation αr. Furthermore, it assumes that at any given point in time, the probability of a particle suddenly switching to another path is non-zero. This is represented by an exponential time discount with respect to past points. Besides its reasonable modeling, another advantage of the model is its ability to be efficiently updated. For some constant time unit τ, let the likelihood function be proportional to f defined below:

这里，α＜＜1是比例系数，ζ＞0是粒子在给定时间单位内转向随机位置的概率。Here, α<<1 is the proportionality coefficient and ζ>0 is the probability that a particle will turn to a random position in a given time unit.

可以用下面简单的观察法有效更新函数f。The function f can be updated efficiently using the following simple observation method.

另外，由于α＜＜1，如果if||x(t′_m)-P(t_i)||≥r，则有：In addition, since α<<1, if ||x(t′ _m )-P(t _i )||≥r, then:

现在只能对x(t′_j)的邻近点而不是整条路径的和进行更新，因此这是似然函数的重要属性。用S表示(t_i，P(t_i))的集，使||x(t′_m)-P(t_i)||≤τ，则有：Now we can only update the neighboring points of x(t′ _j ) instead of the sum of the entire path, so this is an important property of the likelihood function. Let S represent the set (t _i , P(t _i )) such that ||x(t′ _m )-P(t _i )||≤τ, then:

详细的说明请参见算法2.2。将f作为同时接收负整数索引的稀疏向量。 S集是x(t_i)在路径上所有邻近点的集合，可以用PPLEB算法很快地算出。很容易先验证x(t_i)的邻近点是不是受限于某一常数n_near，再验证向量f的非零数是不是受限于较大的常数系统该算法的最后一步是在 f([δ/τ」)大于某一临界值时输出δ的比值。See Algorithm 2.2 for a detailed description. Let f be a sparse vector that also accepts negative integer indices. Set S is the set of all neighbors of x(t _i ) on the path, which can be quickly computed using the PPLEB algorithm. It's easy to first verify that the neighbors of x(t _i ) are bounded by a constant n _near , and then verify that the number of nonzeros in the vector f is bounded by a larger constant. The final step of the algorithm is to output the ratio of δ when f([δ/τ”) is greater than a critical value.

3.附图说明3. Description of the Figures

图11显示了三个连接的点位置以及点位置周围的路径点。请注意，仅用底部点或中部点不足以识别路径的正确部分。但是，可以同时使用底部点和中部点进行识别。顶部点的添加有利于我们肯定粒子其实是路径的最终(左边)曲线。Figure 11 shows the three connected point locations and the path points surrounding them. Note that using only the bottom point or the middle point alone is not enough to identify the correct portion of the path. However, both the bottom and middle points can be used for identification. The addition of the top point helps us confirm that the particle is actually the final (left) curve of the path.

图12显示了n(灰)点的集合，算法得到查询点(黑)，返回与其距离τ内的点集(圆圈内的点)。在传统的设置中，算法必须返回所有点。在概率设置中，返回的点必须带常数概率。Figure 12 shows a set of n (gray) points. The algorithm is given a query point (black) and returns the set of points within a distance τ from it (the points in the circle). In the traditional setting, the algorithm must return all points. In the probabilistic setting, the points returned must have a constant probability.

图13显示了u(x₁)、u(x₂)、u(x)的值。直观地讲，如果短划线在x₁和x₂之间经过，则函数u赋予x₁和x₂不同的值，如不经过则赋予相同的值。短划线在随机方向上通过能确保通过的可能性与的x1和x2的夹角成正比。Figure 13 shows the values of u(x ₁ ), u(x ₂ ), and u(x). Intuitively, if the dashed line passes between x ₁ and x ₂ , function u assigns different values to x ₁ and x ₂ , while if it does not, it assigns the same value. The probability of the dashed line passing in a random direction is proportional to the angle between x 1 and x 2 .

在图14中，空间分成不同的圆圈，R_i介于半径2r(1+ε)ⁱ和2r(1+ε)ⁱ⁺¹之间，确保圆圈内任意两个向量的长度相同，不超过(1+ε)系数，最多在 1/ε个圆圈中搜索。In Figure 14, the space is divided into different circles, and R _i is between the radius 2r(1+ε) ⁱ and 2r(1+ε) ⁱ⁺¹ . It is ensured that the lengths of any two vectors in the circle are the same and do not exceed the (1+ε) coefficient. At most 1/ε circles are searched.

图15显示了自相交的路径和查询点(黑色)。图15表明，没有粒子的位置历史不可能了解粒子在路径上的位置。A self-intersecting path and query point (black) are shown in Figure 15. Figure 15 shows that it is impossible to learn the position of a particle on the path without its position history.

图16显示了三个连续点位以及点位周围的路径点。请注意，单独的x(t₁) 或x(t₂)均不足以识别路径的正确部分。只有结合x(t₁)和x(t₂)才能正常识别。 x(t₃)的添加有利于我们肯定粒子其实是路径的最终(左边)曲线。Figure 16 shows three consecutive points and the path points surrounding them. Note that neither x(t ₁ ) nor x(t ₂ ) alone is sufficient to identify the correct portion of the path. Only the combination of x(t ₁ ) and x(t ₂ ) allows proper identification. The addition of x(t ₃ ) helps us confirm that the particle is actually the final (left-hand) curve of the path.

参考文献References

[l]Alina Beygelzimer,Sham Kakade,and John Langford.Cover trees fornearest neighbor.In ICML,pages 97-104,2006.[l]Alina Beygelzimer, Sham Kakade, and John Langford. Cover trees for nearest neighbor. In ICML, pages 97-104, 2006.

[2]Thomas H.Cormen,Charles E.Leiserson,Ronald L.Rivest,and CliffordStein. Introduction to Algorithms.The MIT Press,2nd revised edition edition,September 2001.[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. The MIT Press, 2nd revised edition, September 2001.

[3]Aristides Gionis,Piotr Indyk,and Rajeev Motwani.Similarity searchin high dimensions via hashing.In VLDB,pages 518-529,1999.[3] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518-529, 1999.

[4]Eyal Kushilevitz,Rafaii Ostrovsky,and Yuval Rabani.Efficientsearch for approximate nearest neighbor in high dimensional spaces.SI AMJ.Comput., 30(2):457-474,2000.[4] Eyal Kushilevitz, Rafaii Ostrovsky, and Yuval Rabani. Efficientsearch for approximate nearest neighbor in high dimensional spaces. SI AMJ. Comput., 30(2):457-474,2000.

[5]Qin Lv,William Josephson,Zhe Wang,Moses Charikar,and Kai Li.Multi-probe Ish:Efficient indexing for high-dimensional similarity search.In VLDB,pages 950-961,2007.[5] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe Ish: Efficient indexing for high-dimensional similarity search. In VLDB, pages 950-961, 2007.

Claims

1. A method for determining context-oriented content, comprising:

The server determines the point sampling instructions to be sent to the media system;

The server sends the point sampling command to the media system;

The server receives a message including pixel clues, wherein the message is associated with the point sampling instruction, and wherein the pixel clues include the average of the pixel values of a plurality of individual pixels in a two-dimensional pixel array in a frame of a video clip displayed by the media system.

The server uses the pixel clues to determine the content to be sent to the media system, wherein the content is associated with the video segment; and

The content is sent by the server, and the content is sent to the media system to be displayed together with one or more frames that are being displayed by the media system.

2. The method of claim 1, wherein the pixel cue point is associated with the point sampling instruction, and/or the two-dimensional pixel array is each of a plurality of different two-dimensional pixel arrays sampled from a frame of the video segment displayed by the media system.

3. The method of claim 1, wherein the content is determined by matching the pixel cue points with stored pixel cue points.

4. The method of claim 1, wherein the point sampling instruction indicates a plurality of pixel values to be used for pixel cue points.

5. The method of claim 1, wherein the point sampling instruction indicates the frequency at which the media system samples pixel clue points.

6. The method of claim 1, further comprising:

The video segment being displayed by the media system is identified based on the pixel cues.

7. A system for determining context-oriented content, comprising:

One or more processors; and

A non-transitory computer-readable medium comprising instructions that, when executed by the one or more processors, cause the one or more processors to:

Determine the point sampling command to be sent to the media system;

Send the point sampling command to the media system;

Receive a message including pixel clues, wherein the message is associated with the point sampling instruction, and wherein the pixel clues include the average of the pixel values of a plurality of individual pixels in a two-dimensional pixel array in a frame of a video clip displayed by the media system;

The pixel cues are used to determine the content to be sent to the media system, wherein the content is associated with the video segment; and

The content is sent to the media system for display along with one or more frames that are being displayed by the media system.

8. The system of claim 7, wherein the pixel cue point is associated with the point sampling instruction, and/or the two-dimensional pixel array is each of a plurality of different two-dimensional pixel arrays sampled from a frame of the video segment displayed by the media system.

9. The system of claim 7, wherein the content is determined by matching the pixel cue points with stored pixel cue points.

10. The system of claim 7, wherein the point sampling instruction indicates a plurality of pixel values to be used for pixel cue points.

11. The system of claim 7, wherein the point sampling instruction indicates the frequency at which the media system samples pixel cue points.

12. The system of claim 7, wherein when the one or more processors execute the instructions, the instructions further cause the one or more processors to:

13. A method for displaying context-oriented content, comprising:

Receives signals containing video data, including video clips;

The media system displays frames of the video segment based on the video data.

The media system receives point sampling instructions from a remote server;

The media system generates a request based on the point sampling instruction. The request includes the media system's identification, time stamp, and pixel cue points. The request is sent to the remote server, and the pixel cue points are generated by mathematically transforming the pixel values of multiple individual pixels in a two-dimensional pixel array in the frame.

The request is sent by the media system, wherein the request is sent to the remote server;

The media system receives content data from the remote server in response to sending the request, wherein the content data represents content associated with frames of the video segment being displayed by the media system; and

The content associated with the frame is displayed while one or more other frames of the video segment are being displayed by the media system.

14. The method of claim 13, wherein the request is sent to the server based on the point sampling instruction, and/or the two-dimensional pixel array is each of a plurality of different two-dimensional pixel arrays sampled from a frame of the video segment displayed by the media system.

15. The method of claim 13, wherein the pixel cue points are generated based on the point sampling instructions.

16. The method of claim 13, wherein the number of pixel values used for the mathematical transformation is based on the point sampling instruction.

17. The method of claim 13, wherein the frequency at which the media system samples the pixel cue points is based on the cue point sampling instruction.