HK1217794B

HK1217794B - Systems and methods for addressing a media database using distance associative hashing

Info

Publication number: HK1217794B
Application number: HK16105782.3A
Authority: HK
Inventors: 泽夫．纽梅尔; 布莱恩．里德
Original assignee: 构造资料有限责任公司
Priority date: 2013-03-15
Filing date: 2014-03-17
Publication date: 2019-12-20

Description

System and method for addressing a media database using distance-associative hashing

优先权要求Priority claim

本申请构成2010年5月27日提交的、并且2013年11月6日作为美国专利号8,595,781发布的题为“METHODS FOR IDENTIFYING VIDEO SEGMENTS AND DISPLAYINGCONTEXTUAL TARGETED CONTENT ON A CONNECTED TELEVISION(用于标识视频片段并且在所连接的电视上显示上下文的定为目标的内容的方法)”的美国专利申请号12/788,721的部分继续申请，那项申请是要求2009年5月29日提交的题为“SYSTEM FOR PROCESSINGCONTENT INFORMATION IN A TELEVIDEO SIGNAL(用于处理电视视频信号中的内容信息的系统)”的美国临时专利申请号61/182,334的权益的非临时申请、并且是要求2009年12月29日提交的题为“CONTEXTUALTARGETING BASED ON DATA RECEIVED FROM ATELEVISIONSYSTEM(基于从电视系统接收的数据的上下文目标确定)”的美国临时专利申请号61/290,714的权益的非临时申请；本申请进一步构成2010年5月27日提交的题为“METHODS FORDISPLAYING CONTEXTUALLY TARGETED CONTENT ON ACONNECTEDTELEVISION(用于在所连接的电视上显示根据上下文定为目标的内容的方法)”的美国专利申请号12/788,748的部分继续申请；本申请进一步构成2013年11月25日提交的题为“__”的美国专利申请号14/089,003的部分继续申请；本申请进一步构成2014年3月17日提交的题为“SYSTEMS AND METHODSFOR IDENTIFYING VIDEO SEGMENTS FOR DISPLAYING CONTEXTUALLY RELEVANTCONTENT(用于标识视频片段以便显示上下文相关内容的系统和方法)”的美国专利申请号**/***,***的部分继续申请；本申请进一步构成2014年3月17日提交的题为“SYSTEMSANDMETHODSFORREAL-TIME TELEVISION AD DETECTION USING AN AUTOMATEDCONTENT RECOGNITION DATABASE(用于使用自动化内容识别数据库的实时电视广告检测的系统和方法)”的美国专利申请号**/***,***的部分继续申请；本申请进一步构成2014年3月17日提交的题为“SYSTEMSANDMETHODS FOR ON-SCREEN GRAPHICS DETECTION(用于屏幕上图形检测的系统和方法)”的美国专利申请号**/***,***的部分继续申请；本申请进一步构成2014年3月17日提交的题为“SYSTEMS AND METHODS FOR IMPROVING SERVER ANDCLIENT PERFORMANCE INFINGERPRINTACR SYSTEMS(用于改进指纹ACR系统中的服务器和客户端性能的系统和方法)”的美国专利申请号**/***,***的部分继续申请；本申请进一步构成2014年3月17日提交的题为“SYSTEMS AND METHODS FOR MULTI-BROADCASTDIFFERENTIATION(用于多重广播区分的系统和方法)”的美国专利申请号**/***,***的部分继续申请；并且本申请进一步构成2013年3月15日提交的题为“SYSTEMS AND METHODSFOR IDENTIFYING VIDEO SEGMENTS BEINGDISPLAYEDONREMOTELYLOCATEDTELEVISIONS(用于标识在远程定位的电视上显示的视频片段的系统和方法)”的美国专利申请号61/791,578的部分继续申请。前述申请或者是当前共同未决的或者是当前共同未决申请的有权享有提交日期的权益的申请。This application constitutes a continuation-in-part of U.S. Patent Application No. 12/788,721, filed May 27, 2010, and issued as U.S. Patent No. 8,595,781 on November 6, 2013, entitled “METHODS FOR IDENTIFYING VIDEO SEGMENTS AND DISPLAYING CONTEXTUAL TARGETED CONTENT ON A CONNECTED TELEVISION,” which application is a non-provisional application claiming the benefit of U.S. Provisional Patent Application No. 61/182,334, filed May 29, 2009, entitled “SYSTEM FOR PROCESSING CONTENT INFORMATION IN A TELEVIDEO SIGNAL,” and filed December 29, 2009, entitled “CONTEXTUAL TRARGETING BASED ON DATA RECEIVED FROM This application is a non-provisional application that is a continuation-in-part of U.S. Patent Application No. 61/290,714, entitled “CONTEXTUALLY TARGETED CONTENT ON A CONNECTED TELEVISION SYSTEM” filed on May 27, 2010; this application is a continuation-in-part of U.S. Patent Application No. 14/089,003, entitled “__”, filed on November 25, 2013; this application is a continuation-in-part of U.S. Patent Application No. 61/290,714, entitled “CONTEXTUALLY TARGETED CONTENT ON A CONNECTED TELEVISION SYSTEM” filed on March 17, 2014; and this application is a continuation-in-part of U.S. Patent Application No. 14/089,003, entitled “__”, filed on November 25, 2013; and this application is a continuation-in-part of U.S. Patent Application No. 14/089,003, entitled “__”, filed on March 17, 2014 RELEVANTCONTENT (System and method for identifying video clips for displaying contextually relevant content)"; this application further constitutes a continuation-in-part of U.S. patent application No. **/***,***, filed on March 17, 2014, entitled "System and method for real-time television ad detection using an automated content recognition database"; this application further constitutes a continuation-in-part of U.S. patent application No. **/***,***, filed on March 17, 2014, entitled "System and method for real-time television advertisement detection using an automated content recognition database"; this application further constitutes a continuation-in-part of U.S. patent application No. **/***,***, filed on March 17, 2014, entitled "System and method for on-screen graphics detection"; this application further constitutes a continuation-in-part of U.S. patent application No. **/***,***, filed on March 17, 2014, entitled "System and method for on-screen graphics detection"; this application further constitutes a continuation-in-part of U.S. patent application No. **/***,***, filed on March 17, 2014, entitled "Systems and methods for improving server and client performance in real-time television advertisement detection" and Methods for Identifying Video Segments Displayed on Remotely Located Televisions. The foregoing applications are either currently co-pending or are applications that claim the benefit of the filing date of currently co-pending applications.

发明领域Field of the Invention

本发明总体上涉及对未知媒体数据(如，视频或音频片段)与参考媒体文件的海量数据库进行的匹配。The present invention generally relates to matching unknown media data (eg, video or audio clips) with massive databases of reference media files.

背景background

用于对音频或视频媒体的自动化内容识别(ACR)的系统是本领域技术人员众所周知的。然而，这种ACR系统带来了许多技术挑战，包括管理经编码的音频或视频信息的潜在地非常大的数据库以及管理对在所述数据库中的信息进行定址所需的大索引。Systems for automatic content recognition (ACR) of audio or video media are well known to those skilled in the art. However, such ACR systems present many technical challenges, including managing potentially very large databases of encoded audio or video information and managing the large indexes required to address the information in the databases.

同样是本领域技术人员众所周知的是，大数据库索引(如可以用于本发明中的)可以使用某些散列函数来生成。另一种对数据库进行定址的方法可以是通过应用也被称为b树的二进制树结构。两种方法都常用于数据管理系统中。It is also well known to those skilled in the art that large database indexes (such as those used in the present invention) can be generated using certain hash functions. Another method of addressing a database can be through the use of a binary tree structure also known as a b-tree. Both methods are commonly used in data management systems.

无论采用何种方法来对大数据库进行标引，所述索引经常太大以至于不能以其整体驻留在如在典型的ACR系统中所使用的计算机服务器的主存储器中。当所述数据库无法完全地装配在计算机系统的存储器中时，其通常被存储于磁盘存储器上，并且然后所述数据库的一些部分被按块对应于提供地址的索引值读入到存储器中。所述再调用部分数据库信息的手段也被本领域技术人员称为“分页(paging)”，这是许多不同的计算机软件系统所共有的过程。Regardless of the method used to index a large database, the index is often too large to reside in its entirety in the main memory of a computer server, such as that used in a typical ACR system. When the database cannot be fully stored in the computer system's memory, it is typically stored on disk storage, and portions of the database are then read into memory in blocks corresponding to the index values that provide the addresses. This method of recalling partial database information is also known to those skilled in the art as "paging," a process common to many different computer software systems.

本发明是对以上所引用的发明的扩展并且是一种用于使用信号处理手段来对未知的数字媒体(如电视节目)与已知媒体的数据库进行匹配的系统和方法，该信号处理手段采用经修改的路径追踪算法(如在第一个发明中所描述的)。The present invention is an extension of the invention referenced above and is a system and method for matching unknown digital media (such as television programs) with a database of known media using signal processing means that employ a modified path tracing algorithm (as described in the first invention).

如在此所披露的系统和方法的另一个新颖方面是距离关联性散列标引手段，该距离关联性散列标引手段可以被细分为多个可独立定址的片段，其中，所述片段中的每个片段都可以对数据库的一部分进行定址，所述片段中的每个片段都可以以其整体驻留于服务器装置的主存储器中。标引手段的所产生的服务器群集各自托管对可搜索的音频或视频信息的较大数据库的相关联数据进行定址的索引的扇区。本发明的这种标引手段导致了ACR系统的速度和准确性的显著改进，该ACR系统这样被使能以便即使当电视显视器正在放映用户正在对来自数字视频记录器的视频进行改变频道、快倒、快进或甚至暂停的内容时对未知媒体进行标识。Another novel aspect of the systems and methods disclosed herein is a distance-associative hash indexing means that can be subdivided into a plurality of independently addressable segments, wherein each of the segments can address a portion of a database, and each of the segments can reside in its entirety in the main memory of a server device. The resulting server cluster of indexing means each hosts a sector of an index that addresses associated data of a larger database of searchable audio or video information. This indexing means of the present invention results in a significant improvement in the speed and accuracy of ACR systems that are enabled to identify unknown media even when a television monitor is showing content while a user is changing channels, rewinding, fast forwarding, or even pausing video from a digital video recorder.

概述Overview

在一些实施例中，一种涉及使用距离关联性散列法来对媒体数据库进行定址的示例性方法可以包括接收对视频片段的样本的一个或多个指示；针对包括至少一个分片的至少一个或多个像素的视频片段的该样本的该至少一个分片来确定每个分片的该一个或多个像素的以算法方式导出的值；从针对每个分片的平均值中减去针对每个分片所建立的中间点值；使用预先导出的函数来变换由于该减法而产生的这些值以均匀地分布这些值；根据这些经变换的值构建散列值；引用所构建的该散列值的多个最高有效位以确定数据库扇区；以及至少将该散列值存储在所确定的该数据库扇区上。In some embodiments, an exemplary method involving using distance-associative hashing to address a media database may include receiving one or more indications of a sample of a video fragment; determining, for at least one fragment of the sample of the video fragment including at least one or more pixels of at least one fragment, an algorithmically derived value for the one or more pixels of each fragment; subtracting a midpoint value established for each fragment from an average value for each fragment; transforming the values resulting from the subtraction using a pre-derived function to evenly distribute the values; constructing a hash value based on the transformed values; referencing multiple most significant bits of the constructed hash value to determine a database sector; and storing at least the hash value on the determined database sector.

在一些实施例中，使用一个或多个处理装置来至少部分地实现前述示例性方法的接收、确定、减法、变换、构建、引用、或存储中的至少一项。在前述示例性方法的一些实施例中，接收对视频片段的样本的一个或多个指示可以包括接收对帧或静止图像中的至少一项的一个或多个指示。在前述示例性方法的一些实施例中，接收对视频片段的样本的一个或多个指示可以包括接收对视频片段的样本的一个或多个指示，对视频片段的样本的该一个或多个指示与对频道的至少一个指示、对视频片段的至少一个指示、以及对与该视频片段的开始的时间代码偏移的至少一个指示相关联。In some embodiments, one or more processing devices are used to at least partially implement at least one of the receiving, determining, subtracting, transforming, constructing, referencing, or storing of the aforementioned exemplary method. In some embodiments of the aforementioned exemplary method, receiving one or more indications of samples of the video segment may include receiving one or more indications of at least one of a frame or a still image. In some embodiments of the aforementioned exemplary method, receiving one or more indications of samples of the video segment may include receiving one or more indications of samples of the video segment, the one or more indications of samples of the video segment being associated with at least one indication of a channel, at least one indication of the video segment, and at least one indication of a time code offset from a start of the video segment.

在前述示例性方法的一些实施例中，针对包括至少一个分片的至少一个或多个像素的视频片段的该样本的该至少一个分片来确定每个分片的该一个或多个像素的以算法方式导出的值包括针对包括至少一个分片的至少一个或多个像素的视频片段的该样本的该至少一个分片来确定每个分片的该一个或多个像素的平均值。在前述示例性方法的一些实施例中，从针对每个分片的平均值中减去针对每个分片所建立的中间点值可以包括从针对每个分片的平均值中减去针对每个分片所建立的中间点值，之前已经针对多个频道在至少一个时间段上使用来自每个分片的数据确定了针对每个分片所建立的该中间点值。In some embodiments of the aforementioned exemplary methods, determining, for the at least one slice of the sample of the video segment including the at least one or more pixels of the at least one slice, the algorithmically derived value for the one or more pixels of each slice comprises determining, for the at least one slice of the sample of the video segment including the at least one or more pixels of the at least one slice, an average value for the one or more pixels of each slice. In some embodiments of the aforementioned exemplary methods, subtracting a midpoint value established for each slice from the average value for each slice may comprise subtracting a midpoint value established for each slice from the average value for each slice, the midpoint value established for each slice having been previously determined for a plurality of channels over at least one time period using data from each slice.

在前述示例性方法的一些实施例中，使用预先导出的函数来变换由于该减法而产生的这些值以均匀地分布这些值可以包括形成至少包括由于该减法而产生的这些值的可变矩阵；获得在与该可变矩阵相交叉时将更均匀地分布这些经变换的值的静态矩阵；以及计算该可变矩阵与该静态矩阵的点积，该点积至少包括这些被更均匀地分布的经变换的值。在前述示例性方法的一些实施例中，获得在与该可变矩阵相交叉时将更均匀地分布这些经变换的值的静态矩阵可以包括使用位置敏感散列法至少部分地基于一个或多个先前所获得的散列值来确定在与可变矩阵相交叉时将更均匀地分布该可变矩阵的这些经变换的值的静态矩阵。In some embodiments of the aforementioned exemplary method, transforming the values resulting from the subtraction using a pre-derived function to evenly distribute the values may include forming a variable matrix including at least the values resulting from the subtraction; obtaining a static matrix that, when intersected with the variable matrix, more evenly distributes the transformed values; and computing a dot product of the variable matrix with the static matrix, the dot product including at least the more evenly distributed transformed values. In some embodiments of the aforementioned exemplary method, obtaining the static matrix that, when intersected with the variable matrix, more evenly distributes the transformed values may include using position-sensitive hashing to determine the static matrix that, when intersected with the variable matrix, more evenly distributes the transformed values of the variable matrix based at least in part on one or more previously obtained hash values.

在前述示例性方法的一些实施例中，根据这些经变换的值构建散列值可以包括根据这些经变换的值构建散列值，包括至少通过将每个经变换的值减小至二进制表示来减小这些经变换的值的保真度。在前述示例性方法的一些实施例中，通过将每个经变换的值减小至二进制表示来减小这些经变换的值的保真度可以包括针对每个经变换的值确定该经变换的值是否是正数，并且如果该经变换的值是正数，将一赋值给该散列值，并且否则将零赋值给该散列值。In some embodiments of the aforementioned exemplary method, constructing a hash value from the transformed values may include constructing a hash value from the transformed values including reducing the fidelity of the transformed values by at least reducing each transformed value to a binary representation. In some embodiments of the aforementioned exemplary method, reducing the fidelity of the transformed values by reducing each transformed value to a binary representation may include determining, for each transformed value, whether the transformed value is a positive number, and assigning one to the hash value if the transformed value is positive, and otherwise assigning zero to the hash value.

在前述示例性方法的一些实施例中，引用所构建的该散列值的多个最高有效位以确定数据库扇区可以包括引用所构建的该散列值的多个最高有效位以确定数据库服务器，其中，该多个最高有效位被预先确定为定址多个数据库服务器，其中，与该多个最高有效位相关联的多个数据库服务器被建立为使得与数据库扇区相关联的至少一个索引能够完全地驻留在相应的数据库服务器的存储器中。在前述示例性方法的一些实施例中，至少将该散列值存储在所确定的该数据库扇区上可以包括至少将该散列值存储在所确定的该数据库扇区上，包括至少部分地基于该散列值至少将对频道的至少一个指示、对视频片段的至少一个指示、以及对与该视频片段的开始的时间代码偏移的至少一个指示存储在数据库位置处。In some embodiments of the aforementioned exemplary method, referencing the constructed plurality of most significant bits of the hash value to determine the database sector may include referencing the constructed plurality of most significant bits of the hash value to determine a database server, wherein the plurality of most significant bits are predetermined to address a plurality of database servers, wherein the plurality of database servers associated with the plurality of most significant bits are established such that at least one index associated with the database sector can reside entirely in memory of the corresponding database server. In some embodiments of the aforementioned exemplary method, storing at least the hash value on the determined database sector may include storing at least the hash value on the determined database sector, including storing at least one indication of a channel, at least one indication of a video segment, and at least one indication of a time code offset from a start of the video segment at a database location based at least in part on the hash value.

在前述示例性方法的一个或多个替代性实施例中，多个相关系统包括但不限于用于实现在此所引用的方法实施例的电路和/或编程；取决于系统设计者的设计选择，该电路和/或编程实际上可以是被配置成用于实现在此所引用的方法方面的硬件、软件、和/或固件的任何组合。In one or more alternative embodiments of the aforementioned exemplary methods, a plurality of related systems include, but are not limited to, circuitry and/or programming for implementing the method embodiments referenced herein; depending on the design choices of the system designer, such circuitry and/or programming may actually be any combination of hardware, software, and/or firmware configured to implement aspects of the methods referenced herein.

在不同的实施例中，一种涉及使用距离关联性散列法来对媒体数据库进行定址的示例性方法可以包括接收提示，该提示通过与介质存储操作相关联的一个或多个操作来构建；引用所接收的该提示的多个最高有效位以确定数据库扇区；以及至少部分地基于所接收的该提示返回对来自该数据库扇区的至少一个候选项的至少一个指示。In various embodiments, an exemplary method involving addressing a media database using distance-associative hashing may include receiving a hint constructed by one or more operations associated with media storage operations; referencing a plurality of most significant bits of the received hint to determine a database sector; and returning at least one indication of at least one candidate from the database sector based at least in part on the received hint.

在前述示例性方法的一些实施例中，接收提示(该提示通过与介质存储操作相关联的一个或多个操作来构建)可以包括接收与客户端系统的视频缓冲器的样本相关联的提示，包括至少接收与时刻相关的一个或多个指示，该时刻与该客户端系统的该视频缓冲器的该样本相关联。在前述示例性方法的一些实施例中，接收提示(该提示通过与介质存储操作相关联的一个或多个操作来构建)可以包括接收提示，该提示与客户端系统的视频缓冲器的样本相关联，该提示至少部分地通过对与该视频缓冲器相关联的至少一些值进行散列来确定。In some embodiments of the foregoing exemplary method, receiving a hint (the hint constructed by one or more operations associated with media storage operations) may include receiving a hint associated with a sample of a video buffer of a client system, including at least receiving one or more indications related to a time instant associated with the sample of the video buffer of the client system. In some embodiments of the foregoing exemplary method, receiving a hint (the hint constructed by one or more operations associated with media storage operations) may include receiving a hint associated with a sample of a video buffer of a client system, the hint being determined at least in part by hashing at least some values associated with the video buffer.

在前述示例性方法的一些实施例中，接收提示(该提示与客户端系统的视频缓冲器的样本相关联，该提示至少部分地通过对与该视频缓冲器相关联的至少一些值进行散列来确定)可以包括接收提示，该提示与客户端系统的视频缓冲器的样本相关联，该提示至少部分地通过对与该视频缓冲器相关联的至少一些值进行散列来确定，该散列至少部分地基于同样用于相关联的介质存储操作中的至少一个操作数或至少一个算法中的一个或多个。在前述示例性方法的一些实施例中，接收提示(该提示通过与介质存储操作相关联的一个或多个操作来构建)可以包括接收提示，该提示通过至少包括以下各项的一个或多个操作来确定：接收对客户端系统的视频缓冲器的至少一项内容的一个或多个指示；针对包括至少一个分片的至少一个或多个像素的该视频缓冲器的该至少一项内容的该至少一个分片来确定每个分片的该一个或多个像素的以算法方式导出的值；从针对每个分片的平均值中减去中间点值；对由于该减法所产生的这些值进行变换；根据这些经变换的值构建散列值；以及将该提示至少部分地与所构建的该散列值相关联，其中，该确定操作、减法操作、变换操作、或构建操作中的至少一个利用同样用于相关联的介质存储操作中的至少一个操作数或至少一个算法中的一个或多个。In some embodiments of the foregoing exemplary method, receiving a hint associated with a sample of a video buffer of the client system, the hint determined at least in part by hashing at least some values associated with the video buffer, may include receiving a hint associated with a sample of a video buffer of the client system, the hint determined at least in part by hashing at least some values associated with the video buffer, the hash based at least in part on one or more of at least one operand or at least one algorithm also used in an associated media storage operation. In some embodiments of the aforementioned exemplary method, receiving a hint (the hint being constructed by one or more operations associated with a media storage operation) may include receiving a hint determined by one or more operations comprising at least the following: receiving one or more indications of at least one content of a video buffer of a client system; determining, for at least one slice of the at least one content of the video buffer comprising at least one pixel of at least one slice, an algorithmically derived value for the one or more pixels of the at least one slice; subtracting a midpoint value from an average value for each slice; transforming the values resulting from the subtraction; constructing a hash value based on the transformed values; and associating the hint at least in part with the constructed hash value, wherein at least one of the determining operation, the subtracting operation, the transforming operation, or the constructing operation utilizes one or more of at least one operand or at least one algorithm that is also used in the associated media storage operation.

在前述示例性方法的一些实施例中，至少部分地基于所接收的该提示返回对来自该数据库扇区的至少一个候选项的至少一个指示可以包括至少部分地基于作为所接收的该提示的函数的平等球中的概率点位置(“PPLEB”)算法来返回对来自该数据库扇区的至少一个候选项的至少一个指示。在前述示例性方法的一些实施例中，至少部分地基于所接收的该提示返回对来自该数据库扇区的至少一个候选项的至少一个指示可以包括至少部分地基于所接收的该提示来返回对来自该数据库扇区的至少一个候选项的至少一个指示，该至少一个候选项在所接收的该提示的预先确定的逆百分比分布半径内。In some embodiments of the aforementioned exemplary methods, returning at least one indication of at least one candidate from the database sector based at least in part on the received hint may include returning at least one indication of at least one candidate from the database sector based at least in part on a Probability Point Location in an Equal Sphere ("PPLEB") algorithm as a function of the received hint. In some embodiments of the aforementioned exemplary methods, returning at least one indication of at least one candidate from the database sector based at least in part on the received hint may include returning at least one indication of at least one candidate from the database sector that is within a predetermined inverse percent distribution radius of the received hint based at least in part on the received hint.

在不同的实施例中，一种涉及使用距离关联性散列法来对媒体数据库进行定址的示例性方法可以包括接收对至少一个候选项的至少一个指示和对至少一个提示的至少一个指示；将令牌添加到与至少一个所接收的候选项相关联的箱；以及确定在箱内的令牌的数量是否超过与客户端系统正在显示与至少一个提示相关联的具体视频片段的概率相关联的值，并且如果在箱内的令牌的该数量超过了与客户端系统正在显示与至少一个提示相关联的具体视频片段的概率相关联的值，至少部分地基于该箱返回与该具体视频片段相关联的至少一些数据。In various embodiments, an exemplary method involving addressing a media database using distance-associative hashing may include receiving at least one indication of at least one candidate and at least one indication of at least one prompt; adding tokens to a bin associated with at least one received candidate; and determining whether the number of tokens within the bin exceeds a value associated with a probability that the client system is displaying a specific video clip associated with the at least one prompt, and if the number of tokens within the bin exceeds the value associated with the probability that the client system is displaying the specific video clip associated with the at least one prompt, returning at least some data associated with the specific video clip based at least in part on the bin.

在前述示例性方法的一些实施例中，将令牌添加到与至少一个所接收的候选项相关联的箱可以包括将令牌添加到与至少一个所接收的候选项相关联的时间箱。在前述示例性方法的一些实施例中，将令牌添加到与至少一个所接收的候选项相关联的箱可以包括确定相对时间，包括至少从与该至少一个提示相关联的任意时间减去与该至少一个候选项相关联的候选项时间；以及至少部分地基于所确定的该相对时间将令牌添加到与该候选项相关联的时间箱。在前述示例性方法的一些实施例中，该方法可以包括至少部分地基于所经过的时间段从时间箱移除一个或多个令牌。In some embodiments of the aforementioned exemplary method, adding the token to a bin associated with at least one received candidate may include adding the token to a time bin associated with the at least one received candidate. In some embodiments of the aforementioned exemplary method, adding the token to a bin associated with the at least one received candidate may include determining a relative time, including subtracting a candidate time associated with the at least one candidate from an arbitrary time associated with the at least one prompt; and adding the token to the time bin associated with the candidate based at least in part on the determined relative time. In some embodiments of the aforementioned exemplary method, the method may include removing one or more tokens from a time bin based at least in part on an elapsed time period.

在不同的实施例中，一种涉及使用距离关联性散列法对媒体数据库进行定址的示例性系统可以包括但不限于：一个或多个计算装置；以及一条或多条指令，当该一条或多条指令被执行于该一个或多个计算装置中的至少一些上时使该一个或多个计算装置中的至少一些至少执行以下操作：接收至少一个栅格化视频流；创建与至少一个所接收的栅格化视频流的至少一个样本相关联的至少一个散列值；确定用于存储所创建的至少一个散列值的至少一个数据库扇区；以及将所创建的至少一个散列值存储在所确定的至少一个数据库扇区上。In various embodiments, an exemplary system involving addressing a media database using distance-associative hashing may include, but is not limited to: one or more computing devices; and one or more instructions that, when executed on at least some of the one or more computing devices, cause at least some of the one or more computing devices to at least perform the following operations: receive at least one rasterized video stream; create at least one hash value associated with at least one sample of at least one received rasterized video stream; determine at least one database sector for storing the at least one created hash value; and store the at least one created hash value on the at least one determined database sector.

在不同的实施例中，一种涉及使用距离关联性散列法对媒体数据库进行定址的示例性系统可以包括但不限于：一个或多个计算装置；以及一条或多条指令，当该一条或多条指令被执行于该一个或多个计算装置中的至少一些上时使该一个或多个计算装置中的至少一些至少执行以下操作：接收与至少一个客户端系统的至少一个视频缓冲器相关联的一个或多个指令；至少部分地基于该至少一个视频缓冲器以及与该至少一个视频缓冲器相关联的至少一个时刻来确定提示，其中，与确定该提示相关联的至少一个操作数或至少一个函数中的一个或多个还被用于相关联的介质存储操作中；引用所确定的提示的多个最高有效位以确定数据库扇区；以及至少部分地基于所确定的提示返回对来自所确定的数据库扇区的至少一个候选项的至少一个指示。In various embodiments, an exemplary system involving addressing a media database using distance-associative hashing may include, but is not limited to: one or more computing devices; and one or more instructions that, when executed on at least some of the one or more computing devices, cause at least some of the one or more computing devices to at least perform the following operations: receive one or more instructions associated with at least one video buffer of at least one client system; determine a hint based at least in part on the at least one video buffer and at least one time associated with the at least one video buffer, wherein one or more of at least one operand or at least one function associated with determining the hint is also used in an associated media storage operation; reference multiple most significant bits of the determined hint to determine a database sector; and return at least one indication of at least one candidate from the determined database sector based at least in part on the determined hint.

在不同的实施例中，一种涉及使用距离关联性散列法对媒体数据库进行定址的示例性系统可以包括但不限于：一个或多个计算装置；以及一条或多条指令，当该一条或多条指令被执行于该一个或多个计算装置中的至少一些上时使该一个或多个计算装置中的至少一些至少执行以下操作：接收对至少一个候选项的至少一个指示和对至少一个提示的至少一个指示；将令牌添加到与至少一个所接收的候选项相关联的箱；以及确定在箱内的令牌的数量是否超过与客户端系统正在接收与所接收到的至少一个提示相关联的具体视频片段的概率相关联的值，并且如果在箱内的令牌的该数量超过了与客户端系统正在接收与所接收到的至少一个提示相关联的具体视频片段的概率相关联的值，至少部分地基于该箱返回与该具体视频片段相关联的至少一些数据。In various embodiments, an exemplary system involving addressing a media database using distance-associative hashing may include, but is not limited to, one or more computing devices; and one or more instructions that, when executed on at least some of the one or more computing devices, cause at least some of the one or more computing devices to at least perform the following operations: receive at least one indication of at least one candidate and at least one indication of at least one prompt; add tokens to a box associated with at least one received candidate; and determine whether the number of tokens within the box exceeds a value associated with a probability that the client system is receiving a specific video clip associated with at least one received prompt, and if the number of tokens within the box exceeds a value associated with the probability that the client system is receiving a specific video clip associated with at least one received prompt, return at least some data associated with the specific video clip based at least in part on the box.

除了前述内容之外，在如本披露的正文(例如，权利要求书、附图和/或详细说明)和/或附图的传授中阐述和描述了不同的其他方法、系统和/或程序产品实施例。In addition to the foregoing, various other method, system, and/or program product embodiments are set forth and described in the text of this disclosure (eg, claims, drawings, and/or detailed description) and/or as taught in the accompanying drawings.

上述是概述并且因此(必然地)包含了细节的简化、概括及省略；因此本领域的技术人员将认识到该概述仅是说明性的而并非旨在是以任何方式来限制。在此描述的装置和/或过程和/或其他主题的其他方面、实施例、特征以及优点将于在此阐述的传授中变得明显。The foregoing is a summary and therefore (necessarily) contains simplifications, generalizations, and omissions of detail; therefore, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be limiting in any way. Other aspects, embodiments, features, and advantages of the devices and/or processes and/or other subject matter described herein will become apparent from the teachings set forth herein.

附图简要说明BRIEF DESCRIPTION OF THE DRAWINGS

参照以下附图，以下更详细地描述了本发明的某些实施例。Certain embodiments of the present invention are described in more detail below with reference to the following drawings.

图1展示了如本发明所教导的扇区化视频匹配数据库的构建，该扇区化视频匹配数据库以随后被持续更新的初始视频摄取或捕获过程开始。示出了电视显视系统101及其相应的电视显示存储器缓冲器103，用于该系统的潜在实施例。针对每个像素分片来进行使用本领域技术人员已知的某种算法手段的像素分片102的分配以及对值105的计算，并且创建所产生的数据结构并且然后将其打上时间戳以产生“提示”106，该提示还可以具有与其相关联的附加元数据。FIG1 illustrates the construction of a sectorized video matching database, as taught by the present invention, beginning with an initial video ingestion or capture process that is subsequently continuously updated. A television display system 101 and its corresponding television display memory buffer 103 are shown as potential embodiments of the system. The allocation of pixel tiles 102 and the calculation of values 105 using some algorithmic approach known to those skilled in the art are performed for each pixel tile, and the resulting data structure is created and then timestamped to produce a "hint" 106, which may also have additional metadata associated with it.

图2展示了使用距离关联性散列过程来处理提示数据201并且生成散列索引202，进一步展示了扇区化定址方案203以将数据存储在相关的组(存储桶(bucket))206中。FIG. 2 illustrates the use of a distance-associative hashing process to process hint data 201 and generate a hash index 202 , and further illustrates a sectorized addressing scheme 203 to store data in related groups (buckets) 206 .

图3：展示了对未知电视内容的实时捕获，用于从所连接的电视监视器等301进行识别。像素分片通常被定义为视频缓冲器303的具有大约十个像素乘以十行像素304的尺寸的正方形像素区，然而，可以使用任何合理的形状和尺寸。像素分片位置的数量可以是在所述视频缓冲器内的十个与五十个位置之间的任何数量并且被处理305为用于将提示数据306发送至中央服务器装置。Figure 3: shows real-time capture of unknown television content for identification from a connected television monitor or the like 301. A pixel tile is typically defined as a square pixel region of a video buffer 303 having dimensions of approximately ten pixels by ten rows of pixels 304, however, any reasonable shape and size may be used. The number of pixel tile locations may be anywhere between ten and fifty locations within the video buffer and are processed 305 for sending hint data 306 to a central server device.

图4展示了如以上所引用的第一个发明中所教导的从参考(匹配)数据库存储桶404中提取多个候选提示值401并且将所述提示值403应用到路径追踪内容匹配过程402。FIG. 4 illustrates extracting a plurality of candidate cue values 401 from a reference (matching) database bucket 404 and applying the cue values 403 to a path tracing content matching process 402 as taught in the first invention referenced above.

图5展示了多个箱的数据结构，这些箱保持用于对来自该匹配数据库的多个候选值进行评分的多个令牌。随着搜索过程贯穿时间而进行，所述箱是“泄漏的”并且令牌随着时间推移而过期。Figure 5 shows a data structure of bins that hold tokens used to score candidate values from the matching database. As the search proceeds through time, the bins are "leaky" and tokens expire over time.

图6：展示了如现有技术所教导的一种用于访问大数据库的典型存储器分页方案。FIG6 shows a typical memory paging scheme for accessing a large database as taught in the prior art.

图7展示了涉及若干步骤的对散列值的创建，这些步骤以计算构成来自视频帧的那些样本的大量点中的每一个点的中值开始。FIG7 illustrates the creation of a hash value which involves several steps starting with computing the median value for each of a large number of points that make up the samples from the video frame.

图8展示了如何计算散列值。Figure 8 shows how the hash value is calculated.

图9展示了作为计算散列值的过程的一部分而使用将像素位置的中值的有益结果。FIG. 9 illustrates the beneficial results of using the median of pixel positions as part of the process of calculating a hash value.

图9a展示了在对多维数据集进行分区时不使用中值的问题。Figure 9a illustrates the problem of not using the median when partitioning a multidimensional dataset.

图9b展示了找到数据集的中值的益处。Figure 9b demonstrates the benefit of finding the median of a dataset.

图10展示了表示关于使用距离关联性散列法来对媒体数据库进行定址的多个示例操作的操作流程。FIG10 illustrates an operational flow representing several example operations for addressing a media database using distance-associative hashing.

图11展示了图10的操作流程的替代性实施例。FIG11 shows an alternative embodiment of the operational flow of FIG10 .

图12展示了图10的操作流程的替代性实施例。FIG12 shows an alternative embodiment of the operational flow of FIG10.

图13展示了图10的操作流程的替代性实施例。FIG13 shows an alternative embodiment of the operational flow of FIG10.

图14展示了图10的操作流程的替代性实施例。FIG14 shows an alternative embodiment of the operational flow of FIG10.

图15展示了图10的操作流程的替代性实施例。FIG. 15 shows an alternative embodiment of the operational flow of FIG. 10 .

图16展示了图10的操作流程的替代性实施例。FIG16 shows an alternative embodiment of the operational flow of FIG10.

图17展示了图10的操作流程的替代性实施例。FIG17 shows an alternative embodiment of the operational flow of FIG10.

图18展示了图10的操作流程的替代性实施例。FIG18 shows an alternative embodiment of the operational flow of FIG10.

图19展示了图10的操作流程的替代性实施例。FIG19 shows an alternative embodiment of the operational flow of FIG10.

图20展示了表示关于使用距离关联性散列法来对媒体数据库进行定址的多个示例操作的不同操作流程。FIG. 20 illustrates various operational flows representing several example operations for addressing a media database using distance-associative hashing.

图21展示了图20的操作流程的替代性实施例。FIG. 21 illustrates an alternative embodiment of the operational flow of FIG. 20 .

图22展示了图20的操作流程的替代性实施例。FIG. 22 illustrates an alternative embodiment of the operational flow of FIG. 20 .

图23展示了图20的操作流程的替代性实施例。FIG. 23 illustrates an alternative embodiment of the operational flow of FIG. 20 .

图24展示了表示关于使用距离关联性散列法来对媒体数据库进行定址的多个示例操作的另一个操作流程。24 illustrates another operational flow representing several example operations for addressing a media database using distance-associative hashing.

图25展示了图24的操作流程的替代性实施例。FIG. 25 illustrates an alternative embodiment of the operational flow of FIG. 24 .

图26展示了图24的操作流程的替代性实施例。FIG26 shows an alternative embodiment of the operational flow of FIG24.

图27展示了关于使用距离关联性散列法来对媒体数据库进行定址的系统。FIG. 27 illustrates a system for addressing a media database using distance-associative hashing.

图28展示了关于使用距离关联性散列法来对媒体数据库进行定址的另一个系统。FIG. 28 illustrates another system for addressing a media database using distance-associative hashing.

图29展示了关于使用距离关联性散列法来对媒体数据库进行定址的又另一个系统。FIG. 29 illustrates yet another system for addressing a media database using distance-associative hashing.

详细说明Detailed description

如在前述公开中所描述的，涉及本发明的第一发明是一种使用新颖的信号处理手段以及其他手段来对未知的视频与已知视频的数据库进行匹配的系统和方法，该信号处理手段采用经修改的路径追踪算法。As described in the aforementioned disclosure, the first invention related to the present invention is a system and method for matching an unknown video with a database of known videos using novel signal processing techniques, such as a modified path tracing algorithm, among other techniques.

新发明的新颖性手段是其距离关联性散列法(DistanceAssociatedHashing)，其伴随有提供使用扇区化索引的数据库访问。所述标引手段提供了用于对未知媒体片段与已知媒体(如，音频或视频内容)的参考数据库进行匹配的计算上高效的手段。The novel approach of the new invention is its distance-associated hashing, which, along with providing database access using sectorized indices, provides a computationally efficient means for matching unknown media segments with a reference database of known media (e.g., audio or video content).

本发明的这种标引手段导致了ACR系统的速度和准确性的显著改进，该ACR系统这样被使能以便即使当电视显视器正在放映用户正在对来自数字视频记录器的视频进行改变频道、快倒、快进或甚至暂停的内容时对媒体的标识进行跟踪。This indexing approach of the present invention results in significant improvements in the speed and accuracy of ACR systems, which are enabled to track the identity of media even when a television monitor is showing content while a user is changing channels, rewinding, fast forwarding, or even pausing video from a digital video recorder.

对媒体匹配数据库的建立、更新两者以及后续的访问将描述一种系统，该系统能够生成并定址扇区化数据库从而使得这些数据库扇区可以各自驻留在对应的大量服务器装置的主存储器中而不需要借助于在对应的服务器装置中的每一个服务器装置内的分页装置。这种通过位置敏感散列法对扇区化数据库进行定址的共同手段提供了对操作效率的显著改进。A system will be described for both the creation and updating of a media matching database, as well as subsequent access, that generates and addresses a sectorized database so that each of these database sectors can reside in the main memory of a corresponding plurality of server devices without resorting to paging within each of the corresponding server devices. This common approach to addressing the sectorized database through locality-sensitive hashing provides significant improvements in operational efficiency.

对扇区化视频匹配数据库的构建以图1中所展示的过程开始。电视系统101对电视信号进行解码并且将每个视频帧的内容放入视频帧缓冲器内，为对该视频帧的像素信息的显示或进一步处理做准备。所述电视系统可以是任何电视解码系统，该电视解码系统可以或者从基带或者从调制电视源中解码电视信号并且以由视频信号所指定的对应帧大小上的经解码的RGB值来填充视频帧缓冲器。这种系统是本领域技术人员众所周知的。The construction of the sectorized video matching database begins with the process shown in FIG1 . The television system 101 decodes the television signal and places the contents of each video frame into a video frame buffer in preparation for display or further processing of the pixel information of the video frame. The television system can be any television decoding system that can decode a television signal, either from baseband or from a modulated television source, and fill the video frame buffer with decoded RGB values corresponding to the frame size specified by the video signal. Such systems are well known to those skilled in the art.

本发明的系统首先建立在原申请中被描述为提示或提示值的电视节目指纹的参考数据库并且然后持续地对其进行更新。出于建立所述视频提示的参考数据库的目的，本发明进行对从视频帧缓冲器103中所读取的一个或多个视频分片102的采集。所述视频分片可以是任意形状或图案的，但是出于本示例的目的，应当是10个水平方向上的像素乘以10个竖直方向上的像素。同样为了本示例的目的，假设在视频帧缓冲器中有25个像素分片位置均匀地分布在所述缓冲器的边界内，虽然它们不一定是均匀分布的。每个像素应当由红色值、绿色值和蓝色值组成104，这些值通常由用于每种颜色八位二进制值总计24位或每个分片位置三个字节来表示。The system of the present invention first establishes a reference database of television program fingerprints described as prompts or prompt values in the original application and then continuously updates it. For the purpose of establishing the reference database of video prompts, the present invention collects one or more video slices 102 read from the video frame buffer 103. The video slices can be of any shape or pattern, but for the purpose of this example, they should be 10 pixels in the horizontal direction multiplied by 10 pixels in the vertical direction. Also for the purpose of this example, it is assumed that there are 25 pixel slice positions evenly distributed within the boundaries of the buffer in the video frame buffer, although they are not necessarily evenly distributed. Each pixel should be composed of a red value, a green value and a blue value 104, which are usually represented by an eight-bit binary value for each color, totaling 24 bits or three bytes per slice position.

此复合数据结构由来自视频缓冲器的多个像素分片位置的平均像素值来填充。像素分片被定义为视频缓冲器的具有大约十个像素乘以十行像素304的尺寸的通常为正方形的像素区。像素分片位置的数量可能通常位于视频缓冲器内的十个与五十个位置之间。This composite data structure is populated with the average pixel values of a plurality of pixel slice locations from the video buffer. A pixel slice is defined as a generally square pixel region of the video buffer having a size of approximately ten pixels by ten rows of pixels 304. The number of pixel slice locations may typically be between ten and fifty locations within the video buffer.

将这些平均像素值305与引用来自该电视系统的处理器装置的“时刻”的时间代码306组合在一起。时刻被定义为一秒的具有自1970年1月1日半夜起过去的多个小部分中的时间，这是计算系统(尤其是基于Unix(或Linux)的系统)中的公认的惯例。These average pixel values 305 are combined with a time code 306 that references a "time of day" from the television system's processor device. The time of day is defined as the time in fractions of a second that have elapsed since midnight on January 1, 1970, which is a well-established convention in computing systems, particularly Unix (or Linux) based systems.

此外，如在原专利申请中所教导的，元数据可以被包括在内并且与数据结构106一起被定义，称为标记指纹、“提示”、或“点”。此类元数据属性可能是从来自当前正在显示的视频节目的隐藏式字幕数据中导出的，或者其可以是通过在电视系统的处理器装置内运行的将来自对应的电视节目的视频转换为文本信息的语音识别系统提取的关键词。然后，所述文本信息可以针对相关关键词被检索或者以其整体作为提示数据结构的一部分被发送至中央服务器装置以进行进一步的处理。Additionally, as taught in the original patent application, metadata may be included and defined with data structure 106, referred to as tag fingerprints, "cues," or "points." Such metadata attributes may be derived from closed caption data from the currently displayed video program, or they may be keywords extracted by a speech recognition system operating within a processor device of the television system that converts video from the corresponding television program into textual information. The textual information may then be retrieved for relevant keywords or sent in its entirety as part of the cue data structure to a central server device for further processing.

那些提示记录201在图2中被传递至散列函数202，该散列函数使用基于平等球中的概率点位置算法(“PPLEB”)的位置敏感散列算法来生成散列值203。此散列值是根据来自提示记录(指纹)207的平均像素值计算出来的并且该过程将具有相似值的数据关联206至被称为存储桶的组中。Those hint records 201 are passed to a hash function 202 in Figure 2, which uses a position-sensitive hashing algorithm based on the Probabilistic Point Location in Equal Sphere ("PPLEB") algorithm to generate a hash value 203. This hash value is calculated based on the average pixel value from the hint record (fingerprint) 207, and the process associates 206 data with similar values into groups called buckets.

在此具体示例中所示出的十乘十像素分片302将具有一百个像素并且在数学上取平均分别针对红色值、绿色值和蓝色值产生平均像素值305。替代性地，可以使用任何取平均函数来替代简单的平均值。The ten by ten pixel patch 302 shown in this particular example would have one hundred pixels and mathematically averaged separately for red, green and blue values to produce an average pixel value 305. Alternatively, any averaging function could be used instead of a simple average.

从视频帧中提取出多个这种像素分片。举例来讲，如果从视频帧中提取出25个这种像素分片，那么结果将是表示在75维空间内的位置的点。技术人员将知晓这种大搜索空间会要求大量的计算资源来结合表示一个视频帧的其他74个值来定位(甚至近似地)所述值。A plurality of such pixel slices are extracted from the video frame. For example, if 25 such pixel slices are extracted from the video frame, the result will be a point representing a position in a 75-dimensional space. The skilled person will appreciate that such a large search space will require a significant amount of computing resources to locate (even approximately) the value in conjunction with the other 74 values representing a video frame.

在此所描述的距离关联性散列法的系统和方法的优点是减小计算负荷并且提高对未知的视频帧与已知的视频帧数据库进行匹配的准确性。Advantages of the distance-associative hashing system and method described herein are reduced computational load and increased accuracy in matching an unknown video frame to a database of known video frames.

创建散列值涉及若干步骤，这些步骤以计算每个点的以算法方式导出的值开始，如图7中所示出的701至775。发现了通过对在大约24小时时段上由匹配数据库维护的每个节目流或视频频道的每个帧的每个点进行求和来以算法方式导出所述中值的一种有用手段。每个点的中值都从该求和过程中找到。导出最终的散列值中的下一个步骤是从每个对应的点的点值减去平均值，行801减去行802等于行803。结果是预先导出的散列函数被应用于其上的正值或负值。通常，点值减去对应点的平均值的结果安排在矩阵中，使用构成预先导出的散列值(或散列键)的类似矩阵来计算对该矩阵的点积。然后，两个矩阵的点积的结果基于乘积矩阵元素的符号被进一步变换为一或零。通常，技术人员会将正值设为一并且将负值设为零。Creating a hash value involves several steps, which begin with calculating the algorithmically derived value for each point, as shown in 701 to 775 in Figure 7. A useful means of algorithmically deriving the median value has been found by summing each point of each frame of each program stream or video channel maintained by a matching database over a period of approximately 24 hours. The median value of each point is found from this summing process. The next step in deriving the final hash value is to subtract the average value from the point value of each corresponding point, where row 801 minus row 802 equals row 803. The result is a positive or negative value to which the pre-derived hash function is applied. Typically, the result of the point value minus the average value of the corresponding point is arranged in a matrix, and a similar matrix that constitutes the pre-derived hash value (or hash key) is used to calculate the dot product of the matrix. Then, the result of the dot product of the two matrices is further transformed to one or zero based on the sign of the product matrix elements. Typically, a technician will set a positive value to one and a negative value to zero.

所产生的散列值指向更多或更少跨数据存储区域均匀分布的值。散列值203可以被进一步划分(图2)为使得‘n’个最高有效位205对数据库的2ⁿ(2^n)个扇区之一进行定址。剩余的位206对数据库的所定址的扇区的各个‘存储桶’进行定址，这将在下文更详细的进行描述。The resulting hash value points to values that are more or less evenly distributed across the data storage area. The hash value 203 can be further divided ( FIG. 2 ) such that the 'n' most significant bits 205 address one of the 2 ⁿ (2^n) sectors of the database. The remaining bits 206 address individual 'buckets' of the addressed sector of the database, as will be described in more detail below.

对各个扇区地址空间进行定义的散列值的划分点被计算为使得数据库扇区的数据的索引装配于所述存储器扇区的那些处理器系统的存储器界限内。否则，所述数据库将经受分页，该分页将降低此过程的有效性。The partition points of the hash values defining the address spaces of the various sectors are calculated so that the index of the data of a database sector fits within the memory boundaries of those processor systems of the memory sector. Otherwise, the database will be subject to paging which will reduce the effectiveness of this process.

将本发明所教导的系统和方法与相关领域中的技术人员所熟知的系统和方法进行比较，图6展示了典型的分页方式。在图6中，假设示例系统正在试图对未知数据与已知数据的服务器进行匹配。使用索引602来仅对可以装配于CPU606的主存储器中的数据605的一部分进行定址。对此数据进行搜索，并且如果结果是负的，那么将另一个数据片段取入主存储器603并且搜索继续。Comparing the systems and methods taught by the present invention with those known to those skilled in the relevant art, FIG6 illustrates a typical paging approach. In FIG6 , assume that the example system is attempting to match unknown data with a server of known data. An index 602 is used to address only a portion of the data 605 that can fit in the main memory of the CPU 606. This data is searched, and if the result is negative, another data fragment is fetched into main memory 603 and the search continues.

使用分页的这种访问手段是常用的但是显著降低了计算机系统的效率。实际上，这种方式不能与搜索大媒体数据库的ACR系统一起使用，因为硬盘驱动器的读/写速度不足以跟上该任务。多年来已经开发出许多不同的算法方式来解决将搜索分割为多个较小的部分并且将较小的搜索分配到多个计算机服务器系统的这种问题。This access method using paging is common but significantly reduces the efficiency of the computer system. In practice, this approach cannot be used with ACR systems that search large media databases because the read/write speed of the hard drive is not fast enough to keep up with the task. Many different algorithmic approaches have been developed over the years to solve this problem of splitting the search into multiple smaller parts and distributing the smaller searches to multiple computer server systems.

知名的示例可能是相当大的谷歌(Google)搜索引擎。技术人员知晓此系统是迄今为止所建立的最大的计算机系统之一。谷歌搜索过程的速度和准确性是出色的。然而，谷歌搜索手段是明显不同的，并且完全不适用于对未知媒体与已知媒体的数据库进行匹配，即使该两个数据库具有相同的非常大的大小。这是因为谷歌搜索手段采用映射-归约(map-reduce)算法，该算法被设计为用于搜索基本上不关联的数据的大数据库。虽然较分页系统先进，映射-归约是计算密集型过程，该过程也要求参与的计算机系统之间的相当大的数据通信带宽。相反，本发明在处理资源和通信资源的使用上是高效的。A well-known example is probably the rather large Google search engine. Those skilled in the art will recognize that this system is one of the largest computer systems ever built. The speed and accuracy of the Google search process are remarkable. However, the Google search approach is significantly different and is completely unsuitable for matching unknown media with a database of known media, even if the two databases are of the same very large size. This is because the Google search approach employs a map-reduce algorithm, which is designed for searching large databases of essentially unrelated data. While more advanced than paging systems, map-reduce is a computationally intensive process that also requires considerable data communication bandwidth between the participating computer systems. In contrast, the present invention is efficient in its use of processing and communication resources.

在本发明中，距离关联性散列函数提供了按扇区来对数据库进行定址的手段，从而使得所述定址手段的数据装配于服务器组的单独的服务器装置的主存储器中。所述分组通过使用距离关联性散列步骤作为实现所述分组的手段来将多维阵列中的通过距离而相关的数据分组为相同的扇区来完成。用于对数据元素进行定址的扇区标识是根据散列索引计算出的，通过提取所述散列函数的总位的子集并且使用所述子集对所期望的扇区进行定址以在其中将数据存储在参考数据库中来从所述过程中来生成该散列索引。In the present invention, a distance-associative hash function provides a means for addressing a database by sector, allowing the data for the addressing means to fit within the main memory of individual server devices in a server group. Grouping is accomplished by grouping distance-related data in a multidimensional array into the same sector using a distance-associative hashing step as a means for achieving the grouping. The sector identifier used to address the data element is calculated from a hash index, which is generated from the process by extracting a subset of the total bits of the hash function and using the subset to address the desired sector in which to store the data in the reference database.

以此方式，散列索引子集是包含与多个散列值相关联的距离的扇区的地址(在第一发明中被称为存储桶)。然后，使用该散列地址的剩余部分来对扇区的用于存储新数据的存储桶进行定址。替代性地，可以通过对第一散列值再次进行散列来找到扇区地址。In this way, the hash index subset is the address of the sector (called a bucket in the first invention) containing the distance associated with multiple hash values. The remaining portion of the hash address is then used to address the bucket of the sector for storing new data. Alternatively, the sector address can be found by hashing the first hash value again.

这种通过多个散列标引步骤的数据库定址系统和方法产生了高效的数据库访问方案，该方案具有显著的性能益处以及较以上所描述的传统数据库访问方法提高的效率。This system and method of database addressing through multiple hash indexing steps results in an efficient database access scheme that has significant performance benefits and increased efficiency over the traditional database access methods described above.

距离关联性散列法提供了通过找到不是完全匹配而是在所寻求的值的预先确定的半径(距离关联性的)之内的数据来快速地对非常复杂(多维)的数据库进行定址的手段。重要的是，有时候这种定址手段将导致根本没有匹配。在面向商业的数据库无法容忍不准确性的情况下，媒体匹配系统可以轻易地容忍丢失的匹配并且一旦在第一专利中所接收并教导的下一个数据到达时将简单地继续匹配过程。当然，来自有待通过ACR系统确定的未知源的数据到达是周期性的，但是可以受本发明的系统命令以基于对准确性的要求或通过由系统的状态(如，系统可能正在接近过载时)所施加的要求以不同的间隔到达，并且然后这些发送客户端受到命令以发送较低采样率。例如，典型的数据接收率可能是1/10秒的间隔。Distance-associative hashing provides a means of quickly addressing very complex (multi-dimensional) databases by finding data that is not an exact match but is within a predetermined radius (distance-associative) of the value being sought. Importantly, sometimes this means of addressing will result in no matches at all. Where commercially oriented databases cannot tolerate inaccuracies, the media matching system can easily tolerate lost matches and will simply continue the matching process once the next data received and taught in the first patent arrives. Of course, data from unknown sources to be determined by the ACR system arrives periodically, but can be commanded by the system of the present invention to arrive at different intervals based on requirements for accuracy or by requirements imposed by the state of the system (e.g., when the system may be approaching overload), and these sending clients are then commanded to send a lower sampling rate. For example, a typical data reception rate might be an interval of 1/10 of a second.

对于参考媒体数据库，像素值组是从来自有待成为参考数据库的一部分的每个视频源的每个视频帧中导出的。然后，将视频节目的广播时间以及某个元数据附到像素值组上，该元数据是节目的信息，如内容标识(ID)、节目标题、演员姓名、播送时间、简短大纲等。所述元数据通常是从商用电子节目指南源获取的。For a reference media database, a set of pixel values is derived from each video frame from each video source to be part of the reference database. The broadcast time of the video program is then attached to the set of pixel values, along with certain metadata, which is information about the program, such as a content identifier (ID), program title, actor names, airtime, a brief synopsis, etc. The metadata is typically obtained from a commercial electronic program guide source.

然后，将添加有时间代码加上元数据的所述经处理的像素值的阵列存储于参考数据库中，并且然后将所述存储的数据的地址添加到在对应的散列值和扇区ID值处的散列索引上。此外，通过将来自元数据的内容ID用作另一种用于对参考数据库进行定址的手段来建立并维护第二数据库索引。The array of processed pixel values with the time code plus metadata added is then stored in a reference database, and the address of the stored data is then added to the hash index at the corresponding hash value and sector ID value. In addition, a second database index is established and maintained by using the content ID from the metadata as another means for addressing the reference database.

建立并持续更新数据库的过程是连续的，并且由数据库维护的数据的天数是基于用户的需要的但是例如可以在从一天到一个月的范围内。The process of building and continually updating the database is continuous, and the number of days of data maintained by the database is based on the needs of the user but may range from one day to one month, for example.

对来自接收自大量客户端装置的数据的未知视频片段进行标识的过程以类似于以上用于建立参考数据库的程序的程序开始。在图3中，此程序涉及电视监视器301，如通常属于被称为智能TV的类型的流行平面屏幕HDTV，其中，该TV包含具有能够执行类似于当今常用的智能电话上所找到的类型的应用程序的存储器的处理装置。本发明的系统在通常大量的位置内对视频帧缓冲器301的区域302进行采样。所述样本具有与在建立参考数据库的过程中所使用的那些像素分片完全相同的大小、形状和位置。然后，以算法方式对所收集的这些像素分片中的每个像素分片都进行处理以便以与用于创建参考数据库的方法完全相同的方式针对每个分片的红色值、绿色值和蓝色值产生计算出的值。The process of identifying unknown video segments from data received from a large number of client devices begins with a procedure similar to the procedure used above to establish a reference database. In Figure 3, this procedure involves a television monitor 301, such as a popular flat-screen HDTV of the type commonly referred to as a smart TV, wherein the TV includes a processing device with memory capable of executing applications similar to the type found on today's popular smartphones. The system of the present invention samples a region 302 of the video frame buffer 301 in a generally large number of locations. The samples have the same size, shape, and position as the pixel slices used in the process of establishing the reference database. Each of these collected pixel slices is then processed algorithmically to produce calculated values for the red value, green value, and blue value of each slice in a manner identical to the method used to create the reference database.

然后，本发明的所述系统计算所收集的平均值的与以上所描述的内容摄取函数完全相同的距离关联性散列索引。同样与以上所描述的摄取系统完全相同的，所产生的扇区标识(ID)值被提取为散列索引的总位的子集。该散列索引的剩余部分用于对所期望的扇区进行定址以在其中搜索属于与未知的数据点相同的存储桶的所有候选(潜在)匹配。The system of the present invention then calculates a distance-associative hash index of the collected averages, identical to the content ingestion function described above. Also identical to the ingestion system described above, the resulting sector identification (ID) value is extracted as a subset of the total bits of the hash index. The remainder of the hash index is used to address the desired sector, searching it for all candidate (potential) matches belonging to the same bucket as the unknown data point.

可选地，如果从以上过程可获得对匹配的良好猜测(成功的匹配)，本发明的系统还将使用内容ID索引从数据库中收集候选项以对围绕(成功匹配的候选项的)时间戳的时间半径r’的多个参考提示进行定址，该数据库负责属于潜在的内容ID的如以上所描述的在摄取过程中所创建的所述扇区。接下来，如在第一专利中所教导的，移除重复的候选项以及与未知点相距太远(半径r)的候选项。Optionally, if a good guess for a match (a successful match) is obtained from the above process, the system of the present invention will also use the content ID index to collect candidates from the database for the sectors created during the ingestion process as described above, belonging to potential content IDs, to address multiple reference hints within a time radius r' around the timestamp (of the successfully matched candidate). Next, as taught in the first patent, duplicate candidates and candidates that are too far away (radius r) from the unknown point are removed.

为了针对将未知视频片段与已知视频数据的参考数据库的匹配进行测试，假设来自上一个步骤的候选项列表，其中，每个候选项(即，每个可能的匹配)已经将其与以下数据项相关联：内容ID、媒体时间、作为与当前未知点(与未知视频流)的距离计算的逆百分比分布半径，其中，100％表示未知点的确切值并且0％是超出从未知点的半径r(分布)之外的值。In order to test the matching of an unknown video clip against a reference database of known video data, assume a list of candidates from the previous step, where each candidate (i.e., each possible match) has associated with it the following data items: content ID, media time, inverse percent distribution radius calculated as the distance from the current unknown point (to the unknown video stream), where 100% represents the exact value of the unknown point and 0% is a value outside the radius r (distribution) from the unknown point.

在本发明的匹配系统的存储器中为每个匹配候选项501分配数据结构502。除其他项外，该数据结构由通过某个任意量来分组的任意的时间箱(例如，大约一秒)组成。出于示例的目的，假设所述数据结构由一百个表示十秒的视频提示点的箱组成。这些箱在时间上通常不是相等间隔的。In the memory of the matching system of the present invention, a data structure 502 is allocated for each matching candidate 501. This data structure consists of, among other things, arbitrary time bins grouped by some arbitrary amount (e.g., approximately one second). For the purposes of this example, assume that the data structure consists of one hundred bins representing ten-second video cue points. These bins are typically not equally spaced in time.

对于在参考(匹配)数据库中所找到的每个候选项：首先，通过从未知视频的任意时间中减去候选项时间来计算相对时间。候选项时间是在参考节目播送期间与候选项相关联的每个视频提示的播放时间。For each candidate found in the reference (matching) database: First, the relative time is calculated by subtracting the candidate time from the arbitrary time of the unknown video. The candidate time is the play time of each video cue associated with the candidate during the reference program broadcast.

未知视频的任意时间来自于电视监视器的从在所述电视的存储器或在附接至所述电视的机顶盒中运行的本发明的应用内部生成的时刻，并且与所采样的视频提示点一起由所述应用发送至本发明的中央服务器装置。时刻是技术人员所众所周知的并且通常用于计算机系统中。所述时间作为自1970年1月1日起的时间单元的当前数量。The arbitrary time of the unknown video is generated from the television monitor by an application of the present invention running in the television's memory or in a set-top box attached to the television. The application sends this time along with the sampled video cue points to the central server device of the present invention. Time is well known to those skilled in the art and commonly used in computer systems. It is expressed as the current number of time units since January 1, 1970.

例如，如果来自(在家庭中的)电视的任意时间与真实媒体时间之间的时间差是100秒，那么那些实际上匹配的候选项的相对时间应该接近于那个值。同样，不是良好匹配的候选项不可能具有接近于本示例的100秒的相对时间。For example, if the time difference between any time from the TV (in the home) and the real media time is 100 seconds, then the relative time of those candidates that are actually a match should be close to that value. Likewise, candidates that are not good matches are unlikely to have relative times close to 100 seconds in this example.

在候选项数据结构中，当未知视频的提示点与参考提示点相匹配时，本发明的系统将令牌添加至候选项数据结构的对应箱内。然后，所述系统如在上一段中所描述的那样针对下一个候选项重复该过程。In the candidate data structure, when a cue point of the unknown video matches a reference cue point, the system of the present invention adds a token to the corresponding box of the candidate data structure. The system then repeats the process for the next candidate as described in the previous paragraph.

用于对结果进行评分的另一个并且重要的步骤是向所有的箱施加时间折扣。这是相对简单的过程，该过程针对每个时间周期将所有箱内的值递减较小的量。技术人员将认识到，这是“漏桶(leaky bucket)”评分方法。通过定义，经过所述过程的多个周期，不再通过匹配提示点来填充的箱将最终递减至零。同样，慢慢地通过系统中的随机噪声而填充的箱将同样被递减。所以，时间折扣最终清空由假正匹配和随机噪声填充的箱。技术人员还将清楚地看到，没有所述时间折扣分箱(time discount binning)，所有的箱都将最终填充至容量并且无法从过程中获得任何结果。Another and important step for scoring the results is to apply a time discount to all bins. This is a relatively simple process that decrements the values in all bins by a small amount for each time period. The skilled person will recognize that this is a "leaky bucket" scoring method. By definition, over multiple cycles of the process, bins that are no longer filled by matching cue points will eventually be decremented to zero. Similarly, bins that are slowly being filled by random noise in the system will also be decremented. Therefore, the time discount eventually clears the bins that are filled by false positive matches and random noise. The skilled person will also clearly see that without the time discount binning, all bins will eventually fill to capacity and no results can be obtained from the process.

在通过以下各项中的任何一项来以任何方式改变来自客户端电视监视器的视频流时，所述时间折扣还将具有在匹配阈值504的水平之上的任何箱(如503)递减至零：改变频道、快退、快进、暂停视频等。The time discount will also decrement to zero any bin (such as 503) that is above a level matching threshold 504 when the video stream from the client television monitor is changed in any way by any of the following: changing the channel, rewinding, fast forwarding, pausing the video, etc.

如果候选项数据结构的任何箱(如箱503)在某个阈值504之上，那么内容被声明匹配。限定匹配的进一步的手段可能包括在大于确定秒数(例如，三秒)的时间内针对候选项片段的连贯匹配进行测试。The content is declared a match if any bin of the candidate data structure, such as bin 503, is above a certain threshold 504. Further means of qualifying a match might include testing for consecutive matches of candidate segments for greater than a certain number of seconds (e.g., three seconds).

图8展示了如何计算散列值。首先，通过对所述位置的在所述位置处来自表示有待由本发明标识的典型的电视节目的多个电视频道中的采集值的许多天的时间上的那些值进行求和来找到有助于视频指纹的每个像素位置的中值。一旦确定了中值，其可以无限期地用作常数而不需要进一步的计算或调整。首先通过减去所述像素位置的中值来处理从客户端发送至服务器匹配系统的像素值。将所产生的结果与视频帧的其他像素位置一起存储在矩阵中，并且将适当的散列函数应用于所述矩阵。然后，从所产生的点积中导出多个散列值。FIG8 illustrates how hash values are calculated. First, the median value for each pixel position contributing to the video fingerprint is found by summing the values at that position over many days of collected values from multiple television channels representing typical television programs to be identified by the present invention. Once the median value is determined, it can be used as a constant indefinitely without the need for further calculations or adjustments. The pixel value sent from the client to the server matching system is first processed by subtracting the median value of that pixel position. The resulting result is stored in a matrix along with the other pixel positions of the video frame, and an appropriate hash function is applied to the matrix. Multiple hash values are then derived from the resulting dot products.

图9展示了作为计算散列值的过程的一部分而使用将像素位置的中值的有益结果。图表901示出了典型的未经优化的散列函数的输出的所产生的曲线，该散列函数具有在曲线的左边上占用相对窄范围的相对小数量的散列值。所产生的中值902较低。图表903示出了由于计算参与匹配过程的每个像素位置的中值并且将所述中值作为散列函数的一部分而应用所引起的对散列值的有利再分布。散列值的分布更加散开，伴随有所有的散列键904的中值中的上升。FIG9 illustrates the beneficial results of using the median of pixel positions as part of the process of calculating hash values. Graph 901 shows the resulting curve of the output of a typical, unoptimized hash function, which has a relatively small number of hash values occupying a relatively narrow range on the left side of the curve. The resulting median 902 is relatively low. Graph 903 illustrates the beneficial redistribution of hash values resulting from calculating the median for each pixel position participating in the matching process and applying the median as part of the hash function. The distribution of hash values is more spread out, accompanied by an increase in the median of all hash keys 904.

图9a展示了当在对数据集进行分区之前未找到中值时所述数据集发生了什么。如果系统对每个视频帧的十六个像素位置进行采样并且如果每个像素位置具有红色像素值、绿色像素值和蓝色像素值，将有64维(或轴线)至该图。出于展示的目的，在本示例中，数据集仅包括单个视频帧的两个样本点906和908。进一步地，该示例假设在每个像素点处仅获得单个亮度值。通过在对角线方向907上将数据集分割为顺时针扇区907c和逆时针扇区907cc并且竖直轴线908和水平轴线906在零值905处相交叉，在该两个所述像素位置之间仅有八个扇区中的两个扇区910和911。9 a shows what happens to the data set when no median is found before the data set is partitioned. If the system samples sixteen pixel positions of each video frame and if each pixel position has a red pixel value, a green pixel value, and a blue pixel value, there will be 64 dimensions (or axes) to the figure. For the purpose of illustration, in this example, the data set only includes two sample points 906 and 908 of a single video frame. Further, this example assumes that only a single brightness value is obtained at each pixel point. By dividing the data set into a clockwise sector 907 c and a counterclockwise sector 907 cc on the diagonal direction 907 and the vertical axis 908 and the horizontal axis 906 intersecting at the zero value 905, there are only two sectors 910 and 911 of the eight sectors between the two pixel positions.

图9b展示了找到每个像素位置的中值的益处。本示例继续使用像素值是从零至255的单个亮度值的假设，虽然绝对值对本方法并不重要。本示例展示了对于两个像素位置而言中值均是128的简化假设。现在，通过将分区点移位至905’，竖直轴线和水平轴线分别移位至908’和906’。对角切片909移至909’。从示图中清楚地看到现在所有八个扇区都包含数据。FIG9 b illustrates the benefit of finding the median value for each pixel position. This example continues with the assumption that the pixel values are single brightness values ranging from zero to 255, although the absolute values are not important to the method. This example illustrates the simplifying assumption that the median value is 128 for both pixel positions. Now, by shifting the partition point to 905′, the vertical and horizontal axes are shifted to 908′ and 906′, respectively. The diagonal slice 909 is shifted to 909′. It is clear from the diagram that all eight sectors now contain data.

在以此方式对数据集进行分区时，所计算的中值既不一定也不需要在数据集的中间。所期望的结果是将数据分散开从而使得当所述数据被分区并且指派至各个服务器时，系统更统一地访问所述服务器。相反，图9的未经优化的数据如果如所示的那样在八个服务器之间进行分区则将仅看到八个被访问的服务器中的两个服务器。在图9b所展示的方法中，通过在每个像素位置处的那些颜色值并且通过16个像素位置的示例，实际计算导致如48维图那样计算的48个中值的施加。进一步地，根据需要可以围绕48维图的每个中间点进行多于一次的数据拼接，从而使得可以使由所述切片产生的所述数据集装配于系统的单独计算机服务器的操作约束之内。在任何情况下，在每个分区切片的顺时针侧和逆时针侧上的大多数时间都将找到数据。When partitioning a data set in this way, the calculated median value is neither necessarily nor required to be in the middle of the data set. The desired result is to disperse the data so that when the data is partitioned and assigned to each server, the system accesses the servers more uniformly. On the contrary, if the unoptimized data of Figure 9 is partitioned between eight servers as shown, only two of the eight accessed servers will be seen. In the method shown in Figure 9b, by those color values at each pixel position and by the example of 16 pixel positions, the actual calculation results in the application of 48 medians calculated as a 48-dimensional graph. Further, as needed, more than one data splicing can be performed around each middle point of the 48-dimensional graph, so that the data set generated by the slice can be assembled within the operational constraints of the system's separate computer servers. In any case, data will be found most of the time on the clockwise and counterclockwise sides of each partition slice.

图10展示了表示关于使用距离关联性散列法来对媒体数据库进行定址的多个示例操作的操作流程1000。在图10中以及在包括操作流程的各个示例的下图中，将关于图1到图9的上述示例和/或关于其他示例和上下文来提供讨论和阐释。然而，应理解，可以在许多其他环境和上下文中和/或在图1到图9的经修改版本中执行这些操作流程。而且，虽然以所说明的序列来呈现各个操作流，但应理解，可以用与所说明的顺序不同的其他顺序来执行各个操作，或者可以同时执行这些操作。FIG10 illustrates an operational flow 1000 representing several example operations for addressing a media database using distance-associative hashing. In FIG10 and in the following figures, which include various examples of operational flows, discussion and illustration are provided with respect to the aforementioned examples of FIG1 through FIG9 and/or with respect to other examples and contexts. However, it should be understood that these operational flows can be performed in many other environments and contexts and/or in modified versions of FIG1 through FIG9. Furthermore, while the various operational flows are presented in the illustrated sequence, it should be understood that the various operations can be performed in an order other than that illustrated, or can be performed simultaneously.

在开始操作之后，操作流程1000移动到操作1002。操作1002描绘了接收对视频片段的样本的一个或多个指示。例如，如在图1至图9中所示出的和/或关于其所描述的，这些指示可以与来自摄取系统的一个或多个像素分片相关联。After a start operation, operational flow 1000 moves to operation 1002. Operation 1002 depicts receiving one or more indications of samples of a video segment. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9, these indications may be associated with one or more pixel slices from an ingest system.

然后，操作1004描绘了针对包括至少一个分片的至少一个或多个像素的视频片段的该样本的该至少一个分片来确定每个分片的该一个或多个像素的以算法方式导出的值。例如，如在图1至图9中所示出的和/或关于其所描述的，可以计算每个分片中的那些红色像素、每个分片中的那些绿色像素、以及每个分片中的那些蓝色像素的平均值。Then, operation 1004 depicts determining, for the at least one tile of the sample of the video segment including at least one or more pixels of the at least one tile, an algorithmically derived value for the one or more pixels of each tile. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , an average value of the red pixels in each tile, the green pixels in each tile, and the blue pixels in each tile may be calculated.

然后，操作1006描绘了从针对每个分片的平均值中减去针对每个分片所建立的中间点值。例如，如在图1至图9中所示出的和/或关于其所描述的，可以通过对所述位置的在所述位置处来自多个电视频道中的采集值的许多天的时间上的那些值进行求和来找到有助于视频指纹的每个像素位置的中值。Operation 1006 then depicts subtracting the midpoint value established for each slice from the mean for each slice. For example, as shown in and/or described with respect to Figures 1-9, the median value for each pixel location contributing to the video fingerprint can be found by summing those values for that location over a number of days of collected values from multiple television channels at that location.

然后，操作1008描绘了使用预先导出的函数来变换由于该减法而产生的这些值以均匀地分布这些值。例如，如在图1至图9中所示出的和/或关于其所描述的，由减法所产生的这些值填充矩阵。可以计算那个矩阵与预先导出的静态矩阵的点积。该预先导出的静态矩阵可以在对操作流程1000进行实例化之前被确定并且可以基于过去所摄取的数据来以算法方式被优化，从而使得与其相交叉的多个矩阵将产生比直接来自减法操作的结果分布更均匀的结果。Then, operation 1008 has described using the function that is derived in advance to transform these values that produce due to this subtraction to distribute these values evenly.For example, as shown in Figure 1 to Figure 9 and/or about it, these values that are produced by subtraction fill a matrix.Can calculate the dot product of that matrix and the static matrix that is derived in advance.This static matrix that is derived in advance can be determined before operational flow 1000 is instantiated and can be optimized in an algorithmic manner based on the data that was taken in in the past, thereby make multiple matrices that intersect with it will produce the result that is more evenly distributed than the result directly from the subtraction operation.

然后，操作1010描绘了根据这些经变换的值构建散列值。例如，如在图1至图9中所示出的和/或关于其所描述的，能够保持RGB值的值被减小至位形式，从而使得散列值可以是位串。Then, operation 1010 depicts constructing a hash value from these transformed values. For example, as shown in and/or described with respect to Figures 1-9, the values capable of maintaining RGB values are reduced to bit form so that the hash value can be a bit string.

然后，操作1012描绘了引用所构建的该散列值的多个最高有效位以确定数据库扇区。例如，如在图1至图9中所示出的和/或关于其所描述的，可以预先确定多个位，从而使得散列值的所预先确定的该多个位用于对一个或多个数据库扇区进行定址。Operation 1012 then depicts referencing the most significant bits of the constructed hash value to determine a database sector. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , a plurality of bits may be predetermined such that the predetermined plurality of bits of the hash value are used to address one or more database sectors.

然后，操作1014描绘了至少将该散列值存储在所确定的该数据库扇区上。例如，如在图1至图9中所示出的和/或关于其所描述的，可以将该散列值存储在存储桶中，该存储桶包括在数学上接近的其他散列值，其中，这些散列值至少与具体的视频片段和偏移相关联。Operation 1014 then depicts storing at least the hash value on the determined database sector. For example, as shown in and/or described with respect to FIG. 1-9 , the hash value may be stored in a bucket that includes other mathematically close hash values, where the hash values are associated with at least a specific video segment and offset.

图11展示了图10的示例操作流程1000的替代性实施例。图11展示了操作流程1000可以包括至少一个附加操作的示例实施例。附加操作可以包括操作1102。FIG11 illustrates an alternative embodiment of the example operational flow 1000 of FIG10. FIG11 illustrates an example embodiment in which the operational flow 1000 may include at least one additional operation. The additional operation may include operation 1102.

操作1002展示了使用一个或多个处理装置来至少部分地实现接收操作1002、确定操作1004、减法操作1006、变换操作1008、构建操作1010、引用操作1012、或存储操作1014中的至少一个操作。在一些实例中，可以使用一个或多个计算机处理器来至少部分地实现前述操作之一。其他处理装置可以包括专用集成电路(ASIC)、现场可编程门阵列(FPGA)、数字信号处理器(DSP)、或被配置为用于实现前述操作中的至少一个操作的结果的任何其他电路。Operation 1002 illustrates using one or more processing devices to at least partially implement at least one of the receiving operation 1002, determining operation 1004, subtracting operation 1006, transforming operation 1008, constructing operation 1010, referencing operation 1012, or storing operation 1014. In some examples, one or more computer processors may be used to at least partially implement one of the aforementioned operations. Other processing devices may include application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), or any other circuitry configured to implement the results of at least one of the aforementioned operations.

图12展示了图10的示例操作流程1000的替代性实施例。图12展示了示例实施例，其中操作1002可以包括至少一项附加操作。附加操作可以包括操作1202和/或操作1204。FIG12 illustrates an alternative embodiment of the example operational flow 1000 of FIG10 . FIG12 illustrates an example embodiment in which operation 1002 may include at least one additional operation. The additional operation may include operation 1202 and/or operation 1204 .

操作1202展示了接收对帧或静止图像中的至少一项的一个或多个指示。例如，如在图1至图9中所示出的和/或关于其所描述的，视频片段的样本可以包括视频流的单独帧。这样的帧可以是一个30fps视频帧。在不同的实施例中，视频片段的样本可以是静止图像或者可以以除了30次每秒之外的速率成像的视频片段的一部分。Operation 1202 illustrates receiving one or more indications of at least one of a frame or a still image. For example, as shown in and/or described with respect to Figures 1 to 9, a sample of a video segment may comprise a single frame of a video stream. Such a frame may be a 30 fps video frame. In various embodiments, a sample of a video segment may be a still image or a portion of a video segment that may be imaged at a rate other than 30 times per second.

进一步，操作1204展示了接收对视频片段的样本的一个或多个指示，对视频片段的样本的该一个或多个指示与对频道的至少一个指示、对视频片段的至少一个指示、以及对与该视频片段的开始的时间代码偏移的至少一个指示相关联。例如，如在图1至图9中所示出的和/或关于其所描述的，可以从例如与正在由摄取系统监视的频道相关联的频道指南中接收与视频片段相关联的数据(其可以是节目标题和/或与视频片段相关联的其他元数据)、从其中摄取节目的频道、以及从节目开始的时间偏移。Further, operation 1204 illustrates receiving one or more indications of samples of the video segment, the one or more indications of the samples of the video segment being associated with at least one indication of a channel, at least one indication of the video segment, and at least one indication of a time code offset from the start of the video segment. For example, as shown in and/or described with respect to FIGURES 1-9, data associated with the video segment (which may be a program title and/or other metadata associated with the video segment), the channel from which the program was ingested, and the time offset from the start of the program may be received from, for example, a channel guide associated with a channel being monitored by the ingestion system.

图13展示了图10的示例操作流程1000的替代性实施例。图13展示了示例实施例，其中操作1004可以包括至少一项附加操作1302。FIG13 illustrates an alternative embodiment of the example operational flow 1000 of FIG10. FIG13 illustrates an example embodiment in which operation 1004 may include at least one additional operation 1302.

操作1302展示了针对包括至少一个分片的至少一个或多个像素的视频片段的该样本的该至少一个分片来确定每个分片的该一个或多个像素的平均值。例如，如在图1至图9中所示出的和/或关于其所描述的，用于将分片中的该一个或多个像素减少至单个值的算法操作可以是例如算数平均。Operation 1302 illustrates determining, for each slice of the sample of the video segment including at least one or more pixels of the at least one slice, an average of the one or more pixels of the at least one slice. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , the algorithmic operation for reducing the one or more pixels in the slice to a single value may be, for example, an arithmetic average.

图14展示了图10的示例操作流程1000的替代性实施例。图14展示了示例实施例，其中操作1006可以包括至少一项附加操作1402。FIG14 illustrates an alternative embodiment of the example operational flow 1000 of FIG10. FIG14 illustrates an example embodiment in which operation 1006 may include at least one additional operation 1402.

操作1402展示了从针对每个分片的平均值中减去针对每个分片所建立的中间点值，之前已经针对多个频道在至少一个时间段上使用来自每个分片的数据确定了针对每个分片所建立的该中间点值。例如，如在图1至图9中所示出的和/或关于其所描述的，可以确定中值，该中值是针对每个分片所确定的，其中，针对与在确定客户端系统上的片段的操作中相同的摄取时的分片建立中值，该中值被建立为从在长时间(一个月、一年等)内跨许多频道的相同分片进行监视来导出的常数值。Operation 1402 illustrates subtracting a midpoint value established for each shard from the mean value for each shard, the midpoint value established for each shard having been previously determined for a plurality of channels over at least one time period using data from each shard. For example, as shown in and/or described with respect to FIGURES 1-9, a median value may be determined, the median value being determined for each shard, wherein the median value is established for the shard at the same time of ingestion as in the operation of determining the segments on the client system, the median value being established as a constant value derived from monitoring the same shard across many channels over an extended period of time (a month, a year, etc.).

图15展示了图10的示例操作流程1000的替代性实施例。图15展示了示例实施例，其中操作1008可以包括至少一项附加操作。附加操作可以包括操作1502、操作1504和/或操作1506。FIG15 illustrates an alternative embodiment of the example operational flow 1000 of FIG10. FIG15 illustrates an example embodiment in which operation 1008 may include at least one additional operation. The additional operation may include operation 1502, operation 1504, and/or operation 1506.

操作1502展示了形成至少包括由于该减法而产生的这些值的可变矩阵。例如，如在图1至图9中所示出的和/或关于其所描述的，值被安排在矩阵中，这些值由减法操作产生，其中，该减法操作从正在被摄取的即时帧的平均值中减去随着时间推移针对每个分片所建立的中值。Operation 1502 illustrates forming a variable matrix including at least the values resulting from the subtraction. For example, as shown in and/or described with respect to FIGs. 1-9, values are arranged in a matrix, the values resulting from a subtraction operation that subtracts a median value established over time for each slice from an average value for the instantaneous frame being ingested.

操作1504展示了获得在与该可变矩阵相交叉时将更均匀地分布这些经变换的值的静态矩阵。例如，如在图1至图9中所示出的和/或关于其所描述的，可以基于对先前所获得的关于散列值的数据集进行的数学分析来确定矩阵。可以在数学上对该矩阵进行优化，从而使得当用作与连续的可变矩阵的点积操作中的操作数时，相应的连续结果矩阵将包括多个值，这些值沿着分布曲线比在点积操作之前的可变矩阵更均匀地分布。Operation 1504 illustrates obtaining a static matrix that, when intersected with the variable matrix, will more evenly distribute the transformed values. For example, as shown in and/or described with respect to Figures 1 through 9, the matrix can be determined based on a mathematical analysis of a previously obtained data set of hash values. The matrix can be mathematically optimized so that, when used as an operand in a dot product operation with a continuous variable matrix, the corresponding continuous result matrix will include a plurality of values that are more evenly distributed along the distribution curve than the variable matrix prior to the dot product operation.

操作1506展示了计算该可变矩阵与该静态矩阵的点积，该点积至少包括这些被更均匀地分布的经变换的值。例如，如在图1至图9中所示出的和/或关于其所描述的，可以将包含由于减法操作所产生的多个值的可变矩阵与静态矩阵相交叉，该静态矩阵已经被预先确定为用于更均匀地分布由可变矩阵所表示的数据，从而使得所产生的矩阵更加分散开而不是围绕分布的具体部分聚拢。Operation 1506 illustrates calculating a dot product of the variable matrix with the static matrix, the dot product including at least the more evenly distributed transformed values. For example, as shown in and/or described with respect to Figures 1-9, the variable matrix including the multiple values resulting from the subtraction operation may be intersected with a static matrix that has been predetermined to more evenly distribute the data represented by the variable matrix, thereby causing the resulting matrix to be more spread out rather than clustered around a particular portion of the distribution.

图16展示了图10的示例操作流程1000的替代性实施例。图16展示了示例实施例，其中操作1504可以包括至少一项附加操作1602。Figure 16 illustrates an alternative embodiment of the example operational flow 1000 of Figure 10. Figure 16 illustrates an example embodiment in which operation 1504 may include at least one additional operation 1602.

操作1602展示了使用位置敏感散列法至少部分地基于一个或多个先前所获得的散列值来确定在与可变矩阵相交叉时将更均匀地分布该可变矩阵的这些经变换的值的静态矩阵。例如，如在图1至图9中所示出的和/或关于其所描述的，可以使用位置敏感散列技术来分析先前所摄取的视频样本，产生矩阵，从而使得当用作与连续的可变矩阵的点积操作中的操作数时，相应的连续结果矩阵将包括多个值，这些值沿着分布曲线比在点积操作之前的可变矩阵更均匀地分布。Operation 1602 illustrates determining, using locality-sensitive hashing, a static matrix that, when intersected with a variable matrix, more evenly distributes the transformed values of the variable matrix based at least in part on one or more previously obtained hash values. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , locality-sensitive hashing techniques may be used to analyze previously captured video samples to produce matrices such that, when used as operands in a dot product operation with a successive variable matrix, the corresponding successive result matrices include values that are more evenly distributed along a distribution curve than the variable matrix prior to the dot product operation.

图17展示了图10的示例操作流程1000的替代性实施例。图17展示了示例实施例，其中操作1010可以包括至少一项附加操作。附加操作可以包括操作1702和/或操作1704。FIG17 illustrates an alternative embodiment of the example operational flow 1000 of FIG10 . FIG17 illustrates an example embodiment in which operation 1010 may include at least one additional operation. The additional operation may include operation 1702 and/or operation 1704 .

操作1702展示了根据这些经变换的值构建散列值，包括至少通过将每个经变换的值减小至二进制表示来减小这些经变换的值的保真度。例如，如在图1至图9中所示出的和/或关于其所描述的，来自点积操作的结果矩阵的每个值都可以从例如从0至255的8位值(或从-127至128)减少至单个位，或者为一或者为零。Operation 1702 illustrates constructing a hash value from the transformed values, including reducing the fidelity of the transformed values by at least reducing each transformed value to a binary representation. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , each value of the result matrix from the dot product operation may be reduced from, for example, an 8-bit value from 0 to 255 (or from -127 to 128) to a single bit, either one or zero.

操作1702可以包括操作1704。操作1704展示了针对每个经变换的值确定该经变换的值是否是正数，并且如果该经变换的值是正数，将一赋值给该散列值，并且否则将零赋值给该散列值。例如，如在图1至图9中所示出的和/或关于其所描述的，来自点积操作的结果矩阵的在1与128之间的每个值都可以被减少至为1的位值，并且来自该点积操作的结果矩阵的在-127与0之间的每个值都可以被减少至为0的位值。Operation 1702 may include operation 1704. Operation 1704 illustrates determining, for each transformed value, whether the transformed value is a positive number, and if the transformed value is a positive number, assigning a one to the hash value, and otherwise assigning a zero to the hash value. For example, as shown in and/or described with respect to Figures 1 to 9, each value between 1 and 128 of the result matrix from the dot product operation may be reduced to a bit value of 1, and each value between -127 and 0 of the result matrix from the dot product operation may be reduced to a bit value of 0.

图18展示了图10的示例操作流程1000的替代性实施例。图18展示了示例实施例，其中操作1012可以包括至少一项附加操作1802。FIG18 illustrates an alternative embodiment of the example operational flow 1000 of FIG10. FIG18 illustrates an example embodiment in which operation 1012 may include at least one additional operation 1802.

操作1802展示了引用所构建的该散列值的多个最高有效位以确定数据库服务器，其中，该多个最高有效位被预先确定为定址多个数据库服务器，其中，与该多个最高有效位相关联的多个数据库服务器被建立为使得与数据库扇区相关联的至少一个索引能够完全地驻留在相应的数据库服务器的存储器中。例如，如在图1至图9中所示出的和/或关于其所描述的，可以选择2位的多个最高有效位，由此，该2位可以提供四个不同的值(00、01、10和11)，这些值中的每个值都可以被指派给不同的数据库扇区。散列值的该多个最高有效位可以被建立为用于提供足够多的服务器，从而使得与多个散列值相关联的内容可以完全装配于具体数据库扇区的存储器中，其可以是数据库服务器、群集伙伴、虚拟机、和/或另一种类型的数据库节点。位数不一定但是可以确切地表示在任何给定时间的数据库扇区的最大数量(即，在可以选择6位来提供对高达64个数据库扇区的定址时，系统可以与较少的服务器(例如，60个扇区)或者与最大64个扇区一起操作)。Operation 1802 illustrates referencing a plurality of most significant bits of the constructed hash value to determine a database server, wherein the plurality of most significant bits are predetermined to address a plurality of database servers, wherein the plurality of database servers associated with the plurality of most significant bits are established such that at least one index associated with a database sector can reside entirely within the memory of the corresponding database servers. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , a plurality of most significant bits of 2 bits can be selected such that the 2 bits can provide four different values (00, 01, 10, and 11), each of which can be assigned to a different database sector. The plurality of most significant bits of the hash value can be established to provide a sufficient number of servers such that the content associated with the plurality of hash values can completely fit within the memory of a particular database sector, which can be a database server, a cluster partner, a virtual machine, and/or another type of database node. The number of bits does not necessarily but may exactly represent the maximum number of database sectors at any given time (i.e., where 6 bits may be selected to provide addressing of up to 64 database sectors, the system may operate with fewer servers (e.g., 60 sectors) or with a maximum of 64 sectors).

图19展示了图10的示例操作流程1000的替代性实施例。图19展示了示例实施例，其中操作1014可以包括至少一项附加操作1902。FIG19 illustrates an alternative embodiment of the example operational flow 1000 of FIG10. FIG19 illustrates an example embodiment in which operation 1014 may include at least one additional operation 1902.

操作1902展示了至少将该散列值存储在所确定的该数据库扇区上，包括至少部分地基于该散列值至少将对频道的至少一个指示、对视频片段的至少一个指示、以及对与该视频片段的开始的时间代码偏移的至少一个指示存储在数据库位置处。例如，如在图1至图9中所示出的和/或关于其所描述的，可以将与视频片段相关联的数据(其可以是节目标题和/或与视频片段相关联的其他元数据)、从其中摄取节目的频道、以及从节目开始的时间偏移或者与散列值一起存储或者存储于与散列值相关联的和/或可由散列值引用的位置中，该存储可以在与散列值相同或不同的扇区、服务器或数据库中。Operation 1902 illustrates storing at least the hash value on the determined database sector, including storing at least one indication of a channel, at least one indication of a video clip, and at least one indication of a time code offset from the start of the video clip at a database location based at least in part on the hash value. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , data associated with the video clip (which may be a program title and/or other metadata associated with the video clip), the channel from which the program was ingested, and the time offset from the start of the program may be stored either with the hash value or in a location associated with and/or referenced by the hash value, which storage may be in the same or a different sector, server, or database as the hash value.

图20展示了表示关于使用距离关联性散列法来对媒体数据库进行定址的多个示例操作的操作流程2000。在图20中以及在包括操作流程的各个示例的下图中，将关于图1到图9的上述示例和/或关于其他示例和上下文来提供讨论和阐释。然而，应理解，可以在许多其他环境和上下文中和/或在图1到图9的经修改版本中执行这些操作流程。而且，虽然以所说明的序列来呈现各个操作流，但应理解，可以用与所说明的顺序不同的其他顺序来执行各个操作，或者可以同时执行这些操作。FIG20 illustrates an operational flow 2000 representing several example operations for addressing a media database using distance-associative hashing. In FIG20 and in the following figures, which include various examples of operational flows, discussion and illustration are provided with respect to the aforementioned examples of FIG1 through FIG9 and/or with respect to other examples and contexts. However, it should be understood that these operational flows can be performed in many other environments and contexts and/or in modified versions of FIG1 through FIG9. Furthermore, while the various operational flows are presented in the illustrated sequence, it should be understood that the various operations can be performed in an order other than the illustrated order or can be performed simultaneously.

在开始操作之后，操作流程2000移动到操作2002。操作2002描绘了：接收提示，该提示通过与介质存储操作相关联的一个或多个操作来构建。例如，如在图1至图9中所示出的和/或关于其所描述的，接收与从取自具体客户端系统的视频数据的样本相关联的至少一些数据。该数据可以与恰好与由摄取操作所定义的相同的客户端系统的分片相关联。可以使用与摄取操作相同的操作以算法方式处理数据以得到散列值。相应地，如果对与具体频道上的具体节目的具体时间偏移相关联的具体帧进行摄取和散列，导致与那个具体帧相关联的散列值，如果那个具体帧在被显示在客户端系统上的同时也被采样，与施加到所摄取的帧上相同的散列操作将引起与由对所摄取的帧进行的散列操作所产生的相同的散列值。但是与在摄取期间所准备的散列值相反，操作2002的提示表示与来自具体客户端系统的视频数据的样本相关联的数据。可以通过例如HTTP请求来接收提示。After a start operation, operational flow 2000 moves to operation 2002. Operation 2002 depicts receiving a hint, constructed by one or more operations associated with a media storage operation. For example, as shown in and/or described with respect to Figures 1-9, at least some data associated with a sample of video data taken from a specific client system is received. This data may be associated with a slice corresponding to the same client system as defined by the ingestion operation. The data may be algorithmically processed using the same operations as the ingestion operation to obtain a hash value. Accordingly, if a specific frame associated with a specific time offset of a specific program on a specific channel is ingested and hashed, resulting in a hash value associated with that specific frame, then the same hash operation applied to the ingested frame will result in the same hash value as would be generated by the hash operation performed on the ingested frame if that specific frame were sampled while being displayed on the client system. However, in contrast to a hash value prepared during ingestion, the hint of operation 2002 represents data associated with the sample of video data from the specific client system. The hint may be received, for example, via an HTTP request.

然后，操作2004描绘了引用所接收的该提示的多个最高有效位以确定数据库扇区。例如，如在图1至图9中所示出的和/或关于其所描述的，对于由在摄取期间用于引用数据库扇区的该多个最高有效位所定义的相同的提示的位进行检查。例如，如果摄取时的散列值的前两位用于在具体数据库扇区处存储散列值，与来自客户端系统的视频数据的样本相关联的提示的相同的前两位用于对具体数据库扇区进行定址。Operation 2004 then depicts referencing the most significant bits of the received hint to determine the database sector. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , the bits of the same hint defined by the most significant bits used to reference the database sector during ingestion are checked. For example, if the first two bits of the hash value at ingestion were used to store the hash value at a specific database sector, the same first two bits of the hint associated with the sample of video data from the client system are used to address the specific database sector.

然后，操作2006描绘了至少部分地基于所接收的该提示返回对来自该数据库扇区的至少一个候选项的至少一个指示。例如，如在图1至图9中所示出的和/或关于其所描述的，与提示确切地匹配或在该提示附近的多个散列值作为多个怀疑项或候选项中的一个或多个而被返回。可以在具体百分比半径内返回候选项。可以根据最近邻算法或经修改的最近邻算法来返回候选项。Operation 2006 then depicts returning at least one indication of at least one candidate from the database sector based at least in part on the received hint. For example, as shown in and/or described with respect to Figures 1-9, multiple hash values that exactly match the hint or are near the hint are returned as one or more of multiple suspects or candidates. Candidates may be returned within a specified percentage radius. Candidates may be returned based on a nearest neighbor algorithm or a modified nearest neighbor algorithm.

图21展示了图20的示例操作流程2000的替代性实施例。图21展示了示例实施例，其中操作2002可以包括至少一项附加操作。附加操作可以包括操作2102、操作2104和/或操作2106。Figure 21 illustrates an alternative embodiment of the example operational flow 2000 of Figure 20. Figure 21 illustrates an example embodiment in which operation 2002 may include at least one additional operation. The additional operation may include operation 2102, operation 2104, and/or operation 2106.

操作2102展示了接收与客户端系统的视频缓冲器的样本相关联的提示，包括至少接收与时刻相关的一个或多个指示，该时刻与该客户端系统的该视频缓冲器的该样本相关联。例如，如在图1至图9中所示出的和/或关于其所描述的，提示可以包括与任意时间的时间偏移或与其相关联。例如，可以从1970年1月1日开始计算该时间偏移。Operation 2102 illustrates receiving a hint associated with a sample of a video buffer of a client system, including receiving at least one or more indications associated with a time instant associated with the sample of the video buffer of the client system. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , the hint may include or be associated with a time offset associated with an arbitrary time. For example, the time offset may be calculated from January 1, 1970.

操作2104展示了：接收提示，该提示与客户端系统的视频缓冲器的样本相关联，该提示至少部分地通过对与该视频缓冲器相关联的至少一些值进行散列来确定。例如，如在图1至图9中所示出的和/或关于其所描述的，可以将一个或多个操作数用作常数通过一个或多个数学运算或算法将与视频缓冲器相关联的多个分片减少至位串，这些常数是通过例如本文中其他地方关于散列所描述的操作来预先导出的。Operation 2104 illustrates receiving a hint associated with a sample of a video buffer of a client system, the hint being determined at least in part by hashing at least some values associated with the video buffer. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , one or more operands may be used as constants to reduce a plurality of slices associated with the video buffer to a bit string via one or more mathematical operations or algorithms, the constants being pre-derived by operations such as those described elsewhere herein with respect to hashing.

操作2016展示了：接收提示，该提示与客户端系统的视频缓冲器的样本相关联，该提示至少部分地通过对与该视频缓冲器相关联的至少一些值进行散列来确定，该散列至少部分地基于同样用于相关联的介质存储操作中的至少一个操作数或至少一个算法中的一个或多个。例如，如在图1至图9中所示出的和/或关于其所描述的，通过摄取操作所利用的和/或与摄取过程所共有的数据位置相结合利用的和/或涉及用于摄取过程所利用的操作数的常数值的操作来对与表示在具体时间量子时电视屏幕显示什么的视频缓冲器的样本相关联的至少一些数据进行处理。例如，在摄取时所分析的分片的数量还可以用于提供与具体客户端系统相关联的提示。在摄取时所分析的像素分片的大小还可以用于提供与具体客户端系统相关联的提示。在摄取时用于更均匀地分布散列值的相同的预先导出的静态矩阵还可以用于对于具体客户端系统相关联的数据进行散列期间。Operation 2016 illustrates receiving a hint associated with a sample of a video buffer of a client system, the hint being determined at least in part by hashing at least some values associated with the video buffer, the hash being based at least in part on one or more of at least one operand or at least one algorithm also used in an associated media storage operation. For example, as shown in and/or described with respect to FIG1-9, at least some data associated with the sample of the video buffer representing what the television screen displayed at a particular time quantum is processed by an operation utilized by an ingestion operation and/or utilized in conjunction with a data location common to the ingestion process and/or involving constant values for operands utilized by the ingestion process. For example, the number of tiles analyzed during ingestion can also be used to provide a hint associated with a particular client system. The size of the pixel tiles analyzed during ingestion can also be used to provide a hint associated with a particular client system. The same pre-derived static matrix used to more evenly distribute hash values during ingestion can also be used during hashing of data associated with a particular client system.

图22展示了图20的示例操作流程2000的替代性实施例。图22展示了示例实施例，其中操作2002可以包括至少一项附加操作。附加操作可以包括操作2202、操作2204、操作2206、操作2208、操作2210、操作2212和/或操作2214。Figure 22 illustrates an alternative embodiment of the example operational flow 2000 of Figure 20. Figure 22 illustrates an example embodiment in which operation 2002 may include at least one additional operation. The additional operation may include operation 2202, operation 2204, operation 2206, operation 2208, operation 2210, operation 2212, and/or operation 2214.

操作2202展示了接收对客户端系统的视频缓冲器的至少一项内容的一个或多个指示。例如，如在图1至图9中所示出的和/或关于其所描述的，可以针对每个帧、或针对每三个帧、或针对每十个帧、或针对每秒、或以某种其他的间隔来读取用于客户端系统的视频缓冲器的每个预定义的分片处的每个像素位置处的红色像素、绿色像素和蓝色像素的像素值。可以由电视上的控件、由电视上的控制逻辑、由与媒体服务器耦接的系统、或其他地方来接收这些指示(像素值或其他数据)。Operation 2202 illustrates receiving one or more indications of at least one content of a video buffer of a client system. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , pixel values for red pixels, green pixels, and blue pixels at each pixel location at each predefined slice of the video buffer of the client system may be read for each frame, or for every three frames, or for every ten frames, or for every second, or at some other interval. These indications (pixel values or other data) may be received by a control on the television, by control logic on the television, by a system coupled to a media server, or elsewhere.

操作2204展示了针对包括至少一个分片的至少一个或多个像素的该视频缓冲器的该至少一项内容的该至少一个分片来确定每个分片的该一个或多个像素的以算法方式导出的值。例如，如在图1至图9中所示出的和/或关于其所描述的，可以对用于客户端系统的视频缓冲器的每个预定义的分片处的每个像素位置处的红色像素、绿色像素和蓝色像素的像素值取平均。Operation 2204 illustrates determining, for the at least one tile of the at least one item of content of the video buffer including at least one or more pixels of the at least one tile, an algorithmically derived value for the one or more pixels of each tile. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , pixel values for red pixels, green pixels, and blue pixels at each pixel location at each predefined tile of the video buffer for the client system may be averaged.

操作2206展示了从针对每个分片的平均值中减去中间点值。例如，如在图1至图9中所示出的和/或关于其所描述的，确定了通过对所摄取的内容进行的分析来建立的在每个分片处的中间点值。一旦由与媒体数据库和摄取系统相关联的系统确定，用于每个分片的中间点值可以例如被提供至客户端系统。可以不时地(每小时、每天、每月、每年)更新这些中间点值。提供用于对与客户端系统的视频缓冲器相关联的数据进行散列的这些中间点值可以是在摄取时用于对传入的内容进行散列的相同中间点值。Operation 2206 illustrates subtracting a midpoint value from the average for each shard. For example, as shown in and/or described with respect to Figures 1 to 9, a midpoint value at each shard established by analysis of the ingested content is determined. Once determined by a system associated with the media database and the ingestion system, the midpoint value for each shard can be provided, for example, to the client system. These midpoint values can be updated from time to time (hourly, daily, monthly, yearly). These midpoint values provided for hashing data associated with the client system's video buffer can be the same midpoint values used to hash the incoming content at the time of ingestion.

操作2208展示了对由于该减法所产生的这些值进行变换。例如，如在图1至图9中所示出的和/或关于其所描述的，由减法所产生的这些值被填充到矩阵中并且与预定义的静态矩阵相交叉。可以在将与视频缓冲器内的帧相关联的像素分片数据转换为提示的过程期间在客户端系统处执行将该两个矩阵相交叉的点积操作，从而使得在HTTP请求中而不是实际像素分片数据中发送提示，导致紧凑的HTTP消息。预定义的静态矩阵可以在变换之前被提供给客户端系统、并且可以是与被产生以在摄取时更均匀地分布多个经散列的值的相同的矩阵。可以不时地在客户端系统处更新预定义的静态矩阵。替代性地，可以将分片数据(与或不与其他元数据一起)从客户端系统(例如，电视)发送至不同的系统用于进行处理和/或散列。Operation 2208 shows that these values produced due to this subtraction are transformed.For example, as shown in Figure 1 to Figure 9 and/or described about it, these values produced by subtraction are filled in the matrix and intersect with the predefined static matrix.Can be during the process of converting the pixel slice data associated with the frame in the video buffer into prompts at the client system, perform the dot product operation of these two matrices intersecting, so that prompts are sent in the HTTP request rather than in the actual pixel slice data, resulting in a compact HTTP message.Predefined static matrix can be provided to the client system before transformation and can be the same matrix that is generated to more evenly distribute multiple hashed values when ingested.Predefined static matrix can be updated from time to time at the client system.Alternatively, the slice data (with or without other metadata) can be sent to different systems for processing and/or hashing from the client system (for example, TV).

操作2210展示了根据这些经变换的值构建散列值。例如，如在图1至图9中所示出的和/或关于其所描述的，由于将具有与视频缓冲器相关联的值的矩阵与预先导出的静态矩阵相交叉而产生的矩阵中的那些值可以被减少至多个位，其中，单个位替代矩阵中的每个8位值。在其他实施例中，所构建的散列值可以包括用于矩阵中的每个值的不同位数。在不同的实施例中，所构建的散列值可以具有与矩阵中的那些值相同的位数，或者可以是对矩阵中的那些值的直接表示。Operation 2210 illustrates constructing a hash value based on these transformed values. For example, as shown in and/or described with respect to Figures 1 to 9, the values in the matrix resulting from intersecting a matrix having values associated with the video buffer with a pre-derived static matrix can be reduced to a plurality of bits, wherein a single bit replaces each 8-bit value in the matrix. In other embodiments, the constructed hash value can include a different number of bits for each value in the matrix. In various embodiments, the constructed hash value can have the same number of bits as the values in the matrix, or can be a direct representation of the values in the matrix.

操作2212展示了将该提示至少部分地与所构建的该散列值相关联。例如，如在图1至图9中所示出的和/或关于其所描述的，从经变换的矩阵中所构建出来的位串可以是提示、或者可以将所构建的位串与时间(例如，时刻)相关联以形成提示、或者可以关联其他数据(如，IP地址或与客户端电视或客户端电视的控件相关联的其他标识符)以形成提示。替代性地，提示可以包括与客户端系统处的视听内容相关联的任何其他元数据或者以其他方式与其相关联。Operation 2212 illustrates associating the hint at least in part with the constructed hash value. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , a bit string constructed from the transformed matrix may be the hint, or the constructed bit string may be associated with a time (e.g., a moment in time) to form a hint, or other data (e.g., an IP address or other identifier associated with the client television or a control on the client television) may be associated to form a hint. Alternatively, the hint may include or be otherwise associated with any other metadata associated with the audiovisual content at the client system.

操作2214展示了确定操作2204、减法操作2206、变换操作2208、或构建操作2210中的至少一个利用同样用于相关联的介质存储操作中的至少一个操作数或至少一个算法中的一个或多个。例如，如在图1至图9中所示出的和/或关于其所描述的，可以将包括对像素分片的数量的定义、对像素分片的大小的定义、与像素分片相关联的预定义的中值、或预定义的静态矩阵中的一项或多项的一个或多个参数提供给客户端TV，该一个或多个参数还被摄取过程所利用从而使得应用至来自视频缓冲器的样本的多个操作将导致与当对那个帧(例如，相同的视频片段和时间偏移)进行摄取和散列时所产生的散列值相同的散列值。Operation 2214 illustrates that at least one of the determining operation 2204, the subtracting operation 2206, the transforming operation 2208, or the constructing operation 2210 utilizes one or more of the at least one operand or the at least one algorithm that is also used in the associated media storage operation. For example, as shown in and/or described with respect to FIG. 1 through FIG. 9 , one or more parameters including one or more of a definition of the number of pixel tiles, a definition of the size of the pixel tiles, a predefined median value associated with the pixel tiles, or a predefined static matrix may be provided to the client TV, the one or more parameters also being utilized by the ingestion process such that multiple operations applied to samples from the video buffer will result in the same hash value as would have been produced if that frame (e.g., the same video segment and time offset) had been ingested and hashed.

图23展示了图20的示例操作流程2000的替代性实施例。图23展示了示例实施例，其中操作2006可以包括至少一项附加操作。附加操作可以包括操作2302和/或操作2304。FIG23 illustrates an alternative embodiment of the example operational flow 2000 of FIG20 . FIG23 illustrates an example embodiment in which operation 2006 may include at least one additional operation. The additional operation may include operation 2302 and/or operation 2304.

操作2302展示了至少部分地基于作为所接收的该提示的函数的平等球中的概率点位置(“PPLEB”)算法来返回对来自该数据库扇区的至少一个候选项的至少一个指示。例如，如在图1至图9中所示出的和/或关于其所描述的，从通过摄取过程所构建和/或修改的媒体数据库中返回表示与提示接近的路径点(例如，邻项、最近邻项、在半径内、从在相同的储物桶内、属于相同的环等)的候选项或怀疑项中的至少一个。Operation 2302 illustrates returning at least one indication of at least one candidate from the database sector based at least in part on a Probability Point Location in Equal Sphere ("PPLEB") algorithm as a function of the received cue, such as returning at least one candidate or suspect from a media database constructed and/or modified by an ingestion process representing a waypoint that is close to the cue (e.g., a neighbor, a nearest neighbor, within a radius, from within the same bucket, belonging to the same ring, etc.) as shown in and/or described with respect to FIGs. 1-9.

操作2304展示了：至少部分地基于所接收的该提示来返回对来自该数据库扇区的至少一个候选项的至少一个指示，该至少一个候选项在所接收的该提示的预先确定的逆百分比分布半径内。例如，如在图1至图9中所示出的和/或关于其所描述的，返回与关于提示与散列值中的至少一项的位置敏感散列法相关联的至少一个候选项或怀疑项。Operation 2304 illustrates returning, based at least in part on the received hint, at least one indication of at least one candidate from the database sector that is within a predetermined inverse percentile distribution radius of the received hint, such as, for example, returning at least one candidate or suspect associated with locality-sensitive hashing of at least one of the hint and the hash value as shown in and/or described with respect to FIG. 1-9 .

图24展示了表示关于使用距离关联性散列法来对媒体数据库进行定址的多个示例操作的操作流程2400。在图24中以及在包括操作流程的各个示例的下图中，将关于图1到图9的上述示例和/或关于其他示例和上下文来提供讨论和阐释。然而，应理解，可以在许多其他环境和上下文中和/或在图1到图9的经修改版本中执行这些操作流程。而且，虽然以所说明的序列来呈现各个操作流，但应理解，可以用与所说明的顺序不同的其他顺序来执行各个操作，或者可以同时执行这些操作。FIG24 illustrates an operational flow 2400 representing a number of example operations for addressing a media database using distance-associative hashing. In FIG24 and in the following figures, which include various examples of operational flows, discussion and illustration are provided with respect to the aforementioned examples of FIG1 through FIG9 and/or with respect to other examples and contexts. However, it should be understood that these operational flows can be performed in many other environments and contexts and/or in modified versions of FIG1 through FIG9. Furthermore, while the various operational flows are presented in the illustrated sequence, it should be understood that the various operations can be performed in an order other than the illustrated order or can be performed simultaneously.

在开始操作之后，操作流程2400移动到操作2402。操作2402描绘了接收对至少一个候选项的至少一个指示和对至少一个提示的至少一个指示。例如，如在图1至图9中所示出的和/或关于其所描述的，与客户端系统的视频缓冲器有关的散列值、连同一个或多个相关联的候选项或怀疑项一起被确定。After a start operation, operational flow 2400 moves to operation 2402. Operation 2402 depicts receiving at least one indication of at least one candidate and at least one indication of at least one hint. For example, as shown in and/or described with respect to FIGURES 1-9, a hash value associated with a video buffer of the client system is determined, along with one or more associated candidates or suspects.

然后，操作2404描绘了将令牌添加到与至少一个所接收的候选项相关联的箱。例如，如在图1至图9中所示出的和/或关于其所描述的，对候选项进行评分是通过添加至与候选项/怀疑项相对应的箱上的令牌来进行的，该令牌是例如每次令牌加一时递增的值。Then, operation 2404 depicts adding a token to a bin associated with at least one received candidate. For example, as shown in and/or described with respect to FIGURES 1-9, scoring the candidate is performed by adding a token to a bin corresponding to the candidate/suspect, the token being a value that increments, for example, each time the token is incremented by one.

然后，操作2406描绘了：确定在箱内的令牌的数量是否超过与客户端系统正在显示与至少一个提示相关联的具体视频片段的概率相关联的值，并且如果在箱内的令牌的该数量超过了与客户端系统正在显示与至少一个提示相关联的具体视频片段的概率相关联的值，至少部分地基于该箱返回与该具体视频片段相关联的至少一些数据。例如，如在图1至图9中所示出的和/或关于其所描述的，对具体视频片段和该视频片段的具体偏移的确定通过与这些箱相关联的该评分来在概率上确定。Operation 2406 then depicts determining whether the number of tokens within the bin exceeds a value associated with a probability that the client system is displaying a specific video segment associated with the at least one prompt, and if the number of tokens within the bin exceeds the value associated with the probability that the client system is displaying the specific video segment associated with the at least one prompt, returning at least some data associated with the specific video segment based at least in part on the bin. For example, as shown in and/or described with respect to Figures 1-9, the determination of the specific video segment and the specific offset of the video segment is probabilistically determined by the scores associated with the bins.

图25展示了图24的示例操作流程2400的替代性实施例。图25展示了示例实施例，其中操作2404可以包括至少一项附加操作2502。Figure 25 illustrates an alternative embodiment of the example operational flow 2400 of Figure 24. Figure 25 illustrates an example embodiment in which operation 2404 may include at least one additional operation 2502.

操作2502展示了将令牌添加到与至少一个所接收的候选项相关联的时间箱。例如，如在图1至图9中所示出的和/或关于其所描述的，与候选项/怀疑项相关联的数据结构可以包括由任意时间所分组的任意时间箱。Operation 2502 illustrates adding a token to a time bin associated with at least one received candidate. For example, as shown in and/or described with respect to Figures 1-9, the data structure associated with the candidate/suspect may include arbitrary time bins grouped by arbitrary time.

图26展示了图20的示例操作流程2400的替代性实施例。图26展示了示例实施例，其中操作2404可以包括至少一项附加操作。附加操作可以包括操作2602和/或操作2604。进一步地，操作流程2400可以包括至少一项附加操作2606。FIG26 illustrates an alternative embodiment of the example operational flow 2400 of FIG20 . FIG26 illustrates an example embodiment in which operation 2404 may include at least one additional operation. The additional operation may include operation 2602 and/or operation 2604. Furthermore, operational flow 2400 may include at least one additional operation 2606.

操作2602展示了：确定相对时间，包括至少从与该至少一个提示相关联的任意时间减去与该至少一个候选项相关联的候选项时间。例如，如在图1至图9中所示出的和/或关于其所描述的，从与有关接收自客户端系统(电视、机顶盒、或物品、机器、或显示和/或提供和/或接收视频内容的组合物)的提示的时刻相关联的任意时间中减去与候选项相关联的视频片段的时间偏移。Operation 2602 illustrates determining a relative time, including subtracting a candidate time associated with the at least one candidate from an arbitrary time associated with the at least one prompt, such as, for example, subtracting a time offset of a video segment associated with the candidate from an arbitrary time associated with a moment of receiving a prompt from a client system (a television, a set-top box, or an article, a machine, or a combination thereof that displays and/or provides and/or receives video content), as shown in and/or described with respect to FIG. 1 through FIG. 9 .

操作2604展示了至少部分地基于所确定的该相对时间将令牌添加到与该候选项相关联的时间箱。例如，如在图1至图9中所示出的和/或关于其所描述的，当与客户端系统相关联的提示点匹配或几乎匹配与媒体数据库相关联的参考提示点时，可以向箱添加令牌，这可以包括将与箱相关联的值或其他跟踪箱操作的手段加一。Operation 2604 illustrates adding a token to a time bin associated with the candidate based at least in part on the determined relative time. For example, as shown in and/or described with respect to FIGURES 1-9, when a cue point associated with the client system matches or nearly matches a reference cue point associated with the media database, a token may be added to the bin, which may include incrementing a value associated with the bin or other means of tracking bin operations.

操作2606展示了至少部分地基于所经过的时间段从时间箱移除一个或多个令牌。例如，如在图1至图9中所示出的和/或关于其所描述的，箱可以是泄露的从而使得与旧怀疑项/候选项相关联的数据和/或令牌可以从箱中释放出来，这可以包括将与箱相关联的值或其他跟踪箱操作的手段减一。Operation 2606 illustrates removing one or more tokens from a time bin based at least in part on the elapsed time period. For example, as shown in and/or described with respect to FIGURES 1-9, a bin may be leaked such that data and/or tokens associated with old suspects/candidates may be released from the bin, which may include decrementing a value associated with the bin or other means of tracking bin operations.

在不同的实施例中，像素位置可以涉及一种或许多种颜色和/或颜色空间/模型(例如，红色、蓝色、绿色；红色、蓝色、绿色和黄色；青色、品红色、黄色、和黑色；单个像素值唯一地标识一种颜色，例如，与像素位置相关联的24位值；色调、饱和度、亮度；等)。可以使用在分片中不同的像素数量，并且分片不一定是正方形分片。进一步地，客户端系统的视频缓冲器的分辨率可以变化。在客户端系统和摄取系统处的分辨率和/或色密度可以变化。系统可以以不同的光栅分辨率(包括但不限于1920乘1080、3840乘2160、1440×1080、1366×768或其他分辨率)来操作。预期的是，在接下来的二十年内，将发生常用节目、电视和/或客户端系统的像素分辨率的增加；虽然像素分片数量、大小、采样率或其他方面可能变化，可以利用相同的基本操作。进一步地，上变频、下变频、或与分辨率和/或色密度相关联的其他变换操作可以发生和/或插入到在此所描述的其他操作之间。In various embodiments, a pixel position may relate to one or more colors and/or color spaces/models (e.g., red, blue, green; red, blue, green, and yellow; cyan, magenta, yellow, and black; a single pixel value uniquely identifying a color, e.g., a 24-bit value associated with a pixel position; hue, saturation, brightness; etc.). Different numbers of pixels in a tile may be used, and the tiles need not be square tiles. Furthermore, the resolution of the client system's video buffer may vary. The resolution and/or color density at the client system and the ingest system may vary. The system may operate at different raster resolutions (including, but not limited to, 1920 by 1080, 3840 by 2160, 1440×1080, 1366×768, or other resolutions). It is expected that over the next twenty years, an increase in the pixel resolution of common programs, televisions, and/or client systems will occur; while the number of pixel tiles, their size, sampling rate, or other aspects may vary, the same basic operations may be utilized. Further, upconversion, downconversion, or other conversion operations associated with resolution and/or color density may occur and/or be inserted between other operations described herein.

图27展示了可以在其中实现实施例的示例系统2700。系统2700包括一个或多个计算装置2702。系统2700还展示了用于促进在一个或多个计算装置与一个或多个客户端装置2706之间的通信的构架2704。系统2700包括还展示了一个或多个客户端装置2706。在一些实施例中，该一个或多个客户端装置可以在该一个或多个计算装置之间。系统2700还展示了至少一个非瞬态计算机可读介质2708。在一些实施例中，2708可以包括一条或多条指令2710，当该一条或多条指令被执行于该一个或多个计算装置中的至少一些上时使该一个或多个计算装置中的至少一些至少执行以下操作：接收至少一个栅格化视频流；创建与至少一个所接收的栅格化视频流的至少一个样本相关联的至少一个散列值；确定用于存储所创建的至少一个散列值的至少一个数据库扇区；以及将所创建的至少一个散列值存储在所确定的至少一个数据库扇区上。在不同的实施例中，该一条或多条指令可以执行于单个计算装置上。在其他实施例中，该一条或多条指令中的一些部分可以通过该一个或多个计算装置中的第一多个计算装置来执行，而该一条或多条指令中的其他部分可以通过该一个或多个计算装置中的第二多个计算装置来执行。FIG27 illustrates an example system 2700 in which embodiments may be implemented. System 2700 includes one or more computing devices 2702. System 2700 also illustrates a framework 2704 for facilitating communication between the one or more computing devices and one or more client devices 2706. System 2700 also illustrates one or more client devices 2706. In some embodiments, the one or more client devices may be between the one or more computing devices. System 2700 also illustrates at least one non-transitory computer-readable medium 2708. In some embodiments, 2708 may include one or more instructions 2710 that, when executed on at least some of the one or more computing devices, cause at least some of the one or more computing devices to perform at least the following operations: receive at least one rasterized video stream; create at least one hash value associated with at least one sample of at least one received rasterized video stream; determine at least one database sector for storing the at least one created hash value; and store the at least one created hash value on the at least one determined database sector. In various embodiments, the one or more instructions may be executed on a single computing device. In other embodiments, some portions of the one or more instructions may be executed by a first plurality of the one or more computing devices, while other portions of the one or more instructions may be executed by a second plurality of the one or more computing devices.

图28展示了可以在其中实现实施例的示例系统2800。系统2800包括一个或多个计算装置2802。系统2800还展示了用于促进在一个或多个计算装置与一个或多个客户端装置2806之间的通信的构架2804。系统2800包括还展示了一个或多个客户端装置2806。在一些实施例中，该一个或多个客户端装置可以在该一个或多个计算装置之间。系统2800还展示了至少一个非瞬态计算机可读介质2808。在一些实施例中，2808可以包括一条或多条指令2810，当该一条或多条指令被执行于该一个或多个计算装置中的至少一些上时使该一个或多个计算装置中的至少一些至少执行以下操作：接收与至少一个客户端系统的至少一个视频缓冲器相关联的一个或多个指令；至少部分地基于该至少一个视频缓冲器以及与该至少一个视频缓冲器相关联的至少一个时刻来确定提示，其中，与确定该提示相关联的至少一个操作数或至少一个函数中的一个或多个还被用于相关联的介质存储操作中；引用所确定的提示的多个最高有效位以确定数据库扇区；以及至少部分地基于所确定的提示返回对来自所确定的数据库扇区的至少一个候选项的至少一个指示。在不同的实施例中，该一条或多条指令可以执行于单个计算装置上。在其他实施例中，该一条或多条指令中的一些部分可以通过该一个或多个计算装置中的第一多个计算装置来执行，而该一条或多条指令中的其他部分可以通过该一个或多个计算装置中的第二多个计算装置来执行。FIG28 illustrates an example system 2800 in which embodiments may be implemented. System 2800 includes one or more computing devices 2802. System 2800 also illustrates a framework 2804 for facilitating communication between the one or more computing devices and one or more client devices 2806. System 2800 also illustrates one or more client devices 2806. In some embodiments, the one or more client devices may be between the one or more computing devices. System 2800 also illustrates at least one non-transitory computer-readable medium 2808. In some embodiments, 2808 may include one or more instructions 2810 that, when executed on at least some of the one or more computing devices, cause at least some of the one or more computing devices to perform at least the following operations: receive one or more instructions associated with at least one video buffer of at least one client system; determine a hint based at least in part on the at least one video buffer and at least one time associated with the at least one video buffer, wherein one or more of at least one operand or at least one function associated with determining the hint is also used in an associated media storage operation; reference a plurality of most significant bits of the determined hint to determine a database sector; and return at least one indication of at least one candidate from the determined database sector based at least in part on the determined hint. In various embodiments, the one or more instructions may be executed on a single computing device. In other embodiments, some portions of the one or more instructions may be executed by a first plurality of computing devices among the one or more computing devices, while other portions of the one or more instructions may be executed by a second plurality of computing devices among the one or more computing devices.

图29展示了可以在其中实现实施例的示例系统2900。系统2900包括一个或多个计算装置2902。系统2900还展示了用于促进在一个或多个计算装置与一个或多个客户端装置2906之间的通信的构架2904。系统2900包括还展示了一个或多个客户端装置2906。在一些实施例中，该一个或多个客户端装置可以在该一个或多个计算装置之间。系统2900还展示了至少一个非瞬态计算机可读介质2908。在一些实施例中，2908可以包括一条或多条指令2910，当该一条或多条指令被执行于该一个或多个计算装置中的至少一些上时使该一个或多个计算装置中的至少一些至少执行以下操作：接收对至少一个候选项的至少一个指示和对至少一个提示的至少一个指示；将令牌添加到与至少一个所接收的候选项相关联的箱；以及确定在箱内的令牌的数量是否超过与客户端系统正在接收与所接收到的至少一个提示相关联的具体视频片段的概率相关联的值，并且如果在箱内的令牌的该数量超过了与客户端系统正在接收与所接收到的至少一个提示相关联的具体视频片段的概率相关联的值，至少部分地基于该箱返回与该具体视频片段相关联的至少一些数据。在不同的实施例中，该一条或多条指令可以执行于单个计算装置上。在其他实施例中，该一条或多条指令中的一些部分可以通过该一个或多个计算装置中的第一多个计算装置来执行，而该一条或多条指令中的其他部分可以通过该一个或多个计算装置中的第二多个计算装置来执行。FIG29 illustrates an example system 2900 in which embodiments may be implemented. System 2900 includes one or more computing devices 2902. System 2900 also illustrates a framework 2904 for facilitating communication between the one or more computing devices and one or more client devices 2906. System 2900 also illustrates one or more client devices 2906. In some embodiments, the one or more client devices may be between the one or more computing devices. System 2900 also illustrates at least one non-transitory computer-readable medium 2908. In some embodiments, 2908 may include one or more instructions 2910 that, when executed on at least some of the one or more computing devices, cause at least some of the one or more computing devices to perform at least the following operations: receive at least one indication of at least one candidate and at least one indication of at least one hint; add tokens to a bin associated with at least one received candidate; and determine whether the number of tokens in the bin exceeds a value associated with a probability that the client system is receiving a specific video segment associated with the received at least one hint, and if the number of tokens in the bin exceeds the value associated with the probability that the client system is receiving the specific video segment associated with the received at least one hint, return at least some data associated with the specific video segment based at least in part on the bin. In various embodiments, the one or more instructions may be executed on a single computing device. In other embodiments, some portions of the one or more instructions may be executed by a first plurality of computing devices among the one or more computing devices, while other portions of the one or more instructions may be executed by a second plurality of computing devices among the one or more computing devices.

本发明的某些方面包括在此以算法形式描述的处理步骤和指令。应当注意，本发明的这些处理步骤和指令可以体现在软件、固件或硬件中，并且当体现在软件中时，可以被下载以驻留在由实时网络操作系统使用的不同的平台上并且从这些不同的平台来操作。Certain aspects of the present invention include processing steps and instructions described herein in algorithmic form. It should be noted that these processing steps and instructions of the present invention can be embodied in software, firmware, or hardware, and when embodied in software, can be downloaded to reside on and operate from different platforms used by real-time network operating systems.

本发明还涉及一种用于执行在此的操作的设备。该设备可被专门构造用于所需目的，或者其可以包括通过存储于计算机中的计算机程序选择性地激活或重新布置的通用计算机。此类计算机程序可存储于计算机可读存储介质，诸如但不限于任何类型的盘，包括软盘、光盘、CD-ROM、磁光盘、只读存储器(ROM)、随机存取存储器(RAM)、EPROM、EEPROM、磁卡或光卡、专用集成电路(ASIC)、或适用于存储电子指令的任何类型的介质，并且每一者均耦接至计算机系统总线。The present invention also relates to an apparatus for performing the operations herein. The apparatus may be specially constructed for the required purpose, or it may comprise a general-purpose computer selectively activated or rearranged by a computer program stored in the computer. Such a computer program may be stored on a computer-readable storage medium, such as, but not limited to, any type of disk, including a floppy disk, an optical disk, a CD-ROM, a magneto-optical disk, a read-only memory (ROM), a random access memory (RAM), an EPROM, an EEPROM, a magnetic or optical card, an application-specific integrated circuit (ASIC), or any type of medium suitable for storing electronic instructions, and each coupled to a computer system bus.

另外，在本说明书中所涉及的计算机或计算装置可以包括单个处理器或者可以采用用于增加计算能力的多处理器设计。Additionally, the computers or computing devices referred to in this specification may include a single processor or may employ a multi-processor design for increased computing capability.

在此所提出的算法和显示并非固有地与任何特定计算机或其他装置相关。依据本文的教导，各种通用系统也可以与程序一起使用，或构建更专业的装置以执行所需的方法步骤可证明是方便的。用于多种这些系统的所需结构将从上文描述中显现。另外，没有参照任何特定的程序语言或操作系统来描述本发明。应当认识到，各种编程语言和操作系统可以用于实现如在此所描述的本发明的教导。The algorithm and display proposed herein are not inherently related to any particular computer or other device. According to the teachings of this paper, various general-purpose systems can also be used together with the program, or it can be convenient to build a more specialized device to perform the required method steps. The required structure for a variety of these systems will appear from the above description. In addition, the present invention is not described with reference to any specific programming language or operating system. It should be appreciated that various programming languages and operating systems can be used to implement the teachings of the present invention as described herein.

本说明书中所描述的系统和方法、流程图以及结构框图可以在包括程序代码的计算机处理系统中实现，该程序代码包括可由计算机处理系统执行的程序指令。还可以使用其他实现方式。此外，本文中所描述的流程图和结构框图描述了支持多个步骤和相应功能(这些步骤和相应功能支持所披露的结构手段)的具体方法和/或相应动作，并且还可以用于实现相应的软件结构和算法、及其等效物。The systems and methods, flow charts, and structured block diagrams described in this specification can be implemented in a computer processing system including program code comprising program instructions executable by the computer processing system. Other implementations can also be used. In addition, the flow charts and structured block diagrams described herein describe specific methods and/or corresponding actions that support multiple steps and corresponding functions (these steps and corresponding functions support the disclosed structural means), and can also be used to implement corresponding software structures and algorithms, and their equivalents.

本说明书中所披露的主题的实施例可以被实现为一个或多个计算机程序产品，即，编码在有形程序载体上的计算机程序指令的一个或多个模块，以便由数据处理装置执行或控制其操作。计算机储存介质可以是机器可读存储设备、机器可读储存基板、存储器设备、或它们中的一个或多个的组合。Embodiments of the subject matter disclosed in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible program carrier for execution by or to control the operation of a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, or a combination of one or more of these.

计算机程序(又称为程序、软件、软件应用、脚本或代码)可以用任何形式的编程语言编写，包括编译型语言或解释型语言、或声明型语言或程序语言，并且其可以用任何形式部署，包括作为独立式程序或作为模块、组件、子例程、或适合于在计算环境中使用的其他单元。计算机程序不一定对应于文件系统中的文件。程序可以存储在文件的保持其他程序或数据的部分中(例如，存储在标记语言文档中的一个或多个脚本)、存储在专用于所讨论的程序的单个文件中、或者存储在多个协调文件(例如，存储一个或多个模块、子程序、或代码的多个部分的文件)中。计算机程序可以被部署成在一个计算机上或者在位于一个站点或分布在多个站点并且通过合适的通信网络互连的多个计算机上被执行。A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files storing multiple portions of one or more modules, subroutines, or code). A computer program can be deployed to be executed on one computer or on multiple computers located at one site or distributed across multiple sites and interconnected by a suitable communication network.

本说明书中所描述的过程或逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程处理器执行以便执行对输入数据进行操作并且生成输出的功能。这些过程或逻辑流程还可以由装置执行，并且装置还可以被实现为专用逻辑电路，例如，FPGA(现场可编程门阵列)或ASIC(专用集成电路)。The processes or logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions of operating on input data and generating output. These processes or logic flows can also be performed by an apparatus, and the apparatus can also be implemented as a special-purpose logic circuit, such as an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

计算机的必不可少的元件是用于执行指令的处理器和一个或多个用于存储指令和数据的存储器设备。通常，计算机还将包括一个或多个用于存储数据的存储设备(例如，磁盘、磁光盘、或光盘)或者操作性地联接成用于从其中接收数据或向其传输数据或者既接收又传输数据。然而，计算机不需要具有这种设备。适合于执行计算机程序的处理器仅举例来讲而非限制性地包括通用微处理器和专用微处理器两者、以及任何种类的数字计算机的任何一个或多个处理器。通常，处理器将从只读存储器或随机存取存储器或两者接收指令和数据。The essential element of a computer is a processor for executing instructions and one or more memory devices for storing instructions and data. Usually, a computer will also include one or more storage devices (such as, magnetic disks, magneto-optical disks, or optical disks) for storing data or be operatively connected to receive data therefrom or transmit data thereto or both receive and transmit data. However, a computer does not need to have such a device. The processor suitable for executing a computer program only includes, by way of example and not limitation, any one or more processors of a general-purpose microprocessor and a special-purpose microprocessor and a digital computer of any kind. Usually, a processor will receive instructions and data from a read-only memory or a random access memory or both.

为了提供与本文中描述的系统的用户或管理者交互，本说明书中所描述的主题的实施例可以实现在具有用于向用户显示信息的显示设备(例如，CRT(阴极射线管)或LCD(液晶显示器)监控器)以及用户可以通过其向计算机提供输入的键盘(例如，鼠标或追踪球)以及定点设备的计算机上。其他种类的设备也可以用于提供与用户交互。例如向用户提供的反馈可以是任何形式的感觉反馈，例如视觉反馈、听觉反馈或者触觉反馈；并且可以用任何形式(包括声学、语音或者触觉输入)接收来自用户的输入。To provide for user or administrator interaction with the systems described herein, embodiments of the subject matter described in this specification can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user, and a keyboard (e.g., a mouse or trackball) and a pointing device through which the user can provide input to the computer. Other types of devices can also be used to provide for user interaction. For example, feedback provided to the user can be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, voice, or tactile input.

本说明书中所描述的主题的实施例可以实现在计算系统中，该计算系统包括包括一个或多个数据服务器的后端组件、或包括具有一个或多个中间件组件(如应用服务器)、或包括具有图形用户界面或Web浏览器(用户或管理员可以通过其与本说明书中所描述的主题的某些实现方式交互)的前端组件(如客户端计算机)、或一个或多个这种后端组件、中间件、或前端组件的任意组合。系统的组件可以通过任何数字数据通信形式或介质(如通信网络)互连。计算系统可以包括客户端和服务器。客户端和服务器通常远离彼此并且通常通过通信网络交互。客户端与服务器的关系借助于在各自的计算机上运行的并且彼此具有客户端服务器关系的计算机程序产生。The embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component including one or more data servers, or includes a front-end component (such as a client computer) having one or more middleware components (such as an application server), or includes a front-end component (such as a client computer) having a graphical user interface or a web browser (through which a user or administrator can interact with certain implementations of the subject matter described in this specification), or any combination of one or more such back-end components, middleware, or front-end components. The components of the system can be interconnected through any digital data communication form or medium (such as a communication network). The computing system can include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship between the client and the server is generated by means of computer programs running on respective computers and having a client-server relationship with each other.

虽然本说明书包含许多特定实现方式细节，但这些不应被解释为对任何发明或可能要求的事物的范围的限制，而是被解释为对可能特定于具体发明的具体实施例的特征的描述。在单独的实施例的背景下在本说明书中所描述的某些特征还可以按组合形式实现在单一实施例中。While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments may also be implemented in combination in a single embodiment.

相反，在单一实施例的背景下描述的不同特征也可以被单独的或以任何合适的子组合的方式实现在多个实施例中。而且，尽管特征以上可以被描述为以某些组合起作用并且甚至如此最初被要求，但来自所要求的组合的一个或多个特征在某些情况下可以与组合离体，并且所要求的组合可以针对子组合或子组合的变化。Conversely, various features described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as functioning in certain combinations and even initially claimed as such, one or more features from the claimed combination may in some cases be separated from the combination, and the claimed combination may be directed to subcombinations or variations of the subcombinations.

类似地，虽然附图中以具体顺序描绘了操作，但这不应被理解成要求这种操作以所示的具体顺序或以有序顺序执行，或者所有展示的操作可以被执行，以实现令人希望的结果。在某些情况下，多重任务处理和并行处理可能是有利的。而且，上述实施例中的不同系统组件的分离不应被理解成在所有实施例中都要求这种分离，并且应理解的是，所描述的程序部件和系统通常可以一起整合在单个软件产品中或封装进多个软件产品中。Similarly, although operations are depicted in a specific order in the accompanying drawings, this should not be understood as requiring that such operations be performed in the specific order shown or in an ordered order, or that all of the operations shown can be performed to achieve the desired results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of different system components in the above-described embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

所编写的本说明书阐述了本发明的最佳模式并且提供了多个示例来描述本发明并且能够使本领域的普通技术人员制作和使用本发明。所编写的本说明书不将本发明局限于所阐述的精确术语。因此，虽然参照以上阐述的示例详细描述了本发明，但本领域的普通技术人员可以对这些示例实施变化、修改和改变而不脱离本发明的范围。This specification has been written to set forth the best mode of the present invention and to provide a number of examples to describe the invention and to enable one skilled in the art to make and use the invention. This specification has been written not to limit the invention to the precise terms set forth. Thus, while the invention has been described in detail with reference to the examples set forth above, variations, modifications, and alterations to these examples may be made by one skilled in the art without departing from the scope of the invention.

Claims

1. A method for addressing a media database using distance-associative hashing, the method comprising:

receiving one or more indications of samples of a video segment;

determining, for a patch of one or more pixels of the sample, an algorithmically derived value of the one or more pixels;

deriving a new value by subtracting a midpoint value established for the slice from the algorithmically derived value;

transforming the new value, wherein the transforming comprises uniformly distributing the value using a pre-derived function;

generating a hash value, wherein said generating comprises using at least said transformed new value;

determining a database sector by referencing a plurality of most significant bits of the hash value, wherein the plurality of most significant bits are predetermined to be an address of a corresponding database sector, and wherein one or more remaining bits of the hash value address respective buckets of the database sector; and

The hash value is stored on the database sector.

2. The method of claim 1, wherein at least one of receiving, determining, deriving, transforming, generating, referencing, or storing is at least partially implemented using one or more processing devices.

3. The method of claim 1 , wherein the receiving comprises:

One or more indications of frames or still images are received.

4. The method of claim 1, wherein the one or more indications are associated with an indication of a channel, an indication of a video segment, or an indication of a time code offset from the start of the video segment.

5. The method of claim 1, wherein the midpoint value has been previously determined for a plurality of channels over a time period using data from the slices.

6. The method of claim 1 , wherein the transforming comprises:

forming a variable matrix including at least the new value;

obtaining a static matrix that more evenly distributes values when intersected with the variable matrix; and

Calculate the dot product of the variable matrix and the static matrix.

7. The method of claim 6, wherein the obtaining comprises:

A static matrix that more evenly distributes values of the variable matrix when intersected with the variable matrix is determined based on one or more previously obtained hash values using locality-sensitive hashing.

8. The method of claim 1 , wherein said generating comprises:

A hash value is generated from at least the transformed new value by reducing the fidelity of the transformed new value by reducing the transformed new value to a binary representation.

9. The method of claim 8, wherein the reducing comprises:

It is determined whether the transformed new value is a positive number, and if the transformed new value is a positive number, one is assigned to the hash value, otherwise zero is assigned to the hash value.

10. The method of claim 1, wherein the storing comprises:

An indication of a channel, an indication of a video segment, or an indication of a time code offset from the start of the video segment is stored at a database location based at least in part on the hash value.

11. A method for addressing a media database using distance-associative hashing, the method comprising:

Receiving a prompt, the prompt generated by one or more operations, wherein the one or more operations include:

receiving one or more indications of contents in a video buffer of a client system;

determining, for a tile of one or more pixels of the content, an algorithmically derived value of the one or more pixels;

deriving a new value by subtracting the midpoint value from the algorithmically derived value;

generating a hash value, wherein said generating comprises using at least said transformed new value; and

associating a hint at least in part with the hash value;

An indication of a candidate from the database sector is returned based at least in part on the received hint.

12. The method of claim 11, wherein receiving the prompt comprises:

A hint is received associated with a sample in a video buffer of a client system, wherein the hint comprises one or more indications related to a time instant associated with the sample in the video buffer of the client system.

13. The method of claim 11, wherein the hint is associated with samples in a video buffer of the client system, and wherein the hint is determined at least in part by hashing at least some values associated with the video buffer.

14. The method of claim 13, wherein the hash value is generated based at least in part on one or more of at least one operand or at least one algorithm also used in an associated media storage operation.

15. The method of claim 11, wherein at least one of the determining, deriving, transforming, or generating operations utilizes one or more of at least one operand or at least one algorithm also used in an associated media storage operation.

16. The method of claim 11, wherein the returning comprises:

At least one indication of at least one candidate from the database sector is returned based at least in part on a probabilistic point location in an equal sphere algorithm as a function of the received hint.

17. The method of claim 11, wherein the candidate item is within a predetermined inverse percentage distribution radius of the received prompt.

18. A system for addressing a media database using distance-associative hashing, the system comprising:

one or more processors; and

One or more instructions, when executed by the one or more processors, cause the one or more processors to perform the following operations:

receiving one or more indications of samples of a video segment;

The database sector is determined by referencing a plurality of most significant bits of the hash value, wherein the plurality of most significant bits are predetermined to be the address of the corresponding database sector, and wherein one or more of the hash values

A plurality of remaining bits addresses respective buckets of the database sector; and

The hash value is stored on the database sector.

19. A machine for addressing a media database using distance-associative hashing, the machine comprising:

circuitry configured to receive one or more indications of samples of a video segment;

circuitry configured to determine an algorithmically derived value for a tile of one or more pixels of the sample;

circuitry configured to derive a new value by subtracting a midpoint value of the slice from the algorithmically derived value;

circuitry configured to transform the new value, wherein the transforming comprises uniformly distributing the value using a pre-derived function;

circuitry configured to generate a hash value, wherein said generating comprises using at least said transformed new value;

circuitry configured to determine a database sector by referencing a plurality of most significant bits of the hash value, wherein the plurality of most significant bits are predetermined to be an address of a corresponding database sector, and wherein one or more remaining bits of the hash value address respective buckets of the database sector; and

Circuitry is configured to store the hash value on the database sector.