TW200951832A

TW200951832A - Universal lookup of video-related data

Info

Publication number: TW200951832A
Application number: TW98112575A
Authority: TW
Inventors: Alexander Bronstein; Michael Bronstein; Shlomo Selim Rakib
Original assignee: Novafora Inc
Priority date: 2008-04-15
Filing date: 2009-04-15
Publication date: 2009-12-16

Abstract

A universal video-related lookup system and method receives a request for information associated with specific video content from a requesting device. The system and method identify a first video content identifier associated with the specific video content and retrieves first metadata associated with the specific video content based on the first video content identifier. Next, the system and method translate the first video content identifier into a second video content identifier associated with the specific video content and retrieves second metadata based on the second video content identifier. The first metadata and the second metadata are then provided to the requesting device.

Description

200951832 六、發明說明：【發明所屬之技術領域】種查找視頻關聯資料的系統及其方法，特別係指一種確定及關聯複數與特定視顯容相義視細容識浦之裝置及其方法，以及-種聚集來自一個或多個元資料來源的元資料（metadata) 之裝置及其方法，該元資料與特定視頻内容相關。【先前技術】 0200951832 VI. Description of the Invention: [Technical Fields of the Invention] A system for finding video-related data and a method thereof, in particular, a device and method for determining and correlating a complex number with a specific visual display, and a method thereof An apparatus for aggregating metadata from one or more metadata sources and methods thereof, the metadata being associated with a particular video content. [Prior Art] 0

在視頻儲存處之特定視頻内容可能具有指向視頻且與視頻唯 -相關之Μ識職。此類的識職通常_為全域，識別符 (Globally Unique identifier，GUID)。本發明所提之視頻儲存處的例子為包含視頻主機的分散式網站（如NetFlix、Y〇uTube或Huh 等）、DVD媒體的收藏處、多媒體檔案的收藏處、或是點對點 (peer-to-peer)網路。要特別一提的是，每個視頻儲存處中的GUID都是獨有的。例如，在線上核_ Y〇uTube中的視頻具有—個_的網路位址（uniform resource locator，URL) ’該網路位址在線上視頻資源中是獨-無二的。同樣的’槽案在；BitTorre_點對點網路具有雜、湊值（hash value )’該雜湊值係由檔案的内容計算出來，且用來作為該檔案的識別符。DVD光碟能夠依據雜湊值來進行唯一的識別，該雜湊值係由被錄製在光碟上之多媒體檔案產生，通常被^ 為 DVDid。 4 此外，視頻内容的儲存處存在多個視頻相關資訊（也稱視頻相關元資料）的儲存處。一些例子如：包含電影情節與角色之詳細描述的Wikipedia、包含電影中之演員列表的網路電影資料 200951832 庫（如International Movie Database, IMDB)、包含不同語言之字幕的字幕網站（如OpenSubtitles)、以及包含DVD相關資訊的網站資料庫（如 DVDXML database)等。許多的這些元資料儲存處隨處可見且使用不同類型的識別符作為索引。舉例來說，DVD相關資訊（如：封面、章節列表、標題等）能夠使用DVDid來擷取、在字幕網站資料庫中的字幕能依據電影雜凑值(moviehash)進行擷取、而識別符被使用在mtT〇rrent 點對點網财。其他資絲源也使贿電影或其他視觸目相關的雜凑值來索引視頻相關資訊。某些線上資訊月艮務則使用^作為視頻相關資訊的索引。雖然有許多方法可以用來確認不同類型的視頻内容，但這些識別符是不可替代的。元資料儲存處為典型的使用一些類型的識別符來存取赋表性之内容來源的㈣（如，字幕與電影雜凑值的關，以及DVD光碟資訊與DVDid等）。然而，一個資訊來源的識別符將無法有效的確認在其他資訊來源中的視頻内容。【發明内容】本發明揭露一種裝置及其方法，其中：本發贿揭露之方法’其轉至少包括：由請求設備接收與寺定視頻内容對應之請求；自視_容來源確認與特定視頻内容對應之第-視頻内容識別符；依據第—視頻内容識別 ^見頻内容對應之第—元資料；轉換第—視_容朗符為第: 内容朗符；依據第二視_容識職麻與特定視頻元資料；提供第一元資料及第二元資料予請求設 200951832 本發明所揭露之方法，其步驟至少包括：確認與視頻節目相關之第-視頻序列；確認與視頻節目相關之第二視頻序列；計算第-視頻序列及第二視頻序顺之對應_ ;判斷第―視頻序= -及第二視頻序列之校準線；儲存第—視頻相及第二視頻序列間 :之對應關係；儲存第-視頻序列及第二視頻序列間之校準線之資訊。本發明所揭露之方法，其步驟至少包括：接收與視頻節目相關之元資料之-請求，請求包含對應視頻節目之第一視頻内容識符；確認視頻節目中之視頻序列；基於視頻序列確認對應視頻節目之第—視頻内容識別符；基於第二視頻内容識別符由元資料來源擷取元資料；由對應視頻節目之時間軸及與對應元資料之時間軸中確認對應關係。本發明所揭露之方法，其步驟至少包括：輸人第—視頻識別符，第一視頻識別符包含内容識別符及在第一視頻内容中之一組第一座標；確認對應第二視頻内容之一組第二視頻識別符，其中 φ 第二視頻内容與第一視頻内容相似；對該組第二視頻識別符中之各視頻識別符進行下列步驟：確認第二視頻内容中之一組座標； •輸出第二視頻内容中之該組座標給對應第一視頻内容之請求元資，料。本發明所揭露之裝置，至少包含：通訊模組，用以自請求設備接收對應特定視頻内容之請求；處理器，與通訊模組連接，用以確認對應特定視頻内容之第一視頻内容識別符，並依據第一視頻内容識別符擷取對應特定視頻内容之第一元資料，及轉換第一視頻内容識別符為第二視頻内容識別符’並依據第二視頻内容識 5 200951832 別符操取第二元資料，及更用以提供第—元資料及第二元資請求設備。本發明所揭露之裝置’至少包含：通訊模組，用以接收第一視頻識別符’第_視頻識別符包含内容識別符及在第-視頻内容中組第-座標，制以傳送第二視頻識聰及與第二視賴別符對應之一組第二座標，該組第二座標係於第二視頻内容中；處理益，與通訊模組連接，用以由表格中擷取第二視頻識別符，以確認第二視頻識別符，表格包含存放視頻識別符之第一攔位，及存放與在第-棚財之测翻符侧之視細容相似之視頻内谷相關之視頻識別符之第二攔位。本發明所揭露之裝置與方法如上’與先前技術之間的差異在於本發明透過接收由請求設備傳送之對應特定視頻内容之請求後，確認對應特定視_容之第—視頻内容識別符，並依據第一視頻内容識別符擷取對應特定視頻内容之第一元資料，接著，轉換第-視頻内容識別冑為與特定視頻内容對應 < 第二視頻識別符，並依據第二視頻内容識別符擷取第二元資料後，提供第一元身料與第二元資料請求設備，藉崎決先前技術所存在的問題，並可以達成減少負载的技術功效。【實施方式】以下將配合圖式及實施例來詳細說明本發明之特徵點與實施方式，内容足以使任何熟習相關技藝者能夠輕易地充分理解本發明解決技術問騎細峨射段並據以實施，藉此實現本發明可達成的功效》本發明所提之裝置與方法確定及關聯多個與特定視頻内容對 200951832 應之視頻内容識別符。此外，本發明所提之裝置與方法由一個或更多元資料來源中聚集與特定視頻内容對應之元資料。舉例^ 忒，此系統與方法確定特定視頻節目的多個版本，不論轉碼格式、 '長寬比、商業廣告、修改過的節目長度等等。視頻内容的相1 1係是以空間及或時間座標來表現。「第1圖」所示為一個可以應用本發明所提之裝置與方法之環境100的例子’環境100包含與網路108連接的多個視頻來源 ❹102、104、以及106。視頻來源102、1〇4以及1〇6能夠以任何形式的視頻來源儲存處、點對點網路、電腦系統或是其他具有儲存、操取、傳送視頻内容之能力或以其他方式提供視頻内容的系統。視頻來源_子包含提供下載或φ流㈣綱存處、提供視頻内容交換的點對點網路、包含視頻内容的個人電腦系統、數位多功能光碟（Digital Versatile Disc，DVD)播放機、藍光（Blu_rayDisc， BD)播放機、數位視頻錄影機、遊戲操縱台、Y〇u加^、NetFlix 以及BitTorrent等等。視頻内容的例子包含電影、電視節目、家用 ❹錄影帶、電腦產生之影片以及完整的電影或電視節目中的一部分。網路108為一個使用任何通訊協定以及任何類型之通訊媒介 .的資料通訊網路。在一個實施例中，網路1〇8為網際網路、(Intemet)。在其他實施例中’網路108為兩個或更多網路的相互组合。網路108可能透過有線或無線通訊網路進行存取。 %境100也包含與網路108連接的多個元資料來源11()、112 以及114。元資料來源110、112以及114能夠以任何形式的元資料儲存處、電腦系統或是其他具有儲存、擷取資料之能力或以其他方式提供與視頻内容相關之元資料的系統。視頻相關之元資料 7 200951832 包含的資料有：DVD猶的封面、字幕:纽、視_容的標題、視頻節目的演員、視頻内容的故事大綱、視頻内容的作者/製片者以及與視_容關切覽者意見等等。元㈣來源的例子包含網路服務的dvdxml.com資料庫、0pensubtitie org資料庫、ιμ〇β 以及YouTube的使用者的意見等等。dvdxml c〇m的資料庫包含電影之DVD版本的資訊，例如標題以及關鍵演貞。〇pensubtitle, 的資料庫也包含各種電影之多種不同語言的字幕檔案。此外，環境100包含分別與網路1〇8連接的通用視頻查找系統116以及視頻設備118。如上所述之通用視頻查找系統116確認並關聯與特定視頻内容相關之多個視頻内容識別符。此外，通用視頻查找系統116由一個或多個元資料來源11〇、112以及114中聚集與特定視頻内容相關之元資料。與通用視頻查找系統116的元件以及操作相關的細節將與本文中描述。視頻設備118能夠接收並處理視頻内容，例如，在一個或多個顯示設備（圖中未示）上顯示。視頻設備118的具體例子包含電腦、機上盒（seU〇pb〇x， STB)、衛星接受器、DVD播放機、藍光播放機、數位視頻錄影機以及遊戲操縱台等等。雖然「第1圖」表現出三個視頻來源1〇2、1〇4以及106以及三個元資料來源110、112以及114 ’不過實際上，特定的環境1〇〇 T以包含任思數置的視頻來源以及任何數量的元資料來源。此外，特定的環境100也可以包含任意數量的視頻設備118、通用視頻查找系統116以及透過網路108相互連接的其他設備或系統（圖中未示）。雖然在「第1圖」中所表示的各種元件以及系統與網路1〇8 200951832 連接，但通用視頻查找系統116能夠透過其他的網路或通訊線路等方式連接到一個或多個元件或系統。「第2圖」表示實施本發明所提之通用視頻查找系統116的 Λ 例子。通用視頻查找系統116包含通訊模組202、處理器204、視頻分析模組206以及儲存裝置208。通訊模組202負責在通用視頻查找系統116以及如視頻來源、元資料來源、視頻設備等其他設備之間傳遞資料與其他資訊。處理器2〇4執行通用視頻查找系統 ❹ U6運作過程中所必要的各種指令，例如，處理器204能夠執行一些方法以及程序來處理本發明所提之視頻内容識別符以及與元資料相關之視頻内容。視頻分析模組206執行各種視頻程序以及視頻分析指令，例如，視頻分析模組206能夠識別被顯示之視頻資料所包含的内容。儲存裝置208儲存資料以及通用視頻查找系統 116運作過程所使用的其他資訊。儲存裝置2〇8可以包含一個或多個依電性（volatile)及/或非依電性的記憶體。在特殊的實施例中，儲存裝置208包含結合依電性以及非依電性之記憶體的硬碟（乜^^ ❿ disk drive, HDD ) °The particular video content at the video store may have a point of view that is directed to the video and is only relevant to the video. This type of job is usually _ a Globally Unique identifier (GUID). Examples of the video storage provided by the present invention are distributed websites including video hosts (such as NetFlix, Y〇uTube, or Huh, etc.), collections of DVD media, collections of multimedia files, or peer-to-peer (peer-to- Peer) network. In particular, the GUID in each video store is unique. For example, the video in the online core _Y〇uTube has a _uniform resource locator (URL) ’. The network address is unique in the video resource on the line. The same 'slot case; BitTorre_ peer-to-peer network has a hash value' that is calculated from the contents of the file and used as the identifier of the file. A DVD disc can be uniquely identified based on a hash value generated by a multimedia file recorded on a disc, usually by a DVDid. 4 In addition, there is a storage location for multiple video related information (also called video related metadata) in the storage of video content. Some examples include Wikipedia, which contains a detailed description of the movie plot and characters, a network movie material 200951832 library containing a list of actors in the movie (such as International Movie Database, IMDB), and subtitle websites (such as OpenSubtitles) that contain subtitles in different languages. And a website database (such as DVDXML database) containing information about DVDs. Many of these metadata stores are everywhere and use different types of identifiers as indexes. For example, DVD related information (such as: cover, chapter list, title, etc.) can be retrieved using the DVDid, and the subtitles in the subtitle website database can be retrieved according to the movie hash value, and the identifier is captured. Use peer-to-peer online at mtT〇rrent. Other sources of silk also index video related information by bribery movies or other hash-related values. Some online newsletters use ^ as an index for video-related information. Although there are many ways to identify different types of video content, these identifiers are irreplaceable. The Metadata Store typically uses some types of identities to access the source of the identifiable content (iv) (eg, the subtitles and movie hash values, as well as DVD information and DVDids, etc.). However, the identifier of one source of information will not be able to effectively confirm the video content in other sources. SUMMARY OF THE INVENTION The present invention discloses an apparatus and method thereof, wherein: the method for issuing a bribe disclosure includes: at least: receiving, by the requesting device, a request corresponding to the temple video content; the self-viewing source confirmation and the specific video content Corresponding first-video content identifier; according to the first video content identification ^ corresponding frequency content corresponding to the meta-data; conversion first-view_容朗符 is the first: the content of the language; according to the second vision And the specific video metadata; providing the first metadata and the second metadata to the requesting method 200951832, the method disclosed in the present invention, the method comprising the steps of: confirming the first video sequence related to the video program; and confirming the first related to the video program a second video sequence; calculating a correspondence between the first video sequence and the second video sequence _; determining a first video sequence = - and a calibration line of the second video sequence; storing a correspondence between the first video phase and the second video sequence: ; storing information of the calibration line between the first video sequence and the second video sequence. The method of the present invention comprises the steps of: receiving a request for metadata associated with a video program, requesting a first video content identifier corresponding to the corresponding video program; confirming a video sequence in the video program; and confirming the correspondence based on the video sequence The video program identifier-video content identifier; the metadata data is retrieved from the metadata source based on the second video content identifier; and the corresponding relationship is confirmed by the time axis of the corresponding video program and the time axis of the corresponding metadata. The method of the present invention includes the steps of: inputting a first video identifier, the first video identifier comprising a content identifier and a first coordinate in the first video content; and confirming the corresponding second video content a set of second video identifiers, wherein φ the second video content is similar to the first video content; performing the following steps for each of the set of second video identifiers: confirming a set of coordinates in the second video content; • Outputting the set of coordinates in the second video content to the request element corresponding to the first video content. The device disclosed in the present invention comprises: a communication module, configured to receive a request for a specific video content from a requesting device; and a processor coupled to the communication module to confirm a first video content identifier corresponding to the specific video content And searching for the first meta-data corresponding to the specific video content according to the first video content identifier, and converting the first video content identifier to the second video content identifier 'and according to the second video content identification 5 200951832 The second element data is used to provide the first element information and the second element request device. The device disclosed in the present invention includes at least: a communication module, configured to receive a first video identifier, a first video identifier, a content identifier, and a first coordinate in the first video content, to transmit a second video The second coordinate of the group corresponding to the second and the second object, the second coordinate of the group is in the second video content; the processing benefit is connected with the communication module for capturing the second video from the table An identifier for confirming the second video identifier, the table containing the first block for storing the video identifier, and the video identifier associated with the video valley similar to the view size of the first side of the test The second stop. The apparatus and method disclosed in the present invention differs from the prior art in that the present invention confirms the first video content identifier corresponding to a specific video content by receiving a request for the corresponding specific video content transmitted by the requesting device, and Retrieving the first meta-data corresponding to the specific video content according to the first video content identifier, and then converting the first video content identification to the specific video content corresponding to the second video identifier, and according to the second video content identifier After extracting the second element data, the first element material and the second element data request device are provided, and the problems existing in the prior art are overcome by the problem, and the technical effect of reducing the load can be achieved. [Embodiment] Hereinafter, the features and embodiments of the present invention will be described in detail in conjunction with the drawings and the embodiments, which are sufficient to enable the skilled in the art to fully understand the technical solutions of the present invention. Implementation, whereby the achievable effects of the present invention are achieved. The apparatus and method of the present invention determine and correlate a plurality of video content identifiers for a particular video content pair 200951832. Moreover, the apparatus and method of the present invention aggregates metadata corresponding to a particular video content from one or more metadata sources. For example, the system and method determines multiple versions of a particular video program, regardless of transcoding format, 'aspect ratio, commercial, modified program length, and the like. The phase 1 of the video content is represented by space and or time coordinates. "Figure 1" shows an example of an environment 100 in which the apparatus and method of the present invention can be applied. Environment 100 includes a plurality of video sources 102, 104, and 106 coupled to network 108. Video sources 102, 1〇4, and 1〇6 can be any form of video source storage, peer-to-peer network, computer system, or other system that has the ability to store, manipulate, transmit, or otherwise provide video content. . The video source_ sub contains a download or φ stream (four) program, a peer-to-peer network that provides video content exchange, a PC system with video content, a Digital Versatile Disc (DVD) player, Blu-ray Disc (Blu_rayDisc, BD) Players, digital video recorders, game consoles, Y〇u Plus^, NetFlix, BitTorrent and more. Examples of video content include movies, television shows, home video, computer-generated movies, and a complete movie or television program. Network 108 is a data communication network that uses any communication protocol and any type of communication medium. In one embodiment, the network 1〇8 is the Internet, (Intemet). In other embodiments, network 108 is a combination of two or more networks. Network 108 may be accessed via a wired or wireless communication network. The % context 100 also contains a plurality of metadata sources 11(), 112, and 114 that are connected to the network 108. Metadata sources 110, 112, and 114 can be in any form of metadata repository, computer system, or other system that has the ability to store, retrieve, or otherwise provide metadata related to video content. Video related meta-data 7 200951832 Contains information: DVD cover, subtitles: New, visual _ rong title, video program actor, video content story outline, video content author / producer and view _ I am concerned about the views of the visitors and so on. Examples of source (4) sources include the dvdxml.com database for web services, the 0pensubtitie org database, the users of ιμ〇β, and YouTube users. The dvdxml c〇m database contains information about the DVD version of the movie, such as the title and key interpretations. 〇pensubtitle, the database also contains subtitle files in a variety of different languages for various movies. In addition, environment 100 includes a universal video lookup system 116 and video device 118 that are coupled to network 1-8, respectively. The universal video lookup system 116 as described above identifies and associates a plurality of video content identifiers associated with particular video content. In addition, the universal video lookup system 116 aggregates metadata associated with particular video content from one or more metadata sources 11〇, 112, and 114. Details related to the components and operation of the universal video lookup system 116 will be described herein. Video device 118 is capable of receiving and processing video content, for example, on one or more display devices (not shown). Specific examples of the video device 118 include a computer, a set-top box (seU〇pb〇x, STB), a satellite receiver, a DVD player, a Blu-ray player, a digital video recorder, a game console, and the like. Although "Figure 1" shows three video sources 1, 2, 1〇4, and 106 and three meta data sources 110, 112, and 114', in reality, the specific environment 1〇〇T contains the number of thoughts. Video source and any number of metadata sources. In addition, the particular environment 100 can also include any number of video devices 118, a universal video lookup system 116, and other devices or systems (not shown) that are interconnected by the network 108. Although the various components and systems shown in "Figure 1" are connected to the network 1〇8 200951832, the universal video lookup system 116 can be connected to one or more components or systems via other networks or communication lines. . Fig. 2 shows an example of the implementation of the universal video search system 116 of the present invention. The universal video search system 116 includes a communication module 202, a processor 204, a video analysis module 206, and a storage device 208. The communication module 202 is responsible for communicating data and other information between the universal video discovery system 116 and other devices such as video sources, metadata sources, video devices, and the like. The processor 2〇4 executes various instructions necessary for the operation of the universal video search system ❹ U6. For example, the processor 204 can execute some methods and programs to process the video content identifier and the metadata related to the metadata provided by the present invention. content. Video analysis module 206 executes various video programs and video analysis instructions. For example, video analysis module 206 can identify the content contained in the displayed video material. The storage device 208 stores the data and other information used by the general video lookup system 116 during operation. The storage device 2〇8 may contain one or more volatile and/or non-electrical memory. In a particular embodiment, the storage device 208 includes a hard disk (HDD) that combines electrical and non-electrical memory.

通用視頻查找系統116也包含視頻DNA模組21〇、對應模組 • 212、雜湊（hash)計算模組214以及廣告管理模組21卜視頻DNA ，模組210識別視頻内容中的物件，藉以找出不同視頻序列的對應關係。對應模組212分析多個視頻序列，藉以由兩個或多個視頻序列中找出空間及/或時間的對應關係。雜湊計算模組214計算與視頻内容料的#段4目__函式（_如此⑻。' 雜湊函式為一個將大量的資料轉換為較小之「雜凑值(_★)」的演算法。雜湊值通常被用來作為資料表（臟）或其他大量資 9 200951832 料的索引（index)。廣告管理模組216執行各種廣告相關的函式，例如，廣告管理模組216能夠在各種因素的基礎上選擇多個廣告並插入視頻内容中。雖然沒有表示在「第2圖」中，但組成通用視頻查找系統116 的元件透過一個或多個通訊線路，如匯流排（bus),與其他在通用視頻查找系統116之内的元件進行通訊。在特殊的實施例中，「第 2圖」所表示之各模組（視頻分析模組2〇6、視頻DNA模組21〇、對應模組212雜湊計算模組214以及廣告管理模組216)可以表現出電腦可讀取的指令被執行，例如由處理器2〇4執行電腦可讀取的指令。「第3圖」為擷取視頻内容識別符以及與特定視頻内容對應之元資料的實施例之程序（procedure) 300的流程圖。首先，程序 300接收到與特定視頻内容對應之訊息的請求（步驟3〇2)。該請求由如「第1圖」所示之視頻設備118所接收。被請求的訊息可以包含與視頻内容對應之元資料或視頻内容的其他版本。接著，私序300確認與特定視頻内容對應之視頻内容識別符（步驟 304)。該請求可以包含與特定視頻内容對應之視頻内容識別符、視頻内容的連結（link)或是該特定視頻内容自身的一部分。若該請求沒有包含視頻内容識別符，程序300將如後續描述，依據特定視頻内容確認相對應之視頻内容識別符。之後’程序300依據視頻内容識別符擷取與特定視頻内容對應之元資料（步驟306)。被擷取出之元資料能與全部之視頻内容對應、與視頻内容中之特定時間區間對應、或與視頻内容中之時空物件對應。該元資料可以由任何數量的元資料來源中擷取出 200951832 來，例如，第一元資料來源提供字幕資訊，第二元資料來源包含浹員資訊，第三元資料來源包含與視頻内容對應之評論以及使用者評分。而後，程序300將與特定視頻内容對應之視頻内容識別符轉換為與先前確定之視頻内容識別符相對應之其他視頻識別符 (步驟308)。該視頻内容識別符至相對應之視頻識別符的轉換係可事先被執行與儲存於資料庫、資料表或其他資料結構留待後續擷取。各種辦能卿可被應絲處理上述賴視頻内容識別符轉換為相對應之視頻識別符的程序。例如，該視頻内容識別符能夠被轉換為與全料視細容對狀另—她納容識別符，或是被轉換為與狀時間區間内之視頻内容相關的第二視頻内容識別符。在另-個實做中，該視_容識職被轉換為與視頻内容中之時空物件相_第二視_容識別符。例如，儲存預先計算之視頻内容識職㈣料表，該資料表包含第—攔仙及第二棚位，其中，該視頻内容識別符儲存於第—欄位中，與第一搁位中The universal video search system 116 also includes a video DNA module 21, a corresponding module 212, a hash calculation module 214, and an advertisement management module 21 for video DNA. The module 210 identifies objects in the video content, thereby finding Correspondence between different video sequences. The corresponding module 212 analyzes a plurality of video sequences to find a spatial and/or temporal correspondence between the two or more video sequences. The hash calculation module 214 calculates the #段四目__ function of the video content material (_ such (8). The hash function is a calculation that converts a large amount of data into a smaller "heavy value (_★)". The hash value is typically used as an index for the data sheet (dirty) or other large amount of resources. The advertisement management module 216 performs various advertisement related functions, for example, the advertisement management module 216 can be used in various Based on the factors, multiple advertisements are selected and inserted into the video content. Although not shown in "Picture 2", the components constituting the universal video search system 116 are transmitted through one or more communication lines, such as a bus, and Other components within the general video search system 116 communicate. In a particular embodiment, the modules represented by "Fig. 2" (video analysis module 2〇6, video DNA module 21〇, corresponding mode) The group 212 hash calculation module 214 and the advertisement management module 216) can represent that a computer readable command is executed, for example, the processor 2 〇 4 executes a computer readable command. "Fig. 3" is a captured video. Content identifier and A flowchart of an embodiment of a meta-data corresponding to a specific video content. First, the program 300 receives a request for a message corresponding to a specific video content (step 3〇2). The request is as shown in FIG. The illustrated video device 118 receives the requested message. The requested message may include metadata or other versions of the video content corresponding to the video content. Next, the private sequence 300 confirms the video content identifier corresponding to the particular video content (step 304). The request may include a video content identifier corresponding to the particular video content, a link to the video content, or a portion of the particular video content itself. If the request does not include a video content identifier, the program 300 will be described as follows. The corresponding video content identifier is confirmed according to the specific video content. Then the program 300 retrieves the meta-data corresponding to the specific video content according to the video content identifier (step 306). The extracted meta-data can correspond to all the video content. Corresponding to a specific time interval in the video content or corresponding to a spatiotemporal object in the video content. Metadata can be extracted from any number of metadata sources 200951832, for example, the first source provides subtitle information, the second source contains employee information, and the third source contains comments corresponding to the video content. The user scores. Then, the program 300 converts the video content identifier corresponding to the specific video content into other video identifiers corresponding to the previously determined video content identifier (step 308). The video content identifier corresponds to The conversion of the video identifier can be executed in advance and stored in a database, data table or other data structure for subsequent retrieval. Various functions can be processed by the silk video to convert the above video content identifier into the corresponding video identifier. program of. For example, the video content identifier can be converted to a binary identity identifier, or a second video content identifier that is converted to video content within a time interval. In another implementation, the visual job is converted to a time-space object in the video content. For example, storing a pre-computed video content job (four) material table, the data table includes a first block and a second place, wherein the video content identifier is stored in the first field, and the first place

的視頻内容識麟相對應之視_容相似（或烟）之視頻内容對應之視頻内容識別符包含於第二欄位中。接著，程序300 _與各财他的視細容識元資料（步驟310)。如此，與元杳姐 * . 、胃撕應之同—視頻内容的各個版本係由任何數m料來源中#_來。最後，元貧=對應視頻内容相關的資訊給請求設備（步驟312)。該資況允5午睛求设備顯不部分或所有 ^ 者觀看。此外，請求設備能夠使用牡設備的使用使用者觀看，從^^_€_=員内谷之所有可用的版本予 Λ 則轉擇敝驗本來顯示。第4圖」為判斷與相同之視頻節目對應的多個視頻序列之 200951832 對應關係的實施例之程序400的流程圖。該些視頻序列通常為相同視頻節目之不同版本。相同視頻節目的不同版本可能有不同的長寬比、不同的轉碼格式、或不同的播送版本（例如：沒有商業廣告之完整長度的電影版本與經過編輯而給電視播送之包含商業 ' 廣告的版本）。首先，程序400確定與視頻節目對應之第一視頻序》列（步驟402 )。該第一視頻序列可以表現全部或部分的視頻節目。接著，程序400確認與相同視頻節目對應之第二視頻序列（步驟 404)。如上所述，第一視頻序列與第二視頻序列為相同視頻節目之不同版本。 ❹ 之後，程序400計算第一視頻序列與第二視頻序列之間的對應關係（步驟406)。這個對應關係可能包含時間座標及/或空間座標。許多的系統與方法可以被用來計算出第一視頻序列與第二視頻序列的對應關係。當兩視頻序列需要進行時間同步時，計算兩視頻序列的時間對應關係是特別重要的。例如，假設包含於第一視頻序列中的字幕與第二視頻序列的視頻内容一起被顯示，則字幕應該被顯示在第二視頻序列的正確時間上。在此情況下計算時間對應關係將可提供適當的同步。當確認視頻序列中之幾何圖案的資訊時，例如確認在一個鏡、頭中的許多物件，計算兩視頻序列間之空間對應關係是特別有幫 · 助的。視頻節目的不同版本能有不同的解析度、長寬比等等，如此，兩視頻節目間交換必須要有空間對應關係。例如，空間對應關係可提供不同型式的内容以及不同版本的視頻相關之元資料交換。而後’「第4圖」中的程序400判斷第一視頻序列與第二視頻 12 200951832 視列之（時間的及/或空間的）校準線（步驟顿）。多個可供選擇的實施例係依據校準以聲頻為基礎之資料片段來校準第一視頻序列與第二視頻序列。 '賊’程序獅儲存該經過計算之第—視頻序列與第二視頻 •序列的對應關係（步驟410)。被儲存之對應關係的資訊可以提供後續在兩__視解财朗，而不需要顏計算該對應關係。最後，程序400儲存相關於第一視頻序列與第二視頻序列之 ❹校準線的資訊（步驟412)。被儲存之校準線的資訊可以用來在未來校準上述之兩個視頻相，柯需要重·斷視頻序列的校線。「第4圖」中的程序4〇〇判斷對應於相同視頻節目之兩特殊，頻序列的對侧係。程序對每對視頻序列進行重複的步驟，藉以預先計算對蘭叙資訊以及建立包含許多不_應關係之資訊的貧料庫（或資料結構）。具有對應關係之資訊的資料庫係用來加速通職頻紙_ n6的運作，避免計算已包含於資料庫 ❹中之對應關係的資訊。該對應關係之資訊的資料庫可能包含在通用視頻查找系統116内或由通訊視頻查找系統116 (或其他系統） ‘透過網路108或其他資料通訊線路存取。在特殊的實施例中，預，先建立之對應關係的資訊由一個資料服務提供給多個系統或設備，例如視頻設備118。「第5圖」為由播放在DVD光碟中之電影中確定、擷取以及校準字幕資訊的實施例之程序的流程圖。在「第5圖」的例子中’為了取得與特定電影相關之特定字幕資訊而接收請求（步驟5〇2)。例如，使用者可能希望觀看DVD光碟上的電影，但希 13 200951832 望字幕資訊為特殊的語言，例如俄羅斯語。dv〇光碟中可能包含英文與西班牙文的字幕資訊’但並沒有包含俄羅斯語。使用本發 =所提之通用視頻查找系統116以及程序·，使用者能夠以俄羅斯語的字幕觀看DVD光碟上的電影。、接著，程序確認與DVD光碟對應之第—動識別符(步，驟5〇4)。例如’第一勵識別符可能為雜識別碼或麵光碟上之部分視頻内容經由雜凑函式計算產生的雜凑值。在特殊的實施例中，能夠依據與視_容對應之難名稱、以檔案為基礎的雜凑值、或朗容為基礎之雜凑值來確認與視軸容⑷如儲❹ 存於DVD光碟上的視頻内容）相關之識別符。檔案名稱識別符的例子如使用與永久儲存且不改變的視頻内容相關之固定的視頻存放處之URL。_案絲礎之雜凑值可以在點對闕路中確認相同檔案的齡名稱即使錄使用者之間獨立地龍案變，以檔案為基礎之雜湊值例如「M〇viehash」。以内容為基礎之雜凑值以分析自己的視頻内容來確認視頻，例如先前所述的視頻 DNA。因此，朗容絲礎之麟值是不變的檔案名稱、編碼程序、視頻内容的加工、或是視頻内容的編輯。在確認第-DVD識別符後，程序500在儲存於DVD光碟上< 的電影中確認視頻序列（步驟506)，例如電影的前3〇秒或電影中，的任何其他視頻序列。然後，程序5〇〇依據確認過的視頻序^確認與DVD相關之第二DVD識別符（步驟5〇8)。苐二dvd識別碼係依據所需的附加資料（如視頻元資料）所選出。在這個例子中，係依據用來確認字幕來源之訊息的索引類型來選擇第二dvd 識別符。因此，第二DVD識別符被用來在與使用者選擇之特定 200951832 DVD光碟相關的字幕來源中尋找並擷取俄羅斯字幕的資訊（步驟 510)。雇之後’程序500確定DVD電影的時間軸與字幕來源相關之電〜的日$間軸間的對應關係（步驟512)。而後，字幕資訊依據校準 ’的鱗間軸而被對應到DVD電影（步驟514)。這個對應關係以及校準線是需要時間同步適當電影内容之字幕的顯示。在程序5〇〇之特殊的實施例中，DVD電影的時間軸以及與字幕來源相關之電 ❹影的時間軸間的對應關係係預先計算並儲存於資料庫或資料結構中。在其他的實施例中，f有需要的時候，在步驟512所確認之對應關係㈣訊將先被計算’然後選擇性的儲存藉以提供未來參考0 ❹ 本發明所提之裝置及方法包含元資料元件以及相對應的元件。元資料元件與視頻内容識別符相關，例如為加獅之脱或雜凑函式，元資料元件也與時空座標_，例如在視頻序列中的（邮）與視頻内容識別符對應。在時空座標（x，y，t)中，χ以空間維度對應，t與時間維度對應。特定的元資料元件 m ( ’x，y，z)來表不，其中，「να」表示視頻内容識別符。在不同麵的να ’元資料可能被配置林_元資料來源。在^輯財，元資料為序珊級的#料（例如所以元資料元件m(VCI;x )= 知碡）間與時間㈣顺電影標題並不細。 4况下’空字電影的 15 200951832 資料就不是必須的。的座^後，與砸辕心靖物悔時間與空間兩個不同視頻内容識別符間之對應關係元件以函式 ⑴ Wvcn，⑽;xl，yl，tl)來表示，其中，（x2 y2，t2)^’ 及（A yi，ti)均為視頻序列中的時空座標，該視頻序列 vcn以及VCI2對應。不同來源之視頻内容可以依據不同種敵 να (例如：YouTube的視頻剪輯與则光碟）建立對應 ❹ 而視頻内容與相__ να之間也可以建立對蘭係，例如 DVD上之相同電影的不同版本。與視頻相關的通用查找表現在給予（VCI〇;x〇 y〇，z〇)的兩個階段中。首先，紐尋找與v⑽有對應_之να，在這中以 C (vao，VCI1;XO，y0，to)、c (ν⑽，VCI2;x〇 y〇 t〇)、、The video content of the video content is similar to the video content of the similar (or smoke) video corresponding to the video content identifier is included in the second field. Next, the program 300_ is associated with each of the financial resources (step 310). In this way, with the Yuanxiao sister *., the stomach tears should be the same - the various versions of the video content are from any number of sources. Finally, the meta-lean = information corresponding to the video content is sent to the requesting device (step 312). This condition allows the device to be viewed in part or all of the devices. In addition, the requesting device can be viewed by the user of the use of the oyster device, and from the available version of the ^^_€_= 里内谷, then the selection is displayed. Fig. 4 is a flow chart showing a procedure 400 of an embodiment for judging the correspondence of 200951832 of a plurality of video sequences corresponding to the same video program. These video sequences are typically different versions of the same video program. Different versions of the same video program may have different aspect ratios, different transcoding formats, or different broadcast versions (eg, a full-length movie version without commercials and a commercial-advertised advertisement for television broadcasts) version). First, the routine 400 determines a first video sequence column corresponding to the video program (step 402). The first video sequence can represent all or part of the video program. Next, the program 400 confirms the second video sequence corresponding to the same video program (step 404). As mentioned above, the first video sequence and the second video sequence are different versions of the same video program. Thereafter, the routine 400 calculates a correspondence between the first video sequence and the second video sequence (step 406). This correspondence may contain time coordinates and/or space coordinates. Many systems and methods can be used to calculate the correspondence between the first video sequence and the second video sequence. When two video sequences require time synchronization, it is particularly important to calculate the temporal correspondence of the two video sequences. For example, assuming that the subtitles included in the first video sequence are displayed along with the video content of the second video sequence, the subtitle should be displayed at the correct time of the second video sequence. In this case, the calculation of the time correspondence will provide the appropriate synchronization. When confirming the information of the geometric pattern in the video sequence, for example, identifying many objects in a mirror and a head, it is particularly helpful to calculate the spatial correspondence between the two video sequences. Different versions of a video program can have different resolutions, aspect ratios, etc., so that there must be a spatial correspondence between the exchanges between the two video programs. For example, a spatial correspondence can provide different types of content as well as metadata exchanges associated with different versions of the video. The program 400 in Fig. 4 then determines the calibration line (time and/or space) of the first video sequence and the second video 12 200951832. A plurality of alternative embodiments calibrate the first video sequence and the second video sequence based on calibrating the audio based data segments. The 'thief' program lion stores the correspondence between the calculated first video sequence and the second video sequence (step 410). The information of the stored correspondence relationship can be provided for subsequent follow-up in the two __ 解财 ,, without the need to calculate the corresponding relationship. Finally, the routine 400 stores information relating to the ❹ calibration line of the first video sequence and the second video sequence (step 412). The information of the stored calibration line can be used to calibrate the two video phases in the future, and the calibration of the video sequence is required. The program 4 in "Fig. 4" judges the opposite side of the two special and frequency sequences corresponding to the same video program. The program repeats the steps for each pair of video sequences to pre-calculate the information and establish a poor library (or data structure) containing information about many relationships. A database of correspondence information is used to accelerate the operation of the -6 n6, avoiding the calculation of the correspondence contained in the database. The repository of information for the correspondence may be included in the universal video lookup system 116 or accessed by the communication video lookup system 116 (or other system) via the network 108 or other data communication line. In a particular embodiment, the information of the pre-established correspondence is provided by a data service to a plurality of systems or devices, such as video device 118. Fig. 5 is a flow chart showing a procedure of an embodiment for determining, capturing, and calibrating caption information from a movie played on a DVD. In the example of "Fig. 5", a request is received in order to obtain specific caption information related to a specific movie (step 5〇2). For example, a user may wish to watch a movie on a DVD disc, but Xi 13 200951832 hopes that the subtitle information is a special language, such as Russian. The dv〇 disc may contain subtitle information in English and Spanish' but does not contain Russian. Using the universal video search system 116 and program provided by the present invention, the user can watch movies on the DVD in Russian subtitles. Then, the program confirms the first motion identifier corresponding to the DVD disc (step, step 5〇4). For example, the 'first excitation identifier' may be a hash value generated by a hash identification calculation or a partial video content on a surface disc. In a special embodiment, it can be confirmed according to the difficult name corresponding to the visual capacity, the file-based hash value, or the hash value based on the Lang capacity, and the visual axis capacity (4) is stored in the DVD disc. The video content on it) the associated identifier. An example of a file name identifier is to use a URL of a fixed video store associated with video content that is permanently stored and not changed. The hash value of the caseline can be used to confirm the age of the same file in the point-to-point route, even if the independent case is recorded between the users, the file-based hash value such as "M〇viehash". Content-based hash values to identify your video content, such as the video DNA described previously. Therefore, the value of the Langrong silk is the constant file name, encoding program, processing of video content, or editing of video content. After confirming the DVD-ID identifier, the program 500 confirms the video sequence (step 506) in a movie stored on the DVD disc, such as the first 3 seconds of the movie or any other video sequence in the movie. Then, the program 5 confirms the second DVD identifier associated with the DVD based on the confirmed video sequence (step 5〇8). The second dvd identification code is selected based on the additional information required (such as video metadata). In this example, the second dvd identifier is selected based on the index type of the message used to confirm the source of the subtitle. Thus, the second DVD identifier is used to find and retrieve information on Russian subtitles in the subtitle source associated with the particular 200951832 DVD disc selected by the user (step 510). The post-employment process 500 determines the correspondence between the time axis of the DVD movie and the day/inter-axis of the power associated with the source of the subtitle (step 512). The subtitle information is then mapped to the DVD movie in accordance with the scale axis of the calibration' (step 514). This correspondence and calibration line is a display of subtitles that require time to synchronize the appropriate movie content. In a particular embodiment of the procedure 5, the correspondence between the time axis of the DVD movie and the time axis of the electrical shadow associated with the source of the subtitle is pre-computed and stored in a database or data structure. In other embodiments, when f is needed, the correspondence (4) identified in step 512 will be calculated first 'and then selectively stored to provide future reference 0. 装置 The device and method of the present invention include metadata. Components and corresponding components. The metadata element is associated with a video content identifier, such as a lion or a hash function, and the metadata element also corresponds to a space-time coordinate _, such as a (post) and video content identifier in a video sequence. In the space-time coordinates (x, y, t), χ corresponds to the spatial dimension, and t corresponds to the time dimension. The specific metadata element m ( 'x, y, z) is shown, where "να" represents the video content identifier. The να 'metadata in different faces may be configured with the forest_yuan data source. In the sum of money, the meta-data is the material of the order grade (for example, the metadata element m (VCI; x) = knowledge) and the time (four) is not detailed. 4 Under the condition of 'empty movie' 15 200951832 is not necessary. After the seat ^, and the corresponding relationship between the two different video content identifiers of the time and space, the function is expressed by the function (1) Wvcn, (10); xl, yl, tl), where (x2 y2, T2)^' and (A yi, ti) are space-time coordinates in the video sequence, and the video sequences vcn and VCI2 correspond. Video content from different sources can be mapped according to different kinds of enemy να (for example, YouTube video clips and CDs), and the video content and phase __ να can also be established differently for the same movie on the blue line, such as DVD. version. The general lookup associated with the video is presented in two phases (VCI〇; x〇 y〇, z〇). First, the new search has a corresponding να with v(10), where C (vao, VCI1; XO, y0, to), c (ν(10), VCI2; x〇 y〇 t〇),

c (VCIO, VaN;x0，y0，t0)表示。這些對應關係轉換座標（X )為（xl，yl，tl)、（x2，y2，t2)、…、（xN,yN，tN)。接著，系統 ◎ 會操取元資料 m(VCI1;xl，yl，tl)、m(VCI2;x2，y2,t2)、·..: (VON; χΝ, yN，tN)。對於特殊的元資料元件，系統會擷取全部的凡資料或是部份的it資料。舉例來說，如果元資料包含關於一部電影的所有資訊，系統可能只想要顯示電影的名稱及摘要資訊’因此系統只會擷取出部份的元資料。、如本發明所討論的，查找相關的通用視頻計算出兩個視頻序列的對應關係。在某些特定的情況下，時空的對細係已被計算出來’在其他的情況下’時間的對應關係或空間的對應關係是I 夠的。當時空_應義是必需的時候，通用視頻麵系統Μ 16 200951832 的實施例會先計算兩個視頻序列之時_對應_ ϋ 兩個相同的視頻序列之空間的對應關係。异c (VCIO, VaN; x0, y0, t0) is indicated. These correspondence conversion coordinates (X) are (xl, yl, t1), (x2, y2, t2), ..., (xN, yN, tN). Next, the system ◎ will fetch the metadata m(VCI1;xl,yl,tl), m(VCI2;x2,y2,t2),·..: (VON; χΝ, yN, tN). For special metadata components, the system will retrieve all the data or part of the it data. For example, if the metadata contains all the information about a movie, the system may only want to display the name and summary information of the movie' so the system will only extract some of the metadata. As discussed in the present invention, the associated generic video is searched to calculate the correspondence of the two video sequences. In some specific cases, the pair of time and space pairs has been calculated. In other cases, the correspondence of the time or the correspondence of the space is I. When the space is required, the embodiment of the universal video surface system Μ 16 200951832 first calculates the correspondence between the two video sequences at the time _ corresponding _ ϋ two identical video sequences. different

這兩個視頻相之時間的對應_可很多方法來由於-個或兩個視頻序列中的插人及/或刪除，會造成對應關係的空隙（gap)’因此視頻序列之時間的對應關係不需要是一對對應。時間的對應關係之概括描述可以被寫為（u，⑴，代表著第 -視頻序列中的時間tl對應到第二視頻序列中的時間β。動程式設計演算法，毅Smith_Walt_n演算法被絲找尋=個視頻序列間的最佳校準線（例如，—個或多個間隙的對應關係這些用來校準的資料可以是音贼以視料基礎的描述符號 (descriptor)，就像本發明所討論的DNA描述符號。 ) 如以上所描_ ’這兩個視頻序觸對應_可以為兩個定的視_容_符進行贱龍。典型_子為，視頻内容有 -個「原始」的版本，此版本是最完整也最有權威性的視頻内容。所有其他綱視頻内容的版本將會被同步到原始的版本（例如，對應關係就是使用原始的版本來計算）。在一個特定的例子中成為原始版本之鄉的DVD版核財的其滅本（修細心的版本以及BitTorrem的版本）會以DVD光碟的時間軸來進行校準〇各種系統和方法㈣識別、關聯、追縱、配對和調整視頻晝面以及視頻序列。這些功能之不同_的特定實施例討論如下: 視頻資料包括時空^料，該時空資料包含兩個空間維度以及一個時間維度（也就是二維的視頻圖像以及不同視頻畫面的時間序列）。我們H分兩個不同視齡面之咖的對應職與空間的對應 200951832 關係。時間的對應關係被表現在不同視頻晝面間之時間的時間粒度（timegranularity)中：視頻序列被視為有次序的—維晝面序列，且匹配產生了兩個序列中之畫面間的對應關係。空間的對應關係表現在子晝面粒度中，在序列中找到兩個晝面間對應的像素或域。 % 對應關係和相似性的問題是緊密關聯的，且通常計算一個問題會讓人推論另外-個問題也正在被計算，舉例來說，我們可以定義大量之簡_聯部分為相似，相反的，如果_有視頻序列的不同部分間之相似度的評斷標準，我們可以定義對應關係為❹ 該相似部分的最大範圍。在這裡’我們要區分兩種類型的相似性：語義的以及視覺的。物件之「視覺的」相似性意味著它們的「看起來很像」，例如說，他們的像素的表現是相似的。「語義的」相似性意味著兩個物件所代表的概念是相同的。語義的相似性比視覺的相似性的定義更廣泛，例如，卡車和法拉利的視覺上是不同的，但是語義上是相似的（都代表車的概念）’因此，視覺的相似性是比較容易量化和評 ❹ 估，而語義相似性比較主觀且與問題相關的。由不同的角度、照明條件、編輯、解析度等原因造成視頻訊-號幾乎總是有雜訊和失真，這個理想的相似性判斷標準應該是不，會隨些或其他變動而有所改變的，以學術的用語來說，如果相似性的判斷標準視兩個物件的描述為相似，無論他們照明的方式。我們S兒，相似性是不會隨照明條件而有所改變。上述的系統與方法允許匹配的視頻序列不受編輯和失真的影響，更精確的來說，這些系統與方法依據視覺的相似性提供了 — 18 200951832 ^寺空對應的框架，其可以讓時間的失真（像框架速率的改變），時，的編輯（移除或插人晝面），空間的失真（像素值（細_) 運异)、以及朗的編輯（刪除或增加畫面中的内容）不會改變。在數學上’時空匹配的問題可以被規劃為：提供兩個視頻序列，找出第-序列t之時空系統的座標（x,y，t)與第二序列中之時空系統的座標（x，，y’，t，）之對應關係。把視頻資料想成-個像素的三維陣列，時空的匹配問題就可The correspondence between the two video phases can be a lot of methods. Because of the insertion and/or deletion in the - or two video sequences, the gap of the correspondence relationship will be caused. Therefore, the correspondence of the time of the video sequence is not Need to be a pair. A general description of the correspondence of time can be written as (u, (1), representing the time t1 in the first video sequence corresponding to the time β in the second video sequence. The programming algorithm, the Smith_Walt_n algorithm is searched for by the silk = the best calibration line between video sequences (eg, the correspondence of one or more gaps). The material used for calibration may be a video-based descriptor, as discussed in the present invention. DNA description symbol.) As described above _ 'The two video sequence touches _ can be used for two fixed view _ _ _ 贱。. Typical _ sub, video content has a "original" version, This version is the most complete and authoritative video content. All other versions of the video content will be synced to the original version (for example, the correspondence is calculated using the original version). In a particular example The original version of the DVD version of the treasure of the original version of the treasure (the careful version and the version of BitTorrem) will be calibrated on the timeline of the DVD disc. Various systems and methods (4) identification, association Tracking, pairing, and adjusting video frames and video sequences. The specific embodiments of these different functions are discussed as follows: Video material includes time and space data, which contains two spatial dimensions and one time dimension (ie, two-dimensional Video images and time series of different video images.) We divide the correspondence between the corresponding positions of the two different age-age coffees and the space 200951832. The correspondence of time is expressed in the time granularity of the time between different video faces. (timegranularity): The video sequence is treated as an ordered-dimensional surface sequence, and the matching produces a correspondence between the pictures in the two sequences. The spatial correspondence is represented in the sub-surface granularity and found in the sequence. The corresponding pixel or domain between the two faces. The problem of the correspondence and similarity is closely related, and usually calculating a problem makes people infer that another problem is also being calculated. For example, we can define a large number of The _ _ part is similar, on the contrary, if _ has the same criteria for the similarity between different parts of the video sequence, I You can define the correspondence as the maximum range of the similar parts. Here we want to distinguish the similarities between the two types: semantic and visual. The "visual" similarity of objects means that they "seem like" For example, their pixel representations are similar. The "semantic" similarity means that the two objects represent the same concept. Semantic similarity is more broadly defined than visual similarity, for example, trucks and Ferrari is visually different, but semantically similar (both represent the concept of a car). Therefore, visual similarity is easier to quantify and evaluate, while semantic similarity is subjective and problem-related. Different angles, lighting conditions, editing, resolution, etc. cause the video signal to almost always have noise and distortion. This ideal similarity criterion should be no, and will change with some or other changes. In academic terms, if the criteria for similarity are similar to the description of the two objects, no matter how they are illuminated. In our case, the similarity does not change with the lighting conditions. The above described systems and methods allow matching video sequences to be unaffected by editing and distortion, and more precisely, these systems and methods provide visual similarities - 18 200951832 ^Temple-corresponding framework, which allows time Distortion (like frame rate change), time editing (removing or inserting face), spatial distortion (pixel value (fine_) different), and lang editing (deleting or adding content in the screen) will not change. In mathematics, the problem of spatio-temporal matching can be planned as: providing two video sequences, finding the coordinates (x, y, t) of the spatio-temporal system of the first-sequence t and the coordinates of the spatio-temporal system in the second sequence (x, , y', t,) correspondence. Think of video data as a three-dimensional array of pixels, and the matching problem of time and space can be

以被想為找出兩個三維_的對應_，—般來說，這個問題在汁算上很複雜（複雜度為NP^ompiete)，這樣的計算是不切實際的。這是因為如果沒有進一步簡化，計算系統將試圖找到第一和第二序列間之像素的所有可能子集之匹配，這將是一個相當大量的運算。胃然而，如果將匹配問題分成下列兩個單獨的程序，匹配問題可以大幅的簡化：時間的匹配以及空間的匹配。其中，空間匹配的問題是比較複雜的，因為視頻畫面是二維的，如此將產生大量 φ 的二維比對。相反的，雖然一維的時間匹配問題還是很複雜，但是運用本發明中所討論的視頻DNA或視頻基因（genomics)動態 -程式設計法，能夠簡單的讓一維（時間的）訊號有效的被匹配。 . 「第6A圖」表示視頻資料之空間的對應關係以及時間的對應關係的例子。在「第6A圖」中的第一階段600,時間的匹配是被顯現出來的（這步驟在以下會有更詳細的討論）。時間的匹配產生了第一視頻序列的子集中之時間的座標「t」及第二視頻序列的子集中之時間的座標「t’」間之對應關係。藉由表現時間的匹配，我們需要避免去試著表現在視頻序列中之像素的所有可能子集之間 19 200951832 的兩維之空間的匹配（本質上是三維的匹配問題）。再者，這問題是因為大小而減少，因此空間的匹配必須在部份視頻序列之時間的對應關係的小子集間被表現。換句話說，為了空間的對應，在相對地比較小部份的2D視頻晝面中，大的3D匹配問題會被轉變為、比較小的2D匹配問題。舉例來說，嘗試將與上層的全部視頻序列· 中之像素「thing」匹配的「apple」字串替代為在下層的全部視頻序列中對應「apple」的事物，如此，在「序列A」以及「序列b」中最相關的少量的晝面就被檢視過了。以下討論的是視頻序列中之其中一個典型的短問題，如此部 © 分視頻序列的時間的對應關係是小量的，如此可以大量地減少空間匹配的問題。在「第6A圖」中的第二階段6〇2 ’時間的對應關係之視頻資料的空間匹配被展現出來。空間的匹配產生在第一及第二視頻序列上之時間的匹配部分（如：畫面）之空間座標（x，y) 與（x’，y’）間的對應關係^ ’ 在描述的系統和方法上，配對可以變為更強烈，對於扭曲現象及影片内容的編輯來說是不變的。特狀，時間的配對對於短 ❹ 暫的影片序列編輯是穩定不變的。空間配對可以使得影片扭曲及影片+序列的編輯變得穩定（舉例來說’在「第6A圖」上顯示出來‘ 的韻果的不同比率’不同的光線’及不同水果的背景）。 ▲ 舄要被理解的是在這邊所描述的方法正常上都是在至少包含 :個處理11 (通常都是在兩似上處理器的電腦）以及記憶體（通 1有mega位元或者是giga位元以上的記憶體）的電腦系統上，°適合執行這個方法的現有的處理器通常是一般用電腦處理 x86 N KiTPC n vuu、Power、ARM及類似的處理器。或是被充作影 20 200951832 :如賴理器、數位訊號處理器、以及誰處 r的處理益。在這邊所描述的方法可以在更高階的 11 是「C」、「C++」、「java」、「制」、「p_」、，在較低階的元件上操作或是直接散人比較高級 p身雄研究的結果可以儲存在記憶儲存媒體上如記憶體或是隨身碟:硬碟，cd，勵，藍光及相_儲存媒介。 ❹ 、見貞訊（視頻像）可以藉由—小部分特殊位置的特徵點這種視頻特徵點是在影像中可啸發現的點，以穩定修 η二《广机的方法。—個特徵點會去影響到其他的特徵點，如 5电細的&述魏—樣’典型地贿局部的影仙容或是特徵點 ^週狀況。舰闕常魏力去轉影像是通旋轉，以修改後來呈現，及以不同的光源來呈現。個特徵點通f被描述為和影#巾時空侧子集的向量資訊舉例來说’-個點可以是時空邊緣的3維方向，移動範圍的方向顏色77布’等^基本上，部分的特徵點提供了對一個物 ❹體的描述所有的特徵點則建構了背景。舉例來說，一個在電腦廣告上的蘋果物體及在許辣果中的—個蘋果物體，都有相同的 -部分光點描述著這個物體，但整體的背景是不同的。 * 舉例來說，部分特徵點應該包含： .如同 c. Harris 以及 M. Stephens 在 i年第四屆 Alvey visi〇n Conference中之「A c〇mbined c〇mer — 如故你」所描述的角落Ϊ貞測器（Harris comer detector )及它的變異體。卽自 2004 年 D. G. Lowe 發表於「International Journal of putef Vision」上之「Distinctive image features from 200951832 seale-invariantkeypoints」中所描述的 SLFT 理論。 •藉由解碼視頻串流所獲得的移動向量。 •時空邊緣的指引。 •顏色的分布。 •本質的描述。 · •對已知之字典内的像素進行係數分解，例如wavelets、curvelets 等。 •習知技術中之特殊的物體。將這個想法延伸到影像資料上，我們可以提取一個視頻成為二維空間的結構（二維空間是由許多2D的圖片所形成，而一維空間是由許多的視頻畫面來形成）。這個3D架構可以使用為呈現視頻序列的基本建造基石。如同前面所談論到的，以生物學名詞的方法來思考視頻分析的問題是相當的有幫助的，並從生物形成學上獲得靈感。在這裡，這個例子，把特徵點當成是原子來思考是有用的，由多種視頻畫面所濃縮成的特徵點就像是核苷酸，視頻本身就像是核苷酸受命 ❹ 令控制的序列’就像是大的DNA或RNA分子。在視頻序列的空間及時間的維度有著不同的詮釋。時間維度可以是視頻資料的命令。我們可以說是-個特徵點接在料. 特徵點之後。如果我們把視頻序列劃分成為短暫的區間，我們可以把它當作是視頻成份的一個受命令控制的序列。裡面都包含著特徵點的集合。如同前面所提到的，在這裡我們認為視頻資料$ 小部分核苷酸的序列，以及我們認為一個視頻的訊號也是由像$In order to find out the correspondence between two three-dimensional _, in general, this problem is complicated in the calculation of the juice (complexity is NP^ompiete), and such calculation is impractical. This is because without further simplification, the computing system will attempt to find a match for all possible subsets of pixels between the first and second sequences, which would be a fairly large number of operations. Stomach However, if the matching problem is divided into the following two separate programs, the matching problem can be greatly simplified: time matching and space matching. Among them, the problem of spatial matching is more complicated, because the video picture is two-dimensional, which will produce a large number of two-dimensional alignment of φ. On the contrary, although the one-dimensional time matching problem is still complicated, the video DNA or genomics dynamic-programming method discussed in the present invention can easily make one-dimensional (time) signals effectively match. "Fig. 6A" shows an example of the correspondence between the space of the video material and the correspondence of time. In the first phase 600 of Figure 6A, the matching of time is revealed (this step is discussed in more detail below). The matching of the time produces a correspondence between the coordinates "t" of the time in the subset of the first video sequence and the coordinates "t'" of the time in the subset of the second video sequence. By representing the matching of time, we need to avoid trying to match the two-dimensional space between all possible subsets of pixels in the video sequence (in essence, the three-dimensional matching problem). Moreover, this problem is reduced by size, so the matching of spaces must be represented between small subsets of the correspondences of the time of the partial video sequence. In other words, for spatial correspondence, large 3D matching problems are transformed into smaller 2D matching problems in relatively small portions of 2D video. For example, try to replace the "apple" string that matches the pixel "thing" in the entire video sequence of the upper layer with the thing corresponding to "apple" in all the video sequences in the lower layer. Thus, in "sequence A" and The most relevant small number of faces in "sequence b" have been examined. The following discussion is one of the typical short problems in video sequences. The correspondence of the time of such a video sequence is small, which can greatly reduce the problem of space matching. The spatial matching of the video data of the corresponding relationship in the second stage of the 6th 2' time in "Picture 6A" is revealed. The matching of the spaces produces a correspondence between the spatial coordinates (x, y) and (x', y') of the matching portion (eg, picture) of the time on the first and second video sequences ^ ' in the described system and In practice, pairing can become more intense, and it is constant for the distortion and editing of the content of the movie. In particular, the matching of time is stable for the short film sequence editing. Spatial pairing can make the film distortion and the editing of the film+sequence stable (for example, 'the different ratios of the rhymes' on the "Fig. 6A" and the different fruits' backgrounds). ▲ It is to be understood that the methods described here are normally at least included: processing 11 (usually on two computers with processors) and memory (passing 1 with megabits or On a computer system with a memory above giga bit, the existing processor suitable for performing this method is usually a computer that processes x86 N KiTPC n vuu, Power, ARM, and the like. Or be used as a shadow 20 200951832 : such as the processor, digital signal processor, and who is the processing benefits. The methods described here can be "C", "C++", "java", "system", "p_" in higher order 11 and operate on lower-order components or directly. The results of the research can be stored on memory storage media such as memory or flash drives: hard disk, cd, excitation, blue light and phase storage media. ❹ , see 贞 ( (video image) can be used - a small part of the special location feature points This video feature point is a point that can be found in the image to stabilize the η 2 "wide machine method. - A feature point will affect other feature points, such as 5 electric & Wei - sample 'typically bribe local shadows or feature points ^ week situation. The ship's often Wei Li to turn the image is rotated, modified to be presented later, and presented with different light sources. The feature point f is described as the vector information of the time-space side subset of the shadow image. For example, '- a point may be a 3-dimensional direction of a space-time edge, a direction color of a moving range 77 cloth', etc. basically, part of Feature points provide a description of an object body. All feature points construct a background. For example, an apple object on a computer advertisement and an apple object in a hot fruit have the same - part of the light spot describes the object, but the overall background is different. * For example, some feature points should include: . Like the corners described by c. Harris and M. Stephens at the 4th Alvey visi〇n Conference in the year of "A c〇mbined c〇mer — as you are." Harris comer detector and its variants. The SLFT theory described in D. G. Lowe, "Distinctive image features from 200951832 seale-invariantkeypoints", published in "International Journal of putef Vision" in 2004. • The motion vector obtained by decoding the video stream. • Guidance on the edge of time and space. • The distribution of colors. • Essential description. • Factor decomposition of pixels in known dictionaries, such as wavelets, curvelets, and so on. • Special objects in the prior art. Extending this idea to imagery, we can extract a video into a two-dimensional structure (two-dimensional space is formed by many 2D images, and one-dimensional space is formed by many video images). This 3D architecture can be used as a basic building block for presenting video sequences. As discussed earlier, thinking about video analytics in terms of biological nouns is quite helpful and inspired by biomorphology. Here, in this example, it is useful to think of feature points as atoms. The feature points condensed by multiple video frames are like nucleotides, and the video itself is like a sequence controlled by nucleotides. It's like a big DNA or RNA molecule. There are different interpretations of the spatial and temporal dimensions of the video sequence. The time dimension can be a command for a video material. We can say that a feature point is connected to the material. After the feature point. If we divide the video sequence into short intervals, we can think of it as a command-controlled sequence of video components. It contains a collection of feature points. As mentioned earlier, here we consider the sequence of a small portion of the nucleotides of the video material, and the signal we think a video is also like by $

苷酸的視頻提交，稱為視頻的DNA 〇 X 22 200951832 叙序浙析，本㈣所提之裝置和方法能夠將隹人沙不為二維、二維以及一維的訊號。考慮全部特徵點的我們會得到-個三維的（時空上的）結構。考慮時間區間 •而歹我們可以獲得一個一維的表示式。考慮序列中的一個晝我們可轉付—個二維的表示式。相同的表示式翻來進行 Μ與空間的配對階段。接下來為—個兩階段的比對方法。，在第一階段，-個視頻相之時_表示式將會被建立。每 ❹^視頻相將會被分割成—些時間區間。這裡賴的時間區間通 '不是只有單一的視頻晝面’而是由-些視頻晝面（例如：3到則固晝面）組成的片段。時間區間將會在這裡有更深入的討論。對於所有的時間區間，實際的視頻影像會被抽取成一個表示式U此也被稱做視覺核苷酸（visualnude〇tide))，表示式中會包 3這個區間的關鍵特徵點n依據丟棄各種特徵點的時空座標抽取並壓縮這個特徵點序列。換句話說，我們只要開始計算特徵點的不_態。勒話說，這_似要記錄微_描述符號 O (descriptor)以及不同的特徵點型態之描述符號的數量。每個時間分割的視頻訊號（我們將會稱為「㈣酸」，為類似 •生物上的核普酸）將會被表示成一個未排序的集合或是一個特徵 .點的「袋子」（或是-個特徵描述符號的袋子）。目此，如果每個特徵點被想成-個「視覺原子」，那麼「特徵包」就絲一個特定視頻的時間區段，可以被稱為一個「核苷酸」。然後，不同視頻的時間區段的表示式被安排至-個已排序的「序列」或是映射（历叩） (視頻DNA)。在本發明中的，我們通常使用「核苷酸」這個術語，而非「特徵包」，因為這以一個逼近視頻分析程序之有用的生 23 200951832 物資訊方法來幫助引導思考β 的方DNA __序列’可以_類似的方法來权準，像是以比對和校準dna 列分析中，一個重要的問題就是嘗 ^ 序尋找方式為州 ❹ =的最大相似和最小的差距來找尋對應_::= t的系拙方法的演算法，類_絲缝資财的DN 背，=演算法可以被用來對齊兩個不同的視頻訊號。、額外=:階段比對兩個視頻媒體的部份符合後，將可以完成係之視瓣麻」r 段’可以找到時間對應關間的對應關係。在第—個視頻中所更仔二;爻西」(像素群，可以在第二個視頻中比對出來。内6 =舰在可W顿兩辦㈣應之視娜像的書面内谷間之空間的對應關係。 —囬在泛之後的第二階段’我們不會丢棄特徵點的時空座標。在 Ο -IW又’母個晝面被表示成一個二維的特徵點結構，而且我們 t保留特徵點的座標。為了達到第二階段的目的，在此將會用到 2之空間匹配、視齡面_容比較、更標準的特徵點演算法、先則所使用的電腦視覺文件。對於物件識別和其他的翻，以物件為基礎的分析是必要二了_ (vldeGge_ies)」方法提供較先前技術有意義的。錢，本發騎财的祕和綠提供一個 =彳描述符號為高的辨識力。這個辨識力不但是物件描述付號自己_識力’而且包含時_紐，像這些贿符號的時 24 200951832 間序列。雖然一些存在的方法說，最好的辨識能力是經由一個較大數字的精準優化特徵點獲得的，但我們發現，事情並非如此。出乎意料的，當本發明所描述的系統和方法與先前技術一個一個 ^的比較時，我們發現結果是時間的支援（像是時間順序在多數特 •徵點的群組出現）較使用非常大量的不同描述符號為更重要的辨識力（discriminative power)。舉例來說，通常增加物件描述的精準度是直得的。先前技術以「暴力法（brute f〇rce)」來增加精準 ❹度的方式將會容碰得更多的特徵點和特徵描述符號，但是每個特徵點和特徵描述符號是由密集運算所產生時，先前技術之「暴力法」由於較高的計算開銷，將迅速的到達遞減的位置。無淪如何，我們找出了一個增加物件描述的準確度的方法，其他方面鸪要先刖技術依據兩個強度的順序（依據附近兩個強度的順序增加計算負荷）增加視覺的辭彙大小，如此可以依據本$ 明所提之裝置和方法使用較低強度的運算進行簡單地比對。本發明所描述之裝置和方法用來改進精準度，我們避免增加特徵描述 ❹=躺數量，而域自—辦間分辨上的分析來增加精準度。這將經由簡單的增加兩個更多的「核苦酸」（像是在-個視頻分析 \中使用較小的時間分割）到「視頻DNA」序列比較來完成。為了 *避，顧的增力σ特難賴量，本發騎提之裝置衫法可以達到南精準度’而且從計算的觀點將更有效率。 J技術的方法’像j Sivic以及a zisserman的「Video Google, a text retrieval approach to object matching in video j 係將影片當作影像的—個集合，其中，為了獲得高的識別力，因此必/員使用大量的特徵點「詞彙」（超過百萬的元素）。相比之下， 25 200951832 使用時間支援的财可以較少⑽徵點辭彙（得到相同或更好的結果，並增加計算的效益。s的讀）第-個優點是以内容為基礎之取回應用，本和方法准許取域興趣的物件和内容歧的物件。時間序列^置被想f描述物件的額外資訊，還有物件本身的描述。、 β以「第6Β圖」展現出同一物件（蘋果61〇)在兩個不同令出現的情形:水果612和電腦614。在第—案例中 ^ ❹ 働件纽在與錢及料在—_賴，魏料了之= 有水果的涵義。而在第二案例當中，「蘋果」這個物件則在鱼筆^ f電腦和1Ρ_手機的序财，這便賦予了這物件具有電腦的涵義。在這裡，系統跟方法足夠複雜地去辨別這些上下文的相異處。因此，該視頻映射/視步員DNA在這兩個案例中的呈現將有所不^ 儘管事實是物件本身是一樣的。與此相反，如Sivic和Zis_an之先前技術的方法，並沒有〇考慮到視頻内容的前因後果，因而無法區分這兩種不同的物件「蘋果j在上面的例子中的差別。第二個優點是描述「視頻基因」的方法允許以許多不同的方式來執行部分比較以及視頻序列的配對。正如生物資訊學中的方法允許不同的DNA序列進行比較，兩種不同的視頻DNA序列可以進行配對，儘管有一些不同的視頻畫面（核苷酸），像是插入或空隙。當不變性的視頻修改（如時間的編輯）是被需要的時候，這一點特別重要。舉例來說，電影的視頻DNA與插入廣告的版本需要被正確的配對。「第7圖」展現出了一個關於產生視頻序列之視頻映射/視頻 26 200951832 DNA代表式的概念性綱要。程序包含以下階段。在第一階段7〇2，一個局部的特徵偵測器被用來偵測視頻序列尹之興趣點。適合的特徵该測器包括公開於1981年C. Harris及M. Stephens所合著之「A combined comer and edge detector」中的角落Y貞測器（Harris # comer detector)、或者是由D. G· Lowe在2004年發表在ucv的文旱 Distinctive image features from scale-invariant keypoints」中之以尺度不邊特徵點轉換（SIFT)尺度空間為基礎的特徵偵測器。〇興趣點可以被追朔到數個視頻畫面去修整無意義的或時間不一致（例··出現在一過短的時間内）的點。這個部分我們稍後將會描述的更加詳細。然後剩餘的點則使用局部特徵描述符號，例如：尺度不㈣徵點賴（SIFT)居於局部分佈梯度方向、或由 H· Bay、T. Tuytelaars 以及 L· van G〇〇1 在 2〇〇6 發表之叩 mbuStfeatures」中所揭露的加速強健特徵點（sur^演算法。特徵點_與描述演算法應該被設計成對空間扭曲的視頻序列而言是強健或不變性（例如：解析度的改變、雜訊壓縮等時 ©，特徵點的位置和對應的特徵描述符號組成最基柄視頻序列之表示式層級。㈣階段W ’視頻相被分帅_區間7G6，其經常二(通常是3到30個晝面）。這樣的分割是可以注音…+1/’可以依據從先前的階段追蹤的特徵點。值得的ΓΓΓ視頻的修改而言應該是不變的，如先引所㈣之時空位置（特徵點座標）在這個階段不使用。 27 200951832 相反地’時間區間中的資訊被描述為使用「特徵包」的方法刑。在此’類似於Sivic和Zis_an所提出的方法，所有的特徵描述符號代表使用視覺詞彙（獲取自代表性的描述符號之聚集，舉例來說，藉由向量化的綠）。每轉徵描述符號被視覺触中. 相應最接近的元素所賴。如前述所論及的H點在這種方式· 下的表示枝也為視覺軒。依據這麵推的枝，贼詞囊可以被視為是一個「元素週期表」的視覺元素。然而，不同於Sivic和Zisserman所提之先前技術的方法，在這裡我們捨棄那些被稱做「表示式」、「視覺的核_」、「核苦〇酸」、或偶爾被視為「特徵包」710之特徵點的空間座標，並且以在時間區間内之不同視覺原子出現頻率作為直方圖（群組或向里）。這裡的代表來自視頻之一定數量的視頻畫面之「視覺核苷酸」 712基本上是依據捨棄空間座標與單純地計數發生頻率所創造^ 「特徵包」（這個過程被視為「包函式」或「分組函式」）。如果視覺元素的標準集合是用來描述每個「特徵包」中的内容，那麼一個視覺核賊可以表示絲學形式之直方圖或稀疏向量。糊來說，如果「特徵包」描述了好幾個視頻影像，其中包含第一特徵點的有三個案例，包含第二特徵點的有兩個案例，且沒有一個案 · 例是包含第三特徵點的，那麼用來描述視頻影像的視覺核苷酸或 · 「特徵包」就可以用直方圖或向量（3,2,〇)來表示。在這個例子中視覺核皆酸（321)被用直方圖或向量表示為（〇,〇,〇, 4, 〇,〇〇〇,〇,5,0)。，，，，「特徵包」表示法允許空間編輯的不變性：假如視頻序列被修改，舉例來說，覆蓋像素到原始晝面上，新的序列將包含混合 28 200951832 的特徵點（一部分的舊特徵點將屬於原始的視頻且另一部分的新特徵點則對應到覆蓋層）。假如重疊的大小沒有非常明顯（換言之’晝面中的大多數資訊是屬於原來的視頻），只需要在各自的「特徵包」（像是疏散向量）同時具有一定比例的特徵點元素，就有可 # 能正確地匹配兩個視覺核苷酸。最後，所有視覺性的核苷酸（或特徵包）被聚集為視頻映射或視頻DNA*714中之經過排序的序列。每個圖像（或視覺性的核 ❹ 苦®文、「包」、直方圖、或稀疏向量（sparsevector))可以被視為在無限的字母表中之一個被推論的字母，因此，視頻DNA為廣義的文字序列。兩個視頻序列的時間匹配能夠依據使用多種不同的演算法匹配對應的視頻DNA。為了匹配生物的DNA序列，這能夠由非常簡單的「匹配/未匹配演算法」延伸到適合生物資訊的「點矩陣（d〇t matrix)」演算法’到類似那些被使用在生物資訊的精密演算法。一些較複雜的生物資訊演算法例如：1970年由S.BNeedleman以 ❹ 及 C. D Wunsch 在「A general method applicable to the search for similarities in the amino acid sequence of two proteins」中所提出之 • Needleman-Wunsch 演算法、1981 年由 T. F. Smith 以及 M. S. * Waterman 在「Identification of common molecular subsequences」中所提出之Smith-Waterman演算法、以及1990年由S. F.Alschul等人在「Basic Local Alignment Search Tool」中所提出之 Basic Local Alignment Search Tool ( BLAST )等啟發式演算法。通常，合適的序列匹配演算法將依據定義匹配的評分（或距離）以及兩視頻序列間之匹配的品質進行運作。匹配的評分包含 29 200951832 兩個主要的讀：核聽與雜齡（gappenalty)間之相似處（或距離）、_如何不嘗試分裂序列而引人間隔的標準的演算法。為了做到這-點，在第一視頻中的核芽酸以及在第二視頻中之對應的㈣酸之距離必驗據—魏學的鱗來觸。如何將，「特徵包」從第-視頻畫面序列相似到第二視頻晝面序列中的「特· 徵包」？相似值能触測量兩健賊有細似或衫不相似的矩陣來表達。在簡單_子巾，相似絲_歐基里德距離 (Euclideandistance)或相關性向量（特徵包）表示各個核苷酸。若希望允許部分相似（這經常發生，特別是視覺性的核_可能© 在空間編輯時包含不同的特徵點），則應該要使用權重或異常剔除 (rejection of outlier)更複雜的矩陣。更複雜的距離可能還會考慮到兩核微間的異變機率：若兩個不同的核碰為彼此的異變，則該兩不同的核微更有相似的可能。例如，考慮第―視頻影像序列的第一視頻影像、相同之第一視頻影像序列的第二視頻影像、以及視頻重疊（video overiay)。明顯的，許多在第一視頻所描述之包内的視頻特徵點將與在第二視頻所描述之包内的視頻特 ❹ 徵點相似，而因為視頻重疊’「異變」在此為那些不同的視頻特徵點。 . 空隙罰分是在序列核苷酸之間引入空隙的一種功能。如果使 . 用的是線性罰分，只要簡單地求出空隙乘以一些預設常數的數量即可。較為複雜的空隙罰分也許就要考量到一個空隙會出現的機率，例如，可以根據内容中廣告位置和長度的統計分佈來判別。以下的討論辨別了生物DNA與視頻dNA之間的相似性與相異性案例。由於在此所討論的系統和方法基本上將配對不同視 30 200951832 頻媒體的對應部分問題轉換成了容許與配對生物dna序列之問題有相似性的問題，因此可以經由更仔細地檢視這種類推而得到某些較深入的理解。因為DNA序列的配對技術事在發展中相對較為後階段的狀態下進行，與視頻的配對技術有關，因此系統和方法有了意料之外的結果’即顯示一些後階段的DNA生物資訊學研究方法技标如何能出乎意料地被應用到配對視頻信號的不同領域如同之前所討論的，在概念層次下，生物DNA的結構與描述視頻DNA方法之間有著強烈的相似性。一個生物DNA是由核苷酸所組成的序列，同樣地視頻DNA則由視覺核苷酸（多重視頻畫面的特徵包）所組成的。生物中的一個核苦酸是由元素週期表中眾原:所組成的-個分子，同樣地，視覺㈣酸就是由視覺詞棄 (通㊉式不同特徵點的―個標準化集裝區）中視覺原子（也就是特徵點）所組成的一組特徵包。 ❹ 第8圖」以圖表的方式透過呈現抽取之視頻信號8⑼以及 ^物DNA分子與其組成物（核芽酸與原子卿結構之間的類比， "月了「視頻DNA」名稱峰_^齡有著概念上的相似性，視咖A之有物細差異。首先，出現在生物二，β、f子Γ素週期表規模是小的，通常僅包括—些元素（例反虱氧、磷、氮等等）。在視N 的硯頻詞彙規模典型地至少合—i 侧C原子) (44mw^〇 S有好幾千至好幾百萬的視覺元素時咬幾百個Γΐ謦典型核芽酸分子中的原子數量也相對較少（幾數特徵包)中的「視覺原子」(特徵點) 就料百錢千個。當在生物觀酸巾時，空間關 200951832 係以及原子間的關係是很重要的，而在一個視覺核苷酸中，特徵點之間的這種關係（也就是特徵點間的調和）通常不會被強調或被忽略。第三’生物DNA序列中不同核苷酸的數量是小的—通常在、 DNA序列中有四個（rA」、rT」、rG」、rc」）核苦酸，在蛋白質· 序列中則有二十個。相較之下，在視頻DNA中，每個視頻核苷酸是一個通常會含有至少幾百或幾千個不同特徵點的「特徵包」，可以由直方圖或向量的方式呈現。因此，如果一組或一集裝區，例如500或1000個標準特徵點被當作是一個標準的是頻分析選擇來 ® 使用時，每個「特徵包」就會是一個由出現在一系列被描述為「核苷酸」或「特徵包」之視頻晝面中各個500或1000個標準特徵點的係數的倍數組成的直方圖或向量，因此可能會個別代表著不同視頻核苷酸的這個量中的排列數量是很大的。這些事實上的差異會讓視頻DNA的配對只會在本意上相似於生物序列的配對。在某些方面，視頻配對的問題較為困難，而在另一些方面上，它又較為簡單。更明確地來說，配對演算在以 ❹ 下的面向上是不同的。首先，在生物序列中，由於不同核苷酸的數量是小的，配對 — 兩個核碰的分數就能以簡易的「配對」或「不能崎」結果來‘ 表示。亦即，一個生物的核苷酸可能是「A」、「T」、「G」或「C , 而要辨別的就是是否有rA」與rA」的配對，或是沒有。相較之下’視頻DN Α巾的每麵賊本纽是通f錢百個錢千個不同係數的一排直方圖、向量或「特徵包」’因此配對的操作會較為複雜。所以，對視頻DNA而言，我們需要使用—種較為的核 32 200951832 苷酸之間的「分數法」或「距離法」概念。這種分數可以被視為是直方圖或向量之間的某種距離法。換言之，任兩個不同「特徵包」之間的距離有多遠？ ‘ 另一方面，許多不同的概念，像是同源性分數、嵌入、刪除、 r點突變以及其他類似的概念，在這些兩個不同的領域之間有著顯著的相似性。「第9圖」呈現的是在一個實施例中，輸入視頻序列中的一 ©個已演算視頻DNA。視頻DNA的演算過程會收到視頻資料9〇〇，且包含以下的階段：特徵偵測1000、特徵描述2〇〇〇、特徵修整 3000、特徵重現4000、時間區間的分割5000、以及視覺原子的聚集6000。過程的輸出是一個視頻DNA 。某些階段可能會在不同的實施例中進行，或者完全不進行。以下的描述詳盡地說明了以上視頻DNA演算階段中不同的實施例。如第圖」所示，視頻序列被分成一組時間的區間。「第 ίο圖」指出了在一個實施例中，視頻時間區間1〇2〇的長度是固定 ❹的（例如’1秒）且不會重疊。在另一個實施例中，時間區間1022 有-些重疊的部分。在此，各個視_可能會由和出現在i '秒（或是一次要組別）中一樣多的視頻晝面所組成的，它取決於 .每秒的畫面比例’可能是10個、16個、24個、30個、60個或一些次要組合的晝面。在另-個實施例裡’這些區間是根據鏡頭（場景）的位置而改變、或者疋在為了滿足兩個連續性的畫面的前提下而突然轉折 (參考腦>14有可能是糊追雜賴結絲蚊接下來的鏡頭在每#結構下，這些新的路徑數字會被計算而且會取代原 33 200951832 有的路徑數字。如果消失的路徑數字超越某些界限，而後來取代原有路㈣聽據也會超_-麵限，這種畫面減為一個鏡頭。如果鏡頭已經被在某些地方上時，在這之巾某個視頻核普酸可能是由上百個或上千個視頻畫面所組成如果此畫面很長的話。在另-個實施例裡，_是目定持續的而且在每個綱下同步（參考1026) ❹ 特徵伯測1〇〇〇 (清參考「第9圖」）。一個特徵偵測器運作在視頻資料_上，並產生一組N個不變特徵點的集合，{(从,肥（請參考「第9圖」麵），其中，x、y、t分別代表特徵點的時空對應處。特徵偵測步驟1000已經詳細的表示在「第u圖」並且有一個用義綠的實例。機侧麵是在-健礎的畫面下執行。在時間t時一個晝面，一組^的特徵點倾配置。典型的特徵點有著二維的邊緣或祕。不變的特徵點_的標準演算式是可以顧_!<裡的。例如··祕伽❻猶、 tracker (KLT)…等等。 ❹Video submission of glycosides, called DNA DNA 〇 X 22 200951832 The analysis of the device and method proposed in this (4) can make the human sand not two-dimensional, two-dimensional and one-dimensional signals. Considering all the feature points, we get a three-dimensional (space-time) structure. Consider the time interval • and we can get a one-dimensional representation. Consider a 昼 in the sequence. We can transfer a two-dimensional representation. The same expression is used to carry out the pairing phase of space and space. The next step is a two-stage comparison method. In the first phase, the _ expression will be created when a video phase is reached. Each video phase will be divided into time intervals. The time interval here depends on 'not just a single video face' but a segment consisting of some video faces (for example, 3 to solid face). The time interval will be discussed in more depth here. For all time intervals, the actual video image will be extracted into a representation U. This is also called visual nude (tide). In the expression, the key feature points of the interval of packet 3 are discarded. The spatiotemporal coordinates of the feature points extract and compress this sequence of feature points. In other words, we only need to start calculating the non-state of the feature point. In a nutshell, this seems to record the number of descriptors O (descriptor) and the number of descriptive symbols of different feature point patterns. Each time-divided video signal (which we will call "(4) acid", which is similar to a biochemical acid) will be represented as an unsorted collection or a feature. The "bag" of points (or Is a bag of characterization symbols). Therefore, if each feature point is thought of as a "visual atom," then the "feature packet" is a time segment of a particular video, which can be called a "nucleotide." The representation of the time segments of the different videos is then arranged into a sequence of "sequences" or maps (calendar) (video DNA). In the present invention, we usually use the term "nucleotide" instead of the "characteristic package" because it helps to guide the square DNA of β by a useful information method that approximates the video analysis program. The _sequence' can be used in a similar way. For example, in the analysis of the alignment and calibration dna, an important problem is to find the correspondence between the maximum similarity and the smallest gap of the state ❹ = _: The algorithm of the system method of := t, the _ _ back of the class _ 资资, = algorithm can be used to align two different video signals. After the additional =: phase is compared with the two video media, the corresponding relationship between the time and the correlation can be found. In the first video, I have two more; Daisy" (pixel group, which can be compared in the second video. Inside 6 = ship can be Wton two (four) should be seen in the image of the valley between the valleys Correspondence of space. - Back to the second stage after the pan. 'We will not discard the space-time coordinates of the feature points. In Ο -IW and 'mother facets are represented as a two-dimensional feature point structure, and we The coordinates of the feature points are preserved. For the purpose of the second phase, the spatial matching of 2, the comparison of the age, the more standard feature point algorithm, and the computer vision file used by the first are used. Object recognition and other flipping, object-based analysis is necessary _ (vldeGge_ies)" method provides more meaningful than the previous technology. Money, the secret of the hair riding and the green provide a = 彳 description symbol for the high identification This discrepancy is not only the description of the object, but also the time series _News, like the sequence of these bribes. The sequence of 24,518,518. Although some existing methods say that the best ability to identify is through a larger Digital precision The feature points were obtained, but we found that this is not the case. Unexpectedly, when the system and method described in the present invention are compared with the prior art, we find that the result is time support (like chronological order). In the group of most special points, it is more discriminative power than using a very large number of different descriptive symbols. For example, the accuracy of the object description is usually straightforward. The prior art is " The brute f〇rce method to increase the precision of the sensibility will touch more feature points and feature descriptors, but each feature point and feature description symbol is generated by the intensive operation. The "Violence Law" will quickly reach the decreasing position due to the high computational cost. Innocent, we have found a way to increase the accuracy of the description of the object, and other aspects must first be based on two strengths. Sequence (increasing the computational load according to the order of the two nearby strengths) to increase the visual vocabulary size, so that the device according to the present invention can be used The method uses a low-intensity calculation for simple comparison. The apparatus and method described in the present invention are used to improve the accuracy, and we avoid increasing the number of features ❹=lying, and the analysis of the domain-to-office resolution to increase the accuracy. This will be done by simply adding two more "nucleic acid" (like using a smaller time segmentation in a video analysis) to a "video DNA" sequence comparison. The strength of the σ is particularly difficult to rely on, the hairdressing method of the hair ride can achieve the South precision 'and will be more efficient from the point of view of the calculation. J technology method 'like j Sivic and a zisserman's "Video Google, a Text retrieval approach to object matching in video j is a collection of images, in which a large number of feature points "vocabulary" (more than one million elements) are used in order to obtain high recognition power. In contrast, 25 200951832 use time support can be less (10) vocabulary (get the same or better results, and increase the efficiency of the calculation. s read) The first advantage is based on content Back to the application, the method and method permit the fetching of objects of interest and content disparity. The time series is set to describe the additional information of the object, as well as the description of the object itself. , β shows the same object (Apple 61〇) in two different orders in the “Picture 6”: fruit 612 and computer 614. In the first case, ^ 働働纽新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新新In the second case, the "Apple" object is in the middle of the fish pen and the computer, which gives the object a computer meaning. Here, the system and method are complex enough to distinguish the differences between these contexts. Therefore, the presentation of the video map/visualist DNA in both cases will not be the case, despite the fact that the object itself is the same. In contrast, prior art methods such as Sivic and Zis_an did not take into account the prophetic consequences of video content, and thus could not distinguish between the two different objects "Apple j in the above example. The second advantage is the description The "video gene" approach allows partial comparisons and pairing of video sequences to be performed in many different ways. Just as the method in bioinformatics allows different DNA sequences to be compared, two different video DNA sequences can be paired, although there are some different video frames (nucleotides), such as insertions or gaps. This is especially important when invariant video modifications (such as time editing) are needed. For example, the video DNA of the movie and the version of the inserted ad need to be correctly paired. Figure 7 shows a conceptual outline of the video mapping/video 26 200951832 DNA representation of the generated video sequence. The program contains the following stages. In the first phase, 7〇2, a local feature detector is used to detect the interest points of the video sequence Yin. Suitable Features The detector includes a Corner # comer detector ("Harbin # comer detector", published in C. Harris and M. Stephens, "A combined comer and edge detector", 1981, or by D. G. · Lowe's 2004 feature-based feature detector based on scale-independent feature point conversion (SIFT) scale space published in ucv's Distinctive image features from scale-invariant keypoints. 〇 Points of interest can be traced to several video frames to fix points that are meaningless or inconsistent in time (eg, appearing in a short period of time). This section will be described in more detail later. Then the remaining points are using local characterization symbols, for example: the scale is not (4), the point is (SIFT) is in the direction of the local distribution gradient, or by H·Bay, T. Tuytelaars, and L·van G〇〇1 at 2〇〇6 The accelerated robust feature points disclosed in the published mbuStfeatures (sur^ algorithm. The feature points and description algorithms should be designed to be robust or invariant to spatially distorted video sequences (eg, resolution changes) , noise compression etc. ©, the position of the feature point and the corresponding feature description symbol form the representation level of the most base video sequence. (4) Stage W 'video phase is divided into handsome_interval 7G6, which is often two (usually 3 to 30 ).) Such a segmentation can be phonetic...+1/' can be based on feature points tracked from previous stages. It is worthwhile to modify the video of the ΓΓΓ video, such as the space-time position of the first (4) (Feature point coordinates) are not used at this stage. 27 200951832 Conversely, the information in the 'time interval' is described as a method of using the "feature package". Here, 'similar to Sivic and Zis_an's proposed Method, all characterization symbols represent the use of visual vocabulary (acquired from the aggregation of representative descriptive symbols, for example, by vectorized green). Each transcript is visually touched. The corresponding closest element Lai. As mentioned above, the point H in this way is also a visual Xuan. According to this push, the thief can be regarded as a visual element of the "periodic table of elements". Unlike the prior art methods proposed by Sivic and Zisserman, here we discard those that are referred to as "representatives", "visual nuclear _", "nuclear acid", or occasionally referred to as "feature packs" 710. The spatial coordinates of the feature points, and the frequency of occurrence of different visual atoms in the time interval as a histogram (group or inward). Here the "visual nucleotides" 712 representing a certain number of video pictures from the video are basically The above is based on the discarding space coordinates and simply counting the frequency of occurrence of the "feature package" (this process is regarded as "package function" or "group function"). If the visual element is marked A collection is used to describe the content of each "feature package", then a visual nuclear thief can represent a histogram or sparse vector of the silk form. For the sake of the paste, if the "feature package" describes several video images, it contains There are three cases of the first feature point, two cases including the second feature point, and none of the cases are examples of the third feature point, which is used to describe the visual nucleotide of the video image or It can be represented by a histogram or a vector (3, 2, 〇). In this example, the visual nucleus (321) is represented by a histogram or vector (〇, 〇, 〇, 4, 〇, 〇〇) 〇,〇,5,0). ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 28 200951832 feature points (a part of the old feature points will belong to the original video and another part of the new feature points will correspond to the overlay layer). If the size of the overlap is not very obvious (in other words, most of the information in the facet belongs to the original video), you only need to have a certain proportion of feature points in the "feature package" (such as the evacuation vector). Can # correctly match two visual nucleotides. Finally, all of the visual nucleotides (or signatures) are aggregated into a sequence of sequences in the video map or video DNA*714. Each image (or visual nucleus®, “package”, histogram, or sparsevector) can be thought of as an inferred letter in an infinite alphabet, therefore, video DNA A sequence of words in a broad sense. The time matching of the two video sequences can match the corresponding video DNA using a plurality of different algorithms. In order to match the DNA sequence of the organism, this can be extended from a very simple "matching/unmatched algorithm" to a "d matrix" algorithm suitable for biological information to similar precisions that are used in biological information. Algorithm. Some of the more complex bioinformatics algorithms, such as those proposed by S. BNeedleman and C. D Wunsch in "A general method applicable to the search for similarities in the amino acid sequence of two proteins" in 1970 • Needleman- The Wunsch algorithm, the Smith-Waterman algorithm proposed by TF Smith and MS * Waterman in "Identification of common molecular subsequences" in 1981, and the "Basic Local Alignment Search Tool" by SF Alschul et al. Heuristic algorithms such as Basic Local Alignment Search Tool (BLAST). In general, a suitable sequence matching algorithm will operate based on the score (or distance) that defines the match and the quality of the match between the two video sequences. The matching score contains 29 200951832 Two main readings: the similarity (or distance) between nuclear hearing and gappenalty, and the standard algorithm for how to try to split the sequence without introducing a sequence. In order to do this, the distance between the nuclear bud acid in the first video and the corresponding (four) acid in the second video must be checked by Wei Xue's scale. How to change the "feature package" from the sequence of the first video sequence to the "special package" in the second video sequence? Similar values can be measured by measuring the matrix of two thieves with similar or dissimilar patterns. In the simple _ shawl, the similar filament _Euclideandistance or correlation vector (feature pack) represents each nucleotide. If you want to allow partial similarity (which happens often, especially if the visual kernel_may contain different feature points when editing in space), you should use a more complex matrix of weights or rejection of outliers. More complex distances may also take into account the probability of a change between the two cores: if two different cores are mutually different, the two different cores have similar possibilities. For example, consider a first video image of the first video sequence, a second video image of the same first video sequence, and a video overiay. Obviously, many of the video feature points in the package described in the first video will be similar to the video feature points in the package described in the second video, and because the video overlap '"variation" is different here. Video feature points. Void penalty is a function of introducing a gap between sequence nucleotides. If you use a linear penalty, simply find the gap multiplied by the number of preset constants. More complex gap penalties may take into account the probability that a gap will occur, for example, based on the statistical distribution of the position and length of the ad in the content. The following discussion identifies cases of similarity and dissimilarity between biological DNA and video dNA. Since the systems and methods discussed herein essentially convert the problem of matching the corresponding portion of the different media 30 200951832 into a problem that allows for similarity to the problem with the paired bio-dna sequence, this analogy can be examined more closely. And get some deeper understanding. Because the pairing technique of DNA sequences is carried out in a relatively late stage of development, which is related to the matching technology of video, the system and method have unexpected results, which shows some post-stage DNA bioinformatics research methods. How can the technology be unexpectedly applied to different areas of paired video signals. As discussed earlier, at the conceptual level, there is a strong similarity between the structure of biological DNA and the method of describing video DNA. A biological DNA is a sequence consisting of nucleotides, and video DNA is composed of visual nucleotides (a signature package of multiple video screens). A nuclear bitter acid in a living organism is composed of a group of elements in the periodic table of the elements: in the same way, the visual (four) acid is discarded by visual words (a standardized packing area with ten different feature points). A set of feature sets consisting of visual atoms (that is, feature points). ❹ Figure 8 is a graphical representation of the extracted video signal 8(9) and the DNA molecules and their constituents (the analogy between the nuclear phytic acid and the atomic structure, "monthly "video DNA" name peak_^ There is a conceptual similarity, and there is a slight difference in the appearance of coffee A. First, it appears in the second, the scale of the β, f sub-periodic periodic table is small, usually only include some elements (such as anti-oxygen, phosphorus, Nitrogen, etc.). The size of the vocabulary in N is typically at least -i side C atom. (44mw^〇S has several thousand to several million visual elements, biting hundreds of typical nuclear buds. The number of atoms in the acid molecule is relatively small (the number of characteristic capsules). The "visual atom" (feature point) is worth a thousand dollars. When it comes to biological acid wipes, the space is closed to 200951832 and the relationship between atoms It is important that in a visual nucleotide, this relationship between feature points (that is, the reconciliation between feature points) is usually not emphasized or ignored. The third 'biological DNA sequence in different nucleosides The amount of acid is small—usually four in the DNA sequence (rA) rT", rG", rc") nuclear acid, twenty in the protein sequence. In video DNA, each video nucleotide usually contains at least a few hundred or a few A "feature package" of thousands of different feature points can be represented by a histogram or a vector. Therefore, if a group or a packed area, such as 500 or 1000 standard feature points, is regarded as a standard, frequency analysis When selected, the “features” will be a coefficient of 500 or 1000 standard feature points appearing in a series of video frames described as “nucleotides” or “features”. Multiples consist of histograms or vectors, so the number of permutations in this amount that may individually represent different video nucleotides is large. These de facto differences will make the matching of video DNA only similar in nature to the creature. Pairing of sequences. In some ways, the problem of video pairing is more difficult, while in other respects it is simpler. More specifically, the pairing calculus is different in the face-to-face orientation. First, in Health In the sequence, since the number of different nucleotides is small, the pairing—the scores of the two nuclear collisions can be represented by a simple “pairing” or “not able to satisfactorily” result. That is, the nucleotides of one organism may be Is "A", "T", "G" or "C, and it is necessary to distinguish whether there is a pair of rA" and rA", or no. In contrast, the nickname of each video DN wiper It is a series of histograms, vectors or "features" of thousands of different coefficients. It is complicated to operate. Therefore, for video DNA, we need to use a more kind of core 32 200951832 The concept of "score method" or "distance method" between glycosides. Such a score can be regarded as a distance method between a histogram or a vector. In other words, how far is the distance between any two different “features”? ‘On the other hand, many different concepts, such as homology scores, embedding, deletions, r-point mutations, and other similar concepts, have significant similarities between these two different domains. "Picture 9" presents, in one embodiment, a © calculated video DNA in an input video sequence. The video DNA calculation process will receive 9 视频 video data and include the following stages: Feature Detection 1000, Feature Description 2, Feature Trimming 3000, Feature Reproduction 4000, Time Interval Segmentation 5000, and Vision Atom The gathering of 6000. The output of the process is a video DNA. Some stages may be performed in different embodiments or not at all. The following description details the different embodiments of the above video DNA calculation stages. As shown in the figure, the video sequence is divided into intervals of a set of time. "Fig." indicates that in one embodiment, the length of the video time interval 1〇2〇 is fixed (e.g., '1 second) and does not overlap. In another embodiment, time interval 1022 has some overlapping portions. Here, each view_ may consist of as many video faces as there are in i 'seconds (or one group to be), depending on the ratio of frames per second 'may be 10, 16 Faces of 24, 30, 60 or some minor combinations. In another embodiment, 'these intervals are changed according to the position of the lens (scene), or suddenly under the premise of satisfying the two consecutive pictures (reference brain > 14 may be miscellaneous) The next lens of the larvae is under the # every structure, these new path numbers will be calculated and will replace the path number of the original 33 200951832. If the missing path number exceeds certain limits, then replace the original road (4) The hearing will also exceed the _-face, this picture is reduced to a lens. If the lens has been in some places, in a video of this video, the nuclear acid may be from hundreds or thousands of videos. If the picture is very long, in another embodiment, _ is continuous and synchronized in each class (Ref. 1026) ❹ Feature Test 1〇〇〇 (Refer to Figure 9) A feature detector operates on the video data_ and produces a set of N invariant feature points, {(from, fat (please refer to the "9th" face), where x, y, t represents the space-time correspondence of feature points, respectively. Feature detection step 1000 has been shown in detail in "U-figure" and there is an example of using green. The side of the machine is executed under the screen of -Jiang. At time t, a facet, a set of feature points of ^ is tilted. A typical feature point has a two-dimensional edge or secret. The standard calculus of the invariant feature point _ can be considered in _!<. For example, 秘 ❻ ❻ track, tracker (KLT)...etc.

Nt代表性的值範圍從1〇到麵不等。在特別的實施例中，The representative values of Nt range from 1〇 to 面. In a particular embodiment,

Nt的值為1GG、2GG、...、麵。在另—個實施例裡，价的值是預設_ ’且是-個運用特徵_演算法的結果。在另—個實施例裡’这個特徵_是在—個存在於_與空間上的資料裡執行，產生彳木σ丨(wn準的特徵躺演算法三能會被用在此目的上。、十，述2000 (請參考「第9圖」）。每-個特徵點在债測描 "時’某個特徵描述符號將會被計算出來，產生 : 述符號的集合（「第9圖」細道與特徵點相符合。一個寺: 34 200951832 Γίΐ號是個局部視师訊的代表。在電腦世界裡的東西有 =點像是，SIFT或是瓣特徵描述符號）是以在徵描述符號可以被表示成-釘維空_向量，個特個航的特徵描述符號F=128、瞻特徵點符號F=64。]如·— ^-轉別的實施_，舰描述雜是在—個 r情況下所計算出來的，也就是說它們代表著在某 ❹ 、曰目=之畫面上的像素。標準特徵描述符號像《SIFT或SURF可例子下細。在另—個實施例中。特徵描述符號是存在The value of Nt is 1GG, 2GG, ..., face. In another embodiment, the value of the valence is the result of the pre-set _' and is a feature _ algorithm. In another embodiment, the feature _ is executed in a data that exists in _ and space, and produces a 彳 σ 丨 (the feature lie algorithm of wn can be used for this purpose. , 10, and 2000 (please refer to "Section 9"). Each feature point in the debt test "when" a feature description symbol will be calculated, resulting in: a collection of symbols ("9th figure The path is in accordance with the feature point. A temple: 34 200951832 Γίΐ is a representative of the local visual teacher. In the computer world, there are things like = SIFT or petal feature descriptors. It is expressed as a nail-empty _ vector, a feature description symbol F=128, and a feature point symbol F=64.] If the implementation is _, the ship description is in the r In the case of the calculations, that is to say they represent the pixels on the screen of a certain 曰, 曰目=. The standard characterization symbols are as detailed in SIFT or SURF. In another embodiment, the characterization Symbol is present

科間，、郎上的，也就纽它們代表著那些存在於時間與空間上附近的像素。辟特徵描述瓶秘是—個以運用在it烟子上。 J 正3000 (凊參考「第9圖」）。這邊在所有的特徵點找到-個-致特徵點的子集3_。在不—樣的實施例裡，一致性可能暗轉如的—雜（特徵點沒有突⑽雜加上他 ❹相鄰時間上的位置是相似的），暫時一致（特徵點沒有出現或疋突然地消失），或是時空的一致（上面所說的結合）。，在個實施例禮，在「第12圖」中，為了找到前後一致的特、徵點:執行追縱。在特徵點追縱演算法3100裡，它嘗試著找出幾、且二地被呈現在-個擁有充分寬大並且連接排序畫面裡的特徵 j以樣就可以移轉在—個翔晝面輯_·假特徵點。 k二假特徵點會干擾縣所偵_的圖像，例如：從鏡子的反射裡如果把那二叙特徵點移除掉將會改善準確度以及晝面裡的視覺内容。 35 200951832 的追:嘗:5::裡’運用-個畫面基礎的追蹤方式。這種方式 :者去找$彳在晝面纟和之兩個特徵工對應處，其中，通常師㈣的加；。=r:裡，多個畫面是在同-個時間内被追縱追跟态3100輸出執跡τ 2表時_始與時間結束時，而H剛好是所經朗時間。有可點的她度來決定這些關健⑽—錄跡裡的特在這裡用上變）^相：）、移動（特徵點的位置不隨著軌跡而有明顯的改變），或柏者皆是錢輸可㈣的舰點追賴轉算式可以、檢查所制獅之-雜並錄行軌歸整删。一個實施例裡’在-些特定的門釘，這些執跡將會有某部分因為不符合而，移除掉。在另—個實施例裡，顯示著空間相似處的高度變異 (突然的移動）的執跡將被刪除。在另一個實施例裡，那些特徵，的特徵描述符號如果他們的執跡也顯示著高機率的變異性，那 k二也像會被移除。移除的結果是_個是軌跡τ，的子集m。其中-個實施例裡，特徵集合仏，讀^以及對應處描述符此都會在-個魏t _的時雜計算，而且追織會被初始化到 ’ Kalman過濾器（fllter)將預測在下一個晝面广，中之特徵點的位置⑽摘。這個集合^心光㈣應處描述符 ί/;1=Ι在畫面什汾下被計算。每一個特徵點〜^，乂剛好都相對照特 36 200951832 徵點夂，A，/，,的子集而且是在一個圓裡有著一定的半徑、圓心在為(ο，Λ(ο，還會相符合於所選擇最接近的描述符。當晝面鄰近排序 •沒有出現一個適當的配對時，軌跡被終結（失效）。只有那些符合充分短暫時間的軌跡點會被保存下來。在一個實施例中，Kalman過濾器與恆定速率模型（vel〇dty) 一起使用，且以估計之特徵點位置協方差決定下一個畫面的搜半徑。 0 一種基於追蹤（「第12圖」步驟3200)之特徵點修整的實施例在「第13圖」中有更詳細描述。輸入特徵點座位置1010、對應之特徵描述符號2010以及特徵點的執跡3丨丨〇，在每一個軌跡中之執跡，度為「d」、移動異為「mv」、以及描述符號變異「办」會破計算出來。這些值通過—㈣的咖及決策規職，會去除長度太丨、或變異性太大的軌跡。結果便是屬於經過修整後剩下來之執跡之特徵點的子集3〇1〇。一個使丟棄軌跡之可能的決策規則如下所示： ❹（d〉th_d) AND (mv < th_mv) AND (dv < th—dv) 其中th_d」為持續時間的門植，「th—mv」為移動變異的門植， • 「th_dv」為描述符號變異的門檻。 ' 特徵表示4000 :回至「第9圖」，步驟侧表示以視覺字彙方式表達修整過之軌跡上的特徵點。這個階段的結果式產生一系列的視覺原子侧。視覺字彙是麵為κ之概觀符號的聚集 (視覺元素）’在這裡以表示。視覺字彙能夠被預先計算的， ^列來說’由表讀頻相的集合中收集大量的特徵點並在描述符號上表現向里化。在不同的實施例中，〖值可以用1麵、2〇〇〇、 37 200951832 3000、...、loooooo 代入。 >依據兀素的數字/由最接近特徵點/之描述符號的視覺詞囊取代每個特徵點Ζ·。在—個實施财，以最接近演算法（nearest ^eighb^r algorithm)來尋找特徵點z•的表示式， * 其中，為《·«識別符號空間之基準。在另一個實施例中，近似鄰近演算法（approximatenearestneighborh〇〇d)被使用。由於特徵點 / 被表示為，被稱為視覺原子。 —在-個實施例中，在表示視覺字彙中之特徵點之前，會找$ ❹ 每個軌跡的特徵點表示式。它可以沿著軌跡由特徵點的描述符號中獲取特徵描述符號的平均值、中位數或_。在一個實轭例中’沒有辨識力的特徵點會被修整掉。沒有辨識力的特徵點疋那些與多數視覺原子幾乎同距離的特徵點。可以從與離它第一近與第二近的鄰居之距離的比值來判斷這些特徵點。視覺原子聚集6000 :為了每一個在步驟5〇〇所計算出來的時間區間，視覺原子被聚集為視覺核苷酸。視覺核苷酸序列（視頻 ❹ DNA6010)被輸出。視覺核苷酸j被建立為〖個容器的直方圖（κ 為視覺詞彙的大小），第η個容器為在時間區間内出現之η類型之 · 視覺緣子的數目。，在一個實施例中，在區間[Wj内的直方圖依據下列的公式將視覺原子在區間内的時間位置加上權重： Κ = i:lj =n 其中’ w(〇是一個加權函式，〜是直方圖中第n個容器的值。在一個實施例中，權重在區間中心被設定為的最大值，並向區間的邊 38 200951832 緣減少’例如可以依據高斯公式 w(0 = exp f2 1 •在另一個實施例中’偵測出區間[yj内剪輯鏡頭，且由鏡頭的邊界至區間中心糾的吨)被設為零。在特定的實施例中，直方圖中的容器更為了減少不可靠的容器的影響而加權。例如，第η個容器的權重與類型η的視覺原子的典型頻率成反比。這個類型的權重就像是反轉在文字搜尋引擎 0 中之文件頻率的權重。在另一個實施例中，第η個容器的權重與依據典型突變的代表所计算之第η個容器的變異成反比，以及與相同内容之第η個谷器的變異成正比。一旦為了兩個視頻序列計算視頻DNA，這些不同的視頻序列能夠依照下面的描述以時間進行匹配（校正）。在一個實施例中，查詢視頻DNA所代表的視覺核苷酸的序列化此與在資料庫中之視頻DNA所代表的視覺核苷酸的序列{ίγ=ι之間的時間對應處以〇以下的方式被計算。兩個序列之間的匹配，核苷酸④係對應核苷酸〜，或對應核苷酸\以及〜之間的空隙，同樣的，核苷酸〜對應核苷酸A，或對應核皆酸仏以及A+1之間的空隙。{ςτ,.Α以及之間的匹配能夠以擁有K個對應處的序列fe，人)丨【=1、擁有G個空隙的序列也‘以匕以及擁有G’個空隙的序列{(4,以)來表示，其中，（^，九人)表示在核苷酸气以及L+1之間的空隙長度/»，子序列表示相似處’仏，人乂)表示在核苷酸A以及氕+1之間的空隙長度/„，子序列表示相似處。匹配會依據下列公式配置一個分數 39 200951832 G G* S = Σσ(^Λ ) + + 'ZsiLJnJ») 众=1 w=*l n 篇1 其中，〇(%，&)是計算核苷酸a與核苷酸〜相似程度的評分方程式，是空隙罰分。根據上面的描述，許多可選擇的演算法被用來做配對的計，算，包括簡單到非常複雜的。在本發明的一個實施例中， · Needleman-Wunsch演算法被用來依據最大總分$來搜尋配對，在另一個實施例中則使用Smith-Waterman演算法，而在另一個實施例中，使用BLAST演算法。在自由選擇的實施例中，配對最大總分S是利用下列方法而 ® 完成的。在第一個階段，在資料庫中的查詢和排序之間找尋有的小固定長度w的良好配對。這些良好配對被稱為種子（seed)。在第'一卩又’匕S又法從種子的兩個方向延伸配對。無差距對齊程序 (un-gapped alignment process)嘗試增進對齊分數（alignment score)使用在每個方向延伸長度w的初始種子配對。插入和刪除則不在這階段考慮。假如找到高分的無差距對齊，資料庫序列則直接跳到第三階段。在第三階段，查詢序舰#料料狀_ ❹ 有差距對齊（gapped aiignment)則可以用Smith_Waterman演算法得到。 * 在一個發明的實施例中，空隙罰分是線性的，可以用 · 以= <來表示，其中α為參數。在其他的實施例，空隙罰分疋相近的’可以用机，九人) = 0+4])來表示，其中々為另一個參數。在實施例中，評分方程式吻,Λ)描述代表核苦酸仏的直方圖 *和代表核皆酸〜的直方狀之_相似性。在另-個實施例中， 200951832 相似性可用〈^〉的内積計算。在自由選擇的實施例中，相似性可藉由從學習（training)資料得到的一個向量權重（weight)加權而最大化評分方程式的識別力。或者，評分方程式#% ^ )是核苷酸％的直方圖A和代表核苷酸^的直方圖&，之間距離的反比。在其他 • 實施例中，距離可用Lp基準（Norm)計算聃 - 。在特定的實施例中，距離是直方圖之間的Kullback_Leiber分〇散性。在其他實施例中，距離是直方圖之間的EarthMover距離。在特定的實作絲，評分方程式σ“)與跟核賊〜變異成另個核普g“的空間或時間失真應用於基本（⑽触細）視頻序列的機率成正關係。也就是說可朗絲㈣酸⑽直方圖办變異成另一個代表核苷酸气的直方圖/Γ的機率表示。在這個例子，機率可用 P{hW) = Y\P{hn\hn) η 來估計’當MW是直方圖㈣第Ν個容器（bin)的值改變為的機率。機率紙丨W可從實際經驗上的每個容器獨立的訓練資料來得到。率在另-個例子，Bayes理論用來表示評分方程式吻心為機 p{h'\h)=^)pm m 其中，p_是用先前解釋過的方法計算。户 ρ^)-ψΛΚ) ⑴表不為 41 200951832 其中’ w為測量直方圖A中_個以及獨立的為每個容器從實際經驗二 ^為乂時的機率，這個通常是有_，不僅要—二丨練讀估計。不同的視頻之間，而且要找到第—個二體晝面或時間序列從兩個 -個視射的第群像素），在第視頻中的第二個空間對齊之間的不容’」，在第二個用的用在比較不同來源或解析度到°或者’這個有時是有 ❹ 拍攝電視螢幕也許想要準•定：二= 的疋什麼。在兩個例子之中，它 p目次電汾播空間對齊和時間（畫面編號）對^有用的在決定兩個不同視頻的在本發明的實施例中，代表覺核聽,,和代表資料庫序物時序=中_區⑽的視謦枋：《：缺的時間區間匕，纥]的最佳配對之視見核皆酸,7之間的空間關聯可用下列方法計算：對之視 ❹ 挑述符從。另-個晝—被可由兩個集合二間r到徵二士广}二之對應描述符。對應處的的配對則被拒絕。相關分數（c_spondingp〇int)可表示為 ^y〇,{x'Jk,y.j 〇 ‘旦此可藉由最小值找到Branches, Lang, and New, they represent pixels that exist in time and space. The characteristics of the bottle are described as a bottle to be used on it. J is 3000 (refer to "Figure 9"). Here, at all feature points, find a subset of the feature points 3_. In the case of the non-like embodiment, the consistency may be as dark as possible—the miscellaneous features (the feature points are not sudden (10) and the positions at the adjacent time are similar), and are temporarily consistent (the feature points are not present or suddenly) The ground disappears, or the consistency of time and space (the combination mentioned above). In the example of the ceremony, in the "12th picture", in order to find consistent features, the point: execution of the memorial. In the feature point tracking algorithm 3100, it tries to find out that several and two places are presented in a feature that has a sufficiently large and connected sorting picture, so that it can be transferred in a picture. · False feature points. The k-fake feature points interfere with the image of the county. For example, removing the feature points from the mirror will improve the accuracy and visual content in the face. 35 200951832's chase: Taste: 5:: 里' Use - a screen based tracking method. In this way: the person goes to find the corresponding feature of the two features in the face, and usually the addition of the teacher (four); In =r:, multiple screens are tracked in the same time. Follow-up state 3100 output trajectory τ 2 table _ start and end of time, and H is just the time of lang. There are some points to determine these key points (10) - the special features in the recordings are changed here) ^ phase, movement (the position of the feature points does not change significantly with the trajectory), or the cypress It is the money to lose (4) the ship's point of recourse can be converted, the inspection of the lion-mixed and recorded tracks are deleted. In an embodiment, the specific nails are removed, and some of these tracks will be removed due to non-compliance. In another embodiment, the manifestation of a highly variable (sudden movement) showing a spatial similarity will be deleted. In another embodiment, the feature descriptors of those features, if their profiling also shows a high probability of variability, then k2 will also be removed. The result of the removal is _ a subset m of the trajectory τ. In one of the embodiments, the feature set 仏, read ^ and the corresponding descriptor will be calculated in the time of -wei t _, and the chase will be initialized to the ' Kalman filter (fllter) will be predicted in the next 昼The area is wide, and the position of the feature point in the middle is (10). This collection ^ heart light (four) should be in the descriptor ί /; 1 = 被 is calculated under the picture. Each feature point ~^, 乂乂都 36 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 Corresponds to the closest descriptor selected. When the face is adjacent to the sort • No proper pairing occurs, the track is terminated (failed). Only those track points that meet a sufficiently short time are saved. In one embodiment In the Kalman filter, the constant rate model (vel〇dty) is used together, and the estimated feature point position covariance is used to determine the search radius of the next picture. 0 A feature point based on tracking ("Fig. 12" step 3200) The trimmed embodiment is described in more detail in "Figure 13." The input feature point position 1010, the corresponding feature description symbol 2010, and the feature point's execution 3丨丨〇, the persecution in each track, degree The "d", the movement difference "mv", and the description symbol variation "do" will be calculated. These values will pass the - (4) coffee and decision-making rules, and will remove the trajectory that is too long or too variant. The result is a subset of the feature points that have been traced after the trimming. A possible decision rule for discarding the trajectory is as follows: ❹(d>th_d) AND (mv < th_mv) AND (dv < th—dv) where th_d is the duration of the gate, "th-mv" is the gate of the mobile mutation, and "th_dv" is the threshold for describing the symbolic variation. 'Character representation 4000: Back to Fig. 9 shows that the step side expresses the feature points on the trimmed track in visual vocabulary. The result of this stage produces a series of visual atomic sides. The visual vocabulary is the aggregation (visual element) of the κ-like overview symbol. It is represented here. The visual vocabulary can be pre-computed, ^ column says 'a large number of feature points are collected from the set of table read frequency phases and represent the clarification on the descriptive symbols. In different embodiments, the value You can substitute 1 side, 2〇〇〇, 37 200951832 3000,..., loooooo. > Replace each feature point 兀· according to the number of pixels/the visual capsule with the descriptive symbol closest to the feature point/. In the implementation of the fiscal, to the closest calculation (nearest ^eighb^r algorithm) to find the representation of the feature point z•, where * is the reference for identifying the symbol space. In another embodiment, the approximate proximity algorithm (approximatenearestneighborh〇〇d) is used. Since feature points / are represented as, they are called visual atoms. - In an embodiment, before the feature points in the visual vocabulary are represented, $ 特征 will be found for each feature's feature point representation. The trajectory obtains the mean, median or _ of the characterization symbols from the descriptive symbols of the feature points. In a solid yoke, the feature points that are not discernible will be trimmed. Feature points that are not discerning 特征 those feature points that are almost at the same distance from most visual atoms. These feature points can be judged from the ratio of the distance from the first closest neighbor to the second nearest neighbor. Visual Atomic Aggregation 6000: For each time interval calculated in Step 5, the visual atoms are aggregated into visual nucleotides. The visual nucleotide sequence (video ❹ DNA6010) was output. The visual nucleotide j is established as a histogram of the container (κ is the size of the visual vocabulary), and the nth container is the number of visual edges of the η type appearing in the time interval. In one embodiment, the histogram in the interval [Wj adds weight to the temporal position of the visual atom in the interval according to the following formula: Κ = i:lj =n where ' w (〇 is a weighting function, ~ is the value of the nth container in the histogram. In one embodiment, the weight is set to the maximum value at the center of the interval and decreases to the edge 38 200951832 of the interval 'for example, according to the Gaussian formula w (0 = exp f2 1 • In another embodiment, 'detecting the interval [with clipping of the lens within yj and from the boundary of the lens to the center of the interval) is set to zero. In a particular embodiment, the container in the histogram is more Weighting reduces the impact of unreliable containers. For example, the weight of the nth container is inversely proportional to the typical frequency of the visual atom of type η. This type of weight is like the weight of the file frequency in the text search engine 0. In another embodiment, the weight of the nth container is inversely proportional to the variation of the nth container calculated from the representative of the typical mutation, and is proportional to the variation of the nth trough of the same content. The two video sequences compute video DNA that can be matched (corrected) in time according to the description below. In one embodiment, the serialization of the visual nucleotides represented by the video DNA is queried in the database. The sequence of the visual nucleotide represented by the video DNA in the {ίγ=ι is calculated in the following manner. The match between the two sequences, the nucleotide 4 corresponds to the nucleotide ~, or Corresponding to the gap between the nucleotides \ and ~, the same, the nucleotide ~ corresponding to the nucleotide A, or the corresponding gap between the core and the A+1. {ςτ,.Α and the match between It is possible to have a sequence of fe corresponding to K, a person 丨 [=1, a sequence having G gaps, and a sequence having a G' gap, {(4, 、), where (^, Nine) indicates the length of the gap between the nucleotide gas and L+1/», and the subsequence indicates the similarity '仏, human 乂) indicates the length of the gap between nucleotides A and 氕+1/, The subsequences represent similarities. The match will be configured according to the following formula: 39 200951832 GG* S = Σσ(^Λ ) + + 'ZsiLJnJ») 众=1 w=*ln Article 1 where 〇(%, &) is a scoring equation for calculating the degree of similarity between nucleotide a and nucleotide ~, which is a gap Penalty. According to the above description, many alternative algorithms are used to make pairing calculations, including simple to very complex. In one embodiment of the invention, the Needleman-Wunsch algorithm is used to The maximum total score is $ to search for pairing, in another embodiment the Smith-Waterman algorithm is used, and in another embodiment, the BLAST algorithm is used. In the freely chosen embodiment, the paired maximum total score S is utilized The following methods are done by ®. In the first phase, a good pairing with a small fixed length w is found between the query and the sort in the database. These good pairs are called seeds. In the first and second 匕S, the method extends the pairing from the two directions of the seed. An un-gapped alignment process attempts to improve the alignment score using an initial seed pair that extends length w in each direction. Inserts and deletes are not considered at this stage. If you find a gap-free alignment of the high scores, the database sequence jumps directly to the third stage. In the third stage, the query order ship #料料_ ❹ gaped aiignment can be obtained by the Smith_Waterman algorithm. * In an embodiment of the invention, the gap penalty is linear and can be represented by = <, where a is a parameter. In other embodiments, the gap penalty is similar to that of the machine, nine people = 0 + 4]), where 々 is another parameter. In the examples, the scoring equation kisses, Λ) describes a histogram representing the nucleotide * * and the _ similarity of the histogram representing the nucleus. In another embodiment, 200951832 similarity can be calculated using the inner product of <^>. In a freely chosen embodiment, the similarity can maximize the recognition power of the scoring equation by weighting a vector weight derived from the training material. Alternatively, the scoring equation #% ^ ) is the inverse of the distance between the histogram A of the nucleotide % and the histogram & In other • embodiments, the distance can be calculated from the Lp reference (Norm) 聃 - . In a particular embodiment, the distance is the Kullback_Leiber split between the histograms. In other embodiments, the distance is the EarthMover distance between the histograms. In a particular implementation, the scoring equation σ ") is proportional to the probability that the nuclear thief ~ mutates into another nucleus or space distortion applied to the basic ((10) touchdown) video sequence. That is to say, the lens (tetra) acid (10) histogram is mutated into another probability representation of the histogram/Γ of the nucleotide gas. In this example, the probability can be estimated by P{hW) = Y\P{hn\hn) η 'When MW is the histogram (4) the probability that the value of the second bin (bin) changes to . The probability paper can be obtained from the actual training data of each container in the actual experience. In another example, Bayes theory is used to represent the scoring equation as the machine p{h'\h)=^)pm m where p_ is calculated using the previously explained method. Household ρ^)-ψΛΚ) (1) The table is not 41 200951832 where 'w is the probability of measuring _ in the histogram A and independent for each container from the actual experience two 乂, this usually has _, not only - Second reading practice. Between different videos, and to find the first two-body face or time series from two-image group of pixels, the second space alignment in the video is not allowed to '' The second one is used to compare different sources or resolutions to ° or 'this is sometimes a flaw. Shooting a TV screen may want to be accurate: two = what. Among the two examples, it is useful to determine the spatial alignment and time (picture number) pairs in the embodiment of the present invention that determines two different videos, representing the auditory, and representative databases. Sequence Timing = Medium_Zone (10) Vision: "The missing time interval 匕, 纥] is the best pairing. The spatial correlation between 7 can be calculated by the following method: Descriptive from. Another - 昼 - can be composed of two sets of two r to the two of the two descriptors. The pairing at the corresponding location is rejected. The correlation score (c_spondingp〇int) can be expressed as ^y〇,{x'Jk,y.j 〇 ‘this can be found by the minimum value

JU 算法執> 個實施例中，最小化可用隨機樣本共識（^NSAC)演仃。在另一項實施例，最小值使用迴圈再加權最小平方適 42 200951832 Θ Uemtivdyeweighted least squares fming)演算法執行旋轉’大小，或扭曲轉換常常是有用的^ 對執在其中一個實施例中，轉換式Τ的型式為 r cos^ sin^ ιι\ * t Τ -sin^ cos0 0 0 在其他一個實施例中，轉換式T的型式為 cos^ sin0 u\ T_. αύηθ crcos0 ν 0 0 1 Ο 在其他一個實施例中，轉換式Τ的型式為 Τ b c d ν 0 0 1 ：ive 在其他-個實施例中’轉換式τ的型式為投影轉換(Pr0細i Transformation)。找尋兩個序列之間的時空對應處可由「第14圖」表示。程序包含以下的階段。In the JU algorithm implementation, the random sample consensus (^NSAC) can be minimized. In another embodiment, the minimum is used to re-weight the least squares 42 200951832 Θ Uemtivdyeweighted least squares fming) The algorithm performs a rotation 'size, or a twisting conversion is often useful ^ in one of the embodiments, the conversion The type of Τ is r cos^ sin^ ιι\ * t Τ -sin^ cos0 0 0 In another embodiment, the type of the conversion formula T is cos^ sin0 u\ T_. αύηθ crcos0 ν 0 0 1 Ο in other In one embodiment, the type of the conversion Τ is Τ bcd ν 0 0 1 :ive In other embodiments, the pattern of the conversion τ is a projection transformation (Pr0 fine transformation). Finding the space-time correspondence between two sequences can be represented by "Fig. 14". The program contains the following stages.

1·視頻DNA計算：兩個視頻資料集合9〇〇和9〇1被輸入到一個視頻DNA計算階段141〇。階段1410將被詳細表示在「第9圖」的步驟1〇〇〇、2〇〇〇、3〇〇〇、4000。這個階段可用線上執行或者是先計算並加以儲存的方式執行。 2.時間比對：得出結果的視頻DNAs 6010和6〇11輸入到一個時間對齊階段1420 ’計算時間對應關係.1425。時間對應關係本質上是一個轉換，從視頻資料9〇〇的時間座標系統轉換到視頻資料901。 3·空間比對：空間對應關係1425用在階段1430用來挑選視 43 200951832 頻900和視頻901的時間對應關係子集。視頻900和視頻 901中被挑選的子集1435和1436分別輸入到空間對齊階段1420，計算空間對應關係1445。空間對應關係本質上是一個轉換，從視頻資料900的空間座標系統轉換到視頻，資料901。 , 一個實施例討論如下’其中輸入視頻序列的視頻DNA視頻序列如「第9圖」所示被計算。視頻DNA在計算過程中輸入視頻資料900，並包括以下階段：特徵偵測励〇、特徵描述2〇〇〇、特徵修整3000、特徵表示4000、時間區間分割50〇〇以及視覺原子聚 Ο 集6000。程序輸出一個視頻DNA6〇1〇。特徵偵測1000: —個SURF特徵偵測器（在r Speeded Up祕⑽ Features」中有描述，出自於2〇〇6年5月第九屆電腦視覺歐洲研討會）在視頻序列900的每個畫面中獨立運作，在每一個晝面「丈」生產出一個Nt=150最強不變的特徵點位置1〇1〇 (請參考「第9 圖」）。特徵描述2000:在特徵摘測謂的階段為每個特徵點進行# ❹ 測，一個64維之SURF特徵描述符被計算，就如同2〇〇6年5月第九屆電腦視覺歐洲研討會的一篇文章「Speeded1. Video DNA calculation: Two video data sets 9〇〇 and 9〇1 are input to a video DNA calculation stage 141〇. Stage 1410 will be shown in detail in steps 1 〇〇〇, 2 〇〇〇, 3 〇〇〇, 4000 of "Fig. 9". This phase can be performed online or calculated and stored. 2. Time alignment: The resulting video DNAs 6010 and 6〇11 are input to a time alignment phase 1420' to calculate the time correspondence. 1425. The time-correspondence relationship is essentially a transformation that shifts from the video coordinate system of the video data to the video material 901. 3. Spatial alignment: The spatial correspondence 1425 is used in stage 1430 to select a subset of time correspondences for the video 2009 200951832 frequency 900 and video 901. The selected subsets 1435 and 1436 of video 900 and video 901 are input to spatial alignment stage 1420, respectively, to compute spatial correspondence 1445. The spatial correspondence is essentially a transformation from the spatial coordinate system of video material 900 to video and data 901. An embodiment discusses the following in which the video DNA video sequence of the input video sequence is calculated as shown in Fig. 9. The video DNA is input into the video material 900 during the calculation process, and includes the following stages: feature detection excitation, feature description 2〇〇〇, feature trimming 3000, feature representation 4000, time interval segmentation 50〇〇, and visual atomic clustering 6000 . The program outputs a video DNA of 6〇1〇. Feature Detection 1000: - A SURF Feature Detector (described in r Speeded Up (10) Features", from the 9th May of the 9th Computer Vision Europe Symposium) in each of the video sequences 900 The screen operates independently, and each of the "faces" produces a Nt=150 strongest feature point position of 1〇1〇 (please refer to "9th figure"). Feature Description 2000: For each feature point in the feature extraction phase, a 64-dimensional SURF feature descriptor is calculated, just like the 9th Computer Vision Europe Symposium in May 2002. An article "Speeded

Up Robust *Up Robust *

Features」中描述的一樣。特徵修整3000 :這是一個選擇性的步驟，在本實施例中並沒有實行。 &特徵表示4000 .特徵點代表了 κ=1〇〇〇的條目所組成的視覺詞彙。代表元素被使用相似最近鄰演算法（叩卿^她 nearest neighbor algorithm)計算，在 1993 年出版之「誕 acm_siam 44 200951832The same as described in Features. Feature Trimming 3000: This is an optional step and has not been implemented in this embodiment. The & feature indicates 4000. The feature points represent visual vocabulary consisting of κ=1〇〇〇 entries. The representative element is calculated using a similar nearest neighbor algorithm (叩近^ her nearest neighbor algorithm), published in 1993, "Acm_siam 44 200951832

Symposium on Discrete Algorithms」（SODA)第四期之 271 到 280 頁中 ’S.Aiya和 D.脱 Mount 有提到「Approximate Nearest NeighborSymposium on Discrete Algorithms" (SODA), Issue 271 to 280, ‘S.Aiya and D. De Mount, mentioned “Approximate Nearest Neighbor

Searching」的描述。只有距離最近鄰居在9〇%距離以下的到第二 .近的鄰居的特徵點被保留。這個階段的結果是一組視覺原子4〇1〇 • 的集合。視覺詞彙的特徵點描述階段是從一個序列的七十五萬個特徵描述符由應用先前描述階段的一套分類視頻的内容來當作培訓資 ❺料預先計算獲得。K均值演算法被用來使量化訓練群集分成1000 個叢集。為了減輕計算的負擔，在K均值演算法中的最近鄰搜尋被其最近似變量所取代，這同樣在1993年出版之「^. ACM-SIAM Symposium on Discrete Algorithms」（SODA)的第四期之271到280頁中，S· Arya和D. M. Mount所提出之「Approximate Nearest Neighbor Searching」有相關的描述〇時間區間分割5000 :視頻序列被分割為一組固定長度一秒的時間區間1020 (參考「第1〇圖」）。〇視覺原子聚集6000 :對於在階段5000被計算的每個時間區間’視覺原子内部聚集成視覺核苷酸。由視覺核苷酸所產生的序 *列（視頻DNA6010)是過程中的產出。視覺核苷酸被創造成像一 •個Κ>1000個容器（bin)的直方圖，第η個容器負責視覺原子在形態η中出現的時間區間次數的計數。在兩個不同或差異更大的視頻DNA被生產之後，從這些材料產生的視頻DNA ’可由以下的方式來檢查其對應處及配對結果：時間的配對1420 (請參考「第14圖」）可以用SWAT演算法與線性空隙罰分的參數《 = 5及# = 3，這時會用到加權評分方程式。 45 200951832Description of Searching. Only feature points to the second nearest neighbor that are below 9〇% of the nearest neighbor are retained. The result of this phase is a collection of visual atoms of 4〇1〇. The feature point description phase of the visual vocabulary is obtained by pre-calculating the content of a set of classified images from a sequence of seven hundred and fifty thousand feature descriptors as a training resource. The K-means algorithm is used to divide the quantitative training cluster into 1000 clusters. In order to reduce the computational burden, the nearest neighbor search in the K-means algorithm is replaced by its most approximate variable, which was also published in the third issue of "^. ACM-SIAM Symposium on Discrete Algorithms" (SODA) in 1993. From pages 271 to 280, "Approximate Nearest Neighbor Searching" by S. Arya and DM Mount has a description. Time interval segmentation 5000: The video sequence is divided into a set of time intervals 1020 of a fixed length of one second (refer to "Number" 1〇图”).视觉 Vision Atomic Aggregation 6000: For each time zone that is calculated at stage 5000, the visual nucleotides are aggregated inside the visual atom. The order * column (video DNA 6010) produced by the visual nucleotides is the output in the process. The visual nucleotides are created to image a histogram of 1000 bins, and the nth container is responsible for counting the number of time intervals in which the visual atom appears in the form η. After two different or more different video DNAs are produced, the video DNA generated from these materials can be checked for their correspondence and pairing results in the following ways: Pairing of time 1420 (please refer to Figure 14) Using the parameters of the SWAT algorithm and the linear gap penalty " = 5 and # = 3, the weighted scoring equation is used. 45 200951832

a(h,h') = 權重可以憑藉著過去的經驗計算。為了這個目的，各種訓練視頻序列可以轉化彻-套隨機空間和時間變形，包括模糊、解析度、寬高比和晝面速率的變化，且其視頻舰可以被計算出。在每—容㈣視覺㈣酸懸異就和根據變形估計的每一容器對應的視覺核苷酸差異-樣。對於每容器n，權重4設定為後兩者的差異。a(h,h') = Weights can be calculated based on past experience. For this purpose, various training video sequences can be transformed into random-space and temporal distortions, including changes in blur, resolution, aspect ratio, and face velocity, and their video ships can be calculated. The peri- (iv) visual (iv) acid suspension is the same as the visual nucleotide difference corresponding to each container estimated from the deformation. For each container n, the weight 4 is set to the difference between the latter two.

空間匹配1440 (請參考「第14圖」）：校正的空間可以由特徵點為兩個1秒的相應區間代表兩個集合的視頻資料_及9〇1來完成，其-致處是從先前㈣間校正階段咖。—麵間中的每個特徵點，在其他區_相雜徵断由歐基里德距離來最小化他們之間的各自。過簡產出是兩個姆應特徵點的集合 {(Ά，, )}，{〇〜，〆,，，;)}Space matching 1440 (please refer to "Fig. 14"): The corrected space can be completed by the feature points being two 1 second corresponding segments representing the video data of the two sets _ and 9〇1, which is from the previous (d) Inter-correction phase coffee. - Each feature point in the face, in other areas _ mixed with the Euclid distance to minimize their respective. The oversimplified output is a collection of two m should be characteristic points {(Ά,, )}, {〇~,〆,,,;)}

-旦找到對應處’就可使用RANSAC演算法來針對對應的集合進行形式 f a b u T= -6 c ν Ο 1> 的轉換。另一方面，本發明是一種時空配對的數位視頻資料，其中包括多時間㈣的視頻晝面。在這纖財，方法包括了在數位視頻資料執行時難配的步驟’其中數位視崎料包含在多數的時間匹配視頻晝面以獲得相似矩陣，相似矩陣的空間匹配代表每個視頻畫面使用包含匹配評分的代表性，—個她元件、—個空隙 46 200951832 罰分元件以及-個使用局部修正演算法來運作的表示法（就像一個基於基因組配對的演算法，或其他合適的演算法）；和實行空間匹配的數位視頻資料，其中包括多種視頻晝面時間匹配使用所獲 •得的相似矩陣。在這進行空間匹配的步驟本質上係獨立於實行時 • 間匹配的步驟。上述的方法也可應用Needleman-Wunsch演算法、SWAT或其他相似型祕演算法。以上的方法可以用基因組配對演算法來實 φ行，像是基本局部校正搜尋工具可被使用來比較生物學上的序列、蛋白質或核苷酸DNA排序類的演算法。以上的方法可以進一步的包含在數位視頻資料上執行局部特徵偵測，其中數位視頻資料包括多數的時間匹配視頻畫面對客戶興趣點的檢測；並且使用興趣點將數位視頻資料分段，這裡的數位數頻資料包括多數的時間匹配視頻晝面到多數的時間區間，其中在執行空間配對與時間配對運作在許多的時間區間。在另一方面，這個方法也許可以決定視頻資料時空對應處， ❹並且包含幾個步驟，像是輸入視頻資料；將視頻資料用已經過排序的視覺核苷酸序列來表示；藉由校正序列的視覺核苷酸決定視 ^頻資料時間對應處的子集合；計算視頻資料中空間和時間子集合 •的對應處；並且輸出在視頻資料子集合間的時空對應處。輸入資料的類型：關於這個方面，視頻資料可能為影像序列的集合、可能為視頻資料的查詢以及視頻資料集、可能為單一視頻序列的子集或在視頻資料的集合中經過修改後之視頻序列的子集。更進一步的說，時空對應處係由查詢視頻資料中之視頻序列的至少一個子集與視頻資料集中之視頻序列的至少一個子集之間 47 200951832 被建立。在特定的實施例中，時空對應處能夠在查詢視頻資料之視頻序列的子集與視織料集之視頻序列的子集之間被建立。關於上述之查賴㈣料’所提之錢包含視頻資料集之被修改的子集，所提之修改係由下列—個或多個項目所組成： - •改變畫面速率； · •改變空間解析度； •構成不均勻的空間； •修改直方圖； •覆蓋新的視頻内容； ❹ •新的視頻内容的時間增加。、核芽酸分割：在另-雜化中’本發騎提之纽與方法包含被分割树間區間的視頻資料，且能夠為每—舰間計算視覺核苦酸。區間期間：在另-種變化中，本發明所提之系統與方法能夠分割視頻資料為固定期間的時間區間或是不固定期間的時間區間。時職間的開始與結束時間能触據視麵料巾的鏡頭轉換❹ 來計算。它也能夠記錄時間區間能夠部分重疊或不能部分重疊。、視覺核碰計算：在另—個變化中，視覺核_ (如前所述，· =來描述視覺内容中時間區間的視覺數據）也可以以下步驟來計. •代表一個時間區間的視覺數據作為一個集合的視覺原子； •構建核苷酸在至少有一個視覺原子的功能上。々對此一計算，該功能可能是一種在時間區間上顯現頻率的特徵點（視覺原子）直方圖，或該功能可能是—種在時間區間上視 48 200951832 覺原子舰辭之加歡相。如是1 可歸於視覺軒可組合的魏如下：置方®狀加榷 •視覺原子在時間區間上的時間落點； ' •視覺原子在時間區間上的空間落點； * •視覺原子的有效值。不同特徵點或在核皆酸上的視覺原權··在-執行情況，加權不斷的區間（例如\· =2= ❹所有的特徵點都獲得同等地對待。舉例來說，在一替代加權方 οβ 催々茶加權可以為一個在區間裡的雨斯函數的最大加權。加權也可以設定—大值為視覺内容屬於同-鏡頭當作區間的中心，小值為視覺内容屬於不同的鏡頭。又或者加權也可設定成-大值為視覺原子落點近於晝面中心’小值為視覺原子落點近於晝面邊界。視覺原子的方法：如前述，視覺原子描述一時視覺内容的視覺數據。在-執行，代表^覺數據的時間區間當〇作視覺原子的集合包括以下步驟： •檢測在時間區間一不變特徵點的集合； ' •計算在每财變特獅四肋述視缝料之轉落點區域 • 的集合； •消除一個子集的不變特徵點及其描述； •建立一個視覺原子作為剩下的不變特徵點落點之功能和描述的集合。特徵偵測的方法：除前述之特徵偵測法外，視覺數據中在時間區間的不變特徵點之集合，上述計算可能使用角落偵測器或利 49 200951832 用仿設不變之角落侧n或细日枝航_器或使用mser演异法。假如是利用MSER演算法，它可應用於單獨的一個視頻資料子集或可以應用於視頻資料的時空的子集^上述的不變特徵點之描述也可以是SIFT描述、時空的咖摇述或者§·描述。追蹤的方法：在-些實關t，計算上述描述之子集包含：在時間區間中相對不變特徵點的軌跡，所利用的方法如： •計算一個單—描述符為胁軌跡之顿舰點之描述符； •分配所有屬於該執跡的特徵點之描述符。此計算功能可能是不變特徵點之描述符的平均或不變特徵點之描述符的中間值。 ” 變特徵特徵修整的方法：在一些實施例中，消除如上述之不點的子集包括： •在時間關中相對不變特徵點的數據之軌跡； •為每個軌跡指定—品質矩陣，· •=在執跡上的不變特徵點，其品質矩陣值低於預定的門植值。質矩陣可能的組 ❹ 〇在一些實施例，如上述之為軌跡所分派的品合功能如下： •描述在執跡上不變特徵點的值； •在執跡上不變特徵點的落點；正比於喊_者所林㈣難落關差異。視覺原子結構：在—此音六的隼合可妒曰二貫細例中，建立一個如上述視覺原子作為，二 =覺原子， 50 200951832 :! 妾收一不變特徵點的描述符作為輸入；針烟—做细和最佳认變特徵點描述符接收做為輪入；輸出找到的代表描述符之索引描述符 r 一可使_量崎峻崎礙罐的代表視覺字_方法··代讀_财符如 — 彙）可能從訓練的數據中被 …（視見予 ❹夢㈣魏麟料异，或者可能從輪入的線上更新。在某些情況，其將被利用來建立一 ^視見予彙其普遍運行在所有視頻’或者至少在一大型 ;4 ’ μ促魏於大魏頻資料庫鮮储源的大型陣列。兄领术視見原子修整的方法：在一些實施例，建立如上述視覺原子的集合可能跟著齡的—個視覺原子子集，被·的視覺原子子集可包括： ❹ •在集合裡為每-個視覺原子指定_個品質矩陣； •消除品質矩陣值低於預定的門檻值的視覺原子。一門檻值可以被固定或者調整維持在集合中視覺原子的最小值 .或者凋整在集合中視覺原子極限的最大值。此外，指定的品質矩陣可包括： •接收一個視覺原子當作輸入； •計算視覺原子和代表視覺原子集合的向量相似度； •輸出品質矩陣當作相似向量的功能。此功能可能正比於相似向量的最大值和相似向量的最大值和相似向量的第二大 51 200951832 值之比值，或者此功能是相似向量的最大值和相似向量的最大值和相似向置的第^大值之比值。序列演算的方法：在一些實施例，如上述之視覺核苷酸的序列計算可能包括：， •接收兩個視覺核苷酸序列丨和當作輪入；· •接收一個評分方程式σ(υ;)和一個空隙罰分函數外，»當作参數； •找到部分的對應處C = {(W丄和間隙集合 G 的功能最大值 © *=1 •輸出找到的部分對應處(:和功能最大值。其它计鼻方法：如同先前所討論，最大化可使用 Smith-Waterman 演算法、Needleman-Wunsch 演算法、BLAST、'寅算法來完成或者可能在階層式方法完成。、計分的方法：評分方程式如上述可能是—或多個函數所組成 ^ Aqj ； s·、 ❹ λ/^· ^si j ^(Jj 其中」可能為怪等矩陣，對角矩陣。、計分可能正比於㈣酸^變異為核賊4條 . 突變機率可能是由訓練資料進行訓練之經驗估計而來。” 記分也可能正比於機率，犬變機率可能是由訓練資料進行訓練之經驗估計。 52 200951832 距離基礎計分法：此外’評分方輕式可能反比於距離函數 4?,，％) ’和距離函數可能的組合至少有如下： • L1距離； • Mahalanobis 距離； • Kullback-Leibler 發散； • Earth Mover 距離。加權計晝：除了先前所描述加權計劃，矩陣乂對角線上的元素可正比於 ” log 士其中’尽是指有多少次視覺原子ζ·出現在視覺核苦酸中。在也可由訓練的視頻資料或視被輸入的視頻資料來估計。而矩陣讀角線上的元素也可正比於 21 Κ -中’ V,為視覺原子ζ出現在相同視覺㈣酸之突變版本的差異， ❹ ^為視覺原子，出現在任何視覺核碰的差異。此外，咖可能由訓練的視頻資料估計而來。，，二隙办的方法：在—些實施例，空關分可以是形成函數的參數，兑中 ·在.θ 、，、甲，和*7疋兩個序列的開始位置，《是間隙長為參數。參數們可以從訓練數據來估計，而訓練數據可能包函和刪除的内容。此外，空隙罰分可能形成 Γ7 ’ "代表間隙長度而α和办是參數。進-步，到門=疋凸函數’或者與在位置始於，和y的兩個序列上找關隙長度為成反比。 53 200951832 空間對應處的方法··計算空_應處的方法包括： •輸入視頻資料的時間對應處子集； •在視頻資料的子集提供特徵點； •尋找特徵點之間的對應處； •尋找空間座標之間的對應處。畫面。視頻資料的時間相對子集可能是至少一對的時間此外，尋找特徵點間的對應處更可能包括： •輸入兩組特徵點； •提供特徵點的描述符； •匹配描述符。特徵點可以_囉的視賴賊計算，並且描 " 利用於同樣的視頻核苷酸計算。此疋另外’尋找特徵點間的對應處可以使用算法或引參數所城的翻描述兩_互轉化哺徵點，在其会數之模，·可以進行求解下躲佳化關題 ^ 其中，和队a)丨為兩組特徵點，τ則是依據參數0轉換兩組特徵點的參數。 '' 、二間座標間的對應處可表示為一個在視頻資料子集裡空間系統的座& (X ’ y )和另一個視頻資料子集裡空間系統的座標（X，之間的映射。 ’ 的方法··輸出時空中視頻資料子集間的對應關係可是代表乍為1集中時空系統的座標（X，y,t)和另-子集中時空系统的座標U，，y，，t’）之間的映射。、、 200951832 一個該視頻的dna生成過程的例子顯示在「第15圖」。在這裡，一個局部特徵偵測器以一個frame-wise的方法應用在各種視頻序列的影像畫面1500。此特徵偵測器發現興趣點15〇2，在視頻 ' 序列上也被稱為「特徵點」。正如先前所討論的，許多不同類型的 4特徵债測器可用於包括Harris角落偵測器（c. Harris and M.Once the corresponding location is found, the RANSAC algorithm can be used to perform the conversion of the form f a b u T= -6 c ν Ο 1> for the corresponding set. In another aspect, the invention is a space-time paired digital video material comprising a plurality of temporal (four) video frames. In this fiber, the method includes a step that is difficult to match when performing digital video data. [The digital image is included in most of the time matching video frames to obtain a similar matrix, and the spatial matching of the similar matrix represents the use of each video frame. The representativeness of the matching score, a her component, a gap 46 200951832 penalty component, and a representation that operates using a local correction algorithm (like a genome-based pairing algorithm, or other suitable algorithm) And the implementation of spatial matching digital video data, including a variety of video face time matching using the obtained similar matrix. The step of spatial matching in this is essentially independent of the step of performing the time-matching. The above method can also be applied to the Needleman-Wunsch algorithm, SWAT or other similar secret algorithm. The above method can be implemented using a genomic pairing algorithm, such as a basic local correction search tool that can be used to compare biological sequences, protein or nucleotide DNA sorting algorithms. The above method may further comprise performing local feature detection on the digital video material, wherein the digital video material comprises a majority of the time-matched video images to detect the customer's interest points; and the digits are segmented by the interest points, where the digits are digitized The frequency data includes a majority of time-matched video frames to a majority of time intervals, where performing spatial pairing and time pairing operates over a number of time intervals. On the other hand, this method may determine the temporal and spatial correspondence of the video material, and include several steps, such as inputting video data; representing the video data with an ordered sequence of visual nucleotides; The visual nucleotide determines a subset of the time corresponding to the video data; calculates the correspondence between the spatial and temporal sub-sets in the video data; and outputs the space-time correspondence between the video data subsets. Type of input data: In this regard, the video material may be a collection of image sequences, a query that may be a video material, and a video data set, possibly a subset of a single video sequence, or a modified video sequence in a collection of video material. a subset of. Furthermore, the spatio-temporal correspondence is established by querying at least a subset of the video sequences in the video material and at least a subset of the video sequences in the video data set. In a particular embodiment, the spatiotemporal correspondence can be established between a subset of the video sequences of the query video material and a subset of the video sequences of the visual woven set. The money mentioned in the above-mentioned "Lai (4) material" contains a modified subset of the video data set, and the proposed modification consists of the following one or more items: - • changing the picture rate; • changing the spatial resolution Degrees; • constituting an uneven space; • modifying histograms; • overwriting new video content; • increasing the time of new video content. , Nucleic acid cleavage: In the other-hybridization, the method and method include the video data of the interval between the divided trees, and the visual nuclear acid can be calculated for each ship. Interval Period: In another variation, the system and method of the present invention are capable of segmenting video material into a time interval of a fixed period or a time period of an indefinite period. The start and end time of the time zone can be calculated based on the lens shift of the fabric towel. It is also capable of recording time intervals that can be partially overlapping or not partially overlapping. , visual collision calculation: In another change, the visual kernel _ (as mentioned above, · = to describe the visual data of the time interval in the visual content) can also be calculated in the following steps. • Visual data representing a time interval As a collection of visual atoms; • Construct nucleotides with at least one visual atom function. 々 In this calculation, the function may be a feature point (visual atom) histogram of the frequency appearing in the time interval, or the function may be a kind of appearance in the time interval. If it is 1 can be attributed to the combination of Wei Xuan can be as follows: Set square + shape plus • Time drop of visual atom in time interval; ' • Spatial drop of visual atom in time interval; * • Effective value of visual atom . Different feature points or visual originality on the nucleus acid. · In the implementation, the weighted interval (for example, \· = 2 = ❹ all feature points are treated equally. For example, in an alternative weighting The weight of the square οβ 々 tea can be the maximum weighting of the rain function in the interval. The weighting can also be set—the large value is that the visual content belongs to the same-shot as the center of the interval, and the small value is that the visual content belongs to different shots. Or the weighting can also be set to - the large value is the visual atomic drop point near the center of the facet. The small value is the visual atomic drop point close to the facet boundary. The method of visual atom: as mentioned above, the visual atom describes the visual content of the temporary visual content. Data. In-execution, representing the time interval of the data, when the set of visual atoms is composed of the following steps: • Detecting a set of invariant feature points in the time interval; ' • Calculating the four ribs in each of the financial lions The set of the turning point area of the sewing material; • Eliminate the invariant feature points of a subset and its description; • Create a visual atom as the function and description of the remaining invariant feature points The method of feature detection: in addition to the feature detection method described above, the set of invariant feature points in the time interval in the visual data, the above calculation may use the corner detector or the benefit 49 4951832 Corner side n or fine day _ _ or use mser to perform the same method. If you use the MSER algorithm, it can be applied to a separate subset of video data or a subset of space and time that can be applied to video data. The description of the feature points may also be a SIFT description, a space-time coffee description or a § description. Tracking method: In the case of some real-time t, the calculation of the subset of the above description includes: the trajectory of the relatively invariant feature points in the time interval, The methods used are as follows: • Calculate a single-descriptor as the descriptor of the ship's point of the threat track; • Assign all descriptors of the feature points belonging to the track. This calculation function may be a descriptor of the invariant feature point. The median of the descriptors of the average or invariant feature points. ” Method of modifying feature features: In some embodiments, eliminating subsets as described above includes: • In time off The trajectory of the data for the invariant feature points; • Specify the quality matrix for each trajectory, • • = the invariant feature points on the trajectory whose quality matrix values are lower than the predetermined physiology.一些一些 In some embodiments, the product functions assigned to the trajectory as described above are as follows: • Describe the value of the invariant feature point on the trace; • The drop point of the invariant feature point on the trace; proportional to shouting _ The difference between the forest and the forest (4) is the difference in the visual atomic structure: in the case of the symmetry of the six sounds, the creation of a visual atom as described above, the second atom, 50 200951832 :! The descriptor of the invariant feature point is taken as input; the cigarette smoke-doing fine and the best recognition feature point descriptor is received as a round-in; the index descriptor r of the representative descriptor found by the output can be used to The representative visual character of the canister _ method · 代 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ In some cases, it will be utilized to create a large array of views that are commonly used in all video's or at least in a large; 4' μ to promote the source of fresh resources. The method of seeing the atomic trimming is shown in some embodiments. In some embodiments, a subset of visual atoms, such as the set of visual atoms described above, may be associated with the age, and the subset of visual atoms that may be included: ❹ • in the set for each - A visual atom specifies a quality matrix; • A visual atom that eliminates a quality matrix value below a predetermined threshold. A threshold can be fixed or adjusted to maintain the minimum of the visual atom in the set. Or to minimize the maximum value of the visual atomic limit in the set. In addition, the specified quality matrix can include: • receiving a visual atom as input; • computing visual atoms and vector similarities representing a set of visual atoms; • output quality matrix as a function of similar vectors. This function may be proportional to the ratio of the maximum of the similar vector and the maximum of the similar vector to the second largest 51 200951832 value of the similar vector, or this function is the maximum of the similar vector and the maximum of the similar vector and the similarity of the orientation ^ The ratio of large values. Method of sequence calculus: In some embodiments, sequence calculations of visual nucleotides as described above may include: • receiving two visual nucleotide sequences 丨 and taking turns; • receiving a scoring equation σ(υ; And a gap penalty function, » as a parameter; • Find the corresponding part of the C = {(W丄 and gap function G function maximum value © *=1 • Output found part correspondence (: and function maximum) Other methods of counting the nose: As discussed earlier, maximization can be done using the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, the BLAST, the '寅 algorithm, or possibly in a hierarchical approach. The scoring method: scoring The equation may be - or a plurality of functions as described above ^ Aqj ; s ·, ❹ λ / ^ · ^si j ^ (Jj where "may be a strange matrix, a diagonal matrix. , scoring may be proportional to (4) acid ^ The mutation is 4 nuclear thieves. The mutation probability may be estimated from the training experience of the training materials.” The score may also be proportional to the probability, and the dog change probability may be an empirical estimate of the training data. 52 200951832 From the basic scoring method: In addition, the 'scoring side light may be inversely proportional to the distance function 4?,,%) 'The possible combinations of the distance function and the distance function are at least as follows: • L1 distance; • Mahalanobis distance; • Kullback-Leibler divergence; • Earth Mover distance. Weighted calculation: In addition to the previously described weighting plan, the elements on the diagonal of the matrix can be proportional to "log", which means how many times the visual atom appears in the visual nuclear bitter acid. The video data of the training is estimated based on the input video data. The elements on the matrix read angle can also be proportional to the difference between 21 Κ -中 'V, which is the mutated version of the visual ζ ζ in the same visual (four) acid, ❹ ^ For the visual atom, there is a difference in any visual nuclear collision. In addition, the coffee may be estimated from the training video data. , The method of the two gaps: In some embodiments, the null separation can be a parameter of the formation function. In the beginning of the two sequences of .θ , , , A, and *7疋, “the gap length is a parameter. The parameters can be estimated from the training data, and the training data. Can encapsulate and delete the content. In addition, the gap penalty may form Γ7 ' " represents the gap length and α and is the parameter. Step-by-step, to the gate = 疋 convex function ' or with the position at the beginning, and y The length of the gap is inversely proportional to the two sequences. 53 200951832 The method of spatial correspondence · The method of calculating the space _ should include: • input the time corresponding to the subset of video data; • provide feature points in a subset of the video material • Find the correspondence between feature points; • Find the correspondence between space coordinates. The temporal relative subset of the video material may be at least one pair of times. In addition, finding the correspondence between feature points is more likely to include: • inputting two sets of feature points; • providing descriptors of feature points; • matching descriptors. The feature points can be calculated by the 视视视贼 ,, and the description is used for the same video nucleotide calculation. In addition, the other 'find the corresponding point between the feature points can use the algorithm or the parameter to describe the two _ mutual transformation feeding points, in the model of its number, · can solve the problem under the tactics ^ And the team a) is a set of two feature points, and τ is a parameter that converts two sets of feature points according to parameter 0. The correspondence between the '' and the two coordinates can be expressed as a mapping between the coordinates of the spatial system in the subset of video data & (X ' y ) and the coordinates of the spatial system in another subset of video data (X, The method of 'the output of the time-space video data subsets is the coordinate (X, y, t) representing the 1 concentrated space-time system and the coordinates U, y, t of the other-subset space-time system. The mapping between '). , , 200951832 An example of the dna generation process for this video is shown in Figure 15. Here, a local feature detector is applied to image frames 1500 of various video sequences in a frame-wise manner. This feature detector finds a point of interest 15〇2, also referred to as a “feature point” in the video 'sequence. As previously discussed, many different types of 4-featured debt detectors can be used to include Harris corner detectors (c. Harris and M.

Stephens「A combined comer and edge detector」，Alvey Vision Conference，1988 )’Kanade-Lucas 演算法（B. D. Lucas and T. Kanade， ❹ 「An iterative image registration technique with an application to stereo vision」，1981) SIFT尺度空間的特徵偵測器（D G LoWe，Stephens "A combined comer and edge detector", Alvey Vision Conference, 1988) 'Kanade-Lucas algorithm (BD Lucas and T. Kanade, 「 "An iterative image registration technique with an application to stereo vision", 1981) SIFT scale space Feature detector (DG LoWe,

Distinctive image features from scale-invariant keypoints, IJCV, 2004)等。一般而言，此特徵偵測演算法的目的是設計來描述特徵點在視頻序列穩健或不變的空間扭曲（例如，改變解析度，壓縮雜訊等）。為了減少短暫的雜訊和最有用特徵點的聚焦，該特徵點通常在多個晝面中留下軌跡15〇4，而出現時間太短的特徵點 1506則會被刪除或修整。 Ο 下一階段視頻的DNA生成過程顯示在「第16圖/這裡「第 16圖」所顯示一個視頻影像畫面的細部，在畫面中的點15〇2相對 '的特徵點影像將會被個到。在功能修整後之特徵點1600用來描 *述一個局部特徵描述符。此特徵描述符生成第二類向量，該第二類向畺代表特徵點1600在視頻晝面中之局部鄰近地區16〇2。正如先則所时論的，許多不同的演算法，可以用來描述在特徵點附近視頻影像晝面的性能。這些演算法包括邊緣方向的局部直方圖、 =例不變⑽徵轉換（SIFT)、加速雜點穩健性 (SURF)演算法（H. Bay，τ· Tuytelaars and L G〇〇1，「加^ 叩 r〇b⑽ 55 200951832 features」，2006)。 /I學上來說，特徵簡述符可以代表為第二類的向量 ’植點描述符描述與特徵點侧之視頻影像的局部區域。他局部鄰近區域應，第二類向量的值可以較接近修整 t!nr〇 ^ 1604 顏二近t=?6=r二其他J相對於對應影像的亮度或撕剛棒因此，_ DNA核賊或描述了視 ❹ 的6旦;^的識別標不⑻1畫面的短的時間序）包含兩種類型的向1 .第-類型的向量告知在片段t有多少種不_型 =述符，㈣：細向姻鄉贿每，變特徵點描 2打-鋪準化的難，可以使許多砰的視頻容易比二 =用視頻分段的唯一描述符，通常是創建標於許多不_視頻’絲了最適合的「映射」、「容二二:」的描述來自任何特定的視頻進入這個標準化資料〇點1600( ϋ?」巾正如先續討論的，對於每—個修整特徵合根攄Μ四觸概環境之實際㈣赌述符1700 ^料庫」或「視覺字棄^皮指定為「容器」，這是預描述符集之類型。這種視覺字棄可以被看作是一個 ===鮮化資料庫。在這裡，，，想集合特徵m代表特徵細符的計算’雜「真實的」個在視覺資料庫中的「理想」特徵描述符，其最接近真實的」特徵描述符。因此，每-個「真實的」特徵描 56 200951832 述符携〇從實際麵的部分放入（或替代為）在視覺近的夥伴麗，而只有最接近「理想」的索引（即事 = 的_貝料庫特徵赌述有另-種接近卿居）或其儲/ 而不是真實的描述本身。仔傾從命名的角度來看，這種方式的特徵點代表，有時會提到「視見原子」在此規範。作-個粗略的比喻，視覺字彙可以被看 ❹ ❹ 「個視f原子或分子的「週錄」。延觀舰喻，視覺字彙可以被看作是一個視覺元素的「週期表」。「第18圖」提供了額外的細節顯示出原始視頻如何間（時間分割）。在這個階段，視頻序列分割成不同的時間（時間的）區間或小片段等_、刪、腦。區間時間可以是固定大小（例如’每十個晝面代表—個區間），或可變大小，也可是重疊或不重疊。常常可枝追_徵馳及分舰頻為相對科的特徵的棚，它通常會姆於較_場景之峡的削減或修改。_分割可以被做到的，舉例來說，基於來自前期的特徵點追蹤。其應當指㈣是分是經由預先蚊的演動完成。其次，現在視覺字彙放入視覺特徵描述符（視覺原子）在每個夺間區間中被組合（聚集）為集合18〇6。在這裡，特徵點本身的時間和空間座標18〇8不被使用，而在一系列的視頻晝面（時間區間），不同類型的特徵描述符的總和被使用在麵。這一過程土^上是止於建立一個對於每個系列的視頻晝面之直方圖、向量或特徵包（描述符）」181〇。各種放入特徵描述符（視覺原子）出見的頻率可代表作為直方圖或向量並使用於這裡，此直方圖 57 200951832 或向量有時稱為視覺核苷酸。 © 「特徵包」的方法在提取或索引視頻有很多優點。其中的一個優點是_方法是穩賴，並可以侧細視頻之間的關係，即使一個或兩個視頻經重疊像素在原始晝面改變，空間剪輯（'例如，剪裁），更改為不同的解析度或畫面速率等。舉例來說，如^ -個視解配被修改⑷如，在原始畫财重疊像幻，新的°視頻序列將包括特徵關混合（屬於雜視頻的集合和屬於覆蓋的其他集合如果覆蓋不是很大（即在畫面上大多數的資訊屬原始視頻）’它仍然能夠正確地匹配來自兩個視頻的兩個由通過-個輕鬆的匹配鮮，決定該核魏（直㈣或向量的^ 徵點）匹配在兩者之間只有不到100%的對應性。「第19圖」顯示—娜成視頻DNA的特定媒介的例子。在這裡’視頻DNA包括取自視頻的不同時間分割（小片段）刪已排序陣列、不同「直方圖」的「序列」、「特徵描 j的向1」或「核_」等。視頻也可以是在—鎌器上的元 ❹ 數據資料庫之原始參考視頻，或由原始參考視頻被複製的客户端視頻，視頻可經由視頻DNA程序進行提取和索引，一般而今，由參考視頻所建立的視頻DNA能與客戶端視綱建立的視頻峨足夠相似’以致於視頻的疆可作為索引或匹頻的DNA的對猶。 X I、他視此-參考視頻DNA創建—個索引，被建立的索引允許其他設 ^ ’如客戶端，去播放參考視頻的客戶端副本，去找出客戶端所，放^^參考視頻中或在健器視頻職資料庫中的部刀。例如’播放客户端視頻1914的客戶端可以依據由同一視頻 58 200951832 DNA程序計算1916客戶端視頻的視頻DNA，傳送客戶端視頻 Z的=DNA識別標雜服器或其他保存參考視頻臟的之序列的性質以及位置可以依據客戶端視頻疆 <‘、、s $或> 考視頻DNA資料庫的索引來決定。該索引又可以 =擷取來自伺服器之資料庫_料，元資料對應到被播放之視頻的部分。正如先前討論，在-個不同特徵_演算法的相對大陣列（即 ❹=或數千）可麟分析的視娜像，鮮是所有的影像特徵點 ^放入每個不__特徵點演算法。有些影像特徵描述符並不適合輸人特定的特徵描述符演算法，或者有模棱兩可的適配。為改。整體視頻DNA程序的精確度，往往嘗試使用近鄰演算法藉 =試圖得到最靠近的可能適配往往是有用的。在最接近鄰居的演算^中’實際觀察到的特徵點（特徵描述符）被建立給與特徵描述符演算法錢之與觀察狀舰絲接近輯數容器使用。大客戶^的時間匹配和參考視頻DNA可以用於各種不同的演 ❹算法。這些演算法的範圍從非常簡單的「匹配/不匹配演算法」，到生物資峨樣「‘驗陣」的演算法，制非常複雜的演算法類似於用在生物資訊學的生物DNA序列匹配。這些複雜的生物資訊學 •演算法的例子包括由s. B. Needleman以及C. D. Wunsch在1970 年所提出之Needleman-Wunsch演算法、由τ· F. Smith以及M. S.Distinctive image features from scale-invariant keypoints, IJCV, 2004), etc. In general, the purpose of this feature detection algorithm is to design spatial distortions (eg, changing resolution, compressing noise, etc.) that characterize the feature sequence as robust or constant in the video sequence. To reduce transient noise and focus of the most useful feature points, the feature points typically leave a track 15〇4 in multiple faces, while feature points 1506 that occur too short will be deleted or trimmed. DNA The DNA generation process of the next stage video is displayed in the detail of a video image displayed in “16th image/here”, “16th image”. The image of the feature point at the point 15〇2 in the picture will be . Feature points 1600 after function trimming are used to describe a local feature descriptor. This feature descriptor generates a second type of vector representing the local neighboring region 16 of the feature point 1600 in the video plane. As is the case, many different algorithms can be used to describe the performance of video images near feature points. These algorithms include local histograms in the edge direction, = example invariant (10) sign conversion (SIFT), and accelerated point robustness (SURF) algorithms (H. Bay, τ·Tuytelaars and LG〇〇1, “plus ^ 叩R〇b(10) 55 200951832 features", 2006). /I In theory, the feature descriptor can represent a local region of the video image of the second type of vector ‘spot descriptor and the feature point side. His local neighborhood should be, the value of the second type of vector can be closer to trimming t!nr〇^ 1604 Yan two near t=?6=r two other J relative to the brightness of the corresponding image or tearing the stick, therefore, _ DNA thief Or describe the 6 denier of the visual ;; the identification of the ^ is not a short time sequence of the (8) 1 picture) contains two types of vectors of the 1. type - telling how many kinds of non-type = descriptives in the fragment t, (d) : Fine-to-marriage bribery, change the feature point to draw 2 hits - the difficulty of the standardization, can make many sly videos easier than two = unique descriptors with video segmentation, usually created in many not _video' The most appropriate description of "mapping" and "capacity two:" comes from any particular video entering this standardized data point 1600 (ϋ?" towel, as discussed earlier, for each trimming feature. The actual situation of the four-touch environment (4) gambling descriptor 1700 ^ material library or "visual word discarding is designated as "container", which is the type of pre-descriptor set. This visual discard can be regarded as a == = Fresh database. Here, I want to collect the feature m to represent the calculation of the feature details. An "ideal" feature descriptor in the visual database, which is closest to the real "feature descriptor. Therefore, each "real" feature description 56 200951832 is carried out from the actual face part ( Or instead of being in a visually close partner, and only the index closest to the "ideal" (ie, the _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ From the point of view of naming, the characteristic points of this method represent, sometimes, the "seeing atom" is regulated here. As a rough metaphor, the visual vocabulary can be seen ❹ 个The "weekly record" of atoms or molecules. Looking at the ship's metaphor, the visual vocabulary can be seen as a "periodic table" of visual elements. "18th picture" provides additional details showing how the original video is (time division) At this stage, the video sequence is segmented into different time (time) intervals or small segments, etc. _, delete, brain. Interval time can be a fixed size (such as 'every ten faces represent a range", or variable Size, also available It is overlapped or non-overlapping. It is often possible to chase _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Said, based on the feature point tracking from the previous period. It should mean that (4) the score is completed by the action of the pre-mosquito. Secondly, the visual vocabulary is now put into the visual feature descriptor (visual atom) to be combined in each intervening interval ( Aggregate) is the set 18〇6. Here, the time and space coordinates of the feature points themselves are not used, but in a series of video faces (time intervals), the sum of different types of feature descriptors is used. This process ends with the creation of a histogram, vector or signature (descriptor) for each series of video images. The various frequencies placed into the feature descriptor (visual atom) can be represented as a histogram or vector and used here. This histogram 57 200951832 or vector is sometimes referred to as a visual nucleotide. © The "Feature Pack" method has many advantages in extracting or indexing video. One of the advantages is that the _ method is stable and can be used to fine-tune the relationship between the videos, even if one or two videos are changed over the original face by overlapping pixels, the space clip ('for example, cropping) is changed to a different parsing Degree or picture rate, etc. For example, if ^ - a visual solution is modified (4), for example, in the original image, the new video sequence will include feature-mixing (a collection belonging to a heterogeneous video and other collections belonging to the overlay if the coverage is not very Large (ie most of the information on the screen is the original video) 'It still correctly matches the two from the two videos by the easy-to-match matching, determining the nuclear Wei (straight (four) or vector ^ points Matching is less than 100% correspondence between the two. "Figure 19" shows an example of a specific medium for Nacheng video DNA. Here, 'video DNA includes different time divisions (small fragments) taken from the video. Delete the sorted array, the "sequence" of the different "histograms", the "1" of the feature description j, or the "nucleus_", etc. The video can also be the original reference video of the metadata library on the device. Or the client video copied from the original reference video, the video can be extracted and indexed via the video DNA program. Generally, the video DNA created by the reference video can be similar to the video created by the client. So that the video can be used as an index or a frequency of DNA. XI, he sees this - reference video DNA creates an index, the established index allows other users to set up a reference video like a client The end copy, to find out the client, put ^^ reference video or in the health video library. For example, the client that plays the client video 1914 can calculate 1916 according to the same video 58 200951832 DNA program. The video DNA of the client video, the nature of the sequence of the client-side video Z's DNA identification device or other saved reference video dirty and the location can be based on the client video <', s $ or > The index of the video DNA database is determined. The index can then retrieve the data from the server, and the metadata corresponds to the portion of the video being played. As discussed earlier, the relative characteristics of the different features are Large arrays (ie ❹ = or thousands) can be analyzed by the image of the lining, and all the image feature points are placed in each __ feature point algorithm. Some image feature descriptors are not suitable for the input. The feature descriptor algorithm, or ambiguous adaptation. To correct the accuracy of the overall video DNA program, it is often useful to try to use the nearest neighbor algorithm to try to get the closest possible adaptation. In the nearest neighbor In the calculation ^, the actual observed feature points (feature descriptors) are created for use with the feature descriptor algorithm Qian Zhi and the observation ship close to the number of containers. The time matching and reference video DNA of the large customer ^ can be used. For a variety of different deductive algorithms. These algorithms range from very simple "match/mismatch algorithms" to biometric "like" algorithms, which are very complex algorithms. Biological DNA sequence matching for bioinformatics. Examples of these complex bioinformatics algorithms include the Needleman-Wunsch algorithm proposed by s. B. Needleman and CD Wunsch in 1970, by τ·F. Smith and MS.

Waterman 在 1981 年提出之 Smith-Waterman 演算法、1981 年的「一般分子之序列的鑑別」、以及在199〇年由s F Alschul等人所提出之「基礎位置校正搜尋工具（BLAST)」。通常’ 一個合適的序列匹配演算法將定義一個匹配分數（或 59 200951832 距離）’代表在兩個視頻序列之間匹配的品質。匹配的計分包括兩，主要部分：核_和空關分之間的相似性（或距離），並以演算法評論如何引進間隙來盡量避免「撕裂」序列。為了做到這一點，一個在第一視頻的核苷酸和在第二視頻相應的核練之間的距離是被—錄學過程所決定。所以，來自於視覺晝面的第-伺服器之「特徵包」和來自於第二視覺晝面的第二飼服器之「特徵包」如何相似？這種相似可以表示為以一個矩陣測量兩個核《多相似或多不相似單的情況下，它可以是代表每個料_向量（概包）之_卽伽咖距離或相關性。如果要允許部分相似（這經常發生，特別是在視覺核普酸空間的修改而含有不_特徵點），—個更加複雜的加權 2或非主體資料的排除都可以被使用。更為複雜的距離也可能 =慮:i:,核苦酸的突變機率:兩種不同的核普酸是非常相似讀生彼此的突變。例如，考慮有視·像的第一個視疊。頻^=4二個視頻有同樣視頻影像的第—序列則視頻重子或兀素）將與許多在特徵包中之用來點相似。另外，「穿鑤布肌两幻硯頻特徵特徵點的獨心翻為視頻的重疊導致這些視頻補償空之間5丨人空隙㈣能。如果用線性娜1、政由-些預先設㈣常數The Smith-Waterman algorithm proposed by Waterman in 1981, the "identification of the sequence of general molecules" in 1981, and the "Basic Position Correction Search Tool (BLAST)" proposed by s F Alschul et al. in 199. Usually 'a suitable sequence matching algorithm will define a match score (or 59 200951832 distance)' to represent the quality of the match between the two video sequences. The matching score consists of two parts, the main part: the similarity (or distance) between the kernel _ and the vacant points, and the algorithm to comment on how to introduce gaps to avoid the "tearing" sequence. To do this, the distance between a nucleotide in the first video and the corresponding nuclear training in the second video is determined by the recording process. So, how is the "feature package" of the first server from the visual side and the "feature package" of the second feeding device from the second visual side? This similarity can be expressed as the measurement of two cores in a matrix. In the case of multiple similarities or multiple dissimilarities, it can represent the _ gamma distance or correlation of each material _ vector (abbreviated). If partial similarity is to be allowed (this often happens, especially if the visual nucleotic acid space is modified to contain non-feature points), a more complex weighting 2 or exclusion of non-subject data can be used. More complex distances may also be considered: i:, the probability of mutation of nuclear picric acid: two different nucleotide acids are very similar to each other. For example, consider the first view with a view. The frequency ^=4 two videos have the same video image as the first sequence of video tuner or pixels) will be similar to many of the points used in the feature package. In addition, the ulterior motive of the two feature points of the 鑤肌肌导致导致为导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致导致

為複雜的空隙罰分，可能會者膚罚_數I相乘。更 .y 會考慮間隙出現的機率，例如.钿祕A 口位置的統計分姊内容中的__。 ·根據i 雖然「視頻職」—獅描述麵記號法概述的好方法， 200951832 明顯的，匹配各種視頻核普酸應該比匹配生物核苦酸複雜得夕生物的核苷酸通常是一個簡單的「A」、「T」、「G ,戎「(- 1、 J -人，而視頻的DNA核苷酸是一個更為複雜的「特徵包」（特徵描述符的 ' 袋）。因此’往往一個特定的視頻核苷酸將永遠不會找到一個相卷 • 完美的匹配。相反的，標準的「匹配」通常是將是一個非常接近，但是還不完美的匹配。通常’匹配將被距離函數決定，如距離， L1距離、Mahalanobis距離、Kullback-Leibler發散距離、地球移 ❺動距離或者其他函數。這是一個例子，不論何時的視頻核苷酸匹配「距離」<=門檻值。較小的匹配規則被認為是一種更嚴格的匹配（即少數的視頻 DNA核苷酸或識別標記將彼此匹配），而較大的匹配規則被認為是一種不那麼嚴格的匹配（即更多視頻的DNA核苷酸或識別標記將彼此匹配）。關於「第20圖」至「第24圖」，一系列的圖示說明一個根據系統和所述的方法之過程配置。「第2〇圖」顯示視頻記號特徵债 ©測過程的一個例子。在這個例子中，輸入視頻（A)是由含有視頻影像2004的視頻畫面序列和依據被用來輸入多比率特徵偵測器 2006之時間期間上之X以及γ所定義的區域所組成。視頻信號$上， .s2, s3是受到不同空間深度（B)之過濾器的旋轉後，所產生之一系列具有不同解析度之特徵比率的影像。這些不同比率的空間影像在分析後（例如，角落偵測）會落在不同比率1，2,3 (C)中。圖片就可以經由—系列的多比率頂點（D)確認在畫面中的特徵點fl、β。厂第21圖」表示視頻記號特徵點追縱和修整過程的例子。這 200951832 ，一個選擇性的階段，但如果被使用，特徵點可跨過多個畫面被畫面是足夠（例如，滿足預設標準)—被保 =棄岭暫的特徵_為沒有足·的咖以滿足預設標準而「第22圖」表示視頻記號特徵描述符的例子。「第 ◎ =_先侧_特徵點之後如何描述…般情況下^ -人輸人視頻22GG的程序，和這個時間分析在視頻中每個先前了^^特徵點⑹四周的鄰近區域（x，y，r)。此特徵 =2=7⑽補㈣，—轉徵點周_ 近之办像的SIFT梯度疋被計算的（H)，依據這個梯度產生 ΪΓ=7Γ度方向的細（1)°此細之後解析成元素的一個向量（J)，稱為特徵描述符。「第23圖」顯示向量量化的過程的一個例子，將映射的一影像輸入一量化特徵描述符的序列。在這個例子中，視頻影像，先 ❹ =之為具有-個任意的特徵描述符詞彙的特徵描述符向量 ⑻，其映射到—個標準化的d維雜描述符料⑴。 —1〇標準化計劃(SCh—⑽‘ 準汁旦此夠不_來源的確認唯一視頻。「第24圖」所示的是視頻DNA的創分析槪，健財畫_畫_麵上細 =」特徵包。廷方面的一個例子如「第8圖」所示。 ==?視頻!料和對於特別晝面之特徵包是匯總“： Ν)ι些從視頻晝面鄰近之特徵包（例如，畫 62 200951832 面卜晝面2、畫面3 )被平均（P)，以產生—多晝面視頻時間區間的代表，常常提及為「視頻核苷酸」。「第25圖」顯示處理視頻資料的系統25〇〇。視頻資料源儲存及/或生成視頻資料。視頻分割器25〇4從視頻資料源25〇2接 '收視頻資料並分割為視頻資料為時間區間。視頻處理器25〇6從視頻資料源2502接收視頻資料，並對視頻資料進行各種操作。在這個例子中，視頻處理器25〇6在視頻資料中偵測特徵位置，生成與 ❹特徵位置有關的特徵插述符，以及修整偵測特徵位置以生成特徵位置的子集。視頻聚集器251G與視頻分割器2504和視頻處理器 2506相連接。視頻聚集器251〇生成一個有關視頻資料的視頻 DNA。由於這裡討論的，視頻DNA可以包括視頻資料該視頻資料為已排序之視覺核苷酸序列。儲存裝置25G8接上影像分割器25G4，視頻處理器屬和視頻聚集器2MG可以儲存多種不同由這些元件使用的資料^儲存的資料包括：錄影資料、畫面龍、觀龍、特徵描述符、視覺 ❿原子、視頻DNA、演算法、設定、門襤值等。「第乃圖」所描述的元件可以直接或經由另一個中間裝置、系統、元件、網路、通 • 訊連結等相連接。 . 纽描述的系統和方法使與特定視頻内容關聯之多視頻内容識別符的確認以及關聯變的容易。除此之外，某些實施例能和一種以上的傳統視頻處理及/或視頻顯示系統和方法—起使用。例如，一個實施例能被用來改進現有的視頻處理系統。雖然這裡娜容的元相及模岐轉殊_式組合而成，但這些元件以及模組的組合型式可以經由調整藉以在不同的方 63 200951832 法中實現多視頻内容識別符的確認以及關聯。在其他的實施例中’-個或多個附加的元件或模組可以添加到先前描述的系統中。可選擇的實施例也可將兩種或兩種以上之上述的元件或模組組合為一個元件或模組。 ' ’ 、雖…:本發明所揭露之實施方式如上，惟所述之内容並非用以直接限疋本發明之專利保護範圍。任何本發明所屬技術領域中具有通常知識者’在不脫離本發明所揭露之精神和範圍的前提下/，、對本發明之實施的形式上及細節上作些許之更動潤飾，均屬於本發明之專娜護細。本發明之專娜絲H，侧輯附之中❹ 請專利範圍所界定者為準。【圖式簡單說明】第1圖為本發明所提之環境示意圖。第2圖為本發明所提之通用視頻查找系統可執行之程序示意圖。第3圖為本發明所提之摘取視頻内容識別符以及特定視頻内容相關之元資料之流程圖。第4圖為本發明所提之判斷多視頻序列間之對應關係之流程圖。第5圖為本發明所提之確認、擷取、校準字幕資訊之流程圖。 > 第6A圖為本發明所提之視頻資料之空間校準以及時間校準之示意圖。第6B圖為本發明所提之以視頻基因演算法表示上下文之示意圖。第7圖為本發明所提之視頻DNA之示意圖。 64 200951832 第8圖為本發明所提之生物的DNA與視頻DNA之比對示意圖。第9圖為本發明所提之建立視頻DNA之流程圖。第10圖為本發明所提之將視頻序列分割為時間區間之示意 ' 圖。第11圖為本發明所提之偵測以晝面為基礎之特徵點之流程圖。 Φ 第12圖為本發明所提之搜尋不變的特徵點之流程圖。第13圖為本發明所提之修整特徵點軌跡之流程圖。第14圖為本發明所提之搜尋兩視頻DNA序列間之時空對應關係之流程圖。第15圖為本發明所提之視頻DNA產生程序之示意圖。第16圖為本發明所提之在試頻DNA產生期間之視頻特徵點如何被處理之示意圖。第17圖為本發明所提之視頻特徵描述符號如何被放入特徵描 ❹ 述符號之儲存處之示意圖。第18圖為本發明所提之視頻在視頻DNA被建立過程中如何 ' 被分段為大量短暫間格的畫面之示意圖。 • 第19圖為本發明所提之依據相關之視頻DNA索引以及描述For complex gap penalties, the number of skins may be multiplied by the number I. More .y will take into account the probability of a gap, for example, __ in the statistical distribution of the location of the port A. · According to i, although the "video job"-lion description method is a good method outlined in 200951832, it is obvious that matching various video nucleotides should be a simpler than matching the nucleotides of the biological nucleic acid complex. A", "T", "G, 戎" (- 1, J - human, and the DNA nucleotide of the video is a more complex "feature package" (the 'description of the feature descriptor'). So 'often one A particular video nucleotide will never find a phase roll • A perfect match. Instead, the standard "match" will usually be a very close, but not perfect match. Usually the 'match will be determined by the distance function. Such as distance, L1 distance, Mahalanobis distance, Kullback-Leibler divergence distance, Earth moving turbulence distance or other functions. This is an example, whenever the video nucleotide matches the "distance" <= threshold value. Matching rules are considered to be a more rigorous match (ie a small number of video DNA nucleotides or identification tags will match each other), while larger matching rules are considered a less stringent match (That is, the DNA nucleotides or identification marks of more videos will match each other.) With regard to "20th to 24th", a series of illustrations illustrate a process configuration according to the system and the method described. Figure 2 shows an example of a video signature feature. In this example, the input video (A) is used to input a multi-ratio feature detector 2006 from a sequence and basis of video images containing video images 2004. The X and γ defined regions are formed during the time period. The video signals $, , .s2, and s3 are the rotations of the filters of different spatial depths (B), resulting in a series of different resolution characteristics. The ratio of images. These different ratios of spatial images will fall in different ratios 1, 2, 3 (C) after analysis (for example, corner detection). Pictures can be confirmed by the multi-ratio vertices (D) of the series. The feature points fl and β in the picture. The 21st picture of the factory represents an example of the tracking and trimming process of the feature points of the video symbol. This 200951832 is an optional stage, but if used, the feature points can be spanned across multiple pictures. The picture is sufficient (for example, the preset criteria are met) - the insured = the feature of the abandonment of the ridge is _ the number of coffees that do not have enough to meet the preset criteria, and the "22nd picture" shows an example of the video symbol feature descriptor. =_First side_How to describe after the feature point... In the normal case ^ - the person enters the video 22GG program, and this time analyzes the neighboring area around each of the previous ^^ feature points (6) in the video (x, y, r This feature = 2 = 7 (10) complement (four), - the week of the retweet _ the SIFT gradient of the near image is calculated (H), according to this gradient, the fineness of the ΪΓ = 7 twist direction (1) ° after this fine A vector (J) parsed into an element is called a feature descriptor. "23rd picture" shows an example of a process of vector quantization, which inputs a mapped image into a sequence of quantized feature descriptors. In this example, the video image is first ❹ = a feature descriptor vector (8) with an arbitrary feature descriptor vocabulary, which is mapped to a standardized d-dimensional interfering descriptor (1). —1〇Standardization Plan (SCh—(10)' The only video that confirms the source of the drug is not enough. The “24th picture” shows the creative analysis of video DNA, and the picture of the video is _ Feature package. An example of the Ting aspect is shown in Figure 8. ==? Video! and the feature package for the special face is a summary ": Ν) ι some feature packages from the video face (for example, Painting 62 200951832 face 2, picture 3) is averaged (P) to produce a representative of the multi-face video time interval, often referred to as "video nucleotides". Figure 25 shows the system for processing video data. Video source Save and/or generate video material. The video splitter 25〇4 receives the video data from the video data source 25〇2 and divides it into video data for the time interval. The video processor 25〇6 receives the video material from the video material source 2502 and performs various operations on the video material. In this example, video processor 25〇6 detects feature locations in the video material, generates feature inserts associated with the feature locations, and trims the detected feature locations to generate a subset of feature locations. Video aggregator 251G is coupled to video splitter 2504 and video processor 2506. The video aggregator 251 generates a video DNA about the video material. As discussed herein, the video DNA can include video material that is an ordered sequence of visual nucleotides. The storage device 25G8 is connected to the image splitter 25G4, and the video processor genus and the video concentrator 2MG can store a plurality of different materials used by the components. The stored data includes: video data, picture dragon, Guanlong, feature descriptor, visual ❿ Atomic, video DNA, algorithms, settings, thresholds, etc. The components described in "Dinitu" can be connected directly or via another intermediate device, system, component, network, communication link, etc. The system and method described by New is easy to confirm and correlate multiple video content identifiers associated with a particular video content. In addition, certain embodiments can be used with more than one conventional video processing and/or video display system and method. For example, an embodiment can be used to improve existing video processing systems. Although Nayong's metaphase and modularity are combined, the combination of these components and modules can be adjusted and associated with multiple video content identifiers in different ways. In other embodiments, one or more additional elements or modules may be added to the previously described system. Alternative embodiments may also combine two or more of the above-described elements or modules into one element or module. The invention disclosed in the above is the same as the above, but the content is not intended to be directly limited to the scope of the invention. Any modification of the form and details of the practice of the present invention may be made by those skilled in the art without departing from the spirit and scope of the invention. Special care for the fine. The special yarn of the present invention, the side of the collection, is subject to the definition of patent scope. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic view of the environment proposed by the present invention. Figure 2 is a schematic diagram of a program executable by the universal video search system of the present invention. Figure 3 is a flow chart showing the extraction of the video content identifier and the meta-data related to the specific video content. Figure 4 is a flow chart for judging the correspondence between multiple video sequences according to the present invention. Figure 5 is a flow chart for confirming, capturing, and calibrating subtitle information according to the present invention. > Figure 6A is a schematic diagram showing the spatial calibration and time calibration of the video material proposed by the present invention. Figure 6B is a schematic illustration of the context in which the video gene algorithm represents the context of the present invention. Figure 7 is a schematic diagram of the video DNA of the present invention. 64 200951832 Fig. 8 is a schematic diagram showing the comparison of DNA and video DNA of the organism of the present invention. Figure 9 is a flow chart of establishing a video DNA according to the present invention. Figure 10 is a schematic diagram of the present invention for dividing a video sequence into time intervals. Figure 11 is a flow chart showing the detection of feature points based on the face of the present invention. Φ Fig. 12 is a flow chart of the feature points for searching unchanged according to the present invention. Figure 13 is a flow chart of the trimming feature point trajectory proposed by the present invention. Figure 14 is a flow chart showing the temporal and spatial correspondence between the two video DNA sequences proposed by the present invention. Figure 15 is a schematic diagram of the video DNA generation program of the present invention. Figure 16 is a schematic illustration of how video feature points during trial frequency DNA generation are processed in accordance with the present invention. Figure 17 is a diagram showing how the video feature description symbols of the present invention are placed in the storage of the feature description symbols. Figure 18 is a schematic diagram of how the video of the present invention is 'segmented into a large number of short cells during the process of creating video DNA. • Figure 19 is a reference to the video DNA index and description related to the invention.

視頻之示意圖。 W 第20圖為本發明所提之偵測視頻識別特徵點之過程之示咅 m ° 〜第21圖為本發明所提之追蹤及修整視頻識別特徵點之示音圖。 μ忍 65 200951832 示意圖。第22圖為本發明所提之描述視頻識別特徵點之第23圖為本發明所提之向量量子化之示意圖。第24圖為本發明所提之建造視頻dnaA schematic of the video. W Fig. 20 is a diagram showing the process of detecting a video recognition feature point according to the present invention. ° m ° ~ 21 is a sound diagram of the tracking and trimming video recognition feature points of the present invention. μ忍 65 200951832 Schematic. Figure 22 is a diagram showing the video identification feature point of the present invention. Figure 23 is a schematic diagram of vector quantization according to the present invention. Figure 24 is a construction video dna of the present invention

【主要元件符號說明】示意圖。 102 104 106 108 110 112 114 116 118 202 204 206 208 210 212 214 216 600 第一視頻來源第二視頻來源第三視頻來源網路 _ 霸第一元資料來源第二元資料來源第三元資料來源通用視頻查找系統視頻設備通訊模組處理器視頻分析模組 © 儲存設備視頻DNA模組相似模組 · 雜湊計算模組廣告管理模組第一階段第二階段 66 602 200951832[Main component symbol description] Schematic. 102 104 106 108 110 112 114 116 118 202 204 206 208 210 212 214 216 600 First video source second video source third video source network _ 霸 first source data twentieth data source ternary data source universal Video search system video device communication module processor video analysis module © storage device video DNA module similar module · hash computing module advertising management module first stage second stage 66 602 200951832

610 蘋果 612 水果 614 電腦 702 第一階段 704 第二階段 706 時間區間 708 視覺元素（原子） 710 視覺範圍 712 視覺核苷酸 714 視頻DNA 800 視頻信號 802 核苷酸與原子 900 視頻資料 901 視頻資料 1000 特徵偵測 1010 特徵位置 1020 時間區間 1022 時間區間 1024 時間區間 1026 時間區間 1425 時間對應處 1435 子集 1445 空間對應處 1500 影像晝面 67 200951832 1502 特徵點 1504 執跡 1506 被刪除的無意義執跡 1600 修剪後的特徵點 1602 局部鄰近地區 1604 特徵描述符 1700 實際的特徵描述符 1702 代表性的特徵描述符i 1800 時間區間 1802 時間區間 1804 時間區間 1806 特徵集合 1808 時間和空間座標 1810 特徵包 1914 客戶端視頻 1916 客戶端視頻DNA產生程序 2000 特徵描述 2004 視頻影像 2006 多比率特徵偵測器 2010 特徵描述符 2200 視頻 2500 子系統 2502 視頻資料源 2504 視頻分割器 68 200951832 2506 視頻處理器 2508 儲存裝置 2510 視頻聚集器 3000 特徵修整 * 3010 特徵子集 3100 特徵追蹤 3110 軌跡 3200 執跡修整 ® 4000 特徵表示 5000 時間區間分割 6000 視覺原子聚集 6010 視頻DNA 6011 視頻DNA 302 接收與特訂視頻内容相關之請求 304 識別與特訂視頻内容相關之視頻内容識別符〇 306 依據視頻内容識別符擷取與特定視頻内容相關之元資料 - 308 310 312 402 404 406 轉換與特定視頻内容相關之視頻内容識別符為與先别之視頻内容識別符對應之其他視頻内容識別符擷取與各個其他視頻内容識別符相關之元資料提供元資料及與視翻容對應之資訊至請求設備識別與視頻節目_之第—視頻序列識別與相同視頻節目相關之第二視頻序列計算第-視頻序列與第二視頻序列間之對應處 69 200951832 408 4l〇 412 502 504 506 508 510 512 514 判斷第一視頻序列與第二視頻序列間之校準線儲存計算之對應處儲存關於校準線之資訊接收與DVD光碟中之特定電影相關之特定字幕資訊 ' 識別與DVD光碟相關之第一 DVD識別符識別儲存於DVD光碟中之電影之視頻序列依據視頻序顺_ DVD _相H DVD $ 別符使用第一 DVD識別符由字幕來源操取電影字幕識別DVD電树邮與與字幕祕侧之電影時間轴之對應處610 Apple 612 Fruit 614 Computer 702 First Stage 704 Second Stage 706 Time Interval 708 Visual Element (Atomic) 710 Vision Range 712 Visual Nucleotide 714 Video DNA 800 Video Signal 802 Nucleotide and Atomic 900 Video Data 901 Video Data 1000 Feature Detection 1010 Feature Location 1020 Time Interval 1022 Time Interval 1024 Time Interval 1026 Time Interval 1425 Time Correspondence 1435 Subset 1445 Space Correspondence 1500 Image Face 67 200951832 1502 Feature Point 1504 Execution 1506 Deleted Meaningless Pendant 1600 Pruned feature points 1602 Local neighbors 1604 Feature descriptor 1700 Actual feature descriptor 1702 Representative feature descriptor i 1800 Time interval 1802 Time interval 1804 Time interval 1806 Feature set 1808 Time and space coordinates 1810 Feature package 1914 Client Video 1916 Client Video DNA Generation Program 2000 Feature Description 2004 Video Image 2006 Multi-Ratio Feature Detector 2010 Feature Descriptor 2200 Video 2500 Subsystem 2502 Video Data Source 2504 Video Splitter 68 200951832 2506 View Frequency Processor 2508 Storage 2510 Video Aggregator 3000 Feature Trimming * 3010 Feature Subset 3100 Feature Tracking 3110 Track 3200 Excavation Trim® 4000 Feature Representation 5000 Time Interval Segmentation 6000 Visual Atomic Aggregation 6010 Video DNA 6011 Video DNA 302 Receive and Special The video content related request 304 identifies the video content identifier associated with the specific video content 〇 306 retrieves the metadata associated with the particular video content based on the video content identifier - 308 310 312 402 404 406 Converting the video associated with the particular video content The content identifier is another video content identifier corresponding to the first video content identifier, and the metadata providing metadata related to each other video content identifier and the information corresponding to the visual reconciliation to the requesting device identification and video program The first video sequence identifies a second video sequence associated with the same video program to calculate a correspondence between the video sequence and the second video sequence. 69 200951832 408 4l 〇 412 502 504 506 508 510 512 514 determining the first video sequence and Calibration line storage between the second video sequences Storing the information about the calibration line to receive the specific subtitle information related to the specific movie in the DVD disc. The first DVD identifier associated with the DVD disc identifies the video sequence of the movie stored in the DVD disc according to the video sequence.顺 _ DVD _ phase H DVD $ 别 Use the first DVD identifier to listen to the subtitle source from the subtitle source to identify the correspondence between the DVD electric tree post and the movie timeline of the subtitle

依據兩時間歡校轎準_字幕資訊對應至 DVDAccording to two time, the school car _ subtitle information corresponds to the DVD

Claims

200951832 VII. Patent application scope: 1. A method, the method comprising at least the following steps: receiving a request corresponding to a specific video content from a requesting device; confirming, by a video content source, one of the first corresponding to the specific video content a video content identifier; capturing, according to the first video content identifier, a first meta-data corresponding to the specific video content; " converting the first-view_receiving job to an H-video content identifier; The second video content identifier captures one of the second meta-data corresponding to the specific video content; and “provides the first meta-data and the second meta-data to the requesting device. As described in claim II. The method of claim 1, wherein the first video content identifier corresponds to all of the specific video content. 3. The method of claim 1, wherein the first video content identification system and the method A specific time interval corresponding to one of the specific video contents. 4. The method of claim 100, wherein the first video content identifies 71 1 The method of claim 1, wherein the first video content identifier is an identifier based on the file name, and the identifier is stored in the identifier. The method of claim 1, wherein the first video content identifier is one of the identifiers based on the content of a video file. The identifier corresponds to the file bearing size of the specific video file. The method of claim 6, wherein the first video content identifier is based on the file bearing size of the specific video file. A method of calculating a hash value. 8. The method of claim 1, wherein the first video content identifier is an identifier based on the video content, the identifier and Obtaining information from the frame in the specific video content. 9. The method of claim 1, wherein the specific video content is The source is a Digitai versatile Disc (DVD), and the first video content identifier is a 〇^ identification code. The method of claim 1, wherein the specific The video content is obtained from a file of a peer-to-peer network, and the first video content identifier is a hash value corresponding to the file. The specific video content is obtained from a video file stored in a server, and the first video content identifier is one of identifying the server and the specific video file. The method of claim, wherein converting the first video identifier to a second video identifier comprises generating a second video identifier corresponding to all of the finalized video content. 13. The method of claim i, wherein the converting the first video., the sign-of-second step comprises: generating a _specific view _ · a time interval corresponding to the A second video identifier. The method of claim 13, wherein the converting the first, the step of distinguishing the second video identifier further comprises finding a time coordinate of a specific video material corresponding to the first video inner valley identifier The second video content 72 200951832 is a fine relationship. The method of claim 1, wherein the step of converting the first video to the second suffix comprises generating and framing the video, 16 such as the time space object - the second video identifier . The method of claim 15, wherein the step of converting the first video to: ,, and the second video of the video further comprises finding a space-time coordinate of the special frequency data corresponding to the first video in the video. And the second video content=extracting the relationship between the spatiotemporal coordinates of the specific video material. H Please refer to the method described in item 1, wherein the first meta-data is non-time metadata corresponding to the specific video content of the video. The method of claim 1, wherein the meta-metadata is associated with a time interval metadata of the _ interval in the special video content, as described in item i of the patent application scope. The method, wherein the first meta-data is a spatio-temporal metadata corresponding to a spatio-temporal object in the specific video content. .2〇=请!利范! In the second item, the method 'the (four)-meta data system is from the source of the first element, and the second element of the 5th is derived from the source of the second source. A method of claim 21, wherein the step of converting the first identifier to the second video identifier comprises performing a conversion according to each of the requests. 22. The method of claim </ RTI> wherein the step of converting the first video 73 200951832 identifier to the second video identifier further comprises a second video content identifier previously generated. The method of claim 1, wherein the step of confirming the first video inclusion step further comprises the step of receiving the first visual axis capacity identifier by the imaging device. 24. The method of claim 1, wherein the method further comprises the step of confirming a correspondence between a time axis corresponding to the specific video content and a time axis corresponding to the second metadata. . 25. The method of claim 24, wherein the method further comprises the step of providing the correspondence to the requesting device. a method comprising at least one of: determining a first video sequence associated with a video program; identifying a second video sequence associated with the video program; calculating the first video sequence and the second video Corresponding relationship between the sequences; determining a calibration line of the first video sequence and the second video sequence; storing the correspondence between the first video sequence and the second video sequence; and storing the first video sequence And the information of the calibration line between the second video sequence. 27. The method of claim 26, wherein the step of calculating the first video sequence and the second video sequence comprises calculating the first video sequence and the second video sequence One of the spatial correspondences. 28. The method of claim 26, wherein the step of calculating the correspondence between the sequence of the video-video 74 200951832 and the second video sequence comprises calculating the first video sequence and the second video sequence One of the time correspondences. 29. The method of claim 26, wherein the calculating the correspondence between the first video sequence and the second video sequence comprises calculating the first video sequence and the second video sequence A time-space correspondence. 30. The method of claim 26, wherein the step of determining the calibration line of the first video sequence and the second video sequence comprises calculating video-based complex descriptors, The description symbol corresponds to the first video sequence and the second video sequence. 31. A method of at least the method comprising: receiving a request for one of a meta-data associated with a video program, the request including a first video content identifier corresponding to one of the video programs; a video sequence; confirming, according to the video sequence, a second video content identifier corresponding to one of the video programs; 撷 extracting the metadata from the unary data source based on the second video content identifier; and * by one of the corresponding video programs A time axis and a corresponding relationship with the time-axis of one of the corresponding metadata is confirmed. 32. The method of claim 31, wherein the step of confirming the correspondence comprises calculating a spatial correspondence between the video program and the metadata. 33. The method of claim 31, wherein the step of confirming the correspondence comprises calculating a time correspondence between the video program and the metadata. 0 75 200951832 34. The method, wherein the step of confirming the correspondence includes calculating a space-time correspondence between the video program and the metadata. 35. The method of claim 31, wherein the step of confirming the correspondence comprises the step of extracting a previously calculated correspondence from a storage device. The method of claim 31, wherein the step of confirming the correspondence comprises calculating a calibration line between the video program and the metadata. 37. The method of claim 31, wherein the meta-data is subtitle information corresponding to the video program. 〇 38. The method of claim 31, wherein the meta-data further includes a comment on the video program. 39. A device, the device comprising: - a communication module 'for receiving a request for a specific video content from a requesting device; and - a processor 'connecting with the communication module' to confirm that the corresponding video is corresponding Content - the first - visual _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The second-level information is obtained according to the second video content identifier of the Wei, and the information and the second-dimensional data are further provided to the requesting device. 4. The device according to claim 39, wherein the towel is used to find the time coordinate of the specific video material corresponding to the first video content and the second video content identifier. Corresponds to the relationship between time coordinates of a particular video material. 76. The apparatus of claim 39, wherein the processor is further configured to find the age coordinate of the miscellaneous fixed woven fabric and the second video content identification. The relationship between the space-time coordinates of a particular video material. 1 42. A method comprising the steps of: inputting a first video identifier, the first video identifier comprising a content identifier and a first set of coordinates in a first video content; a second video identifier of a second video content, wherein 'the second video content is similar to the first video content; and performing the following steps for each video identifier in the second set of video identifiers: confirming the a set of coordinates in the second video content; and outputting the set of coordinates in the second video content to request metadata of one of the first video content. 43. The method of claim 42, wherein the first coordinate of the group is a set of time points in a time axis, wherein the time axis corresponds to the first video content. The method of claim 42, wherein the first coordinate of the group is a set of time intervals in a time axis of the first video content. 45. The method of claim 42, wherein the first coordinate of the group is a set of spatial coordinates' wherein the spatial coordinate corresponds to one of the first video content. 46. The method of claim 45, wherein the set of spatial coordinates is a diagonal comers of a bounding box. 77. The method of claim 42, wherein the first coordinate of the group is a set of space-time coordinates, wherein the space-time coordinate corresponds to one of the first video content. 48. The method of claim 47, wherein the set of space-time coordinates is a set of time intervals, and space coordinates corresponding to the objects are respectively provided to each of the time intervals. 49. The method of claim 42, wherein the step of confirming the second set of video identifiers comprises extracting the second video identifier from a table, wherein the table includes one of storing video identifiers The first block, and the second field of the video identifier associated with the video content that is similar to the video content associated with the video identifier in the first block. 5. The method of claim 49, wherein the form is pre-computed. 51. The method of claim 49, wherein the form is a correspondence table. 52. The method of claim 49, wherein the step of calculating the form comprises calculating a description symbol based on the video content in the first video content, and comparing the description symbol with The content base description in a database. The symbol identifies other video content similar to the first video content. 53. The method of claim 52, wherein the content base description symbol is a visual nucleotide (visual nueleQtide). The device comprises at least: ^ a communication module, For receiving a first video identifier, the first video identifier includes a content identifier and a set of first 78 200951832 coordinates in a first video content and used to transmit a second video identifier and The second video identifier corresponds to a second set of coordinates, the second coordinate of the set is in a second video content, and a processor is connected to the communication module for extracting the first The second video axis character 'has the second vision pain of the lining, the silk screen includes a first block for storing the video identifier, and the video content storing the video content related to the video identifier in the first block The second field of the associated video identifier. ❹ 79