TW201224799A

TW201224799A - Video search method, system, and method for establishing a database therefor

Info

Publication number: TW201224799A
Application number: TW99141786A
Authority: TW
Inventors: Jih Sheng Tu; Jung-Yang Kao
Original assignee: Ind Tech Res Inst
Priority date: 2010-12-01
Filing date: 2010-12-01
Publication date: 2012-06-16
Also published as: CN102486800A; TWI443535B

Abstract

A video search method, a system, and a method for establishing a database therefore are introduced. In the method, a query clip with meta-data is provided. The meta-data includes an index tag and a semantic pattern. One or more candidate video clips are obtained from a video database according to the index tag of the query clip. A semantic pattern of each of the candidate video clips is sequentially compared with the semantic pattern of the query clip, and the candidate video clips are marked as a returnable video clip or a video clip which is not returnable. The candidate video clips which are marked as returnable can be sent back as search result in response to the query clip. A system using the video search method and a method for establishing a database are also provided.

Description

201224799 P52990083TW 35538twf.doc/t 六、發明說明：【發明所屬之技術領域】本揭露是有關於一種視訊搜尋’且特別是有關於一種以視訊内容作為搜尋輸入條件來進行視訊搜尋的搜尋方法、系統、及用於此視訊搜尋方法的視訊資料庫建立方法。【先前技術】目前網路上的搜尋目前網路上的搜尋技術，全都以文字(Text)為主的技術，例如Google、Yahoo、Youtube或是國内的無名小站等等搜尋引擎，都是以文字搜尋為主。雖然各搜尋引擎都希望能突破文字的限制，例如在相同關鍵字下，可搜尋繁體中文或是簡體中文、甚至其他語言的内容，但畢竟還是受到文字關鍵字的限制。例如，若希望能搜尋相關的多媒體資料，包括聲音檔案或是影像檔案，常常因為沒有足夠的文字内容可作為搜尋的依據，或是各國家採用不同的翻譯名稱，造成徒有關鍵字卻無法找出正確或者更多相關的資料。搜尋引擎Google在2009年4月份推出以照片找照片的服務’是以照片内容(Content)找具有相關内容(Content) 資料的服務。例如’請參照圖1A，當在輸入框120輸入 “apple”關鍵字110時’會出現跟“appie”有關的相關資料’ 但此時包括以蘋果造型所有的影像，以及商標“Apple®”的相關產品手機“iPhone”。此時，若再進一步點選時，就可以排除許多不適用資料。例如圖1B，使用者選取與蘋果造 201224799 P52990083TW 35538twf.doc/t 型相關的影像後，搜寻引擎進一步顯示與此類水果(蘋果) 相關的影像。而又例如圖1C，使用者選取與商標“Appie®，，相關產品手機“iPhone”影像後，則顯示與此產品相關的其他影像，更精確地找出使用者所要搜尋的照片。但此技術很顯然是用影像(Image)内容來搜尋影像(Image)，但僅限於相關照片圖檔的搜尋，對於多媒體檔案，並沒有任何方法可以搜尋。為突破此限制，在活動影像專業人員組織(M〇ti〇n201224799 P52990083TW 35538twf.doc/t VI. Description of the Invention: [Technical Field] The present disclosure relates to a video search, and in particular to a search method and system for performing video search using video content as a search input condition And a method for establishing a video database for the video search method. [Prior Art] Currently, the search technology on the Internet currently uses text-based technologies, such as Google, Yahoo, Youtube, or domestic unnamed stations. Search for the main. Although search engines are hoping to break through the limitations of text, for example, under the same keywords, they can search for content in Traditional Chinese or Simplified Chinese or even other languages, but they are still limited by text keywords. For example, if you want to be able to search related multimedia materials, including sound files or video files, often because there is not enough text content to be used as a basis for searching, or different translation names are used in different countries, resulting in keywords but not able to find Get correct or more relevant information. The search engine Google launched a service for finding photos by photo in April 2009. It is a service that finds content with content. For example, 'Please refer to FIG. 1A. When the "apple" keyword 110 is input in the input box 120, 'the related material related to "appie" will appear. "But this includes all images in the shape of Apple, and the trademark "Apple®". Related products mobile phone "iPhone". At this time, if you click further, you can exclude many unsuitable materials. For example, in FIG. 1B, after the user selects an image related to Apple's 201224799 P52990083TW 35538twf.doc/t type, the search engine further displays images related to such fruits (apples). For example, in FIG. 1C, after the user selects the image of the product "Appie®," and the related product mobile phone "iPhone", other images related to the product are displayed to more accurately find the photo to be searched by the user. Obviously, I use Image content to search for images, but only for the search of related photo files. There is no way to search for multimedia files. To break through this limitation, in the Motion Picture Professionals Organization (M 〇ti〇n

Picture Expert Group，以下簡稱MPEG)制定之技術協定 MPEP-7中，提出一種具有對内容提供補充資訊的標準，特別是針對多媒體數位内容。在此MPEP-7標準中，對於夕媒體可以k供相關對應的内容描述(Multimedia Content Description)，而且可以獨立於其他的MPEP標準，而此數位内容描述甚至可以附加在類比的電影檔案中。對於每一個視聽内容(Audio-visual Content)都可賦予對應的内容描述(Content Description)，此内容描述主要是 k出此視聽内容的相關特徵值。而其檔案編排的方式，則是例如： AV+Descript+AV+Desript+AV+Desript+... 此AV即代表視聽内容(Audio-visual Content)，而 Desript則疋代表對應的内容描述(c〇ntent Description)。但這樣的架構過於複雜，必須對於所有的多媒體檔案進行重新編排，不適合既有的檔案與架構。另外，雖然可以經由特徵值，藉由類似關鍵字的搜尋，而找出相關的多 201224799 P52990083TW 35538twf.doc/t 媒體檔案，但是，卻無法跳脫以文字搜尋造成不同語言之間的隔閡。此外，隨著網路與TV的結合曰漸普及，在電視上要進行視訊搜尋將不可避免會碰到關鍵字輸入的問題。一般人看電視手頭上所握的都是一個遙控器，以遙控的大小與功能是無法取代鍵盤作為文字輸入的裝置，因此在網路丁乂上利用遙控器來控制視訊搜尋是未來此類應用的一個問題點所在。【發明内容】在一實施範例中，提出一種視訊搜尋方法，包括接收一查詢短片（Query Clip)的元資料(Meta-data) ’其中元資料包括一第一索引標籤與一第一語意樣式。根據上述第一索引標籤從至少一視訊資料庫中取得一或多個候選視訊短片。根據所述第一語意樣式與逐一與每一候選視訊短片的語意樣式進行比對，而根據一比對結果將每一候選視訊短片標示為可回傳視訊短片或不可回傳視訊短片。標示為可回傳視訊短片的所述候選視訊短片則為符合查詢短片的查詢結果。在一實施範例中，提出一種建立可根據查詢短片 (Query Clip)查詢視訊的視訊資料庫的方法。此視訊資料庫包括儲存多個視訊位元串流及該些視訊位元串流的元資料 (Meta-data)。每一所述元資料之建立方法包括對視訊位元串流以一分割偵測流程進行分割處理，產生多個片段。對 201224799 P52990083TW 35538twf.doc/t 所述片丰又進行索引的編排，而根據每一片段的内容賦予對應的一索引標籤。根據每一所述片段的視訊特徵(Video Feature)建立片段的語意樣式，其中元資料至少包括所述片° 段所對應的索引標籤與語意樣式。在一實施範例中，提出一種視訊搜尋系統，包括一搜尋引擎與至少一視訊資料庫。此搜尋引擎用以接收一查詢短片（Query Clip)的元資料(Meta-data)，其中所述元資料包括一第一索引標籤與一第一語意樣式。此視訊資料庫包括多個視訊短片，其中搜尋引擎根據第一索引標籤從一視訊資料庫中取得一或多個候選視訊短片，並根據第一語意樣式與逐一與每一候選視訊短片的語意樣式進行比對，而根據一比對結果將每一候選視訊短片標示為可回傳視訊短片或不可回傳視訊短片，其中標示為可回傳視訊短片的候選視訊短片則為符合查詢短片的查詢結果。為讓本揭露之上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。【實施方式】本揭露貫施範例提出一種新的視訊搜尋技術，突破目月’J以文字為主的搜尋技術，建立一個以視訊内容作為搜尋條件的技術，達到以視訊找視訊的搜尋結果。在本揭露實施範例其中之一，是選擇其中一段視訊檔案作為檢索資訊(Query Message)，此選擇方式不論是透過使用者選擇任-時間長度的視訊槽案，或是經由使用者界 201224799 P52990083TW 35538twf.doc/t 面自動選取其中一段固定或特定期間的視訊檔案等等方式皆可運用在此範例中。上述的使用者選擇方式，在一實施例中，可内建於控制影片播放的遙控裝置中，例如電視或是DVD的播放^ 等等。或是内建於觸控顯示器或螢幕上的使用者界面，能讓使用者方便且簡單的方式停住___段時_影#作為檢索資訊等’均屬本發明之應用。 W針對選擇作為檢索條件的視訊檔案，由於其名稱或是影片格式，甚至晝面大小與品質可能有所不同，但若為同二部影片，其劇情是相同的。所以，只要針對所選擇的視几槽案’建立檢索索引取⑽仏㈣，即可找出相同或是 t =的影片。例如對作為檢索條件的視訊檔案以及欲兔=有其他視訊檔案’可先經由格式轉換而轉換為具有相同格式的視訊槽案。 L由^本實施例加人了時間軸(Time D〇main)的特徵，因二鈾U欲進仃檢索的所有其他視訊檔案中找出相同時間軸區奴的對應視訊檔案同格式的魏财。在㈣為具有相 /、在貫轭例中，所有欲進行檢索的其資料庫是存在區域轉的域、搜尋引擎的主機 ::主i疋雲端資料庫等等。而此格式轉換的運算，可或是雲端運算以統伺服器、搜尋引擎的主機實施新的視訊搜尋的方法，在多個 Τ之叫參照圖2Α與圖2Β所示。圖2八是 201224799 P52990083TW 35538twf.doc/t 說明在-視訊影+選擇其段的視鋪案作為檢索資訊（Query Message)，以進行所謂的内容查詢Picture Expert Group, hereinafter referred to as MPEG) Technical Agreement MPEP-7 proposes a standard for providing supplementary information to content, especially for multimedia digital content. In this MPEP-7 standard, the media content can be provided for the corresponding content description (Multimedia Content Description), and can be independent of other MPEP standards, and the digital content description can even be attached to the analog movie file. For each audio-visual content (Audio-visual content), a corresponding content description (Content Description) can be given, and the content description is mainly k related feature values of the audiovisual content. The way of file layout is, for example: AV+Descript+AV+Desript+AV+Desript+... This AV represents Audio-visual Content, while Desript represents the corresponding content description (c〇 Ntent Description). However, such a architecture is too complex, and all multimedia files must be rearranged and not suitable for existing files and architectures. In addition, although it is possible to find related 201224799 P52990083TW 35538twf.doc/t media files by using keyword-like search through feature values, it is impossible to escape the gap between different languages by text search. In addition, as the combination of the Internet and TV becomes more popular, video search on TV will inevitably encounter the problem of keyword input. The average person who watches TV is holding a remote control. The size and function of the remote control cannot replace the keyboard as a text input device. Therefore, using the remote control to control video search on the network is the future application of this type. A problem is there. SUMMARY OF THE INVENTION In one embodiment, a video search method is provided, including receiving metadata (Meta-data) of a Query Clip, wherein the metadata includes a first index label and a first semantic style. And acquiring one or more candidate video clips from the at least one video database according to the first index tag. And comparing each of the candidate video clips as a returnable video clip or a non-returnable video clip according to a comparison result according to the first semantic style and the semantic style of each candidate video clip. The candidate video clips that are marked as returnable video clips are the result of the query that matches the query video. In an embodiment, a method of establishing a video library for querying video based on a Query Clip is proposed. The video database includes metadata (Meta-data) for storing a plurality of video bit streams and the video bit streams. Each method for establishing the metadata includes segmenting the video bit stream in a segmentation detection process to generate a plurality of segments. The layout of 201224799 P52990083TW 35538twf.doc/t is indexed, and an index label corresponding to each fragment is given. A semantic style of the segment is established according to a video feature of each of the segments, wherein the metadata includes at least an index tag and a semantic style corresponding to the slice segment. In one embodiment, a video search system is provided that includes a search engine and at least one video library. The search engine is configured to receive a metadata (Meta-data) of a query clip, wherein the metadata includes a first index tab and a first semantic style. The video database includes a plurality of video clips, wherein the search engine obtains one or more candidate video clips from a video database according to the first index label, and according to the first semantic style and the semantic style of each candidate video clip. Performing an alignment, and marking each candidate video clip as a returnable video clip or a non-returnable video clip according to a comparison result, wherein the candidate video clips marked as returnable video clips are query results matching the query video. . The above described features and advantages of the present invention will be apparent from the following description. [Embodiment] This disclosure example proposes a new video search technology, which breaks through the text-based search technology of the target month, and establishes a technology that uses video content as a search condition to achieve video search results. One of the embodiments of the present disclosure is to select one of the video files as the Query Message. The selection method is either by selecting the video channel of any length or length of time, or by the user community 201224799 P52990083TW 35538twf The .doc/t face automatically selects a video file of a fixed or specific period, etc., which can be used in this example. The user selection method described above, in one embodiment, may be built into a remote control device for controlling movie playback, such as playback of a television or a DVD, and the like. Or the user interface built into the touch display or the screen, which allows the user to conveniently and simply stop the ___ segment _ shadow # as search information, etc., is an application of the present invention. For the video file selected as the search condition, the size and quality of the video file may be different due to its name or video format. However, if it is the same two films, the story is the same. Therefore, as long as the search index is set to (10) 仏 (4) for the selected view, the same or t = movie can be found. For example, a video file as a search condition and a video file for the rabbit = other video files can be converted into a video slot having the same format by format conversion. L is a feature of the time axis (Time D〇main) added by this embodiment, and the corresponding video files of the same time axis area are found in the same format as the other U.S. U. . In (4) for the phase/in yoke example, all the databases to be searched are the domain-existing domain, the search engine host: the main i cloud database, and so on. The operation of this format conversion, or the cloud computing to implement a new video search method by the server and the search engine host, is shown in FIG. 2 and FIG. Figure 2 is 201224799 P52990083TW 35538twf.doc/t Description In the video video + select the segment of the video file as a search message (Query Message) for so-called content query

Search)。此選擇方式不論是透過使用者選擇任—時間長度的視訊檔案，或是經由使用者界面自動選取其中一段固= 或特定期間的視訊檔案等等方式皆可運用在此範例中。如圖2A為例，當使用者在觀看影片檔名為「葉問」時，此影片210的相關特徵為解析度讲以〇1此〇11)是 352x288(晝素)、晝框率(Frame_Rate)每秒15個晝框伽圆 ρπ second)以及晝面的品質(Quality)為低畫質(L〇w)。在此衫片中選擇其巾-段的視訊檔案作為檢索的條件，並且將此檢索資訊（Query Message)傳回搜尋引擎⑼肌h Engme)22〇 ’以進行所謂的内容查詢(Content Search)。在根據本實建立的檢索索引(Seafeh in(jex)巾進行内容檢索後’對於找到的影諸案，例如圖示中的影片230, 為不符合條件的影片，因此不會回傳。但是，例如對於影片232與234而言，則是符合内容檢索的條件’因此，會將影片232與234回傳給使用者。，檢索仵到的影片，例如圖示中的影片23。、232與234，可八有不同的影片特徵。例如影片230解析度是 ^2。4^768、晝樞率每秒3〇個畫框以及晝面的品質為低畫、。影片232解析度是1920x1080、晝框率每秒15個畫框、及旦面的。〇質為高晝質（High)，以及234解析度是 1〇80晝樞率每秒30個晝框以及晝面的品質為高晝 201224799 P52990083TW 35538twf.doc/t 得到的檢索結果，與原來的影片並不需要相同的進行士檢索資訊，以以粑據關榼^ r的衫片。也就是，使用者一開始可 :關鍵子，完成以文字作為基礎的檢索。而後，在第 =驟’搜尋引擎22〇則回傳一個視訊位元串流(vide〇 -1ΓΓΓ)與其70資料(底下稱為Meta-Data)給使用者。在第二=中，使用者對影片中某段劇情有興趣，希望能觀看 =二析度及/或不同晝質的版本。使用者可選擇視訊槽 ' 。卩伤作為檢索的條件。而後，在第四步驟，可將選擇的視訊短片的Meta_Data資料傳回給搜尋引擎 220。如第五步驟，搜尋引擎22〇根據選擇的視訊短片，對所有可得到的視訊資料庫進行檢索，如圖示的視訊資料庫 240、242與244，而找出類似的視訊檔案。並且接著如第 /、步驟，將找出類似的視訊檔案回傳給使用者，讓使用者選擇。對於視訊資料庫240、242與244而言，在一實施例中，可以是存在遠端的主機中，例如區域網路的主機、搜哥引擎的主機資料庫、或是雲端資料庫等等。而搜尋引擎 22〇或是袼式轉換的運算，可在個人主機、在區域網路的 201224799 P52990083TW SSSSStwf.doc/t 系統伺服器、搜尋引擎的主機或是雲端運算系統等進行皆可。上述搜尋引擎220根據選擇的視訊短片（Vide〇 clip)，對所有可得到的視訊資料庫進行檢索，而找出類似的視訊 $案。而針對視赌片進行視訊資料庫的檢索，根據本揭露多個實施範例其中之-，需先對所儲存的所有視訊檔案進行Meta-Data的建立流程。而後才能進行搜尋並取得類似的視訊檔案。也就疋本揭露多個實施範例其中之一，類似影片搜尋的方法，包括底下兩大步驟，首先，為視訊資料庫的建立步驟，其次為取得(Retrieving)類似視訊短片（video Clip)的步驟。上述視訊資料庫的建立步驟則是至少包括（U對視訊檔案的分割(Segmentation)與建立索引（indexing):包括將一個視§fl槽案分割為多個視訊短片（Video Clip)，而後賦予每一個視訊短片一個索引標籤(Index Tag) ; (2)語意樣式 (Semantic Pattern)的建立：糟由視訊特徵(video Feature)建立每個視訊短片的語意樣式。上述取得(Retrieving)類似視訊短片的步驟至少包括⑴ 取得候選視訊短片（Candidate Clips):根據所選擇一段視訊檔案作為檢索條件的檢索短片（Query Clip)尋找具有相同或類似的索引標籤（Index Tag)作為候選視訊短片 (Candidate Clips) ; (2)語意樣式(Semantic Pattern)的比對：計算檢索短片（Query Clip)與所有候選視訊短片的語意樣 201224799 P52990083TW 35538twf.doc/t 式距離(Semantic Distance) ’ 並與一臨限值(Thresh〇ld)比較後，判斷是否為類似視訊短片。在一實施例中，視訊資料庫的建立步驟請參照圖3。視訊資料庫的建立流程300中，針對所有影片的原始視訊位元串流310，除了將視訊檔案全部儲存在儲存系統35〇外，更加上對原始視訊位元串流310執行視訊位元串流剖析(Parsing)步驟320、建立Meta-data步驟330以及將所得到的對應Meta-data儲存在儲存系統350。上述關於對視訊位元串流的剖析(Parsing)步驟，請參照2010年7月21曰所提出美國第12/804,477號，名稱為 MVIDEO SEARCH METHOD USING MOTION VECTORS AND APPARATUS THEREOF”的申請案内容(對應2010年 4月30日在中國民國出名稱為“使用動態向量的視訊搜尋方法及其裝置”的第099113963號專利申請案，或是2010 年6月29日在中國提出名稱為“使用動態向量的視訊搜尋方法及其裝置”的第201010220461.1號專利申請案），在此參照並加入相關的内容。對視訊位元串流的剖析(Parsing)步驟若是採用動態向量(Motion Vector)法時，對所有視訊檔案，通常為已經過壓縮的資料，其位元串流(Bitstream)進行剖析(Paring)，可以選擇性的以一定比例(例如1:2、1:4、1:N，N為晝框的數目）取得對應畫框的移動向量(MV)值，其目的是為了彈性調整時間軸之取樣率。本實施例將所有視訊檔案（包括作為檢索條件的視訊擋案)轉換為相同格式的用意，在於所有 201224799 P52990083TW 35538twf.doc/t 視訊檔案内圖框的移動向量。也就是說，對所有壓縮的視訊檔案取出其移動向量，據以建立檢索索引。而對於不同解析度的移動向量，在本實施例中，可以利用上調取樣（Up-Sampling)或是下調取樣 (Down-Sampling)的方式進行改變。例如，一般視訊檔案是由很多依照時間軸順序連續排列的晝框(Frame)所組成，而每個畫框是由很多個微區塊(MicroBl〇ck，MB)編碼而成，而母個微區塊MB為例如是16x16為單位，而對於每個微區塊MB具有的移動向量，有可能一個，也可能具有16 個（一個MB可再切割成塊4χ4之子方塊），因此不同格式的影片光一#1 MB中可能就有丨〜16個Μν值在其中，對於後面的MV差值運算會造成無法對應運算。所以，為了統一解析度，則必須將每個微區塊ΜΒ*具有的移動向量數量調整成-致，在—實施例中，若是要將―移動向量調整成一個，可以採用例如平均法，將η個移動向量的值做一平均數的計算。 3另外，若是要將僅有丨個移動向量轉為具有η個移動 =置的作法’可以將1個移動向量轉為η個相同值的移動向量。一另外，針對如何決定微區塊MB所具有的移動向量是，轉η個或是轉—個的方式，可採用統計法完成。二，對ΜΡΕΡ的視訊編碼格式，通常在架構上會定義畫 jf i i(GlOUP 〇f朽伽⑽’ G〇P)，例如在ΜΡΕΡ_4協定處、’·^動態影料，為賴比較好㈣縮效果，會定義此 12 201224799 P52990083TW 35538twf.doc/t 晝面群組(GOP)，可以獲得影像資料内的隨機存取動作， =如在MPEP-4協定是包括九張晝面(一個1晝面、兩個向前預測的P晝面與六個雙向預測的B畫面）。因此，若欲知道採用那個移動向量數量比較適合，則在一例子中，可以，面群組(G0P)為基本單位，針對其微區塊mb的移動向量所屬的區塊大小(Block Size)所佔的比率，以例如一臨限值(Threshold)來決定採用那個數量的移動向量進行建立檢索索引。對視訊位元串流的剖析(parsing)步驟在一實施例中，也可採用HSV長條圖（HSV Histogram)剖析法。上述的HSV Histogram剖析法，底下將介紹其中一種實施範例加以說明。請參照圖4是說明本揭露一實施例中對於視訊位元串流建立Meta-data的方法。對於一個視訊位元串流410進行分割（Segmentation)後，成為分割視訊檔案420。而此分割視訊檔案420如圖所示，例如根據場景變換點（Scene Change Point)轉換為5個不同的片段。而後，則進一步對這5個片段進行索引的編排，成為具有不同索引標籤(Index Tag)的視訊短片430。如圖所示，這5個片段的索引例如 &7匕5(^、3山3。2、、a4b7c2、以及a3b3c2。而後’藉由視訊特徵(Video Feature)建立每個視訊短片的語意樣式，如圖所示，5個片段的索引a7b5Ci、&山3〇2、a2b5Ci、a4b7<：2、以及a3b3c2轉換後的語意樣式440分別為“0 0 0 0”、“1 〇” ' “1 1 -1 1”、“1 0 0 0 1 -1 0 Γ、以及“0 0 1”。Search). This selection method can be used in this example, whether the user selects any video file of a certain length of time, or automatically selects a video file of a certain period or a specific period through the user interface. As shown in FIG. 2A, when the user is viewing the title of the movie file "Ip Man", the relevant feature of the movie 210 is that the resolution is 〇1 〇11) is 352x288 (quality), frame rate (Frame_Rate ) 15 frames per second ρπ second) and the quality of the face is low (L〇w). The video file of the towel-segment is selected as the retrieval condition in the shirt, and the search information (Query Message) is transmitted back to the search engine (9) muscle h Engme 22' to perform a so-called content search. After the content search (the Seafeh in (jex) towel is used to retrieve the content according to the real thing', for the found movie, for example, the movie 230 in the illustration is a non-eligible movie, and therefore will not be returned. For example, for movies 232 and 234, it is in accordance with the conditions of content retrieval. Therefore, the videos 232 and 234 are transmitted back to the user. The retrieved videos, such as the movies 23 in the illustration, 232 and 234 are retrieved. There are different movie features. For example, the resolution of the movie 230 is ^2. 4^768, the pivot rate is 3 frames per second, and the quality of the face is low. The resolution of the movie 232 is 1920x1080, 昼The frame rate is 15 frames per second, and the surface is high. The enamel is high, and the 234 resolution is 1〇80昼. The rate is 30 frames per second and the quality of the face is higher than 201224799. P52990083TW 35538twf.doc/t The search results obtained do not need the same information as the original search for the original film, so that the user can start with the key piece. That is, the user can start with: key, complete Search based on text. Then, search in the first step擎22〇 returns a video bit stream (vide〇-1ΓΓΓ) and its 70 data (hereinafter referred to as Meta-Data) to the user. In the second =, the user is interested in a certain story in the film, I hope that I can watch the version of the second resolution and/or different enamel. The user can select the video slot'. The injury is used as the search condition. Then, in the fourth step, the Meta_Data data of the selected video clip can be transmitted back to Search engine 220. As a fifth step, the search engine 22 retrieves all available video repositories based on the selected video clips, such as the illustrated video repositories 240, 242, and 244, to find similar video files. And then, as in step /, step, a similar video file will be found back to the user for the user to select. For video library 240, 242 and 244, in one embodiment, there may be a remote end. Hosts, such as the host of the regional network, the host database of the search engine, or the cloud database, etc. The search engine 22 or the conversion operation can be performed on the personal host and the local area network. 201224799 P529 90083TW SSSSStwf.doc/t The system server, the search engine host or the cloud computing system can be used. The search engine 220 searches all available video data files according to the selected video clip (Vide〇clip). And to find a similar video $. The search of the video database for the video, according to a plurality of embodiments of the disclosure, the Meta-Data creation process is first performed on all the stored video files. In order to search and obtain similar video files. In other words, one of the plurality of implementation examples is disclosed, and the method similar to the video search includes the following two steps: first, the steps of establishing the video database, and secondly, the steps of retrieving a video clip. . The step of establishing the video database includes at least (Segmentation and indexing of the U-video file: including dividing a §fl slot into a plurality of video clips, and then assigning each A video clip has an index tag (Index Tag); (2) Semantic pattern creation: the semantic style of each video clip is established by the video feature. The steps of retrieving a video-like movie are as follows. At least (1) obtaining candidate video clips (Candidate Clips): searching for the same or similar index tags (Candidate Clips) according to the Query Clip of the selected video file as a search condition; 2) Comparison of Semantic Patterns: Calculate the Query Clip and the semantics of all candidate video clips 201224799 P52990083TW 35538twf.doc/t Distance (Semantic Distance) ' and a threshold (Thresh〇 Ld) After comparison, it is judged whether it is a similar video clip. In an embodiment, the steps of establishing the video database Please refer to FIG. 3. In the process of establishing the video database 300, the original video bit stream 310 for all movies is executed on the original video bit stream 310 except that the video files are all stored in the storage system 35. The video bit stream parsing (parsing) step 320, the meta-data step 330 is established, and the obtained corresponding meta-data is stored in the storage system 350. For the parsing step of the video bit stream, please refer to The application content of the US No. 12/804,477 titled "MVIDEO SEARCH METHOD USING MOTION VECTORS AND APPARATUS THEREOF" was submitted on July 30, 2010 (corresponding to the name of "using dynamic vector" on April 30, 2010 in China. Patent application No. 099113963 to video search method and apparatus thereof, or patent application No. 201010220461.1 entitled "Video Search Method and Apparatus Using Dynamic Vector" in China on June 29, 2010, in Refer to this and add the relevant content. If you use the Motion Vector method for the parsing step of the video bit stream, for all Video files, usually compressed data, are parsed by Bitstream, which can be selectively scaled (for example, 1:2, 1:4, 1:N, N). The number of motion vectors (MV) of the corresponding frame is obtained in order to adjust the sampling rate of the time axis flexibly. The purpose of converting all video files (including video files as search conditions) into the same format in this embodiment lies in the motion vectors of all the frames in the video file of 201224799 P52990083TW 35538twf.doc/t. That is to say, the motion vector of all compressed video files is taken out to establish a retrieval index. For the motion vectors of different resolutions, in this embodiment, the change may be performed by using up-sampling or down-sampling. For example, a general video file is composed of a number of frames arranged in a sequence of time axes, and each frame is encoded by a plurality of micro-blocks (MicroBl〇ck, MB), and the parent micro- The block MB is, for example, in units of 16x16, and for each micro block MB, there is a possibility that one or both of them may be 16 (one MB can be further cut into sub-blocks of blocks 4χ4), so videos of different formats are used. In the light #1 MB, there may be 丨~16 Μν values in it, and the subsequent MV difference calculation will cause the corresponding operation. Therefore, in order to unify the resolution, it is necessary to adjust the number of motion vectors that each micro-block ΜΒ* has, in the embodiment, if the "movement vector is to be adjusted to one, for example, an averaging method, The values of the n moving vectors are calculated as an average. In addition, if only one motion vector is to be converted to have n movement = set, one motion vector can be converted into n motion vectors of the same value. In addition, for how to determine the motion vector that the micro-block MB has, the method of transferring n or rotating one can be done by statistical method. Second, the video encoding format for ΜΡΕΡ, usually defines the drawing jf ii (GlOUP 〇f 伽 ( (10) ' G 〇 P), for example, in the ΜΡΕΡ _ 4 agreement, '· ^ dynamic film, for the better (four) shrink The effect will define this 12 201224799 P52990083TW 35538twf.doc/t face group (GOP), you can get the random access action in the image data, = if the MPEP-4 agreement includes nine facets (one facet) , two forward-predicted P-planes and six bi-predicted B-pictures). Therefore, if it is desired to know that the number of motion vectors is suitable, in an example, the face group (G0P) may be a basic unit, and the block size (Block Size) to which the motion vector of the micro block mb belongs may be used. The ratio is determined by, for example, a threshold (Threshold) to determine the number of motion vectors to establish a retrieval index. In the embodiment of the parsing step of the video bit stream, the HSV Histogram parsing method can also be used. The HSV Histogram analysis method described above will be described below with reference to one of the embodiments. Referring to FIG. 4, a method for establishing Meta-data for video bitstreams in an embodiment of the present disclosure is illustrated. After segmentation is performed on one video bit stream 410, it becomes a divided video file 420. The split video file 420 is shown in the figure, for example, converted into five different segments according to a Scene Change Point. Then, the five segments are further indexed into video clips 430 having different index tags. As shown, the indices of the five segments are, for example, &7匕5 (^, 3, 3.2.2, a4b7c2, and a3b3c2. Then, the semantic style of each video clip is established by the video feature. As shown in the figure, the semantic expressions 440 of the indexes a7b5Ci, & mountain 3〇2, a2b5Ci, a4b7<:2, and a3b3c2 of the five segments are respectively "0 0 0 0" and "1 〇" ' 1 1 -1 1", "1 0 0 0 1 -1 0 Γ, and "0 0 1".

13 S 201224799 P52990083TW 35538twf.doc/t 底下將分別就視訊資料庫的Meta-Data資料建立步驟 ⑴分割（Segmentation) ; (2)建立索引（indexing);以及(3)語意樣式(Semantic Pattern)的產生進行說明。分割（Segmentation) 對視訊禮案進行分割’需採用一致的方式。在一實施例中’對視訊檔案的分割’採用所謂場景變換（Scene Change)的地方當作切割視訊檔案的點，這樣讓每個切割出來片段有較高的相似性。以上述場景變換的點所切割出來的每個小段視訊檔案’在此實施例中稱為片段(Shot)。而對於每個片段(Shot)的選擇，在多個實施例其中之一，是採用HSV長條圖(HSV Histogram)剖析法。也就是，根據每個晝框(Frame)之間的HSV Histogram的差距當作判斷是否要把目前這個畫框當成片段邊緣(Shot Boundary)的依據。 HSI色彩空間是從人的視覺系統出發，用色調(Hue)、飽和度(Saturation 或 Chroma)和亮度（Value、Intensity 或 Brightness)來描述色彩。HSV色彩空間可以用一個圓錐空間模型來描述，請參照圖5A，Η代表色調(Hue)、S代表飽和度(Saturation)、及V代表亮度（Value)。用這種描述色彩空間的圓錐模型.相當複雜，但卻是能把色調、亮度和飽和度的變化情形表現得很清楚。通常把色調和飽和度通稱為色度，用來表示顏色的類別與深淺程度。由於人的視覺對亮度的敏感程度強於對顏色濃淡的敏感程度，為了便 201224799 P52990083TW 35538twf.doc/t 於色彩處理和識別，人的視覺系統經常採用HSV色彩空間，它比RGB色彩空間更符合人的視覺特性。而關於HSV長條圖(HSV Histogram)轉換，請參照圖 5B ’每個圖檔在經過Histogram轉換後轉為三個分佈圖，包括色調(Hue)分佈圖520、飽和度(Saturation)分佈圖522、及焭度（Value)分佈圖524。而如圖示5C，為說明經由HSV Histogram轉換示意圖。根據所取得的HSV Histogram當成特徵，判斷是否要把目前這個晝框當成片段邊緣(sh〇t Boundary)的片段偵測演繹（Shot Detection Algorithm)方法。而依據此特徵作為是否將這些畫框當成片段邊緣(shot Boundary)的依據。例如，如圖5C所示，針對取樣畫框（Sampling Frames)，對每個晝框進行HSV轉換，例如對畫框f；進行轉換後得到左側的HSV分佈圖，而對晝框fi+1進行轉換後得到右側的HSV分佈圖，而後，計算相鄰晝框之間的距離 (Distance，D)，如圖所示 D(/；U。鲁上述的HSV Histogram轉換與分割的可參考例如在 IEEE ICIP 2002 由 Shamik Sural、Gang Qian 與 Sakti Pramanik 所發表名稱為 “SEGMENTATION AND HISTOGRAM GENERATION USING THE HSV COLOR SPACE FORIMAGE RETRIEVAL”的論文内容，或是由 Te-Wei Chiang，Tienwei Tsai, Mann-Jung Hsiao 在 “Performance Analysis of Color Components in 15 201224799 P52990083TW 35538twf.doc/t13 S 201224799 P52990083TW 35538twf.doc/t The following steps will be carried out on the Meta-Data data creation of the video database (1) Segmentation; (2) Indexing; and (3) Semantic Pattern generation Be explained. Segmentation Separating video courtesies requires a consistent approach. In one embodiment, the 'segmentation of the video file' uses a so-called scene change as the point at which the video file is cut, so that each cut segment has a higher similarity. Each of the small video files cut out by the points converted by the above scenes is referred to as a "shot" in this embodiment. For the selection of each segment, in one of the various embodiments, an HSV Histogram analysis is employed. That is, the gap between the HSV Histograms between each frame is used as a basis for judging whether or not the current frame is to be the Shot Boundary. The HSI color space is based on the human visual system and uses Hue, Saturation or Chroma and Brightness (Value, Intensity or Brightness) to describe the color. The HSV color space can be described by a conic space model. Referring to Figure 5A, Η represents Hue, S represents Saturation, and V represents Value. Using this conical model to describe the color space is quite complicated, but it can clearly show the changes in hue, brightness and saturation. Hue and saturation are commonly referred to as chromaticity and are used to indicate the type and shade of the color. Since human vision is more sensitive to brightness than sensitivity to color shading, in order to color processing and recognition, the human visual system often uses HSV color space, which is more in line with RGB color space. Human visual characteristics. For the HSV Histogram conversion, please refer to FIG. 5B. 'Each image file is converted into three profiles after Histogram conversion, including Hue distribution map 520 and Saturation distribution map 522. And the value distribution map 524. As shown in Fig. 5C, a schematic diagram of the conversion via HSV Histogram is illustrated. According to the obtained HSV Histogram, it is judged whether or not the current frame is regarded as the slice detection method of the slice edge (sh〇t Boundary). According to this feature, whether or not these frames are used as the basis of the shot Boundary. For example, as shown in FIG. 5C, for sampling frames (Sampling Frames), HSV conversion is performed for each frame, for example, the frame f is converted to obtain the HSV distribution on the left side, and the frame fi+1 is performed on the frame. After conversion, the HSV distribution map on the right side is obtained, and then the distance between adjacent frames (Distance, D) is calculated, as shown in the figure D (/; U. Lu above HSV Histogram conversion and segmentation can be referred to, for example, in IEEE ICIP 2002 by Shamik Sural, Gang Qian and Sakti Pramanik, entitled "SEGMENTATION AND HISTOGRAM GENERATION USING THE HSV COLOR SPACE FORIMAGE RETRIEVAL", or by Te-Wei Chiang, Tienwei Tsai, Mann-Jung Hsiao at "Performance" Analysis of Color Components in 15 201224799 P52990083TW 35538twf.doc/t

Histogram-Based Image Retrieval” 所提出的距離量測 (Distance Measurement)的方法。建立索引(Indexing) 當視訊檔案切成幾個小片段之後，接著要對這些小片段標上索引標藏(Index Tag)。對每個片段(Shot)加上索引的方法，有許多種不同的實施例，在部分實施例中，可參照每個片段的動態向量(Motion Vector，MV)、片段的長度、色彩空間的分佈(例如色調Hue的角度）或其他的特徵之一或其各種組合進行索引的編輯。在一實施例中，請參照圖6，是利用包括(a)動態向量 (Motion Vector)的方向分佈、（b)片段的長度、（c)色彩空間中的色調Hue的角度分佈取得其對應的索引值。如圖6(a) 中的MV的方向分佈可分為八個象限(ai、a2、a3、a4、a5、 ae、、a8) ’圖6(b)則是針對片段的長度給予不同的索引值(b〗、b2、b3、b4、…）’而圖6(c)則是針對色調Hue的角度分佈可分為三個象限(c!、c2、c3)。語意樣式(Semantic Pattern)的產生為了快速的進行視訊串流的相似性比較，必須把一段有大量資料量的影片轉化成具有意義的符號，減少需要比對的資訊’而這串符號稱為語意樣式(Semantic Pattern)。建立語意樣式的方法，在多個實施例其中之一，可先從晝框與畫框間的動態向量(MV)做相減，再把相減完得到的向 16 201224799 P52990083TW 35538twf.doc/t 量取長度，把這些長度加總起來可以當成晝框與晝框間的關聯’透過預定運算式的轉換後，得到一個總合比目前的總合大Delta值，則得到“1”，小於Delta值則得到“-Γ，其他情況則得到，如圖7B與7C所示。經過上述步驟可以把一段影片轉換成一串由1、〇、-1組成的語意樣式。對於一個視訊串流而言，經過上述的步驟後被分割為多個片段’而後’對每個片段產生特定的標籤索引與語意樣式。如圖7D中，視訊串流710被分割為多個片段shot〇、 shoti、shot、…、Sh〇tn720後，則具有不同的標籤索引與語意樣式730。如片段sh〇t0則是具有標籤索引aibiCi與語思樣式（1 0 1 1 -1 -1 〇〇)，如標號732所示。片段shot!則是具有標籤索引aib3c2與語意樣式(-1 1 〇〇〇〇〇〇)，如標號734所示。而此標籤索引則例如圖6(a)、（b)、（c)所示的標鐵索引值。取得(Retrieving)類似視訊短片對於取得(Retrieving)類似視訊短片的步驟至少包括（1) 取得候選視訊短片（Candidate Clips):根據所選擇一段視訊檔案作為檢索條件的檢索短片（Query Clip)尋找具有相同或類似的索引標籤（Index Tag)作為候選視訊短片 (Candidate Clips) ; (2)語意樣式(Semantic Pattern)的比對：計算檢索短片（Query Clip)與所有候選視訊短片的語意樣式距離(Semantic Distance)，並與一臨限值(Threshold)比較後’判斷是否為類似視訊短片。 17 201224799 P52990083TW 35538twf.doc/t 上述取得類似視訊短片的步驟流程圖，請參照圖8，首先，如步驟S810，開始進行類似視訊短片的搜尋。而步驟S820 ’則疋接收查询短片（Query ciip)的Meta-data。而後，如步驟S830，藉由索引標籤(IndexTag)從視訊資料庫中取得候選視訊短片（假設具有X數量），也就是候選視訊短片與查詢短片具有相同的索引標籤。步驟S840 ’則計算藉由語意樣式查詢第丨個候選視訊短片與查詢短片之間的語意長度(Semantic Distance)，並接著步驟S850,判斷候選視訊短片與查詢短片之間的語意長度D的值是否小於臨限值，若否’則進行下一個候選視訊短片i+Ι的比較。若是語意長度D的值小於臨限值，則此第i個候選視訊短片則屬於可以回傳給使用者的類似視訊短片。而步驟S870則是判斷是否完成這些X個候選視訊短片的比對，若i<X，則繼續步驟S840，下一個候選視訊短片i+Ι的比較’若i==X ’則如步驟S880，停止此流程。上述語意長度(Semantic Distance)的比較方法，在一實施例中可採用底下的式子運算： I，其中，CK為第i個候選視訊短片的第 K個語意樣$，而其數值可能為1、-1或是〇。而％則是查詢短片（Query Clip)的第K個語意樣式。而L是查詢短片的長度值。 201224799 P52990083TW 35538twf.doc/t 候選視訊短片與查詢短片具有相同或類似的索引標籤前述揭露内容中，對視訊檔案進行分割，採用場景變換(Scene Change)的地方當作切割視訊檔案的點，這樣讓每個切割出來片段有較高的相似性。而場景變換的點所切割出來的每個小段視訊檔案稱為片段(Sh〇t)。而對於每個片段(Shot)的選擇’是根據每個晝框之間的HSv Histogram的差距當作判斷是否要把目前這個晝框當成片段邊緣(811的 Boundary)的依據。根據所取得的hSV Histogram當成特徵，判斷是否要把目前這個晝框當成片段邊緣巧匕说Histogram-Based Image Retrieval The proposed method of Distance Measurement. Indexing After the video file is cut into several small segments, the small segments are then indexed (Index Tag). There are many different embodiments for the method of indexing each segment. In some embodiments, reference may be made to the motion vector (MV) of each segment, the length of the segment, and the color space. Indexing is performed by one of the distributions (eg, the angle of Hue Hue) or one of the other features, or various combinations thereof. In an embodiment, referring to FIG. 6, is a direction distribution including (a) a motion vector, (b) the length of the segment, and (c) the angular distribution of the hue Hue in the color space to obtain its corresponding index value. The direction distribution of the MV in Fig. 6(a) can be divided into eight quadrants (ai, a2, a3). , a4, a5, ae, a8) 'Fig. 6(b) gives different index values (b, b2, b3, b4, ...) for the length of the segment and Figure 6(c) is for the color tone Hue's angular distribution can be divided into three quadrants (c!, c2, c3). The creation of the Semantic Pattern In order to quickly compare the similarity of video streams, a film with a large amount of data must be converted into meaningful symbols to reduce the information needed to be compared. This string of symbols is called semantic meaning. Semantic Pattern. The method of establishing a semantic style, in one of the multiple embodiments, can be subtracted from the dynamic vector (MV) between the frame and the frame, and then subtracted from the obtained 16 201224799 P52990083TW 35538twf.doc/t Measure the length, add these lengths together to be the association between the frame and the frame. After the conversion through the predetermined expression, get a total ratio of the current total Delta value, then get "1", which is smaller than the Delta value, results in "-Γ, which is obtained in other cases, as shown in Figures 7B and 7C. After the above steps, a movie can be converted into a string of semantic styles consisting of 1, 〇 and -1. For a video stream, after the above steps, it is divided into a plurality of segments 'and then' produces a specific tag index and semantic style for each segment. As shown in Fig. 7D, after the video stream 710 is divided into a plurality of segments shot〇, shoti, shot, ..., Sh〇tn720, there are different label indexes and semantic patterns 730. For example, the fragment sh〇t0 has a label index aibiCi and a language pattern (1 0 1 1 -1 -1 〇〇), as indicated by reference numeral 732. The fragment shot! has the label index aib3c2 and the semantic style (-1 1 〇〇〇〇〇〇) as indicated by the symbol 734. The tag index is, for example, the index value of the standard shown in Figs. 6(a), (b), and (c). Retrieving a similar video clip The steps of retrieving a similar video clip include at least (1) obtaining Candidate Clips: finding the same Query Clip based on the selected video file as a search condition. Or similar index tags (Candidate Clips); (2) Semantic Pattern comparison: Calculate the semantic distance of the Query Clip and all candidate video clips (Semantic Distance) ), and after comparing with a threshold (Threshold), determine whether it is a similar video clip. 17 201224799 P52990083TW 35538twf.doc/t The above steps of obtaining a similar video clip are shown in FIG. 8. First, in step S810, a similar video clip search is started. Step S820' then receives the Meta-data of the Query ciip. Then, in step S830, the candidate video clips (assuming X number) are obtained from the video database by using an index tag (IndexTag), that is, the candidate video clips have the same index label as the query video. Step S840', the semantic distance between the second candidate video clip and the query video is calculated by the semantic pattern, and then the step S850 is performed to determine whether the value of the semantic length D between the candidate video clip and the query video is If it is less than the threshold, if no, the comparison of the next candidate video clip i+Ι is performed. If the value of the semantic length D is less than the threshold, the ith candidate video clip belongs to a similar video clip that can be transmitted back to the user. Step S870 is to determine whether the comparison of the X candidate video clips is completed. If i < X, proceed to step S840, and compare the next candidate video clip i + ' 'if i==X ', as in step S880. Stop this process. In the above comparison method, a comparison method of the Semantic Distance may be performed in an embodiment: I, where CK is the Kth language meaning of the ith candidate video clip, and the value may be 1. , -1 or 〇. And % is the Kth semantic style of the Query Clip. And L is the length value of the query movie. The candidate video clip has the same or similar index label as the query video. Each cut segment has a higher similarity. Each small video file cut by the scene change point is called a fragment (Sh〇t). The choice for each shot is based on the difference in HSv Histogram between each frame as a basis for judging whether or not to use the current frame as the edge of the segment (Boundary of 811). According to the obtained hSV Histogram as a feature, judge whether to use the current frame as the edge of the segment.

Boundary)的片段偵測演繹（Sh〇t Detection Algorithm)方法。對於此片段偵測的方法，可能因為視訊檔案的解析度 (Resolution)或是晝質（Quality)差異，造成會有不同的結果，如圖9所示，對於影片的品質QP = 2〇的視訊與Qp = 40的視訊在時間轴上會有所差異，為避免此種誤差，在進行查詢短片（Query Clip)與候選視訊短片的比較時，可以將查5旬短片的§吾意樣式，與候選視訊短片與其兩側鄰近的視訊短片進行比對。也就是，在時間軸上，將查詢短片的語意樣式從左侧的視訊短片的語意樣式，類似滑動比對的方式進行比較，而到右侧的鄰近的視訊短片，以避免產生誤差。如圖10所示，查詢短片QP = 26，語意樣式為 010-11-1-11-11-10011-1-10000000]，而其中一個候選短片QP = 32 ，語意樣式為......〇〇〇-il〇_li-ll-iooii-i〇〇〇〇〇〇〇-i......，臨限值為(3x 201224799 P52990083TW 35538twf.doc/t 擷取長度)/10=(3^5)/10。計算後’最短距離=2，v、於臨 P艮值，屬類似的視訊檔案’所以會回傳給使用者。而美今另一個候選短片QP = 32 ，語意樣、式為……10-1010000000001101-1-111.，士十 ^ 後，最短距離=19，大於臨限值，則不屬於類似的視訊= 案，不可回傳或不會回傳給使用者。 β 雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍内’當可作些許之更動與濁飾，故本發明之保護範圍當視後附之申請專利範圍所界定者為準。【圖式簡單說明】圖1Α〜1C是習知的一種以名稱找影像的檢索方法示意圖。圖2Α與2Β是說明本揭露一實施範例中，選擇一段視訊進行搜尋的流程示意圖。圖3是說明本揭露一實施範例中，建立視訊資料庫的建立步驟流程不意圖。圖4是說明本揭露一實施例中對於視訊位元串流建立 Meta-data的方法示意圖。圖5A是說明HSV色彩空間示意圖。圖5B是將HSV色彩空間經由HSV長條圖（HSV Histogram)剖析法轉換的分佈圖。 201224799 P52990083TW 35538twf.doc/t 圖5C是將視訊檔案HSV長條圖(HSV Histogram)剖析法轉換產生長條圖流程示意圖。圖6是說明本揭露一實施例中，將視訊檔案切成幾個小片段後，對每一片段賦予索引標戴(Index Tag)的不同選擇示意圖。圖7A〜7D是說明本揭露一實施例中，產生語意樣式的不意圖。 I 圖8是取得類似視訊短片的步驟流程示意圖。圖9是說明本揭露一實施例中，進行查詢短片（Query Clip)與候選視訊短片的比較方法一實施例示意圖。圖1〇是說明本揭露一實施例中’從候選視訊短片中選擇可回傳給使用者的視訊片段示意圖。【主要元件符號說明】 110 :關鍵字 120 :輸入框 • 210、230、232、234 :影片 220 :搜尋引擎(Search Engine) 240、242與244 :視訊資料庫 310 :視訊位元串流 350 :儲存系統 21Boundary's method of Sh〇t Detection Algorithm. The method of detecting this segment may have different results due to the resolution of the video file or the difference in quality. As shown in Figure 9, the video quality QP = 2〇. Video with Qp = 40 will be different on the timeline. To avoid such errors, when comparing Query Clips with candidate video clips, you can check the style of the 5th short film. The candidate video clip is compared with the video clips adjacent to both sides. That is, on the time axis, the semantic style of the query video is compared from the semantic style of the video clip on the left side, similar to the sliding comparison method, to the adjacent video clip on the right side to avoid an error. As shown in Figure 10, the query movie QP = 26, the semantic style is 010-11-1-11-11-10011-1-10000000], and one of the candidate clips QP = 32, the semantic style is... 〇〇〇-il〇_li-ll-iooii-i〇〇〇〇〇〇〇-i......, the threshold is (3x 201224799 P52990083TW 35538twf.doc/t length)/10= (3^5)/10. After calculation, the shortest distance = 2, v, at the value of P, is a similar video file, so it will be passed back to the user. Another candidate video of the United States and today QP = 32, the meaning of the language, the formula is ... 10-1010000000001101-1-111., after the tenth ^ ^, the shortest distance = 19, greater than the threshold, it does not belong to similar video = case , can not be returned or will not be passed back to the user. Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some modifications and neglects without departing from the spirit and scope of the invention. Therefore, the scope of protection of the present invention is subject to the definition of the scope of the patent application. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 Α 1C is a conventional search method for finding an image by name. 2A and 2B are schematic diagrams showing the flow of selecting a video for searching in an embodiment of the disclosure. FIG. 3 is a schematic diagram showing the flow of establishing a video database in an embodiment of the present disclosure. FIG. 4 is a schematic diagram illustrating a method for establishing Meta-data for video bitstreams in an embodiment of the present disclosure. Figure 5A is a schematic diagram illustrating the HSV color space. Figure 5B is a distribution diagram of the HSV color space converted by HSV Histogram analysis. 201224799 P52990083TW 35538twf.doc/t Figure 5C is a schematic diagram of the process of converting the HSV Histogram of the video file into a long bar graph. FIG. 6 is a schematic diagram showing different selections of index tags for each segment after the video file is cut into several small segments according to an embodiment of the disclosure. 7A to 7D are diagrams for explaining a semantic pattern in an embodiment of the present disclosure. I Figure 8 is a flow chart showing the steps of obtaining a similar video clip. FIG. 9 is a schematic diagram showing an embodiment of a method for comparing a Query Clip and a candidate video clip according to an embodiment of the present disclosure. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic diagram showing a video segment selected from a candidate video clip for returning to a user in an embodiment of the present disclosure. [Main component symbol description] 110: Keyword 120: Input box • 210, 230, 232, 234: Movie 220: Search Engine 240, 242 and 244: Video Library 310: Video Bit Stream 350: Storage system 21

Claims

201224799 P52990083TW 35538twf.doc/t VII. Patent application scope: 1. A video search method, comprising: receiving a meta-data of a Query Clip, wherein the meta-data includes a first index tag and a first semantic style; obtaining - or a plurality of candidate video clips from the at least one video database according to the first index label; and comparing the semantic styles of the candidate video clips with each of the candidate video clips according to the first semantic pattern And each of the candidate video clips is marked as a returnable video clip or a non-returnable video clip according to a comparison result, wherein the candidate video clips indicated as the returnable video clips are in accordance with the query. The result of the short film query. 2. The video search method of claim 1, wherein the comparing the first semantic style with the semantic style of each of the candidate video clips comprises: calculating the first semantic style and each of the The semantic distance of the semantic style of the candidate video clip; and comparing with a threshold value. If the semantic length is less than the threshold, the candidate video clip is marked as a returnable video clip, if the semantic meaning If the threshold is longer than the threshold, the candidate video clip is marked as non-returnable. The video search method of claim 1, wherein the query video selects a video file of a length of time for the user. The video search method of claim 1, wherein the query video is a video file selected for a length of time via a user interface connection. 5) The video search method according to claim 1, wherein the metadata of the Query Clip received is a response video after a user performs a text query. After the file and its meta-data, the user selects the video slot as the meta-data of the Query Clip according to the user's selection of a length of time. 6. The video search method of claim 1, further comprising comparing the first semantic pattern with each of the candidate video clips and a video clip adjacent to both sides of the candidate video clip. Compare the results. . 7. A video search system, comprising: a search engine for receiving a meta-data of a Query Clip, wherein the metadata includes a first index label and a first semantic style And at least one video database, comprising a plurality of video clips, wherein the search engine obtains one or more candidate video clips from at least one of the video data bases according to the first index tab of the sin, and according to the first semantic style Comparing the semantic styles of each of the candidate video clips one by one, and marking the mother-aihai candidate video clip as a returnable video clip or a non-returnable sfl movie according to a comparison result, wherein the flag can be returned. The candidate video clips of the video clips are the result of the query conforming to the query video. 23 201224799 P52990083TW 35538twf.doc/t 8. The video search system according to claim 7 of the patent application scope, the video tributary library stores a plurality of video bit streams and the elements of the video bit string weeping, wherein each The establishment of the metadata includes segmentation processing of the video bitstream by a segmentation detection process, generating a segment of the segment; indexing the segments, and assigning corresponding content according to the content of each slice An index tag; and a semantic style of the segment is established according to a video feature of each segment, wherein the metadata includes at least the index tags corresponding to the segments and the semantic styles. 9. The video search system of claim 8, wherein the process of dividing the video bit stream by the segmentation detection process comprises cutting the video bit according to a Scene Change Point Streaming. 10. The video search system of claim 9, wherein the selection of the scene change point is based on a result of the HSV Histogram conversion to determine whether the scene change point. 11. The video search method according to the first aspect of the patent application, wherein the result obtained by the conversion is a result obtained by comparing the distances between the two adjacent frames according to the HSV bar graph. 12. The video search method of claim 8, wherein the indexing of the segments is performed by assigning a corresponding index label to each of the segments in a Motion Vector (MV) direction. The method of video search according to claim 8, wherein the indexing of the segments is performed by assigning a corresponding index to the length of each segment. 14. The video search method of claim 8, wherein the indexing of the segments is performed by assigning a corresponding index to the hue of the HSV color space of each of the segments. 15. The video search method according to claim 8 of the claim, wherein the indexing of the segments is performed by using a motion vector (MV) direction of each segment, a length of the segment, Each of the angles of the hue Hue of each HSV color space of the segment is assigned to the corresponding index tag. 16. The video search system of claim 7, further comprising a user terminal, wherein the query video selects a video file of a length of time for the user terminal. 17. The video search system of claim 7, further comprising a user terminal, wherein the query video is a video file selected for a length of time via a user interface of the user terminal. 18. The video search system of claim 7, wherein the manner in which the first semantic style is compared with the semantic style of each of the candidate video clips comprises: & ten noses. And a semantic distance of a semantic pattern of each of the candidate video clips; and comparing with a threshold value. If the semantic length is less than the threshold, the candidate video clip is marked as returnable. The video clip, if the meaning is 25 201224799 P52990083TW 35538twf.doc/t is greater than the threshold, the candidate video clip is marked as a non-returnable video clip. 19. The video search system of claim 7, wherein the metadata of the Query Clip received is a response video after a user performs a text query. After the file and its meta-data, the user selects a portion of the video file for a length of time as the meta-data of the Query Clip. 20. The video search system of claim 7, wherein the video database is located at a remote host, wherein the search engine can establish a communication pipe with the remote host for accessing the video library. The video search system of claim 7, further comprising comparing the first semantic pattern with each of the candidate video clips and a video clip adjacent to both sides of the candidate video clip. Compare the sisters. , , σ 22. The method for establishing a video database includes: storing a plurality of video bit streams in a database; and establishing meta data for each of the video bit streams, The method for establishing the parent data includes: processing the video bit stream to generate a plurality of segments; the segmentation detection process performs segmentation, and indexing the segments according to each segment of the segment, Giving a corresponding index label; and 26 201224799 P52990083TW 35538twf.doc/t establishing a semantic style of the fragment according to the video feature of each segment, wherein the metadata includes at least the corresponding segments of the segment Index labels and these semantic styles. 23. The method for establishing a video database according to claim 22, wherein the process of dividing the video bit stream by the segmentation detection process comprises cutting the scene according to a Scene Change Point Video bit stream. 24. The method for establishing a video database according to claim 23, wherein the selection of the scene change point is based on a result of HSV Histogram conversion to determine whether the scene change point. 25. The method for establishing a video database as described in claim 24, wherein the result obtained by the conversion is based on the HSV bar graph between two adjacent frames, and the distance between the two is compared. result. 26. The method for establishing a video database according to claim 22, wherein the indexing of the segments is performed by assigning a corresponding motion vector (MV) direction of the segment to the corresponding index. label. ^ 27. The method for establishing a video database according to claim 22, wherein the indexing of the segments is performed by assigning a corresponding index to each of the segments. 28. The method for establishing a video database according to claim 22, wherein the indexing of the segments is performed by assigning an index of the hue of each HSV color space of the segment to the corresponding index tag. . ^ 27 201224799 P52990083TW 35538twf.doc/t 29. The method for establishing a video database according to claim 22, wherein the indexing of the segments is a dynamic vector using each of the segments (Motion Vector) The 'MV' direction is assigned to the corresponding index label. X. 30. The method for establishing a video database according to claim 22, wherein the indexing of the segments is performed by using a motion vector (MV) direction of each segment, the segment The length of the HSV color space of each of the segments is arbitrarily combined with the hue of the Hue color space to give the corresponding index label.

28