[go: up one dir, main page]

TWI881909B - Video similarity judgment method and video recommendation method based on video key frames - Google Patents

Video similarity judgment method and video recommendation method based on video key frames Download PDF

Info

Publication number
TWI881909B
TWI881909B TW113131962A TW113131962A TWI881909B TW I881909 B TWI881909 B TW I881909B TW 113131962 A TW113131962 A TW 113131962A TW 113131962 A TW113131962 A TW 113131962A TW I881909 B TWI881909 B TW I881909B
Authority
TW
Taiwan
Prior art keywords
video
scene
key
key frame
time
Prior art date
Application number
TW113131962A
Other languages
Chinese (zh)
Inventor
吳淑琴
張美惠
方俞喬
林彥佐
林峻譚
Original Assignee
台灣大哥大股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 台灣大哥大股份有限公司 filed Critical 台灣大哥大股份有限公司
Priority to TW113131962A priority Critical patent/TWI881909B/en
Application granted granted Critical
Publication of TWI881909B publication Critical patent/TWI881909B/en

Links

Images

Landscapes

  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本發明提供一種影片相似度判斷方法及影片推薦方法。首先,接收一第一影片及一第二影片。識別該第一影片包含的一第一關鍵場景,識別該第二影片包含的一第二關鍵場景,該第一影片的第一關鍵場景具有一第一時間比重,該第二影片的第二關鍵場景具有一第二時間比重。萃取該第一影片的第一關鍵場景中的一第一關鍵幀,萃取該第二影片的第二關鍵場景中的一第二關鍵幀。利用影像辨識手段,決定該第一關鍵幀及該第二關鍵幀的一相似度。基於該第一時間比重、該第二時間比重與該相似度之乘積值,計算該第一影片或該第二影片的一推薦值。The present invention provides a video similarity determination method and a video recommendation method. First, a first video and a second video are received. A first key scene contained in the first video is identified, and a second key scene contained in the second video is identified, wherein the first key scene of the first video has a first time proportion, and the second key scene of the second video has a second time proportion. A first key frame in the first key scene of the first video is extracted, and a second key frame in the second key scene of the second video is extracted. Using image recognition means, a similarity between the first key frame and the second key frame is determined. Based on the product value of the first time proportion, the second time proportion and the similarity, a recommendation value of the first video or the second video is calculated.

Description

基於影片關鍵幀的影片相似度判斷方法及影片推薦方法Video similarity judgment method and video recommendation method based on video key frames

本發明有關於一種影片相似度判斷方法及影片推薦方法,特別是一種基於影片關鍵幀分析而執行的影片相似度判斷方法及影片推薦方法。更具體地,本發明方法是針對影音推薦之應用,透過擷取影片中的一或多個關鍵幀進行影像分析,從而推薦與目標影片相關的其他影片。The present invention relates to a video similarity determination method and a video recommendation method, and in particular to a video similarity determination method and a video recommendation method based on video key frame analysis. More specifically, the method of the present invention is applied to video recommendation, and by capturing one or more key frames in a video for image analysis, other videos related to the target video are recommended.

常見的影片推薦機制主要是透過預先定義的影音內容分類、標籤與用戶的觀看或購買習慣進行分析從而產生推薦,但這些推薦都有盲點,無法發現某些影片內容其實有相似,卻在分類或標籤上有截然不同的情形。Common video recommendation mechanisms mainly generate recommendations by analyzing pre-defined video content categories and tags and users' viewing or purchasing habits. However, these recommendations have blind spots and cannot detect that some videos actually have similar content but completely different categories or tags.

本發明提出一種推薦機制,其透過影像分析找出影音內容相關的影片,並據此進行推薦。本發明使用影片關鍵幀(keyframe)來進行影像分析模型的訓練,並使用分析結果來推薦影片。The present invention proposes a recommendation mechanism, which finds videos with relevant audio and video content through image analysis and recommends them accordingly. The present invention uses video keyframes to train an image analysis model and uses the analysis results to recommend videos.

本發明目的在於提供一種基於影片關鍵幀的影片相似度判斷方法,該方法由處理器執行,該方法包括:接收一第一影片及一第二影片;識別該第一影片包含的一第一關鍵場景,及識別該第二影片包含的一第二關鍵場景,該第一影片的第一關鍵場景具有一第一時間比重,該第二影片的第二關鍵場景具有一第二時間比重;萃取該第一影片的第一關鍵場景中的一第一關鍵幀,及萃取該第二影片的第二關鍵場景中的一第二關鍵幀;利用影像辨識手段,決定該第一關鍵幀及該第二關鍵幀的一相似度;及基於該第一時間比重、該第二時間比重與該相似度之乘積值,決定該第一影片的第一場景與該第二影片的第二場景的相似度。The present invention aims to provide a method for determining video similarity based on video key frames. The method is executed by a processor and comprises: receiving a first video and a second video; identifying a first key scene contained in the first video and a second key scene contained in the second video, wherein the first key scene of the first video has a first time proportion, and the second key scene of the second video has a second time proportion. extracting a first key frame from a first key scene of the first video and extracting a second key frame from a second key scene of the second video; determining a similarity between the first key frame and the second key frame by image recognition means; and determining a similarity between a first scene of the first video and a second scene of the second video based on a product value of the first time proportion, the second time proportion and the similarity.

本發明另一目的在於一種基於影片關鍵幀的影片推薦方法,該方法由處理器執行,該方法包括:接收一第一影片及一第二影片,其中該第一影片為在一串流影音平台上由一用戶選擇觀看之影片,該第二影片為待推薦候選影片集合中之一者;識別該第一影片包含的一第一關鍵場景,及識別該第二影片包含的一第二關鍵場景,該第一影片的第一關鍵場景具有一第一時間比重,該第二影片的第二關鍵場景具有一第二時間比重;萃取該第一影片的第一關鍵場景中的一第一關鍵幀,及萃取該第二影片的第二關鍵場景中的一第二關鍵幀;利用影像辨識手段,決定該第一關鍵幀及該第二關鍵幀的一相似度;基於該第一時間比重、該第二時間比重與該相似度之乘積值,決定該第二影片的第二場景具有一推薦值;及基於該推薦值,在有用戶觀看該第一影片的串流影音平台上呈現該第二影片。Another object of the present invention is a method for recommending videos based on key frames of videos, the method being executed by a processor, the method comprising: receiving a first video and a second video, wherein the first video is a video selected by a user to watch on a streaming video platform, and the second video is one of a set of candidate videos to be recommended; identifying a first key scene contained in the first video, and identifying a second key scene contained in the second video, wherein the first key scene of the first video has a first time proportion, and the second key scene of the second video has a first time proportion. The key scene has a second time proportion; a first key frame is extracted from the first key scene of the first video, and a second key frame is extracted from the second key scene of the second video; a similarity between the first key frame and the second key frame is determined by image recognition means; based on the product of the first time proportion, the second time proportion and the similarity, it is determined that the second scene of the second video has a recommendation value; and based on the recommendation value, the second video is presented on the streaming video platform where the user watches the first video.

在一具體實施例中,該第一關鍵幀及該第二關鍵幀為完整影像檔案。In a specific embodiment, the first key frame and the second key frame are complete image files.

在一具體實施例中,該第一時間比重為該第一場景的一播放長度除以該第一影片的總播放長度,該第二時間比重為該第二場景的一播放長度除以該第二影片的總撥放長度。In a specific embodiment, the first time proportion is a play length of the first scene divided by the total play length of the first video, and the second time proportion is a play length of the second scene divided by the total play length of the second video.

在一具體實施例中,該第一場景及該第二場景的資料描述,至少是以一場景名稱、一關鍵幀名稱、一時間碼的描述及一時間比重所構成。In a specific embodiment, the data description of the first scene and the second scene is composed of at least a scene name, a key frame name, a time code description and a time ratio.

本發明方法是針對影音推薦之應用,透過擷取影片中的一或多個關鍵幀進行影像分析,從而推薦與目標影片相關的其他影片,提升用戶對影音平台的使用體驗及黏著度。The method of the present invention is applied to video recommendation. By capturing one or more key frames in a video and performing image analysis, other videos related to the target video are recommended to enhance the user experience and stickiness of the video platform.

底下將參考圖式更完整說明本發明,並且藉由例示顯示特定範例具體實施例。不過,本主張主題可具體實施於許多不同形式,因此所涵蓋或申請主張主題的建構並不受限於本說明書所揭示的任何範例具體實施例;範例具體實施例僅為例示。同樣,本發明在於提供合理寬闊的範疇給所申請或涵蓋之主張主題。此外,本發明中的附圖與例示通常不是按比例繪製的,且非旨在與實際的相對尺寸相對應。The present invention will be more fully described below with reference to the drawings, and specific exemplary embodiments are shown by way of illustration. However, the claimed subject matter may be embodied in many different forms, and thus the construction of the claimed subject matter covered or claimed is not limited to any exemplary embodiment disclosed in this specification; the exemplary embodiments are merely illustrative. Likewise, the present invention is to provide a reasonably broad scope for the claimed or covered subject matter. In addition, the drawings and illustrations in the present invention are generally not drawn to scale and are not intended to correspond to actual relative sizes.

出於一致性和易於理解的目的,在示例性附圖中藉由標號以標示相同特徵(雖在一些示例中並未如此標示)。然而,不同實施方式中的特徵在其他方面可能不同,因此不應狹義地局限於附圖所示的特徵。本發明的說明書及上述附圖中的術語「第一」和「第二」等是用於區別不同物件,而非用於描述特定順序。For the purpose of consistency and ease of understanding, the same features are marked by reference numerals in the exemplary drawings (although in some examples they are not so marked). However, the features in different embodiments may differ in other aspects and should not be narrowly limited to the features shown in the drawings. The terms "first" and "second" in the specification of the present invention and the above drawings are used to distinguish different objects, rather than to describe a specific order.

本發明所述「影片」特別是指串流服務平台及技術所提供的影片,例如電影預告片,但不以此為限制。第一圖示意一影片的組成,其包含多個場景,如播放長度不同的場景1至場景10。每一個場景由多個幀(frames)所組成。本發明所述「幀」代表一張影像。例如,場景4由幀4-1至幀4-7所組成。所有場景的不同播放長度的總和構成該影片的一總播放長度。The "video" mentioned in the present invention specifically refers to the video provided by the streaming service platform and technology, such as a movie trailer, but is not limited thereto. The first figure illustrates the composition of a video, which includes multiple scenes, such as scene 1 to scene 10 with different play lengths. Each scene is composed of multiple frames. The "frame" mentioned in the present invention represents an image. For example, scene 4 is composed of frame 4-1 to frame 4-7. The sum of the different play lengths of all scenes constitutes a total play length of the video.

在本發明的一實施例中,一影片的組成被區分成由一或多個關鍵場景及一或多個非關鍵場景所構成。本發明所述「關鍵場景」代表含有一張關鍵幀(Keyframe)的場景,或是其他自定義的場景。由於一關鍵場景是由多個幀所組合而成,而為了壓縮影片,已知可從一關鍵場景中的多張幀指定其中一張幀為關鍵幀,並基於該張關鍵幀壓縮其他張幀,使其他張幀僅保留影像變化資訊且省略重複資訊,藉此達到影像壓縮的效果,其詳細技術手段已為本領域技術者所熟知,故細節不在此贅述。每一關鍵場景所對應的一播放長度可由一時間碼描述,其代表該關鍵場景在該影片中從第幾分幾秒撥放至第幾分幾秒,例如時間碼可描述為 ”00:00-00:03”。在一實施例中,每一張關鍵幀是以一命名的影像檔表示,如 ”001.jpg”。在另一實施例中,關鍵幀不限於只能是關鍵場景中的首張幀。在其他實施例中,一關鍵場景中可以有大於一張關鍵幀的存在,此情況下應擇一作為該關鍵場景的關鍵幀。In one embodiment of the present invention, the composition of a video is divided into one or more key scenes and one or more non-key scenes. The "key scene" described in the present invention represents a scene containing a key frame (Keyframe) or other custom scenes. Since a key scene is composed of multiple frames, in order to compress a video, it is known that one of the multiple frames in a key scene can be designated as a key frame, and other frames can be compressed based on the key frame, so that other frames only retain image change information and omit repeated information, thereby achieving the effect of image compression. The detailed technical means are already well known to technicians in this field, so the details are not repeated here. The playback length corresponding to each key scene can be described by a time code, which represents the time from which the key scene is played to which minute in the video. For example, the time code can be described as "00:00-00:03". In one embodiment, each key frame is represented by a named image file, such as "001.jpg". In another embodiment, the key frame is not limited to the first frame in the key scene. In other embodiments, there can be more than one key frame in a key scene, in which case one should be selected as the key frame of the key scene.

一般而言,影片發行商所提供的預告片可視為含有大量關鍵場景的一種影片(相較於一完整影片)。本發明之方法係萃取一影片中屬於關鍵場景的部分並分析其中的關鍵幀與該影片的關係,藉此可為每一個預告片建立關鍵場景資料,並行形成關於多部預告片的關鍵場景資料庫。Generally speaking, trailers provided by film distributors can be regarded as a type of film containing a large number of key scenes (compared to a complete film). The method of the present invention extracts the key scene portion of a film and analyzes the relationship between the key frames and the film, thereby establishing key scene data for each trailer and forming a key scene database for multiple trailers.

第二圖示意一影片的組成,該影片總撥放長度為10秒,且該影片是由關鍵場景1至3及一個非關鍵場景所組成,其中關鍵場景2由五張幀所組成,首張幀設為關鍵幀,其餘為壓縮幀1至4。The second figure shows the composition of a video. The total playback length of the video is 10 seconds, and the video is composed of key scenes 1 to 3 and a non-key scene. Key scene 2 is composed of five frames, the first frame is set as the key frame, and the rest are compressed frames 1 to 4.

由於一張關鍵幀是一張完整的影像,其可用於和其他影像比對,進而獲得一相似度。一關鍵場景的時間碼可用於分析關鍵場景在一完整影片中所佔比例,本發明稱為「時間比重」。Since a key frame is a complete image, it can be used to compare with other images to obtain a similarity. The time code of a key scene can be used to analyze the proportion of the key scene in a complete video, which is called "time proportion" in the present invention.

第三圖例示影片A中的關鍵場景1至3及其定義的關鍵幀名稱、時間碼及時間比重。在該舉例中,影片A的總撥放長度為10秒,其中根據時間碼的描述可知,關鍵場景1從第0秒撥放至第3秒,關鍵場景2從第4秒撥放至第6秒,關鍵場景3從第6秒撥放至第10秒。至於第3秒至第4秒,其則非關鍵場景。關鍵場景1含有的一張關鍵幀命名為 ”001.jpg”,以此類推,關鍵場景2和關鍵場景3則分別含有 ”002.jpg” 及 “003.jpg” 關鍵幀。關鍵場景1的時間比重為該場景的撥放長度除以影片A的總撥放長度,即3秒除以10秒而等於0.3。以此類推,關鍵場景2的時間比重為0.2,關鍵場景3的時間比重為0.4。據此,影片A的關鍵場景資料(包含關鍵場景、關鍵幀影像、時間碼及時間比重)被建立。The third figure illustrates the key scenes 1 to 3 in the video A and their defined key frame names, time codes and time proportions. In this example, the total playback length of the video A is 10 seconds. According to the description of the time code, key scene 1 is played from the 0th second to the 3rd second, key scene 2 is played from the 4th second to the 6th second, and key scene 3 is played from the 6th second to the 10th second. As for the 3rd second to the 4th second, it is not a key scene. Key scene 1 contains a key frame named "001.jpg". Similarly, key scene 2 and key scene 3 contain key frames "002.jpg" and "003.jpg" respectively. The time proportion of key scene 1 is the playback length of the scene divided by the total playback length of video A, that is, 3 seconds divided by 10 seconds, which equals 0.3. Similarly, the time proportion of key scene 2 is 0.2, and the time proportion of key scene 3 is 0.4. Accordingly, the key scene data of video A (including key scenes, key frame images, time codes and time proportions) is established.

第四圖示意本發明影片推薦系統(2),包含一場景辨識模組(20)、一場景比重分析模組(22)、一相似度分析模組(24)、及一推薦模組(26)。儘管未顯示, 影片推薦系統(2)可與一串流影音系統連接,並替該串流影音系統提供的所有影片建立關鍵場景資料。影片推薦系統(2)主要由一或多個處理器、記憶體、電源、及網路介面等部分所實現。影片推薦系統(2)可向一影片資料庫(3)存取一或多個影片,並對影片進行分析及產生對應的關鍵場景資料。影片資料庫(3)可以是既有串流影音系統包含的一部分或者是專用於影片推薦系統(2)。FIG. 4 schematically shows the video recommendation system (2) of the present invention, which includes a scene recognition module (20), a scene weight analysis module (22), a similarity analysis module (24), and a recommendation module (26). Although not shown, the video recommendation system (2) can be connected to a streaming video system and establish key scene data for all videos provided by the streaming video system. The video recommendation system (2) is mainly implemented by one or more processors, memories, power supplies, and network interfaces. The video recommendation system (2) can access one or more videos from a video database (3), analyze the videos, and generate corresponding key scene data. The video database (3) can be a part of an existing streaming video system or be dedicated to the video recommendation system (2).

影片推薦系統(2)因應一用戶在串流影音系統選擇觀看影片A,而自影片資料庫(3)讀取關於影片A或關於影片A的關鍵場景資料。同時,影音推薦系統(2)還自影片資料庫(3)讀取其他待推薦候選影片,如影片B至影片H,這些候選影片於當前並未由該用戶在串流影音系統中選擇觀看。影音推薦系統(2)針對影片B至影片H進行分析以形成各自的關鍵場景資料。In response to a user selecting to watch video A in the streaming video system, the video recommendation system (2) reads key scene data about video A or about video A from the video database (3). At the same time, the video recommendation system (2) also reads other candidate videos to be recommended, such as video B to video H, from the video database (3). These candidate videos have not been selected by the user to watch in the streaming video system. The video recommendation system (2) analyzes video B to video H to form their respective key scene data.

場景辨識模組(20)配置成分析及辨識一影片中所含的一或多個場景及其屬性。在一種情況下,影片檔案的描述包含自定義的場景,可藉由定義名稱識別影片中的場景。在另一種情況下,影片檔案的描述未包含自定義的場景,則可採取深度學習及影像辨識的技術手段來定義影片中的場景。場景辨識模組(20)分析影片後形成關於該影片的關鍵場景資料,如第三圖的場景編號/名稱、關鍵幀檔案名稱、及場景的時間碼。The scene recognition module (20) is configured to analyze and identify one or more scenes and their properties contained in a video. In one case, the description of the video file includes a custom scene, and the scene in the video can be identified by defining a name. In another case, the description of the video file does not include a custom scene, and the scene in the video can be defined by using deep learning and image recognition techniques. After analyzing the video, the scene recognition module (20) forms key scene data about the video, such as the scene number/name, key frame file name, and scene time code in the third figure.

場景比重分析模組(22)配置成基於一場景的時間碼計算該場景在整部影片中的比重,產生每一個場景的時間比重。The scene proportion analysis module (22) is configured to calculate the proportion of a scene in the entire video based on the time code of the scene, and generate the time proportion of each scene.

相似度分析模組(24)配置成基於一場景中(如取自影片A)的一張幀與另一場景中(如取自影片B)的另一張幀執行影像辨識,以決定兩張幀的相似程度。此可以採取已知的影像辨識而達成,其並非本發明改良部分,故不在此贅述。所述相似程度可以數值範圍表示,例如0.0至1.0,其中數值1.0為高度相似,數值0.0為低度相似。在一實施例中,所述影像辨識可基於VGG(Visual Geometry Group)之模型而實現。在另一實施例中,所述相似度可根據兩張不同幀的內容而決定,例如從兩張不同幀可識別出相同品牌或相似物件,則判斷兩者具有高度相似。The similarity analysis module (24) is configured to perform image recognition based on a frame in one scene (such as taken from video A) and another frame in another scene (such as taken from video B) to determine the degree of similarity between the two frames. This can be achieved by adopting known image recognition, which is not an improved part of the present invention and is not described in detail here. The degree of similarity can be represented by a numerical range, such as 0.0 to 1.0, where a value of 1.0 is highly similar and a value of 0.0 is low similarity. In one embodiment, the image recognition can be implemented based on the VGG (Visual Geometry Group) model. In another embodiment, the similarity can be determined based on the content of two different frames. For example, if the same brand or similar objects can be identified from two different frames, then the two are judged to be highly similar.

推薦模組(26)配置成決定一影片(如影片B至H)的一推薦值並基於所述推薦值排序及推薦該影片給一串流影音平台的用戶。更具體而言,推薦模組(26)是決定該影片(如影片B至H)中的一場景的一推薦值。在本發明的一實施例中,所述「推薦值」是候選影片中一場景的一時間比重乘以已選擇觀看影片中一場景的一時間比重再乘以一相似度,其中所述「相似度」是候選影片中的場景的一關鍵幀與已選擇觀看影片中的場景的一關鍵幀的相似度,其以數值表示。所述「推薦值」代表一候選影片中的一特定場景,相對於一選擇觀看影片中的一特定場景而言,兩者場景的相似程度。換言之,一場景具有的推薦值會根據該場景與另一場景之間的關係而決定。The recommendation module (26) is configured to determine a recommendation value for a video (such as videos B to H) and to sort and recommend the video to a user of a streaming video platform based on the recommendation value. More specifically, the recommendation module (26) determines a recommendation value for a scene in the video (such as videos B to H). In one embodiment of the present invention, the "recommendation value" is a time proportion of a scene in a candidate video multiplied by a time proportion of a scene in a video selected for viewing multiplied by a similarity, wherein the "similarity" is the similarity between a key frame of a scene in the candidate video and a key frame of a scene in the video selected for viewing, which is expressed as a numerical value. The "recommendation value" represents a specific scene in a candidate video, relative to a specific scene in a video selected for viewing, and the degree of similarity between the two scenes. In other words, the recommendation value of a scene is determined by the relationship between the scene and another scene.

第五圖例示影片A關鍵場景1與其他候選影片場景的關聯性。如第三圖所舉例,影片A具有一關鍵場景1,其代表的關鍵幀名稱為 ”001.jpg”,描述撥放長度的時間碼為 “00:00-00:03”,時間比重為 “0.3”。已知影片B中的關鍵場景2具有的時間比重為 “0.1”,則影片B關鍵場景2的推薦值= 0.1(影片B關鍵場景2的時間比重)*0.3(影片A關鍵場景1的時間比重)*1(影片B關鍵場景2的關鍵幀與影片A關鍵場景1的關鍵幀”001.jpg”之相似度)=0.03。已知影片C中的關鍵場景2具有的時間比重為 “0.1”,則影片C關鍵場景2的推薦值=0.1(影片C關鍵場景2的時間比重)*0.3(影片A關鍵場景1的時間比重)*0.9(影片C關鍵場景2的關鍵幀與影片A關鍵場景1的關鍵幀”001.jpg”之相似度)=0.27。已知影片F中的關鍵場景2具有的時間比重為 “0.1”,則影片F關鍵場景2的推薦值=0.1(影片F關鍵場景2的時間比重)*0.3(影片A關鍵場景1的時間比重)*0.6(影片F關鍵場景2的關鍵幀與影片A關鍵場景1的關鍵幀”001.jpg”之相似度)=0.018。因此,就影片A關鍵場景1而言,影片B關鍵場景2具有最高的推薦值,其次是影片C關鍵場景2,再來是影片F關鍵場景2。為了簡化說明,此舉例僅列出前三名以方便理解。The fifth figure illustrates the correlation between the key scene 1 of the movie A and other candidate movie scenes. As shown in the third figure, the movie A has a key scene 1, the key frame name represented by it is "001.jpg", the time code describing the playback length is "00:00-00:03", and the time weight is "0.3". It is known that the time weight of the key scene 2 in the movie B is "0.1", then the recommendation value of the key scene 2 of the movie B = 0.1 (the time weight of the key scene 2 of the movie B) * 0.3 (the time weight of the key scene 1 of the movie A) * 1 (the similarity between the key frame of the key frame of the key scene 2 of the movie B and the key frame "001.jpg" of the key scene 1 of the movie A) = 0.03. It is known that the time proportion of key scene 2 in video C is "0.1", then the recommendation value of key scene 2 in video C = 0.1 (time proportion of key scene 2 in video C) * 0.3 (time proportion of key scene 1 in video A) * 0.9 (similarity between the key frame of key scene 2 in video C and the key frame "001.jpg" of key scene 1 in video A) = 0.27. It is known that the time proportion of key scene 2 in video F is "0.1", then the recommendation value of key scene 2 in video F = 0.1 (time proportion of key scene 2 in video F) * 0.3 (time proportion of key scene 1 in video A) * 0.6 (similarity between the key frame of key scene 2 in video F and the key frame "001.jpg" of key scene 1 in video A) = 0.018. Therefore, for key scene 1 in video A, key scene 2 in video B has the highest recommendation value, followed by key scene 2 in video C, and then key scene 2 in video F. In order to simplify the explanation, this example only lists the top three for easy understanding.

第六圖例示影片A關鍵場景2與其他候選影片場景的關聯性。如第三圖所舉例,影片A具有一關鍵場景2,其代表的關鍵幀名稱為 ”002.jpg”,描述撥放長度的時間碼為 “00:04-00:06”,時間比重為 “0.2”。已知影片B中的關鍵場景5具有的時間比重為 “0.2”,則影片B關鍵場景5的推薦值= 0.2(影片B關鍵場景5的時間比重)*0.2(影片A關鍵場景2的時間比重)*1(影片B關鍵場景5的關鍵幀與影片A關鍵場景2的關鍵幀”002.jpg”之相似度)=0.04。已知影片D中的關鍵場景5具有的時間比重為 “0.2”,則影片D關鍵場景5的推薦值=0.2(影片D關鍵場景5的時間比重)*0.2(影片A關鍵場景2的時間比重)*0.8(影片D關鍵場景5的關鍵幀與影片A關鍵場景2的關鍵幀”002.jpg”之相似度)=0.032。已知影片G中的關鍵場景5具有的時間比重為 “0.2”,則影片G關鍵場景5的推薦值=0.2(影片G關鍵場景5的時間比重)*0.2(影片A關鍵場景2的時間比重)*0.6(影片G關鍵場景5的關鍵幀與影片A關鍵場景2的關鍵幀”002.jpg”之相似度)=0.024。因此,就影片A關鍵場景2而言,影片B關鍵場景5具有最高的推薦值,其次是影片D關鍵場景5,再來是影片G關鍵場景5。為了簡化說明,此舉例僅列出前三名以方便理解。The sixth figure illustrates the correlation between the key scene 2 of video A and other candidate video scenes. As shown in the third figure, video A has a key scene 2, the key frame name of which is "002.jpg", the time code describing the playback length is "00:04-00:06", and the time weight is "0.2". It is known that the time weight of the key scene 5 in video B is "0.2", so the recommendation value of the key scene 5 of video B = 0.2 (the time weight of the key scene 5 of video B) * 0.2 (the time weight of the key scene 2 of video A) * 1 (the similarity between the key frame of the key scene 5 of video B and the key frame "002.jpg" of the key scene 2 of video A) = 0.04. It is known that the time proportion of key scene 5 in movie D is "0.2", then the recommendation value of key scene 5 in movie D = 0.2 (time proportion of key scene 5 in movie D) * 0.2 (time proportion of key scene 2 in movie A) * 0.8 (similarity between the key frame of key scene 5 in movie D and the key frame "002.jpg" of key scene 2 in movie A) = 0.032. It is known that the time proportion of key scene 5 in movie G is "0.2", then the recommendation value of key scene 5 in movie G = 0.2 (time proportion of key scene 5 in movie G) * 0.2 (time proportion of key scene 2 in movie A) * 0.6 (similarity between the key frame of key scene 5 in movie G and the key frame "002.jpg" of key scene 2 in movie A) = 0.024. Therefore, for key scene 2 in movie A, key scene 5 in movie B has the highest recommendation value, followed by key scene 5 in movie D, and then key scene 5 in movie G. In order to simplify the explanation, this example only lists the top three for easy understanding.

第七圖例示影片A關鍵場景3與其他候選影片場景的關聯性。如第三圖所舉例,影片A具有一關鍵場景3,其代表的關鍵幀名稱為 ”003.jpg”,描述撥放長度的時間碼為 “00:06-00:10”,時間比重為 “0.4”。已知影片C中的關鍵場景8具有的時間比重為 “0.3”,則影片C關鍵場景8的推薦值= 0.3(影片C關鍵場景8的時間比重)*0.4(影片A關鍵場景3的時間比重)*1(影片C關鍵場景8的關鍵幀與影片A關鍵場景3的關鍵幀”003.jpg”之相似度)=0.18。已知影片E中的關鍵場景8具有的時間比重為 “0.3”,則影片E關鍵場景8的推薦值=0.3(影片E關鍵4場景8的時間比重)*0.4(影片A關鍵場景3的時間比重)*0.8(影片E關鍵場景8的關鍵幀與影片A關鍵場景3的關鍵幀”003.jpg”之相似度)=0.096。已知影片H中的關鍵場景8具有的時間比重為 “0.3”,則影片H關鍵場景8的推薦值=0.3(影片H關鍵場景8的時間比重)*0.4(影片A關鍵場景3的時間比重)*0.7(影片H關鍵場景8的關鍵幀與影片A關鍵場景3的關鍵幀”003.jpg”之相似度)=0.084。因此,就影片A關鍵場景3而言,影片C關鍵場景8具有最高的推薦值,其次是影片E關鍵場景8,再來是影片H關鍵場景8。為了簡化說明,此舉例僅列出前三名以方便理解。FIG7 illustrates the correlation between the key scene 3 of movie A and other candidate movie scenes. As shown in FIG3, movie A has a key scene 3, the key frame name represented by it is "003.jpg", the time code describing the playback length is "00:06-00:10", and the time weight is "0.4". It is known that the time weight of the key scene 8 in movie C is "0.3", so the recommendation value of the key scene 8 in movie C = 0.3 (the time weight of the key scene 8 in movie C) * 0.4 (the time weight of the key scene 3 in movie A) * 1 (the similarity between the key frame of the key scene 8 in movie C and the key frame "003.jpg" of the key scene 3 in movie A) = 0.18. It is known that the time proportion of key scene 8 in movie E is "0.3", then the recommendation value of key scene 8 in movie E = 0.3 (time proportion of key scene 8 in movie E) * 0.4 (time proportion of key scene 3 in movie A) * 0.8 (similarity between the key frame of key scene 8 in movie E and the key frame "003.jpg" of key scene 3 in movie A) = 0.096. It is known that the time proportion of key scene 8 in movie H is "0.3", then the recommendation value of key scene 8 in movie H = 0.3 (time proportion of key scene 8 in movie H) * 0.4 (time proportion of key scene 3 in movie A) * 0.7 (similarity between the key frame of key scene 8 in movie H and the key frame "003.jpg" of key scene 3 in movie A) = 0.084. Therefore, with respect to key scene 3 in movie A, key scene 8 in movie C has the highest recommendation value, followed by key scene 8 in movie E, and then key scene 8 in movie H. In order to simplify the explanation, this example only lists the top three for easy understanding.

第八圖列出其他影片的推薦值總和。根據第五、六、七圖分析的推薦值,候選影片中對應影片A關鍵場景1的前三名推薦分別是影片B關鍵場景2、影片C關鍵場景2及影片F關鍵場景2,候選影片中對應影片A關鍵場景2的前三名推薦分別是影片B關鍵場景5、影片D關鍵場景5及影片G關鍵場景5,候選影片中對應影片A關鍵場景3的前三名推薦分別是影片C關鍵場景8、影片E關鍵場景8及影片G關鍵場景8。若各場景僅考慮前三名推薦,則影片A的其他候選影片推薦為候選影片的推薦值總和的排名。以上述舉例而言,影片C為第一推薦(推薦值=0.12+0.027=0.147,最高),影片E為第二推薦(推薦值=0.096,次高),影片H為第三推薦(推薦值=0.084,第三高)。推薦模組(26)基於推薦值的總和決定候選影片的一推薦排序。Figure 8 lists the total recommendation values of other films. According to the recommendation values analyzed in Figures 5, 6, and 7, the top three recommendations for key scene 1 of film A in the candidate films are key scene 2 of film B, key scene 2 of film C, and key scene 2 of film F, respectively; the top three recommendations for key scene 2 of film A in the candidate films are key scene 5 of film B, key scene 5 of film D, and key scene 5 of film G, respectively; the top three recommendations for key scene 3 of film A in the candidate films are key scene 8 of film C, key scene 8 of film E, and key scene 8 of film G, respectively. If only the top three recommendations are considered for each scene, the other candidate film recommendations of film A are the ranking of the total recommendation values of the candidate films. In the above example, video C is the first recommendation (recommendation value = 0.12 + 0.027 = 0.147, the highest), video E is the second recommendation (recommendation value = 0.096, the second highest), and video H is the third recommendation (recommendation value = 0.084, the third highest). The recommendation module (26) determines the recommendation ranking of the candidate videos based on the sum of the recommendation values.

第九圖為本發明影片推薦方法。FIG. 9 shows the video recommendation method of the present invention.

步驟S900,接收一影片,分析及萃取其關鍵場景和各關鍵場景中的一關鍵幀,取得描述各關鍵場景撥放長度的描述並轉換成代表各場景的一時間比重。形成個影片專屬的場景資料並儲存至影片資料庫供推薦影片時使用。In step S900, a video is received, key scenes and a key frame in each key scene are analyzed and extracted, a description of the length of each key scene is obtained and converted into a time proportion representing each scene, and scene data specific to each video is formed and stored in the video database for use in recommending videos.

步驟S902,利用不同影片的關鍵幀來決定相似場景。例如,影像辨識影片A的關鍵場景1的一張關鍵幀與影片B的關鍵場景2的一張關鍵幀,並計算兩者相似度。因此,若影片A的關鍵場景1的關鍵幀與影片B的關鍵場景2的關鍵幀均為含有汽車的影像,則兩場景較相似;反之,若其中一關鍵幀是含有汽車的影像,另一關鍵幀是含有船的影像,則兩者較不相似。所述相似場景的描述可使用數值範圍,如從低度相關為0.0至高度相關為1.0。In step S902, similar scenes are determined using key frames of different videos. For example, a key frame of key scene 1 of video A and a key frame of key scene 2 of video B are image recognized, and the similarity between the two is calculated. Therefore, if the key frame of key scene 1 of video A and the key frame of key scene 2 of video B are both images containing cars, the two scenes are relatively similar; conversely, if one of the key frames is an image containing a car and the other key frame is an image containing a boat, the two scenes are relatively dissimilar. The description of similar scenes can use a numerical range, such as from low correlation of 0.0 to high correlation of 1.0.

步驟S904,基於各影片關鍵場景的時間比重及不同影片關鍵場景之間的相似度,計算候選影片相對於已選擇觀看影片的一推薦值(即第八圖所示推薦值總和)。In step S904, based on the time proportion of each key scene in each video and the similarity between key scenes in different videos, a recommendation value (i.e., the total recommendation value shown in FIG. 8 ) of the candidate video relative to the selected video is calculated.

步驟S906,基於候選影片中所有影片的推薦值,產生一影片推薦排序,其指示優先呈現給一用戶的數個部分影片,讓該用戶在選擇觀看一影片後可獲得經推薦的部分候選影片的觀看選項。In step S906, based on the recommendation values of all the candidate videos, a video recommendation ranking is generated, which indicates several partial videos that are preferentially presented to a user, so that the user can obtain viewing options of the recommended partial candidate videos after choosing to watch a video.

然應了解,本發明的各個具體實施例僅是作為說明之用,在不脫離本發明申請專利範圍與精神下可進行各種改變,且均應包含於本發明之專利範圍中。因此,本說明書所描述的各具體實施例並非用以限制本發明,本發明之真實範圍與精神揭示於以下申請專利範圍。However, it should be understood that the specific embodiments of the present invention are only for illustrative purposes, and various changes can be made without departing from the scope and spirit of the patent application of the present invention, and all should be included in the patent scope of the present invention. Therefore, the specific embodiments described in this specification are not used to limit the present invention, and the true scope and spirit of the present invention are disclosed in the following patent application scope.

2:推薦系統 20:場景辨識模組 22:場景比重分析模組 24:相似度分析模組 26:推薦模組 S900至S906:步驟2: Recommendation system 20: Scene recognition module 22: Scene weight analysis module 24: Similarity analysis module 26: Recommendation module S900 to S906: Steps

參照下列圖式與說明,可更進一步理解本發明。非限制性與非窮舉性實例參照下列圖式而描述。在圖式中的構件並非必須為實際尺寸;重點在於說明結構及原理。The present invention may be further understood with reference to the following drawings and descriptions. Non-limiting and non-exhaustive examples are described with reference to the following drawings. The components in the drawings are not necessarily to actual size; the emphasis is on illustrating the structure and principle.

第一圖示意一影片的組成。The first figure shows the composition of a video.

第二圖示意一影片的組成,含有關鍵場景及非關鍵場景。The second figure shows the composition of a video, including key scenes and non-key scenes.

第三圖例示影片A中的關鍵場景1至3。The third figure illustrates key scenes 1 to 3 in video A.

第四圖示意本發明影片推薦系統。FIG. 4 illustrates the video recommendation system of the present invention.

第五圖例示影片A關鍵場景1與其他候選影片場景的關聯性。FIG. 5 illustrates the correlation between the key scene 1 of video A and other candidate video scenes.

第六圖例示影片A關鍵場景2與其他候選影片場景的關聯性。FIG. 6 illustrates the correlation between the key scene 2 of video A and other candidate video scenes.

第七圖例示影片A關鍵場景3與其他候選影片場景的關聯性。FIG. 7 illustrates the correlation between the key scene 3 of video A and other candidate video scenes.

第八圖列出其他影片的推薦值總和。The eighth figure lists the total recommendation values of other videos.

第九圖為本發明影片推薦方法。FIG. 9 shows the video recommendation method of the present invention.

S900至S906:步驟 S900 to S906: Steps

Claims (8)

一種基於影片關鍵幀的影片相似度判斷方法,該方法由處理器執行,該方法包括: 接收一第一影片及一第二影片; 識別該第一影片包含的一第一關鍵場景,及識別該第二影片包含的一第二關鍵場景,該第一影片的第一關鍵場景具有一第一時間比重,該第二影片的第二關鍵場景具有一第二時間比重; 萃取該第一影片的第一關鍵場景中的一第一關鍵幀,及萃取該第二影片的第二關鍵場景中的一第二關鍵幀; 利用影像辨識手段,決定該第一關鍵幀及該第二關鍵幀的一相似度;及 基於該第一時間比重、該第二時間比重與該相似度之乘積值,決定該第一影片的第一場景與該第二影片的第二場景的相似度。 A method for determining video similarity based on video key frames, the method being executed by a processor, the method comprising: receiving a first video and a second video; identifying a first key scene contained in the first video, and identifying a second key scene contained in the second video, the first key scene of the first video having a first time proportion, and the second key scene of the second video having a second time proportion; extracting a first key frame from the first key scene of the first video, and extracting a second key frame from the second key scene of the second video; using image recognition means to determine a similarity between the first key frame and the second key frame; and Based on the product of the first time proportion, the second time proportion and the similarity, the similarity between the first scene of the first video and the second scene of the second video is determined. 如請求項1所述之方法,其中,該第一關鍵幀及該第二關鍵幀為完整影像檔案。The method as described in claim 1, wherein the first key frame and the second key frame are complete image files. 如請求項1所述之方法,其中,該第一時間比重為該第一場景的一播放長度除以該第一影片的總播放長度,該第二時間比重為該第二場景的一播放長度除以該第二影片的總撥放長度。A method as described in claim 1, wherein the first time ratio is a playback length of the first scene divided by a total playback length of the first video, and the second time ratio is a playback length of the second scene divided by a total playback length of the second video. 如請求項1所述之方法,其中,該第一場景及該第二場景的資料描述,至少是以一場景名稱、一關鍵幀名稱、一時間碼的描述及一時間比重所構成。The method as described in claim 1, wherein the data description of the first scene and the second scene is composed of at least a scene name, a key frame name, a time code description and a time ratio. 一種基於影片關鍵幀的影片推薦方法,該方法由處理器執行,該方法包括: 接收一第一影片及一第二影片,其中該第一影片為在一串流影音平台上由一用戶選擇觀看之影片,該第二影片為待推薦候選影片集合中之一者; 識別該第一影片包含的一第一關鍵場景,及識別該第二影片包含的一第二關鍵場景,該第一影片的第一關鍵場景具有一第一時間比重,該第二影片的第二關鍵場景具有一第二時間比重; 萃取該第一影片的第一關鍵場景中的一第一關鍵幀,及萃取該第二影片的第二關鍵場景中的一第二關鍵幀; 利用影像辨識手段,決定該第一關鍵幀及該第二關鍵幀的一相似度; 基於該第一時間比重、該第二時間比重與該相似度之乘積值,決定該第二影片的第二場景具有一推薦值;及 基於該推薦值,在該串流影音平台上呈現該第二影片。 A video recommendation method based on video key frames, the method is executed by a processor, and the method includes: Receiving a first video and a second video, wherein the first video is a video selected by a user to watch on a streaming video platform, and the second video is one of a set of candidate videos to be recommended; Identifying a first key scene contained in the first video, and identifying a second key scene contained in the second video, the first key scene of the first video has a first time proportion, and the second key scene of the second video has a second time proportion; Extracting a first key frame in the first key scene of the first video, and extracting a second key frame in the second key scene of the second video; Using image recognition means, determining a similarity between the first key frame and the second key frame; Based on the product of the first time proportion, the second time proportion and the similarity, determine that the second scene of the second video has a recommendation value; and Based on the recommendation value, present the second video on the streaming video platform. 如請求項5所述之方法,其中,該第一關鍵幀及該第二關鍵幀為完整影像檔案。The method as described in claim 5, wherein the first key frame and the second key frame are complete image files. 如請求項5所述之方法,其中,該第一時間比重為該第一場景的一播放長度除以該第一影片的總播放長度,該第二時間比重為該第二場景的一播放長度除以該第二影片的總撥放長度。A method as described in claim 5, wherein the first time ratio is a playback length of the first scene divided by a total playback length of the first video, and the second time ratio is a playback length of the second scene divided by a total playback length of the second video. 如請求項5所述之方法,其中,該第一場景及該第二場景的資料描述,至少是以一場景名稱、一關鍵幀名稱、一時間碼的描述及一時間比重所構成。The method as described in claim 5, wherein the data description of the first scene and the second scene is composed of at least a scene name, a key frame name, a time code description and a time ratio.
TW113131962A 2024-08-26 2024-08-26 Video similarity judgment method and video recommendation method based on video key frames TWI881909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW113131962A TWI881909B (en) 2024-08-26 2024-08-26 Video similarity judgment method and video recommendation method based on video key frames

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW113131962A TWI881909B (en) 2024-08-26 2024-08-26 Video similarity judgment method and video recommendation method based on video key frames

Publications (1)

Publication Number Publication Date
TWI881909B true TWI881909B (en) 2025-04-21

Family

ID=96142059

Family Applications (1)

Application Number Title Priority Date Filing Date
TW113131962A TWI881909B (en) 2024-08-26 2024-08-26 Video similarity judgment method and video recommendation method based on video key frames

Country Status (1)

Country Link
TW (1) TWI881909B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955687A (en) * 2011-08-19 2013-03-06 联想(北京)有限公司 Method and equipment for picture presentation
US10853813B2 (en) * 2012-11-14 2020-12-01 The 41St Parameter, Inc. Systems and methods of global identification
TW202228048A (en) * 2021-01-12 2022-07-16 威聯通科技股份有限公司 Content recommendation system and content recommendation method
CN115146065A (en) * 2022-09-02 2022-10-04 安徽商信政通信息技术股份有限公司 Intelligent information reporting similar content merging method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955687A (en) * 2011-08-19 2013-03-06 联想(北京)有限公司 Method and equipment for picture presentation
US10853813B2 (en) * 2012-11-14 2020-12-01 The 41St Parameter, Inc. Systems and methods of global identification
TW202228048A (en) * 2021-01-12 2022-07-16 威聯通科技股份有限公司 Content recommendation system and content recommendation method
CN115146065A (en) * 2022-09-02 2022-10-04 安徽商信政通信息技术股份有限公司 Intelligent information reporting similar content merging method and system

Similar Documents

Publication Publication Date Title
CN111683209B (en) Mixed-cut video generation method and device, electronic equipment and computer-readable storage medium
Cohendet et al. VideoMem: Constructing, analyzing, predicting short-term and long-term video memorability
Cohendet et al. Annotating, understanding, and predicting long-term video memorability
KR101816113B1 (en) Estimating and displaying social interest in time-based media
TWI510064B (en) Video recommendation system and method thereof
CN101169955B (en) Method and apparatus for generating meta data of content
CN102207954B (en) Electronic equipment, content recommendation method and program thereof
JP4538756B2 (en) Information processing apparatus, information processing terminal, information processing method, and program
JP4539712B2 (en) Information processing terminal, information processing method, and program
JP5181640B2 (en) Information processing apparatus, information processing terminal, information processing method, and program
CN101452477A (en) Information processing apparatus, information processing method, and program
JP2006155384A (en) Video comment input / display method, apparatus, program, and storage medium storing program
CN110287375B (en) Method and device for determining video tag and server
US20130088645A1 (en) Method of Processing Moving Picture and Apparatus Thereof
JP2010520713A (en) System and method for video recommendation based on video frame features
JP2006319980A (en) Video summarizing apparatus, method and program using event
CN114845149A (en) Editing method of video clip, video recommendation method, device, equipment and medium
CN103984778A (en) Video retrieval method and video retrieval system
Bost A storytelling machine?: automatic video summarization: the case of TV series
CN113407708A (en) Feed generation method, information recommendation method, device and equipment
JP2006287319A (en) Program digest creation device and program digest creation program
TWI881909B (en) Video similarity judgment method and video recommendation method based on video key frames
US8306992B2 (en) System for determining content topicality, and method and program thereof
WO2014103374A1 (en) Information management device, server and control method
CN107277570B (en) A method of improving television terminal recommender system recommendation effect