WO2017149447A1

WO2017149447A1 - A system and method for providing real time media recommendations based on audio-visual analytics

Info

Publication number: WO2017149447A1
Application number: PCT/IB2017/051160
Authority: WO
Inventors: Abhinav GURU
Original assignee: Guru Abhinav
Priority date: 2016-02-29
Filing date: 2017-02-28
Publication date: 2017-09-08

Abstract

The present invention provides a system (100) for generating real time media recommendations for a video feed indicating a currently broadcasted or streamed video content to the user. For this purpose, the system (100) generates metadata based on the currently viewed video frame using audio-visual analytics techniques. The system (100) uses the generated metadata to identify media that is related to the current video frame viewed by the user and provides real time dynamic media recommendations to the user. The system (100) is also capable of providing real time contextual media recommendations embedded within the audio video stream based on the continuous analysis of the live or recorded video stream, as the video stream is being played to the user/group of users.

Description

TITLE OF THE INVENTION

A system and method for providing real time media recommendations based on audio-visual analytics

[0001] DESCRIPTION OF THE INVENTION:

[0002] Technical field of the invention

[0003] The present invention relates to a system and method for providing real time media recommendations and more specifically relates to a system for providing real time media recommendations using audio-visual analytics conducted at frame level optionally enhanced based on user profile/user viewing history and/or other metrics.

[0004] Background of the invention

[0005] Today, most of the current video sharing systems such as YouTube® and Vimeo® provide media recommendations based on the playback location of the current video frame watched by the user. However, these video sharing systems provide recommendations only after analyzing the entire video content. Further, the recommendations provided by the current video sharing systems remain the same throughout the video playback.

[0006] To overcome the above mentioned drawbacks, a system and method for providing dynamic video segment recommendation based on the current playback location of the video, as disclosed in the US patent document US20100199295A1 (referred herein as '295) was introduced. The term dynamic as disclosed in '295 means that the media recommendation changes as the scene changes. However, the system as disclosed in '295 generates recommendations only after analyzing the video content completely. Further, the recommendation generated is a onetime process and may not necessarily be repeated for each and every user. Further, the recommendations appear either as a advertisement in session or they appear as a separate list of interactive options. [0007] Hence, the system as disclosed in '295 does not provide effective media recommendations for the video content which has no prior recorded information.

[0008] Therefore, there exists a need of a system and method to provide real time media recommendations for recorded as well as live video stream. Further, there also exists a need to provide real time media recommendations that can be inserted at frame level.

[0009] Summary of the invention: [0010] The present invention relates to a system for providing real time media recommendations to the user for a video at frame level that is being watched by a user. For this purpose, the system of the present invention comprises a real time recommendation engine to generate metadata based on the currently broadcasted or streamed video content to the user. The real time recommendation engine uses the generated metadata to identify media content that is relevant to the current video frame watched by the user so as to provide real time dynamic media recommendations to a video consumption device. Here, the system saves the metadata generated in a data store for future references.

[0011] In accordance to one embodiment of the present invention, the real time recommendation engine generates metadata based on the currently broadcasted or streamed video content to the user, using an audio analytics module and a visual analytics module. Further, the real time recommendation engine uses the generated metadata to search for similar media content so as to provide real time contextual media recommendations to the user. [0012] In accordance to one embodiment of the present invention, the real time recommendation engine checks whether the video feed is being previously broadcasted before generating metadata, and then uses the saved metadata to make recommendations for the previously broadcasted video feed. Further, the recommendations may also appear as a part of the currently viewed audio/video feed. [0013] In accordance to one embodiment of the present invention, the real time recommendation engine provides real time contextual media recommendations based on the user's past viewing history, user preferences and/or metadata generated using the audio analytics module and visual analytics module. [0014] The method for providing real time media recommendations to the user or a group of user for a video at frame level comprises the steps of receiving a video feed indicating a currently broadcasted or streamed video content to the user or the group of users and generating metadata based on the currently viewed video frame by the user, using audio-visual analytics techniques. The method then uses the generated metadata to identify relevant media that is similar to the current video frame viewed by the user so as to provide real time dynamic media recommendations to the user/group of users.

[0015] Thus, the system and method of the present invention overcomes the drawback of the prior art by providing dynamic real time media recommendations for both recorded as well as live video feed continuously for the user/group of users. Further, the system may also provide recommendations as a part of the currently viewed audio/video feed.

[0016] Brief description of the drawings:

[0017] The foregoing and other features of embodiments will become more apparent from the following detailed description of embodiments when read in conjunction with the accompanying drawings. In the drawings, like reference numerals refer to like elements.

[0018] FIG 1 illustrates a system for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention.

[0019] FIG 2 illustrates a method for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention. [0020] Detailed description of the invention:

[0021] Reference will now be made in detail to the description of the present subject matter, one or more examples of which are shown in figures. Each example is provided to explain the subject matter and not a limitation. Various changes and modifications obvious to one skilled in the art to which the invention pertains are deemed to be within the spirit, scope and contemplation of the invention.

[0022] The present invention provides a system and method for generating real time media recommendations for a video feed indicating a currently broadcasted or streamed video content to the user. For this purpose, the system generates metadata based on the currently viewed video frame using audio-visual analytics techniques. The system uses the generated metadata to identify relevant media for the current video frame viewed by the user and provides real time dynamic media recommendations to the user/group of users. [0023] FIG 1 illustrates a system for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention. The system (100) comprises a real time recommendation engine (101) to generate metadata based on the currently broadcasted or streamed video content to the user, using an audio analytics module (101a) and a visual analytics module (101b). The real time recommendation engine (101) uses the generated metadata to identify media content that is relevant to the current video frame watched by the user, and then provides real time dynamic media recommendations (104) to a video consumption device (106).

[0024] In accordance to one embodiment of the present invention, the video consumption device (106) may either pull or poll real time media recommendations for the current video frame watched by the user. Alternatively, the recommendations might be embedded into the audio/video stream either at the video consumption device (106) or in the broadcasted feed.

[0025] In accordance to one embodiment of the present invention, the real time recommendation engine (101) uses the audio analytics module (101a) to generate metadata based upon the received audio stream for the current video frame watched by the user. The audio analytics module (101a) is configured to separate the audio stream from the received video feed and convert the received audio stream into audio text data. The audio analytics module (101a) uses the converted audio text data to identify metadata associated with the audio stream. The audio analytics module (101a) further analyzes the audio text data to identify domains associated with metadata. The system (100) then stores the identified audio metadata along with their associated domain in a database say an analytics database (102). [0026] In accordance to one or more embodiment of the present invention possible metadata could be names such as celebrity names, brand names, landmarks, weather forecast, news, art, entertainment, real estates, technologies etc. The audio analytics module (101a), upon identification of the metadata from the audio text, looks for the possible domains that may be associated with the identified metadata. For instance, if the metadata is a celebrity name, then the audio analytics module (101a) further analyses the audio text for further finding the domain name such as film, sports, music, dance industry associated with that corresponding metadata celebrity name. The audio analytics module (101a) is also capable of providing list of similar celebrity names in the same domain and stores it in the analytics database (102).

[0027] In accordance to another embodiment of the present invention, the visual analytics module (101b) of the real time recommendation engine (101) is configured to recognize faces, landmarks and objects present in the current video frame. This visual recognition process is accelerated using the identification of possible matches of people, landmarks and objects based on the metadata obtained from the audio analytics module (101a), by using face detection and object detection techniques. For example, if the metadata identified from the audio analytics module (101a) is a celebrity name such as 'Sachin' and the associated domain is cricket, then the visual analytics module (101b) would narrow the set of possible matches of the faces to the entities related to ' Sachin' thereby speeding up the visual recognition process. Thus, the real time recommendation engine (101) provides real time recommendations using audio-visual analytics techniques.

[0028] In accordance to one embodiment of the present invention, the real time recommendation engine (101) searches for similar media content using the metadata obtained from the audio analytics module (101a) and visual analytics module (101b) for providing real time dynamic media recommendations (104) with or without using a recommendation box (105) or embedding recommendations as a part of the audio/video feed. Further, the recommendation box (105) may provide a list of recommendations to the user in a collapsed form and the list may be expanded by a simple action such as by a mouse click or touch or voice command or remote key press etc over the recommendation box (105) or in case of embedded recommendations the user or user group would view or hear the recommendation as part of the viewing process.

[0029] In accordance to an alternate embodiment of the present invention, the recommendations present in the recommendation list may be based on the currently broadcasted or streamed video content to the user. Alternatively, the recommendations may also be based on the content that has been streamed/broadcasted overall. Thus, the real time recommendation engine (101) is capable of providing recommendations based on the overall content or the current video content streamed to the user/group of user. Further, the recommendations provided using audio visual analytics techniques may be further personalized based on the user profile, group profile, user past viewing history, group past viewing history, user preferences and/or group preferences.

[0030] In accordance to another embodiment of the present invention, the real time recommendation engine (101) also provides real time contextual based media recommendations based on the user/group past viewing history, user/group preferences and/or metadata generated using the audio analytics module (101a) and visual analytics module (101b). The real time recommendation engine (101) is also capable of providing real time contextual based media recommendations for a predetermined time interval say, every one second and/or based on the user/group settings for the frequency/granularity of recommendations. [0031] The system (100) may also use user identification system to provide more specific personalized real time dynamic media recommendations to the user. For this purpose, the system (100) may use one or more cameras, radio-frequency identification (RFID) reader, fingerprint scanner, Deoxyribonucleic acid (DNA) sequence based identifier, voice recognition system, retina based identification system known in the art or future-developed for electronically identifying/recognizing the user of the system (100). Thus, the system (100) upon identifying the user/group, using the above user/group identification system provides personalized media recommendations to the user based on the user profile, user past viewing history and/or user preferences. .

[0032] In accordance to one embodiment of the present invention, the system (100) is also capable of providing real time contextual media recommendations for the currently broadcasted or streamed television content such as news, serials, sports or movies. For instance, the system (100) may provide real time contextual based media recommendations such as interactive media timeline, statistics, and profiles for events such as a news story related to a terrorist attack, ongoing war, murder mystery, election constituency, actor, sports/sportsman, presenter, politicians, animals, birds, location etc.

[0033] In accordance to one embodiment of the present invention, the real time user/group tailored recommendations may appear as a part of currently broadcasted or streamed video content itself in either visual or audio form so that the consumption of the recommendations is seamless for the user/group of users as a part of natural viewing process.

[0034] The system (100) is also capable of providing real time medical health recommendations to the user based on the currently broadcasted or streamed medical health related video content to the user. Here, the possible medical health recommendations may be related to medical health conditions, symptoms or diagnosis related to the video feed currently viewed by the user.

[0035] In accordance to one embodiment of the present invention, the system (100) is capable of providing real time medical health recommendations taking into consideration the currently broadcasted or streamed video content to the user, user health history, geographic location, and/or socio-economic status of the user.

[0036] The system (100) of the present invention is also capable of providing real time media recommendations tailored to education field by providing Its users individualized education content specific to the aptitude of the student/learner and relevant to the video frame/segment/topic that is being consumed so as to enhance the rate of learning and enable more just in time learning for people of all age groups and fields.

[0037] In accordance to one or more embodiment of the present invention, the video consumption device (106) used herein includes a smart phone, a cellular phone, a personal digital assistant (PDA), a personal computer, a set top box, a streaming media player, a smart TV, a laptop or any similar computing or video consumption device that can be used by the user.

[0038] In accordance to one embodiment of the present invention, the real time recommendation engine (101) saves the metadata generated using the audio analytics module (101a) and the visual analytics module (101b).

[0039] In accordance to one embodiment of the present invention, the real time recommendation engine (101) checks whether the received video feed is being previously broadcasted before generating metadata, using audio-visual analytics techniques. Further, upon identifying that the video feed is being previously broadcasted, the real time recommendation engine (101) uses the stored saved metadata to make recommendations, if found optimal.

[0040] FIG 2 illustrates a method for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention. The method for providing real time dynamic media recommendations to the user comprises the steps of receiving a video feed indicating a currently broadcasted or streamed video content to the user/group of users at step 201. At step 202, the method generates metadata based on the currently viewed video frame by the user/group of users, using audio-visual analytics techniques. The method further uses the generated metadata to identify video frames or relevant media that is similar to the current video frame viewed by the user/group of users at step 203, and provides dynamic real time media recommendations based on the identified relevant media that is similar to the current video frame watched by the user/group of users at step 204. [0041] In accordance to one or more embodiment of the present invention, the method may continuously push recommendations to the video consumption device (106) commonly used today such as a television, laptop, streaming device, or a mobile phone. Alternatively, the video consumption device (106) may pull or poll real time media recommendations for the current video frame watched by the user/group of users. Further, the recommendations may be alternatively embedded into the audio/video stream during transmission/streaming of video or dynamically inserted real time into the stream at the video consumption device (106) so that the recommendations are part of the video and its consumption would be seamless for the user/group of users. [0042] Thus, the present invention overcomes the drawbacks of the prior art by providing a system (100) and method and for generating real time media recommendations for a video at frame level that is being watched by the user. The system (100) of the present invention is capable of providing real time media recommendations for both recorded as well as live video feed continuously. [0043] While at least one exemplary embodiment has been presented in the foregoing detailed description, it should be appreciated that a vast number of variations exist. It should also be appreciated that the exemplary embodiment or exemplary embodiments are only examples, and are not intended to limit the scope, applicability, or configuration in any way.

Claims

[0046] Claims [0047] I Claim,

1. A system (100) for providing real time media recommendations based on audio-visual analytics, the system (100) comprising:

a) a real time recommendation engine (101) to generate metadata based on currently broadcasted or streamed video content to the user using an audio analytics module (101a) and a visual analytics module (101b); and

b) a video consumption device (106), wherein the real time recommendation engine (101) uses the generated metadata to identify media content that is relevant to the current video frame watched by the user, and then provides real time dynamic media recommendations (104) on the video consumption device (106).

2. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) is a part of a video consumption device (106) or remotely linked to the video consumption device (106).

3. The system (100) as claimed in claim 1, wherein the real time

recommendation engine (101) uses the audio analytics module (101a) to generate metadata based upon the received audio stream for the current video frame watched by the user/group of users.

4. The system (100) as claimed in claim 3, wherein the audio analytics module (101a) is configured to:

a) separate the audio stream from the received video feed;

b) convert the received audio stream into audio text data;

c) identify metadata associated with the converted audio text data;

d) analyze the audio text data to identify domains associated with metadata;

e) identify domains associated with metadata; and

f) store the identified audio metadata along with the associated domain in an analytics database (102).

5. The system (100) as claimed in claim 1, wherein the visual analytics module (101b) is configured to recognize faces, landmarks and objects present in the current video frame.

6. The system (100) as claimed in claim 5, wherein the visual recognition process is accelerated by identifying possible matches of people, landmarks and objects based on the metadata obtained from the audio analytics module (101a).

7. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) is configured to:

a) search for similar media content using the metadata obtained from the audio analytics module (101a) and visual analytics module (101b); and b) provide real time contextual media recommendations.

8. The system (100) as claimed in claim 7, wherein the real time recommendation engine (101) provides real time contextual based media recommendations based on the user/group past viewing history, user /group preferences and/or metadata generated using the audio analytics module (101a) and visual analytics module (101b).

9. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) provides recommendations based on the content that has been streamed/broadcasted overall.

10. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) is also capable of providing real time contextual media recommendations.

11. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) saves the metadata generated using the audio analytics module (101a) and visual analytics module (101b).

12. The system (100) as claimed in claim 11, wherein the real time recommendation engine (101) checks whether the video feed is being previously broadcasted before generating metadata.

13. The system (100) as claimed in claim 12, wherein the system (100) uses the stored saved metadata to make recommendations for the previously broadcasted video feed.

14. The system (100) as claimed in claim 1, wherein the system (100) uses an user identification system to identify the user/group so as to provide personalized media recommendations to the user/group based on the user/group profile, user/group past viewing history and/or user/group preferences.

15. The system (100) as claimed in claim 1, wherein the system (100) provides real time contextual based media recommendations for the currently broadcasted or streamed television/educational content to the user.

16. The system (100) as claimed in claim 1, wherein the real time dynamic media recommendations also appears as a part of the currently broadcasted or streamed video content.

17. The system (100) as claimed in claim 1, wherein the system (100) provides real time medical health recommendations to the user based on the current broadcasted or streamed medical health related video content to the user.

18. The system (100) as claimed in claim 17, wherein the system (100) provides real time medical health recommendations taking into consideration the currently broadcasted or streamed video content to the user, user health history, geographic location, and/or socio-economic status of the user. A method for providing real time media recommendations to the user/group of users comprises the steps of:

a) receiving a video feed indicating a currently broadcasted or streamed video content to the user/group of users;

b) generating metadata based on the currently streamed video frame by the user/group of users, using audio-visual analytics techniques; c) identifying related media that is similar to the current video frame viewed by the user/group of users using the generated metadata; and

d) providing dynamic real time media recommendations based on the identified relevant media for the current video frame watched by the user/group of users, user/group profile, user/group past viewing history and/or user/group preferences.