[go: up one dir, main page]

WO2017149447A1 - A system and method for providing real time media recommendations based on audio-visual analytics - Google Patents

A system and method for providing real time media recommendations based on audio-visual analytics Download PDF

Info

Publication number
WO2017149447A1
WO2017149447A1 PCT/IB2017/051160 IB2017051160W WO2017149447A1 WO 2017149447 A1 WO2017149447 A1 WO 2017149447A1 IB 2017051160 W IB2017051160 W IB 2017051160W WO 2017149447 A1 WO2017149447 A1 WO 2017149447A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
real time
audio
metadata
group
Prior art date
Application number
PCT/IB2017/051160
Other languages
French (fr)
Inventor
Abhinav GURU
Original Assignee
Guru Abhinav
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guru Abhinav filed Critical Guru Abhinav
Publication of WO2017149447A1 publication Critical patent/WO2017149447A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/61Arrangements for services using the result of monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • H04H60/66Arrangements for services using the result of monitoring, identification or recognition covered by groups H04H60/29-H04H60/54 for using the result on distributors' side
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4825End-user interface for program selection using a list of items to be played back in a given order, e.g. playlists
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/35Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users
    • H04H60/46Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for recognising users' preferences

Definitions

  • the present invention relates to a system and method for providing real time media recommendations and more specifically relates to a system for providing real time media recommendations using audio-visual analytics conducted at frame level optionally enhanced based on user profile/user viewing history and/or other metrics.
  • a system and method for providing dynamic video segment recommendation based on the current playback location of the video was introduced.
  • the term dynamic as disclosed in '295 means that the media recommendation changes as the scene changes.
  • the system as disclosed in '295 generates recommendations only after analyzing the video content completely. Further, the recommendation generated is a onetime process and may not necessarily be repeated for each and every user. Further, the recommendations appear either as a advertisement in session or they appear as a separate list of interactive options. [0007]
  • the system as disclosed in '295 does not provide effective media recommendations for the video content which has no prior recorded information.
  • the present invention relates to a system for providing real time media recommendations to the user for a video at frame level that is being watched by a user.
  • the system of the present invention comprises a real time recommendation engine to generate metadata based on the currently broadcasted or streamed video content to the user.
  • the real time recommendation engine uses the generated metadata to identify media content that is relevant to the current video frame watched by the user so as to provide real time dynamic media recommendations to a video consumption device.
  • the system saves the metadata generated in a data store for future references.
  • the real time recommendation engine generates metadata based on the currently broadcasted or streamed video content to the user, using an audio analytics module and a visual analytics module. Further, the real time recommendation engine uses the generated metadata to search for similar media content so as to provide real time contextual media recommendations to the user. [0012] In accordance to one embodiment of the present invention, the real time recommendation engine checks whether the video feed is being previously broadcasted before generating metadata, and then uses the saved metadata to make recommendations for the previously broadcasted video feed. Further, the recommendations may also appear as a part of the currently viewed audio/video feed.
  • the real time recommendation engine provides real time contextual media recommendations based on the user's past viewing history, user preferences and/or metadata generated using the audio analytics module and visual analytics module.
  • the method for providing real time media recommendations to the user or a group of user for a video at frame level comprises the steps of receiving a video feed indicating a currently broadcasted or streamed video content to the user or the group of users and generating metadata based on the currently viewed video frame by the user, using audio-visual analytics techniques. The method then uses the generated metadata to identify relevant media that is similar to the current video frame viewed by the user so as to provide real time dynamic media recommendations to the user/group of users.
  • the system and method of the present invention overcomes the drawback of the prior art by providing dynamic real time media recommendations for both recorded as well as live video feed continuously for the user/group of users. Further, the system may also provide recommendations as a part of the currently viewed audio/video feed.
  • FIG 1 illustrates a system for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention.
  • FIG 2 illustrates a method for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention.
  • the present invention provides a system and method for generating real time media recommendations for a video feed indicating a currently broadcasted or streamed video content to the user.
  • the system generates metadata based on the currently viewed video frame using audio-visual analytics techniques.
  • the system uses the generated metadata to identify relevant media for the current video frame viewed by the user and provides real time dynamic media recommendations to the user/group of users.
  • FIG 1 illustrates a system for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention.
  • the system (100) comprises a real time recommendation engine (101) to generate metadata based on the currently broadcasted or streamed video content to the user, using an audio analytics module (101a) and a visual analytics module (101b).
  • the real time recommendation engine (101) uses the generated metadata to identify media content that is relevant to the current video frame watched by the user, and then provides real time dynamic media recommendations (104) to a video consumption device (106).
  • the video consumption device (106) may either pull or poll real time media recommendations for the current video frame watched by the user.
  • the recommendations might be embedded into the audio/video stream either at the video consumption device (106) or in the broadcasted feed.
  • the real time recommendation engine (101) uses the audio analytics module (101a) to generate metadata based upon the received audio stream for the current video frame watched by the user.
  • the audio analytics module (101a) is configured to separate the audio stream from the received video feed and convert the received audio stream into audio text data.
  • the audio analytics module (101a) uses the converted audio text data to identify metadata associated with the audio stream.
  • the audio analytics module (101a) further analyzes the audio text data to identify domains associated with metadata.
  • the system (100) then stores the identified audio metadata along with their associated domain in a database say an analytics database (102).
  • possible metadata could be names such as celebrity names, brand names, landmarks, weather forecast, news, art, entertainment, real estates, technologies etc.
  • the audio analytics module (101a) upon identification of the metadata from the audio text, looks for the possible domains that may be associated with the identified metadata. For instance, if the metadata is a celebrity name, then the audio analytics module (101a) further analyses the audio text for further finding the domain name such as film, sports, music, dance industry associated with that corresponding metadata celebrity name.
  • the audio analytics module (101a) is also capable of providing list of similar celebrity names in the same domain and stores it in the analytics database (102).
  • the visual analytics module (101b) of the real time recommendation engine (101) is configured to recognize faces, landmarks and objects present in the current video frame. This visual recognition process is accelerated using the identification of possible matches of people, landmarks and objects based on the metadata obtained from the audio analytics module (101a), by using face detection and object detection techniques. For example, if the metadata identified from the audio analytics module (101a) is a celebrity name such as 'Sachin' and the associated domain is cricket, then the visual analytics module (101b) would narrow the set of possible matches of the faces to the entities related to ' Sachin' thereby speeding up the visual recognition process. Thus, the real time recommendation engine (101) provides real time recommendations using audio-visual analytics techniques.
  • the real time recommendation engine (101) searches for similar media content using the metadata obtained from the audio analytics module (101a) and visual analytics module (101b) for providing real time dynamic media recommendations (104) with or without using a recommendation box (105) or embedding recommendations as a part of the audio/video feed.
  • the recommendation box (105) may provide a list of recommendations to the user in a collapsed form and the list may be expanded by a simple action such as by a mouse click or touch or voice command or remote key press etc over the recommendation box (105) or in case of embedded recommendations the user or user group would view or hear the recommendation as part of the viewing process.
  • the recommendations present in the recommendation list may be based on the currently broadcasted or streamed video content to the user.
  • the recommendations may also be based on the content that has been streamed/broadcasted overall.
  • the real time recommendation engine (101) is capable of providing recommendations based on the overall content or the current video content streamed to the user/group of user.
  • the recommendations provided using audio visual analytics techniques may be further personalized based on the user profile, group profile, user past viewing history, group past viewing history, user preferences and/or group preferences.
  • the real time recommendation engine (101) also provides real time contextual based media recommendations based on the user/group past viewing history, user/group preferences and/or metadata generated using the audio analytics module (101a) and visual analytics module (101b).
  • the real time recommendation engine (101) is also capable of providing real time contextual based media recommendations for a predetermined time interval say, every one second and/or based on the user/group settings for the frequency/granularity of recommendations.
  • the system (100) may also use user identification system to provide more specific personalized real time dynamic media recommendations to the user.
  • the system (100) may use one or more cameras, radio-frequency identification (RFID) reader, fingerprint scanner, Deoxyribonucleic acid (DNA) sequence based identifier, voice recognition system, retina based identification system known in the art or future-developed for electronically identifying/recognizing the user of the system (100).
  • RFID radio-frequency identification
  • DNA Deoxyribonucleic acid
  • voice recognition system voice recognition system
  • retina based identification system known in the art or future-developed for electronically identifying/recognizing the user of the system (100).
  • the system (100) upon identifying the user/group, using the above user/group identification system provides personalized media recommendations to the user based on the user profile, user past viewing history and/or user preferences.
  • the system (100) is also capable of providing real time contextual media recommendations for the currently broadcasted or streamed television content such as news, serials, sports or movies.
  • the system (100) may provide real time contextual based media recommendations such as interactive media timeline, statistics, and profiles for events such as a news story related to a terrorist attack, ongoing war, murder mystery, election constituency, actor, sports/sportsman, presenter, politicians, animals, birds, location etc.
  • the real time user/group tailored recommendations may appear as a part of currently broadcasted or streamed video content itself in either visual or audio form so that the consumption of the recommendations is seamless for the user/group of users as a part of natural viewing process.
  • the system (100) is also capable of providing real time medical health recommendations to the user based on the currently broadcasted or streamed medical health related video content to the user.
  • the possible medical health recommendations may be related to medical health conditions, symptoms or diagnosis related to the video feed currently viewed by the user.
  • the system (100) is capable of providing real time medical health recommendations taking into consideration the currently broadcasted or streamed video content to the user, user health history, geographic location, and/or socio-economic status of the user.
  • the system (100) of the present invention is also capable of providing real time media recommendations tailored to education field by providing Its users individualized education content specific to the aptitude of the student/learner and relevant to the video frame/segment/topic that is being consumed so as to enhance the rate of learning and enable more just in time learning for people of all age groups and fields.
  • the video consumption device (106) used herein includes a smart phone, a cellular phone, a personal digital assistant (PDA), a personal computer, a set top box, a streaming media player, a smart TV, a laptop or any similar computing or video consumption device that can be used by the user.
  • PDA personal digital assistant
  • the video consumption device (106) used herein includes a smart phone, a cellular phone, a personal digital assistant (PDA), a personal computer, a set top box, a streaming media player, a smart TV, a laptop or any similar computing or video consumption device that can be used by the user.
  • the real time recommendation engine (101) saves the metadata generated using the audio analytics module (101a) and the visual analytics module (101b).
  • the real time recommendation engine (101) checks whether the received video feed is being previously broadcasted before generating metadata, using audio-visual analytics techniques. Further, upon identifying that the video feed is being previously broadcasted, the real time recommendation engine (101) uses the stored saved metadata to make recommendations, if found optimal.
  • FIG 2 illustrates a method for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention.
  • the method for providing real time dynamic media recommendations to the user comprises the steps of receiving a video feed indicating a currently broadcasted or streamed video content to the user/group of users at step 201.
  • the method generates metadata based on the currently viewed video frame by the user/group of users, using audio-visual analytics techniques.
  • the method further uses the generated metadata to identify video frames or relevant media that is similar to the current video frame viewed by the user/group of users at step 203, and provides dynamic real time media recommendations based on the identified relevant media that is similar to the current video frame watched by the user/group of users at step 204.
  • the method may continuously push recommendations to the video consumption device (106) commonly used today such as a television, laptop, streaming device, or a mobile phone.
  • the video consumption device (106) may pull or poll real time media recommendations for the current video frame watched by the user/group of users.
  • the recommendations may be alternatively embedded into the audio/video stream during transmission/streaming of video or dynamically inserted real time into the stream at the video consumption device (106) so that the recommendations are part of the video and its consumption would be seamless for the user/group of users.
  • the present invention overcomes the drawbacks of the prior art by providing a system (100) and method and for generating real time media recommendations for a video at frame level that is being watched by the user.
  • the system (100) of the present invention is capable of providing real time media recommendations for both recorded as well as live video feed continuously.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The present invention provides a system (100) for generating real time media recommendations for a video feed indicating a currently broadcasted or streamed video content to the user. For this purpose, the system (100) generates metadata based on the currently viewed video frame using audio-visual analytics techniques. The system (100) uses the generated metadata to identify media that is related to the current video frame viewed by the user and provides real time dynamic media recommendations to the user. The system (100) is also capable of providing real time contextual media recommendations embedded within the audio video stream based on the continuous analysis of the live or recorded video stream, as the video stream is being played to the user/group of users.

Description

TITLE OF THE INVENTION
A system and method for providing real time media recommendations based on audio-visual analytics
[0001] DESCRIPTION OF THE INVENTION:
[0002] Technical field of the invention
[0003] The present invention relates to a system and method for providing real time media recommendations and more specifically relates to a system for providing real time media recommendations using audio-visual analytics conducted at frame level optionally enhanced based on user profile/user viewing history and/or other metrics.
[0004] Background of the invention
[0005] Today, most of the current video sharing systems such as YouTube® and Vimeo® provide media recommendations based on the playback location of the current video frame watched by the user. However, these video sharing systems provide recommendations only after analyzing the entire video content. Further, the recommendations provided by the current video sharing systems remain the same throughout the video playback.
[0006] To overcome the above mentioned drawbacks, a system and method for providing dynamic video segment recommendation based on the current playback location of the video, as disclosed in the US patent document US20100199295A1 (referred herein as '295) was introduced. The term dynamic as disclosed in '295 means that the media recommendation changes as the scene changes. However, the system as disclosed in '295 generates recommendations only after analyzing the video content completely. Further, the recommendation generated is a onetime process and may not necessarily be repeated for each and every user. Further, the recommendations appear either as a advertisement in session or they appear as a separate list of interactive options. [0007] Hence, the system as disclosed in '295 does not provide effective media recommendations for the video content which has no prior recorded information.
[0008] Therefore, there exists a need of a system and method to provide real time media recommendations for recorded as well as live video stream. Further, there also exists a need to provide real time media recommendations that can be inserted at frame level.
[0009] Summary of the invention: [0010] The present invention relates to a system for providing real time media recommendations to the user for a video at frame level that is being watched by a user. For this purpose, the system of the present invention comprises a real time recommendation engine to generate metadata based on the currently broadcasted or streamed video content to the user. The real time recommendation engine uses the generated metadata to identify media content that is relevant to the current video frame watched by the user so as to provide real time dynamic media recommendations to a video consumption device. Here, the system saves the metadata generated in a data store for future references.
[0011] In accordance to one embodiment of the present invention, the real time recommendation engine generates metadata based on the currently broadcasted or streamed video content to the user, using an audio analytics module and a visual analytics module. Further, the real time recommendation engine uses the generated metadata to search for similar media content so as to provide real time contextual media recommendations to the user. [0012] In accordance to one embodiment of the present invention, the real time recommendation engine checks whether the video feed is being previously broadcasted before generating metadata, and then uses the saved metadata to make recommendations for the previously broadcasted video feed. Further, the recommendations may also appear as a part of the currently viewed audio/video feed. [0013] In accordance to one embodiment of the present invention, the real time recommendation engine provides real time contextual media recommendations based on the user's past viewing history, user preferences and/or metadata generated using the audio analytics module and visual analytics module. [0014] The method for providing real time media recommendations to the user or a group of user for a video at frame level comprises the steps of receiving a video feed indicating a currently broadcasted or streamed video content to the user or the group of users and generating metadata based on the currently viewed video frame by the user, using audio-visual analytics techniques. The method then uses the generated metadata to identify relevant media that is similar to the current video frame viewed by the user so as to provide real time dynamic media recommendations to the user/group of users.
[0015] Thus, the system and method of the present invention overcomes the drawback of the prior art by providing dynamic real time media recommendations for both recorded as well as live video feed continuously for the user/group of users. Further, the system may also provide recommendations as a part of the currently viewed audio/video feed.
[0016] Brief description of the drawings:
[0017] The foregoing and other features of embodiments will become more apparent from the following detailed description of embodiments when read in conjunction with the accompanying drawings. In the drawings, like reference numerals refer to like elements.
[0018] FIG 1 illustrates a system for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention.
[0019] FIG 2 illustrates a method for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention. [0020] Detailed description of the invention:
[0021] Reference will now be made in detail to the description of the present subject matter, one or more examples of which are shown in figures. Each example is provided to explain the subject matter and not a limitation. Various changes and modifications obvious to one skilled in the art to which the invention pertains are deemed to be within the spirit, scope and contemplation of the invention.
[0022] The present invention provides a system and method for generating real time media recommendations for a video feed indicating a currently broadcasted or streamed video content to the user. For this purpose, the system generates metadata based on the currently viewed video frame using audio-visual analytics techniques. The system uses the generated metadata to identify relevant media for the current video frame viewed by the user and provides real time dynamic media recommendations to the user/group of users. [0023] FIG 1 illustrates a system for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention. The system (100) comprises a real time recommendation engine (101) to generate metadata based on the currently broadcasted or streamed video content to the user, using an audio analytics module (101a) and a visual analytics module (101b). The real time recommendation engine (101) uses the generated metadata to identify media content that is relevant to the current video frame watched by the user, and then provides real time dynamic media recommendations (104) to a video consumption device (106).
[0024] In accordance to one embodiment of the present invention, the video consumption device (106) may either pull or poll real time media recommendations for the current video frame watched by the user. Alternatively, the recommendations might be embedded into the audio/video stream either at the video consumption device (106) or in the broadcasted feed.
[0025] In accordance to one embodiment of the present invention, the real time recommendation engine (101) uses the audio analytics module (101a) to generate metadata based upon the received audio stream for the current video frame watched by the user. The audio analytics module (101a) is configured to separate the audio stream from the received video feed and convert the received audio stream into audio text data. The audio analytics module (101a) uses the converted audio text data to identify metadata associated with the audio stream. The audio analytics module (101a) further analyzes the audio text data to identify domains associated with metadata. The system (100) then stores the identified audio metadata along with their associated domain in a database say an analytics database (102). [0026] In accordance to one or more embodiment of the present invention possible metadata could be names such as celebrity names, brand names, landmarks, weather forecast, news, art, entertainment, real estates, technologies etc. The audio analytics module (101a), upon identification of the metadata from the audio text, looks for the possible domains that may be associated with the identified metadata. For instance, if the metadata is a celebrity name, then the audio analytics module (101a) further analyses the audio text for further finding the domain name such as film, sports, music, dance industry associated with that corresponding metadata celebrity name. The audio analytics module (101a) is also capable of providing list of similar celebrity names in the same domain and stores it in the analytics database (102).
[0027] In accordance to another embodiment of the present invention, the visual analytics module (101b) of the real time recommendation engine (101) is configured to recognize faces, landmarks and objects present in the current video frame. This visual recognition process is accelerated using the identification of possible matches of people, landmarks and objects based on the metadata obtained from the audio analytics module (101a), by using face detection and object detection techniques. For example, if the metadata identified from the audio analytics module (101a) is a celebrity name such as 'Sachin' and the associated domain is cricket, then the visual analytics module (101b) would narrow the set of possible matches of the faces to the entities related to ' Sachin' thereby speeding up the visual recognition process. Thus, the real time recommendation engine (101) provides real time recommendations using audio-visual analytics techniques.
[0028] In accordance to one embodiment of the present invention, the real time recommendation engine (101) searches for similar media content using the metadata obtained from the audio analytics module (101a) and visual analytics module (101b) for providing real time dynamic media recommendations (104) with or without using a recommendation box (105) or embedding recommendations as a part of the audio/video feed. Further, the recommendation box (105) may provide a list of recommendations to the user in a collapsed form and the list may be expanded by a simple action such as by a mouse click or touch or voice command or remote key press etc over the recommendation box (105) or in case of embedded recommendations the user or user group would view or hear the recommendation as part of the viewing process.
[0029] In accordance to an alternate embodiment of the present invention, the recommendations present in the recommendation list may be based on the currently broadcasted or streamed video content to the user. Alternatively, the recommendations may also be based on the content that has been streamed/broadcasted overall. Thus, the real time recommendation engine (101) is capable of providing recommendations based on the overall content or the current video content streamed to the user/group of user. Further, the recommendations provided using audio visual analytics techniques may be further personalized based on the user profile, group profile, user past viewing history, group past viewing history, user preferences and/or group preferences.
[0030] In accordance to another embodiment of the present invention, the real time recommendation engine (101) also provides real time contextual based media recommendations based on the user/group past viewing history, user/group preferences and/or metadata generated using the audio analytics module (101a) and visual analytics module (101b). The real time recommendation engine (101) is also capable of providing real time contextual based media recommendations for a predetermined time interval say, every one second and/or based on the user/group settings for the frequency/granularity of recommendations. [0031] The system (100) may also use user identification system to provide more specific personalized real time dynamic media recommendations to the user. For this purpose, the system (100) may use one or more cameras, radio-frequency identification (RFID) reader, fingerprint scanner, Deoxyribonucleic acid (DNA) sequence based identifier, voice recognition system, retina based identification system known in the art or future-developed for electronically identifying/recognizing the user of the system (100). Thus, the system (100) upon identifying the user/group, using the above user/group identification system provides personalized media recommendations to the user based on the user profile, user past viewing history and/or user preferences. .
[0032] In accordance to one embodiment of the present invention, the system (100) is also capable of providing real time contextual media recommendations for the currently broadcasted or streamed television content such as news, serials, sports or movies. For instance, the system (100) may provide real time contextual based media recommendations such as interactive media timeline, statistics, and profiles for events such as a news story related to a terrorist attack, ongoing war, murder mystery, election constituency, actor, sports/sportsman, presenter, politicians, animals, birds, location etc.
[0033] In accordance to one embodiment of the present invention, the real time user/group tailored recommendations may appear as a part of currently broadcasted or streamed video content itself in either visual or audio form so that the consumption of the recommendations is seamless for the user/group of users as a part of natural viewing process.
[0034] The system (100) is also capable of providing real time medical health recommendations to the user based on the currently broadcasted or streamed medical health related video content to the user. Here, the possible medical health recommendations may be related to medical health conditions, symptoms or diagnosis related to the video feed currently viewed by the user.
[0035] In accordance to one embodiment of the present invention, the system (100) is capable of providing real time medical health recommendations taking into consideration the currently broadcasted or streamed video content to the user, user health history, geographic location, and/or socio-economic status of the user.
[0036] The system (100) of the present invention is also capable of providing real time media recommendations tailored to education field by providing Its users individualized education content specific to the aptitude of the student/learner and relevant to the video frame/segment/topic that is being consumed so as to enhance the rate of learning and enable more just in time learning for people of all age groups and fields.
[0037] In accordance to one or more embodiment of the present invention, the video consumption device (106) used herein includes a smart phone, a cellular phone, a personal digital assistant (PDA), a personal computer, a set top box, a streaming media player, a smart TV, a laptop or any similar computing or video consumption device that can be used by the user.
[0038] In accordance to one embodiment of the present invention, the real time recommendation engine (101) saves the metadata generated using the audio analytics module (101a) and the visual analytics module (101b).
[0039] In accordance to one embodiment of the present invention, the real time recommendation engine (101) checks whether the received video feed is being previously broadcasted before generating metadata, using audio-visual analytics techniques. Further, upon identifying that the video feed is being previously broadcasted, the real time recommendation engine (101) uses the stored saved metadata to make recommendations, if found optimal.
[0040] FIG 2 illustrates a method for providing real time media recommendations to the user, in accordance to one or more embodiment of the present invention. The method for providing real time dynamic media recommendations to the user comprises the steps of receiving a video feed indicating a currently broadcasted or streamed video content to the user/group of users at step 201. At step 202, the method generates metadata based on the currently viewed video frame by the user/group of users, using audio-visual analytics techniques. The method further uses the generated metadata to identify video frames or relevant media that is similar to the current video frame viewed by the user/group of users at step 203, and provides dynamic real time media recommendations based on the identified relevant media that is similar to the current video frame watched by the user/group of users at step 204. [0041] In accordance to one or more embodiment of the present invention, the method may continuously push recommendations to the video consumption device (106) commonly used today such as a television, laptop, streaming device, or a mobile phone. Alternatively, the video consumption device (106) may pull or poll real time media recommendations for the current video frame watched by the user/group of users. Further, the recommendations may be alternatively embedded into the audio/video stream during transmission/streaming of video or dynamically inserted real time into the stream at the video consumption device (106) so that the recommendations are part of the video and its consumption would be seamless for the user/group of users. [0042] Thus, the present invention overcomes the drawbacks of the prior art by providing a system (100) and method and for generating real time media recommendations for a video at frame level that is being watched by the user. The system (100) of the present invention is capable of providing real time media recommendations for both recorded as well as live video feed continuously. [0043] While at least one exemplary embodiment has been presented in the foregoing detailed description, it should be appreciated that a vast number of variations exist. It should also be appreciated that the exemplary embodiment or exemplary embodiments are only examples, and are not intended to limit the scope, applicability, or configuration in any way.

Claims

[0046] Claims [0047] I Claim,
1. A system (100) for providing real time media recommendations based on audio-visual analytics, the system (100) comprising:
a) a real time recommendation engine (101) to generate metadata based on currently broadcasted or streamed video content to the user using an audio analytics module (101a) and a visual analytics module (101b); and
b) a video consumption device (106), wherein the real time recommendation engine (101) uses the generated metadata to identify media content that is relevant to the current video frame watched by the user, and then provides real time dynamic media recommendations (104) on the video consumption device (106).
2. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) is a part of a video consumption device (106) or remotely linked to the video consumption device (106).
3. The system (100) as claimed in claim 1, wherein the real time
recommendation engine (101) uses the audio analytics module (101a) to generate metadata based upon the received audio stream for the current video frame watched by the user/group of users.
4. The system (100) as claimed in claim 3, wherein the audio analytics module (101a) is configured to:
a) separate the audio stream from the received video feed;
b) convert the received audio stream into audio text data;
c) identify metadata associated with the converted audio text data;
d) analyze the audio text data to identify domains associated with metadata;
e) identify domains associated with metadata; and
f) store the identified audio metadata along with the associated domain in an analytics database (102).
5. The system (100) as claimed in claim 1, wherein the visual analytics module (101b) is configured to recognize faces, landmarks and objects present in the current video frame.
6. The system (100) as claimed in claim 5, wherein the visual recognition process is accelerated by identifying possible matches of people, landmarks and objects based on the metadata obtained from the audio analytics module (101a).
7. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) is configured to:
a) search for similar media content using the metadata obtained from the audio analytics module (101a) and visual analytics module (101b); and b) provide real time contextual media recommendations.
8. The system (100) as claimed in claim 7, wherein the real time recommendation engine (101) provides real time contextual based media recommendations based on the user/group past viewing history, user /group preferences and/or metadata generated using the audio analytics module (101a) and visual analytics module (101b).
9. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) provides recommendations based on the content that has been streamed/broadcasted overall.
10. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) is also capable of providing real time contextual media recommendations.
11. The system (100) as claimed in claim 1, wherein the real time recommendation engine (101) saves the metadata generated using the audio analytics module (101a) and visual analytics module (101b).
12. The system (100) as claimed in claim 11, wherein the real time recommendation engine (101) checks whether the video feed is being previously broadcasted before generating metadata.
13. The system (100) as claimed in claim 12, wherein the system (100) uses the stored saved metadata to make recommendations for the previously broadcasted video feed.
14. The system (100) as claimed in claim 1, wherein the system (100) uses an user identification system to identify the user/group so as to provide personalized media recommendations to the user/group based on the user/group profile, user/group past viewing history and/or user/group preferences.
15. The system (100) as claimed in claim 1, wherein the system (100) provides real time contextual based media recommendations for the currently broadcasted or streamed television/educational content to the user.
16. The system (100) as claimed in claim 1, wherein the real time dynamic media recommendations also appears as a part of the currently broadcasted or streamed video content.
17. The system (100) as claimed in claim 1, wherein the system (100) provides real time medical health recommendations to the user based on the current broadcasted or streamed medical health related video content to the user.
18. The system (100) as claimed in claim 17, wherein the system (100) provides real time medical health recommendations taking into consideration the currently broadcasted or streamed video content to the user, user health history, geographic location, and/or socio-economic status of the user. A method for providing real time media recommendations to the user/group of users comprises the steps of:
a) receiving a video feed indicating a currently broadcasted or streamed video content to the user/group of users;
b) generating metadata based on the currently streamed video frame by the user/group of users, using audio-visual analytics techniques; c) identifying related media that is similar to the current video frame viewed by the user/group of users using the generated metadata; and
d) providing dynamic real time media recommendations based on the identified relevant media for the current video frame watched by the user/group of users, user/group profile, user/group past viewing history and/or user/group preferences.
PCT/IB2017/051160 2016-02-29 2017-02-28 A system and method for providing real time media recommendations based on audio-visual analytics WO2017149447A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201641006867 2016-02-29
IN201641006867 2016-02-29

Publications (1)

Publication Number Publication Date
WO2017149447A1 true WO2017149447A1 (en) 2017-09-08

Family

ID=59743594

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2017/051160 WO2017149447A1 (en) 2016-02-29 2017-02-28 A system and method for providing real time media recommendations based on audio-visual analytics

Country Status (1)

Country Link
WO (1) WO2017149447A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10652592B2 (en) 2017-07-02 2020-05-12 Comigo Ltd. Named entity disambiguation for providing TV content enrichment
US10939146B2 (en) 2018-01-17 2021-03-02 Comigo Ltd. Devices, systems and methods for dynamically selecting or generating textual titles for enrichment data of video content items

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100199295A1 (en) * 2009-02-02 2010-08-05 Napo Enterprises Dynamic video segment recommendation based on video playback location
US20100195975A1 (en) * 2009-02-02 2010-08-05 Porto Technology, Llc System and method for semantic trick play
US20150156530A1 (en) * 2013-11-29 2015-06-04 International Business Machines Corporation Media selection based on content of broadcast information
US20150186510A1 (en) * 2007-12-21 2015-07-02 Lemi Technology, Llc System For Generating Media Recommendations In A Distributed Environment Based On Seed Information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150186510A1 (en) * 2007-12-21 2015-07-02 Lemi Technology, Llc System For Generating Media Recommendations In A Distributed Environment Based On Seed Information
US20100199295A1 (en) * 2009-02-02 2010-08-05 Napo Enterprises Dynamic video segment recommendation based on video playback location
US20100195975A1 (en) * 2009-02-02 2010-08-05 Porto Technology, Llc System and method for semantic trick play
US20150156530A1 (en) * 2013-11-29 2015-06-04 International Business Machines Corporation Media selection based on content of broadcast information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10652592B2 (en) 2017-07-02 2020-05-12 Comigo Ltd. Named entity disambiguation for providing TV content enrichment
US10939146B2 (en) 2018-01-17 2021-03-02 Comigo Ltd. Devices, systems and methods for dynamically selecting or generating textual titles for enrichment data of video content items

Similar Documents

Publication Publication Date Title
US9888279B2 (en) Content based video content segmentation
US11197036B2 (en) Multimedia stream analysis and retrieval
US10271098B2 (en) Methods for identifying video segments and displaying contextually targeted content on a connected television
US9202523B2 (en) Method and apparatus for providing information related to broadcast programs
US12425698B2 (en) Detection of common media segments
US9471936B2 (en) Web identity to social media identity correlation
EP2541963B1 (en) Method for identifying video segments and displaying contextually targeted content on a connected television
US11080749B2 (en) Synchronising advertisements
US10652592B2 (en) Named entity disambiguation for providing TV content enrichment
US11057457B2 (en) Television key phrase detection
US20140075465A1 (en) Time varying evaluation of multimedia content
CN108509611B (en) Method and device for pushing information
ES2648368A1 (en) Video recommendation based on content (Machine-translation by Google Translate, not legally binding)
Vettehen et al. Competitive pressure and arousing television news: A cross-cultural study
WO2017149447A1 (en) A system and method for providing real time media recommendations based on audio-visual analytics
KR101674310B1 (en) System and method for matching advertisement for providing advertisement associated with video contents
EP3044728A1 (en) Content based video content segmentation

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17759349

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17759349

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.02.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17759349

Country of ref document: EP

Kind code of ref document: A1