[go: up one dir, main page]

WO2011039773A2 - Tv news analysis system for multilingual broadcast channels - Google Patents

Tv news analysis system for multilingual broadcast channels Download PDF

Info

Publication number
WO2011039773A2
WO2011039773A2 PCT/IN2010/000617 IN2010000617W WO2011039773A2 WO 2011039773 A2 WO2011039773 A2 WO 2011039773A2 IN 2010000617 W IN2010000617 W IN 2010000617W WO 2011039773 A2 WO2011039773 A2 WO 2011039773A2
Authority
WO
WIPO (PCT)
Prior art keywords
news
news programs
programs
identified
repeat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IN2010/000617
Other languages
French (fr)
Other versions
WO2011039773A3 (en
Inventor
Ghosh Hiranmay
Kopparapu Sunilkumar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tata Consultancy Services Ltd
Original Assignee
Tata Consultancy Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Ltd filed Critical Tata Consultancy Services Ltd
Publication of WO2011039773A2 publication Critical patent/WO2011039773A2/en
Publication of WO2011039773A3 publication Critical patent/WO2011039773A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/16Analogue secrecy systems; Analogue subscription systems
    • H04N7/162Authorising the user terminal, e.g. by paying; Registering the use of a subscription channel, e.g. billing
    • H04N7/163Authorising the user terminal, e.g. by paying; Registering the use of a subscription channel, e.g. billing by receiver means only
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4331Caching operations, e.g. of an advertisement for later insertion during playback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data

Definitions

  • the present invention relates to the field of computer vision and audio processing techniques.
  • the present invention relates to analysis of television (TV) news channels.
  • this invention relates to TV news analysis system for multilingual broadcast channels
  • US2004189873 discloses a VIDEO DETECTION AND INSERTION SYSTEM.
  • US2004189873 system includes means which detects defined segments in a video stream.
  • the defined segments may be advertisements, as mentioned.
  • US6614987 discloses a TELEVISION PROGRAM RECORDING WITH USER PREFERENCE DETERMINATION.
  • the system includes a module which is responsive to attribute information in accordance with categorization (classification) parameters or viewing preferences of the user.
  • categorization classification
  • US2002162118 discloses an EFFICIENT INTERACTIVE TV.
  • This system includes content identifier means to identify content or a subset of content. This identification, not only helps in identify news stories and advertisements, but is also poised to identify repetitive news stories.
  • US6608930 discloses a METHOD AND SYSTEM FOR ANALYZING VIDEO CONTENT USING DETECTED TEXT IN VIDEO FRAMES. This system detects video streams based on user-selected image text attributes. A selected attribute may be news stories or advertisements or both. So, both may be individually identified for segregation purposes. Further, recognition of persons featuring in the detected video is also disclosed. However, all these features are enabled due to the (image) text that is available in each video stream.
  • ARTIFICIAL TV NEWS PROGRAMS It includes means to process the language of a newscaster to be translated into choice of user, combines automatic speech recognition (Speech-To-Text processing), automatic machine translation, and audio-visual Text-To-Speech (TTS) synthesis techniques for automatically personalizing TV news programs.
  • speech-To-Text processing automatic speech recognition
  • machine translation automatic machine translation
  • audio-visual Text-To-Speech (TTS) synthesis techniques for automatically personalizing TV news programs.
  • this patent application does not provide a solution for identifying news programs and classification of said programs, or even identifying programs based on metadata.
  • An object of the invention is to provide an integrated and complete solution for news video analysis.
  • Another object of the invention is to provide a system wherein TV newscasts in different languages can be processed.
  • Yet another object of the invention is to automatically identify news programs in a broadcast stream and separate it out from other programs.
  • Still another object of the invention is to provide a system wherein advertisements in TV newscasts are automatically identified and removed from the news program.
  • Still an additional object of the invention is to provide a system for news analysis wherein similar stories (pertaining to the same event) on different channels are identified and clustered.
  • Another additional object of the invention is to provide a system for news analysis wherein each news story is indexed with keywords identified in the speech and visual text as well as other metadata, such as a recognized face.
  • Another object of the invention is to provide a system where news stories in languages, for which speech and OCR technologies are not mature, are indexed based on their similarity with stories in other languages where speech and OCR technologies is mature.
  • Yet another additional object of the invention is to provide a system for news analysis wherein the stories are classified and can be retrieved.
  • a system for identification, classification, storage, and analysis of news programs containing an audio channel, video channel, and metadata relating to it, broadcasted/relayed on a television (TV) channel by means of a plurality of TV broadcast streams, said system comprising:
  • - recording module adapted to record said captured streams on a physical storage
  • - news program identification module adapted to identify news programs in said stored broadcast streams
  • - news program clipping module adapted to separate said identified news programs from other programs
  • - advertisement clipping module adapted for removal of said identified advertisements
  • - seam detection module adapted to detect and identify seams of said news programs in order to demarcate individual stories in a news program
  • - keyword generation module adapted to generate a list of keywords
  • - text-keyword identification module adapted to identify said created keywords from visual text of identified said news programs
  • - speech-keyword identification module adapted to identify the created keywords from the speech of said identified news programs
  • - repeat-identification module adapted to identify similar/repeat news programs from said plurality of TV broadcast streams
  • - clustering module adapted to cluster said repeat news programs into one news programs, in order to avoid duplication or multiplication
  • - removal module adapted to remove said repeat-identified news programs
  • - logical interconnection module adapted to logically interconnecting each of said modules for determining the sequence of steps for a multilingual news video analysis system.
  • said keyword generation module is a multilingual keyword generation module adapted to generate keywords in multiple languages.
  • said text-keyword identification module is a multilingual text- keyword identification module adapted to identify said created keywords from visual text of identified said news programs, in different languages, in the visual channel of the news program.
  • said speech-keyword identification module is a multilingual speech-keyword identification module adapted to identify said created keywords from the speech in different languages, in the audio channel of the news program.
  • said system includes a multilingual lexicon database for generating multilingual synonymous keywords for said created keywords.
  • an acquisition module adapted to capture a TV broadcast stream and further includes a recording module adapted to record said captured stream on a physical storage, typically on disk, in chunks of manageable size.
  • a news program identification module adapted to identify news programs in the broadcast stream and to separate them from other programs.
  • an advertisement identification module for identification of advertisements from said news programs, and further including an advertisement clipping module adapted for removal of said identified advertisement breaks.
  • a keyword generation module to create a list of desired keywords of contemporary interest in different languages.
  • a text-keyword identification module adapted to identify the desired created keywords from the visual text of said news stories, in different languages, typically appearing in form of ticker text on the screen.
  • a speech-keyword identification module adapted to identify the desired keywords from the speech, in different languages, in the audio channel of the news.
  • a seam detection module adapted to detect and identify seams i.e. story boundaries and demarcate the individual stories in a news program.
  • a repeat-identification module adapted to identify similar/repeat stories from multiple channels and further including a clustering module adapted to cluster said repeat stories into one story, to avoid duplication or multiplication.
  • a removal module adapted to identify duplicate stories (repeat telecasts) and remove them from the selected stories.
  • a repository adapted for storing the news contents, content description of the news videos, various indexes and links as discovered in the previously described modules.
  • said system includes a retrieval module adapted for retrieving a news program from said repository.
  • said system includes a navigation means adapted for navigation in said repository for retrieving a news program.
  • a logical interconnection module adapted to logically interconnect all the said modules for determining the sequence of steps for a multilingual news video analysis system.
  • repeat-identification module includes visual matching means adapted to use visual cues in order to identify repeat news programs, said visual matching means comprises:
  • - key frame identification means adapted to identify at least a key frame in a plurality of news program
  • - key frame visual feature extruding means adapted to extrude visual features relating to pre-defined parameters of said identified key frames
  • - processing means adapted to process said extruded features based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program
  • - identification means adapted to identify repeat news programs based on pre-defined criteria of said similarity score
  • - deletion means adapted to delete said identified repeat news programs.
  • repeat-identification module included audio matching means adapted to use audio cues in order to identify repeat news programs, said audio matching means comprises:
  • - window determination means adapted to determine a window of frames in a plurality of news programs
  • - fingerprint detection means adapted to detect audio fingerprint based on pre-defined processing criteria on said determined window of frames;
  • processing means adapted to process said detected audio fingerprint based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program;
  • - identification means adapted to identify repeat news programs based on pre-defined criteria of said similarity score
  • - deletion means adapted to delete said identified repeat news programs.
  • a method for identification, classification, storage, and analysis of news programs containing an audio channel, video channel, and metadata relating to it, broadcasted/relayed on a television (TV) channel by means of a plurality of TV broadcast streams, said method comprises the steps of:
  • said step of identifying said created keywords from visual text of identified said news programs includes the step of generating keywords in multiple languages.
  • said step of identifying created text keywords includes the step of identifying created keywords from visual text of identified said news programs, in different languages, in the visual channel of the news program.
  • said step of identifying speech-keywords includes the step of identifying said created keywords from the speech in different languages, in the audio channel of the news program.
  • said method includes a step of retrieving a news program from said repository.
  • said method includes a step of navigating in said repository for retrieving a news program.
  • said step of removing repeat-identified news programs includes a method of using visual cues in order to identify repeat news programs, said method comprises the steps of:
  • said step of removing repeat-identified news programs includes a method of using audio cues in order to identify repeat news programs, said method comprises the steps of:
  • Figure 1 illustrates a schematic block diagram of the multilingual news video analysis system.
  • Figure 1 illustrates a schematic block diagram of the multilingual news video analysis system in accordance with the present invention.
  • the Telecast Acquisition Module (10) captures telecast from several possible sources, e.g. a DTH dish, cable TV, etc., tunes to a particular channel, decodes the TV signals and converts the transmission in standard digital video format, e.g. MPEG-4 or the like. This module is replicated for every channel to be monitored.
  • sources e.g. a DTH dish, cable TV, etc.
  • This module is replicated for every channel to be monitored.
  • the video streams captured by the Telecast Acquisition modules (10) are stored in a Recording Module (20) in chunks of manageable size with unique file names.
  • Video Description Module (40).
  • Advertisement breaks within a news program are now detected using absence of specific ticker-text bands and marked in the video in Advertisement Identification Module (50).
  • the video is decomposed into constituent shots and several visual and audio parameters are extracted. The additional information accumulates in Video Description Module (40).
  • a set of keywords of contemporary interest are selected by analysis of RSS feeds by a Keyword Generation Module (60).
  • the video segments representing news programs are now processed to detect these keywords.
  • a Keyword Recognition Module (70) analyzes the visual text and speech to spot the identified keywords.
  • the visual keywords are classified into 'global' and 'local' categories, depending on the ticker-text band where they appear. While the 'local' keywords pertain to the current story being telecast, the 'global' keywords do not pertain to a story that may appear anywhere in the news program.
  • Speaker identification Module (80) identifies the speaker using face recognition and speaker identification (speech) technologies in the scenes containing one dominant speaker, for example in speeches made by important personalities. The additional information further augments the description in Video Description Module (40).
  • the keyword generation module is a multilingual keyword generation module.
  • the system provided an ability process multilingual news programs, according to this invention.
  • a multilingual keyword list in multiple languages, is created, in order to enable keyword spotting in multilingual TV news broadcast channels, both in spoken and visual forms.
  • multilingual keyword list helps to automatically map the spotted keywords in different languages to a primary language (say English) equivalents for uniform indexing across multiple channels. Restricting the keyword list to a small number helps in improving the accuracy of the system, especially for keyword spotting in speech.
  • a sample multilingual keyword list is shown below:
  • the method for creating a multilingual keyword list is fueled by RSS feeds, maintained by some website systems.
  • RSS feeds captures the contemporary news in a semi-structured XML format and contains hyperlinks to the full-text news stories usually in English.
  • the system of this invention identifies the common (statistical language processing) and proper nouns (using named entity detection processing) in the RSS feed text and the associated stories as the keywords.
  • the keywords in the language of the RSS (usually English) forms a set of concepts, which need to be identified in the audio-visual broadcast in different language telecasts.
  • the equivalent keywords in other languages from the English keywords can be derived using a word level English-to-language dictionary (for common noun keywords) that language; a pronunciation lexicon (a lexicon is an association of words and their phonetic transcription. It is a special kind of dictionary that maps a word to all the possible phonemic representations of the word.) for transliterating proper names in a semi-automatic matter as suggested.
  • the keywords in multilingual form is dynamic keyword list structure in XML format. This becomes an active keyword list for the news video channels and is used for both keyword spotting in audio-visual new telecast.
  • One of the novelties of this invention is the use of keyword spotting instead of adopting a full transcription of new telecast to annotate multilingual news broadcast.
  • This serves three purposes (a) one need not determine the language of telecast a priori and (b) one need not have language specific speech recognition engines and (c) it is easier to keyword spot than try a full text transcription because the search space of the speech to text (speech recognition engine) is constrained in search space.
  • it is sufficient to annotate the news telecast by the keywords because news broadcast is all about places and people (proper nouns) and a set of commonly nouns; additionally keyword annotation of the news broadcast occupies much less space than the erroneous full text transcription.
  • the Seam Detection Module (90) uses the video descriptors available in Video Description Module (40) to identify story boundaries.
  • Repeat- Identification Module (100) identifies similar and duplicate stories from multiple channels.
  • the OCR and speech technology for many Indian languages are not mature enough for reliable keyword extraction. Similar shot detection helps in classification of news stories in these languages.
  • the additional information further augments the description in Video Description Module (40) and is used to create a Repository Knowledge Base (110).
  • the knowledge base enables semantic search for news clusters by semantic analysis of the various metadata associated with the news videos in the earlier stages of processing.
  • both audio and visual cues are identified and used, from a plurality of news programs.
  • the recorded news videos are segregated into news programs for further processing.
  • shot detection technique is used where the news stories are logically segmented into distinct shots wherein each shot is represented by a key frame or representative frame.
  • similar story detection module finds similarity score, using visual matching techniques, between two news stories in the range of [0, 1], where '0' means no match and T means complete match.
  • the duplicate story detection module finds whether two news stories are duplicates of each other.
  • the shots are detected on the basis of difference in visual features of the successive frames in a video.
  • a key frame or representative frame and its corresponding visual features such as colour, texture, edges, etc are extracted for each shot.
  • the shots are clustered based on the visual similarity of their representative frames. This is calculated by distance measures such as Absolute Image Difference, Histogram Intersection, Hausdorff Distance, Color Moments, SIFT, and the like.
  • Each cluster in a story is now compared with every cluster in the other story by comparing the central representative frames in the clusters using a visual comparator.
  • ⁇ cu, c n ... c !m J be the clusters in story Si
  • ⁇ c 2 i, c 2 2 ⁇ ⁇ ⁇ c 2n ⁇ be the clusters in story s 2 .
  • ky be the number of shots in j' h cluster of story i. The process is repeated with every pair of candidate similar stories and clusters of similar stories are discovered.
  • SIM 12 is greater than a certain threshold, the news programs are designated to be similar.
  • Duplicate stories are a subset of similar stories. Two stories are said to be duplicates of each other only if their audio-visual patterns are same.
  • T s i and T s2 be the total duration of the two stories, 'm ' and are the total number of shots in the two stories respectively and and be the duration of shots in stories Si and s 2 respectively. Then the criteria for (visually) duplicate videos are:
  • the two stories are not duplicates of each other visually, if any of the above condition fails.
  • the audio patterns or audio fingerprint that are used are based on perceptual features of audio that are invariant, at least to certain degree, with respect to signal degradations. Thus severely degraded audio still leads to very similar audio fingerprints.
  • These fingerprints are matched for each frame block, which is a group of frames, from two streams. The two streams are duplicates if the fingerprints of all the frame blocks are matched. As the streams can be from different channels, they may not match exactly at the desired points. The match may occur at few samples before or after the desired point. Thus the audio frames are matched in a window of some predetermined size.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A system for identification, classification, storage, and analysis of news programs, containing an audio channel, video channel, and metadata relating to it, broadcasted/relayed on a television (TV) channel by means of a plurality of TV broadcast streams, said system comprising: - acquisition module adapted to capture said TV broadcast streams; - recording module adapted to record said captured streams on a physical storage; - news program identification module adapted to identify news programs in said stored broadcast streams; - news program clipping module adapted to separate said identified news programs from other programs; - advertisement identification module for identification of advertisements from said identified news programs; - advertisement clipping module adapted for removal of said identified advertisements; - seam detection module adapted to detect and identify seams of said news programs in order to demark individual stories in a news program; - keyword generation module adapted to generate a list of keywords; - text-keyword identification module adapted to identify said created keywords from visual text of identified said news programs; - speech-keyword identification module adapted to identify the created keywords from the speech of said identified news programs; - repeat-identification module adapted to identify similar/repeat news programs from said plurality of TV broadcast streams; - clustering module adapted to cluster said repeat news programs into one news programs, in order to avoid duplication or multiplication; - removal module adapted to remove said repeat-identified news programs; - repository adapted to store said news programs and metadata embedded in said news programs; and - logical interconnection module adapted to logically interconnecting each of said modules for determining the sequence of steps for a multilingual news video analysis system.

Description

TV NEWS ANALYSIS SYSTEM FOR MULTILINGUAL BROADCAST CHANNELS
Field of the invention:
The present invention relates to the field of computer vision and audio processing techniques.
Particularly, the present invention relates to analysis of television (TV) news channels.
Still particularly, this invention relates to TV news analysis system for multilingual broadcast channels
Background of the Invention:
Round-the-clock monitoring of several news channels in different languages, which is of paramount importance to several agencies, requires unaffordable manpower with language skills and is error-prone because of possible distractions. Thus, it is necessary to have an automated system for identifying and indexing individual stories from news broadcast streams. It is also necessary to filter out repeat transmissions and cluster similar stories broadcast on different channels, possibly in different languages.
While there has been significant research in multimodal analysis of news- video for their automated indexing and classification, the commercial applications are yet to mature. The commercial products like BBN Broadcast monitoring system and Nexedia rich media solution offer speech-only solutions for TV news broadcast indexing and retrieval for English and a handful of other languages. None of these solutions can differentiate between TV news and other TV programs and additionally they cannot filter out commercials. They index the audio-stream as a whole and cannot demarcate news story boundaries. None of the available solutions can handle telecasts in multiple languages using audio-visual features.
PRIOR ART
The paper, "Transcribing broadcast news for audio and video indexing"''', Communications of the ACM (CACM), 43(2), Feb 2000. pp 64—70; by Jean-Luc Gauvain, Lori Lamel and Gilles Adda, and another paper, "Speech and Language Technologies for Audio Indexing and Retrieval"; Proceedings of the IEEE, 88(8), August 2000 by John Makhoul, Francis Kubala, Timothy Leek, Daben Liu, Long Nguyen, Richard Schwartz and Amit Srivastava propose audio-based approaches, where the speech, in multiple languages, is transcribed and the constituent words and phrases have been used to index the contents of a broadcast stream. This approach has been followed in the commercial systems like BBN and Nexedia.
Another paper, "TRECVID 2004 search and feature extraction tas by NUS PR/S"; NIST TRECVID-2004, Nov 2004; by Tat-Seng Chua, S.Y. Neo, K. Li, G.H. Wang, R. Shi. M Zhao, H. Xu S. Gao and T.L proposes use of additional sources of information, e.g. Video OCR information, face recognizer and speaker identification for indexing news videos in English.
The paper, "Detection of acoustic patterns in broadcast news using neural networ "; Acustica 2004; by H. Meinedo and J. Neto proposes a method for using jingles to mark the boundaries of different programs on a TV channel.
The papers, "Automatic TV advertisement detection from MPEG bitstream"; Pattern Recognition, 35(12), December 2002, pp 2719— 2726; by David A. Sadlier, Sean Marlow, Noel O'Connor and Noel Murphy; "Time-constraint boost for TV commercial detection"; International Conference on Image Processing, (ICIP Ό4),_24-27 Oct. 2004. Vol 3, pp: 1617 - 1620; by Tie- Yan Liu, Tao Qin and Hong-Jiang Zhang; "Robust learning-based TV commercial detection"; Proc. 14 ACM International Conference on Multimedia and Expo (ICME), Amsterdam, 6 Jul, 2005; by Xian-Sheng Hua, Lie Lu and Hong- Jiang Zhang; "Segmentation Categorization and identification of commercials from TV streams using multimodal analysis"; International Multimedia Conference (ΜΜΌ6), 23-27 October, 2006; by Ling-Yu Duan, Jinqiao Wang, Yantao Zheng, Hanqing Lu and Jesse S. Jin; "TV ad video categorization with probabilistic latent concept learning"; Multimedia Information Retrieval (MIR'07), Augsburg, Sept 2007, pp 217— 226; by Jinqiao Wang, Ling-Yu Duan, Lei Xu, Yantao Zheng, Jesse S. Jin, Hanqing Lu and Changsheng Xu propose different methods for detecting advertisement breaks and advertisement classification. Another paper, "Story boundary detection in large broadcast news video archives: techniques experience and trends"; 12th ACM International Conference on Multimedia (MM' 04), pp. 656 - 659, 2004; by Tat-Seng Chua, Shih-Fu Chang, Lekha Chaisorn and Winston Hsu surveys several methods for story-boundary identification in a news program. Different methods are further described in "Story segmentation of broadcast news in English, Mandarin and Arabic'''; Proc. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 4-9 June 2006; by Andrew Rosenberg and Julia Hirschberg; "Story segmentation of broadcast news in Arabic, Chinese and English using multi-window features"; Proc 30 annual international ACM SIGIR conference on research and development in information retrieval (poster), pp 703 - 704, 2007; by Martin Franz and Jian-Ming Xu; "Texttiling: segmenting text into multi-paragraph subtopic passages"; Computational Linguistics, 23(1), pp 33— 64. 1997; by M.A. Hearst; "Unsupervised video-shot segmentation and model-free anchor-person detection for news video parsing"; IEEE Trans. Circuits and Systems for Video Technology. 12(9), pp. 765 - 776, 2002; by X. Gao and X. Tang; "Combining text and audio-visual features in video indexing"; Proceedings of IEEE International Conference on Accoustics, Speech and Signal Processing (ICASSP '05) pp. 1005—1008, 2005; by Shih-Fu Chang, R. Manmatha and Tat-Seng Chua; "A multi-modal approach to story segmentation for news video"; World Wide Web: Internet and Web Information Systems. Vol 6. pp 187— 208. 2003; by Lekha Chaisorn, Tat- Send Chua and Chin-Hui Lee; "Video story segmentation with multi-modal features: experiments on TRECvid 2003"; Multimedia Information Retrieval (MIR'04). October 15-16, 2004; by Laurent Besacier, George Quenot, Stephane Ayache and Daniel Moraru.
Papers, "A probabilistic framework for TV-news story detection and classification"; IEEE International Conference of Multimedia and Expo (iCME'05), pp 1350—1355. July 2005; by Francesco Colace, Pasquale Foggia and Gennaro Percannella and "Using Multimedia Ontology for generating conceptual annotations and hyperlinks in video collections"; International conference on Web Intelligence, Hong Kong, December 2006; by Gaurav Harit, Santanu Chaudhury and Hiranmay Ghosh, concentrates on sports news classification.
US2004189873 discloses a VIDEO DETECTION AND INSERTION SYSTEM. US2004189873 system includes means which detects defined segments in a video stream. The defined segments may be advertisements, as mentioned.
US6614987 discloses a TELEVISION PROGRAM RECORDING WITH USER PREFERENCE DETERMINATION. The system includes a module which is responsive to attribute information in accordance with categorization (classification) parameters or viewing preferences of the user. Thus, news stories may be identified according to this document. US2002162118 discloses an EFFICIENT INTERACTIVE TV. This system includes content identifier means to identify content or a subset of content. This identification, not only helps in identify news stories and advertisements, but is also poised to identify repetitive news stories.
US6608930 discloses a METHOD AND SYSTEM FOR ANALYZING VIDEO CONTENT USING DETECTED TEXT IN VIDEO FRAMES. This system detects video streams based on user-selected image text attributes. A selected attribute may be news stories or advertisements or both. So, both may be individually identified for segregation purposes. Further, recognition of persons featuring in the detected video is also disclosed. However, all these features are enabled due to the (image) text that is available in each video stream.
However, none of the above patents / patent applications provide a solution for handling multilingual applications such as multilingual television channels including multilingual news programs and removal of advertisements, thereof.
US2006136226 discloses a SYSTEM AND METHOD FOR CREATING
ARTIFICIAL TV NEWS PROGRAMS. It includes means to process the language of a newscaster to be translated into choice of user, combines automatic speech recognition (Speech-To-Text processing), automatic machine translation, and audio-visual Text-To-Speech (TTS) synthesis techniques for automatically personalizing TV news programs. However, this patent application does not provide a solution for identifying news programs and classification of said programs, or even identifying programs based on metadata.
While there are several methods for different aspects of news video analysis, there is a need for a process to combine these tools for creating a news video analysis solution that obviates the limitations of the prior art.
OBJECTS OF THIS INVENTION
An object of the invention is to provide an integrated and complete solution for news video analysis.
Another object of the invention is to provide a system wherein TV newscasts in different languages can be processed.
Yet another object of the invention is to automatically identify news programs in a broadcast stream and separate it out from other programs.
Still another object of the invention is to provide a system wherein advertisements in TV newscasts are automatically identified and removed from the news program.
An additional object of the invention is to provide a system for new analysis wherein the story boundaries are automatically identified and the news stories are segregated. Yet an additional object of the invention is to provide a system wherein repeat telecasts are identified and filtered out.
Still an additional object of the invention is to provide a system for news analysis wherein similar stories (pertaining to the same event) on different channels are identified and clustered.
Another additional object of the invention is to provide a system for news analysis wherein each news story is indexed with keywords identified in the speech and visual text as well as other metadata, such as a recognized face.
Another object of the invention is to provide a system where news stories in languages, for which speech and OCR technologies are not mature, are indexed based on their similarity with stories in other languages where speech and OCR technologies is mature.
Yet another additional object of the invention is to provide a system for news analysis wherein the stories are classified and can be retrieved.
SUMMARY OF THE INVENTION
According to this invention, there is provided a system for analysis of news channels/stories broadcasted/relayed on a television (TV). According to this invention, there is provided a system for identification, classification, storage, and analysis of news programs, containing an audio channel, video channel, and metadata relating to it, broadcasted/relayed on a television (TV) channel by means of a plurality of TV broadcast streams, said system comprising:
- acquisition module adapted to capture said TV broadcast streams;
- recording module adapted to record said captured streams on a physical storage;
- news program identification module adapted to identify news programs in said stored broadcast streams;
- news program clipping module adapted to separate said identified news programs from other programs;
- advertisement identification module for identification of advertisements from said identified news programs;
- advertisement clipping module adapted for removal of said identified advertisements;
- seam detection module adapted to detect and identify seams of said news programs in order to demarcate individual stories in a news program;
- keyword generation module adapted to generate a list of keywords; - text-keyword identification module adapted to identify said created keywords from visual text of identified said news programs;
- speech-keyword identification module adapted to identify the created keywords from the speech of said identified news programs;
- repeat-identification module adapted to identify similar/repeat news programs from said plurality of TV broadcast streams;
- clustering module adapted to cluster said repeat news programs into one news programs, in order to avoid duplication or multiplication;
- removal module adapted to remove said repeat-identified news programs;
- repository adapted to store said news programs and metadata embedded in said news programs; and
- logical interconnection module adapted to logically interconnecting each of said modules for determining the sequence of steps for a multilingual news video analysis system.
Typically, said keyword generation module is a multilingual keyword generation module adapted to generate keywords in multiple languages. Typically, said text-keyword identification module is a multilingual text- keyword identification module adapted to identify said created keywords from visual text of identified said news programs, in different languages, in the visual channel of the news program.
Typically, said speech-keyword identification module is a multilingual speech-keyword identification module adapted to identify said created keywords from the speech in different languages, in the audio channel of the news program.
Typically, said system includes a multilingual lexicon database for generating multilingual synonymous keywords for said created keywords.
In accordance with an embodiment of this invention, there is provided an acquisition module adapted to capture a TV broadcast stream and further includes a recording module adapted to record said captured stream on a physical storage, typically on disk, in chunks of manageable size.
In accordance with another embodiment of this invention, there is provided a news program identification module adapted to identify news programs in the broadcast stream and to separate them from other programs. In accordance with yet another embodiment of this invention, there is provided an advertisement identification module for identification of advertisements from said news programs, and further including an advertisement clipping module adapted for removal of said identified advertisement breaks.
In accordance with still another embodiment of this invention, there is provided a keyword generation module to create a list of desired keywords of contemporary interest in different languages.
In accordance with an additional embodiment of this invention, there is provided a text-keyword identification module adapted to identify the desired created keywords from the visual text of said news stories, in different languages, typically appearing in form of ticker text on the screen.
In accordance with yet an additional embodiment of this invention, there is provided a speech-keyword identification module adapted to identify the desired keywords from the speech, in different languages, in the audio channel of the news.
In accordance with still an additional embodiment of this invention, there is provided a seam detection module adapted to detect and identify seams i.e. story boundaries and demarcate the individual stories in a news program. In accordance with another additional embodiment of this invention, there is provided a repeat-identification module adapted to identify similar/repeat stories from multiple channels and further including a clustering module adapted to cluster said repeat stories into one story, to avoid duplication or multiplication.
In accordance with yet another additional embodiment of this invention, there is provided a removal module adapted to identify duplicate stories (repeat telecasts) and remove them from the selected stories.
In accordance with still another additional embodiment of this invention, there is provided a repository adapted for storing the news contents, content description of the news videos, various indexes and links as discovered in the previously described modules.
Typically, said system includes a retrieval module adapted for retrieving a news program from said repository.
Typically, said system includes a navigation means adapted for navigation in said repository for retrieving a news program.
In accordance with yet another embodiment of this invention, there is provided a logical interconnection module adapted to logically interconnect all the said modules for determining the sequence of steps for a multilingual news video analysis system. Typically, repeat-identification module includes visual matching means adapted to use visual cues in order to identify repeat news programs, said visual matching means comprises:
- key frame identification means adapted to identify at least a key frame in a plurality of news program;
- key frame visual feature extruding means adapted to extrude visual features relating to pre-defined parameters of said identified key frames;
- processing means adapted to process said extruded features based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program;
- identification means adapted to identify repeat news programs based on pre-defined criteria of said similarity score; and
- deletion means adapted to delete said identified repeat news programs.
Typically, repeat-identification module included audio matching means adapted to use audio cues in order to identify repeat news programs, said audio matching means comprises:
- window determination means adapted to determine a window of frames in a plurality of news programs;
- fingerprint detection means adapted to detect audio fingerprint based on pre-defined processing criteria on said determined window of frames; - processing means adapted to process said detected audio fingerprint based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program;
- identification means adapted to identify repeat news programs based on pre-defined criteria of said similarity score; and
- deletion means adapted to delete said identified repeat news programs.
According to this invention, there is provided a method for identification, classification, storage, and analysis of news programs, containing an audio channel, video channel, and metadata relating to it, broadcasted/relayed on a television (TV) channel by means of a plurality of TV broadcast streams, said method comprises the steps of:
- capturing said TV broadcast streams;
- recording said captured streams on a physical storage;
- identifying news programs in said stored broadcast streams;
- separating said identified news programs from other programs;
- identifying advertisements from said identified news programs;
- clipping said identified advertisements;
- detecting and identifying seams of said news programs in order to demarcate individual stories in a news program; - generating a list of keywords;
- identifying said created keywords from visual text of identified said news programs;
- identifying the created keywords from the speech of said identified news programs;
- identifying similar/repeat news programs from said plurality of TV broadcast streams;
- clustering said repeat news programs into one news programs, in order to avoid duplication or multiplication;
- removing said repeat-identified news programs;
- storing said news programs and metadata embedded in said news programs; and
- logically interconnecting each of said modules for determining the sequence of steps for a multilingual news video analysis system.
Typically, said step of identifying said created keywords from visual text of identified said news programs includes the step of generating keywords in multiple languages. Typically, said step of identifying created text keywords includes the step of identifying created keywords from visual text of identified said news programs, in different languages, in the visual channel of the news program.
Typically, said step of identifying speech-keywords includes the step of identifying said created keywords from the speech in different languages, in the audio channel of the news program.
Typically, said method includes a step of retrieving a news program from said repository.
Typically, said method includes a step of navigating in said repository for retrieving a news program.
Typically, said step of removing repeat-identified news programs includes a method of using visual cues in order to identify repeat news programs, said method comprises the steps of:
- identifying at least a key frame in a plurality of news program;
- extruding visual features relating to pre-defined parameters of said identified key frames; - processing said extruded features based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program;
- identifying repeat news programs based on pre-defined criteria of said similarity score; and
- deleting said identified repeat news programs.
Typically, said step of removing repeat-identified news programs includes a method of using audio cues in order to identify repeat news programs, said method comprises the steps of:
- determining a window of frames in a plurality of news programs;
- detecting audio fingerprint based on pre-defined processing criteria on said determined window of frames;
- processing said detected audio fingerprint based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program;
- identifying repeat news programs based on pre-defined criteria of said similarity score; and
- deleting said identified repeat news programs.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWING
The invention will now be described in relation to the accompanying drawings, in which: Figure 1 illustrates a schematic block diagram of the multilingual news video analysis system.
DETAILED DESCRIPTION OF THE INVENTION
Figure 1 illustrates a schematic block diagram of the multilingual news video analysis system in accordance with the present invention.
The Telecast Acquisition Module (10) captures telecast from several possible sources, e.g. a DTH dish, cable TV, etc., tunes to a particular channel, decodes the TV signals and converts the transmission in standard digital video format, e.g. MPEG-4 or the like. This module is replicated for every channel to be monitored.
The video streams captured by the Telecast Acquisition modules (10) are stored in a Recording Module (20) in chunks of manageable size with unique file names.
These video chunks are then fed to a News Program Identification Module (30), which has pre-fed knowledge about schedule of news programs and jingles that precede and follow the news programs in the various channels, and the like. The module marks the beginning and end of the news programs. This information is stored, typically in standard MPEG-7 compliant format, in Video Description Module (40).
Advertisement breaks within a news program are now detected using absence of specific ticker-text bands and marked in the video in Advertisement Identification Module (50). At this stage, the video is decomposed into constituent shots and several visual and audio parameters are extracted. The additional information accumulates in Video Description Module (40).
A set of keywords of contemporary interest are selected by analysis of RSS feeds by a Keyword Generation Module (60). The video segments representing news programs are now processed to detect these keywords. A Keyword Recognition Module (70) analyzes the visual text and speech to spot the identified keywords. The visual keywords are classified into 'global' and 'local' categories, depending on the ticker-text band where they appear. While the 'local' keywords pertain to the current story being telecast, the 'global' keywords do not pertain to a story that may appear anywhere in the news program. Speaker identification Module (80) identifies the speaker using face recognition and speaker identification (speech) technologies in the scenes containing one dominant speaker, for example in speeches made by important personalities. The additional information further augments the description in Video Description Module (40).
Typically, the keyword generation module is a multilingual keyword generation module.
The system provided an ability process multilingual news programs, according to this invention. A multilingual keyword list, in multiple languages, is created, in order to enable keyword spotting in multilingual TV news broadcast channels, both in spoken and visual forms. The
Figure imgf000022_0001
multilingual keyword list helps to automatically map the spotted keywords in different languages to a primary language (say English) equivalents for uniform indexing across multiple channels. Restricting the keyword list to a small number helps in improving the accuracy of the system, especially for keyword spotting in speech. A sample multilingual keyword list is shown below:
The method for creating a multilingual keyword list is fueled by RSS feeds, maintained by some website systems. Typically, RSS feeds captures the contemporary news in a semi-structured XML format and contains hyperlinks to the full-text news stories usually in English. The system of this invention identifies the common (statistical language processing) and proper nouns (using named entity detection processing) in the RSS feed text and the associated stories as the keywords. The keywords in the language of the RSS (usually English) forms a set of concepts, which need to be identified in the audio-visual broadcast in different language telecasts. The equivalent keywords in other languages from the English keywords, can be derived using a word level English-to-language dictionary (for common noun keywords) that language; a pronunciation lexicon (a lexicon is an association of words and their phonetic transcription. It is a special kind of dictionary that maps a word to all the possible phonemic representations of the word.) for transliterating proper names in a semi-automatic matter as suggested. Finally, the keywords in multilingual form is dynamic keyword list structure in XML format. This becomes an active keyword list for the news video channels and is used for both keyword spotting in audio-visual new telecast.
One of the novelties of this invention is the use of keyword spotting instead of adopting a full transcription of new telecast to annotate multilingual news broadcast. This serves three purposes (a) one need not determine the language of telecast a priori and (b) one need not have language specific speech recognition engines and (c) it is easier to keyword spot than try a full text transcription because the search space of the speech to text (speech recognition engine) is constrained in search space. Additionally, it is sufficient to annotate the news telecast by the keywords because news broadcast is all about places and people (proper nouns) and a set of commonly nouns; additionally keyword annotation of the news broadcast occupies much less space than the erroneous full text transcription.
The Seam Detection Module (90) uses the video descriptors available in Video Description Module (40) to identify story boundaries. Repeat- Identification Module (100) identifies similar and duplicate stories from multiple channels. The OCR and speech technology for many Indian languages are not mature enough for reliable keyword extraction. Similar shot detection helps in classification of news stories in these languages. The additional information further augments the description in Video Description Module (40) and is used to create a Repository Knowledge Base (110). The knowledge base enables semantic search for news clusters by semantic analysis of the various metadata associated with the news videos in the earlier stages of processing.
For identifying similar and duplicate news programs both audio and visual cues are identified and used, from a plurality of news programs. Firstly, the recorded news videos are segregated into news programs for further processing. As a first step, shot detection technique is used where the news stories are logically segmented into distinct shots wherein each shot is represented by a key frame or representative frame. Further, similar story detection module finds similarity score, using visual matching techniques, between two news stories in the range of [0, 1], where '0' means no match and T means complete match. After this, the duplicate story detection module finds whether two news stories are duplicates of each other.
The shots are detected on the basis of difference in visual features of the successive frames in a video. A key frame or representative frame and its corresponding visual features such as colour, texture, edges, etc are extracted for each shot. The shots are clustered based on the visual similarity of their representative frames. This is calculated by distance measures such as Absolute Image Difference, Histogram Intersection, Hausdorff Distance, Color Moments, SIFT, and the like.
Each cluster in a story is now compared with every cluster in the other story by comparing the central representative frames in the clusters using a visual comparator. Let {cu, cn ... c!mJ be the clusters in story Si, and {c2i, c22 · · · c2n} be the clusters in story s2. Let ky be the number of shots in j'h cluster of story i. The process is repeated with every pair of candidate similar stories and clusters of similar stories are discovered. Let
Figure imgf000025_0001
represent the match between cluster pair cn and c2j. Similarity is defined as:
Figure imgf000026_0001
If SIM 12 is greater than a certain threshold, the news programs are designated to be similar.
Duplicate stories are a subset of similar stories. Two stories are said to be duplicates of each other only if their audio-visual patterns are same.
Let Tsi and Ts2 be the total duration of the two stories, 'm ' and are the total number of shots in the two stories respectively and
Figure imgf000026_0002
and
Figure imgf000026_0003
be the duration of shots in stories Si and s2 respectively. Then the criteria for (visually) duplicate videos are:
1. ^/ = Ts2
2. m = n
3. 3 i, j such that
Figure imgf000026_0004
4. tii = t2j
The two stories are not duplicates of each other visually, if any of the above condition fails.
The audio patterns or audio fingerprint that are used (generated using audio features such as LPC, MFCC, shifted delta cepstral features etc) are based on perceptual features of audio that are invariant, at least to certain degree, with respect to signal degradations. Thus severely degraded audio still leads to very similar audio fingerprints. These fingerprints are matched for each frame block, which is a group of frames, from two streams. The two streams are duplicates if the fingerprints of all the frame blocks are matched. As the streams can be from different channels, they may not match exactly at the desired points. The match may occur at few samples before or after the desired point. Thus the audio frames are matched in a window of some predetermined size.
While considerable emphasis has been placed herein on the particular features of this invention, it will be appreciated that various modifications can be made, and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other modifications in the nature of the invention or the preferred embodiments will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.

Claims

Claims:
1. A system for identification, classification, storage, and analysis of news programs, containing an audio channel, video channel, and metadata relating to it, broadcasted/relayed on a television (TV) channel by means of a plurality of TV broadcast streams, said system comprising:
- acquisition module adapted to capture said TV broadcast streams;
- recording module adapted to record said captured streams on a physical storage;
- news program identification module adapted to identify news programs in said stored broadcast streams;
- news program clipping module adapted to separate said identified news programs from other programs;
- advertisement identification module for identification of advertisements from said identified news programs;
- advertisement clipping module adapted for removal of said identified advertisements;
- seam detection module adapted to detect and identify seams of said news programs in order to demark individual stories in a news program;
- keyword generation module adapted to generate a list of keywords; - text-keyword identification module adapted to identify said created keywords from visual text of identified said news programs;
- speech-keyword identification module adapted to identify the created keywords from the speech of said identified news programs;
- repeat-identification module adapted to identify similar/repeat news programs from said plurality of TV broadcast streams;
- clustering module adapted to cluster said repeat news programs into one news programs, in order to avoid duplication or multiplication;
- removal module adapted to remove said repeat-identified news programs;
- repository adapted to store said news programs and metadata embedded in said news programs; and
- logical interconnection module adapted to logically interconnecting each of said modules for determining the sequence of steps for a multilingual news video analysis system.
2. A system as claimed in claim 1 wherein, said keyword generation module is a multilingual keyword generation module adapted to generate keyword in multiple languages by analyzing the RSS feed making use of the word level dictionaries.
3. A system as claimed in claim 1 wherein, said text-keyword identification module is a multilingual text-keyword identification module adapted to identify said created keywords from visual text of identified said news programs, in different languages, in the visual channel of the news program.
4. A system as claimed in claim 1 wherein, said speech-keyword identification module is a multilingual speech-keyword identification module adapted to identify said created keywords from the speech in different languages, in the audio channel of the news program.
5. A system as claimed in the above claims wherein, said system includes a multilingual lexicon database for generating multilingual synonymous keywords for said created keywords.
6. A system as claimed in claim 1 wherein, said system includes a retrieval module adapted for retrieving a news program from said repository.
7. A system as claimed in claim 1 wherein, said system includes a navigation means adapted for navigation in said repository for retrieving a news program.
8. A repeat-identification module includes visual matching means adapted to use visual cues in order to identify repeat news programs, said visual matching means comprising:
- key frame identification means adapted to identify at least a key frame in a plurality of news program;
- key frame visual feature extruding means adapted to extrude visual features relating to pre-defined parameters of said identified key frames;
- processing means adapted to process said extruded features based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program;
- identification means adapted to identify repeat news programs based on pre-defined criteria of said similarity score; and
- deletion means adapted to delete said identified repeat news programs.
9. A repeat-identification module included audio matching means adapted to use audio cues in order to identify repeat news programs, said audio matching means comprising: - window determination means adapted to determine a window of frames in a plurality of news programs;
- fingerprint detection means adapted to detect audio fingerprint based on pre-defined processing criteria on said determined window of frames;
- processing means adapted to process said detected audio fingerprint based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program;
- identification means adapted to identify repeat news programs based on pre-defined criteria of said similarity score; and
- deletion means adapted to delete said identified repeat news programs.
10. A method for identification, classification, storage, and analysis of news programs, containing an audio channel, video channel, and metadata relating to it, broadcasted/relayed on a television (TV) channel by means of a plurality of TV broadcast streams, said method comprising the steps of
- capturing said TV broadcast streams;
- recording said captured streams on a physical storage;
- identifying news programs in said stored broadcast streams;
- separating said identified news programs from other programs;
- identifying advertisements from said identified news programs; - clipping said identified advertisements;
- detecting and identifying seams of said news programs in order to demark individual stories in a news program;
- generating a list of keywords;
- identifying said created keywords from visual text of identified said news programs;
- identifying the created keywords from the speech of said identified news programs;
- identifying similar/repeat news programs from said plurality of TV broadcast streams;
- clustering said repeat news programs into one new^s programs, in order to avoid duplication or multiplication;
- removing said repeat-identified news programs;
- storing said news programs and metadata embedded in said news programs; and
- logically interconnecting each of said modules for determining the sequence of steps for a multilingual news video analysis system.
1 l .A method as claimed in claim 10 wherein, said step of identifying said created keywords from visual text of identified said news programs includes the step of generating keywords in multiple languages.
12. A method as claimed in claim 10 wherein, said step of identifying created text keywords includes the step of identifying created keywords from visual text of identified said news programs, in different languages, in the visual channel of the news program.
13. A method as claimed in claim 10 wherein, said step of identifying speech-keywords includes the step of identifying said created keywords from the speech in different languages, in the audio channel of the news program.
14. A method as claimed in claim 10 wherein, said method includes a step of retrieving a news program from said repository
15. A method as claimed in claim 10 wherein, said method includes a step of navigating in said repository for retrieving a news program.
16. A method as claimed in claim 10 wherein, said step of removing repeat-identified news programs includes a method of using visual cues in order to identify repeat news programs, said method comprising the steps of:
- identifying at least a key frame in a plurality of news program;
- extruding visual features relating to pre-defined parameters of said identified key frames;
- processing said extruded features based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program;
- identifying repeat news programs based on pre-defined criteria of said similarity score; and
- deleting said identified repeat news programs.
17. A method as claimed in claim 10 wherein, said step of removing repeat-identified news programs includes a method of using audio cues in order to identify repeat news programs, said method comprising the steps of:
- determining a window of frames in a plurality of news programs;
- detecting audio fingerprint based on pre-defined processing criteria on said determined window of frames;
- processing said detected audio fingerprint based on pre-defined tests in order to obtain a processed similarity score in relation to plurality of news program; - identifying repeat news programs based on pre-defined criteria of said similarity score; and
- deleting said identified repeat news programs.
PCT/IN2010/000617 2009-09-14 2010-09-14 Tv news analysis system for multilingual broadcast channels Ceased WO2011039773A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN2092MU2009 2009-09-14
IN2092/MUM/2009 2009-09-14

Publications (2)

Publication Number Publication Date
WO2011039773A2 true WO2011039773A2 (en) 2011-04-07
WO2011039773A3 WO2011039773A3 (en) 2011-06-16

Family

ID=43826738

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2010/000617 Ceased WO2011039773A2 (en) 2009-09-14 2010-09-14 Tv news analysis system for multilingual broadcast channels

Country Status (1)

Country Link
WO (1) WO2011039773A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015102245A1 (en) * 2014-01-02 2015-07-09 Samsung Electronics Co., Ltd. Display device, server device, voice input system and methods thereof
WO2017023719A1 (en) * 2015-07-31 2017-02-09 Rovi Guides, Inc. Method for enhancing a user viewing experience when consuming a sequence of media
KR102005112B1 (en) * 2018-10-16 2019-07-29 (주) 씨이랩 Method for providing advertising service on contents streaming media
CN112565820A (en) * 2020-12-24 2021-03-26 新奥特(北京)视频技术有限公司 Video news splitting method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1116649C (en) * 1998-12-23 2003-07-30 皇家菲利浦电子有限公司 Personalized video classification and acquisition system
US6999932B1 (en) * 2000-10-10 2006-02-14 Intel Corporation Language independent voice-based search system
US20120114167A1 (en) * 2005-11-07 2012-05-10 Nanyang Technological University Repeat clip identification in video data
CN101315631B (en) * 2008-06-25 2010-06-02 中国人民解放军国防科学技术大学 A news video story unit association method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9749699B2 (en) 2014-01-02 2017-08-29 Samsung Electronics Co., Ltd. Display device, server device, voice input system and methods thereof
WO2015102245A1 (en) * 2014-01-02 2015-07-09 Samsung Electronics Co., Ltd. Display device, server device, voice input system and methods thereof
US11523182B2 (en) 2015-07-31 2022-12-06 Rovi Guides, Inc. Method for enhancing a user viewing experience when consuming a sequence of media
EP3448049A1 (en) * 2015-07-31 2019-02-27 Rovi Guides, Inc. Method for enhancing a user viewing experience when consuming a sequence of media
US10375443B2 (en) 2015-07-31 2019-08-06 Rovi Guides, Inc. Method for enhancing a user viewing experience when consuming a sequence of media
US11032611B2 (en) 2015-07-31 2021-06-08 Rovi Guides, Inc. Method for enhancing a user viewing experience when consuming a sequence of media
EP3926966A1 (en) * 2015-07-31 2021-12-22 Rovi Guides, Inc. Method for enhancing a user viewing experience when consuming a sequence of media
WO2017023719A1 (en) * 2015-07-31 2017-02-09 Rovi Guides, Inc. Method for enhancing a user viewing experience when consuming a sequence of media
US11849182B2 (en) 2015-07-31 2023-12-19 Rovi Guides, Inc. Method for providing identifying portions for playback at user-selected playback rate
EP4598034A1 (en) * 2015-07-31 2025-08-06 Adeia Guides Inc. Method for enhancing a user viewing experience when consuming a sequence of media
KR102005112B1 (en) * 2018-10-16 2019-07-29 (주) 씨이랩 Method for providing advertising service on contents streaming media
CN112565820A (en) * 2020-12-24 2021-03-26 新奥特(北京)视频技术有限公司 Video news splitting method and device
CN112565820B (en) * 2020-12-24 2023-03-28 新奥特(北京)视频技术有限公司 Video news splitting method and device

Also Published As

Publication number Publication date
WO2011039773A3 (en) 2011-06-16

Similar Documents

Publication Publication Date Title
US20030065655A1 (en) Method and apparatus for detecting query-driven topical events using textual phrases on foils as indication of topic
CN114880496B (en) Multimedia information topic analysis method, device, equipment and storage medium
US20040143434A1 (en) Audio-Assisted segmentation and browsing of news videos
US20180039859A1 (en) Joint acoustic and visual processing
CN103761261A (en) Voice recognition based media search method and device
CN114996506B (en) Corpus generation method, corpus generation device, electronic equipment and computer readable storage medium
Dufour et al. Characterizing and detecting spontaneous speech: Application to speaker role recognition
US7349477B2 (en) Audio-assisted video segmentation and summarization
WO2011039773A2 (en) Tv news analysis system for multilingual broadcast channels
Ariki et al. Highlight scene extraction in real time from baseball live video
CN119763013B (en) A method and system for generating video clip tags for scene change detection
Poignant et al. Towards a better integration of written names for unsupervised speakers identification in videos
Hayashi et al. Speech-based and video-supported indexing of multimedia broadcast news
Bechet et al. Detecting person presence in tv shows with linguistic and structural features
Xu et al. Affective content detection in sitcom using subtitle and audio
Bechet et al. Multimodal understanding for person recognition in video broadcasts.
Stein et al. From raw data to semantically enriched hyperlinking: Recent advances in the LinkedTV analysis workflow
JP4305921B2 (en) Video topic splitting method
Nouza et al. A system for information retrieval from large records of Czech spoken data
Ghosh et al. Multimodal indexing of multilingual news video
Zhu et al. Video browsing and retrieval based on multimodal integration
Amaral et al. The development of a portuguese version of a media watch system.
Papageorgiou et al. Multimedia Indexing and Retrieval Using Natural Language, Speech and Image Processing Methods
Bechet et al. DETECTING PERSON PRESENCE IN TV SHOWS WITH LINGUISTIC AND STRUCTURAL
Grangier et al. Effect of segmentation method on video retrieval performance

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10820017

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10820017

Country of ref document: EP

Kind code of ref document: A2