Li et al., 2014 - Google Patents

What's making that sound?

Li et al., 2014

Document ID: 12922587109361144763
Author: Li K; Ye J; Hua K
Publication year: 2014
Publication venue: Proceedings of the 22nd ACM international conference on Multimedia

External Links

Cited by

Snippet

In this paper, we investigate techniques to localize the sound source in video made using one microphone. The visual object whose motion generates the sound is located and segmented based on the synchronization analysis of object motion and audio energy. We …

Continue reading at people.computing.clemson.edu (PDF) (other versions)

230000000007 visual effect 0 abstract description 66

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G06K9/00711—Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis

Similar Documents

Publication	Publication Date	Title
Serrano et al.	2018	Fight recognition in video using hough forests and 2D convolutional neural network
Izadinia et al.	2012	Multimodal analysis for identification and segmentation of moving-sounding objects
Wang et al.	2000	Multimedia content analysis-using both audio and visual clues
CN108307229B (en)	2023-12-22	Video and audio data processing method and device
US8135221B2 (en)	2012-03-13	Video concept classification using audio-visual atoms
Li et al.	2014	What's making that sound?
Abidin et al.	2018	Spectrotemporal analysis using local binary pattern variants for acoustic scene classification
El Khoury et al.	2014	Audiovisual diarization of people in video content
Pfeiffer et al.	2001	Scene determination based on video and audio features
Coutrot et al.	2014	An audiovisual attention model for natural conversation scenes
Le et al.	2016	Learning multimodal temporal representation for dubbing detection in broadcast media
Mademlis et al.	2016	Multimodal stereoscopic movie summarization conforming to narrative characteristics
Hao et al.	2022	Deepfake detection using multiple data modalities
Pan et al.	2002	Videocube: A novel tool for video mining and classification
Kapsouras et al.	2017	Multimodal speaker clustering in full length movies
Sharma et al.	2022	Cross modal video representations for weakly supervised active speaker localization
Shahabaz et al.	2024	Increasing importance of joint analysis of audio and video in computer vision: a survey
Gade et al.	2015	Audio-visual classification of sports types
Soler et al.	2016	Suggesting sounds for images from video collections
Castro et al.	2015	Empirical study of audio-visual features fusion for gait recognition
Vrochidis et al.	2024	A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events
El Khoury	2010	Unsupervised video indexing based on audiovisual characterization of persons
Malathi et al.	2022	Generic object detection using deep learning
Jiang et al.	2010	Audio-visual atoms for generic video concept classification
Butt et al.	2021	Audiovisual saliency prediction in uncategorized video sequences based on audio-video correlation