Li et al., 2014 - Google Patents
What's making that sound?Li et al., 2014
View PDF- Document ID
- 12922587109361144763
- Author
- Li K
- Ye J
- Hua K
- Publication year
- Publication venue
- Proceedings of the 22nd ACM international conference on Multimedia
External Links
Snippet
In this paper, we investigate techniques to localize the sound source in video made using one microphone. The visual object whose motion generates the sound is located and segmented based on the synchronization analysis of object motion and audio energy. We …
- 230000000007 visual effect 0 abstract description 66
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G06K9/00711—Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Serrano et al. | Fight recognition in video using hough forests and 2D convolutional neural network | |
Izadinia et al. | Multimodal analysis for identification and segmentation of moving-sounding objects | |
Wang et al. | Multimedia content analysis-using both audio and visual clues | |
CN108307229B (en) | Video and audio data processing method and device | |
US8135221B2 (en) | Video concept classification using audio-visual atoms | |
Li et al. | What's making that sound? | |
Abidin et al. | Spectrotemporal analysis using local binary pattern variants for acoustic scene classification | |
El Khoury et al. | Audiovisual diarization of people in video content | |
Pfeiffer et al. | Scene determination based on video and audio features | |
Coutrot et al. | An audiovisual attention model for natural conversation scenes | |
Le et al. | Learning multimodal temporal representation for dubbing detection in broadcast media | |
Mademlis et al. | Multimodal stereoscopic movie summarization conforming to narrative characteristics | |
Hao et al. | Deepfake detection using multiple data modalities | |
Pan et al. | Videocube: A novel tool for video mining and classification | |
Kapsouras et al. | Multimodal speaker clustering in full length movies | |
Sharma et al. | Cross modal video representations for weakly supervised active speaker localization | |
Shahabaz et al. | Increasing importance of joint analysis of audio and video in computer vision: a survey | |
Gade et al. | Audio-visual classification of sports types | |
Soler et al. | Suggesting sounds for images from video collections | |
Castro et al. | Empirical study of audio-visual features fusion for gait recognition | |
Vrochidis et al. | A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events | |
El Khoury | Unsupervised video indexing based on audiovisual characterization of persons | |
Malathi et al. | Generic object detection using deep learning | |
Jiang et al. | Audio-visual atoms for generic video concept classification | |
Butt et al. | Audiovisual saliency prediction in uncategorized video sequences based on audio-video correlation |