[go: up one dir, main page]

Li et al., 2014 - Google Patents

What's making that sound?

Li et al., 2014

View PDF
Document ID
12922587109361144763
Author
Li K
Ye J
Hua K
Publication year
Publication venue
Proceedings of the 22nd ACM international conference on Multimedia

External Links

Snippet

In this paper, we investigate techniques to localize the sound source in video made using one microphone. The visual object whose motion generates the sound is located and segmented based on the synchronization analysis of object motion and audio energy. We …
Continue reading at people.computing.clemson.edu (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • G06F17/30799Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
    • G06K9/00711Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00288Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30017Multimedia data retrieval; Retrieval of more than one type of audiovisual media
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00335Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Similar Documents

Publication Publication Date Title
Serrano et al. Fight recognition in video using hough forests and 2D convolutional neural network
Izadinia et al. Multimodal analysis for identification and segmentation of moving-sounding objects
Wang et al. Multimedia content analysis-using both audio and visual clues
CN108307229B (en) Video and audio data processing method and device
US8135221B2 (en) Video concept classification using audio-visual atoms
Li et al. What's making that sound?
Abidin et al. Spectrotemporal analysis using local binary pattern variants for acoustic scene classification
El Khoury et al. Audiovisual diarization of people in video content
Pfeiffer et al. Scene determination based on video and audio features
Coutrot et al. An audiovisual attention model for natural conversation scenes
Le et al. Learning multimodal temporal representation for dubbing detection in broadcast media
Mademlis et al. Multimodal stereoscopic movie summarization conforming to narrative characteristics
Hao et al. Deepfake detection using multiple data modalities
Pan et al. Videocube: A novel tool for video mining and classification
Kapsouras et al. Multimodal speaker clustering in full length movies
Sharma et al. Cross modal video representations for weakly supervised active speaker localization
Shahabaz et al. Increasing importance of joint analysis of audio and video in computer vision: a survey
Gade et al. Audio-visual classification of sports types
Soler et al. Suggesting sounds for images from video collections
Castro et al. Empirical study of audio-visual features fusion for gait recognition
Vrochidis et al. A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events
El Khoury Unsupervised video indexing based on audiovisual characterization of persons
Malathi et al. Generic object detection using deep learning
Jiang et al. Audio-visual atoms for generic video concept classification
Butt et al. Audiovisual saliency prediction in uncategorized video sequences based on audio-video correlation