Strobel et al., 2002 - Google Patents
Joint audio-video object localization and trackingStrobel et al., 2002
- Document ID
- 6487219924026487942
- Author
- Strobel N
- Spors S
- Rabenstein R
- Publication year
- Publication venue
- IEEE signal processing magazine
External Links
Snippet
There has been a tremendous amount of research on object localization either involving microphone arrays or video cameras. Considerable less attention has been paid, however, to object localization and tracking based on joint audio-video processing thus far. This may …
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic or multiview television systems; Details thereof
- H04N13/02—Picture signal generators
- H04N13/0203—Picture signal generators using a stereoscopic image camera
- H04N13/0239—Picture signal generators using a stereoscopic image camera having two 2D image pickup sensors representing the interocular distance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Strobel et al. | Joint audio-video object localization and tracking | |
Stillman et al. | A system for tracking and recognizing multiple people with multiple cameras | |
Qian et al. | Multi-speaker tracking from an audio–visual sensing device | |
Nickel et al. | A joint particle filter for audio-visual speaker tracking | |
Fuchs et al. | Virtual space teleconferencing using a sea of cameras | |
CN107820037B (en) | Audio signal, image processing method, device and system | |
Schauerte et al. | Multimodal saliency-based attention for object-based scene analysis | |
Kapralos et al. | Audiovisual localization of multiple speakers in a video teleconferencing setting | |
JPH10320588A (en) | Picture processor and picture processing method | |
Li et al. | Multi-modal perception attention network with self-supervised learning for audio-visual speaker tracking | |
Ng et al. | Monitoring dynamically changing environments by ubiquitous vision system | |
Liu et al. | Multiple speaker tracking in spatial audio via PHD filtering and depth-audio fusion | |
Checka et al. | A probabilistic framework for multi-modal multi-person tracking | |
Russakoff et al. | Head tracking using stereo | |
Strobel et al. | Joint audio-video signal processing for object localization and tracking | |
Qian et al. | GLMB 3D speaker tracking with video-assisted multi-channel audio optimization functions | |
Izhar et al. | Tracking sound sources for object-based spatial audio in 3D audio-visual production | |
Arabi et al. | Integrated vision and sound localization | |
Checka et al. | Person tracking using audio-video sensor fusion | |
Stillman et al. | Tracking multiple people with multiple cameras | |
Ishiguro et al. | Integrating a perceptual information infrastructure with robotic avatars: a framework for tele-existence | |
Bernardin et al. | Multi-and single view multiperson tracking for smart room environments | |
Frigola et al. | Visual human machine interface by gestures | |
Spors et al. | Joint audio-video object tracking | |
Sun et al. | Recording the region of interest from FLYCAM panoramic video |