Aarabi et al., 2004 - Google Patents

Robust speech processing using multi-sensor multi-source information fusion––an overview of the state of the art

Aarabi et al., 2004

Document ID: 10807500814170173820
Author: Aarabi P; Dasarathy B
Publication year: 2004
Publication venue: Information Fusion

External Links

Cited by

Snippet

This article offers an overview of the state of the art in robust speech processing and delineates the role of information fusion in furthering its objectives. In addition, it also serves the function of the traditional guest editorial of a special issue in terms of presenting a brief …

Continue reading at www.academia.edu (PDF) (other versions)

230000004927 fusion 0 abstract description 47

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals

Similar Documents

Publication	Publication Date	Title
Chen et al.	2004	Real-time speaker tracking using particle filter sensor fusion
JP4986433B2 (en)	2012-07-25	Apparatus and method for recognizing and tracking objects
Rodomagoulakis et al.	2016	Multimodal human action recognition in assistive human-robot interaction
EP4310838B1 (en)	2025-09-24	Speech wakeup method and apparatus, and storage medium and system
Mumolo et al.	2003	Algorithms for acoustic localization based on microphone array in service robotics
CN112088315A (en)	2020-12-15	Multi-mode speech positioning
Nakamura et al.	2011	Intelligent sound source localization and its application to multimodal human tracking
EP1643769B1 (en)	2009-12-23	Apparatus and method performing audio-video sensor fusion for object localization, tracking and separation
CN112861726B (en)	2024-07-12	D-S evidence theory multi-mode fusion human-computer interaction method based on rule intention voter
CN117995187A (en)	2024-05-07	Customer service robot and dialogue processing system and method based on deep learning
Asano et al.	2004	Detection and separation of speech event using audio and video information fusion and its application to robust speech interface
Cabañas-Molero et al.	2018	Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
Berghi et al.	2023	Leveraging visual supervision for array-based active speaker detection and localization
Aarabi et al.	2004	Robust speech processing using multi-sensor multi-source information fusion––an overview of the state of the art
Gebru et al.	2015	Audio-visual speech-turn detection and tracking
Murase et al.	2005	Multiple moving speaker tracking by microphone array on mobile robot.
RU2737231C1 (en)	2020-11-26	Method of multimodal contactless control of mobile information robot
KR20190059381A (en)	2019-05-31	Method for Device Control and Media Editing Based on Automatic Speech/Gesture Recognition
Robi et al.	2024	Active speaker detection using audio, visual and depth modalities: A survey
Goodridge et al.	1996	Multimedia sensor fusion for intelligent camera control
Nakamura et al.	2017	Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array
Griol et al.	2013	Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems
Tse et al.	2019	No need to scream: Robust sound-based speaker localisation in challenging scenarios
Nguyen et al.	2019	A two-step system for sound event localization and detection
Nguyen et al.	2014	Audio-visual integration for human-robot interaction in multi-person scenarios