Aarabi et al., 2004 - Google Patents
Robust speech processing using multi-sensor multi-source information fusion––an overview of the state of the artAarabi et al., 2004
View PDF- Document ID
- 10807500814170173820
- Author
- Aarabi P
- Dasarathy B
- Publication year
- Publication venue
- Information Fusion
External Links
Snippet
This article offers an overview of the state of the art in robust speech processing and delineates the role of information fusion in furthering its objectives. In addition, it also serves the function of the traditional guest editorial of a special issue in terms of presenting a brief …
- 230000004927 fusion 0 abstract description 47
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Chen et al. | Real-time speaker tracking using particle filter sensor fusion | |
| JP4986433B2 (en) | Apparatus and method for recognizing and tracking objects | |
| Rodomagoulakis et al. | Multimodal human action recognition in assistive human-robot interaction | |
| EP4310838B1 (en) | Speech wakeup method and apparatus, and storage medium and system | |
| Mumolo et al. | Algorithms for acoustic localization based on microphone array in service robotics | |
| CN112088315A (en) | Multi-mode speech positioning | |
| Nakamura et al. | Intelligent sound source localization and its application to multimodal human tracking | |
| EP1643769B1 (en) | Apparatus and method performing audio-video sensor fusion for object localization, tracking and separation | |
| CN112861726B (en) | D-S evidence theory multi-mode fusion human-computer interaction method based on rule intention voter | |
| CN117995187A (en) | Customer service robot and dialogue processing system and method based on deep learning | |
| Asano et al. | Detection and separation of speech event using audio and video information fusion and its application to robust speech interface | |
| Cabañas-Molero et al. | Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis | |
| Berghi et al. | Leveraging visual supervision for array-based active speaker detection and localization | |
| Aarabi et al. | Robust speech processing using multi-sensor multi-source information fusion––an overview of the state of the art | |
| Gebru et al. | Audio-visual speech-turn detection and tracking | |
| Murase et al. | Multiple moving speaker tracking by microphone array on mobile robot. | |
| RU2737231C1 (en) | Method of multimodal contactless control of mobile information robot | |
| KR20190059381A (en) | Method for Device Control and Media Editing Based on Automatic Speech/Gesture Recognition | |
| Robi et al. | Active speaker detection using audio, visual and depth modalities: A survey | |
| Goodridge et al. | Multimedia sensor fusion for intelligent camera control | |
| Nakamura et al. | Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array | |
| Griol et al. | Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems | |
| Tse et al. | No need to scream: Robust sound-based speaker localisation in challenging scenarios | |
| Nguyen et al. | A two-step system for sound event localization and detection | |
| Nguyen et al. | Audio-visual integration for human-robot interaction in multi-person scenarios |