[go: up one dir, main page]

Aarabi et al., 2004 - Google Patents

Robust speech processing using multi-sensor multi-source information fusion––an overview of the state of the art

Aarabi et al., 2004

View PDF
Document ID
10807500814170173820
Author
Aarabi P
Dasarathy B
Publication year
Publication venue
Information Fusion

External Links

Snippet

This article offers an overview of the state of the art in robust speech processing and delineates the role of information fusion in furthering its objectives. In addition, it also serves the function of the traditional guest editorial of a special issue in terms of presenting a brief …
Continue reading at www.academia.edu (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Similar Documents

Publication Publication Date Title
Chen et al. Real-time speaker tracking using particle filter sensor fusion
JP4986433B2 (en) Apparatus and method for recognizing and tracking objects
Rodomagoulakis et al. Multimodal human action recognition in assistive human-robot interaction
EP4310838B1 (en) Speech wakeup method and apparatus, and storage medium and system
Mumolo et al. Algorithms for acoustic localization based on microphone array in service robotics
CN112088315A (en) Multi-mode speech positioning
Nakamura et al. Intelligent sound source localization and its application to multimodal human tracking
EP1643769B1 (en) Apparatus and method performing audio-video sensor fusion for object localization, tracking and separation
CN112861726B (en) D-S evidence theory multi-mode fusion human-computer interaction method based on rule intention voter
CN117995187A (en) Customer service robot and dialogue processing system and method based on deep learning
Asano et al. Detection and separation of speech event using audio and video information fusion and its application to robust speech interface
Cabañas-Molero et al. Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
Berghi et al. Leveraging visual supervision for array-based active speaker detection and localization
Aarabi et al. Robust speech processing using multi-sensor multi-source information fusion––an overview of the state of the art
Gebru et al. Audio-visual speech-turn detection and tracking
Murase et al. Multiple moving speaker tracking by microphone array on mobile robot.
RU2737231C1 (en) Method of multimodal contactless control of mobile information robot
KR20190059381A (en) Method for Device Control and Media Editing Based on Automatic Speech/Gesture Recognition
Robi et al. Active speaker detection using audio, visual and depth modalities: A survey
Goodridge et al. Multimedia sensor fusion for intelligent camera control
Nakamura et al. Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array
Griol et al. Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems
Tse et al. No need to scream: Robust sound-based speaker localisation in challenging scenarios
Nguyen et al. A two-step system for sound event localization and detection
Nguyen et al. Audio-visual integration for human-robot interaction in multi-person scenarios