Weintraub et al., 1994 - Google Patents

Constructing telephone acoustic models from a high-quality speech corpus

Weintraub et al., 1994

Document ID: 3174152726899592960
Author: Weintraub M; Neumeyer L
Publication year: 1994
Publication venue: Proceedings of ICASSP'94. IEEE International Conference on Acoustics, Speech and Signal Processing

External Links

Cited by

Snippet

In this paper we explore the effectiveness of constructing telephone acoustic models using a high-quality speech corpus. Results are presented for several front-end signal processing and feature mapping techniques. The algorithms were tested using SRI's DECIPHER …

Continue reading at leoneu.github.io (PDF) (other versions)

238000000034 method 0 abstract description 3

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services, time announcement
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4936—Speech interaction details
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition

Similar Documents

Publication	Publication Date	Title
Neumeyer et al.	1994	Probabilistic optimum filtering for robust speech recognition
Murthy et al.	2002	Robust text-independent speaker identification over telephone channels
Padmanabhan et al.	2002	Speaker clustering and transformation for speaker adaptation in speech recognition systems
EP0913809A2 (en)	1999-05-06	Source normalization training for modeling of speech
Fukuda et al.	2004	Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition
Heracleous et al.	2003	Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation
Hain et al.	2005	The development of the AMI system for the transcription of speech in meetings
Maganti et al.	2007	Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms
Weintraub et al.	1994	Constructing telephone acoustic models from a high-quality speech corpus
Singh et al.	2011	MFCC VQ based speaker recognition and its accuracy affecting factors
Hansen et al.	2001	Robust speech recognition in noise: an evaluation using the spine corpus.
Li et al.	2001	An auditory system-based feature for robust speech recognition.
Neumeyer et al.	1994	Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus
Giuliani et al.	1995	Hands free continuous speech recognition in noisy environment using a four microphone array
Heracleous et al.	2004	Non-audible murmur (NAM) speech recognition using a stethoscopic NAM microphone
Matassoni et al.	2000	Hands-free speech recognition using a filtered clean corpus and incremental HMM adaptation
Alkhaldi et al.	2002	Multi-band based recognition of spoken arabic numerals using wavelet transform
Isobe et al.	1999	Text-independent speaker verification using virtual speaker based cohort normalization.
Chang	1995	Speech recognition system robustness to microphone variations
Fukuda et al.	2003	Noise-robust ASR by using distinctive phonetic features approximated with logarithmic normal distribution of HMM.
Heracleous et al.	2004	Audible (normal) speech and inaudible murmur recognition using NAM microphone
Wang et al.	2004	A GMM-based telephone channel classification for Mandarin speech recognition
Zhang et al.	2005	Noisy speech recognition based on robust end-point detection and model adaptation
Alkhaldi et al.	2002	Automatic speech/speaker recognition in noisy environments using wavelet transform
Tarcisio et al.	1999	Use of simulated data for robust telephone speech recognition