Weintraub et al., 1994 - Google Patents
Constructing telephone acoustic models from a high-quality speech corpusWeintraub et al., 1994
View PDF- Document ID
- 3174152726899592960
- Author
- Weintraub M
- Neumeyer L
- Publication year
- Publication venue
- Proceedings of ICASSP'94. IEEE International Conference on Acoustics, Speech and Signal Processing
External Links
Snippet
In this paper we explore the effectiveness of constructing telephone acoustic models using a high-quality speech corpus. Results are presented for several front-end signal processing and feature mapping techniques. The algorithms were tested using SRI's DECIPHER …
- 238000000034 method 0 abstract description 3
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services, time announcement
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4936—Speech interaction details
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Neumeyer et al. | Probabilistic optimum filtering for robust speech recognition | |
| Murthy et al. | Robust text-independent speaker identification over telephone channels | |
| Padmanabhan et al. | Speaker clustering and transformation for speaker adaptation in speech recognition systems | |
| EP0913809A2 (en) | Source normalization training for modeling of speech | |
| Fukuda et al. | Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition | |
| Heracleous et al. | Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation | |
| Hain et al. | The development of the AMI system for the transcription of speech in meetings | |
| Maganti et al. | Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms | |
| Weintraub et al. | Constructing telephone acoustic models from a high-quality speech corpus | |
| Singh et al. | MFCC VQ based speaker recognition and its accuracy affecting factors | |
| Hansen et al. | Robust speech recognition in noise: an evaluation using the spine corpus. | |
| Li et al. | An auditory system-based feature for robust speech recognition. | |
| Neumeyer et al. | Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus | |
| Giuliani et al. | Hands free continuous speech recognition in noisy environment using a four microphone array | |
| Heracleous et al. | Non-audible murmur (NAM) speech recognition using a stethoscopic NAM microphone | |
| Matassoni et al. | Hands-free speech recognition using a filtered clean corpus and incremental HMM adaptation | |
| Alkhaldi et al. | Multi-band based recognition of spoken arabic numerals using wavelet transform | |
| Isobe et al. | Text-independent speaker verification using virtual speaker based cohort normalization. | |
| Chang | Speech recognition system robustness to microphone variations | |
| Fukuda et al. | Noise-robust ASR by using distinctive phonetic features approximated with logarithmic normal distribution of HMM. | |
| Heracleous et al. | Audible (normal) speech and inaudible murmur recognition using NAM microphone | |
| Wang et al. | A GMM-based telephone channel classification for Mandarin speech recognition | |
| Zhang et al. | Noisy speech recognition based on robust end-point detection and model adaptation | |
| Alkhaldi et al. | Automatic speech/speaker recognition in noisy environments using wavelet transform | |
| Tarcisio et al. | Use of simulated data for robust telephone speech recognition |