Schwarz et al., 2004 - Google Patents

Towards lower error rates in phoneme recognition

Schwarz et al., 2004

Document ID: 5398849570732998832
Author: Schwarz P; Matějka P; Černocký J
Publication year: 2004
Publication venue: International Conference on Text, Speech and Dialogue

External Links

Cited by

Snippet

We investigate techniques for acoustic modeling in automatic recognition of context- independent phoneme strings from the TIMIT database. The baseline phoneme recognizer is based on TempoRAl Patterns (TRAP). This recognizer is simplified to shorten processing …

Continue reading at www.academia.edu (PDF) (other versions)

230000002123 temporal effect 0 abstract description 25

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
- G10L2015/0636—Threshold criteria for the updating
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis

Similar Documents

Publication	Publication Date	Title
Schwarz et al.	2004	Towards lower error rates in phoneme recognition
CN108447490B (en)	2020-08-18	Method and device for voiceprint recognition based on memory bottleneck feature
US6985858B2 (en)	2006-01-10	Method and apparatus for removing noise from feature vectors
McLaren et al.	2015	Advances in deep neural network approaches to speaker recognition
Chen et al.	2006	MVA processing of speech features
US7054810B2 (en)	2006-05-30	Feature vector-based apparatus and method for robust pattern recognition
Pinto et al.	2010	Analysis of MLP-based hierarchical phoneme posterior probability estimator
Chengalvarayan	1999	Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition.
US20140025379A1 (en)	2014-01-23	Method and System for Real-Time Keyword Spotting for Speech Analytics
Szöke et al.	2005	Phoneme based acoustics keyword spotting in informal continuous speech
US7254538B1 (en)	2007-08-07	Nonlinear mapping for feature extraction in automatic speech recognition
EP1465154B1 (en)	2009-10-14	Method of speech recognition using variational inference with switching state space models
US20110257976A1 (en)	2011-10-20	Robust Speech Recognition
JP2005165272A (en)	2005-06-23	Speech recognition utilizing multitude of speech features
US7617104B2 (en)	2009-11-10	Method of speech recognition using hidden trajectory Hidden Markov Models
EP1385147A2 (en)	2004-01-28	Method of speech recognition using time-dependent interpolation and hidden dynamic value classes
US20150255075A1 (en)	2015-09-10	System and Method to Correct for Packet Loss in ASR Systems
KR101120765B1 (en)	2012-03-23	Method of speech recognition using multimodal variational inference with switching state space models
US7409346B2 (en)	2008-08-05	Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction
US8639510B1 (en)	2014-01-28	Acoustic scoring unit implemented on a single FPGA or ASIC
Dimitriadis et al.	2015	Use of micro-modulation features in large vocabulary continuous speech recognition tasks
Fernando et al.	2016	Eigenfeatures: An alternative to Shifted Delta Coefficients for Language Identification
Sai et al.	2018	Enhancing pitch robustness of speech recognition system through spectral smoothing
Matějka et al.	2003	Phoneme recognition using temporal patterns
Matějka et al.	2004	Automatic language identification using phoneme and automatically derived unit strings