von Zeddelmann, 2012 - Google Patents

A feature-based approach to noise robust speech detection

von Zeddelmann, 2012

Document ID: 157544072714433128
Author: von Zeddelmann D
Publication year: 2012
Publication venue: Speech Communication; 10. ITG Symposium

External Links

Cited by

Snippet

We propose a robust and easy to realize method for unsupervised Speech Detection (SD) in the context of audio monitoring applications. SD is posed as a binary classification task with the goal of localizing speech in an acoustic monitoring recording. In realistic monitoring …

Continue reading at ieeexplore.ieee.org (other versions)

238000001514 detection method 0 title abstract description 30

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/09—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis

Similar Documents

Publication	Publication Date	Title
Aneeja et al.	2015	Single frequency filtering approach for discriminating speech and nonspeech
KR101101384B1 (en)	2012-01-02	Parameterized Time Characterization
KR101269296B1 (en)	2013-05-29	Neural network classifier for separating audio sources from a monophonic audio signal
Kim et al.	2010	Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring
CN103489446A (en)	2014-01-01	Twitter identification method based on self-adaption energy detection under complex environment
US9792898B2 (en)	2017-10-17	Concurrent segmentation of multiple similar vocalizations
Jaafar et al.	2013	Automatic syllables segmentation for frog identification system
Chu et al.	2007	A noise-robust FFT-based auditory spectrum with application in audio classification
Couvreur et al.	2004	Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models
US9026440B1 (en)	2015-05-05	Method for identifying speech and music components of a sound signal
CN112233693A (en)	2021-01-15	Sound quality evaluation method, device and equipment
KR100308028B1 (en)	2001-10-20	method and apparatus for adaptive speech detection and computer-readable medium using the method
US9196249B1 (en)	2015-11-24	Method for identifying speech and music components of an analyzed audio signal
JP4871191B2 (en)	2012-02-08	Target signal section estimation device, target signal section estimation method, target signal section estimation program, and recording medium
Ravindran et al.	2006	Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing
Uhle et al.	2008	Speech enhancement of movie sound
von Zeddelmann	2012	A feature-based approach to noise robust speech detection
CN103201793B (en)	2015-03-25	Method and system based on voice communication for eliminating interference noise
Khonglah et al.	2016	Low frequency region of vocal tract information for speech/music classification
JPH01255000A (en)	1989-10-11	Apparatus and method for selectively adding noise to template to be used in voice recognition system
Sanam et al.	2012	A combination of semisoft and μ-law thresholding functions for enhancing noisy speech in wavelet packet domain
Harvilla et al.	2012	Histogram-based subband powerwarping and spectral averaging for robust speech recognition under matched and multistyle training
Goodarzi et al.	2009	Speech enhancement using spectral subtraction based on a modified noise minimum statistics estimation
Kato et al.	2014	A wind-noise suppressor based on wind-onset detection and spectral gain modification
Upadhyay et al.	2012	An auditory perception based improved multi-band spectral subtraction algorithm for enhancement of speech degraded by non-stationary noises