Wu et al., 2025 - Google Patents
Unsupervised Multi-channel Speech Dereverberation via DiffusionWu et al., 2025
View PDF- Document ID
- 16894143092750423247
- Author
- Wu Y
- Xu Z
- Chen J
- Wang Z
- Choudhury R
- Publication year
- Publication venue
- arXiv preprint arXiv:2508.02071
External Links
Snippet
We consider the problem of multi-channel single-speaker blind dereverberation, where multi- channel mixtures are used to recover the clean anechoic speech. To solve this problem, we propose USD-DPS,{U} nsupervised {S} peech {D} ereverberation via {D} iffusion {P} osterior …
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Drude et al. | SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition | |
| Erdogan et al. | Improved MVDR beamforming using single-channel mask prediction networks. | |
| US9666183B2 (en) | Deep neural net based filter prediction for audio event classification and extraction | |
| Krueger et al. | Model-based feature enhancement for reverberant speech recognition | |
| EP3685378B1 (en) | Signal processor and method for providing a processed audio signal reducing noise and reverberation | |
| US8218780B2 (en) | Methods and systems for blind dereverberation | |
| CN114041185B (en) | Method and apparatus for determining a depth filter | |
| Tammen et al. | Deep multi-frame MVDR filtering for single-microphone speech enhancement | |
| Zhou et al. | Speech dereverberation with a reverberation time shortening target | |
| Gonzalez et al. | Investigating the design space of diffusion models for speech enhancement | |
| CN110998723A (en) | Signal processing apparatus using neural network, signal processing method using neural network, and signal processing program | |
| Mack et al. | Declipping speech using deep filtering | |
| Habets et al. | Dereverberation | |
| Hsu et al. | Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features | |
| Al-Karawi et al. | The effects of distance and reverberation time on speaker recognition performance | |
| Nie et al. | Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement. | |
| Li et al. | Multichannel identification and nonnegative equalization for dereverberation and noise reduction based on convolutive transfer function | |
| Wu et al. | Unsupervised Multi-channel Speech Dereverberation via Diffusion | |
| Zhang et al. | Glottal Model Based Speech Beamforming for ad-hoc Microphone Arrays. | |
| Li et al. | Robust speech dereverberation based on wpe and deep learning | |
| Wang et al. | VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification | |
| Krueger et al. | A model-based approach to joint compensation of noise and reverberation for speech recognition | |
| Li et al. | Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement | |
| Nakatani et al. | Simultaneous denoising, dereverberation, and source separation using a unified convolutional beamformer | |
| Hsu et al. | Multi-channel target speech enhancement based on ERB-scaled spatial coherence features |