Wu et al., 2025 - Google Patents

Unsupervised Multi-channel Speech Dereverberation via Diffusion

Wu et al., 2025

Document ID: 16894143092750423247
Author: Wu Y; Xu Z; Chen J; Wang Z; Choudhury R
Publication year: 2025
Publication venue: arXiv preprint arXiv:2508.02071

External Links

Cited by

Snippet

We consider the problem of multi-channel single-speaker blind dereverberation, where multi- channel mixtures are used to recover the clean anechoic speech. To solve this problem, we propose USD-DPS,{U} nsupervised {S} peech {D} ereverberation via {D} iffusion {P} osterior …

Continue reading at arxiv.org (PDF) (other versions)

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters

Similar Documents

Publication	Publication Date	Title
Drude et al.	2019	SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition
Erdogan et al.	2016	Improved MVDR beamforming using single-channel mask prediction networks.
US9666183B2 (en)	2017-05-30	Deep neural net based filter prediction for audio event classification and extraction
Krueger et al.	2010	Model-based feature enhancement for reverberant speech recognition
EP3685378B1 (en)	2021-10-13	Signal processor and method for providing a processed audio signal reducing noise and reverberation
US8218780B2 (en)	2012-07-10	Methods and systems for blind dereverberation
CN114041185B (en)	2025-09-23	Method and apparatus for determining a depth filter
Tammen et al.	2021	Deep multi-frame MVDR filtering for single-microphone speech enhancement
Zhou et al.	2023	Speech dereverberation with a reverberation time shortening target
Gonzalez et al.	2024	Investigating the design space of diffusion models for speech enhancement
CN110998723A (en)	2020-04-10	Signal processing apparatus using neural network, signal processing method using neural network, and signal processing program
Mack et al.	2019	Declipping speech using deep filtering
Habets et al.	2018	Dereverberation
Hsu et al.	2022	Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features
Al-Karawi et al.	2024	The effects of distance and reverberation time on speaker recognition performance
Nie et al.	2018	Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement.
Li et al.	2018	Multichannel identification and nonnegative equalization for dereverberation and noise reduction based on convolutive transfer function
Wu et al.	2025	Unsupervised Multi-channel Speech Dereverberation via Diffusion
Zhang et al.	2017	Glottal Model Based Speech Beamforming for ad-hoc Microphone Arrays.
Li et al.	2020	Robust speech dereverberation based on wpe and deep learning
Wang et al.	2025	VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification
Krueger et al.	2011	A model-based approach to joint compensation of noise and reverberation for speech recognition
Li et al.	2023	Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement
Nakatani et al.	2019	Simultaneous denoising, dereverberation, and source separation using a unified convolutional beamformer
Hsu et al.	2022	Multi-channel target speech enhancement based on ERB-scaled spatial coherence features