Correlation-Based and Model-Based Blind Single-Channel Late-Reverberation Suppression in Noisy Time-Varying Acoustical Environments

被引:32
作者
Erkelens, Jan S. [1 ]
Heusdens, Richard [1 ]
机构
[1] Delft Univ Technol, Dept Mediamat, NL-2628 CD Delft, Netherlands
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 07期
关键词
Discrete Fourier transform (DFT)-based speech enhancement; SPEECH ENHANCEMENT; STATISTICAL-MODEL; DEREVERBERATION; DOMAIN;
D O I
10.1109/TASL.2010.2051271
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper considers suppression of late reverberation and additive noise in single-channel speech recordings. The reverberation introduces long-term correlation in the observed signal. In the first part of this work, we show how this correlation can be used to estimate the late reverberant spectral variance (LRSV) without having to assume a specific model for the room impulse responses (RIRs) while no explicit estimates of RIR model parameters are needed. That makes this correlation-based approach more robust against RIR modeling errors. However, the correlation-based method can follow only slow time variations in the RIRs. Existing model-based methods use statistical models for the RIRs, that depend on one or more parameters that have to be estimated blindly. The common statistical models lead to simple expressions for the LRSV that depend on past values of the spectral variance of the reverberant, noise-free, signal. All existing model-based LRSV estimators in the literature are derived assuming the RIRs to be time-invariant realizations of a stochastic process. In the second part of this paper, we go one step further and analyze time-varying RIRs. We show that in this case the reverberance tends to become decorrelated. We discuss the relations between different RIR models and their corresponding LRSV estimators. We show theoretically that similar simple estimators exist as in the time-invariant case, provided that the reverberation time T-60 and direct-to-reverberation ratio (DRR) of the RIRs remain nearly constant during an interval of the order of a few frames. We show that the reverberation time can be taken frequency-bin independent in DFT-based enhancement algorithms. Experiments with time-varying RIRs validate the analysis. Experiments with additive nonstationary noise and time-invariant RIRs show the influence of blind estimation of the reverberation time and the DRR.
引用
收藏
页码:1746 / 1765
页数:20
相关论文
共 50 条
[1]  
ALLEN JB, 1979, J ACOUST SOC AM, V65, P943, DOI 10.1121/1.382599
[2]  
[Anonymous], Room Impulse Response and Noise Database
[3]  
[Anonymous], 2007, THESIS EINDHOVEN U T
[4]  
[Anonymous], P INT WORKSH AC ECH
[5]  
[Anonymous], 2005, Speech Enhancement
[6]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[7]   System identification in the short-time Fourier transform domain with crossband filtering [J].
Avargel, Yekutiel ;
Cohen, Israel .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1305-1319
[8]  
Benesty Jacob, 2001, Advances in Network and Acoustic Echo Cancellation
[9]  
Brandstein M, 2001, DIGITAL SIGNAL PROC, P133
[10]   Relaxed statistical model for speech enhancement and a priori SNR estimation [J].
Cohen, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05) :870-881