Single-channel speech enhancement with correlated spectral components: Limits-potential

被引:3
作者
Mowlaee, Pejman [2 ]
Stahl, Johannes K. W. [1 ]
机构
[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Graz, Austria
[2] WS Audiol, Lynge, Denmark
基金
奥地利科学基金会;
关键词
Speech enhancement; Noise reduction; Speech intelligibility; Multidimensional Wiener filter; Inter-frequency dependency; NOISE; COEFFICIENTS; ESTIMATORS; FREQUENCY;
D O I
10.1016/j.specom.2020.05.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we investigate single-channel speech enhancement algorithms that operate in the short-time Fourier transform and take into account dependencies w.r.t. frequency. As a result of allowing for inter-frequency dependencies, the minimum mean square error optimal estimates of the short-time Fourier transform expansion coefficients are functions of complex-valued covariance matrices in general. The covariance matrices are not known a priori and have to be estimated from the observed data. This work is dedicated to analyzing how this affects the respective single-channel speech enhancement algorithms. We propose a statistical model that circumvents the need to estimate complex-valued second order statistics and derive a linear multidimensional short-time spectral amplitude estimator that is motivated by these assumptions. Further, we provide empirical evidence for the assumptions that form the basis of this model. We evaluate the potential of taking into account inter-frequency dependencies for single-channel speech enhancement and subsequently compare the estimator resulting from the proposed statistical model to relevant benchmark methods. The results indicate that estimators that consider inter-frequency dependencies are capable of pushing the limits of standard approaches in terms of joint speech quality and intelligibility improvement when the second order statistics are estimated from isolated speech data. The proposed linear multidimensional short-time spectral amplitude estimator preserves this trend in fully blind scenarios.
引用
收藏
页码:58 / 69
页数:12
相关论文
共 51 条
[31]  
Li CJ, 2004, HELS UNIV TECHNOL S, V46, P200
[32]  
Mardia K.V., 2004, Directional Statistics
[33]   Speech enhancement based on minimum mean-square error estimation and supergaussian priors [J].
Martin, R .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05) :845-856
[34]   Noise power spectral density estimation based on optimal smoothing and minimum statistics [J].
Martin, R .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (05) :504-512
[35]  
Momeni H, 2016, INT CONF ACOUST SPEE, P5215, DOI 10.1109/ICASSP.2016.7472672
[36]   An approach to blind source separation based on temporal structure of speech signals [J].
Murata, N ;
Ikeda, S ;
Ziehe, A .
NEUROCOMPUTING, 2001, 41 :1-24
[37]  
Papoulis A., 2002, PROBABILITY RANDOM V, Vforth
[38]   Multidimensional STSA Estimators for Speech Enhancement With Correlated Spectral Components [J].
Plourde, Eric ;
Champagne, Benoit .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2011, 59 (07) :3013-3024
[39]   A FAMILY OF BAYESIAN STSA ESTIMATORS FOR THE ENHANCEMENT OF SPEECH WITH CORRELATED FREQUENCY COMPONENTS [J].
Plourde, Eric ;
Champagne, Benoit .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4766-4769
[40]  
Press WH., 1994, Numerical Recipes