On DCT-based MMSE estimation of short time spectral amplitude for single-channel speech enhancement

被引:12
作者
Shi, Sisi [1 ]
Paliwal, Kuldip [1 ]
Busch, Andrew [2 ]
机构
[1] Griffith Univ, Griffith Sch Engn, Signal Proc Lab, Nathan, QLD 4111, Australia
[2] Griffith Univ, Sch Engn, Nathan, Qld 4111, Australia
关键词
Discrete Cosine transform (DCT); Minimum mean-square error (MMSE); estimator; Speech enhancement; Super-Gaussian speech modelling; Speech presence uncertainty (SPU); SQUARE ERROR ESTIMATION; NOISE SUPPRESSION; PHASE ESTIMATION; COEFFICIENTS; COSINE; MODEL; GAMMA;
D O I
10.1016/j.apacoust.2022.109134
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes Discrete Cosine Transform (DCT) based speech enhancement algorithms. These algo-rithms utilize minimum mean square error (MMSE) estimator of clean short-time spectral amplitude, which respectively uses Gaussian, Laplace and Gamma probability density functions (PDF) as speech pri-ors. We consider the noise process is additive and Gaussian. The proposed estimators are closed-form solutions, whereas the conventional Discrete Fourier Transform (DFT) based estimators derived under super-Gaussian speech priors have no closed-form solutions. We also examine the estimators with the Speech Presence Uncertainty (SPU) that addresses the speech or silence problem with probability. Compared to the alternative approaches, such as the Ephraim and Malah or the Erkelens et.al MMSE-STSA estimators, the proposed methods demonstrate superior performance in terms of Segmental SNR (SegSNR), Perceptual Evaluation of Speech Quality (PESQ), short-time objective intelligibility measure (STOI), and mean subjective preference score, while exhibiting an equal or lower complexity.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:23
相关论文
共 60 条
[1]  
Abramowitz M., 1964, Handbook of mathematical functions: with formulas, graphs, and mathematical tables, V55
[2]   DISCRETE COSINE TRANSFORM [J].
AHMED, N ;
NATARAJAN, T ;
RAO, KR .
IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (01) :90-93
[3]   Speech spectral amplitude estimators using optimally shaped Gamma and Chi priors [J].
Andrianakis, I. ;
White, P. R. .
SPEECH COMMUNICATION, 2009, 51 (01) :1-14
[4]  
[Anonymous], 2013, Speech enhancement: theory and practice
[5]  
Aroudi A., 2012, 2012 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA), P304, DOI 10.1109/ISSPA.2012.6310565
[6]   Speech signal modeling using multivariate distributions [J].
Aroudi, Ali ;
Veisi, Hadi ;
Sameti, Hossein ;
Mafakheri, Zahra .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, :1-14
[7]   Hidden Markov model-based speech enhancement using multivariate Laplace and Gaussian distributions [J].
Aroudi, Ali ;
Veisi, Hadi ;
Sameti, Hossein .
IET SIGNAL PROCESSING, 2015, 9 (02) :177-185
[8]   Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor [J].
Cappe, Olivier .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :345-349
[9]   A Laplacian-based MMSE estimator for speech enhancement [J].
Chen, Bin ;
Loizou, Philipos C. .
SPEECH COMMUNICATION, 2007, 49 (02) :134-143
[10]   Relaxed statistical model for speech enhancement and a priori SNR estimation [J].
Cohen, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05) :870-881