Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement

被引:23
作者
Schwerin, Belinda [1 ]
Paliwal, Kuldip [1 ]
机构
[1] Griffith Univ, Griffith Sch Engn, Signal Proc Lab, Nathan, Qld 4111, Australia
关键词
Modulation domain; Analysis-modification-synthesis (AMS); Speech enhancement; MMSE short-time spectral magnitude estimator; Modulation spectrum; Modulation magnitude spectrum; Modulation signal; Modulation spectral subtraction; SPECTRAL AMPLITUDE ESTIMATOR; SUBTRACTION; NOISE;
D O I
10.1016/j.specom.2013.11.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we investigate an alternate, RI-modulation (R = real, I = imaginary) AMS framework for speech enhancement, in which the real and imaginary parts of the modulation signal are processed in secondary AMS procedures. This framework offers theoretical advantages over the previously proposed modulation AMS frameworks in that noise is additive in the modulation signal and noisy acoustic phase is not used to reconstruct speech. Using the MMSE magnitude estimation to modify modulation magnitude spectra, initial experiments presented in this work evaluate if these advantages translate into improvements in processed speech quality. The effect of speech presence uncertainty and log-domain processing on MMSE magnitude estimation in the RI-modulation framework is also investigated. Finally, a comparison of different enhancement approaches applied in the RI-modulation framework is presented. Using subjective and objective experiments as well as spectrogram analysis, we show that RI-modulation MMSE magnitude estimation with speech presence uncertainty produces stimuli which has a higher preference by listeners than the other RI-modulation types. In comparisons to similar approaches in the modulation AMS framework, results showed that the theoretical advantages of the RI-modulation framework did not translate to an improvement in overall quality, with both frameworks yielding very similar sounding stimuli, but a clear improvement (compared to the corresponding modulation AMS based approach) in speech intelligibility was found. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:49 / 68
页数:20
相关论文
共 23 条
[1]  
[Anonymous], 1988, Objective measures of speech quality
[2]  
[Anonymous], 2001, Discrete-Time Speech Signal Processing:Principles and Practice
[3]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[4]  
[Anonymous], P IEEE INT C AC SPEE
[5]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[6]   Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor [J].
Cappe, Olivier .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :345-349
[7]   EFFECT OF REDUCING SLOW TEMPORAL MODULATIONS ON SPEECH RECEPTION [J].
DRULLMAN, R ;
FESTEN, JM ;
PLOMP, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05) :2670-2680
[8]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445
[9]   A SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT [J].
EPHRAIM, Y ;
VANTREES, HL .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (04) :251-266
[10]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121