Bayesian estimation for speech enhancement given a priori knowledge of clean speech phase

被引:1
作者
Sunnydayal V. [1 ]
Kumar T.K. [1 ]
机构
[1] Electronics and Communication Engineering Department, National Institute of Technology Warangal, Warangal, 506 004, Telangana
关键词
Exponential density function; Gamma density function; Laplace density function; MMSE estimator; Nakagami distribution; PESQ; von Mises distribution;
D O I
10.1007/s10772-015-9306-4
中图分类号
学科分类号
摘要
In this paper, STFT based speech enhancement algorithms based on estimation of short time spectral amplitudes are proposed. These algorithms use maximum likelihood, maximum a posterior and minimum mean square error (MMSE) estimators which respectively uses Laplace, Gamma and Exponential probability density functions as noise spectral amplitude priors and Nakagami distribution as speech spectral amplitude priors. The phase of noisy speech carries significant information to be retrieved and utilized. However, the undesired artifacts which are the resultant of the process do create many challenges. In this paper, the reconstructed phase is treated as an uncertain prior knowledge when deriving a joint MMSE estimate of the (C)omplex speech coefficients given (U)ncertain (P)hase information is proposed. The proposed phase reconstruction algorithm assists in generating a clean speech phase. The proposed estimator reduces undesired artifacts and also gives satisfactory values between noisy phase signal and estimate of prior phase and hence yields superior performance in the instrument measures, informal listening and speech quality. © 2015, Springer Science+Business Media New York.
引用
收藏
页码:593 / 607
页数:14
相关论文
共 31 条
  • [1] Toulouse, France, pp. 1068-1071, (2006)
  • [2] Las Vegas, NV, USA, pp. 4897-4900, (2008)
  • [3] Ephraim Y., Malah D., Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Transaction on Acoustic, Speech, Signal Processing, 32, 6, pp. 1109-1121, (1984)
  • [4] Ephraim Y., Malah D., Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transaction on Acoustic, Speech, Signal Processing, 33, 2, pp. 443-445, (1985)
  • [5] Erkelens J.S., Hendriks R.C., Heusdens R., Jensen J., Minimum mean-square error estimation of discrete fourier coefficients with generalized gamma priors, IEEE Transaction on Audio, Speech and Language Processing, 15, 6, pp. 1741-1752, (2007)
  • [6] Evans M., Hastings N., Peacock B., von Mises distribution. In Statistical distributions (ch. 45, pp. 191–192), (2000)
  • [7] Florence, Italy, pp. 4478-4482, (2014)
  • [8] Gerkmann T., Hendriks R.C., Unbiased MMSEbased noise power estimation with low complexity and low tracking delay, IEEE Transaction on Audio, Speech, Language Processing, 20, 4, pp. 1383-1393, (2012)
  • [9] Gerkmann T., Krawczyk M., MMSE-optimal spectral amplitude estimation given the STFT-phase, IEEE Signal Processing Letters, 20, 2, pp. 129-132, (2013)
  • [10] Gerkmann T., Martin R., On the statistics of spectral amplitudes after variance reduction by temporal cepstrum smoothing and cepstral nulling, IEEE Transaction on Signal Processing, 57, 11, pp. 4165-4174, (2009)