Stochastic-Deterministic MMSE STFT Speech Enhancement With General A Priori Information

被引：28

作者：

McCallum, Matthew ^{[1
]}

Guillemin, Bernard ^{[1
]}

机构：

[1] Univ Auckland, Dept Elect & Comp Engn, Auckland 1142, New Zealand

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 07期

关键词：

Amplitude estimation; Gaussian processes; minimum mean-square error; phase estimation; speech enhancement; stochastic deterministic model; SQUARE ERROR ESTIMATION; NOISE; INTELLIGIBILITY; SUPPRESSION; ESTIMATORS;

D O I：

10.1109/TASL.2013.2253100

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A wide range of Bayesian short-time spectral amplitude (STSA) speech enhancement algorithms exist, varying in both the statistical model used for speech and the cost functions considered. Current algorithms of this class consistently assume that the distribution of clean speech short time Fourier transform (STFT) samples are either randomly distributed with zero mean or deterministic. No single distribution function has been considered that captures both deterministic and random signal components. In this paper a Bayesian STSA algorithm is proposed under a stochastic-deterministic (SD) speech model that makes provision for the inclusion of a priori information by considering a non-zeromean. Analytical expressions are derived for the speech STFT magnitude in the MMSE sense, and phase in the maximum-likelihood sense. Furthermore, a practical method of estimating the a priori SD speech model parameters is described based on explicit consideration of harmonically related sinusoidal components in each STFT frame, and variations in both the magnitude and phase of these components between successive STFT frames. Objective tests using the PESQ measure indicate that the proposed algorithm results in superior speech quality when compared to several other speech enhancement algorithms. In particular it is clear that the proposed algorithm has an improved capability to retain low amplitude voiced speech components in low SNR conditions.

引用

页码：1445 / 1457

页数：13

共 39 条

[1]

[Anonymous], 1996, ITU T RECOMMENDATION, P830

[2]

[Anonymous], P IEEE INT C AC SPEE

[3]

[Anonymous], 1993, ESIMATION THEORY

[4]

[Anonymous], 2007, Speech Enhancement: Theory and Practice

[5] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[6] Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions [J].

Breithaupt, Colin ;

Martin, Rainer .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (02) :277-289

[7] Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor [J].

Cappe, Olivier .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :345-349

[8] The Effect of Spectral Estimation on Speech Enhancement Performance [J].

Charoenruengkit, Werayuth ;

Erdoel, Nurguen .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05) :1170-1179

[9] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445

[10] A SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT [J].

EPHRAIM, Y ;

VANTREES, HL .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (04) :251-266

← 1 2 3 4 →