On-line Gaussian mixture modeling in the log-power domain for signal-to-noise ratio estimation and speech enhancement

被引：11

作者：

Dat, Tran Huy

Takeda, Kazuya

Itakura, Fumitada

机构：

[1] Inst Infocomm Res, Singapore 119613, Singapore

[2] Nagoya Univ, Grad Sch Informat Sci, Chikusa Ku, Nagoya, Aichi 4648603, Japan

[3] Meijo Univ, Grad Sch Informat Engn, Tempaku Ku, Nagoya, Aichi 4688502, Japan

来源：

SPEECH COMMUNICATION | 2006年 / 48卷 / 11期

关键词：

Gaussian mixture modeling; segmental SNR; log-normal distributions; cumulative distribution function equalization; speech enhancement;

D O I：

10.1016/j.specom.2006.06.009

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present on-line Gaussian mixture modeling (GMM) in the log-power domain of actual noisy speech and its applications to segmental signal-to-noise ratio (SNR) estimation and speech enhancement. The basic idea in this method is the use of conventional two-component GMM modeling in the log-power domain to estimate the distributions of noise and noisy speech subspaces in each speech segment of a length of 0.5-2 s. Given the subspace distributions, the statistical estimation method is adopted in the applications. For the segmental SNR estimation, the average speech level is estimated from noisy speech using a nonlinear moment of modeled distributions. This method is suitable under real conditions, when neither reference signals nor speech activity is available, and is shown to be more robust and accurate than conventional methods, particularly under low-SNR conditions. The proposed GMM model is extended to the multiband log-power domains for noise estimation. We use long-term information, which is obtained by GMM modeling in each segment of 0.5 s, to update the local distributions of noise and noisy speech power at each actual time-frequency index. The cumulative distribution function equalization (CDFE) is then used to estimate the noise and subtract it from the noisy speech power. The advantage of the CDFE method for noise estimation is that the estimation is given in the logarithmic domain without any approximation. The proposed speech enhancement is tested using the AURORA-2J database. We also compare the proposed method to the conventional minimum statistic and quantile-based noise estimation. The proposed method is found to be superior to the conventional in the speech recognition rate over most noise environments and shown to provide very good compromise between speech enhancement and speech recognition performance. (c) 2006 Elsevier B.V. All rights reserved.

引用

页码：1515 / 1527

页数：13

共 20 条

[1]

[Anonymous], 1993, ACOUSTICAL ENV ROBUS

[2]

AULAY M, 1980, IEEE T ASSP, V28, P137

[3]

Burslem R, 2002, MATER WORLD, V10, P6

[4]

COHEN I, 2002, IEEE T, V20

[5]

DASGUPTA S, 1995, P IEEE ICASSP

[6]

DEMPSTER A, 1977, P ROY STAT SOC B, V39

[7] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121

[8]

*ETSI, 2000, ETSI ES201 108 V1 1

[9]

Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5

[10]

HIRSCH H, 2000, P ISCA ITWR ASR

← 1 2 →