Model-Based Speech Enhancement With Improved Spectral Envelope Estimation via Dynamics Tracking

被引：19

作者：

Chen, Ruofei ^{[1
]}

Chan, Cheung-Fat ^{[1
]}

So, Hing Cheung ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 04期

关键词：

Codebook mapping; harmonic noise model (HNM); Kalman filter; speech analysis; speech synthesis; vector quantization (VQ); NOISE MODEL; QUANTIZATION; ALGORITHM;

D O I：

10.1109/TASL.2011.2177821

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we present a model-based approach to enhance noisy speech using an analysis-synthesis framework. Target speech is reconstructed with model parameters estimated from noisy observations. In particular, spectral envelope is estimated by tracking its temporal trajectories in order to improve the noise-distorted short-time spectral amplitude. Initially, we propose an analysis-synthesis framework for speech enhancement based on harmonic noise model (HNM). Acoustic parameters such as pitch, spectral envelope, and spectral gain are extracted from HNM analysis. Spectral envelope estimation is improved by tracking its line spectrum frequency trajectories through Kalman filtering. System identification of Kalman filter is achieved via a combined design of codebook mapping scheme and maximum-likelihood estimator with parallel training data. Complete system design and experimental validations are given in details. Through performance evaluation based on a study of spectrogram, objective measures and a subjective listening test, it is demonstrated that the proposed approach achieves significant improvement over conventional methods in various conditions. A distinct advantage of the proposed method is that it successfully tackles the "musical tones" problem.

引用

页码：1324 / 1336

页数：13

共 31 条

[1]

[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225

[2]

[Anonymous], 1993, ESIMATION THEORY

[3]

[Anonymous], 2000, RECP862 ITUT

[4]

[Anonymous], 2007, Speech Enhancement: Theory and Practice

[5] Improving pitch estimation for efficient multiband excitation coding of speech [J].

Chan, CF ;

Yu, EWM .

ELECTRONICS LETTERS, 1996, 32 (10) :870-872

[6]

Chen RF, 2010, EUR SIGNAL PR CONF, P1539

[7] SPEECH ENHANCEMENT IN CAR NOISE ENVIRONMENT BASED ON AN ANALYSIS-SYNTHESIS APPROACH USING HARMONIC NOISE MODEL [J].

Chen, R. F. ;

Chan, C. F. ;

So, H. C. ;

Lee, Jonathan S. C. ;

Leung, C. Y. .

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, :4413-+

[8] ML Estimation of a Stochastic Linear System with the EM Algorithm and Its Application to Speech Recognition [J].

Digalakis, V. ;

Rohlicek, J. R. ;

Ostendorf, M. .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (04) :431-442

[9] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445

[10] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121

← 1 2 3 4 →