An Experimental Study on Speech Enhancement Based on Deep Neural Networks

被引：707

作者：

Xu, Yong ^{[1
]}

Du, Jun ^{[1
]}

Dai, Li-Rong ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Anhui, Peoples R China

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2014年 / 21卷 / 01期

关键词：

Deep neural networks; noise reduction; regression model; speech enhancement;

D O I：

10.1109/LSP.2013.2291240

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture. In the DNN learning process, a large training set ensures a powerful modeling capability to estimate the complicated nonlinear mapping from observed noisy speech to desired clean signals. Acoustic context was found to improve the continuity of speech to be separated from the background noises successfully without the annoying musical artifact commonly observed in conventional speech enhancement algorithms. A series of pilot experiments were conducted under multi-condition training with more than 100 hours of simulated speech data, resulting in a good generalization capability even in mismatched testing conditions. When compared with the logarithmic minimum mean square error approach, the proposed DNN-based algorithm tends to achieve significant improvements in terms of various objective quality measures. Furthermore, in a subjective preference evaluation with 10 listeners, 76.35% of the subjects were found to prefer DNN-based enhanced speech to that obtained with other conventional technique.

引用

页码：65 / 68

页数：4

共 22 条

[1]

[Anonymous], 2013, COMPUT REV

[2]

[Anonymous], 2001, P862 ITUT

[3]

[Anonymous], 2008, Springer handbook of speech processing, DOI [DOI 10.1007/978-3-540-49127-944, 10.1007/978-3-540-49127-9_44, DOI 10.1007/978-3-540-49127-9_44]

[4]

[Anonymous], 1998, HDB NEURAL NETWORKS

[5] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[6] Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].

Cohen, I .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475

[7] Speech enhancement for non-stationary noise environments [J].

Cohen, I ;

Berdugo, B .

SIGNAL PROCESSING, 2001, 81 (11) :2403-2418

[8]

Deng L, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1692

[9]

Du J, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P569

[10] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445

← 1 2 3 →