Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement

被引：74

作者：

Zhao, Yan ^{[1
]}

Wang, Zhong-Qiu ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Northwestern Polytech Univ, Ctr Intelligent Acoust & Immers Commun, Xian 710072, Shaanxi, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2019年 / 27卷 / 01期

关键词：

Deep neural networks; denoising; dereverberation; phase; ideal ratio mask; JOINT OPTIMIZATION; NEURAL-NETWORKS; DEREVERBERATION; DOMAIN; MODEL;

D O I：

10.1109/TASLP.2018.2870725

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In real-world situations, speech reaching our ears is commonly corrupted by both room reverberation and background noise. These distortions are detrimental to speech intelligibility and quality, and also pose a serious problem to many speech-related applications, including automatic speech and speaker recognition. In order to deal with the combined effects of noise and reverberation, we propose a two-stage strategy to enhance corrupted speech, where denoising and dereverberation are conducted sequentially using deep neural networks. In addition, we design a new objective function that incorporates clean phase during model training to better estimate spectral magnitudes, which would in turn yield better phase estimates when combined with iterative phase reconstruction. The two-stage model is then jointly trained to optimize the proposed objective function. Systematic evaluations and comparisons show that the proposed algorithm improves objective metrics of speech intelligibility and quality substantially, and significantly outperforms previous one-stage enhancement systems.

引用

页码：53 / 62

页数：10

共 46 条

[1]

Al-Karawi Khamis A., 2015, International Journal of Information and Electronics Engineering, V5, P423, DOI 10.7763/IJIEE.2015.V5.571

[2] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[3]

[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225

[4]

[Anonymous], 2013, COMPUT REV

[5]

[Anonymous], 2015, INT C MACH LEARN ICM

[6]

[Anonymous], 2016, P INT C LEARN REPR

[7]

[Anonymous], 2009, PROC JOTH INT C DIGI

[8]

[Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications

[9]

[Anonymous], 2011, INT C ART INT STAT

[10]

[Anonymous], 2014, Room impulse response generator

← 1 2 3 4 5 →