Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement

被引:11
作者
Ge, Meng [1 ]
Wang, Longbiao [1 ]
Li, Nan [1 ]
Shi, Hao [1 ]
Dang, Jianwu [1 ,2 ]
Li, Xiangang [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Didi Chuxing, Beijing, Peoples R China
来源
INTERSPEECH 2019 | 2019年
基金
中国国家自然科学基金;
关键词
environment-dependent; attention; convolutional network; recurrent network; speech enhancement; DEREVERBERATION;
D O I
10.21437/Interspeech.2019-1477
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech enhancement aims to keep the real speech signal and reduce noise for building robust communication systems. Under the success of DNN, significant progress has been made. Nevertheless, accuracy of the speech enhancement system is not satisfactory due to insufficient consideration of varied environmental and contextual information in complex cases. To address these problems, this research proposes an end-to-end environment-dependent attention-driven approach. The local frequency-temporal pattern via convolutional neural network is fully employed without pooling operation. It then integrates an attention mechanism into bidirectional long short-term memory to acquire the weighted dynamic context between consecutive frames. Furthermore, dynamic environment estimation and phase correction further improve the generalization ability. Extensive experimental results on REVERB challenge demonstrated that the proposed approach outperformed existing methods, improving PESQ from 2.56 to 2.87 and SRMR from 4.95 to 5.50 compared with conventional DNN.
引用
收藏
页码:3153 / 3157
页数:5
相关论文
共 19 条
[1]  
[Anonymous], 1998, HDB BRAIN THEORY NEU
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]  
Ernst O., 2018, ARXIV180308243
[4]   A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech [J].
Falk, Tiago H. ;
Zheng, Chenxi ;
Chan, Wai-Yip .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1766-1774
[5]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[6]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[7]  
Kinoshita K., 2011, NTT TECHNICAL REV, V9, P1
[8]   A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research [J].
Kinoshita, Keisuke ;
Delcroix, Marc ;
Gannot, Sharon ;
Habets, Emanuel A. P. ;
Haeb-Umbach, Reinhold ;
Kellermann, Walter ;
Leutnant, Volker ;
Maas, Roland ;
Nakatani, Tomohiro ;
Raj, Bhiksha ;
Sehr, Armin ;
Yoshioka, Takuya .
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, :1-19
[9]  
Loizou P. C., 2013, Speech Enhancement: Theory and Practice, DOI 10.1201/b14529
[10]  
Maas AL, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P22