Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement

被引：11

作者：

Ge, Meng ^{[1
]}

Wang, Longbiao ^{[1
]}

Li, Nan ^{[1
]}

Shi, Hao ^{[1
]}

Dang, Jianwu ^{[1
,2
]}

Li, Xiangang ^{[3
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan

[3] Didi Chuxing, Beijing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

基金：

中国国家自然科学基金;

关键词：

environment-dependent; attention; convolutional network; recurrent network; speech enhancement; DEREVERBERATION;

D O I：

10.21437/Interspeech.2019-1477

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Speech enhancement aims to keep the real speech signal and reduce noise for building robust communication systems. Under the success of DNN, significant progress has been made. Nevertheless, accuracy of the speech enhancement system is not satisfactory due to insufficient consideration of varied environmental and contextual information in complex cases. To address these problems, this research proposes an end-to-end environment-dependent attention-driven approach. The local frequency-temporal pattern via convolutional neural network is fully employed without pooling operation. It then integrates an attention mechanism into bidirectional long short-term memory to acquire the weighted dynamic context between consecutive frames. Furthermore, dynamic environment estimation and phase correction further improve the generalization ability. Extensive experimental results on REVERB challenge demonstrated that the proposed approach outperformed existing methods, improving PESQ from 2.56 to 2.87 and SRMR from 4.95 to 5.50 compared with conventional DNN.

引用

页码：3153 / 3157

页数：5

共 19 条

[1]

[Anonymous], 1998, HDB BRAIN THEORY NEU

[2]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[3]

Ernst O., 2018, ARXIV180308243

[4] A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech [J].

Falk, Tiago H. ;

Zheng, Chenxi ;

Chan, Wai-Yip .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1766-1774

[5]

Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

[6] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

[7]

Kinoshita K., 2011, NTT TECHNICAL REV, V9, P1

[8] A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research [J].

Kinoshita, Keisuke ;

Delcroix, Marc ;

Gannot, Sharon ;

Habets, Emanuel A. P. ;

Haeb-Umbach, Reinhold ;

Kellermann, Walter ;

Leutnant, Volker ;

Maas, Roland ;

Nakatani, Tomohiro ;

Raj, Bhiksha ;

Sehr, Armin ;

Yoshioka, Takuya .

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, :1-19

[9]

Loizou P. C., 2013, Speech Enhancement: Theory and Practice, DOI 10.1201/b14529

[10]

Maas AL, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P22

← 1 2 →