Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement

被引:11
作者
Ge, Meng [1 ]
Wang, Longbiao [1 ]
Li, Nan [1 ]
Shi, Hao [1 ]
Dang, Jianwu [1 ,2 ]
Li, Xiangang [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Didi Chuxing, Beijing, Peoples R China
来源
INTERSPEECH 2019 | 2019年
基金
中国国家自然科学基金;
关键词
environment-dependent; attention; convolutional network; recurrent network; speech enhancement; DEREVERBERATION;
D O I
10.21437/Interspeech.2019-1477
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech enhancement aims to keep the real speech signal and reduce noise for building robust communication systems. Under the success of DNN, significant progress has been made. Nevertheless, accuracy of the speech enhancement system is not satisfactory due to insufficient consideration of varied environmental and contextual information in complex cases. To address these problems, this research proposes an end-to-end environment-dependent attention-driven approach. The local frequency-temporal pattern via convolutional neural network is fully employed without pooling operation. It then integrates an attention mechanism into bidirectional long short-term memory to acquire the weighted dynamic context between consecutive frames. Furthermore, dynamic environment estimation and phase correction further improve the generalization ability. Extensive experimental results on REVERB challenge demonstrated that the proposed approach outperformed existing methods, improving PESQ from 2.56 to 2.87 and SRMR from 4.95 to 5.50 compared with conventional DNN.
引用
收藏
页码:3153 / 3157
页数:5
相关论文
共 19 条
  • [1] [Anonymous], 1998, HDB BRAIN THEORY NEU
  • [2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [3] Ernst O., 2018, ARXIV180308243
  • [4] A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech
    Falk, Tiago H.
    Zheng, Chenxi
    Chan, Wai-Yip
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1766 - 1774
  • [5] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [6] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [7] Kinoshita K., 2011, NTT TECHNICAL REV, V9, P1
  • [8] A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
    Kinoshita, Keisuke
    Delcroix, Marc
    Gannot, Sharon
    Habets, Emanuel A. P.
    Haeb-Umbach, Reinhold
    Kellermann, Walter
    Leutnant, Volker
    Maas, Roland
    Nakatani, Tomohiro
    Raj, Bhiksha
    Sehr, Armin
    Yoshioka, Takuya
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, : 1 - 19
  • [9] Loizou P. C., 2013, Speech Enhancement: Theory and Practice, DOI 10.1201/b14529
  • [10] Maas AL, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P22