DRC-NET: DENSELY CONNECTED RECURRENT CONVOLUTIONAL NEURAL NETWORK FOR SPEECH DEREVERBERATION

被引:10
作者
Liu, Jinjiang [1 ]
Zhang, Xueliang [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
基金
中国国家自然科学基金;
关键词
speech dereverberation; microphone array processing; convolutional recurrent neural network; deep learning; DOMAIN;
D O I
10.1109/ICASSP43922.2022.9747111
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Under our previous work on frequency bin-wise independent processing, a dramatic reduction of the computational complexity for recurrent neural networks (RNN) is achieved. So that a massive deployment of RNN in time dimension is realized in this paper, by using the channel-wise long short-term memory neural network. Based on this approach, the processing of RNN on frequency dimension and time dimension in the time-frequency domain are unified. This allows us to combine convolutional neural network (CNN) and RNN as a basic neural operator, which finally leads to the Densely Connected Recurrent Convolutional Neural Network (DRC-NET). The DRC-NET sufficiently exploits the infinite response of RNN, and the finite response of CNN. Its balanced response characteristics significantly improve the system performance. Experimental result shows that both non-causal and causal version of DRC-NET outperforms the state-of-the-art (STOA) model for speech dereverberation task.
引用
收藏
页码:166 / 170
页数:5
相关论文
共 31 条
  • [1] REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET
    Choi, Hyeong-Seok
    Park, Sungjin
    Lee, Jie Hwan
    Heo, Hoon
    Jeon, Dongsuk
    Lee, Kyogu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5789 - 5793
  • [2] Delcroix M., 2014, Proceedings of REVERB Challenge Workshop
  • [3] Fransen J., 1994, WSJCAM0 CORPUS RECOR, V192
  • [4] Habets E. A., 2006, ROOM IMPULSE RESPONS, V2, P1
  • [5] New Insights Into the MVDR Beamformer in Room Acoustics
    Habets, E. A. P.
    Benesty, J.
    Cohen, I.
    Gannot, S.
    Dmochowski, J.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (01): : 158 - 170
  • [6] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    [J]. INTERSPEECH 2020, 2020, : 2472 - 2476
  • [7] Densely Connected Convolutional Networks
    Huang, Gao
    Liu, Zhuang
    van der Maaten, Laurens
    Weinberger, Kilian Q.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2261 - 2269
  • [8] An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers
    Jensen, Jesper
    Taal, Cees H.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2009 - 2022
  • [9] A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
    Kinoshita, Keisuke
    Delcroix, Marc
    Gannot, Sharon
    Habets, Emanuel A. P.
    Haeb-Umbach, Reinhold
    Kellermann, Walter
    Leutnant, Volker
    Maas, Roland
    Nakatani, Tomohiro
    Raj, Bhiksha
    Sehr, Armin
    Yoshioka, Takuya
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, : 1 - 19
  • [10] Le Roux J, 2019, INT CONF ACOUST SPEE, P626, DOI 10.1109/ICASSP.2019.8683855