TPARN: TRIPLE-PATH ATTENTIVE RECURRENT NETWORK FOR TIME-DOMAIN MULTICHANNEL SPEECH ENHANCEMeENT

被引:20
作者
Pandey, Ashutosh [1 ]
Xu, Buye [1 ]
Kumar, Anurag [1 ]
Donley, Jacob [1 ]
Calamia, Paul [1 ]
Wang, DeLiang [2 ]
机构
[1] Facebook Real Labs Res, Menlo Pk, CA 94025 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
multichannel; time-domain; MIMO; self-attention; triple-path; fixed array;
D O I
10.1109/ICASSP43922.2022.9747373
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), which is a recurrent neural network (RNN) augmented with self-attention. Next, an ARN is introduced along the spatial dimension for spatial context aggregation. TPARN is designed as a multiple-input and multiple-output architecture to enhance all input channels simultaneously. Experimental results demonstrate the superiority of TPARN over existing state-of-the-art approaches.
引用
收藏
页码:6497 / 6501
页数:5
相关论文
共 32 条
  • [1] [Anonymous], 2018, IEEE ACM T AUDIO SPE
  • [2] [Anonymous], 2018, Mixed Precision Training
  • [3] Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1
  • [4] Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
    Chen, Jingjing
    Mao, Qirong
    Liu, Dong
    [J]. INTERSPEECH 2020, 2020, : 2642 - 2646
  • [5] Improved MVDR beamforming using single-channel mask prediction networks
    Erdogan, Hakan
    Hershey, John
    Watanabe, Shinji
    Mandel, Michael
    Le Roux, Jonathan
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1981 - 1985
  • [6] A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
    Gannot, Sharon
    Vincent, Emmanuel
    Markovich-Golan, Shmulik
    Ozerov, Alexey
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 692 - 730
  • [7] Heymann J, 2016, INT CONF ACOUST SPEE, P196, DOI 10.1109/ICASSP.2016.7471664
  • [8] Kabeli O., 2021, ARXIV210613493
  • [9] A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
    Kinoshita, Keisuke
    Delcroix, Marc
    Gannot, Sharon
    Habets, Emanuel A. P.
    Haeb-Umbach, Reinhold
    Kellermann, Walter
    Leutnant, Volker
    Maas, Roland
    Nakatani, Tomohiro
    Raj, Bhiksha
    Sehr, Armin
    Yoshioka, Takuya
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, : 1 - 19
  • [10] Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks
    Liu, Chang-Le
    Fu, Sze-Wei
    Li, You-Jin
    Huang, Jen-Wei
    Wang, Hsin-Min
    Tsao, Yu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 1888 - 1900