TPARN: TRIPLE-PATH ATTENTIVE RECURRENT NETWORK FOR TIME-DOMAIN MULTICHANNEL SPEECH ENHANCEMeENT

被引：20

作者：

Pandey, Ashutosh ^{[1
]}

Xu, Buye ^{[1
]}

Kumar, Anurag ^{[1
]}

Donley, Jacob ^{[1
]}

Calamia, Paul ^{[1
]}

Wang, DeLiang ^{[2
]}

机构：

[1] Facebook Real Labs Res, Menlo Pk, CA 94025 USA

[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

multichannel; time-domain; MIMO; self-attention; triple-path; fixed array;

D O I：

10.1109/ICASSP43922.2022.9747373

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), which is a recurrent neural network (RNN) augmented with self-attention. Next, an ARN is introduced along the spatial dimension for spatial context aggregation. TPARN is designed as a multiple-input and multiple-output architecture to enhance all input channels simultaneously. Experimental results demonstrate the superiority of TPARN over existing state-of-the-art approaches.

引用

页码：6497 / 6501

页数：5

共 32 条

[1] [Anonymous], 2018, IEEE ACM T AUDIO SPE
[2] [Anonymous], 2018, Mixed Precision Training
[3] Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1
[4] Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
Chen, Jingjing
Mao, Qirong
Liu, Dong
[J]. INTERSPEECH 2020, 2020, : 2642 - 2646
[5] Improved MVDR beamforming using single-channel mask prediction networks
Erdogan, Hakan
Hershey, John
Watanabe, Shinji
Mandel, Michael
Le Roux, Jonathan
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1981 - 1985
[6] A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
Gannot, Sharon
Vincent, Emmanuel
Markovich-Golan, Shmulik
Ozerov, Alexey
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 692 - 730
[7] Heymann J, 2016, INT CONF ACOUST SPEE, P196, DOI 10.1109/ICASSP.2016.7471664
[8] Kabeli O., 2021, ARXIV210613493
[9] A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
Kinoshita, Keisuke
Delcroix, Marc
Gannot, Sharon
Habets, Emanuel A. P.
Haeb-Umbach, Reinhold
Kellermann, Walter
Leutnant, Volker
Maas, Roland
Nakatani, Tomohiro
Raj, Bhiksha
Sehr, Armin
Yoshioka, Takuya
[J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, : 1 - 19
[10] Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks
Liu, Chang-Le
Fu, Sze-Wei
Li, You-Jin
Huang, Jen-Wei
Wang, Hsin-Min
Tsao, Yu
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 1888 - 1900

← 1 2 3 4 →