DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation

被引：0

作者：

Wang, Zhenyu ^{[1
,2
,4
]}

Zhou, Yi ^{[1
,2
]}

Gan, Lu ^{[3
,4
]}

Chen, Rilin

Tang, Xinyu ^{[1
,2
]}

Liu, Hongqing ^{[1
,2
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China

[2] Chongqing Key Lab Signal & Informat Proc, Chongqing 400065, Peoples R China

[3] Brunel Univ, Coll Engn Design & Phys Sci, London UB8 3PH, England

[4] Tencent AI Lab, Beijing, Peoples R China

来源：

2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS) | 2022年

关键词：

Speech separation; multi-channel; deep encoder; improved transformer; beamforming; TASNET;

D O I：

10.1109/SIPS55645.2022.9919247

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, beamforming has been extensively investigated in multi-channel speech separation task. In this paper, we propose a deep encoder dual-path convolutional transformer network (DE-DPCTnet), which directly estimates the beamforming filters for speech separation task in time domain. In order to learn the signal repetitions correctly, nonlinear deep encoder module is proposed to replace the traditional linear one. The improved transformer is also developed by utilizing convolutions to capture long-time speech sequences. The ablation studies demonstrate that the deep encoder and improved transformer indeed benefit the separation performance. The comparisons show that the DE-DPCTnet outperforms the state-of-the-art filter-and-sum network with transform-average-concatenate module (FaSNet-TAC), even with a lower computational complexity.

引用

页码：180 / 184

页数：5

共 28 条

[21] DON'T SHOOT BUTTERFLY WITH RIFLES: MULTI-CHANNEL CONTINUOUS SPEECH SEPARATION WITH EARLY EXIT TRANSFORMER
Chen, Sanyuan
Wu, Yu
Chen, Zhuo
Yoshioka, Takuya
Liu, Shujie
Li, Jinyu
Yu, Xiangzhan
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6139 - 6143
[22] Gated Recurrent Fusion of Spatial and Spectral Features for Multi-channel Speech Separation with Deep Embedding Representations
Fan, Cunhang
Tao, Jianhua
Bin Liu
Yi, Jiangyan
Wen, Zhengqi
INTERSPEECH 2020, 2020, : 3321 - 3325
[23] Opacity annotation of diffuse lung diseases using deep convolutional neural network with multi-channel information
Mabu, Shingo
Kido, Shoji
Hashimoto, Noriaki
Hirano, Yasushi
Kuremoto, Takashi
MEDICAL IMAGING 2018: COMPUTER-AIDED DIAGNOSIS, 2018, 10575
[24] Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)
Wijayakusuma, Alfian
Gozali, Davin Reinaldo
Widjaja, Anthony
Ham, Hanry
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 762 - 772
[25] TRUNet: Transformer-Recurrent-U Network for End-to-end Multi-channel Reverberant Sound Source Separation
Aroudi, Ali
Uhlich, Stefan
Font, Marc Ferras
INTERSPEECH 2022, 2022, : 911 - 915
[26] A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation
Chen, Hangting
Zhang, Pengyuan
NEURAL NETWORKS, 2021, 141 : 238 - 248
[27] Wear indicator construction of rolling bearings based on multi-channel deep convolutional neural network with exponentially decaying learning rate
She, Daoming
Jia, Minping
MEASUREMENT, 2019, 135 : 368 - 375
[28] COMBINING DEEP NEURAL NETWORKS AND BEAMFORMING FOR REAL-TIME MULTI-CHANNEL SPEECH ENHANCEMENT USING A WIRELESS ACOUSTIC SENSOR NETWORK
Ceolini, Enea
Liu, Shih-Chii
2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,

← 1 2 3 →