DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation

被引:0
|
作者
Wang, Zhenyu [1 ,2 ,4 ]
Zhou, Yi [1 ,2 ]
Gan, Lu [3 ,4 ]
Chen, Rilin
Tang, Xinyu [1 ,2 ]
Liu, Hongqing [1 ,2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
[2] Chongqing Key Lab Signal & Informat Proc, Chongqing 400065, Peoples R China
[3] Brunel Univ, Coll Engn Design & Phys Sci, London UB8 3PH, England
[4] Tencent AI Lab, Beijing, Peoples R China
来源
2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS) | 2022年
关键词
Speech separation; multi-channel; deep encoder; improved transformer; beamforming; TASNET;
D O I
10.1109/SIPS55645.2022.9919247
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, beamforming has been extensively investigated in multi-channel speech separation task. In this paper, we propose a deep encoder dual-path convolutional transformer network (DE-DPCTnet), which directly estimates the beamforming filters for speech separation task in time domain. In order to learn the signal repetitions correctly, nonlinear deep encoder module is proposed to replace the traditional linear one. The improved transformer is also developed by utilizing convolutions to capture long-time speech sequences. The ablation studies demonstrate that the deep encoder and improved transformer indeed benefit the separation performance. The comparisons show that the DE-DPCTnet outperforms the state-of-the-art filter-and-sum network with transform-average-concatenate module (FaSNet-TAC), even with a lower computational complexity.
引用
收藏
页码:180 / 184
页数:5
相关论文
共 28 条
  • [21] DON'T SHOOT BUTTERFLY WITH RIFLES: MULTI-CHANNEL CONTINUOUS SPEECH SEPARATION WITH EARLY EXIT TRANSFORMER
    Chen, Sanyuan
    Wu, Yu
    Chen, Zhuo
    Yoshioka, Takuya
    Liu, Shujie
    Li, Jinyu
    Yu, Xiangzhan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6139 - 6143
  • [22] Gated Recurrent Fusion of Spatial and Spectral Features for Multi-channel Speech Separation with Deep Embedding Representations
    Fan, Cunhang
    Tao, Jianhua
    Bin Liu
    Yi, Jiangyan
    Wen, Zhengqi
    INTERSPEECH 2020, 2020, : 3321 - 3325
  • [23] Opacity annotation of diffuse lung diseases using deep convolutional neural network with multi-channel information
    Mabu, Shingo
    Kido, Shoji
    Hashimoto, Noriaki
    Hirano, Yasushi
    Kuremoto, Takashi
    MEDICAL IMAGING 2018: COMPUTER-AIDED DIAGNOSIS, 2018, 10575
  • [24] Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)
    Wijayakusuma, Alfian
    Gozali, Davin Reinaldo
    Widjaja, Anthony
    Ham, Hanry
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 762 - 772
  • [25] TRUNet: Transformer-Recurrent-U Network for End-to-end Multi-channel Reverberant Sound Source Separation
    Aroudi, Ali
    Uhlich, Stefan
    Font, Marc Ferras
    INTERSPEECH 2022, 2022, : 911 - 915
  • [26] A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation
    Chen, Hangting
    Zhang, Pengyuan
    NEURAL NETWORKS, 2021, 141 : 238 - 248
  • [27] Wear indicator construction of rolling bearings based on multi-channel deep convolutional neural network with exponentially decaying learning rate
    She, Daoming
    Jia, Minping
    MEASUREMENT, 2019, 135 : 368 - 375
  • [28] COMBINING DEEP NEURAL NETWORKS AND BEAMFORMING FOR REAL-TIME MULTI-CHANNEL SPEECH ENHANCEMENT USING A WIRELESS ACOUSTIC SENSOR NETWORK
    Ceolini, Enea
    Liu, Shih-Chii
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,