Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

被引:6
作者
Chen, Hangting [1 ,2 ]
Yi, Yang [1 ,2 ]
Feng, Dang [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Speech separation; multi-channel speech processing; MVDR; time-domain network; NETWORK; PERFORMANCE;
D O I
10.21437/Interspeech.2022-230
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic flow. Specifically, the first stage uses Beam-TasNet to generate estimated single-speaker signals, which favors the separation in the second stage. The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods. Experimental results on the spatialized WSJ0-2MIX demonstrate that the Beam-Guided TasNet has achieved an SDR of 21.5 dB, exceeding the baseline Beam-TasNet by 4.1 dB under the same model size and narrowing the gap with the oracle signal-based MVDR to 2 dB.
引用
收藏
页码:866 / 870
页数:5
相关论文
共 50 条
  • [21] Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Zou, Yuexian
    Yu, Dong
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1370 - 1374
  • [22] Time-frequency Domain Filter-and-sum Network for Multi-channel Speech Separation
    Deng, Zhewen
    Zhou, Yi
    Liu, Hongqing
    INTERSPEECH 2023, 2023, : 3689 - 3693
  • [23] Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution
    Ceolini, Enea
    Kiselev, Ilya
    Liu, Shih-Chii
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1428 - 1439
  • [24] Implicit Filter-and-sum Network for End-to-end Multi-channel Speech Separation
    Luo, Yi
    Mesgarani, Nima
    INTERSPEECH 2021, 2021, : 3071 - 3075
  • [25] DMANET: DEEP LEARNING-BASED DIFFERENTIAL MICROPHONE ARRAYS FOR MULTI-CHANNEL SPEECH SEPARATION
    Yang, Xiaokang
    Wei, Jianguo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4363 - 4367
  • [26] Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming
    Bella, Mostafa
    Saylani, Hicham
    Hosseini, Shahram
    Deville, Yannick
    IEEE ACCESS, 2023, 11 : 100632 - 100645
  • [27] Gated Recurrent Fusion of Spatial and Spectral Features for Multi-channel Speech Separation with Deep Embedding Representations
    Fan, Cunhang
    Tao, Jianhua
    Bin Liu
    Yi, Jiangyan
    Wen, Zhengqi
    INTERSPEECH 2020, 2020, : 3321 - 3325
  • [28] Multi-Channel Speaker Verification for Single and Multi-talker Speech
    Kataria, Saurabh
    Zhang, Shi-Xiong
    Yu, Dong
    INTERSPEECH 2021, 2021, : 4608 - 4612
  • [29] DON'T SHOOT BUTTERFLY WITH RIFLES: MULTI-CHANNEL CONTINUOUS SPEECH SEPARATION WITH EARLY EXIT TRANSFORMER
    Chen, Sanyuan
    Wu, Yu
    Chen, Zhuo
    Yoshioka, Takuya
    Liu, Shujie
    Li, Jinyu
    Yu, Xiangzhan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6139 - 6143
  • [30] MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION
    Han, Jiangyu
    Zhou, Xinyuan
    Long, Yanhua
    Li, Yijie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6094 - 6098