Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

被引:6
作者
Chen, Hangting [1 ,2 ]
Yi, Yang [1 ,2 ]
Feng, Dang [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Speech separation; multi-channel speech processing; MVDR; time-domain network; NETWORK; PERFORMANCE;
D O I
10.21437/Interspeech.2022-230
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic flow. Specifically, the first stage uses Beam-TasNet to generate estimated single-speaker signals, which favors the separation in the second stage. The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods. Experimental results on the spatialized WSJ0-2MIX demonstrate that the Beam-Guided TasNet has achieved an SDR of 21.5 dB, exceeding the baseline Beam-TasNet by 4.1 dB under the same model size and narrowing the gap with the oracle signal-based MVDR to 2 dB.
引用
收藏
页码:866 / 870
页数:5
相关论文
共 50 条
  • [1] Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation
    Zhang, Zhuohuang
    Xu, Yong
    Yu, Meng
    Zhang, Shi-Xiong
    Chen, Lianwu
    Williamson, Donald S.
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3526 - 3540
  • [2] Iteratively Refined Multi-Channel Speech Separation
    Zhang, Xu
    Bao, Changchun
    Yang, Xue
    Zhou, Jing
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [3] A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
    Xie, Wupeng
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Liu, Guanghong
    SYMMETRY-BASEL, 2023, 15 (02):
  • [4] Multi-Modal Multi-Channel Target Speech Separation
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Xu, Yong
    Chen, Lianwu
    Zou, Yuexian
    Yu, Dong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 530 - 541
  • [5] Multi-channel separation of dynamic speech and sound events
    Fujimura, Takuya
    Scheibler, Robin
    INTERSPEECH 2023, 2023, : 3749 - 3753
  • [6] A MULTI-PHASE GAMMATONE FILTERBANK FOR SPEECH SEPARATION VIA TASNET
    Ditter, David
    Gerkmann, Timo
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 36 - 40
  • [7] AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
    Li, Guinan
    Yu, Jianwei
    Deng, Jiajun
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6042 - 6046
  • [8] Improving Channel Decorrelation for Multi-Channel Target Speech Extraction
    Han, Jiangyu
    Rao, Wei
    Wang, Yannan
    Long, Yanhua
    INTERSPEECH 2021, 2021, : 1847 - 1851
  • [9] MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
    Chen, Lianwu
    Yu, Meng
    Su, Dan
    Yu, Dong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 705 - 709
  • [10] EFFICIENT INTEGRATION OF FIXED BEAMFORMERS AND SPEECH SEPARATION NETWORKS FOR MULTI-CHANNEL FAR-FIELD SPEECH SEPARATION
    Chen, Zhuo
    Yoshioka, Takuya
    Xiao, Xiong
    Li, Jinyu
    Seltzer, Michael L.
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5384 - 5388