Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

被引:6
作者
Chen, Hangting [1 ,2 ]
Yi, Yang [1 ,2 ]
Feng, Dang [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Speech separation; multi-channel speech processing; MVDR; time-domain network; NETWORK; PERFORMANCE;
D O I
10.21437/Interspeech.2022-230
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic flow. Specifically, the first stage uses Beam-TasNet to generate estimated single-speaker signals, which favors the separation in the second stage. The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods. Experimental results on the spatialized WSJ0-2MIX demonstrate that the Beam-Guided TasNet has achieved an SDR of 21.5 dB, exceeding the baseline Beam-TasNet by 4.1 dB under the same model size and narrowing the gap with the oracle signal-based MVDR to 2 dB.
引用
收藏
页码:866 / 870
页数:5
相关论文
共 50 条
  • [31] MULTI-CHANNEL NARROW-BAND DEEP SPEECH SEPARATION WITH FULL-BAND PERMUTATION INVARIANT TRAINING
    Quan, Changsheng
    Li, Xiaofei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 541 - 545
  • [32] Audio-visual Multi-channel Recognition of Overlapped Speech
    Yu, Jianwei
    Wu, Bo
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Chen, Lianwu
    Xu, Yong
    Yu, Meng
    Su, Dan
    Yu, Dong
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2020, 2020, : 3496 - 3500
  • [33] Multi-Channel Conversational Speaker Separation via Neural Diarization
    Taherian, Hassan
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2467 - 2476
  • [34] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
  • [35] DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation
    Wang, Zhenyu
    Zhou, Yi
    Gan, Lu
    Chen, Rilin
    Tang, Xinyu
    Liu, Hongqing
    2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, : 180 - 184
  • [36] Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech
    Yu, Jianwei
    Zhang, Shi-Xiong
    Wu, Bo
    Liu, Shansong
    Hu, Shoukang
    Geng, Mengzhe
    Liu, Xunying
    Meng, Helen
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2067 - 2082
  • [37] SINGLE-CHANNEL SPEECH SEPARATION INTEGRATING PITCH INFORMATION BASED ON A MULTI TASK LEARNING FRAMEWORK
    Li, Xiang
    Liu, Rui
    Song, Tao
    Wu, Xihong
    Chen, Jing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7279 - 7283
  • [38] Direction-aware Speaker Beam for Multi-channel Speaker Extraction
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yu, Meng
    Chen, Lianwu
    Peng, Shouye
    Li, Changliang
    INTERSPEECH 2019, 2019, : 2713 - 2717
  • [39] GLMSNET: SINGLE CHANNEL SPEECH SEPARATION FRAMEWORK IN NOISY AND REVERBERANT ENVIRONMENTS
    Shi, Huiyu
    Chen, Xi
    Kong, Tianlong
    Yin, Shouyi
    Ouyang, Peng
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 663 - 670
  • [40] Modeling and Analysis Framework for Multi-Interface Multi-Channel Cognitive Radio Networks
    Tadayon, Navid
    Aissa, Sonia
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2015, 14 (02) : 935 - 947