Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

被引：6

作者：

Chen, Hangting ^{[1
,2
]}

Yi, Yang ^{[1
,2
]}

Feng, Dang ^{[1
,2
]}

Zhang, Pengyuan ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

Speech separation; multi-channel speech processing; MVDR; time-domain network; NETWORK; PERFORMANCE;

D O I：

10.21437/Interspeech.2022-230

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic flow. Specifically, the first stage uses Beam-TasNet to generate estimated single-speaker signals, which favors the separation in the second stage. The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods. Experimental results on the spatialized WSJ0-2MIX demonstrate that the Beam-Guided TasNet has achieved an SDR of 21.5 dB, exceeding the baseline Beam-TasNet by 4.1 dB under the same model size and narrowing the gap with the oracle signal-based MVDR to 2 dB.

引用

页码：866 / 870

页数：5

共 50 条

[31] MULTI-CHANNEL NARROW-BAND DEEP SPEECH SEPARATION WITH FULL-BAND PERMUTATION INVARIANT TRAINING
Quan, Changsheng
Li, Xiaofei
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 541 - 545
[32] Audio-visual Multi-channel Recognition of Overlapped Speech
Yu, Jianwei
Wu, Bo
Gu, Rongzhi
Zhang, Shi-Xiong
Chen, Lianwu
Xu, Yong
Yu, Meng
Su, Dan
Yu, Dong
Liu, Xunying
Meng, Helen
INTERSPEECH 2020, 2020, : 3496 - 3500
[33] Multi-Channel Conversational Speaker Separation via Neural Diarization
Taherian, Hassan
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2467 - 2476
[34] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
Chang, Xuankai
Zhang, Wangyou
Qian, Yanmin
Le Roux, Jonathan
Watanabe, Shinji
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
[35] DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation
Wang, Zhenyu
Zhou, Yi
Gan, Lu
Chen, Rilin
Tang, Xinyu
Liu, Hongqing
2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, : 180 - 184
[36] Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech
Yu, Jianwei
Zhang, Shi-Xiong
Wu, Bo
Liu, Shansong
Hu, Shoukang
Geng, Mengzhe
Liu, Xunying
Meng, Helen
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2067 - 2082
[37] SINGLE-CHANNEL SPEECH SEPARATION INTEGRATING PITCH INFORMATION BASED ON A MULTI TASK LEARNING FRAMEWORK
Li, Xiang
Liu, Rui
Song, Tao
Wu, Xihong
Chen, Jing
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7279 - 7283
[38] Direction-aware Speaker Beam for Multi-channel Speaker Extraction
Li, Guanjun
Liang, Shan
Nie, Shuai
Liu, Wenju
Yu, Meng
Chen, Lianwu
Peng, Shouye
Li, Changliang
INTERSPEECH 2019, 2019, : 2713 - 2717
[39] GLMSNET: SINGLE CHANNEL SPEECH SEPARATION FRAMEWORK IN NOISY AND REVERBERANT ENVIRONMENTS
Shi, Huiyu
Chen, Xi
Kong, Tianlong
Yin, Shouyi
Ouyang, Peng
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 663 - 670
[40] Modeling and Analysis Framework for Multi-Interface Multi-Channel Cognitive Radio Networks
Tadayon, Navid
Aissa, Sonia
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2015, 14 (02) : 935 - 947

← 1 2 3 4 5 →