Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

被引：6

作者：

Chen, Hangting ^{[1
,2
]}

Yi, Yang ^{[1
,2
]}

Feng, Dang ^{[1
,2
]}

Zhang, Pengyuan ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

Speech separation; multi-channel speech processing; MVDR; time-domain network; NETWORK; PERFORMANCE;

D O I：

10.21437/Interspeech.2022-230

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic flow. Specifically, the first stage uses Beam-TasNet to generate estimated single-speaker signals, which favors the separation in the second stage. The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods. Experimental results on the spatialized WSJ0-2MIX demonstrate that the Beam-Guided TasNet has achieved an SDR of 21.5 dB, exceeding the baseline Beam-TasNet by 4.1 dB under the same model size and narrowing the gap with the oracle signal-based MVDR to 2 dB.

引用

页码：866 / 870

页数：5

共 50 条

[1] Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation
Zhang, Zhuohuang
Xu, Yong
Yu, Meng
Zhang, Shi-Xiong
Chen, Lianwu
Williamson, Donald S.
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3526 - 3540
[2] Iteratively Refined Multi-Channel Speech Separation
Zhang, Xu
Bao, Changchun
Yang, Xue
Zhou, Jing
APPLIED SCIENCES-BASEL, 2024, 14 (14):
[3] A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
Xie, Wupeng
Xiang, Xiaoxiao
Zhang, Xiaojuan
Liu, Guanghong
SYMMETRY-BASEL, 2023, 15 (02):
[4] Multi-Modal Multi-Channel Target Speech Separation
Gu, Rongzhi
Zhang, Shi-Xiong
Xu, Yong
Chen, Lianwu
Zou, Yuexian
Yu, Dong
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 530 - 541
[5] Multi-channel separation of dynamic speech and sound events
Fujimura, Takuya
Scheibler, Robin
INTERSPEECH 2023, 2023, : 3749 - 3753
[6] A MULTI-PHASE GAMMATONE FILTERBANK FOR SPEECH SEPARATION VIA TASNET
Ditter, David
Gerkmann, Timo
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 36 - 40
[7] AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
Li, Guinan
Yu, Jianwei
Deng, Jiajun
Liu, Xunying
Meng, Helen
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6042 - 6046
[8] Improving Channel Decorrelation for Multi-Channel Target Speech Extraction
Han, Jiangyu
Rao, Wei
Wang, Yannan
Long, Yanhua
INTERSPEECH 2021, 2021, : 1847 - 1851
[9] MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
Chen, Lianwu
Yu, Meng
Su, Dan
Yu, Dong
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 705 - 709
[10] EFFICIENT INTEGRATION OF FIXED BEAMFORMERS AND SPEECH SEPARATION NETWORKS FOR MULTI-CHANNEL FAR-FIELD SPEECH SEPARATION
Chen, Zhuo
Yoshioka, Takuya
Xiao, Xiong
Li, Jinyu
Seltzer, Michael L.
Gong, Yifan
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5384 - 5388

← 1 2 3 4 5 →