Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

被引：6

作者：

Chen, Hangting ^{[1
,2
]}

Yi, Yang ^{[1
,2
]}

Feng, Dang ^{[1
,2
]}

Zhang, Pengyuan ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

Speech separation; multi-channel speech processing; MVDR; time-domain network; NETWORK; PERFORMANCE;

D O I：

10.21437/Interspeech.2022-230

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic flow. Specifically, the first stage uses Beam-TasNet to generate estimated single-speaker signals, which favors the separation in the second stage. The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods. Experimental results on the spatialized WSJ0-2MIX demonstrate that the Beam-Guided TasNet has achieved an SDR of 21.5 dB, exceeding the baseline Beam-TasNet by 4.1 dB under the same model size and narrowing the gap with the oracle signal-based MVDR to 2 dB.

引用

页码：866 / 870

页数：5

共 50 条

[21] Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain
Gu, Rongzhi
Zhang, Shi-Xiong
Zou, Yuexian
Yu, Dong
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1370 - 1374
[22] Time-frequency Domain Filter-and-sum Network for Multi-channel Speech Separation
Deng, Zhewen
Zhou, Yi
Liu, Hongqing
INTERSPEECH 2023, 2023, : 3689 - 3693
[23] Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution
Ceolini, Enea
Kiselev, Ilya
Liu, Shih-Chii
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1428 - 1439
[24] Implicit Filter-and-sum Network for End-to-end Multi-channel Speech Separation
Luo, Yi
Mesgarani, Nima
INTERSPEECH 2021, 2021, : 3071 - 3075
[25] DMANET: DEEP LEARNING-BASED DIFFERENTIAL MICROPHONE ARRAYS FOR MULTI-CHANNEL SPEECH SEPARATION
Yang, Xiaokang
Wei, Jianguo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4363 - 4367
[26] Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming
Bella, Mostafa
Saylani, Hicham
Hosseini, Shahram
Deville, Yannick
IEEE ACCESS, 2023, 11 : 100632 - 100645
[27] Gated Recurrent Fusion of Spatial and Spectral Features for Multi-channel Speech Separation with Deep Embedding Representations
Fan, Cunhang
Tao, Jianhua
Bin Liu
Yi, Jiangyan
Wen, Zhengqi
INTERSPEECH 2020, 2020, : 3321 - 3325
[28] Multi-Channel Speaker Verification for Single and Multi-talker Speech
Kataria, Saurabh
Zhang, Shi-Xiong
Yu, Dong
INTERSPEECH 2021, 2021, : 4608 - 4612
[29] DON'T SHOOT BUTTERFLY WITH RIFLES: MULTI-CHANNEL CONTINUOUS SPEECH SEPARATION WITH EARLY EXIT TRANSFORMER
Chen, Sanyuan
Wu, Yu
Chen, Zhuo
Yoshioka, Takuya
Liu, Shujie
Li, Jinyu
Yu, Xiangzhan
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6139 - 6143
[30] MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION
Han, Jiangyu
Zhou, Xinyuan
Long, Yanhua
Li, Yijie
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6094 - 6098

← 1 2 3 4 5 →