Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

被引:6
作者
Chen, Hangting [1 ,2 ]
Yi, Yang [1 ,2 ]
Feng, Dang [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Speech separation; multi-channel speech processing; MVDR; time-domain network; NETWORK; PERFORMANCE;
D O I
10.21437/Interspeech.2022-230
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic flow. Specifically, the first stage uses Beam-TasNet to generate estimated single-speaker signals, which favors the separation in the second stage. The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods. Experimental results on the spatialized WSJ0-2MIX demonstrate that the Beam-Guided TasNet has achieved an SDR of 21.5 dB, exceeding the baseline Beam-TasNet by 4.1 dB under the same model size and narrowing the gap with the oracle signal-based MVDR to 2 dB.
引用
收藏
页码:866 / 870
页数:5
相关论文
共 50 条
  • [41] Three-stage hybrid neural beamformer for multi-channel speech enhancement
    Kuang, Kelan
    Yang, Feiran
    Li, Junfeng
    Yang, Jun
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (06) : 3378 - 3389
  • [42] Two-stage UNet with channel and temporal-frequency attention for multi-channel speech enhancement
    Xu, Shiyun
    Cao, Yinghan
    Zhang, Zehua
    Wang, Mingjiang
    SPEECH COMMUNICATION, 2025, 166
  • [43] DCE-CDPPTnet: Dense Connected Encoder Cross Dual-path Parrel Transformer Network for Multi-channel Speech Separation
    Zhuang, Chenghao
    Zhou, Lin
    Cao, Yanxiang
    Wang, Qirui
    Cheng, Yunling
    2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 303 - 308
  • [44] MBrain: A Multi-channel Self-Supervised Learning Framework for Brain Signals
    Cai, Donghong
    Chen, Junru
    Yang, Yang
    Liu, Teng
    Li, Yafeng
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 130 - 141
  • [45] Multi-Head Attention Time Domain Audiovisual Speech Separation Based on Dual-Path Recurrent Network and Conv-TasNet
    Lan C.
    Jiang P.
    Chen H.
    Zhao S.
    Guo X.
    Han Y.
    Han C.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (03): : 1005 - 1012
  • [46] Iterative multi-channel FH-MFSK reception in mobile shallow underwater acoustic channels
    Sun, Dajun
    Hong, Xiaoping
    Cui, Hongyu
    Liu, Lu
    IET COMMUNICATIONS, 2020, 14 (05) : 838 - 845
  • [47] Iterative Gaussian mixture model and multi-channel attributes for arrival picking in extremely noisy environments
    Wang, Hang
    Chen, Yangkang
    GEOPHYSICAL PROSPECTING, 2022, 70 (02) : 343 - 361
  • [48] LOCATION-BASED TRAINING FOR MULTI-CHANNEL TALKER-INDEPENDENT SPEAKER SEPARATION
    Taherian, Hassan
    Tan, Ke
    Wang, DeLiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 696 - 700
  • [49] MULTI-CHANNEL SPEECH PROCESSING ARCHITECTURES FOR NOISE ROBUST SPEECH RECOGNITION: 3RD CHIME CHALLENGE RESULTS
    Pfeifenberger, Lukas
    Schrank, Tobias
    Zoehrer, Matthias
    Hagmueller, Martin
    Pernkopf, Franz
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 452 - 459
  • [50] A deep learning traffic flow prediction framework based on multi-channel graph convolution
    Zhao, Yuanmeng
    Cao, Jie
    Zhang, Hong
    Liu, Zongli
    TRANSPORTATION PLANNING AND TECHNOLOGY, 2021, 44 (08) : 887 - 900