TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective

被引:0
作者
Li, Andong [1 ,2 ]
Meng, Weixin [1 ,2 ]
Yu, Guochen [1 ]
Liu, Wenzhe [3 ]
Li, Xiaodong [1 ,2 ]
Zheng, Chengshi [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Tencent Corp, Tencent Ethereal Audio Lab, Shenzhen, Peoples R China
来源
INTERSPEECH 2023 | 2023年
关键词
multi-channel speech enhancement; taylor's approximation theory; beam-space; deep neural networks;
D O I
10.21437/Interspeech.2023-514
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Despite the promising performance of existing frame-wise all-neural beamformers in the speech enhancement field, it remains unclear what the underlying mechanism exists. In this paper, we revisit the beamforming behavior from the beam-space dictionary perspective and formulate it into the learning and mixing of different beam-space components. Based on that, we propose an all-neural beamformer called TaylorBMto simulate Taylor's series expansion operation in which the 0th-order term serves as a spatial filter to conduct the beam mixing, and several high-order terms are tasked with residual noise cancellation for post-processing. The whole system is devised to work in an end-to-end manner. Experiments are conducted on the spatialized LibriSpeech corpus and results show that the proposed approach outperforms existing advanced baselines in terms of evaluation metrics.
引用
收藏
页码:1055 / 1059
页数:5
相关论文
共 36 条
  • [1] Bando Y, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P716, DOI 10.1109/ICASSP.2018.8461530
  • [2] Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1
  • [3] Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks
    Chakrabarty, Soumitro
    Habets, Emanuel A. P.
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 787 - 799
  • [4] Multichannel Signal Enhancement Algorithms for Assisted Listening Devices
    Doclo, Simon
    Kellermann, Walter
    Makino, Shoji
    Nordholm, Sven
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) : 18 - 30
  • [5] A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
    Gannot, Sharon
    Vincent, Emmanuel
    Markovich-Golan, Shmulik
    Ozerov, Alexey
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 692 - 730
  • [6] Gu RZ, 2020, INT CONF ACOUST SPEE, P7319, DOI [10.1109/icassp40776.2020.9053092, 10.1109/ICASSP40776.2020.9053092]
  • [7] L3DAS22 CHALLENGE: LEARNING 3D AUDIO SOURCES IN A REAL OFFICE ENVIRONMENT
    Guizzo, Eric
    Marinoni, Christian
    Pennese, Marco
    Ren, Xinlei
    Zheng, Xiguang
    Zhang, Chen
    Masiero, Bruno
    Uncini, Aurelio
    Comminiello, Danilo
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9186 - 9190
  • [8] COMPLEX-VALUED SPATIAL AUTOENCODERS FOR MULTICHANNEL SPEECH ENHANCEMENT
    Halimeh, Mhd Modar
    Kellermann, Walter
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 261 - 265
  • [9] Heymann J, 2016, INT CONF ACOUST SPEE, P196, DOI 10.1109/ICASSP.2016.7471664
  • [10] Heymann J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P444, DOI 10.1109/ASRU.2015.7404829