TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor's Approximation Theory

被引:7
作者
Li, Andong [1 ,2 ]
Yu, Guochen [1 ]
Zheng, Chengshi [1 ,2 ]
Li, Xiaodong [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
关键词
multi-channel speech enhancement; taylor's approximation theory; all-neural beamformer;
D O I
10.21437/Interspeech.2022-159
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While existing end-to-end beamformers achieve impressive performance in various front-end speech processing tasks, they usually encapsulate the whole process into a black box and thus lack adequate interpretability. As an attempt to fill the blank, we propose a novel neural beamformer inspired by Taylor's approximation theory called TaylorBeamformer for multi-channel speech enhancement. The core idea is that the recovery process can be formulated as the spatial filtering in the neighborhood of the input mixture. Based on that, we decompose it into the superimposition of the 0th-order non-derivative and high-order derivative terms, where the former serves as the spatial filter and the latter is viewed as the residual noise canceller to further improve the speech quality. To enable end-to-end training, we replace the derivative operations with trainable networks and thus can learn from training data. Extensive experiments are conducted on the synthesized dataset based on LibriSpeech and results show that the proposed approach performs favorably against the previous advanced baselines.
引用
收藏
页码:5413 / 5417
页数:5
相关论文
共 28 条
  • [1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS
    ALLEN, JB
    BERKLEY, DA
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) : 943 - 950
  • [2] Casebeer J., 2021, ARXIV211204613
  • [3] Improved MVDR beamforming using single-channel mask prediction networks
    Erdogan, Hakan
    Hershey, John
    Watanabe, Shinji
    Mandel, Michael
    Le Roux, Jonathan
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1981 - 1985
  • [4] A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
    Gannot, Sharon
    Vincent, Emmanuel
    Markovich-Golan, Shmulik
    Ozerov, Alexey
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 692 - 730
  • [5] Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information
    Gu, Rongzhi
    Chen, Lianwu
    Zhang, Shi-Xiong
    Zheng, Jimeng
    Xu, Yong
    Yu, Meng
    Su, Dan
    Zou, Yuexian
    Yu, Dong
    [J]. INTERSPEECH 2019, 2019, : 4290 - 4294
  • [6] Halimeh M. M., 2021, ARXIV210803130
  • [7] Heymann J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P444, DOI 10.1109/ASRU.2015.7404829
  • [8] Higuchi T, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P531, DOI 10.1109/ICASSP.2018.8461850
  • [9] An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers
    Jensen, Jesper
    Taal, Cees H.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2009 - 2022
  • [10] Kingma D P., 2014, P INT C LEARN REPR