TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor's Approximation Theory

被引：7

作者：

Li, Andong ^{[1
,2
]}

Yu, Guochen ^{[1
]}

Zheng, Chengshi ^{[1
,2
]}

Li, Xiaodong ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

multi-channel speech enhancement; taylor's approximation theory; all-neural beamformer;

D O I：

10.21437/Interspeech.2022-159

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While existing end-to-end beamformers achieve impressive performance in various front-end speech processing tasks, they usually encapsulate the whole process into a black box and thus lack adequate interpretability. As an attempt to fill the blank, we propose a novel neural beamformer inspired by Taylor's approximation theory called TaylorBeamformer for multi-channel speech enhancement. The core idea is that the recovery process can be formulated as the spatial filtering in the neighborhood of the input mixture. Based on that, we decompose it into the superimposition of the 0th-order non-derivative and high-order derivative terms, where the former serves as the spatial filter and the latter is viewed as the residual noise canceller to further improve the speech quality. To enable end-to-end training, we replace the derivative operations with trainable networks and thus can learn from training data. Extensive experiments are conducted on the synthesized dataset based on LibriSpeech and results show that the proposed approach performs favorably against the previous advanced baselines.

引用

页码：5413 / 5417

页数：5

共 28 条

[1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS
ALLEN, JB
BERKLEY, DA
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) : 943 - 950
[2] Casebeer J., 2021, ARXIV211204613
[3] Improved MVDR beamforming using single-channel mask prediction networks
Erdogan, Hakan
Hershey, John
Watanabe, Shinji
Mandel, Michael
Le Roux, Jonathan
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1981 - 1985
[4] A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
Gannot, Sharon
Vincent, Emmanuel
Markovich-Golan, Shmulik
Ozerov, Alexey
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 692 - 730
[5] Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information
Gu, Rongzhi
Chen, Lianwu
Zhang, Shi-Xiong
Zheng, Jimeng
Xu, Yong
Yu, Meng
Su, Dan
Zou, Yuexian
Yu, Dong
[J]. INTERSPEECH 2019, 2019, : 4290 - 4294
[6] Halimeh M. M., 2021, ARXIV210803130
[7] Heymann J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P444, DOI 10.1109/ASRU.2015.7404829
[8] Higuchi T, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P531, DOI 10.1109/ICASSP.2018.8461850
[9] An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers
Jensen, Jesper
Taal, Cees H.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2009 - 2022
[10] Kingma D P., 2014, P INT C LEARN REPR

← 1 2 3 →