TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective

被引：0

作者：

Li, Andong ^{[1
,2
]}

Meng, Weixin ^{[1
,2
]}

Yu, Guochen ^{[1
]}

Liu, Wenzhe ^{[3
]}

Li, Xiaodong ^{[1
,2
]}

Zheng, Chengshi ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Tencent Corp, Tencent Ethereal Audio Lab, Shenzhen, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

关键词：

multi-channel speech enhancement; taylor's approximation theory; beam-space; deep neural networks;

D O I：

10.21437/Interspeech.2023-514

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Despite the promising performance of existing frame-wise all-neural beamformers in the speech enhancement field, it remains unclear what the underlying mechanism exists. In this paper, we revisit the beamforming behavior from the beam-space dictionary perspective and formulate it into the learning and mixing of different beam-space components. Based on that, we propose an all-neural beamformer called TaylorBMto simulate Taylor's series expansion operation in which the 0th-order term serves as a spatial filter to conduct the beam mixing, and several high-order terms are tasked with residual noise cancellation for post-processing. The whole system is devised to work in an end-to-end manner. Experiments are conducted on the spatialized LibriSpeech corpus and results show that the proposed approach outperforms existing advanced baselines in terms of evaluation metrics.

引用

页码：1055 / 1059

页数：5

共 36 条

[1] Bando Y, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P716, DOI 10.1109/ICASSP.2018.8461530
[2] Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1
[3] Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks
Chakrabarty, Soumitro
Habets, Emanuel A. P.
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 787 - 799
[4] Multichannel Signal Enhancement Algorithms for Assisted Listening Devices
Doclo, Simon
Kellermann, Walter
Makino, Shoji
Nordholm, Sven
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) : 18 - 30
[5] A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
Gannot, Sharon
Vincent, Emmanuel
Markovich-Golan, Shmulik
Ozerov, Alexey
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 692 - 730
[6] Gu RZ, 2020, INT CONF ACOUST SPEE, P7319, DOI [10.1109/icassp40776.2020.9053092, 10.1109/ICASSP40776.2020.9053092]
[7] L3DAS22 CHALLENGE: LEARNING 3D AUDIO SOURCES IN A REAL OFFICE ENVIRONMENT
Guizzo, Eric
Marinoni, Christian
Pennese, Marco
Ren, Xinlei
Zheng, Xiguang
Zhang, Chen
Masiero, Bruno
Uncini, Aurelio
Comminiello, Danilo
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9186 - 9190
[8] COMPLEX-VALUED SPATIAL AUTOENCODERS FOR MULTICHANNEL SPEECH ENHANCEMENT
Halimeh, Mhd Modar
Kellermann, Walter
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 261 - 265
[9] Heymann J, 2016, INT CONF ACOUST SPEE, P196, DOI 10.1109/ICASSP.2016.7471664
[10] Heymann J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P444, DOI 10.1109/ASRU.2015.7404829

← 1 2 3 4 →