A Feature Integration Network for Multi-Channel Speech Enhancement

被引：0

作者：

Zeng, Xiao ^{[1
]}

Zhang, Xue ^{[1
]}

Wang, Mingjiang ^{[1
]}

机构：

[1] Harbin Inst Technol, Key Lab Key Technol IoT Terminals, Shenzhen 518055, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 22期

基金：

中国国家自然科学基金;

关键词：

multi-channel speech enhancement; LSTM; deep learning; self-attention;

D O I：

10.3390/s24227344

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel feature integration network that not only captures spectral information but also refines it through shifted-window-based self-attention, enhancing the quality and precision of the feature extraction. Our network consists of blocks containing a full- and sub-band LSTM module for capturing spectral information, and a global-local attention fusion module for refining this information. The full- and sub-band LSTM module integrates both full-band and sub-band information through two LSTM layers, while the global-local attention fusion module learns global and local attention in a dual-branch architecture. To further enhance the feature integration, we fuse the outputs of these branches using a spatial attention module. The model is trained to predict the complex ratio mask (CRM), thereby improving the quality of the enhanced signal. We conducted an ablation study to assess the contribution of each module, with each showing a significant impact on performance. Additionally, our model was trained on the SPA-DNS dataset using a circular microphone array and the Libri-wham dataset with a linear microphone array, achieving competitive results compared to state-of-the-art models.

引用

页数：13

共 50 条

[31] ONE MODEL TO ENHANCE THEM ALL: ARRAY GEOMETRY AGNOSTIC MULTI-CHANNEL PERSONALIZED SPEECH ENHANCEMENT
Taherian, Hassan
Eskimez, Sefik Emre
Yoshioka, Takuya
Wang, Huaming
Chen, Zhuo
Huang, Xuedong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 271 - 275
[32] Multi-Channel Expression Recognition Network Based on Channel Weighting
Lu, Xiuwen
Zhang, Hongying
Zhang, Qi
Han, Xue
APPLIED SCIENCES-BASEL, 2023, 13 (03):
[33] Signed Convex Combination of Fast Convergence Algorithm to Generalized Sidelobe Canceller Beamformer for Multi-Channel Speech Enhancement
Priyanka, Siva S.
Kumar, Kishore T.
TRAITEMENT DU SIGNAL, 2021, 38 (03) : 785 - 795
[34] SIMULTANEOUS OPTIMIZATION OF FORGETTING FACTOR AND TIME-FREQUENCY MASK FOR BLOCK ONLINE MULTI-CHANNEL SPEECH ENHANCEMENT
Togami, Masahito
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2702 - 2706
[35] Multi-Channel Speech Enhancement Using Labelled Random Finite Sets and a Neural Beamformer in Cocktail Party Scenario
Datta, Jayanta
Firoozabadi, Ali Dehghan
Zabala-Blanco, David
Castillo-Soria, Francisco R.
APPLIED SCIENCES-BASEL, 2025, 15 (06):
[36] Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition
Li, Guanjun
Liang, Shan
Nie, Shuai
Liu, Wenju
Yang, Zhanlei
Xiao, Longshuai
INTERSPEECH 2020, 2020, : 51 - 55
[37] End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform
Lee, Hyeonseung
Kim, Hyung Yong
Kang, Woo Hyun
Kim, Jeunghun
Kim, Nam Soo
INTERSPEECH 2019, 2019, : 4285 - 4289
[38] Multi-stage strength estimation network with cross attention for single channel speech enhancement
Zhang, Zipeng
Ding, Yuchen
Chen, Wei
Chen, Yutao
Guo, Weiwei
Liu, Houguang
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 6937 - 6948
[39] TOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGE
Lu, Yen-Ju
Cornell, Samuele
Chang, Xuankai
Zhang, Wangyou
Li, Chenda
Ni, Zhaoheng
Wang, Zhong-Qiu
Watanabe, Shinji
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9201 - 9205
[40] MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION USING DEEP COMPLEX UNET
Kong, Yuxiang
Wu, Jian
Wang, Quandong
Gao, Peng
Zhuang, Weiji
Wang, Yujun
Xie, Lei
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 104 - 110

← 1 2 3 4 5 →