A Feature Integration Network for Multi-Channel Speech Enhancement

被引:0
作者
Zeng, Xiao [1 ]
Zhang, Xue [1 ]
Wang, Mingjiang [1 ]
机构
[1] Harbin Inst Technol, Key Lab Key Technol IoT Terminals, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-channel speech enhancement; LSTM; deep learning; self-attention;
D O I
10.3390/s24227344
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel feature integration network that not only captures spectral information but also refines it through shifted-window-based self-attention, enhancing the quality and precision of the feature extraction. Our network consists of blocks containing a full- and sub-band LSTM module for capturing spectral information, and a global-local attention fusion module for refining this information. The full- and sub-band LSTM module integrates both full-band and sub-band information through two LSTM layers, while the global-local attention fusion module learns global and local attention in a dual-branch architecture. To further enhance the feature integration, we fuse the outputs of these branches using a spatial attention module. The model is trained to predict the complex ratio mask (CRM), thereby improving the quality of the enhanced signal. We conducted an ablation study to assess the contribution of each module, with each showing a significant impact on performance. Additionally, our model was trained on the SPA-DNS dataset using a circular microphone array and the Libri-wham dataset with a linear microphone array, achieving competitive results compared to state-of-the-art models.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] ONE MODEL TO ENHANCE THEM ALL: ARRAY GEOMETRY AGNOSTIC MULTI-CHANNEL PERSONALIZED SPEECH ENHANCEMENT
    Taherian, Hassan
    Eskimez, Sefik Emre
    Yoshioka, Takuya
    Wang, Huaming
    Chen, Zhuo
    Huang, Xuedong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 271 - 275
  • [32] Multi-Channel Expression Recognition Network Based on Channel Weighting
    Lu, Xiuwen
    Zhang, Hongying
    Zhang, Qi
    Han, Xue
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [33] Signed Convex Combination of Fast Convergence Algorithm to Generalized Sidelobe Canceller Beamformer for Multi-Channel Speech Enhancement
    Priyanka, Siva S.
    Kumar, Kishore T.
    TRAITEMENT DU SIGNAL, 2021, 38 (03) : 785 - 795
  • [34] SIMULTANEOUS OPTIMIZATION OF FORGETTING FACTOR AND TIME-FREQUENCY MASK FOR BLOCK ONLINE MULTI-CHANNEL SPEECH ENHANCEMENT
    Togami, Masahito
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2702 - 2706
  • [35] Multi-Channel Speech Enhancement Using Labelled Random Finite Sets and a Neural Beamformer in Cocktail Party Scenario
    Datta, Jayanta
    Firoozabadi, Ali Dehghan
    Zabala-Blanco, David
    Castillo-Soria, Francisco R.
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [36] Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yang, Zhanlei
    Xiao, Longshuai
    INTERSPEECH 2020, 2020, : 51 - 55
  • [37] End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform
    Lee, Hyeonseung
    Kim, Hyung Yong
    Kang, Woo Hyun
    Kim, Jeunghun
    Kim, Nam Soo
    INTERSPEECH 2019, 2019, : 4285 - 4289
  • [38] Multi-stage strength estimation network with cross attention for single channel speech enhancement
    Zhang, Zipeng
    Ding, Yuchen
    Chen, Wei
    Chen, Yutao
    Guo, Weiwei
    Liu, Houguang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 6937 - 6948
  • [39] TOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGE
    Lu, Yen-Ju
    Cornell, Samuele
    Chang, Xuankai
    Zhang, Wangyou
    Li, Chenda
    Ni, Zhaoheng
    Wang, Zhong-Qiu
    Watanabe, Shinji
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9201 - 9205
  • [40] MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION USING DEEP COMPLEX UNET
    Kong, Yuxiang
    Wu, Jian
    Wang, Quandong
    Gao, Peng
    Zhuang, Weiji
    Wang, Yujun
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 104 - 110