A Feature Integration Network for Multi-Channel Speech Enhancement

被引:0
作者
Zeng, Xiao [1 ]
Zhang, Xue [1 ]
Wang, Mingjiang [1 ]
机构
[1] Harbin Inst Technol, Key Lab Key Technol IoT Terminals, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-channel speech enhancement; LSTM; deep learning; self-attention;
D O I
10.3390/s24227344
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel feature integration network that not only captures spectral information but also refines it through shifted-window-based self-attention, enhancing the quality and precision of the feature extraction. Our network consists of blocks containing a full- and sub-band LSTM module for capturing spectral information, and a global-local attention fusion module for refining this information. The full- and sub-band LSTM module integrates both full-band and sub-band information through two LSTM layers, while the global-local attention fusion module learns global and local attention in a dual-branch architecture. To further enhance the feature integration, we fuse the outputs of these branches using a spatial attention module. The model is trained to predict the complex ratio mask (CRM), thereby improving the quality of the enhanced signal. We conducted an ablation study to assess the contribution of each module, with each showing a significant impact on performance. Additionally, our model was trained on the SPA-DNS dataset using a circular microphone array and the Libri-wham dataset with a linear microphone array, achieving competitive results compared to state-of-the-art models.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Hybrid lightweight temporal-frequency analysis network for multi-channel speech enhancement
    Yinghan Cao
    Shiyun Xu
    Wenjie Zhang
    Mingjiang Wang
    Yun Lu
    EURASIP Journal on Audio, Speech, and Music Processing, 2025 (1)
  • [2] Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement
    Kim, Hansol
    Kang, Kyeongmuk
    Shin, Jong Won
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1898 - 1902
  • [3] All-Neural Multi-Channel Speech Enhancement
    Wang, Zhong-Qiu
    Wang, DeLiang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3234 - 3238
  • [4] A time-frequency fusion model for multi-channel speech enhancement
    Zeng, Xiao
    Xu, Shiyun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [5] LEARNING-BASED MULTI-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING A LOW-PARAMETER MODEL AND INTEGRATION WITH MVDR BEAMFORMING FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Tao, Shuai
    Mowlaee, Pejman
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 100 - 104
  • [6] Channel-Time-Frequency Attention Module for Improved Multi-Channel Speech Enhancement
    Zeng, Xiao
    Wang, Mingjiang
    IEEE ACCESS, 2025, 13 : 44418 - 44427
  • [7] MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS
    Tzirakis, Panagiotis
    Kumar, Anurag
    Donley, Jacob
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3415 - 3419
  • [8] Multi-channel Speech Enhancement with Multiple-target GANs
    Yuan, Jing
    Bao, Changchun
    2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020), 2020,
  • [9] A separation and interaction framework for causal multi-channel speech enhancement
    Liu, Wenzhe
    Li, Andong
    Zheng, Chengshi
    Li, Xiaodong
    DIGITAL SIGNAL PROCESSING, 2022, 126
  • [10] Eigenvector-Based Speech Mask Estimation for Multi-Channel Speech Enhancement
    Pfeifenberger, Lukas
    Zoehrer, Matthias
    Pernkopf, Franz
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2162 - 2172