A Feature Integration Network for Multi-Channel Speech Enhancement

被引:0
作者
Zeng, Xiao [1 ]
Zhang, Xue [1 ]
Wang, Mingjiang [1 ]
机构
[1] Harbin Inst Technol, Key Lab Key Technol IoT Terminals, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-channel speech enhancement; LSTM; deep learning; self-attention;
D O I
10.3390/s24227344
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel feature integration network that not only captures spectral information but also refines it through shifted-window-based self-attention, enhancing the quality and precision of the feature extraction. Our network consists of blocks containing a full- and sub-band LSTM module for capturing spectral information, and a global-local attention fusion module for refining this information. The full- and sub-band LSTM module integrates both full-band and sub-band information through two LSTM layers, while the global-local attention fusion module learns global and local attention in a dual-branch architecture. To further enhance the feature integration, we fuse the outputs of these branches using a spatial attention module. The model is trained to predict the complex ratio mask (CRM), thereby improving the quality of the enhanced signal. We conducted an ablation study to assess the contribution of each module, with each showing a significant impact on performance. Additionally, our model was trained on the SPA-DNS dataset using a circular microphone array and the Libri-wham dataset with a linear microphone array, achieving competitive results compared to state-of-the-art models.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Multi-channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder
    Tawara, Naohiro
    Kobayashi, Tetsunori
    Ogawa, Tetsuji
    INTERSPEECH 2019, 2019, : 86 - 90
  • [22] A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users
    Dekkers, Gert
    van Waterschoot, Toon
    Vanrumste, Bart
    Van Den Broeck, Bert
    Gemmeke, Jort F.
    Van Hamme, Hugo
    Karsmakers, Peter
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 746 - 750
  • [23] Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR
    Wang, Zhong-Qiu
    Wang, Peidong
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1778 - 1787
  • [24] MULTI-CHANNEL SPEECH ENHANCEMENT USING BEAMFORMING AND NULLFORMING FOR SEVERELY ADVERSE DRONE ENVIRONMENT
    Kim, Seokhyun
    Jeong, Won
    Park, Hyung-Min
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 755 - 759
  • [25] Underwater Image Enhancement Network Based on Multi-channel Hybrid Attention Mechanism
    Li Y.
    Sun S.
    Huang Q.
    Jing P.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (01): : 118 - 128
  • [26] A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement
    Ho, Minh Tri
    Lee, Jinyoung
    Lee, Bong-Ki
    Yi, Dong Hoon
    Kang, Hong-Goo
    INTERSPEECH 2020, 2020, : 4049 - 4053
  • [27] Multi-Modal Multi-Channel Target Speech Separation
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Xu, Yong
    Chen, Lianwu
    Zou, Yuexian
    Yu, Dong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 530 - 541
  • [28] CONFERENCINGSPEECH CHALLENGE: TOWARDS FAR-FIELD MULTI-CHANNEL SPEECH ENHANCEMENT FOR VIDEO CONFERENCING
    Rao, Wei
    Fu, Yihui
    Hu, Yanxin
    Xu, Xin
    Jv, Yvkai
    Han, Jiangyu
    Jiang, Zhongjie
    Xie, Lei
    Wang, Yannan
    Watanabe, Shinji
    Tan, Zheng-Hua
    Bu, Hui
    Yu, Tao
    Shang, Shidong
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 679 - 686
  • [29] INVARIANT FEATURE EXTRACTION FOR IMAGE CLASSIFICATION VIA MULTI-CHANNEL CONVOLUTIONAL NEURAL NETWORK
    Mei, Shaohui
    Jiang, Ruoqiao
    Ji, Jingyu
    Sun, Jun
    Peng, Yang
    Zhang, Yifan
    2017 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2017), 2017, : 491 - 495
  • [30] CLOSING THE GAP BETWEEN TIME-DOMAIN MULTI-CHANNEL SPEECH ENHANCEMENT ON REAL AND SIMULATION CONDITIONS
    Zhang, Wangyou
    Shi, Jing
    Li, Chenda
    Watanabe, Shinji
    Qian, Yanmin
    2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 146 - 150