TIME-FREQUENCY MASKING BASED ONLINE SPEECH ENHANCEMENT WITH MULTI-CHANNEL DATA USING CONVOLUTIONAL NEURAL NETWORKS

被引:0
|
作者
Chakrabarty, Soumitro [1 ]
Wang, DeLiang [2 ,3 ]
Habets, Emanuel A. P. [1 ]
机构
[1] Int Audio Labs Erlangen, Erlangen, Germany
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
convolutional neural networks; speech enhancement; microphone array; masking;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech enhancement in noisy and reverberant conditions remains a challenging task. In this work, a time-frequency masking based method for speech enhancement with multi-channel data using convolutional neural networks (CNN) is proposed, where the CNN is trained to estimate the ideal ratio mask by discriminating directional speech source from diffuse or spatially uncorrelated noise. The proposed method operates on, frame-by-frame, the magnitude and phase components of the short-time Fourier transform coefficients of all frequency sub-bands and microphones. The avoidance of temporal context and explicit feature extraction makes the proposed method suitable for online implementation. In contrast to most speech enhancement methods that utilize multi-channel data, the proposed method does not require information about the spatial position of the desired speech source. Through experimental evaluation with both simulated and real data, we show the robustness of the proposed method to unseen acoustic conditions as well as varying noise levels.
引用
收藏
页码:476 / 480
页数:5
相关论文
共 50 条
  • [1] Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks
    Chakrabarty, Soumitro
    Habets, Emanuel A. P.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 787 - 799
  • [2] A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments
    Brutti, Alessio
    Tsiami, Antigoni
    Katsamanis, Athanasios
    Maragos, Petros
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2875 - 2879
  • [3] A time-frequency fusion model for multi-channel speech enhancement
    Zeng, Xiao
    Xu, Shiyun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [4] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    Priyanka, S. Siva
    Kumar, T. Kishore
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 973 - 979
  • [5] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    S. Siva Priyanka
    T. Kishore Kumar
    Signal, Image and Video Processing, 2023, 17 : 973 - 979
  • [6] TIME-FREQUENCY MASKING STRATEGIES FOR SINGLE-CHANNEL LOW-LATENCY SPEECH ENHANCEMENT USING NEURAL NETWORKS
    Parviainen, Mikko
    Pertila, Pasi
    Virtanen, Tuomas
    Grosche, Peter
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 51 - 55
  • [7] MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS
    Tzirakis, Panagiotis
    Kumar, Anurag
    Donley, Jacob
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3415 - 3419
  • [8] SIMULTANEOUS OPTIMIZATION OF FORGETTING FACTOR AND TIME-FREQUENCY MASK FOR BLOCK ONLINE MULTI-CHANNEL SPEECH ENHANCEMENT
    Togami, Masahito
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2702 - 2706
  • [9] Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain
    Oostermeijer, Koen
    Wang, Qing
    Du, Jun
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 465 - 470
  • [10] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
    Xue, Cheng
    Huang, Weilong
    Chen, Weiguang
    Feng, Jinwei
    INTERSPEECH 2021, 2021, : 1862 - 1866