Dual channel neural network speech enhancement algorithm based on time frequency masking

被引:0
|
作者
Jia, Hairong [1 ]
Mei, Shulin [1 ]
Zhang, Min [1 ]
机构
[1] College of Information and Computer, Taiyuan University of Technology, Taiyuan,030024, China
来源
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition) | 2021年 / 49卷 / 06期
关键词
Speech intelligibility - Covariance matrix - Signal to noise ratio - Deep neural networks - Reverberation;
D O I
暂无
中图分类号
学科分类号
摘要
In order to improve the ability of speech enhancement algorithm eliminate directional noise and suppress reverberation, combining the advantages of single and multi-channel processing signals, a dual-channel neural network time-frequency masking speech enhancement algorithm was proposed. First, using the improved multi-resolution cochlear dynamic and static features (DSMRACC), combined with an adaptive mask (AM) optimized based on the signal-to-noise ratio (SNR), the dual-microphone signals were separately enhanced by a single channel deep neural network (DNN) to achieve the goal of fully utilizing the nonlinear features of speech to improve perception. Second, a steering vector localization method based on the AM was proposed to accurately calculate the spatial covariance matrix and steering vectors, locate the target speech accurately under the noise and reverberation environment. Finally, signal was input to a convolutional beamformer to further denoise and suppress reverberation. The experimental results show that compared with other speech enhancement algorithms, the enhanced speech has better speech quality and intelligibility. © 2021 Editorial Board of Journal of Huazhong University of Science and Technology. All right reserved.
引用
收藏
页码:43 / 49
相关论文
共 50 条
  • [1] Time frequency masking based speech enhancement using deep encoder-decoder neural network
    Shi, Wenhua
    Zhang, Xiongwei
    Zou, Xia
    Sun, Meng
    Li, Li
    Shengxue Xuebao/Acta Acustica, 2020, 45 (03): : 299 - 307
  • [2] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
    Xue, Cheng
    Huang, Weilong
    Chen, Weiguang
    Feng, Jinwei
    INTERSPEECH 2021, 2021, : 1862 - 1866
  • [3] Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks
    Chakrabarty, Soumitro
    Habets, Emanuel A. P.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 787 - 799
  • [4] A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network
    Han W.
    Zhang X.-W.
    Min G.
    Zhang Q.-Y.
    Zhang, Xiong-Wei (xwzhang9898@163.com), 2017, Science Press (43): : 248 - 258
  • [5] TIME-FREQUENCY MASKING BASED ONLINE SPEECH ENHANCEMENT WITH MULTI-CHANNEL DATA USING CONVOLUTIONAL NEURAL NETWORKS
    Chakrabarty, Soumitro
    Wang, DeLiang
    Habets, Emanuel A. P.
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 476 - 480
  • [6] TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK
    Soni, Meet H.
    Shah, Neil
    Patil, Hemant A.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5039 - 5043
  • [7] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
    Guo, Xinyu
    Ou, Shifeng
    Gao, Meng
    Gao, Ying
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
  • [8] PHASE TIME-FREQUENCY MASKING BASED SPEECH ENHANCEMENT ALGORITHM USING CIRCULAR MICROPHONE ARRAY
    He, Li
    Zhou, Yi
    Liu, Hongqing
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 808 - 813
  • [9] A time-frequency smoothing neural network for speech enhancement
    Yuan, Wenhao
    SPEECH COMMUNICATION, 2020, 124 : 75 - 84
  • [10] A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments
    Brutti, Alessio
    Tsiami, Antigoni
    Katsamanis, Athanasios
    Maragos, Petros
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2875 - 2879