A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

被引:72
|
作者
Wu, Bo [1 ]
Li, Kehuang [2 ]
Yang, Minglei [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Xidian Univ, Natl Lab Radar Signal Proc, Xian 710126, Peoples R China
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Acoustic context; deep neural networks (DNNs); frame shift; linear output layer; mean-variance normalization; reverberation-time-aware (RTA); speech dereverberation; ALGORITHM; SUPPRESSION; PREDICTION;
D O I
10.1109/TASLP.2016.2623559
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A reverberation-time-aware deep-neural-network (DNN)-based speech dereverberation framework is proposed to handle a wide range of reverberation times. There are three key steps in designing a robust system. First, in contrast to sigmoid activation and min-max normalization in state-of-the-art algorithms, a linear activation function at the output layer and global meanvariance normalization of target features are adopted to learn the complicated nonlinear mapping function from reverberant to anechoic speech and to improve the restoration of the low-frequency and intermediate-frequency contents. Next, two key design parameters, namely, frame shift size in speech framing and acoustic context window size at the DNN input, are investigated to show that RT60-dependent parameters are needed in the DNN training stage in order to optimize the system performance in diverse reverberant environments. Finally, the reverberation time is estimated to select the proper frame shift and context window sizes for feature extraction before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. Our experimental results indicate that the proposed framework outperforms the conventional DNNs without taking the reverberation time into account, while achieving a performance only slightly worse than the oracle cases with known reverberation times even for extremely weak and severe reverberant conditions. It also generalizes well to unseen room sizes, loudspeaker and microphone positions, and recorded room impulse responses.
引用
收藏
页码:102 / 111
页数:10
相关论文
共 50 条
  • [1] A Late Reverberation Power Spectral Density Aware Approach to Speech Dereverberation Based on Deep Neural Networks
    Qi, Yuanlei
    Yang, Feiran
    Yang, Jun
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1700 - 1703
  • [2] A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation
    Bo Wu
    Minglei Yang
    Kehuang Li
    Zhen Huang
    Sabato Marco Siniscalchi
    Tong Wang
    Chin-Hui Lee
    EURASIP Journal on Advances in Signal Processing, 2017
  • [3] A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation
    Wu, Bo
    Yang, Minglei
    Li, Kehuang
    Huang, Zhen
    Siniscalchi, Sabato Marco
    Wang, Tong
    Lee, Chin-Hui
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2017,
  • [4] A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation
    Sidheswar Routray
    Qirong Mao
    Neural Computing and Applications, 2022, 34 : 9831 - 9845
  • [5] A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation
    Routray, Sidheswar
    Mao, Qirong
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (12): : 9831 - 9845
  • [6] Speech dereverberation method with convolutional neural network and reverberation time attention
    Sun, Xingwei
    Li, Junfeng
    Yan, Yonghong
    Shengxue Xuebao/Acta Acustica, 2021, 46 (06): : 1234 - 1241
  • [7] Speech Dereverberation With Context-Aware Recurrent Neural Networks
    Santos, Joao Felipe
    Falk, Tiago H.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (07) : 1232 - 1242
  • [8] A Maximum Likelihood Approach to Deep Neural Network Based Speech Dereverberation
    Wang, Xin
    Du, Jun
    Wang, Yannan
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 155 - 158
  • [9] Speech dereverberation based on blind estimation of a reverberation filter
    Zee, Min-Seon
    Park, Hyung-Min
    IEICE ELECTRONICS EXPRESS, 2009, 6 (20): : 1456 - 1461
  • [10] Phase-Aware Speech Enhancement Based on Deep Neural Networks
    Zheng, Naijun
    Zhang, Xiao-Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 63 - 76