A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

被引：72

作者：

Wu, Bo ^{[1
]}

Li, Kehuang ^{[2
]}

Yang, Minglei ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Xidian Univ, Natl Lab Radar Signal Proc, Xian 710126, Peoples R China

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Acoustic context; deep neural networks (DNNs); frame shift; linear output layer; mean-variance normalization; reverberation-time-aware (RTA); speech dereverberation; ALGORITHM; SUPPRESSION; PREDICTION;

D O I：

10.1109/TASLP.2016.2623559

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A reverberation-time-aware deep-neural-network (DNN)-based speech dereverberation framework is proposed to handle a wide range of reverberation times. There are three key steps in designing a robust system. First, in contrast to sigmoid activation and min-max normalization in state-of-the-art algorithms, a linear activation function at the output layer and global meanvariance normalization of target features are adopted to learn the complicated nonlinear mapping function from reverberant to anechoic speech and to improve the restoration of the low-frequency and intermediate-frequency contents. Next, two key design parameters, namely, frame shift size in speech framing and acoustic context window size at the DNN input, are investigated to show that RT60-dependent parameters are needed in the DNN training stage in order to optimize the system performance in diverse reverberant environments. Finally, the reverberation time is estimated to select the proper frame shift and context window sizes for feature extraction before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. Our experimental results indicate that the proposed framework outperforms the conventional DNNs without taking the reverberation time into account, while achieving a performance only slightly worse than the oracle cases with known reverberation times even for extremely weak and severe reverberant conditions. It also generalizes well to unseen room sizes, loudspeaker and microphone positions, and recorded room impulse responses.

引用

页码：102 / 111

页数：10

共 50 条

[1] A Late Reverberation Power Spectral Density Aware Approach to Speech Dereverberation Based on Deep Neural Networks
Qi, Yuanlei
Yang, Feiran
Yang, Jun
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1700 - 1703
[2] A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation
Bo Wu
Minglei Yang
Kehuang Li
Zhen Huang
Sabato Marco Siniscalchi
Tong Wang
Chin-Hui Lee
EURASIP Journal on Advances in Signal Processing, 2017
[3] A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation
Wu, Bo
Yang, Minglei
Li, Kehuang
Huang, Zhen
Siniscalchi, Sabato Marco
Wang, Tong
Lee, Chin-Hui
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2017,
[4] A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation
Sidheswar Routray
Qirong Mao
Neural Computing and Applications, 2022, 34 : 9831 - 9845
[5] A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation
Routray, Sidheswar
Mao, Qirong
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (12): : 9831 - 9845
[6] Speech dereverberation method with convolutional neural network and reverberation time attention
Sun, Xingwei
Li, Junfeng
Yan, Yonghong
Shengxue Xuebao/Acta Acustica, 2021, 46 (06): : 1234 - 1241
[7] Speech Dereverberation With Context-Aware Recurrent Neural Networks
Santos, Joao Felipe
Falk, Tiago H.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (07) : 1232 - 1242
[8] A Maximum Likelihood Approach to Deep Neural Network Based Speech Dereverberation
Wang, Xin
Du, Jun
Wang, Yannan
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 155 - 158
[9] Speech dereverberation based on blind estimation of a reverberation filter
Zee, Min-Seon
Park, Hyung-Min
IEICE ELECTRONICS EXPRESS, 2009, 6 (20): : 1456 - 1461
[10] Phase-Aware Speech Enhancement Based on Deep Neural Networks
Zheng, Naijun
Zhang, Xiao-Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 63 - 76

← 1 2 3 4 5 →