A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

被引：0

作者：

Wang, Heming ^{[1
]}

Pandey, Ashutosh ^{[1
]}

Wang, Deliang ^{[2
]}

机构：

[1] Ohio State Univ, 281 Lane Ave, Columbus, OH 43210 USA

[2] Ctr Cognit & Brain Sci, 1835 Neil Ave, Columbus, OH 43210 USA

来源：

COMPUTER SPEECH AND LANGUAGE | 2025年 / 89卷

关键词：

Speech enhancement; Speech dereverberation; Self-attention; ARN; DC-CRN; NEURAL-NETWORK; DEREVERBERATION; IDENTIFICATION; RECOGNITION;

D O I：

10.1016/j.csl.2024.101677

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time -domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely -connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant -only and reverberant -noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time -domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.

引用

页数：12

共 50 条

[41] Evaluation of microphone arrays for enhancing noisy and reverberant speech for coding
Li, Z
Hoffman, MW
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (01): : 91 - 95
[42] Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
Wang, Zhong-Qiu
Wichern, Gordon
Le Roux, Jonathan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3476 - 3490
[43] Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments
Kusumoto, A
Arai, T
Kinoshita, K
Hodoshima, N
Vaughan, N
SPEECH COMMUNICATION, 2005, 45 (02) : 101 - 113
[44] Enhancement of reverberant speech using LP residual signal
Yegnanarayana, B
Murthy, PS
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (03): : 267 - 281
[45] Ensemble Based Speaker Verification Using Adapted Score Fusion in Noisy Reverberant Environments
Nakanishi, Ryosuke
Shiota, Sayaka
Kiya, Hitoshi
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[46] AN IMPROVED NON-INTRUSIVE INTELLIGIBILITY METRIC FOR NOISY AND REVERBERANT SPEECH
Santos, Joao F.
Senoussaoui, Mohammed
Falk, Tiago H.
2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 55 - 59
[47] A feature study for masking-based reverberant speech separation
Delfarah, Masood
Wang, DeLiang
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 555 - 559
[48] Neural Network Front-ends Based Speech Recognition In Reverberant Environments
Zhang, Zhen
Li, Peng
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ELECTRONIC TECHNOLOGY, 2016, 48 : 213 - 218
[49] Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation
Md Jahangir Alam
Vishwa Gupta
Patrick Kenny
Pierre Dumouchel
EURASIP Journal on Advances in Signal Processing, 2015
[50] Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation
Alam, Md Jahangir
Gupta, Vishwa
Kenny, Patrick
Dumouchel, Pierre
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015, : 1 - 13

← 1 2 3 4 5 →