A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

被引：0

作者：

Wang, Heming ^{[1
]}

Pandey, Ashutosh ^{[1
]}

Wang, Deliang ^{[2
]}

机构：

[1] Ohio State Univ, 281 Lane Ave, Columbus, OH 43210 USA

[2] Ctr Cognit & Brain Sci, 1835 Neil Ave, Columbus, OH 43210 USA

来源：

COMPUTER SPEECH AND LANGUAGE | 2025年 / 89卷

关键词：

Speech enhancement; Speech dereverberation; Self-attention; ARN; DC-CRN; NEURAL-NETWORK; DEREVERBERATION; IDENTIFICATION; RECOGNITION;

D O I：

10.1016/j.csl.2024.101677

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time -domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely -connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant -only and reverberant -noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time -domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.

引用

页数：12

共 50 条

[31] Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments
Lemercier, Jean-Marie
Thiemann, Joachim
Koning, Raphael
Gerkmann, Timo
INTERSPEECH 2022, 2022, : 226 - 230
[32] Strategies for distant speech recognition in reverberant environments
Delcroix, Marc
Yoshioka, Takuya
Ogawa, Atsunori
Kubo, Yotaro
Fujimoto, Masakiyo
Ito, Nobutaka
Kinoshita, Keisuke
Espi, Miquel
Araki, Shoko
Hori, Takaaki
Nakatani, Tomohiro
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
[33] Survey on Approaches to Speech Recognition in Reverberant Environments
Yoshioka, Takuya
Sehr, Armin
Delcroix, Marc
Kinoshita, Keisuke
Maas, Roland
Nakatani, Tomohiro
Kellermann, Walter
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[34] Strategies for distant speech recognitionin reverberant environments
Marc Delcroix
Takuya Yoshioka
Atsunori Ogawa
Yotaro Kubo
Masakiyo Fujimoto
Nobutaka Ito
Keisuke Kinoshita
Miquel Espi
Shoko Araki
Takaaki Hori
Tomohiro Nakatani
EURASIP Journal on Advances in Signal Processing, 2015
[35] Reverberant Speech Enhancement by Temporal and Spectral Processing
Krishnamoorthy, P.
Prasanna, S. R. Mahadeva
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 253 - 266
[36] Binary Mask Estimation for Improved Speech Intelligibility in Reverberant Environments
Hazrati, Oldooz
Lee, Jaewook
Loizou, Philipos
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 162 - 165
[37] REAL-TIME SPEECH ENHANCEMENT IN NOISY REVERBERANT MULTI-TALKER ENVIRONMENTS BASED ON A LOCATION-INDEPENDENT ROOM ACOUSTICS MODEL
Nakatani, Tomohiro
Yoshioka, Takuya
Kinoshita, Keisuke
Miyoshi, Masato
Juang, Biing-Hwang
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 137 - 140
[38] Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems
Ayllon, David
Sanchez-Hevia, Hector A.
Figueroa, Carol
Lanchantin, Pierre
INTERSPEECH 2019, 2019, : 1511 - 1515
[39] A stereophonic acoustic signal extraction scheme for noisy and reverberant environments
Reindl, Klaus
Zheng, Yuanhang
Schwarz, Andreas
Meier, Stefan
Maas, Roland
Sehr, Armin
Kellermann, Walter
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03) : 726 - 745
[40] WHAMR!: NOISY AND REVERBERANT SINGLE-CHANNEL SPEECH SEPARATION
Maciejewski, Matthew
Wichern, Gordon
McQuinn, Emmett
Le Roux, Jonathan
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 696 - 700

← 1 2 3 4 5 →