A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

被引:0
|
作者
Wang, Heming [1 ]
Pandey, Ashutosh [1 ]
Wang, Deliang [2 ]
机构
[1] Ohio State Univ, 281 Lane Ave, Columbus, OH 43210 USA
[2] Ctr Cognit & Brain Sci, 1835 Neil Ave, Columbus, OH 43210 USA
关键词
Speech enhancement; Speech dereverberation; Self-attention; ARN; DC-CRN; NEURAL-NETWORK; DEREVERBERATION; IDENTIFICATION; RECOGNITION;
D O I
10.1016/j.csl.2024.101677
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time -domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely -connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant -only and reverberant -noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time -domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments
    Lemercier, Jean-Marie
    Thiemann, Joachim
    Koning, Raphael
    Gerkmann, Timo
    INTERSPEECH 2022, 2022, : 226 - 230
  • [32] Strategies for distant speech recognition in reverberant environments
    Delcroix, Marc
    Yoshioka, Takuya
    Ogawa, Atsunori
    Kubo, Yotaro
    Fujimoto, Masakiyo
    Ito, Nobutaka
    Kinoshita, Keisuke
    Espi, Miquel
    Araki, Shoko
    Hori, Takaaki
    Nakatani, Tomohiro
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [33] Survey on Approaches to Speech Recognition in Reverberant Environments
    Yoshioka, Takuya
    Sehr, Armin
    Delcroix, Marc
    Kinoshita, Keisuke
    Maas, Roland
    Nakatani, Tomohiro
    Kellermann, Walter
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [34] Strategies for distant speech recognitionin reverberant environments
    Marc Delcroix
    Takuya Yoshioka
    Atsunori Ogawa
    Yotaro Kubo
    Masakiyo Fujimoto
    Nobutaka Ito
    Keisuke Kinoshita
    Miquel Espi
    Shoko Araki
    Takaaki Hori
    Tomohiro Nakatani
    EURASIP Journal on Advances in Signal Processing, 2015
  • [35] Reverberant Speech Enhancement by Temporal and Spectral Processing
    Krishnamoorthy, P.
    Prasanna, S. R. Mahadeva
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 253 - 266
  • [36] Binary Mask Estimation for Improved Speech Intelligibility in Reverberant Environments
    Hazrati, Oldooz
    Lee, Jaewook
    Loizou, Philipos
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 162 - 165
  • [37] REAL-TIME SPEECH ENHANCEMENT IN NOISY REVERBERANT MULTI-TALKER ENVIRONMENTS BASED ON A LOCATION-INDEPENDENT ROOM ACOUSTICS MODEL
    Nakatani, Tomohiro
    Yoshioka, Takuya
    Kinoshita, Keisuke
    Miyoshi, Masato
    Juang, Biing-Hwang
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 137 - 140
  • [38] Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems
    Ayllon, David
    Sanchez-Hevia, Hector A.
    Figueroa, Carol
    Lanchantin, Pierre
    INTERSPEECH 2019, 2019, : 1511 - 1515
  • [39] A stereophonic acoustic signal extraction scheme for noisy and reverberant environments
    Reindl, Klaus
    Zheng, Yuanhang
    Schwarz, Andreas
    Meier, Stefan
    Maas, Roland
    Sehr, Armin
    Kellermann, Walter
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03) : 726 - 745
  • [40] WHAMR!: NOISY AND REVERBERANT SINGLE-CHANNEL SPEECH SEPARATION
    Maciejewski, Matthew
    Wichern, Gordon
    McQuinn, Emmett
    Le Roux, Jonathan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 696 - 700