A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

被引:0
|
作者
Wang, Heming [1 ]
Pandey, Ashutosh [1 ]
Wang, Deliang [2 ]
机构
[1] Ohio State Univ, 281 Lane Ave, Columbus, OH 43210 USA
[2] Ctr Cognit & Brain Sci, 1835 Neil Ave, Columbus, OH 43210 USA
关键词
Speech enhancement; Speech dereverberation; Self-attention; ARN; DC-CRN; NEURAL-NETWORK; DEREVERBERATION; IDENTIFICATION; RECOGNITION;
D O I
10.1016/j.csl.2024.101677
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time -domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely -connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant -only and reverberant -noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time -domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Evaluation of microphone arrays for enhancing noisy and reverberant speech for coding
    Li, Z
    Hoffman, MW
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (01): : 91 - 95
  • [42] Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
    Wang, Zhong-Qiu
    Wichern, Gordon
    Le Roux, Jonathan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3476 - 3490
  • [43] Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments
    Kusumoto, A
    Arai, T
    Kinoshita, K
    Hodoshima, N
    Vaughan, N
    SPEECH COMMUNICATION, 2005, 45 (02) : 101 - 113
  • [44] Enhancement of reverberant speech using LP residual signal
    Yegnanarayana, B
    Murthy, PS
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (03): : 267 - 281
  • [45] Ensemble Based Speaker Verification Using Adapted Score Fusion in Noisy Reverberant Environments
    Nakanishi, Ryosuke
    Shiota, Sayaka
    Kiya, Hitoshi
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [46] AN IMPROVED NON-INTRUSIVE INTELLIGIBILITY METRIC FOR NOISY AND REVERBERANT SPEECH
    Santos, Joao F.
    Senoussaoui, Mohammed
    Falk, Tiago H.
    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 55 - 59
  • [47] A feature study for masking-based reverberant speech separation
    Delfarah, Masood
    Wang, DeLiang
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 555 - 559
  • [48] Neural Network Front-ends Based Speech Recognition In Reverberant Environments
    Zhang, Zhen
    Li, Peng
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ELECTRONIC TECHNOLOGY, 2016, 48 : 213 - 218
  • [49] Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation
    Md Jahangir Alam
    Vishwa Gupta
    Patrick Kenny
    Pierre Dumouchel
    EURASIP Journal on Advances in Signal Processing, 2015
  • [50] Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation
    Alam, Md Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    Dumouchel, Pierre
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015, : 1 - 13