Speech enhancement using long short term memory with trained speech features and adaptive wiener filter

被引:15
作者
Garg, Anil [1 ]
机构
[1] Maharishi Markandeshwar Deemed Univ, Maharishi Markandeshwar Engn Coll, ECE Dept, Ambala 134007, Haryana, India
基金
英国科研创新办公室;
关键词
Speech processing; Speech enhancement; Empirical mean decomposition; Empirical mean curve decomposition; Wiener filter; NONNEGATIVE MATRIX FACTORIZATION; NOISE; MASKING; MODEL;
D O I
10.1007/s11042-022-13302-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech enhancement is the process of enhancing the clarity and intelligibility of speech signals that have been degraded due to background noise. With the assistance of deep learning, a novel speech signal enhancement model is introduced in this research. The proposed model is divided into two phases: (i) Training (ii) Testing. In the training phase, the noise spectrum and signal spectrum are estimated via a Non-negative Matrix Factorization (NMF) from the noisy input signal. Then, Empirical Mean Decomposition (EMD) features are extracted from the Wiener filter. The de-noised signal is acquired from EMD, the bark frequency is evaluated and the Fractional Delta AMS features are extracted. The key contribution of this study is the use of the Long Short Term Memory (LSTM) model to properly estimate the tuning factor eta of the Wiener filter for all input signals. The LSTM was trained by the extracted features (EMD) via a modified wiener filter for decomposing the spectral signal and the output of EMD is the denoised enhanced speech signal. A comparative evaluation is carried out between the proposed and existing models in terms of error measures.
引用
收藏
页码:3647 / 3675
页数:29
相关论文
共 46 条
[1]  
Anita JS., 2019, MULTIMEDIA RES, V2, P9
[2]  
Arul VH., 2019, MULTIMEDIA RES, V2, P37
[3]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[4]   A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement [J].
Chai, Li ;
Du, Jun ;
Liu, Qing-Feng ;
Lee, Chin-Hui .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :106-117
[5]   Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement [J].
Chung, Hanwook ;
Plourde, Eric ;
Champagne, Benoit .
SPEECH COMMUNICATION, 2017, 87 :18-30
[6]   Speech enhancement for non-stationary noise environments [J].
Cohen, I ;
Berdugo, B .
SIGNAL PROCESSING, 2001, 81 (11) :2403-2418
[7]   Multi-objective based multi-channel speech enhancement with BiLSTM network [J].
Cui, Xingyue ;
Chen, Zhe ;
Yin, Fuliang .
APPLIED ACOUSTICS, 2021, 177
[8]  
Darekar R. V., 2019, Multimedia Research, V2, P12
[9]   Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering [J].
Dionelis, Nikolaos ;
Brookes, Mike .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (05) :937-950
[10]  
Garg A, 2020, ENHANCEMENT SPEECH S