Single-channel dereverberation and denoising based on lower band trained SA-LSTMs

被引:4
作者
Li, Yi [1 ]
Sun, Yang [2 ]
Naqvi, Syed Mohsen [1 ]
机构
[1] Newcastle Univ, Intelligent Sensing & Commun Res Grp, Newcastle Upon Tyne, Tyne & Wear, England
[2] Univ Oxford, Big Data Inst, Oxford, England
关键词
Fourier transforms; reverberation; speech enhancement; recurrent neural nets; computational complexity; approximation theory; supervised learning; speech mixture; lower band approach; speech enhancement performance; enhanced ratio mask; single-channel dereverberation; supervised single-channel speech enhancement; mixture recording; network parameters; reconstructed speech signal; human speech; noise interferences; signal approximation based neural networks; lower band trained SA-LSTMs; current neural network-based single-channel speech methods; power spectral density; long short-term memory; short-time Fourier transform; reverberant room environments; dereverberation mask; MONAURAL SOURCE SEPARATION; SPEECH DEREVERBERATION; COMPLEX-DOMAIN; MASKING; FEATURES; NOISE;
D O I
10.1049/iet-spr.2020.0134
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The supervised single-channel speech enhancement presents one mixture recording at the input of the neural network and updates network parameters in order to generate an output as the reconstructed speech signal. However, current neural networks-based single-channel speech enhancement methods are not able to fully utilise pertinence with the specific frequency range of speech signals with limited computational complexity. In this study, the authors studied the power spectral density of mixtures with human speech and noise interferences. Based on the theory that the speech signal distributes at the lower band, they proposed a method to train signal approximation (SA) based neural networks with the lower frequency band of the speech mixture to improve the performance. To realise the lower band approach for single-channel speech enhancement, the method uses a long short-term memory (LSTM) block to exploit short-time Fourier transform of the desired frequency range. Furthermore, in order to improve the speech enhancement performance within reverberant room environments, the dereverberation mask and the enhanced ratio mask are exploited as the training targets of two LSTM blocks, respectively. The detailed evaluations confirm that the proposed method outperforms the state-of-the-art methods.
引用
收藏
页码:774 / 782
页数:9
相关论文
共 40 条
[1]  
Albu F., 1996, P INT S EL TEL TIM R, P78
[2]  
[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225
[3]  
[Anonymous], TIMIT ACOUSTIC PHONE
[4]  
Braithwaite D.T., 2019, SPEECH ENHANCEMENT V
[5]   Long short-term memory for speaker generalization in supervised speech separation [J].
Chen, Jitong ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06) :4705-4714
[6]  
Choi H.S., 2019, 7 INT C LEARN REPR N
[7]  
Delfarah M, 2019, IEEE-ACM T AUDIO SPE, V27, P1839, DOI [10.1109/TASLP.2019.2934319, 10.1109/taslp.2019.2934319]
[8]   Features for Masking-Based Monaural Speech Separation in Reverberant Conditions [J].
Delfarah, Masood ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) :1085-1094
[9]  
Eaton J, 2015, INT CONF ACOUST SPEE, P46, DOI 10.1109/ICASSP.2015.7177929
[10]   Online Training of LSTM Networks in Distributed Systems for Variable Length Data Sequences [J].
Ergen, Tolga ;
Kozat, Suleyman S. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) :5159-5165