Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition

被引:1
作者
Soni, Meet H. [1 ]
Joshi, Sonal [1 ]
Panda, Ashish [1 ]
机构
[1] TCS Innovat Labs, Mumbai, Maharashtra, India
来源
TEXT, SPEECH, AND DIALOGUE (TSD 2019) | 2019年 / 11697卷
关键词
Speech recognition; Noise robustness; Front-end processing; Multiconditioned training; FEATURES; NOISE;
D O I
10.1007/978-3-030-27947-9_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present comparison of various front-end enhancement techniques and multiconditioned training for robust Automatic Speech Recognition (ASR) for additive noise. We compare De-noising Autoencoders (DAEs) based on Deep Neural Network (DNN), Time-Delay Neural Network (TDNN) architecture, and Time-Frequency (T-F) masking based DNN based front-ends. We train these front-ends and evaluate their performance on various seen/unseen noise conditions. In multiconditioned training, we train acoustic model on various noise conditions and test on seen/unseen noises along with Noise Aware Training (NAT). The results suggest that all front-ends provide performance improvement for seen noise conditions while degrading performance for unseen noise conditions. TDNN-DAE provides the most improvement for seen conditions while giving the most degradation for unseen conditions. We use a method to improve performance of TDNN-DAE in unseen conditions by training it on features enhanced using Vector Taylor Series with Acoustic Masking (VTS-AM) and Spectral Subtraction (SS). We show that these enhancement techniques improve the efficacy of the TDNN-DAE significantly in unseen noise conditions. Overall we observed that multiconditioned training still gives better performance in case of both seen/unseen noise conditions, although the enhanced TDNN-DAE comes closest among all the front-ends to the performance of multiconditioned training.
引用
收藏
页码:329 / 340
页数:12
相关论文
共 30 条
[1]  
[Anonymous], 1990, Readings Speech Recognit, DOI DOI 10.1016/B978-0-08-051584-7.50037-1
[2]  
[Anonymous], 2017, NEW ERA ROBUST SPEEC
[3]  
[Anonymous], 2011, WORKSH AUT SPEECH RE
[4]  
[Anonymous], 2015, P INT
[5]  
Berouti M., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P208
[6]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[7]  
Brookes M, 2011, VOICEBOX SPEECH PROC, V47
[8]   Improved Automatic Speech Recognition using Subband Temporal Envelope Features and Time-delay Neural Network Denoising Autoencoder [J].
Cong-Thanh Do ;
Stylianou, Yannis .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :3832-3836
[9]  
Das B, 2017, INT CONF ACOUST SPEE, P5235, DOI 10.1109/ICASSP.2017.7953155
[10]  
Dean D.B., 2010, P INT