Robust Voice Activity Detection Based on Complementary BLSTM Enhancement Stage

被引:0
作者
Shahryary, Iman [1 ]
Seyedin, Sanaz [1 ]
Ahadi, Seyed Mohammad [1 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran Polytech, Tehran, Iran
来源
2020 28TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE) | 2020年
关键词
Bidirectional Long Short-Term Memory; Joint learning; Multi-Resolution Cochleagram; Voice Activity Detection; SPEECH RECOGNITION; NOISE; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose a new two-stage deep structure with a joint learning technique to improve Voice Activity Detection (VAD) in different noisy conditions especially in unseen noises. The first stage of our proposed method deals with the enhancement of the noisy signal, which is complementary to the second stage. Bidirectional Long Short-Term Memory (BLSTM) architecture is used in this part so as to take benefit from both previous and upcoming frames. The second stage uses the enhanced frames features to predict the speech presence probability. Based on previous studies, we use Multi-Resolution Cochleagram (MRCG) features to achieve higher robustness. We evaluate our proposed method using the Area Under the Curve (AUC) and precision metrics in TIMIT corpus. Based on our evaluations, the proposed method outperforms other state-of-the-art methods based on deep structures as baseline, both in AUC and precision metrics. The proposed method's AUC improvement versus other methods, in noises not seen in the training step, is significant.
引用
收藏
页码:1608 / 1612
页数:5
相关论文
共 22 条
[1]  
[Anonymous], 1993, Tech. Rep. LDC93S1
[2]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[3]   A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios [J].
Chen, Jitong ;
Wang, Yuxuan ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1993-2002
[4]   Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J].
Graves, A ;
Schmidhuber, J .
NEURAL NETWORKS, 2005, 18 (5-6) :602-610
[5]  
Jung Y, 2018, INTERSPEECH, P1210
[6]  
Junqua J.-C., 1991, EUROSPEECH 91. 2nd European Conference on Speech Communication and Technology Proceedings, P1371
[7]  
Nahar S.M. R., 2016, 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA), P1
[8]  
Ng T, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P1967
[9]  
Sak H, 2014, INTERSPEECH, P338
[10]  
Schroder Gunnar, 2011, UCERSTI2 WORKSH 5 AC, V23