Robust speaker recognition based on level-building voice activity detection

被引:0
作者
Xie, Yan-Lu [1 ]
Zhang, Jing-Song [1 ]
Liu, Ming-Hui [2 ]
Huang, Zhong-Wei [2 ]
机构
[1] College of Information Science, Beijing Language and Culture University
[2] Phonetic Laboratory, Shenzhen University
来源
Shenzhen Daxue Xuebao (Ligong Ban)/Journal of Shenzhen University Science and Engineering | 2012年 / 29卷 / 04期
关键词
Distributed speech recognition; Level-building; Likelihood measurement; Speaker identification; Speech signal processing; Voice activity detection;
D O I
10.3724/SP.J.1249.2012.04328
中图分类号
学科分类号
摘要
A level-building and two-stage Wiener filter methodology is proposed to improve the robustness in distributed noise speech recognition in ETSI(European Telecommunications Standards Institute)-DSR(Distributed Speech Recognition)-AFE(Advanced Front-End)standard. The speech is clustered in an unsupervised with a likelihood measurement. The level-building process for dividing speech at each level is introduced to reduce the computational load. Therefore, the boundaries of voice and non-voice data are precisely detected. Experiments have demonstrated that performance of this proposed methodology shows improvement by 18.9% in ETSI-DSR-AFE standard when the SNR of speech is greater than 0 dB. The recognition rate is also improved by 60.7% in comparison with that of Mel-frequently Ceptral coefficients( MFCC) system.
引用
收藏
页码:328 / 334
页数:6
相关论文
共 22 条
[1]  
Speech Processing, Transmission and Quality Aspects (STQ)
[2]  
Distributed speech recognition
[3]  
Advanced front-end feature extraction algorithm
[4]  
Gales M.J.F., Model-based techniques fornoise robust speech recognition, (1995)
[5]  
Reynolds D.A., Channel robust speaker verification via feature mapping, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, pp. 53-56, (2003)
[6]  
Zhang X., Wang H.-P., Xiao X., Et al., Maximum a posteriori linear regression for speaker recognition, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 4542-4545, (2010)
[7]  
Kim D.K., Gales M.J.F., Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, 19, 2, pp. 315-325, (2011)
[8]  
Lu Y., Wu Z.-Y., Maximum likelihood polynomial regression for robust speech recognition, ACTA Acustica, 35, 1, pp. 88-96, (2010)
[9]  
Garcia A.A., Mammone R.J., Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 325-328, (1999)
[10]  
Sturim D., Campbell W., Dehak N., Et al., The MIT LL 2010 speaker recognition evaluation system: scalable language-independent speaker recognition, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 5272-5275, (2011)