Robust speaker recognition based on level-building voice activity detection

被引：0

作者：

Xie, Yan-Lu ^{[1
]}

Zhang, Jing-Song ^{[1
]}

Liu, Ming-Hui ^{[2
]}

Huang, Zhong-Wei ^{[2
]}

机构：

[1] College of Information Science, Beijing Language and Culture University

[2] Phonetic Laboratory, Shenzhen University

来源：

Shenzhen Daxue Xuebao (Ligong Ban)/Journal of Shenzhen University Science and Engineering | 2012年 / 29卷 / 04期

关键词：

Distributed speech recognition; Level-building; Likelihood measurement; Speaker identification; Speech signal processing; Voice activity detection;

D O I：

10.3724/SP.J.1249.2012.04328

中图分类号：

学科分类号：

摘要：

A level-building and two-stage Wiener filter methodology is proposed to improve the robustness in distributed noise speech recognition in ETSI(European Telecommunications Standards Institute)-DSR(Distributed Speech Recognition)-AFE(Advanced Front-End)standard. The speech is clustered in an unsupervised with a likelihood measurement. The level-building process for dividing speech at each level is introduced to reduce the computational load. Therefore, the boundaries of voice and non-voice data are precisely detected. Experiments have demonstrated that performance of this proposed methodology shows improvement by 18.9% in ETSI-DSR-AFE standard when the SNR of speech is greater than 0 dB. The recognition rate is also improved by 60.7% in comparison with that of Mel-frequently Ceptral coefficients( MFCC) system.

引用

页码：328 / 334

页数：6

共 22 条

[1]

Speech Processing, Transmission and Quality Aspects (STQ)

[2]

Distributed speech recognition

[3]

Advanced front-end feature extraction algorithm

[4]

Gales M.J.F., Model-based techniques fornoise robust speech recognition, (1995)

[5]

Reynolds D.A., Channel robust speaker verification via feature mapping, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, pp. 53-56, (2003)

[6]

Zhang X., Wang H.-P., Xiao X., Et al., Maximum a posteriori linear regression for speaker recognition, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 4542-4545, (2010)

[7]

Kim D.K., Gales M.J.F., Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, 19, 2, pp. 315-325, (2011)

[8]

Lu Y., Wu Z.-Y., Maximum likelihood polynomial regression for robust speech recognition, ACTA Acustica, 35, 1, pp. 88-96, (2010)

[9]

Garcia A.A., Mammone R.J., Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 325-328, (1999)

[10]

Sturim D., Campbell W., Dehak N., Et al., The MIT LL 2010 speaker recognition evaluation system: scalable language-independent speaker recognition, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 5272-5275, (2011)

← 1 2 3 →