Speaker adaptation method based on eigenphone speaker subspace for speech recognition

被引:0
作者
Qu, Dan [1 ]
Zhang, Wen-Lin [1 ]
机构
[1] Institute of Information System Engineering, PLA Information Engineering University, Zhengzhou
来源
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology | 2015年 / 37卷 / 06期
关键词
Eigenphone; Eigenphones' speaker subspace; Eigenvoice; Low-rank constraint; Speaker adaptation; Speech signal processing;
D O I
10.11999/JEIT141264
中图分类号
学科分类号
摘要
The eigenphone speaker adaptation method performs well when the amount of adaptation data is sufficient. However, it suffers from severe over-fitting when insufficient amount of adaptation data is provided. A speaker adaptation method based on eigenphone speaker subspace is proposed to overcome this problem. Firstly, a brief overview of the eigenphone speaker adaptation method is presented in case of Hidden Markov Model-Gaussian Mixture Model (HMM-GMM) based speech recognition system. Secondly, speaker subspace is introduced to model the inter-speaker correlation information among different speakers' eigenphones. Thirdly, a new speaker adaptation method based on eigenphone speaker subspace is derived from estimation of a speaker dependent coordinate vector for each speaker. Finally, a comparison between the new method and traditional speaker subspace based method is discussed in detail. Experimental results on a Mandarin Chinese continuous speech recognition task show that compared with original eigenphone speaker adaptation method, the performance of the eigenphone speaker subspace method can be improved significantly when insufficient amount of adaptation data is provided. Compared with eigenvoice method, eigenphone speaker subspace method can save a great amount of storage space only at the expense of minor performance degradation. ©, 2015, Science Press. All right reserved.
引用
收藏
页码:1350 / 1356
页数:6
相关论文
共 16 条
[1]  
Zhang W.-L., Zhang W.-Q., Li B.-C., Et al., Bayesian speaker adaptation based on a new hierarchical probabilistic model, IEEE Transactions on Audio, Speech and Language Processing, 20, 7, pp. 2002-2015, (2012)
[2]  
Solomonoff A., Campbell W.M., Boardman I., Advances in channel compensation for SVM speaker recognition, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 629-632, (2005)
[3]  
Kumar D.S.P., Prasad N.V., Joshi V., Et al., Modified splice and its extension to non-stereo data for noise robust speech recognition, Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 174-179, (2013)
[4]  
Ghalehjegh S.H., Rose R.C., Two-stage speaker adaptation in subspace Gaussian mixture models, Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP), pp. 6374-6378, (2014)
[5]  
Wang Y.Q., Gale M.J.F., Tandem system adaptation using multiple linear feature transforms, Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP), pp. 7932-7936, (2013)
[6]  
Kenny P., Boulianne G., Dumouchel P., Eigenvoice modeling with sparse training data, IEEE Transactions on Speech and Audio Processing, 13, 3, pp. 345-354, (2005)
[7]  
Kenny P., Boulianne G., Dumouchel P., Et al., Speaker adaptation using an eigenphone basis, IEEE Transaction on Speech and Audio Processing, 12, 6, pp. 579-589, (2004)
[8]  
Zhang W.-L., Zhang W.-Q., Li B.-C., Speaker adaptation based on speaker-dependent eigenphone estimation, Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 48-52, (2011)
[9]  
Zhang W.-L., Zhang L.-H., Chen Q., Et al., Low-rank constraint eigenphone speaker adaptation method for speech recognition, Journal of Electronics & Information Technology, 36, 4, pp. 981-987, (2014)
[10]  
Zhang W.-L., Qu D., Zhang W.-Q., Speaker adaptation based on sparse and low-rank eigenphone matrix estimation, Proceedings of Annual Conference on International Speech Communication Association (INTERSPEECH), pp. 2972-2976, (2014)