Low-rank constraint eigenphone speaker adaptation method for speech recognition

被引:0
作者
机构
[1] Institute of Information System Engineering, PLA Information Engineering University
来源
Zhang, W.-L. (zwlin_2004@163.com) | 1600年 / Science Press卷 / 36期
关键词
Eigenphone; Low-rank constraint; Proximal gradient method; Speaker adaptation; Speech recognition;
D O I
10.3724/SP.J.1146.2013.00848
中图分类号
学科分类号
摘要
A low-rank constraint eigenphone speaker adaptation method is proposed. Original eigenphone speaker adaptation method performs well when the amount of adaptation data is sufficient. However, it suffers from server overfitting when insufficient amount of adaptation data is provided, possibly resulting in lower performance than that of the unadapted system. Firstly, a simplified estimation alogrithm of the eigenphone matrix is deduced in case of hidden Markov model-Gaussian mixture model (HMM-GMM) based speech recognition system with diagonal covariance matrices. Then, a low-rank constraint is applied to estimation of the eigenphone matrix. The nuclear norm is used as a convex approximation of the rank of a matrix. The weight of the norm is adjusted to control the complexity of the adaptation model. Finally, an accelerated proximal gradient method is adopted to solve the mathematic optimization. Experiments on an Mandarin Chinese continuous speech recognition task show that, the performance of the original eigenphone method is improved remarkably. The new method outperforms the maximum likelihood linear regression followed by maximum a posterriori (MLLR+MAP) methods under 5~50 s adaptation data testing conditions.
引用
收藏
页码:981 / 987
页数:6
相关论文
共 17 条
  • [1] Zhang W.-L., Zhang W.-Q., Li B.-C., Et al., Bayesian speaker adaptation based on a new hierarchical probabilistic model, IEEE Transactions on Audio, Speech and Language Processing, 20, 7, pp. 2002-2015, (2012)
  • [2] Zhang S.-L., Qin Y., Model dimensionality selection in bilinear transformation for feature space MLLR rapid speaker adaptation, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 4353-4356, (2012)
  • [3] Zhang W.-L., Niu T., Zhang L.-H., Et al., Rapid speaker adaptation based on maximum-likelihood variable subspace, Journal of Electronics & Information Technology, 34, 3, pp. 571-575, (2012)
  • [4] Zhang W.-L., Zhang W.-Q., Li B.-C., Speaker adaptation based on speaker-dependent eigenphone estimation, Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 48-52, (2011)
  • [5] Gemmeke J., Van Hamme H., Advances in noise robust digit recognition using hybrid exemplar-based techniques, Proceedings of Interspeech, (2012)
  • [6] Lu L., Ghoshal A., Renals S., Regularized subspace Gaussian mixture models for speech recognition, IEEE Signal Processing Letters, 18, 7, pp. 419-422, (2011)
  • [7] Yu D., Seide F., Li G., Et al., Exploiting sparseness in deep neural networks for large vocabulary speech recognition, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 4409-4412, (2012)
  • [8] Deledalle C., Vaiter S., Peyre G., Et al., Risk estimation for matrix recovery with spectral regularization, (2012)
  • [9] Candes E.J., Li X., Ma Y., Et al., Robust principal component analysis?, Journal of ACM, 58, 3, (2011)
  • [10] Chen C.-F., Wei C.-P., Wang Y.C.F., Low-rank matrix recovery with structural incoherence for robust face recognition, Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 2618-2625, (2012)