Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models

被引:0
作者
Yongwon Jeong
机构
[1] Pusan National University,Department of Electronics Engineering
来源
Journal of Signal Processing Systems | 2016年 / 82卷
关键词
Eigenvoice adaptation; Speaker adaptation; Speech recognition; Two-dimensional PCA;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents the basis-based speaker adaptation method that includes approaches using principal component analysis (PCA) and two-dimensional PCA (2DPCA). The proposed method partitions the hidden Markov model (HMM) mean vectors of training models into subvectors of smaller dimension. Consequently, the sample covariance matrix computed using the partitioned HMM mean vectors has various dimensions according to the dimension of the subvectors. From the eigen-decomposition of the sample covariance matrix, basis vectors are constructed. Thus, the dimension of basis vectors varies according to the dimension of the sample covariance matrix, and the proposed method includes PCA and 2DPCA-based approaches. We present the adaptation equation in both the maximum likelihood (ML) and maximum a posteriori (MAP) frameworks. We perform continuous speech recognition experiments using the Wall Street Journal (WSJ) corpus. The results show that the model with basis vectors whose dimensions are between those of PCA and 2DPCA-based approaches shows good overall performance. The proposed approach in the MAP framework shows additional performance improvement over the ML counterpart when the number of adaptation parameters is large but the amount of available adaptation data is small. Furthermore, the performance of the approach in the MAP framework approach is less sensitive to the choice of model order than the ML counterpart.
引用
收藏
页码:303 / 310
页数:7
相关论文
共 27 条
  • [1] Rabiner LR(1989)A tutorial on hidden Markov models and selected applications in speech recognition Proceedings of the IEEE 77 257-286
  • [2] Gales M(2008)The application of hidden Markov models in speech recognition Foundations and Trends in Signal Processing 1 195-304
  • [3] Young S(2000)Rapid speaker adaptation in eigenvoice space IEEE Transactions on Audio, Speech, and Language Processing 8 695-707
  • [4] Kuhn R(2004)Subpattern-based principle component analysis Pattern Recognition 37 1081-1083
  • [5] Junqua J-C(2004)An improved face recognition technique based on modular PCA approach Pattern Recognition Letters 25 429-436
  • [6] Nguyen P(2013)Unified framework for basis-based speaker adaptation based on sample covariance matrix of variable dimension Speech Communication 55 340-346
  • [7] Niedzielski N(2004)Two-dimensional PCA: A new approach to appearance-based face representation and recognition IEEE Transactions on Pattern Analysis and Machine Intelligence 26 131-137
  • [8] Chen S(2012)Adaptation of hidden Markov models using model-as-matrix representation IEEE Transactions on Audio, Speech, and Language Processing 20 2352-2364
  • [9] Zhu Y(1977)Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39 1-38
  • [10] Gottumukkal R(1995)Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models Computer Speech and Language 9 171-185