Adaptation of Hidden Markov Models Using Model-as-Matrix Representation

被引：6

作者：

Jeong, Yongwon ^{[1
]}

机构：

[1] Pusan Natl Univ, Sch Elect Engn, Pusan 609735, South Korea

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 08期

关键词：

Generalized low rank approximations of matrices; matrix-variate distribution; speaker adaptation; speech recognition; two-dimensional principal component analysis (2DPCA); POSTERIORI LINEAR-REGRESSION; LOW-RANK APPROXIMATIONS; SPEAKER ADAPTATION; MAXIMUM-LIKELIHOOD; 2-DIMENSIONAL PCA;

D O I：

10.1109/TASL.2012.2202649

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we describe basis-based speaker adaptation techniques using the matrix representation of training models. Bases are obtained from training models by decomposition techniques for matrix-variate objects: two-dimensional principal component analysis (2DPCA) and generalized low rank approximations of matrices (GLRAM). The motivation for using matrix representation is that the sample covariance matrix of training models can be more accurately computed and the speaker weight becomes a matrix. Speaker adaptation equations are derived in the maximum-likelihood (ML) framework, and the adaptation equations can be solved using the maximum-likelihood linear regression technique. Additionally, novel applications of probabilistic 2DPCA and GLRAM to speaker adaptation are presented. From the probabilistic 2DPCA/GLRAM of training models, speaker adaptation equations are formulated in the maximum a posteriori (MAP) framework. The adaptation equations can be solved using the MAP linear regression technique. In the isolated-word experiments, the matrix representation-based methods in the ML and MAP frameworks outperformed maximum-likelihood linear regression adaptation, MAP adaptation, eigenvoice, and probabilistic PCA-based model for adaptation data longer than 20 s. Furthermore, the adaptation methods using probabilistic 2DPCA/GLRAM showed additional performance improvement over the adaptation methods using 2DPCA/GLRAM for small amounts of adaptation data.

引用

页码：2352 / 2364

页数：13

共 34 条

[1]

Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807

[2]

[Anonymous], 2002, Principal components analysis

[3]

Beal MJ, 2003, BAYESIAN STATISTICS 7, P453

[4]

Chen K.-t., 2000, INTERSPEECH, P742

[5]

Chen KT, 2001, INT CONF ACOUST SPEE, P317, DOI 10.1109/ICASSP.2001.940831

[6]

Chesta C., 1999, EUROSPEECH, P211

[7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[8] The Application of Hidden Markov Models in Speech Recognition [J].

Gales, Mark ;

Young, Steve .

FOUNDATIONS AND TRENDS IN SIGNAL PROCESSING, 2007, 1 (03) :195-304

[9] Cluster adaptive training of hidden Markov models [J].

Gales, MJF .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04) :417-428

[10] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

← 1 2 3 4 →