Gaussian mixture models with covariances or precisions in shared multiple subspaces

被引：11

作者：

Dharanipragada, Satya ^{[1
]}

Visweswariah, Karthik

机构：

[1] Citadel Investment Grp, Chicago, IL 60603 USA

[2] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 04期

关键词：

covariance matrices; density functions; EM algorithm; factor analysis; Gaussian mixture models (GMMs); speech recognition;

D O I：

10.1109/TSA.2005.860835

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We introduce a class of Gaussian mixture models (GMMs) in which the covariances or the precisions (inverse co-variances) are restricted to lie in subspaces spanned by rank-one symmetric matrices. The rank-one basis are shared between the Gaussians according to a sharing structure. We describe an algorithm for estimating the parameters of the GMM in a maximum likelihood framework given a sharing structure. We employ these models for modeling the observations in the hidden-states of a hidden Markov model based speech recognition system. We show that this class of models provide improvement in accuracy and computational efficiency over well-known covariance modeling techniques such as classical factor analysis, shared factor analysis and maximum likelihood linear transformation based models which are special instances of this class of models. We also investigate different sharing mechanisms. We show that for the same number of parameters, modeling precisions leads to better performance when compared to modeling covariances. Modeling precisions also gives a distinct advantage in computational and memory requirements.

引用

页码：1255 / 1266

页数：12

共 32 条

[1] [Anonymous], 1979, Multivariate analysis
[2] ASPACH DL, 1974, INF SCI, V7, P271
[3] AXELROD S, 2002, P ICSLP, P2177
[4] BAHL LR, 1995, P ICASSP, V1, P41
[5] Chen SS, 1998, INT CONF ACOUST SPEE, P645, DOI 10.1109/ICASSP.1998.675347
[6] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
DAVIS, SB
MERMELSTEIN, P
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
[7] A robust high accuracy speech recognition system for mobile applications
Deligne, S
Dharainipragada, S
Gopinath, R
Maison, B
Olsen, P
Printz, H
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (08): : 551 - 561
[8] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[9] DHARANIPRAGADA S, 2003, P ICASSP, V1, P904
[10] VITERBI ALGORITHM
FORNEY, GD
[J]. PROCEEDINGS OF THE IEEE, 1973, 61 (03) : 268 - 278

← 1 2 3 4 →