Gaussian mixture models with covariances or precisions in shared multiple subspaces

被引:11
作者
Dharanipragada, Satya [1 ]
Visweswariah, Karthik
机构
[1] Citadel Investment Grp, Chicago, IL 60603 USA
[2] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 04期
关键词
covariance matrices; density functions; EM algorithm; factor analysis; Gaussian mixture models (GMMs); speech recognition;
D O I
10.1109/TSA.2005.860835
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce a class of Gaussian mixture models (GMMs) in which the covariances or the precisions (inverse co-variances) are restricted to lie in subspaces spanned by rank-one symmetric matrices. The rank-one basis are shared between the Gaussians according to a sharing structure. We describe an algorithm for estimating the parameters of the GMM in a maximum likelihood framework given a sharing structure. We employ these models for modeling the observations in the hidden-states of a hidden Markov model based speech recognition system. We show that this class of models provide improvement in accuracy and computational efficiency over well-known covariance modeling techniques such as classical factor analysis, shared factor analysis and maximum likelihood linear transformation based models which are special instances of this class of models. We also investigate different sharing mechanisms. We show that for the same number of parameters, modeling precisions leads to better performance when compared to modeling covariances. Modeling precisions also gives a distinct advantage in computational and memory requirements.
引用
收藏
页码:1255 / 1266
页数:12
相关论文
共 32 条
  • [1] [Anonymous], 1979, Multivariate analysis
  • [2] ASPACH DL, 1974, INF SCI, V7, P271
  • [3] AXELROD S, 2002, P ICSLP, P2177
  • [4] BAHL LR, 1995, P ICASSP, V1, P41
  • [5] Chen SS, 1998, INT CONF ACOUST SPEE, P645, DOI 10.1109/ICASSP.1998.675347
  • [6] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
    DAVIS, SB
    MERMELSTEIN, P
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
  • [7] A robust high accuracy speech recognition system for mobile applications
    Deligne, S
    Dharainipragada, S
    Gopinath, R
    Maison, B
    Olsen, P
    Printz, H
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (08): : 551 - 561
  • [8] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [9] DHARANIPRAGADA S, 2003, P ICASSP, V1, P904
  • [10] VITERBI ALGORITHM
    FORNEY, GD
    [J]. PROCEEDINGS OF THE IEEE, 1973, 61 (03) : 268 - 278