Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions

被引:100
作者
Andrews, Jeffrey L. [1 ]
McNicholas, Paul D. [1 ]
机构
[1] Univ Guelph, Dept Math & Stat, Guelph, ON N1G 2W1, Canada
基金
加拿大创新基金会;
关键词
Classification; Clustering; Discriminant analysis; Eigen-decomposition; Mixture models; Model-based clustering; Multivariate t-distribution; MAXIMUM-LIKELIHOOD; VARIABLE SELECTION; FACTOR ANALYZERS; ALGORITHM;
D O I
10.1007/s11222-011-9272-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The last decade has seen an explosion of work on the use of mixture models for clustering. The use of the Gaussian mixture model has been common practice, with constraints sometimes imposed upon the component covariance matrices to give families of mixture models. Similar approaches have also been applied, albeit with less fecundity, to classification and discriminant analysis. In this paper, we begin with an introduction to model-based clustering and a succinct account of the state-of-the-art. We then put forth a novel family of mixture models wherein each component is modeled using a multivariate t-distribution with an eigen-decomposed covariance structure. This family, which is largely a t-analogue of the well-known MCLUST family, is known as the tEIGEN family. The efficacy of this family for clustering, classification, and discriminant analysis is illustrated with both real and simulated data. The performance of this family is compared to its Gaussian counterpart on three real data sets.
引用
收藏
页码:1021 / 1029
页数:9
相关论文
共 52 条
  • [31] BREAST-CANCER DIAGNOSIS AND PROGNOSIS VIA LINEAR-PROGRAMMING
    MANGASARIAN, OL
    STREET, WN
    WOLBERG, WH
    [J]. OPERATIONS RESEARCH, 1995, 43 (04) : 570 - 577
  • [32] Variable Selection for Clustering with Gaussian Mixture Models
    Maugis, Cathy
    Celeux, Gilles
    Martin-Magniette, Marie-Laure
    [J]. BIOMETRICS, 2009, 65 (03) : 701 - 709
  • [33] Mclachlan G., 1988, Mixture Models: Inference and Applications to Clustering, V38
  • [34] Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution
    McLachlan, G. J.
    Bean, R. W.
    Jones, L. Ben-Tovim
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (11) : 5327 - 5338
  • [35] McLachlan G. J., 1992, Discriminant Analysis and Statistical Pattern Recognition. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics
  • [36] McLachlan G. J., 1982, Handbook of Statistics, V2, P199, DOI [DOI 10.1016/S0169-7161(82)02012-4, DOI 10.1016/s0169-7161(82)02012-4]
  • [37] McLachlan Geoffrey., 2000, P 17 INT C MACHINE L, P599
  • [38] MCLACHLAN GJ, 1998, LECT NOTES COMPUTER, V1451, P658
  • [39] Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models
    McNicholas, P. D.
    Murphy, T. B.
    McDaid, A. F.
    Frost, D.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (03) : 711 - 723
  • [40] Model-based clustering of microarray expression data via latent Gaussian mixture models
    McNicholas, Paul D.
    Murphy, Thomas Brendan
    [J]. BIOINFORMATICS, 2010, 26 (21) : 2705 - 2712