Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions

被引:100
作者
Andrews, Jeffrey L. [1 ]
McNicholas, Paul D. [1 ]
机构
[1] Univ Guelph, Dept Math & Stat, Guelph, ON N1G 2W1, Canada
基金
加拿大创新基金会;
关键词
Classification; Clustering; Discriminant analysis; Eigen-decomposition; Mixture models; Model-based clustering; Multivariate t-distribution; MAXIMUM-LIKELIHOOD; VARIABLE SELECTION; FACTOR ANALYZERS; ALGORITHM;
D O I
10.1007/s11222-011-9272-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The last decade has seen an explosion of work on the use of mixture models for clustering. The use of the Gaussian mixture model has been common practice, with constraints sometimes imposed upon the component covariance matrices to give families of mixture models. Similar approaches have also been applied, albeit with less fecundity, to classification and discriminant analysis. In this paper, we begin with an introduction to model-based clustering and a succinct account of the state-of-the-art. We then put forth a novel family of mixture models wherein each component is modeled using a multivariate t-distribution with an eigen-decomposed covariance structure. This family, which is largely a t-analogue of the well-known MCLUST family, is known as the tEIGEN family. The efficacy of this family for clustering, classification, and discriminant analysis is illustrated with both real and simulated data. The performance of this family is compared to its Gaussian counterpart on three real data sets.
引用
收藏
页码:1021 / 1029
页数:9
相关论文
共 52 条
  • [1] Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis
    Andrews, Jeffrey L.
    McNicholas, Paul D.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (04) : 1479 - 1486
  • [2] Extending mixtures of multivariate t-factor analyzers
    Andrews, Jeffrey L.
    McNicholas, Paul D.
    [J]. STATISTICS AND COMPUTING, 2011, 21 (03) : 361 - 373
  • [3] Model-based classification via mixtures of multivariate t-distributions
    Andrews, Jeffrey L.
    McNicholas, Paul D.
    Subedi, Sanjeena
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (01) : 520 - 529
  • [4] [Anonymous], 2000, Sankhya Ser. A, DOI DOI 10.2307/25051289
  • [5] [Anonymous], 6 BERK S MATH STAT P
  • [6] [Anonymous], 2010, LANG ENV STAT COMP
  • [7] [Anonymous], 2006, MCLUST VERSION 3 R N
  • [8] MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING
    BANFIELD, JD
    RAFTERY, AE
    [J]. BIOMETRICS, 1993, 49 (03) : 803 - 821
  • [9] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS
    BAUM, LE
    PETRIE, T
    SOULES, G
    WEISS, N
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01): : 164 - &
  • [10] BAYESIAN COMPUTATION AND STOCHASTIC-SYSTEMS
    BESAG, J
    GREEN, P
    HIGDON, D
    MENGERSEN, K
    [J]. STATISTICAL SCIENCE, 1995, 10 (01) : 3 - 41