Model-based clustering using copulas with applications

被引:0
作者
Ioannis Kosmidis
Dimitris Karlis
机构
[1] University College London,Department of Statistical Science
[2] Athens University of Economics and Business,Department of Statistics
来源
Statistics and Computing | 2016年 / 26卷
关键词
Mixture models; Dependence modelling; Parametric rotations; Multivariate discrete data; Mixed-domain data;
D O I
暂无
中图分类号
学科分类号
摘要
The majority of model-based clustering techniques is based on multivariate normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: (i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and (ii) the explicit choice of marginal distributions for the clusters allows the modelling of multivariate data of various modes (either discrete or continuous) in a natural way. This paper introduces and studies the framework of copula-based finite mixture models for clustering applications. Estimation in the general case can be performed using standard EM, and, depending on the mode of the data, more efficient procedures are provided that can fully exploit the copula structure. The closure properties of the mixture models under marginalization are discussed, and for continuous, real-valued data parametric rotations in the sample space are introduced, with a parallel discussion on parameter identifiability depending on the choice of copulas for the components. The exposition of the methodology is accompanied and motivated by the analysis of real and artificial data.
引用
收藏
页码:1079 / 1099
页数:20
相关论文
共 55 条
  • [1] Alfo M(2011)A finite mixture model for multivariate counts under endogenous selectivity Stat. Comput. 21 185-202
  • [2] Maruotti A(2011)Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis J. Stat. Plan. Inference 141 1479-1486
  • [3] Trovato G(1993)Model-based Gaussian and non-Gaussian clustering Biometrics 49 803-821
  • [4] Andrews JL(2002)Vines—a new graphical model for dependent random variables Ann. Stat. 30 1031-1068
  • [5] McNicholas PD(2012)Model-based clustering, classification, and discriminant analysis of data with mixed type J. Stat. Plan. Inference 142 2976-2984
  • [6] Banfield JD(1995)Gaussian parsimonious clustering models Pattern Recogn. 28 781-793
  • [7] Raftery AE(2013)Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas Adv. Data Anal. Classif. 7 339-357
  • [8] Bedford T(2012)A copula-based algorithm for discovering patterns of dependent observations J. Classif. 29 50-75
  • [9] Cooke RM(2002)The meta-elliptical distributions with given marginals J. Multivar. Anal. 82 1-16
  • [10] Browne R(2014)A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering Stat. Comput. 24 971-984