Enhancing the selection of a model-based clustering with external categorical variables

被引:11
作者
Baudry, Jean-Patrick [1 ]
Cardoso, Margarida [2 ]
Celeux, Gilles [3 ]
Amorim, Maria Jose [4 ]
Ferreira, Ana Sousa [5 ]
机构
[1] Univ Paris 06, Sorbonne Univ, EA 3124, LSTA, F-75005 Paris, France
[2] ISCTE Univ Inst Lisbon, Dept Quantitat Methods Management & Econ, Business Res Unit, Lisbon, Portugal
[3] INRIA Saclay Ile de France, Orsay, France
[4] Lisbon Univ Inst, Inst Super Engn Lisboa, Lisbon, Portugal
[5] Univ Lisbon, Fac Psychol, Business Res Unit, P-1699 Lisbon, Portugal
关键词
Mixture models; Model-based clustering; Number of clusters; Penalised criteria; Categorical variables; BIC; ICL; Mixed type variables clustering;
D O I
10.1007/s11634-014-0177-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.
引用
收藏
页码:177 / 196
页数:20
相关论文
共 17 条
[1]  
[Anonymous], 2000, Sankhya Ser. A, DOI DOI 10.2307/25051289
[2]  
[Anonymous], 2008, EM ALGORITHM EXTENSI
[3]  
Bache K., 2013, UCI Machine Learning Repository
[4]  
Baudry J.-P., 2009, THESIS U PARIS SUD
[5]  
Baudry J-P, 2012, MATH STAT THEORY
[6]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[7]   Model-based cluster and discriminant analysis with the MIXMOD software [J].
Biernacki, Christophe ;
Celeux, Gilles ;
Govaert, Gerard ;
Langrognet, Florent .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (02) :587-600
[8]   MULTIVARIATE STUDY OF VARIATION IN 2 SPECIES OF ROCK CRAB OF GENUS LEPTOGRAPSUS [J].
CAMPBELL, NA ;
MAHON, RJ .
AUSTRALIAN JOURNAL OF ZOOLOGY, 1974, 22 (03) :417-425
[9]   GAUSSIAN PARSIMONIOUS CLUSTERING MODELS [J].
CELEUX, G ;
GOVAERT, G .
PATTERN RECOGNITION, 1995, 28 (05) :781-793
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38