Unifying data units and models in (co-)clustering

被引:2
作者
Biernacki, Christophe [1 ,2 ,3 ]
Lourme, Alexandre [4 ]
机构
[1] Univ Lille, Lille, France
[2] INRIA, Lille, France
[3] CNRS, Lille, France
[4] Univ Bordeaux, Bordeaux, France
关键词
Measurement units; Mixed data; Mixture models; Model selection; Non-identifiability; MIXTURE;
D O I
10.1007/s11634-018-0325-2
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Statisticians are already aware that any task (exploration, prediction) involving a modeling process is largely dependent on the measurement units for the data, to the extent that it should be impossible to provide a statistical outcome without specifying the couple (unit,model). In this work, this general principle is formalized with a particular focus on model-based clustering and co-clustering in the case of possibly mixed data types (continuous and/or categorical and/or counting features), and this opportunity is used to revisit what the related data units are. Such a formalization allows us to raise three important spots: (i) the couple (unit,model) is not identifiable so that different interpretations unit/model of the same whole modeling process are always possible; (ii) combining different classical units with different classical models should be an interesting opportunity for a cheap, wide and meaningful expansion of the whole modeling process family designed by the couple (unit,model); (iii) if necessary, this couple, up to the non-identifiability property, could be selected by any traditional model selection criterion. Some experiments on real data sets illustrate in detail practical benefits arising from the previous three spots.
引用
收藏
页码:7 / 31
页数:25
相关论文
共 67 条
[1]   Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions [J].
Andrews, Jeffrey L. ;
McNicholas, Paul D. .
STATISTICS AND COMPUTING, 2012, 22 (05) :1021-1029
[2]  
[Anonymous], TPAMI
[3]  
[Anonymous], P IND STAT I GOLD JU
[4]  
[Anonymous], 1971, FDN MEASUREMENT ADDI
[5]  
[Anonymous], MODEL CHOICE MODEL A
[6]  
[Anonymous], THESIS
[7]  
[Anonymous], 47 JOURN STAT SFDS
[8]  
[Anonymous], J STAT SOFTW
[9]  
[Anonymous], ARXIV150106314
[10]  
[Anonymous], 2013, COCLUSTERING