Density-based clustering with non-continuous data

被引:2
作者
Azzalini, Adelchi [1 ]
Menardi, Giovanna [1 ]
机构
[1] Univ Padua, Dipartimento Sci Stat, Padua, Italy
关键词
Density estimation; Mixed variables; Modal clustering; Model-based clustering; Multidimensional scaling; DISCRIMINANT-ANALYSIS; MODEL; TREE;
D O I
10.1007/s00180-016-0644-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Density-based clustering relies on the idea of associating groups with regions of the sample space characterized by high density of the probability distribution underlying the observations. While this approach to cluster analysis exhibits some desirable properties, its use is necessarily limited to continuous data only. The present contribution proposes a simple but working way to circumvent this problem, based on the identification of continuous components underlying the non-continuous variables. The basic idea is explored in a number of variants applied to simulated data, confirming the practical effectiveness of the technique and leading to recommendations for its practical usage. Some illustrations using real data are also presented.
引用
收藏
页码:771 / 798
页数:28
相关论文
共 32 条
[1]   The Clustering of Categorical Data: A Comparison of a Model-based and a Distance-based Approach [J].
Anderlucci, Laura ;
Hennig, Christian .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2014, 43 (04) :704-721
[2]  
[Anonymous], NUMERICAL TAXONOMY
[3]  
[Anonymous], 2011, R: A Language and Environment for Statistical Computing
[4]  
[Anonymous], 2013, R PACKAGE
[5]  
[Anonymous], 1999, Latent Variable Models and Factor Analysis
[6]  
[Anonymous], 1980, Multivariate Analysis
[7]  
[Anonymous], 2010, UCI machine learning repository
[8]  
Arabie P, 1994, HDB MARKETING RES
[9]  
AZZALINI A, 2014, J STAT SOFTW, V57, P1
[10]   Clustering via nonparametric density estimation [J].
Azzalini, Adelchi ;
Torelli, Nicola .
STATISTICS AND COMPUTING, 2007, 17 (01) :71-80