Density-based clustering with non-continuous data

被引:2
作者
Azzalini, Adelchi [1 ]
Menardi, Giovanna [1 ]
机构
[1] Univ Padua, Dipartimento Sci Stat, Padua, Italy
关键词
Density estimation; Mixed variables; Modal clustering; Model-based clustering; Multidimensional scaling; DISCRIMINANT-ANALYSIS; MODEL; TREE;
D O I
10.1007/s00180-016-0644-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Density-based clustering relies on the idea of associating groups with regions of the sample space characterized by high density of the probability distribution underlying the observations. While this approach to cluster analysis exhibits some desirable properties, its use is necessarily limited to continuous data only. The present contribution proposes a simple but working way to circumvent this problem, based on the identification of continuous components underlying the non-continuous variables. The basic idea is explored in a number of variants applied to simulated data, confirming the practical effectiveness of the technique and leading to recommendations for its practical usage. Some illustrations using real data are also presented.
引用
收藏
页码:771 / 798
页数:28
相关论文
共 32 条
[21]  
Kaufman L., 2009, Finding groups in data: an introduction to cluster analysis
[22]  
Leisch F., 2004, Journal of Statistical Software, V11, P1, DOI [DOI 10.18637/JSS.V011.I08, https://doi.org/10.18637/jss.v011.i08]
[23]   Robust mixture modeling using multivariate skew t distributions [J].
Lin, Tsung-I .
STATISTICS AND COMPUTING, 2010, 20 (03) :343-356
[24]   Model-Based Clustering for Conditionally Correlated Categorical Data [J].
Marbac, Matthieu ;
Biernacki, Christophe ;
Vandewalle, Vincent .
JOURNAL OF CLASSIFICATION, 2015, 32 (02) :145-175
[25]   An advancement in clustering via nonparametric density estimation [J].
Menardi, Giovanna ;
Azzalini, Adelchi .
STATISTICS AND COMPUTING, 2014, 24 (05) :753-767
[26]  
Oh M, 1998, J COMPUT GRAPH STAT, V16, P559
[27]   Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample [J].
Stuetzle, W .
JOURNAL OF CLASSIFICATION, 2003, 20 (01) :25-47
[28]   A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density [J].
Stuetzle, Werner ;
Nugent, Rebecca .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2010, 19 (02) :397-418
[29]   Multidimensional scaling for large genomic data sets [J].
Tzeng, Jengnan ;
Lu, Henry Horng-Shing ;
Li, Wen-Hsiung .
BMC BIOINFORMATICS, 2008, 9 (1)
[30]  
Venables V.N., 2002, Modern Applied Statistics with S, VFourth