Merging K-means with hierarchical clustering for identifying general-shaped groups

被引:30
作者
Peterson, Anna D. [1 ]
Ghosh, Arka R. [1 ]
Maitra, Ranjan [1 ]
机构
[1] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
complete linkage; distance measure; hierarchical clustering; K-means algorithm; single linkage; MIXTURE COMPONENTS; DATA SET; NUMBER;
D O I
10.1002/sta4.172
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and K-means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tree-like structure but suffers from computational complexity in large datasets, while K-means clustering is efficient but designed to identify homogeneous spherically shaped clusters. We present a hybrid non-parametric clustering approach that amalgamates the two methods to identify general-shaped clusters and that can be applied to larger datasets. Specifically, we first partition the dataset into spherical groups using K-means. We next merge these groups using hierarchical methods with a data-driven distance measure as a stopping criterion. Our proposal has the potential to reveal groups with general shapes and structure in a dataset. We demonstrate good performance on several simulated and real datasets. Copyright (c) 2018 John Wiley & Sons, Ltd.
引用
收藏
页数:16
相关论文
共 37 条
[1]  
Alimoglu F, 1996, THESIS BOG U
[2]  
Alimoglu F., 1996, P 5 TURK ART INT ART
[3]   Combining Mixture Components for Clustering [J].
Baudry, Jean-Patrick ;
Raftery, Adrian E. ;
Celeux, Gilles ;
Lo, Kenneth ;
Gottardo, Raphael .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2010, 19 (02) :332-353
[4]   Nearest-neighbor clutter removal for estimating features in spatial point processes [J].
Byers, S ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (442) :577-584
[5]   A simulation study to compare robust clustering methods based on mixtures [J].
Coretto, Pietro ;
Hennig, Christian .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2010, 4 (2-3) :111-135
[6]  
Everitt B. S., 2001, CLUSTER ANAL
[7]  
FORINA M, 1982, ANN CHIM-ROME, V72, P143
[8]  
Forina M., 1983, FOOD RES DATA ANAL
[9]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[10]   Combining multiple clusterings using evidence accumulation [J].
Fred, ALN ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (06) :835-850