Indexes to Find the Optimal Number of Clusters in a Hierarchical Clustering

被引:2
作者
David Martin-Fernandez, Jose [1 ]
Maria Luna-Romera, Jose [1 ]
Pontes, Beatriz [1 ]
Riquelme-Santos, Jose C. [1 ]
机构
[1] Univ Seville, E-41012 Seville, Spain
来源
14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019) | 2020年 / 950卷
关键词
Machine Learning; Hierarchical clustering; Internal validation indexes; BIG DATA; ALGORITHMS;
D O I
10.1007/978-3-030-20055-8_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering analysis is one of the most commonly used techniques for uncovering patterns in data mining. Most clustering methods require establishing the number of clusters beforehand. However, due to the size of the data currently used, predicting that value is at a high computational cost task in most cases. In this article, we present a clustering technique that avoids this requirement, using hierarchical clustering. There are many examples of this procedure in the literature, most of them focusing on the dissociative or descending subtype, while in this article we cover the agglomerative or ascending subtype. Being more expensive in computational and temporal cost, it nevertheless allows us to obtain very valuable information, regarding elements membership to clusters and their groupings, that is to say, their dendrogram. Finally, several sets of data have been used, varying their dimensionality. For each of them, we provide the calculations of internal validation indexes to test the algorithm developed, studying which of them provides better results to obtain the best possible clustering.
引用
收藏
页码:3 / 13
页数:11
相关论文
共 18 条
[1]  
[Anonymous], 2003, SOSP
[2]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[3]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[4]  
Dunn J. C., 1974, Journal of Cybernetics, V4, P95, DOI 10.1080/01969727408546059
[5]   A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis [J].
Fahad, Adil ;
Alshatri, Najlaa ;
Tari, Zahir ;
Alamri, Abdullah ;
Khalil, Ibrahim ;
Zomaya, Albert Y. ;
Foufou, Sebti ;
Bouras, Abdelaziz .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) :267-279
[6]   Big Earth Data: a new challenge and opportunity for Digital Earth's development [J].
Guo, Huadong ;
Liu, Zhen ;
Jiang, Hao ;
Wang, Changlin ;
Liu, Jie ;
Liang, Dong .
INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2017, 10 (01) :1-12
[7]  
Hastie T, 2009, ELEMENTS STAT LEARNI, DOI 10.1007/978-0-387-84858-7
[8]  
Kim Era, 2014, AMIA Annu Symp Proc, V2014, P1815
[9]   Big Data And New Knowledge In Medicine: The Thinking, Training, And Tools Needed For A Learning Health System [J].
Krumholz, Harlan M. .
HEALTH AFFAIRS, 2014, 33 (07) :1163-1170
[10]   Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space [J].
Loewenstein, Yaniv ;
Portugaly, Elon ;
Fromer, Menachem ;
Linial, Michal .
BIOINFORMATICS, 2008, 24 (13) :I41-I49