Indexes to Find the Optimal Number of Clusters in a Hierarchical Clustering

被引：2

作者：

David Martin-Fernandez, Jose ^{[1
]}

Maria Luna-Romera, Jose ^{[1
]}

Pontes, Beatriz ^{[1
]}

Riquelme-Santos, Jose C. ^{[1
]}

机构：

[1] Univ Seville, E-41012 Seville, Spain

来源：

14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019) | 2020年 / 950卷

关键词：

Machine Learning; Hierarchical clustering; Internal validation indexes; BIG DATA; ALGORITHMS;

D O I：

10.1007/978-3-030-20055-8_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering analysis is one of the most commonly used techniques for uncovering patterns in data mining. Most clustering methods require establishing the number of clusters beforehand. However, due to the size of the data currently used, predicting that value is at a high computational cost task in most cases. In this article, we present a clustering technique that avoids this requirement, using hierarchical clustering. There are many examples of this procedure in the literature, most of them focusing on the dissociative or descending subtype, while in this article we cover the agglomerative or ascending subtype. Being more expensive in computational and temporal cost, it nevertheless allows us to obtain very valuable information, regarding elements membership to clusters and their groupings, that is to say, their dendrogram. Finally, several sets of data have been used, varying their dimensionality. For each of them, we provide the calculations of internal validation indexes to test the algorithm developed, studying which of them provides better results to obtain the best possible clustering.

引用

页码：3 / 13

页数：11

共 18 条

[1]

[Anonymous], 2003, SOSP

[2] CLUSTER SEPARATION MEASURE [J].