AutoElbow: An Automatic Elbow Detection Method for Estimating the Number of Clusters in a Dataset

被引:15
|
作者
Onumanyi, Adeiza James [1 ]
Molokomme, Daisy Nkele [1 ]
Isaac, Sherrin John [1 ]
Abu-Mahfouz, Adnan M. [1 ,2 ]
机构
[1] CSIR, Next Generat Enterprises & Inst, ZA-0001 Pretoria, South Africa
[2] Univ Johannesburg, Dept Elect & Elect Engn Sci, ZA-2006 Johannesburg, South Africa
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期
关键词
automatic; clustering; elbow method; K-means; unsupervised;
D O I
10.3390/app12157515
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The elbow technique is a well-known method for estimating the number of clusters required as a starting parameter in the K-means algorithm and certain other unsupervised machine-learning algorithms. However, due to the graphical output nature of the method, human assessment is necessary to determine the location of the elbow and, consequently, the number of data clusters. This article presents a simple method for estimating the elbow point, thus, enabling the K-means algorithm to be readily automated. First, the elbow-based graph is normalized using the graph's minimum and maximum values along the ordinate and abscissa coordinates. Then, the distance between each point on the graph to the minimum (i.e., the origin) and maximum reference points, and the "heel" of the graph are calculated. The estimated elbow location is, thus, the point that maximizes the ratio of these distances, which corresponds to an approximate number of clusters in the dataset. We demonstrate that the strategy is effective, stable, and adaptable over different types of datasets characterized by small and large clusters, different cluster shapes, high dimensionality, and unbalanced distributions. We provide the clustering community with a description of the method and present comparative results against other well-known methods in the prior state of the art.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] A fast method for discovering suitable number of clusters for fuzzy clustering
    Hsu, Ping-Yu
    Phan-Anh-Huy Nguyen
    INTELLIGENT DATA ANALYSIS, 2022, 26 (06) : 1523 - 1538
  • [32] An entropy-based initialization method of K-means clustering on the optimal number of clusters
    Kuntal Chowdhury
    Debasis Chaudhuri
    Arup Kumar Pal
    Neural Computing and Applications, 2021, 33 : 6965 - 6982
  • [33] Automatic selection of the number of clusters using Bayesian clustering and sparsity-inducing priors
    Valle, Denis
    Jameel, Yusuf
    Betancourt, Brenda
    Azeria, Ermias T.
    Attias, Nina
    Cullen, Joshua
    ECOLOGICAL APPLICATIONS, 2022, 32 (03)
  • [34] A Modified Multiobjective EA-based Clustering Algorithm with Automatic Determination of the Number of Clusters
    Tsai, Chun-Wei
    Chen, Wen-Ling
    Chiang, Ming-Chao
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2833 - 2838
  • [35] Approaches for the Clustering of Geographic Metadata and the Automatic Detection of Quasi-Spatial Dataset Series
    Lacasta, Javier
    Javier Lopez-Pellicer, Francisco
    Zarazaga-Soria, Javier
    Bejar, Ruben
    Nogueras-Iso, Javier
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (02)
  • [36] An adaptive optimization method for estimating the number of components in a Gaussian mixture model
    Sun, Shuping
    Tong, Yaonan
    Zhang, Biqiang
    Yang, Bowen
    He, Peiguang
    Song, Wei
    Yang, Wenbo
    Wu, Yilin
    Liu, Guangyu
    JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 64
  • [37] A Method to Determine the Number of Clusters Based on Multi-validity Index
    Sun, Ning
    Yu, Hong
    ROUGH SETS, IJCRS 2018, 2018, 11103 : 427 - 439
  • [38] Automatic Estimation of Cluster Number in Fuzzy Co-clustering Based on Competition and Elimination of Clusters
    Ubukata, Seiki
    Yanagisawa, Kazuki
    Notsu, Akira
    Honda, Katsuhiro
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 660 - 665
  • [39] A Heuristic Method for Finding the Optimal Number of Clusters with Application in Medical Data
    Bayati, Hamidreza
    Davoudi, Heydar
    Fatemizadeh, Emad
    2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vols 1-8, 2008, : 4684 - 4687
  • [40] Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis
    Gorzalczany, Marian B.
    Rudzinski, Filip
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (07) : 2833 - 2845