AutoElbow: An Automatic Elbow Detection Method for Estimating the Number of Clusters in a Dataset

被引:15
|
作者
Onumanyi, Adeiza James [1 ]
Molokomme, Daisy Nkele [1 ]
Isaac, Sherrin John [1 ]
Abu-Mahfouz, Adnan M. [1 ,2 ]
机构
[1] CSIR, Next Generat Enterprises & Inst, ZA-0001 Pretoria, South Africa
[2] Univ Johannesburg, Dept Elect & Elect Engn Sci, ZA-2006 Johannesburg, South Africa
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期
关键词
automatic; clustering; elbow method; K-means; unsupervised;
D O I
10.3390/app12157515
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The elbow technique is a well-known method for estimating the number of clusters required as a starting parameter in the K-means algorithm and certain other unsupervised machine-learning algorithms. However, due to the graphical output nature of the method, human assessment is necessary to determine the location of the elbow and, consequently, the number of data clusters. This article presents a simple method for estimating the elbow point, thus, enabling the K-means algorithm to be readily automated. First, the elbow-based graph is normalized using the graph's minimum and maximum values along the ordinate and abscissa coordinates. Then, the distance between each point on the graph to the minimum (i.e., the origin) and maximum reference points, and the "heel" of the graph are calculated. The estimated elbow location is, thus, the point that maximizes the ratio of these distances, which corresponds to an approximate number of clusters in the dataset. We demonstrate that the strategy is effective, stable, and adaptable over different types of datasets characterized by small and large clusters, different cluster shapes, high dimensionality, and unbalanced distributions. We provide the clustering community with a description of the method and present comparative results against other well-known methods in the prior state of the art.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] An entropy-based initialization method of K-means clustering on the optimal number of clusters
    Chowdhury, Kuntal
    Chaudhuri, Debasis
    Pal, Arup Kumar
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (12): : 6965 - 6982
  • [22] Automatic Determination of the Appropriate Number of Clusters for Multispectral Image Data
    Koonsanit, Kitti
    Jaruskulchai, Chuleerat
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (05): : 1256 - 1263
  • [23] An Automatic Approach for Solving Clustering Problem with the Number of Clusters Unknown
    Dong, Jinxin
    Qi, Minyong
    2010 SECOND ETP/IITA WORLD CONGRESS IN APPLIED COMPUTING, COMPUTER SCIENCE, AND COMPUTER ENGINEERING, 2010, : 282 - 285
  • [24] Local and Global Data Spread Based Index for Determining Number of Clusters in a Dataset
    Riyaz, Romana
    Wani, M. Arif
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 651 - 656
  • [25] Estimating the Number of Clusters as a Pre-processing Step to Unsupervised Learning
    Nietto, Paulo Rogerio
    Nicoletti, Maria do Carmo
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA 2016), 2017, 557 : 25 - 34
  • [26] Sequential clustering with particle filters - Estimating the number of clusters from data
    Schubert, J
    Sidenbladh, H
    2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 122 - 129
  • [27] Selection of the number of clusters via the bootstrap method
    Fang, Yixin
    Wang, Junhui
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (03) : 468 - 477
  • [28] Efficiently Finding the Optimum Number of Clusters in a Dataset with a New Hybrid Cellular Evolutionary Algorithm
    Arellano-Verdejo, Javier
    Guzman-Arenas, Adolfo
    Godoy-Calderon, Salvador
    Barron Fernandez, Ricardo
    COMPUTACION Y SISTEMAS, 2014, 18 (02): : 313 - 327
  • [29] Estimating the number of clusters in multivariate data by various fittings of the L-curve
    Moustafa, Rida
    Hadi, Ali S.
    COMPUTATIONAL & APPLIED MATHEMATICS, 2025, 44 (01):
  • [30] Estimating the number of clusters in a numerical data set via quantization error modeling
    Kolesnikov, Alexander
    Trichina, Elena
    Kauranne, Tuomo
    PATTERN RECOGNITION, 2015, 48 (03) : 941 - 952