AutoElbow: An Automatic Elbow Detection Method for Estimating the Number of Clusters in a Dataset

被引:15
|
作者
Onumanyi, Adeiza James [1 ]
Molokomme, Daisy Nkele [1 ]
Isaac, Sherrin John [1 ]
Abu-Mahfouz, Adnan M. [1 ,2 ]
机构
[1] CSIR, Next Generat Enterprises & Inst, ZA-0001 Pretoria, South Africa
[2] Univ Johannesburg, Dept Elect & Elect Engn Sci, ZA-2006 Johannesburg, South Africa
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期
关键词
automatic; clustering; elbow method; K-means; unsupervised;
D O I
10.3390/app12157515
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The elbow technique is a well-known method for estimating the number of clusters required as a starting parameter in the K-means algorithm and certain other unsupervised machine-learning algorithms. However, due to the graphical output nature of the method, human assessment is necessary to determine the location of the elbow and, consequently, the number of data clusters. This article presents a simple method for estimating the elbow point, thus, enabling the K-means algorithm to be readily automated. First, the elbow-based graph is normalized using the graph's minimum and maximum values along the ordinate and abscissa coordinates. Then, the distance between each point on the graph to the minimum (i.e., the origin) and maximum reference points, and the "heel" of the graph are calculated. The estimated elbow location is, thus, the point that maximizes the ratio of these distances, which corresponds to an approximate number of clusters in the dataset. We demonstrate that the strategy is effective, stable, and adaptable over different types of datasets characterized by small and large clusters, different cluster shapes, high dimensionality, and unbalanced distributions. We provide the clustering community with a description of the method and present comparative results against other well-known methods in the prior state of the art.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
    Shi, Congming
    Wei, Bingtao
    Wei, Shoulin
    Wang, Wen
    Liu, Hai
    Liu, Jialei
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2021, 2021 (01)
  • [2] A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
    Congming Shi
    Bingtao Wei
    Shoulin Wei
    Wen Wang
    Hai Liu
    Jialei Liu
    EURASIP Journal on Wireless Communications and Networking, 2021
  • [3] DP-Dip: A skinny method for estimating the number and center of clusters
    Xu, Shuaijing
    Bie, Rongfang
    Li, Liangchi
    Yang, Yuqi
    2017 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2018, 129 : 2 - 8
  • [4] Estimating the Number of Clusters Using Cross-Validation
    Fu, Wei
    Perry, Patrick O.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) : 162 - 173
  • [5] Universal and automatic elbow detection for learning the effective number of components in model selection problems
    Morgado, Eduardo
    Martino, Luca
    San Millan-Castillo, Roberto
    DIGITAL SIGNAL PROCESSING, 2023, 140
  • [6] Automatic estimation of clusters number for K-means
    Sabri, My Abdelouahed
    Ennouni, Assia
    Aarab, Abdellah
    2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 450 - 454
  • [7] Estimating the number of clusters in a data set via the gap statistic
    Tibshirani, R
    Walther, G
    Hastie, T
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 : 411 - 423
  • [8] ClusterVote: Automatic Summarization Dataset Construction with Document Clusters
    Chernyshev, Daniil
    Dobrov, Boris
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 99 - 113
  • [9] Estimating the number of clusters in DNA microarray data
    Bolshakova, N
    Azuaje, F
    METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
  • [10] RSQRT: An heuristic for estimating the number of clusters to report
    Carlis, John
    Bruso, Kelsey
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2012, 11 (02) : 152 - 158