Automatic identification of the number of clusters in hierarchical clustering

被引:34
|
作者
Karna, Ashutosh [1 ,2 ]
Gibert, Karina [3 ]
机构
[1] HP Inc, Printing & Digital Mfg 3D, Catalonia, Spain
[2] Univ Politecn Cataluna, BarcelonaTech, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
[3] Univ Politecn Cataluna, BarcelonaTech, Knowledge Engn & Machine Learning Grp, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
关键词
Hierarchical clustering; Calinski-Harabasz index; Scalability; Data Science; 3D Printing; Decision Support; ALGORITHM; PLUS;
D O I
10.1007/s00521-021-05873-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.
引用
收藏
页码:119 / 134
页数:16
相关论文
共 50 条
  • [21] Automatic Determination of the Appropriate Number of Clusters for Multispectral Image Data
    Koonsanit, Kitti
    Jaruskulchai, Chuleerat
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (05): : 1256 - 1263
  • [22] Automatic Classification of Securities using Hierarchical Clustering of the 10-Ks
    Yang, Hoseong
    Lee, Hye Jin
    Cho, Sungzoon
    Cho, Eugene
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3936 - 3943
  • [23] Probabilistic hierarchical clustering based identification and segmentation of brain tumors in magnetic resonance imaging
    Vidyarthi, Ankit
    BIOMEDICAL ENGINEERING-BIOMEDIZINISCHE TECHNIK, 2024, 69 (02): : 181 - 192
  • [24] REPLICATION AS A RULE FOR DETERMINING THE NUMBER OF CLUSTERS IN HIERARCHICAL CLUSTER-ANALYSIS
    OVERALL, JE
    MAGEE, KN
    APPLIED PSYCHOLOGICAL MEASUREMENT, 1992, 16 (02) : 119 - 128
  • [25] Estimating the Number of Endmembers in Hyperspectral Imagery Using Hierarchical Agglomerate Clustering
    Wu, Jee-Cheng
    Wu, Heng-Yang
    Tsuei, Gwo-Chyang
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XIX, 2013, 8892
  • [26] An Automatic R and T Peak Detection Method Based on the Combination of Hierarchical Clustering and Discrete Wavelet Transform
    Chen, Hanjie
    Maharatna, Koushik
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (10) : 2825 - 2832
  • [27] Subspace Clustering Without Knowing the Number of Clusters: A Parameter Free Approach
    Menon, Vishnu
    Muthukrishnan, Gokularam
    Kalyani, Sheetal
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 (68) : 5047 - 5062
  • [28] Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
    Dinh, Duy-Tai
    Fujinami, Tsutomu
    Huynh, Van-Nam
    KNOWLEDGE AND SYSTEMS SCIENCES, KSS 2019, 2019, 1103 : 1 - 17
  • [29] Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification
    Xin Xu
    Jiaheng Lu
    Wei Wang
    Journal of Computer Science and Technology, 2018, 33 : 807 - 822
  • [30] Identification of Power System Dynamic Signature Using Hierarchical Clustering
    Guo, Tingyan
    Milanovic, J. V.
    2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,