Automatic identification of the number of clusters in hierarchical clustering

被引:34
|
作者
Karna, Ashutosh [1 ,2 ]
Gibert, Karina [3 ]
机构
[1] HP Inc, Printing & Digital Mfg 3D, Catalonia, Spain
[2] Univ Politecn Cataluna, BarcelonaTech, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
[3] Univ Politecn Cataluna, BarcelonaTech, Knowledge Engn & Machine Learning Grp, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
关键词
Hierarchical clustering; Calinski-Harabasz index; Scalability; Data Science; 3D Printing; Decision Support; ALGORITHM; PLUS;
D O I
10.1007/s00521-021-05873-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.
引用
收藏
页码:119 / 134
页数:16
相关论文
共 50 条
  • [31] Optimal hierarchical clustering on a graph
    Kahvecioglu, Gokce
    Morton, David P.
    NETWORKS, 2022, 79 (02) : 143 - 163
  • [32] Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification
    Xu, Xin
    Lu, Jiaheng
    Wang, Wei
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2018, 33 (04) : 807 - 822
  • [33] AN ATTACK IDENTIFICATION SCHEME USING HIERARCHICAL DATA CLUSTERING IN MANET
    Vuppala, Satyanarayana
    Banerjee, Alokparna
    Pal, Anita
    Choudhury, Prasenjit
    THIRD INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY (ICCET 2011), 2011, : 873 - +
  • [34] Space-Time Hierarchical Clustering for Identifying Clusters in Spatiotemporal Point Data
    Lamb, David S.
    Downs, Joni
    Reader, Steven
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (02)
  • [35] A Hierarchical Clustering Method for Land Cover Change Detection and Identification
    Hame, Tuomas
    Sirro, Laura
    Kilpi, Jorma
    Seitsonen, Lauri
    Andersson, Kaj
    Melkas, Timo
    REMOTE SENSING, 2020, 12 (11)
  • [36] Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion
    Ping Tian
    Huitao Shen
    Ahad Abolfathi
    Journal of Grid Computing, 2022, 20
  • [37] Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion
    Tian, Ping
    Shen, Huitao
    Abolfathi, Ahad
    JOURNAL OF GRID COMPUTING, 2022, 20 (04)
  • [38] An Automatic Hierarchical Clustering Method for the LiDAR Point Cloud Segmentation of Buildings via Shape Classification and Outliers Reassignment
    Wang, Feng
    Zhou, Guoqing
    Xie, Jiali
    Fu, Bolin
    You, Haotian
    Chen, Jianjun
    Shi, Xue
    Zhou, Bowen
    REMOTE SENSING, 2023, 15 (09)
  • [39] Optimization of the clusters number of An improved fuzzy C-means clustering algorithm
    Xu Yejun
    10TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2015), 2015, : 931 - 935
  • [40] Incremental Clustering for Hierarchical Clustering
    Narita, Kakeru
    Hochin, Teruhisa
    Nomiya, Hiroki
    2018 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE/ INTELLIGENCE AND APPLIED INFORMATICS (CSII 2018), 2018, : 102 - 107