Automatic identification of the number of clusters in hierarchical clustering

被引:34
|
作者
Karna, Ashutosh [1 ,2 ]
Gibert, Karina [3 ]
机构
[1] HP Inc, Printing & Digital Mfg 3D, Catalonia, Spain
[2] Univ Politecn Cataluna, BarcelonaTech, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
[3] Univ Politecn Cataluna, BarcelonaTech, Knowledge Engn & Machine Learning Grp, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
关键词
Hierarchical clustering; Calinski-Harabasz index; Scalability; Data Science; 3D Printing; Decision Support; ALGORITHM; PLUS;
D O I
10.1007/s00521-021-05873-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.
引用
收藏
页码:119 / 134
页数:16
相关论文
共 50 条
  • [41] An Agglomerative Hierarchical Clustering Framework for Improving the Ensemble Clustering Process
    Jafarzadegan, Mohammad
    Safi-Esfahani, Faramarz
    Beheshti, Zahra
    CYBERNETICS AND SYSTEMS, 2022, 53 (08) : 679 - 701
  • [42] Hierarchical clustering of the correlation patterns: New method of domain identification in proteins
    Yesylevskyy, SO
    Kharkyanen, VN
    Demchenko, AP
    BIOPHYSICAL CHEMISTRY, 2006, 119 (01) : 84 - 93
  • [43] A Review on Hierarchical Clustering-Based Covariance Model to ncRNA Identification
    Pratiwi, Lustiana
    Choo, Yun-Huoy
    Muda, Azah Kamilah
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2016), 2018, 614 : 571 - 581
  • [44] Hierarchical clustering with planar segments as prototypes
    Leski, Jacek M.
    Kotas, Marian
    PATTERN RECOGNITION LETTERS, 2015, 54 : 1 - 10
  • [45] Automatic Identification, Clustering and Reporting of Recurrent Faults in Electric Distribution Feeders
    Manivinnan, Karthick
    Benner, Carl L.
    Russell, B. Don
    Wischkaemper, Jeffrey A.
    2017 19TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM APPLICATION TO POWER SYSTEMS (ISAP), 2017,
  • [46] A hierarchical Clustering Method Based on PCA-Clusters Merging for Quasi-linear SVM
    Yang, Cheng
    Yang, Keshi
    Zhou, Bo
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 : 2270 - 2276
  • [47] Feature extraction and state identification in biomedical signals using hierarchical fuzzy clustering
    Geva, AB
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 1998, 36 (05) : 608 - 614
  • [48] Scalable clustering by aggregating representatives in hierarchical groups
    Xie, Wen-Bo
    Liu, Zhen
    Das, Debarati
    Chen, Bin
    Srivastava, Jaideep
    PATTERN RECOGNITION, 2023, 136
  • [49] Feature extraction and state identification in biomedical signals using hierarchical fuzzy clustering
    A. B. Geva
    Medical and Biological Engineering and Computing, 1998, 36 : 608 - 614
  • [50] Parallel DBSCAN-Martingale Estimation of the Number of Concepts for Automatic Satellite Image Clustering
    Gialampoukidis, Ilias
    Andreadis, Stelios
    Pantelidis, Nick
    Hayat, Sameed
    Zhong, Li
    Bakratsas, Marios
    Hoppe, Dennis
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 95 - 106