Blanket Clusterer: A Tool for Automating the Clustering in Unsupervised Learning

被引:0
作者
Bogdanoski, Konstantin [1 ]
Mishev, Kostadin [1 ]
Trajanov, Dimitar [1 ]
机构
[1] Ss Cyril & Methodius Univ, Fac Comp Sci & Engn, Rugjer Boshkovikj 16, Skopje, North Macedonia
来源
DELTA: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS | 2022年
关键词
Unsupervised Learning; Clustering; Hierarchical Clustering; Data Visualization; Machine Learning; Algorithm Optimisation; Machine Learning Tools; Blanket Clusterer; Silhouette Score;
D O I
10.5220/0011276000003277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a generic hierarchical clustering algorithm - named Blanket Clusterer, which allows researchers to examine their data and verify the results gained from other machine learning techniques. We also integrate a three-dimensional visualization plugin that provides better understanding of the clustering results. We verify the tool on a specific use-case, i.e., measuring the clustering techniques performances on a textual dataset based solely on ICD-9 descriptions encoded using the Word2Vec distributed representations. The verification shows that Blanket Clusterer provides an efficient pipeline for evaluating and interpreting the most frequently used clustering methods in unsupervised learning.
引用
收藏
页码:125 / 131
页数:7
相关论文
共 15 条
[1]  
Abdulhafedh A., 2021, J CITY DEV, V3, P12, DOI [DOI 10.12691/JCD-3-1-3, 10.12691/jcd-3-1-3]
[2]  
Bhardwaj K K., 2019, Internet of Things in Biomedical Engineering, P161
[3]  
Chami I., 2020, TREES CONTINUOUS EMB
[4]   A Cost Function for Similarity-Based Hierarchical Clustering [J].
Dasgupta, Sanjoy .
STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2016, :118-127
[5]   Does Principal Component Analysis Improve Cluster-Based Analysis? [J].
Farjo, Joan ;
Abou Assi, Rawad ;
Masri, Wes ;
Zaraket, Fadi .
IEEE SIXTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2013), 2013, :400-403
[6]  
Gao Z., 2021, arXiv
[7]   Unsupervised Learning Methods for Molecular Simulation Data [J].
Glielmo, Aldo ;
Husic, Brooke E. ;
Rodriguez, Alex ;
Clementi, Cecilia ;
Noe, Frank ;
Laio, Alessandro .
CHEMICAL REVIEWS, 2021, 121 (16) :9722-9758
[8]   Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980-2019) [J].
Govender, P. ;
Sivakumar, V .
ATMOSPHERIC POLLUTION RESEARCH, 2020, 11 (01) :40-56
[9]   AutoML: A survey of the state-of-the-art [J].
He, Xin ;
Zhao, Kaiyong ;
Chu, Xiaowen .
KNOWLEDGE-BASED SYSTEMS, 2021, 212
[10]   Combining hierarchical clustering approaches using the PCA method [J].
Jafarzadegan, Mohammad ;
Safi-Esfahani, Faramarz ;
Beheshti, Zahra .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 137 :1-10