Impact of Dataset Scaling on Hierarchical Clustering: A Comparative Analysis of DistanceBased and Ratio-Based Methods

被引:0
作者
Alzahrani, Ali Rashash R. [1 ]
机构
[1] Umm Al Qura Univ, Fac Sci, Math Dept, Mecca, Saudi Arabia
关键词
distance type methods; ratio type method; median linkage; centroid linkage; average linkage;
D O I
10.28924/2291-8639-22-2024-36
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this study, the distance -based agglomerative hierarchical clustering techniques were compared to a ratiobased approach. Two real datasets, which were also used in a prior study by Roux (2018), were considered. Firstly, it was observed that the type of scaling applied to the datasets was found to affect the results of hierarchical clustering. Thus, various scaling methods were employed prior to implementing hierarchical clustering. Furthermore, two rankbased goodness -of -fit measures were used to evaluate the hierarchical clustering methods. In contrast to Roux (2018) findings, it was observed that the distance -based methods, such as Median linkage, Average linkage, and centroid linkage, performed better than the ratio -based method. The ratio -based methods also showed issues with branch crossing in the hierarchical clustering dendrogram. Consequently, this study illustrates that, with appropriate dataset scaling, the distance -based methods outperform ratio -based methods in terms of goodness -of -fit measures.
引用
收藏
页数:14
相关论文
empty
未找到相关数据