How Good Is The Euclidean Distance Metric For The Clustering Problem

被引:18
作者
Bouhmala, Noureddine [1 ]
机构
[1] Univ Coll Southeast Norway, Dept Techolol & Maritime Innovat, Borre, Norway
来源
PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016 | 2016年
关键词
clustering problem; euclidean distance; K-Means; ALGORITHM;
D O I
10.1109/IIAI-AAI.2016.26
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data Mining is concerned with the discovery of interesting patterns and knowledge in data repositories. Cluster Analysis which belongs to the core methods of data mining is the process of discovering homogeneous groups called clusters. Given a data-set and some measure of similarity between data objects, the goal in most clustering algorithms is maximizing both the homogeneity within each cluster and the heterogeneity between different clusters. In this work, test cases are used to demonstrate that the Euclidean Distance widely in literature is not a suitable metric for capturing the quality of the clustering.
引用
收藏
页码:312 / 315
页数:4
相关论文
共 19 条
[1]  
Adnan K., 2011, J APPL SCI, V11
[2]  
Alckmin D. P. F., 2012, REV INVESTIGATION OP, V33, P141
[3]  
[Anonymous], 1999, TR99020 U MINN DEP C
[4]  
Baldi P., 2001, Bioinformatics: The Machine Learning Approach
[5]   Integrative data mining: The new direction in bioinformatics [J].
Bertone, P ;
Gerstein, M .
IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2001, 20 (04) :33-40
[6]  
Bigus J. P., 1996, DATA MINING NEURAL N
[7]  
BOLEY D, 1999, DECISION SUPPORT SYS
[8]  
Everitt B. S., 2001, CLUSTER ANAL
[9]   THE COMPLEXITY OF THE GENERALIZED LLOYD MAX PROBLEM [J].
GAREY, MR ;
JOHNSON, DS ;
WITSENHAUSEN, HS .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1982, 28 (02) :255-256
[10]  
Jain AK, 1988, Algorithms for Clustering Data