A novel cluster validity index based on augmented non-shared nearest neighbors

被引:11
作者
Duan, Xinjie [1 ]
Ma, Yan [1 ]
Zhou, Yuqing [1 ]
Huang, Hui [1 ]
Wang, Bin [1 ]
机构
[1] Shanghai Normal Univ, Coll Informat Mech & Elect Engn, Shanghai 200234, Peoples R China
关键词
Validity index; Within -cluster compactness; Between -cluster separation; Shared nearest neighbors; INTERNAL INDEX; INFORMATION; VALIDATION; FIND;
D O I
10.1016/j.eswa.2023.119784
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The true cluster number of the dataset in practical applications is rarely known in advance. Therefore, it is necessary to use a cluster validity index to evaluate the clustering results and determine the optimal cluster number. However, the performance of existing cluster validity indices is vulnerable to various factors such as cluster shape and density. To solve the above issues, this paper proposes a new cluster validity index based on augmented non-shared nearest neighbors (ANCV). The ANCV index is based on the following principles: (1) Within-cluster compactness can be measured by the distance between the pairs of data points with fewer shared nearest neighbors. (2) The distances between the pairs of data points at the intersection of clusters can be used to estimate the between-cluster separation. On this basis, the above point pairs are further extended to their augmented non-shared nearest neighbors, thereby forming small clusters. Then, the average distance within and between these clusters is calculated respectively to estimate the within-cluster compactness and between-cluster separation. Finally, the optimal number of clusters is determined by the difference between the between-cluster separation and the within-cluster compactness. Experimental results on both 12 two-dimensional synthetic datasets and 10 real datasets from UCI have shown that the ANCV index performs the best among all compared indices.
引用
收藏
页数:16
相关论文
共 43 条
[1]   A point symmetry-based clustering technique for automatic evolution of clusters [J].
Bandyopadhyay, Sanghamitra ;
Saha, Sriparna .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) :1441-1457
[2]  
Cengizler C, 2017, Br J Math Comput Sci, V22, P1, DOI [10.9734/BJMCS/2017/33729, DOI 10.9734/BJMCS/2017/33729]
[3]   A Novel Cluster Validity Index Based on Local Cores [J].
Cheng, Dongdong ;
Zhu, Qingsheng ;
Huang, Jinlong ;
Wu, Quanwang ;
Yang, Lijun .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (04) :985-999
[4]   UIFDBC: Effective density based clustering to find clusters of arbitrary shapes without user input [J].
Chowdhury, Hussain Ahmed ;
Bhattacharyya, Dhruba Kumar ;
Kalita, Jugal Kumar .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 186
[5]   A parallel algorithm for minimum spanning tree on GPU [J].
de Alencar Vasconcellos, Jucele Franca ;
Caceres, Edson Norberto ;
Mongelli, Henrique ;
Song, Siang Wun .
2017 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2017, :67-72
[6]  
de Souto M. C. P., 2012, 2012 Brazilian Symposium on Neural Networks (SBRN 2012), P49, DOI 10.1109/SBRN.2012.25
[7]  
Dunn J. C., 1974, Journal of Cybernetics, V4, P95, DOI 10.1080/01969727408546059
[8]   SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index [J].
Gurrutxaga, Ibai ;
Albisua, Inaki ;
Arbelaitz, Olatz ;
Martin, Jose I. ;
Muguerza, Javier ;
Perez, Jesus M. ;
Perona, Inigo .
PATTERN RECOGNITION, 2010, 43 (10) :3364-3373
[9]   dbscan: Fast Density-Based Clustering with R [J].
Hahsler, Michael ;
Piekenbrock, Matthew ;
Doran, Derek .
JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (01) :1-30
[10]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218