Spectral embedded generalized mean based κ-nearest neighbors clustering with S-distance

被引:37
作者
Sharma, Krishna Kumar [1 ,2 ]
Seal, Ayan [1 ]
机构
[1] PDPM Indian Inst Informat Technol Design & Mfg Ja, Dept Comp Sci & Engn, Jabalpur 482005, Madhya Pradesh, India
[2] Univ Kota, Dept Comp Sci & Informat, Kota Rajasthan 324005, India
关键词
S-distance; Spectral clustering; Symmetry favored k-nearest neighbors; Generalized mean; Distributed computing;
D O I
10.1016/j.eswa.2020.114326
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The spectral clustering algorithm is extensively employed in different aspects, especially in the field of pattern recognition. However, the efficient construction of the neighborhood graph is the main reason for its promising results. Generally, the similarity matrix relies on the applied similarity measure between two data points, selection of k-nearest neighbors (KNN), and approach for the construction of a neighborhood graph. In this study, we integrate S-distance to spectral clustering, which is capable to find out the complex and non-linear cluster structures. Moreover, generalized mean distance-based KNN is proposed to decrease the sensitiveness towards the value of the k. Also, a symmetry-favored KNN method is applied to construct the neighborhood graph, which reduces the impact of outliers and noisy data points. However, spectral clustering faces scalability and speedup issues in the case of large size datasets. Thus, the proposed spectral clustering algorithm is also executed in distributed environments. Several experiments are performed to validate the proposed clustering algorithm on 20 real-world datasets and 3 large size datasets. Experimental results demonstrate that the proposed clustering algorithm outperforms some of the baseline methods in terms of accuracy and clustering error rates. Finally, we conduct Wilcoxon's Rank-Sum test and illustrate that the proposed spectral clustering algorithm is statistically significant.
引用
收藏
页数:10
相关论文
共 63 条
[1]   Clustering for Metric and Nonmetric Distance Measures [J].
Ackermann, Marcel R. ;
Bloemer, Johannes ;
Sohler, Christian .
ACM TRANSACTIONS ON ALGORITHMS, 2010, 6 (04)
[2]  
Almalawi A, 2013, C LOCAL COMPUT NETW, P639, DOI 10.1109/LCN.2013.6761301
[3]  
[Anonymous], 1998, MPI THE COMPLETE REF
[4]  
[Anonymous], 2007, NIPS
[5]  
[Anonymous], 12 PAC C COMP GRAPH
[6]   An updated set of Basic Linear Algebra Subprograms (BLAS) [J].
Blackford, LS ;
Demmel, J ;
Dongarra, J ;
Duff, I ;
Hammarling, S ;
Henry, G ;
Heroux, M ;
Kaufman, L ;
Lumsdaine, A ;
Petitet, A ;
Pozo, R ;
Remington, K ;
Whaley, RC .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2002, 28 (02) :135-151
[7]  
Bridson M. R., 1999, METRIC SPACES NON PO
[8]  
Chakraborty S, 2020, PR MACH LEARN RES, V108, P691
[9]   k-Means clustering with a new divergence-based distance metric: Convergence and performance analysis [J].
Chakraborty, Saptarshi ;
Das, Swagatam .
PATTERN RECOGNITION LETTERS, 2017, 100 :67-73
[10]   Spectral clustering: A semi-supervised approach [J].
Chen, Weifu ;
Feng, Guocan .
NEUROCOMPUTING, 2012, 77 (01) :229-242