Spectral Clustering Approach with K-Nearest Neighbor and Weighted Mahalanobis Distance for Data Mining

被引:9
作者
Yin, Lifeng [1 ]
Lv, Lei [1 ]
Wang, Dingyi [2 ]
Qu, Yingwei [1 ]
Chen, Huayue [3 ]
Deng, Wu [4 ,5 ]
机构
[1] Dalian Jiaotong Univ, Sch Software, Dalian 116028, Peoples R China
[2] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Beijing 100044, Peoples R China
[3] China West Normal Univ, Sch Comp Sci, Nanchong 637002, Peoples R China
[4] Civil Aviat Univ China, Coll Elect Informat & Automat, Tianjin 300300, Peoples R China
[5] Southwest Jiaotong Univ, State Key Lab Tract Power, Chengdu 610031, Peoples R China
关键词
data mining; spectral clustering; Mahalanobis distance; Laplace matrix; K-means clustering;
D O I
10.3390/electronics12153284
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a spectral clustering method using k-means and weighted Mahalanobis distance (Referred to as MDLSC) to enhance the degree of correlation between data points and improve the clustering accuracy of Laplacian matrix eigenvectors. First, we used the correlation coefficient as the weight of the Mahalanobis distance to calculate the weighted Mahalanobis distance between any two data points and constructed the weighted Mahalanobis distance matrix of the data set; then, based on the weighted Mahalanobis distance matrix, we used the K-nearest neighborhood (KNN) algorithm construct similarity matrix. Secondly, the regularized Laplacian matrix was calculated according to the similarity matrix, normalized and decomposed, and the feature space for clustering was obtained. This method fully considered the degree of linear correlation between data and special spatial structure and achieved accurate clustering. Finally, various spectral clustering algorithms were used to conduct multi-angle comparative experiments on artificial and UCI data sets. The experimental results show that MDLSC has certain advantages in each clustering index and the clustering quality is better. The distribution results of the eigenvectors also show that the similarity matrix calculated by MDLSC is more reasonable, and the calculation of the eigenvectors of the Laplacian matrix maximizes the retention of the distribution characteristics of the original data, thereby improving the accuracy of the clustering algorithm.
引用
收藏
页数:23
相关论文
共 70 条
[1]  
[Anonymous], 2012, NEUROCOMPUT
[2]  
Bai Lu, 2021, Computer Engineering and Applications, V57, P15, DOI 10.3778/j.issn.1002-8331.2103-0547
[3]   A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix [J].
Berahmand, Kamal ;
Mohammadi, Mehrnoush ;
Faroughi, Azadeh ;
Mohammadiani, Rojiar Pir .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (02) :869-888
[4]   Document clustering using locality preserving indexing [J].
Cai, D ;
He, XF ;
Han, JW .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) :1624-1637
[5]  
Cai J., 2012, COMPUT ENG APPL, V48, P422
[6]   Broken ice circumferential crack estimation via image techniques [J].
Cai, Jinyan ;
Ding, Shifeng ;
Zhang, Qin ;
Liu, Renwei ;
Zeng, Dinghan ;
Zhou, Li .
OCEAN ENGINEERING, 2022, 259
[7]  
[蔡晓妍 CAI Xiaoyan], 2008, [计算机科学, Computer Science], V35, P14
[8]  
[陈迪 Chen Di], 2021, [南京大学学报. 自然科学版, Journal of Nanjing University. Natural Sciences], V57, P177
[9]   Hyperspectral Image Classification Based on Fusing S3-PCA, 2D-SSA and Random Patch Network [J].
Chen, Huayue ;
Wang, Tingting ;
Chen, Tao ;
Deng, Wu .
REMOTE SENSING, 2023, 15 (13)
[10]   A New SCAE-MT Classification Model for Hyperspectral Remote Sensing Images [J].
Chen, Huayue ;
Chen, Ye ;
Wang, Qiuyue ;
Chen, Tao ;
Zhao, Huimin .
SENSORS, 2022, 22 (22)