Local standard deviation spectral clustering

被引:6
作者
Xie, Juanying [1 ]
Zhou, Ying [1 ]
Ding, Lijuan [1 ]
机构
[1] Shaanxi Normal Univ, Sch Comp Sci, Xian, Shaanxi, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP) | 2018年
基金
中国国家自然科学基金;
关键词
Spectral Clustering; NJW; Self-Tuning spectral Clustering; Standard Deviation; Clustering;
D O I
10.1109/BigComp.2018.00043
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The global scaling parameter a may cause the typical spectral clustering algorithm NJW failing to discover the true clustering of a data set, especially it contains multiple scales. Although the Self-Tuning spectral clustering algorithm can overcome the weakness of NJW by proposing the local scaling parameter sigma(i) for point i, the local scaling parameter may be affected by outliers. To avoid the deficiencies of the NJW and the Self-Tuning spectral clustering algorithms, the local standard deviation spectral clustering named as SCSD for short is proposed in this paper. The SCSD coined the local standard deviation scaling parameter sigma(std_i) via the standard deviation of point i with its top p nearest neighbors instead of the local scaling parameter sigma(i) of point i in Self-Tuning spectral clustering. As a consequence the affinity matrix in SCSD can reflect the original distribution of a data set as far as possible. The power of the proposed SCSD was tested on some benchmark data sets including the challenging synthetic data sets and the real world data sets from UCI machine learning repository, and on the synthetically generated comparative big data with noises. Its performance was compared with that of NJW and Self-Tuning in terms of the popular bench mark metrics including accuracy (Acc), Adjusted Mutual Information(AMI) and Adjusted Rand Index (ARI). The extensive experimental results demonstrate that the proposed SCSD spectral clustering algorithm is superior to NJW and Self-Tuning, and can find the true distribution of the data sets as far as possible and can be applied to detect the pattern of a big data.
引用
收藏
页码:242 / 250
页数:9
相关论文
共 31 条
[21]   A maximum variance cluster algorithm [J].
Veenman, CJ ;
Reinders, MJT ;
Backer, E .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (09) :1273-1280
[22]  
von Luxburg U., 2012, Proceedings of ICML workshop on unsupervised and transfer learning, V27, P65
[23]   A tutorial on spectral clustering [J].
von Luxburg, Ulrike .
STATISTICS AND COMPUTING, 2007, 17 (04) :395-416
[24]  
Xie J., 2016, Scientia Sinica: Information, V46, P258
[25]  
Xie Juan-Ping, 2014, Journal of Software, V25, P2050, DOI 10.13328/j.cnki.jos.004644
[26]   Differential Feature Recognition of Breast Cancer Patients Based on Minimum Spanning Tree Clustering and F-statistics [J].
Xie, Juanying ;
Li, Ying ;
Zhou, Ying ;
Wang, Mingzhao .
HEALTH INFORMATION SCIENCE, HIS 2016, 2016, 10038 :194-204
[27]   Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors [J].
Xie, Juanying ;
Gao, Hongchao ;
Xie, Weixin ;
Liu, Xiaohui ;
Grant, Philip W. .
INFORMATION SCIENCES, 2016, 354 :19-40
[28]   A Stable Gene Subset Selection Algorithm for Cancers [J].
Xie, Juanying ;
Gao, Hongchao .
HEALTH INFORMATION SCIENCE (HIS 2015), 2015, 9085 :111-122
[29]   An Efficient Global K-means Clustering Algorithm [J].
Xie, Juanying ;
Jiang, Shuai ;
Xie, Weixin ;
Gao, Xinbo .
JOURNAL OF COMPUTERS, 2011, 6 (02) :271-279
[30]   Survey of clustering algorithms [J].
Xu, R ;
Wunsch, D .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2005, 16 (03) :645-678