Local standard deviation spectral clustering

被引:6
作者
Xie, Juanying [1 ]
Zhou, Ying [1 ]
Ding, Lijuan [1 ]
机构
[1] Shaanxi Normal Univ, Sch Comp Sci, Xian, Shaanxi, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP) | 2018年
基金
中国国家自然科学基金;
关键词
Spectral Clustering; NJW; Self-Tuning spectral Clustering; Standard Deviation; Clustering;
D O I
10.1109/BigComp.2018.00043
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The global scaling parameter a may cause the typical spectral clustering algorithm NJW failing to discover the true clustering of a data set, especially it contains multiple scales. Although the Self-Tuning spectral clustering algorithm can overcome the weakness of NJW by proposing the local scaling parameter sigma(i) for point i, the local scaling parameter may be affected by outliers. To avoid the deficiencies of the NJW and the Self-Tuning spectral clustering algorithms, the local standard deviation spectral clustering named as SCSD for short is proposed in this paper. The SCSD coined the local standard deviation scaling parameter sigma(std_i) via the standard deviation of point i with its top p nearest neighbors instead of the local scaling parameter sigma(i) of point i in Self-Tuning spectral clustering. As a consequence the affinity matrix in SCSD can reflect the original distribution of a data set as far as possible. The power of the proposed SCSD was tested on some benchmark data sets including the challenging synthetic data sets and the real world data sets from UCI machine learning repository, and on the synthetically generated comparative big data with noises. Its performance was compared with that of NJW and Self-Tuning in terms of the popular bench mark metrics including accuracy (Acc), Adjusted Mutual Information(AMI) and Adjusted Rand Index (ARI). The extensive experimental results demonstrate that the proposed SCSD spectral clustering algorithm is superior to NJW and Self-Tuning, and can find the true distribution of the data sets as far as possible and can be applied to detect the pattern of a big data.
引用
收藏
页码:242 / 250
页数:9
相关论文
共 31 条
[1]  
[Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data, DOI DOI 10.1145/1217299.1217303
[2]   Robust path-based spectral clustering [J].
Chang, Hong ;
Yeung, Dit-Yan .
PATTERN RECOGNITION, 2008, 41 (01) :191-203
[3]  
Cucuringu M., 2016, ARXIV160104746
[4]  
Fanti C, 2004, ADV NEUR IN, V16, P1603
[5]  
Feldman D, 2013, PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013), P1434
[6]   Clustering by passing messages between data points [J].
Frey, Brendan J. ;
Dueck, Delbert .
SCIENCE, 2007, 315 (5814) :972-976
[7]  
Gong C, 2014, AAAI CONF ARTIF INTE, P1847
[8]  
Han J, 2012, MOR KAUF D, P1
[9]   Density-Weighted Fuzzy c-Means Clustering [J].
Hathaway, Richard J. ;
Hu, Yingkang .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2009, 17 (01) :243-252
[10]   Data clustering: 50 years beyond K-means [J].
Jain, Anil K. .
PATTERN RECOGNITION LETTERS, 2010, 31 (08) :651-666