Large scale spectral clustering based on fast landmark sampling

被引:2
作者
Ye M. [1 ,2 ]
Liu W. [1 ,2 ]
机构
[1] PLA Information Engineering University, Zhengzhou
[2] State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou
来源
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology | 2017年 / 39卷 / 02期
基金
中国国家自然科学基金;
关键词
Approximate singular value decomposition; Big data; Landmark sampling; Spectral clustering;
D O I
10.11999/JEIT160260
中图分类号
学科分类号
摘要
The applicability of traditional spectral clustering is limited by its high complexity in large-scale data sets. Through construction of affinity matrix between landmark points and data points, the Landmark-based Spectral Clustering (LSC) algorithm can significantly reduce the computational complexity of spectral embedding. It is vital for clustering results to apply the suitable strategies of the generation of landmark points. While considering big data problems, the existing generation strategies of landmark points face some deficiencies: the unstable results of random sampling, along with the unknown convergence time and the repeatability of data reading in k-means centers method. In this paper, a rapid landmark-sampling spectral clustering algorithm based on the approximate singular value decomposition is designed, which makes the sampling probability of each landmark point decided by the row norm of the approximate singular vector matrix. Compared with LSC algorithm based on random sampling, the clustering result of new algorithm is more stable and accurate; compared with LSC algorithm based on k-means centers, the new algorithm reduces the computational complexity. Moreover, the preservation of information in original data is analyzed for the landmark-sampling results theoretically. At the same time, the performance of new approach is verified by the experiments in some public data sets. © 2017, Science Press. All right reserved.
引用
收藏
页码:278 / 284
页数:6
相关论文
共 23 条
[1]  
He Q., Li N., Luo W., Et al., A survey of machine learning algorithms for big data, Pattern Recognition and Artificial Intelligence, 27, 4, pp. 327-336, (2014)
[2]  
Ding S., Jia H., Zhang L., Et al., Research of semi-supervised spectral clustering algorithm based on pairwise constraints, Neural Computing and Applications, 24, 1, pp. 211-219, (2014)
[3]  
Ng A.Y., Jordan M.I., Weiss Y., On spectral clustering: Analysis and an algorithm, Neural Information Processing Systems: Natural and Synthetic, pp. 849-856, (2001)
[4]  
Fowlkes C., Belongie S., Chung F., Et al., Spectral grouping using the Nystrom method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 2, pp. 214-225, (2004)
[5]  
Li M., Kwok J.T., Lu B.L., Making large-scale Nyström approximation possible, Proceedings of the 27th International Conference on Machine Learning, pp. 631-638, (2010)
[6]  
Li M., Bi W., Kwork J.T., Et al., Large-scale Nyström kernel matrix approximation using randomized SVD, IEEE Transactions on Neural Networks and Learning Systems, 26, 1, pp. 152-164, (2015)
[7]  
Ding S., Jia H., Shi Z., Spectral clustering algorithm based on adaptive Nyström sampling for big data analysis, Journal of Software, 25, 9, pp. 2037-2049, (2014)
[8]  
Yan D., Huang L., Jordan M.I., Fast approximate spectral clustering, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907-916, (2009)
[9]  
Chen X., Cai D., Large scale spectral clustering with landmark-based representation, Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, pp. 313-318, (2011)
[10]  
Cai D., Chen X., Large scale spectral clustering via landmark-based sparse representation, IEEE Transactions on Cybernetics, 45, 8, pp. 1669-1680, (2015)