An improved semi-supervised clustering algorithm based on initial center points

被引:0
作者
Xia, Zhanguo [1 ]
Cai, Shiyu [1 ]
Zhang, Wentao [1 ]
机构
[1] School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, Jiangsu Province
关键词
Initial center points; Initialization; K-means; Labeled data; Semi-supervised;
D O I
10.4156/jcit.vol7.issue5.38
中图分类号
学科分类号
摘要
With label data existing widely, this paper proposes an improved semi-supervised clustering algorithm based on initial center points by limited labeled data. It takes advantage of a small amount of labeled data to guide the data clustering, and optimizes the initial clustering via center point initialization. Finally the algorithm can select the best classification of clustering according to labeled information and objective function. Experimental results on UCI dataset verify the accuracy and effectiveness of the algorithm. The accuracy and the efficiency of clustering have been greatly improved.
引用
收藏
页码:317 / 324
页数:7
相关论文
共 27 条
[1]  
Soman K.P., Diwakar S., Ajay V., Insight into Data Mining: Theory and practice, (2006)
[2]  
McQueen J., Some methods for classification and analysis of multivariate observations, Proceedings of Fifth Berkeley Symposium on Mathematics and Probability, pp. 281-297, (1967)
[3]  
Khan D.M., Mohamudally N., A Multiagent System (MAS) for the Generation of Initial Centroids for kmeans Clustering Data Mining Algorithm based on Actual Sample Datapoints, JNIT, 1, 2, pp. 85-95, (2010)
[4]  
Barakbah A.R., Kiyoki Y., A Pillar Algorithm for K-Means Optimization by Distance Maximization for Initial Centroid Designation, Computational Intelligence and Data Mining, pp. 61-68, (2009)
[5]  
Manjunath S.S., Rangarajan L., Refinement of K-means Clustering for Segmentation of Microarray Images, JCIT, 6, 9, pp. 403-411, (2011)
[6]  
Merz C.J., St Clair D.C., Bond W.E., Semi-supervised Adaptive Resonance Theory, Proceedings of the International Joint Conference on Neural Networks, pp. 851-856, (1992)
[7]  
Yuecheng Y., Jiandong W., Guansheng Z., Bin G., Semi-supervised Distributed Clustering with Mahalanobis Distance Metric Learning, JDCTA, 4, 9, pp. 132-140, (2010)
[8]  
Hong Y., Kwong S., Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm, IEEE Transactions on Systems Man and Cybernetics, pp. 568-574, (2009)
[9]  
Basu S., Bilenko M., A Probabilistic Framework for Semi-Supervised Clustering, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 59-68, (2004)
[10]  
Zhu G.-P., Zeng Q.-S., Qu Y.-C., Wang C.-H., Sheng B.-C., An Unsupervised Image Segmentation Algorithm Based on HMRF Model, Acta Electronica Sinaica, 34, 2, pp. 374-379, (2006)