An improved initial cluster centers selection algorithm for k-means based on features correlative degree

被引:0
|
作者
Chen, Xingshu [1 ]
Wu, Xiaosong [1 ]
Wang, Wenxian [1 ]
Wang, Haizhou [1 ]
机构
[1] Network and Trusted Computing Inst., College of Computer Sci., Sichuan Univ., Chengdu
来源
Sichuan Daxue Xuebao (Gongcheng Kexue Ban)/Journal of Sichuan University (Engineering Science Edition) | 2015年 / 47卷 / 01期
关键词
Feature correlative degree; Initial cluster center; K-means; Text clustering;
D O I
10.15961/j.jsuese.2015.01.002
中图分类号
学科分类号
摘要
In order to solve the problem that K-means algorithms is highly sensitive to initial clusters centers in text clustering, an initial cluster center selection algorithm based on the correlative degree of features was proposed. Features with high correlative degree were chosen after reducing dimensions and a new dataset was created. Subsequently, a candidate initial cluster center set was constructed by merging the similar documents in the new dataset using “OR operation”. Finally, the best centers from the candidate dataset were obtained through computing document density and following the minimax principle. The results of five experimental datasets showed that most F-scores are more than 90%, and entropies are below 0.5. Comparison with the K-means algorithms of Mahout showed that the improved algorithm can choose higher quality centers and produce better clustering results. ©, 2015, Editorial Department of Journal of Sichuan University. All right reserved.
引用
收藏
页码:13 / 19
页数:6
相关论文
共 17 条
  • [1] MacQueen J., Some methods for classification and analysis of multivariate observations, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, (1967)
  • [2] Arthur D., Vassilvitskii S., K-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, pp. 1027-1035, (2007)
  • [3] Khan F., An initial seed selection algorithm for K-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application, Applied Soft Computing, 12, 11, pp. 3698-3700, (2012)
  • [4] Xiao J., Yan Y., Zhang J., Et al., A quantum-inspired genetic algorithm for K-means clustering, Expert Systems with Applications, 37, 7, pp. 4966-4973, (2012)
  • [5] Reddy D., Jana P.K., Initialization for K-means clustering using voronoi diagram, Procedia Technology, 4, pp. 395-400, (2012)
  • [6] Likas A., Vlassis N., Verbeek J.J., The global K-means clustering algorithm, Pattern Recognition, 36, 2, pp. 451-461, (2003)
  • [7] Bagirov A.M., Modified global K-means algorithm for minimum sum-of-squares clustering problem, Pattern Recognition, 41, 10, pp. 3192-3199, (2008)
  • [8] Bagirov A.M., Ugon J., Webb D., Fast modified global K-means algorithm for incremental cluster construction, Pattern Recognition, 44, 4, pp. 866-876, (2011)
  • [9] Wang Z., Liu Z., Chen D., Research of adaptive text clustering based on the statistics of the datasets, Journal of Sichan University: Engineering Science Edition, 44, 1, pp. 106-111, (2012)
  • [10] Zhang J., Yang Y., Yang J., Et al., Algorithm for initialization of K-means clustering center based on optimized division, Journal of System Simulation, 21, 9, pp. 2586-2590, (2009)