Fast modified global k-means algorithm for incremental cluster construction

被引:94
作者
Bagirov, Adil M. [1 ]
Ugon, Julien [1 ]
Webb, Dean [1 ]
机构
[1] Univ Ballarat, Ctr Informat & Appl Optimizat, Grad Sch Informat Technol & Math Sci, Ballarat, Vic 3353, Australia
基金
澳大利亚研究理事会;
关键词
Minimum sum-of-squares clustering; Nonsmooth optimization; k-means algorithm; Global k-means algorithm; MINIMUM SUM; SEARCH; BRANCH;
D O I
10.1016/j.patcog.2010.10.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the modified global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points lying in different parts of the dataset. We exploit information gathered in previous iterations of the incremental algorithm to eliminate the need of computing or storing the whole affinity matrix and thereby to reduce computational effort and memory usage. Results of numerical experiments on six standard datasets demonstrate that the new algorithm is more efficient than the global and the modified global k-means algorithms. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:866 / 876
页数:11
相关论文
共 35 条
  • [1] Agresti A., 1990, CATEGORICAL DATA ANA
  • [2] ALEX N, 2008, ESANN 2008, P227
  • [3] A TABU SEARCH APPROACH TO THE CLUSTERING PROBLEM
    ALSULTAN, KS
    [J]. PATTERN RECOGNITION, 1995, 28 (09) : 1443 - 1451
  • [4] Bagirov A.M., 2003, TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, V11, P1
  • [5] Bagirov A.M., 2006, Proceedings of the 2006 Workshop on Intelligent Systems for Bioinformatics, V73, P23
  • [6] Modified global k-means algorithm for minimum sum-of-squares clustering problems
    Bagirov, Adil M.
    [J]. PATTERN RECOGNITION, 2008, 41 (10) : 3192 - 3199
  • [7] A Global Optimization Approach to Classification
    Bagirov, Adil M.
    Rubinov, Alexander M.
    Yearwood, John
    [J]. OPTIMIZATION AND ENGINEERING, 2002, 3 (02) : 129 - 155
  • [8] A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems
    Bagirov, AM
    Yearwood, J
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 170 (02) : 578 - 596
  • [9] Bock H. H., 1998, ADV DATA SCI CLASSIF, P265
  • [10] A PRACTICAL APPLICATION OF SIMULATED ANNEALING TO CLUSTERING
    BROWN, DE
    HUNTLEY, CL
    [J]. PATTERN RECOGNITION, 1992, 25 (04) : 401 - 412