Gaussian model-based partitioning using iterated local search

被引:2
作者
Brusco, Michael J. [1 ]
Shireman, Emilie [2 ]
Steinley, Douglas [2 ]
Brudvig, Susan [3 ]
Cradit, J. Dennis [1 ]
机构
[1] Florida State Univ, Tallahassee, FL 32306 USA
[2] Univ Missouri, Columbia, MO USA
[3] Indiana Univ East, Richmond, IN USA
关键词
clustering; model-based partitioning; heuristics; WITHIN-CLUSTER SUMS; BINARY DATA; ALGORITHM; BRANCH; SELECTION; NUMBER; LIKELIHOOD; VARIABLES; CRITERIA;
D O I
10.1111/bmsp.12084
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The emergence of Gaussian model-based partitioning as a viable alternative to K-means clustering fosters a need for discrete optimization methods that can be efficiently implemented using model-based criteria. A variety of alternative partitioning criteria have been proposed for more general data conditions that permit elliptical clusters, different spatial orientations for the clusters, and unequal cluster sizes. Unfortunately, many of these partitioning criteria are computationally demanding, which makes the multiple-restart (multistart) approach commonly used for K-means partitioning less effective as a heuristic solution strategy. As an alternative, we propose an approach based on iterated local search (ILS), which has proved effective in previous combinatorial data analysis contexts. We compared multistart, ILS and hybrid multistart-ILS procedures for minimizing a very general model-based criterion that assumes no restrictions on cluster size or within-group covariance structure. This comparison, which used 23 data sets from the classification literature, revealed that the ILS and hybrid heuristics generally provided better criterion function values than the multistart approach when all three methods were constrained to the same 10-min time limit. In many instances, these differences in criterion function values reflected profound differences in the partitions obtained.
引用
收藏
页码:1 / 24
页数:24
相关论文
共 87 条
  • [1] An improved column generation algorithm for minimum sum-of-squares clustering
    Aloise, Daniel
    Hansen, Pierre
    Liberti, Leo
    [J]. MATHEMATICAL PROGRAMMING, 2012, 131 (1-2) : 195 - 220
  • [2] NP-hardness of Euclidean sum-of-squares clustering
    Aloise, Daniel
    Deshpande, Amit
    Hansen, Pierre
    Popat, Preyas
    [J]. MACHINE LEARNING, 2009, 75 (02) : 245 - 248
  • [3] Anderberg M.R., 1973, Probability and Mathematical Statistics, DOI DOI 10.1016/C2013-0-06161-0
  • [4] [Anonymous], MULTIVARIATE ANAL 2
  • [5] [Anonymous], THESIS
  • [6] [Anonymous], US MATLAB VERS 6
  • [7] [Anonymous], 1956, B ACAD POL SCI
  • [8] Banfield C. F., 1977, Applied Statistics, V26, P206, DOI 10.2307/2347039
  • [9] MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING
    BANFIELD, JD
    RAFTERY, AE
    [J]. BIOMETRICS, 1993, 49 (03) : 803 - 821
  • [10] Bartholomew D, 2011, WILEY SER PROBAB ST, P1, DOI 10.1002/9781119970583