Smoothed Analysis of the k-Means Method

被引:60
作者
Arthur, David [1 ]
Manthey, Bodo [2 ,4 ]
Roeglin, Heiko [3 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Twente, Dept Appl Math, NL-7500 AE Enschede, Netherlands
[3] Univ Bonn, Inst Informat, Abt 1, D-53113 Bonn, Germany
[4] Univ Saarland, Dept Comp Sci, D-6600 Saarbrucken, Germany
关键词
Algorithms; Theory; Data clustering; k-means method; smoothed analysis; ALGORITHM;
D O I
10.1145/2027216.2027217
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this article, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.
引用
收藏
页数:31
相关论文
共 32 条
[1]   Clustering for Metric and Nonmetric Distance Measures [J].
Ackermann, Marcel R. ;
Bloemer, Johannes ;
Sohler, Christian .
ACM TRANSACTIONS ON ALGORITHMS, 2010, 6 (04)
[2]  
Ackermann MR, 2009, PROCEEDINGS OF THE TWENTIETH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1088
[3]   NP-hardness of Euclidean sum-of-squares clustering [J].
Aloise, Daniel ;
Deshpande, Amit ;
Hansen, Pierre ;
Popat, Preyas .
MACHINE LEARNING, 2009, 75 (02) :245-248
[4]  
[Anonymous], 1991, Probability: theory and examples
[5]  
[Anonymous], 1973, Pattern Classification and Scene Analysis
[6]  
Arthur D., 2006, Proceedings of the Twenty-Second Annual Symposium on Computational Geometry (SCG'06), P144, DOI 10.1145/1137856.1137880
[7]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[8]   k-Means has Polynomial Smoothed Complexity [J].
Arthur, David ;
Manthey, Bodo ;
Roeglin, Heiko .
2009 50TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE: FOCS 2009, PROCEEDINGS, 2009, :405-414
[9]   WORST-CASE AND SMOOTHED ANALYSIS OF THE ICP ALGORITHM, WITH AN APPLICATION TO THE k-MEANS METHOD [J].
Arthur, David ;
Vassilvitskii, Sergei .
SIAM JOURNAL ON COMPUTING, 2009, 39 (02) :766-782
[10]  
Badoiu M, 2002, P 34 ANN ACM S THEOR, P250, DOI DOI 10.1145/509907.509947