HOW THE INITIALIZATION AFFECTS THE STABILITY OF THE k-MEANS ALGORITHM

被引:27
作者
Bubeck, Sebastien [1 ]
Meila, Marina [2 ]
von Luxburg, Ulrike [3 ]
机构
[1] Ctr Recerca Matemat Barcelona, Barcelona, Spain
[2] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[3] Max Planck Inst Biol Cybernet, Tubingen, Germany
关键词
Clustering; k-means; stability; model selection;
D O I
10.1051/ps/2012013
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We investigate the role of the initialization for the stability of the k-means clustering algorithm. As opposed to other papers, we consider the actual k-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the k-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.
引用
收藏
页码:436 / 452
页数:17
相关论文
共 13 条
[1]  
[Anonymous], P SODA
[2]  
Ben-David S., 2006, P COLT
[3]  
Ben-David S., 2007, P COLT
[4]  
Ben-David S., 2008, P COLT
[5]  
Bottou L., 1995, P NIPS
[6]  
Dasgupta S, 2007, J MACH LEARN RES, V8, P203
[7]  
GRAF S, 2000, FDN QUANTIZATIONS PR
[8]   A BEST POSSIBLE HEURISTIC FOR THE K-CENTER PROBLEM [J].
HOCHBAUM, DS ;
SHMOYS, DB .
MATHEMATICS OF OPERATIONS RESEARCH, 1985, 10 (02) :180-184
[9]  
Ostrovsky R., 2006, P FOCS
[10]  
Shamir O., 2008, P NIPS