Approximating k-means-type clustering via semidefinite programming

被引:111
作者
Peng, Jiming [1 ]
Wei, Yu [1 ]
机构
[1] McMaster Univ, Dept Comp & Software, Adv Optimizat Lab, Hamilton, ON L8S 4K1, Canada
关键词
k-means clustering; principal component analysis; 0-1; SDP; relaxation; computational complexity; approximation;
D O I
10.1137/050641983
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
One of the fundamental clustering problems is to assign n points into k clusters based on minimal sum-of-squared distances (MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP) problem. We show that our 0-1 SDP model provides a unified framework for several clustering approaches such as normalized k- cut and spectral clustering. Moreover, the 0-1 SDP model allows us to solve the underlying problem approximately via the linear programming and SDP relaxations. Second, we consider the issue of how to extract a feasible solution of the original 0-1 SDP model from the optimal solution of the relaxed SDP problem. By using principal component analysis, we develop a rounding procedure to construct a feasible partitioning from a solution of the relaxed problem. In our rounding procedure, we need to solve a K-means clustering problem in Rk - 1, which can be done in O(n(k2 - 2k + 2)) time. In case of biclustering, the running time of our rounding procedure can be reduced to O(n log n). We show that our algorithm provides a 2-approximate solution to the original problem. Promising numerical results for biclustering based on our new method are reported.
引用
收藏
页码:186 / 205
页数:20
相关论文
共 45 条
[1]   Exact and approximation algorithms for clustering [J].
Agarwal, PK ;
Procopiuc, CM .
ALGORITHMICA, 2002, 33 (02) :201-226
[2]  
[Anonymous], NIPS
[3]  
BACH FR, 2004, ADV NEURAL INF PROCE, V16
[4]  
Bradley P. S., 2000, MSRTR200065 MICR RES
[5]   Mathematical programming for data mining: Formulations and challenges [J].
Bradley, PS ;
Fayyad, UM ;
Mangasarian, OL .
INFORMS JOURNAL ON COMPUTING, 1999, 11 (03) :217-238
[6]  
DING C, 2004, P 21 ANN INT C MACH
[7]   Clustering large graphs via the Singular Value Decomposition [J].
Drineas, P ;
Frieze, A ;
Kannan, R ;
Vempala, S ;
Vinay, V .
MACHINE LEARNING, 2004, 56 (1-3) :9-33
[8]  
DRINEAS P, 2005, COMMUNICATION
[9]   An interior point algorithm for minimum sum-of-squares clustering [J].
Du Merle, O ;
Hansen, P ;
Jaumard, B ;
Mladenovic, N .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2000, 21 (04) :1485-1505
[10]  
Ghosh J, 2003, HUM FAC ER, P247