Linear Time Maximum Margin Clustering

被引:52
作者
Wang, Fei [1 ]
Zhao, Bin [1 ]
Zhang, Changshui [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2010年 / 21卷 / 02期
关键词
Clustering; concave-convex procedure (CCP); cutting plane; maximum margin; OPTIMIZATION; ALGORITHM;
D O I
10.1109/TNN.2009.2036998
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Maximum margin clustering (MMC) is a newly proposed clustering method which has shown promising performance in recent studies. It extends the computational techniques of support vector machine (SVM) to the unsupervised scenario. Traditionally, MMC is formulated as a nonconvex integer programming problem which makes it difficult to solve. Several methods have been proposed in the literature to solve the MMC problem based on either semidefinite programming (SDP) or alternating optimization. However, these methods are still time demanding when handling large scale data sets, which limits its application in real-world problems. In this paper, we propose a cutting plane maximum margin clustering (CPMMC) algorithm. It first decomposes the nonconvex MMC problem into a series of convex subproblems by making use of the constrained concave-convex procedure (CCCP), then for each subproblem, our algorithm adopts the cutting plane algorithm to solve it. Moreover, we show that the CPMMC algorithm takes O(sn) time to converge with guaranteed accuracy, where is the number of samples in the data set and is the sparsity of the data set, i.e., the average number of nonzero features of the data samples. We also derive the multiclass version of our CPMMC algorithm. Experimental evaluations on several real-world data sets show that CPMMC performs better than existing MMC methods, both in efficiency and accuracy.
引用
收藏
页码:319 / 332
页数:14
相关论文
共 38 条
[1]  
AN LTH, 2002, P 1 INT WORKSH GLOB, P1
[2]  
[Anonymous], 2003, NONLINEAR PROGRAMMIN
[3]  
[Anonymous], 2005, INT WORKSHOP ARTIFIC
[4]  
[Anonymous], 2006, P ACMSIGKDD INT C KN
[5]  
[Anonymous], 2001, Journal of Machine Learning Research
[6]  
[Anonymous], P NAT C ART INT
[7]   CSDP, a C library for semidefinite programming [J].
Borchers, B .
OPTIMIZATION METHODS & SOFTWARE, 1999, 11-2 (1-4) :613-623
[8]   SPECTRAL K-WAY RATIO-CUT PARTITIONING AND CLUSTERING [J].
CHAN, PK ;
SCHLAG, MDF ;
ZIEN, JY .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1994, 13 (09) :1088-1096
[9]  
Chapelle O., 2005, P 10 INT WORKSH ART, P57
[10]  
CHEUNG PM, 2006, P 23 INT C MACH LEAR, P193