A Local and Global Discriminative Framework and Optimization for Balanced Clustering

被引:20
作者
Han, Junwei [1 ]
Liu, Hanyang [1 ]
Nie, Feiping [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[3] Northwestern Polytech Univ, Ctr Opt Imagery Anal & Learning, Xian 710072, Shaanxi, Peoples R China
关键词
Balanced clustering; linear regression; spectral clustering (SC); local learning (LL); ALGORITHM;
D O I
10.1109/TNNLS.2018.2870131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For many specific applications in data mining and machine learning, we face explicit or latent size constraint for each cluster that leads to the "balanced clustering" problem. Many existing clustering algorithms perform well in partitioning but fail in producing balanced clusters and preserving the naturally balanced structure of some data. In this paper, we propose a novel balanced clustering framework that flexibly utilizes local and global information of data. First, we propose the global balanced clustering (GBC), in which a global discriminative partitioning model is combined with the minimization of the distribution entropy of data. Then, we show that the proposed GBC can be further used to globally regularize some widely used local clustering models, so as to transform them into balanced clustering that simultaneously capture local and global data. We apply our global balanced regularization to spectral clustering (SC) and local learning (LL)-based clustering, respectively, and propose another two novel balanced clustering models: the local and global balanced SC (LGB-SC) and LGB-LL. Finding the optimal balanced partition is nondeterministic polynomial-time (NP)-hard in general. We adopt the method of augmented Lagrange multipliers to help optimize our model. Comprehensive experiments on several real world benchmarks demonstrate the advantage of our framework to yield balanced clusters while preserving good clustering quality. Our proposed LGB-SC and LGB-LL also outperform SC and LI, as well as other classical clustering methods.
引用
收藏
页码:3059 / 3071
页数:13
相关论文
共 53 条
[1]  
[Anonymous], 2000, RES REDMOND
[2]  
[Anonymous], 2006, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2008, Advances in neural information processing systems
[4]  
[Anonymous], SERIES GESELLSCHAFT
[5]  
[Anonymous], STAT LEARN COMPUT VI
[6]  
[Anonymous], 2001, The elements of statistical learning: data mining, inference and prediction
[7]  
[Anonymous], P ADV NEUR INF PROC
[8]  
[Anonymous], P ICML
[9]   Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres [J].
Banerjee, A ;
Ghosh, J .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (03) :702-719
[10]  
Banerjee A, 2002, SIAM PROC S, P333