Fast Clustering with Flexible Balance Constraints

被引:0
作者
Liu, Hongfu [1 ]
Huang, Ziming [2 ]
Chen, Qi [3 ]
Li, Mingqin [3 ]
Fu, Yun [4 ]
Zhang, Lintao [3 ]
机构
[1] Brandeis Univ, Dept Comp Sci, Waltham, MA 02453 USA
[2] Peking Univ, Key Lab High Confidence Software Technol MOE, Sch EECS, Beijing, Peoples R China
[3] Northeastern Univ, Microsoft, Boston, MA 02115 USA
[4] Northeastern Univ, Coll Engn, Boston, MA 02115 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Clustering; K-means; Balance Constraints;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Balanced clustering aims at partitioning a dataset with roughly even cluster sizes while exploiting the intrinsic structure of the data. Despite attracting increased attention recently in both the academia and the industry, most existing balanced clustering algorithms still have high run time complexities that prevent them from being applied to large datasets. To cope with this challenge, we propose a Fast Clustering with Flexible balance Constraints FCFC, a simple, fast and effective clustering algorithm that can deal with flexible balance constraints. In essence, FCFC employs K-means as the core clustering algorithm and the cluster size variances as the penalty for imbalance. The objective function consists of the combined classical K-means clustering cost as well as the imbalance penalty. By exploiting a new insight of the second term, FCFC is able to employ an efficient K-means-like optimization procedure that can scale to big datasets. Furthermore, we also extend our model for multiple balance constraints with theoretical supports. Extensive experimental results show that our method exceeds several state-of-the-art methods by large margins in terms of efficiency and clustering quality. Finally, a real-world application for Bing search is provided, where data are organized in multiple machines with data size and query frequency balancing objectives. In the simulated scenario, our solution achieves the same fidelity score while reduces cost by 75% compared to the baseline method.
引用
收藏
页码:743 / 750
页数:8
相关论文
共 50 条
  • [41] A fast particle swarm optimization for clustering
    Chun-Wei Tsai
    Ko-Wei Huang
    Chu-Sing Yang
    Ming-Chao Chiang
    Soft Computing, 2015, 19 : 321 - 338
  • [42] Fast reciprocal nearest neighbors clustering
    Lopez-Sastre, Roberto J.
    Onoro-Rubio, Daniel
    Gil-Jimenez, Pedro
    Maldonado-Bascon, Saturnino
    SIGNAL PROCESSING, 2012, 92 (01) : 270 - 275
  • [43] Fast Clustering for Interactive Tractography Segmentation
    Olivetti, Emanuele
    Thien Bao Nguyen
    Garyfallidis, Eleftherios
    Agarwal, Nivedita
    Avesani, Paolo
    2013 3RD INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION IN NEUROIMAGING (PRNI 2013), 2013, : 42 - 45
  • [44] Fast hierarchical clustering and its validation
    Dash, M
    Liu, H
    Scheuermann, P
    Tan, KL
    DATA & KNOWLEDGE ENGINEERING, 2003, 44 (01) : 109 - 138
  • [45] A fast algorithm for robust constrained clustering
    Fritz, Heinrich
    Garcia-Escudero, Luis A.
    Mayo-Iscar, Agustin
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 61 : 124 - 136
  • [46] A fast particle swarm optimization for clustering
    Tsai, Chun-Wei
    Huang, Ko-Wei
    Yang, Chu-Sing
    Chiang, Ming-Chao
    SOFT COMPUTING, 2015, 19 (02) : 321 - 338
  • [47] Fast and explainable clustering based on sorting
    Chen, Xinye
    Guettel, Stefan
    PATTERN RECOGNITION, 2024, 150
  • [48] Fast Super-Paramagnetic Clustering
    Yelibi, Lionel
    Gebbie, Tim
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 551
  • [49] A Fast Weighted Clustering Algorithm for FANET
    Wang, Na
    Xiao, Meng
    Zhao, Zhongliang
    Liu, Yang
    2024 IEEE 99TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2024-SPRING, 2024,
  • [50] Fast Artificial Bee Colony for Clustering
    Girsang, Abba Suganda
    Muliono, Yohan
    Fanny, Fanny
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2018, 42 (02): : 211 - 219