Fast Clustering with Flexible Balance Constraints

被引:0
|
作者
Liu, Hongfu [1 ]
Huang, Ziming [2 ]
Chen, Qi [3 ]
Li, Mingqin [3 ]
Fu, Yun [4 ]
Zhang, Lintao [3 ]
机构
[1] Brandeis Univ, Dept Comp Sci, Waltham, MA 02453 USA
[2] Peking Univ, Key Lab High Confidence Software Technol MOE, Sch EECS, Beijing, Peoples R China
[3] Northeastern Univ, Microsoft, Boston, MA 02115 USA
[4] Northeastern Univ, Coll Engn, Boston, MA 02115 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Clustering; K-means; Balance Constraints;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Balanced clustering aims at partitioning a dataset with roughly even cluster sizes while exploiting the intrinsic structure of the data. Despite attracting increased attention recently in both the academia and the industry, most existing balanced clustering algorithms still have high run time complexities that prevent them from being applied to large datasets. To cope with this challenge, we propose a Fast Clustering with Flexible balance Constraints FCFC, a simple, fast and effective clustering algorithm that can deal with flexible balance constraints. In essence, FCFC employs K-means as the core clustering algorithm and the cluster size variances as the penalty for imbalance. The objective function consists of the combined classical K-means clustering cost as well as the imbalance penalty. By exploiting a new insight of the second term, FCFC is able to employ an efficient K-means-like optimization procedure that can scale to big datasets. Furthermore, we also extend our model for multiple balance constraints with theoretical supports. Extensive experimental results show that our method exceeds several state-of-the-art methods by large margins in terms of efficiency and clustering quality. Finally, a real-world application for Bing search is provided, where data are organized in multiple machines with data size and query frequency balancing objectives. In the simulated scenario, our solution achieves the same fidelity score while reduces cost by 75% compared to the baseline method.
引用
收藏
页码:743 / 750
页数:8
相关论文
共 50 条
  • [11] A simple and fast algorithm for K-medoids clustering
    Park, Hae-Sang
    Jun, Chi-Hyuck
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 3336 - 3341
  • [12] Flexible density peak clustering for real-world data
    Hou, Jian
    Lin, Houshen
    Yuan, Huaqiang
    Pelillo, Marcello
    PATTERN RECOGNITION, 2024, 156
  • [13] Improved fast partitional clustering algorithm for text clustering
    Bejos, Sebastian
    Feliciano-Avelino, Ivan
    Martinez-Trinidad, J. Fco.
    Carrasco-Ochoa, J. A.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2137 - 2145
  • [14] Extracting elite pairwise constraints for clustering
    Jiang, He
    Ren, Zhilei
    Xuan, Jifeng
    Wu, Xindong
    NEUROCOMPUTING, 2013, 99 : 124 - 133
  • [15] Boolean Kernels and Clustering with Pairwise Constraints
    Kusunoki, Yoshifumi
    Tanino, Tetsuzo
    2014 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2014, : 141 - 146
  • [16] On Linear Clustering with Constraints on Cluster Size
    Kimoto, Naoya
    Endo, Yasunori
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 832 - 836
  • [17] Team Building by Data Clustering with Constraints
    Yang, Chao-Lung
    Irfana, Maisyatus S.
    Samopa, Febriliyan
    PROCEEDINGS OF THE 2014 IEEE 18TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2014, : 390 - 395
  • [18] Container packing problem with balance constraints
    Moon, Ilkyeong
    Thi Viet Ly Nguyen
    OR SPECTRUM, 2014, 36 (04) : 837 - 878
  • [19] Graph-Based Clustering with Constraints
    Anand, Rajul
    Reddy, Chandan K.
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 51 - 62
  • [20] Container packing problem with balance constraints
    Ilkyeong Moon
    Thi Viet Ly Nguyen
    OR Spectrum, 2014, 36 : 837 - 878