Fast Clustering with Flexible Balance Constraints

被引:0
作者
Liu, Hongfu [1 ]
Huang, Ziming [2 ]
Chen, Qi [3 ]
Li, Mingqin [3 ]
Fu, Yun [4 ]
Zhang, Lintao [3 ]
机构
[1] Brandeis Univ, Dept Comp Sci, Waltham, MA 02453 USA
[2] Peking Univ, Key Lab High Confidence Software Technol MOE, Sch EECS, Beijing, Peoples R China
[3] Northeastern Univ, Microsoft, Boston, MA 02115 USA
[4] Northeastern Univ, Coll Engn, Boston, MA 02115 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Clustering; K-means; Balance Constraints;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Balanced clustering aims at partitioning a dataset with roughly even cluster sizes while exploiting the intrinsic structure of the data. Despite attracting increased attention recently in both the academia and the industry, most existing balanced clustering algorithms still have high run time complexities that prevent them from being applied to large datasets. To cope with this challenge, we propose a Fast Clustering with Flexible balance Constraints FCFC, a simple, fast and effective clustering algorithm that can deal with flexible balance constraints. In essence, FCFC employs K-means as the core clustering algorithm and the cluster size variances as the penalty for imbalance. The objective function consists of the combined classical K-means clustering cost as well as the imbalance penalty. By exploiting a new insight of the second term, FCFC is able to employ an efficient K-means-like optimization procedure that can scale to big datasets. Furthermore, we also extend our model for multiple balance constraints with theoretical supports. Extensive experimental results show that our method exceeds several state-of-the-art methods by large margins in terms of efficiency and clustering quality. Finally, a real-world application for Bing search is provided, where data are organized in multiple machines with data size and query frequency balancing objectives. In the simulated scenario, our solution achieves the same fidelity score while reduces cost by 75% compared to the baseline method.
引用
收藏
页码:743 / 750
页数:8
相关论文
共 50 条
  • [21] Genetic Algorithm with Fast Greedy Heuristic for Clustering and Location Problems
    Kazakovtsev, Lev A.
    Antamoshkin, Alexander N.
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (03): : 229 - 240
  • [22] Targeted demand response for flexible energy communities using clustering techniques
    Pelekis, Sotiris
    Pipergias, Angelos
    Karakolis, Evangelos
    Mouzakitis, Spiros
    Santori, Francesca
    Ghoreishi, Mohammad
    Askounis, Dimitris
    SUSTAINABLE ENERGY GRIDS & NETWORKS, 2023, 36
  • [23] Parameter-Insensitive Min Cut Clustering With Flexible Size Constrains
    Nie, Feiping
    Xie, Fangyuan
    Yu, Weizhong
    Li, Xuelong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5479 - 5492
  • [24] Characterization of Constraints in Flexible Unknown Environments
    Bhattacharyya, Sam
    Simaan, Nabil
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 3994 - 3999
  • [25] Clustering in WSN with latency and energy consumption constraints
    Aoun, Bassam
    Boutaba, Raouf
    JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2006, 14 (03) : 415 - 439
  • [26] Constrained Clustering: General Pairwise and Cardinality Constraints
    Bibi, Adel
    Alqahtani, Ali
    Ghanem, Bernard
    IEEE ACCESS, 2023, 11 : 5824 - 5836
  • [27] Using Minimum Matching for Clustering with Balancing Constraints
    Shirali-Shahreza, Sajad
    Abolhassani, Hassan
    Shirali-Shahreza, M. Hassan
    2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL I, 2009, : 225 - +
  • [28] Metric learning with clustering-based constraints
    Guo, Xinyao
    Dang, Chuangyin
    Liang, Jianqing
    Wei, Wei
    Liang, Jiye
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (12) : 3597 - 3605
  • [29] Optimizing MSE for Clustering with Balanced Size Constraints
    Tang, Wei
    Yang, Yang
    Zeng, Lanling
    Zhan, Yongzhao
    SYMMETRY-BASEL, 2019, 11 (03):
  • [30] Fast and Accurate Clustering of Multiple Modality Data via Feature Matching
    Zhang, Bin
    Cai, Hongmin
    Chen, Jiazhou
    Hu, Yu
    Huang, Jie
    Rong, Wentao
    Weng, Wanlin
    Huang, Qinjian
    Wang, Haiyan
    Peng, Hong
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 5040 - 5050