Fast Clustering with Flexible Balance Constraints

被引:0
作者
Liu, Hongfu [1 ]
Huang, Ziming [2 ]
Chen, Qi [3 ]
Li, Mingqin [3 ]
Fu, Yun [4 ]
Zhang, Lintao [3 ]
机构
[1] Brandeis Univ, Dept Comp Sci, Waltham, MA 02453 USA
[2] Peking Univ, Key Lab High Confidence Software Technol MOE, Sch EECS, Beijing, Peoples R China
[3] Northeastern Univ, Microsoft, Boston, MA 02115 USA
[4] Northeastern Univ, Coll Engn, Boston, MA 02115 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Clustering; K-means; Balance Constraints;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Balanced clustering aims at partitioning a dataset with roughly even cluster sizes while exploiting the intrinsic structure of the data. Despite attracting increased attention recently in both the academia and the industry, most existing balanced clustering algorithms still have high run time complexities that prevent them from being applied to large datasets. To cope with this challenge, we propose a Fast Clustering with Flexible balance Constraints FCFC, a simple, fast and effective clustering algorithm that can deal with flexible balance constraints. In essence, FCFC employs K-means as the core clustering algorithm and the cluster size variances as the penalty for imbalance. The objective function consists of the combined classical K-means clustering cost as well as the imbalance penalty. By exploiting a new insight of the second term, FCFC is able to employ an efficient K-means-like optimization procedure that can scale to big datasets. Furthermore, we also extend our model for multiple balance constraints with theoretical supports. Extensive experimental results show that our method exceeds several state-of-the-art methods by large margins in terms of efficiency and clustering quality. Finally, a real-world application for Bing search is provided, where data are organized in multiple machines with data size and query frequency balancing objectives. In the simulated scenario, our solution achieves the same fidelity score while reduces cost by 75% compared to the baseline method.
引用
收藏
页码:743 / 750
页数:8
相关论文
共 50 条
  • [31] Weighted bilateral K-means algorithm for fast co-clustering and fast spectral clustering
    Song, Kun
    Yao, Xiwen
    Nie, Feiping
    Li, Xuelong
    Xu, Mingliang
    PATTERN RECOGNITION, 2021, 109 (109)
  • [32] Capacitated spatial clustering with multiple constraints and attributes
    Lahderanta, Tero
    Loven, Lauri
    Ruha, Leena
    Leppanen, Teemu
    Launonen, Ilkka
    Riekki, Jukka
    Sillanpaa, Mikko J.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [33] The fast clustering algorithm for the big data based on K-means
    Xie, Ting
    Zhang, Taiping
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [34] Study of Fast Clustering Algorithm Based on Foregone Samples in Intrusion Detections
    Liu Tao
    Hou Yuan-bin
    Qi Ai-ling
    Chang Xin-tan
    NSWCTC 2009: INTERNATIONAL CONFERENCE ON NETWORKS SECURITY, WIRELESS COMMUNICATIONS AND TRUSTED COMPUTING, VOL 1, PROCEEDINGS, 2009, : 633 - +
  • [35] Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support
    Ye, Jianbo
    Wu, Panruo
    Wang, James Z.
    Li, Jia
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (09) : 2317 - 2332
  • [36] Metric learning with clustering-based constraints
    Xinyao Guo
    Chuangyin Dang
    Jianqing Liang
    Wei Wei
    Jiye Liang
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 3597 - 3605
  • [37] Clustering and Constraints for Real-time Multicast
    Cheng, Wei
    Cheng, Shi
    Wu, Chanle
    Yue, Jun
    Ye, Gang
    He, Lian
    NAS: 2009 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE, 2009, : 184 - 187
  • [38] Clustering in WSN with Latency and Energy Consumption Constraints
    Bassam Aoun
    Raouf Boutaba
    Journal of Network and Systems Management, 2006, 14 : 415 - 439
  • [39] A Clustering Routing Protocol for Energy Balance of WSN based on Genetic Clustering Algorithm
    He, Shijun
    Dai, Yanyan
    Zhou, Ruyan
    Zhao, Shiting
    INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SUPPORTED EDUCATION, 2012, 2 : 788 - 793
  • [40] FastDEC: Clustering by Fast Dominance Estimation
    Yang, Geping
    Lv, Hongzhang
    Yang, Yiyang
    Gong, Zhiguo
    Chen, Xiang
    Hao, Zhifeng
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I, 2023, 13713 : 138 - 156