Fast Clustering with Flexible Balance Constraints

被引:0
|
作者
Liu, Hongfu [1 ]
Huang, Ziming [2 ]
Chen, Qi [3 ]
Li, Mingqin [3 ]
Fu, Yun [4 ]
Zhang, Lintao [3 ]
机构
[1] Brandeis Univ, Dept Comp Sci, Waltham, MA 02453 USA
[2] Peking Univ, Key Lab High Confidence Software Technol MOE, Sch EECS, Beijing, Peoples R China
[3] Northeastern Univ, Microsoft, Boston, MA 02115 USA
[4] Northeastern Univ, Coll Engn, Boston, MA 02115 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Clustering; K-means; Balance Constraints;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Balanced clustering aims at partitioning a dataset with roughly even cluster sizes while exploiting the intrinsic structure of the data. Despite attracting increased attention recently in both the academia and the industry, most existing balanced clustering algorithms still have high run time complexities that prevent them from being applied to large datasets. To cope with this challenge, we propose a Fast Clustering with Flexible balance Constraints FCFC, a simple, fast and effective clustering algorithm that can deal with flexible balance constraints. In essence, FCFC employs K-means as the core clustering algorithm and the cluster size variances as the penalty for imbalance. The objective function consists of the combined classical K-means clustering cost as well as the imbalance penalty. By exploiting a new insight of the second term, FCFC is able to employ an efficient K-means-like optimization procedure that can scale to big datasets. Furthermore, we also extend our model for multiple balance constraints with theoretical supports. Extensive experimental results show that our method exceeds several state-of-the-art methods by large margins in terms of efficiency and clustering quality. Finally, a real-world application for Bing search is provided, where data are organized in multiple machines with data size and query frequency balancing objectives. In the simulated scenario, our solution achieves the same fidelity score while reduces cost by 75% compared to the baseline method.
引用
收藏
页码:743 / 750
页数:8
相关论文
共 50 条
  • [1] Fast and Flexible Unsupervised Clustering Algorithm Based on Ultrametric Properties
    Fouchal, Said
    Lavallee, Ivan
    Q2SWINET 11: PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON QOS AND SECURITY FOR WIRELESS AND MOBILE NETWORKS, 2011, : 35 - 42
  • [2] Fast clustering-based anonymization approaches with time constraints for data streams
    Guo, Kun
    Zhang, Qishan
    KNOWLEDGE-BASED SYSTEMS, 2013, 46 : 95 - 108
  • [3] Perspectives of Fast Clustering Techniques
    Savvas, Ilias K.
    Garani, Georgia
    PROCEEDINGS OF THE THIRD INTERNATIONAL SCIENTIFIC CONFERENCE INTELLIGENT INFORMATION TECHNOLOGIES FOR INDUSTRY (IITI'18), VOL 2, 2019, 875 : 31 - 40
  • [4] FAST ISODATA CLUSTERING ALGORITHMS
    VENKATESWARLU, NB
    RAJU, PSVSK
    PATTERN RECOGNITION, 1992, 25 (03) : 335 - 342
  • [5] A fast implementation of the ISODATA clustering algorithm
    Memarsadeghi, Nargess
    Mount, David M.
    Netanyahu, Nathan S.
    Le Moigne, Jacqueline
    INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 2007, 17 (01) : 71 - 103
  • [6] Fast and robust general purpose clustering algorithms
    Estivill-Castro, V
    Yang, J
    DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (02) : 127 - 150
  • [7] Clustering by Learning Constraints Priorities
    Okabe, Masayuki
    Yamada, Seiji
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 1050 - 1055
  • [8] Leader Ant Clustering with Constraints
    Vu, Viet-Vu
    Labroche, Nicolas
    Bouchon-Meunier, Bernadette
    2009 IEEE-RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION AND VISION FOR THE FUTURE, 2009, : 79 - +
  • [9] Enhancing Social Network Communication through Dynamic Clustering Balance
    Lamrhari, Soumaya
    Elghazi, Hamid
    Tigani, Small
    El Faker, Abdellatif
    Sadiki, Tayeb
    PROCEEDINGS OF THE 2ND MEDITERRANEAN CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (MEDPRAI-2018), 2018, : 63 - 69
  • [10] Fast K-means for Large Scale Clustering
    Hu, Qinghao
    Wu, Jiaxiang
    Bai, Lu
    Zhang, Yifan
    Cheng, Jian
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2099 - 2102