A three-way cluster ensemble approach for large-scale data

被引:64
|
作者
Yu, Hong [1 ]
Chen, Yun [1 ]
Lingras, Pawan [2 ]
Wang, Guoyin [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Computat Intelligence, Chongqing 400065, Peoples R China
[2] St Marys Univ, Dept Math & Comp Sci, Halifax, NS B3H 3C3, Canada
基金
中国国家自然科学基金;
关键词
Cluster ensemble; Three-way decisions; Large-scale data; Cluster units; Spark; DECISION; ALGORITHM;
D O I
10.1016/j.ijar.2019.09.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster ensemble has emerged as a powerful technique for combining multiple clustering results. To address the problem of clustering on large-scale data, this paper presents an efficient three-way cluster ensemble approach based on Spark, which has the ability to deal with both hard clustering and soft clustering. First, this paper proposes the framework of three-way cluster ensemble based on Spark inspired by the theory of three-way decisions, and develops a distributed three-way k-means clustering algorithm. Then, we introduce the concept of cluster unit, which reflects the minimal granularity distribution structure agreed by all the ensemble members. We also introduce quantitative measures for calculating the relationships between units and between clusters. Finally, we propose a consensus clustering algorithm based on cluster units, and we devise various three-way decision strategies to assign small cluster units and no-unit objects. The experimental results using 19 real-world data sets validate the effectiveness of the proposed approach from different indices such as ARI, ACC, NMI and F1-Measure. The experimental results show that the proposed approach can effectively deal with large-scale data, and the proposed consensus clustering algorithm has a lower time cost and does not sacrifice the clustering quality. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:32 / 49
页数:18
相关论文
共 50 条
  • [1] An Efficient Gradual Three-Way Decision Cluster Ensemble Approach
    Yu, Hong
    Wang, Guoyin
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND FOUNDATIONS, PT II, 2018, 854 : 711 - 723
  • [2] Three-way Indexing ZDDs for Large-Scale Sparse Datasets
    Aoki, Hiroshi
    Toda, Takahisa
    Minato, Shin-ichi
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, 2014, 8643 : 457 - 469
  • [3] A Cluster Ensemble Framework Based on Three-Way Decisions
    Yu, Hong
    Zhou, Qingfeng
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY: 8TH INTERNATIONAL CONFERENCE, 2013, 8171 : 302 - 312
  • [4] Three-Way Ensemble Clustering for Incomplete Data
    Wang, Pingxin
    Chen, Xiangjian
    IEEE ACCESS, 2020, 8 (08): : 91855 - 91864
  • [5] Cluster Analysis of Three-Way Atmospheric Data
    Morlini, Isabella
    Orlandini, Stefano
    ADVANCES IN STATISTICAL MODELS FOR DATA ANALYSIS, 2015, : 177 - 189
  • [6] Three-Way Ensemble Prediction for Workload in the Data Center
    Shi, Rui
    Jiang, Chunmao
    IEEE ACCESS, 2022, 10 : 10021 - 10030
  • [7] A three-way decision ensemble method for imbalanced data oversampling
    Yan, Yuan Ting
    Wu, Zeng Bao
    Du, Xiu Quan
    Chen, Jie
    Zhao, Shu
    Zhang, Yan Ping
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 107 (1-16) : 1 - 16
  • [8] A Three-Way Clustering Method Based on Ensemble Strategy and Three-Way Decision
    Wang, Pingxin
    Liu, Qiang
    Xu, Gang
    Wang, Kangkang
    INFORMATION, 2019, 10 (02)
  • [9] Effective ensemble learning approach for large-scale medical data analytics
    Namamula, Lakshmana Rao
    Chaytor, Daniel
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (01) : 13 - 20
  • [10] Effective ensemble learning approach for large-scale medical data analytics
    Lakshmana Rao Namamula
    Daniel Chaytor
    International Journal of System Assurance Engineering and Management, 2024, 15 : 13 - 20