Weighted consensus clustering and its application to Big data

被引:17
|
作者
Alguliyev, Rasim M. [1 ]
Aliguliyev, Ramiz M. [1 ]
Sukhostat, Lyudmila, V [1 ]
机构
[1] Azerbaijan Natl Acad Sci, Inst Informat Technol, 9A B Vahabzade St, AZ-1141 Baku, Azerbaijan
关键词
Weighted consensus clustering; Big data; Utility function; Purity-based utility function; Co-association matrix; ENSEMBLE; ALGORITHM; INDEXES;
D O I
10.1016/j.eswa.2020.113294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of this study is the development of a weighted consensus clustering that assigns weights to single clustering methods using the purity utility function. In the case of Big data that does not contain labels, the utility function based on the Davies-Bouldin index is proposed in this paper. The Banknote authentication, Phishing, Diabetic, Magic04, Credit card clients, Covertype, Phone accelerometer, and NSL-KDD datasets are used to assess the efficiency of the proposed consensus approach. The proposed approach is evaluated using the Euclidean, Minkowski, squared Euclidean, cosine, and Chebychev distance metrics. It is compared with single clustering algorithms (DBSCAN, OPTICS, CLARANS, k-means, and shared nearby neighbor clustering). The experimental results show the effectiveness of the proposed approach to the Big data clustering in comparison to single clustering methods. The proposed weighted consensus clustering using the squared Euclidean distance metric achieves the highest accuracy, which is a very promising result for Big data clustering. It can be applied to expert systems to help experts make group decisions based on several alternatives. The paper also provides directions for future research on consensus clustering in this area. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] A Review of Clustering Algorithms for Big Data
    Djouzi, Kheyreddine
    Beghdad-Bey, Kadda
    2019 4TH INTERNATIONAL CONFERENCE ON NETWORKING AND ADVANCED SYSTEMS (ICNAS 2019), 2019, : 117 - 122
  • [32] High Performance Big Data Clustering
    Agrawal, Ankit
    Patwary, Md. Mostofa Ali
    Hendrix, William
    Liao, Wei-keng
    Choudhary, Alok
    CLOUD COMPUTING AND BIG DATA, 2013, 23 : 192 - 211
  • [33] A weighted kernel possibilistic c-means algorithm based on cloud computing for clustering big data
    Zhang, Qingchen
    Chen, Zhikui
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2014, 27 (09) : 1378 - 1391
  • [34] A Novel Intelligent Clustering Approach for High Dimensional Data in a Big Data Environment
    Tao, Qian
    Wang, Zhenyu
    Gu, Chunqin
    Chen, Wenyuan
    Lin, Weiqiang
    Lin, Haojie
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
  • [35] DENCLUE-IM: A New Approach for Big Data Clustering
    Rehioui, Hajar
    Idrissi, Abdellah
    Abourezq, Manar
    Zegrari, Faouzia
    7TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2016) / THE 6TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2016) / AFFILIATED WORKSHOPS, 2016, 83 : 560 - 567
  • [36] Iterative subsampling in solution path clustering of noisy big data
    Marchetti, Yuliya
    Zhou, Qing
    STATISTICS AND ITS INTERFACE, 2016, 9 (04) : 415 - 431
  • [37] Big data clustering techniques based on Spark: a literature review
    Saeed, Mozamel M.
    Al Aghbari, Zaher
    Alsharidah, Mohammed
    PEERJ COMPUTER SCIENCE, 2020,
  • [38] Parallel and distributed clustering framework for big spatial data mining
    Bendechache, Malika
    Tari, A-Kamel
    Kechadi, M-Tahar
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 671 - 689
  • [39] How to Use K-means for Big Data Clustering?
    Mussabayev, Rustam
    Mladenovic, Nenad
    Jarboui, Bassem
    Mussabayev, Ravil
    PATTERN RECOGNITION, 2023, 137
  • [40] The Survey on Approaches to Efficient Clustering and Classification Analysis of Big Data
    Gandhi, Bhagyashri S.
    Deshpande, Leena A.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,