Weighted consensus clustering and its application to Big data

被引:17
|
作者
Alguliyev, Rasim M. [1 ]
Aliguliyev, Ramiz M. [1 ]
Sukhostat, Lyudmila, V [1 ]
机构
[1] Azerbaijan Natl Acad Sci, Inst Informat Technol, 9A B Vahabzade St, AZ-1141 Baku, Azerbaijan
关键词
Weighted consensus clustering; Big data; Utility function; Purity-based utility function; Co-association matrix; ENSEMBLE; ALGORITHM; INDEXES;
D O I
10.1016/j.eswa.2020.113294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of this study is the development of a weighted consensus clustering that assigns weights to single clustering methods using the purity utility function. In the case of Big data that does not contain labels, the utility function based on the Davies-Bouldin index is proposed in this paper. The Banknote authentication, Phishing, Diabetic, Magic04, Credit card clients, Covertype, Phone accelerometer, and NSL-KDD datasets are used to assess the efficiency of the proposed consensus approach. The proposed approach is evaluated using the Euclidean, Minkowski, squared Euclidean, cosine, and Chebychev distance metrics. It is compared with single clustering algorithms (DBSCAN, OPTICS, CLARANS, k-means, and shared nearby neighbor clustering). The experimental results show the effectiveness of the proposed approach to the Big data clustering in comparison to single clustering methods. The proposed weighted consensus clustering using the squared Euclidean distance metric achieves the highest accuracy, which is a very promising result for Big data clustering. It can be applied to expert systems to help experts make group decisions based on several alternatives. The paper also provides directions for future research on consensus clustering in this area. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Business Applications for Current Developments in Big Data Clustering: An Overview
    Hass, G.
    Simon, P.
    Kashef, R.
    2020 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEE IEEM), 2020, : 195 - 199
  • [42] Big data partition management model and its application research
    Zhang, Wenyi
    Xiang, Lianzhi
    Wang, Xiaofang
    Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University, 2014, 35 (03): : 353 - 360
  • [43] FOREWORD: BIG DATA AND ITS APPLICATION IN HEALTH DISPARITIES RESEARCH
    Onukwugha, Eberechukwu
    Duru, O. Kenrik
    Peprah, Emmanuel
    ETHNICITY & DISEASE, 2017, 27 (02) : 69 - 72
  • [44] A new parallel adaptive clustering and its application to streaming data
    McLaughlin, Benjamin
    Ha Kang, Sung
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 66
  • [45] Regularized matrix data clustering and its application to image analysis
    Gao, Xu
    Shen, Weining
    Zhang, Liwen
    Hu, Jianhua
    Fortin, Norbert J.
    Frostig, Ron D.
    Ombao, Hernando
    BIOMETRICS, 2021, 77 (03) : 890 - 902
  • [46] The application of parallel clustering analysis based on big data mining in physical community discovery
    Fan Wu
    Rui Zhou
    International Journal of System Assurance Engineering and Management, 2022, 13 : 1054 - 1062
  • [47] A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data
    Dierckens, Karl E.
    Harrison, Adrian B.
    Leung, Carson K.
    Pind, Adrienne V.
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 925 - 932
  • [48] Consensus Clustering by Weight Optimization of Input Partitions
    Alguliyev, Rasim
    Aliguliyev, Ramiz
    Sukhostat, Lyudmila
    2019 IEEE 13TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT 2019), 2019, : 143 - 146
  • [49] p-PIC: Parallel power iteration clustering for big data
    Yan, Weizhong
    Brahmakshatriya, Umang
    Xue, Ya
    Gilder, Mark
    Wise, Bowden
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (03) : 352 - 359
  • [50] Range-based Clustering Supporting Similarity Search in Big Data
    Trong Nhan Phan
    Jaeger, Markus
    Nadschlaeger, Stefan
    Kueng, Josef
    2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 120 - 124