Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion

被引:0
作者
Ping Tian
Huitao Shen
Ahad Abolfathi
机构
[1] Xuchang University,Institute of Applied Mathematics
[2] Islamic Azad University,Department of Computer and Information Technology Engineering, Qazvin Branch
来源
Journal of Grid Computing | 2022年 / 20卷
关键词
Hierarchical clustering; Ensemble clustering; MapReduce model; Clusters clustering; Hyper-clusters;
D O I
暂无
中图分类号
学科分类号
摘要
Today, data plays an important and fundamental role in our daily lives. The increasing growth of data production has led to the big data revolution. Managing and analyzing this data, which is often unlabeled, is a major challenge for the real world. Clustering is one of the most important branches of data mining for data analysis and its purpose is to divide the data into meaningful subsets called clusters. Hierarchical clustering is one of the unsupervised learning algorithms for grouping data points with similar properties, so that its concept lies in the construction and analysis of dendrograms. Over the decades, many algorithms have been developed for clustering with different approaches. In this paper, an efficient ensemble hierarchical clustering algorithm based on MapReduce-based clusters clustering technique and an innovative similarity criterion is introduced. The main idea of ensemble clustering is to combine the results of different single clustering methods. Ensemble techniques usually produce better results than single methods due to multiple learning. Accordingly, it can be expected that the aggregation of hierarchical clustering methods will lead to higher quality in clustering. In addition, MapReduce is a model for implementing big data applications, where we use this model to implement hierarchical clustering methods. Meanwhile, the similarity between the samples is calculated through an innovative similarity criterion. The proposed approach is presented in three steps. In the first step, the data are clustered by several single hierarchical clustering methods. Then in the second step, hyper-clusters are generated by applying the clusters clustering technique. Finally, the final clusters are generated in the third step. This is done by allocating samples to hyper-clusters. Accordingly, the final clusters are formed in the third step. The simulation is performed on multiple real-world datasets and the results show better performance of the proposed approach compared to algorithms such as CHC and RCESCC.
引用
收藏
相关论文
共 84 条
  • [1] Boongoen T(2018)Cluster ensembles: A survey of approaches with recent extensions and applications Comput. Sci. Rev. 28 1-25
  • [2] Iam-On N(2019)A Hybrid Approach for Prolonging Lifetime of Wireless Sensor Networks Using Genetic Algorithm and Online Clustering J. Comput. Sci. Eng. 13 163-174
  • [3] Rezaeipanah A(2021)A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding Comput. Biol. Med. 137 3813-3830
  • [4] Nazari H(2021)A workload clustering based resource provisioning mechanism using Biogeography based optimization technique in the cloud based systems Soft. Comput. 25 549-571
  • [5] Ahmadi G(2008)A new method for hierarchical clustering combination Intell. Data Anal. 12 2567-2581
  • [6] Nasiri E(2019)A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters Appl. Intell. 49 1045-1071
  • [7] Berahmand K(2020)An elastic controller using Colored Petri Nets in cloud computing environment Clust. Comput. 23 1235-1252
  • [8] Rostami M(2021)Performing the kick during walking for robocup 3d soccer simulation league using reinforcement learning algorithm Int. J. Soc. Robot. 13 711-750
  • [9] Dabiri M(2021)An efficient resource provisioning approach for analyzing cloud workloads: a metaheuristic-based clustering approach J. Supercomput. 77 239-250
  • [10] Ghobaei-Arani M(2020)Improved K-means clustering algorithm for big data mining under Hadoop parallel framework J. Grid Comput. 18 749-764