A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets

被引:0
作者
Ankita Sinha
Prasanta K. Jana
机构
[1] IIT (ISM),Department of Computer Science and Engineering
[2] Dhanbad,undefined
来源
The Journal of Supercomputing | 2018年 / 74卷
关键词
Mahalanobis distance; Apache Hadoop; -means++ initialization; Genetic algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means++\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ ++ $$\end{document} initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA.
引用
收藏
页码:1562 / 1579
页数:17
相关论文
共 50 条
  • [21] Context Quantization Based on The Modified Genetic Algorithm with K-means
    Chen, Min
    Chen, Jianhua
    2013 NINTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2013, : 434 - 438
  • [22] Efficient adaptive large-scale text clustering method based on genetic K-means algorithm
    Dai, Wenhua
    Jiao, Cuizhen
    He, Tingting
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 281 - 285
  • [23] Improving Performance of K-Means Clustering by Initializing Cluster Centers Using Genetic Algorithm and Entropy Based Fuzzy Clustering for Categorization of Diabetic Patients
    Karegowda, Asha Gowda
    Shama, Vidya T.
    Jayaram, M. A.
    Manjunath, A. S.
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, 2013, 174 : 899 - 904
  • [24] ADAPTIVE K-MEANS ALGORITHM FOR OVERLAPPED GRAPH CLUSTERING
    Bello-Orgaz, Gema
    Menendez, Hector D.
    Camacho, David
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2012, 22 (05)
  • [25] GAPBAS: Genetic algorithm-based privacy budget allocation strategy in differential privacy K-means clustering algorithm
    Li, Yong
    Song, Xiao
    Tu, Yuchun
    Liu, Ming
    COMPUTERS & SECURITY, 2024, 139
  • [26] Modifying Genetic Algorithm with Species and Sexual Selection by using K-means Algorithm
    Patel, Rahila
    Raghuwanshi, M. M.
    Jaiswal, Anil N.
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 114 - +
  • [27] An Enhanced K-Means Genetic Algorithms for Optimal Clustering
    Anusha, M.
    Sathiaseelan, J. G. R.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 580 - 584
  • [28] On Solving 0/1 Multidimensional Knapsack Problem with a Genetic Algorithm Using a Selection Operator Based on K-Means Clustering Principle
    Laabadi, Soukaina
    Naimi, Mohamed
    El Amri, Hassan
    Achchab, Boujemaa
    FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2022, 47 (03) : 247 - 269
  • [29] Locality Preserving Based K-Means Clustering
    Yang, Xiaohuan
    Wang, Xiaoming
    Tian, Yong
    Du, Yajun
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING TECHNIQUES, ISCIDE 2015, PT II, 2015, 9243 : 86 - 95
  • [30] Combining K-MEANS and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering
    Islam, Md Zahidul
    Estivill-Castro, Vladimir
    Rahman, Md Anisur
    Bossomaier, Terry
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 402 - 417