A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets

被引：0

作者：

Ankita Sinha

Prasanta K. Jana

机构：

[1] IIT (ISM),Department of Computer Science and Engineering

[2] Dhanbad,undefined

来源：

The Journal of Supercomputing | 2018年 / 74卷

关键词：

Mahalanobis distance; Apache Hadoop; -means++ initialization; Genetic algorithm;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means++\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ ++ $$\end{document} initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA.

引用

页码：1562 / 1579

页数：17

共 50 条

[31] Combining K-MEANS and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering
Islam, Md Zahidul
Estivill-Castro, Vladimir
Rahman, Md Anisur
Bossomaier, Terry
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 402 - 417
[32] Mahalanobis Distance Based K-Means Clustering
Brown, Paul O.
Chiang, Meng Ching
Guo, Shiqing
Jin, Yingzi
Leung, Carson K.
Murray, Evan L.
Pazdor, Adam G. M.
Cuzzocrea, Alfredo
[J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 256 - 262
[33] An Effective Hybrid Method Based on DE, GA, and K-means for Data Clustering
Prakash, Jay
Singh, Pramod Kumar
[J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 : 1561 - 1572
[34] ARRHYTHMIA DISEASE DIAGNOSIS USING NEURAL NETWORK, SVM, AND GENETIC ALGORITHM-OPTIMIZED k-MEANS CLUSTERING
Martis, Roshan Joy
Chakraborty, Chandan
[J]. JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2011, 11 (04) : 897 - 915
[35] An adaptive and opposite K-means operation based memetic algorithm for data clustering
Wang, Xi
Wang, Zidong
Sheng, Mengmeng
Li, Qi
Sheng, Weiguo
[J]. NEUROCOMPUTING, 2021, 437 : 131 - 142
[36] Multi-Mode Active Suspension Control Based on a Genetic K-Means Clustering Linear Quadratic Algorithm
Wu, Kun
Liu, Jiang
Li, Min
Liu, Jianze
Wang, Yushun
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (21):
[37] A hybrid genetic-fuzzy ant colony optimization algorithm for automatic K-means clustering in urban global positioning system
Ran, Xiaojuan
Suyaroj, Naret
Tepsan, Worawit
Ma, Jianghong
Zhou, Xiangbing
Deng, Wu
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[38] A hybrid genetic-fuzzy ant colony optimization algorithm for automatic K-means clustering in urban global positioning system
Ran, Xiaojuan
Suyaroj, Naret
Tepsan, Worawit
Ma, Jianghong
Zhou, Xiangbing
Deng, Wu
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[39] Genetic weighted k-means algorithm for clustering large-scale gene expression data
Wu, Fang-Xiang
[J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 6)
[40] Genetic weighted k-means algorithm for clustering large-scale gene expression data
Fang-Xiang Wu
[J]. BMC Bioinformatics, 9

← 1 2 3 4 5 →