A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引：11

作者：

Deng, Chuang ^{[1
]}

Liu, Yang ^{[1
]}

Xu, Lixiong ^{[1
]}

Yang, Jie ^{[1
]}

Liu, Junyong ^{[1
]}

Li, Siguang ^{[3
]}

Li, Maozhen ^{[2
,3
]}

机构：

[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China

[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England

[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2016年 / 28卷 / 11期

基金：

美国国家科学基金会;

关键词：

CIM verification; stochastic sampling; clustering; MapReduce; load balancing;

D O I：

10.1002/cpe.3580

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.

引用

页码：3096 / 3114

页数：19

共 50 条

[41] Large-scale k-means clustering with user-centric privacy-preservation
Sakuma, Jun
Kobayashi, Shigenobu
KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (02) : 253 - 279
[42] Parallel Fault Diagnosis of Power Transformer Based on MapReduce and K-means
Wang, Dewen
Liu, Xiaojian
CURRENT DEVELOPMENT OF MECHANICAL ENGINEERING AND ENERGY, PTS 1 AND 2, 2014, 494-495 : 813 - 816
[43] MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era
Chen, Jiaoyan
Chen, Huajun
Wan, Xiangyi
Zheng, Guozhou
NEURAL COMPUTING & APPLICATIONS, 2016, 27 (01): : 101 - 110
[44] MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era
Jiaoyan Chen
Huajun Chen
Xiangyi Wan
Guozhou Zheng
Neural Computing and Applications, 2016, 27 : 101 - 110
[45] MapReduce-based Parallel Algorithms for Multidimensionnal Data Analysis
Pan, Jie
Magoules, Frederic
Le Biannic, Yann
JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (02) : 325 - 350
[46] Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging
Benmounah, Zakaria
Meshoul, Souham
Batouche, Mohamed
Lio, Pietro
APPLIED SOFT COMPUTING, 2018, 69 : 771 - 783
[47] An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm
Sardar T.H.
Ansari Z.
Ansari, Zahid (zahid_cs@pace.edu.in), 1600, Springer (101): : 641 - 650
[48] Large-Scale Stream k-means based on Product-Quantized codes
Hang, Yuqing
Yin, Hongwei
Hu, Wenjun
Zhong, Longfei
Ni, Yuzhou
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025,
[49] Clustering of Image Data Using K-Means and Fuzzy K-Means
Rahmani, Md. Khalid Imam
Pal, Naina
Arora, Kamiya
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
[50] Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-scale Heterogeneous Supercomputers
Shirahata, Koichi
Sato, Hitoshi
Suzumura, Toyotaro
Matsuoka, Satoshi
PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 277 - 284

← 1 2 3 4 5 →