A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引:11
|
作者
Deng, Chuang [1 ]
Liu, Yang [1 ]
Xu, Lixiong [1 ]
Yang, Jie [1 ]
Liu, Junyong [1 ]
Li, Siguang [3 ]
Li, Maozhen [2 ,3 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China
[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2016年 / 28卷 / 11期
基金
美国国家科学基金会;
关键词
CIM verification; stochastic sampling; clustering; MapReduce; load balancing;
D O I
10.1002/cpe.3580
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:3096 / 3114
页数:19
相关论文
共 50 条
  • [41] Large-scale k-means clustering with user-centric privacy-preservation
    Sakuma, Jun
    Kobayashi, Shigenobu
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (02) : 253 - 279
  • [42] Parallel Fault Diagnosis of Power Transformer Based on MapReduce and K-means
    Wang, Dewen
    Liu, Xiaojian
    CURRENT DEVELOPMENT OF MECHANICAL ENGINEERING AND ENERGY, PTS 1 AND 2, 2014, 494-495 : 813 - 816
  • [43] MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era
    Chen, Jiaoyan
    Chen, Huajun
    Wan, Xiangyi
    Zheng, Guozhou
    NEURAL COMPUTING & APPLICATIONS, 2016, 27 (01): : 101 - 110
  • [44] MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era
    Jiaoyan Chen
    Huajun Chen
    Xiangyi Wan
    Guozhou Zheng
    Neural Computing and Applications, 2016, 27 : 101 - 110
  • [45] MapReduce-based Parallel Algorithms for Multidimensionnal Data Analysis
    Pan, Jie
    Magoules, Frederic
    Le Biannic, Yann
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (02) : 325 - 350
  • [46] Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging
    Benmounah, Zakaria
    Meshoul, Souham
    Batouche, Mohamed
    Lio, Pietro
    APPLIED SOFT COMPUTING, 2018, 69 : 771 - 783
  • [47] An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm
    Sardar T.H.
    Ansari Z.
    Ansari, Zahid (zahid_cs@pace.edu.in), 1600, Springer (101): : 641 - 650
  • [48] Large-Scale Stream k-means based on Product-Quantized codes
    Hang, Yuqing
    Yin, Hongwei
    Hu, Wenjun
    Zhong, Longfei
    Ni, Yuzhou
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025,
  • [49] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [50] Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-scale Heterogeneous Supercomputers
    Shirahata, Koichi
    Sato, Hitoshi
    Suzumura, Toyotaro
    Matsuoka, Satoshi
    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 277 - 284