A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引:11
|
作者
Deng, Chuang [1 ]
Liu, Yang [1 ]
Xu, Lixiong [1 ]
Yang, Jie [1 ]
Liu, Junyong [1 ]
Li, Siguang [3 ]
Li, Maozhen [2 ,3 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China
[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2016年 / 28卷 / 11期
基金
美国国家科学基金会;
关键词
CIM verification; stochastic sampling; clustering; MapReduce; load balancing;
D O I
10.1002/cpe.3580
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:3096 / 3114
页数:19
相关论文
共 50 条
  • [21] Multiple Parallel MapReduce k-means Clustering with Validation and Selection
    Garcia, Kemilly Dearo
    Naldi, Murilo Coelho
    2014 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2014, : 432 - 437
  • [22] MapReduce-Based Crow Search-Adopted Partitional Clustering Algorithms for Handling Large-Scale Data
    Visalakshi, Karthikeyani N.
    Shanthi, S.
    Lakshmi, K.
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [23] Large-scale k-means clustering via variance reduction
    Zhao, Yawei
    Ming, Yuewei
    Liu, Xinwang
    Zhu, En
    Zhao, Kaikai
    Yin, Jianping
    NEUROCOMPUTING, 2018, 307 : 184 - 194
  • [24] Genetic weighted k-means algorithm for clustering large-scale gene expression data
    Wu, Fang-Xiang
    BMC BIOINFORMATICS, 2008, 9 (Suppl 6)
  • [25] Genetic weighted k-means algorithm for clustering large-scale gene expression data
    Fang-Xiang Wu
    BMC Bioinformatics, 9
  • [26] Very large-scale data classification based on K-means clustering and multi-kernel SVM
    Tinglong Tang
    Shengyong Chen
    Meng Zhao
    Wei Huang
    Jake Luo
    Soft Computing, 2019, 23 : 3793 - 3801
  • [27] Very large-scale data classification based on K-means clustering and multi-kernel SVM
    Tang, Tinglong
    Chen, Shengyong
    Zhao, Meng
    Huang, Wei
    Luo, Jake
    SOFT COMPUTING, 2019, 23 (11) : 3793 - 3801
  • [28] Hierarchical K-means Method for Clustering Large-Scale Advanced Metering Infrastructure Data
    Xu, Tian-Shi
    Chiang, Hsiao-Dong
    Liu, Guang-Yi
    Tan, Chin-Woo
    IEEE TRANSACTIONS ON POWER DELIVERY, 2017, 32 (02) : 609 - 616
  • [29] Distributed, MapReduce-based Nearest Neighbor and ε-ball Kernel k-Means
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 509 - 515
  • [30] K-means Clustering Optimization Algorithm Based on MapReduce
    Li, Zhihua
    Song, Xudong
    Zhu, Wenhui
    Chen, Yanxia
    PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 198 - 203