Parallel and fault-tolerant k-means clustering based on the actor model

被引:1
|
作者
Taamneh, Salah [1 ]
Qawasmeh, Ahmad [1 ]
Aljammal, Ashraf H. [1 ]
机构
[1] Hashemite Univ, Dept Comp Sci & Applicat, Zarqa, Jordan
关键词
Parallel k-means; actor-model; checkpointing; MEANS ALGORITHM;
D O I
10.3233/MGS-200336
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
K-means algorithm is a well-known unsupervised machine learning tool that aims at splitting a given dataset into a fixed number of clusters via iterative refinement approach. Running such an algorithm on today's datasets that are characterized by its high multidimensionality and huge size requires using fault-tolerance mechanisms to mitigate the impact of possible failures. In this paper, we propose an actor-based implementation of k-means algorithm. The algorithm was made fault-tolerant by periodically saving the centroids into a stable storage during the failure-free execution, and restarting from the last saved centroids upon a failure. This was implemented in two different ways: optimistic checkpointing (blocking) and pessimistic checkpointing (non-blocking). The actor-based k-means algorithm was evaluated on a machine with eight cores. The experiments showed that the proposed algorithm scales very well as the number of workers increases, and can be up to similar to 2x faster than a Java-thread-based implementation of k-means algorithm. The results also showed that the optimistic algorithm outperformed the pessimistic one, specifically, in the presence of competing I/O operations. Several failures were forced to occur during the execution to evaluate the performance of the fault-tolerant implementations. The experiments showed that the average amount of lost work ranged from 3-6%.
引用
收藏
页码:379 / 396
页数:18
相关论文
共 50 条
  • [21] Enhanced Parallel Implementation of the K-Means Clustering Algorithm
    Baydoun, Mohammed
    Dawi, Mohammad
    Ghaziri, Hassan
    2016 3RD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTATIONAL TOOLS FOR ENGINEERING APPLICATIONS (ACTEA), 2016, : 7 - 11
  • [22] Parallel batch k-means for Big data clustering
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    Sukhostat, Lyudmila, V
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
  • [23] Parallel bisecting k-means with prediction clustering algorithm
    Yanjun Li
    Soon M. Chung
    The Journal of Supercomputing, 2007, 39 : 19 - 37
  • [24] A k-means based clustering algorithm
    Bloisi, Domenico Daniele
    Locchi, Luca
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118
  • [25] Graph based k-means clustering
    Galluccio, Laurent
    Michel, Olivier
    Comon, Pierre
    Hero, Alfred O., III
    SIGNAL PROCESSING, 2012, 92 (09) : 1970 - 1984
  • [26] A multilevel fault model for integrated parallel fault-tolerant systems
    Fechner, Bernhard
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (07): : 687 - 698
  • [27] A Parallel Multiple K-Means Clustering and Application on Detect Near Native Model
    Wu, Hongjie
    Wu, Chuang
    Cheng, Chen
    Song, Longfei
    Jiang, Min
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 880 - 887
  • [28] Optimized K-Means Clustering Model based on Gap Statistic
    El-Mandouh, Amira M.
    Mahmoud, Hamdi A.
    Abd-Elmegid, Laila A.
    Haggag, Mohamed H.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 183 - 188
  • [29] An Improved K-Means Clustering Algorithm Based on Semantic Model
    Liu, Zhe
    Bao, Jianmin
    Ding, Fei
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING 2018 (ICITEE '18), 2018,
  • [30] Model Based Modified K-Means Clustering for Microarray Data
    Suresh, R. M.
    Dinakaran, K.
    Valarmathie, P.
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND ENGINEERING, PROCEEDINGS, 2009, : 271 - 273