Parallel and fault-tolerant k-means clustering based on the actor model

被引:1
作者
Taamneh, Salah [1 ]
Qawasmeh, Ahmad [1 ]
Aljammal, Ashraf H. [1 ]
机构
[1] Hashemite Univ, Dept Comp Sci & Applicat, Zarqa, Jordan
关键词
Parallel k-means; actor-model; checkpointing; MEANS ALGORITHM;
D O I
10.3233/MGS-200336
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
K-means algorithm is a well-known unsupervised machine learning tool that aims at splitting a given dataset into a fixed number of clusters via iterative refinement approach. Running such an algorithm on today's datasets that are characterized by its high multidimensionality and huge size requires using fault-tolerance mechanisms to mitigate the impact of possible failures. In this paper, we propose an actor-based implementation of k-means algorithm. The algorithm was made fault-tolerant by periodically saving the centroids into a stable storage during the failure-free execution, and restarting from the last saved centroids upon a failure. This was implemented in two different ways: optimistic checkpointing (blocking) and pessimistic checkpointing (non-blocking). The actor-based k-means algorithm was evaluated on a machine with eight cores. The experiments showed that the proposed algorithm scales very well as the number of workers increases, and can be up to similar to 2x faster than a Java-thread-based implementation of k-means algorithm. The results also showed that the optimistic algorithm outperformed the pessimistic one, specifically, in the presence of competing I/O operations. Several failures were forced to occur during the execution to evaluate the performance of the fault-tolerant implementations. The experiments showed that the average amount of lost work ranged from 3-6%.
引用
收藏
页码:379 / 396
页数:18
相关论文
共 44 条
  • [31] A new weighting k-means type clustering framework with an l2-norm regularization
    Huang, Xiaohui
    Yang, Xiaofei
    Zhao, Junhui
    Xiong, Liyan
    Ye, Yunming
    KNOWLEDGE-BASED SYSTEMS, 2018, 151 : 165 - 179
  • [32] Issues in clustering algorithm consistency in fixed dimensional spaces. Some solutions for k-means
    Mieczysław A. Kłopotek
    Robert A. Kłopotek
    Journal of Intelligent Information Systems, 2021, 57 : 509 - 530
  • [33] Enhancing Parallel k-Means Using Map Reduce for Discovering Knowledge from Big Data
    Moertini, Veronica S.
    Venica, Liptia
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 81 - 87
  • [34] An optimized fuzzy K-means clustering method for automated rock discontinuities extraction from point clouds
    Zhou, Jia-wen
    Chen, Jun-lin
    Li, Hai-bo
    INTERNATIONAL JOURNAL OF ROCK MECHANICS AND MINING SCIENCES, 2024, 173
  • [35] Scalable and efficient fault-tolerant protocol for mobility agents in mobile IP-based systems
    Ahn, J
    Min, SG
    Hwang, CS
    FUTURE GENERATION COMPUTER SYSTEMS, 2002, 18 (05) : 613 - 625
  • [36] Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling
    Kim, Hyunjoong
    Kim, Han Kyul
    Cho, Sungzoon
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150
  • [37] Energy-efficient fault-tolerant scheduling in a fog-based smart monitoring application
    Sharif, Ahmad
    Nickray, Mohsen
    Shahidinejad, Ali
    INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2021, 36 (01) : 32 - 49
  • [38] Schedulability analysis for fault-tolerant hard real-time systems based on rollback recovery
    Ding W.-F.
    Guo R.-F.
    Zhao J.
    Liu X.
    Li J.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2011, 33 (07): : 1673 - 1679
  • [39] A Fault-Tolerant Scheduling Algorithm Based on Checkpointing and Redundancy for Distributed Real-Time Systems
    Kada, Barkahoum
    Kalla, Hamoudi
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (03) : 58 - 75
  • [40] A Java']Javaspace-based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications
    Galtier, Virginie
    Makassikis, Constantinos
    Vialle, Stephane
    PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2011, : 272 - 276