Collaborative Learning Based Straggler Prevention in Large-Scale Distributed Computing Framework

被引：49

作者：

Deshmukh, Shyam ^{[1
]}

Thirupathi Rao, Komati ^{[1
]}

Shabaz, Mohammad ^{[2
]}

机构：

[1] Koneru Lakshmaiah Educ Fdn, Dept Comp Sci & Engn, Guntur 522502, AP, India

[2] Arba Minch Univ, Arba Minch, Ethiopia

来源：

SECURITY AND COMMUNICATION NETWORKS | 2021年 / 2021卷

关键词：

MAPREDUCE;

D O I：

10.1155/2021/8340925

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern big data applications tend to prefer a cluster computing approach as they are linked to the distributed computing framework that serves users jobs as per demand. It performs rapid processing of tasks by subdividing them into tasks that execute in parallel. Because of the complex environment, hardware and software issues, tasks might run slowly leading to delayed job completion, and such phenomena are also known as stragglers. The performance improvement of distributed computing framework is a bottleneck by straggling nodes due to various factors like shared resources, heavy system load, or hardware issues leading to the prolonged job execution time. Many state-of-the-art approaches use independent models per node and workload. With increased nodes and workloads, the number of models would increase, and even with large numbers of nodes. Not every node would be able to capture the stragglers as there might not be sufficient training data available of straggler patterns, yielding suboptimal straggler prediction. To alleviate such problems, we propose a novel collaborative learning-based approach for straggler prediction, the alternate direction method of multipliers (ADMM), which is resource-efficient and learns how to efficiently deal with mitigating stragglers without moving data to a centralized location. The proposed framework shares information among the various models, allowing us to use larger training data and bring training time down by avoiding data transfer. We rigorously evaluate the proposed method on various datasets with high accuracy results.

引用

页数：9

共 43 条

[1]

Akhil K., 2016, INDIAN J SCI TECHNOL, V9

[2]

Ananthanarayanan Ganesh, 2013, Proceedings of NSDI '13: 10th USENIX Symposium on Networked Systems Design and Implementation. NSDI '13, P185

[3]

Ananthanarayanan G., 2010, OSDI, V10, P24

[4]

[Anonymous], 2014, IT CONV SEC ICITCS 2

[5]

[Anonymous], P 4 ANN S CLOUD COMP, DOI 10.1145/2523616.2523633

[6]

[Anonymous], 2011, J. Mach.Learn. Res.

[7]

[Anonymous], P EUR C COMP SYST EU

[8]

[Anonymous], 2014, PROC ACM S CLOUD COM

[9] Genetic algorithm based optimized leach protocol for energy efficient wireless sensor networks [J].

Bhola, Jyoti ;

Soni, Surender ;

Cheema, Gagandeep Kaur .

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (03) :1281-1288

[10] Distributed optimization and statistical learning via the alternating direction method of multipliers [J].

Boyd S. ;

Parikh N. ;

Chu E. ;

Peleato B. ;

Eckstein J. .

Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122

← 1 2 3 4 5 →