Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centres

被引:17
作者
Gill, Sukhpal Singh [1 ]
Ouyang, Xue [2 ]
Garraghan, Peter [3 ]
机构
[1] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London, England
[2] Natl Univ Def Technol, Sch Elect Sci, Changsha, Peoples R China
[3] Univ Lancaster, Sch Comp & Commun, Lancaster, England
基金
英国工程与自然科学研究理事会;
关键词
Computing; Stragglers; Cloud computing; Straggler management; Distributed systems; Cloud data centres;
D O I
10.1007/s11227-020-03241-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.
引用
收藏
页码:10050 / 10089
页数:40
相关论文
共 50 条
  • [31] A Large-Scale Secure Image Retrieval Method in Cloud Environment
    Xu, Yanyan
    Zhao, Xiao
    Gong, Jiaying
    IEEE ACCESS, 2019, 7 : 160082 - 160090
  • [32] Characterisation of Hidden Periodicity in Large-scale Cloud Datacentre Environments
    Panneerselvam, John
    Liu, Lu
    Antonopoulos, Nick
    2017 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2017, : 496 - 503
  • [33] Cloud Computing Applications for Large-Scale Satellite Ground Systems
    Anthony, Richard
    Fritz, John
    Barnhart, Doug
    2011 - MILCOM 2011 MILITARY COMMUNICATIONS CONFERENCE, 2011, : 1894 - 1898
  • [34] An Efficient and Verifiable Encrypted Data Filtering Framework Over Large-Scale Storage in Cloud Edge
    Huang, Qinlong
    Wang, Chao
    Lu, Boyu
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 8248 - 8262
  • [35] Large-scale Ship Fault Data Retrieval Algorithm Supporting Complex Query in Cloud Computing
    Zhang, Shujuan
    JOURNAL OF COASTAL RESEARCH, 2019, : 236 - 241
  • [36] SVM-Based Incremental Learning Algorithm for Large-Scale Data Stream in Cloud Computing
    Wang, Ning
    Yang, Yang
    Feng, Liyuan
    Mi, Zhenqiang
    Meng, Kun
    Ji, Qing
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2014, 8 (10): : 3378 - 3393
  • [37] Remote Data Auditing in Cloud Computing Environments: A Survey, Taxonomy, and Open Issues
    Sookhak, Mehdi
    Gani, Abdullah
    Talebian, Hamid
    Akhunzada, Adnan
    Khan, Samee U.
    Buyya, Rajkumar
    Zomaya, Albert Y.
    ACM COMPUTING SURVEYS, 2015, 47 (04)
  • [38] DATA PLACEMENT IN ERA OF CLOUD COMPUTING: A SURVEY, TAXONOMY AND OPEN RESEARCH ISSUES
    Kaur, Avinash
    Gupta, Pooja
    Singh, Manpreet
    Nayyar, Anand
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2019, 20 (02): : 377 - 398
  • [39] Exploiting Cloud Computing and Web Services to Achieve Data Consistency, Availability, and Partition Tolerance in the Large-Scale Pervasive Systems
    Fadelelmoula A.A.
    International Journal of Interactive Mobile Technologies, 2021, 15 (15) : 74 - 102
  • [40] An Empirical Failure-Analysis of a Large-Scale Cloud Computing Environment
    Garraghan, Peter
    Townend, Paul
    Xu, Jie
    2014 IEEE 15TH INTERNATIONAL SYMPOSIUM ON HIGH-ASSURANCE SYSTEMS ENGINEERING (HASE), 2014, : 113 - 120