Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centres

被引:17
作者
Gill, Sukhpal Singh [1 ]
Ouyang, Xue [2 ]
Garraghan, Peter [3 ]
机构
[1] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London, England
[2] Natl Univ Def Technol, Sch Elect Sci, Changsha, Peoples R China
[3] Univ Lancaster, Sch Comp & Commun, Lancaster, England
基金
英国工程与自然科学研究理事会;
关键词
Computing; Stragglers; Cloud computing; Straggler management; Distributed systems; Cloud data centres;
D O I
10.1007/s11227-020-03241-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.
引用
收藏
页码:10050 / 10089
页数:40
相关论文
共 50 条
  • [41] A cluster-based decentralized job dispatching for the large-scale cloud
    Kang, Byungseok
    Choo, Hyunseung
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2016, : 1 - 8
  • [42] Large-scale virtual screening on public cloud resources with Apache Spark
    Capuccini, Marco
    Ahmed, Laeeq
    Schaal, Wesley
    Laure, Erwin
    Spjuth, Ola
    JOURNAL OF CHEMINFORMATICS, 2017, 9
  • [43] Secure and Efficient Protocol for Outsourcing Large-Scale Matrix Multiplication to the Cloud
    Wu, Yu
    Liao, Yongjian
    Liang, Yikuan
    Liu, Yulu
    IEEE ACCESS, 2020, 8 : 227556 - 227565
  • [44] NScale: neighborhood-centric large-scale graph analytics in the cloud
    Quamar, Abdul
    Deshpande, Amol
    Lin, Jimmy
    VLDB JOURNAL, 2016, 25 (02) : 125 - 150
  • [45] Harnessing the Cloud for Securely Outsourcing Large-Scale Systems of Linear Equations
    Wang, Cong
    Ren, Kui
    Wang, Jia
    Wang, Qian
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (06) : 1172 - 1181
  • [46] Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud
    Moreno, Ismael Solis
    Garraghan, Peter
    Townend, Paul
    Xu, Jie
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2014, 2 (02) : 208 - 221
  • [47] SLA enactment for large-scale healthcare workflows on multi-Cloud
    Jrad, Foued
    Tao, Jie
    Brandic, Ivona
    Streit, Achim
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 43-44 : 135 - 148
  • [48] Large-scale climate simulations harnessing clusters, grid and cloud infrastructures
    Fernandez-Quiruelas, V.
    Blanco, C.
    Cofino, A. S.
    Fernandez, J.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 51 : 36 - 44
  • [49] A CLOUD-BASED LARGE-SCALE DISTRIBUTED VIDEO ANALYSIS SYSTEM
    Wang, Yongzhe
    Chen, Wei-Ta
    Wu, Huahui
    Kokaram, Anil
    Schaeffer, Jaron
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1499 - 1503
  • [50] NScale: neighborhood-centric large-scale graph analytics in the cloud
    Abdul Quamar
    Amol Deshpande
    Jimmy Lin
    The VLDB Journal, 2016, 25 : 125 - 150