Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centres

被引:17
|
作者
Gill, Sukhpal Singh [1 ]
Ouyang, Xue [2 ]
Garraghan, Peter [3 ]
机构
[1] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London, England
[2] Natl Univ Def Technol, Sch Elect Sci, Changsha, Peoples R China
[3] Univ Lancaster, Sch Comp & Commun, Lancaster, England
基金
英国工程与自然科学研究理事会;
关键词
Computing; Stragglers; Cloud computing; Straggler management; Distributed systems; Cloud data centres;
D O I
10.1007/s11227-020-03241-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.
引用
收藏
页码:10050 / 10089
页数:40
相关论文
共 50 条
  • [1] Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centres
    Sukhpal Singh Gill
    Xue Ouyang
    Peter Garraghan
    The Journal of Supercomputing, 2020, 76 : 10050 - 10089
  • [2] A Survey of Large Scale Data Management Approaches in Cloud Environments
    Sakr, Sherif
    Liu, Anna
    Batista, Daniel M.
    Alomari, Mohammad
    IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2011, 13 (03) : 311 - 336
  • [3] MODELLING RESILIENCE IN CLOUD-SCALE DATA CENTRES
    Cartlidge, John
    Sriram, Ilango
    23RD EUROPEAN MODELING & SIMULATION SYMPOSIUM, EMSS 2011, 2011, : 299 - 307
  • [4] Autonomous and Energy-Aware Management of Large-Scale Cloud Infrastructures
    Feller, Eugen
    Morin, Christine
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2542 - 2545
  • [5] Improving System and Software Deployment on a Large-Scale Cloud Data Center
    Wu, Yu-Sheng
    Juang, Tong-Ying
    Chang, Yue-Shan
    Wang, Wei-Jen
    Lu, Jun-Ting
    2013 FIFTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2013, : 82 - 87
  • [6] Clustered Multicast Source Routing for Large-Scale Cloud Data Centers
    Alqahtani, Jarallah
    Sinky, Hassan H.
    Hamdaoui, Bechir
    IEEE ACCESS, 2021, 9 (09): : 12693 - 12705
  • [7] The Application of Cloud Computing in Large-Scale Statistic
    Sun Xiuli
    Li Ying
    Hu Baofang
    Sun Hongfeng
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 308 - 311
  • [8] RESEARCH BASED ON LARGE-SCALE DATA QUERY WITH MAPREDUCE TECHNOLOGY IN CLOUD COMPUTING
    Wang, Feiping
    Gu, Xiaofeng
    2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 243 - 245
  • [9] A Secure Data Assimilation for Large-Scale Sensor Networks Using an Untrusted Cloud
    Xu, Zhiheng
    Zhu, Quanyan
    IFAC PAPERSONLINE, 2017, 50 (01): : 2609 - 2614
  • [10] Towards Efficient Verifiable Cloud Storage and Distribution for Large-Scale Data Streaming
    Yang, Haining
    Feng, Dengguo
    Qin, Jing
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2025, 36 (03) : 487 - 501