Provenance-based fault tolerance technique recommendation for cloud-based scientific workflows: a practical approach

被引:0
作者
Thaylon Guedes
Leonardo A. Jesus
Kary A. C. S. Ocaña
Lucia M. A. Drummond
Daniel de Oliveira
机构
[1] Instituto de Computação - Universidade Federal Fluminense,
[2] Laboratório Nacional de Computação Científica,undefined
来源
Cluster Computing | 2020年 / 23卷
关键词
Cloud computing; Scientific workflow; Fault-tolerance; Recommendation;
D O I
暂无
中图分类号
学科分类号
摘要
Scientific workflows are abstractions composed of activities, data and dependencies that model a computer simulation and are managed by complex engines named scientific workflow management system (SWfMS). Many workflows demand many computational resources once their executions may involve a number of different programs processing a massive volume of data. Thus, the use of high-performance computing (HPC) and data-intensive scalable computing environments allied to parallelization techniques provides the necessary support for the execution of such workflows. Clouds are environments that already offer HPC capabilities and workflows can explore them. Although clouds offer advantages such as elasticity and availability, failures are a reality rather than a possibility in this environment. Thus, existing SWfMS must be fault-tolerant. There are several types of fault tolerance techniques used in SWfMS such as Checkpoint/Restart, Re-Execution and Over-provisioning, but it is far from trivial to choose the suitable fault tolerance technique for a workflow execution that is not going to jeopardize the parallel execution. The major problem is that the suitable fault tolerance technique may be different for each workflow, activity or activation since programs associated with activities may present different behaviors. This article aims at analyzing several fault-tolerance techniques in a cloud-based SWfMS named SciCumulus, and recommend the suitable one for user’s workflow activities and activations using machine learning techniques and provenance data, thus aiming at improving resiliency.
引用
收藏
页码:123 / 148
页数:25
相关论文
共 50 条
  • [31] Catfish-PSO based scheduling of scientific workflows in IaaS cloud
    Nirmala, S. Jaya
    Bhanu, S. Mary Saira
    [J]. COMPUTING, 2016, 98 (11) : 1091 - 1109
  • [32] Catfish-PSO based scheduling of scientific workflows in IaaS cloud
    S. Jaya Nirmala
    S. Mary Saira Bhanu
    [J]. Computing, 2016, 98 : 1091 - 1109
  • [33] Task scheduling strategy based on data replication in scientific Cloud workflows
    Djebbar, Esma Insaf
    Belalem, Ghalem
    Benadda, Merien
    [J]. MULTIAGENT AND GRID SYSTEMS, 2016, 12 (01) : 55 - 67
  • [34] Multi-objective scheduling strategy for scientific workflows in cloud environment: A Firefly-based approach
    Adhikari, Mainak
    Amgoth, Tarachand
    Srirama, Satish Narayana
    [J]. APPLIED SOFT COMPUTING, 2020, 93
  • [35] GMTA: A Geo-Aware Multi-Agent Task Allocation Approach for Scientific Workflows in Container-Based Cloud
    Niu, Meng
    Cheng, Bo
    Feng, Yimeng
    Chen, Junliang
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2020, 17 (03): : 1568 - 1581
  • [36] Automated pressure transient analysis: A cloud-based approach
    Guo, Yonggui
    Mohamed, Ibrahim
    Zidane, Ali
    Panchal, Yashesh
    Abou-Sayed, Omar
    Abou-Sayed, Ahmed
    [J]. JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2021, 196
  • [37] Abstraction Approach for Developing and Delivering Cloud-based Services
    Nguyen, Binh Minh
    Tran, Viet
    Hluchy, Ladislav
    [J]. 2012 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND INDUSTRIAL INFORMATICS (ICCSII), 2012,
  • [38] Enhancing an Availability of Cloud Based on Fault Tolerance Techniques
    Yousef, Alraddadi Safaa S.
    Saleem, Alraddadi Faisal S.
    [J]. 2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC), 2018,
  • [39] Dynamic Approach Based on Learning Automata for Data Fault-Tolerance in the Cloud Storage
    Hosseini, Seyyed Mansour
    Arani, Mostafa Ghobaei
    Kenari, Abdol Reza Rasouli
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (06): : 91 - 103
  • [40] A comprehensive approach to privacy in the cloud-based Internet of Things
    Henze, Martin
    Hermerschmidt, Lars
    Kerpen, Daniel
    Haeussling, Roger
    Rumpe, Bernhard
    Wehrle, Klaus
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 56 : 701 - 718