Provenance-based fault tolerance technique recommendation for cloud-based scientific workflows: a practical approach

被引:0
作者
Thaylon Guedes
Leonardo A. Jesus
Kary A. C. S. Ocaña
Lucia M. A. Drummond
Daniel de Oliveira
机构
[1] Instituto de Computação - Universidade Federal Fluminense,
[2] Laboratório Nacional de Computação Científica,undefined
来源
Cluster Computing | 2020年 / 23卷
关键词
Cloud computing; Scientific workflow; Fault-tolerance; Recommendation;
D O I
暂无
中图分类号
学科分类号
摘要
Scientific workflows are abstractions composed of activities, data and dependencies that model a computer simulation and are managed by complex engines named scientific workflow management system (SWfMS). Many workflows demand many computational resources once their executions may involve a number of different programs processing a massive volume of data. Thus, the use of high-performance computing (HPC) and data-intensive scalable computing environments allied to parallelization techniques provides the necessary support for the execution of such workflows. Clouds are environments that already offer HPC capabilities and workflows can explore them. Although clouds offer advantages such as elasticity and availability, failures are a reality rather than a possibility in this environment. Thus, existing SWfMS must be fault-tolerant. There are several types of fault tolerance techniques used in SWfMS such as Checkpoint/Restart, Re-Execution and Over-provisioning, but it is far from trivial to choose the suitable fault tolerance technique for a workflow execution that is not going to jeopardize the parallel execution. The major problem is that the suitable fault tolerance technique may be different for each workflow, activity or activation since programs associated with activities may present different behaviors. This article aims at analyzing several fault-tolerance techniques in a cloud-based SWfMS named SciCumulus, and recommend the suitable one for user’s workflow activities and activations using machine learning techniques and provenance data, thus aiming at improving resiliency.
引用
收藏
页码:123 / 148
页数:25
相关论文
共 50 条
  • [21] Support for Scientific Workflows in a Model-based Cloud Platform
    Malawski, Maciej
    Balis, Bartosz
    Figiela, Kamil
    Pawlik, Maciej
    Bubak, Marian
    2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 412 - 413
  • [22] Gradient-Based Scheduler for Scientific Workflows in Cloud Computing
    Wang, Danjing
    Li, Huifang
    Zhang, Youwei
    Zhang, Baihai
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2023, 27 (01) : 64 - 73
  • [23] Cloud-based provenance framework for duplicates identification and data quality enhancement
    Khan, Fakhri Alam
    EXPERT SYSTEMS, 2025, 42 (01)
  • [24] Re-provisioning of Cloud-Based Execution Infrastructure Using the Cloud-Aware Provenance to Facilitate Scientific Workflow Execution Reproducibility
    Hasham, Khawar
    Munir, Kamran
    McClatchey, Richard
    Shamdasani, Jetendr
    CLOUD COMPUTING AND SERVICES SCIENCE, CLOSER 2015, 2016, 581 : 74 - 94
  • [25] The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows
    Hayot-Sasson, Valerie
    Glatard, Tristan
    Rokem, Ariel
    PROCEEDINGS OF 16TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS21), 2021, : 42 - 49
  • [26] An approach for economic evaluation of cloud-based applications
    Pena-Ortiz, Raul
    Domenech, Josep
    Gil, Jose A.
    Pont, Ana
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (CLOUDNET), 2014, : 281 - 287
  • [27] Data fault tolerance technology for cloud-based high-speed railway signal interlocking system
    He, Zhibin
    Xing, Kejia
    Zhang, Hongyang
    Wei, Dongdong
    Kong, Jiacheng
    Journal of Railway Science and Engineering, 2024, 21 (07) : 2592 - 2602
  • [28] Cloud-Based Approach for Smart Facilities Management
    Lau, D.
    Liu, J.
    Majumdar, S.
    Nandy, B.
    St-Hilaire, M.
    Yang, C. S.
    2013 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT, 2013,
  • [29] Fault Tolerance Based Load Balancing Approach for Web Resources in Cloud Environment
    Shukla, Anju
    Kumar, Shishir
    Singh, Harikesh
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (02) : 225 - 232
  • [30] Fault Tolerance for Composite Cloud Services: A MAS-based Novel Approach
    Hioual, Ouassila
    Hemam, Sofiane Mounine
    Hioual, Ouided
    Mordjane, Rania
    Bouhlala, Nessrine
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2022, 25 (69): : 183 - 200