Pegasus, a workflow management system for science automation

被引:534
作者
Deelman, Ewa [1 ]
Vahi, Karan [1 ]
Juve, Gideon [1 ]
Rynge, Mats [1 ]
Callaghan, Scott [2 ]
Maechling, Philip J. [2 ]
Mayani, Rajiv [1 ]
Chen, Weiwei [1 ]
da Silva, Rafael Ferreira [1 ]
Livny, Miron [3 ]
Wenger, Kent [3 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[2] Univ So Calif, Los Angeles, CA USA
[3] Univ Wisconsin, Madison, WI USA
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2015年 / 46卷
基金
美国国家科学基金会;
关键词
Scientific workflows; Workflow management system; Pegasus; PROVENANCE; TAVERNA;
D O I
10.1016/j.future.2014.10.008
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable workflow management systems that are able to reliably and efficiently coordinate and automate data movement and task execution on distributed computational resources: campus clusters, national cyberinfrastructures, and commercial and academic clouds. This paper describes the design, development and evolution of the Pegasus Workflow Management System, which maps abstract workflow descriptions onto distributed computing infrastructures. Pegasus has been used for more than twelve years by scientists in a wide variety of domains, including astronomy, seismology, bioinformatics, physics and others. This paper provides an integrated view of the Pegasus system, showing its capabilities that have been developed over time in response to application needs and to the evolution of the scientific computing platforms. The paper describes how Pegasus achieves reliable, scalable workflow execution across a wide variety of computing infrastructures. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:17 / 35
页数:19
相关论文
共 78 条
[1]   Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support [J].
Abouelhoda, Mohamed ;
Issa, Shadi Alaa ;
Ghanem, Moustafa .
BMC BIOINFORMATICS, 2012, 13
[2]   Parameter Exploration in Science and Engineering Using Many-Task Computing [J].
Abramson, David ;
Bethwaite, Blair ;
Enticott, Colin ;
Garic, Slavisa ;
Peachey, Tom .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (06) :960-973
[3]  
Allcock W., 2005, P 2005 ACM IEEE C SU, P54, DOI [10.1109/SC.2005.72, DOI 10.1109/SC.2005.72]
[4]  
Altintas I., 2004, P 16 INT C
[5]  
Amazon.com Inc., EL COMP CLOUD EC2
[6]  
Andreetto P., 2006, C COMP HIGH EN PHYS
[7]  
[Anonymous], 2011, P 7 INT C NETW SERV
[8]  
[Anonymous], 1999, Technical Report
[9]  
[Anonymous], P 1 ACM SIGMOD WORKS
[10]  
[Anonymous], 15 ACM MARD GRAS C