A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows

被引:13
作者
Vahi, Karan [1 ]
Harvey, Ian [2 ]
Samak, Taghrid [3 ]
Gunter, Daniel [3 ]
Evans, Kieran [2 ]
Rogers, David [2 ]
Taylor, Ian [2 ]
Goode, Monte [3 ]
Silva, Fabio [4 ]
Al-Shakarchi, Eddie [2 ]
Mehta, Gaurang [1 ]
Deelman, Ewa [1 ]
Jones, Andrew [2 ]
机构
[1] USC Informat Sci Inst, Marina Del Rey, CA USA
[2] Sch Comp Sci, Cardiff, S Glam, Wales
[3] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[4] Univ So Calif, Los Angeles, CA USA
基金
美国国家科学基金会;
关键词
Scientific workflows; Real time monitoring; Common monitoring infrastructure; Log analysis; Troubleshooting; Workflow performance data; SCIENCE;
D O I
10.1007/s10723-013-9265-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific workflow systems support various workflow representations, operational modes, and configurations. Regardless of the system used, end users have common needs: to track the status of their workflows in real time, be notified of execution anomalies and failures automatically, perform troubleshooting, and automate the analysis of the workflow results. In this paper, we describe how the Stampede monitoring infrastructure was integrated with the Pegasus Workflow Management System and the Triana Workflow Systems, in order to add generic real time monitoring and troubleshooting capabilities across both systems. Stampede is an infrastructure that provides interoperable monitoring using a three-layer model: (1) a common data model to describe workflow and job executions; (2) high-performance tools to load workflow logs conforming to the data model into a data store; and (3) a common query interface. This paper describes the integration of Stampede monitoring architecture with Pegasus and Triana and shows the new analysis capabilities that Stampede provides to these workflow systems. The successful integration of Stampede with these workflow engines demonstrates the generic nature of the Stampede monitoring infrastructure and its potential to provide a common platform for monitoring across scientific workflow engines.
引用
收藏
页码:381 / 406
页数:26
相关论文
共 35 条
[1]   Web services composition for distributed data mining [J].
Ali, AS ;
Rana, OF ;
Taylor, IJ .
2005 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, PROCEEDINGS, 2005, :11-18
[2]  
Altintas I, 2004, 16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, P423
[3]  
Andrews Tony., 2003, Business process execution language for web services
[4]  
[Anonymous], 2011, P 7 INT C NETW SERV
[5]  
[Anonymous], COREGRID S EUR PAR 2
[6]  
Barga Roger, 2008, 2008 IEEE Fourth International Conference on eScience, P317, DOI 10.1109/eScience.2008.126
[7]  
Benson T., 2011, IHIC 2011 C ORL
[8]   Metrics for heterogeneous scientific workflows: A case study of an earthquake science application [J].
Callaghan, Scott ;
Maechling, Philip ;
Small, Patrick ;
Milner, Kevin ;
Juve, Gideon ;
Jordan, Thomas H. ;
Deelman, Ewa ;
Mehta, Gaurang ;
Vahi, Karan ;
Gunter, Dan ;
Beattie, Keith ;
Brooks, Christopher .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (03) :274-285
[9]   Scaling up workflow-based applications [J].
Callaghan, Scott ;
Deelman, Ewa ;
Gunter, Dan ;
Juve, Gideon ;
Maechling, Philip ;
Brooks, Christopher ;
Vahi, Karan ;
Milner, Kevin ;
Graves, Robert ;
Field, Edward ;
Okaya, David ;
Jordan, Thomas .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2010, 76 (06) :428-446
[10]  
Couvares P., 2007, Workflows for e-Science