Online Task Resource Consumption Prediction for Scientific Workflows

被引:42
作者
da Silva, Rafael Ferreira [1 ]
Juve, Gideon [1 ]
Rynge, Mats [1 ]
Deelman, Ewa [1 ]
Livny, Miron [2 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[2] Univ Wisconsin, Madison, WI USA
关键词
Scientific workflow; workflow characterization; online resource usage task estimation; MAPE-K loop;
D O I
10.1142/S0129626415410030
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable workflow executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile five real scientific workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize workflow task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption based on the size of the tasks' input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using a clustering technique. Task estimates are generated based on the ratio parameter/input data size if they are correlated, or based on the probability distribution function of the parameter. We then propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.
引用
收藏
页数:25
相关论文
共 60 条
[1]  
Albrecht M., 2012, 1 ACM SIGMOD WORKSHO, P1
[2]   Montage: A grid enabled engine for delivering custom science-grade mosaics on demand [J].
Berriman, GB ;
Deelman, E ;
Good, J ;
Jacob, J ;
Katz, DS ;
Kesselman, C ;
Laity, A ;
Prince, TA ;
Singh, G ;
Su, MH .
OPTIMIZING SCIENTIFIC RETURN FOR ASTRONOMY THROUGH INFORMATION TECHNOLOGIES, 2004, 5493 :221-232
[3]  
Bessai K., 2012, 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), P638, DOI 10.1109/CLOUD.2012.83
[4]   Execution Time Estimation for Workflow Scheduling [J].
Chirkin, Artem M. ;
Belloum, A. S. Z. ;
Kovalchuk, Sergey V. ;
Makkes, Marc X. .
2014 9TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS), 2014, :1-10
[5]  
da Silva R. Ferreira, 2013, P 8 WORKSH WORKFL SU, p58, DOI [10.1145/2534248.2534254, DOI 10.1145/2534248.2534254]
[6]   Community Resources for Enabling Research in Distributed Scientific Workflows [J].
da Silva, Rafael Ferreira ;
Chen, Weiwei ;
Juve, Gideon ;
Vahi, Karan ;
Deelman, Ewa .
2014 IEEE 10TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), VOL 1, 2014, :177-184
[7]   Self-healing of workflow activity incidents on distributed computing infrastructures [J].
da Silva, Rafael Ferreira ;
Glatard, Tristan ;
Desprez, Frederic .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (08) :2284-2294
[8]  
da Silva RF, 2013, LECT NOTES COMPUT SC, V7640, P79, DOI 10.1007/978-3-642-36949-0_10
[9]  
Deelman E., 2005, Scientific Programming, V13, P219
[10]   Pegasus, a workflow management system for science automation [J].
Deelman, Ewa ;
Vahi, Karan ;
Juve, Gideon ;
Rynge, Mats ;
Callaghan, Scott ;
Maechling, Philip J. ;
Mayani, Rajiv ;
Chen, Weiwei ;
da Silva, Rafael Ferreira ;
Livny, Miron ;
Wenger, Kent .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 46 :17-35