Reinforcement Learning based scheduling in a workflow management system

被引:39
作者
Kintsakis, Athanassios M. [1 ]
Psomopoulos, Fotis E. [2 ,3 ]
Mitkas, Pericles A. [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Elect & Comp Engn, Thessaloniki, Greece
[2] Ctr Res & Technol Hellas, Inst Appl Biosci, Thessaloniki 57001, Greece
[3] Karolinska Inst, Dept Mol Med & Surg, Stockholm, Sweden
关键词
Workflow management systems; Scheduling optimization; Machine learning; Reinforcement Learning; Neural networks; CLOUD;
D O I
10.1016/j.engappai.2019.02.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Any computational process from simple data analytics tasks to training a machine learning model can be described by a workflow. Many workflow management systems (WMS) exist that undertake the task of scheduling workflows across distributed computational resources. In this work, we introduce a WMS that leverages machine learning to predict workflow task runtime and the probability of failure of task assignments to execution sites. The expected runtime of workflow tasks can be used to approximate the weight of the workflow graph branches in respect to the total workflow workload and the ability to anticipate task failures can discourage task assignments that are unlikely to succeed. We demonstrate that the proposed machine learning models can lead to significantly more informed scheduling decisions that minimize task failures and utilize execution sites more efficiently, thus leading to reduced workflow runtime. Additionally, we train a modified sequence-to-sequence neural network architecture via reinforcement learning to perform scheduling decisions as part of a WMS. Our approach introduces a WMS that can drastically improve its scheduling performance by independently learning over time, without external intervention or reliance on any specific heuristic or optimization technique. Finally, we test our approach in real-world scenarios utilizing computationally demanding and data intensive workflows and evaluate its performance against existing scheduling methodologies traditionally used in WMSes. The performance evaluation outcome confirms that the proposed approach significantly outperforms the other scheduling algorithms in a consistent manner and achieves the best execution runtime with the lowest number of failed tasks and communication costs.
引用
收藏
页码:94 / 106
页数:13
相关论文
共 34 条
[1]  
Alkhanak E. N., 2018, FUTURE GENER COMPUT
[2]  
[Anonymous], 2007, Task Scheduling for Parallel Systems
[3]  
Bandanau D., 2014, arXiv preprint arXiv:1409. 0473
[4]  
Bello Irwan, 2016, 5 INTERNAT C LEARN R
[5]   KNIME:: The Konstanz Information Miner [J].
Berthold, Michael R. ;
Cebron, Nicolas ;
Dill, Fabian ;
Gabriel, Thomas R. ;
Koetter, Tobias ;
Meinl, Thorsten ;
Ohl, Peter ;
Sieb, Christoph ;
Thiel, Kilian ;
Wiswedel, Bernd .
DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, :319-326
[6]  
Cho K., 2014, P SSST8 8 WORKSH SYN, P103, DOI 10.3115/v1/w14-4012
[7]  
Cui D., 2015, P INT S COMP INT SYS, P305
[8]  
Deelman E., 2005, Scientific Programming, V13, P219
[9]  
Draper N.R., 1998, Applied regression analysis, V326, DOI DOI 10.1002/9781118625590.CH11
[10]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]