Scheduling Independent Moldable Tasks on Multi-Cores with GPUs

被引:22
作者
Bleuse, Raphael [1 ]
Hunold, Sascha [3 ]
Kedad-Sidhoum, Safia [2 ]
Monna, Florence [2 ]
Mounie, Gregory [1 ]
Trystram, Denis [1 ]
机构
[1] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP,LIG, F-38000 Grenoble, France
[2] UPMC Univ Paris 06, Sorbonne Univ, UMR 7606, LIP6, F-75005 Paris, France
[3] TU Wien, Fac Informat, Inst Informat Syst, Favoritenstr 16-184-53, A-1040 Vienna, Austria
关键词
Scheduling; heterogeneous computing; moldable tasks; dual approximation scheme; integer linear programming; APPROXIMATION ALGORITHMS; PARALLEL;
D O I
10.1109/TPDS.2017.2675891
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a new approach for scheduling independent tasks on multiple CPUs and multiple GPUs. The tasks are assumed to be parallelizable on CPUs using the moldable model: the final number of cores allotted to a task can be decided and set by the scheduler. More precisely, we design an algorithm aiming at minimizing the makespan-the maximum completion time of all tasks-for this scheduling problem. The proposed algorithm combines a dual approximation scheme with a fast integer linear program (ILP). It determines both the partitioning of the tasks, i.e., whether a task should be mapped to CPUs or a GPU, and the number of CPUs allotted to a moldable task if mapped to the CPUs. A worst-case analysis shows that the algorithm has an approximation ratio of 3/2 + is an element of. Since the time complexity of the ILP-based algorithm could be non-polynomial, we also present a polynomial-time algorithm with an approximation ratio of 2 + is an element of. We complement the theoretical analysis of our two novel algorithms with a simulation study. In these simulations, we compare our algorithms to a modified version of the classical HEFT algorithm, which we adapted to handle moldable tasks. The simulation results show that our algorithm with the 3 2 + is an element of 3/2 + -approximation ratio produces significantly shorter schedules than the modified HEFT for most of the instances. In addition, our results provide evidence that our ILP-based algorithm can solve larger problem instances in a reasonable amount of time.
引用
收藏
页码:2689 / 2702
页数:14
相关论文
共 32 条
[1]  
Abe Y., 2012, P 2012 USENIX C POWE, V12, P10
[2]  
Agullo E., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P932, DOI 10.1109/IPDPS.2011.90
[3]  
Amdahl G. M., 1967, P APR 18 20 1967 SPR, P483, DOI [10.1145/1465482.1465560, DOI 10.1145/1465482.1465560]
[4]   StarPU: a unified platform for task scheduling on heterogeneous multicore architectures [J].
Augonnet, Cedric ;
Thibault, Samuel ;
Namyst, Raymond ;
Wacrenier, Pierre-Andre .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (02) :187-198
[5]  
Bezanson J, 2014, CORR
[6]  
Blayo E, 1999, LECT NOTES COMPUT SC, V1685, P303
[7]   Scheduling independent tasks on multi-cores with GPU accelerators [J].
Bleuse, Raphael ;
Kedad-Sidhoum, Safia ;
Monna, Florence ;
Mounie, Gregory ;
Trystram, Denis .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (06) :1625-1638
[8]  
Bonifaci V., 2012, CORR
[9]   PaRSRC: Exploiting Heterogeneity to Enhance Scalability [J].
Bosilca, George ;
Bouteiller, Aurelien ;
Danalis, Anthony ;
Faverge, Mathieu ;
Herault, Thomas ;
Dongarra, Jack J. .
COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (06) :36-45
[10]  
Bougeret M, 2010, LECT NOTES COMPUT SC, V6271, P157, DOI 10.1007/978-3-642-15277-1_16