StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

被引:686
作者
Augonnet, Cedric [1 ]
Thibault, Samuel [1 ]
Namyst, Raymond [1 ]
Wacrenier, Pierre-Andre [1 ]
机构
[1] Univ Bordeaux, LaBRI INRIA Bordeaux Sud Ouest, Talence, France
关键词
GPU; multicore; accelerator; scheduling; runtime system;
D O I
10.1002/cpe.1631
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the field of HPC, the current hardware trend is to design multiprocessor architectures featuring heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE) or data-parallel accelerators (e.g. GPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We therefore designed StarPU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run-time, and we have analyzed their efficiency on several algorithms running simultaneously over multiple cores and a GPU. In addition to substantial improvements regarding execution times, we have obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine. We eventually show that our dynamic approach competes with the highly optimized MAGMA library and overcomes the limitations of the corresponding static scheduling in a portable way. Copyright (C) 2010 John Wiley & Sons, Ltd.
引用
收藏
页码:187 / 198
页数:12
相关论文
共 17 条
[1]  
[Anonymous], 1999, 9 SIAM C PAR PROC SC
[2]  
Augonnet C, 2009, P INT EUR PAR WORKSH
[3]  
Augonnet C, 2009, SAMOS WORKSH SAM GRE
[4]  
Augonnet C, 2008, P INT EUR PAR WORKSH
[5]  
Ayguade E, 2009, P 15 EUR PAR C DELFT
[6]  
Ayguade E, 2009, LECT NOTES COMPUT SC, V5568, P154, DOI 10.1007/978-3-642-02303-3_13
[7]   Scheduling strategies for master-slave tasking on heterogeneous processor platforms [J].
Banino, C ;
Beaumont, O ;
Carter, L ;
Ferrante, J ;
Legrand, A ;
Robert, Y .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2004, 15 (04) :319-330
[8]  
Bellens P, 2009, LECT NOTES COMPUT SC, V5657, P318, DOI 10.1007/978-3-642-03138-0_35
[9]  
Diamos GregoryF., 2008, P 17 INT S HIGH PERF, P197, DOI DOI 10.1145/1383422.1383447
[10]  
Dolbeau R, 2007, HMPP HYBRID MULTICOR