A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

被引：5

作者：

Beri, Tarun ^{[1
]}

Bansal, Sorav ^{[1
]}

Kumar, Subodh ^{[1
]}

机构：

[1] Indian Inst Technol Delhi, New Delhi, India

来源：

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2015年

关键词：

D O I：

10.1109/IPDPS.2015.12

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a runtime system for simple and efficient programming of CPU+GPU clusters. The programmer focuses on core logic, while the system undertakes task allocation, load balancing, scheduling, data transfer, etc. Our programming model is based on a shared global address space, made efficient by transaction style bulk-synchronous semantics. This model broadly targets coarse-grained data-parallel computation particularly suited to multi-GPU heterogeneous clusters. We describe our computation and communication scheduling system and report its performance on a few prototype applications. For example, parallelization of matrix multiplication or 2D FFT using our system requires the regular CPU/GPU implementations and about 30 lines of additional C code to set up the runtime. Our runtime system achieves a performance of 5.61 TFlop/s while multiplying two square matrices of 1.56 billion elements each over a 10-node cluster with 20 GPUs. This performance is possible due to a number of critical optimizations working in concert. These include prefetching, pipelining, maximizing overlap between computation and communication, and scheduling efficiently across heterogeneous devices of vastly different capacities.

引用

页码：146 / 155

页数：10

共 42 条

[1] Aji, HPCC 12
[2] Amza, 1996, COMPUTER, V29
[3] Augonnet, 2012, EUROMPI
[4] StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Augonnet, Cedric
Thibault, Samuel
Namyst, Raymond
Wacrenier, Pierre-Andre
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (02) : 187 - 198
[5] Ayguade, EURO PAR 09
[6] Barak, 2010, CLUST COMP WORKSH PO
[7] Beguelin, 1991, TECH REP
[8] Blumofe, 1995, SIGPLAN NOT, V30
[9] Dagum, 1998, IEEE COMPUT SCI ENG, V5
[10] Dean, 2008, COMMUN ACM, V51

← 1 2 3 4 5 →