A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

被引:5
作者
Beri, Tarun [1 ]
Bansal, Sorav [1 ]
Kumar, Subodh [1 ]
机构
[1] Indian Inst Technol Delhi, New Delhi, India
来源
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2015年
关键词
D O I
10.1109/IPDPS.2015.12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a runtime system for simple and efficient programming of CPU+GPU clusters. The programmer focuses on core logic, while the system undertakes task allocation, load balancing, scheduling, data transfer, etc. Our programming model is based on a shared global address space, made efficient by transaction style bulk-synchronous semantics. This model broadly targets coarse-grained data-parallel computation particularly suited to multi-GPU heterogeneous clusters. We describe our computation and communication scheduling system and report its performance on a few prototype applications. For example, parallelization of matrix multiplication or 2D FFT using our system requires the regular CPU/GPU implementations and about 30 lines of additional C code to set up the runtime. Our runtime system achieves a performance of 5.61 TFlop/s while multiplying two square matrices of 1.56 billion elements each over a 10-node cluster with 20 GPUs. This performance is possible due to a number of critical optimizations working in concert. These include prefetching, pipelining, maximizing overlap between computation and communication, and scheduling efficiently across heterogeneous devices of vastly different capacities.
引用
收藏
页码:146 / 155
页数:10
相关论文
共 42 条
  • [1] Aji, HPCC 12
  • [2] Amza, 1996, COMPUTER, V29
  • [3] Augonnet, 2012, EUROMPI
  • [4] StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
    Augonnet, Cedric
    Thibault, Samuel
    Namyst, Raymond
    Wacrenier, Pierre-Andre
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (02) : 187 - 198
  • [5] Ayguade, EURO PAR 09
  • [6] Barak, 2010, CLUST COMP WORKSH PO
  • [7] Beguelin, 1991, TECH REP
  • [8] Blumofe, 1995, SIGPLAN NOT, V30
  • [9] Dagum, 1998, IEEE COMPUT SCI ENG, V5
  • [10] Dean, 2008, COMMUN ACM, V51