A Pattern Specification and Optimizations Framework for Accelerating Scientific Computations on Heterogeneous Clusters

被引:10
作者
Chen, Linchuan [1 ]
Huo, Xin [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2015年
关键词
D O I
10.1109/IPDPS.2015.13
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clusters with accelerators at each node have emerged as the dominant high-end architecture in recent years. Such systems can be extremely hard to program because of the underlying heterogeneity and the need for exploiting parallelism at multiple levels. Thus, easing parallel programming today requires not only high-level programming models, but ones from which hybrid parallelism can be extracted. In this paper, we focus on the following question: "can simple APIs be developed for several classes of popular scientific applications, to ease application development and yet maintain parallel efficiency, on clusters with accelerators?". We approach this problem by individually considering popular patterns that arise in scientific computations. By developing APIs for generalized reductions, irregular reductions, and stencil computations, we show that several complex scientific applications can be supported. We enable compact specification of these applications (40% of the code size of MPI), while also enabling parallelization across nodes and devices within a node, and with work distribution across CPU and GPU cores. We enable a number of optimizations that are normally implemented by hand by scientific programmers. We compare well against existing MPI applications while scaling across nodes, and against handwritten CUDA applications for executions on a single GPU, and yet can scale by using all parallelism simultaneously. On a cluster with 64 GPUs, we achieve speedups between 600 and 1800 over sequential (single CPU core) versions.
引用
收藏
页码:591 / 600
页数:10
相关论文
共 32 条
  • [1] [Anonymous], 2006, Technical Report
  • [2] [Anonymous], 2004, P 6 C S OP SYST DES
  • [3] [Anonymous], 2011, SC
  • [4] [Anonymous], 2005, LBNL59208
  • [5] StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
    Augonnet, Cedric
    Thibault, Samuel
    Namyst, Raymond
    Wacrenier, Pierre-Andre
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (02) : 187 - 198
  • [6] Bueno Javier, 2012, IPDPS
  • [7] Charles P., 2005, ACM SIGPLAN NOTICES
  • [8] Che S., 2009, IISWC
  • [9] Chen L., 2012, P INT C HIGH PERF CO, P25
  • [10] Chen L., 2012, HPDC, P199, DOI [10.1145/2287076.2287109, DOI 10.1145/2287076.2287109]