PARTANS: An Autotuning Framework for Stencil Computation on Multi-GPU Systems

被引:41
作者
Lutz, Thibaut [1 ]
Fensch, Christian [1 ]
Cole, Murray [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Experimentation; Performance; GPGPU; multi GPU; optimization; stencil computation;
D O I
10.1145/2400682.2400718
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
GPGPUs are a powerful and energy-efficient solution for many problems. For higher performance or larger problems, it is necessary to distribute the problem across multiple GPUs, increasing the already high programming complexity. In this article, we focus on abstracting the complexity of multi-GPU programming for stencil computation. We show that the best strategy depends not only on the stencil operator, problem size, and GPU, but also on the PCI express layout. This adds nonuniform characteristics to a seemingly homogeneous setup, causing up to 23% performance loss. We address this issue with an autotuner that optimizes the distribution across multiple GPUs.
引用
收藏
页数:24
相关论文
共 21 条
[1]  
[Anonymous], 2010, 2010 IEEE INT S PAR
[2]  
[Anonymous], 2010, PROC IPDPS
[3]  
[Anonymous], 2009, PARALLEL DISTRIBUTED
[4]  
APPLIED NUMERICAL ALGORITHMS GROUP LBNL, CHOMBO SOFTW AD SOL
[5]  
Christen M., 2011, P 25 IEEE INT PAR DI
[6]  
Dastgeer Usman., 2011, Proceedings of the 4th International Workshop on Multicore Software Engineering, IWMSE '11, P25, DOI [DOI 10.1145/1984693.1984697, 10.1145/19846 93.1984697]
[7]  
Fox G. C., 1984, Digest of Papers COMPCON Spring '84. Twenty-Eighth IEEE Computer Society International Conference (IEEE Cat. No. 84CH2017-2), P70
[8]   PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs [J].
Han, Dongni ;
Xu, Shixiong ;
Chen, Li ;
Huang, Lei .
2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, :308-315
[9]  
Itu L. M., 2011, 10 ROED INT C ROEDUN, P1
[10]  
KEUTZER K., 2010, WORKSH PAR PROGR PAT