Parameter Tuning Model for Optimizing Application Performance on GPU

被引：3

作者：

Nhat-Phuong Tran ^{[1
]}

Lee, Myungho ^{[1
]}

机构：

[1] Myongji Univ, Dept Comp Sci & Engn, 116 Myongji Ro, Yongin, Gyeonggi Do, South Korea

来源：

2016 IEEE 1ST INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W) | 2016年

关键词：

GPU; High Performance Computing; Performance Optimization;

D O I：

10.1109/FAS-W.2016.28

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, the Graphic Processing Units (GPUs) are becoming increasingly popular for the High Performance Computing (HPC) applications. Although the GPUs provide high peak performance, exploiting the full performance potential for application programs, however, leaves a challenging task to the programmers. When launching a parallel kernel of an application on the GPU, the programmer needs to carefully select the number of blocks (grid size) and the number of threads per block (block size) which greatly influence the performance. With a huge range of possible combinations of the parameter values, choosing the right grid size and the block size is not straightforward. In this paper, we propose a model for tuning the grid size and the block size through which we can reach the optimal performance. Our approach can significantly reduce the potential search space, instead of exhaustive search approaches in the previous research which are not practical in the real applications.

引用

页码：78 / 83

页数：6

共 9 条

[1] [Anonymous], 1998, SC 98, DOI [10.5555/509058.509096, DOI 10.1109/SC.1998.10004]
[2] [Anonymous], 2009, P C HIGH PERFORMANCE
[3] The design and implementation of FFTW3
Frigo, M
Johnson, SG
[J]. PROCEEDINGS OF THE IEEE, 2005, 93 (02) : 216 - 231
[4] Nath Rajib, 2010, PAR MATR ALG APPL PM
[5] Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA
Ryoo, Shane
Rodrigues, Christopher I.
Baghsorkhi, Sara S.
Stone, Sam S.
Kirk, David B.
Hwu, Wen-mei W.
[J]. PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2008, : 73 - 82
[6] Torres Yuri, 2011, Proceedings of the 2011 International Conference on High Performance Computing and Simulation (HPCS 2011), P631
[7] OSKI: A library of automatically tuned sparse matrix kernels
Vuduc, R
Demmel, JW
Yelick, KA
[J]. SCIDAC 2005: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2005, 16 : 521 - 530
[8] Vuduc R, 2000, LECT NOTES COMPUT SC, V1924, P190
[9] An Optimizing Compiler for GPGPU Programs with Input-Data Sharing
Yang, Yi
Xiang, Ping
Kong, Jingfei
Zhou, Huiyang
[J]. ACM SIGPLAN NOTICES, 2010, 45 (05) : 343 - 344

← 1 →