Auto-tuning for GPGPU applications using performance and energy model

被引:2
作者
Lin, Chih-Sheng [1 ]
Teng, Shih-Meng [2 ]
Hsiung, Pao-Ann [2 ]
机构
[1] Ind Technol Res Inst, Informat & Commun Res Labs, Hsinchu, Taiwan
[2] Natl Chung Cheng Univ, Dept Comp Sci & Informat Engn, Chaiyi, Taiwan
关键词
GPGPU; Performance and Energy modeling; Auto-tuning; Optimization;
D O I
10.1016/j.sysarc.2015.11.012
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The general-purpose graphic processing unit (GPGPU) is a popular accelerator for general applications such as scientific computing because the applications are massively parallel and the significant power of parallel computing inheriting from GPUs. However, distributing workload among the large number of cores as the execution configuration in a GPGPU is currently still a manual trial-and-error process. Programmers try out manually some configurations and might settle for a sub-optimal one leading to poor performance and/or high power consumption. This paper presents an auto-tuning approach for GPGPU applications with the performance and power models. First, a model-based analytic approach for estimating performance and power consumption of kernels is proposed. Second, an auto-tuning framework is proposed for automatically obtaining a near-optimal configuration for a kernel computation. In this work, we formulated that automatically finding an optimal configuration as the constraint optimization and solved it using either simulated annealing (SA) or genetic algorithm (GA). Experiment results show that the fidelity of the proposed models for performance and energy consumption are 0.86 and 0.89, respectively. Further, the optimization algorithms result in a normalized optimality offset of 0.94% and 0.79% for SA and GA, respectively. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:40 / 53
页数:14
相关论文
共 21 条
  • [1] [Anonymous], P ACM 4 WORKSH GEN P
  • [2] [Anonymous], 2010, 2010 IEEE INT S PAR
  • [3] Augonnet C., 2009, P 3 WORKSH HIGHL PAR
  • [4] Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
  • [5] Cheng Luo, 2011, Proceedings of the 2011 IEEE 9th International Conference on Dependable, Autonomic and Secure Computing (DASC 2011), P658, DOI 10.1109/DASC.2011.117
  • [6] Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
    Choi, Jee W.
    Singh, Amik
    Vuduc, Richard W.
    [J]. PPOPP 2010: PROCEEDINGS OF THE 2010 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2010, : 115 - 125
  • [7] Da Qi Ren, 2010, 2010 International Conference on Green Computing (Green Comp), P309, DOI 10.1109/GREENCOMP.2010.5598300
  • [8] Davidson A., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P956, DOI 10.1109/IPDPS.2011.92
  • [9] Gehrke A. S., 2011, 2011 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, P2113, DOI 10.1109/IPDPS.2011.390
  • [10] GrauerGray S., 2012, 2012 Innovative Parallel Computing (InPar), DOI [DOI 10.1109/INPAR.2012.6339595, 10.1 109/InPar.2012.6339595]