GPU Auto-tuning Framework for Optimal Performance and Power Consumption

被引:0
作者
Cheema, Sunbal [1 ]
Khan, Gul N. [1 ]
机构
[1] Toronto Metropolitan Univ, Dept Elect Comp & Biomed Engn, Toronto, ON, Canada
来源
15TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPU, GPGPU 2023 | 2023年
关键词
Auto-tuning; Code transformation; Multi-objective optimization; GPU code regeneration; Performance power optimization; EFFICIENCY;
D O I
10.1145/3589236.3589241
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An auto-tuning framework for GPU devices is presented for tuning application kernels of OpenCL. The GPU tuner employs multi-objective optimization methodology to improve the performance and power consumption of applications. It efficiently explores a user defined solution space comprising of possible tunable algorithmic and hardware counter variations through code transformations. The methodology targets GPU code tuning situations where performance and energy consumption are critical. The proposed framework is evaluated for 2D convolution kernels. It utilizes a non-dominated sorting Genetic Algorithm with hardware power sensor data for application code transformation through code rewrite and validation. Various algorithmic variations such as loop unrolling, caching, workgroup size and memory utilization are applied. The final pareto optimal configurations code utilized around 30% less power and 4% faster execution time. The analysis shows the convergence of optimization, and 45% improvement in standard deviation.
引用
收藏
页码:1 / 6
页数:6
相关论文
共 28 条
  • [1] Online Power Estimation of Graphics Processing Units
    Adhinarayanan, Vignesh
    Subramaniam, Balaji
    Feng, Wu-chun
    [J]. 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 245 - 254
  • [2] [Anonymous], 2013, CUPTI: User's Guide
  • [3] OpenTuner: An Extensible Framework for Program Autotuning
    Ansel, Jason
    Kamil, Shoaib
    Veeramachaneni, Kalyan
    Ragan-Kelley, Jonathan
    Bosboom, Jeffrey
    O'Reilly, Una-May
    Amarasinghe, Saman
    [J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 303 - 315
  • [4] Survey and unification of local search techniques in metaheuristics for multi-objective combinatorial optimisation
    Blot, Aymeric
    Kessaci, Marie-Eleonore
    Jourdan, Laetitia
    [J]. JOURNAL OF HEURISTICS, 2018, 24 (06) : 853 - 877
  • [5] Understanding GPU Power: A Survey of Profiling, Modeling, and Simulation Methods
    Bridges, Robert A.
    Imam, Neena
    Mintz, Tiffany M.
    [J]. ACM COMPUTING SURVEYS, 2016, 49 (03)
  • [6] BURTSCHER M., 2014, Proceedings of the 7th Workshop on General Purpose Processing Using GPUs, P28
  • [7] An updated survey of GA-based multiobjective optimization techniques
    Coello, CAC
    [J]. ACM COMPUTING SURVEYS, 2000, 32 (02) : 109 - 143
  • [8] A fast and elitist multiobjective genetic algorithm: NSGA-II
    Deb, K
    Pratap, A
    Agarwal, S
    Meyarivan, T
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (02) : 182 - 197
  • [9] OCLoptimizer: an iterative optimization tool for OpenCL
    Fabeiro, Jorge F.
    Andrade, Diego
    Fraguela, Basilio B.
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 1322 - 1331
  • [10] Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability
    Falch, Thomas L.
    Elster, Anne C.
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1231 - 1240