Benchmarking Optimization Algorithms for Auto-Tuning GPU Kernels

被引:4
作者
Schoonhoven, Richard Arnoud [1 ,2 ]
van Werkhoven, Ben [1 ,3 ]
Batenburg, Kees Joost [1 ,2 ]
机构
[1] Ctr Wiskunde & Informat, Computat Imaging Grp, NL-1098 XG Amsterdam, Netherlands
[2] Leiden Univ, Leiden Inst Adv Comp Sci, NL-2311 EZ Leiden, Netherlands
[3] Netherlands eSci Ctr, NL-1098 XH Amsterdam, Netherlands
基金
荷兰研究理事会;
关键词
Auto-tuning; evolutionary computing; fitness landscape analysis; graphical processing unit (GPU) computing; performance optimization; GLOBAL OPTIMIZATION; SEARCH; IMPLEMENTATION; MODELS;
D O I
10.1109/TEVC.2022.3210654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed phenomenal growth in the application, and capabilities of graphical processing units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging and, generally, only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of automatically optimizing software for highly efficient execution on a target hardware platform. Auto-tuning is particularly useful for GPU programming, as a single kernel requires retuning after code changes, for different input data, and for different architectures. However, the discrete and nonconvex nature of the search space creates a challenging optimization problem. In this work, we investigate which algorithm produces the fastest kernels if the time-budget for the tuning task is varied. We conduct a survey by performing experiments on 26 different kernel spaces, from nine different GPUs, for 16 different evolutionary black-box optimization algorithms. We then analyze these results and introduce a novel metric based on the PageRank centrality concept as a tool for gaining insight into the difficulty of the optimization problem. We demonstrate that our metric correlates strongly with the observed tuning performance.
引用
收藏
页码:550 / 564
页数:15
相关论文
共 50 条
  • [1] Bayesian Optimization for auto-tuning GPU kernels
    Willemsen, Floris-Jan
    van Nieuwpoort, Rob
    van Werkhoven, Ben
    PROCEEDINGS OF PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2021), 2021, : 106 - 117
  • [2] Accelerated Auto-Tuning of GPU Kernels for Tensor Computations
    Li, Chendi
    Xu, Yufan
    Saravani, Sina Mahdipour
    Sadayappan, P.
    PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 549 - 561
  • [3] A methodology for comparing optimization algorithms for auto-tuning
    Willemsen, Floris-Jan
    Schoonhoven, Richard
    Filipovic, Jiri
    Torring, Jacob O.
    van Nieuwpoort, Rob
    van Werkhoven, Ben
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 159 : 489 - 504
  • [4] Bayesian Optimization for Auto-tuning Convolution Neural Network on GPU
    Zhu, Huming
    Liu, Chendi
    Zhang, Lingyun
    Dong, Ximiao
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT VI, 2024, 14492 : 478 - 489
  • [5] A methodology to evaluate PID auto-tuning algorithms
    Romero, Julio A.
    Sanchis, Roberto
    REVISTA IBEROAMERICANA DE AUTOMATICA E INFORMATICA INDUSTRIAL, 2011, 8 (01): : 112 - +
  • [6] GPU Auto-tuning Framework for Optimal Performance and Power Consumption
    Cheema, Sunbal
    Khan, Gul N.
    15TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPU, GPGPU 2023, 2023, : 1 - 6
  • [7] A Scalable Auto-tuning Framework for Compiler Optimization
    Tiwari, Ananta
    Chen, Chun
    Chame, Jacqueline
    Hall, Mary
    Hollingsworth, Jeffrey K.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 796 - +
  • [8] PATSMA: Parameter Auto-tuning for Shared Memory Algorithms
    Fernandes, Joao B.
    Santos-da-Silva, Felipe H.
    Barros, Tiago
    Assis, Italo A. S.
    Xavier-de-Souza, Samuel
    SOFTWAREX, 2024, 27
  • [9] Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
    Lim, Roktaek
    Lee, Yeongha
    Kim, Raehyun
    Choi, Jaeyoung
    Lee, Myungho
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (12) : 7895 - 7908
  • [10] An Optimization and Auto-tuning Method for Scale-free Graph Algorithms on SIMD Architectures
    Lin, Jie
    Tan, Yusong
    Wu, Qingbo
    Li, Xiaoling
    Yu, Jie
    Zhang, Qi
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 533 - 541