Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

被引:0
作者
Li, Chendi [1 ]
Xu, Yufan [1 ]
Saravani, Sina Mahdipour [1 ]
Sadayappan, P. [1 ]
机构
[1] Univ Utah, Salt Lake City, UT 84112 USA
来源
PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024 | 2024年
基金
美国国家科学基金会;
关键词
Auto-tuning; Design space exploration; GPU kernel optimization; Neural networks; Performance modeling; Tile-size optimization;
D O I
10.1145/3650200.3656626
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
TVM is a state-of-the-art auto-tuning compiler for the synthesis of high-performance implementations of tensor computations. However, an extensive search in the vast design space via thousands of compile-execute trials is often needed to identify high-performance code versions, leading to high auto-tuning time. This paper develops new performance modeling and design space exploration strategies to accelerate the code optimization process within TVM. Experimental evaluation on a number of matrix-matrix multiplication and 2D convolution kernels demonstrates about an order-of-magnitude improvement in auto-tuning time to achieve the same level of code performance.
引用
收藏
页码:549 / 561
页数:13
相关论文
共 33 条
  • [1] OpenTuner: An Extensible Framework for Program Autotuning
    Ansel, Jason
    Kamil, Shoaib
    Veeramachaneni, Kalyan
    Ragan-Kelley, Jonathan
    Bosboom, Jeffrey
    O'Reilly, Una-May
    Amarasinghe, Saman
    [J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 303 - 315
  • [2] Baghdadi R, 2019, INT SYM CODE GENER, P193, DOI [10.5281/zenodo.2375075, 10.1109/CGO.2019.8661197]
  • [3] Autotuning in High-Performance Computing Applications
    Balaprakash, Prasanna
    Dongarra, Jack
    Gamblin, Todd
    Hall, Mary
    Hollingsworth, Jeffrey K.
    Norris, Boyana
    Vuduc, Richard
    [J]. PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 2068 - 2083
  • [4] DOPpler: Parallel Measurement Infrastructure for Auto-Tuning Deep Learning Tensor Programs
    Borowiec, Damian
    Yeung, Gingfung
    Friday, Adrian
    Harper, Richard
    Garraghan, Peter
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (07) : 2208 - 2220
  • [5] Trimmer: Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters
    Borowiec, Damian
    Yeung, Gingfung
    Friday, Adrian
    Harper, Richard H. R.
    Garraghan, Peter
    [J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2022), 2022, : 374 - 384
  • [6] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [7] Chen TQ, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P579
  • [8] Chen TQ, 2018, ADV NEUR IN, V31
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
    Gibson, Perry
    Cano, Jose
    [J]. PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 28 - 53