Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

被引：0

作者：

Li, Chendi ^{[1
]}

Xu, Yufan ^{[1
]}

Saravani, Sina Mahdipour ^{[1
]}

Sadayappan, P. ^{[1
]}

机构：

[1] Univ Utah, Salt Lake City, UT 84112 USA

来源：

PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

Auto-tuning; Design space exploration; GPU kernel optimization; Neural networks; Performance modeling; Tile-size optimization;

D O I：

10.1145/3650200.3656626

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

TVM is a state-of-the-art auto-tuning compiler for the synthesis of high-performance implementations of tensor computations. However, an extensive search in the vast design space via thousands of compile-execute trials is often needed to identify high-performance code versions, leading to high auto-tuning time. This paper develops new performance modeling and design space exploration strategies to accelerate the code optimization process within TVM. Experimental evaluation on a number of matrix-matrix multiplication and 2D convolution kernels demonstrates about an order-of-magnitude improvement in auto-tuning time to achieve the same level of code performance.

引用

页码：549 / 561

页数：13

共 33 条

[1] OpenTuner: An Extensible Framework for Program Autotuning
Ansel, Jason
Kamil, Shoaib
Veeramachaneni, Kalyan
Ragan-Kelley, Jonathan
Bosboom, Jeffrey
O'Reilly, Una-May
Amarasinghe, Saman
[J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 303 - 315
[2] Baghdadi R, 2019, INT SYM CODE GENER, P193, DOI [10.5281/zenodo.2375075, 10.1109/CGO.2019.8661197]
[3] Autotuning in High-Performance Computing Applications
Balaprakash, Prasanna
Dongarra, Jack
Gamblin, Todd
Hall, Mary
Hollingsworth, Jeffrey K.
Norris, Boyana
Vuduc, Richard
[J]. PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 2068 - 2083
[4] DOPpler: Parallel Measurement Infrastructure for Auto-Tuning Deep Learning Tensor Programs
Borowiec, Damian
Yeung, Gingfung
Friday, Adrian
Harper, Richard
Garraghan, Peter
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (07) : 2208 - 2220
[5] Trimmer: Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters
Borowiec, Damian
Yeung, Gingfung
Friday, Adrian
Harper, Richard H. R.
Garraghan, Peter
[J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2022), 2022, : 374 - 384
[6] XGBoost: A Scalable Tree Boosting System
Chen, Tianqi
Guestrin, Carlos
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
[7] Chen TQ, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P579
[8] Chen TQ, 2018, ADV NEUR IN, V31
[9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10] Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
Gibson, Perry
Cano, Jose
[J]. PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 28 - 53

← 1 2 3 4 →