Autotuning Convolutions Is Easier Than You Think

被引：10

作者：

Tollenaere, Nicolas ^{[1
]}

Iooss, Guillaume ^{[1
]}

Pouget, Stephane ^{[2
]}

Brunie, Hugo ^{[1
]}

Guillon, Christophe ^{[1
]}

Cohen, Albert ^{[3
]}

Sadayappan, P. ^{[4
]}

Rastello, Fabrice ^{[1
]}

机构：

[1] INRIA, Ctr Rech Inria Rhone Alpes, Antenne Inria GIANT, Minatec Campus,17 Rue Martyrs, F-38054 Grenoble, France

[2] Univ Calif Los Angeles, 404 Westwood Plaza,Engn 6, Los Angeles, CA 90095 USA

[3] Google, 8 Rue Londres, F-75009 Paris, France

[4] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2023年 / 20卷 / 02期

关键词：

Code generation; optimisation space; microkernel; convolution; AFFINE SCHEDULING PROBLEM; EFFICIENT SOLUTIONS; COMPILER;

D O I：

10.1145/3570641

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A wide range of scientific and machine learning applications depend on highly optimized implementations of tensor computations. Exploiting the full capacity of a given processor architecture remains a challenging task, due to the complexity of the microarchitectural features that come into play when seeking near-peak performance. Among the state-of-the-art techniques for loop transformations for performance optimization, AutoScheduler [Zheng et al. 2020a] tends to outperform other systems. It often yields higher performance as compared to vendor libraries, but takes a large number of runs to converge, while also involving a complex training environment. In this article, we define a structured configuration space that enables much faster convergence to high-performance code versions, using only random sampling of candidates. We focus on two-dimensional convolutions on CPUs. Compared to state-of-the-art libraries, our structured search space enables higher performance for typical tensor shapes encountered in convolution stages in deep learning pipelines. Compared to auto-tuning code generators like AutoScheduler, it prunes the search space while increasing the density of efficient implementations. We analyze the impact on convergence speed and performance distribution, on two Intel x86 processors and one ARM AArch64 processor. We match or outperform the performance of the state-of-the-art oneDNN library and TVM's AutoScheduler, while reducing the autotuning effort by at least an order of magnitude.

引用

页数：24

共 50 条

[1] Cheating is easier than you think
Conen, D
NEW SCIENTIST, 2004, 183 (2460) : 6 - 7
[2] SMT: Easier than you think
Johnson, L
EDN, 1996, 41 (12) : 43 - 43
[3] LIBRARY CHATBOTS: Easier Than You Think
Rodriguez, Sharesly
Mune, Christina
Computers in Libraries, 2021, 41 (08) : 29 - 32
[4] Easier than you think, but worth the risk?
Murphy, M.
Chemistry and Industry (London), 2001, (17):
[5] File Navigation is Easier than You Think
Karlin, Al
PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2023, 89 (07): : 407 - 409
[6] Getting involved is easier than you think
Klein, CA
TRIBOLOGY & LUBRICATION TECHNOLOGY, 2004, 60 (03): : 63 - 63
[7] A VIRTUAL CLASSROOM IS EASIER THAN YOU THINK
VETTER, RJ
COMPUTER, 1994, 27 (06) : 104 - 104
[8] EQUILIBRIUM CALCULATIONS ARE EASIER THAN YOU THINK - BUT YOU DO HAVE TO THINK
WELTIN, E
JOURNAL OF CHEMICAL EDUCATION, 1993, 70 (07) : 571 - 573
[9] Kitchen microbiology (It's easier than you think!)
Wilcoxson, C
Shand, SM
Shand, RF
AMERICAN BIOLOGY TEACHER, 1999, 61 (01): : 34 - 38
[10] Programming Your Robot is Easier Than You Think
Adams, Charles
WELDING JOURNAL, 2020, 99 (10) : 42 - 44

← 1 2 3 4 5 →