Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures

被引：0

作者：

Babalad, Shilpa ^{[1
]}

Shevade, Shirish K. ^{[1
]}

Thazhuthaveetil, Matthew Jacob ^{[1
]}

Govindarajan, R. ^{[1
]}

机构：

[1] Indian Inst Sci, Bengaluru, Karnataka, India

来源：

PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024 | 2024年

关键词：

Loop transformations; Vectorization and Parallelization; Supervised learning; Support Vector Machine; Hierarchical Classifier; TRANSFORMATIONS;

D O I：

10.1145/3650200.3656630

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Loop tiling and loop interchange (or permutation) are techniques that can expose task and data-level parallelisms and can exploit data locality available in multi-dimensional loop nests. Choosing the appropriate tile size and loop order is important to achieve significant performance improvement. However, the effect of these transformations on the performance of the loop nest is not straight-forward due to the complex interplay of several architectural features in multi-/many-core architectures. In this work, we propose using a supervised learning technique and develop a Support Vector Machine (SVM) based hierarchical classifier to identify the best-performing tile size and loop order for a given loop nest. Our approach results in identifying tile sizes and loop orders whose performance, on average, is within 18% and 9% of the optimal performance for two sets of loop nests on Intel Xeon Cascadelake architecture. Further, our method outperforms state-of-the-art techniques, Pluto and Polly, with a geometric mean speedup of 1.35x to 1.58x.

引用

页码：388 / 399

页数：12

共 37 条

[1]

Agakov F, 2006, INT SYM CODE GENER, P295

[2] OpenTuner: An Extensible Framework for Program Autotuning [J].

Ansel, Jason ;

Kamil, Shoaib ;

Veeramachaneni, Kalyan ;

Ragan-Kelley, Jonathan ;

Bosboom, Jeffrey ;

O'Reilly, Una-May ;

Amarasinghe, Saman .

PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, :303-315

[3] Cascade Lake: Next Generation Intel Xeon Scalable Processor [J].

Arafa, Mohamed ;

Fahim, Bahaa ;

Kottapalli, Sailesh ;

Kumar, Akhilesh ;

Looi, Lily P. ;

Mandava, Sreenivas ;

Rudoff, Andy ;

Steiner, Ian M. ;

Valentine, Bob ;

Vedaraman, Geetha ;

Vora, Sujal .

IEEE MICRO, 2019, 39 (02) :29-36

[4] MiCOMP: Mitigating the Compiler Phase-Ordering Problem Using Optimization Sub-Sequences and Machine Learning [J].

Ashouri, Amir H. ;

Bignoli, Andrea ;

Palermo, Gianluca ;

Silvano, Cristina ;

Kulkarni, Sameer ;

Cavazos, John .

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (03)

[5]

Babalad Shilpa, 2023, Technical Report

[6] COMPILER TRANSFORMATIONS FOR HIGH-PERFORMANCE COMPUTING [J].

BACON, DF ;

GRAHAM, SL ;

SHARP, OJ .

ACM COMPUTING SURVEYS, 1994, 26 (04) :345-420

[7]

Bailey David, 1995, Technical Report

[8]

BAILEY DH, 1991, SUPERCOMPUTING 91, P158

[9] The PARSEC Benchmark Suite: Characterization and Architectural Implications [J].

Bienia, Christian ;

Kumar, Sanjeev ;

Singh, Jaswinder Pal ;

Li, Kai .

PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :72-81

[10]

Bondhugula U, 2008, LECT NOTES COMPUT SC, V4959, P132

← 1 2 3 4 →