Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures

被引:0
作者
Babalad, Shilpa [1 ]
Shevade, Shirish K. [1 ]
Thazhuthaveetil, Matthew Jacob [1 ]
Govindarajan, R. [1 ]
机构
[1] Indian Inst Sci, Bengaluru, Karnataka, India
来源
PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024 | 2024年
关键词
Loop transformations; Vectorization and Parallelization; Supervised learning; Support Vector Machine; Hierarchical Classifier; TRANSFORMATIONS;
D O I
10.1145/3650200.3656630
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Loop tiling and loop interchange (or permutation) are techniques that can expose task and data-level parallelisms and can exploit data locality available in multi-dimensional loop nests. Choosing the appropriate tile size and loop order is important to achieve significant performance improvement. However, the effect of these transformations on the performance of the loop nest is not straight-forward due to the complex interplay of several architectural features in multi-/many-core architectures. In this work, we propose using a supervised learning technique and develop a Support Vector Machine (SVM) based hierarchical classifier to identify the best-performing tile size and loop order for a given loop nest. Our approach results in identifying tile sizes and loop orders whose performance, on average, is within 18% and 9% of the optimal performance for two sets of loop nests on Intel Xeon Cascadelake architecture. Further, our method outperforms state-of-the-art techniques, Pluto and Polly, with a geometric mean speedup of 1.35x to 1.58x.
引用
收藏
页码:388 / 399
页数:12
相关论文
共 37 条
[1]  
Agakov F, 2006, INT SYM CODE GENER, P295
[2]   OpenTuner: An Extensible Framework for Program Autotuning [J].
Ansel, Jason ;
Kamil, Shoaib ;
Veeramachaneni, Kalyan ;
Ragan-Kelley, Jonathan ;
Bosboom, Jeffrey ;
O'Reilly, Una-May ;
Amarasinghe, Saman .
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, :303-315
[3]   Cascade Lake: Next Generation Intel Xeon Scalable Processor [J].
Arafa, Mohamed ;
Fahim, Bahaa ;
Kottapalli, Sailesh ;
Kumar, Akhilesh ;
Looi, Lily P. ;
Mandava, Sreenivas ;
Rudoff, Andy ;
Steiner, Ian M. ;
Valentine, Bob ;
Vedaraman, Geetha ;
Vora, Sujal .
IEEE MICRO, 2019, 39 (02) :29-36
[4]   MiCOMP: Mitigating the Compiler Phase-Ordering Problem Using Optimization Sub-Sequences and Machine Learning [J].
Ashouri, Amir H. ;
Bignoli, Andrea ;
Palermo, Gianluca ;
Silvano, Cristina ;
Kulkarni, Sameer ;
Cavazos, John .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (03)
[5]  
Babalad Shilpa, 2023, Technical Report
[6]   COMPILER TRANSFORMATIONS FOR HIGH-PERFORMANCE COMPUTING [J].
BACON, DF ;
GRAHAM, SL ;
SHARP, OJ .
ACM COMPUTING SURVEYS, 1994, 26 (04) :345-420
[7]  
Bailey D., 1995, NAS95020 AM RES CTR
[8]  
BAILEY DH, 1991, SUPERCOMPUTING 91, P158
[9]   The PARSEC Benchmark Suite: Characterization and Architectural Implications [J].
Bienia, Christian ;
Kumar, Sanjeev ;
Singh, Jaswinder Pal ;
Li, Kai .
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :72-81
[10]  
Bondhugula U, 2008, LECT NOTES COMPUT SC, V4959, P132