Loop Transformations Leveraging Hardware Prefetching

被引:7
作者
Sioutas, Savvas [1 ]
Stuijk, Sander [1 ]
Corporaal, Henk [1 ]
Basten, Twan [2 ]
Somers, Lou [3 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
[2] Eindhoven Univ Technol, TNO ESI, Eindhoven, Netherlands
[3] Eindhoven Univ Technol, Oce Technol, Eindhoven, Netherlands
来源
PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18) | 2018年
关键词
loop optimizations; compiler optimizations; Halide; PERFORMANCE;
D O I
10.1145/3168823
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Memory-bound applications heavily depend on the band-width of the system in order to achieve high performance. Improving temporal and/or spatial locality through loop transformations is a common way of mitigating this dependency. However, choosing the right combination of optimizations is not a trivial task, due to the fact that most of them alter the memory access pattern of the application and as a result interfere with the efficiency of the hardware prefetching mechanisms present in modern architectures. We propose an optimization algorithm that analytically classifies an algorithmic description of a loop nest in order to decide whether it should be optimized stressing its temporal or spatial locality, while also taking hardware prefetching into account. We implement our technique as a tool to be used with the Halide compiler and test it on a variety of benchmarks. We find an average performance improvement of over 40% compared to previous analytical models targeting the Halide language and compiler.
引用
收藏
页码:254 / 264
页数:11
相关论文
共 29 条
[1]   Near-optimal loop tiling by means of Cache Miss Equations and genetic algorithms [J].
Abella, J ;
González, A ;
Llosa, J ;
Vera, X .
2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS OF THE WORKSHOPS, 2002, :568-577
[2]  
[Anonymous], J PROGRAMMING LANGUA
[3]  
[Anonymous], 1998, SC 98, DOI [10.5555/509058.509096, DOI 10.1109/SC.1998.10004]
[4]  
[Anonymous], TURBOTILING LEVERAGI
[5]  
[Anonymous], 1991, DATA LOCALITY OPTIMI
[6]  
[Anonymous], IN PARCO
[7]   OpenTuner: An Extensible Framework for Program Autotuning [J].
Ansel, Jason ;
Kamil, Shoaib ;
Veeramachaneni, Kalyan ;
Ragan-Kelley, Jonathan ;
Bosboom, Jeffrey ;
O'Reilly, Una-May ;
Amarasinghe, Saman .
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, :303-315
[8]  
Antoine C.W., 2000, Parallel Computing, V27, P2001
[9]  
Bandishti V, 2012, INT CONF HIGH PERFOR
[10]  
Bao Bin., 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), CGO'13, P1