Loop Transformations Leveraging Hardware Prefetching

被引:6
作者
Sioutas, Savvas [1 ]
Stuijk, Sander [1 ]
Corporaal, Henk [1 ]
Basten, Twan [2 ]
Somers, Lou [3 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
[2] Eindhoven Univ Technol, TNO ESI, Eindhoven, Netherlands
[3] Eindhoven Univ Technol, Oce Technol, Eindhoven, Netherlands
来源
PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18) | 2018年
关键词
loop optimizations; compiler optimizations; Halide; PERFORMANCE;
D O I
10.1145/3168823
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Memory-bound applications heavily depend on the band-width of the system in order to achieve high performance. Improving temporal and/or spatial locality through loop transformations is a common way of mitigating this dependency. However, choosing the right combination of optimizations is not a trivial task, due to the fact that most of them alter the memory access pattern of the application and as a result interfere with the efficiency of the hardware prefetching mechanisms present in modern architectures. We propose an optimization algorithm that analytically classifies an algorithmic description of a loop nest in order to decide whether it should be optimized stressing its temporal or spatial locality, while also taking hardware prefetching into account. We implement our technique as a tool to be used with the Halide compiler and test it on a variety of benchmarks. We find an average performance improvement of over 40% compared to previous analytical models targeting the Halide language and compiler.
引用
收藏
页码:254 / 264
页数:11
相关论文
共 29 条
  • [1] Near-optimal loop tiling by means of Cache Miss Equations and genetic algorithms
    Abella, J
    González, A
    Llosa, J
    Vera, X
    [J]. 2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS OF THE WORKSHOPS, 2002, : 568 - 577
  • [2] [Anonymous], J PROGRAMMING LANGUA
  • [3] [Anonymous], 1998, SC 98, DOI [10.5555/509058.509096, DOI 10.1109/SC.1998.10004]
  • [4] [Anonymous], TURBOTILING LEVERAGI
  • [5] [Anonymous], 1991, DATA LOCALITY OPTIMI
  • [6] [Anonymous], IN PARCO
  • [7] OpenTuner: An Extensible Framework for Program Autotuning
    Ansel, Jason
    Kamil, Shoaib
    Veeramachaneni, Kalyan
    Ragan-Kelley, Jonathan
    Bosboom, Jeffrey
    O'Reilly, Una-May
    Amarasinghe, Saman
    [J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 303 - 315
  • [8] Antoine C.W., 2000, Parallel Computing, V27, P2001
  • [9] Bandishti V, 2012, INT CONF HIGH PERFOR
  • [10] Bao Bin., 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), CGO'13, P1