Loop Transformations Leveraging Hardware Prefetching

被引:6
作者
Sioutas, Savvas [1 ]
Stuijk, Sander [1 ]
Corporaal, Henk [1 ]
Basten, Twan [2 ]
Somers, Lou [3 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
[2] Eindhoven Univ Technol, TNO ESI, Eindhoven, Netherlands
[3] Eindhoven Univ Technol, Oce Technol, Eindhoven, Netherlands
来源
PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18) | 2018年
关键词
loop optimizations; compiler optimizations; Halide; PERFORMANCE;
D O I
10.1145/3168823
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Memory-bound applications heavily depend on the band-width of the system in order to achieve high performance. Improving temporal and/or spatial locality through loop transformations is a common way of mitigating this dependency. However, choosing the right combination of optimizations is not a trivial task, due to the fact that most of them alter the memory access pattern of the application and as a result interfere with the efficiency of the hardware prefetching mechanisms present in modern architectures. We propose an optimization algorithm that analytically classifies an algorithmic description of a loop nest in order to decide whether it should be optimized stressing its temporal or spatial locality, while also taking hardware prefetching into account. We implement our technique as a tool to be used with the Halide compiler and test it on a variety of benchmarks. We find an average performance improvement of over 40% compared to previous analytical models targeting the Halide language and compiler.
引用
收藏
页码:254 / 264
页数:11
相关论文
共 29 条
  • [21] Automatically Scheduling Halide Image Processing Pipelines
    Mullapudi, Ravi Teja
    Adams, Andrew
    Sharlet, Dillon
    Ragan-Kelley, Jonathan
    Fatahalian, Kayvon
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04):
  • [22] Analysis of memory hierarchy performance of block data layout
    Park, N
    Hong, B
    Prasanna, VK
    [J]. 2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDING, 2002, : 35 - 44
  • [23] Peemen M, 2015, DES AUT TEST EUROPE, P169
  • [24] Qasem Apan., 2006, PROC ACM INT C SUPER, P249, DOI DOI 10.1145/1183401.1183437
  • [25] Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines
    Ragan-Kelley, Jonathan
    Barnes, Connelly
    Adams, Andrew
    Paris, Sylvain
    Durand, Fredo
    Amarasinghe, Saman
    [J]. ACM SIGPLAN NOTICES, 2013, 48 (06) : 519 - 530
  • [26] Shirako J, 2012, LECT NOTES COMPUT SC, V7210, P101, DOI 10.1007/978-3-642-28652-0_6
  • [27] Tavarageri Sanket., 2011, P INT C HIGH PERFORM, P1, DOI DOI 10.1109/HIPC.2011.6152742
  • [28] Is search really necessary to generate high-performance BLAS?
    Yotov, K
    Li, XM
    Ren, G
    Garzarán, M
    Padua, D
    Pingali, K
    Stodghill, P
    [J]. PROCEEDINGS OF THE IEEE, 2005, 93 (02) : 358 - 386
  • [29] Yuki T, 2010, INT SYM CODE GENER, P190