Array-Specific Dataflow Caches for High-Level Synthesis of Memory-Intensive Algorithms on FPGAs

被引:4
作者
Brignone, Giovanni [1 ]
Jamal, M. Usman [1 ]
Lazarescu, Mihai T. [1 ]
Lavagno, Luciano [1 ]
机构
[1] Politecn Torino, Dept Elect & Telecommun, I-10129 Turin, Italy
关键词
Cache; FPGA; high-level synthesis; memory management;
D O I
10.1109/ACCESS.2022.3219868
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designs implemented on field-programmable gate arrays (FPGAs) via high-level synthesis (HLS) suffer from off-chip memory latency and bandwidth bottlenecks. FPGAs can access both large but slow off-chip memories (DRAM), and fast but small on-chip memories (block RAMs and registers). HLS tools allow exploiting the memory hierarchy in a scratchpad-like fashion, requring a significant manual effort. We propose an automation of the FPGA memory management in Xilinx Vitis HLS through a fully-configurable C++ source-level cache. Each DRAM-mapped array can be associated with a private level 2 (L2) cache with one or more ports, and each port can optionally provide level 1 cache. The L2 cache runs in a separate dataflow task with respect to the application accessing it. This solution isolates off-chip memory accesses and data buffering into dedicated dataflow tasks, resembling the load, compute, store design paradigm, but without the drawback of manual algorithm refactoring. Experimental results collected from FPGA board show that our cache speeds up the execution of a variety of benchmarks by up to 60 times compared to the out-of-the-box solution provided by HLS, requiring very limited optimization effort. Our caches are not meant to compete with manually optimized implementations quality of results (QoR), but rather to significantly save design effort, in exchange for some QoR, to make the HLS flow a bit more software-like, allowing the designer to focus on algorithmic optimizations, rather than on explicit memory management. Moreover, caching could be the only feasible memory optimization for algorithms with data-dependent or irregular memory access patterns, but with good data locality.
引用
收藏
页码:118858 / 118877
页数:20
相关论文
共 27 条
[21]   Shuhai: Benchmarking High Bandwidth Memory on FPGAs [J].
Wang, Zeke ;
Huang, Hongjing ;
Zhang, Jie ;
Alonso, Gustavo .
28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, :111-119
[22]  
Winterstein K., 2015, PROC INT C FIELD PRO, P151
[23]  
Xilinx, 2021, DES AN HARDW KERN MO
[24]  
Xilinx, PG118 XIL
[25]  
Xilinx, VIVADO DESIGN SUITE
[26]  
Xilinx, 2021, PYNQ PYTH PROD XIL P
[27]  
Xilinx, Vitis High-Level Synthesis User Guide