HLS-Based Optimization and Design Space Exploration for Applications with Variable Loop Bounds

被引:21
作者
Choi, Young-kyu [1 ]
Cong, Jason [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
来源
2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS | 2018年
基金
美国国家科学基金会;
关键词
D O I
10.1145/3240765.3240815
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In order to further increase the productivity of field-programmable gate array (FPGA) programmers, several design space exploration (DSE) frameworks for high-level synthesis (HLS) tools have been recently proposed to automatically determine the FPGA design parameters. However, one of the common limitations found in these tools is that they cannot find a design point with large speedup for applications with variable loop bounds. The reason is that loops with variable loop bounds cannot be efficiently parallelized or pipelined with simple insertion of HLS directives. Also, making highly accurate prediction of cycles and resource consumption on the entire design space becomes a challenging task because of the inaccuracy of the HLS tool cycle prediction and the wide design space. In this paper we present an HLS-based FPGA optimization and DSE framework that produces a high-performance design even in the presence of variable loop bounds. We propose code transformations that increase the utilization of the compute resources for variable loops, including several computation patterns with loop-carried dependency such as floating-point reduction and prefix sum. In order to rapidly perform DSE with high accuracy, we describe a resource and cycle estimation model constructed from the information obtained from the actual HLS synthesis. Experiments on applications with variable loop bounds in Polybench benchmarks with Vivado HLS show that our framework improves the baseline implementation by 75X on average and outperforms current state-of-the-art DSE frameworks.
引用
收藏
页数:8
相关论文
共 27 条
  • [1] Alpha Data, 2017, ALPH DAT ADM PCIE KU
  • [2] Amazon, 2018, AM EC2 F1 INST
  • [3] Chandra R., 2001, Parallel programing in OpenMP, DOI DOI 10.5555/355074
  • [4] HLScope: High-Level Performance Debugging for FPGA Designs
    Choi, Young-Kyu
    Cong, Jason
    [J]. 2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 125 - 128
  • [5] A Sound and Complete Abstraction for Reasoning about Parallel Prefix Sums
    Chong, Nathan
    Donaldson, Alastair F.
    Ketema, Jeroen
    [J]. ACM SIGPLAN NOTICES, 2014, 49 (01) : 397 - 409
  • [6] Cong J., 2018, PROC ANN DESIGN AUTO, P154, DOI DOI 10.1145/3195970.3195999
  • [7] Cong J., 2017, P ANN DES AUT C DAC
  • [8] High-Level Synthesis for FPGAs: From Prototyping to Deployment
    Cong, Jason
    Liu, Bin
    Neuendorffer, Stephen
    Noguera, Juanjo
    Vissers, Kees
    Zhang, Zhiru
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2011, 30 (04) : 473 - 491
  • [9] A Comparative Study on the Architecture Templates for Dynamic Nested Loops
    Cong, Jason
    Zou, Yi
    [J]. 2010 18TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2010), 2010, : 251 - 254
  • [10] Harris M., 2008, Parallel Prefix Sum (Scan) with CUDA