HLS-Based Optimization and Design Space Exploration for Applications with Variable Loop Bounds

被引：29

作者：

Choi, Young-kyu ^{[1
]}

Cong, Jason ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA

来源：

2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS | 2018年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1145/3240765.3240815

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In order to further increase the productivity of field-programmable gate array (FPGA) programmers, several design space exploration (DSE) frameworks for high-level synthesis (HLS) tools have been recently proposed to automatically determine the FPGA design parameters. However, one of the common limitations found in these tools is that they cannot find a design point with large speedup for applications with variable loop bounds. The reason is that loops with variable loop bounds cannot be efficiently parallelized or pipelined with simple insertion of HLS directives. Also, making highly accurate prediction of cycles and resource consumption on the entire design space becomes a challenging task because of the inaccuracy of the HLS tool cycle prediction and the wide design space. In this paper we present an HLS-based FPGA optimization and DSE framework that produces a high-performance design even in the presence of variable loop bounds. We propose code transformations that increase the utilization of the compute resources for variable loops, including several computation patterns with loop-carried dependency such as floating-point reduction and prefix sum. In order to rapidly perform DSE with high accuracy, we describe a resource and cycle estimation model constructed from the information obtained from the actual HLS synthesis. Experiments on applications with variable loop bounds in Polybench benchmarks with Vivado HLS show that our framework improves the baseline implementation by 75X on average and outperforms current state-of-the-art DSE frameworks.

引用

页数：8

共 27 条

[1]

Alpha Data, 2017, ALPH DAT ADM PCIE KU

[2]

Amazon, 2018, AM EC2 F1 INST

[3]

Chandra R., 2001, Parallel programing in OpenMP, DOI DOI 10.5555/355074

[4] HLScope: High-Level Performance Debugging for FPGA Designs [J].

Choi, Young-Kyu ;

Cong, Jason .

2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, :125-128

[5] A Sound and Complete Abstraction for Reasoning about Parallel Prefix Sums [J].

Chong, Nathan ;

Donaldson, Alastair F. ;

Ketema, Jeroen .

ACM SIGPLAN NOTICES, 2014, 49 (01) :397-409

[6]

Cong J., 2018, PROC ANN DESIGN AUTO, P154, DOI DOI 10.1145/3195970.3195999

[7]

Cong J., 2017, P ANN DES AUT C DAC

[8] High-Level Synthesis for FPGAs: From Prototyping to Deployment [J].

Cong, Jason ;

Liu, Bin ;

Neuendorffer, Stephen ;

Noguera, Juanjo ;

Vissers, Kees ;

Zhang, Zhiru .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2011, 30 (04) :473-491

[9] A Comparative Study on the Architecture Templates for Dynamic Nested Loops [J].

Cong, Jason ;

Zou, Yi .

2010 18TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2010), 2010, :251-254

[10]

Harris M., 2008, Parallel Prefix Sum (Scan) with CUDA

← 1 2 3 →