HLScope: High-Level Performance Debugging for FPGA Designs

被引:23
作者
Choi, Young-Kyu [1 ]
Cong, Jason [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
来源
2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017) | 2017年
关键词
D O I
10.1109/FCCM.2017.44
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In their quest for further optimization, field-programmable gate array (FPGA) designers often spend considerable time trying to identify the performance bottleneck in a current design. But since FPGAs do not have built-in high-level probes for performance analysis, manual effort is required to insert custom hardware monitors. This, however, is a time-consuming process which calls for automation. Previous work automates the process of inserting hardware monitors into the communication channels or the finite-state machine, but the instrumentation is applied in low-level hardware description languages (HDL) which limits the comprehensibility in identifying the root cause of stalls. Instead, we propose a performance debugging methodology based on high-level synthesis (HLS). High-level analysis allows tracing the cause of stalls on a function or loop level, which provides a more intuitive feedback that can be used to pinpoint the performance bottleneck. In this paper we propose HLScope, a source-to-source transformation framework based on Vivado HLS for automated performance analysis. We present a method for analyzing the information collected from the software simulation to estimate the stall rate and its cause without the need for FPGA bitstream generation. For detailed analysis, an in-FPGA analysis method is proposed that can be natively integrated into the HLS environment. Experiments show that the parameter extraction from the simulation process is orders of magnitude faster than bitstream generation, with a 2.2% cycle difference on average. In-FPGA flow consumes only about 170 LUTs and a BRAM per monitored module and provides cycle-accurate results.
引用
收藏
页码:125 / 128
页数:4
相关论文
共 15 条
[1]  
*ALPH DAT, 2013, ALPH DAT ADM PCIE 7V
[2]  
[Anonymous], 2017, NVIDIA NSIGHT
[3]  
[Anonymous], 2016, SDAccel Development Environment
[4]  
Choi Y., 2016, Proc. DAC, P109
[5]  
CURRERI J, 2009, ACM T RECONFIGURABLE, V3
[6]  
DEVILLE R, 2005, P 2005 INT C ENG REC, P175
[7]  
Finley D., 2007, Optimized QuickSort
[8]  
Intel, 2017, INT VTUNE AMPL
[9]  
Intel, 2021, Intel FPGA SDK for OpenCL
[10]   Performance analysis challenges and framework for high-performance reconfigurable computing [J].
Koehler, Seth ;
Curreri, John ;
George, Alan D. .
PARALLEL COMPUTING, 2008, 34 (4-5) :217-230