Multilevel Granularity Parallelism Synthesis on FPGAs

被引：28

作者：

Papakonstantinou, Alexandros ^{[1
]}

Liang, Yun ^{[2
]}

Stratton, John A. ^{[1
]}

Gururaj, Karthik ^{[3
]}

Chen, Deming ^{[1
]}

Hwu, Wen-Mei W. ^{[1
]}

Cong, Jason ^{[3
]}

机构：

[1] Univ Illinois, Elect & Comp Eng Dept, Urbana, IL 61801 USA

[2] Adv Digital Sci Ctr, Singapore, Singapore

[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA USA

来源：

2011 IEEE 19TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM) | 2011年

关键词：

FPGA; High-Level Sytnthesis; Parallel Computing; Design Space Exploration;

D O I：

10.1109/FCCM.2011.29

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed netlist - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.

引用

页码：178 / 185

页数：8

共 20 条

[1]

[Anonymous], 2009, P IEEE ACM INT C COM

[2]

[Anonymous], 2008, High-Level Synthesis

[3]

[Anonymous], 2008, OpenMP Application Program Interface

[4]

Bilavarn S., 2006, COMPUTER AIDED DESIG, V25

[5]

Cabrera D., 2009, P IEEE INT C SYST AR

[6]

Cong J., 2006, P IEEE INT SOC C

[7] Automatic mapping of C to FPGAs with the DEFACTO compilation and synthesis system [J].

Diniz, P ;

Hall, M ;

Park, J ;

So, B ;

Ziegler, H .

MICROPROCESSORS AND MICROSYSTEMS, 2005, 29 (2-3) :51-62

[8]

Hagiescu A., 2009, P IEEE ACM DES AUT C

[9]

IMPACT Rresearch Group, 2010, PARB BENCHM SUIT

[10]

Impulse Accelerated Technologies, 2010, IMP CODEVELOPER

← 1 2 →