A CAD-based methodology to optimize HLS code via the Roofline model

被引：16

作者：

Siracusa, Marco ^{[1
]}

Rabozzi, Marco ^{[2
]}

Del Sozzo, Emanuele ^{[1
]}

Di Tucci, Lorenzo ^{[2
]}

Williams, Samuel ^{[3
]}

Santambrogio, Marco D. ^{[1
]}

机构：

[1] Politecn Milan, Milan, Italy

[2] Huxelerate srl, Milan, Italy

[3] Lawrence Berkeley Natl Lab, Berkeley, CA USA

来源：

2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD) | 2020年

关键词：

Roofline Model; FPGA; High-Performance Computing; CAD; DSE; DESIGN SPACE EXPLORATION; PERFORMANCE-MODEL; EFFICIENT;

D O I：

10.1145/3400302.3415730

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The intrinsic complexity of modern computing systems requires structured methods for analyzing and optimizing application performance. In this context, the Roofline model proposes an intuitive and visual method providing performance insight and optimization guidance for a given architecture. Although this methodology successfully models multicore and GPU performance optimizations, the original formulation does not directly apply to FPGA devices. For this reason, we propose a Roofline model analysis for reconfigurable architectures and an associated CAD tool for assisting HLS optimization of C/C++ applications. We firstly model FPGA attainable performance by means of an analytical method. Then, we integrate locality walls and a DSE engine for an enhanced optimization process. Starting from a software version of the N-body algorithm, we firstly illustrate how our methodology helps at quickly achieving performance comparable to a state-of-the-art FPGA bespoke implementation. Then, we illustrate an assisted platform porting of the Smith-Waterman sequence alignment providing a 9x speedup. Finally, we evaluated the single DSE engine on the Poly-Bench test suite and achieved performance improvements up to 14.36x compared to previous automated solutions in the literature.

引用

页数：9

共 42 条

[1] Programming languages for data-Intensive HPC applications: A systematic mapping study [J].

Amaral, Vasco ;

Norberto, Beatriz ;

Goulao, Miguel ;

Aldinucci, Marco ;

Benkner, Siegfried ;

Bracciali, Andrea ;

Carreira, Paulo ;

Celms, Edgars ;

Correia, Luis ;

Grelck, Clemens ;

Karatza, Helen ;

Kessler, Christoph ;

Kilpatrick, Peter ;

Martiniano, Hugo ;

Mavridis, Ilias ;

Pllana, Sabri ;

Respicio, Ana ;

Simao, Jose ;

Veiga, Luis ;

Visa, Ari .

PARALLEL COMPUTING, 2020, 91

[2] Abstract Machine Models and Proxy Architectures for Exascale Computing [J].

Ang, J. A. ;

Barrett, R. F. ;

Benner, R. E. ;

Burke, D. ;

Chan, C. ;

Cook, J. ;

Donofrio, D. ;

Hammond, S. D. ;

Hemmert, K. S. ;

Kelly, S. M. ;

Le, H. ;

Leung, V. J. ;

Resnick, D. R. ;

Rodrigues, A. F. ;

Shalf, J. ;

Stark, D. ;

Unat, D. ;

Wright, N. J. .

2014 HARDWARE-SOFTWARE CO-DESIGN FOR HIGH PERFORMANCE COMPUTING (CO-HPC), 2014, :25-32

[3]

[Anonymous], Amazon EC2 F1 Instances

[4]

[Anonymous], 2008, SC 08

[5]

Bacon DF, 2013, COMMUN ACM, V56, P56, DOI 10.1145/2436256.2436271

[6]

Berkeley Lab, EMPIRICAL ROOFLINE T

[7]

CHOI YJ, 2018, 2018 INT S ANT PROP, pNI837

[8]

Codina J. M., 2002, Conference Proceedings of the 2002 International Conference on SUPERCOMPUTING, P97, DOI 10.1145/514191.514208

[9]

Cong Jason, 2013, Languages and Compilers for Parallel Computing. 25th International Workshop (LCPC 2012). Revised Selected Papers, P143, DOI 10.1007/978-3-642-37658-0_10

[10] An efficient and versatile scheduling algorithm based on SDC formulation [J].

Cong, Jason ;

Zhang, Zhiru .

43RD DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2006, 2006, :433-+

← 1 2 3 4 5 →