Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth

被引：75

作者：

Sano, Kentaro ^{[1
]}

Hatsuda, Yoshiaki ^{[2
]}

Yamamoto, Satoru ^{[1
]}

机构：

[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 980, Japan

[2] Kobo Co Ltd, Kawaguchi, Saitama, Japan

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2014年 / 25卷 / 03期

关键词：

Scalable streaming-array; stencil computation; custom computing machine; FPGA; high-performance computation; MODEL;

D O I：

10.1109/TPDS.2013.51

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stencil computation is one of the important kernels in scientific computations. However, sustained performance is limited owing to restriction on memory bandwidth, especially on multicore microprocessors and graphics processing units (GPUs) because of their small operational intensity. In this paper, we present a custom computing machine (CCM), called a scalable streaming-array (SSA), for high-performance stencil computations with multiple field-programmable gate arrays (FPGAs). We design SSA based on a domain-specific programmable concept, where CCMs are programmable with the minimum functionality required for an algorithm domain. We employ a deep pipelining approach over successive iterations to achieve linear scalability for multiple devices with a constant memory bandwidth. Prototype implementation using nine FPGAs demonstrates good agreement with a performance model, and achieves 260 and 236 GFlop/s for 2D and 3D Jacobi computation, which are 87.4 and 83.9 percent of the peak, respectively, with a memory bandwidth of only 2.0 GB/s. We also evaluate the performance of SSA for state-of-the-art FPGAs.

引用

页码：695 / 705

页数：11

共 22 条

[1] [Anonymous], 2008, SC 08
[2] [Anonymous], P 9 ANN IEEE S FIELD
[3] [Anonymous], 2010, PROC IPDPS
[4] Augustin W, 2009, LECT NOTES COMPUT SC, V5704, P772, DOI 10.1007/978-3-642-03869-3_72
[5] Hageman L.A., 2012, Applied Iterative Methods
[6] Kobori T, 2003, LECT NOTES COMPUT SC, V2778, P755
[7] Kobori T., 2002, P INT C FIELD PROGR, P167
[8] Luzhou Wang, 2012, Reconfigurable Computing: Architectures, Tools and Applications. Proceedings of the 8th International Symposium, ARC 2012, P26, DOI 10.1007/978-3-642-28365-9_3
[9] GPU accelerated computing-from hype to mainstream, the rebirth of vector computing
Matsuoka, Satoshi
Aoki, Takayuki
Endo, Toshio
Nukada, Akira
Kato, Toshihiro
Hasegawa, Atushi
[J]. SCIDAC 2009: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2009, 180
[10] Performance modeling of 2D cellular automata on FPGA
Murtaza, S.
Hoekstra, A. G.
Sloot, P. M. A.
[J]. 2007 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 2007, : 74 - 78

← 1 2 3 →