Hexagonal Tiling based Multiple FPGAs Stencil Computation Acceleration and Optimization Methodology

被引：0

作者：

Wang, Jinyu ^{[1
]}

Kang, Yifei ^{[1
]}

Li, Yiwen ^{[1
]}

Wu, Weiguo ^{[1
]}

Liu, Song ^{[1
]}

Wang, Longxiang ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian, Peoples R China

来源：

19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021) | 2021年

关键词：

hexagonal tiling; stencil computation; Field Programmable Gate Array; multiple FPGAs; acceleration; SYSTEMS;

D O I：

10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00101

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, multiple Field Programmable Gate Arrays (FPGAs) accelerators have been widely used in stencil computation fields. However, the state-of-the-art hexagonal tiling algorithm that efficiently improves stencil computation performance is mainly designed for CPUs or GPUs, which not suitable to directly process on FPGAs, leading to lower performance. To address this, a hexagonal tiling based multiple FPGAs stencil computation architecture and the corresponding optimization algorithm are proposed in this paper. The architecture uses the on-chip registers to store and carry cells data of a hexagonal tile. In this way, the scale and size of tiles are dramatically increased as well as the intra-FPGA calculation performance. Then, to take full advantage of multiple FPGAs processing ability, a memory shared inter-FPGAs high speed data transfer structure is devised. Finally, the Mixed-Integer Linear Programming (MILP) is used to optimize an objective function which considers the candidate FPGAs costs, computation latency and resources utilization to obtain a desirable tile size and layout result. The proposed method has been validated on the FPGA cluster which consists of two Xilinx Alveo U50 and one Alveo U250 devices. And experimental results show that we achieve performance up to 580 Gflop/s using one U50 device and 2261 Gflop/s using three FPGAs. The proposed optimizer is also tested with state-of-the-art multiple FPGAs stencil computation acceleration method and the performance is increased by 21.8% at most and 20.04% on average.

引用

页码：697 / 705

页数：9

共 4 条

[1] OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology
Waidyasooriya, Hasitha Muthumala
Takei, Yasuhiro
Tatsumi, Shunsuke
Hariyama, Masanori
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1390 - 1402
[2] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Shen, Jingcheng
Long, Linbo
Deng, Xin
Okita, Masao
Ino, Fumihiko
JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
[3] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Jingcheng Shen
Linbo Long
Xin Deng
Masao Okita
Fumihiko Ino
The Journal of Supercomputing, 2023, 79 : 11055 - 11077
[4] A general methodology for reliability-based robust design optimization of computation-intensive engineering problems
Lai, Xiongming
Huang, Ju
Zhang, Yong
Wang, Cheng
Zhang, Xiaodong
JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2022, 9 (05) : 2151 - 2169

← 1 →