Hexagonal Tiling based Multiple FPGAs Stencil Computation Acceleration and Optimization Methodology

被引:0
|
作者
Wang, Jinyu [1 ]
Kang, Yifei [1 ]
Li, Yiwen [1 ]
Wu, Weiguo [1 ]
Liu, Song [1 ]
Wang, Longxiang [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian, Peoples R China
来源
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021) | 2021年
关键词
hexagonal tiling; stencil computation; Field Programmable Gate Array; multiple FPGAs; acceleration; SYSTEMS;
D O I
10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00101
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, multiple Field Programmable Gate Arrays (FPGAs) accelerators have been widely used in stencil computation fields. However, the state-of-the-art hexagonal tiling algorithm that efficiently improves stencil computation performance is mainly designed for CPUs or GPUs, which not suitable to directly process on FPGAs, leading to lower performance. To address this, a hexagonal tiling based multiple FPGAs stencil computation architecture and the corresponding optimization algorithm are proposed in this paper. The architecture uses the on-chip registers to store and carry cells data of a hexagonal tile. In this way, the scale and size of tiles are dramatically increased as well as the intra-FPGA calculation performance. Then, to take full advantage of multiple FPGAs processing ability, a memory shared inter-FPGAs high speed data transfer structure is devised. Finally, the Mixed-Integer Linear Programming (MILP) is used to optimize an objective function which considers the candidate FPGAs costs, computation latency and resources utilization to obtain a desirable tile size and layout result. The proposed method has been validated on the FPGA cluster which consists of two Xilinx Alveo U50 and one Alveo U250 devices. And experimental results show that we achieve performance up to 580 Gflop/s using one U50 device and 2261 Gflop/s using three FPGAs. The proposed optimizer is also tested with state-of-the-art multiple FPGAs stencil computation acceleration method and the performance is increased by 21.8% at most and 20.04% on average.
引用
收藏
页码:697 / 705
页数:9
相关论文
共 4 条
  • [1] OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology
    Waidyasooriya, Hasitha Muthumala
    Takei, Yasuhiro
    Tatsumi, Shunsuke
    Hariyama, Masanori
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1390 - 1402
  • [2] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Shen, Jingcheng
    Long, Linbo
    Deng, Xin
    Okita, Masao
    Ino, Fumihiko
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
  • [3] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Jingcheng Shen
    Linbo Long
    Xin Deng
    Masao Okita
    Fumihiko Ino
    The Journal of Supercomputing, 2023, 79 : 11055 - 11077
  • [4] A general methodology for reliability-based robust design optimization of computation-intensive engineering problems
    Lai, Xiongming
    Huang, Ju
    Zhang, Yong
    Wang, Cheng
    Zhang, Xiaodong
    JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2022, 9 (05) : 2151 - 2169