OpenCL-enabled Parallel Raytracing for Astrophysical Application on Multiple FPGAs with Optical Links

被引:4
作者
Fujita, Norihisa [1 ,2 ]
Kobayashi, Ryohei [1 ,2 ]
Yamaguchi, Yoshiki [1 ,2 ]
Boku, Taisuke [1 ,2 ]
Yoshikawa, Kohji [1 ,3 ]
Abe, Makito [1 ]
Umemura, Masayuki [1 ,3 ]
机构
[1] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Degree Program Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[3] Univ Tsukuba, Degree Program Pure & Appl Sci, Tsukuba, Ibaraki, Japan
来源
PROCEEDINGS OF H2RC 2020: 2020 SIXTH IEEE/ACM INTERNATIONAL WORKSHOP ON HETEROGENEOUS HIGH-PERFORMANCE RECONFIGURABLE COMPUTING (H2RC) | 2020年
关键词
FPGA; OpenCL; HLS; parallel computing; inter-connection;
D O I
10.1109/H2RC51942.2020.00011
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In an earlier study, we optimized the Authentic Radiative Transfer (ART) method to solve the space radiative transfer problems in early universe astrophysical simulations using an Intel Arria 10 Field Programmable Gate Array (FPGA). In this paper, we optimize this method for use on the latest FPGA, an Intel Stratix 10, and evaluate its performance by comparing the GPU implementation on multiple nodes. For the multi-FPGA computing and communication framework, we apply our original system, called as Communication Integrated Reconfigurable CompUting System (CIRCUS), to realize OpenCL based programming and utilize multiple optical links on an FPGA for parallel FPGA processing, and this study is the first implementation of a real application applied using CIRCUS. The FPGA implementation is 4.54-, 8.41-, and 10.64-times faster than that of a GPU on one, two, and four nodes, respectively, for multi-GPU cases using an InfiniBand HDR100 network. It also achieves 94.2% parallel efficiency running on four FPGAs. We believe this efficiency is brought about from the low-latency and high-efficiency pipelined communication of CIRCUS, which provide easy programming on multi-FPGAs using OpenCL for high-performance computing applications.
引用
收藏
页码:48 / 55
页数:8
相关论文
共 14 条
  • [1] Center for Parallel Computing, PC2 NOCT U PAD
  • [2] Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware
    De Matteis, Tiziano
    Licht, Johannes de Fine
    Beranek, Jakub
    Hoefler, Torsten
    [J]. PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [3] Performance Evaluation of Pipelined Communication Combined with Computation in OpenCL Programming on FPGA
    Fujita, Norihisa
    Kobayashi, Ryohei
    Yamaguchi, Yoshiki
    Ueno, Tomohiro
    Sano, Kentaro
    Boku, Taisuke
    [J]. 2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 450 - 459
  • [4] Fujita Norihisa, 2018, P 9 INT S HIGHL EFF, V6, P1
  • [5] HEALPix:: A framework for high-resolution discretization and fast analysis of data distributed on the sphere
    Górski, KM
    Hivon, E
    Banday, AJ
    Wandelt, BD
    Hansen, FK
    Reinecke, M
    Bartelmann, M
    [J]. ASTROPHYSICAL JOURNAL, 2005, 622 (02) : 759 - 771
  • [6] Hill K, 2015, IEEE INT CONF ASAP, P189, DOI 10.1109/ASAP.2015.7245733
  • [7] Accelerating Radiative Transfer Simulation with GPU-FPGA Cooperative Computation
    Kobayashi, Ryohei
    Fujita, Norihisa
    Yamaguchi, Yoshiki
    Boku, Taisuke
    Yoshikawa, Kohji
    Abe, Makito
    Umemura, Masayuki
    [J]. 2020 IEEE 31ST INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2020), 2020, : 9 - 16
  • [8] Norihisa Fujita, 2018, P INT C HIGH PERFORM, P1
  • [9] NVIDIA Corporation, GPUDIRECT RDMA
  • [10] ARGOT: accelerated radiative transfer on grids using oct-tree
    Okamoto, Takashi
    Yoshikawa, Kohji
    Umemura, Masayuki
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2012, 419 (04) : 2855 - 2866