OpenCL-enabled Parallel Raytracing for Astrophysical Application on Multiple FPGAs with Optical Links

被引：4

作者：

Fujita, Norihisa ^{[1
,2
]}

Kobayashi, Ryohei ^{[1
,2
]}

Yamaguchi, Yoshiki ^{[1
,2
]}

Boku, Taisuke ^{[1
,2
]}

Yoshikawa, Kohji ^{[1
,3
]}

Abe, Makito ^{[1
]}

Umemura, Masayuki ^{[1
,3
]}

机构：

[1] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan

[2] Univ Tsukuba, Degree Program Syst & Informat Engn, Tsukuba, Ibaraki, Japan

[3] Univ Tsukuba, Degree Program Pure & Appl Sci, Tsukuba, Ibaraki, Japan

来源：

PROCEEDINGS OF H2RC 2020: 2020 SIXTH IEEE/ACM INTERNATIONAL WORKSHOP ON HETEROGENEOUS HIGH-PERFORMANCE RECONFIGURABLE COMPUTING (H2RC) | 2020年

关键词：

FPGA; OpenCL; HLS; parallel computing; inter-connection;

D O I：

10.1109/H2RC51942.2020.00011

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In an earlier study, we optimized the Authentic Radiative Transfer (ART) method to solve the space radiative transfer problems in early universe astrophysical simulations using an Intel Arria 10 Field Programmable Gate Array (FPGA). In this paper, we optimize this method for use on the latest FPGA, an Intel Stratix 10, and evaluate its performance by comparing the GPU implementation on multiple nodes. For the multi-FPGA computing and communication framework, we apply our original system, called as Communication Integrated Reconfigurable CompUting System (CIRCUS), to realize OpenCL based programming and utilize multiple optical links on an FPGA for parallel FPGA processing, and this study is the first implementation of a real application applied using CIRCUS. The FPGA implementation is 4.54-, 8.41-, and 10.64-times faster than that of a GPU on one, two, and four nodes, respectively, for multi-GPU cases using an InfiniBand HDR100 network. It also achieves 94.2% parallel efficiency running on four FPGAs. We believe this efficiency is brought about from the low-latency and high-efficiency pipelined communication of CIRCUS, which provide easy programming on multi-FPGAs using OpenCL for high-performance computing applications.

引用

页码：48 / 55

页数：8

共 14 条

[1] Center for Parallel Computing, PC2 NOCT U PAD
[2] Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware
De Matteis, Tiziano
Licht, Johannes de Fine
Beranek, Jakub
Hoefler, Torsten
[J]. PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
[3] Performance Evaluation of Pipelined Communication Combined with Computation in OpenCL Programming on FPGA
Fujita, Norihisa
Kobayashi, Ryohei
Yamaguchi, Yoshiki
Ueno, Tomohiro
Sano, Kentaro
Boku, Taisuke
[J]. 2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 450 - 459
[4] Fujita Norihisa, 2018, P 9 INT S HIGHL EFF, V6, P1
[5] HEALPix:: A framework for high-resolution discretization and fast analysis of data distributed on the sphere
Górski, KM
Hivon, E
Banday, AJ
Wandelt, BD
Hansen, FK
Reinecke, M
Bartelmann, M
[J]. ASTROPHYSICAL JOURNAL, 2005, 622 (02) : 759 - 771
[6] Hill K, 2015, IEEE INT CONF ASAP, P189, DOI 10.1109/ASAP.2015.7245733
[7] Accelerating Radiative Transfer Simulation with GPU-FPGA Cooperative Computation
Kobayashi, Ryohei
Fujita, Norihisa
Yamaguchi, Yoshiki
Boku, Taisuke
Yoshikawa, Kohji
Abe, Makito
Umemura, Masayuki
[J]. 2020 IEEE 31ST INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2020), 2020, : 9 - 16
[8] Norihisa Fujita, 2018, P INT C HIGH PERFORM, P1
[9] NVIDIA Corporation, GPUDIRECT RDMA
[10] ARGOT: accelerated radiative transfer on grids using oct-tree
Okamoto, Takashi
Yoshikawa, Kohji
Umemura, Masayuki
[J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2012, 419 (04) : 2855 - 2866

← 1 2 →