Performance Evaluation of Pipelined Communication Combined with Computation in OpenCL Programming on FPGA

被引:20
作者
Fujita, Norihisa [1 ,2 ]
Kobayashi, Ryohei [1 ,2 ]
Yamaguchi, Yoshiki [1 ,2 ]
Ueno, Tomohiro [3 ]
Sano, Kentaro [3 ]
Boku, Taisuke [1 ,2 ]
机构
[1] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[3] RIKEN, Ctr Computat Sci, Kobe, Hyogo, Japan
来源
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020) | 2020年
关键词
FPGA; OpenCL; HLS; parallel computing; inter-connection;
D O I
10.1109/IPDPSW50202.2020.00083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, much High Performance Computing (HPC) researchers attract to utilize Field Programmable Gate Arrays (FPGAs) for HPC applications. We can use FPGAs for communication as well as computation thanks to FPGA's I/O capabilities. HPC scientists cannot utilize FPGAs for their applications because of the difficulty of the FPGA development, however High Level Synthesis (HLS) allows them to use with appropriate costs. In this study, we propose a Communication Integrated Reconfigurable CompUting System (CIRCUS) to enable us to utilize high-speed interconnection of FPGAS from OpenCL. CIRCUS makes a fused single pipeline combining the computation and the communication, which hides the communication latency by completely overlapping them. In this paper, we present the detail of the implementation and the evaluation result using two benchmarks: pingpong benchmark and allreduce benchmark.
引用
收藏
页码:450 / 459
页数:10
相关论文
共 13 条
[1]   The Tofu Interconnect D [J].
Ajima, Yuichiro ;
Kawashima, Takahiro ;
Okamoto, Takayuki ;
Shida, Naoyuki ;
Hirai, Kouichi ;
Shimizu, Toshiyuki ;
Hiramoto, Shinya ;
Ikeda, Yoshiro ;
Yoshikawa, Takahide ;
Uchida, Kenji ;
Inoue, Tomohiro .
2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, :646-654
[2]  
Alverson Robert, 2010, Proceedings of the 18th IEEE Symposium on High Performance Interconnects (HOTI 2010), P83, DOI 10.1109/HOTI.2010.23
[3]  
Center for Parallel Computing, PC2 NOCT
[4]   Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware [J].
De Matteis, Tiziano ;
Licht, Johannes de Fine ;
Beranek, Jakub ;
Hoefler, Torsten .
PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
[5]   Parallel Processing on FPGA Combining Computation and Communication in OpenCL Programming [J].
Fujita, Norihisa ;
Kobayashi, Ryohei ;
Yamaguchi, Yoshiki ;
Boku, Taisuke .
2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, :479-488
[6]  
Fujita Norihisa, 2018, P 9 INT S HIGHL EFF
[7]   Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer [J].
Idomura, Yasuhiro ;
Nakata, Motoki ;
Yamada, Susumu ;
Machida, Masahiko ;
Imamura, Toshiyuki ;
Watanabe, Tomohiko ;
Nunami, Masanori ;
Inoue, Hikaru ;
Tsutsumi, Shigenobu ;
Miyoshi, Ikuo ;
Shida, Naoyuki .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2014, 28 (01) :73-86
[8]  
Intel Corporation, E TIL TRASC PHY US G
[9]  
Intel Corporation, SERIALLITE 3 STREAM
[10]   OpenCL-based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes [J].
Kenter, Tobias ;
Mahale, Gopinath ;
Alhaddad, Samer ;
Grynko, Yevgen ;
Foerstner, Jens ;
Plessl, Christian ;
Schmitt, Christian ;
Afzal, Ayesha ;
Hannig, Frank .
PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, :189-196