Less for More: Reducing Intra-CGRA Connectivity for Higher Performance and Efficiency in HPC

被引:3
作者
Adhi, Boma [1 ]
Cortes, Carlos [1 ]
Del Sozzo, Emanuele [1 ]
Ueno, Tomohiro [1 ]
Tan, Yiyu [2 ]
Kojima, Takuya [1 ,3 ]
Podobas, Artur [4 ]
Sano, Kentaro [1 ]
机构
[1] RIKEN, Ctr Computat Sci R CCS, Kobe, Hyogo, Japan
[2] Iwate Univ, Fac Sci & Engn, Morioka, Iwate, Japan
[3] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
[4] KTH Royal Inst Technol, Stockholm, Sweden
来源
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW | 2023年
基金
日本学术振兴会;
关键词
CGRA; Routing architecture; Design space exploration; HPC; RTL simulation; ARCHITECTURE; ADRES;
D O I
10.1109/IPDPSW59300.2023.00077
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Coarse-Grained Reconfigurable Arrays (CGRAs) are a class of reconfigurable architectures that inherit the performance of Domain-specific accelerators and the reconfigurability aspects of Field-Programmable Gate Arrays (FPGAs). Historically, CGRAs have been successfully used to accelerate embedded applications and are now considered to accelerate High-Performance Computing (HPC) applications in future supercomputers. However, embedded systems and supercomputers are two vastly different domains with different applications and constraints, and it is today not fully understood what CGRA design decisions adequately cater to the HPC market. One such unknown design decision is regarding the interconnect that facilitates intra-CGRA communication. Our findings show that even the typical king-style mesh-like topology is often underutilized with a typical HPC workload, leading to inefficiency. This research aims to explore the provisioning of intra-CGRA interconnect for HPC-oriented workloads and, ultimately, recoup the potential performance and efficiency lost by reducing the interconnect complexity. We proposed several reduced interconnect topologies based on the usage statistic. Then we evaluate the tradeoffs regarding hardware cost, routability of DFGs, and computational throughput.
引用
收藏
页码:452 / 459
页数:8
相关论文
共 23 条
[1]  
Adhi B., 2022, 1 INT WORKSHOP COARS
[2]  
Adhi B., 2022, 2022 INT C FIELDPROG, P1
[3]   The Cost of Flexibility: Embedded versus Discrete Routers in CGRAs for HPC [J].
Adhi, Boma ;
Cortes, Carlos ;
Tan, Yiyu ;
Kojima, Takuya ;
Podobas, Artur ;
Sano, Kentaro .
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, :347-356
[4]  
Bouwens F, 2007, LECT NOTES COMPUT SC, V4419, P1
[5]   An Architecture-Agnostic Integer Linear Programming Approach to CGRA Mapping [J].
Chin, S. Alexander ;
Anderson, Jason H. .
2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2018,
[6]  
Chin SA, 2017, IEEE INT CONF ASAP, P184, DOI 10.1109/ASAP.2017.7995277
[7]   Pushing the Level of Abstraction of Digital System Design: A Survey on How to Program FPGAs [J].
Del Sozzo, Emanuele ;
Conficconi, Davide ;
Zeni, Alberto ;
Salaris, Mirko ;
Sciuto, Donatella ;
Santambrogio, Marco D. .
ACM COMPUTING SURVEYS, 2023, 55 (05)
[8]   Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches? [J].
Domke, Jens ;
Matsumura, Kazuaki ;
Wahib, Mohamed ;
Zhang, Haoyu ;
Yashima, Keita ;
Tsuchikawa, Toshiki ;
Tsuji, Yohei ;
Podobas, Artur ;
Matsuoka, Satoshi .
2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, :78-88
[9]  
Ebeling C., 1996, Field-Programmable Logic. Smart Applications, New Paradigms and Compilers. 6th International Workshop on Field-Programmable Logic and Applications, FPL '96 Proceedings, P126
[10]  
Friedman S., 2009, PROC INT S FIELD PRO, P191