Towards a Modular RISC-V Based Many-Core Architecture for FPGA Accelerators

被引:19
作者
Kamaleldin, Ahmed [1 ]
Hesham, Salma [1 ]
Gohringer, Diana [1 ,2 ]
机构
[1] Tech Univ Dresden, Adapt Dynam Syst, D-01069 Dresden, Germany
[2] Tech Univ Dresden, Ctr Tactile Internet Human In The Loop CeTI, D-01069 Dresden, Germany
关键词
Field programmable gate arrays; Hardware; Architecture; Scalability; Open source software; Memory management; Many-core architecture; parallel computing; RISC-V; network-on-chip (NoC); field programmable gate array (FPGA); reconfigurable computing; FRAMEWORK;
D O I
10.1109/ACCESS.2020.3015706
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-/Many-core architectures are emerging as scalable, high-performance and energy-efficient computing platforms suitable for a variety of application domains from edge to cloud computing. Recently, the appearance of RISC-V open-source ISA creates new possibilities to develop customized computing platforms with high savings in the non-recurring engineering costs. Moreover, the current trends toward open-source hardware frameworks are aimed to reduce design time and cost for complex system-on-chip architectures. Therefore, modularity and re-usability of hardware components are major challenges for flexible hardware architectures. The motivation behind this work is to introduce a modular cluster-based many-core architecture for FPGA accelerators that is re-usable and flexible tailored to implement different many-core taxonomies with less design time and costs by using regular and replicated sets of computing, memory, and interconnection blocks. The proposed many-core architecture is built using multiple processing clusters coupled with a NoC for communication which allows a high degree of design scalability. The processing cluster inside features a configurable multi-core architecture consisting of multiple RISC-V processing elements (PE) tightly coupled with a bus-based interconnection for intra-cluster communication using parameterized scratchpad shared memory. Each PE features a single RISC-V core with a tightly coupled parameterized scratchpad local memory and generic AXI interface. Evaluation results demonstrate that the proposed architecture features a scalable computing performance of 501 MOp/s for 4 clusters and 878 MOp/s for 8 clusters. Moreover, a scalable memory bandwidth up to 4.3 GB/s is achieved for 9 clusters with a power consumption of 1.4 W per cluster utilizing 7.7% of on-chip memory resources. The many-core architecture is implemented and evaluated on Xilinx Virtex Ultrascale+ with the feature of changing the architecture configurations during run-time using dynamic and partial reconfiguration which provides more flexibility and re-usability.
引用
收藏
页码:148812 / 148826
页数:15
相关论文
共 26 条
[1]  
Airoldi R., 2017, COMPUTING PLATFORMS, P107
[2]   Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs [J].
Amid, Alon ;
Biancolin, David ;
Gonzalez, Abraham ;
Grubb, Daniel ;
Karandikar, Sagar ;
Liew, Harrison ;
Magyar, Albert ;
Mao, Howard ;
Ou, Albert ;
Pemberton, Nathan ;
Rigge, Paul ;
Schmidt, Colin ;
Wright, John ;
Zhao, Jerry ;
Shao, Yakun Sophia ;
Asanovic, Krste ;
Nikolic, Borivoje .
IEEE MICRO, 2020, 40 (04) :10-20
[3]  
[Anonymous], 2019, IEEE INT CONF VLSI
[4]  
[Anonymous], 2018, COMPUTERS, DOI DOI 10.3390/COMPUTERS7020027
[5]  
[Anonymous], 2016, IEEE SYM PARA DISTR, DOI DOI 10.1109/IPDPSW.2016.87
[6]  
[Anonymous], 2015, 2015 EUR C DIG SYST, DOI DOI 10.1109/DSD.2015.23
[7]   CoreVA-MPSoC: A Many-Core Architecture with Tightly Coupled Shared and Local Data Memories [J].
Ax, Johannes ;
Sievers, Gregor ;
Daberkow, Julian ;
Flasskamp, Martin ;
Vohrmann, Marten ;
Jungeblut, Thorsten ;
Kelly, Wayne ;
Porrmann, Mario ;
Rueckert, Ulrich .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (05) :1030-1043
[8]  
Balkind J., 2019, P 3 WORKSH COMP ARCH, P1
[9]  
Benini L, 2012, DES AUT TEST EUROPE, P983
[10]  
Carloni L. P., 2016, P 53 IEEE DES AUT C, P1