Data Transfer Modeling and Optimization in Reconfigurable Multi-Accelerator Systems

被引:0
作者
Ortiz, Alberto [1 ]
Rodriguez, Alfonso [1 ]
Otero, Andres [1 ]
de la Torre, Eduardo [1 ]
机构
[1] Univ Politecn Madrid, Ctr Elect Ind, Madrid, Spain
来源
2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年
关键词
FPGAs; Communication Modeling; Dynamic and Partial Reconfiguration; Hardware Architectures; DESIGN;
D O I
10.1109/recosoc48741.2019.9034940
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The use of accelerator-centric processing architectures in different application scenarios, ranging from the cloud to the edge, is nowadays a reality. However, the always increasing stringent operating conditions and requirements continues to push the research around hardware-based processing architectures, which are able to provide medium to high computing performance capabilities while at the same time supporting energy-efficient execution. In addition, reconfigurable devices (i.e., FPGAs) provide another degree of freedom by enabling software-like flexibility by time-multiplexing the computing resources. Nevertheless, bus-based computing platforms still face architectural bottlenecks when data transfers are not handled efficiently. In this paper, the communication overhead in a re configurable multi-accelerator architecture for high-performance embedded computing is analyzed and modeled. The obtained models are then used to predict the acceleration perfomance and to evaluate two different patterns for data transfers: on the one hand, a basic approach in which data preparation and DMA transfers are executed sequentially; on the other hand, a pipelined approach in which data preparation and DMA transfers are executed in parallel. The evaluation method is based on well-known accelerator benchmarks from the MachSuite suite. Experimental results show that using a pipelined data management approach increases performance up to 2.6x when compared to the sequential alternative, and up to 26.46x when compared with a bare-metal execution of the accelerators (i.e., without using the reconfigurable multi-accelerator processing architecture nor an Operating System).
引用
收藏
页码:20 / 26
页数:7
相关论文
共 13 条
[1]  
Chen T., 2016, MICRO 49, P46
[2]  
Hara Y., 2009, JIP, V17, P242
[3]  
Kumar S, 2002, IEEE COMP SOC ANN, P117, DOI 10.1109/ISVLSI.2002.1016885
[4]  
Reagen B, 2014, I S WORKL CHAR PROC, P110, DOI 10.1109/IISWC.2014.6983050
[5]  
RIOSNAVARRO A, 2018, 2018 IEEE 18 INT C N, P1
[6]  
Rodriguez Alfonso, 2015, 2015 10th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC), P1, DOI 10.1109/ReCoSoC.2015.7238086
[7]  
Rodriguez A., 2018, SENSORS, V18
[8]  
Shao YS, 2016, 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), P1, DOI [10.1109/ETS.2016.7519291, 10.1109/CompComm.2016.7924653]
[9]   A Smart Network Interface Approach for Distributed Applications on Xilinx Zynq SoCs [J].
Shreejith, Shanker ;
Cooke, Ryan A. ;
Fahmy, Suhaib A. .
2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, :186-190
[10]   Analytical Delay Model for CPU-FPGA Data Paths in Programmable System-on-Chip FPGA [J].
Tahghighi, Mohammad ;
Sinha, Sharad ;
Zhang, Wei .
APPLIED RECONFIGURABLE COMPUTING, ARC 2016, 2016, :159-170