Data Transfer Modeling and Optimization in Reconfigurable Multi-Accelerator Systems

被引：0

作者：

Ortiz, Alberto ^{[1
]}

Rodriguez, Alfonso ^{[1
]}

Otero, Andres ^{[1
]}

de la Torre, Eduardo ^{[1
]}

机构：

[1] Univ Politecn Madrid, Ctr Elect Ind, Madrid, Spain

来源：

2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年

关键词：

FPGAs; Communication Modeling; Dynamic and Partial Reconfiguration; Hardware Architectures; DESIGN;

D O I：

10.1109/recosoc48741.2019.9034940

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The use of accelerator-centric processing architectures in different application scenarios, ranging from the cloud to the edge, is nowadays a reality. However, the always increasing stringent operating conditions and requirements continues to push the research around hardware-based processing architectures, which are able to provide medium to high computing performance capabilities while at the same time supporting energy-efficient execution. In addition, reconfigurable devices (i.e., FPGAs) provide another degree of freedom by enabling software-like flexibility by time-multiplexing the computing resources. Nevertheless, bus-based computing platforms still face architectural bottlenecks when data transfers are not handled efficiently. In this paper, the communication overhead in a re configurable multi-accelerator architecture for high-performance embedded computing is analyzed and modeled. The obtained models are then used to predict the acceleration perfomance and to evaluate two different patterns for data transfers: on the one hand, a basic approach in which data preparation and DMA transfers are executed sequentially; on the other hand, a pipelined approach in which data preparation and DMA transfers are executed in parallel. The evaluation method is based on well-known accelerator benchmarks from the MachSuite suite. Experimental results show that using a pipelined data management approach increases performance up to 2.6x when compared to the sequential alternative, and up to 26.46x when compared with a bare-metal execution of the accelerators (i.e., without using the reconfigurable multi-accelerator processing architecture nor an Operating System).

引用

页码：20 / 26

页数：7

共 13 条

[1]

Chen T., 2016, MICRO 49, P46

[2]

Hara Y., 2009, JIP, V17, P242

[3]

Kumar S, 2002, IEEE COMP SOC ANN, P117, DOI 10.1109/ISVLSI.2002.1016885

[4]

Reagen B, 2014, I S WORKL CHAR PROC, P110, DOI 10.1109/IISWC.2014.6983050

[5]

RIOSNAVARRO A, 2018, 2018 IEEE 18 INT C N, P1

[6]

Rodriguez Alfonso, 2015, 2015 10th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC), P1, DOI 10.1109/ReCoSoC.2015.7238086

[7]

Rodriguez A., 2018, SENSORS, V18

[8]

Shao YS, 2016, 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), P1, DOI [10.1109/ETS.2016.7519291, 10.1109/CompComm.2016.7924653]

[9] A Smart Network Interface Approach for Distributed Applications on Xilinx Zynq SoCs [J].

Shreejith, Shanker ;

Cooke, Ryan A. ;

Fahmy, Suhaib A. .

2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, :186-190

[10] Analytical Delay Model for CPU-FPGA Data Paths in Programmable System-on-Chip FPGA [J].

Tahghighi, Mohammad ;

Sinha, Sharad ;

Zhang, Wei .

APPLIED RECONFIGURABLE COMPUTING, ARC 2016, 2016, :159-170

← 1 2 →