Architecture and Application Co-Design for Beyond-FPGA Reconfigurable Acceleration Devices

被引:5
作者
Boutros, Andrew [1 ,2 ]
Nurvitadhi, Eriko [2 ]
Betz, Vaughn [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 3G4, Canada
[2] Intel Corp, Programmable Solut Grp, Santa Clara, CA 95054 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Deep learning; field-programmable gate arrays; hardware acceleration; network-on-chip; reconfigurable computing; EMBEDDED NETWORKS; CHIP;
D O I
10.1109/ACCESS.2022.3204664
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, field-programmable gate arrays (FPGAs) have been increasingly deployed in datacenters as programmable accelerators that can offer software-like flexibility and custom-hardware-like efficiency for key datacenter workloads. To improve the efficiency of FPGAs for these new datacenter use cases and data-intensive applications, a new class of reconfigurable acceleration devices (RADs) is emerging. In these devices, the FPGA fine-grained reconfigurable fabric is a component of a bigger monolithic or multi-die system-in-package that can incorporate general-purpose software-programmable cores, domain-specialized accelerator blocks, and high-performance networks-on-chip (NoCs) for efficient communication between these system components. The integration of all these components in a RAD results in a huge design space and requires re-thinking the implementation of applications that need to be migrated from conventional FPGAs to these novel devices. In this work, we introduce RAD-Sim, an architecture simulator that allows rapid design space exploration for RADs and facilitates the study of complex interactions between their various components. We also present a case study that highlights the utility of RAD-Sim in re-designing applications for these novel RADs by mapping a state-of-the-art deep learning (DL) inference FPGA overlay to different RAD instances. Our case study illustrates how RAD-Sim can capture a wide variety of reconfigurable architectures, from conventional FPGAs to devices augmented with hard NoCs, specialized matrix-vector blocks, and 3D-stacked multi-die devices. In addition, we show that our tool can help architects evaluate the effect of specific RAD architecture parameters on end-to-end workload performance. Through RAD-Sim, we also show that novel RADs can potentially achieve 2.6x better performance on average compared to conventional FPGAs in the key DL application domain.
引用
收藏
页码:95067 / 95082
页数:16
相关论文
共 53 条
  • [1] Design and Applications for Embedded Networks-on-Chip on FPGAs
    Abdelfattah, Mohamed S.
    Bitar, Andrew
    Betz, Vaughn
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (06) : 1008 - 1021
  • [2] THE CASE FOR EMBEDDED NETWORKS ON CHIP ON FIELD-PROGRAMMABLE GATE ARRAYS
    Abdelfattah, Mohamed S.
    Betz, Vaughn
    [J]. IEEE MICRO, 2014, 34 (01) : 80 - 89
  • [3] Abdelfattah MS, 2012, 2012 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT'12), P95, DOI 10.1109/FPT.2012.6412118
  • [4] Abdelfattah MohamedS., 2015, FPGA, P98, DOI DOI 10.1145/2684746.2689074
  • [5] Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
    Albericio, Jorge
    Judd, Patrick
    Hetherington, Tayler
    Aamodt, Tor
    Jerger, Natalie Enright
    Moshovos, Andreas
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 1 - 13
  • [6] MRIMA: An MRAM-Based In-Memory Accelerator
    Angizi, Shaahin
    He, Zhezhi
    Awad, Amro
    Fan, Deliang
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (05) : 1123 - 1136
  • [7] [Anonymous], 2019, SPEEDSTER7T MACHINE
  • [8] [Anonymous], 2016, 2016 49 ANN IEEE ACM, DOI [DOI 10.1109/MICRO.2016.7783710, 10.1109/MICRO.2016.7783710]
  • [9] [Anonymous], 2019, Speedster7t Network on Chip User Guide (UG089)
  • [10] Arora Aman, 2021, FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable, P23, DOI 10.1145/3431920.3439282