Architecture and Application Co-Design for Beyond-FPGA Reconfigurable Acceleration Devices

被引:5
作者
Boutros, Andrew [1 ,2 ]
Nurvitadhi, Eriko [2 ]
Betz, Vaughn [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 3G4, Canada
[2] Intel Corp, Programmable Solut Grp, Santa Clara, CA 95054 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Deep learning; field-programmable gate arrays; hardware acceleration; network-on-chip; reconfigurable computing; EMBEDDED NETWORKS; CHIP;
D O I
10.1109/ACCESS.2022.3204664
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, field-programmable gate arrays (FPGAs) have been increasingly deployed in datacenters as programmable accelerators that can offer software-like flexibility and custom-hardware-like efficiency for key datacenter workloads. To improve the efficiency of FPGAs for these new datacenter use cases and data-intensive applications, a new class of reconfigurable acceleration devices (RADs) is emerging. In these devices, the FPGA fine-grained reconfigurable fabric is a component of a bigger monolithic or multi-die system-in-package that can incorporate general-purpose software-programmable cores, domain-specialized accelerator blocks, and high-performance networks-on-chip (NoCs) for efficient communication between these system components. The integration of all these components in a RAD results in a huge design space and requires re-thinking the implementation of applications that need to be migrated from conventional FPGAs to these novel devices. In this work, we introduce RAD-Sim, an architecture simulator that allows rapid design space exploration for RADs and facilitates the study of complex interactions between their various components. We also present a case study that highlights the utility of RAD-Sim in re-designing applications for these novel RADs by mapping a state-of-the-art deep learning (DL) inference FPGA overlay to different RAD instances. Our case study illustrates how RAD-Sim can capture a wide variety of reconfigurable architectures, from conventional FPGAs to devices augmented with hard NoCs, specialized matrix-vector blocks, and 3D-stacked multi-die devices. In addition, we show that our tool can help architects evaluate the effect of specific RAD architecture parameters on end-to-end workload performance. Through RAD-Sim, we also show that novel RADs can potentially achieve 2.6x better performance on average compared to conventional FPGAs in the key DL application domain.
引用
收藏
页码:95067 / 95082
页数:16
相关论文
共 53 条
  • [41] Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs
    Nurvitadhi, Eriko
    Kwon, Dongup
    Jafari, Ali
    Boutros, Andrew
    Sim, Jaewoong
    Tomson, Phillip
    Sumbul, Huseyin
    Chen, Gregory
    Knag, Phil
    Kumar, Raghavan
    Krishnamurthy, Ram
    Gribok, Sergey
    Pasca, Bogdan
    Langhammer, Martin
    Marr, Debbie
    Dasu, Aravind
    [J]. 2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 199 - 207
  • [42] Papamichael MK, 2012, FPGA 12: PROCEEDINGS OF THE 2012 ACM-SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, P37
  • [43] Putnam A, 2014, CONF PROC INT SYMP C, P13, DOI 10.1109/ISCA.2014.6853195
  • [44] Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology
    Seshadri, Vivek
    Lee, Donghyuk
    Mullins, Thomas
    Hassan, Hasan
    Boroumand, Amirali
    Kim, Jeremie
    Kozuch, Michael A.
    Mutlu, Onur
    Gibbons, Phillip B.
    Mowry, Todd C.
    [J]. 50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 273 - 287
  • [45] Shao YS, 2014, CONF PROC INT SYMP C, P97, DOI 10.1109/ISCA.2014.6853196
  • [46] Kratos: Discovering Inconsistent Security Policy Enforcement in the Android Framework
    Shao, Yuru
    Ott, Jason
    Chen, Qi Alfred
    Qian, Zhiyun
    Mao, Z. Morley
    [J]. 23RD ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2016), 2016,
  • [47] VERSAL NETWORK-on-CHIP (NoC)
    Swarbrick, Ian
    Gaitonde, Dinesh
    Ahmad, Sagheer
    Jayadev, Bala
    Cuppett, Jeff
    Morshed, Abbas
    Gaide, Brian
    Arbel, Ygal
    [J]. 2019 IEEE SYMPOSIUM ON HIGH-PERFORMANCE INTERCONNECTS (HOTI 2019), 2019, : 13 - 17
  • [48] Network-on-Chip Programmable Platform in Versal™ ACAP Architecture
    Swarbrick, Ian
    Gaitonde, Dinesh
    Ahmad, Sagheer
    Gaide, Brian
    Arbel, Ygal
    [J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 212 - 221
  • [49] Velagapudi S., 2022, WP0131310 INT
  • [50] Wong H, 2011, FPGA 11: PROCEEDINGS OF THE 2011 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, P5