Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective

被引：9

作者：

Zeng, Shulin ^{[1
]}

Dai, Guohao ^{[2
]}

Zhang, Niansong ^{[1
]}

Yang, Xinhao ^{[1
]}

Zhang, Haoyu ^{[1
]}

Zhu, Zhenhua ^{[1
]}

Yang, Huazhong ^{[1
]}

Wang, Yu ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing 100190, Peoples R China

[2] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Computer architecture; Field programmable gate arrays; Dynamic scheduling; Optimization; Hardware; Bandwidth; Parallel processing; Multi-tenancy; deep neural network; multi-core; accelerator; FPGA;

D O I：

10.1109/TC.2022.3214113

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Network (DNN) INFerence-as-a-Service (INFaaS) is the dominating workload in current data centers, for which FPGAs become promising hardware platforms because of their high flexibility and energy efficiency. The dynamic and multi-tenancy nature of INFaaS requires careful design in three aspects: multi-tenant architecture, multi-DNN scheduling, and multi-core mapping. These three factors are critical to the system latency and energy efficiency but are also challenging to optimize since they are tightly coupled and correlated. This paper proposes H3M, an automatic Design Space Exploration (DSE) framework to jointly optimize the architecture, scheduling, and mapping for serving INFaaS on cloud FPGAs. H3M explores: (1) the architecture design space with Heterogeneous spatial Multi-tenant sub-accelerators, (2) layer-wise scheduling for Heterogeneous Multi-DNN workloads, and (3) single-layer mapping to the Homogeneous Multi-core architecture. H3M beats state-of-the-art multi-tenant DNN accelerators, Planaria and Herald, by up to 7.5x and 3.6x in Energy-Delay-Product (EDP) reduction on the ASIC platform. On the Xilinx U200 and U280 FPGA platforms, H3M offers 2.1-5.7x and 1.8-9.0x EDP reduction over Herald.

引用

页码：1314 / 1328

页数：15

共 20 条

[1] Heterogeneous Dataflow Accelerators for Multi-DNN Workloads
Kwon, Hyoukjun
Lai, Liangzhen
Pellauer, Michael
Krishna, Tushar
Chen, Yu-Hsin
Chandra, Vikas
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 71 - 83
[2] Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization
Balaskas, Konstantinos
Khdr, Heba
Sikal, Mohammed Bakr
Kreb, Fabian
Siozios, Kostas
Becker, Jurgen
Henkel, Jorg
IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (04) : 317 - 320
[3] Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-Based Accelerators
Das, Abhijit
Russo, Enrico
Palesi, Maurizio
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (08) : 1883 - 1898
[4] Temperature-Aware Sizing of Multi-Chip Module Accelerators for Multi-DNN Workloads
Shukla, Prachi
Aguren, Derrick
Burd, Tom
Coskun, Ayse K.
Kalamatianos, John
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[5] CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads
Panopoulos, Ioannis
Venieris, Stylianos
Venieris, Iakovos
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (04)
[6] Polyform: A Versatile Architecture for Multi-DNN Execution via Spatial and Temporal Acceleration
Yin, Lingxiang
Ghazizadeh, Amir
Tian, Shilin
Louri, Ahmed
Zheng, Hao
2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 166 - 169
[7] A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration
Li, Yuan
Louri, Ahmed
Karanth, Avinash
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (01) : 46 - 58
[8] Versa-DNN: A Versatile Architecture Enabling High-Performance and Energy-Efficient Multi-DNN Acceleration
Yang, Jiaqi
Zheng, Hao
Louri, Ahmed
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (02) : 349 - 361
[9] Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference
Xiang, Yecheng
Kim, Hyoseung
2019 IEEE 40TH REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2019), 2019, : 392 - 405
[10] gCFS: completely fair scheduling on multiple GPUs for improved multi-DNN execution in terms of performance isolation
Hojin Cho
Myungsun Kim
The Journal of Supercomputing, 2023, 79 : 5851 - 5877

← 1 2 →