Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective

被引:9
|
作者
Zeng, Shulin [1 ]
Dai, Guohao [2 ]
Zhang, Niansong [1 ]
Yang, Xinhao [1 ]
Zhang, Haoyu [1 ]
Zhu, Zhenhua [1 ]
Yang, Huazhong [1 ]
Wang, Yu [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100190, Peoples R China
[2] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Computer architecture; Field programmable gate arrays; Dynamic scheduling; Optimization; Hardware; Bandwidth; Parallel processing; Multi-tenancy; deep neural network; multi-core; accelerator; FPGA;
D O I
10.1109/TC.2022.3214113
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Network (DNN) INFerence-as-a-Service (INFaaS) is the dominating workload in current data centers, for which FPGAs become promising hardware platforms because of their high flexibility and energy efficiency. The dynamic and multi-tenancy nature of INFaaS requires careful design in three aspects: multi-tenant architecture, multi-DNN scheduling, and multi-core mapping. These three factors are critical to the system latency and energy efficiency but are also challenging to optimize since they are tightly coupled and correlated. This paper proposes H3M, an automatic Design Space Exploration (DSE) framework to jointly optimize the architecture, scheduling, and mapping for serving INFaaS on cloud FPGAs. H3M explores: (1) the architecture design space with Heterogeneous spatial Multi-tenant sub-accelerators, (2) layer-wise scheduling for Heterogeneous Multi-DNN workloads, and (3) single-layer mapping to the Homogeneous Multi-core architecture. H3M beats state-of-the-art multi-tenant DNN accelerators, Planaria and Herald, by up to 7.5x and 3.6x in Energy-Delay-Product (EDP) reduction on the ASIC platform. On the Xilinx U200 and U280 FPGA platforms, H3M offers 2.1-5.7x and 1.8-9.0x EDP reduction over Herald.
引用
收藏
页码:1314 / 1328
页数:15
相关论文
共 20 条
  • [1] Heterogeneous Dataflow Accelerators for Multi-DNN Workloads
    Kwon, Hyoukjun
    Lai, Liangzhen
    Pellauer, Michael
    Krishna, Tushar
    Chen, Yu-Hsin
    Chandra, Vikas
    2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 71 - 83
  • [2] Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization
    Balaskas, Konstantinos
    Khdr, Heba
    Sikal, Mohammed Bakr
    Kreb, Fabian
    Siozios, Kostas
    Becker, Jurgen
    Henkel, Jorg
    IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (04) : 317 - 320
  • [3] Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-Based Accelerators
    Das, Abhijit
    Russo, Enrico
    Palesi, Maurizio
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (08) : 1883 - 1898
  • [4] Temperature-Aware Sizing of Multi-Chip Module Accelerators for Multi-DNN Workloads
    Shukla, Prachi
    Aguren, Derrick
    Burd, Tom
    Coskun, Ayse K.
    Kalamatianos, John
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [5] CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads
    Panopoulos, Ioannis
    Venieris, Stylianos
    Venieris, Iakovos
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (04)
  • [6] Polyform: A Versatile Architecture for Multi-DNN Execution via Spatial and Temporal Acceleration
    Yin, Lingxiang
    Ghazizadeh, Amir
    Tian, Shilin
    Louri, Ahmed
    Zheng, Hao
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 166 - 169
  • [7] A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration
    Li, Yuan
    Louri, Ahmed
    Karanth, Avinash
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (01) : 46 - 58
  • [8] Versa-DNN: A Versatile Architecture Enabling High-Performance and Energy-Efficient Multi-DNN Acceleration
    Yang, Jiaqi
    Zheng, Hao
    Louri, Ahmed
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (02) : 349 - 361
  • [9] Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference
    Xiang, Yecheng
    Kim, Hyoseung
    2019 IEEE 40TH REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2019), 2019, : 392 - 405
  • [10] gCFS: completely fair scheduling on multiple GPUs for improved multi-DNN execution in terms of performance isolation
    Hojin Cho
    Myungsun Kim
    The Journal of Supercomputing, 2023, 79 : 5851 - 5877