Stream: A Modeling Framework for Fine-grained Layer Fusion on Multi-core DNN Accelerators

被引:2
作者
Symons, Arne [1 ]
Mei, Linyan [1 ]
Colleman, Steven [1 ]
Houshmand, Pouya [1 ]
Karl, Sebastian [1 ,2 ]
Verhelst, Marian [1 ]
机构
[1] Katholieke Univ Leuven, Leuven, Belgium
[2] Tech Univ Munich, Munich, Germany
来源
2023 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, ISPASS | 2023年
关键词
DNN; multi-core; accelerator; layer fusion; design space exploration;
D O I
10.1109/ISPASS57527.2023.00051
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To keep up with the ever-growing performance demand of DNN processing, specialized hardware (HW) accelerators are shifting towards multi-core architectures. Stream is the first open-source design space exploration (DSE) framework for co-optimization of HW architecture and fine-grained scheduling of such multi-core DNN accelerators. Stream supports fine-grained layer fusion, to optimally trade-off energy, latency, and/or on-chip memory footprint for constrained edge devices. Validation against three SotA chips, together with a case study on seven HW architectures with different scheduling granularity, demonstrate the reliability and capabilities of Stream. Results show that high-level architectural decisions greatly impact HW efficiency under the fine-grained scheduling paradigm, reducing the energy-delay product from 2.4x for single-core architectures to up to 30x for heterogeneous multi-core architectures compared to traditional scheduling at layer granularity. Stream is open-source at github.com/ZigZag-Project/stream.
引用
收藏
页码:355 / 357
页数:3
相关论文
共 22 条
  • [1] Alwani M, 2016, INT SYMP MICROARCH
  • [2] CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories
    Balasubramonian, Rajeev
    Kahng, Andrew B.
    Muralimanohar, Naveen
    Shafiee, Ali
    Srinivas, Vaishnav
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (02)
  • [3] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
    Chen, Yu-Hsin
    Krishna, Tushar
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
  • [4] ShiDianNao: Shifting Vision Processing Closer to the Sensor
    Du, Zidong
    Fasthuber, Robert
    Chen, Tianshi
    Ienne, Paolo
    Li, Ling
    Luo, Tao
    Feng, Xiaobing
    Chen, Yunji
    Temam, Olivier
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 92 - 104
  • [5] Garofalo A., 2022, arXiv
  • [6] Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks
    Ghodrati, Soroush
    Ahn, Byung Hoon
    Kim, Joon Kyung
    Kinzer, Sean
    Yatham, Brahmendra Reddy
    Alla, Navateja
    Sharma, Hardik
    Alian, Mohammad
    Ebrahimi, Eiman
    Kim, Nam Sung
    Young, Cliff
    Esmaeilzadeh, Hadi
    [J]. 2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 681 - 697
  • [7] Goetschalckx K., 2021, PROC IEEE S VLSI CIR, P1
  • [8] Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution
    Goetschalckx, Koen
    Verhelst, Marian
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) : 323 - 331
  • [9] Guttman A., 1984, SIGMOD Record, V14, P47, DOI 10.1145/971697.602266
  • [10] Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing
    Jia, Hongyang
    Ozatay, Murat
    Tang, Yinqi
    Valavi, Hossein
    Pathak, Rakshit
    Lee, Jinseok
    Verma, Naveen
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2022, 57 (01) : 198 - 211