Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN Accelerators

被引:7
|
作者
Jeong, Hyuk-Jin [1 ]
Yeo, JiHwan [1 ]
Bahk, Cheongyo [1 ]
Park, JongHyun [1 ]
机构
[1] Samsung Res, Seoul, South Korea
关键词
neural networks; accelerator; compiler; NETWORK;
D O I
10.1145/3579990.3580017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Growing interests in on-device AI have led to the proliferation of accelerators dedicated to neural network inference. Most ASIC accelerators are equipped with compiler-controlled scratchpad memory (SPM) used as a last-level cache to reduce the number of accesses to off-chip memory. A widely-used strategy for utilizing SPM is fused-layer execution, which divides a DNN model into groups of layers and forwards the intermediate results within each group without eviction to the off-chip memory. However, layer fusion has an inherent limitation that the fusion of consecutive layers increases the amount of computations, leading to sub-optimal performance. This paper introduces a new dimension to SPM usage, which temporarily pins a feature map on SPM. Pinning reduces off-chip transfer without computation increase, but it is not applicable to all feature maps due to limited SPM size. We find that superior performance can be achieved by combination of pinning and fusion in MobileNet. Based on this observation, we propose a model-level optimization method that jointly applies pinning and fusion to minimize inference latency under memory constraints. Scheduling and allocation schemes are presented for automatic generation of optimized codes. Evaluation on the commercial AI accelerator shows that the proposed method reduces off-chip transfer of feature maps by 50% and improves inference latency by 15% on average without additional hardware, compared to the state-of-the-art fusion approach.
引用
收藏
页码:224 / 235
页数:12
相关论文
共 19 条
  • [1] Minimizing Off-Chip Memory Access for CNN Accelerators
    Tewari, Saurabh
    Kumar, Anshul
    Paul, Kolin
    IEEE CONSUMER ELECTRONICS MAGAZINE, 2022, 11 (03) : 95 - 104
  • [2] SACC: Split and Combine Approach to Reduce the Off-chip Memory Accesses of LSTM Accelerators
    Tewari, Saurabh
    Kumar, Anshul
    Paul, Kolin
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 580 - 583
  • [3] SmartShuttle: Optimizing Off-Chip Memory Accesses for Deep Learning Accelerators
    Li, Jiajun
    Yan, Guihai
    Lu, Wenyan
    Jiang, Shuhao
    Gong, Shijun
    Wu, Jingya
    Li, Xiaowei
    PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 343 - 348
  • [4] Flex Memory: Exploiting and Managing Abundant Off-Chip Optical Bandwidth
    Wang, Ying
    Zhang, Lei
    Han, Yinhe
    Li, Huawei
    Li, Xiaowei
    2011 DESIGN, AUTOMATION & TEST IN EUROPE (DATE), 2011, : 968 - 973
  • [5] Bus Width Aware Off-Chip Memory Access Minimization for CNN Accelerators
    Tewari, Saurabh
    Kumar, Anshul
    Paul, Kolin
    2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 240 - 245
  • [6] Exploiting off-chip memory access modes in high-level synthesis
    Panda, PR
    Dutt, ND
    Nicolau, A
    1997 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN - DIGEST OF TECHNICAL PAPERS, 1997, : 333 - 340
  • [7] Exact Scheduling to Minimize Off-Chip Data Movement for Deep Learning Accelerators
    Li, Yi
    Gupta, Aarti
    Malik, Sharad
    29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 908 - 914
  • [8] Data Placement in HPC Architectures with Heterogeneous Off-chip Memory
    Pavlovic, Milan
    Puzovic, Nikola
    Ramirez, Alex
    2013 IEEE 31ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2013, : 193 - 200
  • [9] FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads
    Zhang, Jie
    Jung, Myoungsoo
    Kandemir, Mahmut Taylan
    2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 426 - 439
  • [10] POMMEL: Exploring Off-Chip Memory Energy & Power Consumption in Convolutional Neural Network Accelerators
    Montgomerie-Corcoran, Alexander
    Bouganis, Christos-Savvas
    2021 24TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2021), 2021, : 442 - 448