Multi-Bank On-Chip Memory Management Techniques for CNN Accelerators

被引:6
|
作者
Kang, Duseok [1 ]
Kang, Donghyun [1 ]
Ha, Soonhoi [1 ]
机构
[1] Seoul Natl Univ, Dept Comp Engn, Seoul 08826, South Korea
关键词
System-on-chip; Random access memory; Convolution; Memory management; Delays; Frequency modulation; Prefetching; Convolutional neural network; multi-bank memory management; layer fusion; prefetching; data reuse; accelerator;
D O I
10.1109/TC.2021.3076987
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Since off-chip DRAM access affects both performance and power consumption significantly, convolutional neural network (CNN) accelerators commonly aim to maximize data reuse in on-chip memory. By organizing the on-chip memory to multiple banks, we may hide off-chip DRAM access delay by prefetching data to unused banks during computation. When and where to prefetch data and how to reuse the feature map data between layers define the multi-bank on-chip memory management (MOMM) problem. In this paper, we propose compiler techniques to solve the MOMM problem with two different objectives: one is to minimize the off-chip memory access volume, and the other is to minimize the processing delay caused by unhidden DRAM accesses. By running CNN benchmarks on a cycle-level NPU simulator, we demonstrate the trade-off relation between two objectives. Compared with the baseline approach that does not reuse the feature map between layers, we could reduce the DRAM access volume and the processing delay up to 55.0 and 79.4 percent, respectively. Moreover, we extend the proposed techniques to consider layer fusion that aims to reuse feature maps between layers. Experiment results confirm the superiority of the proposed hybrid fusion technique to the per-layer processing technique and the pure fusion technique.
引用
收藏
页码:1181 / 1193
页数:13
相关论文
共 50 条
  • [1] A Multi-Cache System for On-Chip Memory Optimization in FPGA-Based CNN Accelerators
    Pacini, Tommaso
    Rapuano, Emilio
    Dinelli, Gianmarco
    Fanucci, Luca
    ELECTRONICS, 2021, 10 (20)
  • [2] A Case of On-Chip Memory Subsystem Design for Low-Power CNN Accelerators
    Wang, Ying
    Li, Huawei
    Li, Xiaowei
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (10) : 1971 - 1984
  • [3] Energy optimization of a multi-bank main memory
    Ben Fradj, Hanene
    Icart, Sebastien
    Belleudy, Cecile
    Auguin, Michel
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2006, 4017 : 196 - 205
  • [4] On Fault-Tolerant Microarchitectural Techniques for Voltage Underscaling in On-Chip Memories of CNN Accelerators
    Toca-Diaz, Yamilka
    Munoz, Nicolas Landeros
    Gran Tejero, Ruben
    Valero, Alejandro
    2023 26TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN, DSD 2023, 2023, : 138 - 145
  • [5] Multi-bank memory allocation for multimedia application
    Ben Fradj, Hanene
    Belleudy, Cecile
    Auguin, Michel
    Pegatoquet, Alain
    2006 13TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2006, : 780 - +
  • [6] Improving off-chip memory energy behavior in a multi-processor, multi-bank environment
    De La Luz, V
    Kandemir, M
    Sezer, U
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2003, 2624 : 100 - 114
  • [7] Loop scheduling and bank type assignment for heterogeneous multi-bank memory
    Qiu, Meikang
    Guo, Minyi
    Liu, Meiqin
    Xue, Chun Jason
    Yang, Laurence T.
    Sha, Edwin H. -M.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2009, 69 (06) : 546 - 558
  • [8] Operation and Data Mapping for CGRAs with Multi-bank Memory
    Kim, Yongjoo
    Lee, Jongeun
    Shrivastava, Aviral
    Paek, Yunheung
    ACM SIGPLAN NOTICES, 2010, 45 (04) : 17 - 25
  • [9] Minimising Access Conflicts on Shared Multi-Bank Memory
    Tretter, Andreas
    Giannopoulou, Georgia
    Baer, Matthias
    Thiele, Lothar
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
  • [10] Operation and Data Mapping for CGRAs with Multi-bank Memory
    Kim, Yongjoo
    Lee, Jongeun
    Shrivastava, Aviral
    Paek, Yunheung
    LCTES 10-PROCEEDINGS OF THE ACM SIGPLAN/SIGBED 2010 CONFERENCE ON LANGUAGES, COMPILERS, & TOOLS FOR EMBEDDED SYSTEMS, 2010, : 17 - 25