Multi-Bank On-Chip Memory Management Techniques for CNN Accelerators

被引:6
|
作者
Kang, Duseok [1 ]
Kang, Donghyun [1 ]
Ha, Soonhoi [1 ]
机构
[1] Seoul Natl Univ, Dept Comp Engn, Seoul 08826, South Korea
关键词
System-on-chip; Random access memory; Convolution; Memory management; Delays; Frequency modulation; Prefetching; Convolutional neural network; multi-bank memory management; layer fusion; prefetching; data reuse; accelerator;
D O I
10.1109/TC.2021.3076987
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Since off-chip DRAM access affects both performance and power consumption significantly, convolutional neural network (CNN) accelerators commonly aim to maximize data reuse in on-chip memory. By organizing the on-chip memory to multiple banks, we may hide off-chip DRAM access delay by prefetching data to unused banks during computation. When and where to prefetch data and how to reuse the feature map data between layers define the multi-bank on-chip memory management (MOMM) problem. In this paper, we propose compiler techniques to solve the MOMM problem with two different objectives: one is to minimize the off-chip memory access volume, and the other is to minimize the processing delay caused by unhidden DRAM accesses. By running CNN benchmarks on a cycle-level NPU simulator, we demonstrate the trade-off relation between two objectives. Compared with the baseline approach that does not reuse the feature map between layers, we could reduce the DRAM access volume and the processing delay up to 55.0 and 79.4 percent, respectively. Moreover, we extend the proposed techniques to consider layer fusion that aims to reuse feature maps between layers. Experiment results confirm the superiority of the proposed hybrid fusion technique to the per-layer processing technique and the pure fusion technique.
引用
收藏
页码:1181 / 1193
页数:13
相关论文
共 50 条
  • [41] On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators
    Li, Haitong
    Bhargava, Mudit
    Whatmough, Paul N.
    Wong, H-S Philip
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [42] Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations
    Anagnostopoulos, Iraklis
    Xydis, Sotirios
    Bartzas, Alexandros
    Lu, Zhonghai
    Soudris, Dimitrios
    Jantsch, Axel
    IEEE EMBEDDED SYSTEMS LETTERS, 2011, 3 (02) : 66 - 69
  • [43] Adaptive energy-aware design of a multi-bank flash-memory storage system
    Du, YH
    Cai, M
    Dong, JX
    11TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2005, : 311 - 316
  • [44] Addressing GPU On-Chip Shared Memory Bank Conflicts Using Elastic Pipeline
    Chunyang Gou
    Georgi N. Gaydadjiev
    International Journal of Parallel Programming, 2013, 41 : 400 - 429
  • [45] Addressing GPU On-Chip Shared Memory Bank Conflicts Using Elastic Pipeline
    Gou, Chunyang
    Gaydadjiev, Georgi N.
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (03) : 400 - 429
  • [46] Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory
    Yin, Shouyi
    Yao, Xianqing
    Lu, Tianyi
    Liu, Dajiang
    Gu, Jiangyuan
    Liu, Leibo
    Wei, Shaojun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (09) : 2471 - 2485
  • [47] An on-chip multi-wavelength photonic-phononic memory
    Merklein, Moritz
    Stiller, Birgit
    Vu, Khu
    Madden, Stephen J.
    Eggleton, Benjamin J.
    2016 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO), 2016,
  • [48] Flip-and-Patch: A fault-tolerant technique for on-chip memories of CNN accelerators at low supply voltage
    Toca-Diaz, Yamilka
    Palacios, Reynier Hernandez
    Tejero, Ruben Gran
    Valero, Alejandro
    MICROPROCESSORS AND MICROSYSTEMS, 2024, 106
  • [49] High-speed design for mixed radix FFT algorithm based on multi-bank memory strategy
    Ma, Cuimei
    Wang, Yanfei
    IEICE ELECTRONICS EXPRESS, 2016, 13 (18):
  • [50] A concurrent multi-bank memory arbiter for dynamic IP cores using idle skip round robin
    Kearney, DA
    Veldman, G
    2003 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), PROCEEDINGS, 2003, : 411 - 414