Multi-Bank On-Chip Memory Management Techniques for CNN Accelerators

被引:6
|
作者
Kang, Duseok [1 ]
Kang, Donghyun [1 ]
Ha, Soonhoi [1 ]
机构
[1] Seoul Natl Univ, Dept Comp Engn, Seoul 08826, South Korea
关键词
System-on-chip; Random access memory; Convolution; Memory management; Delays; Frequency modulation; Prefetching; Convolutional neural network; multi-bank memory management; layer fusion; prefetching; data reuse; accelerator;
D O I
10.1109/TC.2021.3076987
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Since off-chip DRAM access affects both performance and power consumption significantly, convolutional neural network (CNN) accelerators commonly aim to maximize data reuse in on-chip memory. By organizing the on-chip memory to multiple banks, we may hide off-chip DRAM access delay by prefetching data to unused banks during computation. When and where to prefetch data and how to reuse the feature map data between layers define the multi-bank on-chip memory management (MOMM) problem. In this paper, we propose compiler techniques to solve the MOMM problem with two different objectives: one is to minimize the off-chip memory access volume, and the other is to minimize the processing delay caused by unhidden DRAM accesses. By running CNN benchmarks on a cycle-level NPU simulator, we demonstrate the trade-off relation between two objectives. Compared with the baseline approach that does not reuse the feature map between layers, we could reduce the DRAM access volume and the processing delay up to 55.0 and 79.4 percent, respectively. Moreover, we extend the proposed techniques to consider layer fusion that aims to reuse feature maps between layers. Experiment results confirm the superiority of the proposed hybrid fusion technique to the per-layer processing technique and the pure fusion technique.
引用
收藏
页码:1181 / 1193
页数:13
相关论文
共 50 条
  • [21] TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators
    Maas, Martin
    Beaugnon, Ulysse
    Chauhan, Arun
    Ilbeyi, Berkin
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 1, ASPLOS 2023, 2023, : 123 - 137
  • [22] A Fully Parallel Content Addressable Memory Design Using Multi-Bank Structure
    Jiang, Shixiong
    Saravanan, Vijayalakshmi
    Yan, Pengzhan
    Sridhar, Ramalingam
    2016 29TH IEEE INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC), 2016, : 338 - 343
  • [23] Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis
    Lu, Tianyi
    Yin, Shouyi
    Yao, Xianqing
    Xie, Zhicong
    Liu, Leibo
    Wei, Shaojun
    FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 290 - 290
  • [24] Automatic data migration for reducing energy consumption in multi-bank memory systems
    De La Luz, V
    Kandemir, M
    Kolcu, I
    39TH DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2002, 2002, : 213 - 218
  • [25] Multi-Bank Memory Aware Force Directed Scheduling for High-Level Synthesis
    Yin, Shouyi
    Lu, Tianyi
    Yao, Xianqing
    Xie, Zhicong
    Liu, Leibo
    Wei, Shaojun
    IEEE ACCESS, 2018, 6 : 7526 - 7540
  • [26] Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit
    Hoseok Chang
    Junho Cho
    Wonyong Sung
    Journal of Signal Processing Systems, 2009, 56 : 249 - 260
  • [27] The hierarchical multi-bank DRAM: A high-performance architecture for memory integrated with processors
    Yamauchi, T
    Hammond, L
    Olukotun, K
    SEVENTEENTH CONFERENCE ON ADVANCED RESEARCH IN VLSI, PROCEEDINGS, 1997, : 303 - 319
  • [28] Tolerating Soft Errors in Deep Learning Accelerators with Reliable On-Chip Memory Designs
    Azizimazreah, Arash
    Gu, Yongbin
    Gu, Xiang
    Chen, Lizhong
    2018 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2018,
  • [29] DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators
    Ranawaka, Piyumal
    Azhar, Muhammad Waqar
    Stenstrom, Per
    PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2024, CF 2024, 2024, : 126 - 137
  • [30] Memory access scheduling and binding considering energy minimisation in multi-bank memory systems: integrated approach
    Lyuh, CG
    Kim, T
    IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 2006, 153 (01): : 59 - 68