Multi-Bank On-Chip Memory Management Techniques for CNN Accelerators

被引:6
|
作者
Kang, Duseok [1 ]
Kang, Donghyun [1 ]
Ha, Soonhoi [1 ]
机构
[1] Seoul Natl Univ, Dept Comp Engn, Seoul 08826, South Korea
关键词
System-on-chip; Random access memory; Convolution; Memory management; Delays; Frequency modulation; Prefetching; Convolutional neural network; multi-bank memory management; layer fusion; prefetching; data reuse; accelerator;
D O I
10.1109/TC.2021.3076987
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Since off-chip DRAM access affects both performance and power consumption significantly, convolutional neural network (CNN) accelerators commonly aim to maximize data reuse in on-chip memory. By organizing the on-chip memory to multiple banks, we may hide off-chip DRAM access delay by prefetching data to unused banks during computation. When and where to prefetch data and how to reuse the feature map data between layers define the multi-bank on-chip memory management (MOMM) problem. In this paper, we propose compiler techniques to solve the MOMM problem with two different objectives: one is to minimize the off-chip memory access volume, and the other is to minimize the processing delay caused by unhidden DRAM accesses. By running CNN benchmarks on a cycle-level NPU simulator, we demonstrate the trade-off relation between two objectives. Compared with the baseline approach that does not reuse the feature map between layers, we could reduce the DRAM access volume and the processing delay up to 55.0 and 79.4 percent, respectively. Moreover, we extend the proposed techniques to consider layer fusion that aims to reuse feature maps between layers. Experiment results confirm the superiority of the proposed hybrid fusion technique to the per-layer processing technique and the pure fusion technique.
引用
收藏
页码:1181 / 1193
页数:13
相关论文
共 50 条
  • [31] Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit
    Chang, Hoseok
    Cho, Junho
    Sung, Wonyong
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2009, 56 (2-3): : 249 - 260
  • [32] Efficient Contention-Aware Scheduling of SDF Graphs on Shared Multi-bank Memory
    Tran, Hai Nam
    Honorat, Alexandre
    Talpin, Jean-Pierre
    Gautier, Thierry
    Besnard, Loic
    2019 24TH INTERNATIONAL CONFERENCE ON ENGINEERING OF COMPLEX COMPUTER SYSTEMS (ICECCS 2019), 2019, : 114 - 123
  • [33] LOOP SCHEDULING AND ASSIGNMENT TO MINIMIZE ENERGY WHILE HIDING LATENCY FOR HETEROGENEOUS MULTI-BANK MEMORY
    Qiu, Meikang
    Wu, Jiande
    Xue, Chun Jason
    Hu, Jingtong Aaron
    Tseng, Wei-Che
    Sha, Edwin H. -M.
    2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2, 2008, : 458 - +
  • [34] Conflict-free data access for multi-bank memory architectures using padding
    Sohl, Joar
    Wang, Jian
    Karlsson, Andreas
    Liu, Dake
    2013 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2013, : 425 - 432
  • [35] Elastic Pipeline: Addressing GPU On-chip Shared Memory Bank Conflicts
    Gou, Chunyang
    Gaydadjiev, Georgi N.
    PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011), 2011,
  • [36] Bus Width Aware Off-Chip Memory Access Minimization for CNN Accelerators
    Tewari, Saurabh
    Kumar, Anshul
    Paul, Kolin
    2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 240 - 245
  • [37] Data affinity based garbage collector for multi-bank flash-memory storage system
    College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    Huazhong Ligong Daxue Xuebao, 2007, 11 (66-68):
  • [38] Optimization techniques of On-chip Memory System Based on UltraSPARC Architecture
    Huang, Anwen
    Gao, Jun
    Feng, Chaochao
    Zhang, Minxuan
    2009 ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS (PRIMEASIA 2009), 2009, : 428 - 431
  • [39] IBM POWER7+processor on-chip accelerators for cryptography and active memory expansion
    Blaner, B.
    Abali, B.
    Bass, B. M.
    Chari, S.
    Kalla, R.
    Kunkel, S.
    Lauricella, K.
    Leavens, R.
    Reilly, J. J.
    Sandon, P. A.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2013, 57 (06)
  • [40] A CMOS VAX MICROPROCESSOR WITH ON-CHIP CACHE AND MEMORY MANAGEMENT
    ARCHER, DW
    DEVERELL, DR
    FOX, TF
    GRONOWSKI, PE
    JAIN, AK
    LEARY, M
    MINER, DG
    OLESIN, A
    PERSELS, SD
    RUBINFELD, PI
    SUPNIK, RM
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1987, 22 (05) : 849 - 852