Multi-Bank On-Chip Memory Management Techniques for CNN Accelerators

被引：6

作者：

Kang, Duseok ^{[1
]}

Kang, Donghyun ^{[1
]}

Ha, Soonhoi ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Comp Engn, Seoul 08826, South Korea

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2022年 / 71卷 / 05期

关键词：

System-on-chip; Random access memory; Convolution; Memory management; Delays; Frequency modulation; Prefetching; Convolutional neural network; multi-bank memory management; layer fusion; prefetching; data reuse; accelerator;

D O I：

10.1109/TC.2021.3076987

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Since off-chip DRAM access affects both performance and power consumption significantly, convolutional neural network (CNN) accelerators commonly aim to maximize data reuse in on-chip memory. By organizing the on-chip memory to multiple banks, we may hide off-chip DRAM access delay by prefetching data to unused banks during computation. When and where to prefetch data and how to reuse the feature map data between layers define the multi-bank on-chip memory management (MOMM) problem. In this paper, we propose compiler techniques to solve the MOMM problem with two different objectives: one is to minimize the off-chip memory access volume, and the other is to minimize the processing delay caused by unhidden DRAM accesses. By running CNN benchmarks on a cycle-level NPU simulator, we demonstrate the trade-off relation between two objectives. Compared with the baseline approach that does not reuse the feature map between layers, we could reduce the DRAM access volume and the processing delay up to 55.0 and 79.4 percent, respectively. Moreover, we extend the proposed techniques to consider layer fusion that aims to reuse feature maps between layers. Experiment results confirm the superiority of the proposed hybrid fusion technique to the per-layer processing technique and the pure fusion technique.

引用

页码：1181 / 1193

页数：13

共 50 条

[41] On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators
Li, Haitong
Bhargava, Mudit
Whatmough, Paul N.
Wong, H-S Philip
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[42] Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations
Anagnostopoulos, Iraklis
Xydis, Sotirios
Bartzas, Alexandros
Lu, Zhonghai
Soudris, Dimitrios
Jantsch, Axel
IEEE EMBEDDED SYSTEMS LETTERS, 2011, 3 (02) : 66 - 69
[43] Adaptive energy-aware design of a multi-bank flash-memory storage system
Du, YH
Cai, M
Dong, JX
11TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2005, : 311 - 316
[44] Addressing GPU On-Chip Shared Memory Bank Conflicts Using Elastic Pipeline
Chunyang Gou
Georgi N. Gaydadjiev
International Journal of Parallel Programming, 2013, 41 : 400 - 429
[45] Addressing GPU On-Chip Shared Memory Bank Conflicts Using Elastic Pipeline
Gou, Chunyang
Gaydadjiev, Georgi N.
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (03) : 400 - 429
[46] Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory
Yin, Shouyi
Yao, Xianqing
Lu, Tianyi
Liu, Dajiang
Gu, Jiangyuan
Liu, Leibo
Wei, Shaojun
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (09) : 2471 - 2485
[47] An on-chip multi-wavelength photonic-phononic memory
Merklein, Moritz
Stiller, Birgit
Vu, Khu
Madden, Stephen J.
Eggleton, Benjamin J.
2016 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO), 2016,
[48] Flip-and-Patch: A fault-tolerant technique for on-chip memories of CNN accelerators at low supply voltage
Toca-Diaz, Yamilka
Palacios, Reynier Hernandez
Tejero, Ruben Gran
Valero, Alejandro
MICROPROCESSORS AND MICROSYSTEMS, 2024, 106
[49] High-speed design for mixed radix FFT algorithm based on multi-bank memory strategy
Ma, Cuimei
Wang, Yanfei
IEICE ELECTRONICS EXPRESS, 2016, 13 (18):
[50] A concurrent multi-bank memory arbiter for dynamic IP cores using idle skip round robin
Kearney, DA
Veldman, G
2003 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), PROCEEDINGS, 2003, : 411 - 414

← 1 2 3 4 5 →