Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution

被引：41

作者：

Goetschalckx, Koen ^{[1
]}

Verhelst, Marian ^{[1
]}

机构：

[1] Katholieke Univ Leuven, Dept Elect Engn ESAT, MICAS, B-3001 Leuven, Belgium

来源：

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS | 2019年 / 9卷 / 02期

基金：

比利时弗兰德研究基金会;

关键词：

Neural networks; memory management; high resolution imaging; neural network hardware; LANGUAGE; COMPILER;

D O I：

10.1109/JETCAS.2019.2905361

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Convolutional neural networks (CNNs) now also start to reach impressive performance on non-classification image processing tasks, such as denoising, demosaicing, super-resolution, and super slow motion. Consequently, CNNs are increasingly deployed on very high-resolution images. However, the resulting high-resolution feature maps pose unseen requirements on the memory system of neural network processing systems, as on-chip memories are too small to store high-resolution feature maps, while off-chip memories are very costly in terms of I/O bandwidth and power. This paper first shows that the classical layer-by-layer inference approaches are bounded in their external I/O bandwidth versus on-chip memory tradeoff space, making it infeasible to scale up to very high resolutions at a reasonable cost. Next, we demonstrate how an alternative depth-first network computation can reduce I/O bandwidth requirements up to >200x for a fixed on-chip memory size or, alternatively, reduce on-chip memory requirements up to >10000x for a fixed I/O bandwidth limitation. We further introduce an enhanced depth-first method, exploiting both line buffers and tiling, to further improve the external I/O bandwidth versus on-chip memory capacity tradeoff and quantify its improvements beyond the current state of the art.

引用

页码：323 / 331

页数：9

共 23 条

[1] [Anonymous], 2018, Eyeriss v2: A flexible and highperformance accelerator for emerging deep neural networks
[2] [Anonymous], 2018, BRAINSLUG TRANSPAREN
[3] [Anonymous], ENERGY EFFICIENT FPG
[4] [Anonymous], 2017, MOBILENETS EFFICIENT
[5] [Anonymous], 49 ANN IEEE ACM INT
[6] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Chen, Yu-Hsin
Krishna, Tushar
Emer, Joel S.
Sze, Vivienne
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
[7] Rigel: Flexible Multi-Rate Image Processing Hardware
Hegarty, James
Daly, Ross
DeVito, Zachary
Ragan-Kelley, Jonathan
Horowitz, Mark
Hanrahan, Pat
[J]. ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04):
[8] Darkroom: Compiling High-Level Image Processing Code into Hardware Pipelines
Hegarty, James
Brunhaver, John
DeVito, Zachary
Ragan-Kelley, Jonathan
Cohen, Noy
Bell, Steven
Vasilyev, Artem
Horowitz, Mark
Hanrahan, Pat
[J]. ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (04):
[9] Koeplinger D, 2018, PROCEEDINGS OF THE 39TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION, PLDI 2018, P296, DOI [10.1145/3192366.3192379, 10.1145/3296979.3192379]
[10] Ledig C., 2017, P IEEE C COMP VIS PA, P4681, DOI DOI 10.1109/CVPR.2017.19

← 1 2 3 →