Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution

被引:41
作者
Goetschalckx, Koen [1 ]
Verhelst, Marian [1 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT, MICAS, B-3001 Leuven, Belgium
基金
比利时弗兰德研究基金会;
关键词
Neural networks; memory management; high resolution imaging; neural network hardware; LANGUAGE; COMPILER;
D O I
10.1109/JETCAS.2019.2905361
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural networks (CNNs) now also start to reach impressive performance on non-classification image processing tasks, such as denoising, demosaicing, super-resolution, and super slow motion. Consequently, CNNs are increasingly deployed on very high-resolution images. However, the resulting high-resolution feature maps pose unseen requirements on the memory system of neural network processing systems, as on-chip memories are too small to store high-resolution feature maps, while off-chip memories are very costly in terms of I/O bandwidth and power. This paper first shows that the classical layer-by-layer inference approaches are bounded in their external I/O bandwidth versus on-chip memory tradeoff space, making it infeasible to scale up to very high resolutions at a reasonable cost. Next, we demonstrate how an alternative depth-first network computation can reduce I/O bandwidth requirements up to >200x for a fixed on-chip memory size or, alternatively, reduce on-chip memory requirements up to >10000x for a fixed I/O bandwidth limitation. We further introduce an enhanced depth-first method, exploiting both line buffers and tiling, to further improve the external I/O bandwidth versus on-chip memory capacity tradeoff and quantify its improvements beyond the current state of the art.
引用
收藏
页码:323 / 331
页数:9
相关论文
共 23 条
  • [1] [Anonymous], 2018, Eyeriss v2: A flexible and highperformance accelerator for emerging deep neural networks
  • [2] [Anonymous], 2018, BRAINSLUG TRANSPAREN
  • [3] [Anonymous], ENERGY EFFICIENT FPG
  • [4] [Anonymous], 2017, MOBILENETS EFFICIENT
  • [5] [Anonymous], 49 ANN IEEE ACM INT
  • [6] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
    Chen, Yu-Hsin
    Krishna, Tushar
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
  • [7] Rigel: Flexible Multi-Rate Image Processing Hardware
    Hegarty, James
    Daly, Ross
    DeVito, Zachary
    Ragan-Kelley, Jonathan
    Horowitz, Mark
    Hanrahan, Pat
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04):
  • [8] Darkroom: Compiling High-Level Image Processing Code into Hardware Pipelines
    Hegarty, James
    Brunhaver, John
    DeVito, Zachary
    Ragan-Kelley, Jonathan
    Cohen, Noy
    Bell, Steven
    Vasilyev, Artem
    Horowitz, Mark
    Hanrahan, Pat
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (04):
  • [9] Koeplinger D, 2018, PROCEEDINGS OF THE 39TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION, PLDI 2018, P296, DOI [10.1145/3192366.3192379, 10.1145/3296979.3192379]
  • [10] Ledig C., 2017, P IEEE C COMP VIS PA, P4681, DOI DOI 10.1109/CVPR.2017.19