Communication Lower Bound in Convolution Accelerators

被引:26
作者
Chen, Xiaoming [1 ,2 ]
Han, Yinhe [1 ,2 ]
Wang, Yu [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Ctr Intelligent Comp Syst, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020) | 2020年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Convolutional neural network (CNN); CNN accelerator; communication lower bound;
D O I
10.1109/HPCA47549.2020.00050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In current convolutional neural network (CNN) accelerators, communication (i.e., memory access) dominates the energy consumption. This work provides comprehensive analysis and methodologies to minimize the communication for CNN accelerators. For the off-chip communication, we derive the theoretical lower bound for any convolutional layer and propose a dataflow to reach the lower bound. This fundamental problem has never been solved by prior studies. The on-chip communication is minimized based on an elaborate workload and storage mapping scheme. We in addition design a communication-optimal CNN accelerator architecture. Evaluations based on the 65nm technology demonstrate that the proposed architecture nearly reaches the theoretical minimum communication in a three-level memory hierarchy and it is computation dominant. The gap between the energy efficiency of our accelerator and the theoretical best value is only 37-87%.
引用
收藏
页码:529 / 541
页数:13
相关论文
共 41 条
[1]  
Alwani M, 2016, INT SYMP MICROARCH
[2]  
[Anonymous], 2015, ARXIV E PRINTS
[3]  
CHEN T, 2014, P 19 INT C ARCH SUPP, P269, DOI DOI 10.1145/2541940.2541967
[4]   Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs [J].
Chen, Xiaoming ;
Chen, Jianxu ;
Chen, Danny Z. ;
Hu, Xiaobo Sharon .
PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
[5]   Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Emer, Joel ;
Sze, Vivienne .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :367-379
[6]  
Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
[7]  
Collobert R., 2008, P 25 ICML, P160, DOI [10.1145/1390156.1390177, DOI 10.1145/1390156.1390177]
[8]   ShiDianNao: Shifting Vision Processing Closer to the Sensor [J].
Du, Zidong ;
Fasthuber, Robert ;
Chen, Tianshi ;
Ienne, Paolo ;
Li, Ling ;
Luo, Tao ;
Feng, Xiaobing ;
Chen, Yunji ;
Temam, Olivier .
2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, :92-104
[9]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587
[10]   High-performance implementation of the level-3 BLAS [J].
Goto, Kazushige ;
Van De Geijn, Robert .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 35 (01) :1-14