Communication Lower Bound in Convolution Accelerators

被引:26
作者
Chen, Xiaoming [1 ,2 ]
Han, Yinhe [1 ,2 ]
Wang, Yu [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Ctr Intelligent Comp Syst, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020) | 2020年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Convolutional neural network (CNN); CNN accelerator; communication lower bound;
D O I
10.1109/HPCA47549.2020.00050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In current convolutional neural network (CNN) accelerators, communication (i.e., memory access) dominates the energy consumption. This work provides comprehensive analysis and methodologies to minimize the communication for CNN accelerators. For the off-chip communication, we derive the theoretical lower bound for any convolutional layer and propose a dataflow to reach the lower bound. This fundamental problem has never been solved by prior studies. The on-chip communication is minimized based on an elaborate workload and storage mapping scheme. We in addition design a communication-optimal CNN accelerator architecture. Evaluations based on the 65nm technology demonstrate that the proposed architecture nearly reaches the theoretical minimum communication in a three-level memory hierarchy and it is computation dominant. The gap between the energy efficiency of our accelerator and the theoretical best value is only 37-87%.
引用
收藏
页码:529 / 541
页数:13
相关论文
共 41 条
[11]   EIE: Efficient Inference Engine on Compressed Deep Neural Network [J].
Han, Song ;
Liu, Xingyu ;
Mao, Huizi ;
Pu, Jing ;
Pedram, Ardavan ;
Horowitz, Mark A. ;
Dally, William J. .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :243-254
[12]  
Jia-Wei H., 1981, P 13 ANN ACM S THEOR, P326, DOI DOI 10.1145/800076.802486
[13]   Energy-Efficient Convolution Architecture Based on Rescheduled Dataflow [J].
Jo, Jihyuck ;
Kim, Suchang ;
Park, In-Cheol .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (12) :4196-4207
[14]  
Jouppi N. P., 2013, CACTIIO TECHNICAL RE
[15]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[16]   Fast Algorithms for Convolutional Neural Networks [J].
Lavin, Andrew ;
Gray, Scott .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4013-4021
[17]  
Li JJ, 2018, DES AUT TEST EUROPE, P343, DOI 10.23919/DATE.2018.8342033
[18]   Data and Hardware Efficient Design for Convolutional Neural Network [J].
Lin, Yue-Jin ;
Chang, Tian Sheuan .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (05) :1642-1651
[19]   Addressing the Issue of Processing Element Under-Utilization in General-Purpose Systolic Deep Learning Accelerators [J].
Liu, Bosheng ;
Chen, Xiaoming ;
Wang, Ying ;
Han, Yinhe ;
Li, Jiajun ;
Xu, Haobo ;
Li, Xiaowei .
24TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC 2019), 2019, :733-738
[20]  
Liu Q., 2017, RED BLUE STANDARD PE