EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

被引:27
作者
Cavigelli, Lukas [1 ]
Rutishauser, Georg [1 ]
Benini, Luca [1 ,2 ]
机构
[1] Swiss Fed Inst Technol, Integrated Syst Lab, CH-8092 Zurich, Switzerland
[2] Univ Bologna, Dept Elect Elect & Informat Engn, I-40136 Bologna, Italy
基金
欧盟地平线“2020”;
关键词
Hardware acceleration; Training; Algorithm design and analysis; Convolutional neural networks; Compression; deep learning; convolutional neural networks; hardware acceleration;
D O I
10.1109/JETCAS.2019.2950093
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power and energy constraints at low cost, as well as for boosting throughput in data centers, is growing rapidly. This has sparked a surge of research into specialized hardware accelerators. Their performance is typically limited by I/O bandwidth, power consumption is dominated by I/O transfers to off-chip memory, and on-chip memories occupy a large part of the silicon area. We introduce and evaluate a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks. We present hardware architectures and synthesis results for the compressor and decompressor in 65 nm. With a throughput of one 8-bit word/cycle at 600 MHz, they fit into 2.8 kGE and 3.0 kGE of silicon area, respectively-together the size of less than seven 8-bit multiply-add units at the same throughput. We show that an average compression ratio of $5.1\times $ for AlexNet, $4\times $ for VGG-16, $2.4\times $ for ResNet-34 and $2.2\times $ for MobileNetV2 can be achieved-a gain of 45-70 over existing methods. Our approach also works effectively for various number formats, has a low frame-to-frame variance on the compression ratio, and achieves compression factors for gradient map compression during training that are even better than for inference.
引用
收藏
页码:723 / 734
页数:12
相关论文
共 55 条
[1]  
Agustsson E, 2017, ADV NEUR IN, V30
[2]   NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].
Aimar, Alessandro ;
Mostafa, Hesham ;
Calabrese, Enrico ;
Rios-Navarro, Antonio ;
Tapiador-Morales, Ricardo ;
Lungu, Iulia-Alexandra ;
Milde, Moritz B. ;
Corradi, Federico ;
Linares-Barranco, Alejandro ;
Liu, Shih-Chii ;
Delbruck, Tobi .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656
[3]   Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing [J].
Albericio, Jorge ;
Judd, Patrick ;
Hetherington, Tayler ;
Aamodt, Tor ;
Jerger, Natalie Enright ;
Moshovos, Andreas .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :1-13
[4]   Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine [J].
Andri, Renzo ;
Cavigelli, Lukas ;
Rossi, Davide ;
Benini, Luca .
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) :309-322
[5]   Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes [J].
Andri, Renzo ;
Cavigelli, Lukas ;
Rossi, Davide ;
Benini, Luca .
2018 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2018, :509-515
[6]   YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration [J].
Andri, Renzo ;
Cavigelli, Lukas ;
Rossi, Davide ;
Benini, Luca .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) :48-60
[7]   YodaNN1 : An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights [J].
Andri, Renzo ;
Cavigelli, Lukas ;
Rossi, Davide ;
Benini, Luca .
2016 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2016, :236-241
[8]  
[Anonymous], P 52 ANN DES AUT C N
[9]  
[Anonymous], P ACM SIGDA INT S FI
[10]  
[Anonymous], 2019, P CVPR