Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

被引:257
作者
Eckert, Charles [1 ]
Wang, Xiaowei [1 ]
Wang, Jingcheng [1 ]
Subramaniyan, Arun [1 ]
Iyer, Ravi [2 ]
Sylvester, Dennis [1 ]
Blaauw, David [1 ]
Das, Reetuparna [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Intel Corp, Santa Clara, CA 95051 USA
来源
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) | 2018年
关键词
Cache; In-memory architecture; Convolution Neural Network; Bit-serial architecture; ENERGY-EFFICIENT; MEMORY;
D O I
10.1109/ISCA.2018.00040
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 18.3x over state-of-art multi-core CPU (Xeon E5), 7.7x over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4x over CPU (2.2x over GPU), while reducing power consumption by 50% over CPU (53% over GPU).
引用
收藏
页码:383 / 396
页数:14
相关论文
共 40 条
[1]   Compute Caches [J].
Aga, Shaizeen ;
Jeloka, Supreet ;
Subramaniyan, Arun ;
Narayanasamy, Satish ;
Blaauw, David ;
Das, Reetuparna .
2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, :481-492
[2]   PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture [J].
Ahn, Junwhan ;
Yoo, Sungjoo ;
Mutlu, Onur ;
Choi, Kiyoung .
2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, :336-348
[3]   Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing [J].
Albericio, Jorge ;
Judd, Patrick ;
Hetherington, Tayler ;
Aamodt, Tor ;
Jerger, Natalie Enright ;
Moshovos, Andreas .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :1-13
[4]  
[Anonymous], 2014, HYBRID MEMORY CUBE S
[5]  
BATCHER KE, 1982, IEEE T COMPUT, V31, P377, DOI 10.1109/TC.1982.1676015
[6]  
Chen W., 2013, VLSI Technology (VLSIT), 2013 Symposium on, pC132
[7]   Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Emer, Joel ;
Sze, Vivienne .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :367-379
[8]   DaDianNao: A Machine-Learning Supercomputer [J].
Chen, Yunji ;
Luo, Tao ;
Liu, Shaoli ;
Zhang, Shijin ;
He, Liqiang ;
Wang, Jia ;
Li, Ling ;
Chen, Tianshi ;
Xu, Zhiwei ;
Sun, Ninghui ;
Temam, Olivier .
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, :609-622
[9]   PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory [J].
Chi, Ping ;
Li, Shuangchen ;
Xu, Cong ;
Zhang, Tao ;
Zhao, Jishen ;
Liu, Yongpan ;
Wang, Yu ;
Xie, Yuan .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :27-39
[10]  
Chung E., 2017 HOT CHIPS S HIG