TIME:A Training-in-memory Architecture for Memristor-based Deep Neural Networks

被引:46
作者
Cheng, Ming [1 ]
Xia, Lixue [1 ]
Zhu, Zhenhua [1 ]
Cai, Yi [1 ]
Xie, Yuan [2 ]
Wang, Yu [1 ]
Yang, Huazhong [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, TNList, Beijing, Peoples R China
[2] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
来源
PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) | 2017年
基金
中国国家自然科学基金;
关键词
D O I
10.1145/3061639.3062326
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The training of neural network (NN) is usually time-consuming and resource intensive. Memristor has shown its potential in computation of NN. Especially for the metal-oxide resistive random access memory (RRAM), its crossbar structure and multi-bit characteristic can perform the matrix-vector product in high precision, which is the most common operation of NN. However, there exist two challenges on realizing the training of NN. Firstly, the current architecture can only support the inference phase of training and cannot perform the backpropagation (BP), the weights update of NN. Secondly, the training of NN requires enormous iterations and constantly updates the weights to reach the convergence, which leads to large energy consumption because of lots of write and read operations. In this work, we propose a novel architecture, TIME, and peripheral circuit designs to enable the training of NN in RRAM. TIME supports the BP and the weights update while maximizing the reuse of peripheral circuits for the inference operation on RRAM. Meanwhile, a variability-free tuning scheme and gradually-write circuits are designed to reduce the cost of tuning RRAM. We explore the performance of both SL (supervised learning) and DRL ( deep reinforcement learning) in TIME, and a specific mapping method of DRL is also introduced to further improve the energy efficiency. Experimental results show that, in SL, TIME can achieve 5.3x higher energy efficiency on average compared with the most powerful application-specific integrated circuits (ASIC) in the literature. In DRL, TIME can perform averagely 126x higher than GPU in energy efficiency. If the cost of tuning RRAM can be further reduced, TIME have the potential of boosting the energy efficiency by 2 orders of magnitude compared with ASIC.
引用
收藏
页数:6
相关论文
共 19 条
[1]  
[Anonymous], IEDM
[2]  
[Anonymous], 2011, P AUS SOC SUG CAN TE
[3]  
[Anonymous], IEEE T NEURAL NETWOR
[4]  
[Anonymous], 2013, PLAYING ATARI WITH D
[5]   DaDianNao: A Machine-Learning Supercomputer [J].
Chen, Yunji ;
Luo, Tao ;
Liu, Shaoli ;
Zhang, Shijin ;
He, Liqiang ;
Wang, Jia ;
Li, Ling ;
Chen, Tianshi ;
Xu, Zhiwei ;
Sun, Ninghui ;
Temam, Olivier .
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, :609-622
[6]   PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory [J].
Chi, Ping ;
Li, Shuangchen ;
Xu, Cong ;
Zhang, Tao ;
Zhao, Jishen ;
Liu, Yongpan ;
Wang, Yu ;
Xie, Yuan .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :27-39
[7]  
Dong Xiangyu., 2014, Emerging Memory Technologies, P15
[8]  
Hasan R, 2014, IEEE IJCNN, P21, DOI 10.1109/IJCNN.2014.6889893
[9]   GPUS AND THE FUTURE OF PARALLEL COMPUTING [J].
Keckler, Stephen W. ;
Dally, William J. ;
Khailany, Brucek ;
Garland, Michael ;
Glasco, David .
IEEE MICRO, 2011, 31 (05) :7-17
[10]  
LeCun, 1998, THE MNIST DATABASE O