Hybrid RRAM/SRAM in-Memory Computing for Robust DNN Acceleration

被引：14

作者：

Krishnan, Gokul ^{[1
]}

Wang, Zhenyu ^{[1
]}

Yeo, Injune ^{[1
]}

Yang, Li ^{[1
]}

Meng, Jian ^{[1
]}

Liehr, Maximilian ^{[2
]}

Joshi, Rajiv, V ^{[3
]}

Cady, Nathaniel C. ^{[2
]}

Fan, Deliang ^{[1
]}

Seo, Jae-Sun ^{[1
]}

Cao, Yu ^{[1
]}

机构：

[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85287 USA

[2] State Univ New York Polytech, Dept Nanobiosci, Albany, NY 12203 USA

[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2022年 / 41卷 / 11期

基金：

美国国家科学基金会;

关键词：

Random access memory; Computer architecture; Training; Quantization (signal); Hardware; Performance evaluation; Resistance; In-memory compute; robust deep neural network (DNN) acceleration; RRAM; SRAM;

D O I：

10.1109/TCAD.2022.3197516

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The nonideal output from the RRAM macro, due to device and circuit nonidealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecture in the 65-nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multilevel RRAM cells (MLCs) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively.

引用

页码：4241 / 4252

页数：12

共 24 条

[11] Interconnect-Aware Area and Energy Optimization for In-Memory Acceleration of DNNs [J].

Krishnan, Gokul ;

Mandal, Sumit K. ;

Chakrabarti, Chaitali ;

Seo, Jae-sun ;

Ogras, Umit Y. ;

Cao, Yu .

IEEE DESIGN & TEST, 2020, 37 (06) :79-87

[12]

Liehr M., 2020, PROC IIRW, P1

[13]

Liu BY, 2014, ICCAD-IEEE ACM INT, P63, DOI 10.1109/ICCAD.2014.7001330

[14]

Long Y, 2019, DES AUT TEST EUROPE, P1769, DOI [10.23919/date.2019.8715178, 10.23919/DATE.2019.8715178]

[15]

Ma C, 2020, DES AUT TEST EUROPE, P1432, DOI 10.23919/DATE48585.2020.9116555

[16] Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks [J].

Ma, Yufei ;

Cao, Yu ;

Vrudhula, Sarma ;

Seo, Jae-sun .

FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, :45-54

[17]

Mohanty A, 2017, INT EL DEVICES MEET

[18] ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars [J].

Shafiee, Ali ;

Nag, Anirban ;

Muralimanohar, Naveen ;

Balasubramonian, Rajeev ;

Strachan, John Paul ;

Hu, Miao ;

Williams, R. Stanley ;

Srikumar, Vivek .

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :14-26

[19] PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning [J].

Song, Linghao ;

Qian, Xuehai ;

Li, Hai ;

Chen, Yiran .

2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, :541-552

[20] Unary Coding and Variation-Aware Optimal Mapping Scheme for Reliable ReRAM-Based Neuromorphic Computing [J].

Sun, Yanan ;

Ma, Chang ;

Li, Zhi ;

Zhao, Yilong ;

Jiang, Jiachen ;

Qian, Weikang ;

Yang, Rui ;

He, Zhezhi ;

Jiang, Li .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (12) :2495-2507

← 1 2 3 →