A 40-nm MLC-RRAM Compute-in-Memory Macro With Sparsity Control, On-Chip Write-Verify, and Temperature-Independent ADC References

被引:38
作者
Li, Wantong [1 ]
Sun, Xiaoyu [2 ]
Huang, Shanshi [1 ]
Jiang, Hongwu [1 ]
Yu, Shimeng [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[2] Taiwan Semicond Mfg Co TSMC, San Jose, CA 95132 USA
关键词
System-on-chip; Resistance; Common Information Model (computing); Sensors; Nonvolatile memory; Programming; Quantization (signal); Emerging non-volatile memories (NVMs); hardware accelerators; in-memory computing; machine learning; MONOLITHICALLY INTEGRATED RRAM; INFERENCE; CMOS;
D O I
10.1109/JSSC.2022.3163197
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Resistive random access memory (RRAM)-based compute-in-memory (CIM) has shown great potential for accelerating deep neural network (DNN) inference. However, device characteristics, such as low-resistance values, susceptibility to drift, and single-level cells, may limit the capabilities of RRAM-based CIM. In addition, prior works generally used the off-chip write-verify scheme to tighten RRAM resistance distributions and used off-chip analog-to-digital converter (ADC) references for fine-tuning partial sum quantization. Although off-chip techniques are viable for testing purposes, they may be unsuitable for practical applications. In this work, we present an RRAM-CIM macro to accelerate DNN inference. The chip features: 1) multi-level cell (MLC) RRAM for improving compute performance and density; 2) sparsity-aware input control to leverage the high activation sparsity in DNN models; 3) on-chip write-verify to speed up initial weight programming and periodically refresh cells to compensate for resistance drift under stress; and 4) on-chip ADC reference generation that provides column-wise tunability and stability with varying temperatures to guarantee the CIFAR-10 accuracy of 85.8% at 120 degrees C. The design is fabricated in TSMC 40-nm process with embedded RRAM technology and achieves a macro-level peak performance of 97.8 GOPS/mm(2) and 44.5 TOPS/W for multiply-and-accumulate (MAC) operations on VGG-8 network with ternary weights.
引用
收藏
页码:2868 / 2877
页数:10
相关论文
共 37 条
[1]   BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W [J].
Ando, Kota ;
Ueyoshi, Kodai ;
Orimo, Kentaro ;
Yonekawa, Haruyoshi ;
Sato, Shimpei ;
Nakahara, Hiroki ;
Takamaeda-Yamazaki, Shinya ;
Ikebe, Masayuki ;
Asai, Tetsuya ;
Kuroda, Tadahiro ;
Motomura, Masato .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) :983-994
[2]  
Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
[3]   A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices [J].
Xue, Cheng-Xin ;
Hung, Je-Min ;
Kao, Hui-Yao ;
Huang, Yen-Hsiang ;
Huang, Sheng-Po ;
Chang, Fu-Chun ;
Chen, Peng ;
Liu, Ta-Wei ;
Jhang, Chuan-Jia ;
Su, Chin-, I ;
Khwa, Win-San ;
Lo, Chung-Chuan ;
Liu, Ren-Shuo ;
Hsieh, Chih-Cheng ;
Tang, Kea-Tiong ;
Chih, Yu-Der ;
Chang, Tsung-Yung Jonathan ;
Chang, Meng-Fan .
2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2021, 64 :246-+
[4]   A Fully Integrated Reprogrammable CMOS-RRAM Compute-in-Memory Coprocessor for Neuromorphic Applications [J].
Correll, Justin M. ;
Bothra, Vishishtha ;
Cai, Fuxi ;
Lim, Yong ;
Lee, Seung Hwan ;
Lee, Seungjong ;
Lu, Wei D. ;
Zhang, Zhengya ;
Flynn, Michael P. .
IEEE JOURNAL ON EXPLORATORY SOLID-STATE COMPUTATIONAL DEVICES AND CIRCUITS, 2020, 6 (01) :36-44
[5]  
Dong Q, 2020, ISSCC DIG TECH PAP I, P242, DOI [10.1109/ISSCC19947.2020.9062985, 10.1109/isscc19947.2020.9062985]
[6]   2-Bit-Per-Cell RRAM-Based In-Memory Computing for Area-/Energy-Efficient Deep Learning [J].
He, Wangxin ;
Yin, Shihui ;
Kim, Yulhwa ;
Sun, Xiaoyu ;
Kim, Jae-Joon ;
Yu, Shimeng ;
Seo, Jae-Sun .
IEEE SOLID-STATE CIRCUITS LETTERS, 2020, 3 :194-197
[7]   Analog-to-Digital Converter Design Exploration for Compute-in-Memory Accelerators [J].
Jiang, Hongwu ;
Li, Wantong ;
Huang, Shanshi ;
Cosemans, Stefan ;
Catthoor, Francky ;
Yu, Shimeng .
IEEE DESIGN & TEST, 2022, 39 (02) :48-55
[8]  
Jiang ZW, 2018, 2018 IEEE SYMPOSIUM ON VLSI TECHNOLOGY, P173, DOI 10.1109/VLSIT.2018.8510687
[9]   In-Datacenter Performance Analysis of a Tensor Processing Unit [J].
Jouppi, Norman P. ;
Young, Cliff ;
Patil, Nishant ;
Patterson, David ;
Agrawal, Gaurav ;
Bajwa, Raminder ;
Bates, Sarah ;
Bhatia, Suresh ;
Boden, Nan ;
Borchers, Al ;
Boyle, Rick ;
Cantin, Pierre-luc ;
Chao, Clifford ;
Clark, Chris ;
Coriell, Jeremy ;
Daley, Mike ;
Dau, Matt ;
Dean, Jeffrey ;
Gelb, Ben ;
Ghaemmaghami, Tara Vazir ;
Gottipati, Rajendra ;
Gulland, William ;
Hagmann, Robert ;
Ho, C. Richard ;
Hogberg, Doug ;
Hu, John ;
Hundt, Robert ;
Hurt, Dan ;
Ibarz, Julian ;
Jaffey, Aaron ;
Jaworski, Alek ;
Kaplan, Alexander ;
Khaitan, Harshit ;
Killebrew, Daniel ;
Koch, Andy ;
Kumar, Naveen ;
Lacy, Steve ;
Laudon, James ;
Law, James ;
Le, Diemthu ;
Leary, Chris ;
Liu, Zhuyuan ;
Lucke, Kyle ;
Lundin, Alan ;
MacKean, Gordon ;
Maggiore, Adriana ;
Mahony, Maire ;
Miller, Kieran ;
Nagarajan, Rahul ;
Narayanaswami, Ravi .
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :1-12
[10]   A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array [J].
Kang, Mingu ;
Gonugondla, Sujan K. ;
Patil, Ameya ;
Shanbhag, Naresh R. .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (02) :642-655