Design and Implementation of a Hybrid, ADC/DAC-Free, Input-Sparsity-Aware, Precision Reconfigurable RRAM Processing-in-Memory Chip

被引：0

作者：

Wang J. ^{[1
]}

Zhang T. ^{[2
]}

Liu S. ^{[1
]}

Liu Y. ^{[1
]}

Wu Y. ^{[1
]}

Hu S. ^{[1
]}

Chen T. ^{[3
]}

Liu Y. ^{[1
]}

Yang Y. ^{[2
]}

Huang R. ^{[2
]}

机构：

[1] University of Electronic Science and Technology of China, State Key Laboratory of Electronic Thin Films and Integrated Devices, Chengdu

[2] Peking University, School of Integrated Circuits, Beijing

[3] Nanyang Technological University, School of Electrical and Electronic Engineering, Jurong West

来源：

IEEE Journal of Solid-State Circuits | 2024年 / 59卷 / 02期

关键词：

Processing-in-memory (PIM); quantization-Aware training (QAT); resistive random access memory (RRAM); sparsity-Aware; time-division multiplexing (TDM);

D O I：

10.1109/JSSC.2023.3304174

中图分类号：

学科分类号：

摘要：

In this work, we design and implement a 1-Mb resistive random access memory (RRAM) processing-in-memory (PIM) chip based on a 180-nm CMOS technology. In this design, a time-division multiplexing (TDM) circuit along with sparsity-Aware sense amplifier (SA) and asynchronous counter module (ACM) are proposed to free the chip from digital-To-Analog converter (DAC) and analog-To-digital converter (ADC). A sparsity-Aware input module (SAIM) is designed to improve computational efficiency for bit-level input sparsity detection. A technique based on quantization-Aware training (QAT), dynamically reconfigurable shifters (RecSTRs), and tree adders (TAs) is used to achieve system reconfigurability for 1-8-bit input, 1-8-bit weight, and 6-22-bit output. With this technique, optimized quantization to 4-bit weight 4-bit activation (W4A4) can reduce the number of network parameters to 1/8 of that required for the 32-bit floating-point (FP32) version. The number of calculate cycles can also be reduced to 1/4 of that of the FP32 version. This design has achieved a weight density of 13.32 {Mb/mm} {2} normalized to the 22-nm node and an energy efficiency of 17.36 TOPS/W for 4-bit integer (INT4) activation and weight. © 1966-2012 IEEE.

引用

页码：595 / 604

页数：9

共 25 条

[1] Si X., Et al., A local computing cell and 6T SRAM-based computing-in-memory macro with 8-b MAC operation for edge AI chips, IEEE J. Solid-State Circuits, 56, 9, pp. 2817-2831
[2] Yu C., Yoo T., Chai K.T.C., Kim T.T., Kim B., A 65-nm 8T SRAM compute-in-memory macro with column ADCs for processing neural networks, IEEE J. Solid-State Circuits, 57, 11, pp. 3466-3476
[3] Hung J.-M., Et al., 8-b precision 8-Mb ReRAM compute-in-memory macro using direct-current-free time-domain readout scheme for AI edge devices, IEEE J. Solid-State Circuits, 58, 1, pp. 303-315, (2023)
[4] Chen Y., Lu L., Kim B., Kim T.T., Reconfigurable 2T2R ReRAM architecture for versatile data storage and computing in-memory, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 28, 12, pp. 2636-2649
[5] Ali M.F., Jaiswal A., Roy K., In-memory low-cost bit-serial addition using commodity dram technology, IEEE Trans. Circuits Syst. I, Reg. Papers, 67, 1, pp. 155-165, (2020)
[6] Roy S., Ali M., Raghunathan A., PIM-DRAM: Accelerating machine learning workloads using processing in commodity DRAM, IEEE J. Emerg. Sel. Topics Circuits Syst., 11, 4, pp. 701-710
[7] Prajapati S., Nehra V., Kaushik B.K., High-performance computing-in-memory architecture based on single-level and multilevel cell differential spin Hall MRAM, IEEE Trans. Magn., 57, 9, pp. 1-15
[8] Monga K., Chaturvedi N., Gurunarayanan S., A dual-mode in-memory computing unit using spin Hall-assisted MRAM for data-intensive applications, IEEE Trans. Magn., 57, 4, pp. 1-10, (2021)
[9] Yin G., Et al., Enabling lower-power charge-domain nonvolatile in-memory computing with ferroelectric FETs, IEEE Trans. Circuits Syst. II, Exp. Briefs, 68, 7, pp. 2262-2266
[10] Hwang J., Lim S., Kim G., Jung S.-O., Jeon S., Non-volatile majority function logic using ferroelectric memory for logic in memory technology, IEEE Electron Device Lett, 43, 7, pp. 1049-1052

← 1 2 3 →