Design and Implementation of a Hybrid, ADC/DAC-Free, Input-Sparsity-Aware, Precision Reconfigurable RRAM Processing-in-Memory Chip

被引:0
作者
Wang J. [1 ]
Zhang T. [2 ]
Liu S. [1 ]
Liu Y. [1 ]
Wu Y. [1 ]
Hu S. [1 ]
Chen T. [3 ]
Liu Y. [1 ]
Yang Y. [2 ]
Huang R. [2 ]
机构
[1] University of Electronic Science and Technology of China, State Key Laboratory of Electronic Thin Films and Integrated Devices, Chengdu
[2] Peking University, School of Integrated Circuits, Beijing
[3] Nanyang Technological University, School of Electrical and Electronic Engineering, Jurong West
关键词
Processing-in-memory (PIM); quantization-Aware training (QAT); resistive random access memory (RRAM); sparsity-Aware; time-division multiplexing (TDM);
D O I
10.1109/JSSC.2023.3304174
中图分类号
学科分类号
摘要
In this work, we design and implement a 1-Mb resistive random access memory (RRAM) processing-in-memory (PIM) chip based on a 180-nm CMOS technology. In this design, a time-division multiplexing (TDM) circuit along with sparsity-Aware sense amplifier (SA) and asynchronous counter module (ACM) are proposed to free the chip from digital-To-Analog converter (DAC) and analog-To-digital converter (ADC). A sparsity-Aware input module (SAIM) is designed to improve computational efficiency for bit-level input sparsity detection. A technique based on quantization-Aware training (QAT), dynamically reconfigurable shifters (RecSTRs), and tree adders (TAs) is used to achieve system reconfigurability for 1-8-bit input, 1-8-bit weight, and 6-22-bit output. With this technique, optimized quantization to 4-bit weight 4-bit activation (W4A4) can reduce the number of network parameters to 1/8 of that required for the 32-bit floating-point (FP32) version. The number of calculate cycles can also be reduced to 1/4 of that of the FP32 version. This design has achieved a weight density of 13.32 {Mb/mm} {2} normalized to the 22-nm node and an energy efficiency of 17.36 TOPS/W for 4-bit integer (INT4) activation and weight. © 1966-2012 IEEE.
引用
收藏
页码:595 / 604
页数:9
相关论文
共 25 条
  • [1] Si X., Et al., A local computing cell and 6T SRAM-based computing-in-memory macro with 8-b MAC operation for edge AI chips, IEEE J. Solid-State Circuits, 56, 9, pp. 2817-2831
  • [2] Yu C., Yoo T., Chai K.T.C., Kim T.T., Kim B., A 65-nm 8T SRAM compute-in-memory macro with column ADCs for processing neural networks, IEEE J. Solid-State Circuits, 57, 11, pp. 3466-3476
  • [3] Hung J.-M., Et al., 8-b precision 8-Mb ReRAM compute-in-memory macro using direct-current-free time-domain readout scheme for AI edge devices, IEEE J. Solid-State Circuits, 58, 1, pp. 303-315, (2023)
  • [4] Chen Y., Lu L., Kim B., Kim T.T., Reconfigurable 2T2R ReRAM architecture for versatile data storage and computing in-memory, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 28, 12, pp. 2636-2649
  • [5] Ali M.F., Jaiswal A., Roy K., In-memory low-cost bit-serial addition using commodity dram technology, IEEE Trans. Circuits Syst. I, Reg. Papers, 67, 1, pp. 155-165, (2020)
  • [6] Roy S., Ali M., Raghunathan A., PIM-DRAM: Accelerating machine learning workloads using processing in commodity DRAM, IEEE J. Emerg. Sel. Topics Circuits Syst., 11, 4, pp. 701-710
  • [7] Prajapati S., Nehra V., Kaushik B.K., High-performance computing-in-memory architecture based on single-level and multilevel cell differential spin Hall MRAM, IEEE Trans. Magn., 57, 9, pp. 1-15
  • [8] Monga K., Chaturvedi N., Gurunarayanan S., A dual-mode in-memory computing unit using spin Hall-assisted MRAM for data-intensive applications, IEEE Trans. Magn., 57, 4, pp. 1-10, (2021)
  • [9] Yin G., Et al., Enabling lower-power charge-domain nonvolatile in-memory computing with ferroelectric FETs, IEEE Trans. Circuits Syst. II, Exp. Briefs, 68, 7, pp. 2262-2266
  • [10] Hwang J., Lim S., Kim G., Jung S.-O., Jeon S., Non-volatile majority function logic using ferroelectric memory for logic in memory technology, IEEE Electron Device Lett, 43, 7, pp. 1049-1052