SRAM-Based In-Memory Computing Macro Featuring Voltage-Mode Accumulator and Row-by-Row ADC for Processing Neural Networks

被引:19
作者
Mu, Junjie [1 ]
Kim, Hyunjoon [1 ]
Kim, Bongjin [2 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
关键词
Random access memory; Voltage; Inverters; Neural networks; Memory management; Transistors; Voltage control; Mixed-signal; in-memory computing; binarized neural network; multiply-and-accumulate; voltage-mode; SRAM; ACCELERATOR; WEIGHT; ARCHITECTURE; COMPUTATION; TOPS/W; CHIP;
D O I
10.1109/TCSI.2022.3152653
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a mixed-signal SRAM-based in-memory computing (IMC) macro for processing binarized neural networks. The IMC macro consists of 128x 128 (16K) SRAM-based bitcells. Each bitcell consists of a standard 6T SRAM bitcell, an XNOR-based binary multiplier, and a pseudo-differential voltage-mode driver (i.e., an accumulator unit). Multiply-and-accumulate (MAC) operations between 64 pairs of inputs and weights (stored in the first 64 SRAM bitcells) are performed in 128 rows of the macro, all in parallel. A weight-stationary architecture, which minimizes off-chip memory accesses, effectively reduces energy-hungry data communications. A row-by-row analog-to-digital converter (ADC) based on 32 replica bitcells and a sense amplifier reduces the ADC area overhead and compensates for nonlinearity and variation. The ADC converts the MAC result from each row to an N-bit digital output taking 2 $<^>{N}$ -1 cycles per conversion by sweeping the reference level of 32 replica bitcells. The remaining 32 replica bitcells in the row are utilized for offset calibration. In addition, this paper presents a pseudo-differential voltage-mode accumulator to address issues in the current-mode or single-ended voltage-mode accumulator. A test chip including a 16Kbit SRAM IMC bitcell array is fabricated using a 65nm CMOS technology. The measured energy- and area-efficiency is 741-87TOPS/W with 1-5bit ADC at 0.5V supply and 3.97TOPS/mm(2), respectively.
引用
收藏
页码:2412 / 2422
页数:11
相关论文
共 38 条
[1]   BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W [J].
Ando, Kota ;
Ueyoshi, Kodai ;
Orimo, Kentaro ;
Yonekawa, Haruyoshi ;
Sato, Shimpei ;
Nakahara, Hiroki ;
Takamaeda-Yamazaki, Shinya ;
Ikebe, Masayuki ;
Asai, Tetsuya ;
Kuroda, Tadahiro ;
Motomura, Masato .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) :983-994
[2]   An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS [J].
Bankman, Daniel ;
Yang, Lita ;
Moons, Bert ;
Verhelst, Marian ;
Murmann, Boris .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) :158-172
[3]   CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks [J].
Biswas, Avishek ;
Chandrakasan, Anantha P. .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) :217-230
[4]  
Choi Jungwook, 2019, Proc. Mach. Learn. Syst.
[5]  
Elliott D., 1992, P IEEE CUST INT CIRC, P30
[6]   PROCESSING IN MEMORY - THE TERASYS MASSIVELY-PARALLEL PIM ARRAY [J].
GOKHALE, M ;
HOLMES, B ;
IOBST, K .
COMPUTER, 1995, 28 (04) :23-31
[7]  
Gonugondla SK, 2018, ISSCC DIG TECH PAP I, P490, DOI 10.1109/ISSCC.2018.8310398
[8]  
Horowitz M, 2014, ISSCC DIG TECH PAP I, V57, P10, DOI 10.1109/ISSCC.2014.6757323
[9]   An Energy Efficient 32-nm 20-MB Shared On-Die L3 Cache for Intel® Xeon® Processor E5 Family [J].
Huang, Min ;
Mehalel, Moty ;
Arvapalli, Ramesh ;
He, Songnian .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2013, 48 (08) :1954-1962
[10]  
Hubara I., 2016, Advances in Neural Information Processing Systems, Vvol 29, P4114, DOI [10.5555/3157382.3157557, DOI 10.5555/3157382.3157557]