SRAM-Based In-Memory Computing Macro Featuring Voltage-Mode Accumulator and Row-by-Row ADC for Processing Neural Networks

被引：19

作者：

Mu, Junjie ^{[1
]}

Kim, Hyunjoon ^{[1
]}

Kim, Bongjin ^{[2
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[2] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2022年 / 69卷 / 06期

关键词：

Random access memory; Voltage; Inverters; Neural networks; Memory management; Transistors; Voltage control; Mixed-signal; in-memory computing; binarized neural network; multiply-and-accumulate; voltage-mode; SRAM; ACCELERATOR; WEIGHT; ARCHITECTURE; COMPUTATION; TOPS/W; CHIP;

D O I：

10.1109/TCSI.2022.3152653

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper presents a mixed-signal SRAM-based in-memory computing (IMC) macro for processing binarized neural networks. The IMC macro consists of 128x 128 (16K) SRAM-based bitcells. Each bitcell consists of a standard 6T SRAM bitcell, an XNOR-based binary multiplier, and a pseudo-differential voltage-mode driver (i.e., an accumulator unit). Multiply-and-accumulate (MAC) operations between 64 pairs of inputs and weights (stored in the first 64 SRAM bitcells) are performed in 128 rows of the macro, all in parallel. A weight-stationary architecture, which minimizes off-chip memory accesses, effectively reduces energy-hungry data communications. A row-by-row analog-to-digital converter (ADC) based on 32 replica bitcells and a sense amplifier reduces the ADC area overhead and compensates for nonlinearity and variation. The ADC converts the MAC result from each row to an N-bit digital output taking 2 $<^>{N}$ -1 cycles per conversion by sweeping the reference level of 32 replica bitcells. The remaining 32 replica bitcells in the row are utilized for offset calibration. In addition, this paper presents a pseudo-differential voltage-mode accumulator to address issues in the current-mode or single-ended voltage-mode accumulator. A test chip including a 16Kbit SRAM IMC bitcell array is fabricated using a 65nm CMOS technology. The measured energy- and area-efficiency is 741-87TOPS/W with 1-5bit ADC at 0.5V supply and 3.97TOPS/mm(2), respectively.

引用

页码：2412 / 2422

页数：11

共 38 条

[1] BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W [J].

Ando, Kota ;

Ueyoshi, Kodai ;

Orimo, Kentaro ;

Yonekawa, Haruyoshi ;

Sato, Shimpei ;

Nakahara, Hiroki ;

Takamaeda-Yamazaki, Shinya ;

Ikebe, Masayuki ;

Asai, Tetsuya ;

Kuroda, Tadahiro ;

Motomura, Masato .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) :983-994

[2] An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS [J].

Bankman, Daniel ;

Yang, Lita ;

Moons, Bert ;

Verhelst, Marian ;

Murmann, Boris .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) :158-172

[3] CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks [J].

Biswas, Avishek ;

Chandrakasan, Anantha P. .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) :217-230

[4]

Choi Jungwook, 2019, Proc. Mach. Learn. Syst.

[5]

Elliott D., 1992, P IEEE CUST INT CIRC, P30

[6] PROCESSING IN MEMORY - THE TERASYS MASSIVELY-PARALLEL PIM ARRAY [J].

GOKHALE, M ;

HOLMES, B ;

IOBST, K .

COMPUTER, 1995, 28 (04) :23-31

[7]

Gonugondla SK, 2018, ISSCC DIG TECH PAP I, P490, DOI 10.1109/ISSCC.2018.8310398

[8]

Horowitz M, 2014, ISSCC DIG TECH PAP I, V57, P10, DOI 10.1109/ISSCC.2014.6757323

[9] An Energy Efficient 32-nm 20-MB Shared On-Die L3 Cache for Intel® Xeon® Processor E5 Family [J].

Huang, Min ;

Mehalel, Moty ;

Arvapalli, Ramesh ;

He, Songnian .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2013, 48 (08) :1954-1962

[10]

Hubara I., 2016, Advances in Neural Information Processing Systems, Vvol 29, P4114, DOI [10.5555/3157382.3157557, DOI 10.5555/3157382.3157557]

← 1 2 3 4 →