Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks

被引：96

作者：

Kim, Hyunjoon ^{[1
]}

Yoo, Taegeun ^{[2
]}

Kim, Tony Tae-Hyoung ^{[1
]}

Kim, Bongjin ^{[3
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[2] Samsung Elect, Hwaseong 16677, South Korea

[3] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA

来源：

IEEE JOURNAL OF SOLID-STATE CIRCUITS | 2021年 / 56卷 / 07期

关键词：

Computer architecture; Common Information Model (computing); Random access memory; Biological neural networks; Hardware; Transistors; Training; All-digital implementation; compute-in-memory (CIM); dot-product; neural network; SRAM; vector-matrix multiply; ACCELERATOR;

D O I：

10.1109/JSSC.2021.3061508

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1-16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1-16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with 128x128 SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1-16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.

引用

页码：2221 / 2233

页数：13

共 43 条

[1] BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W
Ando, Kota
Ueyoshi, Kodai
Orimo, Kentaro
Yonekawa, Haruyoshi
Sato, Shimpei
Nakahara, Hiroki
Takamaeda-Yamazaki, Shinya
Ikebe, Masayuki
Asai, Tetsuya
Kuroda, Tadahiro
Motomura, Masato
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) : 983 - 994
[2] [Anonymous], 2015, ARXIV151100363
[3] CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks
Biswas, Avishek
Chandrakasan, Anantha P.
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 217 - 230
[4] A 4096-Neuron 1M-Synapse 3.8-pJ/SOP Spiking Neural Network With On-Chip STDP Learning and Sparse Weights in 10-nm FinFET CMOS
Chen, Gregory K.
Kumar, Raghavan
Sumbul, H. Ekin
Knag, Phil C.
Krishnamurthy, Ram K.
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (04) : 992 - 1002
[5] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Chen, Yu-Hsin
Krishna, Tushar
Emer, Joel S.
Sze, Vivienne
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
[6] DaDianNao: A Machine-Learning Supercomputer
Chen, Yunji
Luo, Tao
Liu, Shaoli
Zhang, Shijin
He, Liqiang
Wang, Jia
Li, Ling
Chen, Tianshi
Xu, Zhiwei
Sun, Ninghui
Temam, Olivier
[J]. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 609 - 622
[7] Courbariaux Matthieu, 2016, BinaryNet: Training deep neural networks with weights and activa
[8] ShiDianNao: Shifting Vision Processing Closer to the Sensor
Du, Zidong
Fasthuber, Robert
Chen, Tianshi
Ienne, Paolo
Li, Ling
Luo, Tao
Feng, Xiaobing
Chen, Yunji
Temam, Olivier
[J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 92 - 104
[9] Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Eckert, Charles
Wang, Xiaowei
Wang, Jingcheng
Subramaniyan, Arun
Iyer, Ravi
Sylvester, Dennis
Blaauw, David
Das, Reetuparna
[J]. 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 383 - 396
[10] Duality Cache for Data Parallel Acceleration
Fujiki, Daichi
Mahlke, Scott
Das, Reetuparna
[J]. PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 397 - 410

← 1 2 3 4 5 →