Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks

被引：96

作者：

Kim, Hyunjoon ^{[1
]}

Yoo, Taegeun ^{[2
]}

Kim, Tony Tae-Hyoung ^{[1
]}

Kim, Bongjin ^{[3
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[2] Samsung Elect, Hwaseong 16677, South Korea

[3] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA

来源：

IEEE JOURNAL OF SOLID-STATE CIRCUITS | 2021年 / 56卷 / 07期

关键词：

Computer architecture; Common Information Model (computing); Random access memory; Biological neural networks; Hardware; Transistors; Training; All-digital implementation; compute-in-memory (CIM); dot-product; neural network; SRAM; vector-matrix multiply; ACCELERATOR;

D O I：

10.1109/JSSC.2021.3061508

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1-16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1-16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with 128x128 SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1-16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.

引用

页码：2221 / 2233

页数：13

共 43 条

[21] Krishnamoorthi R., 2018, ARXIV180608342
[22] ImageNet Classification with Deep Convolutional Neural Networks
Krizhevsky, Alex
Sutskever, Ilya
Hinton, Geoffrey E.
[J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
[23] Gradient-based learning applied to document recognition
Lecun, Y
Bottou, L
Bengio, Y
Haffner, P
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
[24] UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
Lee, Jinmook
Kim, Changhyeon
Kang, Sanghoon
Shin, Dongjoo
Kim, Sangyeob
Yoo, Hoi-Jun
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 173 - 185
[25] An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS
Moons, Bert
Verhelst, Marian
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (04) : 903 - 914
[26] Moons B, 2017, ISSCC DIG TECH PAP I, P246, DOI 10.1109/ISSCC.2017.7870353
[27] Park J, 2019, ISSCC DIG TECH PAP I, V62, P140, DOI 10.1109/ISSCC.2019.8662398
[28] Polino A., 2018, P INT C LEARN REPR I
[29] XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Rastegari, Mohammad
Ordonez, Vicente
Redmon, Joseph
Farhadi, Ali
[J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 525 - 542
[30] Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators
Reagen, Brandon
Whatmough, Paul
Adolf, Robert
Rama, Saketh
Lee, Hyunkwang
Lee, Sae Kyu
Miguel Hernandez-Lobato, Jose
Wei, Gu-Yeon
Brooks, David
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 267 - 278

← 1 2 3 4 5 →