Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks

被引:96
作者
Kim, Hyunjoon [1 ]
Yoo, Taegeun [2 ]
Kim, Tony Tae-Hyoung [1 ]
Kim, Bongjin [3 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] Samsung Elect, Hwaseong 16677, South Korea
[3] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
关键词
Computer architecture; Common Information Model (computing); Random access memory; Biological neural networks; Hardware; Transistors; Training; All-digital implementation; compute-in-memory (CIM); dot-product; neural network; SRAM; vector-matrix multiply; ACCELERATOR;
D O I
10.1109/JSSC.2021.3061508
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1-16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1-16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with 128x128 SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1-16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.
引用
收藏
页码:2221 / 2233
页数:13
相关论文
共 43 条
  • [21] Krishnamoorthi R., 2018, ARXIV180608342
  • [22] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [23] Gradient-based learning applied to document recognition
    Lecun, Y
    Bottou, L
    Bengio, Y
    Haffner, P
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
  • [24] UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
    Lee, Jinmook
    Kim, Changhyeon
    Kang, Sanghoon
    Shin, Dongjoo
    Kim, Sangyeob
    Yoo, Hoi-Jun
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 173 - 185
  • [25] An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS
    Moons, Bert
    Verhelst, Marian
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (04) : 903 - 914
  • [26] Moons B, 2017, ISSCC DIG TECH PAP I, P246, DOI 10.1109/ISSCC.2017.7870353
  • [27] Park J, 2019, ISSCC DIG TECH PAP I, V62, P140, DOI 10.1109/ISSCC.2019.8662398
  • [28] Polino A., 2018, P INT C LEARN REPR I
  • [29] XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
    Rastegari, Mohammad
    Ordonez, Vicente
    Redmon, Joseph
    Farhadi, Ali
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 525 - 542
  • [30] Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators
    Reagen, Brandon
    Whatmough, Paul
    Adolf, Robert
    Rama, Saketh
    Lee, Hyunkwang
    Lee, Sae Kyu
    Miguel Hernandez-Lobato, Jose
    Wei, Gu-Yeon
    Brooks, David
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 267 - 278