Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks

被引:96
作者
Kim, Hyunjoon [1 ]
Yoo, Taegeun [2 ]
Kim, Tony Tae-Hyoung [1 ]
Kim, Bongjin [3 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] Samsung Elect, Hwaseong 16677, South Korea
[3] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
关键词
Computer architecture; Common Information Model (computing); Random access memory; Biological neural networks; Hardware; Transistors; Training; All-digital implementation; compute-in-memory (CIM); dot-product; neural network; SRAM; vector-matrix multiply; ACCELERATOR;
D O I
10.1109/JSSC.2021.3061508
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1-16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1-16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with 128x128 SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1-16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.
引用
收藏
页码:2221 / 2233
页数:13
相关论文
共 43 条
  • [1] BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W
    Ando, Kota
    Ueyoshi, Kodai
    Orimo, Kentaro
    Yonekawa, Haruyoshi
    Sato, Shimpei
    Nakahara, Hiroki
    Takamaeda-Yamazaki, Shinya
    Ikebe, Masayuki
    Asai, Tetsuya
    Kuroda, Tadahiro
    Motomura, Masato
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) : 983 - 994
  • [2] [Anonymous], 2015, ARXIV151100363
  • [3] CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks
    Biswas, Avishek
    Chandrakasan, Anantha P.
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 217 - 230
  • [4] A 4096-Neuron 1M-Synapse 3.8-pJ/SOP Spiking Neural Network With On-Chip STDP Learning and Sparse Weights in 10-nm FinFET CMOS
    Chen, Gregory K.
    Kumar, Raghavan
    Sumbul, H. Ekin
    Knag, Phil C.
    Krishnamurthy, Ram K.
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (04) : 992 - 1002
  • [5] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
    Chen, Yu-Hsin
    Krishna, Tushar
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
  • [6] DaDianNao: A Machine-Learning Supercomputer
    Chen, Yunji
    Luo, Tao
    Liu, Shaoli
    Zhang, Shijin
    He, Liqiang
    Wang, Jia
    Li, Ling
    Chen, Tianshi
    Xu, Zhiwei
    Sun, Ninghui
    Temam, Olivier
    [J]. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 609 - 622
  • [7] Courbariaux Matthieu, 2016, BinaryNet: Training deep neural networks with weights and activa
  • [8] ShiDianNao: Shifting Vision Processing Closer to the Sensor
    Du, Zidong
    Fasthuber, Robert
    Chen, Tianshi
    Ienne, Paolo
    Li, Ling
    Luo, Tao
    Feng, Xiaobing
    Chen, Yunji
    Temam, Olivier
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 92 - 104
  • [9] Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
    Eckert, Charles
    Wang, Xiaowei
    Wang, Jingcheng
    Subramaniyan, Arun
    Iyer, Ravi
    Sylvester, Dennis
    Blaauw, David
    Das, Reetuparna
    [J]. 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 383 - 396
  • [10] Duality Cache for Data Parallel Acceleration
    Fujiki, Daichi
    Mahlke, Scott
    Das, Reetuparna
    [J]. PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 397 - 410