Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks

被引:96
作者
Kim, Hyunjoon [1 ]
Yoo, Taegeun [2 ]
Kim, Tony Tae-Hyoung [1 ]
Kim, Bongjin [3 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] Samsung Elect, Hwaseong 16677, South Korea
[3] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
关键词
Computer architecture; Common Information Model (computing); Random access memory; Biological neural networks; Hardware; Transistors; Training; All-digital implementation; compute-in-memory (CIM); dot-product; neural network; SRAM; vector-matrix multiply; ACCELERATOR;
D O I
10.1109/JSSC.2021.3061508
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1-16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1-16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with 128x128 SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1-16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.
引用
收藏
页码:2221 / 2233
页数:13
相关论文
共 43 条
  • [11] Gonugondla SK, 2018, ISSCC DIG TECH PAP I, P490, DOI 10.1109/ISSCC.2018.8310398
  • [12] EIE: Efficient Inference Engine on Compressed Deep Neural Network
    Han, Song
    Liu, Xingyu
    Mao, Huizi
    Pu, Jing
    Pedram, Ardavan
    Horowitz, Mark A.
    Dally, William J.
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 243 - 254
  • [13] Horowitz M, 2014, ISSCC DIG TECH PAP I, V57, P10, DOI 10.1109/ISSCC.2014.6757323
  • [14] Hubara I, 2018, J MACH LEARN RES, V18
  • [15] In-Datacenter Performance Analysis of a Tensor Processing Unit
    Jouppi, Norman P.
    Young, Cliff
    Patil, Nishant
    Patterson, David
    Agrawal, Gaurav
    Bajwa, Raminder
    Bates, Sarah
    Bhatia, Suresh
    Boden, Nan
    Borchers, Al
    Boyle, Rick
    Cantin, Pierre-luc
    Chao, Clifford
    Clark, Chris
    Coriell, Jeremy
    Daley, Mike
    Dau, Matt
    Dean, Jeffrey
    Gelb, Ben
    Ghaemmaghami, Tara Vazir
    Gottipati, Rajendra
    Gulland, William
    Hagmann, Robert
    Ho, C. Richard
    Hogberg, Doug
    Hu, John
    Hundt, Robert
    Hurt, Dan
    Ibarz, Julian
    Jaffey, Aaron
    Jaworski, Alek
    Kaplan, Alexander
    Khaitan, Harshit
    Killebrew, Daniel
    Koch, Andy
    Kumar, Naveen
    Lacy, Steve
    Laudon, James
    Law, James
    Le, Diemthu
    Leary, Chris
    Liu, Zhuyuan
    Lucke, Kyle
    Lundin, Alan
    MacKean, Gordon
    Maggiore, Adriana
    Mahony, Maire
    Miller, Kieran
    Nagarajan, Rahul
    Narayanaswami, Ravi
    [J]. 44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, : 1 - 12
  • [16] Judd P, 2016, INT SYMP MICROARCH
  • [17] A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array
    Kang, Mingu
    Gonugondla, Sujan K.
    Patil, Ameya
    Shanbhag, Naresh R.
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (02) : 642 - 655
  • [18] Khwa WS, 2018, ISSCC DIG TECH PAP I, P496, DOI 10.1109/ISSCC.2018.8310401
  • [19] A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation
    Kim, Hyunjoon
    Chen, Qian
    Yoo, Taegeun
    Kim, Tony Tae-Hyoung
    Kim, Bongjin
    [J]. IEEE 45TH EUROPEAN SOLID STATE CIRCUITS CONFERENCE (ESSCIRC 2019), 2019, : 345 - 348
  • [20] Kim H, 2019, IEEE ASIAN SOLID STA, P35, DOI [10.1109/A-SSCC47793.2019.9056926, 10.1109/a-sscc47793.2019.9056926]