Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks

被引：96

作者：

Kim, Hyunjoon ^{[1
]}

Yoo, Taegeun ^{[2
]}

Kim, Tony Tae-Hyoung ^{[1
]}

Kim, Bongjin ^{[3
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[2] Samsung Elect, Hwaseong 16677, South Korea

[3] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA

来源：

IEEE JOURNAL OF SOLID-STATE CIRCUITS | 2021年 / 56卷 / 07期

关键词：

Computer architecture; Common Information Model (computing); Random access memory; Biological neural networks; Hardware; Transistors; Training; All-digital implementation; compute-in-memory (CIM); dot-product; neural network; SRAM; vector-matrix multiply; ACCELERATOR;

D O I：

10.1109/JSSC.2021.3061508

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1-16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1-16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with 128x128 SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1-16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.

引用

页码：2221 / 2233

页数：13

共 43 条

[11] Gonugondla SK, 2018, ISSCC DIG TECH PAP I, P490, DOI 10.1109/ISSCC.2018.8310398
[12] EIE: Efficient Inference Engine on Compressed Deep Neural Network
Han, Song
Liu, Xingyu
Mao, Huizi
Pu, Jing
Pedram, Ardavan
Horowitz, Mark A.
Dally, William J.
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 243 - 254
[13] Horowitz M, 2014, ISSCC DIG TECH PAP I, V57, P10, DOI 10.1109/ISSCC.2014.6757323
[14] Hubara I, 2018, J MACH LEARN RES, V18
[15] In-Datacenter Performance Analysis of a Tensor Processing Unit
Jouppi, Norman P.
Young, Cliff
Patil, Nishant
Patterson, David
Agrawal, Gaurav
Bajwa, Raminder
Bates, Sarah
Bhatia, Suresh
Boden, Nan
Borchers, Al
Boyle, Rick
Cantin, Pierre-luc
Chao, Clifford
Clark, Chris
Coriell, Jeremy
Daley, Mike
Dau, Matt
Dean, Jeffrey
Gelb, Ben
Ghaemmaghami, Tara Vazir
Gottipati, Rajendra
Gulland, William
Hagmann, Robert
Ho, C. Richard
Hogberg, Doug
Hu, John
Hundt, Robert
Hurt, Dan
Ibarz, Julian
Jaffey, Aaron
Jaworski, Alek
Kaplan, Alexander
Khaitan, Harshit
Killebrew, Daniel
Koch, Andy
Kumar, Naveen
Lacy, Steve
Laudon, James
Law, James
Le, Diemthu
Leary, Chris
Liu, Zhuyuan
Lucke, Kyle
Lundin, Alan
MacKean, Gordon
Maggiore, Adriana
Mahony, Maire
Miller, Kieran
Nagarajan, Rahul
Narayanaswami, Ravi
[J]. 44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, : 1 - 12
[16] Judd P, 2016, INT SYMP MICROARCH
[17] A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array
Kang, Mingu
Gonugondla, Sujan K.
Patil, Ameya
Shanbhag, Naresh R.
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (02) : 642 - 655
[18] Khwa WS, 2018, ISSCC DIG TECH PAP I, P496, DOI 10.1109/ISSCC.2018.8310401
[19] A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation
Kim, Hyunjoon
Chen, Qian
Yoo, Taegeun
Kim, Tony Tae-Hyoung
Kim, Bongjin
[J]. IEEE 45TH EUROPEAN SOLID STATE CIRCUITS CONFERENCE (ESSCIRC 2019), 2019, : 345 - 348
[20] Kim H, 2019, IEEE ASIAN SOLID STA, P35, DOI [10.1109/A-SSCC47793.2019.9056926, 10.1109/a-sscc47793.2019.9056926]

← 1 2 3 4 5 →