A 2.9-33.0 TOPS/W Reconfigurable 1-D/2-D Compute-Near-Memory Inference Accelerator in 10-nm FinFET CMOS

被引:5
作者
Sumbul, H. Ekin [1 ]
Chen, Gregory K. [1 ]
Knag, Phil C. [1 ]
Kumar, Raghavan [1 ]
Anders, Mark A. [1 ]
Kaul, Himanshu [1 ]
Hsu, Steven K. [1 ]
Agarwal, Amit [1 ]
Kar, Monodeep [1 ]
Kim, Seongjong [1 ]
Krishnamurthy, Ram K. [1 ]
机构
[1] Intel Corp, Circuit Res Lab, Hillsboro, OR 97229 USA
来源
IEEE SOLID-STATE CIRCUITS LETTERS | 2020年 / 3卷
关键词
Compute-near-memory (CNM); deep learning ASIC; deep learning inference; reconfigurable systolic array; variable-precision;
D O I
10.1109/LSSC.2020.3007185
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A 10-nm compute-near-memory (CNM) accelerator augments SRAM with multiply accumulate (MAC) units to reduce interconnect energy and achieve 2.9 8b-TOPS/W for matrix-vector computation. The CNM provides high memory bandwidth by accessing SRAM subarrays to enable low-latency, real-time inference in fully connected and recurrent neural networks with small mini-batch sizes. For workloads with greater arithmetic intensity, such as large-batch convolutional neural networks, the CNM reconfigures into a 2-D systolic array to amortize memory access energy over a greater number of computations. Variable-precision 8b/4b/2b/1b MACs increase throughput by up to 8x for binary operations at 33.0 1b-TOPS/W.
引用
收藏
页码:118 / 121
页数:4
相关论文
共 11 条
[1]  
Auth C., 2017, INT EL DEVICES MEET, DOI DOI 10.1109/IEDM.2017.8268472
[2]   PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations [J].
Castaneda, Oscar ;
Bobbett, Maria ;
Gallyas-Sanhueza, Alexandra ;
Studer, Christoph .
2019 IEEE 30TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2019), 2019, :149-156
[3]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138
[4]  
Diamos Greg, 2016, PMLR, P2024
[5]   A Configurable Cloud-Scale DNN Processor for Real-Time AI [J].
Fowers, Jeremy ;
Ovtcharov, Kalin ;
Papamichael, Michael ;
Massengill, Todd ;
Liu, Ming ;
Lo, Daniel ;
Alkalay, Shlomi ;
Haselman, Michael ;
Adams, Logan ;
Ghandi, Mahdi ;
Heil, Stephen ;
Patel, Prerak ;
Sapek, Adam ;
Weisz, Gabriel ;
Woods, Lisa ;
Lanka, Sitaram ;
Reinhardt, Steven K. ;
Caulfield, Adrian M. ;
Chung, Eric S. ;
Burger, Doug .
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :1-14
[6]  
Guo Z, 2018, ISSCC DIG TECH PAP I, P196, DOI 10.1109/ISSCC.2018.8310251
[7]   A 8.93-TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity With All Parameters Stored On-Chip [J].
Kadetotad, Deepak ;
Berisha, Visar ;
Chakrabarti, Chaitali ;
Seo, Jae-Sun .
IEEE SOLID-STATE CIRCUITS LETTERS, 2019, 2 (09) :119-122
[8]   UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision [J].
Lee, Jinmook ;
Kim, Changhyeon ;
Kang, Sanghoon ;
Shin, Dongjoo ;
Kim, Sangyeob ;
Yoo, Hoi-Jun .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) :173-185
[9]  
Moons B, 2017, ISSCC DIG TECH PAP I, P246, DOI 10.1109/ISSCC.2017.7870353
[10]   Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs [J].
Nurvitadhi, Eriko ;
Kwon, Dongup ;
Jafari, Ali ;
Boutros, Andrew ;
Sim, Jaewoong ;
Tomson, Phillip ;
Sumbul, Huseyin ;
Chen, Gregory ;
Knag, Phil ;
Kumar, Raghavan ;
Krishnamurthy, Ram ;
Gribok, Sergey ;
Pasca, Bogdan ;
Langhammer, Martin ;
Marr, Debbie ;
Dasu, Aravind .
2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, :199-207