A Precision -Optimized Fixed -Point Near-Memory Digital Processing Unit for Analog In -Memory Computing

被引：0

作者：

Ferro, Elena ^{[1
,2
]}

Vasilopoulos, Athanasios ^{[1
]}

Lammie, Corey ^{[1
]}

Le Gallo, Manuel ^{[1
]}

Benini, Luca ^{[2
]}

Boybat, Irem ^{[1
]}

Sebastian, Abu ^{[1
]}

机构：

[1] IBM Res Europe, CH-8803 Ruschlikon, Switzerland

[2] IIS ETH Zurich, CH-8092 Zurich, Switzerland

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024 | 2024年

关键词：

Near-memory processing; Fixed-point computing; Analog in -memory computing; Deep learning; AI; DEEP NEURAL-NETWORKS;

D O I：

10.1109/ISCAS58744.2024.10558286

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Analog In -Memory Computing (AIMC) is an emerging technology for fast and energy -efficient Deep Learning (DL) inference. However, a certain amount of digital post-processing is required to deal with circuit mismatches and non-idealities associated with the memory devices. Efficient near-memory digital logic is critical to retain the high area/energy efficiency and low latency of AIMC. Existing systems adopt Floating Point 16 (FP16) arithmetic with limited parallelization capability and high latency. To overcome these limitations, we propose a Near-Memory digital Processing Unit (NMPU) based on fixed-point arithmetic. It achieves competitive accuracy and higher computing throughput than previous approaches while minimizing the area overhead. Moreover, the NMPU supports standard DL activation steps, such as ReLU and Batch Normalization. We perform a physical implementation of the NMPU design in a 14 nm CMOS technology and provide detailed performance, power, and area assessments. We validate the efficacy of the NMPU by using data from an AIMC chip and demonstrate that a simulated AIMC system with the proposed NMPU outperforms existing FP16-based implementations, providing 139 x speed-up, 7.8 x smaller area, and a competitive power consumption. Additionally, our approach achieves an inference accuracy of 86.65 %/65.06 %, with an accuracy drop of just 0.12 %/0.4 % compared to the FP16 baseline when benchmarked with ResNet9/ResNet32 networks trained on the CIFAR10/CIFAR100 datasets, respectively.

引用

页数：5

共 19 条

[1]

Boybat Irem, 2021, TEMPERATURE SENSITIV

[2] Gradient descent-based programming of analog in-memory computing cores [J].

Buchel, J. ;

Vasilopoulos, A. ;

Kersting, B. ;

Odermatt, F. ;

Brew, K. ;

Ok, I. ;

Choi, S. ;

Saraf, I. ;

Chan, V. ;

Philip, T. ;

Saulnier, N. ;

Narayanan, V. ;

Le Gallo, M. ;

Sebastian, A. .

2022 INTERNATIONAL ELECTRON DEVICES MEETING, IEDM, 2022,

[3] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices [J].

Chen, Yu-Hsin ;

Yange, Tien-Ju ;

Emer, Joel S. ;

Sze, Vivienne .

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) :292-308

[4]

He K., 2016, COMPUTER VISION PATT, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]

[5] A Heterogeneous and Programmable Compute-In-Memory Accelerator Architecture for Analog-AI Using Dense 2-D Mesh [J].

Jain, Shubham ;

Tsai, Hsinyu ;

Chen, Ching-Tzu ;

Muralidhar, Ramachandran ;

Boybat, Irem ;

Frank, Martin M. ;

Wozniak, Stanislaw ;

Stanisavljevic, Milos ;

Adusumilli, Praneet ;

Narayanan, Pritish ;

Hosokawa, Kohji ;

Ishii, Masatoshi ;

Kumar, Arvind ;

Narayanan, Vijay ;

Burr, Geoffrey W. .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (01) :114-127

[6] Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing [J].

Jia, Hongyang ;

Ozatay, Murat ;

Tang, Yinqi ;

Valavi, Hossein ;

Pathak, Rakshit ;

Lee, Jinseok ;

Verma, Naveen .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2022, 57 (01) :198-211

[7] A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing [J].

Jia, Hongyang ;

Valavi, Hossein ;

Tang, Yinqi ;

Zhang, Jintao ;

Verma, Naveen .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (09) :2609-2621

[8] In-Datacenter Performance Analysis of a Tensor Processing Unit [J].

Jouppi, Norman P. ;

Young, Cliff ;

Patil, Nishant ;

Patterson, David ;

Agrawal, Gaurav ;

Bajwa, Raminder ;

Bates, Sarah ;

Bhatia, Suresh ;

Boden, Nan ;

Borchers, Al ;

Boyle, Rick ;

Cantin, Pierre-luc ;

Chao, Clifford ;

Clark, Chris ;

Coriell, Jeremy ;

Daley, Mike ;

Dau, Matt ;

Dean, Jeffrey ;

Gelb, Ben ;

Ghaemmaghami, Tara Vazir ;

Gottipati, Rajendra ;

Gulland, William ;

Hagmann, Robert ;

Ho, C. Richard ;

Hogberg, Doug ;

Hu, John ;

Hundt, Robert ;

Hurt, Dan ;

Ibarz, Julian ;

Jaffey, Aaron ;

Jaworski, Alek ;

Kaplan, Alexander ;

Khaitan, Harshit ;

Killebrew, Daniel ;

Koch, Andy ;

Kumar, Naveen ;

Lacy, Steve ;

Laudon, James ;

Law, James ;

Le, Diemthu ;

Leary, Chris ;

Liu, Zhuyuan ;

Lucke, Kyle ;

Lundin, Alan ;

MacKean, Gordon ;

Maggiore, Adriana ;

Mahony, Maire ;

Miller, Kieran ;

Nagarajan, Rahul ;

Narayanaswami, Ravi .

44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :1-12

[9] GPUS AND THE FUTURE OF PARALLEL COMPUTING [J].

Keckler, Stephen W. ;

Dally, William J. ;

Khailany, Brucek ;

Garland, Michael ;

Glasco, David .

IEEE MICRO, 2011, 31 (05) :7-17

[10] HERMES-Core--A 1.59-TOPS/mm2 PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs [J].

Khaddam-Aljameh, Riduan ;

Stanisavljevic, Milos ;

Mas, Jordi Fornt ;

Karunaratne, Geethan ;

Brandli, Matthias ;

Liu, Feng ;

Singh, Abhairaj ;

Mueller, Silvia M. ;

Egger, Urs ;

Petropoulos, Anastasios ;

Antonakopoulos, Theodore ;

Brew, Kevin ;

Choi, Samuel ;

Ok, Injo ;

Li Lie, Fee ;

Saulnier, Nicole ;

Chan, Victor ;

Ahsan, Ishtiaq ;

Narayanan, Vijay ;

Nandakumar, S. R. ;

Le Gallo, Manuel ;

Francese, Pier Andrea ;

Sebastian, Abu ;

Eleftheriou, Evangelos .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2022, 57 (04) :1027-1038

← 1 2 →