3DL-PIM: A Look-Up Table Oriented Programmable Processing in Memory Architecture Based on the 3-D Stacked Memory for Data-Intensive Applications

被引：4

作者：

Sutradhar, Purab Ranjan ^{[1
]}

Bavikadi, Sathwika ^{[3
]}

Dinakarrao, Sai Manoj Pudukotai ^{[3
]}

Indovina, Mark A. ^{[2
]}

Ganguly, Amlan ^{[1
]}

机构：

[1] Rochester Inst Technol, Dept Comp Engn, Rochester, NY 14623 USA

[2] Rochester Inst Technol, Dept Elect & Microelect Engn, Rochester, NY 14623 USA

[3] George Mason Univ, Dept Elect & Comp Engn, Fairfax, VA 22030 USA

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2024年 / 12卷 / 01期

基金：

美国国家科学基金会;

关键词：

3-D memory; data encryption; deep neural networks; look-up table; parallel processing; processing-in-memory; DRAM;

D O I：

10.1109/TETC.2023.3293140

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Memory-centric computing systems have demonstrated superior performance and efficiency in memory-intensive applications compared to state-of-the-art CPUs and GPUs. 3-D stacked DRAM architectures unlock higher I/O data bandwidth than the traditional 2-D memory architecture and therefore are better suited for incorporating memory-centric processors. However, merely integrating high-precision ALUs in the 3-D stacked memory does not ensure an optimized design since such a design can only achieve a limited utilization of the internal bandwidth of a memory chip and limited operational parallelization. To address this, we propose 3DL-PIM, a 3-D stacked memory-based Processing in Memory (PIM) architecture that locates a plurality of Look-up Table (LUT)-based low-footprint Processing Elements (PE) within the memory banks in order to achieve high parallel computing performance by maximizing data-bandwidth utilization. Instead of relying on the traditional logic-based ALUs, the PEs are formed by clustering a group of programmable LUTs and therefore can be programmed on-the-fly to perform various logic/arithmetic operations. Our simulations show that 3DL-PIM can achieve respectively up to 2.6x higher processing performance at 2.65x higher area efficiency compared to a state-of-the-art 3-D stacked memory-based accelerator.

引用

页码：60 / 72

页数：13

共 44 条

[1]

[Anonymous], 2023, Nvidia data center deep learning product performance

[2]

[Anonymous], 2021, 8-bit inference with tensorrt

[3]

Bavikadi S., 2020, Proceedings of the 2020 on Great Lakes Symposium on VLSI, P89

[4]

Chang KK, 2016, INT S HIGH PERF COMP, P568, DOI 10.1109/HPCA.2016.7446095

[5] Architecting an Energy-Efficient DRAM System for GPUs [J].

Chatterjee, Niladrish ;

O'Connor, Mike ;

Lee, Donghyuk ;

Johnson, Daniel R. ;

Keckler, Stephen W. ;

Rhu, Minsoo ;

Dally, William J. .

2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, :73-84

[6]

Cho JH, 2018, ISSCC DIG TECH PAP I, P208, DOI 10.1109/ISSCC.2018.8310257

[7] Low-bit Quantization of Neural Networks for Efficient Inference [J].

Choukroun, Yoni ;

Kravchik, Eli ;

Yang, Fan ;

Kisilev, Pavel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :3009-3018

[8] LAcc: Exploiting Lookup Table-based Fast and Accurate Vector Multiplication in DRAM-based CNN Accelerator [J].

Deng, Quan ;

Zhang, Youtao ;

Zhang, Minxuan ;

Yang, Jun .

PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,

[9]

docs.opencv, About us

[10]

Eckert Y., 2014, Thermal feasibility of die-stacked processing in memory

← 1 2 3 4 5 →