3DL-PIM: A Look-Up Table Oriented Programmable Processing in Memory Architecture Based on the 3-D Stacked Memory for Data-Intensive Applications

被引:4
作者
Sutradhar, Purab Ranjan [1 ]
Bavikadi, Sathwika [3 ]
Dinakarrao, Sai Manoj Pudukotai [3 ]
Indovina, Mark A. [2 ]
Ganguly, Amlan [1 ]
机构
[1] Rochester Inst Technol, Dept Comp Engn, Rochester, NY 14623 USA
[2] Rochester Inst Technol, Dept Elect & Microelect Engn, Rochester, NY 14623 USA
[3] George Mason Univ, Dept Elect & Comp Engn, Fairfax, VA 22030 USA
基金
美国国家科学基金会;
关键词
3-D memory; data encryption; deep neural networks; look-up table; parallel processing; processing-in-memory; DRAM;
D O I
10.1109/TETC.2023.3293140
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Memory-centric computing systems have demonstrated superior performance and efficiency in memory-intensive applications compared to state-of-the-art CPUs and GPUs. 3-D stacked DRAM architectures unlock higher I/O data bandwidth than the traditional 2-D memory architecture and therefore are better suited for incorporating memory-centric processors. However, merely integrating high-precision ALUs in the 3-D stacked memory does not ensure an optimized design since such a design can only achieve a limited utilization of the internal bandwidth of a memory chip and limited operational parallelization. To address this, we propose 3DL-PIM, a 3-D stacked memory-based Processing in Memory (PIM) architecture that locates a plurality of Look-up Table (LUT)-based low-footprint Processing Elements (PE) within the memory banks in order to achieve high parallel computing performance by maximizing data-bandwidth utilization. Instead of relying on the traditional logic-based ALUs, the PEs are formed by clustering a group of programmable LUTs and therefore can be programmed on-the-fly to perform various logic/arithmetic operations. Our simulations show that 3DL-PIM can achieve respectively up to 2.6x higher processing performance at 2.65x higher area efficiency compared to a state-of-the-art 3-D stacked memory-based accelerator.
引用
收藏
页码:60 / 72
页数:13
相关论文
共 44 条
[1]  
[Anonymous], 2023, Nvidia data center deep learning product performance
[2]  
[Anonymous], 2021, 8-bit inference with tensorrt
[3]  
Bavikadi S., 2020, Proceedings of the 2020 on Great Lakes Symposium on VLSI, P89
[4]  
Chang KK, 2016, INT S HIGH PERF COMP, P568, DOI 10.1109/HPCA.2016.7446095
[5]   Architecting an Energy-Efficient DRAM System for GPUs [J].
Chatterjee, Niladrish ;
O'Connor, Mike ;
Lee, Donghyuk ;
Johnson, Daniel R. ;
Keckler, Stephen W. ;
Rhu, Minsoo ;
Dally, William J. .
2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, :73-84
[6]  
Cho JH, 2018, ISSCC DIG TECH PAP I, P208, DOI 10.1109/ISSCC.2018.8310257
[7]   Low-bit Quantization of Neural Networks for Efficient Inference [J].
Choukroun, Yoni ;
Kravchik, Eli ;
Yang, Fan ;
Kisilev, Pavel .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :3009-3018
[8]   LAcc: Exploiting Lookup Table-based Fast and Accurate Vector Multiplication in DRAM-based CNN Accelerator [J].
Deng, Quan ;
Zhang, Youtao ;
Zhang, Minxuan ;
Yang, Jun .
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[9]  
docs.opencv, About us
[10]  
Eckert Y., 2014, Thermal feasibility of die-stacked processing in memory