TransPimLib: Efficient Transcendental Functions for Processing-in-Memory Systems

被引:5
作者
Item, Maurus [1 ]
Gomez-Luna, Juan [1 ]
Guo, Yuxin [1 ]
Oliveira, Geraldo F. [1 ]
Sadrosadati, Mohammad [1 ]
Mutlu, Onur [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
来源
2023 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, ISPASS | 2023年
关键词
processing-in-memory; processing-near-memory; transcendental functions; activation functions; machine learning;
D O I
10.1109/ISPASS57527.2023.00031
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hardto-calculate) functions in general-purpose PIM systems, we present TransPimLib, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at https://github.com/CMU-SAFARI/transpimlib.
引用
收藏
页码:235 / 247
页数:13
相关论文
共 147 条
  • [1] Compute Caches
    Aga, Shaizeen
    Jeloka, Supreet
    Subramaniyan, Arun
    Narayanasamy, Satish
    Blaauw, David
    Das, Reetuparna
    [J]. 2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 481 - 492
  • [2] A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing
    Ahn, Junwhan
    Hong, Sungpack
    Yoo, Sungjoo
    Mutlu, Onur
    Choi, Kiyoung
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 105 - 117
  • [3] PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture
    Ahn, Junwhan
    Yoo, Sungjoo
    Mutlu, Onur
    Choi, Kiyoung
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 336 - 348
  • [4] Data Reorganization in Memory Using 3D-stacked DRAM
    Akin, Berkin
    Franchetti, Franz
    Hoe, James C.
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 131 - 143
  • [5] Alpaydin E., 2020, INTRO MACHINE LEARNI, P498, DOI DOI 10.7551/MITPRESS/13811.001.0001
  • [6] Ambrosi J, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), P141
  • [7] AlignS: A Processing-In-Memory Accelerator for DNA Short Read Alignment Leveraging SOT-MRAM
    Angizi, Shaahin
    Sun, Jiao
    Zhang, Wei
    Fan, Deliang
    [J]. PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [8] CMP-PIM: An Energy-Efficient Comparator-based Processing-In-Memory Neural Network Accelerator
    Angizi, Shaahin
    He, Zhezhi
    Rakin, Adnan Siraj
    Fan, Deliang
    [J]. 2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2018,
  • [9] PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM
    Ankit, Aayush
    El Hajj, Izzat
    Chalamalasetti, Sai Rahul
    Agarwal, Sapan
    Marinella, Matthew
    Foltin, Martin
    Strachan, John Paul
    Milojicic, Dejan
    Hwu, Wen-Mei
    Roy, Kaushik
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (08) : 1128 - 1142
  • [10] PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference
    Ankit, Aayush
    El Hajj, Izzat
    Chalamalasetti, Sai Rahul
    Ndu, Geoffrey
    Foltin, Martin
    Williams, R. Stanley
    Faraboschi, Paolo
    Hwu, Wen-mei
    Strachan, John Paul
    Roy, Kaushik
    Milojicic, Dejan S.
    [J]. TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV), 2019, : 715 - 731