A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions

被引:0
作者
Ahmed, Hameeza [1 ]
Santos, Paulo C. [2 ]
Lima, Joao P. C. [2 ]
Moura, Rafael F. [2 ]
Alves, Marco A. Z. [3 ]
Beck, Antonio C. S. [2 ]
Carro, Luigi [2 ]
机构
[1] NED Univ, Dept Comp & Informat Syst Engn, Karachi, Pakistan
[2] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[3] Univ Fed Parana, Dept Informat, Curitiba, Parana, Brazil
来源
2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE) | 2019年
关键词
Compiler; Processing in Memory; Near-data computing; Vector instructions; SIMD; 3D-Stacked memories;
D O I
10.23919/date.2019.8714956
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although not a new technique, due to the advent of 3D-stacked technologies, the integration of large memories and logic circuitry able to compute large amount of data has revived the Processing-in-Memory (PIM) techniques. PIM is a technique to increase performance while reducing energy consumption when dealing with large amounts of data. Despite several designs of PIM are available in the literature, their effective implementation still burdens the programmer. Also, various PIM instances are required to take advantage of the internal 3D-stacked memories, which further increases the challenges faced by the programmers. In this way, this work presents the Processing-In-Memory cOmpiler (PRIMO). Our compiler is able to efficiently exploit large vector units on a PIM architecture, directly from the original code. PRIMO is able to automatically select suitable PIM operations, allowing its automatic offloading. Moreover, PRIMO concerns about several PIM instances, selecting the most suitable instance while reduces internal communication between different PIM units. The compilation results of different benchmarks depict how PRIMO is able to exploit large vectors, while achieving a near-optimal performance when compared to the ideal execution for the case study PIM. PRIMO allows a speedup of 38x for specific kernels, while on average achieves 11.8x for a set of benchmarks from PolyBench Suite.
引用
收藏
页码:564 / 569
页数:6
相关论文
共 50 条
  • [41] Optimization of OLAP In-Memory Database Management Systems with Processing-In-Memory Architecture
    Hosseinzadeh, Shima
    Parvaresh, Amirhossein
    Fey, Dietmar
    ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2023, 2023, 13949 : 264 - 278
  • [42] PIM-Quantifier: A Processing-in-Memory Platform for mRNA Quantification
    Zhang, Fan
    Angizi, Shaahin
    Fahmi, Naima Ahmed
    Zhang, Wei
    Fan, Deliang
    2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 43 - 48
  • [43] Plug N? PIM: An integration strategy for Processing-in-Memory accelerators
    Santos, Paulo C.
    Forlin, Bruno E.
    Alves, Marco A. Z.
    Carro, Luigi
    INTEGRATION-THE VLSI JOURNAL, 2023, 88 : 185 - 195
  • [44] Processing-in-Memory Technology for Machine Learning: From Basic to ASIC
    Taylor, Brady
    Zheng, Qilin
    Li, Ziru
    Li, Shiyu
    Chen, Yiran
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (06) : 2598 - 2603
  • [45] Wave-PIM: AcceleratingWave Simulation Using Processing-in-Memory
    Hanindhito, Bagus
    Li, Ruihao
    Gourounas, Dimitrios
    Fathi, Arash
    Govil, Karan
    Trenev, Dimitar
    Gerstlauer, Andreas
    John, Lizy K.
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [46] NPC: A Non-Conflicting Processing-in-Memory Controller in DDR Memory Systems
    Lee, Seungyong
    Lee, Sanghyun
    Seo, Minseok
    Park, Chunmyung
    Shin, Woojae
    Lee, Hyuk-Jae
    Kim, Hyun
    IEEE TRANSACTIONS ON COMPUTERS, 2025, 74 (03) : 1025 - 1039
  • [47] Towards Memory-Efficient Processing-in-Memory Architecture for Convolutional Neural Networks
    Wang, Yi
    Zhang, Mingxu
    Yang, Jing
    ACM SIGPLAN NOTICES, 2017, 52 (05) : 81 - 90
  • [48] SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator
    Xie, Xinfeng
    Liang, Zheng
    Gu, Peng
    Basak, Abanti
    Deng, Lei
    Liang, Ling
    Hu, Xing
    Xie, Yuan
    2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 570 - 583
  • [49] Aggressive Performance Improvement on Processing-in-Memory Devices by Adopting Hugepages
    Santos, Paulo Cesar
    Forlin, Bruno E.
    Alves, Marco A. Z.
    Carro, Luigi
    2022 IEEE 33RD INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2022, : 60 - 63
  • [50] Accelerating CNN Training With Concurrent Execution of GPU and Processing-in-Memory
    Choi, Jungwoo
    Lee, Hyuk-Jae
    Sohn, Kyomin
    Yu, Hak-Soo
    Rhee, Chae Eun
    IEEE ACCESS, 2024, 12 : 160190 - 160204