RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models

被引:0
|
作者
Jeon, Yunhyeong [1 ]
Jang, Minwoo [1 ]
Lee, Hwanjun [1 ]
Jung, Yeji [1 ]
Jung, Jin [2 ]
Lee, Jonggeon [2 ]
So, Jinin [2 ]
Kim, Daehoon [3 ]
机构
[1] DGIST, Daegu 42988, South Korea
[2] Samsung Elect, Hwaseong 443743, South Korea
[3] Yonsei Univ, Seoul 03722, South Korea
关键词
Graphics processing units; Transformers; Random access memory; Kernel; Computer architecture; Natural language processing; Computational modeling; Vectors; Inverters; Encoding; Processing-in-memory; transformer model; rotary positional embedding;
D O I
10.1109/LCA.2025.3535470
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The emergence of attention-based Transformer models, such as GPT, BERT, and LLaMA, has revolutionized Natural Language Processing (NLP) by significantly improving performance across a wide range of applications. A critical factor driving these improvements is the use of positional embeddings, which are crucial for capturing the contextual relationships between tokens in a sequence. However, current positional embedding methods face challenges, particularly in managing performance overhead for long sequences and effectively capturing relationships between adjacent tokens. In response, Rotary Positional Embedding (RoPE) has emerged as a method that effectively embeds positional information with high accuracy and without necessitating model retraining even with long sequences. Despite its effectiveness, RoPE introduces a considerable performance bottleneck during inference. We observe that RoPE accounts for 61% of GPU execution time due to extensive data movement and execution dependencies. In this paper, we introduce RoPIM, a Processing-In-Memory (PIM) architecture designed to efficiently accelerate RoPE operations in Transformer models. RoPIM achieves this by utilizing a bank-level accelerator that reduces off-chip data movement through in-accelerator support for multiply-addition operations and minimizes operational dependencies via parallel data rearrangement. Additionally, RoPIM proposes an optimized data mapping strategy that leverages both bank-level and row-level mappings to enable parallel execution, eliminate bank-to-bank communication, and reduce DRAM activations. Our experimental results show that RoPIM achieves up to a 307.9x performance improvement and 914.1x energy savings compared to conventional systems.
引用
收藏
页码:41 / 44
页数:4
相关论文
共 49 条
  • [31] abstractPIM: Bridging the Gap Between Processing-In-Memory Technology and Instruction Set Architecture
    Eliahu, Adi
    Ben-Hur, Rotem
    Ronen, Ronny
    Kvatinsky, Shahar
    2020 IFIP/IEEE 28TH INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2020, : 28 - 33
  • [32] PIMFlow: Compiler and Runtime Support for CNN Models on Processing-in-Memory DRAM
    Shin, Yongwon
    Park, Juseong
    Cho, Sungjun
    Sung, Hyojin
    PROCEEDINGS OF THE 21ST ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO 2023, 2023, : 249 - 262
  • [33] PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
    Chi, Ping
    Li, Shuangchen
    Xu, Cong
    Zhang, Tao
    Zhao, Jishen
    Liu, Yongpan
    Wang, Yu
    Xie, Yuan
    2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 27 - 39
  • [34] DrPIM: An Adaptive and Less-blocking Data Replication Framework for Processing-in-Memory Architecture
    Xu, Sheng
    Xue, Hongyu
    Luo, Le
    Yan, Liang
    Zou, Xingqi
    PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 385 - 389
  • [35] A prototype Processing-in-Memory (PIM) chip for the Data-Intensive Architecture (DIVA) system
    Draper, J
    Barrett, J
    Sondeen, J
    Mediratta, S
    Kang, C
    Kim, I
    Daglikoca, G
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2005, 40 (01): : 73 - 84
  • [36] ILP-based Multi-Branch CNNs Mapping on Processing-in-Memory Architecture
    Han, Haodong
    Wang, Junpeng
    Ding, Bo
    Chen, Song
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 179 - 183
  • [37] A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System
    Jaffrey Draper
    J. Tim Barrett
    Jeff Sondeen
    Sumit Mediratta
    Chang Woo Kang
    Ihn Kim
    Gokhan Daglikoca
    Journal of VLSI signal processing systems for signal, image and video technology, 2005, 40 : 73 - 84
  • [38] Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications
    Sutradhar, Purab Ranjan
    Bavikadi, Sathwika
    Connolly, Mark
    Prajapati, Savankumar
    Indovina, Mark A.
    Dinakarrao, Sai Manoj Pudukotai
    Ganguly, Amlan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (02) : 263 - 275
  • [39] Genetic Algorithm-Based Energy-Aware CNN Quantization for Processing-In-Memory Architecture
    Kang, Beomseok
    Lu, Anni
    Long, Yun
    Kim, Daehyun
    Yu, Shimeng
    Mukhopadhyay, Saibal
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2021, 11 (04) : 649 - 662
  • [40] Flexible Instruction Set Architecture for Programmable Look-up Table based Processing-in-Memory
    Connolly, Mark
    Sutradhar, Purab Ranjan
    Indovina, Mark
    Ganguly, Amlan
    2021 IEEE 39TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2021), 2021, : 66 - 73