RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models

被引：0

作者：

Jeon, Yunhyeong ^{[1
]}

Jang, Minwoo ^{[1
]}

Lee, Hwanjun ^{[1
]}

Jung, Yeji ^{[1
]}

Jung, Jin ^{[2
]}

Lee, Jonggeon ^{[2
]}

So, Jinin ^{[2
]}

Kim, Daehoon ^{[3
]}

机构：

[1] DGIST, Daegu 42988, South Korea

[2] Samsung Elect, Hwaseong 443743, South Korea

[3] Yonsei Univ, Seoul 03722, South Korea

来源：

IEEE COMPUTER ARCHITECTURE LETTERS | 2025年 / 24卷 / 01期

关键词：

Graphics processing units; Transformers; Random access memory; Kernel; Computer architecture; Natural language processing; Computational modeling; Vectors; Inverters; Encoding; Processing-in-memory; transformer model; rotary positional embedding;

D O I：

10.1109/LCA.2025.3535470

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The emergence of attention-based Transformer models, such as GPT, BERT, and LLaMA, has revolutionized Natural Language Processing (NLP) by significantly improving performance across a wide range of applications. A critical factor driving these improvements is the use of positional embeddings, which are crucial for capturing the contextual relationships between tokens in a sequence. However, current positional embedding methods face challenges, particularly in managing performance overhead for long sequences and effectively capturing relationships between adjacent tokens. In response, Rotary Positional Embedding (RoPE) has emerged as a method that effectively embeds positional information with high accuracy and without necessitating model retraining even with long sequences. Despite its effectiveness, RoPE introduces a considerable performance bottleneck during inference. We observe that RoPE accounts for 61% of GPU execution time due to extensive data movement and execution dependencies. In this paper, we introduce RoPIM, a Processing-In-Memory (PIM) architecture designed to efficiently accelerate RoPE operations in Transformer models. RoPIM achieves this by utilizing a bank-level accelerator that reduces off-chip data movement through in-accelerator support for multiply-addition operations and minimizes operational dependencies via parallel data rearrangement. Additionally, RoPIM proposes an optimized data mapping strategy that leverages both bank-level and row-level mappings to enable parallel execution, eliminate bank-to-bank communication, and reduce DRAM activations. Our experimental results show that RoPIM achieves up to a 307.9x performance improvement and 914.1x energy savings compared to conventional systems.

引用

页码：41 / 44

页数：4

共 49 条

[21] PE-Attack: On the Universal Positional Embedding Vulnerability in Transformer-Based Models
Gao, Shiqi
Zhou, Haoyi
Chen, Tianyu
He, Mingrui
Xu, Runhua
Li, Jianxin
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 9359 - 9373
[22] Gibbon: An Efficient Co-Exploration Framework of NN Model and Processing-In-Memory Architecture
Sun, Hanbo
Zhu, Zhenhua
Wang, Chenyu
Ning, Xuefei
Dai, Guohao
Yang, Huazhong
Wang, Yu
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 4075 - 4089
[23] AR-PIM: An Adaptive-Range Processing-in-Memory Architecture
Chou, Teyuh
Garcia-Redondo, Fernando
Whatmough, Paul
Zhang, Zhengya
2023 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED, 2023,
[24] A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture
Jung, Sangwoo
Lee, Jaehyun
Park, Dahoon
Lee, Youngjoo
Yoon, Jong-Hyeok
Kung, Jaeha
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, : 5546 - 5559
[25] A Study of Data Layout in Multi-channel Processing-In-Memory Architecture
Jeong, Taeyang
Choi, Duheon
Han, Sangwoo
Chung, Eui-Young
PROCEEDINGS OF 2018 7TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2018), 2018, : 134 - 138
[26] Accelerating Graph Convolutional Networks Using Crossbar-based Processing-In-Memory Architectures
Huang, Yu
Zheng, Long
Yao, Pengcheng
Wang, Qinggang
Liao, Xiaofei
Jin, Hai
Xue, Jingling
2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 1029 - 1042
[27] A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal
Han, Lei
Shen, Zhaoyan
Liu, Duo
Shao, Zili
Huang, H. Howie
Li, Tao
ACM TRANSACTIONS ON STORAGE, 2018, 14 (01)
[28] A bio-inspired positional embedding network for transformer-based models
Tang, Xue-song
Hao, Kuangrong
Wei, Hui
NEURAL NETWORKS, 2023, 166 : 204 - 214
[29] A Processing-in-Memory Architecture Programming Paradigm for Wireless Internet-of-Things Applications
Yang, Xu
Hou, Yumin
He, Hu
SENSORS, 2019, 19 (01)
[30] Sky-Sorter: A Processing-in-Memory Architecture for Large-Scale Sorting
Zokaee, Farzaneh
Chen, Fan
Sun, Guangyu
Jiang, Lei
IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (02) : 480 - 493

← 1 2 3 4 5 →