In-Memory Transformer Self-Attention Mechanism Using Passive Memristor Crossbar

被引：0

作者：

Cai, Jack ^{[1
]}

Kaleem, Muhammad Ahsan ^{[1
]}

Genov, Roman ^{[1
]}

Azghadi, Mostafa Rahimi ^{[2
]}

Amirsoleimani, Amirali ^{[3
]}

机构：

[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON, Canada

[2] James Cook Univ, Coll Sci & Engn, Townsville, Qld 4811, Australia

[3] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 1P3, Canada

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024 | 2024年

关键词：

Memristor; In-Memory; Self-Attention; Neural Network Training; Backpropagation; Transformer;

D O I：

10.1109/ISCAS58744.2024.10558182

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Transformers have emerged as the state-of-the-art architecture for natural language processing (NLP) and computer vision. However, they are inefficient in both conventional and in-memory computing architectures as doubling their sequence length quadruples their time and memory complexity due to their self-attention mechanism. Traditional methods optimize self-attention using memory-efficient algorithms or approximate methods, such as locality-sensitive hashing (LSH) attention that reduces time and memory complexity from 0(L-2) to 0(L log L). In this work, we propose a hardware-level solution that further improves the computational efficiency of LSH attention by utilizing in-memory computing with semi-passive memristor arrays. We demonstrate that LSH can be performed with low-resolution, energy-efficient OT1R arrays performing stochastic memristive vector-matrix multiplication (VMM). Using circuit-level simulation, we show our proposed method is feasible as a drop-in approximation in Large Language Models (LLMs) with no degradation in evaluation metrics. Our results set the foundation for future works on computing the entire transformer architecture in-memory.

引用

页数：5

共 50 条

[41] A personalized paper recommendation method based on knowledge graph and transformer encoder with a self-attention mechanism
Gao, Li
Lan, Yu
Yu, Zhen
Zhu, Jian-min
APPLIED INTELLIGENCE, 2023, 53 (24) : 29991 - 30008
[42] A personalized paper recommendation method based on knowledge graph and transformer encoder with a self-attention mechanism
Li Gao
Yu Lan
Zhen Yu
Jian-min Zhu
Applied Intelligence, 2023, 53 : 29991 - 30008
[43] Unsupervised Pansharpening Based on Self-Attention Mechanism
Qu, Ying
Baghbaderani, Razieh Kaviani
Qi, Hairong
Kwan, Chiman
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (04): : 3192 - 3208
[44] Linear Complexity Randomized Self-attention Mechanism
Zheng, Lin
Wang, Chong
Kong, Lingpeng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[45] Keyphrase Generation Based on Self-Attention Mechanism
Yang, Kehua
Wang, Yaodong
Zhang, Wei
Yao, Jiqing
Le, Yuquan
CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 61 (02): : 569 - 581
[46] Self-Attention Mechanism in GANs for Molecule Generation
Chinnareddy, Sandeep
Grandhi, Pranav
Narayan, Apurva
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 57 - 60
[47] Self-rectifying resistive memory in passive crossbar arrays
Kanghyeok Jeon
Jeeson Kim
Jin Joo Ryu
Seung-Jong Yoo
Choongseok Song
Min Kyu Yang
Doo Seok Jeong
Gun Hwan Kim
Nature Communications, 12
[48] Self-rectifying resistive memory in passive crossbar arrays
Jeon, Kanghyeok
Kim, Jeeson
Ryu, Jin Joo
Yoo, Seung-Jong
Song, Choongseok
Yang, Min Kyu
Jeong, Doo Seok
Kim, Gun Hwan
NATURE COMMUNICATIONS, 2021, 12 (01)
[49] Self-attention mechanism for distributed compressive sensing
Shu, Feng
Zhang, Linghua
Ding, Yin
Cheng, Qin
Wang, Xu
ELECTRONICS LETTERS, 2022, 58 (10) : 405 - 407
[50] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Leem, Saebom
Seo, Hyunseok
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 2956 - 2964

← 1 2 3 4 5 →