In-Memory Transformer Self-Attention Mechanism Using Passive Memristor Crossbar

被引：0

作者：

Cai, Jack ^{[1
]}

Kaleem, Muhammad Ahsan ^{[1
]}

Genov, Roman ^{[1
]}

Azghadi, Mostafa Rahimi ^{[2
]}

Amirsoleimani, Amirali ^{[3
]}

机构：

[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON, Canada

[2] James Cook Univ, Coll Sci & Engn, Townsville, Qld 4811, Australia

[3] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 1P3, Canada

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024 | 2024年

关键词：

Memristor; In-Memory; Self-Attention; Neural Network Training; Backpropagation; Transformer;

D O I：

10.1109/ISCAS58744.2024.10558182

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Transformers have emerged as the state-of-the-art architecture for natural language processing (NLP) and computer vision. However, they are inefficient in both conventional and in-memory computing architectures as doubling their sequence length quadruples their time and memory complexity due to their self-attention mechanism. Traditional methods optimize self-attention using memory-efficient algorithms or approximate methods, such as locality-sensitive hashing (LSH) attention that reduces time and memory complexity from 0(L-2) to 0(L log L). In this work, we propose a hardware-level solution that further improves the computational efficiency of LSH attention by utilizing in-memory computing with semi-passive memristor arrays. We demonstrate that LSH can be performed with low-resolution, energy-efficient OT1R arrays performing stochastic memristive vector-matrix multiplication (VMM). Using circuit-level simulation, we show our proposed method is feasible as a drop-in approximation in Large Language Models (LLMs) with no degradation in evaluation metrics. Our results set the foundation for future works on computing the entire transformer architecture in-memory.

引用

页数：5

共 50 条

[1] In-Memory Set Operations on Memristor Crossbar
Kishori, Kajal
Pyne, Sumanta
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 5061 - 5071
[2] Efficient memristor accelerator for transformer self-attention functionality
Bettayeb, Meriem
Halawani, Yasmin
Khan, Muhammad Umair
Saleh, Hani
Mohammad, Baker
SCIENTIFIC REPORTS, 2024, 14 (01):
[3] A Scalable In-Memory Logic Synthesis Approach Using Memristor Crossbar
Gharpinde, Rahul
Thangkhiew, Phrangboklang Lynton
Datta, Kamalika
Sengupta, Indranil
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (02) : 355 - 366
[4] HYPERLOCK: In-Memory Hyperdimensional Encryption in Memristor Crossbar Array
Cai, Jack
Amirsoleimani, Amirali
Genov, Roman
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 960 - 964
[5] Towards an In-Memory Reconfiguration of Arithmetic Logical Unit using Memristor Crossbar Array
Yadav, Dev Narayan
Thangkhiew, P. L.
2018 IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES (CONECCT), 2018,
[6] In-Memory Hamming Error-Correcting Code in Memristor Crossbar
Bae, Woorham
Han, Jin-Woo
Yoon, Kyung Jean
IEEE TRANSACTIONS ON ELECTRON DEVICES, 2022, 69 (07) : 3700 - 3707
[7] Transformer with sparse self-attention mechanism for image captioning
Wang, Duofeng
Hu, Haifeng
Chen, Dihu
ELECTRONICS LETTERS, 2020, 56 (15) : 764 - +
[8] An abstractive text summarization technique using transformer model with self-attention mechanism
Kumar, Sandeep
Solanki, Arun
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (25): : 18603 - 18622
[9] An abstractive text summarization technique using transformer model with self-attention mechanism
Sandeep Kumar
Arun Solanki
Neural Computing and Applications, 2023, 35 : 18603 - 18622
[10] Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
Wu, Chunyang
Wang, Yongqiang
Shi, Yangyang
Yeh, Ching-Feng
Zhang, Frank
INTERSPEECH 2020, 2020, : 2132 - 2136

← 1 2 3 4 5 →