In-Memory Transformer Self-Attention Mechanism Using Passive Memristor Crossbar

被引：0

作者：

Cai, Jack ^{[1
]}

Kaleem, Muhammad Ahsan ^{[1
]}

Genov, Roman ^{[1
]}

Azghadi, Mostafa Rahimi ^{[2
]}

Amirsoleimani, Amirali ^{[3
]}

机构：

[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON, Canada

[2] James Cook Univ, Coll Sci & Engn, Townsville, Qld 4811, Australia

[3] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 1P3, Canada

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024 | 2024年

关键词：

Memristor; In-Memory; Self-Attention; Neural Network Training; Backpropagation; Transformer;

D O I：

10.1109/ISCAS58744.2024.10558182

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Transformers have emerged as the state-of-the-art architecture for natural language processing (NLP) and computer vision. However, they are inefficient in both conventional and in-memory computing architectures as doubling their sequence length quadruples their time and memory complexity due to their self-attention mechanism. Traditional methods optimize self-attention using memory-efficient algorithms or approximate methods, such as locality-sensitive hashing (LSH) attention that reduces time and memory complexity from 0(L-2) to 0(L log L). In this work, we propose a hardware-level solution that further improves the computational efficiency of LSH attention by utilizing in-memory computing with semi-passive memristor arrays. We demonstrate that LSH can be performed with low-resolution, energy-efficient OT1R arrays performing stochastic memristive vector-matrix multiplication (VMM). Using circuit-level simulation, we show our proposed method is feasible as a drop-in approximation in Large Language Models (LLMs) with no degradation in evaluation metrics. Our results set the foundation for future works on computing the entire transformer architecture in-memory.

引用

页数：5

共 50 条

[31] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
Paiva, Pedro V. V.
Ramos, Josue J. G.
Gavrilova, Marina
Carvalho, Marco A. G.
COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
[32] Local self-attention in transformer for visual question answering
Shen, Xiang
Han, Dezhi
Guo, Zihan
Chen, Chongqing
Hua, Jie
Luo, Gaofeng
APPLIED INTELLIGENCE, 2023, 53 (13) : 16706 - 16723
[33] Local self-attention in transformer for visual question answering
Xiang Shen
Dezhi Han
Zihan Guo
Chongqing Chen
Jie Hua
Gaofeng Luo
Applied Intelligence, 2023, 53 : 16706 - 16723
[34] Vision Transformer Based on Reconfigurable Gaussian Self-attention
Zhao L.
Zhou J.-K.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1976 - 1988
[35] Tree Transformer: Integrating Tree Structures into Self-Attention
Wang, Yau-Shian
Lee, Hung-Yi
Chen, Yun-Nung
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1061 - 1070
[36] Maximization of Crossbar Array Memory Using Fundamental Memristor Theory
Eshraghian, Jason K.
Cho, Kyoung-Rok
Iu, Herbert H. C.
Fernando, Tyrone
Iannella, Nicolangelo
Kang, Sung-Mo
Eshraghian, Kamran
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2017, 64 (12) : 1402 - 1406
[37] A lightweight transformer with linear self-attention for defect recognition
Zhai, Yuwen
Li, Xinyu
Gao, Liang
Gao, Yiping
ELECTRONICS LETTERS, 2024, 60 (17)
[38] An efficient parallel self-attention transformer for CSI feedback
Liu, Ziang
Song, Tianyu
Zhao, Ruohan
Jin, Jiyu
Jin, Guiyue
PHYSICAL COMMUNICATION, 2024, 66
[39] Transformer Self-Attention Network for Forecasting Mortality Rates
Roshani, Amin
Izadi, Muhyiddin
Khaledi, Baha-Eldin
JIRSS-JOURNAL OF THE IRANIAN STATISTICAL SOCIETY, 2022, 21 (01): : 81 - 103
[40] Keyword Transformer: A Self-Attention Model for Keyword Spotting
Berg, Axel
O'Connor, Mark
Cruz, Miguel Tairum
INTERSPEECH 2021, 2021, : 4249 - 4253

← 1 2 3 4 5 →