In-Memory Transformer Self-Attention Mechanism Using Passive Memristor Crossbar

被引:0
|
作者
Cai, Jack [1 ]
Kaleem, Muhammad Ahsan [1 ]
Genov, Roman [1 ]
Azghadi, Mostafa Rahimi [2 ]
Amirsoleimani, Amirali [3 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON, Canada
[2] James Cook Univ, Coll Sci & Engn, Townsville, Qld 4811, Australia
[3] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 1P3, Canada
关键词
Memristor; In-Memory; Self-Attention; Neural Network Training; Backpropagation; Transformer;
D O I
10.1109/ISCAS58744.2024.10558182
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Transformers have emerged as the state-of-the-art architecture for natural language processing (NLP) and computer vision. However, they are inefficient in both conventional and in-memory computing architectures as doubling their sequence length quadruples their time and memory complexity due to their self-attention mechanism. Traditional methods optimize self-attention using memory-efficient algorithms or approximate methods, such as locality-sensitive hashing (LSH) attention that reduces time and memory complexity from 0(L-2) to 0(L log L). In this work, we propose a hardware-level solution that further improves the computational efficiency of LSH attention by utilizing in-memory computing with semi-passive memristor arrays. We demonstrate that LSH can be performed with low-resolution, energy-efficient OT1R arrays performing stochastic memristive vector-matrix multiplication (VMM). Using circuit-level simulation, we show our proposed method is feasible as a drop-in approximation in Large Language Models (LLMs) with no degradation in evaluation metrics. Our results set the foundation for future works on computing the entire transformer architecture in-memory.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] In-Memory Set Operations on Memristor Crossbar
    Kishori, Kajal
    Pyne, Sumanta
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 5061 - 5071
  • [2] Efficient memristor accelerator for transformer self-attention functionality
    Bettayeb, Meriem
    Halawani, Yasmin
    Khan, Muhammad Umair
    Saleh, Hani
    Mohammad, Baker
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [3] A Scalable In-Memory Logic Synthesis Approach Using Memristor Crossbar
    Gharpinde, Rahul
    Thangkhiew, Phrangboklang Lynton
    Datta, Kamalika
    Sengupta, Indranil
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (02) : 355 - 366
  • [4] HYPERLOCK: In-Memory Hyperdimensional Encryption in Memristor Crossbar Array
    Cai, Jack
    Amirsoleimani, Amirali
    Genov, Roman
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 960 - 964
  • [5] Towards an In-Memory Reconfiguration of Arithmetic Logical Unit using Memristor Crossbar Array
    Yadav, Dev Narayan
    Thangkhiew, P. L.
    2018 IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES (CONECCT), 2018,
  • [6] In-Memory Hamming Error-Correcting Code in Memristor Crossbar
    Bae, Woorham
    Han, Jin-Woo
    Yoon, Kyung Jean
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2022, 69 (07) : 3700 - 3707
  • [7] Transformer with sparse self-attention mechanism for image captioning
    Wang, Duofeng
    Hu, Haifeng
    Chen, Dihu
    ELECTRONICS LETTERS, 2020, 56 (15) : 764 - +
  • [8] An abstractive text summarization technique using transformer model with self-attention mechanism
    Kumar, Sandeep
    Solanki, Arun
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (25): : 18603 - 18622
  • [9] An abstractive text summarization technique using transformer model with self-attention mechanism
    Sandeep Kumar
    Arun Solanki
    Neural Computing and Applications, 2023, 35 : 18603 - 18622
  • [10] Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
    Wu, Chunyang
    Wang, Yongqiang
    Shi, Yangyang
    Yeh, Ching-Feng
    Zhang, Frank
    INTERSPEECH 2020, 2020, : 2132 - 2136