Efficient memristor accelerator for transformer self-attention functionality

被引:0
|
作者
Bettayeb, Meriem [1 ,2 ]
Halawani, Yasmin [3 ]
Khan, Muhammad Umair [1 ]
Saleh, Hani [1 ]
Mohammad, Baker [1 ]
机构
[1] Khalifa Univ, Syst Onchip Lab, Comp & Informat Engn, Abu Dhabi, U Arab Emirates
[2] Abu Dhabi Univ, Coll Engn, Comp Sci & Informat Technol Dept, Abu Dhabi, U Arab Emirates
[3] Univ Dubai, Coll Engn & IT, Dubai, U Arab Emirates
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
ARCHITECTURE;
D O I
10.1038/s41598-024-75021-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The adoption of transformer networks has experienced a notable surge in various AI applications. However, the increased computational complexity, stemming primarily from the self-attention mechanism, parallels the manner in which convolution operations constrain the capabilities and speed of convolutional neural networks (CNNs). The self-attention algorithm, specifically the matrix-matrix multiplication (MatMul) operations, demands a substantial amount of memory and computational complexity, thereby restricting the overall performance of the transformer. This paper introduces an efficient hardware accelerator for the transformer network, leveraging memristor-based in-memory computing. The design targets the memory bottleneck associated with MatMul operations in the self-attention process, utilizing approximate analog computation and the highly parallel computations facilitated by the memristor crossbar architecture. Remarkably, this approach resulted in a reduction of approximately 10 times in the number of multiply-accumulate (MAC) operations in transformer networks, while maintaining 95.47% accuracy for the MNIST dataset, as validated by a comprehensive circuit simulator employing NeuroSim 3.0. Simulation outcomes indicate an area utilization of 6895.7 mu m2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu m<^>2$$\end{document}, a latency of 15.52 seconds, an energy consumption of 3 mJ, and a leakage power of 59.55 mu W\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu W$$\end{document}. The methodology outlined in this paper represents a substantial stride towards a hardware-friendly transformer architecture for edge devices, poised to achieve real-time performance.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Re-Transformer: A Self-Attention Based Model for Machine Translation
    Liu, Huey-Ing
    Chen, Wei-Lin
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10
  • [42] Wavelet Frequency Division Self-Attention Transformer Image Deraining Network
    Fang, Siyan
    Liu, Bin
    Computer Engineering and Applications, 2024, 60 (06) : 259 - 273
  • [43] MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION
    Wang, Rui
    Ao, Junyi
    Zhou, Long
    Liu, Shujie
    Wei, Zhihua
    Ko, Tom
    Li, Qing
    Zhang, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6732 - 6736
  • [44] Bottleneck Transformer model with Channel Self-Attention for skin lesion classification
    Tada, Masato
    Han, Xian-Hua
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [45] A self-attention armed optronic transformer in imaging through scattering media
    Huang, Zicheng
    Shi, Mengyang
    Ma, Jiahui
    Gao, Yesheng
    Liu, Xingzhao
    OPTICS COMMUNICATIONS, 2024, 571
  • [46] CNN-TRANSFORMER WITH SELF-ATTENTION NETWORK FOR SOUND EVENT DETECTION
    Wakayama, Keigo
    Saito, Shoichiro
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 806 - 810
  • [47] Global-Local Self-Attention Based Transformer for Speaker Verification
    Xie, Fei
    Zhang, Dalong
    Liu, Chengming
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [48] MixSynthFormer: A Transformer Encoder-like Structure with Mixed Synthetic Self-attention for Efficient Human Pose Estimation
    Sun, Yuran
    Dougherty, Alan William
    Zhang, Zhuoying
    Choi, Yi King
    Wu, Chuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14838 - 14847
  • [49] Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps
    Choi, Jeong-Hwan
    Yang, Joon-Young
    Chang, Joon-Hyuk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4580 - 4595
  • [50] DCTN: Dual-Branch Convolutional Transformer Network With Efficient Interactive Self-Attention for Hyperspectral Image Classification
    Zhou, Yunfei
    Huang, Xiaohui
    Yang, Xiaofei
    Peng, Jiangtao
    Ban, Yifang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16