Efficient memristor accelerator for transformer self-attention functionality

被引：0

作者：

Bettayeb, Meriem ^{[1
,2
]}

Halawani, Yasmin ^{[3
]}

Khan, Muhammad Umair ^{[1
]}

Saleh, Hani ^{[1
]}

Mohammad, Baker ^{[1
]}

机构：

[1] Khalifa Univ, Syst Onchip Lab, Comp & Informat Engn, Abu Dhabi, U Arab Emirates

[2] Abu Dhabi Univ, Coll Engn, Comp Sci & Informat Technol Dept, Abu Dhabi, U Arab Emirates

[3] Univ Dubai, Coll Engn & IT, Dubai, U Arab Emirates

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

关键词：

ARCHITECTURE;

D O I：

10.1038/s41598-024-75021-z

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The adoption of transformer networks has experienced a notable surge in various AI applications. However, the increased computational complexity, stemming primarily from the self-attention mechanism, parallels the manner in which convolution operations constrain the capabilities and speed of convolutional neural networks (CNNs). The self-attention algorithm, specifically the matrix-matrix multiplication (MatMul) operations, demands a substantial amount of memory and computational complexity, thereby restricting the overall performance of the transformer. This paper introduces an efficient hardware accelerator for the transformer network, leveraging memristor-based in-memory computing. The design targets the memory bottleneck associated with MatMul operations in the self-attention process, utilizing approximate analog computation and the highly parallel computations facilitated by the memristor crossbar architecture. Remarkably, this approach resulted in a reduction of approximately 10 times in the number of multiply-accumulate (MAC) operations in transformer networks, while maintaining 95.47% accuracy for the MNIST dataset, as validated by a comprehensive circuit simulator employing NeuroSim 3.0. Simulation outcomes indicate an area utilization of 6895.7 mu m2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu m<^>2$$\end{document}, a latency of 15.52 seconds, an energy consumption of 3 mJ, and a leakage power of 59.55 mu W\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu W$$\end{document}. The methodology outlined in this paper represents a substantial stride towards a hardware-friendly transformer architecture for edge devices, poised to achieve real-time performance.

引用

页数：15

共 50 条

[41] Re-Transformer: A Self-Attention Based Model for Machine Translation
Liu, Huey-Ing
Chen, Wei-Lin
AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10
[42] Wavelet Frequency Division Self-Attention Transformer Image Deraining Network
Fang, Siyan
Liu, Bin
Computer Engineering and Applications, 2024, 60 (06) : 259 - 273
[43] MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION
Wang, Rui
Ao, Junyi
Zhou, Long
Liu, Shujie
Wei, Zhihua
Ko, Tom
Li, Qing
Zhang, Yu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6732 - 6736
[44] Bottleneck Transformer model with Channel Self-Attention for skin lesion classification
Tada, Masato
Han, Xian-Hua
2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
[45] A self-attention armed optronic transformer in imaging through scattering media
Huang, Zicheng
Shi, Mengyang
Ma, Jiahui
Gao, Yesheng
Liu, Xingzhao
OPTICS COMMUNICATIONS, 2024, 571
[46] CNN-TRANSFORMER WITH SELF-ATTENTION NETWORK FOR SOUND EVENT DETECTION
Wakayama, Keigo
Saito, Shoichiro
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 806 - 810
[47] Global-Local Self-Attention Based Transformer for Speaker Verification
Xie, Fei
Zhang, Dalong
Liu, Chengming
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[48] MixSynthFormer: A Transformer Encoder-like Structure with Mixed Synthetic Self-attention for Efficient Human Pose Estimation
Sun, Yuran
Dougherty, Alan William
Zhang, Zhuoying
Choi, Yi King
Wu, Chuan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14838 - 14847
[49] Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps
Choi, Jeong-Hwan
Yang, Joon-Young
Chang, Joon-Hyuk
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4580 - 4595
[50] DCTN: Dual-Branch Convolutional Transformer Network With Efficient Interactive Self-Attention for Hyperspectral Image Classification
Zhou, Yunfei
Huang, Xiaohui
Yang, Xiaofei
Peng, Jiangtao
Ban, Yifang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16

← 1 2 3 4 5 →