On The Computational Complexity of Self-Attention

被引:0
作者
Keles, Feyza Duman
Wijewardena, Pruthuvi Mahesakya
Hegde, Chinmay
机构
来源
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201 | 2023年 / 201卷
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.
引用
收藏
页码:597 / 619
页数:23
相关论文
共 50 条
[31]   Integrated Multi-Head Self-Attention Transformer model for electricity demand prediction incorporating local climate variables [J].
Ghimire, Sujan ;
Nguyen-Huy, Thong ;
AL-Musaylh, Mohanad S. ;
Deo, Ravinesh C. ;
Casillas-Perez, David ;
Salcedo-Sanz, Sancho .
ENERGY AND AI, 2023, 14
[32]   Uncertain Interruptibility Multiobjective Flexible Job Shop via Deep Reinforcement Learning Based on Heterogeneous Graph Self-Attention [J].
Wang, Zunxun ;
Li, Junqing ;
Chen, Xiaolong ;
Duan, Peiyong ;
Li, Jiake .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025,
[33]   Intelligent Task Allocation and Planning for Unmanned Surface Vehicle (USV) Using Self-Attention Mechanism and Locking Sweeping Method [J].
Luo, Jing ;
Zhang, Yuhang ;
Zhuang, Jiayuan ;
Su, Yumin .
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (01)
[34]   A novel intelligent fault diagnosis method of bearing based on multi-head self-attention convolutional neural network [J].
Ren, Hang ;
Liu, Shaogang ;
Qiu, Bo ;
Guo, Hong ;
Zhao, Dan .
AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2024, 38
[35]   STA-GCN: Spatial-Temporal Self-Attention Graph Convolutional Networks for Traffic-Flow Prediction [J].
Chang, Zhihong ;
Liu, Chunsheng ;
Jia, Jianmin .
APPLIED SCIENCES-BASEL, 2023, 13 (11)
[36]   RANDOMNESS - A COMPUTATIONAL COMPLEXITY PERSPECTIVE [J].
Wigderson, A. .
XVIITH INTERNATIONAL CONGRESS ON MATHEMATICAL PHYSICS, 2014, :254-263
[37]   The Computational Complexity of Linear Optics [J].
Aaronson, Scott ;
Arkhipov, Alex .
STOC 11: PROCEEDINGS OF THE 43RD ACM SYMPOSIUM ON THEORY OF COMPUTING, 2011, :333-342
[38]   The Computational Complexity of Ball Permutations [J].
Aaronson, Scott ;
Bouland, Adam ;
Kuperberg, Greg ;
Mehraban, Saeed .
STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, :317-327
[39]   Enhancing MIMO-OFDM channel estimation in 5G and beyond with conditional self-attention generative adversarial networks [J].
Alqahtani, Abdullah Saleh ;
Pandiaraj, Saravanan ;
Alshmrany, Sami ;
Almalki, Ali Jaber ;
Prabhu, Sandeep ;
Kumar, U. Arun .
WIRELESS NETWORKS, 2024, 30 (03) :1719-1736
[40]   Multi-objective cluster head using self-attention based progressive generative adversarial network for secured data aggregation [J].
Sindhuja, M. ;
Vidhya, S. ;
Jayasri, B. S. ;
Shajin, Francis H. .
AD HOC NETWORKS, 2023, 140