On The Computational Complexity of Self-Attention

被引:0
作者
Keles, Feyza Duman
Wijewardena, Pruthuvi Mahesakya
Hegde, Chinmay
机构
来源
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201 | 2023年 / 201卷
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.
引用
收藏
页码:597 / 619
页数:23
相关论文
共 50 条
[41]   Estimation of Daily Arctic Winter Sea Ice Thickness from Thermodynamic Parameters Using a Self-Attention Convolutional Neural Network [J].
Liang, Zeyu ;
Ji, Qing ;
Pang, Xiaoping ;
Fan, Pei ;
Yao, Xuedong ;
Chen, Yizhuo ;
Chen, Ying ;
Yan, Zhongnan .
REMOTE SENSING, 2023, 15 (07)
[42]   Reference Crop Evapotranspiration Prediction Based on Gated Recurrent Unit with Quantum Inspired Multi-head Self-attention Mechanism [J].
Gao, Zehai ;
Yang, Dongzhe ;
Li, Baojun ;
Gao, Zijun ;
Li, Chengcheng .
WATER RESOURCES MANAGEMENT, 2025, 39 (03) :1481-1501
[43]   Underwater Equipotential Line Tracking Based on Self-Attention Embedded Multiagent Reinforcement Learning Toward AUV-Based ITS [J].
Lin, Chuan ;
Han, Guangjie ;
Tao, Qiuzi ;
Liu, Li ;
Shah, Syed Bilal Hussain ;
Zhang, Tongwei ;
Li, Zhenglin .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (08) :8580-8591
[44]   A novel hybrid forecasting approach for NOx emission of coal-fired boiler combined with CEEMDAN and self-attention improved by LSTM [J].
Yan, Hua ;
Chen, Yunchi ;
Yang, Bin ;
Yang, Yang ;
Ni, Hu ;
Wang, Ying .
ASIA-PACIFIC JOURNAL OF CHEMICAL ENGINEERING, 2024, 19 (04)
[45]   On the Computational Complexity of Bongartz's Algorithm [J].
Mroz, Andrzej .
FUNDAMENTA INFORMATICAE, 2013, 123 (03) :317-329
[46]   On the Computational Complexity of Compressed Power Series [J].
E. A. Karatsuba .
Mathematical Notes, 2023, 114 :92-98
[47]   Computational Complexity of Jumping Block Puzzles [J].
Kanzaki, Masaaki ;
Otachi, Yota ;
Uehara, Ryuhei .
COMPUTING AND COMBINATORICS (COCOON 2021), 2021, 13025 :655-667
[48]   On the Computational Complexity of Compressed Power Series [J].
Karatsuba, E. A. .
MATHEMATICAL NOTES, 2023, 114 (1-2) :92-98
[49]   Computing equilibria: a computational complexity perspective [J].
Roughgarden, Tim .
ECONOMIC THEORY, 2010, 42 (01) :193-236
[50]   Computational complexity of jumping block puzzles [J].
Kanzaki, Masaaki ;
Otachi, Yota ;
Viglietta, Giovanni ;
Uehara, Ryuhei .
THEORETICAL COMPUTER SCIENCE, 2024, 983