On The Computational Complexity of Self-Attention

被引：0

作者：

Keles, Feyza Duman

Wijewardena, Pruthuvi Mahesakya

Hegde, Chinmay

机构：

来源：

INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201 | 2023年 / 201卷

关键词：

ALGORITHM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.

引用

页码：597 / 619

页数：23

共 50 条

[1] Linear Complexity Randomized Self-attention Mechanism
Zheng, Lin
Wang, Chong
Kong, Lingpeng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[2] Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer
Wu, Yifan
Kan, Shichao
Zeng, Min
Li, Min
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4433 - 4441
[3] Linear Complexity Self-Attention With 3rd Order Polynomials
Babiloni, Francesca
Marras, Ioannis
Deng, Jiankang
Kokkinos, Filippos
Maggioni, Matteo
Chrysos, Grigorios
Torr, Philip
Zafeiriou, Stefanos
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12726 - 12737
[4] COMPARISON OF LOW COMPLEXITY SELF-ATTENTION MECHANISMS FOR ACOUSTIC EVENT DETECTION
Komatsu, Tatsuya
Scheibler, Robin
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1139 - 1143
[5] SHYNESS AND SELF-ATTENTION
CROZIER, WR
BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1983, 36 (FEB): : A5 - A5
[6] Attention and self-attention in random forests
Utkin, Lev V.
Konstantinov, Andrei V.
Kirpichenko, Stanislav R.
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2023, 12 (03) : 257 - 273
[7] Attention and self-attention in random forests
Lev V. Utkin
Andrei V. Konstantinov
Stanislav R. Kirpichenko
Progress in Artificial Intelligence, 2023, 12 : 257 - 273
[8] SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Parcollet, Titouan
van Dalen, Rogier
Zhang, Shucong
Bhattacharya, Sourav
INTERSPEECH 2024, 2024, : 3460 - 3464
[9] Reconstructing computational spectra using deep learning's self-attention method
Wu, Hao
Wu, Hui
Su, Xinyu
Wu, Jingjun
Liu, Shuangli
OPTICA APPLICATA, 2024, 54 (03) : 383 - 394
[10] Self-Attention for Cyberbullying Detection
Pradhan, Ankit
Yatam, Venu Madhav
Bera, Padmalochan
2020 INTERNATIONAL CONFERENCE ON CYBER SITUATIONAL AWARENESS, DATA ANALYTICS AND ASSESSMENT (CYBER SA 2020), 2020,

← 1 2 3 4 5 →