On The Computational Complexity of Self-Attention

被引:0
|
作者
Keles, Feyza Duman
Wijewardena, Pruthuvi Mahesakya
Hegde, Chinmay
机构
来源
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201 | 2023年 / 201卷
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.
引用
收藏
页码:597 / 619
页数:23
相关论文
共 50 条
  • [1] Linear Complexity Randomized Self-attention Mechanism
    Zheng, Lin
    Wang, Chong
    Kong, Lingpeng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer
    Wu, Yifan
    Kan, Shichao
    Zeng, Min
    Li, Min
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4433 - 4441
  • [3] Linear Complexity Self-Attention With 3rd Order Polynomials
    Babiloni, Francesca
    Marras, Ioannis
    Deng, Jiankang
    Kokkinos, Filippos
    Maggioni, Matteo
    Chrysos, Grigorios
    Torr, Philip
    Zafeiriou, Stefanos
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12726 - 12737
  • [4] COMPARISON OF LOW COMPLEXITY SELF-ATTENTION MECHANISMS FOR ACOUSTIC EVENT DETECTION
    Komatsu, Tatsuya
    Scheibler, Robin
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1139 - 1143
  • [5] SHYNESS AND SELF-ATTENTION
    CROZIER, WR
    BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1983, 36 (FEB): : A5 - A5
  • [6] Attention and self-attention in random forests
    Utkin, Lev V.
    Konstantinov, Andrei V.
    Kirpichenko, Stanislav R.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2023, 12 (03) : 257 - 273
  • [7] Attention and self-attention in random forests
    Lev V. Utkin
    Andrei V. Konstantinov
    Stanislav R. Kirpichenko
    Progress in Artificial Intelligence, 2023, 12 : 257 - 273
  • [8] SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
    Parcollet, Titouan
    van Dalen, Rogier
    Zhang, Shucong
    Bhattacharya, Sourav
    INTERSPEECH 2024, 2024, : 3460 - 3464
  • [9] Reconstructing computational spectra using deep learning's self-attention method
    Wu, Hao
    Wu, Hui
    Su, Xinyu
    Wu, Jingjun
    Liu, Shuangli
    OPTICA APPLICATA, 2024, 54 (03) : 383 - 394
  • [10] Self-Attention for Cyberbullying Detection
    Pradhan, Ankit
    Yatam, Venu Madhav
    Bera, Padmalochan
    2020 INTERNATIONAL CONFERENCE ON CYBER SITUATIONAL AWARENESS, DATA ANALYTICS AND ASSESSMENT (CYBER SA 2020), 2020,