DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention

被引:0
|
作者
Qin, Bosheng [1 ]
Li, Juncheng [1 ]
Tang, Siliang [1 ]
Zhuang, Yueting [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Complexity theory; Attention mechanisms; Memory management; Training; Kernel; Sparse matrices; Optimization; Learning systems; Image coding; Bilinear optimization; dynamic compression; efficient transformer; low-rank attention;
D O I
10.1109/TNNLS.2025.3527046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies have aimed to improve Transformer model efficiency using low-rank-based methods that compress sequence length with predetermined or learned compression matrices. However, these methods fix compression coefficients for tokens in the same position during inference, ignoring sequence-specific variations. They also overlook the impact of hidden state dimensions on efficiency gains. To address these limitations, we propose dynamic bilinear low-rank attention (DBA), an efficient and effective attention mechanism that compresses sequence length using input-sensitive dynamic compression matrices. DBA achieves linear time and space complexity by jointly optimizing sequence length and hidden state dimension while maintaining state-of-the-art performance. Specifically, we demonstrate through experiments and the properties of low-rank matrices that sequence length can be compressed with compression coefficients dynamically determined by the input sequence. In addition, we illustrate that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, thereby introducing only a small amount of error. DBA optimizes the attention mechanism through bilinear forms that consider both the sequence length and hidden state dimension. Moreover, the theoretical analysis substantiates that DBA excels at capturing high-order relationships in cross-attention problems. Experimental results across different tasks with varied sequence length conditions demonstrate that DBA achieves state-of-the-art performance compared to several robust baselines. DBA also maintains higher processing speed and lower memory usage, highlighting its efficiency and effectiveness across diverse applications.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] OPTIMAL LOW-RANK DYNAMIC MODE DECOMPOSITION
    Heas, Patrick
    Herzet, Cedric
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4456 - 4460
  • [32] Evolving masked low-rank transformer for long text understanding
    Liu, Chenjing
    Chen, Xiangru
    Lin, Jie
    Hu, Peng
    Wang, Junfeng
    Geng, Xue
    APPLIED SOFT COMPUTING, 2024, 152
  • [33] Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations
    Schotthoefer, Steffen
    Zangrando, Emanuele
    Kusch, Jonas
    Ceruti, Gianluca
    Tudisco, Francesco
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [34] Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling
    Xiong, Tao
    Zhang, Peiran
    Zhu, Hongbo
    Yang, Yihui
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2115 - 2124
  • [35] Weighted bilinear factorization of low-rank matrix with structural smoothness for image denoising
    Wanhong Wu
    Zikai Wu
    Hongjuan Zhang
    Multimedia Systems, 2024, 30
  • [36] Weighted bilinear factorization of low-rank matrix with structural smoothness for image denoising
    Wu, Wanhong
    Wu, Zikai
    Zhang, Hongjuan
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [37] Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-rank Matrix Decomposition
    Cabral, Ricardo
    De la Torre, Fernando
    Costeira, Joao P.
    Bernardino, Alexandre
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2488 - 2495
  • [38] Low-Rank Bottleneck in Multi-head Attention Models
    Bhojanapalli, Srinadh
    Yun, Chulhee
    Rawat, Ankit Singh
    Reddi, Sashank
    Kumar, Sanjiv
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [39] Efficient video hashing based on low-rank frames
    Chen, Zhenhai
    Tang, Zhenjun
    Zhang, Xinpeng
    Sun, Ronghai
    Zhang, Xianquan
    IET IMAGE PROCESSING, 2022, 16 (02) : 344 - 355
  • [40] Efficient low-rank solution of generalized Lyapunov equations
    Stephen D. Shank
    Valeria Simoncini
    Daniel B. Szyld
    Numerische Mathematik, 2016, 134 : 327 - 342