DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention

被引：0

作者：

Qin, Bosheng ^{[1
]}

Li, Juncheng ^{[1
]}

Tang, Siliang ^{[1
]}

Zhuang, Yueting ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年

基金：

中国国家自然科学基金;

关键词：

Transformers; Complexity theory; Attention mechanisms; Memory management; Training; Kernel; Sparse matrices; Optimization; Learning systems; Image coding; Bilinear optimization; dynamic compression; efficient transformer; low-rank attention;

D O I：

10.1109/TNNLS.2025.3527046

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many studies have aimed to improve Transformer model efficiency using low-rank-based methods that compress sequence length with predetermined or learned compression matrices. However, these methods fix compression coefficients for tokens in the same position during inference, ignoring sequence-specific variations. They also overlook the impact of hidden state dimensions on efficiency gains. To address these limitations, we propose dynamic bilinear low-rank attention (DBA), an efficient and effective attention mechanism that compresses sequence length using input-sensitive dynamic compression matrices. DBA achieves linear time and space complexity by jointly optimizing sequence length and hidden state dimension while maintaining state-of-the-art performance. Specifically, we demonstrate through experiments and the properties of low-rank matrices that sequence length can be compressed with compression coefficients dynamically determined by the input sequence. In addition, we illustrate that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, thereby introducing only a small amount of error. DBA optimizes the attention mechanism through bilinear forms that consider both the sequence length and hidden state dimension. Moreover, the theoretical analysis substantiates that DBA excels at capturing high-order relationships in cross-attention problems. Experimental results across different tasks with varied sequence length conditions demonstrate that DBA achieves state-of-the-art performance compared to several robust baselines. DBA also maintains higher processing speed and lower memory usage, highlighting its efficiency and effectiveness across diverse applications.

引用

页数：15

共 50 条

[31] OPTIMAL LOW-RANK DYNAMIC MODE DECOMPOSITION
Heas, Patrick
Herzet, Cedric
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4456 - 4460
[32] Evolving masked low-rank transformer for long text understanding
Liu, Chenjing
Chen, Xiangru
Lin, Jie
Hu, Peng
Wang, Junfeng
Geng, Xue
APPLIED SOFT COMPUTING, 2024, 152
[33] Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations
Schotthoefer, Steffen
Zangrando, Emanuele
Kusch, Jonas
Ceruti, Gianluca
Tudisco, Francesco
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[34] Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling
Xiong, Tao
Zhang, Peiran
Zhu, Hongbo
Yang, Yihui
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2115 - 2124
[35] Weighted bilinear factorization of low-rank matrix with structural smoothness for image denoising
Wanhong Wu
Zikai Wu
Hongjuan Zhang
Multimedia Systems, 2024, 30
[36] Weighted bilinear factorization of low-rank matrix with structural smoothness for image denoising
Wu, Wanhong
Wu, Zikai
Zhang, Hongjuan
MULTIMEDIA SYSTEMS, 2024, 30 (01)
[37] Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-rank Matrix Decomposition
Cabral, Ricardo
De la Torre, Fernando
Costeira, Joao P.
Bernardino, Alexandre
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2488 - 2495
[38] Low-Rank Bottleneck in Multi-head Attention Models
Bhojanapalli, Srinadh
Yun, Chulhee
Rawat, Ankit Singh
Reddi, Sashank
Kumar, Sanjiv
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[39] Efficient video hashing based on low-rank frames
Chen, Zhenhai
Tang, Zhenjun
Zhang, Xinpeng
Sun, Ronghai
Zhang, Xianquan
IET IMAGE PROCESSING, 2022, 16 (02) : 344 - 355
[40] Efficient low-rank solution of generalized Lyapunov equations
Stephen D. Shank
Valeria Simoncini
Daniel B. Szyld
Numerische Mathematik, 2016, 134 : 327 - 342

← 1 2 3 4 5 →