DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention

被引：0

作者：

Qin, Bosheng ^{[1
]}

Li, Juncheng ^{[1
]}

Tang, Siliang ^{[1
]}

Zhuang, Yueting ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年

基金：

中国国家自然科学基金;

关键词：

Transformers; Complexity theory; Attention mechanisms; Memory management; Training; Kernel; Sparse matrices; Optimization; Learning systems; Image coding; Bilinear optimization; dynamic compression; efficient transformer; low-rank attention;

D O I：

10.1109/TNNLS.2025.3527046

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many studies have aimed to improve Transformer model efficiency using low-rank-based methods that compress sequence length with predetermined or learned compression matrices. However, these methods fix compression coefficients for tokens in the same position during inference, ignoring sequence-specific variations. They also overlook the impact of hidden state dimensions on efficiency gains. To address these limitations, we propose dynamic bilinear low-rank attention (DBA), an efficient and effective attention mechanism that compresses sequence length using input-sensitive dynamic compression matrices. DBA achieves linear time and space complexity by jointly optimizing sequence length and hidden state dimension while maintaining state-of-the-art performance. Specifically, we demonstrate through experiments and the properties of low-rank matrices that sequence length can be compressed with compression coefficients dynamically determined by the input sequence. In addition, we illustrate that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, thereby introducing only a small amount of error. DBA optimizes the attention mechanism through bilinear forms that consider both the sequence length and hidden state dimension. Moreover, the theoretical analysis substantiates that DBA excels at capturing high-order relationships in cross-attention problems. Experimental results across different tasks with varied sequence length conditions demonstrate that DBA achieves state-of-the-art performance compared to several robust baselines. DBA also maintains higher processing speed and lower memory usage, highlighting its efficiency and effectiveness across diverse applications.

引用

页数：15

共 50 条

[1] Efficient Optimization for Low-Rank Integrated Bilinear Classifiers
Kobayashi, Takumi
Otsu, Nobuyuki
COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 474 - 487
[2] Low-Rank Bilinear Classification: Efficient Convex Optimization and Extensions
Takumi Kobayashi
International Journal of Computer Vision, 2014, 110 : 308 - 327
[3] Low-Rank Bilinear Classification: Efficient Convex Optimization and Extensions
Kobayashi, Takumi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 110 (03) : 308 - 327
[4] Efficient Low-rank Backpropagation for Vision Transformer Adaptation
Yang, Yuedong
Chiang, Hung-Yueh
Li, Guihong
Marculescu, Diana
Marculescu, Radu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Bilinear Bandits with Low-rank Structure
Jun, Kwang-Sung
Willett, Rebecca
Wright, Stephen
Nowak, Robert
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[6] TriAxial Low-Rank Transformer for Efficient Medical Image Segmentation
Shang, Jiang
Fang, Xi
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 91 - 102
[7] Low-rank and global-representation-key-based attention for graph transformer
Kong, Lingping
Ojha, Varun
Gao, Ruobin
Suganthan, Ponnuthurai Nagaratnam
Snasel, Vaclav
INFORMATION SCIENCES, 2023, 642
[8] LRTD: A Low-rank Transformer with Dynamic Depth and Width for Speech Recognition
Yu, Fan
Xi, Wei
Yang, Zhao
Tong, Ziye
Sun, Jingtong
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[9] Dynamic Low-rank Estimation for Transformer-based Language Models
Huai, Ting
Lie, Xiao
Gao, Shangqian
Hsu, Yenchang
Shen, Yilin
Jin, Hongxia
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9275 - 9287
[10] Efficient Dynamic Skinning with Low-Rank Helper Bone Controllers
Mukai, Tomohiko
Kuriyama, Shigeru
ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04):

← 1 2 3 4 5 →