Dynamic Low-rank Estimation for Transformer-based Language Models

被引：0

作者：

Huai, Ting ^{[1
]}

Lie, Xiao ^{[2
]}

Gao, Shangqian ^{[1
]}

Hsu, Yenchang ^{[2
]}

Shen, Yilin ^{[2
]}

Jin, Hongxia ^{[1
]}

机构：

[1] Samsung Res Amer, Mountain View, CA 94043 USA

[2] Univ Michigan, Ann Arbor, MI 48109 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Matrix decomposition methods, such as Singular Value Decomposition (SVD) and its importance-weighted variants, have been widely used for compressing Transformerbased language models. While importanceweighted decomposition methods alleviate the strong assumption of equal importance for each parameter in SVD, they still rely on two fundamental assumptions: 1) unchanged importance distribution during further fine-tuning, 2) equal importance across weight matrices in different layers. Furthermore, these methods necessitate a well-trained task-specific model as the starting point and require additional fine-tuning after compression. In this work, we proposed RankDyna, a matrix decomposition method that enables dynamic rank resource allocation among matrices across different layers during the training process. Starting from a general pre-trained model, RankDyna accomplishes the dual goals of compression and adaptation to the downstream task, all within a single round of fine-tuning. The extensive evaluations demonstrate that RankDyna can outperform current SOTA methods under various parameter budget levels, and the advantage of RankDyna is further enhanced with higher compression rates.

引用

页码：9275 / 9287

页数：13

共 50 条

[21] The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
Shen, Ke
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23419 - 23420
[22] Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
Perez-Mayos, Laura
Taboas Garcia, Alba
Mille, Simon
Wanner, Leo
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3799 - 3812
[23] Reward modeling for mitigating toxicity in transformer-based language models
Farshid Faal
Ketra Schmitt
Jia Yuan Yu
Applied Intelligence, 2023, 53 : 8421 - 8435
[24] Reward modeling for mitigating toxicity in transformer-based language models
Faal, Farshid
Schmitt, Ketra
Yu, Jia Yuan
APPLIED INTELLIGENCE, 2023, 53 (07) : 8421 - 8435
[25] INFERENCE FOR LOW-RANK MODELS
Chernozhukov, Victor
Hansen, Christian
Liao, Yuan
Zhu, Yinchu
ANNALS OF STATISTICS, 2023, 51 (03): : 1309 - 1330
[26] Common dynamic estimation via structured low-rank approximation with multiple rank constraints
Fazzi, Antonio
Guglielmi, Nicola
Markovsky, Ivan
Usevich, Konstantin
IFAC PAPERSONLINE, 2021, 54 (07): : 103 - 107
[27] Sparse Low-rank Adaptation of Pre-trained Language Models
Ding, Ning
Lv, Xingtai
Wang, Qiaosen
Chen, Yulin
Zhou, Bowen
Liu, Zhiyuan
Sun, Maosong
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4133 - 4145
[28] LORTSAR: Low-Rank Transformer for Skeleton-Based Action Recognition
Oraki, Soroush
Zhuang, Harry
Liang, Jie
ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT I, 2025, 15046 : 196 - 207
[29] Low-rank variance estimation in large-scale GMRF models
Malioutov, Dmitry M.
Johnson, Jason K.
Willsky, Alan S.
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 3127 - 3130
[30] Double-Weighted Low-Rank Matrix Recovery Based on Rank Estimation
Xu, Zhengqin
Xing, Huasong
Fang, Shun
Wu, Shiqian
Xie, Shoulie
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 172 - 180

← 1 2 3 4 5 →