Dynamic Low-rank Estimation for Transformer-based Language Models

被引:0
|
作者
Huai, Ting [1 ]
Lie, Xiao [2 ]
Gao, Shangqian [1 ]
Hsu, Yenchang [2 ]
Shen, Yilin [2 ]
Jin, Hongxia [1 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Matrix decomposition methods, such as Singular Value Decomposition (SVD) and its importance-weighted variants, have been widely used for compressing Transformerbased language models. While importanceweighted decomposition methods alleviate the strong assumption of equal importance for each parameter in SVD, they still rely on two fundamental assumptions: 1) unchanged importance distribution during further fine-tuning, 2) equal importance across weight matrices in different layers. Furthermore, these methods necessitate a well-trained task-specific model as the starting point and require additional fine-tuning after compression. In this work, we proposed RankDyna, a matrix decomposition method that enables dynamic rank resource allocation among matrices across different layers during the training process. Starting from a general pre-trained model, RankDyna accomplishes the dual goals of compression and adaptation to the downstream task, all within a single round of fine-tuning. The extensive evaluations demonstrate that RankDyna can outperform current SOTA methods under various parameter budget levels, and the advantage of RankDyna is further enhanced with higher compression rates.
引用
收藏
页码:9275 / 9287
页数:13
相关论文
共 50 条
  • [21] The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
    Shen, Ke
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23419 - 23420
  • [22] Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
    Perez-Mayos, Laura
    Taboas Garcia, Alba
    Mille, Simon
    Wanner, Leo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3799 - 3812
  • [23] Reward modeling for mitigating toxicity in transformer-based language models
    Farshid Faal
    Ketra Schmitt
    Jia Yuan Yu
    Applied Intelligence, 2023, 53 : 8421 - 8435
  • [24] Reward modeling for mitigating toxicity in transformer-based language models
    Faal, Farshid
    Schmitt, Ketra
    Yu, Jia Yuan
    APPLIED INTELLIGENCE, 2023, 53 (07) : 8421 - 8435
  • [25] INFERENCE FOR LOW-RANK MODELS
    Chernozhukov, Victor
    Hansen, Christian
    Liao, Yuan
    Zhu, Yinchu
    ANNALS OF STATISTICS, 2023, 51 (03): : 1309 - 1330
  • [26] Common dynamic estimation via structured low-rank approximation with multiple rank constraints
    Fazzi, Antonio
    Guglielmi, Nicola
    Markovsky, Ivan
    Usevich, Konstantin
    IFAC PAPERSONLINE, 2021, 54 (07): : 103 - 107
  • [27] Sparse Low-rank Adaptation of Pre-trained Language Models
    Ding, Ning
    Lv, Xingtai
    Wang, Qiaosen
    Chen, Yulin
    Zhou, Bowen
    Liu, Zhiyuan
    Sun, Maosong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4133 - 4145
  • [28] LORTSAR: Low-Rank Transformer for Skeleton-Based Action Recognition
    Oraki, Soroush
    Zhuang, Harry
    Liang, Jie
    ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT I, 2025, 15046 : 196 - 207
  • [29] Low-rank variance estimation in large-scale GMRF models
    Malioutov, Dmitry M.
    Johnson, Jason K.
    Willsky, Alan S.
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 3127 - 3130
  • [30] Double-Weighted Low-Rank Matrix Recovery Based on Rank Estimation
    Xu, Zhengqin
    Xing, Huasong
    Fang, Shun
    Wu, Shiqian
    Xie, Shoulie
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 172 - 180