Dynamic Low-rank Estimation for Transformer-based Language Models

被引:0
|
作者
Huai, Ting [1 ]
Lie, Xiao [2 ]
Gao, Shangqian [1 ]
Hsu, Yenchang [2 ]
Shen, Yilin [2 ]
Jin, Hongxia [1 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Matrix decomposition methods, such as Singular Value Decomposition (SVD) and its importance-weighted variants, have been widely used for compressing Transformerbased language models. While importanceweighted decomposition methods alleviate the strong assumption of equal importance for each parameter in SVD, they still rely on two fundamental assumptions: 1) unchanged importance distribution during further fine-tuning, 2) equal importance across weight matrices in different layers. Furthermore, these methods necessitate a well-trained task-specific model as the starting point and require additional fine-tuning after compression. In this work, we proposed RankDyna, a matrix decomposition method that enables dynamic rank resource allocation among matrices across different layers during the training process. Starting from a general pre-trained model, RankDyna accomplishes the dual goals of compression and adaptation to the downstream task, all within a single round of fine-tuning. The extensive evaluations demonstrate that RankDyna can outperform current SOTA methods under various parameter budget levels, and the advantage of RankDyna is further enhanced with higher compression rates.
引用
收藏
页码:9275 / 9287
页数:13
相关论文
共 50 条
  • [31] LERE: Learning-Based Low-Rank Matrix Recovery with Rank Estimation
    Xu, Zhengqin
    Zhang, Yulun
    Ma, Chao
    Yan, Yichao
    Peng, Zelin
    Xie, Shoulie
    Wu, Shiqian
    Yang, Xiaokang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16228 - 16236
  • [32] Tweets Topic Classification and Sentiment Analysis Based on Transformer-Based Language Models
    Mandal, Ranju
    Chen, Jinyan
    Becken, Susanne
    Stantic, Bela
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (02) : 117 - 134
  • [33] Transformer-based Language Models for Semantic Search and Mobile Applications Retrieval
    Coelho, Joao
    Neto, Antonio
    Tavares, Miguel
    Coutinho, Carlos
    Oliveira, Joao
    Ribeiro, Ricardo
    Batista, Fernando
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1:, 2021, : 225 - 232
  • [34] Roles and Utilization of Attention Heads in Transformer-based Neural Language Models
    Jo, Jae-young
    Myaeng, Sung-hyon
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3404 - 3417
  • [35] Estimation of low-rank covariance function
    Koltchinskii, V.
    Lounici, K.
    Tsybakov, A. B.
    STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2016, 126 (12) : 3952 - 3967
  • [36] Pre-training and Evaluating Transformer-based Language Models for Icelandic
    Daoason, Jon Friorik
    Loftsson, Hrafn
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
  • [37] Shared functional specialization in transformer-based language models and the human brain
    Kumar, Sreejan
    Sumers, Theodore R.
    Yamakoshi, Takateru
    Goldstein, Ariel
    Hasson, Uri
    Norman, Kenneth A.
    Griffiths, Thomas L.
    Hawkins, Robert D.
    Nastase, Samuel A.
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [38] ROBUST LOW-RANK MATRIX ESTIMATION
    Elsener, Andreas
    van de Geer, Sara
    ANNALS OF STATISTICS, 2018, 46 (6B): : 3481 - 3509
  • [39] Localizing in-domain adaptation of transformer-based biomedical language models
    Buonocore, Tommaso Mario
    Crema, Claudio
    Redolfi, Alberto
    Bellazzi, Riccardo
    Parimbelli, Enea
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144
  • [40] Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
    Zhang, Minjia
    He, Yuxiong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33