Dynamic Low-rank Estimation for Transformer-based Language Models

被引:0
作者
Huai, Ting [1 ]
Lie, Xiao [2 ]
Gao, Shangqian [1 ]
Hsu, Yenchang [2 ]
Shen, Yilin [2 ]
Jin, Hongxia [1 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Matrix decomposition methods, such as Singular Value Decomposition (SVD) and its importance-weighted variants, have been widely used for compressing Transformerbased language models. While importanceweighted decomposition methods alleviate the strong assumption of equal importance for each parameter in SVD, they still rely on two fundamental assumptions: 1) unchanged importance distribution during further fine-tuning, 2) equal importance across weight matrices in different layers. Furthermore, these methods necessitate a well-trained task-specific model as the starting point and require additional fine-tuning after compression. In this work, we proposed RankDyna, a matrix decomposition method that enables dynamic rank resource allocation among matrices across different layers during the training process. Starting from a general pre-trained model, RankDyna accomplishes the dual goals of compression and adaptation to the downstream task, all within a single round of fine-tuning. The extensive evaluations demonstrate that RankDyna can outperform current SOTA methods under various parameter budget levels, and the advantage of RankDyna is further enhanced with higher compression rates.
引用
收藏
页码:9275 / 9287
页数:13
相关论文
共 39 条
  • [1] [Anonymous], 2021, INT C MACH LEARN ICM, DOI DOI 10.1109/ICASSP39728.2021.9413567
  • [2] Ben Noach M, 2020, 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), P884
  • [3] Brown TB, 2020, ADV NEUR IN, V33
  • [4] Carbin M., 2020, INT C LEARNING REPRE
  • [5] Cer D., 2017, P 11 INT WORKSH SEM, P1, DOI [DOI 10.18653/V1/S17-2001, 10.18653/v1/s17-2001]
  • [6] Chen Z., 2018, Quora question pairs
  • [7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [8] Dolan Bill, 2005, MICROSOFT RES PARAPH, P63
  • [9] Gliwa Bogdan, 2019, SAMSUM CORPUS HUMAN, P70
  • [10] SINGULAR VALUE DECOMPOSITION AND LEAST SQUARES SOLUTIONS
    GOLUB, GH
    REINSCH, C
    [J]. NUMERISCHE MATHEMATIK, 1970, 14 (05) : 403 - &