Dynamic Low-rank Estimation for Transformer-based Language Models

被引：0

作者：

Huai, Ting ^{[1
]}

Lie, Xiao ^{[2
]}

Gao, Shangqian ^{[1
]}

Hsu, Yenchang ^{[2
]}

Shen, Yilin ^{[2
]}

Jin, Hongxia ^{[1
]}

机构：

[1] Samsung Res Amer, Mountain View, CA 94043 USA

[2] Univ Michigan, Ann Arbor, MI 48109 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Matrix decomposition methods, such as Singular Value Decomposition (SVD) and its importance-weighted variants, have been widely used for compressing Transformerbased language models. While importanceweighted decomposition methods alleviate the strong assumption of equal importance for each parameter in SVD, they still rely on two fundamental assumptions: 1) unchanged importance distribution during further fine-tuning, 2) equal importance across weight matrices in different layers. Furthermore, these methods necessitate a well-trained task-specific model as the starting point and require additional fine-tuning after compression. In this work, we proposed RankDyna, a matrix decomposition method that enables dynamic rank resource allocation among matrices across different layers during the training process. Starting from a general pre-trained model, RankDyna accomplishes the dual goals of compression and adaptation to the downstream task, all within a single round of fine-tuning. The extensive evaluations demonstrate that RankDyna can outperform current SOTA methods under various parameter budget levels, and the advantage of RankDyna is further enhanced with higher compression rates.

引用

页码：9275 / 9287

页数：13

共 50 条

[31] LERE: Learning-Based Low-Rank Matrix Recovery with Rank Estimation
Xu, Zhengqin
Zhang, Yulun
Ma, Chao
Yan, Yichao
Peng, Zelin
Xie, Shoulie
Wu, Shiqian
Yang, Xiaokang
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16228 - 16236
[32] Tweets Topic Classification and Sentiment Analysis Based on Transformer-Based Language Models
Mandal, Ranju
Chen, Jinyan
Becken, Susanne
Stantic, Bela
VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (02) : 117 - 134
[33] Transformer-based Language Models for Semantic Search and Mobile Applications Retrieval
Coelho, Joao
Neto, Antonio
Tavares, Miguel
Coutinho, Carlos
Oliveira, Joao
Ribeiro, Ricardo
Batista, Fernando
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1:, 2021, : 225 - 232
[34] Roles and Utilization of Attention Heads in Transformer-based Neural Language Models
Jo, Jae-young
Myaeng, Sung-hyon
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3404 - 3417
[35] Estimation of low-rank covariance function
Koltchinskii, V.
Lounici, K.
Tsybakov, A. B.
STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2016, 126 (12) : 3952 - 3967
[36] Pre-training and Evaluating Transformer-based Language Models for Icelandic
Daoason, Jon Friorik
Loftsson, Hrafn
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
[37] Shared functional specialization in transformer-based language models and the human brain
Kumar, Sreejan
Sumers, Theodore R.
Yamakoshi, Takateru
Goldstein, Ariel
Hasson, Uri
Norman, Kenneth A.
Griffiths, Thomas L.
Hawkins, Robert D.
Nastase, Samuel A.
NATURE COMMUNICATIONS, 2024, 15 (01)
[38] ROBUST LOW-RANK MATRIX ESTIMATION
Elsener, Andreas
van de Geer, Sara
ANNALS OF STATISTICS, 2018, 46 (6B): : 3481 - 3509
[39] Localizing in-domain adaptation of transformer-based biomedical language models
Buonocore, Tommaso Mario
Crema, Claudio
Redolfi, Alberto
Bellazzi, Riccardo
Parimbelli, Enea
JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144
[40] Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Zhang, Minjia
He, Yuxiong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33

← 1 2 3 4 5 →