MERGEDISTILL: Merging Pre-trained Language Models using Distillation

被引：0

作者：

Khanuja, Simran ^{[1
]}

Johnson, Melvin ^{[2
]}

Talukdar, Partha ^{[1
]}

机构：

[1] Google, Bangalore, Karnataka, India

[2] Google, Mountain View, CA 94043 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained multilingual language models (LMs) have achieved state-of-the-art results in cross-lingual transfer, but they often lead to an inequitable representation of languages due to limited capacity, skewed pre-training data, and sub-optimal vocabularies. This has prompted the creation of an ever-growing pre-trained model universe, where each model is trained on large amounts of language or domain specific data with a carefully curated, linguistically informed vocabulary. However, doing so brings us back full circle and prevents one from leveraging the benefits of multilinguality. To address the gaps at both ends of the spectrum, we propose MERGEDISTILL, a framework to merge pre-trained LMs in a way that can best leverage their assets with minimal dependencies, using task-agnostic knowledge distillation. We demonstrate the applicability of our framework in a practical setting by leveraging pre-existing teacher LMs and training student LMs that perform competitively with or even outperform teacher LMs trained on several orders of magnitude more data and with a fixed model capacity. We also highlight the importance of teacher selection and its impact on student model performance.

引用

页码：2874 / 2887

页数：14

共 50 条

[21] Code Execution with Pre-trained Language Models
Liu, Chenxiao
Lu, Shuai
Chen, Weizhu
Jiang, Daxin
Svyatkovskiy, Alexey
Fu, Shengyu
Sundaresan, Neel
Duan, Nan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4984 - 4999
[22] Knowledge Inheritance for Pre-trained Language Models
Qin, Yujia
Lin, Yankai
Yi, Jing
Zhang, Jiajie
Han, Xu
Zhang, Zhengyan
Su, Yusheng
Liu, Zhiyuan
Li, Peng
Sun, Maosong
Zhou, Jie
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3921 - 3937
[23] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
Koksal, Omer
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[24] Automated LOINC Standardization Using Pre-trained Large Language Models
Tu, Tao
Loreaux, Eric
Chesley, Emma
Lelkes, Adam D.
Gamble, Paul
Bellaiche, Mathias
Seneviratne, Martin
Chen, Ming-Jun
MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 343 - 355
[25] Controlling Translation Formality Using Pre-trained Multilingual Language Models
Rippeth, Elijah
Agrawal, Sweta
Carpuat, Marine
PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 327 - 340
[26] A Study of Pre-trained Language Models in Natural Language Processing
Duan, Jiajia
Zhao, Hui
Zhou, Qian
Qiu, Meikang
Liu, Meiqin
2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
[27] Repairing Security Vulnerabilities Using Pre-trained Programming Language Models
Huang, Kai
Yang, Su
Sun, Hongyu
Sun, Chengyi
Li, Xuejun
Zhang, Yuqing
52ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOP VOLUME (DSN-W 2022), 2022, : 111 - 116
[28] KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation
Tahaei, Marzieh S.
Charlaix, Ella
Nia, Vahid Partovi
Ghodsi, Ali
Rezagholizadeh, Mehdi
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2116 - 2127
[29] A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models
Lee, Hayeon
Hon, Rui
Kim, Jongpil
Liang, Davis
Hwang, Sung Ju
Min, Alexander
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11239 - 11246
[30] Labeling Explicit Discourse Relations Using Pre-trained Language Models
Kurfali, Murathan
TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 79 - 86

← 1 2 3 4 5 →