MERGEDISTILL: Merging Pre-trained Language Models using Distillation

被引:0
作者
Khanuja, Simran [1 ]
Johnson, Melvin [2 ]
Talukdar, Partha [1 ]
机构
[1] Google, Bangalore, Karnataka, India
[2] Google, Mountain View, CA 94043 USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021 | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained multilingual language models (LMs) have achieved state-of-the-art results in cross-lingual transfer, but they often lead to an inequitable representation of languages due to limited capacity, skewed pre-training data, and sub-optimal vocabularies. This has prompted the creation of an ever-growing pre-trained model universe, where each model is trained on large amounts of language or domain specific data with a carefully curated, linguistically informed vocabulary. However, doing so brings us back full circle and prevents one from leveraging the benefits of multilinguality. To address the gaps at both ends of the spectrum, we propose MERGEDISTILL, a framework to merge pre-trained LMs in a way that can best leverage their assets with minimal dependencies, using task-agnostic knowledge distillation. We demonstrate the applicability of our framework in a practical setting by leveraging pre-existing teacher LMs and training student LMs that perform competitively with or even outperform teacher LMs trained on several orders of magnitude more data and with a fixed model capacity. We also highlight the importance of teacher selection and its impact on student model performance.
引用
收藏
页码:2874 / 2887
页数:14
相关论文
共 50 条
  • [21] Code Execution with Pre-trained Language Models
    Liu, Chenxiao
    Lu, Shuai
    Chen, Weizhu
    Jiang, Daxin
    Svyatkovskiy, Alexey
    Fu, Shengyu
    Sundaresan, Neel
    Duan, Nan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4984 - 4999
  • [22] Knowledge Inheritance for Pre-trained Language Models
    Qin, Yujia
    Lin, Yankai
    Yi, Jing
    Zhang, Jiajie
    Han, Xu
    Zhang, Zhengyan
    Su, Yusheng
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3921 - 3937
  • [23] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
    Koksal, Omer
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [24] Automated LOINC Standardization Using Pre-trained Large Language Models
    Tu, Tao
    Loreaux, Eric
    Chesley, Emma
    Lelkes, Adam D.
    Gamble, Paul
    Bellaiche, Mathias
    Seneviratne, Martin
    Chen, Ming-Jun
    MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 343 - 355
  • [25] Controlling Translation Formality Using Pre-trained Multilingual Language Models
    Rippeth, Elijah
    Agrawal, Sweta
    Carpuat, Marine
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 327 - 340
  • [26] A Study of Pre-trained Language Models in Natural Language Processing
    Duan, Jiajia
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
  • [27] Repairing Security Vulnerabilities Using Pre-trained Programming Language Models
    Huang, Kai
    Yang, Su
    Sun, Hongyu
    Sun, Chengyi
    Li, Xuejun
    Zhang, Yuqing
    52ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOP VOLUME (DSN-W 2022), 2022, : 111 - 116
  • [28] KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation
    Tahaei, Marzieh S.
    Charlaix, Ella
    Nia, Vahid Partovi
    Ghodsi, Ali
    Rezagholizadeh, Mehdi
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2116 - 2127
  • [29] A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models
    Lee, Hayeon
    Hon, Rui
    Kim, Jongpil
    Liang, Davis
    Hwang, Sung Ju
    Min, Alexander
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11239 - 11246
  • [30] Labeling Explicit Discourse Relations Using Pre-trained Language Models
    Kurfali, Murathan
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 79 - 86