Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

被引:1
|
作者
Yang, Zhao [1 ,2 ]
Zhang, Yuanzhe [1 ,2 ]
Sui, Dianbo [3 ]
Ju, Yiming [1 ,2 ]
Zhao, Jun [1 ,2 ]
Liu, Kang [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[3] Harbin Inst Technol Weihai, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Explanation; knowledge distillation; model compression;
D O I
10.1145/3639364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the studentmodel are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation
    Zhou, Qinhong
    Li, Peng
    Liu, Yang
    Guan, Yuyang
    Xing, Qizhou
    Chen, Ming
    Sun, Maosong
    Liu, Yang
    AI OPEN, 2023, 4 : 56 - 63
  • [2] Oversea Cross-Lingual Summarization Service in Multilanguage Pre-Trained Model through Knowledge Distillation
    Yang, Xiwei
    Yun, Jing
    Zheng, Bofei
    Liu, Limin
    Ban, Qi
    ELECTRONICS, 2023, 12 (24)
  • [3] ARC: A Layer Replacement Compression Method Based on Fine-Grained Self-Attention Distillation for Compressing Pre-Trained Language Models
    Yu, Daohan
    Qiu, Liqing
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 848 - 860
  • [4] GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model
    Gao, Yingying
    Zhang, Shilei
    Deng, Chao
    Feng, Junlan
    INTERSPEECH 2024, 2024, : 3325 - 3329
  • [5] Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
    Han, Minglun
    Chen, Feilong
    Shi, Jing
    Xu, Shuang
    Xu, Bo
    INTERSPEECH 2023, 2023, : 1364 - 1368
  • [6] Uncertainty-Driven Knowledge Distillation for Language Model Compression
    Huang, Tianyu
    Dong, Weisheng
    Wu, Fangfang
    Li, Xin
    Shi, Guangming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2850 - 2858
  • [7] KNOWLEDGE DISTILLATION FOR NEURAL TRANSDUCERS FROM LARGE SELF-SUPERVISED PRE-TRAINED MODELS
    Yang, Xiaoyu
    Li, Qiujia
    Woodland, Philip C.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8527 - 8531
  • [8] Fast and accurate image retrieval using knowledge distillation from multiple deep pre-trained networks
    Salman, Hasan
    Taherinia, Amir Hossein
    Zabihzadeh, Davood
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 33937 - 33959
  • [9] A Light Bug Triage Framework for Applying Large Pre-trained Language Model
    Lee, Jaehyung
    Han, Kisun
    Yu, Hwanjo
    PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [10] Fast and accurate image retrieval using knowledge distillation from multiple deep pre-trained networks
    Hasan Salman
    Amir Hossein Taherinia
    Davood Zabihzadeh
    Multimedia Tools and Applications, 2023, 82 : 33937 - 33959