Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

被引:1
|
作者
Yang, Zhao [1 ,2 ]
Zhang, Yuanzhe [1 ,2 ]
Sui, Dianbo [3 ]
Ju, Yiming [1 ,2 ]
Zhao, Jun [1 ,2 ]
Liu, Kang [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[3] Harbin Inst Technol Weihai, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Explanation; knowledge distillation; model compression;
D O I
10.1145/3639364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the studentmodel are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
    Zhong, Shanshan
    Huang, Zhongzhan
    Wen, Wushao
    Qin, Jinghui
    Lin, Liang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 567 - 578
  • [32] Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression
    Hao, Zhiwei
    Luo, Yong
    Hu, Han
    An, Jianping
    Wen, Yonggang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1803 - 1811
  • [33] End-to-end model compression via pruning and knowledge distillation for lightweight image super resolution
    Yanzhe Wang
    Yizhen Wang
    Avinash Rohra
    Baoqun Yin
    Pattern Analysis and Applications, 2025, 28 (2)
  • [34] Model compression via pruning and knowledge distillation for person re-identification
    Xie, Haonan
    Jiang, Wei
    Luo, Hao
    Yu, Hongyan
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (02) : 2149 - 2161
  • [35] Joint structured pruning and dense knowledge distillation for efficient transformer model compression
    Cui, Baiyun
    Li, Yingming
    Zhang, Zhongfei
    NEUROCOMPUTING, 2021, 458 : 56 - 69
  • [36] Model Compression by Iterative Pruning with Knowledge Distillation and Its Application to Speech Enhancement
    Wei, Zeyuan
    Li, Hao
    Zhang, Xueliang
    INTERSPEECH 2022, 2022, : 941 - 945
  • [37] Model compression via pruning and knowledge distillation for person re-identification
    Haonan Xie
    Wei Jiang
    Hao Luo
    Hongyan Yu
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 2149 - 2161
  • [38] AUGMENTING KNOWLEDGE DISTILLATION WITH PEER-TO-PEER MUTUAL LEARNING FOR MODEL COMPRESSION
    Niyaz, Usma
    Bathula, Deepti R.
    2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
  • [39] Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
    Yang, Ze
    Shou, Linjun
    Gong, Ming
    Lin, Wutao
    Jiang, Daxin
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 690 - 698
  • [40] Attention-Fused CNN Model Compression with Knowledge Distillation for Brain Tumor Segmentation
    Xu, Pengcheng
    Kim, Kyungsang
    Liu, Huafeng
    Li, Quanzheng
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, MIUA 2022, 2022, 13413 : 328 - 338