Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

被引:1
|
作者
Yang, Zhao [1 ,2 ]
Zhang, Yuanzhe [1 ,2 ]
Sui, Dianbo [3 ]
Ju, Yiming [1 ,2 ]
Zhao, Jun [1 ,2 ]
Liu, Kang [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[3] Harbin Inst Technol Weihai, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Explanation; knowledge distillation; model compression;
D O I
10.1145/3639364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the studentmodel are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Continual Cross-Domain Image Compression via Entropy Prior Guided Knowledge Distillation and Scalable Decoding
    Wu, Chenhao
    Wu, Qingbo
    Ma, Rui
    Ngan, King Ngi
    Li, Hongliang
    Meng, Fanman
    Qiu, Heqian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8080 - 8092
  • [42] Research on the Construction of an Efficient and Lightweight Online Detection Method for Tiny Surface Defects through Model Compression and Knowledge Distillation
    Chen, Qipeng
    Xiong, Qiaoqiao
    Huang, Haisong
    Tang, Saihong
    Liu, Zhenghong
    ELECTRONICS, 2024, 13 (02)
  • [43] KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH
    Lee, Mun-Hak
    Chang, Joon-Hyuk
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8392 - 8396
  • [44] A hybrid model compression approach via knowledge distillation for predicting energy consumption in additive manufacturing
    Li, Yixin
    Hu, Fu
    Liu, Ying
    Ryan, Michael
    Wang, Ray
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2023, 61 (13) : 4525 - 4547
  • [45] Contrastive adversarial knowledge distillation for deep model compression in time-series regression tasks
    Xu, Qing
    Chen, Zhenghua
    Ragab, Mohamed
    Wang, Chao
    Wu, Min
    Li, Xiaoli
    NEUROCOMPUTING, 2022, 485 : 242 - 251
  • [46] KED: A Deep-Supervised Knowledge Enhancement Self-Distillation Framework for Model Compression
    Lai, Yutong
    Ning, Dejun
    Liu, Shipeng
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 831 - 835
  • [47] DiffSLU: Knowledge Distillation Based Diffusion Model for Cross-Lingual Spoken Language Understanding
    Mao, Tianjun
    Zhang, Chenghong
    INTERSPEECH 2023, 2023, : 715 - 719
  • [48] Expert-level policy style measurement via knowledge distillation with large language model collaboration
    Zhang, Yujie
    Huang, Biao
    Yuan, Weikang
    Jiang, Zhuoren
    Peng, Longsheng
    Chen, Shuai
    Tan-Soo, Jie-Sheng
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (04)
  • [49] Intelligent reconstruction algorithm of hydrogen-fueled scramjet combustor flow based on knowledge distillation model compression
    Tian, Ye
    Wang, Gang
    Deng, Xue
    Guo, Mingming
    Ren, Hu
    Li, Linjing
    Chen, Erda
    Zhang, Hua
    Le, Jialing
    INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2024, 49 : 1278 - 1291
  • [50] Exploring Model Compression Limits and Laws: A Pyramid Knowledge Distillation Framework for Satellite-on-Orbit Object Recognition
    Pang, Yanhua
    Zhang, Yamin
    Wang, Yi
    Wei, Xiaofeng
    Chen, Bo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13