Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

被引:1
|
作者
Yang, Zhao [1 ,2 ]
Zhang, Yuanzhe [1 ,2 ]
Sui, Dianbo [3 ]
Ju, Yiming [1 ,2 ]
Zhao, Jun [1 ,2 ]
Liu, Kang [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[3] Harbin Inst Technol Weihai, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Explanation; knowledge distillation; model compression;
D O I
10.1145/3639364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the studentmodel are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] SETEM: Self-ensemble training with Pre-trained Language Models for Entity Matching
    Ding, Huahua
    Dai, Chaofan
    Wu, Yahui
    Ma, Wubin
    Zhou, Haohao
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [22] Efficient and Controllable Model Compression through Sequential Knowledge Distillation and Pruning
    Malihi, Leila
    Heidemann, Gunther
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (03)
  • [23] One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
    Heo, Jungwoo
    Lim, Chan-yeong
    Kim, Ju-ho
    Shin, Hyun-seo
    Yu, Ha-Jin
    INTERSPEECH 2023, 2023, : 5271 - 5275
  • [24] Spirit Distillation: A Model Compression Method with Multi-domain Knowledge Transfer
    Wu, Zhiyuan
    Jiang, Yu
    Zhao, Minghao
    Cui, Chupeng
    Yang, Zongmin
    Xue, Xinhui
    Qi, Hong
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2021, 12815 : 553 - 565
  • [25] Simultaneous Learning Knowledge Distillation for Image Restoration: Efficient Model Compression for Drones
    Zhang, Yongheng
    DRONES, 2025, 9 (03)
  • [26] DISCOVER THE EFFECTIVE STRATEGY FOR FACE RECOGNITION MODEL COMPRESSION BY IMPROVED KNOWLEDGE DISTILLATION
    Wang, Mengjiao
    Liu, Rujie
    Abe, Narishige
    Uchida, Hidetsugu
    Matsunami, Tomoaki
    Yamada, Shigefumi
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2416 - 2420
  • [27] PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
    Kim, Jangho
    Chang, Simyung
    Kwak, Nojun
    INTERSPEECH 2021, 2021, : 4568 - 4572
  • [28] Legal judgment prediction based on pre-training model and knowledge distillation
    Pan R.-D.
    Kong W.-J.
    Qi J.
    Kongzhi yu Juece/Control and Decision, 2021, 37 (01): : 67 - 76
  • [29] Knowledge Distillation Approach for Efficient Internal Language Model Estimation
    Chen, Zhipeng
    Xu, Haihua
    Khassanov, Yerbolat
    He, Yi
    Lu, Lu
    Ma, Zejun
    Wu, Ji
    INTERSPEECH 2023, 2023, : 1339 - 1343
  • [30] Ensemble Compressed Language Model Based on Knowledge Distillation and Multi-Task Learning
    Xiang, Kun
    Fujii, Akihiro
    2022 7TH INTERNATIONAL CONFERENCE ON BUSINESS AND INDUSTRIAL RESEARCH (ICBIR2022), 2022, : 72 - 77