Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

被引：1

作者：

Yang, Zhao ^{[1
,2
]}

Zhang, Yuanzhe ^{[1
,2
]}

Sui, Dianbo ^{[3
]}

Ju, Yiming ^{[1
,2
]}

Zhao, Jun ^{[1
,2
]}

Liu, Kang ^{[1
,2
]}

机构：

[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

[3] Harbin Inst Technol Weihai, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2024年 / 23卷 / 02期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Explanation; knowledge distillation; model compression;

D O I：

10.1145/3639364

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the studentmodel are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.

引用

页数：19

共 50 条

[21] SETEM: Self-ensemble training with Pre-trained Language Models for Entity Matching
Ding, Huahua
Dai, Chaofan
Wu, Yahui
Ma, Wubin
Zhou, Haohao
KNOWLEDGE-BASED SYSTEMS, 2024, 293
[22] Efficient and Controllable Model Compression through Sequential Knowledge Distillation and Pruning
Malihi, Leila
Heidemann, Gunther
BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (03)
[23] One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
Heo, Jungwoo
Lim, Chan-yeong
Kim, Ju-ho
Shin, Hyun-seo
Yu, Ha-Jin
INTERSPEECH 2023, 2023, : 5271 - 5275
[24] Spirit Distillation: A Model Compression Method with Multi-domain Knowledge Transfer
Wu, Zhiyuan
Jiang, Yu
Zhao, Minghao
Cui, Chupeng
Yang, Zongmin
Xue, Xinhui
Qi, Hong
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2021, 12815 : 553 - 565
[25] Simultaneous Learning Knowledge Distillation for Image Restoration: Efficient Model Compression for Drones
Zhang, Yongheng
DRONES, 2025, 9 (03)
[26] DISCOVER THE EFFECTIVE STRATEGY FOR FACE RECOGNITION MODEL COMPRESSION BY IMPROVED KNOWLEDGE DISTILLATION
Wang, Mengjiao
Liu, Rujie
Abe, Narishige
Uchida, Hidetsugu
Matsunami, Tomoaki
Yamada, Shigefumi
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2416 - 2420
[27] PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
Kim, Jangho
Chang, Simyung
Kwak, Nojun
INTERSPEECH 2021, 2021, : 4568 - 4572
[28] Legal judgment prediction based on pre-training model and knowledge distillation
Pan R.-D.
Kong W.-J.
Qi J.
Kongzhi yu Juece/Control and Decision, 2021, 37 (01): : 67 - 76
[29] Knowledge Distillation Approach for Efficient Internal Language Model Estimation
Chen, Zhipeng
Xu, Haihua
Khassanov, Yerbolat
He, Yi
Lu, Lu
Ma, Zejun
Wu, Ji
INTERSPEECH 2023, 2023, : 1339 - 1343
[30] Ensemble Compressed Language Model Based on Knowledge Distillation and Multi-Task Learning
Xiang, Kun
Fujii, Akihiro
2022 7TH INTERNATIONAL CONFERENCE ON BUSINESS AND INDUSTRIAL RESEARCH (ICBIR2022), 2022, : 72 - 77

← 1 2 3 4 5 →