Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

被引：1

作者：

Yang, Zhao ^{[1
,2
]}

Zhang, Yuanzhe ^{[1
,2
]}

Sui, Dianbo ^{[3
]}

Ju, Yiming ^{[1
,2
]}

Zhao, Jun ^{[1
,2
]}

Liu, Kang ^{[1
,2
]}

机构：

[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

[3] Harbin Inst Technol Weihai, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2024年 / 23卷 / 02期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Explanation; knowledge distillation; model compression;

D O I：

10.1145/3639364

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the studentmodel are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.

引用

页数：19

共 50 条

[41] Continual Cross-Domain Image Compression via Entropy Prior Guided Knowledge Distillation and Scalable Decoding
Wu, Chenhao
Wu, Qingbo
Ma, Rui
Ngan, King Ngi
Li, Hongliang
Meng, Fanman
Qiu, Heqian
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8080 - 8092
[42] Research on the Construction of an Efficient and Lightweight Online Detection Method for Tiny Surface Defects through Model Compression and Knowledge Distillation
Chen, Qipeng
Xiong, Qiaoqiao
Huang, Haisong
Tang, Saihong
Liu, Zhenghong
ELECTRONICS, 2024, 13 (02)
[43] KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH
Lee, Mun-Hak
Chang, Joon-Hyuk
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8392 - 8396
[44] A hybrid model compression approach via knowledge distillation for predicting energy consumption in additive manufacturing
Li, Yixin
Hu, Fu
Liu, Ying
Ryan, Michael
Wang, Ray
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2023, 61 (13) : 4525 - 4547
[45] Contrastive adversarial knowledge distillation for deep model compression in time-series regression tasks
Xu, Qing
Chen, Zhenghua
Ragab, Mohamed
Wang, Chao
Wu, Min
Li, Xiaoli
NEUROCOMPUTING, 2022, 485 : 242 - 251
[46] KED: A Deep-Supervised Knowledge Enhancement Self-Distillation Framework for Model Compression
Lai, Yutong
Ning, Dejun
Liu, Shipeng
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 831 - 835
[47] DiffSLU: Knowledge Distillation Based Diffusion Model for Cross-Lingual Spoken Language Understanding
Mao, Tianjun
Zhang, Chenghong
INTERSPEECH 2023, 2023, : 715 - 719
[48] Expert-level policy style measurement via knowledge distillation with large language model collaboration
Zhang, Yujie
Huang, Biao
Yuan, Weikang
Jiang, Zhuoren
Peng, Longsheng
Chen, Shuai
Tan-Soo, Jie-Sheng
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (04)
[49] Intelligent reconstruction algorithm of hydrogen-fueled scramjet combustor flow based on knowledge distillation model compression
Tian, Ye
Wang, Gang
Deng, Xue
Guo, Mingming
Ren, Hu
Li, Linjing
Chen, Erda
Zhang, Hua
Le, Jialing
INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2024, 49 : 1278 - 1291
[50] Exploring Model Compression Limits and Laws: A Pyramid Knowledge Distillation Framework for Satellite-on-Orbit Object Recognition
Pang, Yanhua
Zhang, Yamin
Wang, Yi
Wei, Xiaofeng
Chen, Bo
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13

← 1 2 3 4 5 →