PanDa: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

被引:8
作者
Zhong, Qihuang [1 ]
Ding, Liang [2 ]
Liu, Juhua [1 ]
Du, Bo [1 ]
Tao, Dacheng [3 ]
机构
[1] Wuhan Univ, Inst Artificial Intelligence, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci,Hubei Key Lab Multimedia & Network Co, Wuhan 430072, Peoples R China
[2] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2050, Australia
[3] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Task analysis; Measurement; Tuning; Predictive models; Adaptation models; Computational modeling; Training; Prompt-tuning; knowledge distillation; transfer learning; model adaptation;
D O I
10.1109/TKDE.2024.3376453
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks. However, such a vanilla PoT approach usually achieves sub-optimal performance, as (i) the PoT is sensitive to the similarity of source-target pair and (ii) directly fine-tuning the prompt initialized with source prompt on target task might lead to forgetting of the useful general knowledge learned from source task. To tackle these issues, we propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PanDa) that leverages the knowledge distillation technique to alleviate the knowledge forgetting effectively (regarding (ii)). Extensive and systematic experiments on 189 combinations of 21 source and 9 target datasets across 5 scales of PLMs demonstrate that: 1) our proposed metric works well to predict the prompt transferability; 2) our PanDaconsistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PanDaapproach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios.
引用
收藏
页码:4835 / 4848
页数:14
相关论文
共 63 条
[1]  
Asai Akari, 2022, P 2022 C EMP METH NA, P6655
[2]  
Brown TB, 2020, ADV NEUR IN, V33
[3]  
Carreras Xavier, 2004, Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, P89
[4]  
Carreras Xavier., 2005, CoNLL, P152
[5]   User-Specific Adaptive Fine-Tuning for Cross-Domain Recommendations [J].
Chen, Lei ;
Yuan, Fajie ;
Yang, Jiaxi ;
He, Xiangnan ;
Li, Chengming ;
Yang, Min .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) :3239-3252
[6]  
Chen SY, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P7870
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]  
Ding L, 2021, P INT C LEARN REPR, P1
[9]  
Furlanello T, 2018, PR MACH LEARN RES, V80
[10]  
Gu YX, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P8410