Multi-Task Learning with Knowledge Distillation for Dense Prediction

被引：2

作者：

Xu, Yangyang ^{[1
,2
]}

Yang, Yibo ^{[4
]}

Zhang, Lefei ^{[1
,2
,3
]}

机构：

[1] Wuhan Univ, Inst Artificial Intelligence, Wuhan, Peoples R China

[2] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China

[3] Hubei Luojia Lab, Wuhan, Peoples R China

[4] King Abdullah Univ Sci & Technol, Jeddah, Saudi Arabia

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV51070.2023.01970

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performance is still a challenging problem. In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. First, for memory and training efficiency, we use a single strong multitask model as a teacher during training instead of multiple teachers, as widely adopted in existing studies. Second, we employ a less sensitive Cauchy-Schwarz (CS) divergence instead of the Kullback-Leibler (KL) divergence and propose a CS distillation loss accordingly. With the less sensitive divergence, our knowledge distillation with an alternative match is applied for capturing inter-task and intratask information between the teacher model and the student model of each task, thereby learning more "dark knowledge" for effective distillation. We conducted extensive experiments on dense prediction datasets, including NYUD-v2 and PASCAL-Context, for multiple vision tasks, such as semantic segmentation, human parts segmentation, depth estimation, surface normal estimation, and boundary detection. The results show that our proposed method decidedly improves model performance and the practical inference efficiency.

引用

页码：21493 / 21502

页数：10

共 50 条

[21] Synergic Adversarial Label Learning for Grading Retinal Diseases via Knowledge Distillation and Multi-Task Learning
Ju, Lie
Wang, Xin
Zhao, Xin
Lu, Huimin
Mahapatra, Dwarikanath
Bonnington, Paul
Ge, Zongyuan
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (10) : 3709 - 3720
[22] Learning behaviour prediction and multi-task recommendation based on a knowledge graph in MOOCs
Xia, Xiaona
Qi, Wanxue
TECHNOLOGY PEDAGOGY AND EDUCATION, 2025,
[23] Cross-task Attention Mechanism for Dense Multi-task Learning
Lopes, Ivan
Tuan-Hung Vu
de Charette, Raoul
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2328 - 2337
[24] Towards a Unified Conversational Recommendation System: Multi-task Learning via Contextualized Knowledge Distillation
Jung, Yeongseo
Jung, Eunseo
Chen, Lei
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13625 - 13637
[25] KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH
Lee, Mun-Hak
Chang, Joon-Hyuk
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8392 - 8396
[26] HirMTL: Hierarchical Multi-Task Learning for dense scene understanding
Luo, Huilan
Hu, Weixia
Wei, Yixiao
He, Jianlong
Yu, Minghao
NEURAL NETWORKS, 2025, 181
[27] Constructing negative samples via entity prediction for multi-task knowledge representation learning
Chen, Guihai
Wu, Jianshe
Luo, Wenyun
Ding, Jingyi
KNOWLEDGE-BASED SYSTEMS, 2023, 281
[28] Enhancing Romanian Offensive Language Detection Through Knowledge Distillation, Multi-task Learning, and Data Augmentation
Matei, Vlad-Cristian
Taiatu, Iulian-Marius
Smadu, Razvan-Alexandru
Cercel, Dumitru-Clementin
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 317 - 332
[29] Open knowledge base canonicalization with multi-task learning
Liu, Bingchen
Peng, Huang
Zeng, Weixin
Zhao, Xiang
Liu, Shijun
Pan, Li
Li, Xin
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (05):
[30] Multi-task gradient descent for multi-task learning
Lu Bai
Yew-Soon Ong
Tiantian He
Abhishek Gupta
Memetic Computing, 2020, 12 : 355 - 369

← 1 2 3 4 5 →