Multi-Task Learning with Knowledge Distillation for Dense Prediction

被引:2
|
作者
Xu, Yangyang [1 ,2 ]
Yang, Yibo [4 ]
Zhang, Lefei [1 ,2 ,3 ]
机构
[1] Wuhan Univ, Inst Artificial Intelligence, Wuhan, Peoples R China
[2] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[3] Hubei Luojia Lab, Wuhan, Peoples R China
[4] King Abdullah Univ Sci & Technol, Jeddah, Saudi Arabia
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.01970
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performance is still a challenging problem. In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. First, for memory and training efficiency, we use a single strong multitask model as a teacher during training instead of multiple teachers, as widely adopted in existing studies. Second, we employ a less sensitive Cauchy-Schwarz (CS) divergence instead of the Kullback-Leibler (KL) divergence and propose a CS distillation loss accordingly. With the less sensitive divergence, our knowledge distillation with an alternative match is applied for capturing inter-task and intratask information between the teacher model and the student model of each task, thereby learning more "dark knowledge" for effective distillation. We conducted extensive experiments on dense prediction datasets, including NYUD-v2 and PASCAL-Context, for multiple vision tasks, such as semantic segmentation, human parts segmentation, depth estimation, surface normal estimation, and boundary detection. The results show that our proposed method decidedly improves model performance and the practical inference efficiency.
引用
收藏
页码:21493 / 21502
页数:10
相关论文
共 50 条
  • [21] Synergic Adversarial Label Learning for Grading Retinal Diseases via Knowledge Distillation and Multi-Task Learning
    Ju, Lie
    Wang, Xin
    Zhao, Xin
    Lu, Huimin
    Mahapatra, Dwarikanath
    Bonnington, Paul
    Ge, Zongyuan
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (10) : 3709 - 3720
  • [22] Learning behaviour prediction and multi-task recommendation based on a knowledge graph in MOOCs
    Xia, Xiaona
    Qi, Wanxue
    TECHNOLOGY PEDAGOGY AND EDUCATION, 2025,
  • [23] Cross-task Attention Mechanism for Dense Multi-task Learning
    Lopes, Ivan
    Tuan-Hung Vu
    de Charette, Raoul
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2328 - 2337
  • [24] Towards a Unified Conversational Recommendation System: Multi-task Learning via Contextualized Knowledge Distillation
    Jung, Yeongseo
    Jung, Eunseo
    Chen, Lei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13625 - 13637
  • [25] KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH
    Lee, Mun-Hak
    Chang, Joon-Hyuk
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8392 - 8396
  • [26] HirMTL: Hierarchical Multi-Task Learning for dense scene understanding
    Luo, Huilan
    Hu, Weixia
    Wei, Yixiao
    He, Jianlong
    Yu, Minghao
    NEURAL NETWORKS, 2025, 181
  • [27] Constructing negative samples via entity prediction for multi-task knowledge representation learning
    Chen, Guihai
    Wu, Jianshe
    Luo, Wenyun
    Ding, Jingyi
    KNOWLEDGE-BASED SYSTEMS, 2023, 281
  • [28] Enhancing Romanian Offensive Language Detection Through Knowledge Distillation, Multi-task Learning, and Data Augmentation
    Matei, Vlad-Cristian
    Taiatu, Iulian-Marius
    Smadu, Razvan-Alexandru
    Cercel, Dumitru-Clementin
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 317 - 332
  • [29] Open knowledge base canonicalization with multi-task learning
    Liu, Bingchen
    Peng, Huang
    Zeng, Weixin
    Zhao, Xiang
    Liu, Shijun
    Pan, Li
    Li, Xin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (05):
  • [30] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    Memetic Computing, 2020, 12 : 355 - 369