Layer-Specific Knowledge Distillation for Class Incremental Semantic Segmentation

被引:8
作者
Wang, Qilong [1 ]
Wu, Yiwen [1 ]
Yang, Liu [1 ]
Zuo, Wangmeng [2 ]
Hu, Qinghua [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300350, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
[3] Minist Educ Peoples Republ China, Engn Res Ctr City Intelligence & Digital Governanc, Beijing 100816, Peoples R China
关键词
Knowledge distillation; incremental learning; semantic segmentation;
D O I
10.1109/TIP.2024.3372448
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, class incremental semantic segmentation (CISS) towards the practical open-world setting has attracted increasing research interest, which is mainly challenged by the well-known issue of catastrophic forgetting. Particularly, knowledge distillation (KD) techniques have been widely studied to alleviate catastrophic forgetting. Despite the promising performance, existing KD-based methods generally use the same distillation schemes for different intermediate layers to transfer old knowledge, while employing manually tuned and fixed trade-off weights to control the effect of KD. These KD-based methods take no consideration of feature characteristics from different intermediate layers, limiting the effectiveness of KD for CISS. In this paper, we propose a layer-specific knowledge distillation (LSKD) method to assign appropriate knowledge schemes and weights for various intermediate layers by considering feature characteristics, aiming to further explore the potential of KD in improving the performance of CISS. Specifically, we present a mask-guided distillation (MD) to alleviate the background shift on semantic features, which performs distillation by masking the features affected by the background. Furthermore, a mask-guided context distillation (MCD) is presented to explore global context information lying in high-level semantic features. Based on them, our LSKD assigns different distillation schemes according to feature characteristics. To adjust the effect of layer-specific distillation adaptively, LSKD introduces a regularized gradient equilibrium method to learn dynamic trade-off weights. Additionally, our LSKD makes an attempt to simultaneously learn distillation schemes and trade-off weights of different layers by developing a bi-level optimization method. Extensive experiments on widely used Pascal VOC 12 and ADE20K show our LSKD clearly outperforms its counterparts while achieving state-of-the-art results.
引用
收藏
页码:1977 / 1989
页数:13
相关论文
共 58 条
[1]  
Baek Donghyeon, 2022, Advances in Neural Information Processing Systems
[2]   In-Place Activated BatchNorm for Memory-Optimized Training of DNNs [J].
Bulo, Samuel Rota ;
Porzi, Lorenzo ;
Kontschieder, Peter .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5639-5647
[3]   Modeling the Background for Incremental Learning in Semantic Segmentation [J].
Cermelli, Fabio ;
Mancini, Massimiliano ;
Bulo, Samuel Rota ;
Ricci, Elisa ;
Caputo, Barbara .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9230-9239
[4]  
Cha S, 2021, ADV NEUR IN
[5]   Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence [J].
Chaudhry, Arslan ;
Dokania, Puneet K. ;
Ajanthan, Thalaiyasingam ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 :556-572
[6]  
Chen GB, 2017, ADV NEUR IN, V30
[7]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[8]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[9]  
Chen Z, 2018, PR MACH LEARN RES, V80
[10]  
Chu X., 2021, Proceedings of the ICLR