Bridging knowledge distillation gap for few-sample unsupervised semantic segmentation

被引：3

作者：

Li, Ping ^{[1
,2
]}

Chen, Junjie ^{[1
]}

Tang, Chen ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Peoples R China

[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 673卷

关键词：

Unsupervised semantic segmentation; Knowledge distillation; Block-wise/channel-wise distillation; Few-sample learning; NETWORK;

D O I：

10.1016/j.ins.2024.120714

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to privacy, security, and costly labeling of images, unsupervised semantic segmentation with very few samples has become a promising direction, but still remains unexplored. This inspires us to introduce the few-sample unsupervised semantic segmentation task, which is very challenging because generalizing the segmentation model from only a few unlabeled images is far from sufficient. We address this problem in the knowledge distillation perspective, by proposing a medium-sized auxiliary network as the bridge, which narrows down the semantic knowledge gap between teacher network (large) and student network (small). To this end, we develop the Knowledge Distillation Bridge (KDB) framework for few-sample unsupervised semantic segmentation. In particular, it consists of the teacher-auxiliary-student architecture, which adopts the block-wise distillation that encourages the auxiliary to imitate the teacher and the student to imitate the auxiliary. In this way, the knowledge gap between the source feature distribution and the target one is reduced, allowing the student with the smaller network to be readily deployed in highly-demanding environment. Meanwhile, each channel characterizes different semantics in feature map, which motivates us to distill the features of decoder in a channel-wise manner. Extensive experiments on two benchmarks including Pascal VOC2012 and Cityscapes demonstrate the promising performance of the proposed method, which strikes a good balance between precision and speed, e.g., it achieves the inference speed of 230 fps for a 512 x 512 image.

引用

页数：12

共 50 条

[1]

Bai HL, 2020, AAAI CONF ARTIF INTE, V34, P3203

[2] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[3] Emerging Properties in Self-Supervised Vision Transformers [J].

Caron, Mathilde ;

Touvron, Hugo ;

Misra, Ishan ;

Jegou, Herve ;

Mairal, Julien ;

Bojanowski, Piotr ;

Joulin, Armand .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640

[4] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[6] RGAM: A novel network architecture for 3D point cloud semantic segmentation in indoor scenes [J].

Chen, Xue-Tao ;

Li, Ying ;

Fan, Jia-Hao ;

Wang, Rui .

INFORMATION SCIENCES, 2021, 571 :87-103

[7] PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering [J].

Cho, Jang Hyun ;

Mall, Utkarsh ;

Bala, Kavita ;

Hariharan, Bharath .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16789-16799

[8] Introduction to the special section on video surveillance [J].

Collins, RT ;

Lipton, AJ ;

Kanade, T .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (08) :745-746

[9] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[10] ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting [J].

Ding, Xiaohan ;

Hao, Tianxiang ;

Tan, Jianchao ;

Liu, Ji ;

Han, Jungong ;

Guo, Yuchen ;

Ding, Guiguang .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :4490-4500

← 1 2 3 4 5 →