Multi-scale feature correspondence and pseudo label retraining strategy for weakly supervised semantic segmentation

被引：0

作者：

Wang, Weizheng ^{[1
]}

Zhou, Lei ^{[1
]}

Wang, Haonan ^{[1
]}

机构：

[1] Changsha Univ Sci & Technol, Sch Comp & Commun Engn, Changsha 410076, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2024年 / 150卷

基金：

中国国家自然科学基金;

关键词：

Weakly supervised semantic segmentation; Vision transformer; Multi-scale feature correspondence; Pseudo label retraining strategy;

D O I：

10.1016/j.imavis.2024.105215

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, the performance of semantic segmentation using weakly supervised learning has significantly improved. Weakly supervised semantic segmentation (WSSS) that uses only image-level labels has received widespread attention, it employs Class Activation Maps (CAM) to generate pseudo labels. Compared to traditional use of pixel-level labels, this technique greatly reduces annotation costs by utilizing simpler and more readily available image-level annotations. Besides, due to the local perceptual ability of Convolutional Neural Networks (CNN), the generated CAM cannot activate the entire object area. Researchers have found that this CNN limitation can be compensated for by using Vision Transformer (ViT). However, ViT also introduces an over-smoothing problem. Recent research has made good progress in solving this issue, but when discussing CAM and its related segmentation predictions, it is easy to overlook their intrinsic information and the interrelationships between them. In this paper, we propose a Multi-Scale Feature Correspondence (MSFC) method. Our MSFC can obtain the feature correspondence of CAM and segmentation predictions at different scales, reextract useful semantic information from them, enhancing the network's learning of feature information and improving the quality of CAM. Moreover, to further improve the segmentation precision, we design a Pseudo Label Retraining Strategy (PLRS). This strategy refines the accuracy in local regions, elevates the quality of pseudo labels, and aims to enhance segmentation precision. Experimental results on the PASCAL VOC 2012 and MS COCO 2014 datasets show that our method achieves impressive performance among end-to-end WSSS methods.

引用

页数：11

共 53 条

[1] Learning Pixel-level Semantic Affinity with Image-level Supervision forWeakly Supervised Semantic Segmentation
Ahn, Jiwoon
Kwak, Suha
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4981 - 4990
[2] Single-Stage Semantic Segmentation from Image Labels
Araslanov, Nikita
Roth, Stefan
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4252 - 4261
[3] What's the Point: Semantic Segmentation with Point Supervision
Bearman, Amy
Russakovsky, Olga
Ferrari, Vittorio
Fei-Fei, Li
[J]. COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 549 - 565
[4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[5] Chang YT, 2020, PROC CVPR IEEE, P8988, DOI 10.1109/CVPR42600.2020.00901
[6] Chen L., 2020, P EUR C COMP VIS, P347
[7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[8] Extracting Class Activation Maps from Non-Discriminative Features as well
Chen, Zhaozheng
Sun, Qianru
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3135 - 3144
[9] Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Chen, Zhaozheng
Wang, Tan
Wu, Xiongwei
Hua, Xian-Sheng
Zhang, Hanwang
Sun, Qianru
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 959 - 968
[10] Deep Feature Factorization for Concept Discovery
Collins, Edo
Achanta, Radhakrishna
Susstrunk, Sabine
[J]. COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 352 - 368

← 1 2 3 4 5 6 →