Contrastive Tokens and Label Activation for Remote Sensing Weakly Supervised Semantic Segmentation

被引:2
|
作者
Hu, Zaiyi [1 ]
Gao, Junyu [1 ,2 ]
Yuan, Yuan [1 ]
Li, Xuelong [3 ]
机构
[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[3] China Telecom Corp Ltd, Inst Artificial Intelligence TeleAI, Beijing 100033, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
关键词
Remote sensing; Semantic segmentation; Training; Task analysis; Semantics; Convolutional neural networks; Transformers; Deep learning; remote sensing images; vision transformer (ViT); weakly supervised semantic segmentation (WSSS);
D O I
10.1109/TGRS.2024.3385747
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In recent years, there has been remarkable progress in weakly supervised semantic segmentation (WSSS), with vision transformer (ViT) architectures emerging as a natural fit for such tasks due to their inherent ability to leverage global attention for comprehensive object information perception. However, directly applying ViT to WSSS tasks can introduce challenges. The characteristics of ViT can lead to an oversmoothing problem, particularly in dense scenes of remote sensing images, significantly compromising the effectiveness of class activation maps (CAMs) and posing challenges for segmentation. Moreover, existing methods often adopt multistage strategies, adding complexity and reducing training efficiency. To overcome these challenges, a comprehensive framework Contrastive Token and Foreground Activation (CTFA) based on the ViT architecture for WSSS of remote sensing images is presented. Our proposed method includes a contrastive token learning module (CTLM), incorporating both patch-wise and class-wise token learning to enhance model performance. In patch-wise learning, we leverage the semantic diversity preserved in intermediate layers of ViT and derive a relation matrix from these layers and employ it to supervise the final output tokens, thereby improving the quality of CAM. In class-wise learning, we ensure the consistency of representation between global and local tokens, revealing more entire object regions. Additionally, by activating foreground features in the generated pseudo label using a dual-branch decoder, we further promote the improvement of CAM generation. Our approach demonstrates outstanding results across three well-established datasets, providing a more efficient and streamlined solution for WSSS. Code will be available at: https://github.com/ZaiyiHu/CTFA.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [31] Group-Wise Learning for Weakly Supervised Semantic Segmentation
    Zhou, Tianfei
    Li, Liulei
    Li, Xueyi
    Feng, Chun-Mei
    Li, Jianwu
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 799 - 811
  • [32] DDAug: Differentiable Data Augmentation for Weakly Supervised Semantic Segmentation
    Li, Boyang
    Zhang, Fei
    Wang, Longguang
    Wang, Yingqian
    Liu, Ting
    Lin, Zaiping
    An, Wei
    Guo, Yulan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4764 - 4775
  • [33] Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation
    Zhang, Tianyi
    Lin, Guosheng
    Cai, Jianfei
    Shen, Tong
    Shen, Chunhua
    Kot, Alex C.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2930 - 2941
  • [34] LSENet: Local and Spatial Enhancement to Improve the Semantic Segmentation of Remote Sensing Images
    Ding, Rong-Xing
    Xu, Yi-Han
    Liu, Jie
    Zhou, Wen
    Chen, Chen
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [35] Semantic Segmentation Network of Remote Sensing Images With Dynamic Loss Fusion Strategy
    Liu, Wenjie
    Zhang, Yongjun
    Yan, Jun
    Zou, Yongjie
    Cui, Zhongwei
    IEEE ACCESS, 2021, 9 : 70406 - 70418
  • [36] Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label for Salient Object Detection in Optical Remote Sensing Images
    Qiu, Yu
    Sun, Yuhang
    Mei, Jie
    Xu, Jing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10892 - 10907
  • [37] Multimodal Supervised Contrastive Learning in Remote Sensing Downstream Tasks
    Berg, Paul
    Uzun, Baki
    Pham, Minh-Tan
    Courty, Nicolas
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [38] Domain Adaptation for Remote Sensing Image Semantic Segmentation: An Integrated Approach of Contrastive Learning and Adversarial Learning
    Bai, Lubin
    Du, Shihong
    Zhang, Xiuyuan
    Wang, Haoyu
    Liu, Bo
    Ouyang, Song
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [39] Difference-Complementary Learning and Label Reassignment for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images
    Han, Wenqi
    Jiang, Wen
    Geng, Jie
    Miao, Wang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 566 - 580
  • [40] A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images
    Li, Xin
    Xu, Feng
    Liu, Fan
    Lyu, Xin
    Tong, Yao
    Xu, Zhennan
    Zhou, Jun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61