Contrastive Tokens and Label Activation for Remote Sensing Weakly Supervised Semantic Segmentation

被引:2
|
作者
Hu, Zaiyi [1 ]
Gao, Junyu [1 ,2 ]
Yuan, Yuan [1 ]
Li, Xuelong [3 ]
机构
[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[3] China Telecom Corp Ltd, Inst Artificial Intelligence TeleAI, Beijing 100033, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
关键词
Remote sensing; Semantic segmentation; Training; Task analysis; Semantics; Convolutional neural networks; Transformers; Deep learning; remote sensing images; vision transformer (ViT); weakly supervised semantic segmentation (WSSS);
D O I
10.1109/TGRS.2024.3385747
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In recent years, there has been remarkable progress in weakly supervised semantic segmentation (WSSS), with vision transformer (ViT) architectures emerging as a natural fit for such tasks due to their inherent ability to leverage global attention for comprehensive object information perception. However, directly applying ViT to WSSS tasks can introduce challenges. The characteristics of ViT can lead to an oversmoothing problem, particularly in dense scenes of remote sensing images, significantly compromising the effectiveness of class activation maps (CAMs) and posing challenges for segmentation. Moreover, existing methods often adopt multistage strategies, adding complexity and reducing training efficiency. To overcome these challenges, a comprehensive framework Contrastive Token and Foreground Activation (CTFA) based on the ViT architecture for WSSS of remote sensing images is presented. Our proposed method includes a contrastive token learning module (CTLM), incorporating both patch-wise and class-wise token learning to enhance model performance. In patch-wise learning, we leverage the semantic diversity preserved in intermediate layers of ViT and derive a relation matrix from these layers and employ it to supervise the final output tokens, thereby improving the quality of CAM. In class-wise learning, we ensure the consistency of representation between global and local tokens, revealing more entire object regions. Additionally, by activating foreground features in the generated pseudo label using a dual-branch decoder, we further promote the improvement of CAM generation. Our approach demonstrates outstanding results across three well-established datasets, providing a more efficient and streamlined solution for WSSS. Code will be available at: https://github.com/ZaiyiHu/CTFA.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [41] PICS: Paradigms Integration and Contrastive Selection for Semisupervised Remote Sensing Images Semantic Segmentation
    Qi, Xiyu
    Mao, Yongqiang
    Zhang, Yidan
    Deng, Yawen
    Wei, Haoran
    Wang, Lei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [42] Weakly Supervised Road Segmentation in High-Resolution Remote Sensing Images Using Point Annotations
    Lian, Renbao
    Huang, Liqin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [43] Curriculum learning with class-label composition for weakly supervised semantic segmentation
    Hwang, Dongjun
    Kim, Hyoseo
    Baek, Doyeol
    Kim, Hyunbin
    Kye, Inhye
    Choe, Junsuk
    PATTERN RECOGNITION LETTERS, 2025, 188 : 171 - 177
  • [44] MCTformer plus : Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
    Xu, Lian
    Bennamoun, Mohammed
    Boussaid, Farid
    Laga, Hamid
    Ouyang, Wanli
    Xu, Dan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8380 - 8395
  • [45] Cross-Block Sparse Class Token Contrast for Weakly Supervised Semantic Segmentation
    Cheng, Keyang
    Tang, Jingfeng
    Gu, Hongjian
    Wan, Hao
    Li, Maozhen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13004 - 13015
  • [46] A Gather-to-Guide Network for Remote Sensing Semantic Segmentation of RGB and Auxiliary Image
    Zheng, Xianwei
    Wu, Xiujie
    Huan, Linxi
    He, Wei
    Zhang, Hongyan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [47] PPMamba: Enhancing Semantic Segmentation in Remote Sensing Imagery by SS2D
    Mu, Juwei
    Zhou, Shangbo
    Sun, Xingjie
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [48] Dense Supervised Dual-Aware Contrastive Learning for Airborne Laser Scanning Weakly Supervised Semantic Segmentation
    Luo, Ziwei
    Zeng, Tao
    Jiang, Xinyi
    Peng, Qingyu
    Ma, Ying
    Xie, Zhong
    Pan, Xiong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [49] Extracting Structured Supervision From Captions for Weakly Supervised Semantic Segmentation
    Vilar, Daniel R.
    Perez, Claudio A.
    IEEE ACCESS, 2021, 9 (09): : 65702 - 65720
  • [50] Negative Class Guided Spatial Consistency Network for Sparsely Supervised Semantic Segmentation of Remote Sensing Images
    Yang, Chen
    Wang, Junxiao
    Meng, Huixiao
    Yang, Shuyuan
    Feng, Zhixi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 657 - 669