Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance

被引:0
|
作者
Zhu, Yongshuo [1 ]
Li, Lu [1 ]
Chen, Keyan [1 ,2 ]
Liu, Chenyang [1 ,2 ]
Zhou, Fugen [1 ]
Shi, Zhenwei [1 ,2 ]
机构
[1] Beihang Univ, Image Proc Ctr, Sch Astronaut, Beijing 100191, Peoples R China
[2] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Training; Accuracy; Semantics; Natural languages; Feature extraction; Stability analysis; Decoding; Neck; Sensors; Remote sensing; Change captioning (CC); foundation model; multitask learning (MTL); remote sensing image;
D O I
10.1109/TGRS.2024.3497338
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Remote sensing image change captioning (RSICC) aims to articulate the changes in objects of interest within bitemporal remote sensing images using natural language. Given the limitations of current RSICC methods in expressing general features across multitemporal and spatial scenarios, and their deficiency in providing granular, robust, and precise change descriptions, we introduce a novel change captioning (CC) method based on the foundational knowledge and semantic guidance, which we term Semantic-CC. Semantic-CC alleviates the dependency of high-generalization algorithms on extensive annotations by harnessing the latent knowledge of foundation models, and it generates more comprehensive and accurate change descriptions guided by pixel-level semantics from change detection (CD). Specifically, we propose a bitemporal SAM-based encoder for dual-image feature extraction; a multitask semantic aggregation neck for facilitating information interaction between heterogeneous tasks; a straightforward multiscale CD decoder to provide pixel-level semantic guidance; and a change caption decoder based on the large language model (LLM) to generate change description sentences. Moreover, to ensure the stability of the joint training of CD and CC, we propose a three-stage training strategy that supervises different tasks at various stages. We validate the proposed method on the LEVIR-CC and LEVIR-CD datasets. The experimental results corroborate the complementarity of CD and CC, demonstrating that Semantic-CC can generate more accurate change descriptions and achieve optimal performance across both tasks.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] A Tiny Object Detection Method Based on Explicit Semantic Guidance for Remote Sensing Images
    Liu, Dongyang
    Zhang, Junping
    Qi, Yunxiao
    Wu, Yinhu
    Zhang, Ye
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [32] Progressive Guidance Edge Perception Network for Semantic Segmentation of Remote-Sensing Images
    Pan, Shaoming
    Tao, Yulong
    Chen, Xiaoshu
    Chong, Yanwen
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [33] BGSINet-CD: Bitemporal Graph Semantic Interaction Network for Remote-Sensing Image Change Detection
    Cui, Binge
    Liu, Chenglong
    Yu, Jianzhi
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [34] RSCaMa: Remote Sensing Image Change Captioning With State Space Model
    Liu, Chenyang
    Chen, Keyan
    Chen, Bowen
    Zhang, Haotian
    Zou, Zhengxia
    Shi, Zhenwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [35] Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images
    Ding, Lei
    Guo, Haitao
    Liu, Sicong
    Mou, Lichao
    Zhang, Jing
    Bruzzone, Lorenzo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [36] Semisupervised Multiscale Generative Adversarial Network for Semantic Segmentation of Remote Sensing Image
    Wang, Jiaqi
    Liu, Bing
    Zhou, Yong
    Zhao, Jiaqi
    Xia, Shixiong
    Yang, Yuancan
    Zhang, Man
    Ming, Liu Ming
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [37] Semantic Co-Occurrence and Relationship Modeling for Remote Sensing Image Segmentation
    Zhang, Yinxing
    Song, Haochen
    Wang, Qingwang
    Jin, Pengcheng
    Shen, Tao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6630 - 6640
  • [38] A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation
    Du, Wen-Liang
    Gu, Yang
    Zhao, Jiaqi
    Zhu, Hancheng
    Yao, Rui
    Zhou, Yong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [39] Adaptive Multitype Contrastive Views Generation for Remote Sensing Image Semantic Segmentation
    Shi, Cheng
    Han, Peiwen
    Zhao, Minghua
    Fang, Li
    Miao, Qiguang
    Pun, Chi-Man
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [40] Remote Sensing Image Coding for Machines on Semantic Segmentation via Contrastive Learning
    Zhang, Junxi
    Chen, Zhenzhong
    Liu, Shan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62