Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance

被引:0
|
作者
Zhu, Yongshuo [1 ]
Li, Lu [1 ]
Chen, Keyan [1 ,2 ]
Liu, Chenyang [1 ,2 ]
Zhou, Fugen [1 ]
Shi, Zhenwei [1 ,2 ]
机构
[1] Beihang Univ, Image Proc Ctr, Sch Astronaut, Beijing 100191, Peoples R China
[2] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Training; Accuracy; Semantics; Natural languages; Feature extraction; Stability analysis; Decoding; Neck; Sensors; Remote sensing; Change captioning (CC); foundation model; multitask learning (MTL); remote sensing image;
D O I
10.1109/TGRS.2024.3497338
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Remote sensing image change captioning (RSICC) aims to articulate the changes in objects of interest within bitemporal remote sensing images using natural language. Given the limitations of current RSICC methods in expressing general features across multitemporal and spatial scenarios, and their deficiency in providing granular, robust, and precise change descriptions, we introduce a novel change captioning (CC) method based on the foundational knowledge and semantic guidance, which we term Semantic-CC. Semantic-CC alleviates the dependency of high-generalization algorithms on extensive annotations by harnessing the latent knowledge of foundation models, and it generates more comprehensive and accurate change descriptions guided by pixel-level semantics from change detection (CD). Specifically, we propose a bitemporal SAM-based encoder for dual-image feature extraction; a multitask semantic aggregation neck for facilitating information interaction between heterogeneous tasks; a straightforward multiscale CD decoder to provide pixel-level semantic guidance; and a change caption decoder based on the large language model (LLM) to generate change description sentences. Moreover, to ensure the stability of the joint training of CD and CC, we propose a three-stage training strategy that supervises different tasks at various stages. We validate the proposed method on the LEVIR-CC and LEVIR-CD datasets. The experimental results corroborate the complementarity of CD and CC, demonstrating that Semantic-CC can generate more accurate change descriptions and achieve optimal performance across both tasks.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    REMOTE SENSING, 2024, 16 (01)
  • [22] A Semi-Supervised Semantic and Spatial Change Detail Retention Network for Semantic Change Detection in Remote Sensing Images
    Lv, Pengyuan
    Cheng, Peng
    Ma, Chuang
    Zhong, Yanfei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [23] Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation
    Liu, Fuxiang
    Hu, Zhiqiang
    Li, Lei
    Li, Hanlu
    Liu, Xinxin
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1296 - 1300
  • [24] High-Order Semantic Decoupling Network for Remote Sensing Image Semantic Segmentation
    Zheng, Chengyu
    Nie, Jie
    Wang, Zhaoxin
    Song, Ning
    Wang, Jingyu
    Wei, Zhiqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [25] Domain Adaptive Remote Sensing Scene Recognition via Semantic Relationship Knowledge Transfer
    Zhao, Ying
    Li, Shuang
    Liu, Chi Harold
    Han, Yuqi
    Shi, Hao
    Li, Wei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [26] Remote Sensing Image Semantic Segmentation Based on Edge Information Guidance
    He, Chu
    Li, Shenglin
    Xiong, Dehui
    Fang, Peizhang
    Liao, Mingsheng
    REMOTE SENSING, 2020, 12 (09)
  • [27] SFEARNet: A Network Combining Semantic Flow and Edge-Aware Refinement for Highly Efficient Remote Sensing Image Change Detection
    Li, Miao
    Ming, Dongping
    Xu, Lu
    Dong, Dehui
    Zhang, Yu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [28] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Yang, Yang
    Xiao, Liang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
  • [29] Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
    Yuan, Zhenghang
    Li, Xuelong
    Wang, Qi
    IEEE ACCESS, 2020, 8 (08): : 2608 - 2620
  • [30] Frequency-Driven Edge Guidance Network for Semantic Segmentation of Remote Sensing Images
    Li, Jinsong
    Zhang, Shujun
    Sun, Yukang
    Han, Qi
    Sun, Yuanyuan
    Wang, Yimin
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 9677 - 9693