Intertemporal Interaction and Symmetric Difference Learning for Remote Sensing Image Change Captioning

被引：1

作者：

Li, Yunpeng ^{[1
]}

Zhang, Xiangrong ^{[1
]}

Cheng, Xina ^{[1
]}

Chen, Puhua ^{[1
]}

Jiao, Licheng ^{[1
]}

机构：

[1] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Transformers; Feature extraction; Semantics; Remote sensing; Decoding; Visualization; Sensors; Cross-temporal attention (CTA); remote sensing change captioning; symmetric change detection; temporal remote sensing images (RSIs); NETWORK;

D O I：

10.1109/TGRS.2024.3462091

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Remote sensing image change captioning (RSICC) is more challenging than remote sensing change detection task, which requires extracting occurred changes in similar remote sensing image (RSI) pairs while generating change caption. However, few works have been investigated on RSICC, the main challenges come from how to learn abundant change clues and face the modality gap. To handle these problems, we rethink this task from the perspective of obtaining and aligning symmetrical change features for temporal RSIs. In this work, the proposed intertemporal interaction and symmetric difference learning network are cascaded through several multitemporal integration units to model differences from coarse to fine representations. Specifically, we design a cross-temporal attention (CTA) mechanism to probe direct interaction between bi-temporal RSIs for motivating information coupling between intralevel representations and suppressing irrelevant interferences. To learn robust change features, a symmetric difference transformer (SDT) module is devised to guarantee temporal symmetry between the "before-to-after" and "after-to-before" change representations. Besides, the bi-directional triplet ranking loss is adopted to guide the network to learn strongly discriminative and temporal-symmetric change representation. Extensive experimental results on Dubai-CC and LEVIR-CC datasets demonstrate that our framework with the proposed components can achieve excellent performance and surpass recent state-of-the-art methods. https://github.com/romanticLYP/TISDNet

引用

页数：13

共 46 条

[41] Description Generation for Remote Sensing Images Using Attribute Attention Mechanism [J].

Zhang, Xiangrong ;

Wang, Xin ;

Tang, Xu ;

Zhou, Huiyu ;

Li, Chen .

REMOTE SENSING, 2019, 11 (06)

[42]

Zhang XR, 2017, INT GEOSCI REMOTE SE, P4798, DOI 10.1109/IGARSS.2017.8128075

[43]

Zhang XT, 2019, INT GEOSCI REMOTE SE, P10039, DOI [10.1109/IGARSS.2019.8900503, 10.1109/igarss.2019.8900503]

[44] Global Visual Feature and Linguistic State Guided Attention for Remote Sensing Image Captioning [J].

Zhang, Zhengyuan ;

Zhang, Wenkai ;

Yan, Menglong ;

Gao, Xin ;

Fu, Kun ;

Sun, Xian .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[45] Single-Stream Extractor Network With Contrastive Pre-Training for Remote-Sensing Change Captioning [J].

Zhou, Qing ;

Gao, Junyu ;

Yuan, Yuan ;

Wang, Qi .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 :1-14

[46] Transforming remote sensing images to textual descriptions [J].

Zia, Usman ;

Riaz, M. Mohsin ;

Ghafoor, Abdul .

INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 108

← 1 2 3 4 5 →