Intertemporal Interaction and Symmetric Difference Learning for Remote Sensing Image Change Captioning

被引：1

作者：

Li, Yunpeng ^{[1
]}

Zhang, Xiangrong ^{[1
]}

Cheng, Xina ^{[1
]}

Chen, Puhua ^{[1
]}

Jiao, Licheng ^{[1
]}

机构：

[1] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Transformers; Feature extraction; Semantics; Remote sensing; Decoding; Visualization; Sensors; Cross-temporal attention (CTA); remote sensing change captioning; symmetric change detection; temporal remote sensing images (RSIs); NETWORK;

D O I：

10.1109/TGRS.2024.3462091

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Remote sensing image change captioning (RSICC) is more challenging than remote sensing change detection task, which requires extracting occurred changes in similar remote sensing image (RSI) pairs while generating change caption. However, few works have been investigated on RSICC, the main challenges come from how to learn abundant change clues and face the modality gap. To handle these problems, we rethink this task from the perspective of obtaining and aligning symmetrical change features for temporal RSIs. In this work, the proposed intertemporal interaction and symmetric difference learning network are cascaded through several multitemporal integration units to model differences from coarse to fine representations. Specifically, we design a cross-temporal attention (CTA) mechanism to probe direct interaction between bi-temporal RSIs for motivating information coupling between intralevel representations and suppressing irrelevant interferences. To learn robust change features, a symmetric difference transformer (SDT) module is devised to guarantee temporal symmetry between the "before-to-after" and "after-to-before" change representations. Besides, the bi-directional triplet ranking loss is adopted to guide the network to learn strongly discriminative and temporal-symmetric change representation. Extensive experimental results on Dubai-CC and LEVIR-CC datasets demonstrate that our framework with the proposed components can achieve excellent performance and surpass recent state-of-the-art methods. https://github.com/romanticLYP/TISDNet

引用

页数：13

共 46 条

[1]

Banerjee S, 2005, P ACL WORKSHOP INTRI, DOI DOI 10.3115/1626355.1626389

[2] Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning [J].

Cai, Chen ;

Wang, Yi ;

Yap, Kim-Hui .

REMOTE SENSING, 2023, 15 (23)

[3] Changes to Captions: An Attentive Network for Remote Sensing Change Captioning [J].

Chang, Shizhen ;

Ghamisi, Pedram .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 :6047-6060

[4] A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection [J].

Chen, Hao ;

Shi, Zhenwei .

REMOTE SENSING, 2020, 12 (10)

[5] When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs [J].

Cheng, Gong ;

Yang, Ceyuan ;

Yao, Xiwen ;

Guo, Lei ;

Han, Junwei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (05) :2811-2821

[6] Change Detection of Deforestation in the Brazilian Amazon Using Landsat Data and Convolutional Neural Networks [J].

de Bem, Pablo Pozzobon ;

de Carvalho Junior, Osmar Abilio ;

Guimaraes, Renato Fontes ;

Trancoso Gomes, Roberto Arnaldo .

REMOTE SENSING, 2020, 12 (06)

[7]

Faghri F, 2018, Arxiv, DOI arXiv:1707.05612

[8] Transferable Convolutional Neural Network for Weed Mapping With Multisensor Imagery [J].

Farooq, Adnan ;

Jia, Xiuping ;

Hu, Jiankun ;

Zhou, Jun .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[9]

Frome A., 2013, ADV NEURAL INF PROCE, V26

[10] Generating the captions for remote sensing images: A spatial-channel attention based memory-guided transformer approach [J].

Gajbhiye, Gaurav O. ;

Nandedkar, Abhijeet, V .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114

← 1 2 3 4 5 →