Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning

被引:10
作者
Cai, Chen [1 ]
Wang, Yi [2 ]
Yap, Kim-Hui [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] Hong Kong Polytech Univ, Dept Elect & Elect Engn, Hong Kong, Peoples R China
关键词
image change captioning; remote sensing; multi-layer change awareness; transformer;
D O I
10.3390/rs15235611
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Remote sensing image change captioning (RSICC) aims to automatically generate sentences describing the difference in content in remote sensing bitemporal images. Recent works extract the changes between bitemporal features and employ a hierarchical approach to fuse multiple changes of interest, yielding change captions. However, these methods directly aggregate all features, potentially incorporating non-change-focused information from each encoder layer into the change caption decoder, adversely affecting the performance of change captioning. To address this problem, we proposed an Interactive Change-Aware Transformer Network (ICT-Net). ICT-Net is able to extract and incorporate the most critical changes of interest in each encoder layer to improve change description generation. It initially extracts bitemporal visual features from the CNN backbone and employs an Interactive Change-Aware Encoder (ICE) to capture the crucial difference between these features. Specifically, the ICE captures the most change-aware discriminative information between the paired bitemporal features interactively through difference and content attention encoding. A Multi-Layer Adaptive Fusion (MAF) module is proposed to adaptively aggregate the relevant change-aware features in the ICE layers while minimizing the impact of irrelevant visual features. Moreover, we extend the ICE to extract multi-scale changes and introduce a novel Cross Gated-Attention (CGA) module into the change caption decoder to select essential discriminative multi-scale features to improve the change captioning performance. We evaluate our method on two RSICC datasets (e.g., LEVIR-CC and LEVIRCCD), and the experimental results demonstrate that our method achieves a state-of-the-art performance.
引用
收藏
页数:21
相关论文
共 54 条
[1]  
Banerjee S, 2005, P ACL WORKSH INTR EX, P65
[2]   PPCNET: A Combined Patch-Level and Pixel-Level End-to-End Deep Network for High-Resolution Remote Sensing Image Change Detection [J].
Bao, Tengfei ;
Fu, Chenqin ;
Fang, Tao ;
Huo, Hong .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (10) :1797-1801
[3]   SSN: Stockwell Scattering Network for SAR Image Change Detection [J].
Chen, Gong ;
Zhao, Yanan ;
Wang, Yi ;
Yap, Kim-Hui .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[4]   Remote Sensing Image Change Detection With Transformers [J].
Chen, Hao ;
Qi, Zipeng ;
Shi, Zhenwei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[5]   Urban Built Environment Assessment Based on Scene Understanding of High-Resolution Remote Sensing Imagery [J].
Chen, Jie ;
Dai, Xinyi ;
Guo, Ya ;
Zhu, Jingru ;
Mei, Xiaoming ;
Deng, Min ;
Sun, Geng .
REMOTE SENSING, 2023, 15 (05)
[6]   DiffusionDet: Diffusion Model for Object Detection [J].
Chen, Shoufa ;
Sun, Peize ;
Song, Yibing ;
Luo, Ping .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :19773-19786
[7]   ISNet: Towards Improving Separability for Remote Sensing Image Change Detection [J].
Cheng, Gong ;
Wang, Guangxing ;
Han, Junwei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[8]   CAPTIONING CHANGES IN BI-TEMPORAL REMOTE SENSING IMAGES [J].
Chouaf, Seloua ;
Hoxha, Genc ;
Smara, Youcef ;
Melgani, Farid .
2021 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM IGARSS, 2021, :2891-2894
[9]   Meshed-Memory Transformer for Image Captioning [J].
Cornia, Marcella ;
Stefanini, Matteo ;
Baraldi, Lorenzo ;
Cucchiara, Rita .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10575-10584
[10]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arxiv.1810.04805]