Remote Sensing Images Change Detection Using Triple Attention Mechanism to Aggregate Global and Local Features

被引：0

作者：

Ding, Chenyin ^{[1
]}

Cheng, Qianwen ^{[1
]}

Lin, Yukun ^{[1
,2
]}

Yu, Jia ^{[1
]}

Du, Shiqiang ^{[1
]}

Du, Bo ^{[3
,4
]}

机构：

[1] Shanghai Normal Univ, Sch Environm & Geog Sci, Shanghai 200030, Peoples R China

[2] Hubei Prov Key Lab Multimedia & Network Commun Eng, Wuhan 430072, Peoples R China

[3] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China

[4] Wuhan Univ, Hubei Prov Key Lab Multimedia & Network Commun Eng, Wuhan 430072, Peoples R China

来源：

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS | 2025年 / 22卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Attention mechanisms; Convolution; Vectors; Transformers; Remote sensing; Multimedia communication; Kernel; Fuses; Training; Attention mechanism; change detection; deep learning; global and local features; high spatial resolution remote sensing; NETWORK;

D O I：

10.1109/LGRS.2025.3542065

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

The change detection of high-resolution images plays an important role in practical applications. However, most existing studies use local or global attention mechanisms alone to filter and screen for changing features. In this study, we proposed a triple attention multiscale fusion network (TAMFNet) that can effectively utilize both global and local attention mechanisms, thereby improving the ability to detect the location of change areas and fully outline the change areas. First, we employed a fully convolutional network to extract features from dual temporal images at different scales. Second, three complementary attention mechanisms, namely, the spatial attention mechanism (SAM), channel attention mechanism (CAM), and the multihead self-attention (MSA) module, were integrated to extract and fuse global and local features. Finally, to address semantic and scale differences, we utilized the cross scale fusion (CSF) module, pyramid pooling module (PPM), and pyramid receptive field (PRF) module to aggregate features from adjacent scales for comprehensive feature transmission. To demonstrate the effectiveness of our method, we tested it on the LEVIR-CD and WHU-CD datasets. The results showed that our model achieved intersection over union (IOU) scores of 80.17% and 77.23% on the datasets, outperforming comparative models. Ablation experiments on the LEVIR-CD dataset confirmed the positive impact of each intermediate module in TAMFNet, with an overall 2.35% increase in IOU score.

引用

页数：5

共 18 条

[1] Chen J., Et al., DASNet: Dual attentive fully convolutional Siamese networks for change detection in high-resolution satellite images, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 14, pp. 1194-1206, (2021)
[2] Wang X., Girshick R., Gupta A., He K., Non-local neural networks, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 7794-7803, (2018)
[3] Liu Y., Pang C., Zhan Z., Zhang X., Yang X., Building change detection for remote sensing images using a dual-task constrained deep Siamese convolutional network model, IEEE Geosci. Remote Sens. Lett., 18, 5, pp. 811-815, (2021)
[4] Wang M., Zhu B., Zhang J., Fan J., Ye Y., A lightweight change detection network based on feature interleaved fusion and bistage decoding, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 17, pp. 2557-2569, (2024)
[5] Zhang X., Cheng S., Wang L., Li H., Asymmetric cross-attention hierarchical network based on CNN and transformer for bitemporal remote sensing images change detection, IEEE Trans. Geosci. Remote Sens., 61, (2023)
[6] Liu B., Yu A., Zuo X., Wang R., Qiu C., Yu X., Deep hierarchical transformer for change detection in high-resolution remote sensing images, Eur. J. Remote Sens., 56, 1, pp. 1-10, (2023)
[7] Woo S., Park J., Lee J.-Y., Kweon I.S., CBAM: Convolutional block attention module, Computer Vision—ECCV 2018, 11211, pp. 3-19, (2018)
[8] Yin H., Et al., Attention-guided Siamese networks for change detection in high resolution remote sensing images, Int. J. Appl. Earth Observ. Geoinf., 117, (2023)
[9] Dosovitskiy A., Et al., An image is worth 16x16 words: Transformers for image recognition at scale, (2020)
[10] Yu C., Gao C., Wang J., Yu G., Shen C., Sang N., BiSeNet v2: Bilateral network with guided aggregation for real-time semantic segmentation, Int. J. Comput. Vis., 129, 11, pp. 3051-3068, (2021)

← 1 2 →