Feature Guided Masked Autoencoder for Self-Supervised Learning in Remote Sensing

被引:5
作者
Wang, Yi [1 ]
Hernandez, Hugo Hernandez [1 ]
Albrecht, Conrad M. [2 ]
Zhu, Xiao Xiang [1 ,3 ]
机构
[1] Tech Univ Munich, Chair Data Sci Earth Observat, D-80333 Munich, Germany
[2] German Aerosp Ctr DLR, Remote Sensing Technol Inst, D-82234 Wessling, Germany
[3] Munich Ctr Machine Learning, D-80333 Munich, Germany
关键词
Image reconstruction; Remote sensing; Radar polarimetry; Image edge detection; Synthetic aperture radar; Histograms; Feature extraction; Noise; Indexes; Earth; Earth observation; geospatial foundation models; masked autoencoder (MAE); remote sensing (RS); self-supervised learning; ORIENTED GRADIENTS; COVER; HISTOGRAMS; BENCHMARK; SCALE; INDEX; NDVI;
D O I
10.1109/JSTARS.2024.3493237
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Self-supervised learning guided by masked image modeling, such as masked autoencoder (MAE), has attracted wide attention for pretraining vision transformers in remote sensing. However, MAE tends to excessively focus on pixel details, limiting the model's capacity for semantic understanding, particularly for noisy synthetic aperture radar (SAR) images. In this article, we explore spectral and spatial remote sensing image features as improved MAE-reconstruction targets. We first conduct a study on reconstructing various image features, all performing comparably well or better than raw pixels. Based on such observations, we propose feature guided MAE (FG-MAE): reconstructing a combination of histograms of oriented gradients (HOG) and normalized difference indices (NDI) for multispectral images, and reconstructing HOG for SAR images. Experimental results on three downstream tasks illustrate the effectiveness of FG-MAE with a particular boost for SAR imagery (e.g., up to 5% better than MAE on EuroSAT-SAR). Furthermore, we demonstrate the well-inherited scalability of FG-MAE and release a first series of pretrained vision transformers for medium-resolution SAR and multispectral images.
引用
收藏
页码:321 / 336
页数:16
相关论文
共 55 条
[1]  
Ali M, 2001, INT GEOSCI REMOTE SE, P2298, DOI 10.1109/IGARSS.2001.977981
[2]  
Bao H., 2021, P INT C LEARN REPR
[3]   Self-Supervised Learning for Scene Classification in Remote Sensing: Current State of the Art and Perspectives [J].
Berg, Paul ;
Minh-Tan Pham ;
Courty, Nicolas .
REMOTE SENSING, 2022, 14 (16)
[4]  
Bommasani R., 2021, arXiv
[6]   On the relation between NDVI, fractional vegetation cover, and leaf area index [J].
Carlson, TN ;
Ripley, DA .
REMOTE SENSING OF ENVIRONMENT, 1997, 62 (03) :241-252
[7]  
Cha K, 2024, Arxiv, DOI [arXiv:2304.05215, DOI 10.1109/JSTARS.2024.3401772]
[8]   Unsupervised Multimodal Change Detection Based on Structural Relationship Graph Representation Learning [J].
Chen, Hongruixuan ;
Yokoya, Naoto ;
Wu, Chen ;
Du, Bo .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[9]   Exchange means change: An unsupervised single-temporal change detection framework based on intra- and inter-image patch exchange [J].
Chen, Hongruixuan ;
Song, Jian ;
Wu, Chen ;
Du, Bo ;
Yokoya, Naoto .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 206 :87-105
[10]  
Chen M, 2020, PR MACH LEARN RES, V119