A General Self-Supervised Framework for Remote Sensing Image Classification

被引:11
作者
Gao, Yuan [1 ,2 ,3 ]
Sun, Xiaojuan [1 ,2 ,3 ]
Liu, Chao [4 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[2] Chinese Acad Sci, Key Lab Technol Geospatial Informat Proc & Applic, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 101408, Peoples R China
[4] Ditu Beijing Technol Co Ltd, Beijing 100089, Peoples R China
关键词
self-supervised learning; remote sensing; scene classification; deep learning; vision transformer; masked image modeling; SCENE CLASSIFICATION; ATTENTION; MODEL;
D O I
10.3390/rs14194824
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
This paper provides insights into the interpretation beyond simply combining self-supervised learning (SSL) with remote sensing (RS). Inspired by the improved representation ability brought by SSL in natural image understanding, we aim to explore and analyze the compatibility of SSL with remote sensing. In particular, we propose a self-supervised pre-training framework for the first time by applying the masked image modeling (MIM) method to RS image research in order to enhance its efficacy. The completion proxy task used by MIM encourages the model to reconstruct the masked patches, and thus correlate the unseen parts with the seen parts in semantics. Second, in order to figure out how pretext tasks affect downstream performance, we find the attribution consensus of the pre-trained model and downstream tasks toward the proxy and classification targets, which is quite different from that in natural image understanding. Moreover, this transferable consensus is persistent in cross-dataset full or partial fine-tuning, which means that SSL could boost general model-free representation beyond domain bias and task bias (e.g., classification, segmentation, and detection). Finally, on three publicly accessible RS scene classification datasets, our method outperforms the majority of fully supervised state-of-the-art (SOTA) methods with higher accuracy scores on unlabeled datasets.
引用
收藏
页数:18
相关论文
共 75 条
[1]  
Abnar S., 2020, ARXIV
[2]  
[Anonymous], P 32 INT C MACH LEAR
[3]  
[Anonymous], 1993, P 6 INT C NEURAL INF
[4]  
Aral RA, 2018, IEEE INT CONF BIG DA, P2058, DOI 10.1109/BigData.2018.8622212
[5]  
Ba J. L., 2016, Advances in Neural Information Processing Systems (NeurIPS), P1
[6]  
Bao Hangbo, 2021, arXiv
[7]   Vision Transformers for Remote Sensing Image Classification [J].
Bazi, Yakoub ;
Bashmal, Laila ;
Rahhal, Mohamad M. Al ;
Dayil, Reham Al ;
Ajlan, Naif Al .
REMOTE SENSING, 2021, 13 (03) :1-20
[8]   Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification [J].
Bazi, Yakoub ;
Al Rahhal, Mohamad M. ;
Alhichri, Haikel ;
Alajlan, Naif .
REMOTE SENSING, 2019, 11 (24)
[9]   SELF-ORGANIZING NEURAL NETWORK THAT DISCOVERS SURFACES IN RANDOM-DOT STEREOGRAMS [J].
BECKER, S ;
HINTON, GE .
NATURE, 1992, 355 (6356) :161-163
[10]   Attention Augmented Convolutional Networks [J].
Bello, Irwan ;
Zoph, Barret ;
Vaswani, Ashish ;
Shlens, Jonathon ;
Le, Quoc V. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294