CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding

被引:28
作者
Muhtar, Dilxat [1 ]
Zhang, Xueliang [1 ]
Xiao, Pengfeng [1 ]
Li, Zhenshi [1 ]
Gu, Feng [1 ]
机构
[1] Nanjing Univ, Sch Geog & Ocean Sci, Jiangsu Prov Key Lab Geog Informat Sci & Technol, Key Lab Land Satellite Remote Sensing Applicat,Min, Nanjing 210023, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷
基金
中国国家自然科学基金;
关键词
Semantics; Task analysis; Data models; Visualization; Remote sensing; Image reconstruction; Convolutional neural networks; Contrastive learning (CL); deep learning (DL); masked image modeling (MIM); remote sensing (RS) pretraining; self-supervised learning (SSL); OBJECT DETECTION;
D O I
10.1109/TGRS.2023.3268232
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Self-supervised learning (SSL) has gained wide-spread attention in the remote sensing (RS) and Earth observation (EO) communities owing to its ability to learn task-agnostic representations without human-annotated labels. Nevertheless, most existing RS SSL methods are limited to learning either global semantic separable or local spatial perceptible representations. We argue that this learning strategy is suboptimal in the realm of RS since the required representations for different RS downstream tasks are often varied and complex. In this study, we proposed a unified SSL framework that is better suited for RS image representation learning. The proposed SSL framework, contrastive mask image distillation (CMID), is capable of learning representations with both global semantic separability and local spatial perceptibility by combining contrastive learning (CL) with masked image modeling (MIM) in a self-distillation way. Furthermore, our CMID learning framework is architecture-agnostic, which is compatible with both convolutional neural networks (CNNs) and vision transformers (ViTs), allowing CMID to be easily adapted to a variety of deep learning (DL) applications for RS understanding. Comprehensive experiments have been carried out on four downstream tasks (i.e., scene classification, semantic segmentation, object detection, and change detection) and the results show that models pretrained using CMID achieve a better performance than other state-of-the-art SSL methods on multiple downstream tasks. The code and pretrained models will be made available at https://github.com/NJU-LHRS/official-CMIDhttps://github.com/NJU-LHRS/official-CMID to facilitate SSL research and speed up the development of RS images DL applications.
引用
收藏
页数:17
相关论文
共 95 条
[1]  
Aleissaee AA, 2022, Arxiv, DOI arXiv:2209.01206
[2]   Masked Siamese Networks for Label-Efficient Learning [J].
Assran, Mahmoud ;
Caron, Mathilde ;
Misra, Ishan ;
Bojanowski, Piotr ;
Bordes, Florian ;
Vincent, Pascal ;
Joulin, Armand ;
Rabbat, Mike ;
Ballas, Nicolas .
COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 :456-473
[3]   Geography-Aware Self-Supervised Learning [J].
Ayush, Kumar ;
Uzkent, Burak ;
Meng, Chenlin ;
Tanmay, Kumar ;
Burke, Marshall ;
Lobell, David ;
Ermon, Stefano .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :10161-10170
[4]   Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community [J].
Ball, John E. ;
Anderson, Derek T. ;
Chan, Chee Seng .
JOURNAL OF APPLIED REMOTE SENSING, 2017, 11
[5]  
Bao Hangbo, 2021, arXiv, DOI [10.48550/arXiv.2106.08254, DOI 10.48550/ARXIV.2106.08254]
[6]  
Caron M, 2020, ADV NEUR IN, V33
[7]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[8]   Remote Sensing Image Change Detection With Transformers [J].
Chen, Hao ;
Qi, Zipeng ;
Shi, Zhenwei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[9]  
Chen M, 2020, PR MACH LEARN RES, V119
[10]  
Chen T, 2020, PR MACH LEARN RES, V119