SatViT: Pretraining Transformers for Earth Observation

被引:22
作者
Fuller, Anthony [1 ]
Millard, Koreen [2 ]
Green, James R. [1 ]
机构
[1] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada
[2] Carleton Univ, Dept Geog & Environm Studies, Ottawa, ON K1S 5B6, Canada
关键词
Task analysis; Transformers; Decoding; Predictive models; Image reconstruction; Data models; Computational modeling; Pretraining; self-supervised learning (SSL); vision transformer (ViT); REPRESENTATIONS;
D O I
10.1109/LGRS.2022.3201489
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Despite the enormous success of the "pretraining and fine-tuning" paradigm, widespread across machine learning, it has yet to pervade remote sensing (RS). To help rectify this, we pretrain a vision transformer (ViT) on 1.3 million satellite-derived RS images. We pretrain SatViT using a state-of-the-art (SOTA) self-supervised learning (SSL) algorithm called masked autoencoding (MAE), which learns general representations by reconstructing held-out image patches. Crucially, this approach does not require annotated data, allowing us to pretrain on unlabeled images acquired from Sentinel-1 and 2. After fine-tuning, SatViT outperforms SOTA ImageNet and RS-specific pretrained models on both of our downstream tasks. We further improve the overall accuracy (OA) (by 3.2% and 0.21%) by continuing to pretrain SatViT-still using MAE-on the unlabelled target datasets. Most importantly, we release our code, pretrained model weights, and tutorials aimed at helping researchers fine-tune our models (https://github.com/antofuller/SatViT).
引用
收藏
页数:5
相关论文
共 37 条
[1]  
Bao H., 2021, INT C LEARN REPRESEN
[2]   Vision Transformers for Remote Sensing Image Classification [J].
Bazi, Yakoub ;
Bashmal, Laila ;
Rahhal, Mohamad M. Al ;
Dayil, Reham Al ;
Ajlan, Naif Al .
REMOTE SENSING, 2021, 13 (03) :1-20
[3]   A peatland productivity and decomposition parameter database [J].
Bona, Kelly Ann ;
Hilger, Arlene ;
Burgess, Magdalena ;
Wozney, Nicole ;
Shaw, Cindy .
ECOLOGY, 2018, 99 (10) :2406-2406
[4]  
Bourgeau-Chavez LL, 2019, ORNL DAAC, DOI 10.3334/ORNLDAAC/1703
[5]   When CNNs Meet Vision Transformer: A Joint Framework for Remote Sensing Scene Classification [J].
Deng, Peifang ;
Xu, Kejie ;
Huang, Hong .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[6]  
Dieleman C, 2020, ORNL DAAC, DOI 10.3334/ORNLDAAC/1740
[7]   Exploring Vision Transformers for Polarimetric SAR Image Classification [J].
Dong, Hongwei ;
Zhang, Lamei ;
Zou, Bin .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[8]  
Dosovitskiy, 2020, INT C LEARN REPRESEN
[9]   Mapping and understanding the vulnerability of northern peatlands to permafrost thaw at scales relevant to community adaptation planning [J].
Gibson, C. ;
Cottenie, K. ;
Gingras-Hill, T. ;
Kokelj, S., V ;
Baltzer, J. L. ;
Chasmer, L. ;
Turetsky, M. R. .
ENVIRONMENTAL RESEARCH LETTERS, 2021, 16 (05)
[10]  
He K., 2021, ARXIV, P16000, DOI DOI 10.48550/ARXIV.2111.06377