SatViT: Pretraining Transformers for Earth Observation

被引：22

作者：

Fuller, Anthony ^{[1
]}

Millard, Koreen ^{[2
]}

Green, James R. ^{[1
]}

机构：

[1] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada

[2] Carleton Univ, Dept Geog & Environm Studies, Ottawa, ON K1S 5B6, Canada

来源：

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS | 2022年 / 19卷

关键词：

Task analysis; Transformers; Decoding; Predictive models; Image reconstruction; Data models; Computational modeling; Pretraining; self-supervised learning (SSL); vision transformer (ViT); REPRESENTATIONS;

D O I：

10.1109/LGRS.2022.3201489

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Despite the enormous success of the "pretraining and fine-tuning" paradigm, widespread across machine learning, it has yet to pervade remote sensing (RS). To help rectify this, we pretrain a vision transformer (ViT) on 1.3 million satellite-derived RS images. We pretrain SatViT using a state-of-the-art (SOTA) self-supervised learning (SSL) algorithm called masked autoencoding (MAE), which learns general representations by reconstructing held-out image patches. Crucially, this approach does not require annotated data, allowing us to pretrain on unlabeled images acquired from Sentinel-1 and 2. After fine-tuning, SatViT outperforms SOTA ImageNet and RS-specific pretrained models on both of our downstream tasks. We further improve the overall accuracy (OA) (by 3.2% and 0.21%) by continuing to pretrain SatViT-still using MAE-on the unlabelled target datasets. Most importantly, we release our code, pretrained model weights, and tutorials aimed at helping researchers fine-tune our models (https://github.com/antofuller/SatViT).

引用

页数：5

共 37 条

[1]

Bao H., 2021, INT C LEARN REPRESEN

[2] Vision Transformers for Remote Sensing Image Classification [J].

Bazi, Yakoub ;

Bashmal, Laila ;

Rahhal, Mohamad M. Al ;

Dayil, Reham Al ;

Ajlan, Naif Al .

REMOTE SENSING, 2021, 13 (03) :1-20

[3] A peatland productivity and decomposition parameter database [J].

Bona, Kelly Ann ;

Hilger, Arlene ;

Burgess, Magdalena ;

Wozney, Nicole ;

Shaw, Cindy .

ECOLOGY, 2018, 99 (10) :2406-2406

[4]

Bourgeau-Chavez LL, 2019, ORNL DAAC, DOI 10.3334/ORNLDAAC/1703

[5] When CNNs Meet Vision Transformer: A Joint Framework for Remote Sensing Scene Classification [J].

Deng, Peifang ;

Xu, Kejie ;

Huang, Hong .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19

[6]

Dieleman C, 2020, ORNL DAAC, DOI 10.3334/ORNLDAAC/1740

[7] Exploring Vision Transformers for Polarimetric SAR Image Classification [J].

Dong, Hongwei ;

Zhang, Lamei ;

Zou, Bin .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[8]

Dosovitskiy, 2020, INT C LEARN REPRESEN

[9] Mapping and understanding the vulnerability of northern peatlands to permafrost thaw at scales relevant to community adaptation planning [J].

Gibson, C. ;

Cottenie, K. ;

Gingras-Hill, T. ;

Kokelj, S., V ;

Baltzer, J. L. ;

Chasmer, L. ;

Turetsky, M. R. .

ENVIRONMENTAL RESEARCH LETTERS, 2021, 16 (05)

[10]

He K., 2021, ARXIV, P16000, DOI DOI 10.48550/ARXIV.2111.06377

← 1 2 3 4 →