Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

被引：67

作者：

Reed, Colorado J. ^{[1
,2
]}

Gupta, Ritwik ^{[1
]}

Li, Shufan ^{[1
]}

Brockman, Sarah ^{[3
]}

Funk, Christopher ^{[3
]}

Clipp, Brian ^{[3
]}

Keutzer, Kurt ^{[1
]}

Candido, Salvatore ^{[2
]}

Uyttendaele, Matt ^{[2
]}

Darrell, Trevor ^{[1
]}

机构：

[1] Berkeley AI Res, Berkeley, CA 94704 USA

[2] Meta AI, FAIR, New York, NY USA

[3] Kitware Inc, Clifton Pk, NY USA

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

基金：

美国国家科学基金会;

关键词：

IMAGE CLASSIFICATION; SCENE; DATASET;

D O I：

10.1109/ICCV51070.2023.00378

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a 2.4 - 5.6% non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a 0.9 mIoU to 1.7 mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.

引用

页码：4065 / 4076

页数：12

共 67 条

[1]

Andreoni M., 2022, The New York Times

[2]

[Anonymous], 2018, REMOTE SENS BASEL, DOI DOI 10.3390/RS10030443

[3] A Deep Journey into Super-resolution: A Survey [J].

Anwar, Saeed ;

Khan, Salman ;

Barnes, Nick .

ACM COMPUTING SURVEYS, 2020, 53 (03)

[4] Geography-Aware Self-Supervised Learning [J].

Ayush, Kumar ;

Uzkent, Burak ;

Meng, Chenlin ;

Tanmay, Kumar ;

Burke, Marshall ;

Lobell, David ;

Ermon, Stefano .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :10161-10170

[5] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[6] THE LAPLACIAN PYRAMID AS A COMPACT IMAGE CODE [J].

BURT, PJ ;

ADELSON, EH .

IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (04) :532-540

[7] Emerging Properties in Self-Supervised Vision Transformers [J].

Caron, Mathilde ;

Touvron, Hugo ;

Misra, Ishan ;

Jegou, Herve ;

Mairal, Julien ;

Bojanowski, Piotr ;

Joulin, Armand .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640

[8] Exploring Simple Siamese Representation Learning [J].

Chen, Xinlei ;

He, Kaiming .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753

[9] Learning Continuous Image Representation with Local Implicit Image Function [J].

Chen, Yinbo ;

Liu, Sifei ;

Wang, Xiaolong .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8624-8634

[10] Remote Sensing Image Scene Classification: Benchmark and State of the Art [J].

Cheng, Gong ;

Han, Junwei ;

Lu, Xiaoqiang .

PROCEEDINGS OF THE IEEE, 2017, 105 (10) :1865-1883

← 1 2 3 4 5 6 7 →