Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

被引:67
作者
Reed, Colorado J. [1 ,2 ]
Gupta, Ritwik [1 ]
Li, Shufan [1 ]
Brockman, Sarah [3 ]
Funk, Christopher [3 ]
Clipp, Brian [3 ]
Keutzer, Kurt [1 ]
Candido, Salvatore [2 ]
Uyttendaele, Matt [2 ]
Darrell, Trevor [1 ]
机构
[1] Berkeley AI Res, Berkeley, CA 94704 USA
[2] Meta AI, FAIR, New York, NY USA
[3] Kitware Inc, Clifton Pk, NY USA
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年
基金
美国国家科学基金会;
关键词
IMAGE CLASSIFICATION; SCENE; DATASET;
D O I
10.1109/ICCV51070.2023.00378
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a 2.4 - 5.6% non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a 0.9 mIoU to 1.7 mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.
引用
收藏
页码:4065 / 4076
页数:12
相关论文
共 67 条
[1]  
Andreoni M., 2022, The New York Times
[2]  
[Anonymous], 2018, REMOTE SENS BASEL, DOI DOI 10.3390/RS10030443
[3]   A Deep Journey into Super-resolution: A Survey [J].
Anwar, Saeed ;
Khan, Salman ;
Barnes, Nick .
ACM COMPUTING SURVEYS, 2020, 53 (03)
[4]   Geography-Aware Self-Supervised Learning [J].
Ayush, Kumar ;
Uzkent, Burak ;
Meng, Chenlin ;
Tanmay, Kumar ;
Burke, Marshall ;
Lobell, David ;
Ermon, Stefano .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :10161-10170
[5]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[6]   THE LAPLACIAN PYRAMID AS A COMPACT IMAGE CODE [J].
BURT, PJ ;
ADELSON, EH .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (04) :532-540
[7]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[8]   Exploring Simple Siamese Representation Learning [J].
Chen, Xinlei ;
He, Kaiming .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753
[9]   Learning Continuous Image Representation with Local Implicit Image Function [J].
Chen, Yinbo ;
Liu, Sifei ;
Wang, Xiaolong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8624-8634
[10]   Remote Sensing Image Scene Classification: Benchmark and State of the Art [J].
Cheng, Gong ;
Han, Junwei ;
Lu, Xiaoqiang .
PROCEEDINGS OF THE IEEE, 2017, 105 (10) :1865-1883