Learning Representations of Satellite Images From Metadata Supervision

被引:0
作者
Bourcier, Jules [1 ,2 ]
Dashyan, Gohar [1 ]
Alahari, Karteek [2 ]
Chanussot, Jocelyn [2 ]
机构
[1] Preligens, Paris, France
[2] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP,LJK, Grenoble, France
来源
COMPUTER VISION - ECCV 2024, PT XXVII | 2025年 / 15085卷
关键词
Self-supervised and multimodal learning; Remote sensing; BENCHMARK; CLASSIFICATION;
D O I
10.1007/978-3-031-73383-3_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning is increasingly applied to Earth observation problems that leverage satellite and other remotely sensed data. Within satellite imagery, metadata such as time and location often hold significant semantic information that improves scene understanding. In this paper, we introduce Satellite Metadata-Image Pretraining (SatMIP), a new approach for harnessing metadata in the pretraining phase through a flexible and unified multimodal learning objective. SatMIP represents metadata as textual captions and aligns images with metadata in a shared embedding space by solving a metadata-image contrastive task. Our model learns a non-trivial image representation that can effectively handle recognition tasks. We further enhance this model by combining image self-supervision and metadata supervision, introducing SatMIPS. As a result, SatMIPS improves over its image-image pretraining baseline, SimCLR, and accelerates convergence. Comparison against four recent contrastive and masked autoencoding-based methods for remote sensing also highlight the efficacy of our approach. Furthermore, our framework enables multimodal classification with metadata to improve the performance of visual features, and yields more robust hierarchical pretraining. Code and pretrained models will be made available at: https://github.com/preligens-lab/satmip.
引用
收藏
页码:54 / 71
页数:18
相关论文
共 62 条
[51]  
Van Etten A, 2019, Arxiv, DOI [arXiv:1807.01232, 10.48550/arXiv.1807.01232]
[52]  
Vaswani A, 2017, ADV NEUR IN, V30
[53]   Scene Classification With Recurrent Attention of VHR Remote Sensing Images [J].
Wang, Qi ;
Liu, Shaoteng ;
Chanussot, Jocelyn ;
Li, Xuelong .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (02) :1155-1167
[54]   Self-Supervised Learning in Remote Sensing [J].
Wang, Yi ;
Albrecht, Conrad M. ;
Ait Ali Braham, Nassim ;
Mou, Lichao ;
Zhu, Xiao Xiang .
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2022, 10 (04) :213-247
[55]   Unsupervised Feature Learning via Non-Parametric Instance Discrimination [J].
Wu, Zhirong ;
Xiong, Yuanjun ;
Yu, Stella X. ;
Lin, Dahua .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3733-3742
[56]  
Zhai M., 2019, BRIT MACH VIS C
[57]   Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain [J].
Zhang, Tong ;
Gao, Peng ;
Dong, Hao ;
Zhuang, Yin ;
Wang, Guanqun ;
Zhang, Wei ;
Chen, He .
REMOTE SENSING, 2022, 14 (22)
[58]   A New Benchmark and an Attribute-Guided Multilevel Feature Representation Network for Fine-Grained Ship Classification in Optical Remote Sensing Images [J].
Zhang, Xiaohan ;
Lv, Yafei ;
Yao, Libo ;
Xiong, Wei ;
Fu, Chunlong .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 :1271-1285
[59]   EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata [J].
Zheng, Chenhao ;
Shrivastava, Ayush ;
Owens, Andrew .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :6945-6956
[60]   Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images [J].
Zheng, Xiaochen ;
Kellenberger, Benjamin ;
Gong, Rui ;
Hajnsek, Irena ;
Tuia, Devis .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :732-741