Vision-Language Models for Zero-Shot Classification of Remote Sensing Images

被引:6
|
作者
Al Rahhal, Mohamad Mahmoud [1 ]
Bazi, Yakoub [2 ]
Elgibreen, Hebah [3 ]
Zuair, Mansour [2 ]
机构
[1] King Saud Univ, Coll Appl Comp Sci, Appl Comp Sci Dept, Riyadh 11543, Saudi Arabia
[2] King Saud Univ, Comp Engn Dept, Coll Comp & Informat Sci, Riyadh 11543, Saudi Arabia
[3] Coll Comp & Informat Sci, Informat Technol Dept, Riyadh 11543, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 22期
关键词
Contrastive Language-Image Pre-Training models; remote sensing; zero-shot classification;
D O I
10.3390/app132212462
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Zero-shot classification presents a challenge since it necessitates a model to categorize images belonging to classes it has not encountered during its training phase. Previous research in the field of remote sensing (RS) has explored this task by training image-based models on known RS classes and then attempting to predict the outcomes for unfamiliar classes. Despite these endeavors, the outcomes have proven to be less than satisfactory. In this paper, we propose an alternative approach that leverages vision-language models (VLMs), which have undergone pre-training to grasp the associations between general computer vision image-text pairs in diverse datasets. Specifically, our investigation focuses on thirteen VLMs derived from Contrastive Language-Image Pre-Training (CLIP/Open-CLIP) with varying levels of parameter complexity. In our experiments, we ascertain the most suitable prompt for RS images to query the language capabilities of the VLM. Furthermore, we demonstrate that the accuracy of zero-shot classification, particularly when using large CLIP models, on three widely recognized RS scene datasets yields superior results compared to existing RS solutions.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Label Propagation for Zero-shot Classification with Vision-Language Models
    Stojnic, Vladan
    Kalantidis, Yannis
    Tolias, Giorgos
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23209 - 23218
  • [2] Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models
    Zheng, Zangwei
    Ma, Mingyuan
    Wang, Kai
    Qin, Ziheng
    Yue, Xiangyu
    You, Yang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19068 - 19079
  • [3] MLTU: mixup long-tail unsupervised zero-shot image classification on vision-language models
    Jia, Yunpeng
    Ye, Xiufen
    Mei, Xinkui
    Liu, Yusong
    Guo, Shuxiang
    MULTIMEDIA SYSTEMS, 2024, 30 (03)
  • [4] Inference Calibration of Vision-Language Foundation Models for Zero-Shot and Few-Shot Learning
    Hu, Minyang
    Chang, Hong
    Shan, Shiguang
    Chen, Xilin
    PATTERN RECOGNITION LETTERS, 2025, 192 : 15 - 21
  • [5] Multiple Prompt Fusion for Zero-Shot Lesion Detection Using Vision-Language Models
    Guo, Miaotian
    Yi, Huahui
    Qin, Ziyuan
    Wang, Haiying
    Men, Aidong
    Lao, Qicheng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT V, 2023, 14224 : 283 - 292
  • [6] Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
    Shu, Manli
    Nie, Weili
    Huang, De-An
    Yu, Zhiding
    Goldstein, Tom
    Anandkumar, Anima
    Xiao, Chaowei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] A Hybrid Embedding for Generalized Zero-Shot Scene Classification in Remote Sensing Images
    Rambabu, Damalla
    Datla, Rajeshreddy
    Chalavadi, Vishnu
    Mohan, C. Krishna
    2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
  • [8] Zero-Shot Scene Classification for High Spatial Resolution Remote Sensing Images
    Li, Aoxue
    Lu, Zhiwu
    Wang, Liwei
    Xiang, Tao
    Wen, Ji-Rong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (07): : 4157 - 4167
  • [9] VLPSR: Enhancing Zero-Shot Object ReID with Vision-Language Model
    Hu, Mingzhe
    ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT II, 2025, 15047 : 56 - 69
  • [10] CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
    Javed, Sajid
    Mahmood, Arif
    Ganapathil, Iyyakutti Iyappan
    Dharej, Fayaz Ali
    Werghil, Naoufel
    Bennamoun, Mohammed
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 11450 - 11459