Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt Learning with Data-Dependent Prior

被引:0
|
作者
Cho, Youngjae [1 ]
Bae, HeeSun [1 ]
Shin, Seungjae [1 ]
Youn, Yeo Dong [2 ]
Joo, Weonyoung [3 ]
Moon, Il-Chul [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Seoul Natl Univ, Seoul, South Korea
[3] Ewha Womans Univ, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent Vision-Language Pretrained (VLP) models have become the backbone for many downstream tasks, but they are utilized as frozen model without learning. Prompt learning is a method to improve the pre-trained VLP model by adding a learnable context vector to the inputs of the text encoder. In a few-shot learning scenario of the downstream task, MLE training can lead the context vector to overfit dominant image features in the training data. This overfitting can potentially harm the generalization ability, especially in the presence of a distribution shift between the training and test dataset. This paper presents a Bayesian-based framework of prompt learning, which could alleviate the over-fitting issues on few-shot learning application and increase the adaptability of prompts on unseen instances. Specifically, modeling data-dependent prior enhances the adaptability of text features for both seen and unseen image features without the trade-off of performance between them. Based on the Bayesian framework, we utilize the Wasserstein Gradient Flow in the estimation of our target posterior distribution, which enables our prompt to be flexible in capturing the complex modes of image features. We demonstrate the effectiveness of our method on benchmark datasets for several experiments by showing statistically significant improvements on performance compared to existing methods. The code is available at https://github.com/youngjae-cho/APP.
引用
收藏
页码:11552 / 11560
页数:9
相关论文
共 36 条
  • [1] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [2] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [3] Conditional Prompt Learning for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804
  • [4] Consistent prompt learning for vision-language models
    Zhang, Yonggang
    Tian, Xinmei
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [5] Learning to Prompt for Vision-Language Emotion Recognition
    Xie, Hongxia
    Chung, Hua
    Shuai, Hong-Han
    Cheng, Wen-Huang
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [6] Cascade Prompt Learning for Vision-Language Model Adaptation
    Wu, Ge
    Zhang, Xin
    Li, Zheng
    Chen, Zhaowei
    Liang, Jiajun
    Yang, Jian
    Li, Xiang
    COMPUTER VISION - ECCV 2024, PT L, 2025, 15108 : 304 - 321
  • [7] CoPL: Contextual Prompt Learning for Vision-Language Understanding
    Goswami, Koustava
    Karanam, Srikrishna
    Udhayanan, Prateksha
    Joseph, K. J.
    Srinivasan, Balaji Vasan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18090 - 18098
  • [8] Learning Domain Invariant Prompt for Vision-Language Models
    Zhao, Cairong
    Wang, Yubin
    Jiang, Xinyang
    Shen, Yifei
    Song, Kaitao
    Li, Dongsheng
    Miao, Duoqian
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1348 - 1360
  • [9] Vision-Language Tracking With CLIP and Interactive Prompt Learning
    Zhu, Hong
    Lu, Qingyang
    Xue, Lei
    Zhang, Pingping
    Yuan, Guanglin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) : 3659 - 3670
  • [10] GalLoP: Learning Global and Local Prompts for Vision-Language Models
    Lafon, Marc
    Ramzi, Elias
    Rambour, Clement
    Audebert, Nicolas
    Thome, Nicolas
    COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 264 - 282