SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

被引:0
|
作者
Chen, Yi-Syuan [1 ]
Song, Yun-Zhu [1 ]
Yeo, Cheng Yu [1 ]
Liu, Bei [2 ]
Fu, Jianlong [2 ]
Shuai, Hong-Han [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
10.1109/ICCV51070.2023.01415
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Pre-trained Transformers exhibit an intriguing capacity for in-context learning. Without gradient updates, these models can rapidly construct new predictors from demonstrations presented in the inputs. Recent works promote this ability in the vision-language domain by incorporating visual information into large language models that can already make in-context predictions. However, these methods could inherit issues in the language domain, such as template sensitivity and hallucination. Also, the scale of these language models raises a significant demand for computations, making learning and operating these models resource-intensive. To this end, we raise a question: "How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?". To answer it, we propose a succinct and general framework, Self-supervised IN-Context learning (SINC), that introduces a meta-model to learn on self-supervised prompts consisting of tailored demonstrations. The learned models can be transferred to downstream tasks for making incontext predictions on-the-fly. Extensive experiments show that SINC outperforms gradient-based methods in various vision-language tasks under few-shot settings. Furthermore, the designs of SINC help us investigate the benefits of in-context learning across different tasks, and the analysis further reveals the essential components for the emergence of in-context learning in the vision-language domain.
引用
收藏
页码:15384 / 15396
页数:13
相关论文
共 50 条
  • [41] Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues
    Xu, Ruijian
    Tao, Chongyang
    Jiang, Daxin
    Zhao, Xueliang
    Zhao, Dongyan
    Yan, Rui
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14158 - 14166
  • [42] NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
    Sammani, Fawaz
    Mukherjee, Tanmoy
    Deligiannis, Nikos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8312 - 8322
  • [43] Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
    Zhang, Xinsong
    Zeng, Yan
    Zhang, Jipeng
    Li, Hang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 551 - 568
  • [44] ConCon-Chi: Concept-Context Chimera benchmark of Personalized Vision-Language Tasks
    Rosasco, Andrea
    Berti, Stefano
    Pasquale, Giulia
    Malafronte, Damiano
    Sato, Shogo
    Segawa, Hiroyuki
    Inada, Tetsugo
    Natale, Lorenzo
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 22239 - 22248
  • [45] VISION-LANGUAGE JOINT LEARNING FOR BOX-SUPERVISED CHANGE DETECTION IN REMOTE SENSING
    Yin, Kanghua
    Liu, Fang
    Liu, Jia
    Xiao, Liang
    2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2024), 2024, : 10254 - 10258
  • [46] Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging
    Albelwi, Saleh
    ENTROPY, 2022, 24 (04)
  • [47] Supervised Pretraining Can Learn In-Context Reinforcement Learning
    Lee, Jonathan N.
    Xie, Annie
    Pacchiano, Aldo
    Chandak, Yash
    Finn, Chelsea
    Nachum, Ofir
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Patch-level Representation Learning for Self-supervised Vision Transformers
    Yun, Sukmin
    Lee, Hankook
    Kim, Jaehyung
    Shin, Jinwoo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
  • [49] Progress and Thinking on Self-Supervised Learning Methods in Computer Vision: A Review
    Chen, Zhihua
    Hu, Bo
    Chen, Zhongsheng
    Zhang, Jiarui
    IEEE SENSORS JOURNAL, 2024, 24 (19) : 29524 - 29544
  • [50] Multi-level Contrastive Learning for Self-Supervised Vision Transformers
    Mo, Shentong
    Sun, Zhun
    Li, Chao
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2777 - 2786