SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

被引:0
|
作者
Chen, Yi-Syuan [1 ]
Song, Yun-Zhu [1 ]
Yeo, Cheng Yu [1 ]
Liu, Bei [2 ]
Fu, Jianlong [2 ]
Shuai, Hong-Han [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
10.1109/ICCV51070.2023.01415
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Pre-trained Transformers exhibit an intriguing capacity for in-context learning. Without gradient updates, these models can rapidly construct new predictors from demonstrations presented in the inputs. Recent works promote this ability in the vision-language domain by incorporating visual information into large language models that can already make in-context predictions. However, these methods could inherit issues in the language domain, such as template sensitivity and hallucination. Also, the scale of these language models raises a significant demand for computations, making learning and operating these models resource-intensive. To this end, we raise a question: "How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?". To answer it, we propose a succinct and general framework, Self-supervised IN-Context learning (SINC), that introduces a meta-model to learn on self-supervised prompts consisting of tailored demonstrations. The learned models can be transferred to downstream tasks for making incontext predictions on-the-fly. Extensive experiments show that SINC outperforms gradient-based methods in various vision-language tasks under few-shot settings. Furthermore, the designs of SINC help us investigate the benefits of in-context learning across different tasks, and the analysis further reveals the essential components for the emergence of in-context learning in the vision-language domain.
引用
收藏
页码:15384 / 15396
页数:13
相关论文
共 50 条
  • [31] Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks
    Akiva, Peri
    Purri, Matthew
    Leotta, Matthew
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8193 - 8205
  • [32] Uncertainty-Aware Self-Supervised Learning of Spatial Perception Tasks
    Nava, Mirko
    Paolillo, Antonio
    Guzzi, Jerome
    Gambardella, Luca Maria
    Giusti, Alessandro
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) : 6693 - 6700
  • [33] Diverse Distributions of Self-Supervised Tasks for Meta-Learning in NLP
    Bansal, Trapit
    Gunasekaran, Karthick
    Wang, Tong
    Munkhdalai, Tsendsuren
    McCallum, Andrew
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5812 - 5824
  • [34] Pretext Tasks Selection for Multitask Self-Supervised Audio Representation Learning
    Zaiem, Salah
    Parcollet, Titouan
    Essid, Slim
    Heba, Abdelwahab
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1439 - 1453
  • [35] Self-supervised Video Representation Learning by Context and Motion Decoupling
    Huang, Lianghua
    Liu, Yu
    Wang, Bin
    Pan, Pan
    Xu, Yinghui
    Jin, Rong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13881 - 13890
  • [36] Adapting vision-language AI models to cardiology tasks
    Arnaout, Rima
    NATURE MEDICINE, 2024, 30 (05) : 1245 - 1246
  • [37] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [38] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [39] Self-Supervised Vision for Climate Downscaling
    Singh, Karandeep
    Jeong, Chaeyoon
    Shidqi, Naufal
    Park, Sungwon
    Nellikkatti, Arjun
    Zeller, Elke
    Cha, Meeyoung
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 7456 - 7464
  • [40] Efficient Self-Supervised Learning Representations for Spoken Language Identification
    Liu, Hexin
    Perera, Leibny Paola Garcia
    Khong, Andy W. H.
    Chng, Eng Siong
    Styles, Suzy J.
    Khudanpur, Sanjeev
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1296 - 1307