SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

被引：0

作者：

Chen, Yi-Syuan ^{[1
]}

Song, Yun-Zhu ^{[1
]}

Yeo, Cheng Yu ^{[1
]}

Liu, Bei ^{[2
]}

Fu, Jianlong ^{[2
]}

Shuai, Hong-Han ^{[1
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.01415

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Pre-trained Transformers exhibit an intriguing capacity for in-context learning. Without gradient updates, these models can rapidly construct new predictors from demonstrations presented in the inputs. Recent works promote this ability in the vision-language domain by incorporating visual information into large language models that can already make in-context predictions. However, these methods could inherit issues in the language domain, such as template sensitivity and hallucination. Also, the scale of these language models raises a significant demand for computations, making learning and operating these models resource-intensive. To this end, we raise a question: "How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?". To answer it, we propose a succinct and general framework, Self-supervised IN-Context learning (SINC), that introduces a meta-model to learn on self-supervised prompts consisting of tailored demonstrations. The learned models can be transferred to downstream tasks for making incontext predictions on-the-fly. Extensive experiments show that SINC outperforms gradient-based methods in various vision-language tasks under few-shot settings. Furthermore, the designs of SINC help us investigate the benefits of in-context learning across different tasks, and the analysis further reveals the essential components for the emergence of in-context learning in the vision-language domain.

引用

页码：15384 / 15396

页数：13

共 50 条

[31] Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks
Akiva, Peri
Purri, Matthew
Leotta, Matthew
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8193 - 8205
[32] Uncertainty-Aware Self-Supervised Learning of Spatial Perception Tasks
Nava, Mirko
Paolillo, Antonio
Guzzi, Jerome
Gambardella, Luca Maria
Giusti, Alessandro
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) : 6693 - 6700
[33] Diverse Distributions of Self-Supervised Tasks for Meta-Learning in NLP
Bansal, Trapit
Gunasekaran, Karthick
Wang, Tong
Munkhdalai, Tsendsuren
McCallum, Andrew
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5812 - 5824
[34] Pretext Tasks Selection for Multitask Self-Supervised Audio Representation Learning
Zaiem, Salah
Parcollet, Titouan
Essid, Slim
Heba, Abdelwahab
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1439 - 1453
[35] Self-supervised Video Representation Learning by Context and Motion Decoupling
Huang, Lianghua
Liu, Yu
Wang, Bin
Pan, Pan
Xu, Yinghui
Jin, Rong
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13881 - 13890
[36] Adapting vision-language AI models to cardiology tasks
Arnaout, Rima
NATURE MEDICINE, 2024, 30 (05) : 1245 - 1246
[37] Learning to Prompt for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
[38] Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
International Journal of Computer Vision, 2022, 130 : 2337 - 2348
[39] Self-Supervised Vision for Climate Downscaling
Singh, Karandeep
Jeong, Chaeyoon
Shidqi, Naufal
Park, Sungwon
Nellikkatti, Arjun
Zeller, Elke
Cha, Meeyoung
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 7456 - 7464
[40] Efficient Self-Supervised Learning Representations for Spoken Language Identification
Liu, Hexin
Perera, Leibny Paola Garcia
Khong, Andy W. H.
Chng, Eng Siong
Styles, Suzy J.
Khudanpur, Sanjeev
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1296 - 1307

← 1 2 3 4 5 →