Images Speak in Images: A Generalist Painter for In-Context Visual Learning

被引:53
|
作者
Wang, Xinlong [1 ]
Wang, Wen [2 ]
Cao, Yue [1 ]
Shen, Chunhua [2 ]
Huang, Tiejun [1 ,3 ]
机构
[1] Beijing Acad Artificial Intelligence, Beijing, Peoples R China
[2] Zhejiang Univ, Hangzhou, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00660
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In-context learning, as a new paradigm in NLP, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. But in computer vision, the difficulties for in-context learning lie in that tasks vary significantly in the output representations, thus it is unclear how to define the general-purpose task prompts that the vision model can understand and transfer to out-of-domain tasks. In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images. With this idea, our training process is extremely simple, which performs standard masked image modeling on the stitch of input and output image pairs. This makes the model capable of performing tasks conditioned on visible image patches. Thus, during inference, we can adopt a pair of input and output images from the same task as the input condition, to indicate which task to perform. Without bells and whistles, our generalist Painter can achieve competitive performance compared to well-established task-specific models, on seven representative vision tasks ranging from high-level visual understanding to low-level image processing. In addition, Painter significantly outperforms recent generalist models on several challenging tasks.
引用
收藏
页码:6830 / 6839
页数:10
相关论文
共 50 条
  • [1] In-context learning enables multimodal large language models to classify cancer pathology images
    Ferber, Dyke
    Woelflein, Georg
    Wiest, Isabella C.
    Ligero, Marta
    Sainath, Srividhya
    Ghaffari Laleh, Narmin
    El Nahhas, Omar S. M.
    Mueller-Franzes, Gustav
    Jaeger, Dirk
    Truhn, Daniel
    Kather, Jakob Nikolas
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [2] What Makes Good Examples for Visual In-Context Learning?
    Zhang, Yuanhan
    Zhou, Kaiyang
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] In-Context In-Context Learning with Transformer Neural Processes
    Ashman, Matthew
    Diaconu, Cristiana
    Weller, Adrian
    Turner, Richard E.
    SYMPOSIUM ON ADVANCES IN APPROXIMATE BAYESIAN INFERENCE, 2024, 253 : 1 - 29
  • [4] Exploring Effective Factors for Improving Visual In-Context Learning
    Sun, Yanpeng
    Chen, Qiang
    Wang, Jian
    Wang, Jingdong
    Li, Zechao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 2147 - 2160
  • [5] Visual In-Context Learning for Large Vision-Language Models
    Zhou, Yucheng
    Le, Xiang
    Wang, Qianning
    Shen, Jianbing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
  • [6] Instruct Me More! Random Prompting for Visual In-Context Learning
    Zhang, Jiahao
    Wang, Bowen
    Li, Liangzhi
    Nakashima, Yuta
    Nagahara, Hajime
    arXiv, 2023,
  • [7] Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation
    Suo, Wei
    Lai, Lanqing
    Sun, Mengyang
    Zhang, Hanwang
    Wang, Peng
    Zhang, Yanning
    COMPUTER VISION-ECCV 2024, PT XLVI, 2025, 15104 : 18 - 35
  • [8] A glance at in-context learning
    Wu, Yongliang
    Yang, Xu
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (05)
  • [9] The Learnability of In-Context Learning
    Wies, Noam
    Levine, Yoav
    Shashua, Amnon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
    Bai, Yu
    Chen, Fan
    Wang, Huan
    Xiong, Caiming
    Mei, Song
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,