Pro-Tuning: Unified Prompt Tuning for Vision Tasks

被引:7
|
作者
Nie, Xing [1 ,2 ]
Ni, Bolin [1 ,2 ]
Chang, Jianlong [3 ]
Meng, Gaofeng [1 ,2 ,4 ]
Huo, Chunlei [5 ,6 ]
Xiang, Shiming [1 ,2 ]
Tian, Qi [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Huawei Cloud & AI, Beijing 100095, Peoples R China
[4] HK Inst Sci & Innovat, CAS Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China
[5] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[6] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
关键词
Task analysis; Adaptation models; Tuning; Computational modeling; Transformers; Visualization; Training; Prompt-based learning; representation learning; task-specific knowledge; transfer learning;
D O I
10.1109/TCSVT.2023.3327605
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In computer vision, fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks. However, deploying it in practice is quite challenging, due to adopting parameter inefficient global update and heavily relying on high-quality downstream data. Recently, prompt-based learning, which adds the task-relevant prompt to adapt the pre-trained models to downstream tasks, has drastically boosted the performance of many natural language downstream tasks. In this work, we extend this notable transfer ability benefited from prompt into vision models as an alternative to fine-tuning. To this end, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt diverse frozen pre-trained models to a wide variety of downstream vision tasks. The key to Pro-tuning is prompt-based tuning, i.e., learning task-specific vision prompts for downstream input images with the pre-trained model frozen. By only training a small number of additional parameters, Pro-tuning can generate compact and robust downstream models both for CNN-based and transformer-based network architectures. Comprehensive experiments evidence that the proposed Pro-tuning outperforms fine-tuning on a broad range of vision tasks and scenarios, including image classification (under generic objects, class imbalance, image corruption, natural adversarial examples, and out-of-distribution generalization), and dense prediction tasks such as object detection and semantic segmentation.
引用
收藏
页码:4653 / 4667
页数:15
相关论文
共 38 条
  • [1] Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
    Xing, Yinghui
    Wu, Qirui
    Cheng, De
    Zhang, Shizhou
    Liang, Guoqiang
    Wang, Peng
    Zhang, Yanning
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2056 - 2068
  • [2] When Adversarial Training Meets Prompt Tuning: Adversarial Dual Prompt Tuning for Unsupervised Domain Adaptation
    Cui, Chaoran
    Liu, Ziyi
    Gong, Shuai
    Zhu, Lei
    Zhang, Chunyun
    Liu, Hui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1427 - 1440
  • [3] EVA: Enabling Video Attributes With Hierarchical Prompt Tuning for Action Recognition
    Ruan, Xiangning
    Yin, Qixiang
    Su, Fei
    Zhao, Zhicheng
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 971 - 975
  • [4] Prompt Tuning in Code Intelligence: An Experimental Evaluation
    Wang, Chaozheng
    Yang, Yuanhang
    Gao, Cuiyun
    Peng, Yun
    Zhang, Hongyu
    Lyu, Michael R.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (11) : 4869 - 4885
  • [5] Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification
    Lan, Long
    Wang, Fengxiang
    Zheng, Xiangtao
    Wang, Zengmao
    Liu, Xinwang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [6] Memory-Tuning: A Unified Parameter-Efficient Tuning Method for Pre-Trained Language Models
    Qi, Wang
    Liu, Rui
    Zuo, Yuan
    Li, Fengzhi
    Chen, Yong
    Wu, Junjie
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2025, 33 : 1 - 10
  • [7] Iterative Soft Prompt-Tuning for Unsupervised Domain Adaptation
    Zhu, Yi
    Wang, Shuqin
    Qiang, Jipeng
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8580 - 8592
  • [8] Prompt Tuning of Deep Neural Networks for Speaker-Adaptive Visual Speech Recognition
    Kim, Minsu
    Kim, Hyung-Il
    Ro, Yong Man
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (02) : 1042 - 1055
  • [9] Domain Prompt Tuning via Meta Relabeling for Unsupervised Adversarial Adaptation
    Jin, Xin
    Lan, Cuiling
    Zeng, Wenjun
    Chen, Zhibo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8333 - 8347
  • [10] RingMoGPT: A Unified Remote Sensing Foundation Model for Vision, Language, and Grounded Tasks
    Wang, Peijin
    Hu, Huiyang
    Tong, Boyuan
    Zhang, Ziqi
    Yao, Fanglong
    Feng, Yingchao
    Zhu, Zining
    Chang, Hao
    Diao, Wenhui
    Ye, Qixiang
    Sun, Xian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63