Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices

被引:0
|
作者
Cai, Siqi [1 ]
Liu, Xuan [2 ]
Yuan, Jingling [1 ]
Zhou, Qihua [3 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
[3] Shenzhen Univ, Sch Comp Sci & Software Engn, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Prompt tuning; Transfer learning; Contrastive language-image pre-training; Edge intelligence; DATASET;
D O I
10.1016/j.patcog.2025.111460
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pre-trained vision-language models (VLMs) have been the foundation for diverse intelligent services in human life. Common VLMs hold large parameter scales and require heavy memory overhead for model pre-training, which poses challenges in adapting them to edge devices. To enable memory-efficient VLMs, previous works mainly focus on the prompt engineering technique that utilizes trainable soft prompts instead of manually designing hard prompts. However, to update fewer than 3% of prompt parameters, these studies still require the back-propagation chain to traverse pre-trained models with extensive parameters. Consequently, the intermediate activation variables and gradients occupy a significant amount of memory resources, greatly hindering their adaptation on resource-constrained edge devices. In view of the above, we propose a memory efficient prompt-tuning method, named Prompt-Ladder. Our main idea is to adopt a lightweight ladder network as an agent to bypass VLMs during back-propagation for the parameter optimization of the designed multi-model prompt module. The ladder network fuses the intermediate output of VLMs as a guide and selects important parameters of VLMs to initialize for the maintenance of model performance. We also share parameters of the ladder network between text and image data to obtain amore semantically aligned representation across modalities for the optimization of the prompt module. The experiments across seven datasets demonstrate that Prompt-Ladder can significantly reduce memory resource usage by at least 27% compared to baselines while maintaining relatively good performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Adversarial Prompt Tuning for Vision-Language Models
    Zhang, Jiaming
    Ma, Xingjun
    Wang, Xin
    Qiu, Lingyu
    Wang, Jiaqi
    Jiang, Yu-Gang
    Sang, Jitao
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
  • [2] Distribution-Aware Prompt Tuning for Vision-Language Models
    Cho, Eulrang
    Kim, Jooyeon
    Kim, Hyunwoo J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21947 - 21956
  • [3] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
    Ma, Chengcheng
    Liu, Yang
    Deng, Jiankang
    Xie, Lingxi
    Dong, Weiming
    Xu, Changsheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4616 - 4629
  • [4] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [5] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [6] A survey of efficient fine-tuning methods for Vision-Language Models - Prompt and Adapter
    Xing, Jialu
    Liu, Jianping
    Wang, Jian
    Sun, Lulu
    Chen, Xi
    Gu, Xunxun
    Wang, Yingfei
    COMPUTERS & GRAPHICS-UK, 2024, 119
  • [7] Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
    Wu, Cheng-En
    Tian, Yu
    Yu, Haichao
    Wang, Heng
    Morgado, Pedro
    Hu, Yu Hen
    Yang, Linjie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15442 - 15451
  • [8] Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
    Kan, Baoshuo
    Wang, Teng
    Lu, Wenpeng
    Zhen, Xiantong
    Guan, Weili
    Zheng, Feng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15624 - 15634
  • [9] Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
    Zhu, Beier
    Niu, Yulei
    Lee, Saeil
    Hur, Minhoe
    Zhang, Hanwang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3834 - 3842
  • [10] Conditional Prompt Learning for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804