Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices

被引:0
|
作者
Cai, Siqi [1 ]
Liu, Xuan [2 ]
Yuan, Jingling [1 ]
Zhou, Qihua [3 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
[3] Shenzhen Univ, Sch Comp Sci & Software Engn, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Prompt tuning; Transfer learning; Contrastive language-image pre-training; Edge intelligence; DATASET;
D O I
10.1016/j.patcog.2025.111460
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pre-trained vision-language models (VLMs) have been the foundation for diverse intelligent services in human life. Common VLMs hold large parameter scales and require heavy memory overhead for model pre-training, which poses challenges in adapting them to edge devices. To enable memory-efficient VLMs, previous works mainly focus on the prompt engineering technique that utilizes trainable soft prompts instead of manually designing hard prompts. However, to update fewer than 3% of prompt parameters, these studies still require the back-propagation chain to traverse pre-trained models with extensive parameters. Consequently, the intermediate activation variables and gradients occupy a significant amount of memory resources, greatly hindering their adaptation on resource-constrained edge devices. In view of the above, we propose a memory efficient prompt-tuning method, named Prompt-Ladder. Our main idea is to adopt a lightweight ladder network as an agent to bypass VLMs during back-propagation for the parameter optimization of the designed multi-model prompt module. The ladder network fuses the intermediate output of VLMs as a guide and selects important parameters of VLMs to initialize for the maintenance of model performance. We also share parameters of the ladder network between text and image data to obtain amore semantically aligned representation across modalities for the optimization of the prompt module. The experiments across seven datasets demonstrate that Prompt-Ladder can significantly reduce memory resource usage by at least 27% compared to baselines while maintaining relatively good performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
    Jin, Woojeong
    Cheng, Yu
    Shen, Yelong
    Chen, Weizhu
    Ren, Xiang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2763 - 2775
  • [42] Modal Interaction-Enhanced Prompt Learning by Transformer Decoder for Vision-Language Models
    Liu, Mingyue
    Zhao, Honggang
    Ma, Longfei
    Li, Xiang
    Ji, Yucheng
    Li, Mingyong
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2023, 2023, 14120 : 163 - 174
  • [43] Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
    Sun, Hongbo
    He, Xiangteng
    Zhou, Jiahuan
    Peng, Yuxin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5828 - 5836
  • [44] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
    Mingyue Liu
    Honggang Zhao
    Longfei Ma
    Mingyong Li
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [45] Pre-training A Prompt Pool for Vision-Language Model
    Liu, Jun
    Gu, Yang
    Yang, Zhaohua
    Guo, Shuai
    Liu, Huaqiu
    Chen, Yiqiang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [46] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
    Liu, Mingyue
    Zhao, Honggang
    Ma, Longfei
    Li, Mingyong
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
  • [47] Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
    Li, Juncheng
    Gao, Minghe
    Wei, Longhui
    Tang, Siliang
    Zhang, Wenqiao
    Li, Mengze
    Ji, Wei
    Tian, Qi
    Chua, Tat-Seng
    Zhuang, Yueting
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2551 - 2562
  • [48] MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion
    Fan, Hao
    Ma, Zhaoyang
    Li, Yong
    Tian, Rui
    Chen, Yunli
    Gao, Chenlong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 : 328 - 339
  • [49] Fine-grained multi-modal prompt learning for vision-language models
    Liu, Yunfei
    Deng, Yunziwei
    Liu, Anqi
    Liu, Yanan
    Li, Shengyang
    NEUROCOMPUTING, 2025, 636
  • [50] MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models
    Miao, Yongzhu
    Li, Shasha
    Tang, Jintao
    Wang, Ting
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 25 - 30