Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices

被引:0
|
作者
Cai, Siqi [1 ]
Liu, Xuan [2 ]
Yuan, Jingling [1 ]
Zhou, Qihua [3 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
[3] Shenzhen Univ, Sch Comp Sci & Software Engn, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Prompt tuning; Transfer learning; Contrastive language-image pre-training; Edge intelligence; DATASET;
D O I
10.1016/j.patcog.2025.111460
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pre-trained vision-language models (VLMs) have been the foundation for diverse intelligent services in human life. Common VLMs hold large parameter scales and require heavy memory overhead for model pre-training, which poses challenges in adapting them to edge devices. To enable memory-efficient VLMs, previous works mainly focus on the prompt engineering technique that utilizes trainable soft prompts instead of manually designing hard prompts. However, to update fewer than 3% of prompt parameters, these studies still require the back-propagation chain to traverse pre-trained models with extensive parameters. Consequently, the intermediate activation variables and gradients occupy a significant amount of memory resources, greatly hindering their adaptation on resource-constrained edge devices. In view of the above, we propose a memory efficient prompt-tuning method, named Prompt-Ladder. Our main idea is to adopt a lightweight ladder network as an agent to bypass VLMs during back-propagation for the parameter optimization of the designed multi-model prompt module. The ladder network fuses the intermediate output of VLMs as a guide and selects important parameters of VLMs to initialize for the maintenance of model performance. We also share parameters of the ladder network between text and image data to obtain amore semantically aligned representation across modalities for the optimization of the prompt module. The experiments across seven datasets demonstrate that Prompt-Ladder can significantly reduce memory resource usage by at least 27% compared to baselines while maintaining relatively good performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Constraint embedding for prompt tuning in vision-language pre-trained modelConstraint embedding for prompt tuning in vision-language pre-trained modelK. Cheng et al.
    Keyang Cheng
    Liutao Wei
    Jingfeng Tang
    Yongzhao Zhan
    Multimedia Systems, 2025, 31 (1)
  • [32] A Slim Prompt-Averaged Consistency prompt learning for vision-language model
    He, Siyu
    Wang, Shengsheng
    Long, Sifan
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [33] CSP-DCPE: Category-Specific Prompt with Deep Contextual Prompt Enhancement for Vision-Language Models
    Wu, Chunlei
    Wu, Yixiang
    Xu, Qinfu
    Zi, Xuebin
    ELECTRONICS, 2025, 14 (04):
  • [34] Cascade Prompt Learning for Vision-Language Model Adaptation
    Wu, Ge
    Zhang, Xin
    Li, Zheng
    Chen, Zhaowei
    Liang, Jiajun
    Yang, Jian
    Li, Xiang
    COMPUTER VISION - ECCV 2024, PT L, 2025, 15108 : 304 - 321
  • [35] CoPL: Contextual Prompt Learning for Vision-Language Understanding
    Goswami, Koustava
    Karanam, Srikrishna
    Udhayanan, Prateksha
    Joseph, K. J.
    Srinivasan, Balaji Vasan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18090 - 18098
  • [36] Vision-Language Tracking With CLIP and Interactive Prompt Learning
    Zhu, Hong
    Lu, Qingyang
    Xue, Lei
    Zhang, Pingping
    Yuan, Guanglin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) : 3659 - 3670
  • [37] Task-to-Instance Prompt Learning for Vision-Language Models at Test Time
    Lu, Zhihe
    Bai, Jiawang
    Li, Xin
    Xiao, Zeyu
    Wang, Xinchao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1908 - 1920
  • [38] PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning
    Hussein, Noor
    Shamshad, Fahad
    Naseer, Muzammal
    Nandakumar, Karthik
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 698 - 708
  • [39] UMPA: Unified multi-modal prompt with adapter for vision-language models
    Jin, Zhengwei
    Wei, Yun
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [40] Prompt-guided and multimodal landscape scenicness assessments with vision-language models
    Levering, Alex
    Marcos, Diego
    Jacobs, Nathan
    Tuia, Devis
    PLOS ONE, 2024, 19 (09):