Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices

被引:0
|
作者
Cai, Siqi [1 ]
Liu, Xuan [2 ]
Yuan, Jingling [1 ]
Zhou, Qihua [3 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
[3] Shenzhen Univ, Sch Comp Sci & Software Engn, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Prompt tuning; Transfer learning; Contrastive language-image pre-training; Edge intelligence; DATASET;
D O I
10.1016/j.patcog.2025.111460
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pre-trained vision-language models (VLMs) have been the foundation for diverse intelligent services in human life. Common VLMs hold large parameter scales and require heavy memory overhead for model pre-training, which poses challenges in adapting them to edge devices. To enable memory-efficient VLMs, previous works mainly focus on the prompt engineering technique that utilizes trainable soft prompts instead of manually designing hard prompts. However, to update fewer than 3% of prompt parameters, these studies still require the back-propagation chain to traverse pre-trained models with extensive parameters. Consequently, the intermediate activation variables and gradients occupy a significant amount of memory resources, greatly hindering their adaptation on resource-constrained edge devices. In view of the above, we propose a memory efficient prompt-tuning method, named Prompt-Ladder. Our main idea is to adopt a lightweight ladder network as an agent to bypass VLMs during back-propagation for the parameter optimization of the designed multi-model prompt module. The ladder network fuses the intermediate output of VLMs as a guide and selects important parameters of VLMs to initialize for the maintenance of model performance. We also share parameters of the ladder network between text and image data to obtain amore semantically aligned representation across modalities for the optimization of the prompt module. The experiments across seven datasets demonstrate that Prompt-Ladder can significantly reduce memory resource usage by at least 27% compared to baselines while maintaining relatively good performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] LAPT: Label-Driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
    Zhang, Yabin
    Zhu, Wenjie
    He, Chenhang
    Zhang, Lei
    COMPUTER VISION - ECCV 2024, PT LXXII, 2025, 15130 : 271 - 288
  • [22] Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning
    Gao, Zhengqing
    Ao, Xiang
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 439 - 452
  • [23] Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
    Shu, Manli
    Nie, Weili
    Huang, De-An
    Yu, Zhiding
    Goldstein, Tom
    Anandkumar, Anima
    Xiao, Chaowei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [24] Learning to Prompt for Vision-Language Emotion Recognition
    Xie, Hongxia
    Chung, Hua
    Shuai, Hong-Han
    Cheng, Wen-Huang
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [25] Constraint embedding for prompt tuning in vision-language pre-trained model
    Cheng, Keyang
    Wei, Liutao
    Tang, Jingfeng
    Zhan, Yongzhao
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [26] Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning
    Ding, Kun
    Zhang, Haojian
    Yu, Qiang
    Wang, Ying
    Xiang, Shiming
    Pan, Chunhong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1528 - 1536
  • [27] Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
    Xing, Yinghui
    Wu, Qirui
    Cheng, De
    Zhang, Shizhou
    Liang, Guoqiang
    Wang, Peng
    Zhang, Yanning
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2056 - 2068
  • [28] Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
    Wang, Yubin
    Jiang, Xinyang
    Cheng, De
    Li, Dongsheng
    Zhao, Cairong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5749 - 5757
  • [29] Concept-Guided Prompt Learning for Generalization in Vision-Language Models
    Zhang, Yi
    Zhang, Ce
    Yu, Ke
    Tang, Yushun
    He, Zhihai
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7377 - 7386
  • [30] SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models
    Ma, Xiaosong
    Zhang, Jie
    Guo, Song
    Xu, Wenchao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,