CSP-DCPE: Category-Specific Prompt with Deep Contextual Prompt Enhancement for Vision-Language Models

被引：0

作者：

Wu, Chunlei ^{[1
,2
]}

Wu, Yixiang ^{[1
,2
]}

Xu, Qinfu ^{[1
,2
]}

Zi, Xuebin ^{[1
,2
]}

机构：

[1] China Univ Petr East China, Qingdao Inst Software, Coll Comp Sci & Technol, Qingdao 266580, Peoples R China

[2] China Univ Petr East China, Coll Comp Sci & Technol, Shandong Key Lab Intelligent Oil & Gas Ind Softwar, Qingdao 266580, Peoples R China

来源：

ELECTRONICS | 2025年 / 14卷 / 04期

基金：

中国国家自然科学基金;

关键词：

image classification; pre-trained vision-language models; multi-modal; prompt learning;

D O I：

10.3390/electronics14040673

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, prompt learning has emerged as a viable technique for fine-tuning pre-trained vision-language models (VLMs). The use of prompts allows pre-trained VLMs to be quickly adapted to specific downstream tasks, bypassing the necessity to update the original pre-trained weights. Nevertheless, much of the existing work on prompt learning has focused primarily on the utilization of non-specific prompts, with little attention paid to the category-specific data. In this paper, we present a novel method, the Category-Specific Prompt (CSP), which integrates task-oriented information into our model, thereby augmenting its capacity to comprehend and execute complex tasks. In order to enhance the exploitation of features, thereby optimizing the utilization of the combination of category-specific and non-specific prompts, we introduce a novel deep prompt-learning method, Deep Contextual Prompt Enhancement (DCPE). DCPE outputs features with rich text embedding knowledge that changes in response to input through attention-based interactions, thereby ensuring that our model contains instance-oriented information. Combining the above two methods, our architecture CSP-DCPE contains both task-oriented and instance-oriented information, and achieves state-of-the-art average scores on 11 benchmark image-classification datasets.

引用

页数：22

共 44 条

[1] Learning to Prompt for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
[2] Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
International Journal of Computer Vision, 2022, 130 : 2337 - 2348
[3] CoPL: Contextual Prompt Learning for Vision-Language Understanding
Goswami, Koustava
Karanam, Srikrishna
Udhayanan, Prateksha
Joseph, K. J.
Srinivasan, Balaji Vasan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18090 - 18098
[4] Conditional Prompt Learning for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804
[5] Consistent prompt learning for vision-language models
Zhang, Yonggang
Tian, Xinmei
KNOWLEDGE-BASED SYSTEMS, 2025, 310
[6] Adversarial Prompt Tuning for Vision-Language Models
Zhang, Jiaming
Ma, Xingjun
Wang, Xin
Qiu, Lingyu
Wang, Jiaqi
Jiang, Yu-Gang
Sang, Jitao
COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
[7] Learning Domain Invariant Prompt for Vision-Language Models
Zhao, Cairong
Wang, Yubin
Jiang, Xinyang
Shen, Yifei
Song, Kaitao
Li, Dongsheng
Miao, Duoqian
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1348 - 1360
[8] DPO: Discrete Prompt Optimization for Vision-Language Models
Liang, Nanhao
Liu, Yong
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 671 - 675
[9] Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models
Jing, Yinuo
Wang, Chunyu
Zhang, Ruxu
Liang, Kongming
Ma, Zhanyu
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5716 - 5724
[10] JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
Guo, Yuncheng
Guo, Xiaodong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 28695 - 28705

← 1 2 3 4 5 →