LAPT: Label-Driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

被引：1

作者：

Zhang, Yabin ^{[1
,2
]}

Zhu, Wenjie ^{[1
]}

He, Chenhang ^{[1
]}

Zhang, Lei ^{[1
,2
]}

机构：

[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China

[2] OPPO Res Inst, Shenzhen, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LXXII | 2025年 / 15130卷

关键词：

Out-of-distribution detection; Vision-language models; Automated prompt tuning; Label-driven learning;

D O I：

10.1007/978-3-031-73220-1_16

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Out-of-distribution (OOD) detection is crucial for model reliability, as it identifies samples from unknown classes and reduces errors due to unexpected inputs. Vision-Language Models (VLMs) such as CLIP are emerging as powerful tools for 5OOD detection by integrating multi-modal information. However, the practical application of such systems is challenged by manual prompt engineering, which demands domain expertise and is sensitive to linguistic nuances. In this paper, we introduce Label-driven Automated Prompt Tuning (LAPT), a novel approach to OOD detection that reduces the need for manual prompt engineering. We develop distribution-aware prompts with in-distribution (ID) class names and negative labels mined automatically. Training samples linked to these class labels are collected autonomously via image synthesis and retrieval methods, allowing for prompt learning without manual effort. We utilize a simple cross-entropy loss for prompt optimization, with cross-modal and cross-distribution mixing strategies to reduce image noise and explore the intermediate space between distributions, respectively. The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention. With extensive experiments, LAPT consistently outperforms manually crafted prompts, setting a new standard for OOD detection. Moreover, LAPT not only enhances the distinction between ID and OOD samples, but also improves the ID classification accuracy and strengthens the generalization robustness to covariate shifts, resulting in outstanding performance in challenging full-spectrum OOD detection tasks. Codes are available at https://github.com/YBZh/LAPT.

引用

页码：271 / 288

页数：18

共 50 条

[1] Adversarial Prompt Tuning for Vision-Language Models
Zhang, Jiaming
Ma, Xingjun
Wang, Xin
Qiu, Lingyu
Wang, Jiaqi
Jiang, Yu-Gang
Sang, Jitao
COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
[2] Distribution-Aware Prompt Tuning for Vision-Language Models
Cho, Eulrang
Kim, Jooyeon
Kim, Hyunwoo J.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21947 - 21956
[3] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
Ma, Chengcheng
Liu, Yang
Deng, Jiankang
Xie, Lingxi
Dong, Weiming
Xu, Changsheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4616 - 4629
[4] Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
Wu, Cheng-En
Tian, Yu
Yu, Haichao
Wang, Heng
Morgado, Pedro
Hu, Yu Hen
Yang, Linjie
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15442 - 15451
[5] Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Kan, Baoshuo
Wang, Teng
Lu, Wenpeng
Zhen, Xiantong
Guan, Weili
Zheng, Feng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15624 - 15634
[6] Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
Zhu, Beier
Niu, Yulei
Lee, Saeil
Hur, Minhoe
Zhang, Hanwang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3834 - 3842
[7] Learning to Prompt for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
[8] Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
International Journal of Computer Vision, 2022, 130 : 2337 - 2348
[9] CPT: Colorful Prompt Tuning for pre-trained vision-language models
Yao, Yuan
Zhang, Ao
Zhang, Zhengyan
Liu, Zhiyuan
Chua, Tat-Seng
Sun, Maosong
AI OPEN, 2024, 5 : 30 - 38
[10] CTPT: Continual Test-time Prompt Tuning for vision-language models
Wang, Fan
Han, Zhongyi
Liu, Xingbo
Yin, Yilong
Gao, Xin
PATTERN RECOGNITION, 2025, 161

← 1 2 3 4 5 →