Consistent Prompt Tuning for Generalized Category Discovery

被引:0
作者
Yang, Muli [1 ]
Yin, Jie [2 ]
Gu, Yanan [3 ]
Deng, Cheng [2 ]
Zhang, Hanwang [4 ]
Zhu, Hongyuan [1 ]
机构
[1] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore
[2] Xidian Univ, Sch Elect Engn, Xian, Peoples R China
[3] Norinco Grp Testing & Res Inst, Xian, Peoples R China
[4] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore, Singapore
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Category discovery; Prompt learning; Multimodal learning; Transfer learning;
D O I
10.1007/s11263-024-02343-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generalized Category Discovery (GCD) aims at discovering both known and unknown classes in unlabeled data, using the knowledge learned from a limited set of labeled data. Despite today's foundation models being trained with Internet-scale multi-modal corpus, we find that they still struggle in GCD due to the ambiguity in class definitions. In this paper, we present Consistent Prompt Tuning (CPT) to disambiguate the classes for large vision-language models (e.g., CLIP). To this end, CPT learns a set of "task + class" prompts for labeled and unlabeled data of both known and unknown classes, with the "task" tokens globally shared across classes, which contain a unified class definition pattern, e.g., "the foreground is an animal named" or "the background scene is". These prompts are optimized with two efficient regularization techniques that encourage consistent global and local relationships between any two matched inputs. CPT is evaluated on various existing GCD benchmarks, as well as in new practical scenarios with fewer annotations and customized class definitions, demonstrating clear superiority and broad versatility over existing state-of-the-art methods.
引用
收藏
页码:4014 / 4041
页数:28
相关论文
共 166 条
[21]  
Chuyu Z., 2023, Transactions on Machine Learning Research
[22]  
Colin T, 2024, Arxiv, DOI arXiv:2311.05440
[23]  
Conti A, 2023, ADV NEUR IN
[24]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[25]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[26]  
Dosovitskiy A., 2021, ARXIV, P1, DOI 10.48550/ARXIV.2010.11929
[27]   On-the-fly Category Discovery [J].
Du, Ruoyi ;
Chang, Dongliang ;
Liang, Kongming ;
Hospedales, Timothy ;
Song, Yi-Zhe ;
Ma, Zhanyu .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :11691-11700
[28]  
Fan J., 2024, arXiv
[29]  
Fan Lijie, 2023, Advances in Neural Information Processing Systems
[30]  
Fei Y., 2022, BRIT MACH VIS C BMVC, P96