A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引：3

作者：

Zhu, Chaoyang ^{[1
]}

Chen, Long ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;

D O I：

10.1109/TPAMI.2024.3413013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

引用

页码：8954 / 8975

页数：22

共 50 条

[1] Open-Vocabulary And Multitask Image Segmentation
Pan, Lihu
Yang, Yunting
Wang, Zhengkui
Shan, Wen
Yin, Jaili
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1048 - 1049
[2] Open-Vocabulary Camouflaged Object Segmentation
Pang, Youwei
Zhao, Xiaoqi
Zuo, Jiaming
Zhang, Lihe
Lu, Huchuan
COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 476 - 495
[3] Generalization Boosted Adapter for Open-Vocabulary Segmentation
Xu, Wenhao
Wang, Changwei
Feng, Xuxiang
Xu, Rongtao
Huang, Longzhao
Zhang, Zherui
Guo, Li
Xu, Shibiao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 520 - 533
[4] Open-Vocabulary RGB-Thermal Semantic Segmentation
Zhao, Guoqiang
Huang, Junjie
Yan, Xiaoyun
Wang, Zhaojing
Tang, Junwei
Ou, Yangjun
Hu, Xinrong
Peng, Tao
COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 304 - 320
[5] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
Barsellotti, Luca
Amoroso, Roberto
Baraldi, Lorenzo
Cucchiara, Rita
IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
[6] Open-Vocabulary Object Detection via Scene Graph Discovery
Shi, Hengcan
Hayat, Munawar
Cai, Jianfei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4012 - 4021
[7] Unified Open-Vocabulary Dense Visual Prediction
Shi, Hengcan
Hayat, Munawar
Cai, Jianfei
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8704 - 8716
[8] Open-Vocabulary Instance Segmentation-Boundary IS-Goal
Tang, Quan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 420 - 435
[9] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Lan, Mengcheng
Chen, Chaofeng
Ke, Yiping
Wang, Xinjiang
Feng, Litong
Zhang, Wayne
COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 70 - 88
[10] Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
Fang, Hao
Wu, Peng
Li, Yawei
Zhang, Xinxin
Lu, Xiankai
COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 225 - 241

← 1 2 3 4 5 →