A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引：3

作者：

Zhu, Chaoyang ^{[1
]}

Chen, Long ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;

D O I：

10.1109/TPAMI.2024.3413013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

引用

页码：8954 / 8975

页数：22

共 50 条

[11] OV-VIS: Open-Vocabulary Video Instance Segmentation
Wang, Haochen
Yan, Cilin
Chen, Keyan
Jiang, Xiaolong
Tang, Xu
Hu, Yao
Kang, Guoliang
Xie, Weidi
Gavves, Efstratios
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5048 - 5065
[12] Image-text aggregation for open-vocabulary semantic segmentation
Cheng, Shengyang
Huang, Jianyong
Wang, Xiaodong
Huang, Lei
Wei, Zhiqiang
NEUROCOMPUTING, 2025, 630
[13] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
Shi, Hengcan
Dao, Son Duy
Cai, Jianfei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
[14] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
Kawano, Yasufumi
Aoki, Yoshimitsu
IEEE ACCESS, 2024, 12 : 88322 - 88331
[15] Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation
Wang, Xihua
Ji, Lei
Yan, Kun
Sun, Yuchong
Song, Ruihua
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 407 - 419
[16] OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
Chen, Keyan
Jiang, Xiaolong
Wang, Haochen
Yan, Cilin
Gao, Yan
Tang, Xu
Hu, Yao
Xie, Weidi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5387 - 5409
[17] Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Takmaz, Ayca
Delitzas, Alexandros
Sumner, Robert W.
Engelmann, Francis
Wald, Johanna
Tombari, Federico
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2558 - 2565
[18] A survey on face detection in the wild: Past, present and future
Zafeiriou, Stefanos
Zhang, Cha
Zhang, Zhengyou
COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 138 : 1 - 24
[19] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Xu, Yifan
Zhang, Mengdan
Yang, Xiaoshan
Xu, Changsheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
[20] CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Zhu, Wenqi
Cao, Jiale
Xie, Jin
Yang, Shuangming
Pang, Yanwei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1098 - 1110

← 1 2 3 4 5 →