A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引：3

作者：

Zhu, Chaoyang ^{[1
]}

Chen, Long ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;

D O I：

10.1109/TPAMI.2024.3413013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

引用

页码：8954 / 8975

页数：22

共 50 条

[21] Understanding object descriptions in robotics by open-vocabulary object retrieval and detection
Guadarrama, Sergio
Rodner, Erik
Saenko, Kate
Darrell, Trevor
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (1-3) : 265 - 280
[22] Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching
Zhang, Hao
Xu, Lumin
Lai, Shenqi
Shao, Wenqi
Zheng, Nanning
Luo, Ping
Qiao, Yu
Zhang, Kaipeng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 5741 - 5758
[23] FreeMix: Open-Vocabulary Domain Generalization of Remote-Sensing Images for Semantic Segmentation
Wu, Jingyi
Shi, Jingye
Zhao, Zeyong
Liu, Ziyang
Zhi, Ruicong
REMOTE SENSING, 2025, 17 (08)
[24] Towards Open Vocabulary Learning: A Survey
Wu, Jianzong
Li, Xiangtai
Xu, Shilin
Yuan, Haobo
Ding, Henghui
Yang, Yibo
Li, Xia
Zhang, Jiangning
Tong, Yunhai
Jiang, Xudong
Ghanem, Bernard
Tao, Dacheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 5092 - 5113
[25] Open-vocabulary object detection via debiased curriculum self-training
Zhang, Hanlue
Guan, Dayan
Ke, Xiangrui
El Saddik, Abdulmotaleb
Lu, Shijian
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[26] CLIP-TSA: CLIP-guided open-vocabulary semantic segmentation with two-level semantic awareness
Liang, Zhixue
Dong, Wenyong
Zhang, Bo
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[27] OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
Zhang, Hu
Ku, Jianhua
Tang, Tao
Sun, Haiyang
Huang, Xin
Huang, Zi
Yu, Kaicheng
COMPUTER VISION - ECCV 2024, PT LXXXIV, 2025, 15142 : 1 - 19
[28] GCD-Net: Global consciousness-driven open-vocabulary semantic segmentation network
Wu, Xing
Xu, Zhenyao
Qian, Quan
Huang, Bin
NEUROCOMPUTING, 2025, 636
[29] MVP-SEG: Multi-view Prompt Learning for Open-Vocabulary Semantic Segmentation
Guo, Jie
Wang, Qimeng
Gao, Yan
Jiang, Xiaolong
Lin, Shaohui
Zhang, Baochang
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XII, 2024, 14436 : 158 - 171
[30] Can Identifier Splitting Improve Open-Vocabulary Language Model of Code
Shi, Jieke
Yang, Zhou
He, Junda
Xu, Bowen
Lo, David
2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 1134 - 1138

← 1 2 3 4 5 →