A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引：3

作者：

Zhu, Chaoyang ^{[1
]}

Chen, Long ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;

D O I：

10.1109/TPAMI.2024.3413013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

引用

页码：8954 / 8975

页数：22

共 50 条

[31] Expanding Open-Vocabulary Understanding for UAV Aerial Imagery: A Vision-Language Framework to Semantic Segmentation
Huang, Bangju
Li, Junhui
Luan, Wuyang
Tan, Jintao
Li, Chenglong
Huang, Longyang
[J]. DRONES, 2025, 9 (02)
[32] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
Sacchi, Niccolo
Nanchen, Alexandre
Jaggi, Martin
Cernak, Milos
[J]. INTERSPEECH 2019, 2019, : 3362 - 3366
[33] Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
Etchegaray, Djamahl
Huang, Zi
Harada, Tatsuya
Luo, Yadan
[J]. COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 133 - 151
[34] OV-VG: A benchmark for open-vocabulary visual grounding
Wang, Chunlei
Feng, Wenquan
Li, Xiangtai
Cheng, Guangliang
Lyu, Shuchang
Liu, Binghao
Chen, Lijiang
Zhao, Qi
[J]. NEUROCOMPUTING, 2024, 591
[35] Open-vocabulary spoken term detection using graphone-based hybrid recognition systems
Akbacak, Murat
Vergyri, Dimitra
Stolcke, Andreas
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5240 - 5243
[36] DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks
Li, Zhiheng
Zhong, Yujie
Song, Ran
Li, Tianjiao
Ma, Lin
Zhang, Wei
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7728 - 7741
[37] SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
Li, Siyuan
Ke, Lei
Yang, Yung-Hsu
Piccinelli, Luigi
Segu, Mattia
Danelljan, Martin
Van Gool, Luc
[J]. COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 1 - 18
[38] Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers
Khoi Pham
Kafle, Kushal
Lin, Zhe
Ding, Zhihong
Cohen, Scott
Tran, Quan
Shrivastava, Abhinav
[J]. COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 201 - 219
[39] Prompt-guided DETR with RoI-pruned masked attention for open-vocabulary object detection
Song, Hwanjun
Bang, Jihwan
[J]. PATTERN RECOGNITION, 2024, 155
[40] Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
Shin, Hyeon-Kyeong
Han, Hyewon
Kim, Doyeon
Chung, Soo-Whan
Kang, Hong-Goo
[J]. INTERSPEECH 2022, 2022, : 1871 - 1875

← 1 2 3 4 5 →