A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引:3
作者
Zhu, Chaoyang [1 ]
Chen, Long [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
关键词
Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;
D O I
10.1109/TPAMI.2024.3413013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.
引用
收藏
页码:8954 / 8975
页数:22
相关论文
共 50 条
  • [31] Expanding Open-Vocabulary Understanding for UAV Aerial Imagery: A Vision-Language Framework to Semantic Segmentation
    Huang, Bangju
    Li, Junhui
    Luan, Wuyang
    Tan, Jintao
    Li, Chenglong
    Huang, Longyang
    [J]. DRONES, 2025, 9 (02)
  • [32] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
    Sacchi, Niccolo
    Nanchen, Alexandre
    Jaggi, Martin
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 3362 - 3366
  • [33] Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
    Etchegaray, Djamahl
    Huang, Zi
    Harada, Tatsuya
    Luo, Yadan
    [J]. COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 133 - 151
  • [34] OV-VG: A benchmark for open-vocabulary visual grounding
    Wang, Chunlei
    Feng, Wenquan
    Li, Xiangtai
    Cheng, Guangliang
    Lyu, Shuchang
    Liu, Binghao
    Chen, Lijiang
    Zhao, Qi
    [J]. NEUROCOMPUTING, 2024, 591
  • [35] Open-vocabulary spoken term detection using graphone-based hybrid recognition systems
    Akbacak, Murat
    Vergyri, Dimitra
    Stolcke, Andreas
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5240 - 5243
  • [36] DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks
    Li, Zhiheng
    Zhong, Yujie
    Song, Ran
    Li, Tianjiao
    Ma, Lin
    Zhang, Wei
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7728 - 7741
  • [37] SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
    Li, Siyuan
    Ke, Lei
    Yang, Yung-Hsu
    Piccinelli, Luigi
    Segu, Mattia
    Danelljan, Martin
    Van Gool, Luc
    [J]. COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 1 - 18
  • [38] Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers
    Khoi Pham
    Kafle, Kushal
    Lin, Zhe
    Ding, Zhihong
    Cohen, Scott
    Tran, Quan
    Shrivastava, Abhinav
    [J]. COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 201 - 219
  • [39] Prompt-guided DETR with RoI-pruned masked attention for open-vocabulary object detection
    Song, Hwanjun
    Bang, Jihwan
    [J]. PATTERN RECOGNITION, 2024, 155
  • [40] Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
    Shin, Hyeon-Kyeong
    Han, Hyewon
    Kim, Doyeon
    Chung, Soo-Whan
    Kang, Hong-Goo
    [J]. INTERSPEECH 2022, 2022, : 1871 - 1875