A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引:3
作者
Zhu, Chaoyang [1 ]
Chen, Long [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
关键词
Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;
D O I
10.1109/TPAMI.2024.3413013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.
引用
收藏
页码:8954 / 8975
页数:22
相关论文
共 50 条
  • [21] Understanding object descriptions in robotics by open-vocabulary object retrieval and detection
    Guadarrama, Sergio
    Rodner, Erik
    Saenko, Kate
    Darrell, Trevor
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (1-3) : 265 - 280
  • [22] Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching
    Zhang, Hao
    Xu, Lumin
    Lai, Shenqi
    Shao, Wenqi
    Zheng, Nanning
    Luo, Ping
    Qiao, Yu
    Zhang, Kaipeng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 5741 - 5758
  • [23] FreeMix: Open-Vocabulary Domain Generalization of Remote-Sensing Images for Semantic Segmentation
    Wu, Jingyi
    Shi, Jingye
    Zhao, Zeyong
    Liu, Ziyang
    Zhi, Ruicong
    REMOTE SENSING, 2025, 17 (08)
  • [24] Towards Open Vocabulary Learning: A Survey
    Wu, Jianzong
    Li, Xiangtai
    Xu, Shilin
    Yuan, Haobo
    Ding, Henghui
    Yang, Yibo
    Li, Xia
    Zhang, Jiangning
    Tong, Yunhai
    Jiang, Xudong
    Ghanem, Bernard
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 5092 - 5113
  • [25] Open-vocabulary object detection via debiased curriculum self-training
    Zhang, Hanlue
    Guan, Dayan
    Ke, Xiangrui
    El Saddik, Abdulmotaleb
    Lu, Shijian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [26] CLIP-TSA: CLIP-guided open-vocabulary semantic segmentation with two-level semantic awareness
    Liang, Zhixue
    Dong, Wenyong
    Zhang, Bo
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [27] OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
    Zhang, Hu
    Ku, Jianhua
    Tang, Tao
    Sun, Haiyang
    Huang, Xin
    Huang, Zi
    Yu, Kaicheng
    COMPUTER VISION - ECCV 2024, PT LXXXIV, 2025, 15142 : 1 - 19
  • [28] GCD-Net: Global consciousness-driven open-vocabulary semantic segmentation network
    Wu, Xing
    Xu, Zhenyao
    Qian, Quan
    Huang, Bin
    NEUROCOMPUTING, 2025, 636
  • [29] MVP-SEG: Multi-view Prompt Learning for Open-Vocabulary Semantic Segmentation
    Guo, Jie
    Wang, Qimeng
    Gao, Yan
    Jiang, Xiaolong
    Lin, Shaohui
    Zhang, Baochang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XII, 2024, 14436 : 158 - 171
  • [30] Can Identifier Splitting Improve Open-Vocabulary Language Model of Code
    Shi, Jieke
    Yang, Zhou
    He, Junda
    Xu, Bowen
    Lo, David
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 1134 - 1138