Unified Open-Vocabulary Dense Visual Prediction

被引:11
作者
Shi, Hengcan [1 ]
Hayat, Munawar [2 ]
Cai, Jianfei [2 ]
机构
[1] Hunan Univ, Coll Elect & Informat Engn, Changsha 410012, Peoples R China
[2] Monash Univ, Dept Data Sci & AI, Melbourne 3800, Australia
关键词
Task analysis; Training; Decoding; Visualization; Feature extraction; Semantics; Object detection; Open-vocabulary; object detection; image segmentation;
D O I
10.1109/TMM.2024.3381835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of the existing approaches are task-specific, i.e., tackling each task individually. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Compared with separate models, a unified network is more desirable for diverse industrial applications. Moreover, OV dense prediction training data is relatively less. Separate networks can only leverage task-relevant training data, while a unified approach can integrate diverse data to boost individual tasks. We address two major challenges in unified OV prediction. First, unlike unified methods for fixed-set predictions, OV networks are usually trained with multi-modal data. Therefore, we propose a multi-modal, multi-scale and multi-task (MMM) decoding mechanism to better exploit multi-modal information for OV recognition. Second, because UOVN uses data from different tasks for training, there are significant domain and task gaps. We present a UOVN training mechanism to reduce such gaps. Experiments on four datasets demonstrate the effectiveness of our UOVN.
引用
收藏
页码:8704 / 8716
页数:13
相关论文
共 50 条
  • [21] OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments
    Deng, Yinan
    Wang, Jiahui
    Zhao, Jingyu
    Tian, Xinyu
    Chen, Guangyan
    Yang, Yi
    Yue, Yufeng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8402 - 8409
  • [22] Search3D: Hierarchical Open-Vocabulary 3D Segmentation
    Takmaz, Ayca
    Delitzas, Alexandros
    Sumner, Robert W.
    Engelmann, Francis
    Wald, Johanna
    Tombari, Federico
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2558 - 2565
  • [23] Textual Grounding for Open-Vocabulary Visual Information Extraction in Layout-Diversified Documents
    Cheng, Mengjun
    Zhang, Chengquan
    Liu, Chang
    Li, Yuke
    Li, Bohan
    Yao, Kun
    Zheng, Xiawu
    Ji, Rongrong
    Chen, Jie
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 474 - 491
  • [24] CLIP prior-guided 3D open-vocabulary occupancy prediction
    Zhang, Zongkai
    Gao, Bin
    Ye, Jingrui
    Jin, Huan
    Jiang, Lihui
    Yang, Wenming
    PATTERN RECOGNITION, 2025, 162
  • [25] Understanding object descriptions in robotics by open-vocabulary object retrieval and detection
    Guadarrama, Sergio
    Rodner, Erik
    Saenko, Kate
    Darrell, Trevor
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (1-3) : 265 - 280
  • [26] Open-Vocabulary RGB-Thermal Semantic Segmentation
    Zhao, Guoqiang
    Huang, Junjie
    Yan, Xiaoyun
    Wang, Zhaojing
    Tang, Junwei
    Ou, Yangjun
    Hu, Xinrong
    Peng, Tao
    COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 304 - 320
  • [27] Structured Knowledge Distillation for Dense Prediction
    Liu, Yifan
    Shu, Changyong
    Wang, Jingdong
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7035 - 7049
  • [28] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
    Barsellotti, Luca
    Amoroso, Roberto
    Baraldi, Lorenzo
    Cucchiara, Rita
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
  • [29] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
    Sacchi, Niccolo
    Nanchen, Alexandre
    Jaggi, Martin
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 3362 - 3366
  • [30] Towards Open Vocabulary Learning: A Survey
    Wu, Jianzong
    Li, Xiangtai
    Xu, Shilin
    Yuan, Haobo
    Ding, Henghui
    Yang, Yibo
    Li, Xia
    Zhang, Jiangning
    Tong, Yunhai
    Jiang, Xudong
    Ghanem, Bernard
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 5092 - 5113