Unified Open-Vocabulary Dense Visual Prediction

被引:11
作者
Shi, Hengcan [1 ]
Hayat, Munawar [2 ]
Cai, Jianfei [2 ]
机构
[1] Hunan Univ, Coll Elect & Informat Engn, Changsha 410012, Peoples R China
[2] Monash Univ, Dept Data Sci & AI, Melbourne 3800, Australia
关键词
Task analysis; Training; Decoding; Visualization; Feature extraction; Semantics; Object detection; Open-vocabulary; object detection; image segmentation;
D O I
10.1109/TMM.2024.3381835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of the existing approaches are task-specific, i.e., tackling each task individually. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Compared with separate models, a unified network is more desirable for diverse industrial applications. Moreover, OV dense prediction training data is relatively less. Separate networks can only leverage task-relevant training data, while a unified approach can integrate diverse data to boost individual tasks. We address two major challenges in unified OV prediction. First, unlike unified methods for fixed-set predictions, OV networks are usually trained with multi-modal data. Therefore, we propose a multi-modal, multi-scale and multi-task (MMM) decoding mechanism to better exploit multi-modal information for OV recognition. Second, because UOVN uses data from different tasks for training, there are significant domain and task gaps. We present a UOVN training mechanism to reduce such gaps. Experiments on four datasets demonstrate the effectiveness of our UOVN.
引用
收藏
页码:8704 / 8716
页数:13
相关论文
共 50 条
  • [31] Open-vocabulary object detection via debiased curriculum self-training
    Zhang, Hanlue
    Guan, Dayan
    Ke, Xiangrui
    El Saddik, Abdulmotaleb
    Lu, Shijian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [32] DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks
    Li, Zhiheng
    Zhong, Yujie
    Song, Ran
    Li, Tianjiao
    Ma, Lin
    Zhang, Wei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7728 - 7741
  • [33] Open-Vocabulary Instance Segmentation-Boundary IS-Goal
    Tang, Quan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 420 - 435
  • [34] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
    Lan, Mengcheng
    Chen, Chaofeng
    Ke, Yiping
    Wang, Xinjiang
    Feng, Litong
    Zhang, Wayne
    COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 70 - 88
  • [35] SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
    Li, Siyuan
    Ke, Lei
    Yang, Yung-Hsu
    Piccinelli, Luigi
    Segu, Mattia
    Danelljan, Martin
    Van Gool, Luc
    COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 1 - 18
  • [36] OV-VIS: Open-Vocabulary Video Instance Segmentation
    Wang, Haochen
    Yan, Cilin
    Chen, Keyan
    Jiang, Xiaolong
    Tang, Xu
    Hu, Yao
    Kang, Guoliang
    Xie, Weidi
    Gavves, Efstratios
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5048 - 5065
  • [37] Image-text aggregation for open-vocabulary semantic segmentation
    Cheng, Shengyang
    Huang, Jianyong
    Wang, Xiaodong
    Huang, Lei
    Wei, Zhiqiang
    NEUROCOMPUTING, 2025, 630
  • [38] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
  • [39] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
    Shi, Hengcan
    Dao, Son Duy
    Cai, Jianfei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
  • [40] Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation
    Wang, Xihua
    Ji, Lei
    Yan, Kun
    Sun, Yuchong
    Song, Ruihua
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 407 - 419