Unified Open-Vocabulary Dense Visual Prediction

被引：11

作者：

Shi, Hengcan ^{[1
]}

Hayat, Munawar ^{[2
]}

Cai, Jianfei ^{[2
]}

机构：

[1] Hunan Univ, Coll Elect & Informat Engn, Changsha 410012, Peoples R China

[2] Monash Univ, Dept Data Sci & AI, Melbourne 3800, Australia

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Task analysis; Training; Decoding; Visualization; Feature extraction; Semantics; Object detection; Open-vocabulary; object detection; image segmentation;

D O I：

10.1109/TMM.2024.3381835

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of the existing approaches are task-specific, i.e., tackling each task individually. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Compared with separate models, a unified network is more desirable for diverse industrial applications. Moreover, OV dense prediction training data is relatively less. Separate networks can only leverage task-relevant training data, while a unified approach can integrate diverse data to boost individual tasks. We address two major challenges in unified OV prediction. First, unlike unified methods for fixed-set predictions, OV networks are usually trained with multi-modal data. Therefore, we propose a multi-modal, multi-scale and multi-task (MMM) decoding mechanism to better exploit multi-modal information for OV recognition. Second, because UOVN uses data from different tasks for training, there are significant domain and task gaps. We present a UOVN training mechanism to reduce such gaps. Experiments on four datasets demonstrate the effectiveness of our UOVN.

引用

页码：8704 / 8716

页数：13

共 50 条

[31] Open-vocabulary object detection via debiased curriculum self-training
Zhang, Hanlue
Guan, Dayan
Ke, Xiangrui
El Saddik, Abdulmotaleb
Lu, Shijian
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[32] DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks
Li, Zhiheng
Zhong, Yujie
Song, Ran
Li, Tianjiao
Ma, Lin
Zhang, Wei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7728 - 7741
[33] Open-Vocabulary Instance Segmentation-Boundary IS-Goal
Tang, Quan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 420 - 435
[34] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Lan, Mengcheng
Chen, Chaofeng
Ke, Yiping
Wang, Xinjiang
Feng, Litong
Zhang, Wayne
COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 70 - 88
[35] SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
Li, Siyuan
Ke, Lei
Yang, Yung-Hsu
Piccinelli, Luigi
Segu, Mattia
Danelljan, Martin
Van Gool, Luc
COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 1 - 18
[36] OV-VIS: Open-Vocabulary Video Instance Segmentation
Wang, Haochen
Yan, Cilin
Chen, Keyan
Jiang, Xiaolong
Tang, Xu
Hu, Yao
Kang, Guoliang
Xie, Weidi
Gavves, Efstratios
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5048 - 5065
[37] Image-text aggregation for open-vocabulary semantic segmentation
Cheng, Shengyang
Huang, Jianyong
Wang, Xiaodong
Huang, Lei
Wei, Zhiqiang
NEUROCOMPUTING, 2025, 630
[38] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
Xu, Mengde
Zhang, Zheng
Wei, Fangyun
Hu, Han
Bai, Xiang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
[39] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
Shi, Hengcan
Dao, Son Duy
Cai, Jianfei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
[40] Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation
Wang, Xihua
Ji, Lei
Yan, Kun
Sun, Yuchong
Song, Ruihua
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 407 - 419

← 1 2 3 4 5 →