Unified Open-Vocabulary Dense Visual Prediction

被引：11

作者：

Shi, Hengcan ^{[1
]}

Hayat, Munawar ^{[2
]}

Cai, Jianfei ^{[2
]}

机构：

[1] Hunan Univ, Coll Elect & Informat Engn, Changsha 410012, Peoples R China

[2] Monash Univ, Dept Data Sci & AI, Melbourne 3800, Australia

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Task analysis; Training; Decoding; Visualization; Feature extraction; Semantics; Object detection; Open-vocabulary; object detection; image segmentation;

D O I：

10.1109/TMM.2024.3381835

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of the existing approaches are task-specific, i.e., tackling each task individually. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Compared with separate models, a unified network is more desirable for diverse industrial applications. Moreover, OV dense prediction training data is relatively less. Separate networks can only leverage task-relevant training data, while a unified approach can integrate diverse data to boost individual tasks. We address two major challenges in unified OV prediction. First, unlike unified methods for fixed-set predictions, OV networks are usually trained with multi-modal data. Therefore, we propose a multi-modal, multi-scale and multi-task (MMM) decoding mechanism to better exploit multi-modal information for OV recognition. Second, because UOVN uses data from different tasks for training, there are significant domain and task gaps. We present a UOVN training mechanism to reduce such gaps. Experiments on four datasets demonstrate the effectiveness of our UOVN.

引用

页码：8704 / 8716

页数：13

共 50 条

[21] OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments
Deng, Yinan
Wang, Jiahui
Zhao, Jingyu
Tian, Xinyu
Chen, Guangyan
Yang, Yi
Yue, Yufeng
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8402 - 8409
[22] Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Takmaz, Ayca
Delitzas, Alexandros
Sumner, Robert W.
Engelmann, Francis
Wald, Johanna
Tombari, Federico
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2558 - 2565
[23] Textual Grounding for Open-Vocabulary Visual Information Extraction in Layout-Diversified Documents
Cheng, Mengjun
Zhang, Chengquan
Liu, Chang
Li, Yuke
Li, Bohan
Yao, Kun
Zheng, Xiawu
Ji, Rongrong
Chen, Jie
COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 474 - 491
[24] CLIP prior-guided 3D open-vocabulary occupancy prediction
Zhang, Zongkai
Gao, Bin
Ye, Jingrui
Jin, Huan
Jiang, Lihui
Yang, Wenming
PATTERN RECOGNITION, 2025, 162
[25] Understanding object descriptions in robotics by open-vocabulary object retrieval and detection
Guadarrama, Sergio
Rodner, Erik
Saenko, Kate
Darrell, Trevor
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (1-3) : 265 - 280
[26] Open-Vocabulary RGB-Thermal Semantic Segmentation
Zhao, Guoqiang
Huang, Junjie
Yan, Xiaoyun
Wang, Zhaojing
Tang, Junwei
Ou, Yangjun
Hu, Xinrong
Peng, Tao
COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 304 - 320
[27] Structured Knowledge Distillation for Dense Prediction
Liu, Yifan
Shu, Changyong
Wang, Jingdong
Shen, Chunhua
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7035 - 7049
[28] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
Barsellotti, Luca
Amoroso, Roberto
Baraldi, Lorenzo
Cucchiara, Rita
IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
[29] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
Sacchi, Niccolo
Nanchen, Alexandre
Jaggi, Martin
Cernak, Milos
INTERSPEECH 2019, 2019, : 3362 - 3366
[30] Towards Open Vocabulary Learning: A Survey
Wu, Jianzong
Li, Xiangtai
Xu, Shilin
Yuan, Haobo
Ding, Henghui
Yang, Yibo
Li, Xia
Zhang, Jiangning
Tong, Yunhai
Jiang, Xudong
Ghanem, Bernard
Tao, Dacheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 5092 - 5113

← 1 2 3 4 5 →