Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection

被引：0

作者：

Xu, Yifan ^{[1
,2
]}

Zhang, Mengdan ^{[3
]}

Yang, Xiaoshan ^{[1
,2
]}

Xu, Changsheng ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Univ Chinese Acad Sci, Inst Automat, MAIS, Beijing 100190, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China

[3] Tencent Youtu Lab, Shanghai 200233, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

中国国家自然科学基金;

关键词：

Transformers; Visualization; Detectors; Object detection; Context modeling; Proposals; Location awareness; Annotations; Vocabulary; Training; open-vocabulary; contextual knowledge;

D O I：

10.1109/TIP.2024.3485518

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We explore multi-modal contextual knowledge learned through multi-modal masked language modeling to provide explicit localization guidance for novel classes in open-vocabulary object detection (OVD). Intuitively, a well-modeled and correctly predicted masked concept word should effectively capture the textual contexts, visual contexts, and the cross-modal correspondence between texts and regions, thereby automatically activating high attention on corresponding regions. In light of this, we propose a multi-modal contextual knowledge distillation framework, MMC-Det, to explicitly supervise a student detector with the context-aware attention of the masked concept words in a teacher fusion transformer. The teacher fusion transformer is trained with our newly proposed diverse multi-modal masked language modeling (D-MLM) strategy, which significantly enhances the fine-grained region-level visual context modeling in the fusion transformer. The proposed distillation process provides additional contextual guidance to the concept-region matching of the detector, thereby further improving the OVD performance. Extensive experiments performed upon various detection datasets show the effectiveness of our multi-modal context learning strategy.

引用

页码：6253 / 6267

页数：15

共 50 条

[1] Open-Vocabulary Object Detection via Scene Graph Discovery
Shi, Hengcan
Hayat, Munawar
Cai, Jianfei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4012 - 4021
[2] OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
Chen, Keyan
Jiang, Xiaolong
Wang, Haochen
Yan, Cilin
Gao, Yan
Tang, Xu
Hu, Yao
Xie, Weidi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5387 - 5409
[3] Understanding object descriptions in robotics by open-vocabulary object retrieval and detection
Guadarrama, Sergio
Rodner, Erik
Saenko, Kate
Darrell, Trevor
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (1-3) : 265 - 280
[4] Open-Vocabulary Camouflaged Object Segmentation
Pang, Youwei
Zhao, Xiaoqi
Zuo, Jiaming
Zhang, Lihe
Lu, Huchuan
COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 476 - 495
[5] Open-vocabulary object detection via debiased curriculum self-training
Zhang, Hanlue
Guan, Dayan
Ke, Xiangrui
El Saddik, Abdulmotaleb
Lu, Shijian
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[6] OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
Zhang, Hu
Ku, Jianhua
Tang, Tao
Sun, Haiyang
Huang, Xin
Huang, Zi
Yu, Kaicheng
COMPUTER VISION - ECCV 2024, PT LXXXIV, 2025, 15142 : 1 - 19
[7] Open-Vocabulary Category-Level Object Pose and Size Estimation
Cai, Junhao
He, Yisheng
Yuan, Weihao
Zhu, Siyu
Dong, Zilong
Bo, Liefeng
Chen, Qifeng
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7661 - 7668
[8] Deep Multi-modal Object Detection for Autonomous Driving
Ennajar, Amal
Khouja, Nadia
Boutteau, Remi
Tlili, Fethi
2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 7 - 11
[9] A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Zhu, Chaoyang
Chen, Long
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8954 - 8975
[10] Multi-modal object detection via transformer network
Liu, Wenbing
Wang, Haibo
Gao, Quanxue
Zhu, Zhaorui
IET IMAGE PROCESSING, 2023, 17 (12) : 3541 - 3550

← 1 2 3 4 5 →