Multi-modal object detection via transformer network

被引：2

作者：

Liu, Wenbing ^{[1
,2
]}

Wang, Haibo ^{[1
,2
]}

Gao, Quanxue ^{[1
,3
]}

Zhu, Zhaorui ^{[1
]}

机构：

[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China

[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China

[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China

来源：

IET IMAGE PROCESSING | 2023年 / 17卷 / 12期

关键词：

image representations; object detection;

D O I：

10.1049/ipr2.12884

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.

引用

页码：3541 / 3550

页数：10

共 50 条

[1] Class-Agnostic Object Detection with Multi-modal Transformer
Maaz, Muhammad
Rasheed, Hanoona
Khan, Salman
Khan, Fahad Shahbaz
Anwer, Rao Muhammad
Yang, Ming-Hsuan
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 512 - 531
[2] MULTI-MODAL FEATURE FUSION NETWORK FOR GHOST IMAGING OBJECT DETECTION
Hu, Nan
Ma, Huimin
Le, Chao
Shao, Xuehui
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 351 - 355
[3] Deep Multi-modal Object Detection for Autonomous Driving
Ennajar, Amal
Khouja, Nadia
Boutteau, Remi
Tlili, Fethi
2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 7 - 11
[4] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection
Guo, Ruohao
Ying, Xianghua
Qi, Yanyu
Qu, Liao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7622 - 7635
[5] Object detection in multi-modal images using genetic programming
Bhanu, B
Lin, YQ
APPLIED SOFT COMPUTING, 2004, 4 (02) : 175 - 201
[6] Multi-Modal Dataset Generation using Domain Randomization for Object Detection
Marez, Diego
Nans, Lena
Borden, Samuel
GEOSPATIAL INFORMATICS XI, 2021, 11733
[7] CrossFormer: Cross-guided attention for multi-modal object detection
Lee, Seungik
Park, Jaehyeong
Park, Jinsun
PATTERN RECOGNITION LETTERS, 2024, 179 : 144 - 150
[8] Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection
Wang, Kunpeng
Tu, Zhengzheng
Li, Chenglong
Zhang, Cheng
Luo, Bin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7344 - 7358
[9] Multi-Modal 3D Object Detection by Box Matching
Liu, Zhe
Ye, Xiaoqing
Zou, Zhikang
He, Xinwei
Tan, Xiao
Ding, Errui
Wang, Jingdong
Bai, Xiang
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 19917 - 19928
[10] Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection
Xiao, Zhibin
Xie, Pengwei
Wang, Guijin
MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 352 - 363

← 1 2 3 4 5 →