Multi-modal object detection via transformer network

被引：2

作者：

Liu, Wenbing ^{[1
,2
]}

Wang, Haibo ^{[1
,2
]}

Gao, Quanxue ^{[1
,3
]}

Zhu, Zhaorui ^{[1
]}

机构：

[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China

[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China

[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China

来源：

IET IMAGE PROCESSING | 2023年 / 17卷 / 12期

关键词：

image representations; object detection;

D O I：

10.1049/ipr2.12884

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.

引用

页码：3541 / 3550

页数：10

共 50 条

[21] Object detection based on multi-modal adaptive fusion using YOLOv3
Sheikh, Aarfa Bano
Baru, Apurva
Desai, Sanjana Shinde
Mangale, Supriya
JOURNAL OF APPLIED REMOTE SENSING, 2022, 16 (02)
[22] Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel
Gao, Xin
Zhang, Guoying
Xiong, Yijin
MEASUREMENT, 2022, 194
[23] Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network
Wang, Jinpeng
Su, Nan
Zhao, Chunhui
Yan, Yiming
Feng, Shou
REMOTE SENSING, 2024, 16 (20)
[24] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
Li, Jiahao
Chen, Lingshan
Li, Zhen
IEEE ACCESS, 2025, 13 : 52385 - 52396
[25] Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images
Julia Cohen
Carlos Crispim-Junior
Jean-Marc Chiappa
Laure Tougne Rodet
Multimedia Tools and Applications, 2024, 83 : 12111 - 12138
[26] Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges
Feng, Di
Haase-Schutz, Christian
Rosenbaum, Lars
Hertlein, Heinz
Glaser, Claudius
Timm, Fabian
Wiesbeck, Werner
Dietmayer, Klaus
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (03) : 1341 - 1360
[27] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
Liu, Zhanwen
Cheng, Juanru
Fan, Jin
Lin, Shan
Wang, Yang
Zhao, Xiangmo
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
[28] Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images
Cohen, Julia
Crispim-Junior, Carlos
Chiappa, Jean-Marc
Rodet, Laure Tougne
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12111 - 12138
[29] Multi-Modal System for Walking Safety for the Visually Impaired: Multi-Object Detection and Natural Language Generation
Lee, Jekyung
Cha, Kyung-Ae
Lee, Miran
APPLIED SCIENCES-BASEL, 2024, 14 (17):
[30] A Multi-Modal Fusion-Based 3D Multi-Object Tracking Framework With Joint Detection
Wang, Xiyang
Fu, Chunyun
He, Jiawei
Huang, Mingguang
Meng, Ting
Zhang, Siyu
Zhou, Hangning
Xu, Ziyao
Zhang, Chi
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 532 - 539

← 1 2 3 4 5 →