Multi-modal object detection via transformer network

被引：2

作者：

Liu, Wenbing ^{[1
,2
]}

Wang, Haibo ^{[1
,2
]}

Gao, Quanxue ^{[1
,3
]}

Zhu, Zhaorui ^{[1
]}

机构：

[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China

[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China

[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China

来源：

IET IMAGE PROCESSING | 2023年 / 17卷 / 12期

关键词：

image representations; object detection;

D O I：

10.1049/ipr2.12884

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.

引用

页码：3541 / 3550

页数：10

共 50 条

[31] Multi-modal information fusion for LiDAR-based 3D object detection framework
Ruixin Ma
Yong Yin
Jing Chen
Rihao Chang
Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
[32] Multi-modal information fusion for LiDAR-based 3D object detection framework
Ma, Ruixin
Yin, Yong
Chen, Jing
Chang, Rihao
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012
[33] Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models
Wu, Kewei
Wang, Yiran
He, Xiaogang
Yan, Jinyu
Guo, Yang
Jiang, Zhuqing
Zhang, Xing
Wang, Wei
Xiong, Yongping
Men, Aidong
Xiao, Li
Big Data and Cognitive Computing, 2024, 8 (12)
[34] Multi-modal feature fusion for object detection using neighbourhood component analysis and bounding box regression
Dhillon A.
Verma G.K.
International Journal of Business Intelligence and Data Mining, 2023, 23 (01) : 73 - 99
[35] Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network
Lingling Li
Changwen Zheng
Cunli Mao
Haibo Deng
Taisong Jin
Neural Processing Letters, 2022, 54 : 581 - 595
[36] Multi-Object Tracking Based on a Novel Feature Image With Multi-Modal Information
An, Yi
Wu, Jialin
Cui, Yunhao
Hu, Huosheng
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (08) : 9909 - 9921
[37] Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network
Li, Lingling
Zheng, Changwen
Mao, Cunli
Deng, Haibo
Jin, Taisong
NEURAL PROCESSING LETTERS, 2022, 54 (01) : 581 - 595
[38] Cross-Modal Object Detection Via UAV
Li, Ang
Ni, Shouxiang
Chen, Yanan
Chen, Jianxin
Wei, Xin
Zhou, Liang
Guizani, Mohsen
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (08) : 10894 - 10905
[39] Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection
Lv, Chengtao
Zhou, Xiaofei
Wan, Bin
Wang, Shuai
Sun, Yaoqi
Zhang, Jiyong
Yan, Chenggang
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (02) : 4741 - 4755
[40] BSM-NET: multi-bandwidth, multi-scale and multi-modal fusion network for 3D object detection of 4D radar and LiDAR
Jiang, Tiezhen
Kang, Runjie
Li, Qingzhu
MEASUREMENT SCIENCE AND TECHNOLOGY, 2025, 36 (03)

← 1 2 3 4 5 →