Multi-modal object detection via transformer network

被引:2
|
作者
Liu, Wenbing [1 ,2 ]
Wang, Haibo [1 ,2 ]
Gao, Quanxue [1 ,3 ]
Zhu, Zhaorui [1 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China
[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China
[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
关键词
image representations; object detection;
D O I
10.1049/ipr2.12884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.
引用
收藏
页码:3541 / 3550
页数:10
相关论文
共 50 条
  • [31] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ruixin Ma
    Yong Yin
    Jing Chen
    Rihao Chang
    Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
  • [32] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ma, Ruixin
    Yin, Yong
    Chen, Jing
    Chang, Rihao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012
  • [33] Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models
    Wu, Kewei
    Wang, Yiran
    He, Xiaogang
    Yan, Jinyu
    Guo, Yang
    Jiang, Zhuqing
    Zhang, Xing
    Wang, Wei
    Xiong, Yongping
    Men, Aidong
    Xiao, Li
    Big Data and Cognitive Computing, 2024, 8 (12)
  • [34] Multi-modal feature fusion for object detection using neighbourhood component analysis and bounding box regression
    Dhillon A.
    Verma G.K.
    International Journal of Business Intelligence and Data Mining, 2023, 23 (01) : 73 - 99
  • [35] Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network
    Lingling Li
    Changwen Zheng
    Cunli Mao
    Haibo Deng
    Taisong Jin
    Neural Processing Letters, 2022, 54 : 581 - 595
  • [36] Multi-Object Tracking Based on a Novel Feature Image With Multi-Modal Information
    An, Yi
    Wu, Jialin
    Cui, Yunhao
    Hu, Huosheng
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (08) : 9909 - 9921
  • [37] Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network
    Li, Lingling
    Zheng, Changwen
    Mao, Cunli
    Deng, Haibo
    Jin, Taisong
    NEURAL PROCESSING LETTERS, 2022, 54 (01) : 581 - 595
  • [38] Cross-Modal Object Detection Via UAV
    Li, Ang
    Ni, Shouxiang
    Chen, Yanan
    Chen, Jianxin
    Wei, Xin
    Zhou, Liang
    Guizani, Mohsen
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (08) : 10894 - 10905
  • [39] Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Zhou, Xiaofei
    Wan, Bin
    Wang, Shuai
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (02) : 4741 - 4755
  • [40] BSM-NET: multi-bandwidth, multi-scale and multi-modal fusion network for 3D object detection of 4D radar and LiDAR
    Jiang, Tiezhen
    Kang, Runjie
    Li, Qingzhu
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2025, 36 (03)