Multi-modal object detection via transformer network

被引:2
|
作者
Liu, Wenbing [1 ,2 ]
Wang, Haibo [1 ,2 ]
Gao, Quanxue [1 ,3 ]
Zhu, Zhaorui [1 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China
[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China
[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
关键词
image representations; object detection;
D O I
10.1049/ipr2.12884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.
引用
收藏
页码:3541 / 3550
页数:10
相关论文
共 50 条
  • [41] Exploiting Multi-Modal Synergies for Enhancing 3D Multi-Object Tracking
    Xu, Xinglong
    Ren, Weihong
    Chen, Xi'ai
    Fan, Huijie
    Han, Zhi
    Liu, Honghai
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8643 - 8650
  • [42] TransMIN: Transformer-Guided Multi-Interaction Network for Remote Sensing Object Detection
    Xu, Guangming
    Song, Tiecheng
    Sun, Xia
    Gao, Chenqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [43] TransMIN: Transformer-Guided Multi-Interaction Network for Remote Sensing Object Detection
    Xu, Guangming
    Song, Tiecheng
    Sun, Xia
    Gao, Chenqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [44] Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing
    Schierl, Jonathan
    Graehling, Quinn
    Aspiras, Theus
    Asari, Vijay
    Van Rynbach, Andre
    Rabb, Dave
    2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
  • [45] GraphAlign plus plus : An Accurate Feature Alignment by Graph Matching for Multi-Modal 3D Object Detection
    Song, Ziying
    Jia, Caiyan
    Yang, Lei
    Wei, Haiyue
    Liu, Lin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2619 - 2632
  • [46] GEM: Glare or Gloom, I Can Still See You - End-to-End Multi-Modal Object Detection
    Mazhar, Osama
    Babuska, Robert
    Kober, Jens
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04): : 6321 - 6328
  • [47] EPNet plus plus : Cascade Bi-Directional Fusion for Multi-Modal 3D Object Detection
    Liu, Zhe
    Huang, Tengteng
    Li, Bingling
    Chen, Xiwu
    Wang, Xi
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8324 - 8341
  • [48] Bridging the View Disparity Between Radar and Camera Features for Multi-Modal Fusion 3D Object Detection
    Zhou, Taohua
    Chen, Junjie
    Shi, Yining
    Jiang, Kun
    Yang, Mengmeng
    Yang, Diange
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (02): : 1523 - 1535
  • [49] Multi-Modal and Multi-Scale Fusion 3D Object Detection of 4D Radar and LiDAR for Autonomous Driving
    Wang, Li
    Zhang, Xinyu
    Li, Jun
    Xv, Baowei
    Fu, Rong
    Chen, Haifeng
    Yang, Lei
    Jin, Dafeng
    Zhao, Lijun
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (05) : 5628 - 5641
  • [50] Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion
    Yang Chen
    Hou Zhiqiang
    Li Xinyue
    Ma Sugang
    Yang Xiaobao
    ACTA PHOTONICA SINICA, 2024, 53 (03)