Deformable Feature Fusion Network for Multi-Modal 3D Object Detection

被引:0
|
作者
Guo, Kun [1 ]
Gan, Tong [2 ]
Ding, Zhao [3 ]
Ling, Qiang [1 ]
机构
[1] Univ Sci & Technol China, Dept Automat, Hefei, Peoples R China
[2] Anhui ShineAuto Autonomous Driving Technol Co Ltd, Res & Dev Dept, Hefei, Peoples R China
[3] Anhui JiangHuai Automobile Grp Co Ltd, Inst Intelligent & Networked Automobile, Hefei, Peoples R China
来源
2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024 | 2024年
关键词
3D object detection; multi-modal fusion; feature alignment; VOXELNET;
D O I
10.1109/RAIIC61787.2024.10670940
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
LiDAR and cameras are two widely used sensors in 3D object detection. LiDAR point clouds show geometry knowledge of objects, while RGB images provide semantic information, such as color and texture. How to effectively fuse their features is the key to improving detection performance. This paper proposes a Deformable Feature Fusion Network, which performs LiDAR-camera fusion in a flexible way. We present multi-modal features in the bird's-eye view(BEV), and build a Deformable-Attention Fusion(DAF) module to conduct feature fusion. Besides fusion methods, feature alignment is also important in multi-modal detection. Data augmentation of point clouds may change the projection relationship between RGB images and LiDAR point clouds and causes feature misalignment. We introduce a Feature Alignment Transform(FAT) module and alleviate the problem without introducing any trainable parameters. We conduct experiments on the KITTI dataset to evaluate the effectiveness of proposed modules and the experiment results show that our method outperforms most existing methods.
引用
收藏
页码:363 / 367
页数:5
相关论文
共 50 条
  • [21] Multi-level and Multi-modal Target Detection Based on Feature Fusion
    Cheng T.
    Sun L.
    Hou D.
    Shi Q.
    Zhang J.
    Chen J.
    Huang H.
    Qiche Gongcheng/Automotive Engineering, 2021, 43 (11): : 1602 - 1610
  • [22] Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection
    Zhu, Yaohui
    Sun, Xiaoyu
    Wang, Miao
    Huang, Hua
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 9984 - 9995
  • [23] Multi-Modal Fusion Object Tracking Based on Fully Convolutional Siamese Network
    Qi, Ke
    Chen, Liji
    Zhou, Yicong
    Qi, Yutao
    2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 440 - 444
  • [24] Addressing uncertainty in multi-modal fusion for improved object detection in dynamic environment
    Kumar, Praveen
    Mittal, Ankush
    Kumar, Padam
    INFORMATION FUSION, 2010, 11 (04) : 311 - 324
  • [25] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [26] Lightweight video salient object detection via channel-shuffle enhanced multi-modal fusion network
    Huang, Kan
    Xu, Zhijing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 1025 - 1039
  • [27] Lightweight video salient object detection via channel-shuffle enhanced multi-modal fusion network
    Kan Huang
    Zhijing Xu
    Multimedia Tools and Applications, 2024, 83 : 1025 - 1039
  • [28] GATR: Transformer Based on Guided Aggregation Decoder for 3D Multi-Modal Detection
    Luo, Yikai
    He, Linyuan
    Ma, Shiping
    Qi, Zisen
    Fan, Zunlin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (11): : 9725 - 9732
  • [29] Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel
    Gao, Xin
    Zhang, Guoying
    Xiong, Yijin
    MEASUREMENT, 2022, 194
  • [30] Robust 3D Semantic Segmentation Based on Multi-Phase Multi-Modal Fusion for Intelligent Vehicles
    Ni, Peizhou
    Li, Xu
    Xu, Wang
    Kong, Dong
    Hu, Yue
    Wei, Kun
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1602 - 1614