PPF-Net: Efficient Multimodal 3D Object Detection with Pillar-Point Fusion

被引:0
|
作者
Zhang, Lingxiao [1 ]
Li, Changyong [1 ]
机构
[1] Xinjiang Univ, Coll Mech Engn, Urumqi 830017, Peoples R China
来源
ELECTRONICS | 2025年 / 14卷 / 04期
关键词
3D object detection; cross-modal data augmentation; sensor fusion; joint regression loss function;
D O I
10.3390/electronics14040685
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detecting objects in 3D space using LiDAR is crucial for robotics and autonomous vehicles, but the sparsity of LiDAR-generated point clouds limits performance. Camera images, rich in semantic information, can effectively compensate for this limitation. We propose a simpler yet effective multimodal fusion framework to enhance 3D object detection without complex network designs. We introduce a cross-modal GT-Paste data augmentation method to address challenges like 2D object acquisition and occlusions from added objects. To better integrate image features with sparse point clouds, we propose Pillar-Point Fusion (PPF), which projects non-empty pillars onto image feature maps and uses an attention mechanism to map semantic features from pillars to their constituent points, fusing them with the points' geometric features. Additionally, we design the BD-IoU loss function, which measures 3D bounding box similarity, and a joint regression loss combining BD-IoU and Smooth L1, effectively guiding model training. Our framework achieves consistent improvements across KITTI benchmarks. On the validation set, PFF (PV-RCNN) achieves at least 1.84% AP improvement in Cyclist detection performance across all difficulty levels compared to other multimodal SOTA methods. On the test set, PPF-Net excels in pedestrian detection for moderate and hard difficulty levels and achieves the best results in low-beam LiDAR scenarios.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection
    Shi, Peicheng
    Liu, Zhiqiang
    Qi, Heng
    Yang, Aixi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5615 - 5637
  • [2] PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection
    Xie, Guotao
    Chen, Zhiyuan
    Gao, Ming
    Hu, Manjiang
    Qin, Xiaohui
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (06) : 5598 - 5611
  • [3] PVF-NET: Point & Voxel Fusion 3D Object Detection Framework for Point Cloud
    Cui, Zhihao
    Zhang, Zhenhua
    2020 17TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV 2020), 2020, : 125 - 133
  • [4] MVX-Net: Multimodal VoxelNet for 3D Object Detection
    Sindagi, Vishwanath A.
    Zhou, Yin
    Tuzel, Oncel
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 7276 - 7282
  • [5] Point-Voxel Fusion for Multimodal 3D Detection
    Wang, Ke
    Zhang, Zhichuang
    2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 1716 - 1719
  • [6] PEPillar: a point-enhanced pillar network for efficient 3D object detection in autonomous driving
    Sun, Libo
    Li, Yifan
    Qin, Wenhu
    VISUAL COMPUTER, 2025, 41 (03): : 1777 - 1788
  • [7] FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection
    Xu, Shaoqing
    Zhou, Dingfu
    Fang, Jin
    Yin, Junbo
    Bin, Zhou
    Zhang, Liangjun
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 3047 - 3054
  • [8] Multimodal fusion via voting network for 3D object detection in indoors
    Li, Jianxin
    Si, Guannan
    Liang, Xinyu
    An, Zhaoliang
    Tian, Pengxin
    Zhou, Fengyu
    Wang, Xiaoliang
    PATTERN RECOGNITION, 2025, 164
  • [9] Voxel-to-Pillar: Knowledge Distillation of 3D Object Detection in Point Cloud
    Zhang, Jinbao
    Liu, Jun
    PROCEEDINGS OF THE 4TH EUROPEAN SYMPOSIUM ON SOFTWARE ENGINEERING, ESSE 2023, 2024, : 99 - 104
  • [10] VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection
    Wang, Lin
    Sun, Shiliang
    Zhao, Jing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10597 - 10609