STFNET: Sparse Temporal Fusion for 3D Object Detection in LiDAR Point Cloud

被引:0
作者
Meng, Xin [1 ]
Zhou, Yuan [2 ]
Ma, Jun [1 ]
Jiang, Fangdi [1 ]
Qi, Yongze [1 ]
Wang, Cui [3 ]
Kim, Jonghyuk [4 ]
Wang, Shifeng [1 ,3 ]
机构
[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China
[2] Leapmotor, Hangzhou 310000, Peoples R China
[3] Changchun Univ Sci & Technol, Zhongshan Inst, Zhongshan 528400, Peoples R China
[4] Naif Arab Univ Secur Sci, Ctr Excellence Cybercrimes & Digital Forens, Riyadh 11452, Saudi Arabia
关键词
Feature extraction; Three-dimensional displays; Point cloud compression; Object detection; Laser radar; History; Sensors; Proposals; Heating systems; Fuses; 3D object detection; autonomous vehicle; LiDAR; point cloud;
D O I
10.1109/JSEN.2024.3519603
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In autonomous driving and robotics, 3D object detection using LiDAR point clouds is a critical task. However, existing single-frame 3D object detection methods face challenges such as noise, occlusions, and sparsity, which degrade detection performance. To address these, we propose the sparse temporal fusion network (STFNet), which leverages multiframe historical information to improve 3D object detection accuracy. The contribution of STFNet contains three core modules: multihistory feature alignment module (MFAM), sparse feature extraction module (SFEM), and temporal fusion transformer (TFformer). MFAM: Ego-motion is used for compensation to align frames, establishing correlations between adjacent frames along the temporal dimension. SFEM: Sparse extraction is performed on features from different time steps to obtain key features within the time series. TFformer: The advanced temporal fusion attention mechanism is introduced to facilitate deep interactions between the current and historical frames. We validated the effectiveness of STFNet on the nuScenes dataset, achieving 71.8% NuScenes detection score (NDS) and 67.0% mean average precision (mAP). Compared to the benchmark method, our method improves 1.6% NDS and 1.5% mAP. Extensive experiments demonstrate that STFNet significantly outperforms most existing methods, highlighting the superiority and generalizability of our approach.
引用
收藏
页码:5866 / 5877
页数:12
相关论文
共 37 条
  • [1] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
    Bai, Xuyang
    Hu, Zeyu
    Zhu, Xinge
    Huang, Qingqiu
    Chen, Yilun
    Fu, Hangbo
    Tai, Chiew-Lan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1080 - 1089
  • [2] SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection
    Bhattacharyya, Prarthana
    Huang, Chengjie
    Czarnecki, Krzysztof
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3022 - 3031
  • [3] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [4] MPPNet: Multi-frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection
    Chen, Xuesong
    Shi, Shaoshuai
    Zhu, Benjin
    Cheung, Ka Chun
    Xu, Hang
    Li, Hongsheng
    [J]. COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 680 - 697
  • [5] VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
    Chen, Yukang
    Liu, Jianhui
    Zhang, Xiangyu
    Qi, Xiaojuan
    Jia, Jiaya
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21674 - 21683
  • [6] Chen YK, 2022, Arxiv, DOI arXiv:2206.10555
  • [7] Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
  • [8] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
    Deng, Shengheng
    Liang, Zhihao
    Sun, Lin
    Jia, Kui
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8438 - 8447
  • [9] MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
    He, Chenhang
    Li, Ruihuang
    Zhang, Yabin
    Li, Shuai
    Zhang, Lei
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5196 - 5205
  • [10] LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection
    He, Tong
    Sun, Pei
    Leng, Zhaoqi
    Liu, Chenxi
    Anguelov, Dragomir
    Tan, Mingxing
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1637 - 1644