Optimized voxel transformer for 3D detection with spatial-semantic feature aggregation

被引:3
作者
Li, Yingfei [1 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
关键词
Artificial intelligence; 3D object detection; Point cloud; Single stage object detector;
D O I
10.1016/j.compeleceng.2023.109023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel 3D object detection model that leverages the advantages of the Voxel Transformer (VoTr) and the Confident IoU-Aware Single-Stage Object Detector (CIA-SSD) to address the challenges of detecting objects in 3D point clouds. Our model adopts the VoTr as its backbone, which enables long-range interactions between voxels via a self-attention mechanism. This overcomes the limitations of conventional voxel-based 3D detectors, which struggle to capture sufficient contextual information due to their restricted receptive fields. Our model also integrates the sparse voxel module and the submanifold voxel module, which efficiently process empty and non-empty voxel positions, effectively handling the natural sparsity and abundance of non-empty voxels. Moreover, inspired by the CIA-SSD design, our model incorporates the SpatialSemantic Feature Aggregation (SSFA) module, which allows for the adaptive fusion of high-level abstract semantic features and low-level spatial features, ensuring accurate predictions of bounding boxes and classification confidence. Furthermore, based on the IoU-aware confidence rectification module, which refines the alignment between confidence scores and localization accuracy, we devise an Optimized RPN (Region Proposal Network) Detection Head module as a dense head to further predict the IoU loss and improve the accuracy. In this paper, we combine two state-of-the-art techniques to provide a precise and efficient solution for 3D object detection in point clouds. We evaluate our model on the KITTI dataset1 and achieve 76.56 % accuracy in terms of AP3D (%) Hard.
引用
收藏
页数:10
相关论文
共 29 条
  • [1] Bhattacharyya P, 2020, Arxiv, DOI arXiv:2008.08766
  • [2] Chen Q., 2019, OBJECT HOTSPOTS ANCH
  • [3] Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/iccv.2019.00987, 10.1109/ICCV.2019.00987]
  • [4] Focal Sparse Convolutional Networks for 3D Object Detection
    Chen, Yukang
    Li, Yanwei
    Zhang, Xiangyu
    Sun, Jian
    Jia, Jiaya
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5418 - 5427
  • [5] Du L, 2020, PROC CVPR IEEE, P13326, DOI 10.1109/CVPR42600.2020.01334
  • [6] Vision meets robotics: The KITTI dataset
    Geiger, A.
    Lenz, P.
    Stiller, C.
    Urtasun, R.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) : 1231 - 1237
  • [7] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
  • [8] He CH, 2020, PROC CVPR IEEE, P11870, DOI 10.1109/CVPR42600.2020.01189
  • [9] An effective motion object detection using adaptive background modeling mechanism in video surveillance system
    Kalli, SivaNagiReddy
    Suresh, T.
    Prasanth, A.
    Muthumanickam, T.
    Mohanram, K.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (01) : 1777 - 1789
  • [10] PointPillars: Fast Encoders for Object Detection from Point Clouds
    Lang, Alex H.
    Vora, Sourabh
    Caesar, Holger
    Zhou, Lubing
    Yang, Jiong
    Beijbom, Oscar
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697