Optimized voxel transformer for 3D detection with spatial-semantic feature aggregation

被引：3

作者：

Li, Yingfei ^{[1
]}

机构：

[1] Univ Toronto, Toronto, ON, Canada

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2023年 / 112卷

关键词：

Artificial intelligence; 3D object detection; Point cloud; Single stage object detector;

D O I：

10.1016/j.compeleceng.2023.109023

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a novel 3D object detection model that leverages the advantages of the Voxel Transformer (VoTr) and the Confident IoU-Aware Single-Stage Object Detector (CIA-SSD) to address the challenges of detecting objects in 3D point clouds. Our model adopts the VoTr as its backbone, which enables long-range interactions between voxels via a self-attention mechanism. This overcomes the limitations of conventional voxel-based 3D detectors, which struggle to capture sufficient contextual information due to their restricted receptive fields. Our model also integrates the sparse voxel module and the submanifold voxel module, which efficiently process empty and non-empty voxel positions, effectively handling the natural sparsity and abundance of non-empty voxels. Moreover, inspired by the CIA-SSD design, our model incorporates the SpatialSemantic Feature Aggregation (SSFA) module, which allows for the adaptive fusion of high-level abstract semantic features and low-level spatial features, ensuring accurate predictions of bounding boxes and classification confidence. Furthermore, based on the IoU-aware confidence rectification module, which refines the alignment between confidence scores and localization accuracy, we devise an Optimized RPN (Region Proposal Network) Detection Head module as a dense head to further predict the IoU loss and improve the accuracy. In this paper, we combine two state-of-the-art techniques to provide a precise and efficient solution for 3D object detection in point clouds. We evaluate our model on the KITTI dataset1 and achieve 76.56 % accuracy in terms of AP3D (%) Hard.

引用

页数：10

共 29 条

[1] Bhattacharyya P, 2020, Arxiv, DOI arXiv:2008.08766
[2] Chen Q., 2019, OBJECT HOTSPOTS ANCH
[3] Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/iccv.2019.00987, 10.1109/ICCV.2019.00987]
[4] Focal Sparse Convolutional Networks for 3D Object Detection
Chen, Yukang
Li, Yanwei
Zhang, Xiangyu
Sun, Jian
Jia, Jiaya
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5418 - 5427
[5] Du L, 2020, PROC CVPR IEEE, P13326, DOI 10.1109/CVPR42600.2020.01334
[6] Vision meets robotics: The KITTI dataset
Geiger, A.
Lenz, P.
Stiller, C.
Urtasun, R.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) : 1231 - 1237
[7] Fast R-CNN
Girshick, Ross
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
[8] He CH, 2020, PROC CVPR IEEE, P11870, DOI 10.1109/CVPR42600.2020.01189
[9] An effective motion object detection using adaptive background modeling mechanism in video surveillance system
Kalli, SivaNagiReddy
Suresh, T.
Prasanth, A.
Muthumanickam, T.
Mohanram, K.
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (01) : 1777 - 1789
[10] PointPillars: Fast Encoders for Object Detection from Point Clouds
Lang, Alex H.
Vora, Sourabh
Caesar, Holger
Zhou, Lubing
Yang, Jiong
Beijbom, Oscar
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697

← 1 2 3 →