VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

被引:97
作者
Chen, Yukang [1 ]
Liu, Jianhui [2 ]
Zhang, Xiangyu [3 ]
Qi, Xiaojuan [2 ]
Jia, Jiaya [1 ,4 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Hong Kong, Hong Kong, Peoples R China
[3] MEGVII, Beijing, Peoples R China
[4] SmartMore, Hong Kong, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.02076
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D object detectors usually rely on hand-crafted proxies, e.g., anchors or centers, and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be densified and processed by dense prediction heads, which inevitably costs extra computation. In this paper, we instead propose VoxelNext for fully sparse 3D object detection. Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies. Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely. It is an elegant and efficient framework, with no need for sparse-to-dense conversion or NMS post-processing. Our method achieves a better speed-accuracy trade-off than other mainframe detectors on the nuScenes dataset. For the first time, we show that a fully sparse voxel-based representation works decently for LIDAR 3D object detection and tracking. Extensive experiments on nuScenes, Waymo, and Argoverse2 benchmarks validate the effectiveness of our approach. Without bells and whistles, our model outperforms all existing LIDAR methods on the nuScenes tracking test benchmark. Code and models are available at github.com/dvlab-research/VoxelNeXt.
引用
收藏
页码:21674 / 21683
页数:10
相关论文
共 62 条
  • [1] [Anonymous], 2021, ICRA, DOI DOI 10.1109/ICRA48506.2021.9561754
  • [2] [Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298
  • [3] [Anonymous], 2022, WACV, DOI DOI 10.1109/WACV51458.2022.00235
  • [4] [Anonymous], 2022, CVPR, DOI DOI 10.1109/CVPR52688.2022.00535
  • [5] [Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00746
  • [6] [Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00190
  • [7] [Anonymous], 2022, CVPR, DOI DOI 10.1109/CVPR52688.2022.00535
  • [8] [Anonymous], 2022, CVPR, DOI DOI 10.1109/CVPR52688.2022.01838
  • [9] [Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00567
  • [10] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
    Bai, Xuyang
    Hu, Zeyu
    Zhu, Xinge
    Huang, Qingqiu
    Chen, Yilun
    Fu, Hangbo
    Tai, Chiew-Lan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1080 - 1089