FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

被引:51
作者
Chen, Yilun [1 ]
Yu, Zhiding [3 ]
Chen, Yukang [1 ]
Lan, Shiyi [3 ]
Anandkumar, Anima [2 ,3 ]
Jia, Jiaya [1 ]
Alvarez, Jose M.
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] CALTECH, Pasadena, CA USA
[3] NVIDIA, Santa Clara, CA USA
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
关键词
D O I
10.1109/ICCV51070.2023.00771
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
False negatives (FN) in 3D object detection, e.g., missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies FN in a multi- stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at https: //github.com/NVlabs/FocalFormer3D.
引用
收藏
页码:8360 / 8371
页数:12
相关论文
共 87 条
[51]   Deep Hough Voting for 3D Object Detection in Point Clouds [J].
Qi, Charles R. ;
Litany, Or ;
He, Kaiming ;
Guibas, Leonidas J. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9276-9285
[52]   PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [J].
Qi, Charles R. ;
Su, Hao ;
Mo, Kaichun ;
Guibas, Leonidas J. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :77-85
[53]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[54]   MobileNetV2: Inverted Residuals and Linear Bottlenecks [J].
Sandler, Mark ;
Howard, Andrew ;
Zhu, Menglong ;
Zhmoginov, Andrey ;
Chen, Liang-Chieh .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4510-4520
[55]   PillarNet: Real-Time and High-Performance Pillar-Based 3D Object Detection [J].
Shi, Guangsheng ;
Li, Ruifeng ;
Ma, Chao .
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 :35-52
[56]   PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [J].
Shi, Shaoshuai ;
Guo, Chaoxu ;
Jiang, Li ;
Wang, Zhe ;
Shi, Jianping ;
Wang, Xiaogang ;
Li, Hongsheng .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10526-10535
[57]   PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud [J].
Shi, Shaoshuai ;
Wang, Xiaogang ;
Li, Hongsheng .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :770-779
[58]  
Shi Shaoshuai, 2022, IJCV, P1
[59]   Training Region-based Object Detectors with Online Hard Example Mining [J].
Shrivastava, Abhinav ;
Gupta, Abhinav ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :761-769
[60]   RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection [J].
Sun, Pei ;
Wang, Weiyue ;
Chai, Yuning ;
Elsayed, Gamaleldin ;
Bewley, Alex ;
Zhang, Xiao ;
Sminchisescu, Cristian ;
Anguelov, Dragomir .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5721-5730