Voxel Field Fusion for 3D Object Detection

被引:62
作者
Li, Yanwei [1 ,3 ]
Qi, Xiaojuan [2 ]
Chen, Yukang [1 ,3 ]
Wang, Liwei [1 ,3 ]
Li, Zeming [3 ]
Sun, Jian [3 ]
Jia, Jiaya [1 ,3 ,4 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Hong Kong, Hong Kong, Peoples R China
[3] MEGVII Technol, Beijing, Peoples R China
[4] SmartMore, Hong Kong, Peoples R China
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
关键词
D O I
10.1109/CVPR52688.2022.00119
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field. To this end, the learnable sampler is first designed to sample vital features from the image plane that are projected to the voxel grid in a point-to-ray manner, which maintains the consistency in feature representation with spatial context. In addition, ray-wise fusion is conducted to fuse features with the supplemental context in the constructed voxel field. We further develop mixed augmentor to align feature-variant transformations, which bridges the modality gap in data augmentation. The proposed framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous fusion-based methods on KITTI and nuScenes datasets. Code is made available at https://github.com/dvlab-research/VFF.(1)
引用
收藏
页码:1110 / 1119
页数:10
相关论文
共 52 条
[1]  
[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00466
[2]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298
[3]  
[Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01258-8_23
[4]  
[Anonymous], 2015, NEURIPS
[5]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00826
[6]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[7]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[8]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[9]  
Chen Rui, 2019, ICCV
[10]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534