ESGN: Efficient Stereo Geometry Network for Fast 3D Object Detection

被引:19
作者
Gao, Aqi [1 ]
Pang, Yanwei [1 ]
Nie, Jing [1 ]
Shao, Zhuang [2 ]
Cao, Jiale [1 ]
Guo, Yishun [1 ]
Li, Xuelong [3 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin Key Lab Brain Inspired Intelligence Techno, Tianjin 300072, Peoples R China
[2] Newcastle Univ, Sch Engn, Newcastle Upon Tyne NE1 7RU, England
[3] China Telecom Corp Ltd, Inst Artificial Intelligence TeleAI, Beijing 100033, Peoples R China
关键词
Three-dimensional displays; Feature extraction; Cameras; Object detection; Detectors; Laser radar; Representation learning; Autonomous driving; 3D detection; stereo images; computer vision;
D O I
10.1109/TCSVT.2022.3202810
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fast stereo based 3D object detectors have made great progress recently. However, they suffer from the inferior accuracy. We argue that the main reason is due to the poor geometry-aware feature representation in 3D space. To solve this problem, we propose an efficient stereo geometry network (ESGN). The key in our ESGN is an efficient geometry-aware feature generation (EGFG) module. Our EGFG module first uses a stereo correlation and reprojection module to construct multi-scale stereo volumes in camera frustum space, second employs a multi-scale bird's eye view (BEV) projection and fusion module to generate multiple geometry-aware features. In these two steps, we adopt deep multi-scale information fusion for discriminative geometry-aware feature generation, without any complex aggregation networks. In addition, we introduce a deep geometry-aware feature distillation scheme to guide stereo feature learning with a LiDAR-based detector. The experiments are performed on the classical KITTI dataset. On KITTI test set, our ESGN outperforms the fast state-of-art-art detector YOLOStereo3D by 5.14% on mAP3d at $62ms$ . To the best of our knowledge, our ESGN achieves a best trade-off between accuracy and speed. We hope that our efficient stereo geometry network can provide more possible directions for fast 3D object detection.
引用
收藏
页码:2000 / 2009
页数:10
相关论文
共 62 条
[1]   From Handcrafted to Deep Features for Pedestrian Detection: A Survey [J].
Cao, Jiale ;
Pang, Yanwei ;
Xie, Jin ;
Khan, Fahad Shahbaz ;
Shao, Ling .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) :4913-4934
[2]   Hierarchical Shot Detector [J].
Cao, Jiale ;
Pang, Yanwei ;
Han, Jungong ;
Li, Xuelong .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9704-9713
[3]   DSGN: Deep Stereo Geometry Network for 3D Object Detection [J].
Chen, Yilun ;
Liu, Shu ;
Shen, Xiaoyong ;
Jia, Jiaya .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12533-12542
[4]  
Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/iccv.2019.00987, 10.1109/ICCV.2019.00987]
[5]   BCS-Net: Boundary, Context, and Semantic for Automatic COVID-19 Lung Infection Segmentation From CT Images [J].
Cong, Runmin ;
Yang, Haowei ;
Jiang, Qiuping ;
Gao, Wei ;
Li, Haisheng ;
Wang, Cong ;
Zhao, Yao ;
Kwong, Sam .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[6]   Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation [J].
Cong, Runmin ;
Lei, Jianjun ;
Fu, Huazhu ;
Huang, Qingming ;
Cao, Xiaochun ;
Hou, Chunping .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (02) :568-579
[7]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[8]   CenterNet: Keypoint Triplets for Object Detection [J].
Duan, Kaiwen ;
Bai, Song ;
Xie, Lingxi ;
Qi, Honggang ;
Huang, Qingming ;
Tian, Qi .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6568-6577
[9]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[10]  
Guo S., 2021, IEEECVF INT C COMPUT, P3313