M3DGAF: Monocular 3D Object Detection With Geometric Appearance Awareness and Feature Fusion

被引:9
作者
Chen, Mu [1 ,2 ]
Liu, Pengfei [1 ]
Zhao, Huaici [1 ]
机构
[1] Chinese Acad Sci, Shenyang Inst Automat, Key Lab Opto Elect Informat Proc, Shenyang 110016, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Estimation; Object detection; Task analysis; Feature extraction; Sensors; Detectors; 3D object detection; autonomous driving; geometric appearance awareness; feature fusion;
D O I
10.1109/JSEN.2022.3189174
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The object detection task in autonomous driving scenario is usually completed by a complex visual sensor system, such as the LiDAR sensor, the stereo sensor, and the monocular sensor. Recent progress in autonomous driving leverages a monocular sensor to achieve a highly efficient 3D object detection task with geometric constraints. These detectors improve with the explicit geometry projection, which can build the bridge between the 2D image plane and the 3D world space. However, they tend to focus on optimizing depth estimation and ignore the equally important 3D properties of orientation and 3D dimension. In this work, we propose a Geometric Appearance Awareness (GAA) module to improve the estimation of orientation. Specifically, a GAA module is proposed to obtain the geometry-guided appearance feature, which can be used to estimate reliable orientation. Furthermore, we design a Sample-aware Feature Fusion (SFF) head in the 3D dimension regression branch. This head dynamically deals with the uniqueness of different samples for learning 3D dimension. We evaluate our method on the KITTI dataset, and achieve significant improvements in the 3D object detection task. Compared with the latest method, our approach obtains a 1.08 improvement for the metric of AP(3D) on the hard level and 1.53/3.41/5.79 improvements for the metric of APBEV under the easy/moderate/hard settings, respectively.
引用
收藏
页码:11232 / 11240
页数:9
相关论文
共 54 条
[1]  
[Anonymous], 2010, P 27 INT C MACH LEAR
[2]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[3]   A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection [J].
Cai, Zhaowei ;
Fan, Quanfu ;
Feris, Rogerio S. ;
Vasconcelos, Nuno .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :354-370
[4]   Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [J].
Chabot, Florian ;
Chaouch, Mohamed ;
Rabarisoa, Jaonary ;
Teuliere, Celine ;
Chateau, Thierry .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1827-1836
[5]   Monocular 3D Object Detection Based on Uncertainty Prediction of Keypoints [J].
Chen, Mu ;
Zhao, Huaici ;
Liu, Pengfei .
MACHINES, 2022, 10 (01)
[6]   3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhu, Yukun ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) :1259-1272
[7]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[8]   DSGN: Deep Stereo Geometry Network for 3D Object Detection [J].
Chen, Yilun ;
Liu, Shu ;
Shen, Xiaoyong ;
Jia, Jiaya .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12533-12542
[9]   MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships [J].
Chen, Yongjian ;
Tai, Lei ;
Sun, Kai ;
Li, Mingyang .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12090-12099
[10]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773