Monocular 3D Object Detection Utilizing Auxiliary Learning With Deformable Convolution

被引:4
作者
Chen, Jiun-Han [1 ]
Shieh, Jeng-Lun [1 ]
Haq, Muhamad Amirul [1 ]
Ruan, Shanq-Jang [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Elect & Comp Engn, Taipei 10607, Taiwan
关键词
Three-dimensional displays; Object detection; Solid modeling; Feature extraction; Training; Computational modeling; Task analysis; 3D object detection; monocular camera; driving scene understanding; auxiliary learning; deep learning;
D O I
10.1109/TITS.2023.3319556
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
In autonomous driving systems, the monocular 3D object detection algorithm is a crucial component. The safety of autonomous vehicles heavily depends on a well-designed detection system. Therefore, developing a robust and efficient 3D object detection algorithm is a major goal for institutes and researchers. Having a 3D sense is essential in autonomous vehicles and robotics, as it allows the system to understand its surroundings and react accordingly. Compared with stereo-based and Lidar-based methods, monocular 3D Object detection is a challenging task as it only utilizes 2D information to generate complex 3D features, making it low-cost, less computationally intensive, and with great potential. However, the performance of monocular methods is impaired due to the lack of depth information. In this paper, we propose a simple, end-to-end, and effective network for monocular 3D object detection without the use of external training data. Our work is inspired by auxiliary learning, in which we use a robust feature extractor as our backbone and multiple regression heads to learn auxiliary knowledge. These auxiliary regression heads will be discarded after training for improved inference efficiency, allowing us to take advantage of auxiliary learning and enabling the model to learn critical information more conceptually. The proposed method achieves 17.28% and 20.10% for the moderate level of the Car category on the KITTI benchmark test set and validation set, respectively, which outperforms the previous monocular 3D object detection approaches.
引用
收藏
页码:2424 / 2436
页数:13
相关论文
共 83 条
[1]   Long-Tailed Instance Segmentation Using Gumbel Optimized Loss [J].
Alexandridis, Konstantinos Panagiotis ;
Deng, Jiankang ;
Nguyen, Anh ;
Luo, Shan .
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 :353-369
[2]   A Survey on 3D Object Detection Methods for Autonomous Driving Applications [J].
Arnold, Eduardo ;
Al-Jarrah, Omar Y. ;
Dianati, Mehrdad ;
Fallah, Saber ;
Oxtoby, David ;
Mouzakitis, Alex .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (10) :3782-3795
[3]   MonoFENet: Monocular 3D Object Detection With Feature Enhancement Networks [J].
Bao, Wentao ;
Xu, Bin ;
Chen, Zhenzhong .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :2753-2765
[4]   An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme [J].
Bi, Jingjun ;
Zhang, Chongsheng .
KNOWLEDGE-BASED SYSTEMS, 2018, 158 :81-93
[5]   Kinematic 3D Object Detection in Monocular Video [J].
Brazil, Garrick ;
Pons-Moll, Gerard ;
Liu, Xiaoming ;
Schiele, Bernt .
COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :135-152
[6]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[7]  
Burda Y., 2018, arXiv
[8]   Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [J].
Chabot, Florian ;
Chaouch, Mohamed ;
Rabarisoa, Jaonary ;
Teuliere, Celine ;
Chateau, Thierry .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1827-1836
[9]  
Chen T, 2020, PR MACH LEARN RES, V119
[10]  
Chen XZ, 2015, ADV NEUR IN, V28