MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods

被引:3
作者
Pan, Huihui [1 ,2 ]
Jia, Yisong [1 ]
Wang, Jue [3 ,4 ]
Sun, Weichao [1 ]
机构
[1] Harbin Inst Technol, Res Inst Intelligent Control & Syst, Harbin 150001, Peoples R China
[2] Tongji Univ, Natl Key Lab Autonomous Intelligent Unmanned Syst, Shanghai 201210, Peoples R China
[3] Ningbo Inst Intelligent Equipment Technol Co Ltd, Ningbo 315200, Peoples R China
[4] Univ Sci & Technol China, Dept Automat, Hefei 230027, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Object detection; Head; Detectors; Neck; Training; Feature extraction; Depth measurement; Convolution; Autonomous vehicles; Monocular 3D object detection; deep learning; autonomous driving; optimizer;
D O I
10.1109/TITS.2025.3525772
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Monocular 3D object detection finds applications in various fields, notably in intelligent driving, due to its cost-effectiveness and ease of deployment. However, its accuracy significantly lags behind LiDAR-based methods, primarily because the monocular depth estimation problem is inherently challenging. While some methods leverage additional information to aid in network training and enhance performance, they are hindered by their reliance on specific datasets. We contend that many components of monocular 3D object detection lack the necessary adaptability, impeding the performance of the detector. In this paper, we propose six adaptive methods addressing issues related to network structure, loss function, and optimizer. These methods specifically target the rigid components within the detector that hinder adaptability. Simultaneously, we provide theoretical insights into the network output and propose two novel regression methods. These methods facilitate more straightforward learning for the network. Importantly, our approach does not depend on supplementary information, allowing for end-to-end training. In comparison with existing methods, our proposed approach demonstrates competitive speed and accuracy. On the KITTI dataset, our method achieves a 17.72% AP(3D)(IOU = 0.7, Car, Moderate), outperforming all previous monocular methods. Additionally, our approach prioritizes speed, achieving a runtime of up to 52 FPS on an RTX 2080Ti GPU, surpassing all previous monocular methods. The source codes are at: https://github.com/jiayisong/AMNet.
引用
收藏
页码:3574 / 3587
页数:14
相关论文
共 48 条
[1]   Kinematic 3D Object Detection in Monocular Video [J].
Brazil, Garrick ;
Pons-Moll, Gerard ;
Liu, Xiaoming ;
Schiele, Bernt .
COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :135-152
[2]   Monocular 3D Object Detection Utilizing Auxiliary Learning With Deformable Convolution [J].
Chen, Jiun-Han ;
Shieh, Jeng-Lun ;
Haq, Muhamad Amirul ;
Ruan, Shanq-Jang .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (03) :2424-2436
[3]   Shape-Aware Monocular 3D Object Detection [J].
Chen, Wei ;
Zhao, Jie ;
Zhao, Wan-Lei ;
Wu, Song-Yuan .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (06) :6416-6424
[4]  
Chen XZ, 2015, ADV NEUR IN, V28
[5]   Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving [J].
Chen, Yi-Nan ;
Dai, Hang ;
Ding, Yong .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :877-887
[6]  
Chong Z., 2022, ARXIV
[7]   TOOD: Task-aligned One-stage Object Detection [J].
Feng, Chengjian ;
Zhong, Yujie ;
Gao, Yu ;
Scott, Matthew R. ;
Huang, Weilin .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :3490-3499
[8]   Homography Loss for Monocular 3D Object Detection [J].
Gu, Jiaqi ;
Wu, Bojian ;
Fan, Lubin ;
Huang, Jianqiang ;
Cao, Shen ;
Xiang, Zhiyu ;
Hua, Xian-Sheng .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1070-1079
[9]  
Hong Y., 2024, IEEE Transactions on Intelligent Vehicles, P1, DOI [10.1109/TIV.2024.3380066, DOI 10.1109/TIV.2024.3380066]
[10]   Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection [J].
Hong, Yu ;
Dai, Hang ;
Ding, Yong .
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 :87-104