MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods

被引:3
作者
Pan, Huihui [1 ,2 ]
Jia, Yisong [1 ]
Wang, Jue [3 ,4 ]
Sun, Weichao [1 ]
机构
[1] Harbin Inst Technol, Res Inst Intelligent Control & Syst, Harbin 150001, Peoples R China
[2] Tongji Univ, Natl Key Lab Autonomous Intelligent Unmanned Syst, Shanghai 201210, Peoples R China
[3] Ningbo Inst Intelligent Equipment Technol Co Ltd, Ningbo 315200, Peoples R China
[4] Univ Sci & Technol China, Dept Automat, Hefei 230027, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Object detection; Head; Detectors; Neck; Training; Feature extraction; Depth measurement; Convolution; Autonomous vehicles; Monocular 3D object detection; deep learning; autonomous driving; optimizer;
D O I
10.1109/TITS.2025.3525772
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Monocular 3D object detection finds applications in various fields, notably in intelligent driving, due to its cost-effectiveness and ease of deployment. However, its accuracy significantly lags behind LiDAR-based methods, primarily because the monocular depth estimation problem is inherently challenging. While some methods leverage additional information to aid in network training and enhance performance, they are hindered by their reliance on specific datasets. We contend that many components of monocular 3D object detection lack the necessary adaptability, impeding the performance of the detector. In this paper, we propose six adaptive methods addressing issues related to network structure, loss function, and optimizer. These methods specifically target the rigid components within the detector that hinder adaptability. Simultaneously, we provide theoretical insights into the network output and propose two novel regression methods. These methods facilitate more straightforward learning for the network. Importantly, our approach does not depend on supplementary information, allowing for end-to-end training. In comparison with existing methods, our proposed approach demonstrates competitive speed and accuracy. On the KITTI dataset, our method achieves a 17.72% AP(3D)(IOU = 0.7, Car, Moderate), outperforming all previous monocular methods. Additionally, our approach prioritizes speed, achieving a runtime of up to 52 FPS on an RTX 2080Ti GPU, surpassing all previous monocular methods. The source codes are at: https://github.com/jiayisong/AMNet.
引用
收藏
页码:3574 / 3587
页数:14
相关论文
共 48 条
[31]   MonoGround: Detecting Monocular 3D Objects from the Ground [J].
Qin, Zequn ;
Li, Xi .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :3783-3792
[32]   Disentangling Monocular 3D Object Detection [J].
Simonelli, Andrea ;
Bulo, Samuel Rota ;
Porzi, Lorenzo ;
Lopez-Antequera, Manuel ;
Kontschieder, Peter .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1991-1999
[33]   TSV-Based SIW Bandpass Filter With Adjustable Transmission Zeros For D-Band Applications [J].
Wang, Fengjuan ;
Yang, Zhuoyu ;
Yin, Xiangkun ;
Yu, Ningmei ;
Yang, Yuan .
IEICE ELECTRONICS EXPRESS, 2023, 20 (10) :1-3
[34]   Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection [J].
Wang, Li ;
Du, Liang ;
Ye, Xiaoqing ;
Fu, Yanwei ;
Guo, Guodong ;
Xue, Xiangyang ;
Feng, Jianfeng ;
Zhang, Li .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :454-463
[35]   IoU-aware single-stage object detector for accurate localization [J].
Wu, Shengkai ;
Li, Xiaoping ;
Wang, Xinggang .
IMAGE AND VISION COMPUTING, 2020, 97
[36]   Occlusion-Aware Plane-Constraints for Monocular 3D Object Detection [J].
Yao, Hongdou ;
Chen, Jun ;
Wang, Zheng ;
Wang, Xiao ;
Han, Pengfei ;
Chai, Xiaoyu ;
Qiu, Yansheng .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) :4593-4605
[37]   Deep Layer Aggregation [J].
Yu, Fisher ;
Wang, Dequan ;
Shelhamer, Evan ;
Darrell, Trevor .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2403-2412
[38]   VarifocalNet: An IoU-aware Dense Object Detector [J].
Zhang, Haoyang ;
Wang, Ying ;
Dayoub, Feras ;
Sunderhauf, Niko .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8510-8519
[39]  
Zhang Renrui, 2022, arXiv
[40]   Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection [J].
Zhang, Shifeng ;
Chi, Cheng ;
Yao, Yongqiang ;
Lei, Zhen ;
Li, Stan Z. .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9756-9765