MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods

被引：3

作者：

Pan, Huihui ^{[1
,2
]}

Jia, Yisong ^{[1
]}

Wang, Jue ^{[3
,4
]}

Sun, Weichao ^{[1
]}

机构：

[1] Harbin Inst Technol, Res Inst Intelligent Control & Syst, Harbin 150001, Peoples R China

[2] Tongji Univ, Natl Key Lab Autonomous Intelligent Unmanned Syst, Shanghai 201210, Peoples R China

[3] Ningbo Inst Intelligent Equipment Technol Co Ltd, Ningbo 315200, Peoples R China

[4] Univ Sci & Technol China, Dept Automat, Hefei 230027, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2025年 / 26卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Three-dimensional displays; Object detection; Head; Detectors; Neck; Training; Feature extraction; Depth measurement; Convolution; Autonomous vehicles; Monocular 3D object detection; deep learning; autonomous driving; optimizer;

D O I：

10.1109/TITS.2025.3525772

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Monocular 3D object detection finds applications in various fields, notably in intelligent driving, due to its cost-effectiveness and ease of deployment. However, its accuracy significantly lags behind LiDAR-based methods, primarily because the monocular depth estimation problem is inherently challenging. While some methods leverage additional information to aid in network training and enhance performance, they are hindered by their reliance on specific datasets. We contend that many components of monocular 3D object detection lack the necessary adaptability, impeding the performance of the detector. In this paper, we propose six adaptive methods addressing issues related to network structure, loss function, and optimizer. These methods specifically target the rigid components within the detector that hinder adaptability. Simultaneously, we provide theoretical insights into the network output and propose two novel regression methods. These methods facilitate more straightforward learning for the network. Importantly, our approach does not depend on supplementary information, allowing for end-to-end training. In comparison with existing methods, our proposed approach demonstrates competitive speed and accuracy. On the KITTI dataset, our method achieves a 17.72% AP(3D)(IOU = 0.7, Car, Moderate), outperforming all previous monocular methods. Additionally, our approach prioritizes speed, achieving a runtime of up to 52 FPS on an RTX 2080Ti GPU, surpassing all previous monocular methods. The source codes are at: https://github.com/jiayisong/AMNet.

引用

页码：3574 / 3587

页数：14

共 48 条

[11]

Hong YD, 2025, IEEE T NEUR NET LEAR, V36, P3904, DOI 10.1109/TNNLS.2022.3169779

[12] MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer [J].

Huang, Kuan-Chih ;

Wu, Tsung-Han ;

Su, Hung-Ting ;

Hsu, Winston H. .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :4002-4011

[13] Enhancing Monocular 3-D Object Detection Through Data Augmentation Strategies [J].

Jia, Yisong ;

Wang, Jue ;

Pan, Huihui ;

Sun, Weichao .

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 :1-11

[14] Boosting Monocular 3D Object Detection With Object-Centric Auxiliary Depth Supervision [J].

Kim, Youngseok ;

Kim, Sanmin ;

Sim, Sangmin ;

Choi, Jun Won ;

Kum, Dongsuk .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (02) :1801-1813

[15] A Dual Weighting Label Assignment Scheme for Object Detection [J].

Li, Shuai ;

He, Chenhang ;

Li, Ruihuang ;

Zhang, Lei .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9377-9386

[16] Densely Constrained Depth Estimator for Monocular 3D Object Detection [J].

Li, Yingyan ;

Chen, Yuntao ;

He, Jiawei ;

Zhang, Zhaoxiang .

COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 :718-734

[17] Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection [J].

Li, Zhuoling ;

Qu, Zhan ;

Zhou, Yang ;

Liu, Jianzhuang ;

Wang, Haoqian ;

Jiang, Lihui .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :2781-2790

[18] MonoJS']JSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection [J].

Lian, Qing ;

Li, Peiliang ;

Chen, Xiaozhi .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1060-1069

[19]

Liu XP, 2022, AAAI CONF ARTIF INTE, P1810

[20] Hierarchical Bi-Directional Feature Perception Network for Person Re-Identification [J].

Liu, Zhipu ;

Zhang, Lei ;

Yang, Yang .

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :4289-4298

← 1 2 3 4 5 →