Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration

被引：0

作者：

Wang, Chaoqun ^{[1
]}

Qin, Yiran ^{[1
]}

Kan, Zijian ^{[2
]}

Ma, Ningning ^{[2
]}

Mang, Ruimao ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen Res Inst Big Data, Shenzhen CUHK Shenzhen, Shenzhen, Peoples R China

[2] NIO, Shenzhen, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICRA57147.2024.10610281

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces, as well as the accuracy of object localization within the 3D space. This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization. Different from previous methods which directly predict depth distributions by using a supervised estimation model, we propose a cascade framework consisting of two depth-aware learning paradigms. First, a depth estimation (DE) scheme leverages relative depth information to realize the effective feature lifting from 2D to 3D spaces. Furthermore, a depth calibration (DC) scheme introduces depth reconstruction to further adjust the 3D object localization perturbation along the depth axis. In practice, the DE is explicitly realized by using both the absolute and relative depth optimization loss to promote the precision of depth prediction, while the capability of DC is implicitly embedded into the detection Transformer through a depth denoising mechanism in the training phase. The entire model training is accomplished through an end-to-end manner. We propose a baseline detector and evaluate the effectiveness of our proposal with +2.2%/+2.7% NDS/mAP improvements on NuScenes benchmark, and gain a comparable performance with 55.9%/45.7% NDS/mAP. Furthermore, we conduct extensive experiments to demonstrate its generality based on various detectors with about +2% NDS improvements.

引用

页码：2006 / 2012

页数：7

共 33 条

[1]

Bhat Shariq Farooq, 2021, P IEEE CVF C COMP VI

[2] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[3]

Eigen D, 2014, ADV NEUR IN, V27

[4] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

[5] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[6]

Huang J., 2022, ARXIV

[7]

Huang Jian, 2021, arXiv

[8]

Jiang Yanqin, 2023, P AAAI C ART INT, V37

[9] SiamRPN plus plus : Evolution of Siamese Visual Tracking with Very Deep Networks [J].

Li, Bo ;

Wu, Wei ;

Wang, Qiang ;

Zhang, Fangyi ;

Xing, Junliang ;

Yan, Junjie .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4277-4286

[10]

Li Feng, 2022, P IEEE CVF C COMP VI

← 1 2 3 4 →