Depth Estimation From Surface-Ground Correspondence for Monocular 3D Object Detection

被引：1

作者：

Ji, Yinshuai ^{[1
]}

Xu, Jinhua ^{[1
]}

机构：

[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 11期

关键词：

Three-dimensional displays; Estimation; Object detection; Uncertainty; Task analysis; Head; Feature extraction; Monocular 3D object detection; object detection; depth estimation; ground depth; automatic driving;

D O I：

10.1109/TITS.2024.3411159

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Monocular 3D object detection has attracted great attention due to simplicity and low cost. However, object location recovery in the 3D space from a monocular image is challenging since the depth information is lost. How to estimate the instance depth is the core problem to be solved. Intuitively, the ground depth is continuous and global in essence, independent of the objects in the scene. Therefore the ground depth estimation can be more accurate and easier than the object depth estimation. Inspired by this, we propose to map a set of surface points of an object onto the ground plane and decompose the object depth solving problem into the ground depth estimation and surface point heights estimation. During the training stage, dense ground depth labels are provided by the ground truth (GT) surface depths of objects from LiDAR data. In the inference stage, surface depths are recovered through querying the ground depth map. As a result, a set of instance depth candidates are obtained and the final instance depth can be assembled according to their uncertainties. In addition, since most of the mapped ground points are occluded by the object which may mislead the network learning, we devise a depth expansion strategy to extend the ground depth labels. The proposed method MonoSGC achieves state-of-the-art (SOTA) performance on the KITTI and Waymo datasets. Ablation studies demonstrate the effectiveness of the proposed components. The code and model are released at https://github.com/JiYinshuai/MonoSGC.

引用

页码：16312 / 16322

页数：11

共 43 条

[1]

[Anonymous], 2008, P 1 INT WORKSH COGN

[2] Kinematic 3D Object Detection in Monocular Video [J].

Brazil, Garrick ;

Pons-Moll, Gerard ;

Liu, Xiaoming ;

Schiele, Bernt .

COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :135-152

[3] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].

Brazil, Garrick ;

Liu, Xiaoming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295

[4] 3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection [J].

Chen, Xiaozhi ;

Kundu, Kaustav ;

Zhu, Yukun ;

Ma, Huimin ;

Fidler, Sanja ;

Urtasun, Raquel .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) :1259-1272

[5] Monocular 3D Object Detection for Autonomous Driving [J].

Chen, Xiaozhi ;

Kundu, Kaustav ;

Zhang, Ziyu ;

Ma, Huimin ;

Fidler, Sanja ;

Urtasun, Raquel .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156

[6]

Chen XZ, 2015, ADV NEUR IN, V28

[7] MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships [J].

Chen, Yongjian ;

Tai, Lei ;

Sun, Kai ;

Li, Mingyang .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12090-12099

[8]

Chong Z., 2022, ARXIV

[9] Learning Depth-Guided Convolutions for Monocular 3D Object Detection [J].

Ding, Mingyu ;

Huo, Yuqi ;

Yi, Hongwei ;

Wang, Zhe ;

Shi, Jianping ;

Lu, Zhiwu ;

Luo, Ping .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11669-11678

[10] RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection [J].

Fan, Lue ;

Xiong, Xuan ;

Wang, Feng ;

Wang, Naiyan ;

Zhang, Zhaoxiang .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2898-2907

← 1 2 3 4 5 →