CamRaDepth: Semantic Guided Depth Estimation Using Monocular Camera and Sparse Radar for Automotive Perception

被引:2
作者
Sauerbeck, Florian [1 ]
Halperin, Dan [1 ]
Connert, Lukas [1 ]
Betz, Johannes [2 ]
机构
[1] Tech Univ Munich, Munich Inst Robot & Machine Intelligence MIRMI, Inst Automot Technol, TUM Sch Engn & Design,Dept Mobil Syst Engn, D-85748 Garching, Germany
[2] Tech Univ Munich, Munich Inst Robot & Machine Intelligence MIRMI, TUM Sch Engn & Design, Dept Mobil Syst Engn,Professorship Autonomous Veh, D-85748 Garching, Germany
关键词
Autonomous driving; computer vision; depth prediction; intelligent vehicles; semantic segmentation; sensor fusion;
D O I
10.1109/JSEN.2023.3321886
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Our research aims to generate robust, dense 3-D depth maps for robotics, especially autonomous driving applications. Since cameras output 2-D images and active sensors such as LiDAR or radar produce sparse depth measurements, dense depth maps need to be estimated. Recent methods based on visual transformer networks have outperformed conventional deep learning approaches in various computer vision tasks, including depth prediction, but have focused on the use of a single camera image. This article explores the potential of visual transformers applied to the fusion of monocular images, semantic segmentation, and projected sparse radar reflections for robust monocular depth estimation. The addition of a semantic segmentation branch is used to add object-level understanding and is investigated in a supervised and unsupervised manner. We evaluate our new depth estimation approach on the nuScenes dataset where it outperforms existing state-of-the-art camera-radar depth estimation methods. We show that models can benefit from an additional segmentation branch during the training process by transfer learning even without running segmentation at inference. Further studies are needed to investigate the usage of 4-D-imaging radars and enhanced ground-truth generation in more detail. The related code is available as open-source software under https://github.com/TUMFTM/CamRaDepth.
引用
收藏
页码:28442 / 28453
页数:12
相关论文
共 51 条
[1]   Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems [J].
An, Shan ;
Zhou, Fangru ;
Yang, Mei ;
Zhu, Haogang ;
Fu, Changhong ;
Tsintotas, Konstantinos A. .
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, :55-62
[2]  
Arora Sanjeev, 2019, arXiv
[3]  
Brown T., 2020, Advances in Neural Information Processing Systems, P1877, DOI [10.48550/ARXIV.2005.14165, DOI 10.48550/ARXIV.2005.14165, 10.48550/arXiv.2005.14165]
[4]   Are We Ready for Radar to Replace Lidar in All-Weather Mapping and Localization? [J].
Burnett, Keenan ;
Wu, Yuchen ;
Yoon, David J. ;
Schoellig, Angela P. ;
Barfoot, Timothy D. .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) :10328-10335
[5]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[6]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]  
Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021, DOI [10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]
[9]   AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs [J].
Dubey, Shiv Ram ;
Singh, Satish Kumar ;
Chaudhuri, Bidyut Baran .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :5273-5282
[10]   diffGrad: An Optimization Method for Convolutional Neural Networks [J].
Dubey, Shiv Ram ;
Chakraborty, Soumendu ;
Roy, Swalpa Kumar ;
Mukherjee, Snehasis ;
Singh, Satish Kumar ;
Chaudhuri, Bidyut Baran .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) :4500-4511