Enhancing Self-supervised Monocular Depth Estimation via Piece-Wise Pose Estimation and Geometric Constraints

被引:0
作者
Shyam, Pranjay [1 ]
Okon, Alexandre [1 ]
Yoo, HyunJin [1 ]
机构
[1] Faurecia IRYStec Inc, Montreal, PQ, Canada
来源
2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024 | 2024年
关键词
AWARE;
D O I
10.1109/WACVW60836.2024.00030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing single and multi-frame monocular depth estimation (MDE) approaches lack depth estimation consistency around object edges, while single-frame approaches generate scale-ambiguous depth albeit at a lower computational complexity. We revisit the framework design to address these limitations and propose a joint approach that intertwines depth estimation and panoptic segmentation networks. We present an instance-aware patch-based contrastive loss to ensure depth consistency within an object in feature space. This approach disentangles the embedding triplet and independently refines anchor-positive and anchor-negative pairs, providing coherent depth within objects. Leveraging the panoptic information, we propose masking small objects during photometric loss computation while extracting 6-DoF pose estimates for dynamic objects in a piece-wise approach, thus facilitating depth estimation in dynamic scenes. We demonstrate this mechanism to be suited for single and multi-frame MDE. In addition, to ensure scale fidelity in single-frame MDE, we capitalize on the inherent linear relationship between computed depth and ground truth when using self-supervised photometric loss-based MDE. For this, we propose using a multi-frame depth estimation as a teacher network to inject geometric insight into the student MDE via a global scaling factor, thus generating absolute depth. We further improve the teacher network architecture by introducing a multi-scale feature fusion mechanism that benefits scenarios with significant camera motion. We perform a comprehensive evaluation to validate the efficacy of the proposed mechanism and obtain state-of-the-art performance on the KITTI dataset.
引用
收藏
页码:221 / 231
页数:11
相关论文
共 58 条
[1]   AdaBins: Depth Estimation Using Adaptive Bins [J].
Bhat, Shariq Farooq ;
Alhashim, Ibraheem ;
Wonka, Peter .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]  
Casser V, 2019, AAAI CONF ARTIF INTE, P8001
[4]   Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation [J].
Chawla, Hemang ;
Varma, Arnav ;
Arani, Elahe ;
Zonooz, Bahram .
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, :5140-5146
[5]   Attention-based context aggregation network for monocular depth estimation [J].
Chen, Yuru ;
Zhao, Haitao ;
Hu, Zhengwei ;
Peng, Jingchao .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (06) :1583-1596
[6]   Masked-attention Mask Transformer for Universal Image Segmentation [J].
Cheng, Bowen ;
Misra, Ishan ;
Schwing, Alexander G. ;
Kirillov, Alexander ;
Girdhar, Rohit .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1280-1289
[7]  
Choi J, 2020, Arxiv, DOI arXiv:2010.02893
[8]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658