Multi-resolution distillation for self-supervised monocular depth estimation

被引:3
作者
Lee, Sebin [1 ]
Im, Woobin [1 ]
Yoon, Sung-Eui [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, 291 Daehak ro, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
Monocular depth estimation; Self-supervised learning; Self-distillation; Deep learning; VISUAL ODOMETRY; ATTENTION;
D O I
10.1016/j.patrec.2023.11.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Obtaining dense depth ground-truth is not trivial, which leads to the introduction of self-supervised monocular depth estimation. Most self-supervised methods utilize the photometric loss as the primary supervisory signal optimize a depth network. However, such self-supervised training often falls into an undesirable local minimum due to the ambiguity of the photometric loss. In this paper, we propose a novel self-distillation training scheme that provides a new self-supervision signal, depth consistency among different input resolutions, to the depth network. We further introduce a gradient masking strategy that adjusts the self-supervision signal of the depth consistency during back-propagation to boost the effectiveness of our depth consistency. Experiments demonstrate that our method brings meaningful performance improvements when applied to various depth network architectures. Furthermore, our method outperforms the existing self-supervised methods on KITTI, Cityscapes, and DrivingStereo datasets by a noteworthy margin.
引用
收藏
页码:215 / 222
页数:8
相关论文
共 51 条
[1]   A Fast Identity-Independent Expression Recognition System for Robust Cartoonification using Smart Devices [J].
Agarwal, Gorisha ;
Garg, Ronak ;
Garg, Divya ;
Prasad, Bikash ;
Dutta, Tanima ;
Gupta, Hari Prabhat .
TENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING (ICVGIP 2016), 2016,
[2]  
Almalioglu Y, 2019, IEEE INT CONF ROBOT, P5474, DOI [10.1109/icra.2019.8793512, 10.1109/ICRA.2019.8793512]
[3]   Depth-Aware Video Frame Interpolation [J].
Bao, Wenbo ;
Lai, Wei-Sheng ;
Ma, Chao ;
Zhang, Xiaoyun ;
Gao, Zhiyong ;
Yang, Ming-Hsuan .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3698-3707
[4]   PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [J].
Bello, Juan Luis Gonzalez ;
Kim, Munchurl .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6847-6856
[5]  
Cai H., 2021, X-Distill: Improving self-supervised monocular depth via cross-task distillation
[6]   Visformer: The Vision-friendly Transformer [J].
Chen, Zhengsu ;
Xie, Lingxi ;
Niu, Jianwei ;
Liu, Xuefeng ;
Wei, Longhui ;
Tian, Qi .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :569-578
[7]  
Chung JY, 2014, Arxiv, DOI arXiv:1412.3555
[8]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[9]  
Eigen D, 2014, ADV NEUR IN, V27
[10]   Resolution-Aware Knowledge Distillation for Efficient Inference [J].
Feng, Zhanxiang ;
Lai, Jianhuang ;
Xie, Xiaohua .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :6985-6996