Self-distilled Feature Aggregation for Self-supervised Monocular Depth Estimation

被引:26
作者
Zhou, Zhengming [1 ,2 ,3 ]
Dong, Qiulei [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China
来源
COMPUTER VISION - ECCV 2022, PT I | 2022年 / 13661卷
基金
中国国家自然科学基金;
关键词
Monocular depth estimation; Self-supervised learning; Self-distilled feature aggregation;
D O I
10.1007/978-3-031-19769-7_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised monocular depth estimation has received much attention recently in computer vision. Most of the existing works in literature aggregate multi-scale features for depth prediction via either straightforward concatenation or element-wise addition, however, such feature aggregation operations generally neglect the contextual consistency between multi-scale features. Addressing this problem, we propose the Self-Distilled Feature Aggregation (SDFA) module for simultaneously aggregating a pair of low-scale and high-scale features and maintaining their contextual consistency. The SDFA employs three branches to learn three feature offset maps respectively: one offset map for refining the input low-scale feature and the other two for refining the input high-scale feature under a designed self-distillation manner. Then, we propose an SDFA-based network for self-supervised monocular depth estimation, and design a self-distilled training strategy to train the proposed network with the SDFA module. Experimental results on the KITTI dataset demonstrate that the proposed method outperforms the comparative state-of-the-art methods in most cases. The code is available at https://github.com/ZM-Zhou/SDFA-Net_pytorch.
引用
收藏
页码:709 / 726
页数:18
相关论文
共 59 条
[1]   PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [J].
Bello, Juan Luis Gonzalez ;
Kim, Munchurl .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6847-6856
[2]   Self-Supervised Deep Monocular Depth Estimation With Ambiguity Boosting [J].
Bello, Juan Luis Gonzalez ;
Kim, Munchurl .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :9131-9149
[3]   S3Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data [J].
Cheng, Bin ;
Saggu, Inderjot Singh ;
Shah, Raunak ;
Bansal, Gaurav ;
Bharadia, Dinesh .
COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :52-69
[4]  
Cardace A., 2022, P IEEECVF WINTER C A, P1160
[5]   Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].
Chen, Yuhua ;
Schmid, Cordelia ;
Sminchisescu, Cristian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071
[6]   Revealing the Reciprocal Relations between Self-Supervised Stereo and Monocular Depth Estimation [J].
Chen, Zhi ;
Ye, Xiaoqing ;
Yang, Wei ;
Xu, Zhenbo ;
Tan, Xiao ;
Zou, Zhikang ;
Ding, Errui ;
Zhang, Xinming ;
Huang, Liusheng .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15509-15518
[7]   Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation [J].
Cheng, Zeyu ;
Zhang, Yi ;
Tang, Chengkai .
IEEE SENSORS JOURNAL, 2021, 21 (23) :26912-26920
[8]   Adaptive confidence thresholding for monocular depth estimation [J].
Choi, Hyesong ;
Lee, Hunsang ;
Kim, Sunkyung ;
Kim, Sunok ;
Kim, Seungryong ;
Sohn, Kwanghoon ;
Min, Dongbo .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12788-12798
[9]  
Clevert DA, 2016, Arxiv, DOI [arXiv:1511.07289, DOI 10.48550/ARXIV.1511.07289, 10.48550/arxiv.1511.07289]
[10]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223