Self-distilled Feature Aggregation for Self-supervised Monocular Depth Estimation

被引：26

作者：

Zhou, Zhengming ^{[1
,2
,3
]}

Dong, Qiulei ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China

来源：

COMPUTER VISION - ECCV 2022, PT I | 2022年 / 13661卷

基金：

中国国家自然科学基金;

关键词：

Monocular depth estimation; Self-supervised learning; Self-distilled feature aggregation;

D O I：

10.1007/978-3-031-19769-7_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-supervised monocular depth estimation has received much attention recently in computer vision. Most of the existing works in literature aggregate multi-scale features for depth prediction via either straightforward concatenation or element-wise addition, however, such feature aggregation operations generally neglect the contextual consistency between multi-scale features. Addressing this problem, we propose the Self-Distilled Feature Aggregation (SDFA) module for simultaneously aggregating a pair of low-scale and high-scale features and maintaining their contextual consistency. The SDFA employs three branches to learn three feature offset maps respectively: one offset map for refining the input low-scale feature and the other two for refining the input high-scale feature under a designed self-distillation manner. Then, we propose an SDFA-based network for self-supervised monocular depth estimation, and design a self-distilled training strategy to train the proposed network with the SDFA module. Experimental results on the KITTI dataset demonstrate that the proposed method outperforms the comparative state-of-the-art methods in most cases. The code is available at https://github.com/ZM-Zhou/SDFA-Net_pytorch.

引用

页码：709 / 726

页数：18

共 59 条

[1] PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [J].

Bello, Juan Luis Gonzalez ;

Kim, Munchurl .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6847-6856

[2] Self-Supervised Deep Monocular Depth Estimation With Ambiguity Boosting [J].

Bello, Juan Luis Gonzalez ;

Kim, Munchurl .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :9131-9149

[3] S3Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data [J].

Cheng, Bin ;

Saggu, Inderjot Singh ;

Shah, Raunak ;

Bansal, Gaurav ;

Bharadia, Dinesh .

COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :52-69

[4]

Cardace A., 2022, P IEEECVF WINTER C A, P1160

[5] Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].

Chen, Yuhua ;

Schmid, Cordelia ;

Sminchisescu, Cristian .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071

[6] Revealing the Reciprocal Relations between Self-Supervised Stereo and Monocular Depth Estimation [J].

Chen, Zhi ;

Ye, Xiaoqing ;

Yang, Wei ;

Xu, Zhenbo ;

Tan, Xiao ;

Zou, Zhikang ;

Ding, Errui ;

Zhang, Xinming ;

Huang, Liusheng .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15509-15518

[7] Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation [J].

Cheng, Zeyu ;

Zhang, Yi ;

Tang, Chengkai .

IEEE SENSORS JOURNAL, 2021, 21 (23) :26912-26920

[8] Adaptive confidence thresholding for monocular depth estimation [J].

Choi, Hyesong ;

Lee, Hunsang ;

Kim, Sunkyung ;

Kim, Sunok ;

Kim, Seungryong ;

Sohn, Kwanghoon ;

Min, Dongbo .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12788-12798

[9]

Clevert DA, 2016, Arxiv, DOI [arXiv:1511.07289, DOI 10.48550/ARXIV.1511.07289, 10.48550/arxiv.1511.07289]

[10] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

← 1 2 3 4 5 6 →