F2Depth: Self-supervised indoor monocular depth estimation via optical flow consistency and feature map synthesis

被引：4

作者：

Guo, Xiaotong ^{[1
]}

Zhao, Huijie ^{[1
,2
,3
]}

Shao, Shuwei ^{[4
]}

Li, Xudong ^{[1
,5
]}

Zhang, Baochang ^{[6
]}

机构：

[1] Beihang Univ, Sch Instrumentat & Optoelect Engn, Key Lab Precis Optomechatron Technol, Minist Educ, 37 Xueyuan Rd, Beijing 100191, Peoples R China

[2] Beihang Univ, Inst Artificial Intelligence, 37 Xueyuan Rd, Beijing 100191, Peoples R China

[3] Beihang Univ, Qingdao Res Inst, Qingdao 266101, Peoples R China

[4] Beihang Univ, Sch Automat Sci & Elect Engn, 37 Xueyuan Rd, Beijing 100191, Peoples R China

[5] Beihang Univ, Hangzhou Res Inst, Inst Artificial Intelligence, 37 Xueyuan Rd, Beijing 100191, Peoples R China

[6] Nanchang Inst Technol, Nanchang 330044, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2024年 / 133卷

关键词：

Deep learning; Self-supervision; Monocular depth estimation; Low-texture; Optical flow estimation;

D O I：

10.1016/j.engappai.2024.108391

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Self-supervised monocular depth estimation methods have been increasingly given much attention due to the benefit of not requiring large, labelled datasets. Such self-supervised methods require high-quality salient features and consequently suffer from severe performance drop for indoor scenes, where low-textured regions dominant in the scenes are almost indiscriminative. To address the issue, we propose a self-supervised indoor monocular depth estimation framework called F-2 Depth. A self-supervised optical flow estimation network is introduced to supervise depth learning. To improve optical flow estimation performance in low-textured areas, only some patches of points with more discriminative features are adopted for finetuning based on our welldesigned patch-based photometric loss. The finetuned optical flow estimation network generates high-accuracy optical flow as a supervisory signal for depth estimation. Correspondingly, an optical flow consistency loss is designed. Multi-scale feature maps produced by finetuned optical flow estimation network perform warping to compute feature map synthesis loss as another supervisory signal for depth learning. Experimental results on the NYU Depth V2 dataset demonstrate the effectiveness of the framework and our proposed losses. To evaluate the generalization ability of our F-2 Depth, we collect a Campus Indoor depth dataset composed of approximately 1500 points selected from 99 images in 18 scenes. Zero-shot generalization experiments on 7-Scenes dataset and Campus Indoor achieve delta(1) accuracy of 75.8% and 76.0% respectively. The accuracy results show that our model can generalize well to monocular images captured in unknown indoor scenes.

引用

页数：13

共 71 条

[1] Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention
Agarwal, Ashutosh
Arora, Chetan
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5850 - 5859
[2] DEPTHFORMER: MULTISCALE VISION TRANSFORMER FOR MONOCULAR DEPTH ESTIMATION WITH GLOBAL LOCAL INFORMATION FUSION
Agarwal, Ashutosh
Arora, Chetan
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3873 - 3877
[3] LocalBins: Improving Depth Estimation by Learning Local Distributions
Bhat, Shariq Farooq
Alhashim, Ibraheem
Wonka, Peter
[J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 480 - 496
[4] AdaBins: Depth Estimation Using Adaptive Bins
Bhat, Shariq Farooq
Alhashim, Ibraheem
Wonka, Peter
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4008 - 4017
[5] Auto-Rectify Network for Unsupervised Indoor Depth Estimation
Bian, Jia-Wang
Zhan, Huangying
Wang, Naiyan
Chin, Tat-Jun
Shen, Chunhua
Reid, Ian
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9802 - 9813
[6] Unsupervised Scale-Consistent Depth Learning from Video
Bian, Jia-Wang
Zhan, Huangying
Wang, Naiyan
Li, Zhichao
Zhang, Le
Shen, Chunhua
Cheng, Ming-Ming
Reid, Ian
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) : 2548 - 2564
[7] A Naturalistic Open Source Movie for Optical Flow Evaluation
Butler, Daniel J.
Wulff, Jonas
Stanley, Garrett B.
Black, Michael J.
[J]. COMPUTER VISION - ECCV 2012, PT VI, 2012, 7577 : 611 - 625
[8] Learning a similarity metric discriminatively, with application to face verification
Chopra, S
Hadsell, R
LeCun, Y
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
[9] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[10] FlowNet: Learning Optical Flow with Convolutional Networks
Dosovitskiy, Alexey
Fischer, Philipp
Ilg, Eddy
Haeusser, Philip
Hazirbas, Caner
Golkov, Vladimir
van der Smagt, Patrick
Cremers, Daniel
Brox, Thomas
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2758 - 2766

← 1 2 3 4 5 6 7 8 →