F2Depth: Self-supervised indoor monocular depth estimation via optical flow consistency and feature map synthesis

被引:4
作者
Guo, Xiaotong [1 ]
Zhao, Huijie [1 ,2 ,3 ]
Shao, Shuwei [4 ]
Li, Xudong [1 ,5 ]
Zhang, Baochang [6 ]
机构
[1] Beihang Univ, Sch Instrumentat & Optoelect Engn, Key Lab Precis Optomechatron Technol, Minist Educ, 37 Xueyuan Rd, Beijing 100191, Peoples R China
[2] Beihang Univ, Inst Artificial Intelligence, 37 Xueyuan Rd, Beijing 100191, Peoples R China
[3] Beihang Univ, Qingdao Res Inst, Qingdao 266101, Peoples R China
[4] Beihang Univ, Sch Automat Sci & Elect Engn, 37 Xueyuan Rd, Beijing 100191, Peoples R China
[5] Beihang Univ, Hangzhou Res Inst, Inst Artificial Intelligence, 37 Xueyuan Rd, Beijing 100191, Peoples R China
[6] Nanchang Inst Technol, Nanchang 330044, Peoples R China
关键词
Deep learning; Self-supervision; Monocular depth estimation; Low-texture; Optical flow estimation;
D O I
10.1016/j.engappai.2024.108391
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Self-supervised monocular depth estimation methods have been increasingly given much attention due to the benefit of not requiring large, labelled datasets. Such self-supervised methods require high-quality salient features and consequently suffer from severe performance drop for indoor scenes, where low-textured regions dominant in the scenes are almost indiscriminative. To address the issue, we propose a self-supervised indoor monocular depth estimation framework called F-2 Depth. A self-supervised optical flow estimation network is introduced to supervise depth learning. To improve optical flow estimation performance in low-textured areas, only some patches of points with more discriminative features are adopted for finetuning based on our welldesigned patch-based photometric loss. The finetuned optical flow estimation network generates high-accuracy optical flow as a supervisory signal for depth estimation. Correspondingly, an optical flow consistency loss is designed. Multi-scale feature maps produced by finetuned optical flow estimation network perform warping to compute feature map synthesis loss as another supervisory signal for depth learning. Experimental results on the NYU Depth V2 dataset demonstrate the effectiveness of the framework and our proposed losses. To evaluate the generalization ability of our F-2 Depth, we collect a Campus Indoor depth dataset composed of approximately 1500 points selected from 99 images in 18 scenes. Zero-shot generalization experiments on 7-Scenes dataset and Campus Indoor achieve delta(1) accuracy of 75.8% and 76.0% respectively. The accuracy results show that our model can generalize well to monocular images captured in unknown indoor scenes.
引用
收藏
页数:13
相关论文
共 71 条
  • [1] Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention
    Agarwal, Ashutosh
    Arora, Chetan
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5850 - 5859
  • [2] DEPTHFORMER: MULTISCALE VISION TRANSFORMER FOR MONOCULAR DEPTH ESTIMATION WITH GLOBAL LOCAL INFORMATION FUSION
    Agarwal, Ashutosh
    Arora, Chetan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3873 - 3877
  • [3] LocalBins: Improving Depth Estimation by Learning Local Distributions
    Bhat, Shariq Farooq
    Alhashim, Ibraheem
    Wonka, Peter
    [J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 480 - 496
  • [4] AdaBins: Depth Estimation Using Adaptive Bins
    Bhat, Shariq Farooq
    Alhashim, Ibraheem
    Wonka, Peter
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4008 - 4017
  • [5] Auto-Rectify Network for Unsupervised Indoor Depth Estimation
    Bian, Jia-Wang
    Zhan, Huangying
    Wang, Naiyan
    Chin, Tat-Jun
    Shen, Chunhua
    Reid, Ian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9802 - 9813
  • [6] Unsupervised Scale-Consistent Depth Learning from Video
    Bian, Jia-Wang
    Zhan, Huangying
    Wang, Naiyan
    Li, Zhichao
    Zhang, Le
    Shen, Chunhua
    Cheng, Ming-Ming
    Reid, Ian
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) : 2548 - 2564
  • [7] A Naturalistic Open Source Movie for Optical Flow Evaluation
    Butler, Daniel J.
    Wulff, Jonas
    Stanley, Garrett B.
    Black, Michael J.
    [J]. COMPUTER VISION - ECCV 2012, PT VI, 2012, 7577 : 611 - 625
  • [8] Learning a similarity metric discriminatively, with application to face verification
    Chopra, S
    Hadsell, R
    LeCun, Y
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
  • [9] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [10] FlowNet: Learning Optical Flow with Convolutional Networks
    Dosovitskiy, Alexey
    Fischer, Philipp
    Ilg, Eddy
    Haeusser, Philip
    Hazirbas, Caner
    Golkov, Vladimir
    van der Smagt, Patrick
    Cremers, Daniel
    Brox, Thomas
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2758 - 2766