Depth Estimation of Video Sequences With Perceptual Losses

被引:20
作者
Wang, Anjie [1 ]
Fang, Zhijun [1 ]
Gao, Yongbin [1 ]
Jiang, Xiaoyan [1 ]
Ma, Siwei [2 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201620, Peoples R China
[2] Peking Univ, Dept Elect Engn & Comp Sci, Beijing 100871, Peoples R China
来源
IEEE ACCESS | 2018年 / 6卷
基金
中国国家自然科学基金;
关键词
Depth estimation; perceptual losses; unsupervised; deep learning;
D O I
10.1109/ACCESS.2018.2846546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
3-D vision plays an important role in intelligent perception of robot, while it requires extra 3-D sensors. Depth estimation from monocular videos provides an alternative mechanism to recover the 3-D information. In this paper, we propose an unsupervised learning framework that uses the perceptual loss for depth estimation. Depth and pose networks are first trained to estimate the depth and the camera motion of the video sequence, respectively. With the estimated depth and pose of the original frame, the adjacent frame can be reconstructed. The pixel-wise differences between the constructed frame and the original frame are used as per-pixel loss. Meanwhile, reconstructed views and original views can be used to extract advanced features from a pre-trained network to define and optimize perceptual loss functions to assess the quality of reconstructions. We combine the respective advantages of these two methods and present an approach of generating a depth map by training the feed-forward network with per-pixel loss function and perceptual loss function. The experimental results show that our method can significantly improve the estimation accuracy of depth map.
引用
收藏
页码:30536 / 30546
页数:11
相关论文
共 37 条
  • [1] Learning to See by Moving
    Agrawal, Pulkit
    Carreira, Joao
    Malik, Jitendra
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 37 - 45
  • [2] [Anonymous], 2016, P IEEE C COMPUTER VI
  • [3] [Anonymous], P INT C LEARN REPR I
  • [4] [Anonymous], 2015, INVERTING VISUAL REP
  • [5] [Anonymous], 2017, SFM NET LEARNING STR
  • [6] [Anonymous], 2015, PROC 28 INT C NEURAL
  • [7] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [8] Dosovitskiy Alexey, 2016, ADV NEURAL INFORM PR, V29, DOI DOI 10.48550/ARXIV.1602.02644
  • [9] Eigen D, 2014, ADV NEUR IN, V27
  • [10] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
    Eigen, David
    Fergus, Rob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2650 - 2658