NeuralDiff: Segmenting 3D objects that move in egocentric videos

被引:19
作者
Tschernezki, Vadim [1 ,2 ]
Larlus, Diane [2 ]
Vedaldi, Andrea [1 ]
机构
[1] Univ Oxford, Visual Geometry Grp, Oxford, England
[2] NAVER LABS Europe, Meylan, France
来源
2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021) | 2021年
关键词
SEGMENTATION;
D O I
10.1109/3DV53792.2021.00099
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a raw video sequence taken from a freely-moving camera, we study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground containing the objects that move within the scene. This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion due to the camera large viewpoint change and parallax. In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them. We achieve this factorization by reconstructing the video via a triple-stream neural rendering network that explains the different motions based on corresponding inductive biases. We demonstrate that our method can successfully separate the different types of motion, outperforming recent neural rendering baselines at this task, and can accurately segment the moving objects. We do so by assessing the method empirically on challenging videos from the EPIC-KITCHENS dataset which we augment with appropriate annotations to create a new benchmark for the task of dynamic object segmentation on unconstrained video sequences, for complex 3D environments.
引用
收藏
页码:910 / 919
页数:10
相关论文
共 40 条
[1]   It's Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos [J].
Bideau, Pia ;
Learned-Miller, Erik .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :433-449
[2]   Traditional and recent approaches in background modeling for foreground detection: An overview [J].
Bouwmans, Thierry .
COMPUTER SCIENCE REVIEW, 2014, 11-12 :31-66
[3]  
Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21
[4]  
Chen Jianchuan, 2021, ARXIVCSABS210613629
[5]   Scaling Egocentric Vision: The EPIC-KITCHENS Dataset [J].
Damen, Dima ;
Doughty, Hazel ;
Farinella, Giovanni Maria ;
Fidler, Sanja ;
Furnari, Antonino ;
Kazakos, Evangelos ;
Moltisanti, Davide ;
Munro, Jonathan ;
Perrett, Toby ;
Price, Will ;
Wray, Michael .
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 :753-771
[6]  
Damen Dima, 2020, ABS200613256 CORR
[7]  
Gao Chen, 2021, ARXIVCSABS210506468
[8]  
Gibson J. J, 1986, ECOLOGICAL APPROACH, P2
[9]  
Grauman Kristen, 2021, EGO4D WORLD 3000 HOU
[10]   FusionSeg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos [J].
Jain, Suyog Dutt ;
Xiong, Bo ;
Grauman, Kristen .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2117-2126