Video Object Segmentation without Temporal Information

被引:212
作者
Maninis, Kevis-Kokitsi [1 ]
Caelles, Sergi [1 ]
Chen, Yuhua [1 ]
Pont-Tuset, Jordi [1 ]
Leal-Taixe, Laura [2 ]
Cremers, Daniel [2 ]
Van Gool, Luc [1 ]
机构
[1] ETHZ, CH-8092 Zurich, Switzerland
[2] TUM, D-80333 Munich, Germany
基金
欧盟地平线“2020”;
关键词
Video object segmentation; convolutional neural networks; semantic segmentation; instance segmentation;
D O I
10.1109/TPAMI.2018.2838670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly. This paper explores the orthogonal approach of processing each frame independently, i.e., disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOSS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance-level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent single-object video segmentation databases, which show that OSVOSS is both the fastest and most accurate method in the state of the art. Experiments on multi-object video segmentation show that OSVOSS obtains competitive results.
引用
收藏
页码:1515 / 1530
页数:16
相关论文
共 75 条
[1]  
[Anonymous], 2015, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2015.7298642
[2]  
[Anonymous], 1833, Die stroboscopischen Scheiben
[3]  
oder, Optischen Zauberscheiben: Deren Theorie und wissenschaft liche Anwendung
[4]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.790
[5]   Contour Detection and Hierarchical Image Segmentation [J].
Arbelaez, Pablo ;
Maire, Michael ;
Fowlkes, Charless ;
Malik, Jitendra .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) :898-916
[6]   Semantic Segmentation with Boundary Neural Fields [J].
Bertasius, Gedas ;
Shi, Jianbo ;
Torresani, Lorenzo .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3602-3610
[7]   High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and its Applications to High-Level Vision [J].
Bertasius, Gedas ;
Shi, Jianbo ;
Torresani, Lorenzo .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :504-512
[8]  
Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21
[9]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[10]   A Video Representation Using Temporal Superpixels [J].
Chang, Jason ;
Wei, Donglai ;
Fisher, John W., III .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2051-2058