Video Object Segmentation without Temporal Information

被引:209
|
作者
Maninis, Kevis-Kokitsi [1 ]
Caelles, Sergi [1 ]
Chen, Yuhua [1 ]
Pont-Tuset, Jordi [1 ]
Leal-Taixe, Laura [2 ]
Cremers, Daniel [2 ]
Van Gool, Luc [1 ]
机构
[1] ETHZ, CH-8092 Zurich, Switzerland
[2] TUM, D-80333 Munich, Germany
基金
欧盟地平线“2020”;
关键词
Video object segmentation; convolutional neural networks; semantic segmentation; instance segmentation;
D O I
10.1109/TPAMI.2018.2838670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly. This paper explores the orthogonal approach of processing each frame independently, i.e., disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOSS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance-level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent single-object video segmentation databases, which show that OSVOSS is both the fastest and most accurate method in the state of the art. Experiments on multi-object video segmentation show that OSVOSS obtains competitive results.
引用
收藏
页码:1515 / 1530
页数:16
相关论文
共 50 条
  • [1] Video Object Segmentation with Weakly Temporal Information
    Zhang, Yikun
    Yao, Rui
    Jiang, Qingnan
    Zhang, Changbin
    Wang, Shi
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (03): : 1434 - 1449
  • [2] Probabilistic spatio-temporal video object segmentation incorporating shape information
    Ahmed, Rakib
    Karmakar, Gour C.
    Dooley, Laurence S.
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1893 - 1896
  • [3] Automatic video object segmentation algorithm based on spatio-temporal information
    Zhang, Xiao-Bo
    Liu, Wen-Yao
    Lu, Da-Wei
    Guangdianzi Jiguang/Journal of Optoelectronics Laser, 2008, 19 (03): : 384 - 387
  • [4] Incorporation of texture information for joint spatio-temporal probabilistic video object segmentation
    Ahmed, Rakib
    Karmakar, Gour C.
    Dooley, Laurence S.
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 3089 - 3092
  • [5] Video object segmentation based on temporal frame context information fusion and feature enhancement
    Hou, Zhiqiang
    Li, Fucheng
    Wang, Shuiyuan
    Dai, Nan
    Ma, Sugang
    Fan, Jiulun
    APPLIED INTELLIGENCE, 2023, 53 (06) : 6496 - 6510
  • [6] Video object segmentation based on temporal frame context information fusion and feature enhancement
    Zhiqiang Hou
    Fucheng Li
    Shuiyuan Wang
    Nan Dai
    Sugang Ma
    Jiulun Fan
    Applied Intelligence, 2023, 53 : 6496 - 6510
  • [7] Video Object Segmentation by Integrating Motion Information and Gradient Compensation without Background Construction
    Hu, Wu-Chih
    Yang, Ching-Yu
    Huang, Deng-Yuan
    Hsu, Jung-Fu
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 234 - 238
  • [8] Temporal Collection and Distribution for Referring Video Object Segmentation
    Tang, Jiajin
    Zheng, Ge
    Yang, Sibei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15420 - 15430
  • [9] Temporal Context Enhanced Referring Video Object Segmentation
    Hu, Xiao
    Hampiholi, Basavaraj
    Neumann, Heiko
    Lang, Jochen
    2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 5562 - 5571
  • [10] Video segmentation based on spatial and temporal information
    Choi, JG
    Lee, SW
    Kim, SD
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 2661 - 2664