Video Object Segmentation with 3D Convolution Network

被引:0
作者
Tang, Huiyun [1 ]
Tao, Pin [1 ]
Ma, Rui [1 ]
Shi, Yuanchun [1 ]
机构
[1] Tsinghua Univ, Beijing 100084, Peoples R China
来源
ICCCV 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON CONTROL AND COMPUTER VISION | 2019年
基金
中国国家自然科学基金;
关键词
Video object segmentation; 3-dimension convolution network; Spatiotemporal feature;
D O I
10.1145/3341016.3341031
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We explore a novel method to realize semi-supervised video object segmentation with special spatiotemporal feature extracting structure. Considering 3-dimension convolution network can convolute a volume of image sequence, it is a distinct way to get both spatial and temporal information. Our network is composed of three parts, the visual module, the motion module and the decoder module. The visual module learns object appearance feature from object in the first frame for network to detect specific object in following image sequences. The motion module aims to get spatiotemporal information of image sequences with 3-dimension convolution network, which learns diversities of object temporal appearance and location. The purpose of decoder module is to get foreground object mask from output of visual module and motion module with concatenation and upsampling structure. We evaluate our model on DAVIS segmentation dataset[15]. Our model doesn't need online training compared with most detection-based methods because of visual module. As a result, it takes 0.14 second per frame to get mask which is 71 times faster than the state-of-art method-OSVOS[2]. Our model also shows better performance than most methods proposed in recent years and its meanIOU accuracy is comparable with state-of-art methods.
引用
收藏
页码:28 / 32
页数:5
相关论文
共 32 条
  • [1] [Anonymous], 2017, CVPR
  • [2] [Anonymous], 2018, EUR C COMP VIS ECCV
  • [3] [Anonymous], 2017, PROC IEEE C COMPUT V
  • [4] [Anonymous], 2018, ARXIV181209834
  • [5] [Anonymous], 2017, IEEE CVPR
  • [6] Barron JT, 2015, PROC CVPR IEEE, P4466, DOI 10.1109/CVPR.2015.7299076
  • [7] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
    Cheng, Jingchun
    Tsai, Yi-Hsuan
    Wang, Shengjin
    Yang, Ming-Hsuan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 686 - 695
  • [8] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [9] Faktor Alon, 2014, BMVC
  • [10] A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
    Galasso, Fabio
    Nagaraja, Naveen Shankar
    Cardenas, Tatiana Jimenez
    Brox, Thomas
    Schiele, Bernt
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3527 - 3534