Video Object Segmentation with 3D Convolution Network

被引：0

作者：

Tang, Huiyun ^{[1
]}

Tao, Pin ^{[1
]}

Ma, Rui ^{[1
]}

Shi, Yuanchun ^{[1
]}

机构：

[1] Tsinghua Univ, Beijing 100084, Peoples R China

来源：

ICCCV 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON CONTROL AND COMPUTER VISION | 2019年

基金：

中国国家自然科学基金;

关键词：

Video object segmentation; 3-dimension convolution network; Spatiotemporal feature;

D O I：

10.1145/3341016.3341031

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We explore a novel method to realize semi-supervised video object segmentation with special spatiotemporal feature extracting structure. Considering 3-dimension convolution network can convolute a volume of image sequence, it is a distinct way to get both spatial and temporal information. Our network is composed of three parts, the visual module, the motion module and the decoder module. The visual module learns object appearance feature from object in the first frame for network to detect specific object in following image sequences. The motion module aims to get spatiotemporal information of image sequences with 3-dimension convolution network, which learns diversities of object temporal appearance and location. The purpose of decoder module is to get foreground object mask from output of visual module and motion module with concatenation and upsampling structure. We evaluate our model on DAVIS segmentation dataset[15]. Our model doesn't need online training compared with most detection-based methods because of visual module. As a result, it takes 0.14 second per frame to get mask which is 71 times faster than the state-of-art method-OSVOS[2]. Our model also shows better performance than most methods proposed in recent years and its meanIOU accuracy is comparable with state-of-art methods.

引用

页码：28 / 32

页数：5

共 32 条

[1] [Anonymous], 2017, CVPR
[2] [Anonymous], 2018, EUR C COMP VIS ECCV
[3] [Anonymous], 2017, PROC IEEE C COMPUT V
[4] [Anonymous], 2018, ARXIV181209834
[5] [Anonymous], 2017, IEEE CVPR
[6] Barron JT, 2015, PROC CVPR IEEE, P4466, DOI 10.1109/CVPR.2015.7299076
[7] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
Cheng, Jingchun
Tsai, Yi-Hsuan
Wang, Shengjin
Yang, Ming-Hsuan
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 686 - 695
[8] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[9] Faktor Alon, 2014, BMVC
[10] A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
Galasso, Fabio
Nagaraja, Naveen Shankar
Cardenas, Tatiana Jimenez
Brox, Thomas
Schiele, Bernt
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3527 - 3534

← 1 2 3 4 →