Video Salient Object Detection via Fully Convolutional Networks

被引：590

作者：

Wang, Wenguan ^{[1
]}

Shen, Jianbing ^{[1
]}

Shao, Ling ^{[2
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci, Beijing Lab Intelligent Informat Technol, Beijing 100081, Peoples R China

[2] Univ East Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2018年 / 27卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Video saliency; deep learning; synthetic video data; salient object detection; fully convolutional network; OPTIMIZATION; DEEP;

D O I：

10.1109/TIP.2017.2754941

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).

引用

页码：38 / 49

页数：12

共 71 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2]

[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.465

[3]

[Anonymous], 2007, Computer Vision and Pattern Recognition (CVPR), IEEE Conference on

[4]

Bak C. C., 2016, 2 STREAM CONVOLUTUTI

[5] State-of-the-Art in Visual Attention Modeling [J].

Borji, Ali ;

Itti, Laurent .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207

[6] Salient Object Detection: A Benchmark [J].

Borji, Ali ;

Sihite, Dicky N. ;

Itti, Laurent .

COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 :414-429

[7]

Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21

[8] Personalizing Human Video Pose Estimation [J].

Charles, James ;

Pfister, Tomas ;

Magee, Derek ;

Hogg, David ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3063-3072

[9] Global Contrast based Salient Region Detection [J].

Cheng, Ming-Ming ;

Zhang, Guo-Xin ;

Mitra, Niloy J. ;

Huang, Xiaolei ;

Hu, Shi-Min .

2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416

[10] Video Saliency Incorporating Spatiotemporal Cues and Uncertainty Weighting [J].

Fang, Yuming ;

Wang, Zhou ;

Lin, Weisi ;

Fang, Zhijun .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) :3910-3921

← 1 2 3 4 5 6 7 8 →