A Benchmark Dataset and Saliency-Guided Stacked Autoencoders for Video-Based Salient Object Detection

被引:112
作者
Li, Jia [1 ,2 ]
Xia, Changqun [1 ]
Chen, Xiaowu [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
[2] Beihang Univ, Int Res Inst Multidisciplinary Sci, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Salient object detection; video dataset; stacked autoencoders; model benchmarking; CO-SEGMENTATION; DETECTION MODEL; OPTIMIZATION;
D O I
10.1109/TIP.2017.2762594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-based salient object detection (SOD) has been extensively studied in past decades. However, video-based SOD is much less explored due to the lack of large-scale video datasets within which salient objects are unambiguously defined and annotated. Toward this end, this paper proposes a video-based SOD dataset that consists of 200 videos. In constructing the dataset, we manually annotate all objects and regions over 7650 uniformly sampled keyframes and collect the eye-tracking data of 23 subjects who free-view all videos. From the user data, we find that salient objects in a video can be defined as objects that consistently pop-out throughout the video, and objects with such attributes can be unambiguously annotated by combining manually annotated object/region masks with eye-tracking data of multiple subjects. To the best of our knowledge, it is currently the largest dataset for video-based salient object detection. Based on this dataset, this paper proposes an unsupervised baseline approach for video-based SOD by using saliency-guided stacked autoencoders. In the proposed approach, multiple spatiotemporal saliency cues are first extracted at the pixel, superpixel, and object levels. With these saliency cues, stacked autoencoders are constructed in an unsupervised manner that automatically infers a saliency score for each pixel by progressively encoding the high-dimensional saliency cues gathered from the pixel and its spatiotemporal neighbors. In experiments, the proposed unsupervised approach is compared with 31 state-of-the-art models on the proposed dataset and outperforms 30 of them, including 19 image-based classic (unsupervised or non-deep learning) models, six image-based deep learning models, and five video-based unsupervised models. Moreover, benchmarking results show that the proposed dataset is very challenging and has the potential to boost the development of video-based SOD.
引用
收藏
页码:349 / 364
页数:16
相关论文
共 73 条
[61]  
Wang LJ, 2015, PROC CVPR IEEE, P3183, DOI 10.1109/CVPR.2015.7298938
[62]   Saliency Detection with Recurrent Fully Convolutional Networks [J].
Wang, Linzhao ;
Wang, Lijun ;
Lu, Huchuan ;
Zhang, Pingping ;
Ruan, Xiang .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :825-841
[63]   Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement [J].
Wang, Wenguan ;
Shen, Jianbing ;
Shao, Ling .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) :4185-4196
[64]  
Wenguan Wang, 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P3395, DOI 10.1109/CVPR.2015.7298961
[65]  
Xia C., 2017, PROC CVPR IEEE, P4142, DOI DOI 10.1109/CVPR.2017.468
[66]   Bayesian Saliency via Low and Mid Level Cues [J].
Xie, Yulin ;
Lu, Huchuan ;
Yang, Ming-Hsuan .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (05) :1689-1698
[67]   Hierarchical Saliency Detection [J].
Yan, Qiong ;
Xu, Li ;
Shi, Jianping ;
Jia, Jiaya .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :1155-1162
[68]   Saliency Detection via Graph-Based Manifold Ranking [J].
Yang, Chuan ;
Zhang, Lihe ;
Lu, Huchuan ;
Ruan, Xiang ;
Yang, Ming-Hsuan .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3166-3173
[69]   Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework [J].
Zhang, Dingwen ;
Meng, Deyu ;
Han, Junwei .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (05) :865-878
[70]   A Self-paced Multiple-instance Learning Framework for Co-saliency Detection [J].
Zhang, Dingwen ;
Meng, Deyu ;
Li, Chao ;
Jiang, Lu ;
Zhao, Qian ;
Han, Junwei .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :594-602