Video Saliency Detection Using Deep Convolutional Neural Networks

被引:2
作者
Zhou, Xiaofei [1 ,2 ,3 ]
Liu, Zhi [2 ,3 ]
Gong, Chen [4 ]
Li, Gongyang [2 ,3 ]
Huang, Mengke [2 ,3 ]
机构
[1] Hangzhou Dianzi Univ, Inst Informat & Control, Hangzhou, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai, Peoples R China
[3] Shanghai Univ, Sch Commun & Informat Engn, Shanghai, Peoples R China
[4] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Minist Educ, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PT II | 2018年 / 11257卷
基金
中国国家自然科学基金;
关键词
Video saliency; Convolutional neural networks; Feature aggregation; VISUAL-ATTENTION; SEGMENTATION; IMAGE; MODEL;
D O I
10.1007/978-3-030-03335-4_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numerous deep learning based efforts have been done for image saliency detection, and thus, it is a natural idea that we can construct video saliency model on basis of these image saliency models in an effective way. Besides, as for the limited number of training videos, existing video saliency model is trained with large-scale synthetic video data. In this paper, we construct video saliency model based on existing image saliency model and perform training on the limited video data. Concretely, our video saliency model consists of three steps including feature extraction, feature aggregation and spatial refinement. Firstly, the concatenation of current frame and its optical flow image is fed into the feature extraction network, yielding feature maps. Then, a tensor, which consists of the generated feature maps and the original information including the current frame and the optical flow image, is passed to the aggregation network, in which the original information can provide complementary information for aggregation. Finally, in order to obtain a high-quality saliency map with well-defined boundaries, the output of aggregation network and the current frame are used to perform spatial refinement, yielding the final saliency map for the current frame. The extensive qualitative and quantitative experiments on two challenging video datasets show that the proposed model consistently outperforms the state-of-the-art saliency models for detecting salient objects in videos.
引用
收藏
页码:308 / 319
页数:12
相关论文
共 34 条
  • [1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [2] A Database and Evaluation Methodology for Optical Flow
    Baker, Simon
    Scharstein, Daniel
    Lewis, J. P.
    Roth, Stefan
    Black, Michael J.
    Szeliski, Richard
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2011, 92 (01) : 1 - 31
  • [3] Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation
    Brox, Thomas
    Malik, Jitendra
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) : 500 - 513
  • [4] One-Shot Video Object Segmentation
    Caelles, S.
    Maninis, K. -K.
    Pont-Tuset, J.
    Leal-Taixe, L.
    Cremers, D.
    Van Gool, L.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
  • [5] Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion
    Chen, Chenglizhao
    Li, Shuai
    Wang, Yongguang
    Qin, Hong
    Hao, Aimin
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) : 3156 - 3170
  • [6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [7] Video Object Segmentation Via Dense Trajectories
    Chen, Lin
    Shen, Jianbing
    Wang, Wenguan
    Ni, Bingbing
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (12) : 2225 - 2234
  • [8] Stretchability-aware block scaling for image retargeting
    Du, Huan
    Liu, Zhi
    Jiang, Jianliang
    Shen, Liquan
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2013, 24 (04) : 499 - 508
  • [9] Video Saliency Incorporating Spatiotemporal Cues and Uncertainty Weighting
    Fang, Yuming
    Wang, Zhou
    Lin, Weisi
    Fang, Zhijun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) : 3910 - 3921
  • [10] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448