Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network

被引:61
作者
Zhang, Kao [1 ]
Chen, Zhenzhong [1 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Hubei, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Feature extraction; Predictive models; Streaming media; Visualization; Spatiotemporal phenomena; Computational modeling; Video saliency; spatial-temporal features; visual attention; deep learning; SPATIOTEMPORAL SALIENCY; COMPRESSED-DOMAIN; VISUAL-ATTENTION; MODEL; GAZE;
D O I
10.1109/TCSVT.2018.2883305
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose a novel two-stream neural network for video saliency prediction. Unlike some traditional methods based on hand-crafted feature extraction and integration, our proposed method automatically learns saliency related spatiotemporal features from human fixations without any pre-processing, post-processing, or manual tuning. Video frames are routed through the spatial stream network to compute static or color saliency maps for each of them. And a new two-stage temporal stream network is proposed, which is composed of a pre-trained 2D-CNN model (SF-Net) to extract saliency related features and a shallow 3D-CNN model (Te-Net) to process these features, for temporal or dynamic saliency maps. It can reduce the requirement of video gaze data, improve training efficiency, and achieve high performance. A fusion network is adopted to combine the outputs of both streams and generate the final saliency maps. Besides, a convolutional Gaussian priors (CGP) layer is proposed to learn the bias phenomenon in viewing behavior to improve the performance of the video saliency prediction. The proposed method is compared with state-of-the-art saliency models on two public video saliency benchmark datasets. The results demonstrate that our model can achieve advanced performance on video saliency prediction.
引用
收藏
页码:3544 / 3557
页数:14
相关论文
共 81 条
[11]   Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model [J].
Cornia, Marcella ;
Baraldi, Lorenzo ;
Serra, Giuseppe ;
Cucchiara, Rita .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (10) :5142-5154
[12]  
Cornia M, 2016, INT C PATT RECOG, P3488, DOI 10.1109/ICPR.2016.7900174
[13]   Salient Motion Features for Video Quality Assessment [J].
Culibrk, Dubravko ;
Mirkovic, Milan ;
Zlokolica, Vladimir ;
Pokric, Maja ;
Crnojevic, Vladimir ;
Kukolj, Dragan .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2011, 20 (04) :948-958
[14]   Visual Saliency Prediction Using a Mixture of Deep Neural Networks [J].
Dodge, Samuel F. ;
Karam, Lina J. .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) :4080-4090
[15]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[16]   Video Saliency Incorporating Spatiotemporal Cues and Uncertainty Weighting [J].
Fang, Yuming ;
Wang, Zhou ;
Lin, Weisi ;
Fang, Zhijun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) :3910-3921
[17]   A Video Saliency Detection Model in Compressed Domain [J].
Fang, Yuming ;
Lin, Weisi ;
Chen, Zhenzhong ;
Tsai, Chia-Ming ;
Lin, Chia-Wen .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (01) :27-38
[18]   Saliency Detection in the Compressed Domain for Adaptive Image Retargeting [J].
Fang, Yuming ;
Chen, Zhenzhong ;
Lin, Weisi ;
Lin, Chia-Wen .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2012, 21 (09) :3888-3901
[19]  
Girshick R., 2014, IEEE COMP SOC C COMP, DOI [10.1109/CVPR.2014.81, DOI 10.1109/CVPR.2014.81]
[20]   SEPARATE VISUAL PATHWAYS FOR PERCEPTION AND ACTION [J].
GOODALE, MA ;
MILNER, AD .
TRENDS IN NEUROSCIENCES, 1992, 15 (01) :20-25