Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network

被引:57
|
作者
Zhang, Kao [1 ]
Chen, Zhenzhong [1 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Hubei, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Feature extraction; Predictive models; Streaming media; Visualization; Spatiotemporal phenomena; Computational modeling; Video saliency; spatial-temporal features; visual attention; deep learning; SPATIOTEMPORAL SALIENCY; COMPRESSED-DOMAIN; VISUAL-ATTENTION; MODEL; GAZE;
D O I
10.1109/TCSVT.2018.2883305
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose a novel two-stream neural network for video saliency prediction. Unlike some traditional methods based on hand-crafted feature extraction and integration, our proposed method automatically learns saliency related spatiotemporal features from human fixations without any pre-processing, post-processing, or manual tuning. Video frames are routed through the spatial stream network to compute static or color saliency maps for each of them. And a new two-stage temporal stream network is proposed, which is composed of a pre-trained 2D-CNN model (SF-Net) to extract saliency related features and a shallow 3D-CNN model (Te-Net) to process these features, for temporal or dynamic saliency maps. It can reduce the requirement of video gaze data, improve training efficiency, and achieve high performance. A fusion network is adopted to combine the outputs of both streams and generate the final saliency maps. Besides, a convolutional Gaussian priors (CGP) layer is proposed to learn the bias phenomenon in viewing behavior to improve the performance of the video saliency prediction. The proposed method is compared with state-of-the-art saliency models on two public video saliency benchmark datasets. The results demonstrate that our model can achieve advanced performance on video saliency prediction.
引用
收藏
页码:3544 / 3557
页数:14
相关论文
共 50 条
  • [1] Spatial-Temporal Analysis-Based Video Quality Assessment: A Two-Stream Convolutional Network Approach
    He, Jianghui
    Wang, Zhe
    Liu, Yi
    Song, Yang
    ELECTRONICS, 2024, 13 (10)
  • [2] A Spatial-Temporal Recurrent Neural Network for Video Saliency Prediction
    Zhang, Kao
    Chen, Zhenzhong
    Liu, Shan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 572 - 587
  • [3] Spatial-temporal interaction learning based two-stream network for action recognition
    Liu, Tianyu
    Ma, Yujun
    Yang, Wenhan
    Ji, Wanting
    Wang, Ruili
    Jiang, Ping
    INFORMATION SCIENCES, 2022, 606 : 864 - 876
  • [4] Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification
    Peng, Yuxin
    Zhao, Yunzhen
    Zhang, Junchao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (03) : 773 - 786
  • [5] A two-stream network with joint spatial-temporal distance for video-based person re-identification
    Han, Zhisong
    Liang, Yaling
    Chen, Zengqun
    Zhou, Zhiheng
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 3769 - 3781
  • [6] Spatial-Temporal Attention Two-Stream Convolution Neural Network for Smoke Region Detection
    Ding, Zhipeng
    Zhao, Yaqin
    Li, Ao
    Zheng, Zhaoxiang
    FIRE-SWITZERLAND, 2021, 4 (04):
  • [7] Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition
    Xia, Limin
    Fu, Weiye
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (08): : 11611 - 11626
  • [8] Video Captioning Based on the Spatial-Temporal Saliency Tracing
    Zhou, Yuanen
    Hu, Zhenzhen
    Liu, Xueliang
    Wang, Meng
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 59 - 70
  • [9] Two-Stream Spatial-Temporal Auto-Encoder With Adversarial Training for Video Anomaly Detection
    Guo, Biao
    Liu, Mingrui
    He, Qian
    Jiang, Ming
    IEEE ACCESS, 2024, 12 : 125881 - 125889
  • [10] Contrast Based Hierarchical Spatial-Temporal Saliency for Video
    Le, Trung-Nghia
    Sugimoto, Akihiro
    IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 734 - 748