Accurate video saliency prediction via hierarchical fusion and temporal recurrence

被引:2
作者
Zhang, Yunzuo [1 ]
Zhang, Tian [1 ]
Wu, Cunyu [1 ]
Zheng, Yuxin [1 ]
机构
[1] Shijiazhuang Tiedao Univ, Sch Informat Sci & Technol, Shijiazhuang 050043, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Video saliency prediction; Hierarchical spatiotemporal feature; Temporal recurrence; 3D convolutional network; Attention mechanism; CONVOLUTIONAL NETWORKS; NEURAL-NETWORK; MODEL; EYE;
D O I
10.1016/j.imavis.2023.104744
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the ability to extract spatiotemporal features, 3D convolutional networks have become the mainstream method for Video Saliency Prediction (VSP). However, these methods cannot make full use of hierarchical spatio-temporal features and also lack focus on past salient features, which hinders further improvements in accuracy. To address these issues, we propose a 3D convolutional Network based on Hierarchical Fusion and Temporal Re-currence (HFTR-Net) for VSP. Specifically, we propose a Bi-directional Temporal-Spatial Feature Pyramid (BiTSFP), which adds the flow of shallow location information based on the previous flow of deep semantic infor-mation. Then, different from simple addition and concatenation, we design a Hierarchical Adaptive Fusion (HAF) mechanism that can adaptively learn the fusion weights of adjacent features to integrate them appropriately. Moreover, to utilize previous salient information, a Recall 3D convGRU (R3D GRU) module is integrated into the 3D convolution-based method for the first time. It subtly combines the local feature extraction of the 3D back-bone with the long-term relationship modeling of the temporal recurrence mechanism. Experimental results on the three common datasets demonstrate that the HFTR-Net outperforms existing state-of-the-art methods in accuracy.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 69 条
  • [1] [Anonymous], 2006, Advances in neural information processing systems, DOI DOI 10.7551/MITPRESS/7503.003.0073
  • [2] [Anonymous], 2017, P MACHINE LEARNING R
  • [3] Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction
    Bak, Cagdas
    Kocak, Aysun
    Erdem, Erkut
    Erdem, Aykut
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (07) : 1688 - 1698
  • [4] Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
    Bellitto, G.
    Proietto Salanitri, F.
    Palazzo, S.
    Rundo, F.
    Giordano, D.
    Spampinato, C.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (12) : 3216 - 3232
  • [5] What Do Different Evaluation Metrics Tell Us About Saliency Models?
    Bylinskii, Zoya
    Judd, Tilke
    Oliva, Aude
    Torralba, Antonio
    Durand, Fredo
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) : 740 - 757
  • [6] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [7] Human Vision Attention Mechanism-Inspired Temporal-Spatial Feature Pyramid for Video Saliency Detection
    Chang, Qinyao
    Zhu, Shiping
    [J]. COGNITIVE COMPUTATION, 2023, 15 (03) : 856 - 868
  • [8] Video saliency prediction using enhanced spatiotemporal alignment network
    Chen, Jin
    Song, Huihui
    Zhang, Kaihua
    Liu, Bo
    Liu, Qingshan
    [J]. PATTERN RECOGNITION, 2021, 109
  • [9] Video Saliency Incorporating Spatiotemporal Cues and Uncertainty Weighting
    Fang, Yuming
    Wang, Zhou
    Lin, Weisi
    Fang, Zhijun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) : 3910 - 3921
  • [10] Going from Image to Video Saliency: Augmenting Image Salience with Dynamic Attentional Push
    Gorji, Siavash
    Clark, James J.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7501 - 7511