Accurate video saliency prediction via hierarchical fusion and temporal recurrence

被引：2

作者：

Zhang, Yunzuo ^{[1
]}

Zhang, Tian ^{[1
]}

Wu, Cunyu ^{[1
]}

Zheng, Yuxin ^{[1
]}

机构：

[1] Shijiazhuang Tiedao Univ, Sch Informat Sci & Technol, Shijiazhuang 050043, Hebei, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 136卷

基金：

中国国家自然科学基金;

关键词：

Video saliency prediction; Hierarchical spatiotemporal feature; Temporal recurrence; 3D convolutional network; Attention mechanism; CONVOLUTIONAL NETWORKS; NEURAL-NETWORK; MODEL; EYE;

D O I：

10.1016/j.imavis.2023.104744

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the ability to extract spatiotemporal features, 3D convolutional networks have become the mainstream method for Video Saliency Prediction (VSP). However, these methods cannot make full use of hierarchical spatio-temporal features and also lack focus on past salient features, which hinders further improvements in accuracy. To address these issues, we propose a 3D convolutional Network based on Hierarchical Fusion and Temporal Re-currence (HFTR-Net) for VSP. Specifically, we propose a Bi-directional Temporal-Spatial Feature Pyramid (BiTSFP), which adds the flow of shallow location information based on the previous flow of deep semantic infor-mation. Then, different from simple addition and concatenation, we design a Hierarchical Adaptive Fusion (HAF) mechanism that can adaptively learn the fusion weights of adjacent features to integrate them appropriately. Moreover, to utilize previous salient information, a Recall 3D convGRU (R3D GRU) module is integrated into the 3D convolution-based method for the first time. It subtly combines the local feature extraction of the 3D back-bone with the long-term relationship modeling of the temporal recurrence mechanism. Experimental results on the three common datasets demonstrate that the HFTR-Net outperforms existing state-of-the-art methods in accuracy.& COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：12

共 69 条

[1] [Anonymous], 2006, Advances in neural information processing systems, DOI DOI 10.7551/MITPRESS/7503.003.0073
[2] [Anonymous], 2017, P MACHINE LEARNING R
[3] Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction
Bak, Cagdas
Kocak, Aysun
Erdem, Erkut
Erdem, Aykut
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (07) : 1688 - 1698
[4] Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
Bellitto, G.
Proietto Salanitri, F.
Palazzo, S.
Rundo, F.
Giordano, D.
Spampinato, C.
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (12) : 3216 - 3232
[5] What Do Different Evaluation Metrics Tell Us About Saliency Models?
Bylinskii, Zoya
Judd, Tilke
Oliva, Aude
Torralba, Antonio
Durand, Fredo
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) : 740 - 757
[6] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[7] Human Vision Attention Mechanism-Inspired Temporal-Spatial Feature Pyramid for Video Saliency Detection
Chang, Qinyao
Zhu, Shiping
[J]. COGNITIVE COMPUTATION, 2023, 15 (03) : 856 - 868
[8] Video saliency prediction using enhanced spatiotemporal alignment network
Chen, Jin
Song, Huihui
Zhang, Kaihua
Liu, Bo
Liu, Qingshan
[J]. PATTERN RECOGNITION, 2021, 109
[9] Video Saliency Incorporating Spatiotemporal Cues and Uncertainty Weighting
Fang, Yuming
Wang, Zhou
Lin, Weisi
Fang, Zhijun
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) : 3910 - 3921
[10] Going from Image to Video Saliency: Augmenting Image Salience with Dynamic Attentional Push
Gorji, Siavash
Clark, James J.
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7501 - 7511

← 1 2 3 4 5 6 7 →