Transformer-based multi-level attention integration network for video saliency prediction

被引：0

作者：

Rui Tan ^{[1
]}

Minghui Sun ^{[3
]}

Yanhua Liang ^{[2
]}

机构：

[1] Jilin University,Software College

[2] Jilin University,College of Computer Science and Technology

[3] Jilin University,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education

来源：

Multimedia Tools and Applications | 2025年 / 84卷 / 13期

关键词：

Video saliency prediction; Transformer; Spatio-temporal feature; Self-attention;

D O I：

10.1007/s11042-024-19404-4

中图分类号：

学科分类号：

摘要：

Most existing models for video saliency prediction heavily rely on 3D convolutional operations to extract spatio-temporal features. However, it is worth noting that 3D convolution produces a local receptive field, which may struggle to capture long-range spatio-temporal dependencies effectively. To compensate for such shortage, this paper introduces a novel approach called the Transformer-based Multi-level Attention Integration Network (TMAI-Net) for video saliency prediction. TMAI-Net is designed as a two-stream encoder-decoder model, carefully integrating multi-level features of semantic information. Our model incorporates a Multi-level Interactive Attention(MLIA) module and a Transformer, both implemented based on self-attention mechanism, which are placed at different levels of the model to capture long-range spatio-temporal feature dependencies. Additionally, our model operates on input video frames and attentional patches, allowing the Transformer module to capture structural similarities between related objects in global features and attention features. This, in turn, enables the model to allocate increased attention to salient areas. The efficacy of our proposed approach is validated through extensive experiments conducted on three widely recognized benchmark datasets.

引用

页码：11833 / 11854

页数：21

共 50 条

[1] Transformer-Based Multi-Scale Feature Integration Network for Video Saliency Prediction
Zhou, Xiaofei
Wu, Songhe
Shi, Ran
Zheng, Bolun
Wang, Shuai
Yin, Haibing
Zhang, Jiyong
Yan, Chenggang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7696 - 7707
[2] TransVPR: Transformer-Based Place Recognition with Multi-Level Attention Aggregation
Wang, Ruotong
Shen, Yanqing
Zuo, Weiliang
Zhou, Sanping
Zheng, Nanning
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13638 - 13647
[3] A Deep Multi-Level Network for Saliency Prediction
Cornia, Marcella
Baraldi, Lorenzo
Serra, Giuseppe
Cucchiara, Rita
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3488 - 3493
[4] Transformer-based attention network for stock movement prediction
Zhang, Qiuyue
Qin, Chao
Zhang, Yunfeng
Bao, Fangxun
Zhang, Caiming
Liu, Peide
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 202
[5] Multi-Level Transformer-Based Social Relation Recognition
Wang, Yuchen
Qing, Linbo
Wang, Zhengyong
Cheng, Yongqiang
Peng, Yonghong
SENSORS, 2022, 22 (15)
[6] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
Zhuang, Xuqiang
Liu, Fangai
Hou, Jian
Hao, Jianhua
Cai, Xiaohong
NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1943 - 1960
[7] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
Xuqiang Zhuang
Fangai Liu
Jian Hou
Jianhua Hao
Xiaohong Cai
Neural Processing Letters, 2022, 54 : 1943 - 1960
[8] SATSal: A Multi-Level Self-Attention Based Architecture for Visual Saliency Prediction
Tliba, Marouane
Kerkouri, Mohamed A.
Ghariba, Bashir
Chetouani, Aladine
Coeltekin, Arzu
Shehata, Mohamed
Bruno, Alessandro
IEEE ACCESS, 2022, 10 : 20701 - 20713
[9] MULTI-LEVEL MODEL FOR VIDEO SALIENCY DETECTION
Bi, Hongbo
Lu, Di
Li, Ning
Yang, Lina
Guan, Huaping
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 4654 - 4658
[10] A Transformer-Based Decoder for Semantic Segmentation with Multi-level Context Mining
Shi, Bowen
Jiang, Dongsheng
Zhang, Xiaopeng
Li, Han
Dai, Wenrui
Zou, Junni
Xiong, Hongkai
Tian, Qi
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 624 - 639

← 1 2 3 4 5 →