SATSal: A Multi-Level Self-Attention Based Architecture for Visual Saliency Prediction

被引：16

作者：

Tliba, Marouane ^{[1
]}

Kerkouri, Mohamed A. ^{[1
]}

Ghariba, Bashir ^{[2
]}

Chetouani, Aladine ^{[1
]}

Coeltekin, Arzu ^{[3
]}

Shehata, Mohamed ^{[4
]}

Bruno, Alessandro ^{[5
]}

机构：

[1] Univ Orleans, Lab PRISME, F-45067 Orleans, France

[2] Elmergib Univ, Fac Engn, Dept Elect & Comp Engn, Khoms, Libya

[3] Univ Appl Sci & Arts Northwestern Switzerland, Inst Interact Technol, CH-4132 Windisch, Switzerland

[4] Univ British Columbia, Dept Comp Sci, Kelowna, BC V6T 1Z4, Canada

[5] Bournemouth Univ, Fac Sci & Technol, Dept Comp & Informat, Poole BH12 5BB, Dorset, England

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Visualization; Feature extraction; Computational modeling; Predictive models; Task analysis; Semantics; Mathematical models; Eye movements; low and high vision; saliency prediction; self-attention; visual attention; IMAGE CLASSIFICATION; EYE-MOVEMENTS; BOTTOM-UP; SCENE;

D O I：

10.1109/ACCESS.2022.3152189

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Human visual Attention modelling is a persistent interdisciplinary research challenge, gaining new interest in recent years mainly due to the latest developments in deep learning. That is particularly evident in saliency benchmarks. Novel deep learning-based visual saliency models show promising results in capturing high-level (top-down) human visual attention processes. Therefore, they strongly differ from the earlier approaches, mainly characterised by low-level (bottom-up) visual features. These developments account for innate human selectivity mechanisms that are reliant on both high- and low-level factors. Moreover, the two factors interact with each other. Motivated by the importance of these interactions, in this project, we tackle visual saliency modelling holistically, examining if we could consider both high- and low-level features that govern human attention. Specifically, we propose a novel method SAtSal (Self-Attention Saliency). SAtSal leverages both high and low-level features using a multilevel merging of skip connections during the decoding stage. Consequently, we incorporate convolutional self-attention modules on skip connection from the encoder to the decoder network to properly integrate the valuable signals from multilevel spatial features. Thus, the self-attention modules learn to filter out the latent representation of the salient regions from the other irrelevant information in an embedded and joint manner with the main encoder-decoder model backbone. Finally, we evaluate SAtSal against various existing solutions to validate our approach, using the well-known standard saliency benchmark MIT300. To further examine SAtSal's robustness on other image types, we also evaluate it on the Le-Meur saliency painting benchmark.

引用

页码：20701 / 20713

页数：13

共 50 条

[1] Multi-level Net: A Visual Saliency Prediction Model
Cornia, Marcella
Baraldi, Lorenzo
Serra, Giuseppe
Cucchiara, Rita
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 302 - 315
[2] Spatiotemporal module for video saliency prediction based on self-attention
Wang, Yuhao
Liu, Zhuoran
Xia, Yibo
Zhu, Chunbo
Zhao, Danpei
IMAGE AND VISION COMPUTING, 2021, 112
[3] Transformer-based multi-level attention integration network for video saliency prediction
Rui Tan
Minghui Sun
Yanhua Liang
Multimedia Tools and Applications, 2025, 84 (13) : 11833 - 11854
[4] Chinese Entity Relation Extraction Based on Multi-level Gated Recurrent Mechanism and Self-attention
Zhong, Zicheng
PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
[5] A Deep Multi-Level Network for Saliency Prediction
Cornia, Marcella
Baraldi, Lorenzo
Serra, Giuseppe
Cucchiara, Rita
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3488 - 3493
[6] Retinal blood vessel segmentation and inpainting networks with multi-level self-attention
Golias, Matus
Sikudova, Elena
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
[7] Cascaded feature fusion with multi-level self-attention mechanism for object detection
Wang, Chuanxu
Wang, Huiru
PATTERN RECOGNITION, 2023, 138
[8] Spatio-Temporal Self-Attention Network for Video Saliency Prediction
Wang, Ziqiang
Liu, Zhi
Li, Gongyang
Wang, Yang
Zhang, Tianhong
Xu, Lihua
Wang, Jijun
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1161 - 1174
[9] Multi-level feature fusion capsule network with self-attention for facial expression recognition
Huang, Zhiji
Yu, Songsen
Liang, Jun
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (02)
[10] Multi-Type Self-Attention Guided Degraded Saliency Detection
Zhou, Ziqi
Wang, Zheng
Lu, Huchuan
Wang, Song
Sun, Meijun
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13082 - 13089

← 1 2 3 4 5 →