SATSal: A Multi-Level Self-Attention Based Architecture for Visual Saliency Prediction

被引:16
|
作者
Tliba, Marouane [1 ]
Kerkouri, Mohamed A. [1 ]
Ghariba, Bashir [2 ]
Chetouani, Aladine [1 ]
Coeltekin, Arzu [3 ]
Shehata, Mohamed [4 ]
Bruno, Alessandro [5 ]
机构
[1] Univ Orleans, Lab PRISME, F-45067 Orleans, France
[2] Elmergib Univ, Fac Engn, Dept Elect & Comp Engn, Khoms, Libya
[3] Univ Appl Sci & Arts Northwestern Switzerland, Inst Interact Technol, CH-4132 Windisch, Switzerland
[4] Univ British Columbia, Dept Comp Sci, Kelowna, BC V6T 1Z4, Canada
[5] Bournemouth Univ, Fac Sci & Technol, Dept Comp & Informat, Poole BH12 5BB, Dorset, England
关键词
Visualization; Feature extraction; Computational modeling; Predictive models; Task analysis; Semantics; Mathematical models; Eye movements; low and high vision; saliency prediction; self-attention; visual attention; IMAGE CLASSIFICATION; EYE-MOVEMENTS; BOTTOM-UP; SCENE;
D O I
10.1109/ACCESS.2022.3152189
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human visual Attention modelling is a persistent interdisciplinary research challenge, gaining new interest in recent years mainly due to the latest developments in deep learning. That is particularly evident in saliency benchmarks. Novel deep learning-based visual saliency models show promising results in capturing high-level (top-down) human visual attention processes. Therefore, they strongly differ from the earlier approaches, mainly characterised by low-level (bottom-up) visual features. These developments account for innate human selectivity mechanisms that are reliant on both high- and low-level factors. Moreover, the two factors interact with each other. Motivated by the importance of these interactions, in this project, we tackle visual saliency modelling holistically, examining if we could consider both high- and low-level features that govern human attention. Specifically, we propose a novel method SAtSal (Self-Attention Saliency). SAtSal leverages both high and low-level features using a multilevel merging of skip connections during the decoding stage. Consequently, we incorporate convolutional self-attention modules on skip connection from the encoder to the decoder network to properly integrate the valuable signals from multilevel spatial features. Thus, the self-attention modules learn to filter out the latent representation of the salient regions from the other irrelevant information in an embedded and joint manner with the main encoder-decoder model backbone. Finally, we evaluate SAtSal against various existing solutions to validate our approach, using the well-known standard saliency benchmark MIT300. To further examine SAtSal's robustness on other image types, we also evaluate it on the Le-Meur saliency painting benchmark.
引用
收藏
页码:20701 / 20713
页数:13
相关论文
共 50 条
  • [1] Multi-level Net: A Visual Saliency Prediction Model
    Cornia, Marcella
    Baraldi, Lorenzo
    Serra, Giuseppe
    Cucchiara, Rita
    COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 302 - 315
  • [2] Spatiotemporal module for video saliency prediction based on self-attention
    Wang, Yuhao
    Liu, Zhuoran
    Xia, Yibo
    Zhu, Chunbo
    Zhao, Danpei
    IMAGE AND VISION COMPUTING, 2021, 112
  • [3] Transformer-based multi-level attention integration network for video saliency prediction
    Rui Tan
    Minghui Sun
    Yanhua Liang
    Multimedia Tools and Applications, 2025, 84 (13) : 11833 - 11854
  • [4] Chinese Entity Relation Extraction Based on Multi-level Gated Recurrent Mechanism and Self-attention
    Zhong, Zicheng
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [5] A Deep Multi-Level Network for Saliency Prediction
    Cornia, Marcella
    Baraldi, Lorenzo
    Serra, Giuseppe
    Cucchiara, Rita
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3488 - 3493
  • [6] Retinal blood vessel segmentation and inpainting networks with multi-level self-attention
    Golias, Matus
    Sikudova, Elena
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
  • [7] Cascaded feature fusion with multi-level self-attention mechanism for object detection
    Wang, Chuanxu
    Wang, Huiru
    PATTERN RECOGNITION, 2023, 138
  • [8] Spatio-Temporal Self-Attention Network for Video Saliency Prediction
    Wang, Ziqiang
    Liu, Zhi
    Li, Gongyang
    Wang, Yang
    Zhang, Tianhong
    Xu, Lihua
    Wang, Jijun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1161 - 1174
  • [9] Multi-level feature fusion capsule network with self-attention for facial expression recognition
    Huang, Zhiji
    Yu, Songsen
    Liang, Jun
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (02)
  • [10] Multi-Type Self-Attention Guided Degraded Saliency Detection
    Zhou, Ziqi
    Wang, Zheng
    Lu, Huchuan
    Wang, Song
    Sun, Meijun
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13082 - 13089