IENet: inheritance enhancement network for video salient object detection

被引:0
作者
Jiang, Tao [1 ]
Wang, Yi [2 ]
Hou, Feng [1 ]
Wang, Ruili [1 ]
机构
[1] Massey Univ, Sch Math & Computat Sci, Auckland 0632, New Zealand
[2] Dalian Univ Technol DUT, RU Int Sch Informat Sci & Engn, Dalian 116000, Peoples R China
基金
中国国家自然科学基金;
关键词
Video salient object detection; Feature fusion; Visual transformer; Frame-aware temporal relationships; OPTIMIZATION; CUES;
D O I
10.1007/s11042-024-18408-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective utilization of spatiotemporal information is essential for improving the accuracy and robustness of Video Salient Object Detection (V-SOD). However, current methods have not fully utilized historical frame information, ultimately resulting in insufficient integration of complementary semantic information. To address this issue, we propose a novel Inheritance Enhancement Network (IENet) based on Transformer. The core of IENet is a Heritable Multi-Frame Attention (HMA) module, which fully exploits long-term context and frame-aware temporal modeling in feature extraction through unidirectional cross-frame enhancement. In contrast to existing methods, our heritable strategy is based on the unidirectional inheritance model using attention maps which ensure the information propagation for each frame is consistent and orderly, avoiding additional interference. Furthermore, we propose an auxiliary attention loss by using inherited attention maps to direct the network to focus more on target regions. The experimental results of our IENet reveal its effectiveness in handling challenging scenes on five popular benchmark datasets. For instance, in the cases of VOS and DAVSOD, our method achieves 0.042% and 0.070% for MAE compared to other competitive models. Particularly, IENet excels in inheriting finer details from historical frames even in complex environments. The module and predicted maps are publicly available at https://github.com/TOMMYWHY/IENet
引用
收藏
页码:72007 / 72026
页数:20
相关论文
共 68 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] Ballas N., 2016, arXiv
  • [3] STA-Net: spatial-temporal attention network for video salient object detection
    Bi, Hong-Bo
    Lu, Di
    Zhu, Hui-Hui
    Yang, Li-Na
    Guan, Hua-Ping
    [J]. APPLIED INTELLIGENCE, 2021, 51 (06) : 3450 - 3459
  • [4] Salient Object Detection: A Benchmark
    Borji, Ali
    Cheng, Ming-Ming
    Jiang, Huaizu
    Li, Jia
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5706 - 5722
  • [5] A Novel Video Salient Object Detection Method via Semisupervised Motion Quality Perception
    Chen, Chenglizhao
    Song, Jia
    Peng, Chong
    Wang, Guodong
    Fang, Yuming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2732 - 2745
  • [6] Mixed Autoencoder for Self-supervised Visual Representation Learning
    Chen, Kai
    Liu, Zhili
    Hong, Lanqing
    Xu, Hang
    Li, Zhenguo
    Yeung, Dit-Yan
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22742 - 22751
  • [7] Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, DOI 10.48550/ARXIV.1706.05587]
  • [8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [9] SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection
    Chen, Yuhuan
    Zou, Wenbin
    Tang, Yi
    Li, Xia
    Xu, Chen
    Komodakis, Nikos
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) : 3345 - 3357
  • [10] Background-foreground interaction for moving object detection in dynamic scenes
    Chen, Zhe
    Wang, Ruili
    Zhang, Zhen
    Wang, Huibin
    Xu, Lizhong
    [J]. INFORMATION SCIENCES, 2019, 483 : 65 - 81