STI-Net: Spatiotemporal integration network for video saliency detection

被引:14
|
作者
Zhou, Xiaofei [1 ]
Cao, Weipeng [2 ]
Gao, Hanxiao [1 ]
Ming, Zhong [2 ]
Zhang, Jiyong [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen 518107, Peoples R China
基金
中国国家自然科学基金;
关键词
Spatiotemporal saliency; Feature aggregation; Saliency prediction; Saliency fusion; OBJECT DETECTION; FUSION; SEGMENTATION; ATTENTION; FEATURES;
D O I
10.1016/j.ins.2023.01.106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image saliency detection, to which much effort has been devoted in recent years, has advanced significantly. In contrast, the community has paid little attention to video saliency detection. Especially, existing video saliency models are very likely to fail in videos with difficult scenarios such as fast motion, dynamic background, and nonrigid deformation. Furthermore, performing video saliency detection directly using image saliency models that ignore video temporal information is inappropriate. To alleviate this issue, this study proposes a novel end-to-end spatiotemporal integration network (STI-Net) for detecting salient objects in videos. Specifically, our method is made up of three key steps: feature aggregation, saliency prediction, and saliency fusion, which are used sequentially to generate spatiotemporal deep feature maps, coarse saliency predictions, and the final saliency map. The key advantage of our model lies in the comprehensive exploration of spatial and temporal information across the entire network, where the two features interact with each other in the feature aggregation step, are used to construct boundary cue in the saliency prediction step, and also serve as the original information in the saliency fusion step. As a result, the generated spatiotemporal deep feature maps can precisely and completely characterize the salient objects, and the coarse saliency predictions have well-defined boundaries, effectively improving the final saliency map's quality. Furthermore, "shortcut connections" are introduced into our model to make the proposed network easy to train and obtain accurate results when the network is deep. Extensive experimental results on two publicly available challenging video datasets demonstrate the effectiveness of the proposed model, which achieves comparable performance to state-of-the-art saliency models.
引用
收藏
页码:134 / 147
页数:14
相关论文
共 50 条
  • [31] Spatiotemporal Saliency Detection in Traffic Surveillance
    Li, Wei
    Setiawan, Dhoni Putra
    Zhao, Hua-An
    2017 INTERNATIONAL CONFERENCE ON CONTROL, ELECTRONICS, RENEWABLE ENERGY AND COMMUNICATIONS (ICCREC), 2017, : 139 - 142
  • [32] DevsNet: Deep Video Saliency Network using Short-term and Long-term Cues
    Fang, Yuming
    Zhang, Chi
    Min, Xiongkuo
    Huang, Hanqin
    Yi, Yugen
    Zhai, Guangtao
    Lin, Chia-Wen
    PATTERN RECOGNITION, 2020, 103 (103)
  • [33] VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING
    Fang, Yuming
    Wang, Zhou
    Lin, Weisi
    2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
  • [34] Video saliency detection based on low-level saliency fusion and saliency-aware geodesic
    Li, Weisheng
    Feng, Siqin
    Guan, Hua-Ping
    Zhan, Ziwei
    Gong, Cheng
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (01)
  • [35] Fast Video Saliency Detection based on Feature Competition
    Yan, Hang
    Xu, Yiling
    Sun, Jun
    Yang, Le
    Zhang, Yunfei
    Huang, Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 74 - 77
  • [36] VIDEO SALIENCY DETECTION BASED ON RANDOM WALK WITH RESTART
    Kim, Jun-Seong
    Kim, Hansang
    Sim, Jae-Young
    Kim, Chang-Su
    Lee, Sang-Uk
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 2465 - 2469
  • [37] Accurate and Robust Video Saliency Detection via Self-Paced Diffusion
    Li, Yunxiao
    Li, Shuai
    Chen, Chenglizhao
    Hao, Aimin
    Qin, Hong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (05) : 1153 - 1167
  • [38] Superpixel-Based Spatiotemporal Saliency Detection
    Liu, Zhi
    Zhang, Xiang
    Luo, Shuhua
    Le Meur, Olivier
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (09) : 1522 - 1540
  • [39] Motion-Aware Rapid Video Saliency Detection
    Guo, Fang
    Wang, Wenguan
    Shen, Ziyi
    Shen, Jianbing
    Shao, Ling
    Tao, Dacheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4887 - 4898
  • [40] Tensor-Based Spatiotemporal Saliency Detection
    Dou, Hao
    Li, Bin
    Deng, Qianqian
    Zhang, Lirui
    Pan, Zhihong
    Tian, Jinwen
    MIPPR 2017: REMOTE SENSING IMAGE PROCESSING, GEOGRAPHIC INFORMATION SYSTEMS, AND OTHER APPLICATIONS, 2018, 10611