STI-Net: Spatiotemporal integration network for video saliency detection

被引：14

作者：

Zhou, Xiaofei ^{[1
]}

Cao, Weipeng ^{[2
]}

Gao, Hanxiao ^{[1
]}

Ming, Zhong ^{[2
]}

Zhang, Jiyong ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China

[2] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen 518107, Peoples R China

来源：

INFORMATION SCIENCES | 2023年 / 628卷

基金：

中国国家自然科学基金;

关键词：

Spatiotemporal saliency; Feature aggregation; Saliency prediction; Saliency fusion; OBJECT DETECTION; FUSION; SEGMENTATION; ATTENTION; FEATURES;

D O I：

10.1016/j.ins.2023.01.106

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image saliency detection, to which much effort has been devoted in recent years, has advanced significantly. In contrast, the community has paid little attention to video saliency detection. Especially, existing video saliency models are very likely to fail in videos with difficult scenarios such as fast motion, dynamic background, and nonrigid deformation. Furthermore, performing video saliency detection directly using image saliency models that ignore video temporal information is inappropriate. To alleviate this issue, this study proposes a novel end-to-end spatiotemporal integration network (STI-Net) for detecting salient objects in videos. Specifically, our method is made up of three key steps: feature aggregation, saliency prediction, and saliency fusion, which are used sequentially to generate spatiotemporal deep feature maps, coarse saliency predictions, and the final saliency map. The key advantage of our model lies in the comprehensive exploration of spatial and temporal information across the entire network, where the two features interact with each other in the feature aggregation step, are used to construct boundary cue in the saliency prediction step, and also serve as the original information in the saliency fusion step. As a result, the generated spatiotemporal deep feature maps can precisely and completely characterize the salient objects, and the coarse saliency predictions have well-defined boundaries, effectively improving the final saliency map's quality. Furthermore, "shortcut connections" are introduced into our model to make the proposed network easy to train and obtain accurate results when the network is deep. Extensive experimental results on two publicly available challenging video datasets demonstrate the effectiveness of the proposed model, which achieves comparable performance to state-of-the-art saliency models.

引用

页码：134 / 147

页数：14

共 50 条

[21] A Spatiotemporal Saliency Model for Video Surveillance
Tong Yubing
Cheikh, Faouzi Alaya
Guraya, Fahad Fazal Elahi
Konik, Hubert
Tremeau, Alain
COGNITIVE COMPUTATION, 2011, 3 (01) : 241 - 263
[22] Visual Saliency Detection Using Spatiotemporal Decomposition
Bhattacharya, Saumik
Venkatesh, K. Subramanian
Gupta, Sumana
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1665 - 1675
[23] Stereoscopic video saliency detection based on spatiotemporal correlation and depth confidence optimization
Zhang, Ping
Liu, Jingwen
Wang, Xiaoyang
Pu, Tian
Fei, Chun
Guo, Zhengkui
NEUROCOMPUTING, 2020, 377 : 256 - 268
[24] Novelty-based Spatiotemporal Saliency Detection for Prediction of Gaze in Egocentric Video
Polatsek, Patrik
Benesova, Wanda
Paletta, Lucas
Perko, Roland
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (03) : 394 - 398
[25] Video saliency detection by gestalt theory
Fang, Yuming
Zhang, Xiaoqiang
Yuan, Feiniu
Imamoglu, Nevrez
Liu, Haiwen
PATTERN RECOGNITION, 2019, 96
[26] Saliency-Aware Convolution Neural Network for Ship Detection in Surveillance Video
Shao, Zhenfeng
Wang, Linggang
Wang, Zhongyuan
Du, Wan
Wu, Wenjing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (03) : 781 - 794
[27] Transformer-Based Multi-Scale Feature Integration Network for Video Saliency Prediction
Zhou, Xiaofei
Wu, Songhe
Shi, Ran
Zheng, Bolun
Wang, Shuai
Yin, Haibing
Zhang, Jiyong
Yan, Chenggang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7696 - 7707
[28] Deep fusion based video saliency detection
Wen, Hongfa
Zhou, Xiaofei
Sun, Yaoqi
Zhang, Jiyong
Yan, Chenggang
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 62 : 279 - 285
[29] Pattern mining based video saliency detection
Ramadan, Hiba
Tairi, Hamid
2017 INTELLIGENT SYSTEMS AND COMPUTER VISION (ISCV), 2017,
[30] EF-Net: A novel enhancement and fusion network for RGB-D saliency detection
Chen, Qian
Fu, Keren
Liu, Ze
Chen, Geng
Du, Hongwei
Qiu, Bensheng
Shao, Ling
PATTERN RECOGNITION, 2021, 112

← 1 2 3 4 5 →