DevsNet: Deep Video Saliency Network using Short-term and Long-term Cues

被引:9
作者
Fang, Yuming [1 ]
Zhang, Chi [1 ]
Min, Xiongkuo [2 ]
Huang, Hanqin [1 ]
Yi, Yugen [3 ]
Zhai, Guangtao [2 ]
Lin, Chia-Wen [4 ]
机构
[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330032, Jiangxi, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[3] Jiangxi Normal Univ, Sch Software, Nanchang, Jiangxi, Peoples R China
[4] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu 30013, Taiwan
关键词
Video saliency detection; Spatiotemporal saliency; 3D convolution network (3D-ConvNet); Bidirectional convolutional long-short term memory network (B-ConvLSTM); VISUAL-ATTENTION; OBJECT DETECTION; OPTIMIZATION; SEGMENTATION; FUSION; TREE;
D O I
10.1016/j.patcog.2020.107294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, there have been various saliency detection methods proposed for still images based on deep learning techniques. However, the research on saliency detection for video sequences is still limited. In this study, we introduce a novel deep learning framework of saliency detection for video sequences, namely Deep Video Saliency Network (DevsNet). DevsNet mainly consists of two components: 3D Convolutional Network (3D-ConvNet) and Bidirectional Convolutional Long-Short Term Memory Network (B-ConvLSTM). 3D-ConvNet is constructed to learn short-term spatiotemporal information and the long-term spatiotemporal features are learned by B-ConvLSTM. The designed B-ConvLSTM can extract the temporal information not just from the previous video frames but also from the next frames, which demonstrates that the proposed model considers both the forward and backward temporal information. By combining the short-term and long-term spatiotemporal cues, the proposed DevsNet can extract saliency information for video sequences effectively and efficiently. Extensive experiments have been conducted to show that the proposed model can obtain better performance in spatiotemporal saliency prediction than the state-of-the-art models. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 64 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]  
[Anonymous], 2006, Advances in neural information processing systems
[3]  
[Anonymous], 2019, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2019.00875
[4]   Probabilistic saliency estimation [J].
Aytekin, Caglar ;
Iosifidis, Alexandros ;
Gabbouj, Moncef .
PATTERN RECOGNITION, 2018, 74 :359-372
[5]   Learning graph affinities for spectral graph-based salient object detection [J].
Aytekin, Caglar Caglar ;
Iosifidis, Alexandros ;
Kiranyaz, Serkan ;
Gabbouj, Moncef .
PATTERN RECOGNITION, 2017, 64 :159-167
[6]   State-of-the-Art in Visual Attention Modeling [J].
Borji, Ali ;
Itti, Laurent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207
[7]  
Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21
[8]   What Do Different Evaluation Metrics Tell Us About Saliency Models? [J].
Bylinskii, Zoya ;
Judd, Tilke ;
Oliva, Aude ;
Torralba, Antonio ;
Durand, Fredo .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) :740-757
[9]  
Chaabouni S., ARXIV160408010
[10]   Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion [J].
Chen, Chenglizhao ;
Li, Shuai ;
Wang, Yongguang ;
Qin, Hong ;
Hao, Aimin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) :3156-3170