Attention Embedded Spatio-Temporal Network for Video Salient Object Detection

被引:7
作者
Huang, Lili [1 ]
Yan, Pengxiang [1 ]
Li, Guanbin [1 ]
Wang, Qing [1 ]
Lin, Liang [1 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Video salient object detection; spatiotemporal modeling; deep learning; representation learning;
D O I
10.1109/ACCESS.2019.2953046
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The main challenge in video salient object detection is how to model object motion and dramatic changes in appearance contrast. In this work, we propose an attention embedded spatio-temporal network (ASTN) to adaptively exploit diverse factors that influence dynamic saliency prediction within a unified framework. To compensate for object movement, we introduce a flow-guided spatial learning (FGSL) module to directly capture effective motion information in the form of attention based on optical flows. However, optical flow represents the motion information of all moving objects, including movements of non-salient objects caused by large camera motion and subtle changes in background. Therefore, using the flow-guided attention map alone causes the spatial saliency to be influenced by all moving objects rather than just the salient objects, resulting in unstable and temporally inconsistent saliency maps. To further enhance the temporal coherence, we develop an attentive bidirectional gated recurrent unit (AB-GRU) module to adaptively exploit sequential feature evolution. With this AB-GRU, we can further refine the spatio-temporal feature representation by incorporating an accommodative attention mechanism. Experimental results demonstrate that our model achieves superior empirical performance on video salient object detection. Moreover, an experiment on the extended application to unsupervised video object segmentation further demonstrates the generalization ability and stability of our proposed method.
引用
收藏
页码:166203 / 166213
页数:11
相关论文
共 34 条
[1]  
[Anonymous], P 3 INT C LEARNING R
[2]  
[Anonymous], PROC CVPR IEEE
[3]  
[Anonymous], 2015, NIPS 15 P 28 INT C N
[4]  
[Anonymous], 2015, P NEUR INF PROC SYST
[5]  
[Anonymous], PROC CVPR IEEE
[6]  
Ballas N., 2015, Comput. Sci
[7]  
Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21
[8]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[9]   SegFlow: Joint Learning for Video Object Segmentation and Optical Flow [J].
Cheng, Jingchun ;
Tsai, Yi-Hsuan ;
Wang, Shengjin ;
Yang, Ming-Hsuan .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :686-695
[10]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848