SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection

被引:44
作者
Sun, Meijun [1 ]
Zhou, Ziqi [1 ]
Hu, Qinghua [1 ]
Wang, Zheng [2 ]
Jiang, Jianmin [3 ]
机构
[1] Tianjin Univ, Sch Comp Sci, Tianjin 300350, Peoples R China
[2] Tianjin Univ, Sch Software Engn, Tianjin 300350, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Res Inst Future Media Comp, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
Eye fixation detection; fully convolutional neural networks; video saliency; VISUAL-ATTENTION; CO-SEGMENTATION; SELECTION; FRAMEWORK; GAZE;
D O I
10.1109/TCYB.2018.2832053
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data-driven saliency detection has attracted strong interest as a result of applying convolutional neural networks to the detection of eye fixations. Although a number of image-based salient object and fixation detection models have been proposed, video fixation detection still requires more exploration. Different from image analysis, motion and temporal information is a crucial factor affecting human attention when viewing video sequences. Although existing models based on local contrast and low-level features have been extensively researched, they failed to simultaneously consider interframe motion and temporal information across neighboring video frames, leading to unsatisfactory performance when handling complex scenes. To this end, we propose a novel and efficient video eye fixation detection model to improve the saliency detection performance. By simulating the memory mechanism and visual attention mechanism of human beings when watching a video, we propose a step-gained fully convolutional network by combining the memory information on the time axis with the motion information on the space axis while storing the saliency information of the current frame. The model is obtained through hierarchical training, which ensures the accuracy of the detection. Extensive experiments in comparison with 11 state-of-the-art methods are carried out, and the results show that our proposed model outperforms all 11 methods across a number of publicly available datasets.
引用
收藏
页码:2900 / 2911
页数:12
相关论文
共 64 条
[1]   SALIENCY DETECTION USING MAXIMUM SYMMETRIC SURROUND [J].
Achanta, Radhakrishna ;
Suesstrunk, Sabine .
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, :2653-2656
[2]  
[Anonymous], 2010, EPFLREPORT149300
[3]  
[Anonymous], ARXIV161109571
[4]  
[Anonymous], 2016, AS C COMP VIS
[5]  
[Anonymous], 2008, NIPS
[6]   Free viewing of dynamic stimuli by humans and monkeys [J].
Berg, David J. ;
Boehnke, Susan E. ;
Marino, Robert A. ;
Munoz, Douglas P. ;
Itti, Laurent .
JOURNAL OF VISION, 2009, 9 (05)
[7]   Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation [J].
Brox, Thomas ;
Malik, Jitendra .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) :500-513
[8]   Saliency-Aware Nonparametric Foreground Annotation Based on Weakly Labeled Data [J].
Cao, Xiaochun ;
Zhang, Changqing ;
Fu, Huazhu ;
Guo, Xiaojie ;
Tian, Qi .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (06) :1253-1265
[9]   Weighted Optimization-Based Distributed Kalman Filter for Nonlinear Target Tracking in Collaborative Sensor Networks [J].
Chen, Jie ;
Li, Jiahong ;
Yang, Shuanghua ;
Deng, Fang .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (11) :3892-3905
[10]   Object-Based Multiple Foreground Video Co-Segmentation via Multi-State Selection Graph [J].
Fu, Huazhu ;
Xu, Dong ;
Zhang, Bao ;
Lin, Stephen ;
Ward, Rabab Kreidieh .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) :3415-3424