Shifting More Attention to Video Salient Object Detection

被引:426
作者
Fan, Deng-Ping [1 ]
Wang, Wenguan [2 ]
Cheng, Ming-Ming [1 ]
Shen, Jianbing [2 ,3 ]
机构
[1] Nankai Univ, CS, TKLNDST, Tianjin, Peoples R China
[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] Beijing Inst Technol, Beijing, Peoples R China
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
DETECTION MODEL; SEGMENTATION; OPTIMIZATION; DRIVEN; SCENE;
D O I
10.1109/CVPR.2019.00875
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The last decade has witnessed a growing interest in video salient object detection (VSOD). However, the research community long-term lacked a well-established VSOD dataset representative of real dynamic scenes with high-quality annotations. To address this issue, we elaborately collected a visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions. With corresponding real human eye fixation data, we obtain precise ground-truths. This is the first work that explicitly emphasizes the challenge of saliency shift, i.e., the video salient object(s) may dynamically change. To further contribute the community a complete benchmark, we systematically assess 17 representative VSOD algorithms over seven existing VSOD datasets and our DAVSOD with totally similar to 84K frames (largest-scale). Utilizing three famous metrics, we then present a comprehensive and insightful performance analysis. Furthermore, we propose a baseline model. It is equipped with a saliency shift-aware convLSTM, which can efficiently capture video saliency dynamics through learning human attention-shift behavior. Extensive experiments(1) open up promising future directions for model development and comparison.
引用
收藏
页码:8546 / 8556
页数:11
相关论文
共 94 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]   Unsupervised Uncertainty Estimation Using Spatiotemporal Cues in Video Saliency Detection [J].
Alshawi, Tariq ;
Long, Zhiling ;
AlRegib, Ghassan .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (06) :2818-2827
[3]  
[Anonymous], 2018, CANC BIOL THER
[4]  
[Anonymous], 2017, IEEE TIP, DOI DOI 10.1109/TIP.2016.2631900
[5]   Spatiotemporal Saliency Estimation by Spectral Foreground Detection [J].
Aytekin, Caglar ;
Possegger, Horst ;
Mauthner, Thomas ;
Kiranyaz, Serkan ;
Bischof, Horst ;
Gabbouj, Moncef .
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (01) :82-95
[6]   Visual Saliency Detection Using Spatiotemporal Decomposition [J].
Bhattacharya, Saumik ;
Venkatesh, K. Subramanian ;
Gupta, Sumana .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) :1665-1675
[7]  
Binder Marc D., 2009, GAZE SHIFT, P1676
[8]   What stands out in a scene? A study of human explicit saliency judgment [J].
Borji, Ali ;
Sihite, Dicky N. ;
Itti, Laurent .
VISION RESEARCH, 2013, 91 :62-77
[9]  
Chen Chenglizhao, 2018, IEEE TMM
[10]   Combination therapy of exendin-4 and allogenic adipose-derived mesenchymal stem cell preserved renal function in a chronic kidney disease and sepsis syndrome setting in rats [J].
Chen, Chih-Hung ;
Cheng, Ben-Chung ;
Chen, Kuan-Hung ;
Shao, Pei-Lin ;
Sung, Pei-Hsun ;
Chiang, Hsin-Ju ;
Yang, Chih-Chao ;
Lin, Kun-Chen ;
Sun, Cheuk-Kwan ;
Sheu, Jiunn-Jye ;
Chang, Hsueh-Wen ;
Lee, Mel S. ;
Yip, Hon-Kan .
ONCOTARGET, 2017, 8 (59) :100002-100020