Unified Image and Video Saliency Modeling

被引:88
作者
Droste, Richard [1 ]
Jiao, Jianbo [1 ]
Noble, J. Alison [1 ]
机构
[1] Univ Oxford, Oxford, England
来源
COMPUTER VISION - ECCV 2020, PT V | 2020年 / 12350卷
基金
英国工程与自然科学研究理事会;
关键词
Visual saliency; Video saliency; Domain adaptation; SPATIOTEMPORAL SALIENCY; VISUAL-ATTENTION; PREDICT;
D O I
10.1007/978-3-030-58558-7_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We identify different sources of domain shift between image and video saliency data and between different video saliency datasets as a key challenge for effective joint modelling. To address this we propose four novel domain adaptation techniques-Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN-in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300. With one set of parameters, UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods. We provide retrospective analyses and ablation studies which confirm the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisal.
引用
收藏
页码:419 / 435
页数:17
相关论文
共 50 条
[1]   Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction [J].
Bak, Cagdas ;
Kocak, Aysun ;
Erdem, Erkut ;
Erdem, Aykut .
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (07) :1688-1698
[2]  
Borji A, 2019, Arxiv, DOI arXiv:1810.03716
[3]   State-of-the-Art in Visual Attention Modeling [J].
Borji, Ali ;
Itti, Laurent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207
[4]  
Bousmalis K., 2016, P INT C NEUR INF PRO, P343
[5]   What Do Different Evaluation Metrics Tell Us About Saliency Models? [J].
Bylinskii, Zoya ;
Judd, Tilke ;
Oliva, Aude ;
Torralba, Antonio ;
Durand, Fredo .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) :740-757
[6]   Domain-Specific Batch Normalization for Unsupervised Domain Adaptation [J].
Chang, Woong-Gi ;
You, Tackgeun ;
Seo, Seonguk ;
Kwak, Suha ;
Han, Bohyung .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7346-7354
[7]   Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model [J].
Cornia, Marcella ;
Baraldi, Lorenzo ;
Serra, Giuseppe ;
Cucchiara, Rita .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (10) :5142-5154
[8]   Video Saliency Incorporating Spatiotemporal Cues and Uncertainty Weighting [J].
Fang, Yuming ;
Wang, Zhou ;
Lin, Weisi ;
Fang, Zhijun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) :3910-3921
[9]  
Gal Y, 2016, ADV NEUR IN, V29
[10]   Going from Image to Video Saliency: Augmenting Image Salience with Dynamic Attentional Push [J].
Gorji, Siavash ;
Clark, James J. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7501-7511