Visual Saliency Modeling with Deep Learning: A Comprehensive Review

被引:4
作者
Abraham, Shilpa Elsa [1 ]
Kovoor, Binsu C. [1 ]
机构
[1] Cochin Univ Sci & Technol, Sch Engn, Dept Informat Technol, Kochi 682022, Kerala, India
关键词
Eye fixation prediction; saliency prediction; salient object detection; multi-modal saliency prediction; deep learning; convolutional neural networks; transformers; OBJECT DETECTION; NETWORK; ATTENTION; REGION; FRAMEWORK; PREDICT;
D O I
10.1142/S0219649222500666
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Visual saliency models mimic the human visual system to gaze towards fixed pixel positions and capture the most conspicuous regions in the scene. They have proved their efficacy in several computer vision applications. This paper provides a comprehensive review of the recent advances in eye fixation prediction and salient object detection, harnessing deep learning. It also provides an overview on multimodal saliency prediction that considers audio in dynamic scenes. The underlying network structure and loss function for each model are explored to realise how saliency models work. The survey also investigates the inclusion of specific low-level priors in deep learning-based saliency models. The public datasets and evaluation metrics are succinctly introduced. The paper also makes a discussion on the key issues in saliency modeling along with some open problems and growing research directions in the field.
引用
收藏
页数:59
相关论文
共 192 条
[1]  
Achanta R, 2008, LECT NOTES COMPUT SC, V5008, P66
[2]   Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration [J].
Alpert, Sharon ;
Galun, Meirav ;
Brandt, Achi ;
Basri, Ronen .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (02) :315-327
[3]  
[Anonymous], 2008, NIPS07
[4]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[5]   Saliency-based multi-feature modeling for semantic image retrieval [J].
Bai, Cong ;
Chen, Jia-nan ;
Huang, Ling ;
Kpalma, Kidiyo ;
Chen, Shengyong .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 50 :199-204
[6]   Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction [J].
Bak, Cagdas ;
Kocak, Aysun ;
Erdem, Erkut ;
Erdem, Aykut .
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (07) :1688-1698
[7]   iCoseg: Interactive Co-segmentation with Intelligent Scribble Guidance [J].
Batra, Dhruv ;
Kowdle, Adarsh ;
Parikh, Devi ;
Luo, Jiebo ;
Chen, Tsuhan .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :3169-3176
[8]  
Borji A, 2015, PREPRINT
[9]   Saliency Prediction in the Deep Learning Era: Successes and Limitations [J].
Borji, Ali .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :679-700
[10]  
Borji Ali, 2019, [Computational Visual Media, 计算可视媒体], V5, P117