RINet: Relative Importance-Aware Network for Fixation Prediction

被引:9
作者
Song, Yingjie [1 ,2 ]
Liu, Zhi [1 ,2 ]
Li, Gongyang [1 ,2 ,3 ]
Zeng, Dan [1 ,2 ]
Zhang, Tianhong
Xu, Lihua
Wang, Jijun
机构
[1] Shanghai Univ, Joint Int Res Lab Specialty Fiber Opt & Adv Commun, Shanghai Inst Adv Commun & Data Sci, Key Labora tory Specialty Fiber Opt & Opt Access N, Shanghai, Peoples R China
[2] Shanghai Univ, Sch Commun & Informat Engn, Joint Int Res Lab Specialty Fiber Opt & Adv Commu, Key Lab Specialty Fiber Opt & Opt Access Networks, Shanghai 200444, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Med, Shanghai Mental Hlth Ctr, Shanghai Key Lab Psychot Disorders, Shanghai 200030, Peoples R China
基金
中国国家自然科学基金;
关键词
Fixation prediction; relative importance; self-attention mechanism; complexity-relevant focal loss; ENCODER-DECODER NETWORK; VISUAL-ATTENTION; SALIENCY; QUALITY; MODEL;
D O I
10.1109/TMM.2023.3249481
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fixation prediction aims to simulate human visual selection mechanism and estimate the visual saliency degree of regions in a scene. In semantically rich scenes, there are generally multiple salient regions. This condition requires a fixation prediction model to understand the relative importance relationship of multiple salient regions, that is, to identify which region is more important. In practice, existing fixation prediction models implicitly explore the relative importance relationship in the end-to-end training process while they do not work well. In this article, we propose a novel Relative Importance-aware Network (RINet) to explicitly explore the modeling of relative importance in fixation prediction. RINet perceives multi-scale local and global relative importance through the Hierarchical Relative Importance Enhancement (HRIE) module. Within a single scale subspace, on the one hand, HRIE module regards the similarity matrix as the local relative importance map to weight the input feature. On the other hand, HRIE module integrates a set of local relative importance maps into one map, defined as the global relative importance map, to grasp global relative importance. Moreover, we propose a Complexity-Relevant Focal (CRF) loss for network training. As such, we can progressively emphasize learning difficult samples for better handling the complicated scenarios, further improving the performance. The ablation studies confirm the contributions of key components of our RINet, and extensive experiments on five datasets demonstrate our RINet is superior to 28 relevant state-of-the-art models.
引用
收藏
页码:9263 / 9277
页数:15
相关论文
共 97 条
[1]  
[Anonymous], 2006, GRAPH BASED VISUAL S
[2]  
[Anonymous], 2005, Advances in neural information processing systems
[3]   Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction [J].
Bak, Cagdas ;
Kocak, Aysun ;
Erdem, Erkut ;
Erdem, Aykut .
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (07) :1688-1698
[4]  
Bian P, 2009, LECT NOTES COMPUT SC, V5506, P251, DOI 10.1007/978-3-642-02490-0_31
[5]   Saliency Prediction in the Deep Learning Era: Successes and Limitations [J].
Borji, Ali .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :679-700
[6]   State-of-the-Art in Visual Attention Modeling [J].
Borji, Ali ;
Itti, Laurent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207
[7]  
Bruce N., 2005, Advances in neural information processing systems, P155
[8]   What Do Different Evaluation Metrics Tell Us About Saliency Models? [J].
Bylinskii, Zoya ;
Judd, Tilke ;
Oliva, Aude ;
Torralba, Antonio ;
Durand, Fredo .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) :740-757
[9]   How is Gaze Influenced by Image Transformations? Dataset and Model [J].
Che, Zhaohui ;
Borji, Ali ;
Zhai, Guangtao ;
Min, Xiongkuo ;
Guo, Guodong ;
Le Callet, Patrick .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :2287-2300
[10]   Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model [J].
Cornia, Marcella ;
Baraldi, Lorenzo ;
Serra, Giuseppe ;
Cucchiara, Rita .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (10) :5142-5154