DeepGaze III: Modeling free-viewing human scanpaths with deep learning

被引:50
作者
Kuemmerer, Matthias [1 ]
Bethge, Matthias [1 ]
Wallis, Thomas S. A. [2 ,3 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] Tech Univ Darmstadt, Inst Psychol, Darmstadt, Germany
[3] Tech Univ Darmstadt, Ctr Cognit Sci, Darmstadt, Germany
关键词
CONFIDENCE-INTERVALS; SUPERIOR COLLICULUS; EYE-MOVEMENTS; ATTENTION; LOCATIONS; BIASES;
D O I
10.1167/jov.22.5.7
中图分类号
R77 [眼科学];
学科分类号
100212 ;
摘要
Humans typically move their eyes in "scanpaths" of fixations linked by saccades. Here we present DeepGaze III, a new model that predicts the spatial location of consecutive fixations in a free-viewing scanpath over static images. DeepGaze III is a deep learning-based model that combines image information with information about the previous fixation history to predict where a participant might fixate next. As a high-capacity and flexible model, DeepGaze III captures many relevant patterns in the human scanpath data, setting a new state of the art in the MIT300 dataset and thereby providing insight into how much information in scanpaths across observers exists in the first place. We use this insight to assess the importance of mechanisms implemented in simpler, interpretable models for fixation selection. Due to its architecture, DeepGaze III allows us to disentangle several factors that play an important role in fixation selection, such as the interplay of scene content and scanpath history. The modular nature of DeepGaze III allows us to conduct ablation studies, which show that scene content has a stronger effect on fixation selection than previous scanpath history in our main dataset. In addition, we can use the model to identify scenes for which the relative importance of these sources of information differs most. These data-driven insights would be difficult to accomplish with simpler models that do not have the computational capacity to capture such patterns, demonstrating an example of how deep learning advances can be used to contribute to scientific understanding.
引用
收藏
页码:1 / 27
页数:27
相关论文
共 75 条
[1]   A Model of the Superior Colliculus Predicts Fixation Locations during Scene Viewing and Visual Search [J].
Adeli, Hossein ;
Vitu, Francoise ;
Zelinsky, Gregory J. .
JOURNAL OF NEUROSCIENCE, 2017, 37 (06) :1453-1467
[2]   SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes [J].
Assens, Marc ;
Giro-i-Nieto, Xavier ;
McGuinness, Kevin ;
O'Connor, Noel E. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :2331-2338
[3]  
Ba J L., LAYER NORMALIZATION, DOI 10.48550/arXiv.1607.06450
[4]  
Bahill A. T., NEUROLOGY, V29, P1150, DOI [DOI 10.1212/WNL.0000000000009015, 10.1212/WNL.29.8.1150, DOI 10.1212/WNL.29.8.1150]
[5]   TYPES OF SACCADIC EYE-MOVEMENTS [J].
BAHILL, AT ;
TROOST, BT .
NEUROLOGY, 1979, 29 (08) :1150-1152
[6]   Modeling fixation locations using spatial point processes [J].
Barthelme, Simon ;
Trukenbrod, Hans ;
Engbert, Ralf ;
Wichmann, Felix .
JOURNAL OF VISION, 2013, 13 (12)
[7]   FURTHER PROPERTIES OF HUMAN SACCADIC SYSTEM - EYE MOVEMENTS AND CORRECTION SACCADES WITH AND WITHOUT VISUAL FIXATION POINTS [J].
BECKER, W ;
FUCHS, AF .
VISION RESEARCH, 1969, 9 (10) :1247-&
[8]   Modelling gaze shift as a constrained random walk [J].
Boccignone, G ;
Ferraro, M .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2004, 331 (1-2) :207-218
[9]  
Borji A, 2015, Arxiv, DOI arXiv:1505.03581
[10]   State-of-the-Art in Visual Attention Modeling [J].
Borji, Ali ;
Itti, Laurent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207