Unsupervised Gaze Prediction in Egocentric Videos by Energy-based Surprise Modeling

被引:1
作者
Aakur, Sathyanarayanan N. [1 ]
Bagavathi, Arunkumar [1 ]
机构
[1] Oklahoma State Univ, Dept Comp Sci, Stillwater, OK 74078 USA
来源
VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP | 2021年
基金
美国国家科学基金会;
关键词
Unsupervised Gaze Prediction; Egocentric Vision; Temporal Event Segmentation; Pattern Theory;
D O I
10.5220/0010288009350942
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric perception has grown rapidly with the advent of immersive computing devices. Human gaze prediction is an important problem in analyzing egocentric videos and has primarily been tackled through either saliency-based modeling or highly supervised learning. We quantitatively analyze the generalization capabilities of supervised, deep learning models on the egocentric gaze prediction task on unseen, out-of-domain data. We find that their performance is highly dependent on the training data and is restricted to the domains specified in the training annotations. In this work, we tackle the problem of jointly predicting human gaze points and temporal segmentation of egocentric videos without using any training data. We introduce an unsupervised computational model that draws inspiration from cognitive psychology models of event perception. We use Grenander's pattern theory formalism to represent spatial-temporal features and model surprise as a mechanism to predict gaze fixation points. Extensive evaluation on two publicly available datasets - GTEA and GTEA+ datasets-shows that the proposed model can significantly outperform all unsupervised baselines and some supervised gaze prediction baselines. Finally, we show that the model can also temporally segment egocentric videos with a performance comparable to more complex, fully supervised deep learning baselines.
引用
收藏
页码:935 / 942
页数:8
相关论文
共 27 条
  • [1] A Perceptual Prediction Framework for Self Supervised Event Segmentation
    Aakur, Sathyanarayanan N.
    Sarkar, Sudeep
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1197 - 1206
  • [2] GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK
    Aakur, Sathyanarayanan N.
    De Souza, Fillipe D. M.
    Sarkar, Sudeep
    [J]. QUARTERLY OF APPLIED MATHEMATICS, 2019, 77 (02) : 323 - 356
  • [3] [Anonymous], 2006, P ADV NEUR INF PROC, DOI DOI 10.7551/MITPRESS/7503.001.0001
  • [4] Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation
    Brox, Thomas
    Malik, Jitendra
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) : 500 - 513
  • [5] Fathi A, 2012, LECT NOTES COMPUT SC, V7572, P314, DOI 10.1007/978-3-642-33718-5_23
  • [6] Grenander U., 1996, Elements of Pattern Theory
  • [7] Novelty biases attention and gaze in a surprise trial
    Horstmann, Gernot
    Herwig, Arvid
    [J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2016, 78 (01) : 69 - 77
  • [8] Surprise attracts the eyes and binds the gaze
    Horstmann, Gernot
    Herwig, Arvid
    [J]. PSYCHONOMIC BULLETIN & REVIEW, 2015, 22 (03) : 743 - 749
  • [9] Image Signature: Highlighting Sparse Salient Regions
    Hou, Xiaodi
    Harel, Jonathan
    Koch, Christof
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 194 - 201
  • [10] Using gaze patterns to predict task intent in collaboration
    Huang, Chien-Ming
    Andrist, Sean
    Sauppe, Allison
    Mutlu, Bilge
    [J]. FRONTIERS IN PSYCHOLOGY, 2015, 6