Unsupervised Gaze Prediction in Egocentric Videos by Energy-based Surprise Modeling

被引：1

作者：

Aakur, Sathyanarayanan N. ^{[1
]}

Bagavathi, Arunkumar ^{[1
]}

机构：

[1] Oklahoma State Univ, Dept Comp Sci, Stillwater, OK 74078 USA

来源：

VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP | 2021年

基金：

美国国家科学基金会;

关键词：

Unsupervised Gaze Prediction; Egocentric Vision; Temporal Event Segmentation; Pattern Theory;

D O I：

10.5220/0010288009350942

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Egocentric perception has grown rapidly with the advent of immersive computing devices. Human gaze prediction is an important problem in analyzing egocentric videos and has primarily been tackled through either saliency-based modeling or highly supervised learning. We quantitatively analyze the generalization capabilities of supervised, deep learning models on the egocentric gaze prediction task on unseen, out-of-domain data. We find that their performance is highly dependent on the training data and is restricted to the domains specified in the training annotations. In this work, we tackle the problem of jointly predicting human gaze points and temporal segmentation of egocentric videos without using any training data. We introduce an unsupervised computational model that draws inspiration from cognitive psychology models of event perception. We use Grenander's pattern theory formalism to represent spatial-temporal features and model surprise as a mechanism to predict gaze fixation points. Extensive evaluation on two publicly available datasets - GTEA and GTEA+ datasets-shows that the proposed model can significantly outperform all unsupervised baselines and some supervised gaze prediction baselines. Finally, we show that the model can also temporally segment egocentric videos with a performance comparable to more complex, fully supervised deep learning baselines.

引用

页码：935 / 942

页数：8

共 27 条

[1] A Perceptual Prediction Framework for Self Supervised Event Segmentation
Aakur, Sathyanarayanan N.
Sarkar, Sudeep
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1197 - 1206
[2] GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK
Aakur, Sathyanarayanan N.
De Souza, Fillipe D. M.
Sarkar, Sudeep
[J]. QUARTERLY OF APPLIED MATHEMATICS, 2019, 77 (02) : 323 - 356
[3] [Anonymous], 2006, P ADV NEUR INF PROC, DOI DOI 10.7551/MITPRESS/7503.001.0001
[4] Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation
Brox, Thomas
Malik, Jitendra
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) : 500 - 513
[5] Fathi A, 2012, LECT NOTES COMPUT SC, V7572, P314, DOI 10.1007/978-3-642-33718-5_23
[6] Grenander U., 1996, Elements of Pattern Theory
[7] Novelty biases attention and gaze in a surprise trial
Horstmann, Gernot
Herwig, Arvid
[J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2016, 78 (01) : 69 - 77
[8] Surprise attracts the eyes and binds the gaze
Horstmann, Gernot
Herwig, Arvid
[J]. PSYCHONOMIC BULLETIN & REVIEW, 2015, 22 (03) : 743 - 749
[9] Image Signature: Highlighting Sparse Salient Regions
Hou, Xiaodi
Harel, Jonathan
Koch, Christof
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 194 - 201
[10] Using gaze patterns to predict task intent in collaboration
Huang, Chien-Ming
Andrist, Sean
Sauppe, Allison
Mutlu, Bilge
[J]. FRONTIERS IN PSYCHOLOGY, 2015, 6

← 1 2 3 →