Recognizing Visual Focus of Attention From Head Pose in Natural Meetings

被引:100
作者
Ba, Sileye O. [1 ,2 ]
Odobez, Jean-Marc [1 ,2 ]
机构
[1] IDIAP, Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2009年 / 39卷 / 01期
关键词
Head pose tracking; hidden markov model; maximum a posteriori adaptation; meeting analysis; particle filter; visual focus of attention; GAZE;
D O I
10.1109/TSMCB.2008.927274
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants based on their head pose. To this end, the head pose observations are modeled using a Gaussian mixture model (GMM) or a hidden Markov model (HMM) whose hidden states correspond to the VFOA. The novelties of this paper are threefold. First, contrary to previous studies on the topic, in our setup, the potential VFOA of a person is not restricted to other participants only. It includes environmental targets as well (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan as well as tilt gaze space. Second, we propose a geometric model to set the GMM or HMM parameters by exploiting results from cognitive science on saccadic eye motion, which allows the prediction of the head pose given a gaze target. Third, an unsupervised parameter adaptation step not using any labeled data is proposed, which accounts for the specific gazing behavior of each participant. Using a publicly available corpus of eight meetings featuring four persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision-based tracking system. The results clearly show that in such complex but realistic situations, the VFOA recognition performance is highly dependent on bow well the visual targets are separated for a given meeting participant. In addition, the results show that the use of a geometric model with unsupervised adaptation achieves better results than the use of training data to set the HMM parameters.
引用
收藏
页码:16 / 33
页数:18
相关论文
共 36 条
[1]  
[Anonymous], 2005, INPROCEEDINGS GRAPHI
[2]  
BA SO, 2005, P ACM ICMI MMMP, P9
[3]   Transformed social interaction, augmented gaze, and social influence in immersive virtual environments [J].
Bailenson, JN ;
Beall, AC ;
Loomis, J ;
Blascovich, J ;
Turk, M .
HUMAN COMMUNICATION RESEARCH, 2005, 31 (04) :511-537
[4]  
BARONCOHEN S, 1994, CAH PSYCHOL COGN, V13, P513
[5]  
Brown LM, 2002, IEEE WORKSHOP ON MOTION AND VIDEO COMPUTING (MOTION 2002), PROCEEDINGS, P125, DOI 10.1109/MOTION.2002.1182224
[6]  
COOTES T, 2002, P BMVC, P837
[7]  
CROWLEY JL, 2004, P POINT ICPR INT WOR
[9]   Eye-head coordination during head-unrestrained gaze shifts in rhesus monkeys [J].
Freedman, EG ;
Sparks, DL .
JOURNAL OF NEUROPHYSIOLOGY, 1997, 77 (05) :2328-2348
[10]   BAYESIAN LEARNING FOR HIDDEN MARKOV MODEL WITH GAUSSIAN MIXTURE STATE OBSERVATION DENSITIES [J].
GAUVAIN, JL ;
LEE, CH .
SPEECH COMMUNICATION, 1992, 11 (2-3) :205-213