Detecting gaze towards eyes in natural social interactions and its use in child assessment

被引:31
作者
Chong, Eunji [1 ]
Chanda, Katha [1 ]
Ye, Zhefan [2 ]
Southerland, Audrey [1 ]
Ruiz, Nataniel [1 ]
Jones, Rebecca [1 ]
Rozga, Agata [1 ]
Rehg, James [3 ]
机构
[1] Center for Behavioral Imaging and School of Interactive Computing, College of Computing, Georgia Institute of Technology, United States
[2] Computer Science and Engineering, University of Michigan, United States
[3] Weill Cornell Medicine, Center for Autism and the Developing Brain, United States
关键词
Computer vision - Diseases - Risk assessment - Automation - Behavioral research - Cameras - Wearable technology;
D O I
10.1145/3131902
中图分类号
学科分类号
摘要
Eye contact is a crucial element of non-verbal communication that signifies interest, attention, and participation in social interactions. As a result, measures of eye contact arise in a variety of applications such as the assessment of the social communication skills of children at risk for developmental disorders such as autism, or the analysis of turn-taking and social roles during group meetings. However, the automated measurement of visual attention during naturalistic social interactions is challenging due to the difficulty of estimating a subject's looking direction from video. This paper proposes a novel approach to eye contact detection during adult-child social interactions in which the adult wears a point-of-view camera which captures an egocentric view of the child's behavior. By analyzing the child's face regions and inferring their head pose we can accurately identify the onset and duration of the child's looks to their social partner's eyes. We introduce the Pose-Implicit CNN, a novel deep learning architecture that predicts eye contact while implicitly estimating the head pose. We present a fully automated system for eye contact detection that solves the sub-problems of end-to-end feature learning and pose estimation using deep neural networks. To train our models, we use a dataset comprising 22 hours of 156 play session videos from over 100 children, half of whom are diagnosed with Autism Spectrum Disorder. We report an overall precision of 0.76, recall of 0.80, and an area under the precision-recall curve of 0.79, all of which are significant improvements over existing methods. © 2017 ACM.
引用
收藏
相关论文
empty
未找到相关数据