Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network

被引:79
作者
Gu, Jinwei [1 ]
Yang, Xiaodong [1 ]
De Mello, Shalini [1 ]
Kautz, Jan [1 ]
机构
[1] NVIDIA, Santa Clara, CA 95051 USA
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
关键词
D O I
10.1109/CVPR.2017.167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Facial analysis in videos, including head pose estimation and facial landmark localization, is key for many applications such as facial animation capture, human activity recognition, and human-computer interaction. In this paper, we propose to use a recurrent neural network (RNN) for joint estimation and tracking of facial features in videos. We are inspired by the fact that the computation performed in an RNN bears resemblance to Bayesian filters, which have been used for tracking in many previous methods for facial analysis from videos. Bayesian filters used in these methods, however, require complicated, problem-specific design and tuning. In contrast, our proposed RNN-based method avoids such tracker-engineering by learning from training data, similar to how a convolutional neural network (CNN) avoids feature-engineering for image classification. As an end-to-end network, the proposed RNN-based method provides a generic and holistic solution for joint estimation and tracking of various types of facial features from consecutive video frames. Extensive experimental results on head pose estimation and facial landmark localization from videos demonstrate that the proposed RNN-based method outperforms frame-wise models and Bayesian filtering. In addition, we create a large-scale synthetic dataset for head pose estimation, with which we achieve state-of-the-art performance on a benchmark dataset.
引用
收藏
页码:1531 / 1540
页数:10
相关论文
共 52 条
  • [31] Meyer G. P., 2015, INT C COMP VIS
  • [32] Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks
    Molchanov, Pavlo
    Yang, Xiaodong
    Gupta, Shalini
    Kim, Kihwan
    Tyree, Stephen
    Kautz, Jan
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4207 - 4215
  • [33] Mukherjee S., 2015, IEEE T MULTIMEDIA
  • [34] Murphy-Chutorian E., 2010, IEEE T INTELLIGENT T
  • [35] Murphy-Chutorian E., 2009, IEEE T PATTERN ANAL
  • [36] Murphy-Chutorian E., 2008, IEEE INT VEH S GOLD
  • [37] Padeleris P., 2012, IEEE C COMP VIS PATT
  • [38] Multi-region Two-Stream R-CNN for Action Detection
    Peng, Xiaojiang
    Schmid, Cordelia
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 744 - 759
  • [39] Prabhu U., 2012, EUR C COMP VIS WORKS
  • [40] Rajamanoharan G., 2015, INT C COMP VIS WORKS