Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields

被引:54
作者
Hasani, Behzad [1 ]
Mahoor, Mohammad H. [1 ]
机构
[1] Univ Denver, Dept Elect & Comp Engn, Denver, CO 80208 USA
来源
2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017) | 2017年
关键词
D O I
10.1109/FG.2017.99
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated Facial Expression Recognition (FER) has been a challenging task for decades. Many of the existing works use hand-crafted features such as LBP, HOG, LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as Support Vector Machines for expression recognition. These methods often require rigorous hyperparameter tuning to achieve good results. Recently Deep Neural Networks (DNN) have shown to outperform traditional methods in visual object recognition. In this paper, we propose a two-part network consisting of a DNN-based architecture followed by a Conditional Random Field (CRF) module for facial expression recognition in videos. The first part captures the spatial relation within facial images using convolutional layers followed by three Inception-ResNet modules and two fully-connected layers. To capture the temporal relation between the image frames, we use linear chain CRF in the second part of our network. We evaluate our proposed network on three publicly available databases, viz. CK+, MMI, and FERA. Experiments are performed in subject-independent and cross-database manners. Our experimental results show that cascading the deep network architecture with the CRF module considerably increases the recognition of facial expressions in videos and in particular it outperforms the state-of-the-art methods in the cross-database experiments and yields comparable results in the subject-independent experiments.
引用
收藏
页码:790 / 795
页数:6
相关论文
共 41 条
[1]  
[Anonymous], 2016, ARXIV160305027
[2]  
[Anonymous], 2016, ARXIV160506065
[3]  
[Anonymous], 1997, Neural Computation
[4]  
[Anonymous], 2006, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Volume 2, Washington, DC, USA
[5]  
[Anonymous], 2016, IEEE C COMPUTER VISI
[6]  
[Anonymous], 2015, P DEEP LEARN WORKSH
[7]  
[Anonymous], IEEE C COMP VIS PATT
[8]  
[Anonymous], IEEE C COMP VIS PATT
[9]  
Banziger T., 2010, Blueprint for affective computing: A sourcebook, P271, DOI DOI 10.1037/A0025827
[10]   Facial expression recognition from video sequences: temporal and static modeling [J].
Cohen, I ;
Sebe, N ;
Garg, A ;
Chen, LS ;
Huang, TS .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2003, 91 (1-2) :160-187