Learning Affective Video Features for Facial Expression Recognition via Hybrid Deep Learning

被引:87
作者
Zhang, Shiqing [1 ]
Pan, Xianzhang [1 ]
Cui, Yueli [1 ]
Zhao, Xiaoming [1 ]
Liu, Limei [2 ]
机构
[1] Taizhou Univ, Inst Intelligent Informat Proc, Taizhou 318000, Peoples R China
[2] Hunan Univ Commerce, Inst Big Data & Internet Innovat, Changsha 410205, Hunan, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Facial expression recognition; spatio-temporal features; hybrid deep learning; deep convolutional neural networks; deep belief network; HISTORY; MODEL;
D O I
10.1109/ACCESS.2019.2901521
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One key challenging issues of facial expression recognition (FER) in video sequences is to extract discriminative spatiotemporal video features from facial expression images in video sequences. In this paper, we propose a new method of FER in video sequences via a hybrid deep learning model. The proposed method first employs two individual deep convolutional neural networks (CNNs), including a spatial CNN processing static facial images and a temporal CN network processing optical flow images, to separately learn high-level spatial and temporal features on the divided video segments. These two CNNs are fine-tuned on target video facial expression datasets from a pre-trained CNN model. Then, the obtained segment-level spatial and temporal features are integrated into a deep fusion network built with a deep belief network (DBN) model. This deep fusion network is used to jointly learn discriminative spatiotemporal features. Finally, an average pooling is performed on the learned DBN segment-level features in a video sequence, to produce a fixed-length global video feature representation. Based on the global video feature representations, a linear support vector machine (SVM) is employed for facial expression classification tasks. The extensive experiments on three public video-based facial expression datasets, i.e., BAUM-1s, RML, and MMI, show the effectiveness of our proposed method, outperforming the state-of-the-arts.
引用
收藏
页码:32297 / 32304
页数:8
相关论文
共 42 条
[1]   Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications [J].
Adrian Corneanu, Ciprian ;
Oliu Simon, Marc ;
Cohn, Jeffrey F. ;
Escalera Guerrero, Sergio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) :1548-1568
[2]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[3]  
Elmadany NE, 2016, IEEE INT SYMP CIRC S, P590, DOI 10.1109/ISCAS.2016.7527309
[4]   A dynamic framework based on local Zernike moment and motion history image for facial expression recognition [J].
Fan, Xijian ;
Tjahjadi, Tardi .
PATTERN RECOGNITION, 2017, 64 :399-406
[5]  
Gkioxari G, 2015, PROC CVPR IEEE, P759, DOI 10.1109/CVPR.2015.7298676
[6]   Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks [J].
Hasani, Behzad ;
Mahoor, Mohammad H. .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :2278-2288
[7]   Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields [J].
Hasani, Behzad ;
Mahoor, Mohammad H. .
2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, :790-795
[8]  
Hayat M, 2012, C HUM SYST INTERACT, P43, DOI 10.1109/HSI.2012.16
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507