Multimodal 2D+3D Facial Expression Recognition With Deep Fusion Convolutional Neural Network

被引:149
作者
Li, Huibin [1 ]
Sun, Jian [1 ]
Xu, Zongben [1 ]
Chen, Liming [2 ]
机构
[1] Xi An Jiao Tong Univ, Inst Informat & Syst Sci, Sch Math & Stat, Xian 710049, Shaanxi, Peoples R China
[2] Ecole Cent Lyon, Dept Math & Informat, LIRIS UMR 5205, F-69134 Lyon, France
关键词
Deep fusion convolutional neural network (DF-CNN); facial expression recognition (FER); multimodal; textured three-dimensional (3D) face scan; EMOTION RECOGNITION; 3D; FACE; FRAMEWORK; DATABASE;
D O I
10.1109/TMM.2017.2713408
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel and efficient deep fusion convolutional neural network (DF-CNN) for multimodal 2D+3D facial expression recognition (FER). DF-CNN comprises a feature extraction subnet, a feature fusion subnet, and a softmax layer. In particular, each textured three-dimensional (3D) face scan is represented as six types of 2D facial attribute maps (i.e., geometry map, three normal maps, curvature map, and texture map), all of which are jointly fed into DF-CNN for feature learning and fusion learning, resulting in a highly concentrated facial representation (32-dimensional). Expression prediction is performed by two ways: 1) learning linear support vector machine classifiers using the 32-dimensional fused deep features, or 2) directly performing softmax prediction using the six-dimensional expression probability vectors. Different from existing 3D FER methods, DF-CNN combines feature learning and fusion learning into a single end-to-end training framework. To demonstrate the effectiveness of DF-CNN, we conducted comprehensive experiments to compare the performance of DF-CNN with handcrafted features, pre-trained deep features, fine-tuned deep features, and state-of-the-art methods on three 3D face datasets (i.e., BU-3DFE Subset I, BU-3DFE Subset II, and Bosphorus Subset). In all cases, DF-CNN consistently achieved the best results. To the best of our knowledge, this is the first work of introducing deep CNN to 3D FER and deep learning-based feature level fusion for multimodal 2D+3D FER.
引用
收藏
页码:2816 / 2831
页数:16
相关论文
共 69 条
[31]  
Li HB, 2011, LECT NOTES COMPUT SC, V6915, P483, DOI 10.1007/978-3-642-23687-7_44
[32]   A Data-Driven Approach for Facial Expression Retargeting in Video [J].
Li, Kai ;
Dai, Qionghai ;
Wang, Ruiping ;
Liu, Yebin ;
Xu, Feng ;
Wang, Jue .
IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (02) :299-310
[33]   AU-inspired Deep Networks for Facial Expression Feature Learning [J].
Liu, Mengyi ;
Li, Shaoxin ;
Shan, Shiguang ;
Chen, Xilin .
NEUROCOMPUTING, 2015, 159 :126-136
[34]   Facial Expression Recognition via a Boosted Deep Belief Network [J].
Liu, Ping ;
Han, Shizhong ;
Meng, Zibo ;
Tong, Yan .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1805-1812
[35]  
Liu P, 2014, LECT NOTES COMPUT SC, V8692, P151, DOI 10.1007/978-3-319-10593-2_11
[36]  
Maalej Ahmed, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P4129, DOI 10.1109/ICPR.2010.1003
[37]   Shape analysis of local facial patches for 3D facial expression recognition [J].
Maalej, Ahmed ;
Ben Amor, Boulbaba ;
Daoudi, Mohamed ;
Srivastava, Anuj ;
Berretti, Stefano .
PATTERN RECOGNITION, 2011, 44 (08) :1581-1589
[38]  
Mian A, 2007, THIRD INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, P735
[39]   Bilinear models for 3-D face and facial expression recognition [J].
Mpiperis, Iordanis ;
Malassiotis, Sotiris ;
Strintzis, Michael G. .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2008, 3 (03) :498-511
[40]  
Ocegueda O., 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), P1270, DOI 10.1109/ICCVW.2011.6130397