An efficient multimodal 2D+3D feature-based approach to automatic facial expression recognition

被引:75
作者
Li, Huibin [1 ]
Ding, Huaxiong [2 ]
Huang, Di [3 ]
Wang, Yunhong [3 ]
Zhao, Xi [4 ]
Morvan, Jean-Marie [5 ,6 ]
Chen, Liming [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China
[2] Ecole Cent Lyon, LIRIS UMR5205, Lyon, France
[3] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[4] Xi An Jiao Tong Univ, Sch Management, Xian 710049, Peoples R China
[5] Univ Lyon 1, Inst Camille Jordan, F-69365 Lyon, France
[6] King Abdullah Univ Sci & Technol, GMSV Res Ctr, Thuwal, Saudi Arabia
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Facial expression recognition; Local texture descriptor; Local shape descriptor; Multimodal fusion; FACE; HISTOGRAMS;
D O I
10.1016/j.cviu.2015.07.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU-3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 20 face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar-CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU-3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:83 / 92
页数:10
相关论文
共 48 条
[1]   Facial expression recognition and synthesis based on an appearance model [J].
Abboud, B ;
Davoine, F ;
Dang, M .
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2004, 19 (08) :723-740
[2]  
[Anonymous], 2010, IEEE CVPR 10 WORKSHO
[3]  
[Anonymous], 2008 23 INT S COMP I
[4]  
[Anonymous], IEEE C COMP VIS PATT
[5]  
[Anonymous], P IEEE INT C AUT FAC
[6]  
[Anonymous], 2013, IEEE C COMP VIS PATT
[7]  
[Anonymous], IEEE INT C AUT FAC G
[8]  
[Anonymous], 2012, FACE EXPRESSION RECO
[9]   Incremental Face Alignment in the Wild [J].
Asthana, Akshay ;
Zafeiriou, Stefanos ;
Cheng, Shiyang ;
Pantic, Maja .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1859-1866
[10]  
Berretti S., 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P4125, DOI 10.1109/ICPR.2010.1002