Visual Scene-aware Hybrid Neural Network Architecture for Video-based Facial Expression Recognition

被引:5
作者
Lee, Min Kyu [1 ]
Choi, Dong Yoon [1 ]
Kim, Dae Ha [1 ]
Song, Byung Cheol [1 ]
机构
[1] Inha Univ, Dept Elect Engn, Incheon, South Korea
来源
2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019) | 2019年
关键词
D O I
10.1109/fg.2019.8756551
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With rapid development of deep learning, facial expression recognition ( FER) technology has made considerable progress recently. However, since conventional FER techniques are mainly designed and learned for videos which are artificially acquired in a limited environment, they may not operate robustly on videos acquired in a wild environment. To solve this problem, this paper proposes a scene-aware hybrid neural network ( NN) having a novel combination of three-dimensional ( 3D) convolutional NN ( CNN), 2D CNN and recurrent NN ( RNN). The characteristics of the proposed network are as follows. First, we extract video-based global features and frame-based local features at the same time. In detail, the latent features containing the overall visual scene of a given video are extracted by 3D CNN with auxiliary classifier, and fine-tuned 2D CNN is adopted to extract latent features containing small details from each frame. Second, RNN not only performs temporal domain learning, but also feature-wise fuses two latent features extracted from the networks. For effective fusion, we also present three RNN schemes. Third, the proposed network, in which the above-mentioned methods collaborate, works very robust in a wild environment as well as in a limited environment. Extensive experiments show that the proposed network provides an average accuracy of 49.9% for AFEW dataset, i. e., a representative wild dataset, and an amazing accuracy of 98.2% for another CK+ dataset. We also show that the proposed network outperforms the state-of-the-art network(s).
引用
收藏
页码:153 / 160
页数:8
相关论文
共 45 条
[1]   Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications [J].
Adrian Corneanu, Ciprian ;
Oliu Simon, Marc ;
Cohn, Jeffrey F. ;
Escalera Guerrero, Sergio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) :1548-1568
[2]  
[Anonymous], 2015, ICML
[3]  
Baltrusaitis T, 2015, IEEE INT CONF AUTOMA
[4]  
Bihan Jiang, 2011, Proceedings 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG 2011), P314, DOI 10.1109/FG.2011.5771416
[5]   Learning person-specific models for facial expression and action unit recognition [J].
Chen, Jixu ;
Liu, Xiaoming ;
Tu, Peter ;
Aragones, Amy .
PATTERN RECOGNITION LETTERS, 2013, 34 (15) :1964-1970
[6]   Selective Transfer Machine for Personalized Facial Expression Analysis [J].
Chu, Wen-Sheng ;
De la Torre, Fernando ;
Cohn, Jeffrey F. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (03) :529-545
[7]  
Chung J, 2014, Em-
[8]   EmotiW 2016: Video and Group-Level Emotion Recognition Challenges [J].
Dhall, Abhinav ;
Goecke, Roland ;
Joshi, Jyoti ;
Hoey, Jesse ;
Gedeon, Tom .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :427-432
[9]   Collecting Large, Richly Annotated Facial-Expression Databases from Movies [J].
Dhall, Abhinav ;
Goecke, Roland ;
Lucey, Simon ;
Gedeon, Tom .
IEEE MULTIMEDIA, 2012, 19 (03) :34-41
[10]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497