Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

被引:568
作者
Wang, Kai [1 ,2 ,3 ]
Peng, Xiaojiang [3 ,4 ]
Yang, Jianfei [5 ]
Meng, Debin [3 ,4 ]
Qiao, Yu [3 ,4 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen Key Lab Comp Vis & Pattern Recognit, Shenzhen 518000, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Shenzhen Inst Artificial Intelligence & Robot Soc, SIAT Branch, Shenzhen 518172, Peoples R China
[4] Chinese Acad Sci, Shenzhen Key Lab Comp Vis & Pattern Recognit, Shenzhen 518000, Peoples R China
[5] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Facial expression recognition; occlusion-robust and pose-invariant; region attention network; deep convolutional neural networks; FACE; REPRESENTATION;
D O I
10.1109/TIP.2019.2956143
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Occlusion and pose variations, which can change facial appearance significantly, are two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios. This paper addresses the real-world pose and occlusion robust FER problem in the following aspects. First, to stimulate the research of FER under real-world occlusions and variant poses, we annotate several in-the-wild FER datasets with pose and occlusion attributes for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact fixed-length representation. Last, inspired by the fact that facial expressions are mainly defined by facial action units, we propose a region biased loss to encourage high attention weights for the most important regions. We validate our RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW. Code and the collected test data will be publicly available.
引用
收藏
页码:4057 / 4069
页数:13
相关论文
共 71 条
[1]   Emotion Recognition in Speech using Cross-Modal Transfer in the Wild [J].
Albanie, Samuel ;
Nagrani, Arsha ;
Vedaldi, Andrea ;
Zisserman, Andrew .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :292-301
[2]  
Amos B., 2016, CMU, V6, P1
[3]  
[Anonymous], 2005 IEEE COMPUTER S, DOI 10.1109/CVPR.2005.177
[4]  
[Anonymous], 2018, CoRR abs/1804.08348
[5]  
[Anonymous], 2014, arXiv preprint arXiv:1411.7923
[6]   Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution [J].
Barsoum, Emad ;
Zhang, Cha ;
Ferrer, Cristian Canton ;
Zhang, Zhengyou .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :279-283
[7]   AUMPNet: simultaneous Action Units detection and intensity estimation on multipose facial images using a single convolutional neural network [J].
Batista, Julio Cesar ;
Albiero, Vitor ;
Bellon, Olga R. P. ;
Silva, Luciano .
2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, :866-871
[8]  
Bengio Y, 2015, INT C LEARNING REPRE
[9]   EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild [J].
Benitez-Quiroz, C. Fabian ;
Srinivasan, Ramprakash ;
Martinez, Aleix M. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5562-5570
[10]   FACIAL AREAS AND EMOTIONAL INFORMATION [J].
BOUCHER, JD ;
EKMAN, P .
JOURNAL OF COMMUNICATION, 1975, 25 (02) :21-29