HoloNet: Towards Robust Emotion Recognition in the Wild

被引:72
作者
Yao, Anbang [1 ]
Cai, Dongqi [1 ]
Hu, Ping [1 ]
Wang, Shandong [1 ]
Sha, Liang [2 ]
Chen, Yurong [1 ]
机构
[1] Intel Labs China, Beijing 100190, Peoples R China
[2] Beihang Univ, Lab 673, Beijing 100191, Peoples R China
来源
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2016年
关键词
Emotion Recognition; EmotiW; 2016; Challenge; Deep Learning; Convolutional Neural Networks;
D O I
10.1145/2993148.2997639
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present HoloNet, a well-designed Convolutional Neural Network (CNN) architecture regarding our submissions to the video based sub-challenge of the Emotion Recognition in the Wild (EmotiW) 2016 challenge. In contrast to previous related methods that usually adopt relatively simple and shallow neural network architectures to address emotion recognition task, our HoloNet has three critical considerations in network design. (1) To reduce redundant filters and enhance the non-saturated nonlinearity in the lower convolutional layers, we use a modified Concatenated Rectified Linear Unit (CReLU) instead of ReLU. (2) To enjoy the accuracy gain from considerably increased network depth and maintain efficiency, we combine residual structure and CReLU to construct the middle layers. (3) To broaden network width and introduce multi-scale feature extraction property, the topper layers are designed as a variant of inception-residual structure. The main benefit of grouping these modules into the HoloNet is that both negative and positive phase information implicitly contained in the input data can flow over it in multiple paths, thus deep multi-scale features explicitly capturing emotion variation can be well extracted from multi-path sibling layers, and then can be further concatenated for robust recognition. We obtain competitive results in this year's video based emotion recognition sub-challenge using an ensemble of two HoloNet models trained with given data only. Specifically, we obtain a mean recognition rate of 57.84%, outperforming the baseline accuracy with an absolute margin of 17.37%, and yielding 4.04% absolute accuracy gain compared to the result of last year's winner team. Meanwhile, our method runs with a speed of several thousands of frames per second on a GPU, thus it is well applicable to real-time scenarios.
引用
收藏
页码:472 / 478
页数:7
相关论文
共 32 条
[1]  
[Anonymous], P 33 INT C MACH LEAR
[2]  
[Anonymous], 2015, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2015.7299058
[3]  
[Anonymous], 2014, Proceedings of the 16th International Conference on Multimodal Interaction, DOI 10.1145/2663204.2666275
[4]   Blessing of Dimensionality: High-dimensional Feature and Its Efficient Compression for Face Verification [J].
Chen, Dong ;
Cao, Xudong ;
Wen, Fang ;
Sun, Jian .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3025-3032
[5]   Illumination compensation and normalization for robust face recognition using discrete cosine transform in logarithm domain [J].
Chen, Weilong ;
Er, Meng Joo ;
Wu, Shiqian .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2006, 36 (02) :458-466
[6]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[7]   Human detection using oriented histograms of flow and appearance [J].
Dalal, Navneet ;
Triggs, Bill ;
Schmid, Cordelia .
COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441
[8]   EmotiW 2016: Video and Group-Level Emotion Recognition Challenges [J].
Dhall, Abhinav ;
Goecke, Roland ;
Joshi, Jyoti ;
Hoey, Jesse ;
Gedeon, Tom .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :427-432
[9]   Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 [J].
Dhall, Abhinav ;
Murthy, O. V. Ramana ;
Goecke, Roland ;
Joshi, Jyoti ;
Gedeon, Tom .
ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, :423-426
[10]   Emotion Recognition In The Wild Challenge 2013 [J].
Dhall, Abhinav ;
Goecke, Roland ;
Joshi, Jyoti ;
Wagner, Michael ;
Gedeon, Tom .
ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, :509-515