A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition

被引:28
作者
Liang, Xingcan [1 ,2 ]
Xu, Linsen [5 ]
Zhang, Wenxiang [3 ]
Zhang, Yan [4 ]
Liu, Jinfu [1 ,2 ]
Liu, Zhipeng [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, Hefei Inst Phys Sci, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei 230026, Peoples R China
[3] Changzhou Univ, Sch Microelect & Control Engn, Changzhou 213164, Peoples R China
[4] Anhui Jianzhu Univ, Sch Elect & Informat Engn, Hefei 230009, Peoples R China
[5] Hohai Univ, Coll Mech & Elect Engn, Changzhou 213022, Peoples R China
关键词
Facial expression recognition; CNNs; Transformers; Feature fusion; Robust on occlusions and head-pose variations;
D O I
10.1007/s00371-022-02413-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Facial expression recognition (FER) has attracted much more attention due to its broad range of applications. Occlusions and head-pose variations are two major obstacles for automatic FER. In this paper, we propose a convolution-transformer dual branch network (CT-DBN) that takes advantage of local and global facial information to tackle the real-word occlusions and head-pose variant robust FER. The CT-DBN contains two branches. Taking into account local modeling ability of CNN, the first branch utilizes CNN to capture local edge information. Inspired by transformers' successful application in natural language processing, we employ transformer to the second branch to be responsible for obtaining better global representation. Then, a local-global feature fusion module is proposed to adaptively integrate both features to hybrid features and model the relationship between them. With the help of feature fusion module, our network not only integrates local and global features in an adaptive weighting manner but can also learn the corresponding distinguishable features autonomously. Experimental results under inner-database and cross-database evaluation on four leading facial expression databases illustrate that our proposed CT-DBN outperforms other state-of-the-art methods and achieves robust performance under in-the-wild condition.
引用
收藏
页码:2277 / 2290
页数:14
相关论文
共 57 条
[1]   Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy [J].
Agrawal, Abhinav ;
Mittal, Namita .
VISUAL COMPUTER, 2020, 36 (02) :405-412
[2]   Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution [J].
Barsoum, Emad ;
Zhang, Cha ;
Ferrer, Cristian Canton ;
Zhang, Zhengyou .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :279-283
[3]   Island Loss for Learning Discriminative Features in Facial Expression Recognition [J].
Cai, Jie ;
Meng, Zibo ;
Khan, Ahmed Shehab ;
Li, Zhiyuan ;
O'Reilly, James ;
Tong, Yan .
PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :302-309
[4]   Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction [J].
Chen, Luefeng ;
Zhou, Mengtian ;
Su, Wanjuan ;
Wu, Min ;
She, Jinhua ;
Hirota, Kaoru .
INFORMATION SCIENCES, 2018, 428 :49-61
[5]  
Dahmane Mohamed, 2011, 2011 IEEE INT C AUT, P884
[6]   Attentional Feature Fusion [J].
Dai, Yimian ;
Gieseke, Fabian ;
Oehmcke, Stefan ;
Wu, Yiquan ;
Barnard, Kobus .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :3559-3568
[7]  
Ding H., 2020, ARXIV PREPRINT ARXIV
[8]  
Dosovitskiy A., 2021, arXiv
[9]  
Falcon W., 2019, GitHub
[10]   Facial Expression Recognition in the Wild via Deep Attentive Center Loss [J].
Farzaneh, Amir Hossein ;
Qi, Xiaojun .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2401-2410