Attend to Where and When: Cascaded Attention Network for Facial Expression Recognition

被引:21
作者
Qu, Xiaoye [1 ]
Zou, Zhikang [2 ]
Su, Xinxing [1 ]
Zhou, Pan [1 ]
Wei, Wei [3 ]
Wen, Shiping [4 ]
Wu, Dapeng [5 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Hubei, Peoples R China
[2] Baidu Inc, Dept Comp Vis Technol, Beijing 100085, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Engn, Wuhan 430074, Hubei, Peoples R China
[4] Univ Technol Sydney, Fac Engn & Informat Technol, Ctr Artificial Intelligence, Sydney, NSW 2007, Australia
[5] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2022年 / 6卷 / 03期
基金
中国国家自然科学基金;
关键词
Face recognition; Task analysis; Feature extraction; Recurrent neural networks; Image recognition; Convolutional neural networks; Spatiotemporal phenomena; Facial Expression Recognition; Landmark-based Spatial Attention; Temporal Attention;
D O I
10.1109/TETCI.2021.3070713
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing human expression in videos is a challenging task due to dynamic changes in facial actions and diverse visual appearances. The key to design a reliable video-based expression recognition system is to extract robust spatial features and make full use of temporal modality characteristics. In this paper, we present a novel network architecture called Cascaded Attention Network (CAN) which is a cascaded spatiotemporal model incorporating with both spatial and temporal attention, tailored to video-level facial expression recognition. The cascaded fundamental model consists of a transfer convolutional network and Bidirectional Long Short-Term Memory (BiLSTM) network. Spatial attention is designed from the facial landmarks since facial expressions depend on the actions of key regions (eyebrows, eyes, nose, and mouth) on the face. Focusing on these key regions can help to decrease the effect of person-specific attributes. Meanwhile, the temporal attention is applied to automatically select the peak of expressions and aggregate the video-level representation. Our proposed CAN achieves the state-of-the-art performance on the three most widely used facial expression datasets: CK+ (99.03%), Oulu-CASIA (88.33%), and MMI (83.55%). Moreover, we conduct an extended experiment on a much more complex wild dataset AFEW and the experimental results further verify the generality of our attention mechanisms.
引用
收藏
页码:580 / 592
页数:13
相关论文
共 50 条
[21]   A dual stream attention network for facial expression recognition in the wild [J].
Tang, Hui ;
Li, Yichang ;
Jin, Zhong .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) :5863-5880
[22]   Convolutional Network with Densely Backward Attention for Facial Expression Recognition [J].
Hua, Cam-Hao ;
Thien Huynh-The ;
Seo, Hyunseok ;
Lee, Sungyoung .
PROCEEDINGS OF THE 2020 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM), 2020,
[23]   A Facial Expression Recognition Method Based on a Multibranch Cross-Connection Convolutional Neural Network [J].
Shi, Cuiping ;
Tan, Cong ;
Wang, Liguo .
IEEE ACCESS, 2021, 9 :39255-39274
[24]   Pig facial expression recognition using multi-attention cascaded LSTM model [J].
Wen C. ;
Zhang X. ;
Wu J. ;
Yang C. ;
Li Z. ;
Shi L. ;
Yu H. .
Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2021, 37 (12) :181-190
[25]   Relation-Aware Facial Expression Recognition [J].
Xia, Yifan ;
Yu, Hui ;
Wang, Xiao ;
Jian, Muwei ;
Wang, Fei-Yue .
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (03) :1143-1154
[26]   Facial Expression Recognition With Confidence Guided Refined Horizontal Pyramid Network [J].
Su, Wen ;
Zhang, Haifeng ;
Su, Yuan ;
Yu, Jun .
IEEE ACCESS, 2021, 9 :50321-50331
[27]   Facial Expression Recognition in the Wild Using Multi-Level Features and Attention Mechanisms [J].
Li, Yingjian ;
Lu, Guangming ;
Li, Jinxing ;
Zhang, Zheng ;
Zhang, David .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) :451-462
[28]   Facial expression recognition with dynamic cascaded classifier [J].
Ashir, Abubakar M. ;
Eleyan, Alaa ;
Akdemir, Bayram .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (10) :6295-6309
[29]   Facial Expression Recognition Based on Convolution Neural Network [J].
Duan, Yue ;
Zhou, Linli ;
Wu, Yue .
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2017), 2017, 74 :339-343
[30]   Facial Expression Recognition Using Frequency Neural Network [J].
Tang, Yan ;
Zhang, Xingming ;
Hu, Xiping ;
Wang, Siqi ;
Wang, Haoxiang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (444-457) :444-457