Attend to Where and When: Cascaded Attention Network for Facial Expression Recognition

被引：19

作者：

Qu, Xiaoye ^{[1
]}

Zou, Zhikang ^{[2
]}

Su, Xinxing ^{[1
]}

Zhou, Pan ^{[1
]}

Wei, Wei ^{[3
]}

Wen, Shiping ^{[4
]}

Wu, Dapeng ^{[5
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Hubei, Peoples R China

[2] Baidu Inc, Dept Comp Vis Technol, Beijing 100085, Peoples R China

[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Engn, Wuhan 430074, Hubei, Peoples R China

[4] Univ Technol Sydney, Fac Engn & Informat Technol, Ctr Artificial Intelligence, Sydney, NSW 2007, Australia

[5] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2022年 / 6卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Face recognition; Task analysis; Feature extraction; Recurrent neural networks; Image recognition; Convolutional neural networks; Spatiotemporal phenomena; Facial Expression Recognition; Landmark-based Spatial Attention; Temporal Attention;

D O I：

10.1109/TETCI.2021.3070713

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognizing human expression in videos is a challenging task due to dynamic changes in facial actions and diverse visual appearances. The key to design a reliable video-based expression recognition system is to extract robust spatial features and make full use of temporal modality characteristics. In this paper, we present a novel network architecture called Cascaded Attention Network (CAN) which is a cascaded spatiotemporal model incorporating with both spatial and temporal attention, tailored to video-level facial expression recognition. The cascaded fundamental model consists of a transfer convolutional network and Bidirectional Long Short-Term Memory (BiLSTM) network. Spatial attention is designed from the facial landmarks since facial expressions depend on the actions of key regions (eyebrows, eyes, nose, and mouth) on the face. Focusing on these key regions can help to decrease the effect of person-specific attributes. Meanwhile, the temporal attention is applied to automatically select the peak of expressions and aggregate the video-level representation. Our proposed CAN achieves the state-of-the-art performance on the three most widely used facial expression datasets: CK+ (99.03%), Oulu-CASIA (88.33%), and MMI (83.55%). Moreover, we conduct an extended experiment on a much more complex wild dataset AFEW and the experimental results further verify the generality of our attention mechanisms.

引用

页码：580 / 592

页数：13

共 50 条

[1] A cascaded spatiotemporal attention network for dynamic facial expression recognition
Yaoguang Ye
Yongqi Pan
Yan Liang
Jiahui Pan
Applied Intelligence, 2023, 53 : 5402 - 5415
[2] A cascaded spatiotemporal attention network for dynamic facial expression recognition
Ye, Yaoguang
Pan, Yongqi
Liang, Yan
Pan, Jiahui
APPLIED INTELLIGENCE, 2023, 53 (05) : 5402 - 5415
[3] Multiple Attention Network for Facial Expression Recognition
Gan, Yanling
Chen, Jingying
Yang, Zongkai
Xu, Luhui
IEEE ACCESS, 2020, 8 : 7383 - 7393
[4] Using attention LSGB network for facial expression recognition
Chan Su
Jianguo Wei
Deyu Lin
Linghe Kong
Pattern Analysis and Applications, 2023, 26 : 543 - 553
[5] Facial Expression Recognition Network Based on Attention Mechanism
Zhang W.
Li P.
Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2022, 55 (07): : 706 - 713
[6] Using attention LSGB network for facial expression recognition
Su, Chan
Wei, Jianguo
Lin, Deyu
Kong, Linghe
PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (02) : 543 - 553
[7] Hybrid Attention Cascade Network for Facial Expression Recognition
Zhu, Xiaoliang
Ye, Shihao
Zhao, Liang
Dai, Zhicheng
SENSORS, 2021, 21 (06) : 1 - 16
[8] Facial Expression Recognition Based on Region Enhanced Attention Network
Gongguan C.
Fan Z.
Hua W.
Hui F.
Caiming Z.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (01): : 152 - 160
[9] FLIPPING CONSISTENT AND COUNTERFACTUAL ATTENTION NETWORK FOR FACIAL EXPRESSION RECOGNITION
Liu, Wenjie
Shi, Xinlong
Liu, Xianzhong
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2665 - 2669
[10] Facial Expression Recognition with Global Multiscale and Local Attention Network
Zheng, Shukai
Liu, Miao
Zheng, Ligang
Chen, Wenbin
ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 403 - 414

← 1 2 3 4 5 →