Attend to Where and When: Cascaded Attention Network for Facial Expression Recognition

被引:19
|
作者
Qu, Xiaoye [1 ]
Zou, Zhikang [2 ]
Su, Xinxing [1 ]
Zhou, Pan [1 ]
Wei, Wei [3 ]
Wen, Shiping [4 ]
Wu, Dapeng [5 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Hubei, Peoples R China
[2] Baidu Inc, Dept Comp Vis Technol, Beijing 100085, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Engn, Wuhan 430074, Hubei, Peoples R China
[4] Univ Technol Sydney, Fac Engn & Informat Technol, Ctr Artificial Intelligence, Sydney, NSW 2007, Australia
[5] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2022年 / 6卷 / 03期
基金
中国国家自然科学基金;
关键词
Face recognition; Task analysis; Feature extraction; Recurrent neural networks; Image recognition; Convolutional neural networks; Spatiotemporal phenomena; Facial Expression Recognition; Landmark-based Spatial Attention; Temporal Attention;
D O I
10.1109/TETCI.2021.3070713
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing human expression in videos is a challenging task due to dynamic changes in facial actions and diverse visual appearances. The key to design a reliable video-based expression recognition system is to extract robust spatial features and make full use of temporal modality characteristics. In this paper, we present a novel network architecture called Cascaded Attention Network (CAN) which is a cascaded spatiotemporal model incorporating with both spatial and temporal attention, tailored to video-level facial expression recognition. The cascaded fundamental model consists of a transfer convolutional network and Bidirectional Long Short-Term Memory (BiLSTM) network. Spatial attention is designed from the facial landmarks since facial expressions depend on the actions of key regions (eyebrows, eyes, nose, and mouth) on the face. Focusing on these key regions can help to decrease the effect of person-specific attributes. Meanwhile, the temporal attention is applied to automatically select the peak of expressions and aggregate the video-level representation. Our proposed CAN achieves the state-of-the-art performance on the three most widely used facial expression datasets: CK+ (99.03%), Oulu-CASIA (88.33%), and MMI (83.55%). Moreover, we conduct an extended experiment on a much more complex wild dataset AFEW and the experimental results further verify the generality of our attention mechanisms.
引用
收藏
页码:580 / 592
页数:13
相关论文
共 50 条
  • [1] A cascaded spatiotemporal attention network for dynamic facial expression recognition
    Yaoguang Ye
    Yongqi Pan
    Yan Liang
    Jiahui Pan
    Applied Intelligence, 2023, 53 : 5402 - 5415
  • [2] A cascaded spatiotemporal attention network for dynamic facial expression recognition
    Ye, Yaoguang
    Pan, Yongqi
    Liang, Yan
    Pan, Jiahui
    APPLIED INTELLIGENCE, 2023, 53 (05) : 5402 - 5415
  • [3] Multiple Attention Network for Facial Expression Recognition
    Gan, Yanling
    Chen, Jingying
    Yang, Zongkai
    Xu, Luhui
    IEEE ACCESS, 2020, 8 : 7383 - 7393
  • [4] Using attention LSGB network for facial expression recognition
    Chan Su
    Jianguo Wei
    Deyu Lin
    Linghe Kong
    Pattern Analysis and Applications, 2023, 26 : 543 - 553
  • [5] Facial Expression Recognition Network Based on Attention Mechanism
    Zhang W.
    Li P.
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2022, 55 (07): : 706 - 713
  • [6] Using attention LSGB network for facial expression recognition
    Su, Chan
    Wei, Jianguo
    Lin, Deyu
    Kong, Linghe
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (02) : 543 - 553
  • [7] Hybrid Attention Cascade Network for Facial Expression Recognition
    Zhu, Xiaoliang
    Ye, Shihao
    Zhao, Liang
    Dai, Zhicheng
    SENSORS, 2021, 21 (06) : 1 - 16
  • [8] Facial Expression Recognition Based on Region Enhanced Attention Network
    Gongguan C.
    Fan Z.
    Hua W.
    Hui F.
    Caiming Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (01): : 152 - 160
  • [9] FLIPPING CONSISTENT AND COUNTERFACTUAL ATTENTION NETWORK FOR FACIAL EXPRESSION RECOGNITION
    Liu, Wenjie
    Shi, Xinlong
    Liu, Xianzhong
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2665 - 2669
  • [10] Facial Expression Recognition with Global Multiscale and Local Attention Network
    Zheng, Shukai
    Liu, Miao
    Zheng, Ligang
    Chen, Wenbin
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 403 - 414