Patch attention convolutional vision transformer for facial expression recognition with occlusion

被引:44
作者
Liu, Chang [1 ,2 ]
Hirota, Kaoru [1 ,2 ]
Dai, Yaping [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Automat, 5 Zhongguancun South St, Beijing 100081, Peoples R China
[2] State Key Lab Intelligent Control & Decis Complex, 5 Zhongguancun South St, Beijing 100081, Peoples R China
关键词
Facial expression recognition; Occlusion; Local and global feature; Self-attention; Vision transformer; FACE RECOGNITION; NETWORK; MULTISCALE;
D O I
10.1016/j.ins.2022.11.068
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite substantial progress in Facial Expression Recognition (FER) in recent decades, most previous methods have been developed to recognize constrained facial expressions. Realworld occlusions lead to invisible facial regions and contaminated facial features, which undoubtedly increase the difficulty of FER in the wild. Therefore, a Patch Attention Convolutional Vision Transformer (PACVT) is proposed to tackle the occlusion FER problem. The backbone convolutional neural network is used to extract facial feature maps, which are cropped into multiple regional patches to extract local and global features. The Patch Attention Unit (PAU) is designed to perceive occluded regions by adaptively calculating the patch-level attention weights of local features for expression recognition. The facial patches are mapped into sequences of visual tokens, and the Vision Transformer (ViT) is employed to capture the interactions and correlations between these visual tokens from a global perspective. The self-attention in ViT enables the PACVT to focus on the salient patches with discriminative features and ignore the occlusion. Experiments are conducted on three widely used expression datasets and their occlusion subsets, and the results demonstrate that the proposed PACVT outperforms state-of-the-art methods on occlusion FER. Cross-dataset experiment results evidence the generalization ability of the PACVT. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:781 / 794
页数:14
相关论文
共 46 条
  • [1] Amos B., 2016, CMUCS16118
  • [2] [Anonymous], 2013, Computer Science
  • [3] IDENTITY-FREE FACIAL EXPRESSION RECOGNITION USING CONDITIONAL GENERATIVE ADVERSARIAL NETWORK
    Cai, Jie
    Meng, Zibo
    Khan, Ahmed Shehab
    O'Reilly, James
    Li, Zhiyuan
    Han, Shizhong
    Tong, Yan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1344 - 1348
  • [4] Rotation-reversal invariant HOG cascade for facial expression recognition
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2017, 11 (08) : 1485 - 1492
  • [5] Facial Expression Recognition in Video with Multiple Feature Fusion
    Chen, Junkai
    Chen, Zenghai
    Chi, Zheru
    Fu, Hong
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2018, 9 (01) : 38 - 50
  • [6] Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction
    Chen, Luefeng
    Zhou, Mengtian
    Su, Wanjuan
    Wu, Min
    She, Jinhua
    Hirota, Kaoru
    [J]. INFORMATION SCIENCES, 2018, 428 : 49 - 61
  • [7] Cheng CL, 2017, 2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), DOI 10.1109/ULTSYM.2017.8091659
  • [8] Chu X., 2021, ARXIV, DOI DOI 10.48550/ARXIV.2102.10882
  • [9] SPARSE REPRESENTATION FOR ACCURATE CLASSIFICATION OF CORRUPTED AND OCCLUDED FACIAL EXPRESSIONS
    Cotter, Shane F.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 838 - 841
  • [10] A Comprehensive Survey on Pose-Invariant Face Recognition
    Ding, Changxing
    Tao, Dacheng
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2016, 7 (03)