Patch attention convolutional vision transformer for facial expression recognition with occlusion

被引：44

作者：

Liu, Chang ^{[1
,2
]}

Hirota, Kaoru ^{[1
,2
]}

Dai, Yaping ^{[1
,2
]}

机构：

[1] Beijing Inst Technol, Sch Automat, 5 Zhongguancun South St, Beijing 100081, Peoples R China

[2] State Key Lab Intelligent Control & Decis Complex, 5 Zhongguancun South St, Beijing 100081, Peoples R China

来源：

INFORMATION SCIENCES | 2023年 / 619卷

关键词：

Facial expression recognition; Occlusion; Local and global feature; Self-attention; Vision transformer; FACE RECOGNITION; NETWORK; MULTISCALE;

D O I：

10.1016/j.ins.2022.11.068

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite substantial progress in Facial Expression Recognition (FER) in recent decades, most previous methods have been developed to recognize constrained facial expressions. Realworld occlusions lead to invisible facial regions and contaminated facial features, which undoubtedly increase the difficulty of FER in the wild. Therefore, a Patch Attention Convolutional Vision Transformer (PACVT) is proposed to tackle the occlusion FER problem. The backbone convolutional neural network is used to extract facial feature maps, which are cropped into multiple regional patches to extract local and global features. The Patch Attention Unit (PAU) is designed to perceive occluded regions by adaptively calculating the patch-level attention weights of local features for expression recognition. The facial patches are mapped into sequences of visual tokens, and the Vision Transformer (ViT) is employed to capture the interactions and correlations between these visual tokens from a global perspective. The self-attention in ViT enables the PACVT to focus on the salient patches with discriminative features and ignore the occlusion. Experiments are conducted on three widely used expression datasets and their occlusion subsets, and the results demonstrate that the proposed PACVT outperforms state-of-the-art methods on occlusion FER. Cross-dataset experiment results evidence the generalization ability of the PACVT. (c) 2022 Elsevier Inc. All rights reserved.

引用

页码：781 / 794

页数：14

共 46 条

[1] Amos B., 2016, CMUCS16118
[2] [Anonymous], 2013, Computer Science
[3] IDENTITY-FREE FACIAL EXPRESSION RECOGNITION USING CONDITIONAL GENERATIVE ADVERSARIAL NETWORK
Cai, Jie
Meng, Zibo
Khan, Ahmed Shehab
O'Reilly, James
Li, Zhiyuan
Han, Shizhong
Tong, Yan
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1344 - 1348
[4] Rotation-reversal invariant HOG cascade for facial expression recognition
Chen, Jinhui
Takiguchi, Tetsuya
Ariki, Yasuo
[J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2017, 11 (08) : 1485 - 1492
[5] Facial Expression Recognition in Video with Multiple Feature Fusion
Chen, Junkai
Chen, Zenghai
Chi, Zheru
Fu, Hong
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2018, 9 (01) : 38 - 50
[6] Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction
Chen, Luefeng
Zhou, Mengtian
Su, Wanjuan
Wu, Min
She, Jinhua
Hirota, Kaoru
[J]. INFORMATION SCIENCES, 2018, 428 : 49 - 61
[7] Cheng CL, 2017, 2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), DOI 10.1109/ULTSYM.2017.8091659
[8] Chu X., 2021, ARXIV, DOI DOI 10.48550/ARXIV.2102.10882
[9] SPARSE REPRESENTATION FOR ACCURATE CLASSIFICATION OF CORRUPTED AND OCCLUDED FACIAL EXPRESSIONS
Cotter, Shane F.
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 838 - 841
[10] A Comprehensive Survey on Pose-Invariant Face Recognition
Ding, Changxing
Tao, Dacheng
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2016, 7 (03)

← 1 2 3 4 5 →