Seeing Through the Mask: Recognition of Genuine Emotion Through Masked Facial Expression

被引：4

作者：

Zhou, Ju ^{[1
]}

Liu, Xinyu ^{[1
]}

Wang, Hanpu ^{[1
]}

Zhang, Zheyuan ^{[1
]}

Chen, Tong ^{[1
,2
]}

Fu, Xiaolan ^{[2
,3
]}

Liu, Guangyuan ^{[1
]}

机构：

[1] Southwest Univ, Coll Elect & Informat Engn, Chongqing 400715, Peoples R China

[2] Chinese Acad Sci, Inst Psychol, State Key Lab Brain & Cognit Sci, Beijing 100101, Peoples R China

[3] Univ Chinese Acad Sci, Dept Psychol, Beijing 100049, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 06期

关键词：

Emotion recognition; Videos; Face recognition; Task analysis; Feature extraction; Transformers; Convolutional neural networks; Intensity modulation; Vision sensors; Decoupled convolution; dynamic action unit intensity features (DAIFs); emotion recognition; hidden emotion; masked facial expression (MFE); vision Transformer (ViT); DATABASE; MODEL;

D O I：

10.1109/TCSS.2024.3404611

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The purpose of facial expression recognition is to recognize the corresponding emotions. However, people tend to hide their emotions by displaying facial expressions that differ from those evoked by emotions. These inconsistent facial expressions are referred to as masked facial expressions (MFEs). The automatic recognition of hidden emotions within an MFE using image data is challenging. In this study, we find distinctive movement patterns in the facial action units (AUs) of MFE sequences through a detailed analysis. Considering our findings, we propose handcrafted features called dynamic AU intensity features (DAIFs) to represent AU movement. Furthermore, we develop a decoupled AU transformer (DAUT) model for recognition, where the decoupled convolution operators ensure that the temporal information in the DAIF is not damaged. To further improve the recognition performance, we design self-supervised clip prediction for pretraining of DAUT. Experimental results demonstrate that our proposed method performs exceptionally well across all tasks in the MFE dataset, particularly improving accuracy by nearly double on the most challenging 36-class task. This suggests that leveraging temporal information from facial AU movements is a reliable and effective technique for recognizing MFEs.

引用

页码：7159 / 7172

页数：14

共 45 条

[1] Robust Discriminative Response Map Fitting with Constrained Local Models [J].

Asthana, Akshay ;

Zafeiriou, Stefanos ;

Cheng, Shiyang ;

Pantic, Maja .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3444-3451

[2]

Baltrusaitis T, 2015, IEEE INT CONF AUTOMA

[3] Video-Based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms [J].

Ben, Xianye ;

Ren, Yi ;

Zhang, Junping ;

Wang, Su-Jing ;

Kpalma, Kidiyo ;

Meng, Weixiao ;

Liu, Yong-Jin .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) :5826-5846

[4]

Borod J. C., 2000, The neuropsychology of emotion

[5] Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning [J].

Cai, Zhaowei ;

Ravichandran, Avinash ;

Maji, Subhransu ;

Fowlkes, Charless ;

Tu, Zhuowen ;

Soatto, Stefano .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :194-203

[6] Training Vision Transformers with only 2040 Images [J].

Cao, Yun-Hao ;

Yu, Hao ;

Wu, Jianxin .

COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 :220-237

[7]

Darwin C., 1872, P374

[8]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, 10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]

[9]

Duchenne G., 1990, The Mechanism of Human FacialExpression(Cambridge Books Online)

[10]

Dunteman G., 1989, Principal components analysis, DOI DOI 10.4135/9781412985475

← 1 2 3 4 5 →