Pose-Aware Facial Expression Recognition Assisted by Expression Descriptions

被引：3

作者：

Wang, Shangfei ^{[1
]}

Wu, Yi ^{[1
]}

Chang, Yanan ^{[1
]}

Li, Guoming ^{[2
]}

Mao, Meng ^{[2
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Key Lab Comp & Commun Software Anhui Provice, Hefei 230027, Anhui, Peoples R China

[2] China Merchants Bank, AI Lab, Shenzhen 518000, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2024年 / 15卷 / 01期

关键词：

Pose-aware; expression descriptions; facial expression recognition; cross-modality attention; JOINT POSE; MULTIVIEW;

D O I：

10.1109/TAFFC.2023.3267774

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although expression descriptions provide additional information about facial behaviors despite of different poses, and pose features are beneficial to adapt to pose variety, neither has been fully leveraged in facial expression recognition. This paper proposes a pose-aware text-assisted facial expression recognition method using cross-modality attention. Specifically, the method contains three components. The pose feature extractor extracts pose-related features from facial images, and then cooperates with a fully-connected layer for pose classification. When poses can be clearly discriminated and classified, features obtained from the extractor can represent the corresponding poses. To eliminate bias due to appearance and illumination, cluster centers are taken as the final pose features. The text feature extractor obtains embeddings from expression descriptions. These descriptions are first passed through Intra-Exp attention to obtain preliminary embeddings. To leverage the correlations among expressions, all expression embeddings are then concatenated and passed through Inter-Exp attention. The cross-modality module attempts to learn attention maps that distinguish the importance of facial regions by using prior knowledge about poses and expression descriptions. The image features weighted by the attention maps are utilized to recognize pose and expression jointly. Experiments on three benchmark datasets demonstrate the superiority of the proposed method.

引用

页码：241 / 253

页数：13

共 31 条

[11] Emotion-Preserving Representation Learning via Generative Adversarial Network for Multi-view Facial Expression Recognition [J].