Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition

被引：0

作者：

Lv, Yuanling ^{[1
,2
]}

Huang, Guangyu ^{[1
,2
]}

Yan, Yan ^{[1
,2
]}

Xue, Jing-Hao ^{[3
]}

Chen, Si ^{[4
]}

Wang, Hanzi ^{[1
,2
]}

机构：

[1] Xiamen Univ, Sch Informat, Fujian Key Lab Sensing & Comp Smart City, Xiamen 361005, Peoples R China

[2] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Com, Minist Educ China, Xiamen 361005, Peoples R China

[3] UCL, Dept Stat Sci, London WC1E 6BT, England

[4] Xiamen Univ Technol, Sch Comp & Informat Engn, Xiamen, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Compounds; Feature extraction; Task analysis; Face recognition; Image recognition; Training; Facial expression recognition; Class-incremental learning; Multi-modality learning; Attribute learning; NETWORK; JOINT;

D O I：

10.1109/TMM.2024.3374573

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we study facial expression recognition (FER) in the class-incremental learning (CIL) setting, which defines the classification of well-studied and easily-accessible basic expressions as an initial task while learning new compound expressions gradually. Motivated by the fact that compound expressions are meaningful combinations of basic expressions, we treat basic expressions as attributes (i.e., semantic descriptors), and thus compound expressions are represented in terms of attributes. To this end, we propose a novel visual-textual attribute learning network (VTA-Net), mainly consisting of a textual-guided visual module (TVM) and a textual compositional module (TCM), for class-incremental FER. Specifically, TVM extracts textual-aware visual features and classifies expressions by incorporating the textual information into visual attribute learning. Meanwhile, TCM generates visual-aware textual features and predicts expressions by exploiting the dependency between textual attributes and category names of old and new expressions based on a textual compositional graph. In particular, a visual-textual distillation loss is introduced to calibrate TVM and TCM during incremental learning. Finally, the outputs from TVM and TCM are fused to make a final prediction. On the one hand, at each incremental task, the representations of visual attributes are enhanced since visual attributes are shared across old and new expressions. This increases the stability of our method. On the other hand, the textual modality, which involves rich prior knowledge of the relevance between expressions, facilitates our model to identify subtle visual distinctions between compound expressions, improving the plasticity of our method. Experimental results on both in-the-lab and in-the-wild facial expression databases show the superiority of our method against several state-of-the-art methods for class-incremental FER.

引用

页码：8038 / 8051

页数：14

共 56 条

[11] Douillard Arthur, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12365), P86, DOI 10.1007/978-3-030-58565-5_6
[12] Compound facial expressions of emotion
Du, Shichuan
Tao, Yong
Martinez, Aleix M.
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (15) : E1454 - E1462
[13] CONSTANTS ACROSS CULTURES IN FACE AND EMOTION
EKMAN, P
FRIESEN, WV
[J]. JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1971, 17 (02) : 124 - &
[14] Grossberg S, 2013, NEURAL NETWORKS, V37, P1, DOI [10.1016/j.neunet.2012.09.017, 10.1016/j.neunet.2011.10.011]
[15] Multi-modality Network with Visual and Geometrical Information for Micro Emotion Recognition
Guo, Jianzhu
Zhou, Shuai
Wu, Jinlin
Wan, Jun
Zhu, Xiangyu
Lei, Zhen
Li, Stan Z.
[J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 814 - 819
[16] Learning a Unified Classifier Incrementally via Rebalancing
Hou, Saihui
Pan, Xinyu
Loy, Chen Change
Wang, Zilei
Lin, Dahua
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 831 - 839
[17] Identity-Aware Facial Expression Recognition Via Deep Metric Learning Based on Synthesized Images
Huang, Wei
Zhang, Siyuan
Zhang, Peng
Zha, Yufei
Fang, Yuming
Zhang, Yanning
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3327 - 3339
[18] Ji Z, 2020, AAAI CONF ARTIF INTE, V34, P11085
[19] Class-Incremental Learning by Knowledge Distillation with Adaptive Feature Consolidation
Kang, Minsoo
Park, Jaeyoo
Han, Bohyung
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16050 - 16059
[20] Kipf T.N., 2017, INT C LEARN REPR ICL

← 1 2 3 4 5 6 →