Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition

被引：0

作者：

Lv, Yuanling ^{[1
,2
]}

Huang, Guangyu ^{[1
,2
]}

Yan, Yan ^{[1
,2
]}

Xue, Jing-Hao ^{[3
]}

Chen, Si ^{[4
]}

Wang, Hanzi ^{[1
,2
]}

机构：

[1] Xiamen Univ, Sch Informat, Fujian Key Lab Sensing & Comp Smart City, Xiamen 361005, Peoples R China

[2] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Com, Minist Educ China, Xiamen 361005, Peoples R China

[3] UCL, Dept Stat Sci, London WC1E 6BT, England

[4] Xiamen Univ Technol, Sch Comp & Informat Engn, Xiamen, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Compounds; Feature extraction; Task analysis; Face recognition; Image recognition; Training; Facial expression recognition; Class-incremental learning; Multi-modality learning; Attribute learning; NETWORK; JOINT;

D O I：

10.1109/TMM.2024.3374573

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we study facial expression recognition (FER) in the class-incremental learning (CIL) setting, which defines the classification of well-studied and easily-accessible basic expressions as an initial task while learning new compound expressions gradually. Motivated by the fact that compound expressions are meaningful combinations of basic expressions, we treat basic expressions as attributes (i.e., semantic descriptors), and thus compound expressions are represented in terms of attributes. To this end, we propose a novel visual-textual attribute learning network (VTA-Net), mainly consisting of a textual-guided visual module (TVM) and a textual compositional module (TCM), for class-incremental FER. Specifically, TVM extracts textual-aware visual features and classifies expressions by incorporating the textual information into visual attribute learning. Meanwhile, TCM generates visual-aware textual features and predicts expressions by exploiting the dependency between textual attributes and category names of old and new expressions based on a textual compositional graph. In particular, a visual-textual distillation loss is introduced to calibrate TVM and TCM during incremental learning. Finally, the outputs from TVM and TCM are fused to make a final prediction. On the one hand, at each incremental task, the representations of visual attributes are enhanced since visual attributes are shared across old and new expressions. This increases the stability of our method. On the other hand, the textual modality, which involves rich prior knowledge of the relevance between expressions, facilitates our model to identify subtle visual distinctions between compound expressions, improving the plasticity of our method. Experimental results on both in-the-lab and in-the-wild facial expression databases show the superiority of our method against several state-of-the-art methods for class-incremental FER.

引用

页码：8038 / 8051

页数：14

共 56 条

[1] EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild
Benitez-Quiroz, C. Fabian
Srinivasan, Ramprakash
Martinez, Aleix M.
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5562 - 5570
[2] RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING
BIEDERMAN, I
[J]. PSYCHOLOGICAL REVIEW, 1987, 94 (02) : 115 - 147
[3] Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Chaudhry, Arslan
Dokania, Puneet K.
Ajanthan, Thalaiyasingam
Torr, Philip H. S.
[J]. COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 : 556 - 572
[4] Semantic-Rich Facial Emotional Expression Recognition
Chen, Keyu
Yang, Xu
Fan, Changjie
Zhang, Wei
Ding, Yu
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 1906 - 1916
[5] Convolutional Features-Based Broad Learning With LSTM for Multidimensional Facial Emotion Recognition in Human-Robot Interaction
Chen, Luefeng
Li, Min
Wu, Min
Pedrycz, Witold
Hirota, Kaoru
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (01): : 64 - 75
[6] MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
Chen, Shiming
Hong, Ziming
Xie, Guo-Sen
Yang, Wenhan
Peng, Qinmu
Wang, Kai
Zhao, Jian
You, Xinge
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7602 - 7611
[7] Feature Estimations Based Correlation Distillation for Incremental Image Retrieval
Chen, Wei
Liu, Yu
Pu, Nan
Wang, Weiping
Liu, Li
Lew, Michael S.
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1844 - 1856
[8] Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning
Cheraghian, Ali
Rahman, Shafin
Fang, Pengfei
Roy, Soumava Kumar
Petersson, Lars
Harandi, Mehrtash
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2534 - 2543
[9] Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention
Dat Huynh
Elhamifar, Ehsan
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4482 - 4492
[10] Dosovitskiy A., 2021, P INT C LEARN REPR I

← 1 2 3 4 5 6 →