Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition

被引:0
作者
Lv, Yuanling [1 ,2 ]
Huang, Guangyu [1 ,2 ]
Yan, Yan [1 ,2 ]
Xue, Jing-Hao [3 ]
Chen, Si [4 ]
Wang, Hanzi [1 ,2 ]
机构
[1] Xiamen Univ, Sch Informat, Fujian Key Lab Sensing & Comp Smart City, Xiamen 361005, Peoples R China
[2] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Com, Minist Educ China, Xiamen 361005, Peoples R China
[3] UCL, Dept Stat Sci, London WC1E 6BT, England
[4] Xiamen Univ Technol, Sch Comp & Informat Engn, Xiamen, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Compounds; Feature extraction; Task analysis; Face recognition; Image recognition; Training; Facial expression recognition; Class-incremental learning; Multi-modality learning; Attribute learning; NETWORK; JOINT;
D O I
10.1109/TMM.2024.3374573
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we study facial expression recognition (FER) in the class-incremental learning (CIL) setting, which defines the classification of well-studied and easily-accessible basic expressions as an initial task while learning new compound expressions gradually. Motivated by the fact that compound expressions are meaningful combinations of basic expressions, we treat basic expressions as attributes (i.e., semantic descriptors), and thus compound expressions are represented in terms of attributes. To this end, we propose a novel visual-textual attribute learning network (VTA-Net), mainly consisting of a textual-guided visual module (TVM) and a textual compositional module (TCM), for class-incremental FER. Specifically, TVM extracts textual-aware visual features and classifies expressions by incorporating the textual information into visual attribute learning. Meanwhile, TCM generates visual-aware textual features and predicts expressions by exploiting the dependency between textual attributes and category names of old and new expressions based on a textual compositional graph. In particular, a visual-textual distillation loss is introduced to calibrate TVM and TCM during incremental learning. Finally, the outputs from TVM and TCM are fused to make a final prediction. On the one hand, at each incremental task, the representations of visual attributes are enhanced since visual attributes are shared across old and new expressions. This increases the stability of our method. On the other hand, the textual modality, which involves rich prior knowledge of the relevance between expressions, facilitates our model to identify subtle visual distinctions between compound expressions, improving the plasticity of our method. Experimental results on both in-the-lab and in-the-wild facial expression databases show the superiority of our method against several state-of-the-art methods for class-incremental FER.
引用
收藏
页码:8038 / 8051
页数:14
相关论文
共 56 条
  • [11] Douillard Arthur, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12365), P86, DOI 10.1007/978-3-030-58565-5_6
  • [12] Compound facial expressions of emotion
    Du, Shichuan
    Tao, Yong
    Martinez, Aleix M.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (15) : E1454 - E1462
  • [13] CONSTANTS ACROSS CULTURES IN FACE AND EMOTION
    EKMAN, P
    FRIESEN, WV
    [J]. JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1971, 17 (02) : 124 - &
  • [14] Grossberg S, 2013, NEURAL NETWORKS, V37, P1, DOI [10.1016/j.neunet.2012.09.017, 10.1016/j.neunet.2011.10.011]
  • [15] Multi-modality Network with Visual and Geometrical Information for Micro Emotion Recognition
    Guo, Jianzhu
    Zhou, Shuai
    Wu, Jinlin
    Wan, Jun
    Zhu, Xiangyu
    Lei, Zhen
    Li, Stan Z.
    [J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 814 - 819
  • [16] Learning a Unified Classifier Incrementally via Rebalancing
    Hou, Saihui
    Pan, Xinyu
    Loy, Chen Change
    Wang, Zilei
    Lin, Dahua
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 831 - 839
  • [17] Identity-Aware Facial Expression Recognition Via Deep Metric Learning Based on Synthesized Images
    Huang, Wei
    Zhang, Siyuan
    Zhang, Peng
    Zha, Yufei
    Fang, Yuming
    Zhang, Yanning
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3327 - 3339
  • [18] Ji Z, 2020, AAAI CONF ARTIF INTE, V34, P11085
  • [19] Class-Incremental Learning by Knowledge Distillation with Adaptive Feature Consolidation
    Kang, Minsoo
    Park, Jaeyoo
    Han, Bohyung
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16050 - 16059
  • [20] Kipf T.N., 2017, INT C LEARN REPR ICL