Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition

被引:0
作者
Lv, Yuanling [1 ,2 ]
Huang, Guangyu [1 ,2 ]
Yan, Yan [1 ,2 ]
Xue, Jing-Hao [3 ]
Chen, Si [4 ]
Wang, Hanzi [1 ,2 ]
机构
[1] Xiamen Univ, Sch Informat, Fujian Key Lab Sensing & Comp Smart City, Xiamen 361005, Peoples R China
[2] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Com, Minist Educ China, Xiamen 361005, Peoples R China
[3] UCL, Dept Stat Sci, London WC1E 6BT, England
[4] Xiamen Univ Technol, Sch Comp & Informat Engn, Xiamen, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Compounds; Feature extraction; Task analysis; Face recognition; Image recognition; Training; Facial expression recognition; Class-incremental learning; Multi-modality learning; Attribute learning; NETWORK; JOINT;
D O I
10.1109/TMM.2024.3374573
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we study facial expression recognition (FER) in the class-incremental learning (CIL) setting, which defines the classification of well-studied and easily-accessible basic expressions as an initial task while learning new compound expressions gradually. Motivated by the fact that compound expressions are meaningful combinations of basic expressions, we treat basic expressions as attributes (i.e., semantic descriptors), and thus compound expressions are represented in terms of attributes. To this end, we propose a novel visual-textual attribute learning network (VTA-Net), mainly consisting of a textual-guided visual module (TVM) and a textual compositional module (TCM), for class-incremental FER. Specifically, TVM extracts textual-aware visual features and classifies expressions by incorporating the textual information into visual attribute learning. Meanwhile, TCM generates visual-aware textual features and predicts expressions by exploiting the dependency between textual attributes and category names of old and new expressions based on a textual compositional graph. In particular, a visual-textual distillation loss is introduced to calibrate TVM and TCM during incremental learning. Finally, the outputs from TVM and TCM are fused to make a final prediction. On the one hand, at each incremental task, the representations of visual attributes are enhanced since visual attributes are shared across old and new expressions. This increases the stability of our method. On the other hand, the textual modality, which involves rich prior knowledge of the relevance between expressions, facilitates our model to identify subtle visual distinctions between compound expressions, improving the plasticity of our method. Experimental results on both in-the-lab and in-the-wild facial expression databases show the superiority of our method against several state-of-the-art methods for class-incremental FER.
引用
收藏
页码:8038 / 8051
页数:14
相关论文
共 56 条
  • [1] EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild
    Benitez-Quiroz, C. Fabian
    Srinivasan, Ramprakash
    Martinez, Aleix M.
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5562 - 5570
  • [2] RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING
    BIEDERMAN, I
    [J]. PSYCHOLOGICAL REVIEW, 1987, 94 (02) : 115 - 147
  • [3] Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
    Chaudhry, Arslan
    Dokania, Puneet K.
    Ajanthan, Thalaiyasingam
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 : 556 - 572
  • [4] Semantic-Rich Facial Emotional Expression Recognition
    Chen, Keyu
    Yang, Xu
    Fan, Changjie
    Zhang, Wei
    Ding, Yu
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 1906 - 1916
  • [5] Convolutional Features-Based Broad Learning With LSTM for Multidimensional Facial Emotion Recognition in Human-Robot Interaction
    Chen, Luefeng
    Li, Min
    Wu, Min
    Pedrycz, Witold
    Hirota, Kaoru
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (01): : 64 - 75
  • [6] MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
    Chen, Shiming
    Hong, Ziming
    Xie, Guo-Sen
    Yang, Wenhan
    Peng, Qinmu
    Wang, Kai
    Zhao, Jian
    You, Xinge
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7602 - 7611
  • [7] Feature Estimations Based Correlation Distillation for Incremental Image Retrieval
    Chen, Wei
    Liu, Yu
    Pu, Nan
    Wang, Weiping
    Liu, Li
    Lew, Michael S.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1844 - 1856
  • [8] Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning
    Cheraghian, Ali
    Rahman, Shafin
    Fang, Pengfei
    Roy, Soumava Kumar
    Petersson, Lars
    Harandi, Mehrtash
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2534 - 2543
  • [9] Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention
    Dat Huynh
    Elhamifar, Ehsan
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4482 - 4492
  • [10] Dosovitskiy A., 2021, P INT C LEARN REPR I