Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition

被引:3
|
作者
Xie, Weicheng [1 ]
Peng, Zhibin [1 ]
Shen, Linlin [1 ]
Lu, Wenya [1 ]
Zhang, Yang [1 ]
Song, Siyang [2 ]
机构
[1] Shenzhen Univ, Shenzhen Inst Artificial Intelligence, Sch Comp Sci & Software Engn, Comp Vis Inst,Guangdong Key Lab Intelligent Inform, Shenzhen 518060, Peoples R China
[2] Univ Cambridge, Dept Comp Sci & Technol, Cambridge CB2 1TN, England
关键词
Semantics; Cross layer design; Face recognition; Self-supervised learning; Representation learning; Faces; Task analysis; Facial expression recognition; contrastive learning; latent semantic alignment; multi-layer attention; NETWORK; ATTENTION;
D O I
10.1109/TIP.2024.3378459
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural networks (CNNs) have achieved significant improvement for the task of facial expression recognition. However, current training still suffers from the inconsistent learning intensities among different layers, i.e., the feature representations in the shallow layers are not sufficiently learned compared with those in deep layers. To this end, this work proposes a contrastive learning framework to align the feature semantics of shallow and deep layers, followed by an attention module for representing the multi-scale features in the weight-adaptive manner. The proposed algorithm has three main merits. First, the learning intensity, defined as the magnitude of the backpropagation gradient, of the features on the shallow layer is enhanced by cross-layer contrastive learning. Second, the latent semantics in the shallow-layer and deep-layer features are explored and aligned in the contrastive learning, and thus the fine-grained characteristics of expressions can be taken into account for the feature representation learning. Third, by integrating the multi-scale features from multiple layers with an attention module, our algorithm achieved the state-of-the-art performances, i.e. 92.21%, 89.50%, 62.82%, on three in-the-wild expression databases, i.e. RAF-DB, FERPlus, SFEW, and the second best performance, i.e. 65.29% on AffectNet dataset. Our codes will be made publicly available.
引用
收藏
页码:2514 / 2529
页数:16
相关论文
共 50 条
  • [31] Joint Deep Learning of Facial Expression Synthesis and Recognition
    Yan, Yan
    Huang, Ying
    Chen, Si
    Shen, Chunhua
    Wang, Hanzi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2792 - 2807
  • [32] Facial expression recognition using dual dictionary learning
    Moeini, Ali
    Faez, Karim
    Moeini, Hossein
    Safai, Armon Matthew
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 45 : 20 - 33
  • [33] Deep Cross-Layer Collaborative Learning Network for Online Knowledge Distillation
    Su, Tongtong
    Liang, Qiyu
    Zhang, Jinsong
    Yu, Zhaoyang
    Xu, Ziyue
    Wang, Gang
    Liu, Xiaoguang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2075 - 2087
  • [34] Multimodal learning for facial expression recognition
    Zhang, Wei
    Zhang, Youmei
    Ma, Lin
    Guan, Jingwei
    Gong, Shijie
    PATTERN RECOGNITION, 2015, 48 (10) : 3191 - 3202
  • [35] Learning Dynamic Relationships for Facial Expression Recognition Based on Graph Convolutional Network
    Jin, Xing
    Lai, Zhihui
    Jin, Zhong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (30) : 7143 - 7155
  • [36] DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition
    Li, Ming
    Fu, Huazhu
    He, Shengfeng
    Fan, Hehe
    Liu, Jun
    Keppo, Jussi
    Shou, Mike Zheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6297 - 6309
  • [37] AU-Oriented Expression Decomposition Learning for Facial Expression Recognition
    Lin, Zehao
    She, Jiahui
    Shen, Qiu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 265 - 277
  • [38] Expression-Guided Deep Joint Learning for Facial Expression Recognition
    Fang, Bei
    Zhao, Yujie
    Han, Guangxin
    He, Juhou
    SENSORS, 2023, 23 (16)
  • [39] Contrastive Learning of View-invariant Representations for Facial Expressions Recognition
    Roy, Shuvendu
    Etemad, Ali
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [40] Robust Transferable Subspace Learning for Cross-Corpus Facial Expression Recognition
    Chen, Dongliang
    Song, Peng
    Zhang, Wenjing
    Zhang, Weijian
    Xu, Bingui
    Zhou, Xuan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (10): : 2241 - 2245