Facial action unit detection with emotion consistency: a cross-modal learning approach

被引:0
|
作者
Song, Wenyu [1 ]
Liu, Dongxin [1 ]
An, Gaoyun [2 ,3 ]
Duan, Yun [1 ]
Wang, Laifu [1 ]
机构
[1] China Telecom Res Inst, Guangzhou, Peoples R China
[2] Beijing Jiaotong Univ, Inst Informat Sci, Beijing, Peoples R China
[3] Beijing Key Lab Adv Informat Sci & Network Technol, Beijing, Peoples R China
关键词
Facial Action Unit; Emotional expression consistency; Cross-modal learning; Multi-task learning;
D O I
10.1007/s00530-024-01552-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Facial Action Unit (AU) detection is essential for understanding emotional expressions. This study explores the intricate relationship among AUs, AU descriptions, and facial expressions, emphasizing emotional expression consistency. AUs represent specific facial muscle movements that form the basis of expressions, thus maintaining a solid physiological foundation is crucial for understanding emotional communication. Moreover, AU descriptions serve as linguistic representations and semantic alignment with expressions is paramount. Therefore, the vocabulary in AU descriptions must precisely reflect expression features to ensure coherence between textual and visual cues. Our method, AUTr-emo, employs cross-modal learning, incorporating AU text descriptions as queries and using facial expression recognition as an auxiliary task. This approach highlights the importance of emotional expression consistency across AUs, textual descriptions, and expressions. Extensive experiments are conducted on two challenging datasets, BP4D and DISFA, and experimental results show that our proposed AUTr-emo achieves performance comparable to the state-of-the-art in the field of AU detection.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals
    Nawaz, Shah
    Janjua, Muhammad Kamran
    Gallo, Ignazio
    Mahmood, Arif
    Calefati, Alessandro
    2019 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2019, : 83 - 89
  • [42] Cross-Modal Learning Based Flexible Bimodal Biometric Authentication With Template Protection
    Jiang, Qi
    Zhao, Guichuan
    Ma, Xindi
    Li, Meng
    Tian, Youliang
    Li, Xinghua
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 3593 - 3607
  • [43] Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text
    Schindler, Alexander
    Gordea, Sergiu
    Knees, Peter
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 706 - 713
  • [44] Cross-Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data
    Jin, Yi
    Deligiannis, Anastasios
    Fuentes-Michel, Juan-Carlos
    Vossiek, Martin
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (04): : 3012 - 3025
  • [45] Cross-modal learning using privileged information for long-tailed image classification
    Li, Xiangxian
    Zheng, Yuze
    Ma, Haokai
    Qi, Zhuang
    Meng, Xiangxu
    Meng, Lei
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (05) : 981 - 992
  • [46] PointCMC: cross-modal multi-scale correspondences learning for point cloud understanding
    Zhou, Honggu
    Peng, Xiaogang
    Luo, Yikai
    Wu, Zizhao
    MULTIMEDIA SYSTEMS, 2024, 30 (03)
  • [47] Cross-Modal Learning via Adversarial Loss and Covariate Shift for Enhanced Liver Segmentation
    Ozkan, Savas
    Selver, M. Alper
    Baydar, Bora
    Kavur, Ali Emre
    Candemir, Cemre
    Akar, Gozde Bozdagi
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2723 - 2735
  • [48] A Cross-Modal transfer approach for histological images: A case study in aquaculture for disease identification using Zero-Shot learning
    Mendieta, Milton
    Romero, Dennis
    2017 IEEE SECOND ECUADOR TECHNICAL CHAPTERS MEETING (ETCM), 2017,
  • [49] FROM INTRA-MODAL TO INTER-MODAL SPACE: MULTI-TASK LEARNING OF SHARED REPRESENTATIONS FOR CROSS-MODAL RETRIEVAL
    Choi, Jaeyoung
    Larson, Martha
    Friedland, Gerald
    Hanjalic, Alan
    2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2019), 2019, : 1 - 10
  • [50] Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval
    Wu, Xiaoyu
    Wang, Tiantian
    Wang, Shengjin
    ELECTRONICS, 2020, 9 (12) : 1 - 17