Understanding Naturalistic Facial Expressions with Deep Learning and Multimodal Large Language Models

被引:5
|
作者
Bian, Yifan [1 ]
Kuester, Dennis [2 ]
Liu, Hui [2 ]
Krumhuber, Eva G. [1 ]
机构
[1] UCL, Dept Expt Psychol, London WC1H 0AP, England
[2] Univ Bremen, Dept Math & Comp Sci, D-28359 Bremen, Germany
关键词
automatic facial expression recognition; naturalistic context; deep learning; multimodal large language model; RECOGNITION; EMOTION; CONTEXT; FACE; DATABASE;
D O I
10.3390/s24010126
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
This paper provides a comprehensive overview of affective computing systems for facial expression recognition (FER) research in naturalistic contexts. The first section presents an updated account of user-friendly FER toolboxes incorporating state-of-the-art deep learning models and elaborates on their neural architectures, datasets, and performances across domains. These sophisticated FER toolboxes can robustly address a variety of challenges encountered in the wild such as variations in illumination and head pose, which may otherwise impact recognition accuracy. The second section of this paper discusses multimodal large language models (MLLMs) and their potential applications in affective science. MLLMs exhibit human-level capabilities for FER and enable the quantification of various contextual variables to provide context-aware emotion inferences. These advancements have the potential to revolutionize current methodological approaches for studying the contextual influences on emotions, leading to the development of contextualized emotion models.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Harnessing multimodal approaches for depression detection using large language models and facial expressions
    Misha Sadeghi
    Robert Richer
    Bernhard Egger
    Lena Schindler-Gmelch
    Lydia Helene Rupp
    Farnaz Rahimi
    Matthias Berking
    Bjoern M. Eskofier
    npj Mental Health Research, 3 (1):
  • [2] Naturalistic multimodal emotion data with deep learning can advance the theoretical understanding of emotion
    Angkasirisan, Thanakorn
    PSYCHOLOGICAL RESEARCH-PSYCHOLOGISCHE FORSCHUNG, 2025, 89 (01):
  • [3] Shortcut Learning of Large Language Models in Natural Language Understanding
    Du, Mengnan
    He, Fengxiang
    Zou, Na
    Tao, Dacheng
    Hu, Xia
    COMMUNICATIONS OF THE ACM, 2024, 67 (01) : 110 - 120
  • [4] Multimodal large language models for inclusive collaboration learning tasks
    Lewis, Armanda
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210
  • [5] UniCode: Learning a Unified Codebook for Multimodal Large Language Models
    Zheng, Sipeng
    Zhou, Bohan
    Feng, Yicheng
    Wang, Ye
    Lu, Zongqing
    COMPUTER VISION - ECCV 2024, PT VIII, 2025, 15066 : 426 - 443
  • [6] DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
    Doris, Anna C.
    Grandi, Daniele
    Tomich, Ryan
    Alam, Md Ferdous
    Ataei, Mohammadmehdi
    Cheong, Hyunmin
    Ahmed, Faez
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
  • [7] Surveillance Video-and-Language Understanding: From Small to Large Multimodal Models
    Yuan, Tongtong
    Zhang, Xuange
    Liu, Bo
    Liu, Kun
    Jin, Jian
    Jiao, Zhenzhen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 300 - 314
  • [8] Cloud-Device Collaborative Learning for Multimodal Large Language Models
    Wang, Guanqun
    Chen, Jiaming
    Liu, Chenxuan
    Zhang, Yuan
    Ma, Junpeng
    Wei, Xinyu
    Zhang, Kevin
    Chong, Maurice
    Zhang, Renrui
    Liu, Yijiang
    Zhang, Shanghang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12646 - 12655
  • [9] A survey on multimodal large language models
    Yin, Shukang
    Fu, Chaoyou
    Zhao, Sirui
    Li, Ke
    Sun, Xing
    Xu, Tong
    Chen, Enhong
    NATIONAL SCIENCE REVIEW, 2024, 11 (12)
  • [10] Leveraging Deep Learning and Multimodal Large Language Models for Near-Miss Detection Using Crowdsourced Videos
    Jaradat, Shadi
    Elhenawy, Mohammed
    Ashqar, Huthaifa I.
    Paz, Alexander
    Nayak, Richi
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2025, 6 : 223 - 235