Understanding Naturalistic Facial Expressions with Deep Learning and Multimodal Large Language Models

被引：5

作者：

Bian, Yifan ^{[1
]}

Kuester, Dennis ^{[2
]}

Liu, Hui ^{[2
]}

Krumhuber, Eva G. ^{[1
]}

机构：

[1] UCL, Dept Expt Psychol, London WC1H 0AP, England

[2] Univ Bremen, Dept Math & Comp Sci, D-28359 Bremen, Germany

来源：

SENSORS | 2024年 / 24卷 / 01期

关键词：

automatic facial expression recognition; naturalistic context; deep learning; multimodal large language model; RECOGNITION; EMOTION; CONTEXT; FACE; DATABASE;

D O I：

10.3390/s24010126

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

This paper provides a comprehensive overview of affective computing systems for facial expression recognition (FER) research in naturalistic contexts. The first section presents an updated account of user-friendly FER toolboxes incorporating state-of-the-art deep learning models and elaborates on their neural architectures, datasets, and performances across domains. These sophisticated FER toolboxes can robustly address a variety of challenges encountered in the wild such as variations in illumination and head pose, which may otherwise impact recognition accuracy. The second section of this paper discusses multimodal large language models (MLLMs) and their potential applications in affective science. MLLMs exhibit human-level capabilities for FER and enable the quantification of various contextual variables to provide context-aware emotion inferences. These advancements have the potential to revolutionize current methodological approaches for studying the contextual influences on emotions, leading to the development of contextualized emotion models.

引用

页数：15

共 50 条

[1] Harnessing multimodal approaches for depression detection using large language models and facial expressions
Misha Sadeghi
Robert Richer
Bernhard Egger
Lena Schindler-Gmelch
Lydia Helene Rupp
Farnaz Rahimi
Matthias Berking
Bjoern M. Eskofier
npj Mental Health Research, 3 (1):
[2] Naturalistic multimodal emotion data with deep learning can advance the theoretical understanding of emotion
Angkasirisan, Thanakorn
PSYCHOLOGICAL RESEARCH-PSYCHOLOGISCHE FORSCHUNG, 2025, 89 (01):
[3] Shortcut Learning of Large Language Models in Natural Language Understanding
Du, Mengnan
He, Fengxiang
Zou, Na
Tao, Dacheng
Hu, Xia
COMMUNICATIONS OF THE ACM, 2024, 67 (01) : 110 - 120
[4] Multimodal large language models for inclusive collaboration learning tasks
Lewis, Armanda
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210
[5] UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Zheng, Sipeng
Zhou, Bohan
Feng, Yicheng
Wang, Ye
Lu, Zongqing
COMPUTER VISION - ECCV 2024, PT VIII, 2025, 15066 : 426 - 443
[6] DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
Doris, Anna C.
Grandi, Daniele
Tomich, Ryan
Alam, Md Ferdous
Ataei, Mohammadmehdi
Cheong, Hyunmin
Ahmed, Faez
JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
[7] Surveillance Video-and-Language Understanding: From Small to Large Multimodal Models
Yuan, Tongtong
Zhang, Xuange
Liu, Bo
Liu, Kun
Jin, Jian
Jiao, Zhenzhen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 300 - 314
[8] Cloud-Device Collaborative Learning for Multimodal Large Language Models
Wang, Guanqun
Chen, Jiaming
Liu, Chenxuan
Zhang, Yuan
Ma, Junpeng
Wei, Xinyu
Zhang, Kevin
Chong, Maurice
Zhang, Renrui
Liu, Yijiang
Zhang, Shanghang
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12646 - 12655
[9] A survey on multimodal large language models
Yin, Shukang
Fu, Chaoyou
Zhao, Sirui
Li, Ke
Sun, Xing
Xu, Tong
Chen, Enhong
NATIONAL SCIENCE REVIEW, 2024, 11 (12)
[10] Leveraging Deep Learning and Multimodal Large Language Models for Near-Miss Detection Using Crowdsourced Videos
Jaradat, Shadi
Elhenawy, Mohammed
Ashqar, Huthaifa I.
Paz, Alexander
Nayak, Richi
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2025, 6 : 223 - 235

← 1 2 3 4 5 →