Emotion recognition in live broadcasting: a multimodal deep learning framework

被引:0
作者
Abbas, Rizwan [1 ]
Schuller, Bjorn W. [2 ]
Li, Xuewei [1 ]
Lin, Chi [3 ]
Li, Xi [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Lingyin St, Hangzhou 310058, Zhejiang, Peoples R China
[2] Imperial Coll London, Dept Comp, South Kensington Campus, London SW7 2AZ, England
[3] Dalian Univ Technol, Sch Software Technol, Dalian 116024, Liaoning, Peoples R China
基金
上海市自然科学基金;
关键词
Multimodal emotion recognition; Facial expressions; Speech emotion; Tensor train layers; FUSION;
D O I
10.1007/s00530-025-01780-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal emotion recognition is a rapidly developing field with applications across diverse fields such as entertainment, healthcare, marketing, and education. The emergence of live broadcasting demands real-time emotion recognition, which involves analyzing emotions via body language, voice, facial expressions, and context. Previous studies have faced challenges associated with multimodal emotion recognition in live broadcasting, such as computational efficiency, noisy and incomplete data, and difficult camera angles. This research presents a Multimodal Emotion Recognition in Live Broadcasting (MERLB) system that collects speech, facial expressions, and context displayed in live broadcasting for emotion recognition. We utilize a deep convolutional neural network architecture for facial emotion recognition, incorporating inception modules and dense blocks. We aim to enhance computational efficiency by focusing on key segments rather than analyzing the entire utterance. MERLB employs tensor train layers to combine multimodal representations at higher orders. Experiments were conducted on the FIFA, League of Legends, IEMOCAP, and CMU-MOSEI datasets. MERLB achieves a 6.44% F1 score improvement on the FIFA dataset and 4.71% on League of Legends, outperforming other multi-modal emotion methods on IEMOCAP and CMU-MOSEI datasets. Our code is available at https://github.com/swerizwan/merlb.
引用
收藏
页数:22
相关论文
共 87 条
[1]   Database for an emotion recognition system based on EEG signals and various computer games - GAMEEMO [J].
Alakus, Talha Burak ;
Gonen, Murat ;
Turkoglu, Ibrahim .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 60
[2]   Detecting naturalistic expression of emotions using physiological signals while playing video games [J].
AlZoubi, Omar ;
AlMakhadmeh, Buthina ;
Yassein, Muneer Bani ;
Mardini, Wail .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (2) :1133-1146
[3]  
Andrew G., 2013, ICML
[4]   Considering emotions and contextual factors in music recommendation: a systematic literature review [J].
Assuncao, Willian G. ;
Piccolo, Lara S. G. ;
Zaina, Luciana A. M. .
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (06) :8367-8407
[5]   Popular music and the role of vocal melody in perceived emotion [J].
Beveridge, Scott ;
Knox, Don .
PSYCHOLOGY OF MUSIC, 2018, 46 (03) :411-423
[6]   Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network [J].
Bhangale, Kishor ;
Kothandaraman, Mohanaprasad .
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) :2341-2384
[7]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[8]   Emotion recognition from physiological signals and video games to detect personality traits [J].
Callejas-Cuervo, Mauro ;
Alejandra Martinez-Tejada, Laura ;
Catherine Alarcon-Aldana, Andrea .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) :1935-1943
[9]   Happy or Sad, Smiling or Drawing with Multimodal Search and Visualisation of Movies Based on Emotions Along Time [J].
Chambel, Teresa ;
Caldeira, Francisco ;
Loureiro, Joao .
PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES WORKSHOPS, IMXW 2023, 2023, :109-109
[10]   Deep, Landmark-Free FAME: Face Alignment, Modeling, and Expression Estimation [J].
Chang, Feng-Ju ;
Anh Tuan Tran ;
Hassner, Tal ;
Masi, Iacopo ;
Nevatia, Ram ;
Medioni, Gerard .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (6-7) :930-956