Emotion recognition in live broadcasting: a multimodal deep learning framework

被引：0

作者：

Abbas, Rizwan ^{[1
]}

Schuller, Bjorn W. ^{[2
]}

Li, Xuewei ^{[1
]}

Lin, Chi ^{[3
]}

Li, Xi ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Lingyin St, Hangzhou 310058, Zhejiang, Peoples R China

[2] Imperial Coll London, Dept Comp, South Kensington Campus, London SW7 2AZ, England

[3] Dalian Univ Technol, Sch Software Technol, Dalian 116024, Liaoning, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2025年 / 31卷 / 03期

基金：

上海市自然科学基金;

关键词：

Multimodal emotion recognition; Facial expressions; Speech emotion; Tensor train layers; FUSION;

D O I：

10.1007/s00530-025-01780-y

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal emotion recognition is a rapidly developing field with applications across diverse fields such as entertainment, healthcare, marketing, and education. The emergence of live broadcasting demands real-time emotion recognition, which involves analyzing emotions via body language, voice, facial expressions, and context. Previous studies have faced challenges associated with multimodal emotion recognition in live broadcasting, such as computational efficiency, noisy and incomplete data, and difficult camera angles. This research presents a Multimodal Emotion Recognition in Live Broadcasting (MERLB) system that collects speech, facial expressions, and context displayed in live broadcasting for emotion recognition. We utilize a deep convolutional neural network architecture for facial emotion recognition, incorporating inception modules and dense blocks. We aim to enhance computational efficiency by focusing on key segments rather than analyzing the entire utterance. MERLB employs tensor train layers to combine multimodal representations at higher orders. Experiments were conducted on the FIFA, League of Legends, IEMOCAP, and CMU-MOSEI datasets. MERLB achieves a 6.44% F1 score improvement on the FIFA dataset and 4.71% on League of Legends, outperforming other multi-modal emotion methods on IEMOCAP and CMU-MOSEI datasets. Our code is available at https://github.com/swerizwan/merlb.

引用

页数：22

共 87 条

[1] Database for an emotion recognition system based on EEG signals and various computer games - GAMEEMO [J].