Emotion recognition based on brain-like multimodal hierarchical perception

被引：0

作者：

Zhu X. ^{[1
]}

Huang Y. ^{[1
]}

Wang X. ^{[1
]}

Wang R. ^{[1
]}

机构：

[1] School of Communication and Information Engineering, Shanghai University, No. 99, Shangda Road, Shanghai

来源：

Multimedia Tools and Applications | 2024年 / 83卷 / 18期

基金：

中国国家自然科学基金;

关键词：

Brain-like perception; Emotion recognition; Multimodal;

D O I：

10.1007/s11042-023-17347-w

中图分类号：

学科分类号：

摘要：

Emotion recognition has gained prominence in diverse applications ranging from safe driving and e-commerce to healthcare. Traditional approaches have often relied on single-modal information such as visual, audio, or text, resulting in limitations in both reliability and robustness. To address these shortcomings, we introduce a brain-inspired computing model for emotion recognition that mimics the hierarchical processing characteristics of human cognitive functions. This innovative model accommodates multimodal information cohesively, aiming to emulate the human cognitive process across visual, audio, and text. To gain a better grasp of our brain-like hierarchical perception architecture, we stratify the model into three key layers: feature extraction, fusion, and decision-making. This structure integrates cognitive mechanisms with machine learning algorithms for enhanced performance. Specifically, we begin by extracting deep features that emulate the human brain’s perception of emotional cues. These features are then synthesized using a cross-attention mechanism to explore inter-modal correlations. Finally, the aggregated emotional data is categorized and recognized. Experimental results indicate that our approach achieves an average recognition accuracy of 82.7% across four distinct emotion classifications, showcasing its effectiveness and offering a fresh perspective for multimodal emotion recognition. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.

引用

页码：56039 / 56057

页数：18

共 66 条

[1]

Chen L., Wang K., Li M., Et al., K-means clustering based kernel canonical correlation analysis for multimodal emotion recognition in human robot interaction, IEEE Trans Ind Electron, 70, 1, pp. 1016-1024, (2022)

[2]

Zhang J., Yin Z., Chen P., Et al., Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review, Inf Fusion, 59, pp. 103-126, (2020)

[3]

Zhang Z., Coutinho E., Deng J., Et al., Cooperative learning and its application to emotion recognition from speech, IEEE/ACM Trans Audio Speech Lang Process, 23, 1, pp. 115-126, (2014)

[4]

Tan L., Yu K., Lin L., Et al., Speech emotion recognition enhanced traffic efficiency solution for autonomous vehicles in a 5G-enabled space-air-ground integrated intelligent transportation system, IEEE Trans Intell Transp Syst, 23, 3, pp. 2830-2842, (2021)

[5]

Crangle C.E., Wang R., Perreau-Guimaraes M., Et al., Machine Learning for the Recognition of Emotion in the Speech of Couples in Psychotherapy Using the Stanford Suppes Brain Lab Psychotherapy Dataset, (2019)

[6]

Ishaq M., Kwon S., Short-term energy forecasting framework using an ensemble deep learning approach, IEEE Access, 9, pp. 94262-94271, (2021)

[7]

Hu M., Wang H., Wang X., Et al., Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks, J Vis Commun Image Represent, 59, pp. 176-185, (2019)

[8]

Mellouk W., Handouzi W., Facial emotion recognition using deep learning: review and insights, Procedia Comput Sci, 175, pp. 689-694, (2020)

[9]

Kwon S., A CNN-assisted enhanced audio signal processing for speech emotion recognition, Sensors, 20, 1, (2019)

[10]

Venkataramanan K., Rajamohan H.R., Emotion Recognition from Speech, (2019)

← 1 2 3 4 5 6 7 →