A Novel Emotion-Aware Method Based on the Fusion of Textual Description of Speech, Body Movements, and Facial Expressions

被引：11

作者：

Du, Guanglong ^{[1
]}

Zeng, Yuwen ^{[1
]}

Su, Kang ^{[1
]}

Li, Chunquan ^{[2
]}

Wang, Xueqian ^{[3
]}

Teng, Shaohua ^{[4
]}

Li, Di ^{[5
]}

Liu, Peter Xiaoping ^{[6
]}

机构：

[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China

[2] Nanchang Univ, Sch Informat Engn, Nanchang 330031, Jiangxi, Peoples R China

[3] Univ Town Shenzhen, Tsinghua Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China

[4] Guangdong Univ Technol, Dept Artificial Intelligent & Informat Engn, Guangzhou 510006, Peoples R China

[5] South China Univ Technol, Sch Mech & Automot Engn, Guangzhou 510006, Peoples R China

[6] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2022年 / 71卷

基金：

中国国家自然科学基金;

关键词：

Emotion recognition; Feature extraction; Speech recognition; Neural networks; Fuses; Physiology; Face recognition; Body movements; facial expressions; multimodal emotion recognition; psychological problem; text-level feature fusion; RECOGNITION; DYNAMICS;

D O I：

10.1109/TIM.2022.3204940

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Emotion computing is a necessary part of advanced human-computer interaction. An appropriate description of a character's facial expressions, body languages, and speaking styles in novels always enables readers to infer the character's emotions. Moreover, multimodal information is complementary and integrated. Fusing the information from multiple modes into a textual modal can get better fusion results and overcome the bias of understanding the unimodal information. Inspired by these facts, we develop a novel emotion-aware method by the fusion of textual description of speech, body movements, and facial expression, which reduces the dimensionality of speech, body movements, and facial expressions by unifying three types of information into a unified component. Specifically, to fuse multimodel features for emotion recognition, we propose a two-stage neural network. First, bidirectional long short-term memory-conditional random fields (Bi-LSTM-CRF) and back-propagation neural network (BPNN) are used to analyze the extracted vocal and visual features of facial expressions, body movements, and speeches, which aims to obtain textual descriptions of different features. Second, the textual descriptions of the features are fused through a neural network with a self-organization map (SOM) layer and are used to compensate layers that are trained by web-based corpus. The advantages of this method are to utilize depth information to track facial and bodily movement, and employ an explainable textual intermediate representation to fuse the features. We experimentally tested the emotion-aware system in real-world applications, and the results indicate that our system can quickly and steadily recognize human emotions. Compared with other unimodal and multimodal-fusion algorithms, our method is more precise, which can improve the accuracy by up to 30% compared with the unimodal method.

引用

页数：16

共 26 条

[21] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
Liu, Dong
Wang, Zhiyong
Wang, Lifeng
Chen, Longxi
FRONTIERS IN NEUROROBOTICS, 2021, 15
[22] System and method for recognizing human emotion state based on analysis of speech and facial feature extraction; Applications to Human-Robot Interaction
Rabiei, Mohammad
Gasparetto, Alessandro
2016 4TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2016, : 266 - 271
[23] An Emotion Recognition Method for Humanoid Robot Body Movements Based on a PSO-BP-RMSProp Neural Network
Gao, Wa
Jiang, Tanfeng
Zhai, Wanli
Zha, Fusheng
SENSORS, 2024, 24 (22)
[24] Novel Method to Predict Body Weight in Children Based on Age and Morphological Facial Features
Huang, Ziyin
Barrett, Jeffrey S.
Barrett, Kyle
Barrett, Ryan
Ng, Chee M.
JOURNAL OF CLINICAL PHARMACOLOGY, 2015, 55 (04) : 447 - 451
[25] Facial video-based non-contact emotion recognition: A multi-view features expression and fusion method
Tao, Xue
Su, Liwei
Rao, Zhi
Li, Ye
Wu, Dan
Ji, Xiaoqiang
Liu, Jikui
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 96
[26] The blenderFace method: video-based measurement of raw movement data during facial expressions of emotion using open-source software
Zinkernagel, Axel
Alexandrowicz, Rainer W.
Lischetzke, Tanja
Schmitt, Manfred
BEHAVIOR RESEARCH METHODS, 2019, 51 (02) : 747 - 768

← 1 2 3 →