A Novel Emotion-Aware Method Based on the Fusion of Textual Description of Speech, Body Movements, and Facial Expressions

被引:11
|
作者
Du, Guanglong [1 ]
Zeng, Yuwen [1 ]
Su, Kang [1 ]
Li, Chunquan [2 ]
Wang, Xueqian [3 ]
Teng, Shaohua [4 ]
Li, Di [5 ]
Liu, Peter Xiaoping [6 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Nanchang Univ, Sch Informat Engn, Nanchang 330031, Jiangxi, Peoples R China
[3] Univ Town Shenzhen, Tsinghua Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
[4] Guangdong Univ Technol, Dept Artificial Intelligent & Informat Engn, Guangzhou 510006, Peoples R China
[5] South China Univ Technol, Sch Mech & Automot Engn, Guangzhou 510006, Peoples R China
[6] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada
基金
中国国家自然科学基金;
关键词
Emotion recognition; Feature extraction; Speech recognition; Neural networks; Fuses; Physiology; Face recognition; Body movements; facial expressions; multimodal emotion recognition; psychological problem; text-level feature fusion; RECOGNITION; DYNAMICS;
D O I
10.1109/TIM.2022.3204940
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Emotion computing is a necessary part of advanced human-computer interaction. An appropriate description of a character's facial expressions, body languages, and speaking styles in novels always enables readers to infer the character's emotions. Moreover, multimodal information is complementary and integrated. Fusing the information from multiple modes into a textual modal can get better fusion results and overcome the bias of understanding the unimodal information. Inspired by these facts, we develop a novel emotion-aware method by the fusion of textual description of speech, body movements, and facial expression, which reduces the dimensionality of speech, body movements, and facial expressions by unifying three types of information into a unified component. Specifically, to fuse multimodel features for emotion recognition, we propose a two-stage neural network. First, bidirectional long short-term memory-conditional random fields (Bi-LSTM-CRF) and back-propagation neural network (BPNN) are used to analyze the extracted vocal and visual features of facial expressions, body movements, and speeches, which aims to obtain textual descriptions of different features. Second, the textual descriptions of the features are fused through a neural network with a self-organization map (SOM) layer and are used to compensate layers that are trained by web-based corpus. The advantages of this method are to utilize depth information to track facial and bodily movement, and employ an explainable textual intermediate representation to fuse the features. We experimentally tested the emotion-aware system in real-world applications, and the results indicate that our system can quickly and steadily recognize human emotions. Compared with other unimodal and multimodal-fusion algorithms, our method is more precise, which can improve the accuracy by up to 30% compared with the unimodal method.
引用
收藏
页数:16
相关论文
共 26 条
  • [1] Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG
    Pan, Jiahui
    Fang, Weijie
    Zhang, Zhihang
    Chen, Bingzhi
    Zhang, Zheng
    Wang, Shuihua
    IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY, 2024, 5 : 396 - 403
  • [2] Multimodal Emotion Recognition Based on Facial Expressions, Speech, and Body Gestures
    Yan, Jingjie
    Li, Peiyuan
    Du, Chengkun
    Zhu, Kang
    Zhou, Xiaoyang
    Liu, Ying
    Wei, Jinsheng
    ELECTRONICS, 2024, 13 (18)
  • [3] A Real-Time Emotion-Aware System Based on Wireless Body Area Network for IoMT Applications
    Li, Chang
    Mao, Yingchi
    Huang, Qian
    Xie, Weiliang
    He, Xiaoming
    Wu, Jie
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (24): : 41182 - 41193
  • [4] A multimodal emotion recognition method based on facial expressions and electroencephalography
    Tan, Ying
    Sun, Zhe
    Duan, Feng
    Sole-Casals, Jordi
    Caiafa, Cesar F.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [5] User Emotion Recognition Method Based on Facial Expression and Speech Signal Fusion
    Lu, Fei
    Zhang, Long
    Tian, Guohui
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1121 - 1126
  • [6] A Novel Emotion-Aware Hybrid Music Recommendation Method Using Deep Neural Network
    Wang, Shu
    Xu, Chonghuan
    Ding, Austin Shijun
    Tang, Zhongyun
    ELECTRONICS, 2021, 10 (15)
  • [7] Multi-Modal Emotion Aware System Based on Fusion of Speech and Brain Information
    Ghoniem, Rania M.
    Algarni, Abeer D.
    Shaalan, Khaled
    INFORMATION, 2019, 10 (07)
  • [8] Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition
    Cardenas-Lopez, Hector Manuel
    Zatarain-Cabada, Ramon
    Barron-Estrada, Maria Lucia
    Mitre-Hernandez, Hugo
    SOFT COMPUTING, 2023, 27 (22) : 17357 - 17367
  • [9] Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition
    Héctor Manuel Cárdenas-López
    Ramón Zatarain-Cabada
    María Lucía Barrón-Estrada
    Hugo Mitre-Hernández
    Soft Computing, 2023, 27 : 17357 - 17367
  • [10] Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features
    Wang, Yonggu
    Pan, Kailin
    Shao, Yifan
    Ma, Jiarong
    Li, Xiaojuan
    APPLIED SCIENCES-BASEL, 2025, 15 (06):