Learning Alignment for Multimodal Emotion Recognition from Speech

被引：59

作者：

Xu, Haiyang ^{[1
]}

Zhang, Hui ^{[1
]}

Han, Kun ^{[2
]}

Wang, Yun ^{[3
]}

Peng, Yiping ^{[1
]}

Li, Xiangang ^{[1
]}

机构：

[1] DiDi Chuxing, Beijing, Peoples R China

[2] DiDi Res Amer, Mountain View, CA USA

[3] Peking Univ, Beijing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

关键词：

Emotion Recognition; Multimodal; Attention; Alignment; CLASSIFICATION;

D O I：

10.21437/Interspeech.2019-3247

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Speech emotion recognition is a challenging problem because human convey emotions in subtle and complex ways. For emotion recognition on human speech, one can either extract emotion related features from audio signals or employ speech recognition techniques to generate text from speech and then apply natural language processing to analyze the sentiment. Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality. One can build models for two input sources separately and combine them in a decision level, but this method ignores the interaction between speech and text in the temporal domain. In this paper, we propose to use an attention mechanism to learn the alignment between speech frames and text words, aiming to produce more accurate multimodal feature representations. The aligned multimodal features are fed into a sequential model for emotion recognition. We evaluate the approach on the IEMOCAP dataset and the experimental results show the proposed approach achieves the state-of-the-art performance on the dataset.

引用

页码：3569 / 3573

页数：5

共 50 条

[1] Learning Mutual Correlation in Multimodal Transformer for Speech Emotion Recognition
Wang, Yuhua
Shen, Guang
Xu, Yuezhu
Li, Jiahang
Zhao, Zhengdao
INTERSPEECH 2021, 2021, : 4518 - 4522
[2] Masked Graph Learning With Recurrent Alignment for Multimodal Emotion Recognition in Conversation
Meng, Tao
Zhang, Fuchen
Shou, Yuntao
Shao, Hongen
Ai, Wei
Li, Keqin
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4298 - 4312
[3] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
Ghosh, Sreyan
Tyagi, Utkarsh
Ramaneswaran, S.
Srivastava, Harshvardhan
Manocha, Dinesh
INTERSPEECH 2023, 2023, : 1209 - 1213
[4] Learning deep multimodal affective features for spontaneous speech emotion recognition
Zhang, Shiqing
Tao, Xin
Chuang, Yuelong
Zhao, Xiaoming
SPEECH COMMUNICATION, 2021, 127 : 73 - 81
[5] Towards the explainability of Multimodal Speech Emotion Recognition
Kumar, Puneet
Kaushik, Vishesh
Raman, Balasubramanian
INTERSPEECH 2021, 2021, : 1748 - 1752
[6] Speech emotion recognition using multimodal feature fusion with machine learning approach
Sandeep Kumar Panda
Ajay Kumar Jena
Mohit Ranjan Panda
Susmita Panda
Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
[7] Speech emotion recognition using multimodal feature fusion with machine learning approach
Panda, Sandeep Kumar
Jena, Ajay Kumar
Panda, Mohit Ranjan
Panda, Susmita
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
[8] Multimodal emotion recognition from expressive faces, body gestures and speech
Caridakis, George
Castellano, Ginevra
Kessous, Loic
Raouzaiou, Amaryllis
Malatesta, Lori
Asteriadis, Stelios
Karpouzis, Kostas
ARTIFICIAL INTELLIGENCE AND INNOVATIONS 2007: FROM THEORY TO APPLICATIONS, 2007, : 375 - +
[9] Annotations from speech and heart rate: impact on multimodal emotion recognition
Sharma, Kaushal
Chanel, Guillaume
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 51 - 59
[10] A multimodal hierarchical approach to speech emotion recognition from audio and text
Singh, Prabhav
Srivastava, Ridam
Rana, K. P. S.
Kumar, Vineet
KNOWLEDGE-BASED SYSTEMS, 2021, 229

← 1 2 3 4 5 →