MULTIMODAL TRANSFORMER WITH LEARNABLE FRONTEND AND SELF ATTENTION FOR EMOTION RECOGNITION

被引：11

作者：

Dutta, Soumya

Ganapathy, Sriram

机构：

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Multi-modal emotion recognition; Transformer networks; self-attention models; learnable front-end; SENTIMENT ANALYSIS; FUSION;

D O I：

10.1109/ICASSP43922.2022.9747723

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we propose a novel approach for multi-modal emotion recognition from conversations using speech and text. The audio representations are learned jointly with a learnable audio front-end (LEAF) model feeding to a CNN based classifier. The text representations are derived from pre-trained bidirectional encoder representations from transformer (BERT) along with a gated recurrent network (GRU). Both the textual and audio representations are separately processed using a bidirectional GRU network with self-attention. Further, the multi-modal information extraction is achieved using a transformer that is input with the textual and audio embeddings at the utterance level. The experiments are performed on the IEMO-CAP database, where we show that the proposed framework improves over the current state-of-the-art results under all the common test settings. This is primarily due to the improved emotion recognition performance achieved in the audio domain. Further, we also show that the model is more robust to textual errors caused by an automatic speech recognition (ASR) system.

引用

页码：6917 / 6921

页数：5

共 50 条

[21] Multimodal sentiment and emotion recognition in hyperbolic space
Arano, Keith April
Orsenigo, Carlotta
Soto, Mauricio
Vercellis, Carlo
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184
[22] Multimodal emotion recognition based on audio and text by using hybrid attention networks
Zhang, Shiqing
Yang, Yijiao
Chen, Chen
Liu, Ruixin
Tao, Xin
Guo, Wenping
Xu, Yicheng
Zhao, Xiaoming
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
[23] Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
Zaidi, Syed Aun Muhammad
Latif, Siddique
Qadir, Junaid
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 684 - 693
[24] Self-supervised Multimodal Emotion Recognition Combining Temporal Attention Mechanism and Unimodal Label Automatic Generation Strategy
Sun Q.
Wang S.
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 588 - 601
[25] Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
Xie, Baijun
Sidulova, Mariia
Park, Chung Hyuk
SENSORS, 2021, 21 (14)
[26] SS-Trans (Single-Stream Transformer for Multimodal Sentiment Analysis and Emotion Recognition): The Emotion Whisperer-A Single-Stream Transformer for Multimodal Sentiment Analysis
Ji, Mingyu
Wei, Ning
Zhou, Jiawei
Wang, Xin
ELECTRONICS, 2024, 13 (21)
[27] Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
Liu, Pengfei
Li, Kun
Meng, Helen
INTERSPEECH 2020, 2020, : 379 - 383
[28] Multi-Label Multimodal Emotion Recognition With Transformer-Based Fusion and Emotion-Level Representation Learning
Le, Hoai-Duy
Lee, Guee-Sang
Kim, Soo-Hyung
Kim, Seungwon
Yang, Hyung-Jeong
IEEE ACCESS, 2023, 11 : 14742 - 14751
[29] Multimodal Emotion Recognition Using Deep Generalized Canonical Correlation Analysis with an Attention Mechanism
Lan, Yu-Ting
Liu, Wei
Lu, Bao-Liang
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[30] A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition
Gong, Peizhu
Liu, Jin
Wu, Zhongdai
Han, Bing
Wang, Y. Ken
He, Huihua
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4203 - 4220

← 1 2 3 4 5 →