Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions

被引：0

作者：

Yang Liu

Haoqin Sun

Wenbo Guan

Yuqi Xia

Zhen Zhao

机构：

[1] Qingdao University of Science and Technology,School of Information Science and Technology

来源：

Machine Intelligence Research | 2023年 / 20卷

关键词：

Speech emotion recognition (SER); 3-dimensional (3D) feature; cascaded attention network (CAN); triplet loss; joint loss;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Due to the complexity of emotional expression, recognizing emotions from the speech is a critical and challenging task. In most of the studies, some specific emotions are easily classified incorrectly. In this paper, we propose a new framework that integrates cascade attention mechanism and joint loss for speech emotion recognition (SER), aiming to solve feature confusions for emotions that are difficult to be classified correctly. First, we extract the mel frequency cepstrum coefficients (MFCCs), deltas, and delta-deltas from MFCCs to form 3-dimensional (3D) features, thus effectively reducing the interference of external factors. Second, we employ spatiotemporal attention to selectively discover target emotion regions from the input features, where self-attention with head fusion captures the long-range dependency of temporal features. Finally, the joint loss function is employed to distinguish emotional embeddings with high similarity to enhance the overall performance. Experiments on interactive emotional dyadic motion capture (IEMOCAP) database indicate that the method achieves a positive improvement of 2.49% and 1.13% in weighted accuracy (WA) and unweighted accuracy (UA), respectively, compared to the state-of-the-art strategies.

引用

页码：595 / 604

页数：9

共 50 条

[21] AN INTERACTION-AWARE ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION IN SPOKEN DIALOGS
Yeh, Sung-Lin
Lin, Yun-Shao
Lee, Chi-Chun
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6685 - 6689
[22] A novel conversational hierarchical attention network for speech emotion recognition in dyadic conversation
Tellai, Mohammed
Gao, Lijian
Mao, Qirong
Abdelaziz, Mounir
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59699 - 59723
[23] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
Sun, Licai
Liu, Bin
Tao, Jianhua
Lian, Zheng
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
[24] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
Baruah, Murchana
Banerjee, Bonny
INTERSPEECH 2022, 2022, : 4710 - 4714
[25] The Impact of Attention Mechanisms on Speech Emotion Recognition
Chen, Shouyan
Zhang, Mingyan
Yang, Xiaofen
Zhao, Zhijia
Zou, Tao
Sun, Xinqi
SENSORS, 2021, 21 (22)
[26] Self-attention for Speech Emotion Recognition
Tarantino, Lorenzo
Garner, Philip N.
Lazaridis, Alexandros
INTERSPEECH 2019, 2019, : 2578 - 2582
[27] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
Mountzouris, Konstantinos
Perikos, Isidoros
Hatzilygeroudis, Ioannis
Corchado, Juan M.
Iglesias, Carlos A.
Kim, Byung-Gyu
Mehmood, Rashid
Ren, Fuji
Lee, In
ELECTRONICS, 2023, 12 (20)
[28] SPEECH EMOTION RECOGNITION USING MULTI-HOP ATTENTION MECHANISM
Yoon, Seunghyun
Byun, Seokhyun
Dey, Subhadeep
Jung, Kyomin
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2822 - 2826
[29] A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism
Lieskovska, Eva
Jakubec, Maros
Jarina, Roman
Chmulik, Michal
ELECTRONICS, 2021, 10 (10)
[30] Coordination Attention based Transformers with bidirectional contrastive loss for multimodal speech emotion recognition
Fan, Weiquan
Xu, Xiangmin
Zhou, Guohua
Deng, Xiaofang
Xing, Xiaofen
SPEECH COMMUNICATION, 2025, 169

← 1 2 3 4 5 →