Speech emotion recognition based on emotion perception

被引:0
作者
Liu, Gang [1 ]
Cai, Shifang [1 ]
Wang, Ce [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Speech emotion recognition; Emotion perception; Implicit emotional attribute; Multi-task learning; ATTENTION;
D O I
10.1186/s13636-023-00289-4
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition (SER) is a hot topic in speech signal processing. With the advanced development of the cheap computing power and proliferation of research in data-driven methods, deep learning approaches are prominent solutions to SER nowadays. SER is a challenging task due to the scarcity of datasets and the lack of emotion perception. Most existing networks of SER are based on computer vision and natural language processing, so the applicability for extracting emotion is not strong. Drawing on the research results of brain science on emotion computing and inspired by the emotional perceptive process of the human brain, we propose an approach based on emotional perception, which designs a human-like implicit emotional attribute classification and introduces implicit emotional information through multi-task learning. Preliminary experiments show that the unweighted accuracy (UA) of the proposed method has increased by 2.44%, and weighted accuracy (WA) 3.18% (both absolute values) on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset, which verifies the effectiveness of our method.
引用
收藏
页数:7
相关论文
共 31 条
[1]   Feature-based representations of emotional facial expressions in the human amygdala [J].
Ahs, Fredrik ;
Davis, Caroline F. ;
Gorka, Adam X. ;
Hariri, Ahmad R. .
SOCIAL COGNITIVE AND AFFECTIVE NEUROSCIENCE, 2014, 9 (09) :1372-1378
[2]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[3]   3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition [J].
Chen, Mingyi ;
He, Xuanji ;
Yang, Jing ;
Zhang, Han .
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) :1440-1444
[4]   Multi-agent System Engineering for Emphatic Human-Robot Interaction [J].
Costantini, Stefania ;
De Gasperis, Giovanni ;
Migliarini, Patrizio .
2019 IEEE SECOND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE), 2019, :36-42
[5]  
Fusar-Poli P, 2009, J PSYCHIATR NEUROSCI, V34, P418
[6]  
Han K, 2014, INTERSPEECH, P223
[7]   Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition [J].
Jalal, Md Asif ;
Loweimi, Erfan ;
Moore, Roger K. ;
Hain, Thomas .
INTERSPEECH 2019, 2019, :1701-1705
[8]   Rethinking the Emotional Brain [J].
LeDoux, Joseph .
NEURON, 2012, 73 (04) :653-676
[9]  
Lee J., 2015, P INT 2015 HIGH LEV
[10]  
Lee S., 2005, Proceedings of INTERSPEECH, P497