Investigation of the Effect of Increased Dimension Levels in Speech Emotion Recognition

被引:4
作者
Wang, Haiyan [1 ]
Zhao, Xiaohui [1 ]
Zhao, Yanping [1 ]
机构
[1] Jilin Univ, Coll Commun Engn, Changchun 130012, Peoples R China
关键词
Emotion recognition; Speech recognition; Databases; Solid modeling; Support vector machines; Long short term memory; Licenses; LSTM network; multi-dimensional space; speech emotion recognition; FEATURES; CLASSIFICATION; SELECTION; MODEL;
D O I
10.1109/ACCESS.2022.3194039
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In human-machine interaction systems, speech emotion recognition plays a key role. Recognition of categorical emotions has made a great improvement during the last few decades, but emotion recognition of spontaneous speech is still very challenging. This paper aims to investigate emotion recognition from the spontaneous speech in the three-dimensional model. Each dimension represents one primitive, generic attribute of an emotion. Middle levels of each dimension were introduced in this paper. LSTM network was employed to estimate the dimensions due to its effectiveness in speech emotion recognition. In the experiments, we use the IEMOCAP database and the accuracy is 30-35%. The confusion matrixes show that our method leads to a more concentrated dimension location. Furthermore, dimensions were applied in categorical emotion recognition. This indicates that increasing dimension levels could provide a possibility of dimension estimation, and suggests that it is possible to promote speech emotion recognition with dimensions.
引用
收藏
页码:78123 / 78134
页数:12
相关论文
共 48 条
[1]   Domain Adversarial for Acoustic Emotion Recognition [J].
Abdelwahab, Mohammed ;
Busso, Carlos .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) :2423-2435
[2]   Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 [J].
Anagnostopoulos, Christos-Nikolaos ;
Iliou, Theodoros ;
Giannoukos, Ioannis .
ARTIFICIAL INTELLIGENCE REVIEW, 2015, 43 (02) :155-177
[3]  
Atmaja B. T., 1896, J PHYS C SOLID STATE
[4]  
Atmaja BT, 2019, ASIAPAC SIGN INFO PR, P519, DOI 10.1109/APSIPAASC47483.2019.9023098
[5]   Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition [J].
Bandela, Surekha Reddy ;
Kumar, T. Kishore .
APPLIED ACOUSTICS, 2021, 172 (172)
[6]   MEASURING EMOTION - THE SELF-ASSESSMENT MANNEQUIN AND THE SEMANTIC DIFFERENTIAL [J].
BRADLEY, MM ;
LANG, PJ .
JOURNAL OF BEHAVIOR THERAPY AND EXPERIMENTAL PSYCHIATRY, 1994, 25 (01) :49-59
[7]  
Burkhardt F, 2005, INTERSPEECH, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[8]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[9]   Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications [J].
Calvo, Rafael A. ;
D'Mello, Sidney .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2010, 1 (01) :18-37
[10]   Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction [J].
Chen, Luefeng ;
Su, Wanjuan ;
Feng, Yu ;
Wu, Min ;
She, Jinhua ;
Hirota, Kaoru .
INFORMATION SCIENCES, 2020, 509 :150-163