Combined Feature Representation for Emotion Classification from Russian Speech

被引:0
作者
Verkholyak, Oxana [1 ,2 ]
Karpov, Alexey [1 ,2 ]
机构
[1] ITMO Univ, St Petersburg, Russia
[2] SPIIRAS Inst, St Petersburg, Russia
来源
ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE | 2018年 / 789卷
关键词
Emotion classification; Long Short-Term Memory; Logistic regression; Principal Component Analysis;
D O I
10.1007/978-3-319-71746-3_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic feature extraction for emotion classification is possible on different levels. Frame-level features provide low-level description characteristics that preserve temporal structure of the utterance. On the other hand, utterance-level features represent functionals applied to the low-level descriptors and contain important information about speaker emotional state. Utterance-level features are particularly useful for determining emotion intensity, however, they lose information about temporal changes of the signal. Another drawback includes often insufficient number of feature vectors for complex classification tasks. One solution to overcome these problems is to combine the frame-level features and utterance-level features to take advantage of both methods. This paper proposes to obtain low-level feature representation feeding frame-level descriptor sequences to a Long ShortTerm Memory (LSTM) network, combine the outcome with the Principal Component Analysis (PCA) representation of utterance-level features, and make the final prediction with a logistic regression classifier.
引用
收藏
页码:68 / 73
页数:6
相关论文
共 12 条
[1]  
[Anonymous], 2002, Principal components analysis
[2]  
[Anonymous], THESIS
[3]  
[Anonymous], THESIS
[4]  
[Anonymous], P ANN C INT SPEECH C
[5]  
Eyben F., 2010, P 18 ACM INT C MULT, P1459
[6]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[7]   Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines [J].
Kaya, Heysem ;
Karpov, Alexey A. ;
Salah, Albert Ali .
ADVANCES IN NEURAL NETWORKS - ISNN 2016, 2016, 9719 :115-123
[8]  
Kim Y, 2013, INT CONF ACOUST SPEE, P3687, DOI 10.1109/ICASSP.2013.6638346
[9]  
Makarova V., 2002, PROC INT C SPOKEN LA, P2041
[10]   Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification [J].
Metallinou, Angeliki ;
Woellmer, Martin ;
Katsamanis, Athanasios ;
Eyben, Florian ;
Schuller, Bjoern ;
Narayanan, Shrikanth .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2012, 3 (02) :184-198