SPEECH EMOTION RECOGNITION USING AUTOENCODER BOTTLENECK FEATURES AND LSTM

被引:0
作者
Huang, Kun-Yi [1 ]
Wu, Chung-Hsien [1 ]
Yang, Tsung-Hsien [1 ]
Su, Ming-Hsiang [1 ]
Chou, Jia-Hui [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
来源
2016 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT) | 2018年
关键词
Speech emotion recognition; bottleneck features; long-short term memory;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.
引用
收藏
页码:1 / 4
页数:4
相关论文
共 16 条
  • [1] Deep Scattering Spectrum
    Anden, Joakim
    Mallat, Stephane
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) : 4114 - 4128
  • [2] Andrej K., 2014, ARXIV14122306
  • [3] [Anonymous], 1997, Neural Computation
  • [4] [Anonymous], 1997, Affective Computing
  • [5] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
    BENGIO, Y
    SIMARD, P
    FRASCONI, P
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
  • [6] BERINGER N, 2004, P 8 INT C SPOK LANG, P2233
  • [7] Eyben F., 2010, P 18 ACM INT C MULT, P1459
  • [8] Reducing the dimensionality of data with neural networks
    Hinton, G. E.
    Salakhutdinov, R. R.
    [J]. SCIENCE, 2006, 313 (5786) : 504 - 507
  • [9] Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition
    Lin, Jen-Chun
    Wu, Chung-Hsien
    Wei, Wen-Li
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (01) : 142 - 156
  • [10] Mower E, 2011, INT CONF ACOUST SPEE, P2372