SPEECH EMOTION RECOGNITION USING AUTOENCODER BOTTLENECK FEATURES AND LSTM

被引：0

作者：

Huang, Kun-Yi ^{[1
]}

Wu, Chung-Hsien ^{[1
]}

Yang, Tsung-Hsien ^{[1
]}

Su, Ming-Hsiang ^{[1
]}

Chou, Jia-Hui ^{[1
]}

机构：

[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan

来源：

2016 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT) | 2018年

关键词：

Speech emotion recognition; bottleneck features; long-short term memory;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.

引用

页码：1 / 4

页数：4

共 16 条

[1] Deep Scattering Spectrum
Anden, Joakim
Mallat, Stephane
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) : 4114 - 4128
[2] Andrej K., 2014, ARXIV14122306
[3] [Anonymous], 1997, Neural Computation
[4] [Anonymous], 1997, Affective Computing
[5] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
BENGIO, Y
SIMARD, P
FRASCONI, P
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
[6] BERINGER N, 2004, P 8 INT C SPOK LANG, P2233
[7] Eyben F., 2010, P 18 ACM INT C MULT, P1459
[8] Reducing the dimensionality of data with neural networks
Hinton, G. E.
Salakhutdinov, R. R.
[J]. SCIENCE, 2006, 313 (5786) : 504 - 507
[9] Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition
Lin, Jen-Chun
Wu, Chung-Hsien
Wei, Wen-Li
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (01) : 142 - 156
[10] Mower E, 2011, INT CONF ACOUST SPEE, P2372

← 1 2 →