Emotion Recognition from Human Speech Using Temporal Information and Deep Learning

被引:33
作者
Kim, John W. [1 ]
Saurous, Rif A. [2 ]
机构
[1] Menlo Sch, Atherton, CA USA
[2] Google Inc, Mountain View, CA USA
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
emotion recognition; temporal information; deep learning; CNN; LSTM;
D O I
10.21437/Interspeech.2018-1132
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition by machine is a challenging task, but it has great potential to make empathic human-machine communications possible. In conventional approaches that consist of feature extraction and classifier stages, extensive studies have devoted their effort to developing good feature representations, but relatively little effort was made to make proper use of the important temporal information in these features. In this paper, we propose a model combining features known to be useful for emotion recognition and deep neural networks to exploit temporal information when recognizing emotion status. A benchmark evaluation on EMO-DB demonstrates that the proposed model achieves a state-of-the-art performance of 88.9% recognition rate.
引用
收藏
页码:937 / 940
页数:4
相关论文
共 14 条
[1]  
Burkhardt F., 2005, Interspeech, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[2]  
Chaspari T., 2014, P EUR SIGN PROC C EU
[3]  
Chollet F., 2015, about us
[4]  
Eyben F., 2016, IEEE T AFFECTIVE COM, V7
[5]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[6]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[7]  
Kalinli O., 2016, P INTERSPEECH
[8]  
Kingma D. P., P 3 INT C LEARN REPR
[9]  
Lotfidereshgi R., 2016, P ICASSP
[10]   TOWARD THE SIMULATION OF EMOTION IN SYNTHETIC SPEECH - A REVIEW OF THE LITERATURE ON HUMAN VOCAL EMOTION [J].
MURRAY, IR ;
ARNOTT, JL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1993, 93 (02) :1097-1108