EMOCEPTION: AN INCEPTION INSPIRED EFFICIENT SPEECH EMOTION RECOGNITION NETWORK

被引:0
作者
Singh, Chirag [1 ]
Kumar, Abhay [1 ]
Nagar, Ajay [1 ]
Tripathi, Suraj [1 ]
Yenigalla, Promod [1 ]
机构
[1] Samsung R&D Inst India, Bangalore, Karnataka, India
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
Speech Emotion Recognition; Inception; Multi-Task Learning; CNN;
D O I
10.1109/asru46091.2019.9004020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research proposes a Deep Neural Network architecture for Speech Emotion Recognition called Emoception, which takes inspiration from Inception modules. The network takes speech features like Mel-Frequency Spectral Coefficients (MFSC) or Mel-Frequency Cepstral Coefficients (MFCC) as input and recognizes the relevant emotion in the speech. We use USC-IEMOCAP dataset for training but the limited amount of training data and large depth of the network makes the network prone to overfitting, reducing validation accuracy. The Emoception network overcomes this problem by extending in width without increase in computational cost. We also employ a powerful regularization technique, Multi-Task Learning (MTL) to make the network robust. The model using MFSC input with MTL increases the accuracy by 1.6% vis-a-vis Emoception without MTL. We report an overall accuracy improvement of around 4.6% compared to the existing state-of-art methods for four emotion classes on IEMOCAP dataset.
引用
收藏
页码:787 / 791
页数:5
相关论文
共 20 条
[1]  
[Anonymous], 2017, ARXIV170709917
[2]   A model of inductive bias learning [J].
Baxter, J .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 12 :149-198
[3]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[4]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[5]   The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing [J].
Eyben, Florian ;
Scherer, Klaus R. ;
Schuller, Bjoern W. ;
Sundberg, Johan ;
Andre, Elisabeth ;
Busso, Carlos ;
Devillers, Laurence Y. ;
Epps, Julien ;
Laukka, Petri ;
Narayanan, Shrikanth S. ;
Truong, Khiet P. .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (02) :190-202
[6]  
Han K, 2014, INTERSPEECH, P223
[7]  
Jin Q, 2015, INT CONF ACOUST SPEE, P4749, DOI 10.1109/ICASSP.2015.7178872
[8]  
Kim Y, 2013, INT CONF ACOUST SPEE, P3677, DOI 10.1109/ICASSP.2013.6638344
[9]  
Klambauer G., 2017, Advances in Neural Information Processing Systems, V30, P972
[10]  
Kumar Abhishek, 2012, ICML, P1723