STUDY OF DENSE NETWORK APPROACHES FOR SPEECH EMOTION RECOGNITION

被引:0
作者
Abdelwahab, Mohammed [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Dept Elect Comp Engn, Multimodal Signal Proc MSP Lab, Richardson, TX 75080 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Speech emotion recognition; Deep Neural Networks; NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks have been proven to be very effective in various classification problems and show great promise for emotion recognition from speech. Studies have proposed various architectures that further improve the performance of emotion recognition systems. However, there are still various open questions regarding the best approach to building a speech emotion recognition system. Would the system's performance improve if we have more labeled data? How much do we benefit from data augmentation? What activation and regularization schemes are more beneficial? How does the depth of the network affect the performance? We are collecting the MSP-Podcast corpus, a large dataset with over 30 hours of data, which provides an ideal resource to address these questions. This study explores various dense architectures to predict arousal, valence and dominance scores. We investigate varying the training set size, width, and depth of the network, as well as the activation functions used during training. We also study the effect of data augmentation on the network's performance. We find that bigger training set improves the performance. Batch normalization is crucial to achieving a good performance for deeper networks. We do not observe significant differences in the performance in residual networks compared to dense networks.
引用
收藏
页码:5084 / 5088
页数:5
相关论文
共 29 条
  • [1] Aldeneh Z, 2017, INT CONF ACOUST SPEE, P2741, DOI 10.1109/ICASSP.2017.7952655
  • [2] [Anonymous], 2015, P 3 INT C LEARNING R
  • [3] [Anonymous], 2015, Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge
  • [4] [Anonymous], 2015, ARXIV PREPRINT ARXIV
  • [5] [Anonymous], PROC CVPR IEEE
  • [6] [Anonymous], 2016, P 4 INT C LEARN REPR
  • [7] [Anonymous], IEEE T AFFECTIVE COM
  • [8] [Anonymous], 2012, P INTERSPEECH
  • [9] [Anonymous], 2013, ICML
  • [10] Increasing the Reliability of Crowdsourcing Evaluations Using Online Quality Assessment
    Burmania, Alec
    Parthasarathy, Srinivas
    Busso, Carlos
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (04) : 374 - 388