Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition

被引:4
作者
Sridhar, Kusha [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Dept Elect & Comp Engn, Multimodal Signal Proc MSP Lab, Richardson, TX 75080 USA
来源
INTERSPEECH 2020 | 2020年
关键词
Speech Emotion Recognition; Monte Carlo Dropout; Semi-Supervised learning; Teacher-Student network; CORPUS;
D O I
10.21437/Interspeech.2020-2694
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Reliable and generalizable speech emotion recognition (SER) systems have wide applications in various fields including healthcare, customer service, and security and defense. Towards this goal, this study presents a novel teacher-student (T-S) framework for SER, relying on an ensemble of probabilistic predictions of teacher embeddings to train an ensemble of students. We use uncertainty modeling with Monte-Carlo (MC) dropout to create a distribution for the embeddings of an intermediate dense layer of the teacher. The embeddings guiding the student models are derived by sampling from this distribution. The final prediction combines the results obtained by the student ensemble. The proposed model not only increases the prediction performance over the teacher model, but also generates more consistent predictions. As a T-S formulation, the approach allows the use of unlabeled data to improve the performance of the students in a semi-supervised manner. An ablation analysis shows the importance of the MC-based ensemble and the use of unlabeled data. The results show relative improvements in concordance correlation coefficient (CCC) up to 4.25% for arousal, 2.67% for valence and 4.98% for dominance from their baseline results. The results also show that the student ensemble decreases the uncertainty in the predictions, leading to more consistent results.
引用
收藏
页码:516 / 520
页数:5
相关论文
共 33 条
  • [1] Abdelwahab M., 2019, INT CONF AFFECT, P441
  • [2] Domain Adversarial for Acoustic Emotion Recognition
    Abdelwahab, Mohammed
    Busso, Carlos
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2423 - 2435
  • [3] Abdelwahab M, 2017, INT CONF ACOUST SPEE, P5000, DOI 10.1109/ICASSP.2017.7953108
  • [4] Abdelwahab M, 2017, INT CONF ACOUST SPEE, P5160, DOI 10.1109/ICASSP.2017.7953140
  • [5] Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
    Albanie, Samuel
    Nagrani, Arsha
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 292 - 301
  • [6] Increasing the Reliability of Crowdsourcing Evaluations Using Online Quality Assessment
    Burmania, Alec
    Parthasarathy, Srinivas
    Busso, Carlos
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (04) : 374 - 388
  • [7] Busso C., 2013, Social Emotions in Nature and Artifact: Emotions in Human and Human-Computer Interaction, P110
  • [8] Eyben F., 2010, P 18 ACM INT C MULT, P1459
  • [9] Gal Y, 2016, PR MACH LEARN RES, V48
  • [10] Gideon J., 2020, IEEE Transactions on Affective Computing