Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition

被引：4

作者：

Sridhar, Kusha ^{[1
]}

Busso, Carlos ^{[1
]}

机构：

[1] Univ Texas Dallas, Dept Elect & Comp Engn, Multimodal Signal Proc MSP Lab, Richardson, TX 75080 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

Speech Emotion Recognition; Monte Carlo Dropout; Semi-Supervised learning; Teacher-Student network; CORPUS;

D O I：

10.21437/Interspeech.2020-2694

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Reliable and generalizable speech emotion recognition (SER) systems have wide applications in various fields including healthcare, customer service, and security and defense. Towards this goal, this study presents a novel teacher-student (T-S) framework for SER, relying on an ensemble of probabilistic predictions of teacher embeddings to train an ensemble of students. We use uncertainty modeling with Monte-Carlo (MC) dropout to create a distribution for the embeddings of an intermediate dense layer of the teacher. The embeddings guiding the student models are derived by sampling from this distribution. The final prediction combines the results obtained by the student ensemble. The proposed model not only increases the prediction performance over the teacher model, but also generates more consistent predictions. As a T-S formulation, the approach allows the use of unlabeled data to improve the performance of the students in a semi-supervised manner. An ablation analysis shows the importance of the MC-based ensemble and the use of unlabeled data. The results show relative improvements in concordance correlation coefficient (CCC) up to 4.25% for arousal, 2.67% for valence and 4.98% for dominance from their baseline results. The results also show that the student ensemble decreases the uncertainty in the predictions, leading to more consistent results.

引用

页码：516 / 520

页数：5

共 33 条

[1] Abdelwahab M., 2019, INT CONF AFFECT, P441
[2] Domain Adversarial for Acoustic Emotion Recognition
Abdelwahab, Mohammed
Busso, Carlos
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2423 - 2435
[3] Abdelwahab M, 2017, INT CONF ACOUST SPEE, P5000, DOI 10.1109/ICASSP.2017.7953108
[4] Abdelwahab M, 2017, INT CONF ACOUST SPEE, P5160, DOI 10.1109/ICASSP.2017.7953140
[5] Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Albanie, Samuel
Nagrani, Arsha
Vedaldi, Andrea
Zisserman, Andrew
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 292 - 301
[6] Increasing the Reliability of Crowdsourcing Evaluations Using Online Quality Assessment
Burmania, Alec
Parthasarathy, Srinivas
Busso, Carlos
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (04) : 374 - 388
[7] Busso C., 2013, Social Emotions in Nature and Artifact: Emotions in Human and Human-Computer Interaction, P110
[8] Eyben F., 2010, P 18 ACM INT C MULT, P1459
[9] Gal Y, 2016, PR MACH LEARN RES, V48
[10] Gideon J., 2020, IEEE Transactions on Affective Computing

← 1 2 3 4 →