Joint Discrete and Continuous Emotion Prediction Using Ensemble and End-to-End Approaches

被引：6

作者：

AlBadawy, Ehab A. ^{[1
]}

Kim, Yelin ^{[1
]}

机构：

[1] SUNY Albany, Albany, NY 12222 USA

来源：

ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2018年

关键词：

Emotion recognition; Continuous emotion prediction; Joint representation; Bidirectional long-short-term memory; RECOGNITION;

D O I：

10.1145/3242969.3242972

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel approach in continuous emotion prediction that characterizes dimensional emotion labels jointly with continuous and discretized representations. Continuous emotion labels can capture subtle emotion variations, but their inherent noise often has negative effects on model training. Recent approaches found a performance gain when converting the continuous labels into a discrete set (e.g., using k-means clustering), despite a label quantization error. To find the optimal trade-off between the continuous and discretized emotion representations, we investigate two joint modeling approaches: ensemble and end-to-end. The ensemble model combines the predictions from two models that are trained separately, one with discretized prediction and the other with continuous prediction. On the other hand, the end-to-end model is trained to simultaneously optimize both discretized and continuous prediction tasks in addition to the final combination between them. Our experimental results using the state-of-the-art deep BLSTM network on the RECOLA dataset demonstrate that (i) the joint representation outperforms both individual representation baselines and the state-of-the-art speech-based results on RECOLA, validating the assumption that combining continuous and discretized emotion representations yields better performance in emotion prediction; and (ii) the joint representation can help to accelerate convergence, particularly for valence prediction. Our work provides insights into joint discrete and continuous emotion representation and its efficacy for describing dynamically changing affective behavior in valence and activation prediction.

引用

页码：366 / 375

页数：10

共 52 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

[Anonymous], 2015, Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge

[3]

[Anonymous], 2017, ABS170408305 CORR

[4]

[Anonymous], 2017, ARXIV170807050

[5]

[Anonymous], PROC CVPR IEEE

[6]

[Anonymous], 2017, Predicting the distribution of emotion perception: capturing inter-rater variability in, DOI DOI 10.1145/3136755.3136792

[7]

[Anonymous], 1997, Neural Computation

[8]

[Anonymous], 2014, TRAINING

[9]

[Anonymous], 2015, P AVEC15 BRISB AUSTR

[10]

[Anonymous], 2015, P 5 INT WORKSHOP AUD, DOI DOI 10.1145/2808196.2811634

← 1 2 3 4 5 6 →