A Multitask Approach to Continuous Five-Dimensional Affect Sensing in Natural Speech

被引:37
作者
Eyben, Florian [1 ]
Woellmer, Martin [1 ]
Schuller, Bjoern [1 ]
机构
[1] TUM, Inst Human Machine Commun, Munich, Germany
关键词
Algorithms; Experimentation; Human Factors; Neural networks; long short-term memory; emotion recognition; audio features; SEMAINE; dimensional affect;
D O I
10.1145/2133366.2133372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic affect recognition is important for the ability of future technical systems to interact with us socially in an intelligent way by understanding our current affective state. In recent years there has been a shift in the field of affect recognition from "in the lab" experiments with acted data to "in the wild" experiments with spontaneous and naturalistic data. Two major issues thereby are the proper segmentation of the input and adequate description and modeling of affective states. The first issue is crucial for responsive, real-time systems such as virtual agents and robots, where the latency of the analysis must be as small as possible. To address this issue we introduce a novel method of incremental segmentation to be used in combination with supra-segmental modeling. For modeling of continuous affective states we use Long Short-Term Memory Recurrent Neural Networks, with which we can show an improvement in performance over standard recurrent neural networks and feed-forward neural networks as well as Support Vector Regression. For experiments we use the SEMAINE database, which contains recordings of spontaneous and natural human to Wizard-of-Oz conversations. The recordings are annotated continuously in time and magnitude with FeelTrace for five affective dimensions, namely activation, expectation, intensity, power/dominance, and valence. To exploit dependencies between the five affective dimensions we investigate multitask learning of all five dimensions augmented with inter-rater standard deviation. We can show improvements for multitask over single-task modeling. Correlation coefficients of up to 0.81 are obtained for the activation dimension and up to 0.58 for the valence dimension. The performance for the remaining dimensions were found to be in between that for activation and valence.
引用
收藏
页码:1 / 29
页数:29
相关论文
共 71 条
[1]   Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach [J].
Batliner, Anton ;
Seppi, Dino ;
Steidl, Stefan ;
Schuller, Bjoern .
ADVANCES IN HUMAN-COMPUTER INTERACTION, 2010, 2010
[2]   Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech [J].
Batliner, Anton ;
Steidl, Stefan ;
Schuller, Bjoern ;
Seppi, Dino ;
Vogt, Thurid ;
Wagner, Johannes ;
Devillers, Laurence ;
Vidrascu, Laurence ;
Aharonson, Vered ;
Kessous, Loic ;
Amir, Noam .
COMPUTER SPEECH AND LANGUAGE, 2011, 25 (01) :4-28
[3]  
Busso C, 2007, INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, P2304
[4]  
Caridakis G., 2006, PROC 8 INT C MULTIMO, P146, DOI [10.1145/1180995.1181029, DOI 10.1145/1180995.1181029]
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]  
Cohen P., 2014, APPL MULTIPLE REGRES
[7]  
Cowie R., 2000, PROC ISCA TUT RES WO, P19
[8]   Challenges in real-life emotion annotation and machine learning based detection [J].
Devillers, L ;
Vidrascu, L ;
Lamel, L .
NEURAL NETWORKS, 2005, 18 (04) :407-422
[9]  
Douglas-Cowie E, 2007, LECT NOTES COMPUT SC, V4738, P488
[10]  
Ekman P., 2003, ISHK