CHEAVD: a Chinese natural emotional audio–visual database

被引:11
作者
Ya Li
Jianhua Tao
Linlin Chao
Wei Bao
Yazhu Liu
机构
[1] Chinese Academy of Sciences,National Laboratory of Pattern Recognition (NLPR), Institute of Automation
[2] Chinese Academy of Sciences,CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation
[3] Graduate University of Chinese Academy of Sciences,School of Computer and Control Engineering
[4] Jiangsu Normal University,Institute of Linguistic Sciences
来源
Journal of Ambient Intelligence and Humanized Computing | 2017年 / 8卷
关键词
Audio–visual database; Natural emotion; Corpus annotation; LSTM; Multimodal emotion recognition;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents a recently collected natural, multimodal, rich-annotated emotion database, CASIA Chinese Natural Emotional Audio–Visual Database (CHEAVD), which aims to provide a basic resource for the research on multimodal multimedia interaction. This corpus contains 140 min emotional segments extracted from films, TV plays and talk shows. 238 speakers, aging from child to elderly, constitute broad coverage of speaker diversity, which makes this database a valuable addition to the existing emotional databases. In total, 26 non-prototypical emotional states, including the basic six, are labeled by four native speakers. In contrast to other existing emotional databases, we provide multi-emotion labels and fake/suppressed emotion labels. To our best knowledge, this database is the first large-scale Chinese natural emotion corpus dealing with multimodal and natural emotion, and free to research use. Automatic emotion recognition with Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) is performed on this corpus. Experiments show that an average accuracy of 56 % could be achieved on six major emotion states.
引用
收藏
页码:913 / 924
页数:11
相关论文
共 67 条
[1]  
Barrett LF(1998)Discrete emotions or dimensions? The role of valence focus and arousal focus Cognit Emot 12 579-599
[2]  
Bengio Y(2012)Deep learning of representations for unsupervised and transfer learning Unsuperv Transf Learn Chall Mach Learn 7 19-359
[3]  
Busso C(2008)IEMOCAP: interactive emotional dyadic motion capture database Lang Res Eval 42 335-32
[4]  
Butler EA(2007)Emotion regulation and culture: are the social consequences of emotion suppression culture-specific? Emotion 7 30-422
[5]  
Lee TL(2003)Describing the emotional states that are expressed in speech Speech Commun 40 5-41
[6]  
Gross JJ(2005)Challenges in real-life emotion annotation and machine learning based detection Neural Netw 18 407-60
[7]  
Cowie R(2012)Collecting large, richly annotated facial-expression databases from movies IEEE MultiMedia 19 34-587
[8]  
Cornelius RR(2003)Emotional speech: towards a new generation of databases Speech Commun 40 33-486
[9]  
Devillers L(2011)Survey on speech emotion recognition: features, classification schemes, and databases Pattern Recogn 44 572-291
[10]  
Vidrascu L(2002)On the universality and cultural specificity of emotion recognition: a meta-analysis Psychol Bull 128 203-599