A Three-Layer Emotion Perception Model for Valence and Arousal-Based Detection from Multilingual Speech

被引:9
作者
Li, Xingfeng [1 ]
Akagi, Masato [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Japan
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
emotion recognition; emotion dimension; three-layer model; prosodic feature; spectrogram; glottal waveform; RECOGNITION; EXPRESSION; FEATURES; QUALITY;
D O I
10.21437/Interspeech.2018-1820
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated emotion detection from speech has recently shifted from monolingual to multilingual tasks for human-like interaction in real-life where a system can handle more than a single input language. However, most work on monolingual emotion detection is difficult to generalize in multiple languages, because the optimal feature sets of the work differ from one language to another. Our study proposes a framework to design, implement, and validate an emotion detection system using multiple corpora. A continuous dimensional space of valence and arousal is first used to describe the emotions. A three-layer model incorporated with fuzzy inference systems is then used to estimate two dimensions. Speech features derived from prosodic, spectral, and glottal waveform are examined and selected to capture emotional cues. The results of this new system outperformed the existing state-of-the-art system by yielding a smaller mean absolute error and higher correlation between estimates and human evaluators. Moreover, results for speaker independent validation are comparable to human evaluators.
引用
收藏
页码:3643 / 3647
页数:5
相关论文
共 31 条
  • [1] [Anonymous], 2000, ENGLISHAND JAPANESE
  • [2] [Anonymous], INTERSPEECH 2016 17
  • [3] [Anonymous], 2008, SPRINGER HDB SPEECH
  • [4] [Anonymous], J ACOUSTICAL SOC AM
  • [5] [Anonymous], AC SPEECH SIGN PROC
  • [6] [Anonymous], INTERSPEECH
  • [7] [Anonymous], TRANSCR 11 S CAR P 1
  • [8] Brunswik E., 1956, The Scientific Monthly, V83, P151
  • [9] Calix RA, 2011, LECT NOTES COMPUT SC, V6975, P323, DOI 10.1007/978-3-642-24571-8_43
  • [10] Describing the emotional states that are expressed in speech
    Cowie, R
    Cornelius, RR
    [J]. SPEECH COMMUNICATION, 2003, 40 (1-2) : 5 - 32