On the use of speech parameter contours for emotion recognition

被引:0
作者
Vidhyasaharan Sethu
Eliathamby Ambikairajah
Julien Epps
机构
[1] The University of New South Wales,The School of Electrical Engineering and Telecommunications
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2013卷
关键词
Emotion recognition; Paralinguistic information; Pitch contours; Formant contours; Glottal spectrum; Temporal information; LDC emotional prosody speech corpus;
D O I
暂无
中图分类号
学科分类号
摘要
Many features have been proposed for speech-based emotion recognition, and a majority of them are frame based or statistics estimated from frame-based features. Temporal information is typically modelled on a per utterance basis, with either functionals of frame-based features or a suitable back-end. This paper investigates an approach that combines both, with the use of temporal contours of parameters extracted from a three-component model of speech production as features in an automatic emotion recognition system using a hidden Markov model (HMM)-based back-end. Consequently, the proposed system models information on a segment-by-segment scale is larger than a frame-based scale but smaller than utterance level modelling. Specifically, linear approximations to temporal contours of formant frequencies, glottal parameters and pitch are used to model short-term temporal information over individual segments of voiced speech. This is followed by the use of HMMs to model longer-term temporal information contained in sequences of voiced segments. Listening tests were conducted to validate the use of linear approximations in this context. Automatic emotion classification experiments were carried out on the Linguistic Data Consortium emotional prosody speech and transcripts corpus and the FAU Aibo corpus to validate the proposed approach.
引用
收藏
相关论文
共 64 条
  • [1] Pantic M(2003)Toward an affect-sensitive multimodal human-computer interaction Proc IEEE 91 1370-1390
  • [2] Rothkrantz LJM(2006)Emotional speech recognition: resources, features, and methods Speech Communication 48 1162-1181
  • [3] Ververidis D(2007)Five emotion classes detection in real-world call center data: the use of various types of paralinguistic features, in Saarbrücken 3 11-16
  • [4] Kotropoulos C(2003)Recognition of emotions in interactive voice response systems, in Geneva 1–4 729-732
  • [5] Vidrascu L(2010)Class-level spectral features for emotion recognition Speech Communication 52 613-625
  • [6] Devillers L(2011)Survey on speech emotion recognition: features, classification schemes, and databases Pattern Recognition 44 572-587
  • [7] Yacoub S(2009)GTM-URL contribution to the INTERSPEECH 2009 Emotion Challenge, in Brighton 6–10 316-319
  • [8] Simske S(2011) ( Speech Communication 53 768-785
  • [9] Lin X(2009)) Brighton 6–10 344-347
  • [10] Burns J(2009)Automatic speech emotion recognition using modulation spectral features Brighton 6–10 2011-2014