Articulation constrained learning with application to speech emotion recognition

被引:0
作者
Mohit Shah
Ming Tu
Visar Berisha
Chaitali Chakrabarti
Andreas Spanias
机构
[1] School of Electrical,
[2] Computer,undefined
[3] and Energy Engineering,undefined
[4] Arizona State University,undefined
[5] Speech and Hearing Science Department,undefined
[6] Arizona State University,undefined
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2019卷
关键词
Emotion recognition; Articulation; Constrained optimization; Cross-corpus;
D O I
暂无
中图分类号
学科分类号
摘要
Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ1-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/, /AE/, /IY/, /UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.
引用
收藏
相关论文
共 109 条
  • [1] Cowie R.(2003)Describing the emotional states that are expressed in speech Speech Comm. 40 5-32
  • [2] Cornelius R. R.(1986)Vocal affect expression: a review and a model for future research Psychol. Bull. 99 143-1233
  • [3] Scherer K. R.(2004)The emotional integration of childhood experience: physiological, facial expressive, and self-reported emotional response during the adult attachment interview Dev. Psychol. 40 776-58
  • [4] Roisman G. I.(2013)Behavioral signal processing: deriving human behavioral informatics from speech and language Proc. IEEE 101 1203-1070
  • [5] Tsai J. L.(2009)A survey of affect recognition methods: audio, visual, and spontaneous expressions IEEE Trans. Patt. Anal. Mach. Intell. 31 39-417
  • [6] Chiang K. -H. S.(2011)A framework for automatic human emotion classification using emotion profiles IEEE Trans. Audio Speech Lang. Process. 19 1057-196
  • [7] Narayanan S.(1998)Articulatory correlates of prosodic control: Emotion and emphasis Lang. Speech 41 399-623
  • [8] Georgiou P. G.(2004)Measurements of articulatory variation in expressive speech for a set of swedish vowels Speech Comm. 44 187-303
  • [9] Zeng Z.(2003)Speech emotion recognition using hidden Markov models Speech Commun. 41 603-1087
  • [10] Pantic M.(2005)Toward detecting emotions in spoken dialogs IEEE Trans. Speech Audio Process 13 293-587