Real Life Emotion Classification using Spectral Features and Gaussian Mixture Models

被引:5
作者
Koolagudi, Shashidhar G. [1 ]
Barthwal, Anurag [1 ]
Devliyal, Swati [1 ]
Rao, K. Sreenivasa [2 ]
机构
[1] Graph Era Univ, Sch Comp, Dehra Dun 248002, Uttarakhand, India
[2] Indian Inst Technol, Kharagpur 721302, W Bengal, India
来源
INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING | 2012年 / 38卷
关键词
emotion classification; spectral features; GMM; MFCC; LPCC; text dependent emotion recognition; text independent emotion recognition; RECOGNITION; SPEECH;
D O I
10.1016/j.proeng.2012.06.447
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this work, spectral features are extracted for speech emotion classification. Mel frequency cepstral coefficients (MFCCs) are used as features. Gaussian mixture models (GMMs) are explored as classifiers. The emotions considered are anger, happy, neutral, sad and surprise. Semi-natural emotional database (Graphic Era University Semi Natural Emotion Speech Corpus) is collected from the dialogues of popular Hindi movies. Average emotion recognition performance, in the case of multiple speaker database is observed to be around 55.60%. Results of male, female, multiple male and multiple female speakers are compared to study the effect of speakers and gender on expression of emotions.
引用
收藏
页码:3892 / 3899
页数:8
相关论文
共 10 条
  • [1] Chauhan R, 2011, COMM COM INF SC, V168, P359
  • [2] Koolagudi S. G., 2009, COMMUNICATION COMPUT, V40
  • [3] Koolagudi S.G., 2009, IITKGP SESC SPEECH D
  • [4] Koolagudi Shashidhar G., 2011, P IEEE INT C DEV COM
  • [5] Li Y., 1998, Proceedings of International Conference on Spoken Language Processing, P2255
  • [6] Epoch Extraction From Speech Signals
    Murty, K. Sri Rama
    Yegnanarayana, B.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1602 - 1613
  • [7] Neiberg D, 2006, INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, P809
  • [8] Pao TL, 2005, LECT NOTES COMPUT SC, V3784, P279
  • [9] Rabiner L. R., 1993, Fundamentals of Speech Recognition
  • [10] Duration modification using glottal closure instants and vowel onset points
    Rao, K. Sreenivasa
    Yegnanarayana, B.
    [J]. SPEECH COMMUNICATION, 2009, 51 (12) : 1263 - 1269