Study of Wavelet Packet Energy Entropy for Emotion Classification in Speech and Glottal Signals

被引:4
作者
He, Ling [1 ]
Lech, Margaret [2 ]
Zhang, Jing [1 ]
Ren, Xiaomei [1 ]
Deng, Lihua [1 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu, Peoples R China
[2] RMIT Univ, Sch Elect & Comp Engn, Melbourne, Vic, Australia
来源
FIFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2013) | 2013年 / 8878卷
基金
美国国家科学基金会;
关键词
Emotion recognition; feature extraction; wavelet packet energy entropy; perceptual wavelet packet; RECOGNITION; DEPRESSION; FEATURES;
D O I
10.1117/12.2030929
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).
引用
收藏
页数:6
相关论文
共 18 条
  • [11] Critical analysis of the impact of glottal features in the classification of clinical depression in speech
    Moore, Elliot, II
    Clements, Mark A.
    Peifer, John W.
    Weisser, Lydia
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2008, 55 (01) : 96 - 107
  • [12] Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk
    Ozdas, A
    Shiavi, RG
    Silverman, SE
    Silverman, MK
    Wilkes, DM
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2004, 51 (09) : 1530 - 1540
  • [13] ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS
    REYNOLDS, DA
    ROSE, RC
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01): : 72 - 83
  • [14] Vocal communication of emotion: A review of research paradigms
    Scherer, KR
    [J]. SPEECH COMMUNICATION, 2003, 40 (1-2) : 227 - 256
  • [15] AUTOMATIC GLOTTAL INVERSE FILTERING FROM SPEECH AND ELECTROGLOTTOGRAPHIC SIGNALS
    VEENEMAN, DE
    BEMENT, SL
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 369 - 377
  • [16] Emotional speech recognition: Resources, features, and methods
    Ververidis, Dimitrios
    Kotropoulos, Constantine
    [J]. SPEECH COMMUNICATION, 2006, 48 (09) : 1162 - 1181
  • [17] LEAST-SQUARES GLOTTAL INVERSE FILTERING FROM THE ACOUSTIC SPEECH WAVEFORM
    WONG, DY
    MARKEL, JD
    GRAY, AH
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (04): : 350 - 355
  • [18] Nonlinear feature based classification of speech under stress
    Zhou, GJ
    Hansen, JHL
    Kaiser, JF
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 201 - 216