Improved speech emotion recognition with Mel frequency magnitude coefficient

被引:88
作者
Ancilin, J. [1 ]
Milton, A. [1 ]
机构
[1] St Xaviers Catholic Coll Engn, Dept Elect & Commun Engn, Nagercoil 629003, Tamil Nadu, India
关键词
Speech emotion recognition; Mel frequency magnitude coefficient; Speech feature; Speech signal processing; SPECTRAL FEATURES; FEATURE-SELECTION; CLASSIFICATION;
D O I
10.1016/j.apacoust.2021.108046
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech emotion recognition using machine learning is a demanding research topic in the field of affective computing. Identifying the speech features for speech emotion recognition is a challenging issue as the feature needs to emphasize the information about emotion from the speech. Spectral features play a vital role in emotion recognition from speech signals. In this paper, two modifications are made in the extraction of Mel frequency cepstral coefficient, they are, using magnitude spectrum instead of energy spectrum and exclusion of discrete cosine transform and extract Mel Frequency Magnitude Coefficient. Mel frequency magnitude coefficient is the log of magnitude spectrum on a non-linear Mel scale frequency. Mel frequency magnitude coefficient and three conventional spectral features, Mel frequency cepstral coefficient, log frequency power coefficient and linear prediction cepstral coefficient are tested on Berlin, Ravdess, Savee, EMOVO, eNTERFACE and Urdu databases with multiclass support vector machine as the classifier. Mel frequency magnitude coefficient as a stand alone feature recognizes emotion with an accuracy of 81.50% for Berlin, 64.31% for Ravdess, 75.63% for Savee, 73.30% for EMOVO, 56.41% for eNTERFACE and 95.25% for Urdu databases. Mel frequency magnitude coefficient is found to be the better spectral feature for the identification of emotion from speech compared to the conventional features. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 55 条
  • [1] Classification of speech dysfluencies with MFCC and LPCC features
    Ai, Ooi Chia
    Hariharan, M.
    Yaacob, Sazali
    Chee, Lim Sin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (02) : 2157 - 2165
  • [2] [Anonymous], 2007, INFORM MEDIA TECHNOL
  • [3] [Anonymous], 2010, Multimodal Emotion Recognition, DOI DOI 10.4018/978-1-61520-919-4
  • [4] [Anonymous], 2006, DAT ENG WORKSH 2006
  • [5] Beigi H, 2011, FUNDAMENTALS OF SPEAKER RECOGNITION, P1, DOI 10.1007/978-0-387-77592-0
  • [6] Class-level spectral features for emotion recognition
    Bitouk, Dmitri
    Verma, Ragini
    Nenkova, Ani
    [J]. SPEECH COMMUNICATION, 2010, 52 (7-8) : 613 - 625
  • [7] Formant position based weighted spectral features for emotion recognition
    Bozkurt, Elif
    Erzin, Engin
    Erdem, Cigdem Eroglu
    Erdem, A. Tanju
    [J]. SPEECH COMMUNICATION, 2011, 53 (9-10) : 1186 - 1197
  • [8] Burkhardt F., 2005, P 9 EUR C SPEECH COM, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
  • [9] Support vector machines employing cross-correlation for emotional speech recognition
    Chandaka, Suryannarayana
    Chatterjee, Amitava
    Munshi, Sugata
    [J]. MEASUREMENT, 2009, 42 (04) : 611 - 618
  • [10] Speech emotion recognition: Features and classification models
    Chen, Lijiang
    Mao, Xia
    Xue, Yuli
    Cheng, Lee Lung
    [J]. DIGITAL SIGNAL PROCESSING, 2012, 22 (06) : 1154 - 1160