Improved speech emotion recognition with Mel frequency magnitude coefficient

被引：88

作者：

Ancilin, J. ^{[1
]}

Milton, A. ^{[1
]}

机构：

[1] St Xaviers Catholic Coll Engn, Dept Elect & Commun Engn, Nagercoil 629003, Tamil Nadu, India

来源：

APPLIED ACOUSTICS | 2021年 / 179卷

关键词：

Speech emotion recognition; Mel frequency magnitude coefficient; Speech feature; Speech signal processing; SPECTRAL FEATURES; FEATURE-SELECTION; CLASSIFICATION;

D O I：

10.1016/j.apacoust.2021.108046

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic speech emotion recognition using machine learning is a demanding research topic in the field of affective computing. Identifying the speech features for speech emotion recognition is a challenging issue as the feature needs to emphasize the information about emotion from the speech. Spectral features play a vital role in emotion recognition from speech signals. In this paper, two modifications are made in the extraction of Mel frequency cepstral coefficient, they are, using magnitude spectrum instead of energy spectrum and exclusion of discrete cosine transform and extract Mel Frequency Magnitude Coefficient. Mel frequency magnitude coefficient is the log of magnitude spectrum on a non-linear Mel scale frequency. Mel frequency magnitude coefficient and three conventional spectral features, Mel frequency cepstral coefficient, log frequency power coefficient and linear prediction cepstral coefficient are tested on Berlin, Ravdess, Savee, EMOVO, eNTERFACE and Urdu databases with multiclass support vector machine as the classifier. Mel frequency magnitude coefficient as a stand alone feature recognizes emotion with an accuracy of 81.50% for Berlin, 64.31% for Ravdess, 75.63% for Savee, 73.30% for EMOVO, 56.41% for eNTERFACE and 95.25% for Urdu databases. Mel frequency magnitude coefficient is found to be the better spectral feature for the identification of emotion from speech compared to the conventional features. (C) 2021 Elsevier Ltd. All rights reserved.

引用

页数：10

共 55 条

[1] Classification of speech dysfluencies with MFCC and LPCC features
Ai, Ooi Chia
Hariharan, M.
Yaacob, Sazali
Chee, Lim Sin
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (02) : 2157 - 2165
[2] [Anonymous], 2007, INFORM MEDIA TECHNOL
[3] [Anonymous], 2010, Multimodal Emotion Recognition, DOI DOI 10.4018/978-1-61520-919-4
[4] [Anonymous], 2006, DAT ENG WORKSH 2006
[5] Beigi H, 2011, FUNDAMENTALS OF SPEAKER RECOGNITION, P1, DOI 10.1007/978-0-387-77592-0
[6] Class-level spectral features for emotion recognition
Bitouk, Dmitri
Verma, Ragini
Nenkova, Ani
[J]. SPEECH COMMUNICATION, 2010, 52 (7-8) : 613 - 625
[7] Formant position based weighted spectral features for emotion recognition
Bozkurt, Elif
Erzin, Engin
Erdem, Cigdem Eroglu
Erdem, A. Tanju
[J]. SPEECH COMMUNICATION, 2011, 53 (9-10) : 1186 - 1197
[8] Burkhardt F., 2005, P 9 EUR C SPEECH COM, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[9] Support vector machines employing cross-correlation for emotional speech recognition
Chandaka, Suryannarayana
Chatterjee, Amitava
Munshi, Sugata
[J]. MEASUREMENT, 2009, 42 (04) : 611 - 618
[10] Speech emotion recognition: Features and classification models
Chen, Lijiang
Mao, Xia
Xue, Yuli
Cheng, Lee Lung
[J]. DIGITAL SIGNAL PROCESSING, 2012, 22 (06) : 1154 - 1160

← 1 2 3 4 5 6 →