Emotion classification from speech signal based on empirical mode decomposition and non-linear features Speech emotion recognition

被引:51
作者
Krishnan, Palani Thanaraj [1 ]
Alex Noel, Joseph Raj [2 ]
Rajangam, Vijayarajan [3 ]
机构
[1] St Josephs Coll Engn, Dept Elect & Instrumentat Engn, Chennai, Tamil Nadu, India
[2] Shantou Univ, Dept Elect Engn, Shantou, Peoples R China
[3] Vellore Inst Technol, Div Healthcare Adv Innovat & Res, Chennai, Tamil Nadu, India
关键词
Speech signal; Emotion perception; Entropy measures; Linear discriminant analysis; Empirical mode decomposition;
D O I
10.1007/s40747-021-00295-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition system from speech signal is a widely researched topic in the design of the Human-Computer Interface (HCI) models, since it provides insights into the mental states of human beings. Often, it is required to identify the emotional condition of the humans as cognitive feedback in the HCI. In this paper, an attempt to recognize seven emotional states from speech signals, known as sad, angry, disgust, happy, surprise, pleasant, and neutral sentiment, is investigated. The proposed method employs a non-linear signal quantifying method based on randomness measure, known as the entropy feature, for the detection of emotions. Initially, the speech signals are decomposed into Intrinsic Mode Function (IMF), where the IMF signals are divided into dominant frequency bands such as the high frequency, mid-frequency , and base frequency. The entropy measures are computed directly from the high-frequency band in the IMF domain. However, for the mid- and base-band frequencies, the IMFs are averaged and their entropy measures are computed. A feature vector is formed from the computed entropy measures incorporating the randomness feature for all the emotional signals. Then, the feature vector is used to train a few state-of-the-art classifiers, such as Linear Discriminant Analysis (LDA), Naive Bayes, K-Nearest Neighbor, Support Vector Machine, Random Forest, and Gradient Boosting Machine. A tenfold cross-validation, performed on a publicly available Toronto Emotional Speech dataset, illustrates that the LDA classifier presents a peak balanced accuracy of 93.3%, F1 score of 87.9%, and an area under the curve value of 0.995 in the recognition of emotions from speech signals of native English speakers.
引用
收藏
页码:1919 / 1934
页数:16
相关论文
共 49 条
[1]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 :56-76
[2]  
Anbarjafari, 2018, MULTIMODAL DATABASE
[3]   Human Identification System Based on Spatial and Temporal Features in the Video Surveillance System [J].
Angadi, Sanjeevkumar ;
Nandyal, Suvarna .
INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2020, 11 (03) :1-21
[4]  
[Anonymous], 2005, CHI 2005 TECHNOLOGY, DOI DOI 10.1145/1054972.1055040
[5]   An Enhanced Facial Expression Recognition Model Using Local Feature Fusion of Gabor Wavelets and Local Directionality Patterns [J].
Bellamkonda, Sivaiah ;
Gopalan, N. P. .
INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2020, 11 (01) :48-70
[6]   Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier [J].
Daneshfar, Fatemeh ;
Kabudian, Seyed Jahanshah ;
Neekabadi, Abbas .
APPLIED ACOUSTICS, 2020, 166
[7]   Approximate Entropy and Sample Entropy: A Comprehensive Tutorial [J].
Delgado-Bonal, Alfonso ;
Marshak, Alexander .
ENTROPY, 2019, 21 (06)
[8]   Application of fuzzy C-means clustering algorithm to spectral features for emotion classification from speech [J].
Demircan, Semiye ;
Kahramanli, Humar .
NEURAL COMPUTING & APPLICATIONS, 2018, 29 (08) :59-66
[9]  
Dupuis, TORONTO EMOTIONAL SP, DOI [10.5683/SP2/E8H2MF, DOI 10.5683/SP2/E8H2MF]
[10]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232