Speech Databases, Speech Features, and Classifiers in Speech Emotion Recognition: A Review

被引:2
作者
Dar, G. H. Mohmad [1 ]
Delhibabu, Radhakrishnan [2 ]
机构
[1] Vellore Inst Technol, Sch Adv Sci, Vellore 632014, Tamil Nadu, India
[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Speech emotion recognition; machine learning; deep learning; affective computing; support vector machine; random forest; Gaussian mixture model; audio features; databases; classifiers; COMMUNICATING EMOTION; INFORMATION FUSION; REPRESENTATIONS; IMPLEMENTATION; AUTOENCODER; GENERATION; DEPRESSION; EXPRESSION; NETWORKS; VALENCE;
D O I
10.1109/ACCESS.2024.3476960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition from speech signals plays a crucial role in Human-Machine Interaction (HMI), particularly in the development of applications such as affective computing and interactive systems. This review seeks to provide an in-depth examination of current methodologies in speech emotion recognition (SER), with a focus on databases, feature extraction techniques, and classification models. It has been done in the past using low-level descriptors (LLDs) like Mel-Frequency Cepstral Coefficients (MFCCs), linear predictive coding (LPC), and pitch-based features in methods like Support Vector Machines (SVM), Random Forests (RF), and Gaussian Mixture Models (GMM). But the development of deep learning techniques has completely changed the field. Models like convolutional neural networks (CNNs) and long short-term memory (LSTM) networks have shown that they are better at capturing the complex temporal and spectral features of speech. This paper reviews prominent speech emotion datasets, exploring their linguistic diversity, annotation processes, and emotional labels. It also analyzes the efficacy of different speech features and classifiers in handling challenges such as data imbalance, limited data availability, and cross-lingual variations. The review highlights the need for future work to address real-time processing, context-sensitive emotion detection, and the integration of multi-modal data to enhance the performance of SER systems. By consolidating recent advancements and identifying areas for further research, this paper aims to provide a clearer path for optimizing feature extraction and classification techniques in the field of emotion recognition.
引用
收藏
页码:151122 / 151152
页数:31
相关论文
共 221 条
[2]   Robust Speech Emotion Recognition Using CNN plus LSTM Based on Stochastic Fractal Search Optimization Algorithm [J].
Abdelhamid, Abdel Aziza ;
El-Kenawy, El-Sayed M. ;
Alotaibi, Bandar ;
Amer, Ghadam ;
Abdelkader, Mahmoud Y. ;
Ibrahim, Abdelhameed ;
Eid, Marwa Metwally .
IEEE ACCESS, 2022, 10 :49265-49284
[3]  
Abdelwahab M, 2019, INT CONF AFFECT, DOI [10.1109/ACII.2019.8925524, 10.1109/acii.2019.8925524]
[4]  
Abdullah S.M.S.A., 2021, Journal of Applied Science and Technology Trends, V2, P52, DOI DOI 10.38094/JASTT20291
[5]  
Aguilar G, 2019, Arxiv, DOI arXiv:1906.10198
[6]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 (116) :56-76
[7]  
Al Dujaili M. J., 2023, Multimedia Tools Appl., V82, P1
[8]   Emotions Recognition Using EEG Signals: A Survey [J].
Alarcao, Soraia M. ;
Fonseca, Manuel J. .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (03) :374-393
[9]   Spoken emotion recognition using hierarchical classifiers [J].
Albornoz, Enrique M. ;
Milone, Diego H. ;
Rufiner, Hugo L. .
COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03) :556-570
[10]  
Albu F, 2015, EDULEARN PROC, P3229