Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review

被引：1

作者：

Rathi, Tarun ^{[1
]}

Tripathy, Manoj ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Elect Engn, Roorkee 247667, India

来源：

SPEECH COMMUNICATION | 2024年 / 162卷

关键词：

Speech emotion recognition; Speech emotional data corpus; Speech features; Mel-frequency cepstral coefficients; Deep neural network; Convolutional neural network; DEEP; MODEL; NETWORK; DATABASES; RECURRENT; CNN; REPRESENTATIONS; CLASSIFIERS; 1D;

D O I：

10.1016/j.specom.2024.103102

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Emotion recognition from speech has become crucial in human-computer interaction and affective computing applications. This review paper examines the complex relationship between two critical factors: the selection of speech data corpora and the extraction of speech features regarding speech emotion classification accuracy. Through an extensive analysis of literature from 2014 to 2023, publicly available speech datasets are explored and categorized based on their diversity, scale, linguistic attributes, and emotional classifications. The importance of various speech features, from basic spectral features to sophisticated prosodic cues, and their influence on emotion recognition accuracy is analyzed.. In the context of speech data corpora, this review paper unveils trends and insights from comparative studies exploring the repercussions of dataset choice on recognition efficacy. Various datasets such as IEMOCAP, EMODB, and MSP-IMPROV are scrutinized in terms of their influence on classifying the accuracy of the speech emotion recognition (SER) system. At the same time, potential challenges associated with dataset limitations are also examined. Notable features like Mel-frequency cepstral coefficients, pitch, intensity, and prosodic patterns are evaluated for their contributions to emotion recognition. Advanced feature extraction methods, too, are explored for their potential to capture intricate emotional dynamics. Moreover, this review paper offers insights into the methodological aspects of emotion recognition, shedding light on the diverse machine learning and deep learning approaches employed. Through a holistic synthesis of research findings, this review paper observes connections between the choice of speech data corpus, selection of speech features, and resulting emotion recognition accuracy. As the field continues to evolve, avenues for future research are proposed, ranging from enhanced feature extraction techniques to the development of standardized benchmark datasets. In essence, this review serves as a compass guiding researchers and practitioners through the intricate landscape of speech emotion recognition, offering a nuanced understanding of the factors shaping its recognition accuracy of speech emotion.

引用

页数：24

共 50 条

[31] Effects of Data Augmentations on Speech Emotion Recognition
Atmaja, Bagus Tris
Sasou, Akira
SENSORS, 2022, 22 (16)
[32] Speech emotion recognition using data augmentation
V. M. Praseetha
P. P. Joby
International Journal of Speech Technology, 2022, 25 : 783 - 792
[33] Speech Emotion Recognition Using Data Augmentation
Kapoor, Tanisha
Ganguly, Arnaja
Rajeswari, D.
2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
[34] Speech emotion recognition using data augmentation
Praseetha, V. M.
Joby, P. P.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
[35] Survey on speech emotion recognition: Features, classification schemes, and databases
El Ayadi, Moataz
Kamel, Mohamed S.
Karray, Fakhri
PATTERN RECOGNITION, 2011, 44 (03) : 572 - 587
[36] AESR: Speech Recognition With Speech Emotion Recogniting Learning
Han, RongQi
Liu, Xin
Zhang, Hui
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 91 - 101
[37] A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition
Pao, Tsang-Long
Wang, Chun-Hsiang
Li, Yu-Ji
2012 FIFTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2012, : 157 - 162
[38] IMPROVING SPEECH EMOTION RECOGNITION WITH UNSUPERVISED REPRESENTATION LEARNING ON UNLABELED SPEECH
Neumann, Michael
Ngoc Thang Vu
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7390 - 7394
[39] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
Leem, Seong-Gyun
Fulford, Daniel
Onnela, Jukka-Pekka
Gard, David
Busso, Carlos
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
[40] E-Speech: Development of a Dataset for Speech Emotion Recognition and Analysis
Liu, Wenjin
Shi, Jiaqi
Zhang, Shudong
Zhou, Lijuan
Liu, Haoming
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024

← 1 2 3 4 5 →