Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review

被引：1

作者：

Rathi, Tarun ^{[1
]}

Tripathy, Manoj ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Elect Engn, Roorkee 247667, India

来源：

SPEECH COMMUNICATION | 2024年 / 162卷

关键词：

Speech emotion recognition; Speech emotional data corpus; Speech features; Mel-frequency cepstral coefficients; Deep neural network; Convolutional neural network; DEEP; MODEL; NETWORK; DATABASES; RECURRENT; CNN; REPRESENTATIONS; CLASSIFIERS; 1D;

D O I：

10.1016/j.specom.2024.103102

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Emotion recognition from speech has become crucial in human-computer interaction and affective computing applications. This review paper examines the complex relationship between two critical factors: the selection of speech data corpora and the extraction of speech features regarding speech emotion classification accuracy. Through an extensive analysis of literature from 2014 to 2023, publicly available speech datasets are explored and categorized based on their diversity, scale, linguistic attributes, and emotional classifications. The importance of various speech features, from basic spectral features to sophisticated prosodic cues, and their influence on emotion recognition accuracy is analyzed.. In the context of speech data corpora, this review paper unveils trends and insights from comparative studies exploring the repercussions of dataset choice on recognition efficacy. Various datasets such as IEMOCAP, EMODB, and MSP-IMPROV are scrutinized in terms of their influence on classifying the accuracy of the speech emotion recognition (SER) system. At the same time, potential challenges associated with dataset limitations are also examined. Notable features like Mel-frequency cepstral coefficients, pitch, intensity, and prosodic patterns are evaluated for their contributions to emotion recognition. Advanced feature extraction methods, too, are explored for their potential to capture intricate emotional dynamics. Moreover, this review paper offers insights into the methodological aspects of emotion recognition, shedding light on the diverse machine learning and deep learning approaches employed. Through a holistic synthesis of research findings, this review paper observes connections between the choice of speech data corpus, selection of speech features, and resulting emotion recognition accuracy. As the field continues to evolve, avenues for future research are proposed, ranging from enhanced feature extraction techniques to the development of standardized benchmark datasets. In essence, this review serves as a compass guiding researchers and practitioners through the intricate landscape of speech emotion recognition, offering a nuanced understanding of the factors shaping its recognition accuracy of speech emotion.

引用

页数：24

共 112 条

[1] Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models [J].

Abbaschian, Babak Joze ;

Sierra-Sosa, Daniel ;

Elmaghraby, Adel .

SENSORS, 2021, 21 (04) :1-27

[2] Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].

Akcay, Mehmet Berkehan ;

Oguz, Kaya .

SPEECH COMMUNICATION, 2020, 116 :56-76

[3] Speech Emotion Recognition: A Comprehensive Survey [J].

Al-Dujaili, Mohammed Jawad ;

Ebrahimi-Moghadam, Abbas .

WIRELESS PERSONAL COMMUNICATIONS, 2023, 129 (04) :2525-2561

[4] Emotion Recognition in Never-Seen Languages Using a Novel Ensemble Method with Emotion Profiles [J].

Albornoz, E. M. ;

Milone, D. H. .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2017, 8 (01) :43-53

[5] Privacy Enhanced Speech Emotion Communication using Deep Learning Aided Edge Computing [J].

Ali, Hafiz Shehbaz ;

ul Hassan, Fakhar ;

Latif, Siddique ;

Manzoor, Habib Ullah ;

Qadir, Junaid .

2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2021,

[6] Effect on speech emotion classification of a feature selection approach using a convolutional neural network [J].

Amjad, Ammar ;

Khan, Lal ;

Chang, Hsien-Tsung .

PEERJ COMPUTER SCIENCE, 2021, 7

[7] Improved speech emotion recognition with Mel frequency magnitude coefficient [J].

Ancilin, J. ;

Milton, A. .

APPLIED ACOUSTICS, 2021, 179

[8] Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features [J].

Anvarjon, Tursunov ;

Mustaqeem ;

Kwon, Soonil .

SENSORS, 2020, 20 (18) :1-16

[9]

Arya R, 2021, 2021 IEEE MYS SUBS I, P613

[10] Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion [J].

Atmaja, Bagus Tris ;

Sasou, Akira ;

Akagi, Masato .

SPEECH COMMUNICATION, 2022, 140 :11-28

← 1 2 3 4 5 6 7 8 9 10 →