Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review

被引:1
作者
Rathi, Tarun [1 ]
Tripathy, Manoj [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Roorkee 247667, India
关键词
Speech emotion recognition; Speech emotional data corpus; Speech features; Mel-frequency cepstral coefficients; Deep neural network; Convolutional neural network; DEEP; MODEL; NETWORK; DATABASES; RECURRENT; CNN; REPRESENTATIONS; CLASSIFIERS; 1D;
D O I
10.1016/j.specom.2024.103102
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotion recognition from speech has become crucial in human-computer interaction and affective computing applications. This review paper examines the complex relationship between two critical factors: the selection of speech data corpora and the extraction of speech features regarding speech emotion classification accuracy. Through an extensive analysis of literature from 2014 to 2023, publicly available speech datasets are explored and categorized based on their diversity, scale, linguistic attributes, and emotional classifications. The importance of various speech features, from basic spectral features to sophisticated prosodic cues, and their influence on emotion recognition accuracy is analyzed.. In the context of speech data corpora, this review paper unveils trends and insights from comparative studies exploring the repercussions of dataset choice on recognition efficacy. Various datasets such as IEMOCAP, EMODB, and MSP-IMPROV are scrutinized in terms of their influence on classifying the accuracy of the speech emotion recognition (SER) system. At the same time, potential challenges associated with dataset limitations are also examined. Notable features like Mel-frequency cepstral coefficients, pitch, intensity, and prosodic patterns are evaluated for their contributions to emotion recognition. Advanced feature extraction methods, too, are explored for their potential to capture intricate emotional dynamics. Moreover, this review paper offers insights into the methodological aspects of emotion recognition, shedding light on the diverse machine learning and deep learning approaches employed. Through a holistic synthesis of research findings, this review paper observes connections between the choice of speech data corpus, selection of speech features, and resulting emotion recognition accuracy. As the field continues to evolve, avenues for future research are proposed, ranging from enhanced feature extraction techniques to the development of standardized benchmark datasets. In essence, this review serves as a compass guiding researchers and practitioners through the intricate landscape of speech emotion recognition, offering a nuanced understanding of the factors shaping its recognition accuracy of speech emotion.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] E-Speech: Development of a Dataset for Speech Emotion Recognition and Analysis
    Liu, Wenjin
    Shi, Jiaqi
    Zhang, Shudong
    Zhou, Lijuan
    Liu, Haoming
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024
  • [42] Variational mode decomposition based acoustic and entropy features for speech emotion recognition
    Mishra, Siba Prasad
    Warule, Pankaj
    Deb, Suman
    APPLIED ACOUSTICS, 2023, 212
  • [43] Modulation spectral features for speech emotion recognition using deep neural networks
    Singh, Premjeet
    Sahidullah, Md
    Saha, Goutam
    SPEECH COMMUNICATION, 2023, 146 : 53 - 69
  • [44] Speech emotion recognition using nonlinear dynamics features
    Shahzadi, Ali
    Ahmadyfard, Alireza
    Harimi, Ali
    Yaghmaie, Khashayar
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2015, 23 : 2056 - 2073
  • [45] Speech Emotion Recognition Considering Local Dynamic Features
    Guan, Haotian
    Liu, Zhilei
    Wang, Longbiao
    Dang, Jianwu
    Yu, Ruiguo
    STUDIES ON SPEECH PRODUCTION, 2018, 10733 : 14 - 23
  • [46] Speech Emotion Recognition Using Minimum Extracted Features
    Abdulsalam, Wisal Hashim
    Alhamdani, Rafah Shihab
    Abdullah, Mohammed Najm
    2018 1ST ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION AND SCIENCES (AICIS 2018), 2018, : 58 - 61
  • [47] Amplitude Modulation Features for Emotion Recognition from Speech
    Alam, Md Jahangir
    Attabi, Yazid
    Dumouchel, Pierre
    Kenny, Patrick
    O'Shaughnessy, D.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2419 - 2423
  • [48] Speech Emotion Recognition Using ANN on MFCC Features
    Dolka, Harshit
    Xavier, Arul V. M.
    Juliet, Sujitha
    ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 431 - 435
  • [49] Deep temporal clustering features for speech emotion recognition
    Lin, Wei-Cheng
    Busso, Carlos
    SPEECH COMMUNICATION, 2024, 157
  • [50] SELECTIVE MULTI-TASK LEARNING FOR SPEECH EMOTION RECOGNITION USING CORPORA OF DIFFERENT STYLES
    Zhang, Heran
    Mimura, Masato
    Kawahara, Tatsuya
    Ishizuka, Kenkichi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7707 - 7711