Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review

被引:1
|
作者
Rathi, Tarun [1 ]
Tripathy, Manoj [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Roorkee 247667, India
关键词
Speech emotion recognition; Speech emotional data corpus; Speech features; Mel-frequency cepstral coefficients; Deep neural network; Convolutional neural network; DEEP; MODEL; NETWORK; DATABASES; RECURRENT; CNN; REPRESENTATIONS; CLASSIFIERS; 1D;
D O I
10.1016/j.specom.2024.103102
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotion recognition from speech has become crucial in human-computer interaction and affective computing applications. This review paper examines the complex relationship between two critical factors: the selection of speech data corpora and the extraction of speech features regarding speech emotion classification accuracy. Through an extensive analysis of literature from 2014 to 2023, publicly available speech datasets are explored and categorized based on their diversity, scale, linguistic attributes, and emotional classifications. The importance of various speech features, from basic spectral features to sophisticated prosodic cues, and their influence on emotion recognition accuracy is analyzed.. In the context of speech data corpora, this review paper unveils trends and insights from comparative studies exploring the repercussions of dataset choice on recognition efficacy. Various datasets such as IEMOCAP, EMODB, and MSP-IMPROV are scrutinized in terms of their influence on classifying the accuracy of the speech emotion recognition (SER) system. At the same time, potential challenges associated with dataset limitations are also examined. Notable features like Mel-frequency cepstral coefficients, pitch, intensity, and prosodic patterns are evaluated for their contributions to emotion recognition. Advanced feature extraction methods, too, are explored for their potential to capture intricate emotional dynamics. Moreover, this review paper offers insights into the methodological aspects of emotion recognition, shedding light on the diverse machine learning and deep learning approaches employed. Through a holistic synthesis of research findings, this review paper observes connections between the choice of speech data corpus, selection of speech features, and resulting emotion recognition accuracy. As the field continues to evolve, avenues for future research are proposed, ranging from enhanced feature extraction techniques to the development of standardized benchmark datasets. In essence, this review serves as a compass guiding researchers and practitioners through the intricate landscape of speech emotion recognition, offering a nuanced understanding of the factors shaping its recognition accuracy of speech emotion.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Novel acoustic features for speech emotion recognition
    Yong-Wan Roh
    Dong-Ju Kim
    Woo-Seok Lee
    Kwang-Seok Hong
    Science in China Series E: Technological Sciences, 2009, 52 : 1838 - 1848
  • [22] Speech Emotion Recognition Systems: A Comprehensive Review on Different Methodologies
    Anthony, Audre Arlene
    Patil, Chandreshekar Mohan
    WIRELESS PERSONAL COMMUNICATIONS, 2023, 130 (01) : 515 - 525
  • [23] Significance of Phonological Features in Speech Emotion Recognition
    Wang, Wei
    Watters, Paul A.
    Cao, Xinyi
    Shen, Lingjie
    Li, Bo
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 633 - 642
  • [24] Adding dimensional features for emotion recognition on speech
    Ben Letaifa, Leila
    Ines Torres, Maria
    Justo, Raquel
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [25] Windowing for Speech Emotion Recognition
    Puterka, Boris
    Kacur, Juraj
    Pavlovicova, Jarmila
    2019 61ST INTERNATIONAL SYMPOSIUM ELMAR, 2019, : 147 - 150
  • [26] Deep Multimodal Emotion Recognition on Human Speech: A Review
    Koromilas, Panagiotis
    Giannakopoulos, Theodoros
    APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [27] Urdu Speech Emotion Recognition: A Systematic Literature Review
    Taj, Soonh
    Mujtaba, Ghulam
    Daudpota, Sher Muhammad
    Mughal, Muhammad Hussain
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (07)
  • [28] Dual-TBNet: Improving the Robustness of Speech Features via Dual-Transformer-BiLSTM for Speech Emotion Recognition
    Liu, Zheng
    Kang, Xin
    Ren, Fuji
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2193 - 2203
  • [29] Disruptive situation detection on public transport through speech emotion recognition
    Mancini, Eleonora
    Galassi, Andrea
    Ruggeri, Federico
    Torroni, Paolo
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 21
  • [30] Speech Emotion Recognition Framework based on User Self-referential Speech Features
    Noh, Kyoungju
    Chung, Seungeun
    Lim, Jiyoun
    Kim, Gague
    Jeong, Hyuntae
    2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 341 - 342