Gender-Aware Speech Emotion Recognition in Multiple Languages

被引:0
|
作者
Nicolini, Marco [1 ]
Ntalampiras, Stavros [1 ]
机构
[1] Univ Milan, Dept Comp Sci, Milan, Italy
来源
PATTERN RECOGNITION APPLICATIONS AND METHODS, ICPRAM 2023 | 2024年 / 14547卷
关键词
Audio pattern recognition; Machine learning; Transfer learning; Convolutional neural network; YAMNet; Multilingual speech emotion recognition; CORPUS;
D O I
10.1007/978-3-031-54726-3_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article presents a solution for Speech Emotion Recognition (SER) in multilingual setting using a hierarchical approach. The approach involves two levels, the first level identifies the gender of the speaker, while the second level predicts their emotional state. We evaluate the performance of three classifiers of increasing complexity: k-NN, transfer learning based on YAMNet, and Bidirectional Long Short-Term Memory neural networks. The models were trained, validated, and tested on a dataset that includes the big-six emotions and was collected from well-known SER datasets representing six different languages. Our results indicate that there are differences in classification accuracy when considering all data versus only female or male data, across all classifiers. Interestingly, prior knowledge of the speaker's gender can improve the overall classification performance.
引用
收藏
页码:111 / 123
页数:13
相关论文
共 50 条
  • [31] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
  • [32] E-Speech: Development of a Dataset for Speech Emotion Recognition and Analysis
    Liu, Wenjin
    Shi, Jiaqi
    Zhang, Shudong
    Zhou, Lijuan
    Liu, Haoming
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024
  • [33] Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition
    Atmaja, Bagus Tris
    Sasou, Akira
    IEEE ACCESS, 2022, 10 : 124396 - 124407
  • [34] Context-aware Multimodal Fusion for Emotion Recognition
    Li, Jinchao
    Wang, Shuai
    Chao, Yang
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2022, 2022, : 2013 - 2017
  • [35] EMOTION CONTROLLABLE SPEECH SYNTHESIS USING EMOTION-UNLABELED DATASET WITH THE ASSISTANCE OF CROSS-DOMAIN SPEECH EMOTION RECOGNITION
    Cai, Xiong
    Dai, Dongyang
    Wu, Zhiyong
    Li, Xiang
    Li, Jingbei
    Meng, Helen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5734 - 5738
  • [36] Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech
    Sridhar, Kusha
    Busso, Carlos
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 1959 - 1972
  • [37] Speech emotion recognition using Ramanujan Fourier Transform
    Flower, T. Mary Little
    Jaya, T.
    APPLIED ACOUSTICS, 2022, 201
  • [38] Evaluating intonational features for emotion recognition from speech
    Zervas, Panagiotis
    Mporas, Iosif
    Fakotakis, Nikos
    Kokkinakis, George
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (06) : 1001 - 1014
  • [39] Towards Speech Emotion Recognition Applied to Social Robots
    Gamboa, Alvaro
    Dongo, Irvin
    Aguilera, Ana
    Begazo, Rolinson
    2024 L LATIN AMERICAN COMPUTER CONFERENCE, CLEI 2024, 2024,
  • [40] SUPERVISED DOMAIN ADAPTATION FOR EMOTION RECOGNITION FROM SPEECH
    Abdelwahab, Mohammed
    Busso, Carlos
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5058 - 5062