Gender-Aware Speech Emotion Recognition in Multiple Languages

被引：0

作者：

Nicolini, Marco ^{[1
]}

Ntalampiras, Stavros ^{[1
]}

机构：

[1] Univ Milan, Dept Comp Sci, Milan, Italy

来源：

PATTERN RECOGNITION APPLICATIONS AND METHODS, ICPRAM 2023 | 2024年 / 14547卷

关键词：

Audio pattern recognition; Machine learning; Transfer learning; Convolutional neural network; YAMNet; Multilingual speech emotion recognition; CORPUS;

D O I：

10.1007/978-3-031-54726-3_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article presents a solution for Speech Emotion Recognition (SER) in multilingual setting using a hierarchical approach. The approach involves two levels, the first level identifies the gender of the speaker, while the second level predicts their emotional state. We evaluate the performance of three classifiers of increasing complexity: k-NN, transfer learning based on YAMNet, and Bidirectional Long Short-Term Memory neural networks. The models were trained, validated, and tested on a dataset that includes the big-six emotions and was collected from well-known SER datasets representing six different languages. Our results indicate that there are differences in classification accuracy when considering all data versus only female or male data, across all classifiers. Interestingly, prior knowledge of the speaker's gender can improve the overall classification performance.

引用

页码：111 / 123

页数：13

共 50 条

[31] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
Leem, Seong-Gyun
Fulford, Daniel
Onnela, Jukka-Pekka
Gard, David
Busso, Carlos
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
[32] E-Speech: Development of a Dataset for Speech Emotion Recognition and Analysis
Liu, Wenjin
Shi, Jiaqi
Zhang, Shudong
Zhou, Lijuan
Liu, Haoming
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024
[33] Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition
Atmaja, Bagus Tris
Sasou, Akira
IEEE ACCESS, 2022, 10 : 124396 - 124407
[34] Context-aware Multimodal Fusion for Emotion Recognition
Li, Jinchao
Wang, Shuai
Chao, Yang
Liu, Xunying
Meng, Helen
INTERSPEECH 2022, 2022, : 2013 - 2017
[35] EMOTION CONTROLLABLE SPEECH SYNTHESIS USING EMOTION-UNLABELED DATASET WITH THE ASSISTANCE OF CROSS-DOMAIN SPEECH EMOTION RECOGNITION
Cai, Xiong
Dai, Dongyang
Wu, Zhiyong
Li, Xiang
Li, Jingbei
Meng, Helen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5734 - 5738
[36] Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech
Sridhar, Kusha
Busso, Carlos
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 1959 - 1972
[37] Speech emotion recognition using Ramanujan Fourier Transform
Flower, T. Mary Little
Jaya, T.
APPLIED ACOUSTICS, 2022, 201
[38] Evaluating intonational features for emotion recognition from speech
Zervas, Panagiotis
Mporas, Iosif
Fakotakis, Nikos
Kokkinakis, George
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (06) : 1001 - 1014
[39] Towards Speech Emotion Recognition Applied to Social Robots
Gamboa, Alvaro
Dongo, Irvin
Aguilera, Ana
Begazo, Rolinson
2024 L LATIN AMERICAN COMPUTER CONFERENCE, CLEI 2024, 2024,
[40] SUPERVISED DOMAIN ADAPTATION FOR EMOTION RECOGNITION FROM SPEECH
Abdelwahab, Mohammed
Busso, Carlos
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5058 - 5062

← 1 2 3 4 5 →