Speech Emotion Recognition Using Deep Learning

被引：0

作者：

Alagusundari, N. ^{[1
]}

Anuradha, R. ^{[1
]}

机构：

[1] Sri Ramakrishna Engn Coll Coimbatore, Dept Comp Sci & Engn, Coimbatore, India

来源：

ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023 | 2024年 / 843卷

关键词：

Deep learning; SER (speech emotion recognition); TCN (temporal convolutional network); CNN (convolutional neural network); GRU (gated recurrent unit); DANN (domain adversarial neural network); MFCC (mel-frequency cepstral coefficients);

D O I：

10.1007/978-981-99-8476-3_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech emotion recognition can be used in many applications, mainly in the field of mental health and human-robot interaction. SER can be used to monitor anxiety, depression, and post-traumatic stress disorder, among other mental health disorders. In this work, we have developed deep learning models such as CNN, DANN, and TCN to recognize emotional states from speech signals. Each model is trained with different datasets with different feature extraction techniques such as MFCC, etc., to recognize various emotions. The emotional states of a person can be classified based on factors like pitch, tone, intensity, and dimensions of emotion such as arousal and valence. We have used four different datasets for training and evaluating the model. This work used CNN, GRU, DANN, and TCN with various feature extraction techniques, among that TCN performs better in large datasets (MFCC 58 features) with 93.66% accuracy and with eight emotion classes (Angry, Calm, Disgust, Fear, Happy, Neutral, Sad, Surprise).

引用

页码：313 / 325

页数：13

共 13 条

[1] Robust Speech Emotion Recognition Using CNN plus LSTM Based on Stochastic Fractal Search Optimization Algorithm [J].

Abdelhamid, Abdel Aziza ;

El-Kenawy, El-Sayed M. ;

Alotaibi, Bandar ;

Amer, Ghadam ;

Abdelkader, Mahmoud Y. ;

Ibrahim, Abdelhameed ;

Eid, Marwa Metwally .

IEEE ACCESS, 2022, 10 :49265-49284

[2]

Atmaja BT, 2019, ASIAPAC SIGN INFO PR, P519, DOI 10.1109/APSIPAASC47483.2019.9023098

[3]

Etienne C, 2018, Arxiv, DOI arXiv:1802.05630

[4] Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition [J].

Lian, Zheng ;

Tao, Jianhua ;

Liu, Bin ;

Huang, Jian ;

Yang, Zhanlei ;

Li, Rongjun .

INTERSPEECH 2020, 2020, :394-398

[5] Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning [J].

Luna-Jimenez, Cristina ;

Griol, David ;

Callejas, Zoraida ;

Kleinlein, Ricardo ;

Montero, Juan M. ;

Fernandez-Martinez, Fernando .

SENSORS, 2021, 21 (22)

[6] Multi-modal Attention for Speech Emotion Recognition [J].

Pan, Zexu ;

Luo, Zhaojie ;

Yang, Jichen ;

Li, Haizhou .

INTERSPEECH 2020, 2020, :364-368

[7]

Pranav E, 2020, INT CONF ADVAN COMPU, P317, DOI [10.1109/ICACCS48705.2020.9074302, 10.1109/icaccs48705.2020.9074302]

[8]

Ristea NC, 2022, Arxiv, DOI [arXiv:2203.09581, 10.48550/arXiv.2203.09581]

[9]

Shukla A, 2020, Arxiv, DOI arXiv:2001.04316

[10] End-to-End Speech Emotion Recognition With Gender Information [J].

Sun, Ting-Wei .

IEEE ACCESS, 2020, 8 :152423-152438

← 1 2 →