A Combined CNN Architecture for Speech Emotion Recognition

被引：1

作者：

Begazo, Rolinson ^{[1
]}

Aguilera, Ana ^{[2
,3
]}

Dongo, Irvin ^{[1
,4
]}

Cardinale, Yudith ^{[5
]}

机构：

[1] Univ Catolica San Pablo, Elect & Elect Engn Dept, Arequipa 04001, Peru

[2] Univ Valparaiso, Fac Ingn, Escuela Ingn Informat, Valparaiso 2340000, Chile

[3] Univ Valparaiso, Interdisciplinary Ctr Biomed Res & Hlth Engn MEDIN, Valparaiso 2340000, Chile

[4] Univ Bordeaux, ESTIA Inst Technol, F-64210 Bidart, France

[5] Univ Int Valencia, Grp Invest Ciencia Datos, Valencia 46002, Spain

来源：

SENSORS | 2024年 / 24卷 / 17期

关键词：

speech emotion recognition; deep learning; spectral features; spectrogram imaging; feature fusion; convolutional neural network; NEURAL-NETWORKS; FEATURES; CORPUS;

D O I：

10.3390/s24175797

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Emotion recognition through speech is a technique employed in various scenarios of Human-Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.

引用

页数：39

共 50 条

[21] A novel concatenated 1D-CNN model for speech emotion recognition
Flower, T. Mary Little
Jaya, T.
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 93
[22] Hybrid LSTM-Attention and CNN Model for Enhanced Speech Emotion Recognition
Makhmudov, Fazliddin
Kutlimuratov, Alpamis
Cho, Young-Im
APPLIED SCIENCES-BASEL, 2024, 14 (23):
[23] Lightweight Deep Learning Framework for Speech Emotion Recognition
Akinpelu, Samson
Viriri, Serestina
Adegun, Adekanmi
IEEE ACCESS, 2023, 11 : 77086 - 77098
[24] Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition
Seo, Minji
Kim, Myungho
SENSORS, 2020, 20 (19) : 1 - 21
[25] Learning Salient Features for Speech Emotion Recognition Using CNN
Liu, Jiamu
Han, Wenjing
Ruan, Huabin
Chen, Xiaomin
Jiang, Dongmei
Li, Haifeng
2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
[26] Real Time Emotion Recognition from Facial Expressions Using CNN Architecture
Ozdemir, Mehmet Akif
Elagoz, Berkay
Alaybeyoglu, Aysegul
Sadighzadeh, Reza
Akan, Aydin
2019 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2019, : 417 - 420
[27] An ongoing review of speech emotion recognition
de Lope, Javier
Grana, Manuel
NEUROCOMPUTING, 2023, 528 : 1 - 11
[28] 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features
Mustaqeem
Kwon, Soonil
CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (03): : 4039 - 4059
[29] Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders
Makhmudov, Fazliddin
Kutlimuratov, Alpamis
Akhmedov, Farkhod
Abdallah, Mohamed S.
Cho, Young-Im
ELECTRONICS, 2022, 11 (23)
[30] Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network
Li, Juan
Zhang, Xueying
Huang, Lixia
Li, Fenglian
Duan, Shufei
Sun, Ying
APPLIED SCIENCES-BASEL, 2022, 12 (19):

← 1 2 3 4 5 →