Speech-based emotion recognition using a hybrid RNN-CNN network

被引：0

作者：

Ning, Jingtao ^{[1
]}

Zhang, Wenchuan ^{[1
]}

机构：

[1] Lanzhou Petrochem Univ Vocat Technol, Coll Informat Engn, Lanzhou 730060, Gansu, Peoples R China

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 01期

关键词：

Speech emotion recognition; Deep learning; Recurrent neural network; Convolutional neural network; Wide kernel; Classification; DEEP;

D O I：

10.1007/s11760-024-03574-7

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech emotion recognition is probably among the most exciting and dynamic areas of modern research focused on speech signals analysis, which allows estimating and classifying speakers' rich spectrum of emotions. The following paper aims to develop a novel deep learning (DL)-based model for detecting speech emotion variation to overcome several weaknesses of the existing intelligent data-driven approaches. A new architecture for a DL network, referred to as the RNN-CNN, is proposed and applied in this paper to perform the SER task by operating directly on raw speech signals. Specifically, the challenge was effectively combining an initial convolution layer with a wide kernel as an efficient way to address and mitigate the problems caused by noise found in raw speech signals. In this experimental analysis, the 3 databases used to evaluate the proposed RNN-CNN model are RML, RAVDESS, and SAVEE. The effectiveness of such methodologies can be detected with remarkable efficacy, whose improved accuracy rates depict contrasting trends from those findings of the previous works analyzed through respective datasets. This assessment has validated the robust performance and applicability of the suggested models for diverse speech databases and underlined their potential in further speech-based emotion recognition.

引用

页数：10

共 54 条

[1] Aouani Hadhami, 2020, Procedia Computer Science, V176, P251, DOI 10.1016/j.procs.2020.08.027
[2] Speaker Awareness for Speech Emotion Recognition
Assuncao, Gustavo
Menezes, Paulo
Perdigao, Fernando
[J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (04) : 15 - 22
[3] A comparative study of human facial age estimation: handcrafted features vs. deep features
Bekhouche, S. E.
Dornaika, F.
Benlamoudi, A.
Ouafi, A.
Taleb-Ahmed, A.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 26605 - 26622
[4] Survey of Deep Learning Paradigms for Speech Processing
Bhangale, Kishor Barasu
Kothandaraman, Mohanaprasad
[J]. WIRELESS PERSONAL COMMUNICATIONS, 2022, 125 (02) : 1913 - 1949
[5] A review on speech processing using machine learning paradigm
Bhangale, Kishor Barasu
Mohanaprasad, K.
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 367 - 388
[6] CNN-LSTM Driving Style Classification Model Based on Driver Operation Time Series Data
Cai, Yingfeng
Zhao, Ruidong
Wang, Hai
Chen, Long
Lian, Yubo
Zhong, Yilin
[J]. IEEE ACCESS, 2023, 11 : 16203 - 16212
[7] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
Chen, Mingyi
He, Xuanji
Yang, Jing
Zhang, Han
[J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
[8] Learning multi-scale features for speech emotion recognition with connection attention mechanism
Chen, Zengzhao
Li, Jiawen
Liu, Hai
Wang, Xuyang
Wang, Hu
Zheng, Qiuyu
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
[9] Deep Learning and Audio Based Emotion Recognition
Demir, Asli
Atila, Orhan
Sengur, Abdulkadir
[J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
[10] A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features
Er, Mehmet Bilal
[J]. IEEE ACCESS, 2020, 8 : 221640 - 221653

← 1 2 3 4 5 6 →