Speech-based emotion recognition using a hybrid RNN-CNN network

被引:0
作者
Ning, Jingtao [1 ]
Zhang, Wenchuan [1 ]
机构
[1] Lanzhou Petrochem Univ Vocat Technol, Coll Informat Engn, Lanzhou 730060, Gansu, Peoples R China
关键词
Speech emotion recognition; Deep learning; Recurrent neural network; Convolutional neural network; Wide kernel; Classification; DEEP;
D O I
10.1007/s11760-024-03574-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition is probably among the most exciting and dynamic areas of modern research focused on speech signals analysis, which allows estimating and classifying speakers' rich spectrum of emotions. The following paper aims to develop a novel deep learning (DL)-based model for detecting speech emotion variation to overcome several weaknesses of the existing intelligent data-driven approaches. A new architecture for a DL network, referred to as the RNN-CNN, is proposed and applied in this paper to perform the SER task by operating directly on raw speech signals. Specifically, the challenge was effectively combining an initial convolution layer with a wide kernel as an efficient way to address and mitigate the problems caused by noise found in raw speech signals. In this experimental analysis, the 3 databases used to evaluate the proposed RNN-CNN model are RML, RAVDESS, and SAVEE. The effectiveness of such methodologies can be detected with remarkable efficacy, whose improved accuracy rates depict contrasting trends from those findings of the previous works analyzed through respective datasets. This assessment has validated the robust performance and applicability of the suggested models for diverse speech databases and underlined their potential in further speech-based emotion recognition.
引用
收藏
页数:10
相关论文
共 54 条
  • [1] Aouani Hadhami, 2020, Procedia Computer Science, V176, P251, DOI 10.1016/j.procs.2020.08.027
  • [2] Speaker Awareness for Speech Emotion Recognition
    Assuncao, Gustavo
    Menezes, Paulo
    Perdigao, Fernando
    [J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (04) : 15 - 22
  • [3] A comparative study of human facial age estimation: handcrafted features vs. deep features
    Bekhouche, S. E.
    Dornaika, F.
    Benlamoudi, A.
    Ouafi, A.
    Taleb-Ahmed, A.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 26605 - 26622
  • [4] Survey of Deep Learning Paradigms for Speech Processing
    Bhangale, Kishor Barasu
    Kothandaraman, Mohanaprasad
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2022, 125 (02) : 1913 - 1949
  • [5] A review on speech processing using machine learning paradigm
    Bhangale, Kishor Barasu
    Mohanaprasad, K.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 367 - 388
  • [6] CNN-LSTM Driving Style Classification Model Based on Driver Operation Time Series Data
    Cai, Yingfeng
    Zhao, Ruidong
    Wang, Hai
    Chen, Long
    Lian, Yubo
    Zhong, Yilin
    [J]. IEEE ACCESS, 2023, 11 : 16203 - 16212
  • [7] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
  • [8] Learning multi-scale features for speech emotion recognition with connection attention mechanism
    Chen, Zengzhao
    Li, Jiawen
    Liu, Hai
    Wang, Xuyang
    Wang, Hu
    Zheng, Qiuyu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [9] Deep Learning and Audio Based Emotion Recognition
    Demir, Asli
    Atila, Orhan
    Sengur, Abdulkadir
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [10] A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features
    Er, Mehmet Bilal
    [J]. IEEE ACCESS, 2020, 8 : 221640 - 221653