Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model

被引:0
|
作者
Swami Mishra
Nehal Bhatnagar
Prakasam P
Sureshkumar T. R
机构
[1] Vellore Institute of Technology,School of Electronics Engineering
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Speech emotion recognition; Deep convolutional neural networks; LSTM; MFSC; Ensemble learning;
D O I
暂无
中图分类号
学科分类号
摘要
Accurate emotion detection from speech utterances has been a challenging and active research affair recently. Speech emotion recognition (SER) systems play an essential role in Human-machine interaction, virtual reality, emergency services, and many other real-time systems. It is an open-ended problem as subjects from different regions and lingual backgrounds convey emotions altogether differently. The conventional approach used low-level periodic features from audio samples like energy, pitch, etc., for classification but was not efficient enough to detect emotions accurately and not generalized. With the recent advancements in computer vision and neural networks extracting high-level features and more accurate recognition can be achieved. This study proposes an ensemble deep CNN + Bi-LSTM-based framework for speech emotion recognition and classification of seven different emotions. The paralinguistic log Mel-frequency spectral coefficients (MFSC) is used as a feature to train the proposed architecture. The proposed Hybrid model is validated with TESS and SAVEE datasets. Experimental results have indicated a classification accuracy of 96.36%. The proposed model is compared with existing models, proving the superiority of the proposed hybrid deep CNN and Bi-LSTM model.
引用
收藏
页码:37603 / 37620
页数:17
相关论文
共 50 条
  • [1] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Mishra, Swami
    Bhatnagar, Nehal
    Prakasam, P.
    Sureshkumar, T. R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
  • [2] Hybrid CNN-BiLSTM architecture with multiple attention mechanisms to enhance speech emotion recognition
    Poorna, S. S.
    Menon, Vivek
    Gopalan, Sundararaman
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [3] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [4] Hybrid LSTM-Attention and CNN Model for Enhanced Speech Emotion Recognition
    Makhmudov, Fazliddin
    Kutlimuratov, Alpamis
    Cho, Young-Im
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [5] Human activity recognition using CNN-BiLSTM-LightGBM hybrid model
    Sonmez, Seyma Nur
    Dogru, Ibrahim Alper
    Atacak, Ismail
    Kilic, Kazim
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [6] A Novel CNN-BiLSTM-GRU Hybrid Deep Learning Model for Human Activity Recognition
    Lalwani, Pooja
    Ganeshan, R.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [7] EEG-based emotion recognition using hybrid CNN and LSTM classification
    Chakravarthi, Bhuvaneshwari
    Ng, Sin-Chun
    Ezilarasan, M. R.
    Leung, Man-Fai
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2022, 16
  • [8] Speech-based emotion recognition using a hybrid RNN-CNN network
    Ning, Jingtao
    Zhang, Wenchuan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [9] A BiLSTM-Transformer and 2D CNN Architecture for Emotion Recognition from Speech
    Kim, Sera
    Lee, Seok-Pil
    ELECTRONICS, 2023, 12 (19)
  • [10] A HYBRID CNN-BILSTM MODEL FOR DRUG NAMED ENTITY RECOGNITION
    Fudholi, Dhomas Hatta
    Nayoan, Royan Abida N.
    Hidayatullah, Ahmad Fathan
    Arianto, Dede Brahma
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2022, 17 (01): : 730 - 744