A DCRNN-based ensemble classifier for speech emotion recognition in Odia language

被引:0
作者
Monorama Swain
Bubai Maji
P. Kabisatpathy
Aurobinda Routray
机构
[1] Silicon Institute of Technology,Department of Electronics and Communication Engineering
[2] CV Raman College of Engineering,Department of Electronics and Instrumentation
[3] Indian Institute of Technology,Department of Electrical Engineering
来源
Complex & Intelligent Systems | 2022年 / 8卷
关键词
Speech emotion recognition; Deep convolutional neural network; Bi-directional gated recurrent unit; Ensemble classifier;
D O I
暂无
中图分类号
学科分类号
摘要
The Odia language is an old Eastern Indo-Aryan language, spoken by 46.8 million people across India. We have designed an ensemble classifier using Deep Convolutional Recurrent Neural Network for Speech Emotion Recognition (SER). This study presents a new approach for SER tasks motivated by recent research on speech emotion recognition. Initially, we extract utterance-level log Mel-spectrograms and their first and second derivative (Static, Delta, and Delta-delta), represented as 3-D log Mel-spectrograms. We utilize deep convolutional neural networks deep convolutional neural networks to extract the deep features from 3-D log Mel-spectrograms. Then a bi-directional-gated recurrent unit network is applied to express long-term temporal dependency out of all features to produce utterance-level emotion. Finally, we use ensemble classifiers using Softmax and Support Vector Machine classifier to improve the final recognition rate. In this way, our proposed framework is trained and tested on Odia (Seven emotional states) and RAVDESS (Eight emotional states) dataset. The experimental results reveal that an ensemble classifier performs better instead of a single classifier. The accuracy levels reached are 85.31% and 77.54%, outperforming some state-of-the-art frameworks on the Odia and RAVDESS datasets.
引用
收藏
页码:4237 / 4249
页数:12
相关论文
共 50 条
  • [21] Mixed Entropy Down-Sampling based Ensemble Learning for Speech Emotion Recognition
    Xuan, Zhengji
    Li, Dongdong
    Wang, Zhe
    Yang, Hai
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [22] Language dialect based speech emotion recognition through deep learning techniques
    Sukumar Rajendran
    Sandeep Kumar Mathivanan
    Prabhu Jayagopal
    Maheshwari Venkatasen
    Thanapal Pandi
    Manivannan Sorakaya Somanathan
    Muthamilselvan Thangaval
    Prasanna Mani
    [J]. International Journal of Speech Technology, 2021, 24 : 625 - 635
  • [23] Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition
    Zhou, Ying
    Liang, Xuefeng
    Gu, Yu
    Yin, Yifei
    Yao, Longshan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 695 - 705
  • [24] Speech Emotion Recognition Using Multi-Layer Perceptron Classifier
    Yuan, Xiaochen
    Wong, Wai Pang
    Lam, Chan Tong
    [J]. 2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 644 - 648
  • [25] Language dialect based speech emotion recognition through deep learning techniques
    Rajendran, Sukumar
    Mathivanan, Sandeep Kumar
    Jayagopal, Prabhu
    Venkatasen, Maheshwari
    Pandi, Thanapal
    Sorakaya Somanathan, Manivannan
    Thangaval, Muthamilselvan
    Mani, Prasanna
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 625 - 635
  • [26] Language-independent hyperparameter optimization based speech emotion recognition system
    Thakur A.
    Dhull S.K.
    [J]. International Journal of Information Technology, 2022, 14 (7) : 3691 - 3699
  • [27] BanglaSER: A speech emotion recognition dataset for the Bangla language
    Das, Rakesh Kumar
    Islam, Nahidul
    Ahmed, Md. Rayhan
    Islam, Salekul
    Shatabda, Swakkhar
    Islam, A. K. M. Muzahidul
    [J]. DATA IN BRIEF, 2022, 42
  • [28] Efficient bimodal emotion recognition system based on speech/text embeddings and ensemble learning fusion
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    [J]. ANNALS OF TELECOMMUNICATIONS, 2025, : 379 - 399
  • [29] Ensemble Learning with CNN-LSTM Combination for Speech Emotion Recognition
    Tanberk, Senem
    Tukel, Dilek Bilgin
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION NETWORKS (ICCCN 2021), 2022, 394 : 39 - 47
  • [30] Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition
    Sridhar, Kusha
    Busso, Carlos
    [J]. INTERSPEECH 2020, 2020, : 516 - 520