Variational mode decomposition based acoustic and entropy features for speech emotion recognition

被引:18
作者
Mishra, Siba Prasad [1 ]
Warule, Pankaj [1 ]
Deb, Suman [1 ]
机构
[1] Sardar Vallabhbhai Natl Inst Technol, Surat, Gujarat, India
关键词
Deep neural network; Speech emotion recognition; MFCC; Permutation entropy; Approximate entropy; APPROXIMATE ENTROPY; FEATURE-EXTRACTION; CLASSIFICATION; DEEP;
D O I
10.1016/j.apacoust.2023.109578
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automated speech emotion recognition (SER) is a machine-based method for identifying emotion from speech signals. SER has many practical applications, including improving man-machine interaction (MMI), online customer support, healthcare services, online marketing, etc. Because of the wide range of applications, the popularity of SER has been increasing among researchers for three decades. Numerous studies employed various combinations of features and classifiers to improve emotion classification performance. In our study, we tried to achieve the same by using variational mode decomposition (VMD)-based features. We extracted features like MFCC, mel-spectrogram, approximate entropy (ApEn), and permutation entropy (PrEn) from each VMD mode. The performance of emotion classification is evaluated using the deep neural network (DNN) classifier and the proposed VMD-based features individually (MFCC, mel-spectrogram, ApEn, and PrEn) and in combination (MFCC + mel-spectrogram + ApEn + PrEn). We used two datasets, RAVDESS and EMO-DB, to evaluate the emotion classification performance and obtained a classification accuracy of 91.59% and 80.83% for the EMO-DB and RAVDESS datasets, respectively. Our experimental results were compared with the other methods, and we found that the proposed VMD-based feature combinations with a DNN classifier performed better than the state-of-the-art works in SER.
引用
收藏
页数:12
相关论文
共 49 条
  • [1] Development of novel automated language classification model using pyramid pattern technique with speech signals
    Akbal, Erhan
    Barua, Prabal Datta
    Tuncer, Turker
    Dogan, Sengul
    Acharya, U. Rajendra
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23) : 21319 - 21333
  • [2] Improved speech emotion recognition with Mel frequency magnitude coefficient
    Ancilin, J.
    Milton, A.
    [J]. APPLIED ACOUSTICS, 2021, 179
  • [3] Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files
    Andayani, Felicia
    Theng, Lau Bee
    Tsun, Mark Teekit
    Chua, Caslon
    [J]. IEEE ACCESS, 2022, 10 : 36018 - 36027
  • [4] Speaker Awareness for Speech Emotion Recognition
    Assuncao, Gustavo
    Menezes, Paulo
    Perdigao, Fernando
    [J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (04) : 15 - 22
  • [5] Badshah AM, 2017, 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), P125
  • [6] Permutation entropy: A natural complexity measure for time series
    Bandt, C
    Pompe, B
    [J]. PHYSICAL REVIEW LETTERS, 2002, 88 (17) : 4
  • [7] Bertsekas D.P., 2014, Constrained Optimization and Lagrange Multiplier Methods
  • [8] Bagged support vector machines for emotion recognition from speech
    Bhavan, Anjali
    Chauhan, Pankaj
    Hitkul
    Shah, Rajiv Ratn
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 184
  • [9] A comparative study of traditional and newly proposed features for recognition of speech under stress
    Bou-Ghazale, SE
    Hansen, JHL
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 429 - 442
  • [10] Detection of Common Cold from Speech Signals using Deep Neural Network
    Deb, Suman
    Warule, Pankaj
    Nair, Amrita
    Sultan, Haider
    Dash, Rahul
    Krajewski, Jarek
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 42 (3) : 1707 - 1722