Variational mode decomposition based acoustic and entropy features for speech emotion recognition

被引：18

作者：

Mishra, Siba Prasad ^{[1
]}

Warule, Pankaj ^{[1
]}

Deb, Suman ^{[1
]}

机构：

[1] Sardar Vallabhbhai Natl Inst Technol, Surat, Gujarat, India

来源：

APPLIED ACOUSTICS | 2023年 / 212卷

关键词：

Deep neural network; Speech emotion recognition; MFCC; Permutation entropy; Approximate entropy; APPROXIMATE ENTROPY; FEATURE-EXTRACTION; CLASSIFICATION; DEEP;

D O I：

10.1016/j.apacoust.2023.109578

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automated speech emotion recognition (SER) is a machine-based method for identifying emotion from speech signals. SER has many practical applications, including improving man-machine interaction (MMI), online customer support, healthcare services, online marketing, etc. Because of the wide range of applications, the popularity of SER has been increasing among researchers for three decades. Numerous studies employed various combinations of features and classifiers to improve emotion classification performance. In our study, we tried to achieve the same by using variational mode decomposition (VMD)-based features. We extracted features like MFCC, mel-spectrogram, approximate entropy (ApEn), and permutation entropy (PrEn) from each VMD mode. The performance of emotion classification is evaluated using the deep neural network (DNN) classifier and the proposed VMD-based features individually (MFCC, mel-spectrogram, ApEn, and PrEn) and in combination (MFCC + mel-spectrogram + ApEn + PrEn). We used two datasets, RAVDESS and EMO-DB, to evaluate the emotion classification performance and obtained a classification accuracy of 91.59% and 80.83% for the EMO-DB and RAVDESS datasets, respectively. Our experimental results were compared with the other methods, and we found that the proposed VMD-based feature combinations with a DNN classifier performed better than the state-of-the-art works in SER.

引用

页数：12

共 49 条

[1] Development of novel automated language classification model using pyramid pattern technique with speech signals
Akbal, Erhan
Barua, Prabal Datta
Tuncer, Turker
Dogan, Sengul
Acharya, U. Rajendra
[J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23) : 21319 - 21333
[2] Improved speech emotion recognition with Mel frequency magnitude coefficient
Ancilin, J.
Milton, A.
[J]. APPLIED ACOUSTICS, 2021, 179
[3] Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files
Andayani, Felicia
Theng, Lau Bee
Tsun, Mark Teekit
Chua, Caslon
[J]. IEEE ACCESS, 2022, 10 : 36018 - 36027
[4] Speaker Awareness for Speech Emotion Recognition
Assuncao, Gustavo
Menezes, Paulo
Perdigao, Fernando
[J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (04) : 15 - 22
[5] Badshah AM, 2017, 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), P125
[6] Permutation entropy: A natural complexity measure for time series
Bandt, C
Pompe, B
[J]. PHYSICAL REVIEW LETTERS, 2002, 88 (17) : 4
[7] Bertsekas D.P., 2014, Constrained Optimization and Lagrange Multiplier Methods
[8] Bagged support vector machines for emotion recognition from speech
Bhavan, Anjali
Chauhan, Pankaj
Hitkul
Shah, Rajiv Ratn
[J]. KNOWLEDGE-BASED SYSTEMS, 2019, 184
[9] A comparative study of traditional and newly proposed features for recognition of speech under stress
Bou-Ghazale, SE
Hansen, JHL
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 429 - 442
[10] Detection of Common Cold from Speech Signals using Deep Neural Network
Deb, Suman
Warule, Pankaj
Nair, Amrita
Sultan, Haider
Dash, Rahul
Krajewski, Jarek
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 42 (3) : 1707 - 1722

← 1 2 3 4 5 →