Speech emotion recognition using multi resolution Hilbert transform based spectral and entropy features

被引：0

作者：

Mishra, Siba Prasad ^{[1
]}

Warule, Pankaj ^{[1
]}

Deb, Suman ^{[1
]}

机构：

[1] Sardar Vallabhbhai Natl Inst Technol, Surat, Gujarat, India

来源：

APPLIED ACOUSTICS | 2025年 / 229卷

关键词：

Deep neural network; Speech emotion recognition; Mel frequency cepstral coefficient; MRHT; MRHAE; MRHPE; MRHIE; MRHSE; MRHSME; PERMUTATION ENTROPY; APPROXIMATE ENTROPY; CLASSIFICATION; DIAGNOSIS; DISEASE;

D O I：

10.1016/j.apacoust.2024.110403

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech emotion recognition (SER) is essential for addressing many personal and professional challenges in our everyday lives. The application of SER has shown potential in a number of domains, such as medical intervention, fortification of security systems, online marketing and educational platforms, personal communication, strengthening of devices and human interaction, and numerous other domains. Due to its extensive variety of applications, this subject has attracted the attention of several researchers for more than three decades. The performance of SER can be improved by adopting a suitable methodology for extracting the feature and using it to classify speech emotion. In our study, we used a novel technique known as the multi-resolution Hilbert transform (MRHT) method to extract the speech feature. We used the multi-resolution signal decomposition (MRSD) method to break down the speech signal frame (SSF) into a number of sub- frequency band signals, which are called modes or intrinsic mode functions (IMFs). Then, Hilbert transform (HT) is applied to each IMF signal to find the MRHT-based instantaneous amplitude (MRHIA) and MRHT-based instantaneous frequency (MRHIF) signal vectors. Features such as MRHT-based approximate entropy (MRHAE), MRHT-based permutation entropy (MRHPE), MRHT-based increment entropy (MRHIE), MRHT-based spectral entropy (MRHSE), and MRHT-based sample entropy (MRHSME) were calculated using each MRHIA and MRHIF signal vectors and the mel frequency cepstral coefficient (MFCC) feature extracted using the speech signals. The combinations of the proposed MRHT-based features (MRHAE + MRHPE + MRHIE + MRHSE + MRHSME) are known as the MRHT-based entropy feature (MRHEF). Subsequently, the MRHEF and MFCC features are used both alone and in conjunction to categorize speech emotion using a deep neural network (DNN) classifier. This results in emotion classification accuracies of 89.67%, 85.42%, and 83.48% for the EMO-DB, EMOVO, and SAVEE datasets, respectively. Comparing our experimental results with the other approaches, we found that the proposed feature combinations (MFCC + MRHEF) using a DNN classifier outperformed the state-of-the-art methods in SER.

引用

页数：15

共 67 条

[1] A new approach to early diagnosis of congestive heart failure disease by using Hilbert-Huang transform [J].

Altan, Gokhan ;

Kutlu, Yakup ;

Allahverdi, Novruz .

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2016, 137 :23-34

[2] Improved speech emotion recognition with Mel frequency magnitude coefficient [J].

Ancilin, J. ;

Milton, A. .

APPLIED ACOUSTICS, 2021, 179

[3] Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files [J].

Andayani, Felicia ;

Theng, Lau Bee ;

Tsun, Mark Teekit ;

Chua, Caslon .

IEEE ACCESS, 2022, 10 :36018-36027

[4] Speaker Awareness for Speech Emotion Recognition [J].

Assuncao, Gustavo ;

Menezes, Paulo ;

Perdigao, Fernando .

INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (04) :15-22

[5] Improved multiscale permutation entropy for biomedical signal analysis: Interpretation and application to electroencephalogram recordings [J].

Azami, Hamed ;

Escudero, Javier .

BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2016, 23 :28-41

[6] A comparative study of traditional and newly proposed features for recognition of speech under stress [J].

Bou-Ghazale, SE ;

Hansen, JHL .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04) :429-442

[7]

Burkhardt F., 2005, Interspeech, P1517

[8]

Cen L, 2016, EMOTIONS, P27

[9] Cosine Similarity Entropy: Self-Correlation-Based Complexity Analysis of Dynamical Systems [J].

Chanwimalueang, Theerasak ;

Mandic, Danilo P. .

ENTROPY, 2017, 19 (12)

[10]

Costantini G, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3501

← 1 2 3 4 5 6 7 →