Optimized feature engineering for machine learning-based emotion recognition from human speech

被引：0

作者：

Thakur, Anuja ^{[1
]}

Kumar Dhull, Sanjeev ^{[1
]}

机构：

[1] Guru Jambheshwar Univ Sci & Technol, Dept EEE, Hisar 125001, Haryana, India

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 08期

关键词：

Feature Engineering; Feature Selection; GA-T; Machine learning; Speech Emotion Recognition; Spotted Hyena Optimization;

D O I：

10.1007/s11760-025-04271-9

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper introduces a novel framework for Speech Emotion Recognition (SER) through advanced feature selection (FS) using hybrid meta-heuristic algorithms, addressing persistent challenges in optimal feature choice and feature engineering applications. Despite progress in SER, selecting the most informative features remains complex, often limiting model effectiveness. Our approach leverages three new hybrid models, specifically designed by integrating the Genetic Algorithm using Tournament selection (GA-T) and Spotted Hyena Optimization (SHO) algorithms for superior feature optimization. The first model (GSHO-I) employs GA-T to refine the feature set generated by SHO, creating a robust filter that enhances feature relevance. The second model (GSHO-II), GA-T and SHO are independently executed to assess feature importance, with their individual importance scores averaged to create a consensus-based metric for feature selection. In the third model (GSHO-III), SHO optimizes the feature set generated by GA-T, creating a dynamic loop that maximizes feature diversity. Our approach utilizes a rich combination of spectral, prosodic, and Wavelet Scattering (WS) features to construct a comprehensive feature set that enhances SER model precision. We rigorously evaluate the models on two extensive SER datasets, EmoDB and SAVEE using Support Vector Machine (SVM), K-Nearest Neighbours (KNN) and neural network classifiers. The simulation analysis reveals that GSHO-II achieves significantly improved performance, particularly when all three features are combined for the SVM classifier. It is observed that GSHO-II achieves accuracies of 96.23% on EmoDB and 92.86% on SAVEE datasets using the SVM classifier, establishing the efficacy of our hybrid model in advancing SER accuracy.

引用

页数：9

共 27 条

[1] Improved speech emotion recognition with Mel frequency magnitude coefficient [J].

Ancilin, J. ;

Milton, A. .

APPLIED ACOUSTICS, 2021, 179

[2] Deep Scattering Spectrum [J].

Anden, Joakim ;

Mallat, Stephane .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) :4114-4128

[3] Intuitionistic fuzzy twin proximal SVM with fuzzy hyperplane and its application in EEG signal classification [J].

Arora, Yash ;

Gupta, S. K. .

APPLIED SOFT COMPUTING, 2024, 163

[4] A robust feature selection method based on meta-heuristic optimization for speech emotion recognition [J].

Bagadi, Kesava Rao ;

Sivappagari, Chandra Mohan Reddy .

EVOLUTIONARY INTELLIGENCE, 2024, 17 (02) :993-1004

[5] Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network [J].

Bhangale, Kishor ;

Kothandaraman, Mohanaprasad .

ELECTRONICS, 2023, 12 (04)

[6] Bagged support vector machines for emotion recognition from speech [J].

Bhavan, Anjali ;

Chauhan, Pankaj ;

Hitkul ;

Shah, Rajiv Ratn .

KNOWLEDGE-BASED SYSTEMS, 2019, 184

[7]

Burkhardt Felix, 2005, Interspeech

[8]

Chen W., 2024, IEEE Transactions on Affective Computing

[9] Multimodal speech emotion recognition and classification using convolutional neural network techniques [J].

Christy, A. ;

Vaithyasubramanian, S. ;

Jesudoss, A. ;

Praveena, M. D. Anto .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) :381-388

[10]

Das Abhrajyoti, 2020, Advances in Computational Intelligence, Security and Internet of Things: Second International Conference, ICCISIoT 2019. Communications in Computer and Information Science (1192), P207, DOI 10.1007/978-981-15-3666-3_18

← 1 2 3 →