A hybrid deep feature selection framework for emotion recognition from human speeches

被引：9

作者：

Marik, Aritra ^{[1
]}

Chattopadhyay, Soumitri ^{[1
]}

Singh, Pawan Kumar ^{[1
]}

机构：

[1] Jadavpur Univ, Dept Informat Technol, Jadavpur Univ Second Campus,Plot 8,LB Block, Kolkata 700106, W Bengal, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 08期

基金：

英国科研创新办公室;

关键词：

Speech emotion recognition; Deep learning; Feature selection; Fuzzy entropy & similarity measures; Whale optimization algorithm; ALGORITHMS;

D O I：

10.1007/s11042-022-14052-y

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech Emotion Recognition (SER) is an active area of signal processing research that aims at identifying emotional states from audio speech signals. Applications of SER range from psychological diagnosis to human-computer interaction and as such, a robust framework is needed for accurate classification. To this end, we propose a two-stage hybrid deep feature selection (HDFS) framework that combines deep learning with automated feature engineering for emotion recognition from human speeches, which shines both in terms of accuracy and computational efficiency. Our pipeline extracts self-learned features using a customized Wide-ResNet-50-2 deep learning model from mel-pectrograms of raw audio signals, whose dimensionality is reduced using a hybrid deep feature selection algorithm that comprises a fuzzy entropy and similarity-based feature ranking method, followed by Whale optimization algorithm, which is a popular meta-heuristic optimization algorithm in literature. A k-nearest neighbor classifier is used to classify the optimized feature subset into the respective emotion classes. The proposed pipeline is evaluated on three publicly available SER datasets using a 5-fold cross-validation scheme, where it is found to outperform several state-of-the-art existing works in literature by significant margins thus, justifying the superiority and reliability of the proposed research. The source codes of the proposed method can be found at: .

引用

页码：11461 / 11487

页数：27

共 71 条

[1] Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models [J].