Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

被引：1

作者：

Al Dujaili, Mohammed Jawad ^{[1
]}

Ebrahimi-Moghadam, Abbas ^{[2
]}

机构：

[1] Univ Kufa, Fac Engn, Dept Elect & Commun, Najaf, Iraq

[2] Ferdowsi Univ Mashhad, Elect Engn Dept, Fac Engn, Mashhad, Iran

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 27期

基金：

英国科研创新办公室;

关键词：

Speech emotion recognition (SER); MFCC; Jitter; Shimmer; PCA; ANN; LDA; K_NN; MODELS;

D O I：

10.1007/s11042-023-15413-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite many efforts in Speech Emotion Recognition, there is still a big gap between natural human feelings and computer perception. In this article, the recognition of the speaker's emotions in Persian and German has been examined. For this purpose, Persian emotional speech utterances have been expressed, including 748 sentences with seven feelings of Neutral, Disgust, Fear, Anger, Sadness, Boredom and Happiness. German emotional speech utterances consist of 536 sentences created by professional actors in a laboratory environment, 16 of which with seven different feelings of Happiness, hatred, naturalness, fear, Sadness, Anger, and fatigue. After extracting widely used properties such as MFCC Mel Frequency Cepstral Coefficients and its derivatives, local frequency perturbation coefficient (Jitter), and local perturbation coefficient (Shimmer), various features of this database are extracted separately because of the vast number of options. Reducing feature space is required before applying the principal component classification (PCA) algorithm. Also, three classifications of Artificial neural network (ANN), Linear Discriminant Analysis (LDA), and K_Nearest Neighbor (K_NN) have been used to classify emotions. For the German database, the top results were obtained by fusing the MFCC + Shimmer properties and LDA classification with a precision detection of 91.26% and a runtime execution of 0.43 s, and the best results for the Persian database were obtained by fusing the Jitter + Shimmer properties and K_NN classification with a precision detection of 91.5% and a runtime execution of 0.65 s. The results show that the ability to distinguish attribute vectors is quite different for each emotional state. Expression of emotions and their effect on speech differ in Persian and German.

引用

页码：42783 / 42801

页数：19

共 33 条

[21] Single Appliance Recognition Using Statistical Features Based k-NN Classification
Liu, Qi
Wu, Hao
Liu, Xiaodong
Linge, Nigel
CLOUD COMPUTING AND SECURITY, PT II, 2017, 10603 : 631 - 640
[22] Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network
Bhangale, Kishor
Kothandaraman, Mohanaprasad
ELECTRONICS, 2023, 12 (04)
[23] Investigating voice features for Speech emotion recognition based on four kinds of machine learning methods
Chen, Haiyan
Liu, Zheng
Kang, Xin
Nishide, Shun
Ren, Fuji
PROCEEDINGS OF 2019 6TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2019, : 195 - 199
[24] Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech.
Yogesh, C. K.
Hariharan, M.
Ngadiran, Ruzelita
Adom, A. H.
Yaacob, Sazali
Polat, Kemal
APPLIED SOFT COMPUTING, 2017, 56 : 217 - 232
[25] Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition
Mishra, Siba Prasad
Warule, Pankaj
Deb, Suman
SPEECH COMMUNICATION, 2025, 166
[26] Multiclass SVM-based Language-Independent Emotion Recognition using Selective Speech Features
Amol, Kokane T.
Guddeti, Ram Mohana Reddy
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1069 - 1073
[27] SEGMENT-LEVEL TRAINING OF ANNS BASED ON ACOUSTIC CONFIDENCE MEASURES FOR HYBRID HMM/ANN SPEECH RECOGNITION
Dubagunta, S. Pavankumar
Magimai-Doss, Mathew
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6435 - 6439
[28] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
Pawar, Manju D.
Kokate, Rajendra D.
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) : 15563 - 15587
[29] Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages
Gupta, Astha
Kumar, Rakesh
Kumar, Yogesh
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 30145 - 30166
[30] Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages
Astha Gupta
Rakesh Kumar
Yogesh Kumar
Multimedia Tools and Applications, 2024, 83 : 30145 - 30166

← 1 2 3 4 →