Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

被引:1
|
作者
Al Dujaili, Mohammed Jawad [1 ]
Ebrahimi-Moghadam, Abbas [2 ]
机构
[1] Univ Kufa, Fac Engn, Dept Elect & Commun, Najaf, Iraq
[2] Ferdowsi Univ Mashhad, Elect Engn Dept, Fac Engn, Mashhad, Iran
基金
英国科研创新办公室;
关键词
Speech emotion recognition (SER); MFCC; Jitter; Shimmer; PCA; ANN; LDA; K_NN; MODELS;
D O I
10.1007/s11042-023-15413-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite many efforts in Speech Emotion Recognition, there is still a big gap between natural human feelings and computer perception. In this article, the recognition of the speaker's emotions in Persian and German has been examined. For this purpose, Persian emotional speech utterances have been expressed, including 748 sentences with seven feelings of Neutral, Disgust, Fear, Anger, Sadness, Boredom and Happiness. German emotional speech utterances consist of 536 sentences created by professional actors in a laboratory environment, 16 of which with seven different feelings of Happiness, hatred, naturalness, fear, Sadness, Anger, and fatigue. After extracting widely used properties such as MFCC Mel Frequency Cepstral Coefficients and its derivatives, local frequency perturbation coefficient (Jitter), and local perturbation coefficient (Shimmer), various features of this database are extracted separately because of the vast number of options. Reducing feature space is required before applying the principal component classification (PCA) algorithm. Also, three classifications of Artificial neural network (ANN), Linear Discriminant Analysis (LDA), and K_Nearest Neighbor (K_NN) have been used to classify emotions. For the German database, the top results were obtained by fusing the MFCC + Shimmer properties and LDA classification with a precision detection of 91.26% and a runtime execution of 0.43 s, and the best results for the Persian database were obtained by fusing the Jitter + Shimmer properties and K_NN classification with a precision detection of 91.5% and a runtime execution of 0.65 s. The results show that the ability to distinguish attribute vectors is quite different for each emotional state. Expression of emotions and their effect on speech differ in Persian and German.
引用
收藏
页码:42783 / 42801
页数:19
相关论文
共 33 条
  • [21] Single Appliance Recognition Using Statistical Features Based k-NN Classification
    Liu, Qi
    Wu, Hao
    Liu, Xiaodong
    Linge, Nigel
    CLOUD COMPUTING AND SECURITY, PT II, 2017, 10603 : 631 - 640
  • [22] Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    ELECTRONICS, 2023, 12 (04)
  • [23] Investigating voice features for Speech emotion recognition based on four kinds of machine learning methods
    Chen, Haiyan
    Liu, Zheng
    Kang, Xin
    Nishide, Shun
    Ren, Fuji
    PROCEEDINGS OF 2019 6TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2019, : 195 - 199
  • [24] Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech.
    Yogesh, C. K.
    Hariharan, M.
    Ngadiran, Ruzelita
    Adom, A. H.
    Yaacob, Sazali
    Polat, Kemal
    APPLIED SOFT COMPUTING, 2017, 56 : 217 - 232
  • [25] Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition
    Mishra, Siba Prasad
    Warule, Pankaj
    Deb, Suman
    SPEECH COMMUNICATION, 2025, 166
  • [26] Multiclass SVM-based Language-Independent Emotion Recognition using Selective Speech Features
    Amol, Kokane T.
    Guddeti, Ram Mohana Reddy
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1069 - 1073
  • [27] SEGMENT-LEVEL TRAINING OF ANNS BASED ON ACOUSTIC CONFIDENCE MEASURES FOR HYBRID HMM/ANN SPEECH RECOGNITION
    Dubagunta, S. Pavankumar
    Magimai-Doss, Mathew
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6435 - 6439
  • [28] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
    Pawar, Manju D.
    Kokate, Rajendra D.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) : 15563 - 15587
  • [29] Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages
    Gupta, Astha
    Kumar, Rakesh
    Kumar, Yogesh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 30145 - 30166
  • [30] Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages
    Astha Gupta
    Rakesh Kumar
    Yogesh Kumar
    Multimedia Tools and Applications, 2024, 83 : 30145 - 30166