MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network

被引:27
|
作者
Jothimani, S. [1 ]
Premalatha, K. [1 ]
机构
[1] Bannari Amman Inst Technol, Dept Comp Sci & Engn, Sathyamangalam 638401, India
关键词
Augmentation; Contrastive loss; MFCC; RMS; Speech emotion recognition; ZCR; ACCURACY;
D O I
10.1016/j.chaos.2022.112512
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Speech Emotion Recognition (SER) is a complex task because of the feature selections that reflect the emotion from the human speech. The SER plays a vital role and is very challenging in Human-Computer Interaction (HCI). Traditional methods provide inconsistent feature extraction for emotion recognition. The primary motive of this paper is to improve the accuracy of the classification of eight emotions from the human voice. The proposed MFF-SAug research, Enhance the emotion prediction from the speech by Noise Removal, White Noise Injection, and Pitch Tuning. On pre-processed speech signals, the feature extraction techniques Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), and Root Mean Square (RMS) are applied and combined to achieve substantial performance used for emotion recognition. The augmentation applies to the raw speech for a contrastive loss that maximizes agreement between differently augmented samples in the latent space and reconstructs the loss of input representation for better accuracy prediction. A state-of-the-art Convolution Neural Network (CNN) is proposed for enhanced speech representation learning and voice emotion classification. Further, this MFF-SAug method is compared with the CNN + LSTM model. The experi-mental analysis was carried out using the RAVDESS, CREMA, SAVEE, and TESS datasets. Thus, the classifier achieved a robust representation for speech emotion recognition with an accuracy of 92.6 %, 89.9, 84.9 %, and 99.6 % for RAVDESS, CREMA, SAVEE, and TESS datasets, respectively.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network
    Mustaqeem
    Kwon, Soonil
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (09) : 5116 - 5135
  • [42] Speech Emotion Recognition Using Multi-Scale Global-Local Representation Learning with Feature Pyramid Network
    Wang, Yuhua
    Huang, Jianxing
    Zhao, Zhengdao
    Lan, Haiyan
    Zhang, Xinjia
    APPLIED SCIENCES-BASEL, 2024, 14 (24):
  • [43] Effect on speech emotion classification of a feature selection approach using a convolutional neural network
    Amjad, Ammar
    Khan, Lal
    Chang, Hsien-Tsung
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [44] Effect on speech emotion classification of a feature selection approach using a convolutional neural network
    Amjad A.
    Khan L.
    Chang H.-T.
    PeerJ Computer Science, 2021, 7
  • [45] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
    Baruah, Murchana
    Banerjee, Bonny
    INTERSPEECH 2022, 2022, : 4710 - 4714
  • [46] Enhanced Speech Emotion Recognition Using the Cognitive Emotion Fusion Network for PTSD Detection with a Novel Hybrid Approach
    Suneetha, Chappidi
    Anitha, Raju
    JOURNAL OF ELECTRICAL SYSTEMS, 2023, 19 (04) : 376 - 398
  • [47] Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    ENGINEERING LETTERS, 2019, 27 (04) : 901 - 906
  • [48] Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Zhang, Linjuan
    Guan, Haotian
    Li, Xiangang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1611 - 1615
  • [49] A NEW APPROACH FOR SPEECH EMOTION RECOGNITION USING SINGLE LAYERED CONVOLUTIONAL NEURAL NETWORK
    Mannan, J. Mannar
    Kumar, V. Vinoth
    Palaiahnakote, Shivakumara
    Khan, Surbhi Bhatia
    Almusharraf, Ahlam
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2024, 37 (01) : 89 - 106
  • [50] Towards an efficient backbone for preserving features in speech emotion recognition: deep-shallow convolution with recurrent neural network
    Dev Priya Goel
    Kushagra Mahajan
    Ngoc Duy Nguyen
    Natesan Srinivasan
    Chee Peng Lim
    Neural Computing and Applications, 2023, 35 : 2457 - 2469