Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network

被引:0
作者
Shalini Kapoor
Tarun Kumar
机构
[1] Research Scholar,Department of Computer Science & Engineering
[2] Dr. A.P.J Abdul Kalam Technical University,undefined
[3] Radha Govind Group of Institution,undefined
来源
Multimedia Tools and Applications | 2022年 / 81卷
关键词
Speech emotion recognition; Convolutional neural networks; Deep learning; Emotion change detection; Spectrograms;
D O I
暂无
中图分类号
学科分类号
摘要
Stress and anger are two negative emotions that affect individuals both mentally and physically; there is a need to tackle them as soon as possible. Automated systems are highly required to monitor mental states and to detect early signs of emotional health issues. In the present work convolutional neural network is proposed for anger and stress detection using handcrafted features and deep learned features from the spectrogram. The objective of using a combined feature set is gathering information from two different representations of speech signals to obtain more prominent features and to boost the accuracy of recognition. The proposed method of emotion assessment is more computationally efficient than similar approaches used for emotion assessment. The preliminary results obtained on experimental evaluation of the proposed approach on three datasets Toronto Emotional Speech Set (TESS), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Berlin Emotional Database (EMO-DB) indicate that categorical accuracy is boosted and cross-entropy loss is reduced to a considerable extent. The proposed convolutional neural network (CNN) obtains training (T) and validation (V) categorical accuracy of T = 93.7%, V = 95.6% for TESS, T = 97.5%, V = 95.6% for EMO-DB and T = 96.7%, V = 96.7% for RAVDESS dataset.
引用
收藏
页码:31107 / 31128
页数:21
相关论文
共 65 条
  • [1] Anvarjon T(2020)Deep-net: A lightweight cnn-based speech emotion recognition system using deep frequency features Sens (Switzerland) 20 1-16
  • [2] Mustaqeem S(2019)Deep features-based speech emotion recognition for smart affective services Multimed Tools Appl 78 5571-5589
  • [3] Kwon AM(2021)A survey of deep convolutional neural networks applied for prediction of plant leaf diseases Sensors 21 4749-116
  • [4] Badshah VS(2019)3D CNN-based speech emotion recognition using k-means clustering and spectrograms Entropy 21 479-69
  • [5] Dhaka N(2020)Data-driven cervical cancer prediction model with outlier detection and over-sampling methods Sensors 20 2809-2213
  • [6] Hajarolasvadi H(2021)Speech emotion recognition using emotion perception spectral feature Concurr Comput Pract Exp 33 e5427-125881
  • [7] Demirel MF(2008)Speech emotion analysis Scholarpedia 3 4240-79875
  • [8] Ijaz M(2018)The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english PLoS ONE 13 e0196391-2938
  • [9] Attique Y(2020)Automated assessment of psychiatric disorders using speech: A systematic review Laryngoscope Investig Otolaryngol 5 96-673
  • [10] Son L(2018)Speech emotion recognition based on long short-term memory and convolutional neural networks Nanjing Youdian Daxue Xuebao (Ziran Kexue Ban)/J Nanjing Univ Posts Telecommun (Natural Sci) 38 63-99