Speech emotion classification using attention based network and regularized feature selection

被引:14
作者
Akinpelu, Samson [1 ]
Viriri, Serestina [1 ]
机构
[1] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, ZA-4000 Durban, South Africa
关键词
RECURRENT NEURAL-NETWORKS;
D O I
10.1038/s41598-023-38868-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human-Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and deep neural network (DNN) models have been proposed for efficient recognition of emotion from speech however, the suitability of these methods to accurately classify emotion from speech with multi-lingual background and other factors that impede efficient classification of emotion is still demanding critical consideration. This study proposed an attention-based network with a pre-trained convolutional neural network and regularized neighbourhood component analysis (RNCA) feature selection techniques for improved classification of speech emotion. The attention model has proven to be successful in many sequence-based and time-series tasks. An extensive experiment was carried out using three major classifiers (SVM, MLP and Random Forest) on a publicly available TESS (Toronto English Speech Sentence) dataset. The result of our proposed model (Attention-based DCNN+RNCA+RF) achieved 97.8% classification accuracy and yielded a 3.27% improved performance, which outperforms state-of-the-art SEC approaches. Our model evaluation revealed the consistency of attention mechanism and feature selection with human behavioural patterns in classifying emotion from auditory speech.
引用
收藏
页数:14
相关论文
共 50 条
[1]   Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning [J].
Akinpelu, Samson ;
Viriri, Serestina .
APPLIED SCIENCES-BASEL, 2022, 12 (16)
[2]   Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features [J].
Anvarjon, Tursunov ;
Mustaqeem ;
Kwon, Soonil .
SENSORS, 2020, 20 (18) :1-16
[3]   Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition [J].
Atila, Orhan ;
Sengur, Abdulkadir .
APPLIED ACOUSTICS, 2021, 182
[4]  
Atsavasirilert K., 2019, 2019 14 INT JOINT S, P1
[5]  
Ba'abbad I., 2021, J. Data Anal. Inf. Process, V9, P162, DOI [10.4236/jdaip.2021.93011, DOI 10.4236/JDAIP.2021.93011]
[6]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[7]   A place for emotions in behavior systems research [J].
Burghardt, Gordon M. .
BEHAVIOURAL PROCESSES, 2019, 166
[8]  
Staudemeyer RC, 2019, Arxiv, DOI arXiv:1909.09586
[9]   3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition [J].
Chen, Mingyi ;
He, Xuanji ;
Yang, Jing ;
Zhang, Han .
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) :1440-1444
[10]   A novel dual attention-based BLSTM with hybrid features in speech emotion recognition [J].
Chen, Qiupu ;
Huang, Guimin .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102