A learning framework of modified deep recurrent neural network for classification and recognition of voice mood

被引:7
作者
Agarwal, Gaurav [1 ,2 ]
Om, Hari [1 ]
Gupta, Sachi [3 ]
机构
[1] Indian Inst Technol ISM, Dept Comp Sci & Engn, Dhanbad, Jharkhand, India
[2] IMS Engn Coll, Dept Comp Sci & Engn, Ghaziabad, Uttar Pradesh, India
[3] Raj Kumar Goel Inst Technol, Dept Comp Sci & Engn, Ghaziabad, Uttar Pradesh, India
关键词
audio features; classification; deep learning; emotion recognition; optimization; signal processing; voice emotion; EMOTION; FEATURES;
D O I
10.1002/acs.3425
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recognition of human emotions is a basic requirement in many real-time applications. Detection of exact emotions through voice provides relevant information for various purposes. Several computational methods have been employed for the clear analysis of human emotions. Most of the previous approaches face complexities due to certain drawbacks like degraded signal quality, a requirement of high storage space, increased computational complexity, and deteriorated outcomes of classification accuracy. The proposed work was implemented to gather the accurate classification result of embedded emotions and minimize the computational complexities of MDDTRNN (modified deep duck and traveler recurrent neural network). The proposed work includes four steps: preprocessing, feature extraction, feature selection, and classification. In feature extraction, the spectral and frequency features are extracted using the adopting boosted MFCC (Mel frequency cepstral coefficients) method to improve training speed. In feature selection, the best features are selected using an algorithm of AAVOA (adaptive African vulture optimization algorithm). To provide optimal emotion results, the classification step is undertaken by the MDDTRNN technique. The proposed work shows better classification outcomes of emotions when compared to the existing approaches by holding the accuracy of (95.86%), precision as (93.79%), specificity as (94.28%), sensitivity as (92.89%) and the error rate is attained to be 5.266 in terms of IEMOCAP dataset. The accuracy result (96.27%), precision (94.83%), specificity (93.16%), sensitivity (94%) and the error rate is achieved to be 4.982 in terms of the EMODB dataset.
引用
收藏
页码:1835 / 1859
页数:25
相关论文
共 37 条
[1]   African vultures optimization algorithm: A new nature-inspired metaheuristic algorithm for global optimization problems [J].
Abdollahzadeh, Benyamin ;
Gharehchopogh, Farhad Soleimanian ;
Mirjalili, Seyedali .
COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 158
[2]   Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition [J].
Agarwal, Gaurav ;
Om, Hari .
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (07) :9961-9992
[3]   An efficient supervised framework for music mood recognition using autoencoder-based optimised support vector regression model [J].
Agarwal, Gaurav ;
Om, Hari .
IET SIGNAL PROCESSING, 2021, 15 (02) :98-121
[4]   Machine-Learning-Based Emotion Recognition System Using EEG Signals [J].
Alhalaseh, Rania ;
Alasasfeh, Suzan .
COMPUTERS, 2020, 9 (04) :1-15
[5]   Deep features-based speech emotion recognition for smart affective services [J].
Badshah, Abdul Malik ;
Rahim, Nasir ;
Ullah, Noor ;
Ahmad, Jamil ;
Muhammad, Khan ;
Lee, Mi Young ;
Kwon, Soonil ;
Baik, Sung Wook .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (05) :5571-5589
[6]   Speech Emotion Recognition Based on EMD in Noisy Environments [J].
Chu, Yunyun ;
Xiong, Weihua ;
Chen, Wei .
ADVANCES IN CIVIL ENGINEERING AND BUILDING MATERIALS III, 2014, 831 :460-464
[7]   Semisupervised Autoencoders for Speech Emotion Recognition [J].
Deng, Jun ;
Xu, Xinzhou ;
Zhang, Zixing ;
Fruehholz, Sascha ;
Schuller, Bjorn .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) :31-43
[8]   Heart sound classification based on improved MFCC features and convolutional recurrent neural networks [J].
Deng, Muqing ;
Meng, Tingting ;
Cao, Jiuwen ;
Wang, Shimin ;
Zhang, Jing ;
Fan, Huijie .
NEURAL NETWORKS, 2020, 130 :22-32
[9]   A machine learning model for emotion recognition from physiological signals [J].
Dominguez-Jimenez, J. A. ;
Campo-Landines, K. C. ;
Martinez-Santos, J. C. ;
Delahoz, E. J. ;
Contreras-Ortiz, S. H. .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 55
[10]   Evaluating deep learning architectures for Speech Emotion Recognition [J].
Fayek, Haytham M. ;
Lech, Margaret ;
Cavedon, Lawrence .
NEURAL NETWORKS, 2017, 92 :60-68