Speech Emotion Recognition Using Clustering Based GA-Optimized Feature Set

被引:31
作者
Kanwal, Sofia [1 ,2 ]
Asghar, Sohail [2 ]
机构
[1] Univ Poonch Rawalakot, Dept CS & IT, Azad Kashmir 12350, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Islamabad Campus, Islamabad 45550, Pakistan
关键词
Optimization; Genetic algorithms; Statistics; Sociology; Feature extraction; Emotion recognition; Biological cells; Clustering; feature engineering; feature optimization; genetic algorithm; OpenSMILE tool kit; speech emotions; support vector machine; STRESS RECOGNITION; GENETIC ALGORITHM; SPECTRAL FEATURES; DATABASES; ENTROPY; PSO;
D O I
10.1109/ACCESS.2021.3111659
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is a hot topic in academia and industry. Feature engineering plays a pivotal role in building an efficient SER. Although researchers have done a tremendous amount of work in this field, there are still the issues of speech feature choice and the correct application of feature engineering that remains to be solved in the domain of SER. In this research, a feature optimization approach that uses a clustering-based genetic algorithm is proposed. Instead of randomly selecting the new generation, clustering is applied at the fitness evaluation level to detect outliers for exclusion to be part of the next generation. The approach is compared with the standard Genetic Algorithm in the context of audio emotion recognition using Berlin Emotional Speech Database (EMO-DB), Ryerson Audio-Visual Database of Speech and Song (RAVDESS) and, Surrey Audio-Visual Expressed Emotion Dataset (SAVEE). Results signify that the proposed technique effectively improved the emotion classification in speech. The recognition rate of 89.6% for general speakers (both male and female), 86.2% for male speakers, and 88.3% for female speakers on EMO-DB, 82.5% for general speakers, 75.4% for male speakers, and 91.1% for female speaker on RAVDESS, and 77.7% for general speakers on SAVEE is obtained in speaker-dependent experiments. For speaker-independent experiments, we achieved the recognition rate of 77.5% on EMO-DB, 76.2% on RAVDESS and, 69.8 % on SAVEE. All the experiments were performed on MATLAB and the Support Vector Machine (SVM) was used for classification. Results confirm that the proposed method is capable of discriminating emotions effectively and performed better than the other approaches used for comparison in terms of performance measures
引用
收藏
页码:125830 / 125842
页数:13
相关论文
共 72 条
[1]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 :56-76
[2]   Feature extraction based on bio-inspired model for robust emotion recognition [J].
Albornoz, Enrique M. ;
Milone, Diego H. ;
Rufiner, Hugo L. .
SOFT COMPUTING, 2017, 21 (17) :5145-5158
[3]  
[Anonymous], 2002, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond Adaptive computation and machine learning
[4]  
[Anonymous], 2009, 2009 3 INT C AFF COM, DOI DOI 10.1109/ACII.2009.5349350
[5]   Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild [J].
Avila, Anderson R. ;
Akhtar, Zahid ;
Santos, Joao F. ;
O'Shaughnessy, Douglas ;
Falk, Tiago H. .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) :177-188
[6]   Deep features-based speech emotion recognition for smart affective services [J].
Badshah, Abdul Malik ;
Rahim, Nasir ;
Ullah, Noor ;
Ahmad, Jamil ;
Muhammad, Khan ;
Lee, Mi Young ;
Kwon, Soonil ;
Baik, Sung Wook .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (05) :5571-5589
[7]  
Badshah AM, 2017, 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), P125
[8]   Bagged support vector machines for emotion recognition from speech [J].
Bhavan, Anjali ;
Chauhan, Pankaj ;
Hitkul ;
Shah, Rajiv Ratn .
KNOWLEDGE-BASED SYSTEMS, 2019, 184
[9]  
Boersma Paul, 2018, Glot Int.
[10]  
Borra Surekha., 2019, Satellite Image Analysis: Clustering and Classification. SpringerBriefs in Applied Sciences and Technology, DOI DOI 10.1007/978-981-13-6424-2