Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

被引：0

作者：

Gondohanindijo, Jutono ^{[1
]}

Muljono ^{[1
]}

Noersasongko, Edi ^{[1
]}

Pujiono ^{[1
]}

Setiadi, De Rosal Moses ^{[1
]}

机构：

[1] Univ Dian Nuswantoro, Fac Comp Sci, Semarang, Indonesia

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2023年 / 14卷 / 06期

关键词：

Deep learning; multi-features extraction; RAVDESS; speech emotion recognition; CLASSIFICATION;

D O I：

10.14569/IJACSA.2023.0140623

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The increasing need for human interaction with computers makes the interaction process more advanced, one of which is by utilizing voice recognition. Developing a voice command system also needs to consider the user's emotional state because the users indirectly treat computers like humans in general. By knowing the type of a person's emotions, the computer can adjust the type of feedback that will be given so that the human-computer interaction (HCI) process will run more humanely. Based on the results of previous research, increasing the accuracy of recognizing the types of human emotions is still a challenge for researchers. This is because not all types of emotions can be expressed equally, especially differences in language and cultural accents. In this study, it is proposed to recognize speech-based emotion types using multifeature extraction and deep learning. The dataset used is taken from the RAVDESS database. The dataset was then extracted using MFCC, Chroma, Mel-Spectrogram, Contrast, and Tonnetz. Furthermore, in this study, PCA (Principal Component Analysis) and Min-Max Normalization techniques will be applied to determine the impact resulting from the application of these techniques. The data obtained from the pre-processing stage is then used by the Deep Neural Network (DNN) model to identify the types of emotions such as calm, happy, sad, angry, neutral, fearful, surprised, and disgusted. The model testing process uses the confusion matrix technique to determine the performance of the proposed method. The test results for the DNN model obtained the accuracy value of 93.61%, a sensitivity of 73.80%, and a specificity of 96.34%. The use of multi-features in the proposed method can improve the performance of the model's accuracy in determining the type of emotion based on the RAVDESS dataset. In addition, using the PCA method also provides an increase in pattern correlation between features so that the classifier model can show performance improvements, especially accuracy, specificity, and sensitivity.

引用

页码：198 / 206

页数：9

共 32 条

[1] A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients
Abraham, J. V. Thomas
Khan, A. Nayeemulla
Shahina, A.
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 26 (3) : 579 - 587
[2] A systematic survey on multimodal emotion recognition using learning algorithms
Ahmed, Naveed
Al Aghbari, Zaher
Girija, Shini
[J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 17
[3] Ajibola Alim S., 2018, From Natural to Artificial Intelligence - Algorithms and Applications, P3, DOI DOI 10.5772/INTECHOPEN.80419
[4] Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier
Alnuaim, Abeer Ali
Zakariah, Mohammed
Shukla, Prashant Kumar
Alhadlaq, Aseel
Hatamleh, Wesam Atef
Tarazi, Hussam
Sureshbabu, R.
Ratna, Rajnish
[J]. JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022
[5] Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention
Alsabhan, Waleed
[J]. SENSORS, 2023, 23 (03)
[6] Emotion detection from multilingual audio using deep analysis
Bhattacharya, Sudipta
Borah, Samarjeet
Mishra, Brojo Kishore
Mondal, Atreyee
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 41309 - 41338
[7] Emotion and sociable humanoid robots
Breazeal, C
[J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2003, 59 (1-2) : 119 - 155
[8] Chien JT, 2019, SOURCE SEPARATION AND MACHINE LEARNING, P259, DOI 10.1016/B978-0-12-804566-4.00019-X
[9] Chowanda A, 2022, INT J ADV COMPUT SC, V13, P777
[10] A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness
Dogan, Neslihan
Tanrikulu, Zuhal
[J]. INFORMATION TECHNOLOGY & MANAGEMENT, 2013, 14 (02) : 105 - 124

← 1 2 3 4 →