Empirical Comparison between Deep and Classical Classifiers for Speaker Verification in Emotional Talking Environments

被引：2

作者：

Nassif, Ali Bou ^{[1
]}

Shahin, Ismail ^{[2
]}

Lataifeh, Mohammed ^{[3
]}

Elnagar, Ashraf ^{[3
]}

Nemmour, Nawel ^{[1
]}

机构：

[1] Univ Sharjah, Comp Engn Dept, Sharjah 27272, U Arab Emirates

[2] Univ Sharjah, Elect Engn Dept, Sharjah 27272, U Arab Emirates

[3] Univ Sharjah, Comp Sci Dept, Sharjah 27272, U Arab Emirates

来源：

INFORMATION | 2022年 / 13卷 / 10期

关键词：

classical classifiers; deep neural network; emotional speech; feature extraction; speaker verification;

D O I：

10.3390/info13100456

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech signals carry various bits of information relevant to the speaker such as age, gender, accent, language, health, and emotions. Emotions are conveyed through modulations of facial and vocal expressions. This paper conducts an empirical comparison of performances between the classical classifiers: Gaussian Mixture Model (GMM), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Artificial neural networks (ANN); and the deep learning classifiers, i.e., Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Gated Recurrent Unit (GRU) in addition to the ivector approach for a text-independent speaker verification task in neutral and emotional talking environments. The deep models undergo hyperparameter tuning using the Grid Search optimization algorithm. The models are trained and tested using a private Arabic Emirati Speech Database, Ryerson Audio-Visual Database of Emotional Speech and Song dataset (RAVDESS) database, and a public Crowd-Sourced Emotional Multimodal Actors (CREMA) database. Experimental results illustrate that deep architectures do not necessarily outperform classical classifiers. In fact, evaluation was carried out through Equal Error Rate (EER) along with Area Under the Curve (AUC) scores. The findings reveal that the GMM model yields the lowest EER values and the best AUC scores across all datasets, amongst classical classifiers. In addition, the ivector model surpasses all the fine-tuned deep models (CNN, LSTM, and GRU) based on both evaluation metrics in the neutral, as well as the emotional speech. In addition, the GMM outperforms the ivector using the Emirati and RAVDESS databases.

引用

页数：23

共 42 条

[1] Three-stage speaker verification architecture in emotional talking environments
Shahin I.
Nassif A.B.
International Journal of Speech Technology, 2018, 21 (4) : 915 - 930
[2] Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments
Shahin, Ismail
Nassif, Ali Bou
Nemmour, Nawel
Elnagar, Ashraf
Alhudhaif, Adi
Polat, Kemal
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (23): : 16033 - 16055
[3] Speaker Verification in Emotional Talking Environments based on Three-Stage Framework
Shahin, Ismail
2017 INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTING TECHNOLOGIES AND APPLICATIONS (ICECTA), 2017, : 503 - 507
[4] Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments
Ismail Shahin
Ali Bou Nassif
Nawel Nemmour
Ashraf Elnagar
Adi Alhudhaif
Kemal Polat
Neural Computing and Applications, 2021, 33 : 16033 - 16055
[5] Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model
Shahin, Ismail
Nassif, Ali Bou
2019 INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTING TECHNOLOGIES AND APPLICATIONS (ICECTA), 2019,
[6] Speaker Identification Investigation and Analysis in Two Distinct Emotional Talking Environments
Shahin, Ismail
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 493 - 498
[7] Speaker identification investigation and analysis in unbiased and biased emotional talking environments
Ismail Mohd Adnan Shahin
International Journal of Speech Technology, 2012, 15 (3) : 325 - 334
[8] Speaker identification investigation and analysis in unbiased and biased emotional talking environments
Shahin, Ismail
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (03) : 325 - 334
[9] Deep speaker embeddings for Speaker Verification: Review and experimental comparison
Jakubec, Maros
Jarina, Roman
Lieskovska, Eva
Kasak, Peter
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
[10] Speaker Emotion Recognition: From Classical Classifiers To Deep Neural Networks
Mezghani, Eya
Charfeddine, Maha
Nicolas, Henri
Ben Amar, Chokri
TENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2017), 2018, 10696

← 1 2 3 4 5 →