Emotional speaker identification using a novel capsule nets model

被引：19

作者：

Nassif, Ali Bou ^{[1
]}

Shahin, Ismail ^{[2
]}

Elnagar, Ashraf ^{[3
]}

Velayudhan, Divya ^{[1
]}

Alhudhaif, Adi ^{[4
]}

Polat, Kemal ^{[5
]}

机构：

[1] Univ Sharjah, Dept Comp Engn, Sharjah 27272, U Arab Emirates

[2] Univ Sharjah, Dept Elect Engn, Sharjah 27272, U Arab Emirates

[3] Univ Sharjah, Dept Comp Sci, Sharjah 27272, U Arab Emirates

[4] Prince Sattam bin Abdulaziz Univ, Coll Comp Engn & Sci Al Kharj, Dept Comp Sci, POB 151, Al Kharj 11942, Saudi Arabia

[5] Abant Izzet Baysal Univ, Fac Engn, Dept Elect & Elect Engn, Bolu, Turkey

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2022年 / 193卷

关键词：

Capsule network; Convolutional Neural Network; Emotional speech; Speaker Identification; RECOGNITION; COMPUTER; SYSTEM;

D O I：

10.1016/j.eswa.2021.116469

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment. Therefore, speaker recognition models trained on neutral speech fail to correctly identify speakers under emotional stress. Although considerable advancements in speaker identification have been made using convolutional neural networks (CNN), CNNs cannot exploit the spatial association between low-level features. Inspired by the recent introduction of capsule networks (CapsNets), which are based on deep learning to overcome the inadequacy of CNNs in preserving the pose relationship between low-level features with their pooling technique, this study investigates the performance of using CapsNets in identifying speakers from emotional speech recordings. A CapsNet-based speaker identification model is proposed and evaluated using three distinct speech databases, i.e., the Emirati Speech Database, SUSAS Dataset, and RAVDESS (open-access). The proposed model is also compared to baseline systems. Experimental results demonstrate that the novel proposed CapsNet model trains faster and provides better results over current stateof-the-art schemes. The effect of the routing algorithm on speaker identification performance was also studied by varying the number of iterations, both with and without a decoder network.

引用

页数：11

共 50 条

[1] Emotional Speaker Verification Using Novel Modified Capsule Neural Network
Nassif, Ali Bou
Shahin, Ismail
Nemmour, Nawel
Hindawi, Noor
Elnagar, Ashraf
MATHEMATICS, 2023, 11 (02)
[2] Speaker Identification Enhancement Using Emotional Features
Jabnoun, Jihed
Zrigui, Ahmed
Slimi, Anwer
Ringeval, Fabien
Schwab, Didier
Zrigui, Mounir
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2023, 2023, 14162 : 526 - 539
[3] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
M. Milošević
Ž. Nedeljković
U. Glavitsch
Ž. Đurović
Journal of Communications Technology and Electronics, 2019, 64 : 1256 - 1265
[4] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
Milosevic, M.
Nedeljkovic, Z.
Glavitsch, U.
Durovic, Z.
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (11) : 1256 - 1265
[5] Speaker identification in emotional environments
Department of Electrical and Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates
Iran. J. Electr. Comput. Eng., 2009, 1 (41-46):
[6] EFFICIENT SPEAKER IDENTIFICATION USING DISTRIBUTIONAL SPEAKER MODEL CLUSTERING
Apsingekar, Vijendra Raj
De Leon, Phillip L.
2008 42ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-4, 2008, : 1260 - 1264
[7] Emotional Speaker Identification by Humans and Machines
Yang, Yingchun
Chen, Li
Wang, Wenyi
BIOMETRIC RECOGNITION: CCBR 2011, 2011, 7098 : 167 - 173
[8] Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech
Sarma, Biswajit Dev
Das, Rohan Kumar
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 610 - 615
[9] Real-Time Speaker Identification Using Speaker Model Distance
Zeinali, Hossein
Sameti, Hossein
Hadian, Hossein
2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 643 - 647
[10] Emotional speaker identification using PCAFCM-deepforest with fuzzy logic
Nassif, Ali Bou
Shahin, Ismail
Nemmour, Nawel
Neural Computing and Applications, 2024, 36 (30) : 18567 - 18581

← 1 2 3 4 5 →