Emotional speaker identification using a novel capsule nets model

被引:19
|
作者
Nassif, Ali Bou [1 ]
Shahin, Ismail [2 ]
Elnagar, Ashraf [3 ]
Velayudhan, Divya [1 ]
Alhudhaif, Adi [4 ]
Polat, Kemal [5 ]
机构
[1] Univ Sharjah, Dept Comp Engn, Sharjah 27272, U Arab Emirates
[2] Univ Sharjah, Dept Elect Engn, Sharjah 27272, U Arab Emirates
[3] Univ Sharjah, Dept Comp Sci, Sharjah 27272, U Arab Emirates
[4] Prince Sattam bin Abdulaziz Univ, Coll Comp Engn & Sci Al Kharj, Dept Comp Sci, POB 151, Al Kharj 11942, Saudi Arabia
[5] Abant Izzet Baysal Univ, Fac Engn, Dept Elect & Elect Engn, Bolu, Turkey
关键词
Capsule network; Convolutional Neural Network; Emotional speech; Speaker Identification; RECOGNITION; COMPUTER; SYSTEM;
D O I
10.1016/j.eswa.2021.116469
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment. Therefore, speaker recognition models trained on neutral speech fail to correctly identify speakers under emotional stress. Although considerable advancements in speaker identification have been made using convolutional neural networks (CNN), CNNs cannot exploit the spatial association between low-level features. Inspired by the recent introduction of capsule networks (CapsNets), which are based on deep learning to overcome the inadequacy of CNNs in preserving the pose relationship between low-level features with their pooling technique, this study investigates the performance of using CapsNets in identifying speakers from emotional speech recordings. A CapsNet-based speaker identification model is proposed and evaluated using three distinct speech databases, i.e., the Emirati Speech Database, SUSAS Dataset, and RAVDESS (open-access). The proposed model is also compared to baseline systems. Experimental results demonstrate that the novel proposed CapsNet model trains faster and provides better results over current stateof-the-art schemes. The effect of the routing algorithm on speaker identification performance was also studied by varying the number of iterations, both with and without a decoder network.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Emotional Speaker Verification Using Novel Modified Capsule Neural Network
    Nassif, Ali Bou
    Shahin, Ismail
    Nemmour, Nawel
    Hindawi, Noor
    Elnagar, Ashraf
    MATHEMATICS, 2023, 11 (02)
  • [2] Speaker Identification Enhancement Using Emotional Features
    Jabnoun, Jihed
    Zrigui, Ahmed
    Slimi, Anwer
    Ringeval, Fabien
    Schwab, Didier
    Zrigui, Mounir
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2023, 2023, 14162 : 526 - 539
  • [3] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    M. Milošević
    Ž. Nedeljković
    U. Glavitsch
    Ž. Đurović
    Journal of Communications Technology and Electronics, 2019, 64 : 1256 - 1265
  • [4] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    Milosevic, M.
    Nedeljkovic, Z.
    Glavitsch, U.
    Durovic, Z.
    JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (11) : 1256 - 1265
  • [5] Speaker identification in emotional environments
    Department of Electrical and Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates
    Iran. J. Electr. Comput. Eng., 2009, 1 (41-46):
  • [6] EFFICIENT SPEAKER IDENTIFICATION USING DISTRIBUTIONAL SPEAKER MODEL CLUSTERING
    Apsingekar, Vijendra Raj
    De Leon, Phillip L.
    2008 42ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-4, 2008, : 1260 - 1264
  • [7] Emotional Speaker Identification by Humans and Machines
    Yang, Yingchun
    Chen, Li
    Wang, Wenyi
    BIOMETRIC RECOGNITION: CCBR 2011, 2011, 7098 : 167 - 173
  • [8] Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech
    Sarma, Biswajit Dev
    Das, Rohan Kumar
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 610 - 615
  • [9] Real-Time Speaker Identification Using Speaker Model Distance
    Zeinali, Hossein
    Sameti, Hossein
    Hadian, Hossein
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 643 - 647
  • [10] Emotional speaker identification using PCAFCM-deepforest with fuzzy logic
    Nassif, Ali Bou
    Shahin, Ismail
    Nemmour, Nawel
    Neural Computing and Applications, 2024, 36 (30) : 18567 - 18581