Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引:8
作者
Chowdhury, Labib [1 ]
Zunair, Hasib [2 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期
关键词
speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;
D O I
10.3390/app10217522
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [31] Speaker Clustering by Co-Optimizing Deep Representation Learning and Cluster Estimation
    Li, Yanxiong
    Wang, Wucheng
    Liu, Mingle
    Jiang, Zhongjie
    He, Qianhua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3377 - 3387
  • [32] Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems
    Villalba, Jesus
    Joshi, Sonal
    Zelasko, Piotr
    Dehak, Najim
    INTERSPEECH 2021, 2021, : 4304 - 4308
  • [33] Analysis of Speaker Recognition in Blended Emotional Environment Using Deep Learning Approaches
    Tomar, Shalini
    Koolagudi, Shashidhar G.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 691 - 698
  • [34] A deep learning approach for text-independent speaker recognition with short utterances
    Rania Chakroun
    Mondher Frikha
    Multimedia Tools and Applications, 2023, 82 : 33111 - 33133
  • [35] A deep learning approach for text-independent speaker recognition with short utterances
    Chakroun, Rania
    Frikha, Mondher
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (21) : 33111 - 33133
  • [36] Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition
    Ghahabi, Omid
    Hernando, Javier
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 807 - 817
  • [37] Curriculum Learning based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition
    Ranjan, Shivesh
    Misra, Abhinav
    Hansen, John H. L.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3717 - 3721
  • [38] WITHIN-SAMPLE VARIABILITY-INVARIANT LOSS FOR ROBUST SPEAKER RECOGNITION UNDER NOISY ENVIRONMENTS
    Cai, Danwei
    Cai, Weicheng
    Li, Ming
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6469 - 6473
  • [39] Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition
    Novoselov, Sergey
    Shchemelinin, Vadim
    Shulipa, Andrey
    Kozlov, Alexandr
    Kremnev, Ivan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2242 - 2246
  • [40] Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
    Lambamo, Wondimu
    Srinivasagan, Ramasamy
    Jifara, Worku
    APPLIED SCIENCES-BASEL, 2023, 13 (01):