Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引:8
|
作者
Chowdhury, Labib [1 ]
Zunair, Hasib [2 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期
关键词
speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;
D O I
10.3390/app10217522
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [1] Latent discriminative representation learning for speaker recognition
    Huang, Duolin
    Mao, Qirong
    Ma, Zhongchen
    Zheng, Zhishen
    Routryar, Sidheswar
    Ocquaye, Elias-Nii-Noi
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (05) : 697 - 708
  • [2] Angular Margin Centroid Loss for Text-independent Speaker Recognition
    Wei, Yuheng
    Du, Junzhao
    Liu, Hui
    INTERSPEECH 2020, 2020, : 3820 - 3824
  • [3] Multi-Noise Representation Learning for Robust Speaker Recognition
    Cho, Sunyoung
    Wee, Kyungchul
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 681 - 685
  • [4] Curricular SincNet: Towards Robust Deep Speaker Recognition by Emphasizing Hard Samples in Latent Space
    Chowdhury, Labib
    Kamal, Mustafa
    Hasan, Najia
    Mohammed, Nabeel
    PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE OF THE BIOMETRICS SPECIAL INTEREST GROUP (BIOSIG 2021), 2021, 315
  • [5] Speaker recognition based on deep learning: An overview
    Bai, Zhongxin
    Zhang, Xiao-Lei
    NEURAL NETWORKS, 2021, 140 : 65 - 99
  • [6] A robust feature based on sparse representation for speaker recognition
    Xie, Yining
    Huang, Jinjie
    Wang, Xinlei
    Journal of Computational Information Systems, 2013, 9 (09): : 3553 - 3561
  • [7] A deep learning approach for speaker recognition
    Soufiane Hourri
    Jamal Kharroubi
    International Journal of Speech Technology, 2020, 23 : 123 - 131
  • [8] A deep learning approach for speaker recognition
    Hourri, Soufiane
    Kharroubi, Jamal
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 123 - 131
  • [9] Disentangled Representation Learning for Multilingual Speaker Recognition
    Nam, Kihyun
    Kim, Youkyum
    Huh, Jaesung
    Heo, Hee-Soo
    Jung, Jee-weon
    Chung, Joon Son
    INTERSPEECH 2023, 2023, : 5316 - 5320
  • [10] Max-Margin Metric Learning for Speaker Recognition
    Li, Laitian
    Wang, Dong
    Xing, Chao
    Zheng, Thomas Fang
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,