Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引:8
|
作者
Chowdhury, Labib [1 ]
Zunair, Hasib [2 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期
关键词
speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;
D O I
10.3390/app10217522
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [21] CENTROID-BASED DEEP METRIC LEARNING FOR SPEAKER RECOGNITION
    Wang, Jixuan
    Wang, Kuan-Chieh
    Law, Marc T.
    Rudzicz, Frank
    Brudno, Michael
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3652 - 3656
  • [22] A DISCRIMINATIVE UNSUPERVISED METHOD FOR SPEAKER RECOGNITION USING DEEP LEARNING
    Saleem, Muhammad Muneeb
    Hansen, John H. L.
    2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
  • [23] Robust Hearing-Impaired Speaker Recognition from Speech using Deep Learning Networks in Native Language
    Chelliah, Jeyalakshmi
    Benny, KiranBala
    Arunachalam, Revathi
    Balasubramanian, Viswanathan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (01) : 102 - 112
  • [24] Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
    Xiang, Xu
    Wang, Shuai
    Huang, Houjun
    Qian, Yanmin
    Yu, Kai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1652 - 1656
  • [25] AN ITERATIVE FRAMEWORK FOR SELF-SUPERVISED DEEP SPEAKER REPRESENTATION LEARNING
    Cai, Danwei
    Wang, Weiqing
    Li, Ming
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6728 - 6732
  • [26] Barlow Twins self-supervised learning for robust speaker recognition
    Mohammadamini, Mohammad
    Matrouf, Driss
    Bonastre, Jean-Francois
    Dowerah, Sandipana
    Serizel, Romain
    Jouvet, Denis
    INTERSPEECH 2022, 2022, : 4033 - 4037
  • [27] Automatic Speaker Recognition using Transfer Learning Approach of Deep Learning Models
    Ganvir, Sonal
    Lal, Nidhi
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 595 - 601
  • [28] Latent discriminative representation learning for speaker recognition用于说话人识别的潜在可区分性表征学习
    Duolin Huang
    Qirong Mao
    Zhongchen Ma
    Zhishen Zheng
    Sidheswar Routryar
    Elias-Nii-Noi Ocquaye
    Frontiers of Information Technology & Electronic Engineering, 2021, 22 : 697 - 708
  • [29] A deep learning approach to integrate convolutional neural networks in speaker recognition
    Hourri, Soufiane
    Nikolov, Nikola S.
    Kharroubi, Jamal
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 615 - 623
  • [30] A deep learning approach to integrate convolutional neural networks in speaker recognition
    Soufiane Hourri
    Nikola S. Nikolov
    Jamal Kharroubi
    International Journal of Speech Technology, 2020, 23 : 615 - 623