Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引:8
作者
Chowdhury, Labib [1 ]
Zunair, Hasib [2 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期
关键词
speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;
D O I
10.3390/app10217522
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [41] Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection
    Li, Jiakang
    Sun, Meng
    Zhang, Xiongwei
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1517 - 1522
  • [42] A Pseudo-task Design in Multi-task Learning Deep Neural Network for Speaker Recognition
    Lu, Xugang
    Shen, Peng
    Tsao, Yu
    Kawai, Hisashi
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [43] Speaker Recognition of Fiber-Optic External Fabry-Perot Interferometric Microphone Based on Deep Learning
    Wang, Yangfeng
    Wan, Shengpeng
    Zhang, Sijun
    Yu, Junsong
    IEEE SENSORS JOURNAL, 2022, 22 (13) : 12906 - 12912
  • [44] Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal
    Banala Saritha
    Mohammad Azharuddin Laskar
    Anish Monsley Kirupakaran
    Rabul Hussain Laskar
    Madhuchhanda Choudhury
    Nirupam Shome
    Circuits, Systems, and Signal Processing, 2024, 43 : 1839 - 1861
  • [45] A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients
    Abraham, J. V. Thomas
    Khan, A. Nayeemulla
    Shahina, A.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 26 (3) : 579 - 587
  • [46] A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients
    J. V. Thomas Abraham
    A. Nayeemulla Khan
    A. Shahina
    International Journal of Speech Technology, 2023, 26 : 579 - 587
  • [47] Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning
    El-Moneim S.A.
    El-Mordy E.A.
    Nassar M.A.
    Dessouky M.I.
    Ismail N.A.
    El-Fishawy A.S.
    El-Dolil S.
    El-Dokany I.M.
    El-Samie F.E.A.
    International Journal of Speech Technology, 2022, 25 (03) : 679 - 687
  • [48] MULTI-MODAL MULTI-TASK DEEP LEARNING FOR SPEAKER AND EMOTION RECOGNITION OF TV-SERIES DATA
    Novitasari, Sashi
    Quoc Truong Do
    Sakti, Sakriani
    Lestari, Dessi
    Nakamura, Satoshi
    2018 ORIENTAL COCOSDA - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2018, : 37 - 42
  • [49] Deep Learning for Speaker Recognition: A Comparative Analysis of 1D-CNN and LSTM Models Using Diverse Datasets
    Hassanzadeh, Hiwa
    Qadir, Jihad Anwar
    Omer, Saman Muhammad
    Ahmed, Mohammed Hussein
    Khezri, Edris
    4TH INTERDISCIPLINARY CONFERENCE ON ELECTRICS AND COMPUTER, INTCEC 2024, 2024,
  • [50] Deep Learning-Based End-to-End Speaker Identification Using Time-Frequency Representation of Speech Signal
    Saritha, Banala
    Laskar, Mohammad Azharuddin
    Kirupakaran, Anish Monsley
    Laskar, Rabul Hussain
    Choudhury, Madhuchhanda
    Shome, Nirupam
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 43 (3) : 1839 - 1861