Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引：8

作者：

Chowdhury, Labib ^{[1
]}

Zunair, Hasib ^{[2
]}

Mohammed, Nabeel ^{[1
]}

机构：

[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh

[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada

来源：

APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期

关键词：

speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;

D O I：

10.3390/app10217522

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.

引用

页码：1 / 17

页数：17

共 50 条

[1] Latent discriminative representation learning for speaker recognition
Huang, Duolin
Mao, Qirong
Ma, Zhongchen
Zheng, Zhishen
Routryar, Sidheswar
Ocquaye, Elias-Nii-Noi
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (05) : 697 - 708
[2] Angular Margin Centroid Loss for Text-independent Speaker Recognition
Wei, Yuheng
Du, Junzhao
Liu, Hui
INTERSPEECH 2020, 2020, : 3820 - 3824
[3] Multi-Noise Representation Learning for Robust Speaker Recognition
Cho, Sunyoung
Wee, Kyungchul
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 681 - 685
[4] Curricular SincNet: Towards Robust Deep Speaker Recognition by Emphasizing Hard Samples in Latent Space
Chowdhury, Labib
Kamal, Mustafa
Hasan, Najia
Mohammed, Nabeel
PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE OF THE BIOMETRICS SPECIAL INTEREST GROUP (BIOSIG 2021), 2021, 315
[5] Speaker recognition based on deep learning: An overview
Bai, Zhongxin
Zhang, Xiao-Lei
NEURAL NETWORKS, 2021, 140 : 65 - 99
[6] A robust feature based on sparse representation for speaker recognition
Xie, Yining
Huang, Jinjie
Wang, Xinlei
Journal of Computational Information Systems, 2013, 9 (09): : 3553 - 3561
[7] A deep learning approach for speaker recognition
Soufiane Hourri
Jamal Kharroubi
International Journal of Speech Technology, 2020, 23 : 123 - 131
[8] A deep learning approach for speaker recognition
Hourri, Soufiane
Kharroubi, Jamal
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 123 - 131
[9] Disentangled Representation Learning for Multilingual Speaker Recognition
Nam, Kihyun
Kim, Youkyum
Huh, Jaesung
Heo, Hee-Soo
Jung, Jee-weon
Chung, Joon Son
INTERSPEECH 2023, 2023, : 5316 - 5320
[10] Max-Margin Metric Learning for Speaker Recognition
Li, Laitian
Wang, Dong
Xing, Chao
Zheng, Thomas Fang
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

← 1 2 3 4 5 →