Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引：8

作者：

Chowdhury, Labib ^{[1
]}

Zunair, Hasib ^{[2
]}

Mohammed, Nabeel ^{[1
]}

机构：

[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh

[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada

来源：

APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期

关键词：

speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;

D O I：

10.3390/app10217522

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.

引用

页码：1 / 17

页数：17

共 50 条

[21] CENTROID-BASED DEEP METRIC LEARNING FOR SPEAKER RECOGNITION
Wang, Jixuan
Wang, Kuan-Chieh
Law, Marc T.
Rudzicz, Frank
Brudno, Michael
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3652 - 3656
[22] A DISCRIMINATIVE UNSUPERVISED METHOD FOR SPEAKER RECOGNITION USING DEEP LEARNING
Saleem, Muhammad Muneeb
Hansen, John H. L.
2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
[23] Robust Hearing-Impaired Speaker Recognition from Speech using Deep Learning Networks in Native Language
Chelliah, Jeyalakshmi
Benny, KiranBala
Arunachalam, Revathi
Balasubramanian, Viswanathan
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (01) : 102 - 112
[24] Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Xiang, Xu
Wang, Shuai
Huang, Houjun
Qian, Yanmin
Yu, Kai
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1652 - 1656
[25] AN ITERATIVE FRAMEWORK FOR SELF-SUPERVISED DEEP SPEAKER REPRESENTATION LEARNING
Cai, Danwei
Wang, Weiqing
Li, Ming
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6728 - 6732
[26] Barlow Twins self-supervised learning for robust speaker recognition
Mohammadamini, Mohammad
Matrouf, Driss
Bonastre, Jean-Francois
Dowerah, Sandipana
Serizel, Romain
Jouvet, Denis
INTERSPEECH 2022, 2022, : 4033 - 4037
[27] Automatic Speaker Recognition using Transfer Learning Approach of Deep Learning Models
Ganvir, Sonal
Lal, Nidhi
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 595 - 601
[28] Latent discriminative representation learning for speaker recognition用于说话人识别的潜在可区分性表征学习
Duolin Huang
Qirong Mao
Zhongchen Ma
Zhishen Zheng
Sidheswar Routryar
Elias-Nii-Noi Ocquaye
Frontiers of Information Technology & Electronic Engineering, 2021, 22 : 697 - 708
[29] A deep learning approach to integrate convolutional neural networks in speaker recognition
Hourri, Soufiane
Nikolov, Nikola S.
Kharroubi, Jamal
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 615 - 623
[30] A deep learning approach to integrate convolutional neural networks in speaker recognition
Soufiane Hourri
Nikola S. Nikolov
Jamal Kharroubi
International Journal of Speech Technology, 2020, 23 : 615 - 623

← 1 2 3 4 5 →