A Novel Self-supervised Representation Learning Model for an Open-Set Speaker Recognition

被引:0
作者
Ohi, Abu Quwsar [1 ]
Gavrilova, Marina L. [1 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
来源
COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2023 | 2023年 / 14164卷
基金
加拿大自然科学与工程研究理事会;
关键词
Representation Learning; Self-supervised Learning; Deep Neural Network; Open-set Speaker Recognition; Behavioral Biometric;
D O I
10.1007/978-3-031-42823-4_20
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speaker recognition is an important problem in behavioral biometric domain. Supervised speaker recognition systems have rapidly evolved since the development of deep learning (DL) architectures. Despite advancements in supervised speaker recognition, an open-set speaker clustering remains a challenging problem. This paper proposes a novel self-supervised representation learning architecture that laverages bi-modal architecture based on CNN and MLP sub-networks. A novel combination of angular prototypical loss and cosine similarity loss ensure an excellent clustering parity, while data augmentation results in a better generalization of the model. The experimental results convincingly demonstrate that the proposed archietcture outperforms state-of-the-art speaker verification methods on VoxCeleb1 and LibriSpeech datasets.
引用
收藏
页码:270 / 282
页数:13
相关论文
共 28 条
[1]  
Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
[2]  
Caron M, 2020, ADV NEUR IN, V33
[3]  
Chen T, 2020, PR MACH LEARN RES, V119
[4]   Exploring Simple Siamese Representation Learning [J].
Chen, Xinlei ;
He, Kaiming .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753
[5]   In defence of metric learning for speaker recognition [J].
Chung, Joon Son ;
Huh, Jaesung ;
Mun, Seongkyu ;
Lee, Minjae ;
Heo, Hee-Soo ;
Choe, Soyeon ;
Ham, Chiheon ;
Jung, Sunghwan ;
Lee, Bong-Jin ;
Han, Icksang .
INTERSPEECH 2020, 2020, :2977-2981
[6]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[7]   Sigmoid-weighted linear units for neural network function approximation in reinforcement learning [J].
Elfwing, Stefan ;
Uchibe, Eiji ;
Doya, Kenji .
NEURAL NETWORKS, 2018, 107 :3-11
[8]  
Grill J.-B., 2020, ADV NEURAL INFORM PR, V33, P21271
[9]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[10]  
Mun SH, 2020, Arxiv, DOI arXiv:2010.11433