Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition

被引:17
作者
Kim, Seong-Hu [1 ]
Park, Yong-Hwa [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Mech Engn, Daejeon, South Korea
来源
INTERSPEECH 2021 | 2021年
关键词
speaker recognition; text-independent; adaptive convolutional neural network; frame-level speaker embedding;
D O I
10.21437/Interspeech.2021-65
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In text-independent speaker recognition, each speech is composed of different phonemes depending on spoken text. The conventional neural networks for speaker recognition are static models, so they do not reflect this phoneme-varying characteristic well. To tackle this limitation, we propose an adaptive convolutional neural network (ACNN) for text-independent speaker recognition. The utterance is divided along the time axis into short segments with small fluctuating phonemes. Frame-level features are extracted by applying input-dependent kernels adaptive to each segment. By applying time average pooling and linear layers, utterance-level embeddings extraction and speaker recognition are performed. Adaptive VGG-M using 0.356 seconds segmentation shows better speaker recognition performance than baseline models, with a Top-1 of 86.51% and an EER of 5.68%. It extracts more accurate frame-level embeddings for vowel and nasal phonemes compared to the conventional method without overfitting and large parameters. This framework for text-independent speaker recognition effectively utilizes phonemes and text-varying characteristic of speech.
引用
收藏
页码:66 / 70
页数:5
相关论文
共 36 条
[1]  
[Anonymous], 2016, Advances in Neural Information Processing Systems
[2]   Adaptive Convolution for Object Detection [J].
Chen, Chunlin ;
Ling, Qiang .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (12) :3205-3217
[3]  
Chen Y., P IEEECVF C COMPUTER, p11 030
[4]  
Choi BJ, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P2475
[5]  
Chung J. S., 2018, arXiv
[6]  
Chung J.S., 2020, ODYSSEY 2020, P349, DOI DOI 10.21437/ODYSSEY.2020-49
[7]   ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].
Deng, Jiankang ;
Guo, Jia ;
Xue, Niannan ;
Zafeiriou, Stefanos .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694
[8]  
EATOCK JP, 1994, INT CONF ACOUST SPEE, P133
[9]  
Gao Z., 2019, INTERSPEECH, P361, DOI DOI 10.21437/INTERSPEECH.2019-1489
[10]  
Garofolo J. S., 1993, NASA STI/Recon Technical Report n, V93, P27403, DOI DOI 10.6028/NIST.IR.4930