Model-Agnostic Meta-Learning for Fast Text-Dependent Speaker Embedding Adaptation

被引:5
作者
Lin, Weiwei [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep speaker embedding; text-dependent speaker verification; meta-learning; model adaptation; MAML;
D O I
10.1109/TASLP.2023.3275029
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
By constraining the lexical content of input speech, text-dependent speaker verification (TD-SV) offers more reliable performance than text-independent speaker verification (TI-SV) when dealing with short utterances. Because speech with constrained lexical content is harder to collect, often TD models are fine-tuned from a TI model using a small target phrase dataset. However, sometimes the target phrase dataset is too tiny for fine-tuning, which is the main obstacle for deploying TD-SV. One solution is to fine-tune the model using medium-size multi-phrase TD data and then deploy the model on the target phrase. Although this strategy does help in some cases, the performance is still sub-optimal because the model is not optimized for the target phrase. Inspired by the recent progress in meta-learning, we propose a three-stage pipeline for adapting a TI model to a TD model for the target phrase. Firstly, a TI model is trained using a large amount of speech data. Then, we use a multi-phrase TD dataset to tune the TI model via model-agnostic meta-learning. Finally, we perform fast adaptation using a small target phrase dataset. Results show that the three-stage pipeline consistently outperforms multi-phrase and target phrase fine-tuning.
引用
收藏
页码:1866 / 1876
页数:11
相关论文
共 46 条
[1]  
Abdel-Hamid O, 2013, INT CONF ACOUST SPEE, P7942, DOI 10.1109/ICASSP.2013.6639211
[2]  
Antoniou A., 2019, PROC INT C LEARN REP
[3]   Speaker recognition based on deep learning: An overview [J].
Bai, Zhongxin ;
Zhang, Xiao-Lei .
NEURAL NETWORKS, 2021, 140 :65-99
[4]  
Chen Ting, 2019, PMLR
[5]  
Chen Y., 2019, PROC 7 INT C LEARN R
[6]  
Chung JS, 2018, INTERSPEECH, P1086
[7]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[8]  
Finn C, 2017, PR MACH LEARN RES, V70
[9]  
Garcia-Romero D, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P256
[10]   The vanishing gradient problem during learning recurrent neural nets and problem solutions [J].
Hochreiter, S .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 1998, 6 (02) :107-116